Recognizing commas separated rows as duplicates

  Kiến thức lập trình

I have this dataset that looks like this in R:

sample_table = data.frame( colors = c("red", "blue", "red,blue", "blue, red"), counts = c(12, 10, 5,6))

     colors counts
1       red     12
2      blue     10
3  red,blue      5
4 blue, red      6

I want to recognize that “blue, red” and “red, blue” is the same – and sum both counts into a single row:

    colors counts
1      red     12
2     blue     10
3 red,blue     11

Is there a standard way to do this in R (e.g. for multiple colors, e.g. “red, blue, green” = “green, blue, red”)

I did this manually:

standardize_colors <- function(color_string) {
    colors <- unlist(strsplit(color_string, ",\s*"))
    return(paste(sort(colors), collapse = ","))
}

sample_table$standardized_colors <- sapply(sample_table$colors, standardize_colors)

aggregated_table <- aggregate(counts ~ standardized_colors, data = sample_table, sum)

print(aggregated_table)

Is there a more efficient way to do this?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT