I have this dataset that looks like this in R:
sample_table = data.frame( colors = c("red", "blue", "red,blue", "blue, red"), counts = c(12, 10, 5,6))
colors counts
1 red 12
2 blue 10
3 red,blue 5
4 blue, red 6
I want to recognize that “blue, red” and “red, blue” is the same – and sum both counts into a single row:
colors counts
1 red 12
2 blue 10
3 red,blue 11
Is there a standard way to do this in R (e.g. for multiple colors, e.g. “red, blue, green” = “green, blue, red”)
I did this manually:
standardize_colors <- function(color_string) {
colors <- unlist(strsplit(color_string, ",\s*"))
return(paste(sort(colors), collapse = ","))
}
sample_table$standardized_colors <- sapply(sample_table$colors, standardize_colors)
aggregated_table <- aggregate(counts ~ standardized_colors, data = sample_table, sum)
print(aggregated_table)
Is there a more efficient way to do this?