Conditional sum of multiple columns based on multiple (other) columns

  Kiến thức lập trình

I have a data frame of the form below:

ID <- c(1, 2, 3, 4, 5)
Type1 <- c("A", "", "A", "B", "C")
Count1 <- c(40, NA, 10, 5, 100)
Type2 <- c("D", "", "", "C", "D")
Count2 <- c(5, NA, NA, 30, 5)
Type3 <- c("E", "", "", "D", "")
Count3 <- c(10, NA, NA, 5, NA)
df <- data.frame(ID, Type1, Count1, Type2, Count2, Type3, Count3)

I would like to sum the values in the “Count” columns IF they are of the same “Type”. I.e., if Type1, Type2, or Type3 match, sum the corresponding value in Count1, Count2, and Count3.

Ideally, I could get an output of the form below:

Type <- c("A", "B", "C", "D", "E")
n <- c(2, 1, 2, 3, 1)
Total <- c(50, 5, 130, 15, 10)

result <- data.frame(Type, n, Total)

I was able to achieve this using the following code, but it’s quite clunky. I’m sure there is a more elegant method!

df1 <- data.frame(Type1, Count1)
df2 <- data.frame(Type2, Count2)
df3 <- data.frame(Type3, Count3)

colnames(df1) <- c("Type", "Count")
colnames(df2) <- c("Type", "Count")
colnames(df3) <- c("Type", "Count")

df_all <- rbind(df1, df2, df3)

result <- df_all %>% group_by(Type) %>% 
     summarize(num = n(),
               total = sum(Count))

New contributor

Madeline L is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT