I have a dataframe as below. I want to group the data by recid and add column count1 and count2 such that all the values are counted in both col1 and col2. So example count1 in row 1 will be 2 because 10 occurs 2 times in both the columns and Freqcol1 will then be 2/4

ID col1 col2 recid
 1   10   12 abc_12
 2   10   15 abc_12
 3   10   10 def_34
Desired output:

ID col1 col2 recid count1 count2 Freqcol1 Freqcol2
 1   10   12 abc_12     2     1     0.5     1
 2   10   15 abc_12     2     1     0.5     1
 3   10   10 def_34     2     2     0.5     0.5

My code from Calculate the occurance of a number in two columns

df %>%
  pivot_longer(-ID) %>%
  mutate(count = n(), .by = value) %>%
  mutate(freq = count / n()) %>%
  pivot_wider(values_from = c(value, count, freq))

5

Another dplyr method, no pivoting, dynamic to the number of col# columns (in case that’s useful):

library(dplyr)
quux %>%
  mutate(
    across(starts_with("col"),
           ~ colSums(sapply(.x, `==`, c(col1, col2))),
           .names = "{sub('col', 'count', .col)}"),
    across(starts_with("count"), 
           ~ .x / sum(.x), 
           .names = "{sub('count', 'freq', .col)}"),
    .by = recid
  )
#   ID col1 col2  recid count1 count2 freq1 freq2
# 1  1   10   12 abc_12      2      1   0.5   0.5
# 2  2   10   15 abc_12      2      1   0.5   0.5
# 3  3   10   10 def_34      2      2   1.0   1.0

Data

quux <- structure(list(ID = 1:3, col1 = c(10L, 10L, 10L), col2 = c(12L, 15L, 10L), recid = c("abc_12", "abc_12", "def_34")), class = "data.frame", row.names = c(NA, -3L))

Recognized by R Language Collective

I think what you want is

df %>% 
  pivot_longer(col1:col2) %>% 
  mutate(count=n(), .by=c(recid, value)) %>%
  mutate(freq = 1/count, .by=c(recid, name)) %>% 
  pivot_wider(names_from=name, values_from=c(value, count, freq))

which returns

# A tibble: 3 × 8
     ID recid  value_col1 value_col2 count_col1 count_col2 freq_col1 freq_col2
  <int> <chr>       <int>      <int>      <int>      <int>     <dbl>     <dbl>
1     1 abc_12         10         12          2          1       0.5       1  
2     2 abc_12         10         15          2          1       0.5       1  
3     3 def_34         10         10          2          2       0.5       0.5

for the sample data

df <- read.table(text="ID col1 col2 recid
1   10   12 abc_12
2   10   15 abc_12
3   10   10 def_34", header=T)

This produces slightly different column names by you can rename them in a later step if required.

Recognized by R Language Collective

1

library(dplyr)

df %>%
  group_by(recid) %>%
  mutate(
    count1 = sum(col1 == col1[1]),  # Count occurrences of col1 value within group
    count2 = sum(col2 == col2[1]),  # Count occurrences of col2 value within group
    Freqcol1 = count1 / (count1 + count2),  # Calculate frequency for col1
    Freqcol2 = count2 / (count1 + count2)  # Calculate frequency for col2
  ) %>%
  ungroup()  # Release grouping
A tibble: 3 × 8
     ID  col1  col2 recid  count1 count2 Freqcol1 Freqcol2
  <dbl> <dbl> <dbl> <chr>   <int>  <int>    <dbl>    <dbl>
1     1    10    12 abc_12      2      1    0.667    0.333
2     2    10    15 abc_12      2      1    0.667    0.333
3     3    10    10 def_34      1      1    0.5      0.5  

1

Khám phá các thẻ bài đăng