Can you recommend some methods for tidying a dataset using the tidyverse or dplyr packages? [duplicate]
This question already has answers here: Reshaping wide to long with multiple values columns [duplicate] (5 answers) Reshaping multiple sets of measurement columns (wide format) into single columns (long format) (8 answers) Closed 1 hour ago. I’m seeking advice on how to tidy this dataset. My objective is to arrange the columns in the following […]
How to summarize counts based on multiple conditions in R
I would like to summarize the number of each species that meet a certain condition for further analysis.
How to count the number of “No” ocurrencies across multiple columns in R [duplicate]
This question already has answers here: Compute row-wise counts in subsets of columns in dplyr (2 answers) Closed 15 days ago. I have a large dataset and I need to count the total occurrences of “no” across multiple columns. The dataset looks like this: id = c(1,2,3,4,5,6,7,8) trat = c(“a”,”b”,”a”,”b”,”a”,”b”,”a”,”b”) var1 = c(“no”,”no”,”no”,”no”,”yes”,NA,NA,”no”) var2 = […]
Attempting to calculate ratios in multiple columns in dataframe via a for loop
I’m using R to calculate simple ratios in columns and save those values in another table. The columns seen in the str included are every day in January and every hour of every day in January. For the purposes of this project, I have to calculate all of the hours of the month. I’m getting an appropriate table, but all the values are NA, so the logic or my attempt is failing somewhere and I’ve read all I can read. I’ve hit my limit on trying to find the issue and I need help.
Compute the actual_tat by excluding the non-business hours in R
I have a dataframe like this
Difference in Output Between Single and Double Bracket Indexing in R’s case_when()
I’m working with a list in R and I’ve noticed an unexpected difference in output when using single and double bracket indexing in the case_when() function from the dplyr package. Here’s the sample list I’m working with:
Apply a function in a data frame group-wise on a subset of rows?
By using tidyverse
, I want to calculate standard deviation of alt_freq
column grouping by rsid
in a data frame. In each group, I want to consider only those rows which have at least 100 samples.
Weighted mean per group with different weights per group using dplyr
I am attempting to modify my code below to a single pipeline using dplyr
. I am calculating a weighted mean across two columns per year where each year has a different weighting. How can this been performed without explicitly splitting the dataset and then combining the results? Thanks
Using ‘slice_max()’ in for loop
I’m trying to create new dataframes with the top three values for each column across a dataframe.
How to create subgroups based on group relationship criteria
Context:
I have a dataframe of individual people grouped by household, which includes relationship parameters for each individual describing their relationship to every other individual in the household.