Relative Content

Tag Archive for rsubsetsample

Random Sampling with Condition in R

I would like to subset an original dataset into train and test datasets, with the two subsets having the same relative fraction of values of a binary variable as the original dataset. As an example, an original dataset contains a binary variable column named BIN_VAR (along with numerous other columns). BIN_VAR is made up of 5% ones and 95% zeroes. I would like the test and train subsets to also have this 5 ones:95 zeroes ratio. Assume the size of train is 80% of the size of the original dataset, test is the remaining 20%. I understand that using sample() will likely get me close to the 5:95 ratio, but I would like to make it exact. Thanks.