Random sampling of specific observations

  Kiến thức lập trình

I am currently working on a project. Project contains one data frames having V1, V2, V3 and V4. V1 and V2 are unique IDs. V3 and V4 represents disease status.

df

V1   V2   V3 V4
101 201 1 1
102 202 1 1 
103 203 2 NA
104 204 1 2
105 205 1 1
106 206 1 1
107 207 2 NA
108 208 1 1
109 209 2 1
110 210 2 2
111 211  2 NA
112 212  NA 2
113 213  1 1
114 214  1 1
115 215  2 NA

Here, V3 and V4 contains multiple scenarions 1 and 1, 2 and NA, NA and 2 etc. I want to convert 1 and 1 scenario only, means V3 and V4 both have 1. If it is 1 in V3 it should be NA in V4 or vice versa, randomly. Not for 1 and 2 or 2 and 1. But total 1 should be more in V3. I want to use R.

Final dataframe should look like this:

V1   V2   V3 V4
101 201 1 NA
102 202 NA 1 
103 203 2 NA
104 204 1 2
105 205 1 NA
106 206 1 NA
107 207 2 NA
108 208 NA 1
109 209 2 1
110 210 2 2
111 211  2 NA
112 212  NA 2
113 213  1 NA
114 214  1 NA
115 215  2 NA


     

LEAVE A COMMENT