R – remove rows if a certain value is reached, and recalculate

  Kiến thức lập trình

I have a dataset with GPS points and I want to remove points that are within a 2-hour period. Here’s a sample of the dataset:

       gps_data_animals_id    acquisition_time
348179              348179 2015-09-18 00:00:00
348180              348180 2015-09-18 01:45:00
348181              348181 2015-09-18 02:00:00
348182              348182 2015-09-18 02:15:00
348183              348183 2015-09-18 02:30:00
348184              348184 2015-09-18 04:30:00
348185              348185 2015-09-18 04:45:00
348186              348186 2015-09-18 05:00:00
348187              348187 2015-09-18 06:00:00
348188              348188 2015-09-18 12:00:00
348189              348189 2015-09-18 17:15:00
348190              348190 2015-09-18 17:30:00
348191              348191 2015-09-18 17:45:00
348192              348192 2015-09-18 18:00:00
348193              348193 2015-09-18 18:15:00
348194              348194 2015-09-18 18:30:00
348195              348195 2015-09-18 18:45:00
348196              348196 2015-09-19 00:00:00
348197              348197 2015-09-19 06:01:00
348198              348198 2015-09-19 11:15:00

And I want locations separated in time by at least 2h, so this would be the filtered dataset:

       gps_data_animals_id    acquisition_time
348179              348179 2015-09-18 00:00:00
348181              348181 2015-09-18 02:00:00
348184              348184 2015-09-18 04:30:00
348188              348188 2015-09-18 12:00:00
348189              348189 2015-09-18 17:15:00
348196              348196 2015-09-19 00:00:00
348197              348197 2015-09-19 06:01:00
348198              348198 2015-09-19 11:15:00

I’ve been playing a bit with the lag() function as it seems to do more or less what I need, but I end up removing more than I want. This is what I have done so far:

dataset$time_diff <- unlist(tapply(dataset$acquisition_time, INDEX = dataset$animals_id,
                                 FUN = function(x) c(0, `units<-`(diff(x), "hours"))))

And then I would remove those values of time_diff less than 2h, but that ends up removing more than I want because it would also remove e.g. gps_data_animals_id = 348181, which I want to keep as it has the 2h interval with the first location.

Any thoughts?

Here’s the reproducible example of the dataset:
structure(list(gps_data_animals_id = 348179:348198, acquisition_time = structure(c(1442534400,
1442540700, 1442541600, 1442542500, 1442543400, 1442550600, 1442551500,
1442552400, 1442556000, 1442577600, 1442596500, 1442597400, 1442598300,
1442599200, 1442600100, 1442601000, 1442601900, 1442620800, 1442642460,
1442661300), class = c(“POSIXct”, “POSIXt”), tzone = “GMT”)), row.names = 348179:348198, class = “data.frame”)

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT