I have a dataset with GPS points and I want to remove points that are within a 2-hour period. Here’s a sample of the dataset:
gps_data_animals_id acquisition_time
348179 348179 2015-09-18 00:00:00
348180 348180 2015-09-18 01:45:00
348181 348181 2015-09-18 02:00:00
348182 348182 2015-09-18 02:15:00
348183 348183 2015-09-18 02:30:00
348184 348184 2015-09-18 04:30:00
348185 348185 2015-09-18 04:45:00
348186 348186 2015-09-18 05:00:00
348187 348187 2015-09-18 06:00:00
348188 348188 2015-09-18 12:00:00
348189 348189 2015-09-18 17:15:00
348190 348190 2015-09-18 17:30:00
348191 348191 2015-09-18 17:45:00
348192 348192 2015-09-18 18:00:00
348193 348193 2015-09-18 18:15:00
348194 348194 2015-09-18 18:30:00
348195 348195 2015-09-18 18:45:00
348196 348196 2015-09-19 00:00:00
348197 348197 2015-09-19 06:01:00
348198 348198 2015-09-19 11:15:00
And I want locations separated in time by at least 2h, so this would be the filtered dataset:
gps_data_animals_id acquisition_time
348179 348179 2015-09-18 00:00:00
348181 348181 2015-09-18 02:00:00
348184 348184 2015-09-18 04:30:00
348188 348188 2015-09-18 12:00:00
348189 348189 2015-09-18 17:15:00
348196 348196 2015-09-19 00:00:00
348197 348197 2015-09-19 06:01:00
348198 348198 2015-09-19 11:15:00
I’ve been playing a bit with the lag() function as it seems to do more or less what I need, but I end up removing more than I want. This is what I have done so far:
dataset$time_diff <- unlist(tapply(dataset$acquisition_time, INDEX = dataset$animals_id,
FUN = function(x) c(0, `units<-`(diff(x), "hours"))))
And then I would remove those values of time_diff less than 2h, but that ends up removing more than I want because it would also remove e.g. gps_data_animals_id = 348181
, which I want to keep as it has the 2h interval with the first location.
Any thoughts?
Here’s the reproducible example of the dataset:
structure(list(gps_data_animals_id = 348179:348198, acquisition_time = structure(c(1442534400,
1442540700, 1442541600, 1442542500, 1442543400, 1442550600, 1442551500,
1442552400, 1442556000, 1442577600, 1442596500, 1442597400, 1442598300,
1442599200, 1442600100, 1442601000, 1442601900, 1442620800, 1442642460,
1442661300), class = c(“POSIXct”, “POSIXt”), tzone = “GMT”)), row.names = 348179:348198, class = “data.frame”)