I am using the Ames Housing Data and I want to use all the variables with the sufix “SF” in my recipe, I want to use step_pca() on the variables that are measure by squared feet.

I used reformulate() to no avail:

SF <- reformulate(grep("SF", names(ames), value = TRUE), 
              response = 'Sale_Price')
simple_ames <- 
  recipe(SF + Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type + Latitude, 
                        data = ames_train) %>% 
  step_log(Gr_Liv_Area, base = 10) %>% 
  step_other(Neighborhood, threshold = 0.01) %>%
  step_dummy(all_nominal_predictors()) %>% 
  step_interact(~ Gr_Liv_Area:starts_with('Bldg_Type_')) %>% 
  step_ns(Latitude, deg_free = 20) %>% 
  step_pca(matches('(SF$)|(Gr_Liv'))

Also used grep() directly into the formula

 simple_ames <- 
   recipe(Sale_Price ~ paste(grep("SF"), collapse = '+') + Neighborhood + 
   Gr_Liv_Area + Year_Built + Bldg_Type + Latitude, data = ames_train) %>% 
   step_log(Gr_Liv_Area, base = 10) %>% 
   step_other(Neighborhood, threshold = 0.01) %>%
   step_dummy(all_nominal_predictors()) %>% 
   step_interact(~ Gr_Liv_Area:starts_with('Bldg_Type_')) %>% 
   step_ns(Latitude, deg_free = 20) %>% 
   step_pca(matches('(SF$)|(Gr_Liv'))

I am using the examples from Tidy Modelling with R, https://www.tmwr.org/recipes chapter 8.4.4 (authors do not explain a efficient way to insert all those variables into recipe)

Thanks