Mammals <- dplyr::select(mammals, Order, Binomial, AdultBodyMass_g,ĪdultHeadBodyLen_mm, HomeRange_km2, LitterSize) Names(mammals) <- sub("X+", "", names(mammals))
This gets a bit ugly, but you can safely just run this code chunk and ignore the details: mammals <- read.table("mammals.txt", sep = "\t", header = TRUE,
įirst we’ll download the data: pantheria <-ĭownload.file(pantheria, destfile = "mammals.txt") PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. The subsequent arguments can be copied as is.We’re going to work with a dataset of mammal life-history, geography, and ecology traits from the PanTHERIA database: If there was a single element in vars() you can remove vars(), otherwise replace it with c(). Strip the _if(), _at() and _all() suffix off the function.Ĭall across().
If you want to update your existing code to use across() instead of the _if, _at, or _all() functions, it’s generally straightforward: It’s a bummer that we had a few false starts before we discovered across(), but even with hindsight, I don’t see how we could’ve skipped the intermediate steps. Vctrs package, where we learnt that you can have a column of a data frame that is itself a data frame. Why did it take it long to discover across()? Surprisingly, the key idea that makes across() works came out of our low-level work on the The _at() functions are the only place in dplyr where you have to use vars(), which makes them unusual, and hence harder to learn and remember. For example, you can now transform all numeric columns whose name begins with “x”: across(where(is.numeric) & starts_with("x")).Īcross() doesn’t need vars(). With the where() helper, across() unifies _if and _at semantics, allowing combinations that used to be impossible. This makes dplyr easier for you to use (because there are fewer functions to remember) and easier for us to develop (since we only need to implement one function for each new verb, not four).
For example, it’s now easy to summarise numeric vectors with one function, factors with another, and still compute the number of rows in each group:ĭf %>% group_by ( g1, g2 ) %>% summarise ( across ( where ( is.numeric ), mean ), across ( where ( is.factor ), nlevels ), n = n (), )Īcross() reduces the number of functions that dplyr needs to provide. Why did we decide to move away from these functions in favour of across()?Īcross() makes it possible to compute useful summaries that were previously impossible. This means that they’ll stay around, but will only receive critical bug fixes. These functions solved a pressing need and are used by many people, but are now superseded. If you’ve tackled this problem with an older version of dplyr, you might’ve used one of the functions with an _if, _at, or _all suffix. If needed, you can access the name of the column currently being processed with There are three cool features you might be particularly interested in: Library ( dplyr, nflicts = FALSE ) starwars %>% summarise ( across ( where ( is.character ), n_distinct )) #> # A tibble: 1 x 8 #> name hair_color skin_color eye_color sex gender homeworld species #> #> 1 87 13 31 15 5 3 49 38 starwars %>% group_by ( species ) %>% filter ( n () > 1 ) %>% summarise ( across ( c ( sex, gender, homeworld ), n_distinct )) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 9 x 4 #> species sex gender homeworld #> #> 1 Droid 1 2 3 #> 2 Gungan 1 1 1 #> 3 Human 2 2 16 #> 4 Kaminoan 2 2 1 #> 5 Mirialan 1 1 1 #> 6 Twi'lek 2 2 1 #> 7 Wookiee 1 1 1 #> 8 Zabrak 1 1 2 #> 9 1 1 3 starwars %>% group_by ( homeworld ) %>% filter ( n () > 1 ) %>% summarise ( across ( where ( is.numeric ), mean, na.rm = TRUE ), n = n ()) #> `summarise()` ungrouping output (override with `.groups` argument) #> # A tibble: 10 x 5 #> homeworld height mass birth_year n #> #> 1 Alderaan 176.