TIL: dplyr::mutate()'s .keep argument

Use the .keep argument in dplyr::mutate() to control which variables to retain after mutating to create new variables.

Dec 02, 2024

In the spirit of learning in public,1 today I learned about the .keep argument in dplyr. This doesn’t add anything you can’t do with a select or transmute, but might help simplify some of your dplyr pipelines.2

In the examples below I’m using a few rows from the built-in iris dataset to demonstrate how to use the .keep argument by creating a new ratio variable that’s the ratio of the sepal length to width.

Here’s a Colab notebook with the code you can run (yes, you can use R in Colab!).

The default: keep everything with “all”

The default is to keep all existing variables along with the new ones.

library(dplyr)

# Just get a few rows for easy display
iris <- as_tibble(iris) |> head()

# Keep everything
iris |> 
  mutate(Sepal.Ratio = Sepal.Length/Sepal.Width, .keep = "all")

Result (just adds the newly created variable to the end):

# A tibble: 6 × 6
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Ratio
         <dbl>       <dbl>        <dbl>       <dbl> <fct>         <dbl>
1          5.1         3.5          1.4         0.2 setosa         1.46
2          4.9         3            1.4         0.2 setosa         1.63
3          4.7         3.2          1.3         0.2 setosa         1.47
4          4.6         3.1          1.5         0.2 setosa         1.48
5          5           3.6          1.4         0.2 setosa         1.39
6          5.4         3.9          1.7         0.4 setosa         1.38

Keep nothing with “none”

You can retain only the columns created by mutate with the “none” option:

iris |> 
  mutate(Sepal.Ratio = Sepal.Length/Sepal.Width, .keep = "none")

Result (only keeps the new variable):

# A tibble: 6 × 1
  Sepal.Ratio
        <dbl>
1        1.46
2        1.63
3        1.47
4        1.48
5        1.39
6        1.38

Keep the used columns with “used”

Keep the new column and the columns used to create it with the “used” option:

iris |> 
  mutate(Sepal.Ratio = Sepal.Length/Sepal.Width, .keep = "used")

Result (only keeps the new variable and the columns used to create it):

# A tibble: 6 × 3
  Sepal.Length Sepal.Width Sepal.Ratio
         <dbl>       <dbl>       <dbl>
1          5.1         3.5        1.46
2          4.9         3          1.63
3          4.7         3.2        1.47
4          4.6         3.1        1.48
5          5           3.6        1.39
6          5.4         3.9        1.38

Keep new and unused columns with “unused”

If you don’t need the variables you used to create the new variable, you can keep the new column and all the others that were not used in creating the new one with the “unused” option:

iris |> 
  mutate(Sepal.Ratio = Sepal.Length/Sepal.Width, .keep = "unused")

Result (drops the Sepal.Length and Sepal.Width columns):

# A tibble: 6 × 4
  Petal.Length Petal.Width Species Sepal.Ratio
         <dbl>       <dbl> <fct>         <dbl>
1          1.4         0.2 setosa         1.46
2          1.4         0.2 setosa         1.63
3          1.3         0.2 setosa         1.47
4          1.5         0.2 setosa         1.48
5          1.4         0.2 setosa         1.39
6          1.7         0.4 setosa         1.38

Can I do this in Python?

Sort of. I prefer the ergonomics of Polars over Pandas. There’s nothing like the .keep argument in either (that I know of). As you can see from the code, you still have to explicitly state which columns you want to keep, or in the last example, setting variables that you’ll drop.

Here’s a Colab notebook with the code you can run.

I have other “today I learned” posts like this in the TIL section of this newsletter.

This could also be useful in teaching, instead of introducing the limited use transmute() function.

Paired Ends

Comments