2  Transforming data

Beyond subsetting, other functions from dplyr work to restructure the data by changing the arrangement of rows and columns, or by modifying their contents.

Mutate

Mutate is a useful function that creates new columns by calculating values from existing columns. For example, we can add a column of DBH values in inches this way:

treeData2<-mutate(treeData,DBHin=`DBH (cm)` / 2.54)
treeData2
# A tibble: 14,487 × 42
   DbaseID Region City   Source TreeID Zone  `Park/Street` SpCode ScientificName
     <dbl> <chr>  <chr>  <chr>   <dbl> <chr> <chr>         <chr>  <chr>         
 1       1 InlVal Modes… Motow…      1 Nurs… Nursery       ACSA1  Acer sacchari…
 2       2 InlVal Modes… Motow…      2 Nurs… Nursery       BEPE   Betula pendula
 3       3 InlVal Modes… Motow…      3 Nurs… Nursery       CESI4  Celtis sinens…
 4       4 InlVal Modes… Motow…      4 Nurs… Nursery       CICA   Cinnamomum ca…
 5       5 InlVal Modes… Motow…      5 Nurs… Nursery       FRAN_R Fraxinus angu…
 6       6 InlVal Modes… Motow…      6 Nurs… Nursery       FREX_H Fraxinus exce…
 7       7 InlVal Modes… Motow…      7 Nurs… Nursery       FRHO   Fraxinus holo…
 8       8 InlVal Modes… Motow…      8 Nurs… Nursery       FRPE_M Fraxinus penn…
 9       9 InlVal Modes… Motow…      9 Nurs… Nursery       FRVE_G Fraxinus velu…
10      10 InlVal Modes… Motow…     10 Nurs… Nursery       GIBI   Ginkgo biloba 
# ℹ 14,477 more rows
# ℹ 33 more variables: CommonName <chr>, TreeType <chr>, address <chr>,
#   street <chr>, side <chr>, cell <dbl>, OnStreet <chr>, FromStreet <chr>,
#   ToStreet <chr>, Age <dbl>, `DBH (cm)` <dbl>, `TreeHt (m)` <dbl>,
#   CrnBase <dbl>, `CrnHt (m)` <dbl>, `CdiaPar (m)` <dbl>,
#   `CDiaPerp (m)` <dbl>, `AvgCdia (m)` <dbl>, `Leaf (m2)` <dbl>,
#   Setback <dbl>, TreeOr <dbl>, CarShade <dbl>, LandUse <dbl>, Shape <dbl>, …

Here we’ve created a new variable called DBHin by multiplying all the values in DBH (cm) by 2.54. We can also create new values by combining values across multiple columns. For example, if we wanted to get crown depth, which is the difference between the total tree height and the crown height, we can subtract these using a mutate function:

treeData3<-mutate(treeData2,crownDepth_m=`TreeHt (m)`-`CrnHt (m)`)
ggplot(treeData3,aes(x=crownDepth_m)) + geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The important thing to remember about these kinds of operations is that they are happening row-wise, so the values in the new column will correspond to the position of values in the columns used. For example, the first tree height value is 2 meters, while the first crown height value is 0.5. When the latter is subtracted from the former, we get a value of 1.5, and this becomes the first value in our new column.

Bind

Sometimes you may want to combine two datasets into a single table. Let’s say we had two tables of different maple trees:

sugarMaples<-filter(treeData,CommonName=="Sugar maple")
redMaples<-filter(treeData,CommonName=="Red Maple")

We can put these two together using the bind_rows function:

mapleTrees<-bind_rows(sugarMaples,redMaples)

The bind_cols function works similarly, but instead binds new columns to an existing table. For example, let’s say we had our maple tree data in two pieces; information about streets and information species names:

mapleNames<-select(mapleTrees,ends_with("Name",ignore.case = TRUE))
mapleStreets<-select(mapleTrees,DbaseID,contains("Street",ignore.case = FALSE))

We can recombine these using the bind_cols function this way:

mapleData<-bind_cols(mapleStreets,mapleNames)
mapleData
# A tibble: 246 × 7
   DbaseID `Park/Street` OnStreet FromStreet ToStreet ScientificName CommonName 
     <dbl> <chr>         <chr>    <chr>      <chr>    <chr>          <chr>      
 1    3936 Street        -1       -1         -1       Acer saccharum Sugar maple
 2    4014 Street        -1       -1         -1       Acer saccharum Sugar maple
 3    4101 Street        -1       -1         -1       Acer saccharum Sugar maple
 4    4150 Street        -1       -1         -1       Acer saccharum Sugar maple
 5    4206 Street        -1       -1         -1       Acer saccharum Sugar maple
 6    4268 Street        -1       -1         -1       Acer saccharum Sugar maple
 7    4295 Street        -1       -1         -1       Acer saccharum Sugar maple
 8    4348 Street        -1       -1         -1       Acer saccharum Sugar maple
 9    4353 Street        -1       -1         -1       Acer saccharum Sugar maple
10    4371 Street        -1       -1         -1       Acer saccharum Sugar maple
# ℹ 236 more rows
Try it yourself!

A distinguishing factor between the tidyverse bind_rows and its Base R equivalent (rbind) is that the Base R commands will only work with tables that have the same opposing dimension (same number of columns). The tidyverse versions will simply fill any unmatched columns or rows with NA values. Try it by combining the tree dataset with these datasets from the modeldata package

  • crickets

  • penguins

  • Sacramento