Environmental Data Analysis and Visualization

On Deliberate Visualization

Warm-up activity

  • Start up R Studio and open a new script. Load the palmerpenguins packages (install it if needed)

  • Using ggplot2, create a graphic that shows the relationship between two variables: body mass and bill depth

  • Modify the aesthetic mapping so that the geometry is colored based on species

  • With the + operator, add a layer consisting of the labs function, and give it arguments for x, y, color, and title

Sensor of the day

Mobile phones!

epSos.de, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons

Sensor of the day

https://www.kaggle.com/datasets

Labeling

Cheatsheets

Labeling

Cheatsheets

Order of operations

ggplot(data=penguins,aes(body_mass_g,bill_depth_mm,color=species))+
  geom_point()

Order of operations

ggplot(data=penguins)+
  geom_point(aes(body_mass_g,bill_depth_mm,color=species))

Order of operations

ggplot()+
  geom_point(data=penguins,aes(body_mass_g,bill_depth_mm,color=species))

When does this matter?

  • Where you add your aesthetic mapping will determine what geometries they connect to

  • Where you add your layers will determine their order of drawing

When does this matter?

ggplot(data=gm2007,aes(x = continent, y = lifeExp, color = continent)) +
  geom_boxplot() +
  geom_jitter()

When does this matter?

ggplot(data=gm2007,aes(x = continent, y = lifeExp)) +
  geom_jitter(aes(color = continent)) +
  geom_boxplot()

How to make a good visualization

  • Choose the right chart for the data

  • Maximize the data-to-ink ratio

  • Make deliberate design decisions

Scales

Scales: Being deliberate with color

What can color show effectively?

Scales: Being deliberate with color

ggplot(penguins,aes(body_mass_g,bill_depth_mm,color=species))+
  geom_point() +
  labs(x="Body Mass (g)",y="Bill Depth (mm)",title="Penguin bill size by body mass") +
  scale_color_manual(values=c("Purple","Orange","Green"))

Scales: Being deliberate with color

library(ggthemes)

cvdCols <- c("#000000", "#E69F00", "#56B4E9")

ggplot(penguins,aes(body_mass_g,bill_depth_mm,color=species))+
  geom_point() +
  labs(x="Body Mass (g)",y="Bill Depth (mm)",title="Penguin bill size by body mass") +
  scale_color_manual(values=cvdCols)

Scales: Being deliberate with color

library(ggthemes)

ggplot(penguins,aes(body_mass_g,bill_depth_mm,color=species))+
  geom_point() +
  labs(x="Body Mass (g)",y="Bill Depth (mm)",title="Penguin bill size by body mass") +
  scale_color_colorblind()

Scales: Being deliberate with size

What sort of information does size suggest?

Scales: Being deliberate with size

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp))+
  geom_point() 

Scales: Being deliberate with size

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() 

Scales: Being deliberate with axes

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() +
  scale_x_log10() 

Activity: Deliberate penguins

  • For this activity, use the penguins dataset to genreate a graphic. Use at least 3 variables in your plot.

  • Try using some of the scale options located in the ggplot2 cheatsheet.

    • What sorts of scales might help you clarify or emphasize different variables?

Being deliberately misleading

https://eagereyes.org/blog/2013/banking-45-degrees

Being deliberately misleading

https://eagereyes.org/blog/2013/baselines

Being deliberately misleading

https://infolific.com/technology/internet/seo-lie-factor/

Data-ink and themes

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() +
  scale_x_log10() +
  theme_classic()

Themes and data-ink

While each graphical element can be modified individually, themes provide a way to modify the overall look of the “non-data ink”

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() +
  scale_x_log10()

Themes and data-ink

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() +   
  scale_x_log10()+
  theme_bw()

Themes and data-ink

ggplot(gmAsia2007,aes(x=gdpPercap,y=lifeExp,size=pop))+
  geom_point() +   
  scale_x_log10()+
  theme_classic()

The big picture

  • Visualization is foremost about making data more understandable

  • Guidelines like maximizing data-ink and being deliberate about design help us make decisions that will facilitate this goal

  • The grammar of graphics helps us to make these decisions in an explicit way by connecting elements

  • The ggplot2 package provides a way for us to put that grammar to work inside of the data environment we’re creating in R

Coursekeeping

  • Visualization critiques begin next week

    • Check the list on Canvas to see when your critique is due
  • Coding assignment #1 is due on Thursday

Next week

  • Introducing data analysis

  • Finding a statistic for assessing your data

  • Visualizing stats