Environmental Data Analysis and Visualization

Getting a Clearer View

Visualization Critique

Visualization Critique

Visualization Critique

Sensor of the week

Traffic sensors

https://auto.howstuffworks.com

https://trafficvision.com/

Sensor of the week

Traffic sensors

MassDOT Transportation Data Management System

Learning how to look

University of Wollongong

Getting a clearer view

Getting a clearer view

Getting a clearer view

Flip x and y axes

ggplot(data=Sacramento,aes(x=city,y=price)) +
  geom_boxplot() +
  coord_flip()

Getting a clearer view

Resize y-axis labels

ggplot(data=Sacramento,aes(x=city,y=price)) + 
  geom_boxplot() +
  coord_flip() +
  theme(axis.text.y=element_text(size=rel(0.75)))

Getting a clearer view

Reorder based on median house prices

ggplot(data=Sacramento,aes(x=fct_reorder(city,price,median),y=price)) + 
  geom_boxplot() +
  coord_flip()  +
  theme(axis.text.y=element_text(size=rel(0.5)))

Getting a clearer view

Add some labels

Getting a clearer view

ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) + 
  geom_point()

Getting a clearer view

Changing the transparency (alpha)

ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) + 
  geom_point(alpha=0.1)

Getting a clearer view

Changing the size

ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) + 
  geom_point(size=0.25)

Getting a clearer view

Binning the values (geom_bin2d)

ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) + 
  geom_bin2d()

Getting a clearer view

Data: Common bully (Gobiomorphus cotidianus) records from New Zealand Freshwater Fish Database

ggplot(data=bully,aes(x=altitude,y=maxl)) +
  geom_point()

Getting a clearer view

ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
  geom_point()

Getting a clearer view

Zooming in

ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
  geom_point() +
  coord_cartesian(xlim=c(0,100))

Getting a clearer view

ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
  geom_point()

Getting a clearer view

Log-10 transformation on x-axis

ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
  geom_point() +
  scale_x_continuous(trans='log10')

Getting a clearer view

Add a smooth line

ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
  geom_point() +
  scale_x_continuous(trans='log10')+
  geom_smooth()

Seeing the bigger picture

Faceted plots can help us view the same pattern across multiple variables.

ggplot(data=Sacramento,aes(x=sqft,y=price)) +
  geom_point() +
  facet_wrap(vars(type))

Activity: Looking and comparing

  • In this exercise, you’ll use the scat dataset in the modeldata package (same data, fewer bobcars)

  • Use the facet_wrap function to look at relationships (remember to use the vars function to identify your aesthetic mapping):

    • length and diameter by species

    • diameter and mass by species

Correlation coefficient

https://online.stat.psu.edu

What to do about outliers?

Just because something is an outlier, doesn’t automatically make it wrong or a mistake.

What to do about outliers?

Just because something is an outlier, doesn’t automatically make it wrong or a mistake.

But some outliers are mistakes.

What to do about outliers?

What to do about outliers?

Your job as the data scientist is decide which is which.

If you decide to eliminate an observation, do not modify the original data. Instead, create a new dataset without the observation.

Always detail any decision-making

Next week

  • Hypothesis testing

  • Best practices for data documentation

  • Introducing Quarto

  • Data ethics and open science models