Environmental Data Analysis and Visualization
Visualization Critique
Visualization Critique
Visualization Critique
Sensor of the week
Traffic sensors
Learning how to look
University of Wollongong
Getting a clearer view
Getting a clearer view
Getting a clearer view
Flip x and y axes
ggplot(data=Sacramento,aes(x=city,y=price)) +
geom_boxplot() +
coord_flip()
Getting a clearer view
Resize y-axis labels
ggplot(data=Sacramento,aes(x=city,y=price)) +
geom_boxplot() +
coord_flip() +
theme(axis.text.y=element_text(size=rel(0.75)))
Getting a clearer view
Reorder based on median house prices
ggplot(data=Sacramento,aes(x=fct_reorder(city,price,median),y=price)) +
geom_boxplot() +
coord_flip() +
theme(axis.text.y=element_text(size=rel(0.5)))
Getting a clearer view
Add some labels
Getting a clearer view
ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) +
geom_point()
Getting a clearer view
Changing the transparency (alpha)
ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) +
geom_point(alpha=0.1)
Getting a clearer view
Changing the size
ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) +
geom_point(size=0.25)
Getting a clearer view
Binning the values (geom_bin2d
)
ggplot(data=abalone,aes(x=diameter*200,y=weight.whole*200)) +
geom_bin2d()
Getting a clearer view
ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
geom_point()
Getting a clearer view
Zooming in
ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
geom_point() +
coord_cartesian(xlim=c(0,100))
Getting a clearer view
ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
geom_point()
Getting a clearer view
Log-10 transformation on x-axis
ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
geom_point() +
scale_x_continuous(trans='log10')
Getting a clearer view
Add a smooth line
ggplot(data=drop_na(bully,maxl),aes(x=altitude,y=maxl)) +
geom_point() +
scale_x_continuous(trans='log10')+
geom_smooth()
Seeing the bigger picture
Faceted plots can help us view the same pattern across multiple variables.
ggplot(data=Sacramento,aes(x=sqft,y=price)) +
geom_point() +
facet_wrap(vars(type))
Activity: Looking and comparing
In this exercise, you’ll use the scat
dataset in the modeldata
package (same data, fewer bobcars)
Use the facet_wrap
function to look at relationships (remember to use the vars
function to identify your aesthetic mapping):
Correlation coefficient
https://online.stat.psu.edu
What to do about outliers?
Just because something is an outlier, doesn’t automatically make it wrong or a mistake.
What to do about outliers?
Just because something is an outlier, doesn’t automatically make it wrong or a mistake.
But some outliers are mistakes.
What to do about outliers?
What to do about outliers?
Your job as the data scientist is decide which is which.
If you decide to eliminate an observation, do not modify the original data. Instead, create a new dataset without the observation.
Always detail any decision-making