Create a project and file system for this week’s lecture
Download the middlesexEColi.csv dataset from Canvas to the appropriate place in your file system and read it into R
Create a histogram of the results of these tests. The horizontal (x) axis is colony forming units per 100mL sample (cfu/100mL)
After you’ve plotted the histogram, you can use the abline
function with the argument v=235
to draw a vertical line at 235, and color it red.
The Mystic River Water Quality Buoy!
Mystic River Buoy Readings
National Water Quality Monitoring Council
Visualization helps us to identify patterns and structures in data that are not evident from tables or numerical summaries.
Faith, J. Tyler. 2018. “Paleodietary Change and Its Implications for Aridity Indices Derived from δ18O of Herbivore Tooth Enamel.” https://doi.org/10.1016/j.palaeo.2017.11.045.
Marine Reservoir Correction data from calib.org
Mean x: 9
Mean y: 7.5
Pearson correlation coefficient (r): 0.816
Coefficient of determination (R2): 0.67
Avenue, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons
Matejka and Fitzmaurice, 2017. “Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing.” https://doi.org/10.1145/3025453.3025912.
Also see: datasaurus package
Choose the right chart for the data
Maximize the data-to-ink ratio
Make deliberate design decisions
The type of visualization that should be used depends on the kind of information being conveyed.
Wilke, Claus. 2019. Fundamentals of Data Visualization
Wilke, Claus. 2019. Fundamentals of Data Visualization
Data-ink (per Tufte 19831) refers to ink (or pixels) that, if erased, would reduce the information being presented.
Ideally, the ratio between data and total ink should be close to 1.
The most egregious use of non-data ink is often referred to as chartjunk.
Su, Yu-Sung. 2008. "It's Easy to Produce Chartjunk Using Microsoft®Excel 2007 but Hard to Make Good Graphs." https://doi.org/10.1016/j.csda.2008.03.007.
Su, Yu-Sung. 2008. "It's Easy to Produce Chartjunk Using Microsoft®Excel 2007 but Hard to Make Good Graphs." https://doi.org/10.1016/j.csda.2008.03.007.
How does this choice help someone understand the data?
Contrast
Clarity
Highlighting
Messaging
Some design choices affect some audiences more than others
Color palette
Text and symbol sizes
Wilke, Claus. 2019. Fundamentals of Data Visualization
A framework (per Wilkinson et al. 2005; Wickham 2010) used to describe the components of a data visualization in terms of a set of layered objects
https://wilkelab.org
Component | Description | Example |
---|---|---|
Statistics | Statistical transformations or summaries of data | mean, log transformation, smoothing spline |
Facets | Divisions in data used for multi-plotting | side-by-side plot, 2 x 2 plot |
Coordinates | Space used for plotting values | Cartesian 2D space, polar coordinate space |
Themes | Non-data ink | Font size, shading of background grid, location of tick marks |
Use the as_tibble
function to convert the faithful
dataset from dataframe to a tibble
Using ggplot2, plot the faithful
dataset as a scatterplot, with waiting time as the x variable