Environmental Data Analysis & Visualization

Does anybody really know what time it is?

Warm-up exercise

Create a new file system and Quarto document for this lecture. Download the walkability.csv dataset. Using pipes, create a dataset of with the following variables:

  • Core-Based Statistical Area (CBSA) name (excluding the state)

  • Population (only above 5 million)

  • National Walkability Index

Show the distribution of walkability index scores for all CBSAs in this new dataset

Warm-up exercise

walkData<-read_csv("data/walkability.csv")    

walkData %>%   
  select(CBSA_Name,CBSA_POP,NatWalkInd) %>%   
  separate(CBSA_Name,into=c("cities","states"),sep=",",extra="drop") %>%   
  filter(CBSA_POP>5000000) %>%  
  ggplot(aes(x=reorder(cities,NatWalkInd,median),y=NatWalkInd)) +     
  geom_boxplot() +     
  coord_flip() +   
  labs(x="Walkability Index",y="CBSA") 

Visualization critique

New York Times

Visualization critique

https://www.toptal.com/designers/data-visualization/data-visualization-best-practices

Visualization critique

https://experience.arcgis.com/experience/be9006c30a2148f595693066441fb8eb/page/Map/

Dataset of the day

Stock market data

Yahoo! Finance

When does “when” matter?

NASA Scientific Visualization Studio (https://svs.gsfc.nasa.gov/5137/)

Time: what is it?

R can store time as character values.

  • "2023-10-19"

  • "10/19/2023"

  • "17:05:00"

Time: what is it?

It can also store some kinds of time data as numbers:

  • 2023

  • 10

  • 19

Time: what is it?


Time as data is a strange beast.

Time by itself isn’t something that varies in a meaningful way.

Time has a defined order, but you can’t really count time, nor can you really treat it like a number.

Time: what is it?

R can also recognize time as particular objects like date and date-time objects. For example:

#give today's date as a date object
today()
[1] "2023-10-20"
#give the precise time as a date-time object
now()
[1] "2023-10-20 12:51:43 EDT"

Handling time data

The lubridate package lets us convert between other values and time objects.

library(lubridate)

Handling time data

#storing the date as a character value
todaysDate<-"2023-10-19"

todaysDate
[1] "2023-10-19"
typeof(todaysDate)
[1] "character"

Handling time data

# converting the date to a date object
ymd(todaysDate)
[1] "2023-10-19"
typeof(ymd(todaysDate))
[1] "double"

Visualizing time data

NASDAQ Composite data

nasdaq<-read_csv("data/nasdaq.csv")
nasdaq
# A tibble: 65 × 7
   Date        Open   High    Low  Close `Adj Close`     Volume
   <chr>      <dbl>  <dbl>  <dbl>  <dbl>       <dbl>      <dbl>
 1 7/18/2023 14212. 14397. 14176. 14354.      14354. 4824070000
 2 7/19/2023 14399. 14447. 14317. 14358.      14358. 5112420000
 3 7/20/2023 14273. 14310. 14031. 14063.      14063. 5128020000
 4 7/21/2023 14148. 14179. 14020. 14033.      14033. 5254180000
 5 7/24/2023 14082. 14110. 13997. 14059.      14059. 4083070000
 6 7/25/2023 14093. 14202. 14093. 14145.      14145. 3812470000
 7 7/26/2023 14124. 14187. 14042. 14127.      14127. 4322000000
 8 7/27/2023 14319. 14360. 14007. 14050.      14050. 5115840000
 9 7/28/2023 14200. 14344. 14188. 14317.      14317. 4453520000
10 7/31/2023 14338. 14371. 14293. 14346.      14346. 4934440000
# ℹ 55 more rows

Visualizing time data

ggplot(nasdaq,aes(x=Date,y=Open)) +
  geom_line(color="darkblue") +
  labs(x="Date",y="Points at Closing",title="NASDAQ Composite, July-October 2023") +
  theme_bw()

Visualizing time data

ggplot(nasdaq,aes(x=mdy(Date),y=Open)) +
  geom_line(color="darkblue") +
  labs(x="Date",y="Points at Closing",title="NASDAQ Composite, July-October 2023") +
  theme_bw()

Activity: Visualizing time data

Load the openintro package to access the nycflights data

Create a pipe that

  • subsets the data to Delta Airlines (DL) flights only

  • combines the year, month, and day into a single date column

  • convert to a table of counts using the count function

Plot this new data as a line graph using geom_line.

Activity: Visualizing time data

nycflights2<-nycflights

nycflights2 |> 
  filter(carrier=="DL") |>
  unite(col="Date",year,month,day,sep="-",na.rm = TRUE ) |>
  count(Date)|>
  ggplot(aes(x=ymd(Date),y=n)) +
  geom_line(color="darkblue") 

Next time

  • Data storytelling

  • Introducing sf

  • Making and plotting vector data