Environmental Data Analysis and Visualization

Does Anybody Really Know What Time It Is?

Warm-Up Exercise

Create a new file system and Quarto document for this lecture. Download the walkability.csv dataset. Using pipes, create a dataset of with the following variables:

  • Core-Based Statistical Area (CBSA) name (excluding the state)

  • Population (only above 5 million)

  • National Walkability Index

Show the distribution of walkability index scores for all CBSAs in this new dataset

Warm-Up Exercise

walkData<-read_csv("data/walkability.csv")    

walkData %>%   
  select(CBSA_Name,CBSA_POP,NatWalkInd) %>%   
  separate(CBSA_Name,into=c("cities","states"),sep=",",extra="drop") %>%   
  filter(CBSA_POP>4000000) %>%  
  ggplot(aes(x=reorder(cities,NatWalkInd,median),y=NatWalkInd)) +     
  geom_boxplot() +     
  coord_flip() +   
  labs(x="Walkability Index",y="CBSA")   

Warm-up exericse

Visualization Critique

r/dataisbeautiful

Visualization Critique

nytimes.com

Visualization Critique

nytimes.com

Next week

  • Rolando

  • Tal

Dataset of the day

Stock market data

nasdaq.com

Dataset of the day

The quantmod package

[1] "XOM"
           XOM.Open XOM.High XOM.Low XOM.Close XOM.Volume XOM.Adjusted
2007-01-03    76.26    76.27   73.51     74.11   30510700     40.00807
2007-01-04    73.85    73.90   72.21     72.72   31046600     39.25766
2007-01-05    72.45    73.38   72.22     73.24   24671500     39.53840
2007-01-08    73.88    74.25   71.85     72.65   25981900     39.21988
2007-01-09    71.85    72.84   71.44     72.09   27039900     38.91758
2007-01-10    71.81    73.11   70.64     70.99   30572000     38.32374

Dates as data

# A tibble: 4,486 × 7
   Date       XOM.Open XOM.High XOM.Low XOM.Close XOM.Volume XOM.Adjusted
   <date>        <dbl>    <dbl>   <dbl>     <dbl>      <dbl>        <dbl>
 1 2007-01-03     76.3     76.3    73.5      74.1   30510700         40.0
 2 2007-01-04     73.8     73.9    72.2      72.7   31046600         39.3
 3 2007-01-05     72.4     73.4    72.2      73.2   24671500         39.5
 4 2007-01-08     73.9     74.2    71.8      72.7   25981900         39.2
 5 2007-01-09     71.8     72.8    71.4      72.1   27039900         38.9
 6 2007-01-10     71.8     73.1    70.6      71.0   30572000         38.3
 7 2007-01-11     71.0     72.8    70.8      71.0   29161700         38.3
 8 2007-01-12     71.2     72.7    71.2      72.7   24438700         39.2
 9 2007-01-16     72.5     72.7    71.2      71.6   25444400         38.7
10 2007-01-17     71.5     72.6    71.4      72.5   31292200         39.1
# ℹ 4,476 more rows

Dataset of the day

The quantmod package

Time: what is it?


Time as data is a strange beast.

Time: what is it?


Time as data is a strange beast.

Time by itself isn’t something that varies in a meaningful way.

Time: what is it?


Time as data is a strange beast.

Time by itself isn’t something that varies in a meaningful way.

Time has a defined order, but you can’t really count time, nor can you really treat it like a number.

Time: what is it?

R can store time data as character values.

  • "2023-10-19"

  • "10/19/2023"

  • "17:05:00"

Time: what is it?

It can also store some kinds of time data as numbers:

  • 2023

  • 10

  • 19

Time: what is it?

R can also recognize time as particular objects like date and date-time objects. For example:

#give today's date as a date object
today()
[1] "2024-10-29"
#give the precise time as a date-time object
now()
[1] "2024-10-29 14:59:44 EDT"

Handling time data

The lubridate package lets us convert between other values and time objects. This comes pre-loaded with tidyverse.

Handling time data

#storing the date as a character value
todaysDate<-"2024-10-29"

todaysDate
[1] "2024-10-29"
typeof(todaysDate)
[1] "character"

Handling time data

# converting the date to a date object
ymd(todaysDate)
[1] "2024-10-29"
typeof(ymd(todaysDate))
[1] "double"

Handling time data

  • ymd = “year, month, day” e.g., 2024/10/29

  • dmy = “day, month, year” e.g., 29/10/2024

  • mdy = “month, day, year” e.g., 10/29/2024

Visualizing time data

NASA Scientific Visualization Studio (https://svs.gsfc.nasa.gov/5137/)

Visualizing time data

NASA Scientific Visualization Studio (https://svs.gsfc.nasa.gov/5137/)

Visualizing time data

NASA Scientific Visualization Studio (https://svs.gsfc.nasa.gov/5137/)

Visualizing time data

NASDAQ Composite data

nasdaq<-read_csv("data/nasdaq.csv")
nasdaq
# A tibble: 127 × 5
   Date       `Close/Last`   Open   High    Low
   <chr>             <dbl>  <dbl>  <dbl>  <dbl>
 1 10/28/2024       18567. 18648. 18671. 18563.
 2 10/25/2024       18519. 18513. 18690. 18487.
 3 10/24/2024       18415. 18384. 18435. 18305.
 4 10/23/2024       18277. 18502. 18509. 18147.
 5 10/22/2024       18573. 18452. 18621. 18413.
 6 10/21/2024       18540  18456. 18544. 18378.
 7 10/18/2024       18490. 18466  18524. 18453.
 8 10/17/2024       18374. 18537. 18541. 18369.
 9 10/16/2024       18367. 18333. 18383. 18215.
10 10/15/2024       18316. 18516. 18564. 18253.
# ℹ 117 more rows

Visualizing time data

ggplot(nasdaq,aes(x=Date,y=Open)) +
  geom_line(color="darkblue") +
  labs(x="Date",y="Points at Closing",title="NASDAQ Composite, May-October 2024") +
  theme_bw()

Visualizing time data

ggplot(nasdaq,aes(x=mdy(Date),y=Open)) +
  geom_line(color="darkblue") +
  labs(x="Date",y="Points at Closing",title="NASDAQ Composite, May-October 2024") +
  theme_bw()

Activity: Visualizing time data

Load the openintro package to access the nycflights data

Create a pipeline that

  • subsets the data to Delta Airlines (DL) flights only

  • combines the year, month, and day into a single date column

  • converts this to a table of counts using the count function (different from after_stat)

Plot this new data as a line graph using geom_line.

Activity: Visualizing time data

library(openintro)
nycflights2<-nycflights

nycflights2 %>% 
  filter(carrier=="DL") %>%
  unite(col="Date",year,month,day,sep="-",na.rm = TRUE ) %>%
  count(Date) %>%
  ggplot(aes(x=ymd(Date),y=n)) +
  geom_line(color="darkblue") +
  labs(x="Date",y="Number of Flights")

Activity: Visualizing time data

Coursekeeping

By the end of this week, you will have feedback on your project proposal.

Grades on Coding Exercise 1 will be returned early next week.

Coding Exercise 2 is due November 6.

Coursekeeping

For the next two weeks, we will only have the longer lab session.

Students who have visualization critiques due will deliver these at the start of the session.

There will also be a (brief) introduction to the week’s topics before moving on to the lab.

Coursekeeping

Next week, be prepared to meet with either Denisse or Ben to discuss feedback on proposals and discuss next steps..

Next time

  • Introducing sf

  • Making and plotting vector data

  • Storytelling (with maps!)