Environmental Data Analysis and Visualization
Does Anybody Really Know What Time It Is?
Warm-Up Exercise
Create a new file system and Quarto document for this lecture. Download the walkability.csv dataset. Using pipes, create a dataset of with the following variables:
Core-Based Statistical Area (CBSA) name (excluding the state)
Population (only above 5 million)
National Walkability Index
Show the distribution of walkability index scores for all CBSAs in this new dataset
Warm-Up Exercise
walkData<- read_csv ("data/walkability.csv" )
walkData %>%
select (CBSA_Name,CBSA_POP,NatWalkInd) %>%
separate (CBSA_Name,into= c ("cities" ,"states" ),sep= "," ,extra= "drop" ) %>%
filter (CBSA_POP> 4000000 ) %>%
ggplot (aes (x= reorder (cities,NatWalkInd,median),y= NatWalkInd)) +
geom_boxplot () +
coord_flip () +
labs (x= "Walkability Index" ,y= "CBSA" )
Warm-up exericse
Dataset of the day
The quantmod
package
XOM.Open XOM.High XOM.Low XOM.Close XOM.Volume XOM.Adjusted
2007-01-03 76.26 76.27 73.51 74.11 30510700 40.00807
2007-01-04 73.85 73.90 72.21 72.72 31046600 39.25766
2007-01-05 72.45 73.38 72.22 73.24 24671500 39.53840
2007-01-08 73.88 74.25 71.85 72.65 25981900 39.21988
2007-01-09 71.85 72.84 71.44 72.09 27039900 38.91758
2007-01-10 71.81 73.11 70.64 70.99 30572000 38.32374
Dates as data
# A tibble: 4,486 × 7
Date XOM.Open XOM.High XOM.Low XOM.Close XOM.Volume XOM.Adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2007-01-03 76.3 76.3 73.5 74.1 30510700 40.0
2 2007-01-04 73.8 73.9 72.2 72.7 31046600 39.3
3 2007-01-05 72.4 73.4 72.2 73.2 24671500 39.5
4 2007-01-08 73.9 74.2 71.8 72.7 25981900 39.2
5 2007-01-09 71.8 72.8 71.4 72.1 27039900 38.9
6 2007-01-10 71.8 73.1 70.6 71.0 30572000 38.3
7 2007-01-11 71.0 72.8 70.8 71.0 29161700 38.3
8 2007-01-12 71.2 72.7 71.2 72.7 24438700 39.2
9 2007-01-16 72.5 72.7 71.2 71.6 25444400 38.7
10 2007-01-17 71.5 72.6 71.4 72.5 31292200 39.1
# ℹ 4,476 more rows
Dataset of the day
The quantmod
package
Time: what is it?
Time as data is a strange beast.
Time: what is it?
Time as data is a strange beast.
Time by itself isn’t something that varies in a meaningful way.
Time: what is it?
Time as data is a strange beast.
Time by itself isn’t something that varies in a meaningful way.
Time has a defined order, but you can’t really count time, nor can you really treat it like a number.
Time: what is it?
R can store time data as character values.
"2023-10-19"
"10/19/2023"
"17:05:00"
Time: what is it?
It can also store some kinds of time data as numbers:
Time: what is it?
R can also recognize time as particular objects like date and date-time objects. For example:
#give today's date as a date object
today ()
#give the precise time as a date-time object
now ()
[1] "2024-10-29 14:59:44 EDT"
Handling time data
The lubridate
package lets us convert between other values and time objects. This comes pre-loaded with tidyverse
.
Handling time data
#storing the date as a character value
todaysDate<- "2024-10-29"
todaysDate
Handling time data
# converting the date to a date object
ymd (todaysDate)
Handling time data
ymd
= “year, month, day” e.g., 2024/10/29
dmy
= “day, month, year” e.g., 29/10/2024
mdy
= “month, day, year” e.g., 10/29/2024
Visualizing time data
NASDAQ Composite data
nasdaq<- read_csv ("data/nasdaq.csv" )
nasdaq
# A tibble: 127 × 5
Date `Close/Last` Open High Low
<chr> <dbl> <dbl> <dbl> <dbl>
1 10/28/2024 18567. 18648. 18671. 18563.
2 10/25/2024 18519. 18513. 18690. 18487.
3 10/24/2024 18415. 18384. 18435. 18305.
4 10/23/2024 18277. 18502. 18509. 18147.
5 10/22/2024 18573. 18452. 18621. 18413.
6 10/21/2024 18540 18456. 18544. 18378.
7 10/18/2024 18490. 18466 18524. 18453.
8 10/17/2024 18374. 18537. 18541. 18369.
9 10/16/2024 18367. 18333. 18383. 18215.
10 10/15/2024 18316. 18516. 18564. 18253.
# ℹ 117 more rows
Visualizing time data
ggplot (nasdaq,aes (x= Date,y= Open)) +
geom_line (color= "darkblue" ) +
labs (x= "Date" ,y= "Points at Closing" ,title= "NASDAQ Composite, May-October 2024" ) +
theme_bw ()
Visualizing time data
ggplot (nasdaq,aes (x= mdy (Date),y= Open)) +
geom_line (color= "darkblue" ) +
labs (x= "Date" ,y= "Points at Closing" ,title= "NASDAQ Composite, May-October 2024" ) +
theme_bw ()
Activity: Visualizing time data
Load the openintro package to access the nycflights data
Create a pipeline that
subsets the data to Delta Airlines (DL) flights only
combines the year, month, and day into a single date column
converts this to a table of counts using the count
function (different from after_stat
)
Plot this new data as a line graph using geom_line
.
Activity: Visualizing time data
library (openintro)
nycflights2<- nycflights
nycflights2 %>%
filter (carrier== "DL" ) %>%
unite (col= "Date" ,year,month,day,sep= "-" ,na.rm = TRUE ) %>%
count (Date) %>%
ggplot (aes (x= ymd (Date),y= n)) +
geom_line (color= "darkblue" ) +
labs (x= "Date" ,y= "Number of Flights" )
Activity: Visualizing time data
Coursekeeping
By the end of this week, you will have feedback on your project proposal.
Grades on Coding Exercise 1 will be returned early next week.
Coding Exercise 2 is due November 6.
Coursekeeping
For the next two weeks, we will only have the longer lab session.
Students who have visualization critiques due will deliver these at the start of the session.
There will also be a (brief) introduction to the week’s topics before moving on to the lab.
Coursekeeping
Next week, be prepared to meet with either Denisse or Ben to discuss feedback on proposals and discuss next steps..