Environmental Data Analysis and Visualization

Postering 101

Visualization critique

NY Times

Visualization critique

Kirk Goldsberry

Visualization critique

data.epa.gov

Next visualization critiques (Dec 3)

  • Bella

  • Anya

  • Tamara

Checking in

  • How did it go yesterday?

  • How much do you feel like you remember?

  • Ready to keep going?

Data source of the day

Natural Earth

Reviewing null hypotheses

Every hypothesis test comes with a null hypothesis.

In a hypothesis test, the p-value is the probability that the pattern in the data could arise in a situation where the null hypothesis is true

Reviewing null hypotheses

Depending on the test, the null hypothesis may be framed in terms that can sound like they have positive or negative connotations

  • For the Shapiro-Wilk Normality test, the null hypothesis is the data are drawn from a population that is normally distributed

    • Therefore, when p<0.05, this means the the data are unlikely to have been drawn from a normal distribution

Reviewing null hypotheses

Depending on the test, the null hypothesis may be framed in terms that can sound like they have positive or negative connotations

  • For the Spearman’s rank correlation test, the null hypothesis is that there is no monotonic relationship between the variables

    • Therefore, when p<0.05, this means the pattern in the data are inconsistent with having no relationship (e.g., having arisen from chanceReviewing null hypotheses

When using a test, it’s important to be aware of the null hypothesis. However, R help can be variably unhelpful in distinguishing.

Reviewing null hypotheses

Reporting the results of a statistical test

When reporting the results of a statistical test, be sure to include the following information at a minimum within the context of the hypothesis:

  • Name of test and variable(s)

  • Test statistic(s) (letter will vary)

  • Degrees of freedom (df)

  • P-value (p)

Reporting the results of a statistical test

library(palmerpenguins) 
#create a dataset of just Adelie and Gentoo penguins, and just the species and body mass columns g
genChin <- penguins %>% 
  filter(species == "Chinstrap" | species== "Gentoo") %>%         
  select(species, body_mass_g) %>%         
  drop_na()  

t.test(data = genChin, body_mass_g ~ species)

    Welch Two Sample t-test

data:  body_mass_g by species
t = -20.628, df = 170.4, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Chinstrap and group Gentoo is not equal to 0
95 percent confidence interval:
 -1471.440 -1214.416
sample estimates:
mean in group Chinstrap    mean in group Gentoo 
               3733.088                5076.016 

Reporting the results of a statistical test

“I used Welch’s two-sample t-test of the body masses for Gentoo and Chinstrap penguins, and rejected the null hypothesis that there is no difference in mean body mass between these two groups (t=-20.62, df=170.4, p<2.2e-16).”

Linear modeling

Unlike hypothesis tests, we are often making judgement calls about how well a model fits the data

Testing assumptions

Linearity: The relationship between X and the mean of Y is linear.

treeModel<-lm(`TreeHt (m)`~`DBH (cm)`,data=treeDataNC)
plot(treeModel,which=1)

If plot is flat-ish, we can proceed…

Testing assumptions

Normality: For any fixed value of X, Y is normally distributed.

 plot(treeModel,which=2)

If plot mostly adheres to diagonal line, we can proceed..

Testing assumptions

Homoscedasticity: The variance of residual is the same for any value of X.

plot(treeModel,which=3)

If plot is mostly flat, we can proceed..

Final projects

For your final projects, you will be submitting two documents

  • A Quarto document demonstrating your process, as well as the accompanying data files as a zipped file system

  • A poster that you will use to present your project to a non-specialist audience

Posters

  • Make it readable

  • Show, don’t tell

Make it readable

  • Consistent and clean layout; context oriented in a single direction

  • Headings should clearly readable and convey the idea behind graphics or text sections

  • Graphic elements should be able to be seen from 10 feet away

Make it readable

Wilke, Claus. 2019. Fundamentals of Data Visualization.

Make it readable

Wilke, Claus. 2019. Fundamentals of Data Visualization.

Make it readable

Make it readable

Show, don’t tell

  • Fewer words, bigger words

  • Limit use of text (800-1200 words max)

  • Graphics should be largely self-explantory

  • The poster is meant to come with an interpreter: you!

Show, don’t tell

Show, don’t tell

Activity: Draft a poster

Find a poster template or build your own!

  • You can use your preferred platform, or there is a PowerPoint template on Canvas

If you have any draft graphics, you can also add these in the appropriate places to beginning filling out the space

Poster size should be 36 in tall by 48 inches wide

Next week

  • Monday we will be working on final projects in the lab

  • Tuesday I will be holding final project consultations on Zoom between 1:30 PM and 4PM. A signup will be posted shortly.

  • If you need to discuss your project outside this time, please connect with me as soon as possible.