Week 6 Lab

Author

Ben Davies

Published

October 7, 2024

Introduction

So far in this course, we’ve stressed the importance of visualizing data as a way to address the limitations of numerical summaries. At the same time, human brains are wired to find patterning, and can assign patterning to visual representations of data even where none actually exists. Because of this, it is important to take a skeptical view of the patterns we see in our data.

Statistical inference is a tool for systematically reasoning about data. In environmental research, statistics are used to understand differences and similarities among populations of organisms, measurements of pollutants in the air or water, or the opinions held by residents of city. A common form of statistical inference is hypothesis testing: a way of assessing the likelihood that a given proposition about the world is true. The first part of this lab is aimed at helping to guide you through the process of conducting hypothesis tests in R. Here, we’ll focus on what a hypothesis test looks like and how to select the right test, using some of the datasets you’re already familiar with from this course.

At the end of the day, we want to be able to communicate our efforts to others. One major reason is that we want our work to be replicable, and for others to be able to build upon it. Those others will likely include our future selves. So far we’ve learned to use comments as a way to leave notes about what our code is doing, but these are very limited and do not give us much room to emphasize, enumerate, and represent complex ideas and reasoning. In the second part of this lab, we will be learning how to use code notebooks to document our work; specifically, we’ll be using Quarto documents, which come bundled with RStudio.

Preliminaries

To begin, I recommend, nay, I implore you to set up a file system, including a folder for storing data.

We’ll also be using the tidyverse package, and drawing on data from the modeldata, palmerpenguins, and gapminder packages.