Week 10 Lab

Published

November 5, 2023

Introduction

So far in this course, our main analytical tool has exploratory data analysis, where we use summaries and visualizations to describe patterns and identify We’ve stressed the importance of visualizing data as a way to address the limitations of numerical summaries. At the same time, human brains are wired to find patterning, and can assign patterning to visual representations of data even where none actually exists. Because of this, it is important to take a skeptical view of the patterns we see in our data.

Statistical inference is a tool for systematically reasoning about data. Statistics help us to determine whether a regularity or relationship among variables is likely to be meaningful, or if it it may have arisen purely by chance. In environmental research, statistics are used to understand differences and similarities among populations of organisms, measurements of pollutants in the air or water, or the opinions held by residents of city. Statistics are also part of every day life, helping us make decisions and navigate the world, whether it is evaluating gender bias in a company’s hiring processes or deciding whether to take a coat with us when we leave the house.

A common form of statistical inference is hypothesis testing: a way of assessing the likelihood that a given proposition about the world is true. This lab is aimed at helping to guide you through the process of conducting hypothesis tests in R. This course assumes you’ve already had some experience with statistics, and you are encouraged to use statistical inference and hypothesis testing in your final projects, but how much you include will depend on your comfort with statistics concepts. Here, we’ll focus on what a hypothesis test looks like and how to select the right test, using some of the datasets you’re already familiar with from this course.

There are many packages that can be used for hypothesis testing, but many of the key functions come pre-packaged with Base R. We’ll also be using the tidyverse package, and drawing on data from the modeldata, palmerpenguins, and gapminder packages.