Exploratory Data Analysis in R

Author

Ben Davies

Published

September 28, 2024

Introduction

Last week we looked at visualizing data, emphasizing how visualization can be used to reveal patterns that are not obvious in raw data. But how do we begin to look at data? What kinds of things should we be looking for? And how does this help guide our ability to analyze and tell stories with data?

This week we’ll build on our visualization skills by employing them in Exploratory Data Analysis. EDA is one of the first steps taken in any data science project. It is used to reveal patterns and problems in data, and help to solidify the research questions, analytical methods, and visualization techniques you will use. But EDA is less a formal set of procedures and more like a dialogue you have with the data; the actual procedures you undertake will depend on what you are looking for and what you find as you go. In this lab, you will learn about:

  • Common data types, their properties, and how they are visualized

  • What kinds of descriptive patterning to look for when evaluating data visually

  • How to manage true and anomalous outliers

For this lab, we’ll be looking at a number of different datasets. You will need to download the Week5Data.zip folder, unzip it, and then save the contents to the appropriate place in your file system. We’ll also be using the following packages, which may need to be installed for your use here:

#install.packages("datasets")
#install.packages("modeldata")