Data Visualization: Glossary

Key Points

  • dplyr is one useful tool to manipulate, summarize, and analyze data.

  • ggplot2 is one useful tool to visualize and analyze data.

  • A clearly defined research question is essential to a successful analysis and graphical representation of the results.

Data Management in R
  • Be sure to setwd() to point to your data file before importing it.

  • Import data using read.csv().

  • Familiarize yourself with your data and its structure prior to analysis.

Data Structures
  • ‘Variables’ are associated with columns in tidy data.

  • As a descriptor of a specific column of data, ‘variables’ tend to describe numerically continuous data, while ‘categories’ describe discrete or categorical data that is organized in groups.

  • ‘Values’ are associated with cells in a data table.

  • ‘Replicates’ are values with similar variables and categories most often seen in experimental data.

  • ‘Absolute’ values receive context by their units, while ‘relative’ values are standardized in some fashion (i.e., proportion, per unit) for comparison among categories.

dplyr Basics
  • Use dplyr to manipulate, summarize, and analyze your data.

Scientific Questions and Hypotheses
  • Strong research questions and hypotheses direct the goals of data collection and analysis. They have clearly defined answers with deliberate investigation.

  • A well detailed research question can be answered with a good chart.

Choosing a Good Chart
  • A good chart directly and clearly addresses a scientific question.

ggplot2 Basics
  • Use ggplot2 to visualize and analyze data.

Generating a Line Histogram
  • Line histograms represent a single variable distributions with many points.

Making Publication Quality Figures
  • Publication quality figures stand alone to address a scientific question.

  • Publication quality figures include clear headings, labels, and symbols.

  • Publication quality figures are often accompanied by a descriptive caption.

Bubble Charts
  • Bubble charts represent a relationship with three variables

Tidy Data Structure
  • Tidy data has one value per cell and all similar values in a single column.

Faceted Table of Histograms
  • Faceted tables represent a comparison among many items or categories.

  • Multiple good charts can address a single scientific question.

Stacked Area Chart
  • Stacked area charts represent a composition changing over time.

  • After the hard part of thinking and choosing your scientific question and related chart, implementation should be easy.