Data Analysis II: Data Exploration and Visualization

SE350 Team

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to

  1. maximize insight into a data set;
  2. uncover underlying structure;
  3. extract important variables;
  4. detect outliers and anomalies;
  5. test underlying assumptions;
  6. develop parsimonious models; and
  7. determine optimal factor settings.

    Engineering Statistics Handbook

Tufte: "Graphical displays should...

  • show the data
  • induce the viewer to think about the substance rather than…something else
  • avoid distorting what the data have to say
  • present many numbers in a small space
  • make large data sets coherent
  • encourage the ey to compare different peices of data
  • reveal the data at several levels of detail
  • serve a reasonably clear purpose: description, exploration, tabulation, or decoration
  • be closely integrated with the statical and verbal descriptions of the data set

Example: