Introduction to Exploratory Data Analysis (EDA)
Before data analysis begins, a business situation can typically be classified into one of two scenarios:
- Exploratory: The business has collected operational data but has not yet formed a concrete hypothesis. The goal is to explore the data for hidden patterns and insights.
- Hypothesis-driven: The business has a specific hypothesis about its products or services and wants to test it with data.
Statistical techniques are essential in both scenarios. Statistics provides the mathematical foundation for understanding and interpreting data.
"There are very few things which we know, which are not capable of being reduc’d to a Mathematical Reasoning, ... and where a Mathematical Reasoning can be had, it’s as great folly to make use of any other, as to grope for a thing in the dark when you have a Candle standing by you"
— John Arbuthnot (1692)
Statistical methods are broadly divided into two categories:
- Descriptive Statistics: Used to summarize and explore data. When you don't have a predefined hypothesis, you use descriptive statistics (like summaries and charts) to understand the data's main features and uncover potential hypotheses.
- Inferential Statistics: Used to test a specific hypothesis. With inferential statistics, you aim to draw conclusions about a larger population based on a sample of data. This often involves building predictive models and making judgments based on the results.
This eBook focuses on Descriptive Statistics and its application in Exploratory Data Analysis (EDA). While we will briefly touch on data collection concepts relevant to hypothesis testing, a deep dive into Inferential Statistics is beyond our scope of this eBook. Our goal is to equip aspiring Data Analysts with the fundamental techniques of EDA.