Topic 5: Exploratory Data Analysis (EDA)

Exploratory Data Analysis, commonly known as EDA, encompasses the essential task of conducting initial examinations on data to unveil trends, identify irregularities, validate hypotheses, and scrutinize underlying assumptions through the utilization of summary metrics and visual depictions (Patil, 2018).

Canva Images

EDA is used to analyze huge chunks of data (Big Data), helping decision making processes implemented in businesses, governmental agencies and international organizations. There are three types of EDA:

i) Univariate: This method analyzes one variable, or data column, at a time

ii) Multivariate: Analyzes multiple variables, exploring their relationships

iii) Bivariate: The most usual type of multivariate EDA, analyzing the relationship of only two variables

Tip: Usually, it is best to first perform a univariate EDA on every multivariate EDA component before performing a multivariate EDA (Seltman, 2018).

A simple code example of EDA’s first step

import pandas as pd
import numpy as np
import seaborn as sns

data = pd.read_csv(“C:/Users/User/Desktop/Folder/something.csv”)

Implementing Data Collection : The example of avocado