home/data exploration & preparation/theory/miscellaneous methods/univariate and bivariate analysis

// miscellaneous methods

Univariate and Bivariate Analysis

In the previous article, Consolidation of Datasets was explored. Once the datasets are consolidated, it is important to explore this dataset. Various descriptive and inferential statistics can be used to explore the data, along with the use of various visualisation techniques.

The exploration of a dataset can be divided into 3 types: Univariate Analysis, Bivariate Analysis, and Multivariate Analysis.

Univariate Analysis

In univariate analysis, each variable is analysed individually and we don’t look at more than one variable at a time. It is the simplest and most basic form of analysis. Univariate Analysis can be done for two kinds of variables: Categorical and Numerical.

Categorical Variables

Various Measures of Frequency can be used to analyse a categorical variable by forming frequency tables which count how often each category of the variable in question occurs, and on the basis of such tables, pie and bar charts can also be created. For example, there is a dataset having a variable Continent having three continents as categories. We can count the number of times each category gets repeated.

Numerical Variable

Various descriptive statistics such as Measures of Frequency (count), Shape (skewness, Kurtosis), Variability (Minimum value, Maximum value, Range, Quantile, Variance, Standard Deviation), and Central Tendency (Mean, Median, Mode) can be used to explore a numerical variable. The various visualisation techniques that can be used are mainly histogram and box plot.

Bivariate Analysis

Bivariate analysis is the analysis of two variables, where two variables are analysed to explore the relationship / association between them. Various inferential statistics can be used to perform Bivariate Analysis. Bivariate Analysis is of the following types: Bivariate Analysis of two Numerical Variables (Numerical-Numerical), Bivariate Analysis of two Categorical Variables (Categorical-Categorical), and Bivariate Analysis of one numerical and one categorical variable (Numerical-Categorical).

Numerical-Numerical

Inferential Statistics such as Correlation Coefficient can be used to explore two numerical variables. Visualisation techniques such as Scatterplot can be used.

Categorical-Categorical

Inferential Statistics such as the Chi-Square test can be used to explore two categorical variables. Visualisation techniques such as Stacked Column Charts can be used.

Numerical-Categorical

Inferential Statistics such as T-Test, Z-Test, and ANOVA can be used. The insights provided by such statistics can help us explore the dataset by looking at the various combinations of numerical and categorical variables. Visualisation techniques such as Combination charts or Line Chart with Error Bars can be used for such analyses.

Multivariate Analysis

Such an analysis requires analysing more than two variables simultaneously. For example, if we have to analyse 4 variables at the same time, this causes an increase in dimensionality. It is extremely difficult for a human mind to visualise the relationship of 4 variables (4 Dimensions) in a graph, and thus multivariate analysis is used (generally using special statistical software) to study more complex sets of data that cannot be analysed through univariate or bivariate analysis.

Types of Multivariate Analysis include Cluster Analysis, Factor Analysis, Multiple Regression Analysis, Principal Component Analysis etc.

The various statistics explored in the Basic Statistics section can be put to use to explore a dataset. The most common method of data exploration is done through univariate and bivariate analysis. Once the data is explored and a better understanding of the data is acquired, one can proceed with other data preparation and modeling steps.

ESC
100 pages indexed · Esc to close