// pillar 01 · foundation
The statistical concepts you need to understand before you can responsibly play with data: Descriptive and Inferential Statistics, the building blocks for how any data analysis gets done.
In this section, I'll talk about the basic statistics such as Descriptive and Inferential Statistics, which are the building blocks for understanding how data analysis can be performed.
Let me first start from the beginning and explain what data is.
Data, in very simple terms, is information that has been quantified, and when this information is processed and stored by a computer, this "Computer Data" is what is used in Data Analysis. At the most rudimentary level, this data is made up of binary digits: 0 and 1. But in today's world, we have different manifestations of this data in the form of text documents, images, videos, software, and more.
With the revolution in computing storage and processing power, the amount of data being generated is humongous, and often there is a need to find some method in the madness. That's where Data Analysis kicks in.
Before knowing the type of analysis that can be done on the data, it's imperative to understand the types of data that exist. Broadly, there are two types of data: Qualitative (Categorical) and Quantitative (Numerical).
Qualitative Data, also known as Categorical Data, is generally non-numeric in nature. This kind of data contains words and, as such, cannot be quantified. Examples of Qualitative Data can be Location, Gender, Colour, Shape, and so on.
Qualitative Data can be of three types: Binary, Nominal, and Ordinal.
Quantitative Data is numerical in nature and, as the name suggests, is the kind of data that can be quantified. We can further divide Quantitative Data into two sub-categories: Interval and Ratio. Both these classes of data have weight and contain information about their relative value.
Quantitative Data can also be categorized into two types: Continuous and Discrete.
To understand the type of analysis that can be done on the various kinds of data mentioned above, the prerequisite is to have a brief understanding of what we mean by Statistics, Population, and Sample.
Generally, the first thought that comes to mind when someone says Population is the number of individuals in a country, while Sample means a small part of this population used to represent the entire population. If you have this understanding of Population and Sample, you're not very far from the actual meaning of these terms in Statistics.
Population means everyone comprising a particular group as a whole. It can be described as the individual or group of individuals that make up everyone and everything that is the subject of a statistical observation. It's important to note that a population doesn't have to be large, and it can be as small as 2 individuals if they represent the whole of the category in question. For example, imagine a manufacturer produced only 3 units of a special limited-edition car; if we want to find the average distance covered by that model, our population is just those 3 cars, and it won't include any other car, only this particular model. While Population represents the whole, Sample is nothing but a subset of this Population. Various kinds of analysis are done on this sample data, and the values and inferences drawn from it are known as Statistics. For example, if we compute a mean, the mean drawn using the Population will generate a Parameter, while the mean drawn using a Sample will be known as a Statistic.
As it's often very difficult to measure every individual in a population, various methods for selecting samples are used, such as:
With this understanding in place, we're good to go and dive into the world of statistics, where I'll be discussing the basic statistics that are required to perform any advanced analysis on data.

In this section, equations, distributions, etc. that are involved in calculating the various kinds of statistics that we use to analyze the data are discussed. From calculating a simple arithmetic mean to comparing means of two data sets, how certain calculations are performed, what inferences we can draw from them, and how these inferences can be used to perform more sophisticated kinds of data analysis, all such questions are solved in this section. We can simply write a one-line code and mug up shortcuts to make us remember what the data indicates given a value of a statistic, but to have a deep understanding of what we are doing and, most importantly, why we are doing something, it is important to understand the theory behind it.

In this age of computers, it becomes imperative to apply the formal knowledge through machines, as it produces faster results which are more reliable and robust. With the understanding of the theory behind the statistics, we can take help of computers and use them to their potential. The knowledge of basic statistics can be applied to very large datasets that require highly complex calculations and also require a lot of time if performed manually. In today's world, it is of paramount importance to have the right balance between having the behind-the-scenes knowledge and the knowledge of the application part. We can use various software, and in this section, I discuss the codes in languages like R and Python, through which various basic statistics discussed in the theory section can be applied to very large datasets by using a fairly simple one-line code.