// feature construction
Other Derived Variables
There are many ways through which variables can be created, such as by creating derived variables, which means creating new variables from already existing variables by using certain methods. We have discussed how Encoding and Binning can be used to create new variables, and also how Feature Transformation, in a way, also leads to the construction of new features. However, there are many other ways through which new variables can be created, and these have been discussed below.
Feature Crosses
This is a method of feature creation where a new categorical variable is created by using two already existing variables. In feature crosses, we combine two categorical variables whose cross product will be useful to us. The decision to choose the two categorical variables to be combined is completely arbitrary and largely depends on the business problem at hand. Such combinations can be useful, as many times certain variables don’t make as much sense individually as they do when combined with another categorical variable, generating many possibilities which can then be used to draw better inferences. Also, such cross products are useful to linear classifiers, which cannot model interactions between features.
For example, we have a dataset with many independent variables and a dependent variable, which is income. Among the independent variables, we have age and academic qualification.

Here, as discussed in the blog post on Binning, we can convert the Age variable (which is numeric) into a categorical variable by binning.

Once we have converted the numerical variable into a categorical variable, we can use feature crossing to combine Age and Qualification, and this will help us in predicting Income.
In our example, we will have 9 cross products: 20-29 - Under Graduate, 30-49 - Under Graduate, 50-65 - Under Graduate, 20-29 - Post Graduate, 30-49 - Post Graduate, 50-65 - Post Graduate, 20-29 - PhD, 30-49 - PhD, and 50-65 - PhD.

We can then, for example, find the average income of these groups, giving us better and more useful insights into the data. Here we can see how a high to medium qualification combined with a young age leads to a high income. Thus cross products can be of a lot of use to us.
Creating Variables by Changing Units of Measurement
If we have a dataset where, for all the numerical features, we have weights of different fruits in kilograms, but there are certain fruits whose weight is in grams, this can cause a problem if we want to visualize these variables, and can also make it difficult to do various kinds of statistical analysis. It has been mentioned in the earlier blog post on Feature Scaling that various scaling methods can be used to transform features, but we can also construct a separate feature by changing the units of measurement of a feature, as in this case, where the majority of our feature’s unit of measurement is in kilograms, we can convert the feature that is in grams into kilograms, and this newly constructed feature can be used in the dataset for further analysis.
Key Performance Indicators (KPI)
A KPI is a type of indicator that helps us understand how a particular organization is performing on various grounds. This is a term commonly used in the domain of management; however, a little bit about KPIs must be known, as it involves creating new variables from already existing data, which can provide us with information that can help us draw inferences about the current state of events in an organization, and can help managers and other leaders understand the difference between current performance and their determined business objectives. An example of a KPI can be when we have a dataset where, along with the demographic details of individuals, their outstanding debt and income are provided. If we were to decide the eligibility for a loan, we might decide to go with the person having a high income; however, certain KPIs must be created by constructing new features, such as in this scenario, where we must consider the income and outstanding debt of the person to find their current repaying capacity, which will help the concerned people in deciding eligibility as well as calculating the MCL (Maximum Credit Limit). Such features, when created, can be categorized as KPIs, and having knowledge of various KPIs can be of high use.
By using various methods, we can create new features from pre-existing features, which help us provide more information and give us a better insight and understanding of the data we are dealing with. These methods of construction can help in the process of modeling and can lead to a better interpretation of results.
TM