Feature Scaling

Feature Scaling

The predictors in a dataset are mostly of different magnitudes. For example, in a ‘user’ dataset the ‘age’ feature will have positive values and normally in single or double digit but if the same dataset also contains salary, its values can easily be in five or six figures. We will discuss some techniques to normalise the variables so that all features have same or similar magnitude. This process is called Feature Scaling.

Standardisation

The values are converted to their z-score equivalent. To learn more about z-score checkout the article on outliers.

# Using Scikit-learn

from sklearn.preprocessing import StandardScaler

>>> data = [[10],[20],[15],[25]]

>>> scaler = StandardScaler()
>>> scaler.fit(data)
StandardScaler()

>>> scaler.mean_
array([17.5])
>>> scaler.var_
array([31.25])

>>> scaler.transform(data)
array([[-1.34164079],
       [ 0.4472136 ],
       [-0.4472136 ],
       [ 1.34164079]])

You can also pass a dataframe to the scaler but it will only return back a Numpy array. You will have to convert it back into a dataframe using pd.DataFrame() function. Make sure you first split your data and only pass the training set to the scaler for learning. The same scaler has to be used to transform both training and test sets.

# Using Scipy

>>> from scipy.stats import zscore

>>> zscore(data)
array([[-1.34164079],
       [ 0.4472136 ],
       [-0.4472136 ],
       [ 1.34164079]])

The final output has a mean of 0 and variance of 1. Also, this technique will retain the original distribution including the outliers.

Mean Normalisation

Like the standardisation, the data is centred at mean and the data will range between -1 and +1. The scaling is based on the range of the data where range is the difference between the maximum and minimum values.

scaled_data = (X – mean) / (max – min)

data_mean = X_train.mean(axis=0)
data_range = X_train.max(axis=0)-X_train.min(axis=0)

X_train_new = (X_train - data_mean) / data_range
X_test_new = (X_test - data_mean) / data_range

MinMaxScaler

It is similar to Mean Normalisation but instead of using mean we use the minimum value as shown below.

scaled_data = (X – min) / (max – min)

The value will range from 0 to 1 so in this technique the magnitude of all features will get aligned to the same scale.

# Using Scikit-learn

from sklearn.preprocessing import MinMaxScaler

>>> data = [[10],[20],[15],[25]]

>>> scaler = MinMaxScaler()
>>> scaler.fit(data)
MinMaxScaler()

>>> scaler.transform(data)
array([[0.        ],
       [0.66666667],
       [0.33333333],
       [1.        ]])

MaxAbsScaler

The values are scaled to the maximum of the absolute values (and not absolute value of the maximum). So if the data range is [10, 20, 15, -25] then -25 will be used for scaling as its absolute value, 25, is highest.

scaled_data = X – / max(abs(values))

>>> from sklearn.preprocessing import MaxAbsScaler
>>> data = [[10],[20],[15],[-25]]

>>> scaler = MaxAbsScaler()
>>> scaler.fit(data)
MaxAbsScaler()

>>> scaler.transform(data)
array([[ 0.4],
       [ 0.8],
       [ 0.6],
       [-1. ]])

RobustScaler

The values are scaled to the IQR and centred around the median value.

scaled_data = (X – median) / IQR

>>> from sklearn.preprocessing import RobustScaler
>>> scaler = RobustScaler()

>>> data = [[10],[20],[15],[25]]

>>> scaler.fit(data)
RobustScaler()

>>> scaler.transform(data)
array([[-1.        ],
       [ 0.33333333],
       [-0.33333333],
       [ 1.        ]])

Wrap Up

We have seen some of the popular techniques of feature scaling. But before scaling we should know that it is not mandatory to scale features in all the cases. Particular in the tree based algorithms the feature scaling is not required. On the other hand, any algorithm based on distance calculations like regression or clustering problems should consider feature scaling as otherwise the features with very high magnitude will become dominant. Also check for the impact of the outliers.

Refer to the official documentation of Scikit-learn for more information.