Handling Outliers

Handling Outliers

Outliers are those values which are extremely different from other values in the dataset. To work with outliers we have to find answers to two problems. Firstly, how do we define an outlier and secondly, how do we handle the outliers? Let’s take a look at the two questions separately. Outlier Identification Before handling the outliers it is first important to establish which data points […]

Read Me

Encoding Categorical Variables

Encoding Categorical Variables

Your machine learning models cannot train on the categorical variables so they need to be encoded into a numerical format. In this article we will discuss different encoding techniques. One Hot Encoding In this technique we replace each categorical variable with multiple dummy variables where the number of new variables depend on the cardinality of the categorical variable. The dummy variables have binary values where […]

Read Me

Missing Data Imputation

Missing Data Imputation

The most common issue faced during feature engineering is handling of missing data. It is important to handle the missing data as otherwise your machine learning libraries like Scikit-learn would not be able to work with your data. Before we look at the various ways to handle missing data, we need to first analyse the missing data causes and patterns. Causes can be several ranging […]

Read Me

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a process to achieve Dimensionality Reduction which means reducing the number of features or dimensions while retaining the original variance of the whole data set. The new set of principal components have the variance in descending order so that the first component has maximum variance. Data: We will use the same Breast Cancer data we have used in Support Vector Machines […]

Read Me

What is Machine Learning?

What is Machine Learning?

Machine learning is commonly defined as a field of computer science that gives machines the ability to learn without being explicitly programmed. The above statement although correct may not provide clear explanation to someone new to this field. Lets first understand what we mean by the term ‘ability to learn’. In the context of machine learning, it can be considered as a process of applying […]

Read Me