Column Transformer for Faster Feature Engineering in Machine Learning

Data pre-processing techniques

Amit Chauhan


Photo by Caspar Camille Rubin on Unsplash

Data is very important and a need for predictive modeling. Data goes through various pre-processing techniques before being feed into machine learning model. Feature engineering is a vital part of data pre-processing that handles missing values, data scaling, data encoding from string categories to numerical, and other techniques.

Each column in the data can have a different problem and it can be handled using pre-processing techniques. The main issue in data pre-processing is to handle each column.

Suppose we need to use one hot encoding, imputation, ordinal encoding, and others. These different process creates arrays of each column and then need to concatenate all these arrays to make one big array. So, this type of approach is not so much efficient and faster.

The column transformer is a way to handle all these trouble columns in a single process.

Python Example:

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder
df1 = pd.read_csv('demo.csv')