Fully Explained Voting Ensemble Technique in Machine Learning

Ensemble method for machine learning and data science

Amit Chauhan


Photo by ThisisEngineering RAEng on Unsplash

Ensemble learning in machine learning is a method to use multiple weak learners i.e. different algorithms to create a strong predictive model or strong learners.

In general the types of ensemble methods:

  1. Bagging
  2. Boosting
  3. Voting
  4. Stacking

From the above methods, we will study the voting ensemble.

Voting Ensemble

In this technique, we use different machine learning models that will train on the same dataset to make classification or regression predictions.

Assumptions to be taken in voting technique:

  1. The base model should be different.
  2. The accuracy of each model should be greater than 50%. The final accuracy depends on the prediction probabilities of each model.

As we are using many base models, the effect of poor performance by one algorithm can be managed by the strong performance model.

Types of voting ensemble depend on prediction:

For Classification:

  1. Soft voting: The prediction output based on the average of all probabilities of each base model class and choosing the highest
    probability class.
  2. Hard voting: The prediction output of each model is based on the majority voting classes.

For Regression:

  1. The prediction output is the average of each model.

Soft and Hard voting Classifier Examples

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
df = pd.read_csv('iris.csv')
from sklearn.preprocessing import LabelEncoder

# encoding the output column categories to numerical
encoder = LabelEncoder()
df['variety'] = encoder.fit_transform(df['variety'])
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import…