UNIT IV ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING 9
Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian mixture
models and Expectation maximization
Combining Multiple Learners
• When designing a learning machine, we make choices like parameters of machine, training data,
representation, etc. This implies some sort of variance in performance. For example, in a classification
setting, we can use a parametric classifier or in a multilayer perceptron, we should also decide on the
number of hidden units.
• Each learning algorithm dictates a certain model that comes with a set of assumptions. This inductive
bias leads to error if the assumptions do not hold for the data.
• Different learning algorithms have different accuracies. The learning algorithms can be combined to
attain higher accuracy.
• Data fusion is the process of fusing multiple records representing the same real-world object into a
single, consistent, and clean representation..
• Combining different models is done to improve the performance of deep learning models.
Building a new model by combination requires less time, data, and computational resources.
The most common method to combine models is by a weighted average improves the averaging
multiple models, where taking a weighted average improves the accuracy.
1. Generating Diverse Learners:
• Different Algorithms: We can use different learning algorithms to train different base-learners.
Different algorithms make different assumptions about the data and lead to different classifiers.
• Different Hyper-parameters: We can use the same learning algorithm but use it with different hyper-
parameters.
• Different Input Representations: Different representations make different characteristics explicit
allowing better identification.
• Different Training Sets: Another possibility is to train different base-learners by different subsets of
the training set.
,Model Combination Schemes
• Different methods are used for generating final output for multiple base learners are Multiexpert and
multistage combination.
1. Multiexpert combination
• Multiexpert combination methods have base-learners that work in parallel.
a) Global approach (learner fusion): given an input, all base-learners generate an output and all these
outputs are used, such as voting and stacking
b) Local approach (learner selection): in mixture of experts, there is a gating model, which looks at
the input and chooses one (or very few) of the learners as responsible for generating the output.
2. Multistage combination
Multistage combination methods use a serial approach where the next multistage combination base-
learner is trained with or tested on only the instances where the previous base-learners are not accurate
enough.
• Let's assume that we want to construct a function that maps inputs to outputs from a set of known
Ntrain input-output pairs.
D train = {(x, y)}i=1Ntrain
where xi Є X is a D dimensional feature input vector, yi Є Y is the output.
Voting
A voting classifier is a machine learning model that gains experience by training on a collection of
several models and forecasts an output (class) based on the class with the highest likelihood of
becoming the output.
• Voting is an ensemble machine learning algorithm.
• For regression, a voting ensemble involves making a prediction that is the average of multiple other
regression models.
Voting Strategies:
Hard Voting – The class that receives the majority of votes is selected as the final prediction. It is
commonly used in classification problems. In regression, it predicts the average of the individual
predictions.
Soft Voting – Weighted average of predicted probabilities is used to make the final prediction. It
is suitable when classifiers provide probability estimates. In other words, for each class, it sums
the predicted probabilities and predicts the class with the highest sum.
,• In this methods, the first step is to create multiple classification/regression models using some training
dataset.
Each base model can be created using different splits of the same training dataset and same algorithm, or
using the same dataset with different algorithms, or any other method.
Fig. 9.1.2 shows general idea of Base-learners with model combiner.
, • When combining multiple independent and diverse decisions each of which is at least more accurate
than random guessing, random errors cancel each other out, and correct decisions are reinforced. Human
ensembles are demonstrably better.
• Use a single, arbitrary learning algorithm but manipulate training data to make it learn multiple
models.
Base Models: Individual models that form the ensemble. For example, Support Vector Machines,
Logistic Regression, Decision Trees.
Classifier and Regressor Variants:
Voting Classifier – Combines multiple classifiers for classification tasks.
Voting Regressor – Combines multiple regressors for regression tasks.
Ensemble Learning
Ensemble Learning
Ensemble modeling is the process of running two or more related but different analytical models and
then synthesizing the results into a single score or spread in order to improve the accuracy of predictive
analytics and data mining applications.
• Ensembles of classifiers is a set of classifiers whose individual decisions combined in some way to
classify new examples.
• Ensemble methods combine several decision trees classifiers to produce better predictive performance
than a single decision tree classifier.
The main principle behind the ensemble model is that a group of weak learners come together to form a
strong learner, thus increasing the accuracy of the model.
• Why do ensemble methods work?
• Based on one of two basic observations :
1. Variance reduction: If the training sets are completely independent, it will always helps to average
an ensemble because this will reduce variance without affecting bias (e.g., bagging) and reduce
sensitivity to individual data points.
2. Bias reduction: For simple models, average of models has much greater capacity than single model.
Averaging models can reduce bias substantially by increasing capacity and control variance by Citting
one component at a time.
Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian mixture
models and Expectation maximization
Combining Multiple Learners
• When designing a learning machine, we make choices like parameters of machine, training data,
representation, etc. This implies some sort of variance in performance. For example, in a classification
setting, we can use a parametric classifier or in a multilayer perceptron, we should also decide on the
number of hidden units.
• Each learning algorithm dictates a certain model that comes with a set of assumptions. This inductive
bias leads to error if the assumptions do not hold for the data.
• Different learning algorithms have different accuracies. The learning algorithms can be combined to
attain higher accuracy.
• Data fusion is the process of fusing multiple records representing the same real-world object into a
single, consistent, and clean representation..
• Combining different models is done to improve the performance of deep learning models.
Building a new model by combination requires less time, data, and computational resources.
The most common method to combine models is by a weighted average improves the averaging
multiple models, where taking a weighted average improves the accuracy.
1. Generating Diverse Learners:
• Different Algorithms: We can use different learning algorithms to train different base-learners.
Different algorithms make different assumptions about the data and lead to different classifiers.
• Different Hyper-parameters: We can use the same learning algorithm but use it with different hyper-
parameters.
• Different Input Representations: Different representations make different characteristics explicit
allowing better identification.
• Different Training Sets: Another possibility is to train different base-learners by different subsets of
the training set.
,Model Combination Schemes
• Different methods are used for generating final output for multiple base learners are Multiexpert and
multistage combination.
1. Multiexpert combination
• Multiexpert combination methods have base-learners that work in parallel.
a) Global approach (learner fusion): given an input, all base-learners generate an output and all these
outputs are used, such as voting and stacking
b) Local approach (learner selection): in mixture of experts, there is a gating model, which looks at
the input and chooses one (or very few) of the learners as responsible for generating the output.
2. Multistage combination
Multistage combination methods use a serial approach where the next multistage combination base-
learner is trained with or tested on only the instances where the previous base-learners are not accurate
enough.
• Let's assume that we want to construct a function that maps inputs to outputs from a set of known
Ntrain input-output pairs.
D train = {(x, y)}i=1Ntrain
where xi Є X is a D dimensional feature input vector, yi Є Y is the output.
Voting
A voting classifier is a machine learning model that gains experience by training on a collection of
several models and forecasts an output (class) based on the class with the highest likelihood of
becoming the output.
• Voting is an ensemble machine learning algorithm.
• For regression, a voting ensemble involves making a prediction that is the average of multiple other
regression models.
Voting Strategies:
Hard Voting – The class that receives the majority of votes is selected as the final prediction. It is
commonly used in classification problems. In regression, it predicts the average of the individual
predictions.
Soft Voting – Weighted average of predicted probabilities is used to make the final prediction. It
is suitable when classifiers provide probability estimates. In other words, for each class, it sums
the predicted probabilities and predicts the class with the highest sum.
,• In this methods, the first step is to create multiple classification/regression models using some training
dataset.
Each base model can be created using different splits of the same training dataset and same algorithm, or
using the same dataset with different algorithms, or any other method.
Fig. 9.1.2 shows general idea of Base-learners with model combiner.
, • When combining multiple independent and diverse decisions each of which is at least more accurate
than random guessing, random errors cancel each other out, and correct decisions are reinforced. Human
ensembles are demonstrably better.
• Use a single, arbitrary learning algorithm but manipulate training data to make it learn multiple
models.
Base Models: Individual models that form the ensemble. For example, Support Vector Machines,
Logistic Regression, Decision Trees.
Classifier and Regressor Variants:
Voting Classifier – Combines multiple classifiers for classification tasks.
Voting Regressor – Combines multiple regressors for regression tasks.
Ensemble Learning
Ensemble Learning
Ensemble modeling is the process of running two or more related but different analytical models and
then synthesizing the results into a single score or spread in order to improve the accuracy of predictive
analytics and data mining applications.
• Ensembles of classifiers is a set of classifiers whose individual decisions combined in some way to
classify new examples.
• Ensemble methods combine several decision trees classifiers to produce better predictive performance
than a single decision tree classifier.
The main principle behind the ensemble model is that a group of weak learners come together to form a
strong learner, thus increasing the accuracy of the model.
• Why do ensemble methods work?
• Based on one of two basic observations :
1. Variance reduction: If the training sets are completely independent, it will always helps to average
an ensemble because this will reduce variance without affecting bias (e.g., bagging) and reduce
sensitivity to individual data points.
2. Bias reduction: For simple models, average of models has much greater capacity than single model.
Averaging models can reduce bias substantially by increasing capacity and control variance by Citting
one component at a time.