process of building a machine learning model
Supervised Learning
that is based on labeled training data.
process of building a machine learning model
Unsupervised Learning
without relying on labeled training data
The process involves arranging data into a fixed
Classification
number of categories.
The model will not perform well on new data
Overfitting outside the training data set because it is too
fine tuned to the training dataset.
Involves formatting and manipulating the data
Preprocessing Data
so that the algorithm will better learn from it.
This is the process of converting our numerical
Binarization
values into Boolean values.
Involves removing the mean from our feature
Mean Removal
vector so each featured is centered on zero.
, We don't want values to be artificially large or
Scaling small (outliers) so we scale the data to be more
even
to modify the values in the data so that we can
Normalization
measure them on a common scale.
Makes sure that the sum of absolute values is 1 in
L1 Normalization (Lease
each row. Considered more robust than L2 as it
Absolute Deviations)
deals with outliers.
Refers to least squares, makes the sum of
L2 Normalization squares in row is equal to 1. May be a better
choice when outliers are important.
The process of transforming the word labels
Label Encoding into numerical form. sklkearn expects labels to
be numerical.
Supervised Learning
that is based on labeled training data.
process of building a machine learning model
Unsupervised Learning
without relying on labeled training data
The process involves arranging data into a fixed
Classification
number of categories.
The model will not perform well on new data
Overfitting outside the training data set because it is too
fine tuned to the training dataset.
Involves formatting and manipulating the data
Preprocessing Data
so that the algorithm will better learn from it.
This is the process of converting our numerical
Binarization
values into Boolean values.
Involves removing the mean from our feature
Mean Removal
vector so each featured is centered on zero.
, We don't want values to be artificially large or
Scaling small (outliers) so we scale the data to be more
even
to modify the values in the data so that we can
Normalization
measure them on a common scale.
Makes sure that the sum of absolute values is 1 in
L1 Normalization (Lease
each row. Considered more robust than L2 as it
Absolute Deviations)
deals with outliers.
Refers to least squares, makes the sum of
L2 Normalization squares in row is equal to 1. May be a better
choice when outliers are important.
The process of transforming the word labels
Label Encoding into numerical form. sklkearn expects labels to
be numerical.