1. Introduction
Machine Learning
Learn to perform a task based on experience 𝑋 and minimizing error ϵ
● If the data is biased the model is biased
𝑓θ(𝑋) = 𝑦
Inductive Bias
The assumptions put into the model (β):
● What should the model look like
● User-defined settings (hyperparameters)
● Assumptions about the distribution of the data (i.e. 𝑋 ∼ 𝑁)
● Knowledge transferred from previous tasks (𝑓 , 𝑓 , 𝑓 , ... ⇒ 𝑓 )
1 2 3 𝑛𝑒𝑤
𝑎𝑟𝑔 min ϵ(𝑓θ,β(𝑋))
θ,β
Statistics Machine Learning
● Help humans understand the world ● Automated task entry
● Assume data generated according ● Assume data generation process is
understandable model unknown
Supervised Learning
Learn model 𝑓 from labeled data (𝑋, 𝑦) (ground truth).
● Classification: predict a class label (category), discrete and unordered
○ Result can be binary (0, 1) or multi-class (a, b, c, d)
○ Can return confidence per class
○ Predictions yield a decision boundary separating classes
● Regression: predict a continuous value (i.e. temperature)
○ Target variable is numeric
○ Some algorithms can return confidence interval
○ Find the relationship between predictors and the target variable
Unsupervised Learning
Explore structure of unlabeled data (𝑋) to extract meaningful information.
● Clustering: organize information into meaningful subgroups (clusters)
○ Objects in the cluster share a certain degree of similarity (and dissimilarity to
other clusters)
1
, ● Dimensionality reduction: can compress data into fewer dimensions while retaining most
of the information
○ New features lose original meaning
○ New representation can be easier to model/visualize
Semi-supervised Learning
Learn a model from a few labeled and many unlabeled data points.
Reinforcement Learning
Develop an agent that improves performance based on interactions with the environment.
● Search a (large) space of actions and states
● Learn a series of actions (policy) that maximizes reward through exploration
● Reward function: defines how well a (series of) actions works
Learning = Representation + Evaluation + Optimization
● Representation: defines concepts it can learn (hypothesis space)
● Evaluation: an way to choose one hypothesis over the other using a object function,
scoring function or loss function (ℓ) (diff. between correct output and predictions)
● Optimization: efficient way to search hypothesis space
Overfitting
A model that is too complex for the amount of data you have (high train score & low test score)
→ Solution: make the model simpler (regularization), collect more data, remove features or
scale data.
Underfitting
A model that is too simple given the complexity of the data (low train score & low test score) →
Solution: use more complex model
Model Selection
By using an (external) evaluation function we can check:
● If we’re learning the right thing (feedback signal) → underfitting/overfitting
● Choose to fit the application
● Choose different hyperparameter settings
Data Split
Data needs to be split to avoid data leakage (optimizing hyperparameters or preprocessing
based on the test data).
● Train model → train set
● Optimize hyperparameters → validation set
● Evaluate → test set
2
Machine Learning
Learn to perform a task based on experience 𝑋 and minimizing error ϵ
● If the data is biased the model is biased
𝑓θ(𝑋) = 𝑦
Inductive Bias
The assumptions put into the model (β):
● What should the model look like
● User-defined settings (hyperparameters)
● Assumptions about the distribution of the data (i.e. 𝑋 ∼ 𝑁)
● Knowledge transferred from previous tasks (𝑓 , 𝑓 , 𝑓 , ... ⇒ 𝑓 )
1 2 3 𝑛𝑒𝑤
𝑎𝑟𝑔 min ϵ(𝑓θ,β(𝑋))
θ,β
Statistics Machine Learning
● Help humans understand the world ● Automated task entry
● Assume data generated according ● Assume data generation process is
understandable model unknown
Supervised Learning
Learn model 𝑓 from labeled data (𝑋, 𝑦) (ground truth).
● Classification: predict a class label (category), discrete and unordered
○ Result can be binary (0, 1) or multi-class (a, b, c, d)
○ Can return confidence per class
○ Predictions yield a decision boundary separating classes
● Regression: predict a continuous value (i.e. temperature)
○ Target variable is numeric
○ Some algorithms can return confidence interval
○ Find the relationship between predictors and the target variable
Unsupervised Learning
Explore structure of unlabeled data (𝑋) to extract meaningful information.
● Clustering: organize information into meaningful subgroups (clusters)
○ Objects in the cluster share a certain degree of similarity (and dissimilarity to
other clusters)
1
, ● Dimensionality reduction: can compress data into fewer dimensions while retaining most
of the information
○ New features lose original meaning
○ New representation can be easier to model/visualize
Semi-supervised Learning
Learn a model from a few labeled and many unlabeled data points.
Reinforcement Learning
Develop an agent that improves performance based on interactions with the environment.
● Search a (large) space of actions and states
● Learn a series of actions (policy) that maximizes reward through exploration
● Reward function: defines how well a (series of) actions works
Learning = Representation + Evaluation + Optimization
● Representation: defines concepts it can learn (hypothesis space)
● Evaluation: an way to choose one hypothesis over the other using a object function,
scoring function or loss function (ℓ) (diff. between correct output and predictions)
● Optimization: efficient way to search hypothesis space
Overfitting
A model that is too complex for the amount of data you have (high train score & low test score)
→ Solution: make the model simpler (regularization), collect more data, remove features or
scale data.
Underfitting
A model that is too simple given the complexity of the data (low train score & low test score) →
Solution: use more complex model
Model Selection
By using an (external) evaluation function we can check:
● If we’re learning the right thing (feedback signal) → underfitting/overfitting
● Choose to fit the application
● Choose different hyperparameter settings
Data Split
Data needs to be split to avoid data leakage (optimizing hyperparameters or preprocessing
based on the test data).
● Train model → train set
● Optimize hyperparameters → validation set
● Evaluate → test set
2