Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 4 (April to June 2022)
Course: Machine Learning (880083-M-6)
Lecturers: Ç. Güven
,Lecture 1: Introduction to Machine Learning
Machine Learning
• Machine Learning means learning from experience
• Concept of Generalization: Algorithm also works with unseen data
Types of learning problems
• Supervised (Classification, Regression) vs Unsupervised Learning (Clustering)
• Multilabel Classification: multiple labels per sample
o Assign songs to one or more genres (for each genre, each song is labeled yes or no)
• Multiclass Classification: one label per sample
o Assign songs to one genre (for each song one label is chosen)
Evaluation
• Mean absolute error: average, absolute difference between true value and predicted value
• Mean squared error: average square of the difference between the true and the predicted
value (more sensitive to outliers, usually larger than MAE)
• Type I error: false positive
• Type II error: false negative
• accuracy compares the true prediction vs the whole set of datapoints
o (TP + TN) / (TP + FN + FP + TN)
• Error rate / misclassification rate
o (FP + FN) / (TP + FN + FP + TN)
• Accuracy and error rate are only useful if the dataset is balanced
• precision is the hit-rate (true positives vs the ones predicted as positives)
o “What fraction of flagged emails are real SPAM?”
o (TP) / (TP + FP)
• recall is the true positive rate (true positives vs the actual positives)
o “What fraction of real SPAM has been flagged?”
o (TP) / (TP + FN)
• F or F1 score combines precision and recall and comes up with a harmonic mean of the two
o 2* [ ( (TP) / (TP + FP) ) * ( (TP) / (TP + FN) ) ] / [ ( (TP) / (TP + FP) ) + ( (TP) / (TP + FN) ) ]
o 2* [ Precision * Recall ] / [ Precision + Recall ]
• Use F beta to give more weight to recall or precision
o > 1: recall is weighted more
o < 1, precision is weighted more
, • When there are more than two classes use micro and macro average
o Macro average
▪ rare classes have the same impact as frequent classes (don’t use this one
when the classes are not balanced!)
▪ Compute precision and recall per-class, and average them
o Micro average
▪ Micro averaging treats the entire set of data as an aggregate result, and
calculates 1 metric rather than k metrics that get averaged together
▪
o Macro F1-Score is the harmonic mean of Macro-Precision and Macro-Recall
Find the best possible solution
• We are trying to approximate the relation between the input and the target value
• For a single value, the loss function captures the difference between the predicted and the
true target value
• Cost Function is the loss function plus a regularization term
→ find the parameters which minimize the cost function
• Empirical risk minimization: we are trying to minimize the risk on the sample set
o If the risk is represented by MAE:
o calculate average difference between
estimated cost function and the true cost
function → minimize that one
• ̂
𝑓 (𝑥) can be a linear function or more complex (polynomial function). The higher the power,
the more complex the model.
o If 𝑓̂(𝑥) = 𝜃𝑥 + 𝑐 (linear):
o Use training and validation data to find hyperparameter theta and power
• Optimal solution minimizes the loss between 𝑓(𝑥) and 𝑓̂(𝑥)
• Use a polynomial function for more complex relationships
• A higher power p implies higher degree of freedom = flexibility
• Use cross validation to find the best hyperparameter p
Regularization
•
• Add lambda as regularization term to the cost function to regulate theta to avoid overfitting
o Large value of lambda reduces the size of theta term and overfitting since a simpler
model is assumed