100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Tentamen (uitwerkingen)

Machine Learning Cheatsheet + 27 Exam Questions (No Answers)

Beoordeling
3,0
(4)
Verkocht
19
Pagina's
4
Cijfer
7-8
Geüpload op
17-01-2025
Geschreven in
2024/2025

Machine Learning Cheatsheet + 27 questions that were asked in the exam. There are no answers included.









Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
17 januari 2025
Aantal pagina's
4
Geschreven in
2024/2025
Type
Tentamen (uitwerkingen)
Bevat
Alleen vragen

Voorbeeld van de inhoud

Introduction to ML Gradient descent Gradient boosted trees
Decision trees use logic-based “if-then” rules. Logistic regression uses Linear regression: target is computed using weights (coefficients: w and Ensemble: multiple models work together to make predictions
weighted features. Regression: predicts a number. (Binary) Classification b) applied to the features. Gradient descent: an optimization algorithm (combine each model with their own prediction). When we combine
categorizes data (spam or not spam). Multi-class classification: (sports or that adjusts model parameters (like weights and intercept) step by step their predictions, the mistakes will cancel each other out. Bagging
finance or politics). Multi-label classification: (jazz and pop). Sequence to minimize the error between predictions and actual values (it can (Bootstrapping Aggregating): technique used in random forests (taking
labeling assigns label to each element in a sequence. Sequence to work in high-dimensional spaces). Goal: find the minimum of a random samples of the data to train each decision tree). Residual:
sequence is (Latin to English). Train: to teach the model. Validation: to function. Stop when the change in w becomes very small. SSE: sum of difference between predictions to actual values. MAE and MSE can be
tune and improve the model during development. Test: for final squared errors, used to evaluate how well the model fits the data. used to measure how well the models are performing. Ensembles has
evaluation. Cross validation: used when dataset is small or splits may be Slope: describes steepness of a single dimension. Derivative: gives the less spread and lower error. Their errors need to be uncorrelated. For
unrepresentative (break traindata in 10 parts, train on 9 and test on 1). slope at a specific point. Gradient: collection of slopes, one for each classification, instead of averaging, we use voting (majority vote).
Data points are randomly assigned to avoid bias. Stratification: all classes dimension. Learning rate: controls the size of the steps (small = slow Gradient boosting: builds a model step by step by starting with a simple
are proportionally represented in training and validation splits. Time- progress, large = risk of going too far). Typical approach is to start with a prediction and then adding small trees that focus on correcting the
series: past data for training and future data for validation. Time-series large learning rate, then decrease it as the model becomes more errors of the previous ones (each tree is fitted to the negative gradient
cross-validation: expand training set while ensuring validation is always refined. A smaller learning rate near the minimum helps fine-tune the (residual) of the loss function, which allows GB to handle different loss
forward-looking. MAE: average absolute difference between predictions model. Update rule: adjust the model’s weights using the slope and functions for various tasks. Negative gradient: shows the direction and
and true values. MSE: squares the differences. Precision: minimizing false learning rate and keeps repeating until the weights stop changing much. size of the adjustment needed to reduce the model’s prediction error as
positives (TP / TP + FP). Recall: minimizing false negative (TP / FN + TP). Disadvantages for large data: computationally expensive, requires more quickly as possible. Squared loss: exaggerate the influence of large
Accuracy: number of correct predictions (TP+TN/Total). Error rate: memory and resources  Solution: use a subset of data or SGD: errors (outliers). Absolute loss: less sensitive to large errors and outliers.
proportion of mistakes (FP+FN/Total). F-score: harmonic mean between Stochastic Gradient Descent, where only one example is used for each Huber loss: using squared for smaller residuals and absolute for larger
precision and recall (2* ((P*R)/P+R)). B = 1 means P and R are equal update (erratic movement, as updates are based on random subsets of residuals. GB in regression: we used decision trees and rely on residuals
important. B >1 means recall is more important. B <1 means precision is the data, but still ends up near the minimum. Advantage: faster and gradients to update model. GB in classification: problem becomes
updates, generalize better and avoid overfitting. Batch gradient more complex, so we use sequential addition of trees (instead of a
more important. descent: moves steadily towards the minimum (smooth path). single tree for all classes, the model builds separate trees for each class,
Decision trees & Forests
Momentum: helps smooth out the noisy updates in SGD, by combining assigning scores rather than labels, and combines them. Softmax: to
Use the best feature that best divides the data by class. A good split has
the current gradient with past updates, with the degree of smoothing convert raw scores into probabilities, ensuring they are between 0 and 1
less impurity. Misclassification impurity: measures mistakes made when
controlled by the parameter B. Local minima: where the model gets and sum to 1 across all classes. (used in the output layer for multi-class
labeling data after a split, can be less effective in cases with more than
stuck, unable to reach the minimum. SGD can help the model avoid problems). True class if represented as a “one-hot” distribution (all
two classes. (calculated as 1 – proportion of majority class). Gini
local minima. In higher-dimensional (like neural networks) local minima probabilities are 0 except for the correct class, which is 1). Cross
impurity: measures the likelihood of a random element being incorrectly
is less of a problem because the structure of the error function is more entropy: measures how well the predicted probabilities match the true
classified. Entropy: measures the uniformity of a distribution (all classes
complex and multidimensional. Autodiff: automatically computes one-hot labels, aiming to minimize the difference using concepts from
equal  entropy is high  uncertainty). Lower impurity = better
derivatives by applying calculus rules to the model’s computation graph. entropy and KL divergence.
question. Trees are built incrementally, one split at a time. Goal: create a
structure that minimizes mistakes and simplifies decision-making. Branch Linear classifiers
node: holds a question. Leaf node: holds the class label. Base case: leaf Perceptron: computes weighted sum of input features (plus bias), if sum >= 0, outputs + 1, otherwise outputs -1. Linear classifier: draws a straight
node. Recursive case: function that calls itself until some base case is line (boundary) to separate data into different groups. Bias: helps the perceptron decide when there’s no information or when all features are zero.
reached. Handling numerical data: convert numerical features into (if misclassified, increase or decrease bias). Weights: if misclassified, adjust weights by adding or subtracting the feature values (x). Batch learning:
binary questions (is the size >= threshold?). Depth of the tree determines models that use the whole dataset at once to train (decision trees). Online learning: models like the perceptron that update one example at a time,
classification speed (balanced trees are faster  classification time grows useful when data is continuously generated (like social media posts). The order of examples affects how the model learns: important to randomize
logarithmically with the number of lead nodes). Advantages of DT: easy data order. Zero-One Loss: counts classification mistakes as 0 (correct classification) or 1 (incorrect). Doesn’t give a slope when used for gradient
to interpret and visualize, especially with smaller trees. Disadvantages of descent, making it unhelpful for learning. Logistic regression: unlike the perceptron, uses probabilities to estimate class likelihoods and minimizes
DT: large trees can become hard to interpret, prone to overfitting (to errors through a loss function, typically trained using gradient descent. It calculates the logit, which is the logarithm of the odds ratio, showing the
prevent overfitting, control the depth of the tree and use pruning or relative likelihood of the positive class versus the negative class. It maps probabilities (0 to 1) to real numbers (from minus infinity to plus infinity).
setting minimum sample sizes for splits). Pruning: removes unnecessary So, w * x + b = logit (let’s say 3.0), then exp is always 2.718. 2.718^3 = 20.079. 1/20.079 = 0.0498. 1/(1+0.0498 = 0.95). Inverse logit (sigmoid):
nodes to simplify the tree. Random forest: a collection of decision trees, transforms real numbers (from negative infinity to positive infinity) into probabilities (0 to 1) (suitable for binary classification). Perceptron only
each tree is trained on different subset of the data, often using majority updates weights when there is an error, logistic regression updates weights based on the difference between predicted probabilities and true labels.
voting for classification. Advantages: reduces overfitting and improves To prevent overfitting: regularization term (L2 regularization), which penalizes large weights: by adding a penalty term to the loss function,
generalization. Disadvantages: less interpretable. controlled by a hyperparameter (alpha). Larger alpha value: encourage simpler models with smaller weights, while smaller values allow the model
to fit the data more closely. SVM: uses hinge loss to create decision boundary that maximizes the margin between classes, similar to loss function.
(Logistic regression: predicts probabilities (output between 0 and 1), Linear regression: predicts continuous numerical values).
€5,48
Krijg toegang tot het volledige document:
Gekocht door 19 studenten

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Beoordelingen van geverifieerde kopers

Alle 4 reviews worden weergegeven
2 weken geleden

1 maand geleden

4 maanden geleden

1 week geleden

3,0

4 beoordelingen

5
1
4
1
3
0
2
1
1
1
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
iuk Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
48
Lid sinds
6 jaar
Aantal volgers
14
Documenten
2
Laatst verkocht
4 dagen geleden
iuk notes

3,7

6 beoordelingen

5
3
4
1
3
0
2
1
1
1

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen