Tentamen (uitwerkingen)

Machine Learning Cheatsheet + 27 Exam Questions (No Answers)

Beoordeling

3,0

(4)

Verkocht

Pagina's

Cijfer

7-8

Geüpload op

17-01-2025

Geschreven in

2024/2025

Machine Learning Cheatsheet + 27 questions that were asked in the exam. There are no answers included.

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Data Science & Society
Vak: Machine Learning

Alle documenten voor dit vak (16)

Documentinformatie

Geüpload op: 17 januari 2025
Aantal pagina's: 4
Geschreven in: 2024/2025
Type: Tentamen (uitwerkingen)
Bevat: Alleen vragen

Onderwerpen

Voorbeeld van de inhoud

Introduction to ML Gradient descent Gradient boosted trees
Decision trees use logic-based “if-then” rules. Logistic regression uses Linear regression: target is computed using weights (coefficients: w and Ensemble: multiple models work together to make predictions
weighted features. Regression: predicts a number. (Binary) Classification b) applied to the features. Gradient descent: an optimization algorithm (combine each model with their own prediction). When we combine
categorizes data (spam or not spam). Multi-class classification: (sports or that adjusts model parameters (like weights and intercept) step by step their predictions, the mistakes will cancel each other out. Bagging
finance or politics). Multi-label classification: (jazz and pop). Sequence to minimize the error between predictions and actual values (it can (Bootstrapping Aggregating): technique used in random forests (taking
labeling assigns label to each element in a sequence. Sequence to work in high-dimensional spaces). Goal: find the minimum of a random samples of the data to train each decision tree). Residual:
sequence is (Latin to English). Train: to teach the model. Validation: to function. Stop when the change in w becomes very small. SSE: sum of difference between predictions to actual values. MAE and MSE can be
tune and improve the model during development. Test: for final squared errors, used to evaluate how well the model fits the data. used to measure how well the models are performing. Ensembles has
evaluation. Cross validation: used when dataset is small or splits may be Slope: describes steepness of a single dimension. Derivative: gives the less spread and lower error. Their errors need to be uncorrelated. For
unrepresentative (break traindata in 10 parts, train on 9 and test on 1). slope at a specific point. Gradient: collection of slopes, one for each classification, instead of averaging, we use voting (majority vote).
Data points are randomly assigned to avoid bias. Stratification: all classes dimension. Learning rate: controls the size of the steps (small = slow Gradient boosting: builds a model step by step by starting with a simple
are proportionally represented in training and validation splits. Time- progress, large = risk of going too far). Typical approach is to start with a prediction and then adding small trees that focus on correcting the
series: past data for training and future data for validation. Time-series large learning rate, then decrease it as the model becomes more errors of the previous ones (each tree is fitted to the negative gradient
cross-validation: expand training set while ensuring validation is always refined. A smaller learning rate near the minimum helps fine-tune the (residual) of the loss function, which allows GB to handle different loss
forward-looking. MAE: average absolute difference between predictions model. Update rule: adjust the model’s weights using the slope and functions for various tasks. Negative gradient: shows the direction and
and true values. MSE: squares the differences. Precision: minimizing false learning rate and keeps repeating until the weights stop changing much. size of the adjustment needed to reduce the model’s prediction error as
positives (TP / TP + FP). Recall: minimizing false negative (TP / FN + TP). Disadvantages for large data: computationally expensive, requires more quickly as possible. Squared loss: exaggerate the influence of large
Accuracy: number of correct predictions (TP+TN/Total). Error rate: memory and resources  Solution: use a subset of data or SGD: errors (outliers). Absolute loss: less sensitive to large errors and outliers.
proportion of mistakes (FP+FN/Total). F-score: harmonic mean between Stochastic Gradient Descent, where only one example is used for each Huber loss: using squared for smaller residuals and absolute for larger
precision and recall (2* ((P*R)/P+R)). B = 1 means P and R are equal update (erratic movement, as updates are based on random subsets of residuals. GB in regression: we used decision trees and rely on residuals
important. B >1 means recall is more important. B <1 means precision is the data, but still ends up near the minimum. Advantage: faster and gradients to update model. GB in classification: problem becomes
updates, generalize better and avoid overfitting. Batch gradient more complex, so we use sequential addition of trees (instead of a
more important. descent: moves steadily towards the minimum (smooth path). single tree for all classes, the model builds separate trees for each class,
Decision trees & Forests
Momentum: helps smooth out the noisy updates in SGD, by combining assigning scores rather than labels, and combines them. Softmax: to
Use the best feature that best divides the data by class. A good split has
the current gradient with past updates, with the degree of smoothing convert raw scores into probabilities, ensuring they are between 0 and 1
less impurity. Misclassification impurity: measures mistakes made when
controlled by the parameter B. Local minima: where the model gets and sum to 1 across all classes. (used in the output layer for multi-class
labeling data after a split, can be less effective in cases with more than
stuck, unable to reach the minimum. SGD can help the model avoid problems). True class if represented as a “one-hot” distribution (all
two classes. (calculated as 1 – proportion of majority class). Gini
local minima. In higher-dimensional (like neural networks) local minima probabilities are 0 except for the correct class, which is 1). Cross
impurity: measures the likelihood of a random element being incorrectly
is less of a problem because the structure of the error function is more entropy: measures how well the predicted probabilities match the true
classified. Entropy: measures the uniformity of a distribution (all classes
complex and multidimensional. Autodiff: automatically computes one-hot labels, aiming to minimize the difference using concepts from
equal  entropy is high  uncertainty). Lower impurity = better
derivatives by applying calculus rules to the model’s computation graph. entropy and KL divergence.
question. Trees are built incrementally, one split at a time. Goal: create a
structure that minimizes mistakes and simplifies decision-making. Branch Linear classifiers
node: holds a question. Leaf node: holds the class label. Base case: leaf Perceptron: computes weighted sum of input features (plus bias), if sum >= 0, outputs + 1, otherwise outputs -1. Linear classifier: draws a straight
node. Recursive case: function that calls itself until some base case is line (boundary) to separate data into different groups. Bias: helps the perceptron decide when there’s no information or when all features are zero.
reached. Handling numerical data: convert numerical features into (if misclassified, increase or decrease bias). Weights: if misclassified, adjust weights by adding or subtracting the feature values (x). Batch learning:
binary questions (is the size >= threshold?). Depth of the tree determines models that use the whole dataset at once to train (decision trees). Online learning: models like the perceptron that update one example at a time,
classification speed (balanced trees are faster  classification time grows useful when data is continuously generated (like social media posts). The order of examples affects how the model learns: important to randomize
logarithmically with the number of lead nodes). Advantages of DT: easy data order. Zero-One Loss: counts classification mistakes as 0 (correct classification) or 1 (incorrect). Doesn’t give a slope when used for gradient
to interpret and visualize, especially with smaller trees. Disadvantages of descent, making it unhelpful for learning. Logistic regression: unlike the perceptron, uses probabilities to estimate class likelihoods and minimizes
DT: large trees can become hard to interpret, prone to overfitting (to errors through a loss function, typically trained using gradient descent. It calculates the logit, which is the logarithm of the odds ratio, showing the
prevent overfitting, control the depth of the tree and use pruning or relative likelihood of the positive class versus the negative class. It maps probabilities (0 to 1) to real numbers (from minus infinity to plus infinity).
setting minimum sample sizes for splits). Pruning: removes unnecessary So, w * x + b = logit (let’s say 3.0), then exp is always 2.718. 2.718^3 = 20.079. 1/20.079 = 0.0498. 1/(1+0.0498 = 0.95). Inverse logit (sigmoid):
nodes to simplify the tree. Random forest: a collection of decision trees, transforms real numbers (from negative infinity to positive infinity) into probabilities (0 to 1) (suitable for binary classification). Perceptron only
each tree is trained on different subset of the data, often using majority updates weights when there is an error, logistic regression updates weights based on the difference between predicted probabilities and true labels.
voting for classification. Advantages: reduces overfitting and improves To prevent overfitting: regularization term (L2 regularization), which penalizes large weights: by adding a penalty term to the loss function,
generalization. Disadvantages: less interpretable. controlled by a hyperparameter (alpha). Larger alpha value: encourage simpler models with smaller weights, while smaller values allow the model
to fit the data more closely. SVM: uses hinge loss to create decision boundary that maximizes the margin between classes, similar to loss function.
(Logistic regression: predicts probabilities (output between 0 and 1), Linear regression: predicts continuous numerical values).

€5,48

Krijg toegang tot het volledige document:

Gekocht door 19 studenten

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

iuk

3,7

(6)

Beoordelingen van geverifieerde kopers

Alle 4 reviews worden weergegeven

SaraBeek Data Science & Society · 2 beoordelingen

2 weken geleden

sgkuipers Data Science & Society · 4 beoordelingen

1 maand geleden

michieldefolter Cognitive Neuropsychology · 1 beoordeling

4 maanden geleden

denisegroot02 Sociologie · 7 beoordelingen

1 week geleden

3,0

4 beoordelingen

Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

iuk Tilburg University

Bekijk profiel

Volgen

Verkocht

Lid sinds

6 jaar

Aantal volgers

Documenten

Laatst verkocht

4 dagen geleden

iuk notes

3,7

6 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper iuk. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,48. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50201 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Machine Learning Cheatsheet + 27 Exam Questions (No Answers)

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Tilburg University (UVT) > Data Science & Society

Beoordelingen van geverifieerde kopers

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?