Spiekbrief TML
Definitie machine learning:
Machine learning algorithms build a model based on sample data, known as training data, in order to
make predictions or decisions on new unseen data, without being explicitly programmed to do so.
Supervised:
Classification:
- Evaluation: Accuracy, Precision, Recall, F1
- Classification: Predicting discrete labels
Regression:
- Evaluation: RMSE (Root mean squared error), R2
- Regression: Predicting continuous labels
Unsupervised:
Clustering:
- K-means
- Agglomerative clustering
- DBSCAN
De sigmoid of de logistic function:
- domein (van een functie) beschrijft: welke waarden je in de functie kunt stoppen
- het bereik (range) (van een functie) beschrijft: welke waarden de functie kan worden
RMSE:
Root mean squared error:
, - Evaluation metric for regression
- The mean squared error is the sum of the squared differences between the predictions and the
true values. RMSE takes the root of this.
- When assessing how well a model fits a dataset, we use the RMSE more often because it is
measured in the same units as the response variable.
- MSE is de gemiddelde gekwadrateerde afstand tot het gemiddelde, ofwel variantie. RMSE is
gelijk aan de wortel van de variantie, en dat noemen we de std.
- Error maat: dus hoe lager hoe beter.
def rmse(A,B):
N= len(A)
return sqrt( sum( (A[i]-B[i])**2 for i in range(N) ) / N)
def rmse(A,B):
A,B = np.array(A), np.array(B)
return sqrt( ((A-B)**2).mean() )
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
MSE:
1. The reason that the MSE is sometimes calculated by taking the half of the mean squared error
is that this makes the the derivative of the cost function less complicated, as the squaring and
divison by two cancel each other out when taking the derivative. Since the cost function is
still convex, changing this does not affect the outcome.
2. The general trend is that if we discard too many points, our estimator will not have enough
data too make an accurate estimate, and that the more data we allow the more accurate it
becomes. However, at some point the outliers will start to effect our result, and as a result we
again lose some accuracy in the predictions
Baseline:
→ A predictor "learned" using only global information
→ Your predictor is only based on "prejudice"
→ Rule-based predictor
Mode, or majority class for a good baseline for ‘geslacht’ IK student.
Baselin voor geven fooi kan bijv mean of median gebruiken
mediaan → middelste getal dataset waarbij de helft van de gevallen boven de mediaan zit en de
andere helft onder de mediaan
Definitie machine learning:
Machine learning algorithms build a model based on sample data, known as training data, in order to
make predictions or decisions on new unseen data, without being explicitly programmed to do so.
Supervised:
Classification:
- Evaluation: Accuracy, Precision, Recall, F1
- Classification: Predicting discrete labels
Regression:
- Evaluation: RMSE (Root mean squared error), R2
- Regression: Predicting continuous labels
Unsupervised:
Clustering:
- K-means
- Agglomerative clustering
- DBSCAN
De sigmoid of de logistic function:
- domein (van een functie) beschrijft: welke waarden je in de functie kunt stoppen
- het bereik (range) (van een functie) beschrijft: welke waarden de functie kan worden
RMSE:
Root mean squared error:
, - Evaluation metric for regression
- The mean squared error is the sum of the squared differences between the predictions and the
true values. RMSE takes the root of this.
- When assessing how well a model fits a dataset, we use the RMSE more often because it is
measured in the same units as the response variable.
- MSE is de gemiddelde gekwadrateerde afstand tot het gemiddelde, ofwel variantie. RMSE is
gelijk aan de wortel van de variantie, en dat noemen we de std.
- Error maat: dus hoe lager hoe beter.
def rmse(A,B):
N= len(A)
return sqrt( sum( (A[i]-B[i])**2 for i in range(N) ) / N)
def rmse(A,B):
A,B = np.array(A), np.array(B)
return sqrt( ((A-B)**2).mean() )
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
MSE:
1. The reason that the MSE is sometimes calculated by taking the half of the mean squared error
is that this makes the the derivative of the cost function less complicated, as the squaring and
divison by two cancel each other out when taking the derivative. Since the cost function is
still convex, changing this does not affect the outcome.
2. The general trend is that if we discard too many points, our estimator will not have enough
data too make an accurate estimate, and that the more data we allow the more accurate it
becomes. However, at some point the outliers will start to effect our result, and as a result we
again lose some accuracy in the predictions
Baseline:
→ A predictor "learned" using only global information
→ Your predictor is only based on "prejudice"
→ Rule-based predictor
Mode, or majority class for a good baseline for ‘geslacht’ IK student.
Baselin voor geven fooi kan bijv mean of median gebruiken
mediaan → middelste getal dataset waarbij de helft van de gevallen boven de mediaan zit en de
andere helft onder de mediaan