MINE 272 FINAL EXAM REVIEW
QUESTION WITH COMPLETE
SOLUTIONS
Logistic regression - Answer-Output is not continuous variable, binary outcome
Based on logistic function where f(y) = e^y/1 + e^y for -infinity < y < +infinity
Unsupervised learning - Answer-Training data does not have inputs with corresponding
target vectors as output
Ex. clustering (K-means clustering), density estimation
Clustering - Answer-Discovers groups of similar examples within data, infinite number of
attributes (unsupervised learning)
Ex. ore grades of orebody
Density estimation - Answer-Determine distribution of data within given input space,
converting high dimension data to 1D or 2D (unsupervised learning)
K-means clustering - Answer-Finds k clusters in dataset with any number of attributes
(unsupervised learning)
Centroid = cluster mean
1. Define K, guess location of centroid
2. Compute the distance between the point and centroid (assign point closest centroid)
3. Compute centroid of each cluster
4. Repeat 2 and 3 until convergence (centroid stationary or oscillates)
WSS (within sum of squares) - if k+1 doesn't change WSS, number of k clusters is
suitable
Time series - Answer-Observations taken over time
Time series analysis - Answer-An analyzing underlying structure of time series for
characteristics including:
trends (long-term movement)
seasonality (periodic, fixed fluctuations)
cyclic (not fixed, periodic fluctuations)
random (remaining component of series)
, Autoregressive model (AR) - Answer-Order of p
P dependent on autocorrelation lag in model
Predict feature based on past p-values
Moving average model - Answer-Order of q
Use error terms of previous values, add, multiply by constant
Accounts for effect of earlier random shocks on current value
ARIMA - Answer-Autoregressive Integrated Moving Average
Order of p, d, q
Only input is time itself, provides effective short-term forecasting because it accounts for
severe shock impacts
Polynomial regression - Answer-Fit curve y(x,w) = w0 + w1x + w2x^2 + wmx^m
m = order of polynomial
Non-linear function of x, linear in coefficient w
Error function - Answer-Misfit between y predicted and t training data
Polynomial coefficients determined by fitting to training data
E(w) = 1/2 sum of (y - tn)^2
E(w) = 0 if passed through training data
Choose set that will minimize error
Coefficient of determination - Answer-R^2 value - correlation between data
R2 = 1 - SSR/SST
Residual sum of squares - Answer-SSR = sum of (yi - ypredict)^2
Total sum of squares - Answer-SST = sum of (yi - yactual)^2
Explained sum of squares - Answer-SSE = sum of (ypredicted - yactual)^2
SSR + SSE = SST
Precision (classifier diagnostic) - Answer-Closeness of measurements to each other
(consistency of outputs)
Precision = TP / TP + FP
Accuracy (classifier diagnostic) - Answer-Closeness of measurement to true value (how
good output is)
QUESTION WITH COMPLETE
SOLUTIONS
Logistic regression - Answer-Output is not continuous variable, binary outcome
Based on logistic function where f(y) = e^y/1 + e^y for -infinity < y < +infinity
Unsupervised learning - Answer-Training data does not have inputs with corresponding
target vectors as output
Ex. clustering (K-means clustering), density estimation
Clustering - Answer-Discovers groups of similar examples within data, infinite number of
attributes (unsupervised learning)
Ex. ore grades of orebody
Density estimation - Answer-Determine distribution of data within given input space,
converting high dimension data to 1D or 2D (unsupervised learning)
K-means clustering - Answer-Finds k clusters in dataset with any number of attributes
(unsupervised learning)
Centroid = cluster mean
1. Define K, guess location of centroid
2. Compute the distance between the point and centroid (assign point closest centroid)
3. Compute centroid of each cluster
4. Repeat 2 and 3 until convergence (centroid stationary or oscillates)
WSS (within sum of squares) - if k+1 doesn't change WSS, number of k clusters is
suitable
Time series - Answer-Observations taken over time
Time series analysis - Answer-An analyzing underlying structure of time series for
characteristics including:
trends (long-term movement)
seasonality (periodic, fixed fluctuations)
cyclic (not fixed, periodic fluctuations)
random (remaining component of series)
, Autoregressive model (AR) - Answer-Order of p
P dependent on autocorrelation lag in model
Predict feature based on past p-values
Moving average model - Answer-Order of q
Use error terms of previous values, add, multiply by constant
Accounts for effect of earlier random shocks on current value
ARIMA - Answer-Autoregressive Integrated Moving Average
Order of p, d, q
Only input is time itself, provides effective short-term forecasting because it accounts for
severe shock impacts
Polynomial regression - Answer-Fit curve y(x,w) = w0 + w1x + w2x^2 + wmx^m
m = order of polynomial
Non-linear function of x, linear in coefficient w
Error function - Answer-Misfit between y predicted and t training data
Polynomial coefficients determined by fitting to training data
E(w) = 1/2 sum of (y - tn)^2
E(w) = 0 if passed through training data
Choose set that will minimize error
Coefficient of determination - Answer-R^2 value - correlation between data
R2 = 1 - SSR/SST
Residual sum of squares - Answer-SSR = sum of (yi - ypredict)^2
Total sum of squares - Answer-SST = sum of (yi - yactual)^2
Explained sum of squares - Answer-SSE = sum of (ypredicted - yactual)^2
SSR + SSE = SST
Precision (classifier diagnostic) - Answer-Closeness of measurements to each other
(consistency of outputs)
Precision = TP / TP + FP
Accuracy (classifier diagnostic) - Answer-Closeness of measurement to true value (how
good output is)