QMB3302 Final Exam 2
Features of a model - answer represent your inputs
problems with hold out method for validation - answer model is sensitive to outliers
not built on all training data
Pipelines are useful for the following reasons - answer make it easy to change small
things in your model (variables to include)
help organize code used to clean and treat data
make it easy to repeat steps and run multiple models
X values are ____________, and y value(s) are something we are trying to
___________ - answer features, predict
y hat - answer predicted value of y
what is a good model fit value - answer Unknowable without knowing/understanding the
context and the domain.
first variable in decision tree - answer root
last variable in decision tree - answerterminal node
best source of info on parameters that can be used in models (eg random forest) -
answerthe scikit learning documentation
do you need to run a linear regression to understand a decision tree? - answerNo
what type of problems can be random forests be run on - answerclassification and
regression
random forests are ______ interpretable than decision trees - answerless
Unsupervised learning methods - answer
Elbow Plot - answergives an estimate of how many clusters you need based on the
curvature of the plot; not exact
Silhouette score - answersome indication of how far away a point ins from other clusters
what is a good silhouette score - answerhigher number
0 would average
, what kind of data does a silhouette score use - answerarray (not pandas)
clustering algos do what - answermeasure distance bw observations
does not necessarily mean two values or dimensions
Euclidian distance - answerthe straight-line distance, or shortest possible path, between
two points
what is K in clustering - answerhow many groups you want
centroids - answerpoints dropped around the numbers in order to measure a distance.
points are assigned to the cluster, and the mean of those points create a new centroid
when centroids don't change upon repetition it is called - answerconvergence
regression analysis - answerAn analytic technique where a series of input variables are
examined in relation to their corresponding output results in order to develop a
mathematical or statistical relationship.
regression analysis methods - answerlinear
nonlinear
hierarchal clustering
Naive Bayes Classifier - answerpredicts the probability of a certain outcome based on
prior occurrences of related events
fast and simple classification algos
are naive bayes suitable for high or low dimension databases? - answerhigh
generative model - answeran unsupervised model that predicts how likely a given
example is. eg, predicting the next word in the sentence
P(L | Features) - answerP ( features | L ) P(L) / P (features)
neural network model - answerinspired by the way the brain stores and processes info
input layer (x)
hidden layer
Output layer (Y)
hidden layer is - answerwhat should be applied to the x to equal the y
deep neural networks are - answermore than one hidden layer
hidden layers do what - answerapply weights to be chosen at random
amount of weights are equal to the amount of synapses
Features of a model - answer represent your inputs
problems with hold out method for validation - answer model is sensitive to outliers
not built on all training data
Pipelines are useful for the following reasons - answer make it easy to change small
things in your model (variables to include)
help organize code used to clean and treat data
make it easy to repeat steps and run multiple models
X values are ____________, and y value(s) are something we are trying to
___________ - answer features, predict
y hat - answer predicted value of y
what is a good model fit value - answer Unknowable without knowing/understanding the
context and the domain.
first variable in decision tree - answer root
last variable in decision tree - answerterminal node
best source of info on parameters that can be used in models (eg random forest) -
answerthe scikit learning documentation
do you need to run a linear regression to understand a decision tree? - answerNo
what type of problems can be random forests be run on - answerclassification and
regression
random forests are ______ interpretable than decision trees - answerless
Unsupervised learning methods - answer
Elbow Plot - answergives an estimate of how many clusters you need based on the
curvature of the plot; not exact
Silhouette score - answersome indication of how far away a point ins from other clusters
what is a good silhouette score - answerhigher number
0 would average
, what kind of data does a silhouette score use - answerarray (not pandas)
clustering algos do what - answermeasure distance bw observations
does not necessarily mean two values or dimensions
Euclidian distance - answerthe straight-line distance, or shortest possible path, between
two points
what is K in clustering - answerhow many groups you want
centroids - answerpoints dropped around the numbers in order to measure a distance.
points are assigned to the cluster, and the mean of those points create a new centroid
when centroids don't change upon repetition it is called - answerconvergence
regression analysis - answerAn analytic technique where a series of input variables are
examined in relation to their corresponding output results in order to develop a
mathematical or statistical relationship.
regression analysis methods - answerlinear
nonlinear
hierarchal clustering
Naive Bayes Classifier - answerpredicts the probability of a certain outcome based on
prior occurrences of related events
fast and simple classification algos
are naive bayes suitable for high or low dimension databases? - answerhigh
generative model - answeran unsupervised model that predicts how likely a given
example is. eg, predicting the next word in the sentence
P(L | Features) - answerP ( features | L ) P(L) / P (features)
neural network model - answerinspired by the way the brain stores and processes info
input layer (x)
hidden layer
Output layer (Y)
hidden layer is - answerwhat should be applied to the x to equal the y
deep neural networks are - answermore than one hidden layer
hidden layers do what - answerapply weights to be chosen at random
amount of weights are equal to the amount of synapses