answers A+ rated passed
Linear regression - correct answer ✔✔uses ordinary least squares to allow us to predict a target
(dependent) variable based on one or more input (independent) variables
There are two major issues when linear regression is used to model binary target variables -
correct answer ✔✔1. Non-Conforming Probabilities - Predicted probabilities can be >1 or <0
leading to model interpretation difficulties
2. The variance of error terms (residuals) is not constant (violation of homoscedasticity
assumption), and there are obvious trends in residuals (violation of linearity assumption)
Logistic regression (logit) models addresses these issues - correct answer ✔✔• Selects
regression coefficients to force predicted values for Y to fall between 1 and 0
• Produces an s-shaped (sigmoid) curve rather than a straight line to model probabilities
• Selects coefficients using Maximum Likelihood Estimation (MLE) rather than Ordinary Least
Squares (OLS)
probability - correct answer ✔✔outcome of interest/all possible outcomes
odds - correct answer ✔✔p(event occurs)/p(event does not occur
or
p/1-p
Logistic regression model with a binary target predicts the probability of the - correct answer
✔✔desired target occurring
, Summary - correct answer ✔✔- Logistic regression is similar to linear regression, except that it
is used with a categorical response
- It can be used for explanatory tasks (=profiling) or predictive tasks (=classification)
- The predictors are related to the response Y via a nonlinear function called the logit
- As in linear regression, reducing predictors can be done via variable selection
- Logistic regression can be generalized to more than two classes
Cluster - correct answer ✔✔A collection of data objects
• Large similarity among objects in the same cluster
• Dissimilarity among objects in different clusters
- cluster analysis is also known as segmentation
Clustering is an _______ technique - correct answer ✔✔unsupervised: no predetermined
classes (target)
Typical applications of clustering - correct answer ✔✔- A stand-alone analysis, to gain insight on
the data
- A pre-processing step for other predictive models
Good clustering will produce high quality clusters with: - correct answer ✔✔• High intra-class
similarity
• Low inter-class similarity
Quality of the clustering depends on - correct answer ✔✔• The similarity measure used
• The implementation
- Quality is also measured by the ability to discover hidden patterns