DATA MINING EXAM REVIW GUIDE
QUESTIONS WITH VERIFIED
ANSWERS
True or False: A good predictive model is one that fits the data closely. - Answer-
False: A good predictive model predicts new cases accurately, whereas an
explanatory model fits data closely.
Categorical is one type of variable. Which of the following is not a categorical
variable?
A. Hair color
B. Gender
C. Integer
D. Political affiliation - Answer-C: There are two types of variables, categorical and
numeric. Categorical would be ordered (low,medium,high) or unordered (male or
female). Numeric variables are variables that are continuous or integers.
True/ False: The equation that describes how y is related to x and the error term is
called the regression model? - Answer-True: the simple linear regression model is
y=Bo+B1X+E. B0 and B1 are called parameters of the model and E is a random
variable called
True or False: Data Mining is a scientific approach to managerial decision making in
which raw data are processed and manipulated to produce meaningful information? -
Answer-True: In order to make those decisions you must extract data from large data
sets. With data analysis you can detect meaningful patterns and rules, ultimately
finding meaningful correlations, patterns, and trends.
The variable being predicted is called the ? , while variables being used to predict
the value are called the ? .
A: Independent variable, denoted by y / Dependent variables, denoted by x
B: Dependent variable, denoted by x / Independent variables, denoted by y
C: Dependent variable, denoted by y / Independent variables, denoted by x
D: Independent variable, denoted x / Dependent variables, denoted by y - Answer-C:
Y is dependent upon X. The relationship between these two or more variables help
make managerial decisions. Regression Analysis can be used to develop an
equation showing how the variables are related.
The usefulness of a data mining method depends on _________.
A. The size of dataset
B. The types of patterns that exist in the data
, C. Noisiness of data
D. The particular goal of the analysis
E. All of the above - Answer-E: Every method of data mining has some advantages
and disadvantages. The method that is most useful for the current goal should be
used
True or false: The goal of unsupervised learning is to segment data into meaningful
segments; detect patterns. - Answer-True: With unsupervised learning there is no
target variable to predict or classify.
What is the name of the type of regression that compares one independent variable
with one dependent variable?
A) Multiple Linear Regression
B) Simple Linear Regression
C) Logistic Regression
D) Regression Trees - Answer-B
T/F : Regression analysis is a poor way to show the relationship between the
dependent variable and independent variable(s) - Answer-False: Regression
analysis is one of the best ways to show the relationship between the two types of
variables
1.Out of the six core ideas in data mining, which are associated with unsupervised
learning algorithms?
a.) Association rules, classification, data reduction, data exploration
b.) Data reduction, prediction, data visualization, association rules
c.) Association rules, data visualization, data exploration, data reduction
d.) Prediction, data reduction, data exploration, classification - Answer-C:
Unsupervised learning algorithms are those used where there is no outcome variable
to predict or classify.
T/F: Training data refers to that portion of the data used to assess how well the
model fits. - Answer-False: Training data refers to that portion of the data used to fit
a model. Validation data refers to that portion of the data used to assess how well
the model fits.
True/False: The first step in trying to reduce the number of predictors should always
be to use domain knowledge - Answer-TRUE: This is the first step because it is very
important to understand what the various predictors are measuring and why. By
using domain knowledge, the user can ensure he or she has condensed the data to
a manageable level. This will make finding the solution much easier.
QUESTIONS WITH VERIFIED
ANSWERS
True or False: A good predictive model is one that fits the data closely. - Answer-
False: A good predictive model predicts new cases accurately, whereas an
explanatory model fits data closely.
Categorical is one type of variable. Which of the following is not a categorical
variable?
A. Hair color
B. Gender
C. Integer
D. Political affiliation - Answer-C: There are two types of variables, categorical and
numeric. Categorical would be ordered (low,medium,high) or unordered (male or
female). Numeric variables are variables that are continuous or integers.
True/ False: The equation that describes how y is related to x and the error term is
called the regression model? - Answer-True: the simple linear regression model is
y=Bo+B1X+E. B0 and B1 are called parameters of the model and E is a random
variable called
True or False: Data Mining is a scientific approach to managerial decision making in
which raw data are processed and manipulated to produce meaningful information? -
Answer-True: In order to make those decisions you must extract data from large data
sets. With data analysis you can detect meaningful patterns and rules, ultimately
finding meaningful correlations, patterns, and trends.
The variable being predicted is called the ? , while variables being used to predict
the value are called the ? .
A: Independent variable, denoted by y / Dependent variables, denoted by x
B: Dependent variable, denoted by x / Independent variables, denoted by y
C: Dependent variable, denoted by y / Independent variables, denoted by x
D: Independent variable, denoted x / Dependent variables, denoted by y - Answer-C:
Y is dependent upon X. The relationship between these two or more variables help
make managerial decisions. Regression Analysis can be used to develop an
equation showing how the variables are related.
The usefulness of a data mining method depends on _________.
A. The size of dataset
B. The types of patterns that exist in the data
, C. Noisiness of data
D. The particular goal of the analysis
E. All of the above - Answer-E: Every method of data mining has some advantages
and disadvantages. The method that is most useful for the current goal should be
used
True or false: The goal of unsupervised learning is to segment data into meaningful
segments; detect patterns. - Answer-True: With unsupervised learning there is no
target variable to predict or classify.
What is the name of the type of regression that compares one independent variable
with one dependent variable?
A) Multiple Linear Regression
B) Simple Linear Regression
C) Logistic Regression
D) Regression Trees - Answer-B
T/F : Regression analysis is a poor way to show the relationship between the
dependent variable and independent variable(s) - Answer-False: Regression
analysis is one of the best ways to show the relationship between the two types of
variables
1.Out of the six core ideas in data mining, which are associated with unsupervised
learning algorithms?
a.) Association rules, classification, data reduction, data exploration
b.) Data reduction, prediction, data visualization, association rules
c.) Association rules, data visualization, data exploration, data reduction
d.) Prediction, data reduction, data exploration, classification - Answer-C:
Unsupervised learning algorithms are those used where there is no outcome variable
to predict or classify.
T/F: Training data refers to that portion of the data used to assess how well the
model fits. - Answer-False: Training data refers to that portion of the data used to fit
a model. Validation data refers to that portion of the data used to assess how well
the model fits.
True/False: The first step in trying to reduce the number of predictors should always
be to use domain knowledge - Answer-TRUE: This is the first step because it is very
important to understand what the various predictors are measuring and why. By
using domain knowledge, the user can ensure he or she has condensed the data to
a manageable level. This will make finding the solution much easier.