Business Intelligence & Data Mining
Exam 1 Questions with Complete
Solutions
Class Levels Count Threshold - Answer--change this to "2" so that interval variables
with only 2 levels (1/2, 0/1) will be automatically assigned the level of binary.
Reject level count threshold - Answer-detect the number class levels of nominal
variables and assign a role of Rejectedto those with class counts above the selected
threshold (default=20) ( a column of first names of 1000 students would be rejected)
When we determine if a variable should be input, rejected, id, etc., we are
determining the variables - Answer-role
When we are determining the variable type we are determining the - Answer-
Measurement Level
Nominal, Interval, Binary
Alphanumeric - Answer-2 levels--> binary
>20 levels--> rejected
2-20 levels--> nominal
Numeric - Answer-2 levels--> binary variable
>20 levels--> interval
2-20 levels--> nominal
Fixed costs of a direct marketing campaign are variable. - Answer-False;
Fixed costs are a fixed number - not dependent on volume or other factors. Variable
costs are variable
Response modeling scores prospects on when they respond to a direct marketing
campaign. - Answer-False;
Response modeling scores prospects on their likelihood to respond to a direct
marketing campaign.
Which of the following would be considered nominal variables?
a. the birth month of a customer although month is number
b. the high and low temperatures for a given location over 5 years
c. the actors roles in a play although month is number
d. the total sales amount ($'s) of a customers dinner check
e. children shoe sizes although children's shoe size is a number - Answer-a. the birth
month of a customer
although month is number - it represents a category. Also there are less than 20
levels, so it would be considered a nominal by SAS
b. the actors roles in a play
, although month is number - it represents a category. Also there are less than 20
levels, so it would be considered a nominal by SAS
e. children shoe sizes
although children's shoe size is a number - it represents a category. Also there are
less than 20 levels, so it would be considered a nominal by SAS
Which is more dangerous?
Select one:
a. Learning things that are not true
b. Learning things that are true but not useful - Answer-learnings things that are not
true
True or False:
Learning things that are true but not useful can serve as a validation that the data
mining techniques are working and the data is reasonably accurate. - Answer-True
An insurance company tracks the total insurance claim amount (in dollars) for all
customer claims using a variable called Amount. How would you display the
distribution of the variable Amount?
Select one:
a. Use a bar chart with Amount given the role of X.
b. Use a histogram with Amount given the role of X.
c. Use a histogram with Amount given the role of Category
d. Use a bar chart with Amount given the role of Category - Answer-Use a histogram
with Amount given the role of X.
True or False
The most important difference between directed and undirected data mining is that
undirected data mining does not use a specific target variable. - Answer-True
True or False:
Undirected data mining requires even more human understanding than directed
techniques. - Answer-True
One advantage of undirected data mining is that it can be fully automated. - Answer-
False
One advantage of undirected data mining is that it can be fully automated. - Answer-
Falsely
In geometric terms, clusters should be widely spaced from each other. - Answer-
True;
Cluster members should be close to each other. Clusters, themselves, should be
widely spaced.
k-means clustering is one of the most popular techniques for clustering analysis. -
Answer-True;
K-means clustering is one of the most popular techniques for clustering analysis.
Exam 1 Questions with Complete
Solutions
Class Levels Count Threshold - Answer--change this to "2" so that interval variables
with only 2 levels (1/2, 0/1) will be automatically assigned the level of binary.
Reject level count threshold - Answer-detect the number class levels of nominal
variables and assign a role of Rejectedto those with class counts above the selected
threshold (default=20) ( a column of first names of 1000 students would be rejected)
When we determine if a variable should be input, rejected, id, etc., we are
determining the variables - Answer-role
When we are determining the variable type we are determining the - Answer-
Measurement Level
Nominal, Interval, Binary
Alphanumeric - Answer-2 levels--> binary
>20 levels--> rejected
2-20 levels--> nominal
Numeric - Answer-2 levels--> binary variable
>20 levels--> interval
2-20 levels--> nominal
Fixed costs of a direct marketing campaign are variable. - Answer-False;
Fixed costs are a fixed number - not dependent on volume or other factors. Variable
costs are variable
Response modeling scores prospects on when they respond to a direct marketing
campaign. - Answer-False;
Response modeling scores prospects on their likelihood to respond to a direct
marketing campaign.
Which of the following would be considered nominal variables?
a. the birth month of a customer although month is number
b. the high and low temperatures for a given location over 5 years
c. the actors roles in a play although month is number
d. the total sales amount ($'s) of a customers dinner check
e. children shoe sizes although children's shoe size is a number - Answer-a. the birth
month of a customer
although month is number - it represents a category. Also there are less than 20
levels, so it would be considered a nominal by SAS
b. the actors roles in a play
, although month is number - it represents a category. Also there are less than 20
levels, so it would be considered a nominal by SAS
e. children shoe sizes
although children's shoe size is a number - it represents a category. Also there are
less than 20 levels, so it would be considered a nominal by SAS
Which is more dangerous?
Select one:
a. Learning things that are not true
b. Learning things that are true but not useful - Answer-learnings things that are not
true
True or False:
Learning things that are true but not useful can serve as a validation that the data
mining techniques are working and the data is reasonably accurate. - Answer-True
An insurance company tracks the total insurance claim amount (in dollars) for all
customer claims using a variable called Amount. How would you display the
distribution of the variable Amount?
Select one:
a. Use a bar chart with Amount given the role of X.
b. Use a histogram with Amount given the role of X.
c. Use a histogram with Amount given the role of Category
d. Use a bar chart with Amount given the role of Category - Answer-Use a histogram
with Amount given the role of X.
True or False
The most important difference between directed and undirected data mining is that
undirected data mining does not use a specific target variable. - Answer-True
True or False:
Undirected data mining requires even more human understanding than directed
techniques. - Answer-True
One advantage of undirected data mining is that it can be fully automated. - Answer-
False
One advantage of undirected data mining is that it can be fully automated. - Answer-
Falsely
In geometric terms, clusters should be widely spaced from each other. - Answer-
True;
Cluster members should be close to each other. Clusters, themselves, should be
widely spaced.
k-means clustering is one of the most popular techniques for clustering analysis. -
Answer-True;
K-means clustering is one of the most popular techniques for clustering analysis.