DATA MINING FINAL EXAM REVIEW
QUESTIONS WITH COMPLETE
ANSWERS
What is the best method we have learned so far to handle missing data in a data set
a. replacing missing data with the mode of an attribute
b. Remove the columns containing missing data
c. replace all of the missing values with zero
d. replace all of the missing values with one - Answer-replacing missing data with the
mode of an attribute
in the logistic regression method, what should be the minimum confidence level in
order for the results to be dependable
a. .10%
b. 90%
c. .05%
d. .95% - Answer-.95%
The logistic regression is an -----
a. prediction
b. expectation
c. classification
d. differentation - Answer-prediction
The linear regression is an ---- model
a. classification
b. expectation
c. prediction
d. differentiation - Answer-prediction
Midterm one - Answer-1-5
which of the following companies is not among those that developed the data mining
process we use in class
a. NCR
b. SPSS
c. Microsoft
d. Daimler-Benz - Answer-Microsoft
, What is the name of the software tool we use to retrieve data from a database
a. query
b. correlation
c. data warehouse
d. data mart - Answer-Query
how many data sets do you need in order to implement linear regression
a. one
b. two
c. three
d. four - Answer-two
in the linear regression method one needs to designate an attribute to be predicted
as 'label'. which of the following rapid miner operators can be used in order to do
that?
a. set label
b. set role
c. apply model
d. apply label - Answer-set role
in the linear regression method, the ranges for all attributes in the scoring data must
be within the ranges for the corresponding attributes in the training data. which of the
following rapid miner operators can be used in order to match the ranges?
a. filter examples
b. filter range
c. set examples
d. set ranges - Answer-filter examples
Any value that is smaller (larger) than ____ standard deviations below (above) the
mean is considered inconsistent
a. one
b. two
c. three
d. four - Answer-two
what is the mathematical formula for the multiple linear regression? - Answer-Y =
m1x1 + m2x2+ .... + mnxn + b
What does the letter 'k' in kmeans clustering stand for
a. number of groups
b. number of attributes
c. number of correlations
d. number of observations - Answer-number of groups
QUESTIONS WITH COMPLETE
ANSWERS
What is the best method we have learned so far to handle missing data in a data set
a. replacing missing data with the mode of an attribute
b. Remove the columns containing missing data
c. replace all of the missing values with zero
d. replace all of the missing values with one - Answer-replacing missing data with the
mode of an attribute
in the logistic regression method, what should be the minimum confidence level in
order for the results to be dependable
a. .10%
b. 90%
c. .05%
d. .95% - Answer-.95%
The logistic regression is an -----
a. prediction
b. expectation
c. classification
d. differentation - Answer-prediction
The linear regression is an ---- model
a. classification
b. expectation
c. prediction
d. differentiation - Answer-prediction
Midterm one - Answer-1-5
which of the following companies is not among those that developed the data mining
process we use in class
a. NCR
b. SPSS
c. Microsoft
d. Daimler-Benz - Answer-Microsoft
, What is the name of the software tool we use to retrieve data from a database
a. query
b. correlation
c. data warehouse
d. data mart - Answer-Query
how many data sets do you need in order to implement linear regression
a. one
b. two
c. three
d. four - Answer-two
in the linear regression method one needs to designate an attribute to be predicted
as 'label'. which of the following rapid miner operators can be used in order to do
that?
a. set label
b. set role
c. apply model
d. apply label - Answer-set role
in the linear regression method, the ranges for all attributes in the scoring data must
be within the ranges for the corresponding attributes in the training data. which of the
following rapid miner operators can be used in order to match the ranges?
a. filter examples
b. filter range
c. set examples
d. set ranges - Answer-filter examples
Any value that is smaller (larger) than ____ standard deviations below (above) the
mean is considered inconsistent
a. one
b. two
c. three
d. four - Answer-two
what is the mathematical formula for the multiple linear regression? - Answer-Y =
m1x1 + m2x2+ .... + mnxn + b
What does the letter 'k' in kmeans clustering stand for
a. number of groups
b. number of attributes
c. number of correlations
d. number of observations - Answer-number of groups