Which statement about the data mining process is INCORRECT? - Answers Data cleaning and
pre-processing is usually a trivial step in the process
According to the data-driven decision-making technology pyramid shown in the following figure,
which statement is FALSE? - Answers The process only moves in one direction (upward) and
higher layers never give feedback to the lower layers
Which statement is FALSE about the data-driven decision-making approach? - Answers It is
loaded with assumptions and theories
Which statement about business intelligence workflow is CORRECT? - Answers Data in the
operational database is transformed to analytical data in the data warehouse
Which of the following is a core idea/task in data mining? - Answers All of the others
"Estimating the repair time required for an aircraft based on a trouble ticket."
Performing this task in data mining requires an unsupervised learning approach - Answers False
ANOVA is an analysis under which of the following data mining task categories? - Answers Data
exploration
"Learn from the observed records to predict numerical values of unseen records."
In data mining this is called.... - Answers Regression
Data exploration includes summary statistics, univariate and bivariate analysis, basic statistical
test (t-test, correlation), ANOVA, and outlier detection. - Answers True
"Identifying segments of similar customers."
Performing this task in data mining requires a supervised learning approach. - Answers False
Which of the following tasks is an unsupervised learning task? - Answers Grouping customers
based on the similarity in their online behavior
"Learn from the observed records to predict the class value of unseen records."
In data mining, this called... - Answers Classification
"Identifying a network data packet as dangerous (virus, hacker attack) based on comparison to
other packets whose threat status is known."
performing this task in data mining requires a supervised learning approach. - Answers Trure
"Automated sorting of mail by zip code scanning."
, Performing this task in data mining requires an unsupervised learning approach. - Answers True
What is the first phase in the CRISP-DM approach for data mining tasks? - Answers business
understanding
What is the essential element in the machine learning algorithms that distinguish supervised
from unsupervised learning? - Answers In the supervised learning models target variable is used
in the model, but in the unsupervised learning models there is no target to predict
Which of the following tasks is a supervised learning task? - Answers Predicting air pollution
"Predicting whether a company will go bankrupt based on comparing its financial data to those
of similar bankrupt and non bankrupt firms."
Performing this tasks in data mining requires an unsupervised learning approach. - Answers
False
Which of the following statements is INCORRECT about imputing missing numerical values? -
Answers Random generator function is one of the best methods of imputing
Which is NOT one of the primary reasons for discretizing numerical variables? - Answers Higher
accuracy
When data is not uniformly distributed and includes outliers, linear normalization is better than
the z-score standardization method. - Answers False
Transforming numerical variables means performing mathematical functions on them and
creating new variables that are better suited for our data mining model. - Answers True
IN practice, data preprocessing takes a significant portion of data mining projects. - Answers
True
Which of the following is NOT a step in data pre-processing? - Answers Data modeling
Which of the following statements is INCORRECT about the missing values in a data set? -
Answers The best strategy is always to drop records with any missing values
Which of the following tasks is NOT included in the data preprocessing phase? - Answers
Performance Evaluation
The data dictionary is meta-data, which is data about data. - Answers True
In the data preparation step, normalizing numerical data is a popular method to transform
variables into a more suitable scale for modeling. - Answers True
In statistics and data mining, "a statistical measure of the strength of the relationship between
the relative changes of 2 variables" is called... - Answers Correlation Coefficient