BI EXAM 2 QUESTIONS AND ANSWERS
Data mining can be very useful in detecting patterns such as credit card fraud, but is of
little help in improving sales. - Answer -False
Data mining can be very useful in detecting patterns such as credit card fraud and -
Answer -is very helpful in increasing sales
If using a mining analogy, _________________ would be a more appropriate term than
"data mining." - Answer -knowledge mining
In the Miami-Dade Police Department case study, predictive analytics helped to identify
- Answer -a holistic view of the world of crime and criminals for better and faster
reaction and management.
NOT to identify the best schedule to pay the least overtime
In the cancer research case study, data mining algorithms that predict cancer
survivability with high predictive power are good replacements for medical
professionals. - Answer -False
Are good at helping doctors
In the Influence Health case study, what was the goal of the system? - Answer -
Increasing service use
All of the following statements about data mining are true EXCEPT - Answer -the
process aspect means that data mining should be a one-step process to results.
Data mining is - Answer --the novel aspect means that previously unknown patterns are
discovered.
-the potentially useful aspect means that results should lead to some business benefit.
-the valid aspect means that the discovered patterns should hold true on new data.
The data field "ethnic group" can be best described as - Answer -nominal data.
Which broad area of data mining applications partitions a collection of objects into
natural groupings with similar features? - Answer -Clustering
Clustering partitions a collection of things into segments whose members share -
Answer -Similar Characteristics
What does the robustness of a data mining method refer to? - Answer -its ability to
overcome noisy data to make somewhat accurate predictions
,Why Data Mining? - Answer --More intense competition at the global scale
-Recognition of the value in data sources
-Availability of quality data on customers, vendors, transactions, Web, etc.
-Consolidation and integration of data repositories into data warehouses
-The exponential increase in data processing and storage capabilities; and decrease in
cost
-Movement toward conversion of information resources into nonphysical form
Definition of Data Mining - Answer -The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in data stored in structures
databases.
Other names for data mining - Answer --knowledge extraction,
-pattern analysis,
-knowledge discovery,
-information harvesting,
-pattern searching,
-data dredging
Source of data for DM is often a - Answer -consolidated data warehouse (not always!)
D M environment is usually a - Answer -client-server or a Web-based information
systems architecture.
Data is the most - Answer -critical ingredient for D M which may include
soft/unstructured data
The miner is often - Answer -an end user
Striking it rich - Answer -requires creative thinking
Data mining tools' capabilities and ease of use are - Answer -essential
DM extract ________ from data - Answer -patterns
pattern - Answer -A mathematical (numeric and/or symbolic) relationship among data
items
Types of patterns - Answer --Association
-Prediction
-Cluster (segmentation)
-Sequential (or time series) relationships
Predictions consists of - Answer -Classification
Regression
, Time Series
Association consists of - Answer -Market Basket
Link Analysis
Sequence Analysis
Segmentation consists of - Answer -Clustering
Outlier analysis
Data mining versus statistics - Answer -main difference is that statistics starts with a
well-defined proposition and hypothesis, wheras data mining starts with a loosely
defined discovery statement
Data mining looks for data sets that are as "big" as possible; statistics looks for the right
size of data
Data Mining Process - Answer -•A manifestation of the best practices
•A systematic way to conduct D M projects
•Moving from *Art to Science* for D M project
Everybody has a different version
Most common standard processes - Answer --C R I S P-D M (Cross-Industry Standard
Process for Data Mining)
-S E M M A (Sample, Explore, Modify, Model, and Assess)
-K D D (Knowledge Discovery in Databases)
CRISP-DM stands for - Answer -Cross Industry Standard Process for Data Mining
CRISP-DM composed of six consecutive phases - Answer --Step 1: Business
Understanding
-Step 2: Data Understanding
-Step 3: Data Preparation
-Step 4: Model Building
-Step 5: Testing and Evaluation
-Step 6: Deployment
What steps compose of 85% of the total CRISP-DM project time - Answer --Step 1:
Business Understanding
-Step 2: Data Understanding
-Step 3: Data Preparation
Classification - Answer -•Most frequently used DM method
•Part of the machine-learning family
•Employ supervised learning
•Learn from past data, classify new data
•The output variable is categorical (nominal or ordinal) in nature
Data mining can be very useful in detecting patterns such as credit card fraud, but is of
little help in improving sales. - Answer -False
Data mining can be very useful in detecting patterns such as credit card fraud and -
Answer -is very helpful in increasing sales
If using a mining analogy, _________________ would be a more appropriate term than
"data mining." - Answer -knowledge mining
In the Miami-Dade Police Department case study, predictive analytics helped to identify
- Answer -a holistic view of the world of crime and criminals for better and faster
reaction and management.
NOT to identify the best schedule to pay the least overtime
In the cancer research case study, data mining algorithms that predict cancer
survivability with high predictive power are good replacements for medical
professionals. - Answer -False
Are good at helping doctors
In the Influence Health case study, what was the goal of the system? - Answer -
Increasing service use
All of the following statements about data mining are true EXCEPT - Answer -the
process aspect means that data mining should be a one-step process to results.
Data mining is - Answer --the novel aspect means that previously unknown patterns are
discovered.
-the potentially useful aspect means that results should lead to some business benefit.
-the valid aspect means that the discovered patterns should hold true on new data.
The data field "ethnic group" can be best described as - Answer -nominal data.
Which broad area of data mining applications partitions a collection of objects into
natural groupings with similar features? - Answer -Clustering
Clustering partitions a collection of things into segments whose members share -
Answer -Similar Characteristics
What does the robustness of a data mining method refer to? - Answer -its ability to
overcome noisy data to make somewhat accurate predictions
,Why Data Mining? - Answer --More intense competition at the global scale
-Recognition of the value in data sources
-Availability of quality data on customers, vendors, transactions, Web, etc.
-Consolidation and integration of data repositories into data warehouses
-The exponential increase in data processing and storage capabilities; and decrease in
cost
-Movement toward conversion of information resources into nonphysical form
Definition of Data Mining - Answer -The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in data stored in structures
databases.
Other names for data mining - Answer --knowledge extraction,
-pattern analysis,
-knowledge discovery,
-information harvesting,
-pattern searching,
-data dredging
Source of data for DM is often a - Answer -consolidated data warehouse (not always!)
D M environment is usually a - Answer -client-server or a Web-based information
systems architecture.
Data is the most - Answer -critical ingredient for D M which may include
soft/unstructured data
The miner is often - Answer -an end user
Striking it rich - Answer -requires creative thinking
Data mining tools' capabilities and ease of use are - Answer -essential
DM extract ________ from data - Answer -patterns
pattern - Answer -A mathematical (numeric and/or symbolic) relationship among data
items
Types of patterns - Answer --Association
-Prediction
-Cluster (segmentation)
-Sequential (or time series) relationships
Predictions consists of - Answer -Classification
Regression
, Time Series
Association consists of - Answer -Market Basket
Link Analysis
Sequence Analysis
Segmentation consists of - Answer -Clustering
Outlier analysis
Data mining versus statistics - Answer -main difference is that statistics starts with a
well-defined proposition and hypothesis, wheras data mining starts with a loosely
defined discovery statement
Data mining looks for data sets that are as "big" as possible; statistics looks for the right
size of data
Data Mining Process - Answer -•A manifestation of the best practices
•A systematic way to conduct D M projects
•Moving from *Art to Science* for D M project
Everybody has a different version
Most common standard processes - Answer --C R I S P-D M (Cross-Industry Standard
Process for Data Mining)
-S E M M A (Sample, Explore, Modify, Model, and Assess)
-K D D (Knowledge Discovery in Databases)
CRISP-DM stands for - Answer -Cross Industry Standard Process for Data Mining
CRISP-DM composed of six consecutive phases - Answer --Step 1: Business
Understanding
-Step 2: Data Understanding
-Step 3: Data Preparation
-Step 4: Model Building
-Step 5: Testing and Evaluation
-Step 6: Deployment
What steps compose of 85% of the total CRISP-DM project time - Answer --Step 1:
Business Understanding
-Step 2: Data Understanding
-Step 3: Data Preparation
Classification - Answer -•Most frequently used DM method
•Part of the machine-learning family
•Employ supervised learning
•Learn from past data, classify new data
•The output variable is categorical (nominal or ordinal) in nature