If the number of observations with missing values is small,
______________ may be a reasonable option. - ANSWERS-throwing
out these incomplete observations
Methods to uncover data quality issues and outliers - ANSWERS-
Examine the variables in the data set by means of summary statistics,
histograms, PivotTables, scatter plots, and other tools
Example: Negative values for sales may result from a data entry error
or may actually denote a missing value.
Conservative approach to Identification of Outliers and Erroneous
Data - ANSWERS-•create two data sets, one with and one without
outliers, and then construct a model on both data sets.
•If a model's implications depend on the inclusion or exclusion of
outliers, then one should spend additional time to track down the
cause of the outliers.
Variable representation - ANSWERS-• In many data-mining
applications, the number of variables for which data is recorded may
be prohibitive to analyze.
,Dimension Reduction - ANSWERS-Process of removing variables
from the analysis without losing any crucial information
Methods of dimension reduction - ANSWERS-• examine pairwise
correlations to detect variables or groups of variables that may supply
similar information
• such variables can be aggregated or removed to allow more
parsimonious model development.
Model Construction - ANSWERS-Apply the appropriate data-mining
technique (regression, classification trees, k-means) to accomplish the
desired data-mining task (prediction, classification, clustering, etc.).
Model Assessment - ANSWERS-Evaluate models by comparing
performance on appropriate data sets.
cluster analysis - ANSWERS-• goal is to segment observations into
similar groups based on the observed variables
• can be employed during the data preparation step to identify
variables or observations that can be aggregated or removed from
consideration.
• can also be used to identify outliers.
Unsupervised Learning
,Market Segmentation - ANSWERS-cluster analysis is commonly used
in marketing to divide consumers into different homogeneous groups
Clustering methods - ANSWERS-• hierarchical
• k-means
• both depend on how two observations are similar
Unsupervised Learning
First development - ANSWERS-technological advances, social
networks and data generated from personal electronic devices produce
incredible data
Second development - ANSWERS-• Advances in computational
approaches to effectively handle and explore massive amounts of data
• Faster algorithms for optimization and simulation, and
• More effective approaches for visualizing data
Third development - ANSWERS-• The methodological developments
were paired with an explosion in computing power and storage
capability.
• Better computing hardware, parallel computing, and cloud
computing have enabled businesses to solve big problems faster and
more accurately than ever before.
, Strategic Decision - ANSWERS-•Involve higher-level issues
concerned with the overall direction of the organization.
•These decisions define the organization's overall goals and
aspirations for the future.
Tactical decision - ANSWERS-•Concern how the organization should
achieve the goals and objectives set by its strategy.
•They are usually the responsibility of midlevel management.
Operational Decision - ANSWERS-•Affect how the firm is run from
day-to-day.
•They are the domain of operations managers, who are the closest to
the customer.
Decision making can be defined as the following process -
ANSWERS-1. Identify and define the problem
2. Determine the criteria that will be used to evaluate alternative
solutions
3. Determine the set of alternative solutions
4. Evaluate the alternatives
5. Choose an alternative
Common approaches to making decisions - ANSWERS-•Tradition
•Intuition
•Rules of thumb