Verified By Experts| Latest Update Guaranteed Success
What happens in the Data Mining Phase? Creating training and testing datasets to build
models from
Identify/detect patterns
Determine if groups (clusters) exist in data
Classify data into groups
Create models that "learn" and improve (e.g., machine/deep learning, AI, etc)
Test Hypotheses Refine
Another name for Reporting and Visualization Dashboards
What happens in the Reporting/Visualization phase? Tell a story with data
Provide a summary of analytic analysis
Provide insights to stakeholders
Create insightful graphs that showcase trends and forecasts
Potential Problems with Data Cleaning Some cleaning techniques could dramatically change
data/outcomes
Outliers not dealt with can cause problems with statistical models due to excessive variability.
Potential Problems with Business Understanding Lack of clear focus on stakeholders,
timeline, limitations and budget could potentially derail an analysis
Potential Problems with Data Acquisition Quality and type of data may make access more
difficult
Potential problems with Data Exploration Skipping these steps could enable faulty
perceptions of the data which hurt advanced analytics
,Potential problems with Predictive Modeling Too many input variables (predictors) can
cause problems
Correlation does not imply causation
Time series models often need sufficient time data to offer precise trending
Predictive model accuracy should be assessed using cross-validation.
Potential problems with Data Mining Running an entire data is problematic; need to subset
data into training and testing datasets to build models
Potential problem with Report/Visualization Due to potential large audience consumption,
mistakes can cause bad business decisions and loss of revenue
Improper scales used in graphs could push for interpretations of the story that is inaccurate
Causation is when there is a real-world explanation for WHY this is logically happening; it
implies a cause and effect
C, and C++, and Java are general-purpose languages that are used for the back end, the
foundational elementsterm-26 of data science, and they provide maximum speed
Privacy Torts protects individuals rights to keep certain things out of public view even if they
are true
IRAC is our legal analysis tool to understand how to move from identifying a legal issue to
reaching a conclusion and a decision about how to take action
Which aspect of data exploration occurs when an analyst writes code to compile a bar graph of
dog food sales per month? Verification through visualization
,Content Listening understand and retain the information provided and identify the main
points of the message
What analytic method ask the question what happens in the past? Descriptive analytics
What analytic method ask the question what might happen in the future? Predictive
analytics
What analytics method ask the question what should we do going forward? Prescriptive
analytics
Which are examples of models used in predictive analytics? regression and decision trees
Overfitting The process of fitting a model too closely to the training data for the model to be
effective on other data.
Autocorrelation means that each point in time is influenced by the points that came before
it.
CSV file Comma separated values file; a text file with one record per line, and the field of
each record separated by commas
Implicit rules help the algorithm function. They are the rules that are develop while
analyzing the test data. They cannot be easily described to humans
What is a soft skill? Persuasion
Communication
Emotional intelligence
Active listening
, Logic and reasoning
Interpersonal skills
Negotiation
Data scientists They are data analysts, create software, work with mathematics, know the
business, ask the interesting questions and hacking.
Data Science team Data analysts, research lead and project manager. Their job is to create
an interesting data model. Show trends in the model. Show a correlation between certain
topics and the likelihood to be shared.
Data analysts is the person who obtains and scrubs the data. They will display the data in
graphs and reports
Research lead This is someone from the business side who pushes the team to ask
interesting questions. Identifying key problems. They know the most about the business. Know
topic categories. Drive questions
Who clears organizational hurdles? Project manager
Andre is part of a data science team. He is good at creating visualizations and reports. Which
role on the team is best for him? Data analyst
Joyce is part of a data science team. She is very interested in running experiments. At what
point would Joyce do this? during the process of asking questions
What is the mapping techniques for stakeholders? Power and Interest Grid
Salience Model
Direction of Influence