My D204 WGU Study Set
Study online at https://quizlet.com/_9oklo9
1. Which of these is NOT a topic of interest C
for Discovery/Planning/Business Understand-
ing?
A. Project Scope
B. Identify stakeholders and research ques-
tions/KPIs
C. Build a data pipeline (ETL)
D. Identify timeline, budget, and participants
2. What is a potential problem to consider in the A
planning phase?
A. Lack of clear focus on stakeholders, time-
line, limitations, and budget
B. Quality and type of data may make access
more difficult
C. Some cleaning techniques could dramatical-
ly change data/outcomes
D. Outliers not dealt with can cause problems
with statistical models due to excessive vari-
ability.
3. In what phase does the analyst identify the Business Understanding/Plan-
stake holders and research questions? ning/Discovery
4. In what phase does the analyst deal with the Data acquisition
following:
Gather/collect data from a variety of sources
Provide structure to data accessible via rela-
tional databases (SQL)
Build data pipeline (ETL)
, My D204 WGU Study Set
Study online at https://quizlet.com/_9oklo9
Use of API to download data from an external
source
5. In what phase does the analyst deal with the Data cleaning/wrangling/scrub-
following: bing/munging
Fixing improperly formatted values
Dealing with duplicates, missing data, and out-
liers
Data reduction
6. In what phase does the analyst deal with the Data exploration/Exploratory Data
following: Analysis(EDA)/Descriptive Statistics
Central Tendency/ Measures of center (e.g.,
mean, median, mode), variability (e.g., stan-
dard deviations and quartiles) and distribu-
tions (e.g., normal, skewed, etc)
Identify basic correlations between variables
Pattern discovery
7. In what phase does the analyst deal with the Predictive Modeling/Data Model-
following: ing/Correlation based models/Re-
gression models/Time Series
Estimate/project future values or likelihood of
an event.
Extend correlations found in EDA to mathe-
matical models
Predict/determine output values based on in-
put values
Cross-validation of predictive models to en-
sure accuracy.
Study online at https://quizlet.com/_9oklo9
1. Which of these is NOT a topic of interest C
for Discovery/Planning/Business Understand-
ing?
A. Project Scope
B. Identify stakeholders and research ques-
tions/KPIs
C. Build a data pipeline (ETL)
D. Identify timeline, budget, and participants
2. What is a potential problem to consider in the A
planning phase?
A. Lack of clear focus on stakeholders, time-
line, limitations, and budget
B. Quality and type of data may make access
more difficult
C. Some cleaning techniques could dramatical-
ly change data/outcomes
D. Outliers not dealt with can cause problems
with statistical models due to excessive vari-
ability.
3. In what phase does the analyst identify the Business Understanding/Plan-
stake holders and research questions? ning/Discovery
4. In what phase does the analyst deal with the Data acquisition
following:
Gather/collect data from a variety of sources
Provide structure to data accessible via rela-
tional databases (SQL)
Build data pipeline (ETL)
, My D204 WGU Study Set
Study online at https://quizlet.com/_9oklo9
Use of API to download data from an external
source
5. In what phase does the analyst deal with the Data cleaning/wrangling/scrub-
following: bing/munging
Fixing improperly formatted values
Dealing with duplicates, missing data, and out-
liers
Data reduction
6. In what phase does the analyst deal with the Data exploration/Exploratory Data
following: Analysis(EDA)/Descriptive Statistics
Central Tendency/ Measures of center (e.g.,
mean, median, mode), variability (e.g., stan-
dard deviations and quartiles) and distribu-
tions (e.g., normal, skewed, etc)
Identify basic correlations between variables
Pattern discovery
7. In what phase does the analyst deal with the Predictive Modeling/Data Model-
following: ing/Correlation based models/Re-
gression models/Time Series
Estimate/project future values or likelihood of
an event.
Extend correlations found in EDA to mathe-
matical models
Predict/determine output values based on in-
put values
Cross-validation of predictive models to en-
sure accuracy.