Guaranteed Success
What is the third phase in the Data Life Cycle Data cleaning phase
Data cleaning phase: describe AKA: cleansing, wrangling, munging, feature engineering
uses SQL, Python, R , or Excel to perform data modifications and transformations
Data cleaning phase: tools/techniques PYTHON
R
SQL
EXCEL
- data reduction: optimize storage capacity
-modification
- transformation
- anomaly detection
Data cleaning phase: problems - some cleaning techniques could dramatically change
data/outcomes
- outliers not dealt with can cause problems with stats models
What is the fourth phase in the Data Life Cycle data exploration
Data exploration: describe begins to understand basic nature of the data, relationships
within it, structure of the data set and presence of outliers and the distribution of the data
, uses data visualization tools and numerical summaries
Data exploration: tools and techniques - distributions: normal or skewed
- visualization tools: tableau, R, python, Rstudio, and histogram
- stats tools: mean, mode, median
- correlation discovery
- pattern discovery
-histogram, charts, tables, boxplot, ect
- variability: STD, quartile
Data exploration: problems Skipping this step could enable faulty perceptions of the data
which hurt advanced analytics.
analyst will;l lack insight into the structure of the data set
What is the fifth phase in the Data Life Cycle Predictive modeling
Predictive modeling: describe allows to move beyond describing the data to creating models
and enables predictions of outcomes
Predictive modeling: tools/techniques - python
-R
- modeling:
correlation