Descriptive : what has happened, identify
problems and solutions
3 types of analysis Predictive : What could happen, historical
techniques data, estimate, etc
Prescriptive : what should we do, optimize
and simulate, explore, build, etc
Big data : massive volume of both structured and unstructured data that are
difficult to manage, process, and analyze using traditional data-processing
tools.
3 Vs : Statistical data types
Volume Cross-sectional data : for a
given sort of entity for a
Velocity
single period of time Panel Data : for
Variety Time-series data : for a multiple entities
single entity for multiple for multiple
periods of time periods of time
The linear regression model
Dependent variable = y
Def : postulates that the relationship
between the dependent and independent
variable is linear Independent = X1, X2, etc, Xn
A regression mondel treats all independent variable as numerical
Big data : CM1 1
, d = 1 for 1 of the categories
Dummy variable : used to describe 2 categories of
a categorical variable, d
d = 0 for the other(s)
Simple regression model :
Multiple regression model :
Y^: predicted value of the dependent variable
This equation is the model. It allows to calculate
the predicted value of the dependent variable for
any given values of independent variables.
The difference between the observed and the
predicted values is the residual : e = y - y^
Big data : CM1 2