the study of the collection, organization, analysis, interpretation, and presentation of data -
ANS-Statistics
collecting, organizing, and presenting the data - ANS-descriptive statistics
drawing conclusions based on sample data from that population - ANS-inferential statistics
a number used to describe the population - ANS-parameter
a number calculated from a sample and is used to estimate the parameter - ANS-statistic
the smallest unit upon which an observation is made - ANS-observational unit
when a variable is measured at the same time point (or time frame) for multiple observational
units (or when several variables are measured at the same point in time) - ANS-cross sectional
data
a variable that is measured at regular intervals over time for a single observational unit -
ANS-time series
where you ask a set of customers the same questions at a regular interval to see how their
opinions change over time - ANS-longitudinal survery
a stack of time series for multiple observation units - ANS-panel data
stores structured, semi-structured, and unstructured raw data. - ANS-data lake
stores structured data that is ready for data analytics - ANS-data warehouse
term used to describe data sets so large that traditional methods of storage and analysis are
inadequate - ANS-big data
*have numerical value that works like a number
*must have units
* divided into discrete variable and continuous variables - ANS-Quantitative variables
do not have a meaningful numerical value; aka categorical variables
* can be nominal or ordinal - ANS-Qualitative variables
is a unique identifier assigned to each individual or item in a group. they:
, * don't have units
* are a special kind of categorical variables
* are useful in combining data from different sources to avoid duplication
* aren't variables to be analyzed - ANS-identifier variable
displays quantitative data, works for small to large datasets, plots the 5 number summary, great
for side-by-side comparisons of a quantitative variable according to some categorical variable,
can mask certain features of the shape of a distribution - ANS-Boxplot
works for: medium to large quantitative data sets, bins touch, are nice because you can
visualize the shape of the distribution, even multimodality - ANS-histogram
is used to depict 2 potentially related variables
- each point is a pairing: (x,y)
-linear, curvilinear, clustered, etc.
-positive vs. negative relationships - ANS-scatterplot
quantitative data changing over time, time should go on the horizontal axis, variable should go
on the vertical axis, use different lines to denote separate categories or groups, beware of
plotting different scales - ANS-ling graph
categorical (qualitative) data, can be horizontal or vertical, can display parts of a whole or
separate value, for nominal data put bars in ascending or descending order, for ordinal put bars
in order of categories - ANS-bar graph
categorical (qualitative) data, displays parts of a whole, not good when there are too many
categories, dont ever make it 3-D or "tilted"! not good for comparisons - ANS-pie graphs
average of the sample - ANS-Sample Mean
-50th percentile of the sample
-middle observation in the ordered list if n is odd
-average of 2 middle observations if n is even - ANS-Median (M)
mean is greater than median - ANS-Data skewed right
mean is less than median - ANS-Data skewed left
the mean equals the median - ANS-data is symmetric
most frequently occurring value
-benefit: works for quant and qual data
-drawback: finding the mode for continuous data can be tricky and is not often used -
ANS-Mode