population - Answers the entire group of items/individuals (units) about which we want information
census - Answers a special case when every unit in the population is measured or surveyed (ex. US
census)
sample - Answers a small part of the population that we actually examine in order to gather information
sampling frame - Answers the (best) list of items or people forming a population from which a sample is
taken
sampled population - Answers the collection of all possible observation units that might have been
chosen in a sample; the population from which the sample was taken
parameter - Answers a (numerical) summary of a variable for the entire population (typically unknown)
statistic - Answers a (numerical) summary of a variable for a sample (used to estimate for the
population)
statistical inference - Answers generalizing the results from a sample to the population
bias - Answers tendency of an estimate to deviate in one direction from a true value
selection bias - Answers occurs when the sample is non-representative of the larger population of
interest (best way to avoid this is by using randomness)
simple random sampling (SRS) - Answers a sampling strategy in which every unit in the population is
assigned a number and a random number generator selects which units will be used in the sample
stratified sampling - Answers a sampling strategy in which the population is divided into groups based
on a certain factor, each unit in the groups is given a number, and then a random number generator
selects which units from each group will be used in the sample
cluster sampling - Answers a sampling strategy in which units are already in random groups and when a
group is selected, all of the units are used in the sample
systematic sampling - Answers a sampling strategy in which the sample is obtained by selecting every
kth individual (k is usually calculated as population size/sample size)
multi-stage sampling - Answers a sampling strategy in which several stages of sampling are carried out
(useful for large-scale sampling)
sampling error - Answers the error that results from using a sample to estimate information regarding a
population; cannot be avoided but can be minimized by using a large, representative sample
non-sampling error - Answers the error that results from deficiencies in the survey process (poor
sampling method, questionnaire wording, behavioral effects, etc.)
, non-response bias - Answers a non-sampling error that occurs when individuals selected by researchers
decline/are not available to be part of the sample
response bias - Answers a non-sampling error that occurs when individuals respond differently from
how they truly feel (to avoid judgement, to look good, etc.)
undercoverage - Answers a non-sampling error that occurs when the sampling frame excludes some
parts of the population
convenience sampling - Answers a non-sampling error that occurs when the most convenient (readily
available) group is considered as the sample
volunteer response sampling - Answers a non-sampling error that occurs when only those people who
volunteer to participate are included in the sample
variable - Answers characteristic associated with the cases in the data (what is being recorded)
cases - Answers the objects described by a set of data (ex. customers, companies, experimental subjects,
or other objects)
categorical variable - Answers variable that places a unit into one of several groups or categories
quantitative variable - Answers variable that takes numeric values for which arithmetic operations such
as adding and averaging make sense
graphical summaries for categorical variables - Answers pie charts and bar graphs
graphical summaries for quantitative variables - Answers histogram, time plots, and scatterplots
shape of a distribution - Answers skewed right (long tail to right), skewed left (long tail to left),
symmetric (major cluster in the middle)
modality - Answers number of peaks on a histogram
outliers - Answers unusual values that do not fit with the rest of the pattern (may result from data entry
errors or could be actual unusual values)
sample mean - Answers x-bar = sum of observations/number of observations; greatly impacted by
skewed data
median - Answers midpoint of a distribution
quartiles - Answers used to make box plots; Q1 = median of lower half of data, Q2 = median of data, Q3
= median of upper half of data
inter-quartile range - Answers calculated by subtracted Q1 from Q3; observations are considered
outliers if they are 1.5 x this amount