1 I N TR O D U C TI O N
1.1 INTRODUCTION TO STATISTICAL METHODOLOGY
Fundamental role of statistical methodology in research why?
For example:
- To learn about which factors have the greatest impact on student
performance in school
- To investigate what affects people’s political beliefs/quality of their health
care/their decisions about work and home life
“Statistics isn’t just about data analysis or numbers; it is about understanding the
world around us. The diverse face of statistics means you can use your education
in statistics and apply it to nearly any area you are passionate about…”
- Data the observations gathered on the characteristics of interest
- Databases existing archived collections of data
- Statistics consists of a body of methods for obtaining and analyzing data
Statistical science provides methods for:
1. Design: planning how to gather data for a research study to investigate
questions of interest to us.
2. Description: summarizing the data obtained in the study. (descriptive
statistics graphs, tables and numerical summaries)
3. Inference: making predictions based on the data, to help us deal with
uncertainty in an objective manner. (predictions made using data
statistical inferences)
1.2 DESCRIPTIVE STATISTICS AND INFERENTIAL STATISTICS
- A statistical analysis is classified as descriptive or inferential – according to
whether it’s main purpose is to describe the data or make predictions.
- Subjects the entities on which a study makes observations
- Population and sample the population is the total set of subjects of
interest in a study. A sample is the subset of the population on which the
study collects data.
- Descriptive statistics summarize the information in a collection of data
- Inferential statistics provide predictions about a population, based on
data form a sample of that population. Can predict characteristics of
populations well by selecting samples that are small relative to the
population size.
- Parameter a numerical summary of the population
1.3 THE ROLE OF COMPUTERS AND SOFTWARE IN STATISTICS
Statistical software analyzes data organized in the spreadsheet form of a data
file:
Any one row contains the observations for a particular subject in the
sample
Any one column contains the observations for a particular characteristic
1.4 CHAPTER SUMMARY
The field of statistical science includes methods for
Designing research studies
Describing the data (descriptive statistics)
, Making predictions using the data (inferential statistics)
Statistical methods apply to observations in a sample taken form a
population.
Statistics summarize sample data, while parameters summarize entire
populations.
Descriptive statistics summarize sample or population data with numbers,
tables and graphs.
Inferential statistics use sample data to make predictions about population
parameters.
A data file has a separate row of data for each subject and a separate column for
each characteristic. Software applies statistical methods to data files.
2 SAMPLING AND MEASUREMENT
2.1 VARIABLES AND THEIR MEASUREMENT
- Variable a characteristic that can vary in value among subjects in a
sample or population.
- Measurement scale formed by the values that the variable can take
(e.g. male/female, 0/1/2/3/4, etc.)
- A variable is called quantitative when the measurement scale has
numerical values that represent different magnitudes of the variable
(income, number of siblings, age, etc.)
- A variable is called categorical when the measurement scale is a set of
categories (marital status, employment, favorites, etc.)
o Categorical variables often qualitative
- Interval scale formed by the possible numerical values (quantitative
variable)
- Nominal scale the scale does not have a “high” or “low” end
- Ordinal scale a scale that falls between nominal and interval – it consists
of categorical scales having a natural ordering of values (social class –
upper, middle, lower – political philosophy, etc.)
- Discrete and Continuous Variables a variable is discrete if it’s possible
values form a set of separate numbers, such as (0, 1, 2, 3, 4…). It is
continuous if it can take an infinite continuum of possible real number
values.
2.2 RANDOMIZATION
- Randomization the mechanism for achieving good sample
representation
- Sample size ‘n’ denotes the number of subjects in the sample n’ denotes the number of subjects in the sample
- Simple random sampling is a method of sampling for which every possible
sample of size n has equal chance of selection
o Why it is a good idea: everyone has the same chance of inclusion in
the sample, so it provides fairness. This reduces the chance that the
sample is seriously biased in some way, leading to inaccurate
inferences about the population. Most inferential statistical methods
assume randomization of the sort provided by random sampling
- Simple random sample (also random sample) a simple random sample
of n subjects from a population is one in which each possible sample of
that size has the same probability (chance) of being selected
, - Sampling frame to select a random sample, we need a list of all subjects
in the population. This list is called the sampling frame
- Random numbers are numbers that are computer generated according
to a scheme whereby each digit is equally likely to be any of the integers
0, 1, 2, …, 9 and does not depend on the other digits generated
- Sample survey selecting a sample of people from a population and
interview them
- Experiment the purpose of most experiments is to compare responses
of subjects on some outcome measure, under different conditions – those
conditions are levels of a variable that can influence the outcome
- Treatments the conditions in an experiment
- Observational studies studies that merely observe the outcomes for
available subjects on the variables without any experimental manipulation
of the subjects – the researcher measures subjects’ responses on the
variables of interest but has no experimental control over the subjects
2.3 SAMPLING VARIABILITY AND POTENTIAL BIAS
- Sampling error of a statistic is the error that occurs when we use a
statistic based on a sample to predict the value of a population parameter
- For sample sizes of about 1000, we’ll see that the sampling error for
estimating percentages is usually no greater than plus or minus 3% - this
bound is the margin of error
THREE TYPES OF BIAS:
1. Sampling bias
2. Response bias
3. Nonresponse bias
SAMPLING BIAS
- Probability sampling for simple random sampling, each possible sample
of n subjects has the same probability of selection – the probability any
particular sample will be selected is known (used with inferential statistical
methods)
- Nonprobability sampling methods for which it is not possible to
determine the probabilities of the possible samples
o Inferences using such samples have unknown reliability and result in
sampling bias
o The most common method is volunteer sampling subjects
volunteer for the sample, daarom kan de sample poorly represent
the population and yield misleading conclusions
o The sampling bias inherent in volunteer sampling is also called
selection bias. It is problematic to evaluate policies and programs
when individuals can choose whether or not to participate in them
o Even with random sampling, sampling bias can occur. One case is
when the sampling frame suffers form undercoverage it lacks
representation from some groups in the population (e.g. prison
inmates/homeless people)
RESPONSE BIAS
- Response bias result of poorly worded or confusing questions. Even the
order of in which questions are asked can influence the results
dramatically.
- In an interview, characteristics of the interviewer may result in response
bias