10.1 Samples, populations and Sampling
Our data set (sample) is finite, and incomplete. We can’t get everybody to participate
in our experiment. A sample in statistics is a subset of data collected from a larger
population. It's used to make inferences about the population characteristics, and the
quality of the sample is crucial for the reliability of those inferences.
10.1.1 Defining a population
A population refers to the set of all possible people/observations you want to draw
conclusions about. This is generally much bigger than the sample.
10.1.2 Simple random samples
The relationship between the sample and the population depends of the procedure
by which the sample was selected. Referred to as a sampling method.
A procedure in which every member of the population has the same chance of being
selected is called a simple random sample. We cannot observe the same thing
twice. Observations are said to have been sampled without replacement in that
case. We can also perform a simple random sample procedure with replacement.
Now it is possible that we observe the same things multiple times.
Chapter 10 1
, 10.1.3 Most samples are not simple random samples
Obtaining a true simple random sample from most populations is a difficult task.
While a comprehensive discussion of sampling schemes is beyond our scope, here
are a few key alternatives:
Stratified Sampling: This method involves collecting separate random samples
from distinct subpopulations, making it more practical when the population is
already stratified and more efficient when some subpopulations are rare.
Snowball Sampling: Useful for hidden or hard-to-reach populations, it starts
with a small group and expands through referrals, but it can result in non-random
samples.
Chapter 10 2