Chapter 14 - Data Collection: Sampling
Key terms:
● Population: Whole set of items that are of interest for a particular investigation
● Sample: Subset of the population which represents the population
● Sampling unit: Each individual item in the population that can be sampled
● Sampling frame: When sampling units in the population are numbered or named to form a
list
● Sampling fraction: Proportion of items actually sampled
● Census: Data collected from the entire population
○ Should give very accurate results due to large sample size
○ Time consuming and expensive
○ Sometimes may involve destruction (i.e. if testing all bananas in a shop - all bananas
will have to be destroyed)
○ Large volume of data to be processed
● Sample: Subset of the population which represents the population that investigation is
conducted on:
○ Cheaper, easier and less data to process
○ Necessary if test is destructive - not all items will be destroyed
○ Allows to obtain information, to estimate parameters (like mean, standard deviation)
and conduct hypothesis tests
○ Data may not be accurate
○ Data may not be large enough to represent all subsets
○ Potential for bias
Considerations when sampling:
● Relevance to question
● Potential of bias (who is being asked? How is the question formed?)
● Who is collecting the data?
● Sampling fraction
● Is the sample representative
Random sampling: When each item in the sampling frame has an equal chance of being chosen, to
avoid bias. Each item has a non-zero probability of being picked (ideally, all equal)
Non-random sampling: Used when a sampling frame is not available
1. Random - Simple random sampling: All items are chosen by a random procedure like a
random name generator or picking names from a hat. All items must be named or numbered.
Every sample has the same chance of being picked compared to any other sample.
○ Bias free
○ Easy and cheap to implement
○ Each item has an equal chance of being picked
○ Sampling frame needed
○ Not suitable when population size is large
Key terms:
● Population: Whole set of items that are of interest for a particular investigation
● Sample: Subset of the population which represents the population
● Sampling unit: Each individual item in the population that can be sampled
● Sampling frame: When sampling units in the population are numbered or named to form a
list
● Sampling fraction: Proportion of items actually sampled
● Census: Data collected from the entire population
○ Should give very accurate results due to large sample size
○ Time consuming and expensive
○ Sometimes may involve destruction (i.e. if testing all bananas in a shop - all bananas
will have to be destroyed)
○ Large volume of data to be processed
● Sample: Subset of the population which represents the population that investigation is
conducted on:
○ Cheaper, easier and less data to process
○ Necessary if test is destructive - not all items will be destroyed
○ Allows to obtain information, to estimate parameters (like mean, standard deviation)
and conduct hypothesis tests
○ Data may not be accurate
○ Data may not be large enough to represent all subsets
○ Potential for bias
Considerations when sampling:
● Relevance to question
● Potential of bias (who is being asked? How is the question formed?)
● Who is collecting the data?
● Sampling fraction
● Is the sample representative
Random sampling: When each item in the sampling frame has an equal chance of being chosen, to
avoid bias. Each item has a non-zero probability of being picked (ideally, all equal)
Non-random sampling: Used when a sampling frame is not available
1. Random - Simple random sampling: All items are chosen by a random procedure like a
random name generator or picking names from a hat. All items must be named or numbered.
Every sample has the same chance of being picked compared to any other sample.
○ Bias free
○ Easy and cheap to implement
○ Each item has an equal chance of being picked
○ Sampling frame needed
○ Not suitable when population size is large