STAT 1000
CHAPTER 1 - 4 SUMMARIES
DISCLAIMER:
These summaries are prepared with the intent to closely align with the content of the prescribed lecture notes. Full credit and
acknowledgment are given to the authors and publishers of the original material on which these summaries are based. All information and
images taken directly from the authors of the lecture notes.
While every effort has been made to ensure accuracy, the author of these summaries cannot guarantee that the information contained
herein is entirely free from errors or omissions. As such, no liability is accepted for any inaccuracies, errors, or any resultant consequences
arising from the use of this material.
Users are encouraged to refer to the original textbook for comprehensive understanding and to verify any information. Any reliance you
place on the content of these summaries is strictly at your own risk.
Be advised that the distribution of these notes via any platform, including but not limited to email, WhatsApp, AirDrop, or any other
digital means, leaves an identifiable trail. Unauthorized distribution or reproduction of these materials may be considered a violation of
plagiarism and copyright laws in South Africa. Perpetrators found involved in such actions are liable to be prosecuted under the relevant
legal provisions. Exercise caution and respect intellectual property rights.
Unauthorised dissemination/distribution of this document is strictly prohibited.
By downloading this document, the purchaser acknowledges and agrees to the following terms:
The purchaser accepts full responsibility to refrain from distributing, sharing, or reproducing this document, in whole or in part, without the
express written consent of the author. Any unauthorized distribution is strictly prohibited.
Should the purchaser be found distributing or enabling the distribution of this document, they acknowledge that they may be subject to
legal action. The author reserves the right to take legal recourse, including but not limited to initiating court proceedings, against any
individual or entity found in violation of these terms.
Your download and use of this document indicate your acceptance of these terms and your understanding of the potential legal
consequences of non-compliance.
,CHAPTER 1: DESCRIPTIVE STATISTICS
Introduction
• Statistics: word commonly used to describe numerical information
a field of study that deals with the collection, analysis, interpretation and presentation of numerical data
• Study of statistics divided into two areas:
- Descriptive statistics: deals with results obtained from collecting data on a group of people/items and reaching
valid conclusions only on that group (sample)
- Inferential statistics: deals with results obtained from collecting data on a subset of people/items (sample) and
reaching conclusions about a full set of people/items (population), from which the sample was selected
Chapter Formulae
Terminology
• Population: a complete or exhaustive collection of people, objects or items to be studied. The total number of items in a
population is referred to as the population size and is denoted with a capital letter N
• Sample: subset or portion of the population from which data are collected. The sample size is denoted with a lowercase
letter n (Sample = Statistic)
• Data: entire collection of information gathered from a sample
• Variable: property or characteristic of the items of the population that we would like to measure or record which are
denoted by capital letters and their values denoted by lowercase letters
• Categorical variables: variables that have two or more categories
- Nominal variables: categorical variables where the categories have no implicit ordering, such
gender
- Ordinal variables: categorical variables that consist of categories which can be arranged in a
logical order
, • Numerical variables: variables that yield numerical responses
- Discrete: numerical variable that can take on whole numbers only
- Continuous: numerical variable that can take on any value in an interval
• Parameter: descriptive summary measure calculated from population data (Parameter = Population)
use sample information to estimate population parameters, which forms part of inferential statistics
• Statistic: descriptive summary measure calculated from sample data
Data Types
• Raw/ungrouped data
- first data collected
- record all information (variables) for all individuals (observations)
• Grouped data
- use raw data to construct a table where the frequency distribution of a variable is summarised
- format depends on nature of variable
- Grouped discrete data: frequency table of a discrete variable
- Grouped continuous data: frequency table of a continuous variable or a discrete variable grouped into intervals
Grouped discrete data
• Grouped discrete table lists all possible values of a discrete variable followed by the number of times each value
occurs in the sample
• First order the data in ascending order to form an ordered array
• Count of how many times a value occurs = frequency of occurrence which tells how often/frequently an observed
value occurs within the data
• Collection of all frequencies describes how the data is distributed across the variable therefore termed frequency
distribution
Grouped continuous data
• If there is a large range of discrete values or continuous data, frequency table will not adequately show the distribution
of the variable
• First group the data into intervals (classes) then count the number of observations that fall into each class (frequency)
• Creating class intervals requirements:
- convenient
- keep classes equal in length to clearly indicate distribution
- first class must start below smallest value
- last class must end above largest value
- each class must consist of a lower limit and an upper limit
- classes MUST NOT overlap
Relation between Raw and Ungrouped Data
• All variables start as raw data and can be tabulated into either grouped discrete or grouped continuous data
• In grouped discrete data the original raw data are represented without loss of information therefore possible to
recreate the raw data from grouped discrete data
• In grouped continuous data, grouping the data into classes leads to a loss of information as we no longer represent
the individual observed information therefore we cannot recreate the original data from the tabulated data