Statistics : science of data ,
study of collecting ,
organising analysing
, ,
interpreting and
presenting data
( population)
gain
and/or objects
Used to information about
group a
of
predictions
to make decisions and
has to be
choose subset
e
good sample sample population
of pop .
collect
data 7
^
✓ Collect data
analysis
✓
sample and draw
data conclusions
census
statistical study
:
I .
prepare
context
.
-
source :
now was the data collected ?
sampling method : how were the participants chosen
-
2 .
Analyse
Graph data
-
Explore data
'
Apply statistical methods further investigate data
-
-
3 .
conclude
statistical and critical
thinking
.
common flaws
I Bad method
.
sampling
different sample →
different data . :
different conclusions
sample should be representative and unbiased
representative :
same characteristics as population
unbiased : no systematic difference with population
2 O data
starting misleading presentation
→
.
Not y-axis from of
3 correlation causation
. does not
imply
, collecting sample Data
Different sampling methods
voluntary response sample :
subjects decide themselves to be included
-
in sample ( biased)
Random sample : each member of population has equal probability
-
of
being selected can biased)
simple random sample each sample of size has equal
-
: n
( unbiased
probability of being chosen .
very representative and ,
but hard
to do with a large population)
systematic sampling after starting point select
every th member
-
:
, K -
'
stratified sampling : divide population into
subgroups such that
subjects within groups have same characteristics ,
then draw a
sump ie random sample from each
group
cluster divide population into clusters randomly select
'
sampling :
,
then
some of these clusters ( can lead to biased data)
convenience
sampling easily available results C biased ) C pilots)
-
: ok for
important concepts
variable :
varying quantity
in cause and effect studies :
(dependent) variable
Response representing effect to
study
-
:
the
( independent) variable effect
Explanatory
:
possibly causing that
'
variables
confounding mixing influence of several explanatory
-
: on
response
Different types of study
Observational
study : characteristics of subjects are observed ; subjects
are not modified
Retrospective ( case control) data from the past
-
-
:
sectional data point in time
-
cross -
:
from one
prospective (
longitudinal) data are to be collected
-
:
Experiment : some subject treatment
sometimes control and treatment single blind ( participant doesn't
-
group : -
know) or double -
blind ( participant + researcher don't know)
To measure placebo effect or experimenter effect
'
data
Types of :
parameter : numerical measurement
describing a population 's
characteristic . Notation :
typically ooeek symbols e -
g .
µ ,
o .
Statistic : numerical measurement
describing a sample 's characteristic .
Notation small letters s
e.g x
: .
.
,
, Qualitative (
categorical) names easels represent counts or
:
or
measurements
Quantitative Cneumericae) : numbers represent counts or measurements
-
Discrete the set of possible values is countable Ce number of
g
: -
.
siblings)
continuous the set possible is Ce weight)
-
of values uncountable
g
: -
.
Level of measurement of data determines which statistical methods are
applicable
Qualitative data
-
:
Nominal names
categories ( no
ordering)
-
:
,
labels ,
ordinal
categories with
ordering differences (e.g grades)
-
:
. no -
-
Quantitative data :
'
Interval :
ordering possible and meaningful differences (
e.g . YOB)
Ratio &
ordering possible and
meaningful differences natural
starting
-
:
point ( e.g .
marathon times)
summarising and Graphing Data
choose most question
summary suitable for research
Often interested in data distribution
A
good summary shows :
characteristics data distribution
-
of
location . spread .
range , extremes , accumulations ,
gaps ,
symmetry .
.
.
Depending on context and
goal :
-
are data sampled from a certain distribution ?
Aoe different needed for further analysis ?
'
groups
are there influences other variables ?
of
-
is there dependence between variables ?
-
Summarise →
describe find structure in data distribution :
Graphical :
tables graphs other
figures
-
. .
Descriptive
-
Qualitative describe location and dispersion Ivar lotion
-
shape
-
: .
-
Quantitative : numerical summaries of location lvariation
Graphical summaries :
'
Frequency distribution :
count occeerances
-
cumulative frequencies and relative cumulative) frequencies
-
Bar chart cumulative bar chart and Pareto bar chart
,
circle
diagram
-
Histogram ( can only be used for quantitative data)
-
series plot visualize
Time time varying quantity
-
: -