When a researcher takes a single measure from each item in their sample, it is called
univariate data.
Univariate data: comprises only a single measurement from each individual in our sample.
Bivariate data.: includes two measurements from each individual. Example collecting the
health status and age from a person. Multivariate: more than two measurements from each
individual. Age, health status and BMI.
The people from which data is collected are called cases.
On a computer, the data is always placed in rows. I'm measures each case in columns.
When we have paired data, we can determine the relationship between two measures.
TIME-SERIES DATA: SPECIAL KIND OF BIVARIATE.
Some data are collected over a period of time and show the variation of the factor across
time. Example can be rainfall during different years.
Time series is a set of ordered observations.
Observations ordered in time (like chronologically). Denoted by subscript.
always bivariate data (time and observation).
place ‘time’ on the x-axis when charting timeseries data.
Graphing paired data.
good idea to make a scatterplot when trying to understand the relationship between the
two measures in paired data.
With data from this TUTIRIAL abt women and fertility, label each factor in numbers and
alphabets (a1, a2, a3, b1, b2, b3..)
N
Correlation is a technique designed to measure linear relationships in bivariate data.
Positive and negative relationships.
In a positive linear relationship, a positive change on the x-axis variable is linked with a
positive change in the y-axis variable.
Increase-increase.
Means a scatterplot would increase from left 2 right.
‘Centring’ means that the mean of a dataset is subtracted from each score in the dataset so
that the mean of the centred data is zero, but the scores are otherwise unchanged.
Centring helps make quadrants in scatterplot visible.
Quadrants are just like the portions of a Cartesian plane. Just 4 parts of a square.
A negative linear relationship occurs when a positive change in the x-axis variable is linked
with a negative change in the y-axis variable.
Increase-decrease.
If we centre the data, it is apparent that more data points fall in the upper left n lower right
(negative planes?).
Opposite for positive correlation scatterplot (upper right, lower left).
If you centre bivariate data and make a scatterplot but cannot see either pattern (from lower
left to upper right quadrant, or from upper left to lower right quadrant) in the scatterplot,
then there may not be a linear relationship between the two variables you are plotting, or
the linear relationship may be weak.
, Linear models n scatter.
A linear model is a straight line on a graph.
Model fit is indicated by how closely the scatterplot points cluster around this line.
Correlations are linear models of bivariate data.
When u wnt 2 correlate 2 variables, consider the usefulness of a linear model 2 explain the
relationship.
three common circumstances that can reduce the value of a linear model that a correlation
implements:
(1). The data are not linear, but better fit some other relationship
(2). There are outliers in the data that obscure the linear relationship.
(3). There is too much scatter (or spread) in the data to find a strong relationship.
first two are detected via scatterplot.
traditionally fit an ellipse around the main oval portion of a scatterplot to give a visual
indication of spread
more elongated the ellipse (ie the longer the major axis in relation to the minor axis) the
smaller the spread of the data =stronger the linear relationship between the two variables
Product-Moment correlation coefficient.
AKA Pearson’s correlation coefficient.
Standardised measure. Always between +1 and -1.
When there is a perfect positive relationship between two variables, Pearson’s r will be 1,
when there is a perfect negative relationship between two variables, Pearson’s r will be –1.
When there is no linear relationship between two variables, Pearson’s r will be 0.
r = sxy ____ sx s y
meaning of r.
product–moment correlation formula was designed to ensure that the value of r will fall
within the range of –1 to +1
perfect negative correlation (a perfect inverse relationship).
X increases, y decreases.
TABLE 9.4.
Calculating Pearson’s r.
Correlation is not a robust measure: it is strongly affected by outlying points..
Use formula.
CH10: SIMPLE REGRESSION.
Regression analysis was named after a phenomenon called ‘regression to the mean’
principal concern of regression is to build a model of data that allows us to estimate how
new cases (ie those not included in the original data) will behave
in stats, regression to the mean refers to the observation that if a variable is extreme on its
first measurement, it is likely to be less extreme and closer to the mean on its second
measurement.
simplest model of bivariate data is the linear model
Regression analysis, like correlation analysis, starts with the inspection of a scatterplot
showing the relationship between two paired variables
X variable known as predictor. Independent.