In the formula for r, what is one function of the SDs in the denominator? - Answers The SDs in the
denominator is there to ensure that r stays in the range from minus to plus one.
Why is the numerator of Pearson's r referred to as the "Covariance"? - Answers In its form, the
Covariance resembles a Variance, and it shows how the two variables move together
In the formula for r, how is evidence of a positive association tallied up? Of a negative association? -
Answers Cases in which two variables score above their mean suggest a positive association, as opposed
to cases which lie below their mean. In either case, the product of the two scores will be positive.
Cases in which two variables are on opposite sides of their means suggest a negative association, and
the products of their scores will be negative.
Starting from a formula for r that does not use algebraic notation, show what happens when the
variables are standardized. - Answers r = Cov(x,y)/(SD(x)SD(y)
when standardized it turns into:
∑ zₓzᵧ /N
in words, this is the mean of the products of the standardized variables
What are two ways to interpret Pearson's r? - Answers Pearson's measure tells us how much of an SD of
change we get in one variable for an SD of change in the other. If we square r, we obtain a PRE measure
which tells us the proportion of Variance in one variable that can be predicted from the other.
What is the difference between Spearman's ρ and Pearson's r? - Answers Spearman's measure is
calculated using the ranks of cases rather than their scores.
Spearman - for ordinal - understand the strength of 2 relationship between 2 variables
Pearson - interval scales - strength of the correlation- most widely used
What do we do before calculating rho if more than one case lies in a category? - Answers If several
scores lie in the same category, we suppose that, with finer measurement, they could be distinguished,
and we take the median rank that would then be found for the set.
, We assign each case in the category the median rank for the category.
Why might we find entries only in the lower triangle of a correlation matrix? - Answers The entries in the
upper triangle will be the same as corresponding entries in the lower triangle, and so are unnecessary.
The entries on the diagonal represent rs between variables and themselves, which are of course 1.00, so
they are also unnecessary.
What is a scatterplot? What are some alternatives? - Answers A scatterplot is a graph in which cases are
placed at points corresponding to the scores on two variables, one plotted on each axis. An alternative
would be a boxplot.
What is a moving average? - Answers A moving average is an average of the values for two or more
points in time. The points used shift as we move forward in time, or across a graph.
What are the two advantages of a bar chart over a line graph? Two disadvantages? - Answers In a bar
chart, the level top of the bar can make it easier to estimate a value on the y-axis, and, particularly if
coloured, a bar chart can have greater visual impact. On the other hand, bars can break up the flow of a
line, and require us to use more ink to express the same information as in a line graph.
what is a mosaic plot? Why are the rectangles in the plot different sizes? Why do we care about the
"Pearson's residuals"? - Answers each cell is represented by a rectangle whose area is proportional to
the number of cases in the cell.
pearson's residuals tells us whether the observed cell value is greater or smaller than the expected value
Why are some cells shaded or patterned differently? Why might we be interested in a cell that is
particularly heavy (dark) or particularly light? - Answers the shading and patterns represent the
standardized residuals, whereas dark/heavy cells represent a larger difference between the observed
and expected value.
If we wanted to percentage the table showing vote by region, discussed above, in which direction
direction would we do this? How would we then interpret the differences in percentages? - Answers We
percentage down the columns rather than across the rows. If we have percentages down the columns,
we can see how the figures change as we move across, from one category of the IV to another.
What measure do we typically use to identify heavy cells? How is it related to chi-square? What values
of the measure are we typically interested in? - Answers - we typically use the standardized residual.
-it is the signed square root of the cells contribution for chi-square.
-we typically look for the values of at least +2 or -2, but for tables based on large samples, which may
lead to many values greater than this, we tend to look only at the largest residuals