answers passed
In the formula for r, what is one function of the SDs in the denominator? - correct answer
✔✔The SDs in the denominator is there to ensure that r stays in the range from minus to plus
one.
Why is the numerator of Pearson's r referred to as the "Covariance"? - correct answer ✔✔In its
form, the Covariance resembles a Variance, and it shows how the two variables move together
In the formula for r, how is evidence of a positive association tallied up? Of a negative
association? - correct answer ✔✔Cases in which two variables score above their mean suggest
a positive association, as opposed to cases which lie below their mean. In either case, the
product of the two scores will be positive.
Cases in which two variables are on opposite sides of their means suggest a negative
association, and the products of their scores will be negative.
Starting from a formula for r that does not use algebraic notation, show what happens when the
variables are standardized. - correct answer ✔✔r = Cov(x,y)/(SD(x)SD(y)
when standardized it turns into:
∑ zₓzᵧ /N
in words, this is the mean of the products of the standardized variables
What are two ways to interpret Pearson's r? - correct answer ✔✔Pearson's measure tells us
how much of an SD of change we get in one variable for an SD of change in the other. If we
,square r, we obtain a PRE measure which tells us the proportion of Variance in one variable that
can be predicted from the other.
What is the difference between Spearman's ρ and Pearson's r? - correct answer ✔✔Spearman's
measure is calculated using the ranks of cases rather than their scores.
Spearman - for ordinal - understand the strength of 2 relationship between 2 variables
Pearson - interval scales - strength of the correlation- most widely used
What do we do before calculating rho if more than one case lies in a category? - correct answer
✔✔If several scores lie in the same category, we suppose that, with finer measurement, they
could be distinguished, and we take the median rank that would then be found for the set.
We assign each case in the category the median rank for the category.
Why might we find entries only in the lower triangle of a correlation matrix? - correct answer
✔✔The entries in the upper triangle will be the same as corresponding entries in the lower
triangle, and so are unnecessary. The entries on the diagonal represent rs between variables
and themselves, which are of course 1.00, so they are also unnecessary.
What is a scatterplot? What are some alternatives? - correct answer ✔✔A scatterplot is a graph
in which cases are placed at points corresponding to the scores on two variables, one plotted on
each axis. An alternative would be a boxplot.
What is a moving average? - correct answer ✔✔A moving average is an average of the values
for two or more points in time. The points used shift as we move forward in time, or across a
graph.
,What are the two advantages of a bar chart over a line graph? Two disadvantages? - correct
answer ✔✔In a bar chart, the level top of the bar can make it easier to estimate a value on the
y-axis, and, particularly if coloured, a bar chart can have greater visual impact. On the other
hand, bars can break up the flow of a line, and require us to use more ink to express the same
information as in a line graph.
what is a mosaic plot? Why are the rectangles in the plot different sizes? Why do we care about
the "Pearson's residuals"? - correct answer ✔✔each cell is represented by a rectangle whose
area is proportional to the number of cases in the cell.
pearson's residuals tells us whether the observed cell value is greater or smaller than the
expected value
Why are some cells shaded or patterned differently? Why might we be interested in a cell that is
particularly heavy (dark) or particularly light? - correct answer ✔✔the shading and patterns
represent the standardized residuals, whereas dark/heavy cells represent a larger difference
between the observed and expected value.
If we wanted to percentage the table showing vote by region, discussed above, in which
direction direction would we do this? How would we then interpret the differences in
percentages? - correct answer ✔✔We percentage down the columns rather than across the
rows. If we have percentages down the columns, we can see how the figures change as we
move across, from one category of the IV to another.
What measure do we typically use to identify heavy cells? How is it related to chi-square? What
values of the measure are we typically interested in? - correct answer ✔✔- we typically use the
standardized residual.
-it is the signed square root of the cells contribution for chi-square.
-we typically look for the values of at least +2 or -2, but for tables based on large samples, which
may lead to many values greater than this, we tend to look only at the largest residuals
, What is an association plot? What is the difference between rectangles above a line and those
below? - correct answer ✔✔an association plot shows us which cells in a table are heavy and
which are light, and thus informs us of the key aspects of how two variables are related.
rectangles above a line represent heavy cells, and a rectangle below represents light cells
What are conditional tables? What is another name for them? - correct answer ✔✔used to
check the ways an association might change when a third variable was controlled
- each table includes only people for whom the value of the third variable, called a "test factor"
is fixed
- also called "partial tables"
How do conditional tables "control for" third variables? - correct answer ✔✔examines the
relationship between two variables while holding the third variable fixed
What, for the Columbia School, was a "test factor"? - correct answer ✔✔the value for a third
variable
what is a doubledecker? what does the width of the bars tell us? - correct answer ✔✔a graph
where the two variables are identified in the layers at the bottom
the width of the bars reflects the size of the subgroups
define the following terms: specification, moderation, spurious relationship, distortion,
intervening variable, mediator - correct answer ✔✔specification: exists when the association
between two variables is different for subsamples with different values of a third variable
moderation: another term for specification