UNIVERSITY OF CAPE TOWN
DEPARTMENT OF STATISTICAL SCIENCES
STA3022F
JUNE 2013 EXAMINATION
INTERNAL EXAMINERS: A/Prof S Lubbe, Dr Ş Er TOTAL MARKS: 100
INTERNAL ASSESSOR: A/Prof F Little
EXTERNAL EXAMINER: Dr T Berning TIME ALLOWED: 3 hours
PAGES: 14
INSTRUCTIONS: ANSWER EACH SECTION IN A SEPARATE BOOK.
ALL QUESTIONS MAY BE ATTEMPTED.
MARKS ARE ALLOCATED FOR INTERMEDIATE CALCULATIONS.
SECTION A: EXPLORATORY METHODS [Available marks: 51]
Question 1 [3 marks]
Consider a data set of the monthly inflation rate over several years for all African countries.
(a) Would you consider inflation rate to be a numerical, ordinal categorical or nominal variable.numerical(1)
(b) If the inflation figure for Zimbabwe for June 2012 is missing, name one method of overcoming this
problem.impute withmean (1)
(c) You want to perform an analysis that requires normally distributed data. You plotted histograms of the
data, some of which are shown below. Since the data is very skew, what can be done before embarking
on the analysis? transform the data with log transformation (1)
300
400
250
150
300
200
Frequency
Frequency
Frequency
100
150
200
100
100
50
50
0
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4
x x x
200
250
150
150
200
Frequency
Frequency
Frequency
150
100
100
100
50
50
50
0
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5
x x x
1
, Question 2 [15 marks]
Consider the following contingency table and associated R output. In the table below, 300 people
were asked what bank they used most often and to identify the most important reason why they
chose that bank.
Helpful Enjoy Branch close Keeps me Many Competitive
TOTAL
staff advertising to home informed ATMs interest rates
É FNB
ABSA
14
13
14
6
2
10
23
15
7
5
12
21
72
70
NEDBANK 11 16 8 4 10 8 57
STDBANK 12 14 30 8 28 9 101
TOTAL 50 50 50 50 50 50 300
> ca(bank.data)
Principal inertias (eigenvalues): iii
1 2 3
Value 0.174676 0.046073 0.021778 i
IE Percentage 72.02%
Rows:
FNB ABSA
19%
NEDBANK
8.98%
STDBANK
iiiiiiiiii
Mass 0.240000 0.233333 0.190000 0.336667
ChiDist 0.542201 0.468606 0.383164 0.525126
Inertia 0.070556 0.051238 0.027895 0.092838
Dim. 1 -1.172400 -0.794016 0.278962 1.228645
Dim. 2 -0.781987 1.457919 -1.315956 0.289685
ÉTÉ Columns:
Branch Competitive
Helpful Enjoy close Keeps me Many interest
staff advertising to home informed ATMs rates
Mass 0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
ChiDist 0.205443 0.400249 0.618174 0.614089 0.516271 0.476418
Inertia 0.007034 0.026700 0.063690 0.062851 0.044423 0.037829
Dim. 1 -0.427021 0.023291 1.378472 -1.336570 1.197046 -0.835218
Dim. 2 -0.278986 -1.788994 1.041542 0.087283 -0.301211 1.240367
2 (a) In order to test the hypothesis 𝐻 : no significant association between bank and reason for using that
bank, give the test statistic and its associated distribution. Also define each of the symbols you use in the
definition of the test statistic and indicate how to calculate or obtain them. (4)
EE n in mm I Effffitate for Iii Eckart EYrtw9 andgrand
EE
umn
(b) Give an expression for the Pearson residuals calculated for a contingency table.
tff.gg (1)
(c) Once the matrix of Pearson residuals is obtained, each entry is divided by the grand total from the
contingency table. Name and give a mathematical expression for the method of obtaining the CA map
coordinates from this matrix.
supsingularvaluedecomposition Si Ar EE (2)
UiaVin
(d) Construct a CA map for the contingency table above. (5)
(e) What proportion of variance in the data are you displaying in your map in (d)?72.02719 91.021 (1)
(f) Which bank provides the most competitive interest rates? (1)
ABSA
(g) Why do customers tend to choose Nedbank? (1)
Enjoy advertising
2
DEPARTMENT OF STATISTICAL SCIENCES
STA3022F
JUNE 2013 EXAMINATION
INTERNAL EXAMINERS: A/Prof S Lubbe, Dr Ş Er TOTAL MARKS: 100
INTERNAL ASSESSOR: A/Prof F Little
EXTERNAL EXAMINER: Dr T Berning TIME ALLOWED: 3 hours
PAGES: 14
INSTRUCTIONS: ANSWER EACH SECTION IN A SEPARATE BOOK.
ALL QUESTIONS MAY BE ATTEMPTED.
MARKS ARE ALLOCATED FOR INTERMEDIATE CALCULATIONS.
SECTION A: EXPLORATORY METHODS [Available marks: 51]
Question 1 [3 marks]
Consider a data set of the monthly inflation rate over several years for all African countries.
(a) Would you consider inflation rate to be a numerical, ordinal categorical or nominal variable.numerical(1)
(b) If the inflation figure for Zimbabwe for June 2012 is missing, name one method of overcoming this
problem.impute withmean (1)
(c) You want to perform an analysis that requires normally distributed data. You plotted histograms of the
data, some of which are shown below. Since the data is very skew, what can be done before embarking
on the analysis? transform the data with log transformation (1)
300
400
250
150
300
200
Frequency
Frequency
Frequency
100
150
200
100
100
50
50
0
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.2 0.4 0.6 0.8 0.0 0.1 0.2 0.3 0.4
x x x
200
250
150
150
200
Frequency
Frequency
Frequency
150
100
100
100
50
50
50
0
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5
x x x
1
, Question 2 [15 marks]
Consider the following contingency table and associated R output. In the table below, 300 people
were asked what bank they used most often and to identify the most important reason why they
chose that bank.
Helpful Enjoy Branch close Keeps me Many Competitive
TOTAL
staff advertising to home informed ATMs interest rates
É FNB
ABSA
14
13
14
6
2
10
23
15
7
5
12
21
72
70
NEDBANK 11 16 8 4 10 8 57
STDBANK 12 14 30 8 28 9 101
TOTAL 50 50 50 50 50 50 300
> ca(bank.data)
Principal inertias (eigenvalues): iii
1 2 3
Value 0.174676 0.046073 0.021778 i
IE Percentage 72.02%
Rows:
FNB ABSA
19%
NEDBANK
8.98%
STDBANK
iiiiiiiiii
Mass 0.240000 0.233333 0.190000 0.336667
ChiDist 0.542201 0.468606 0.383164 0.525126
Inertia 0.070556 0.051238 0.027895 0.092838
Dim. 1 -1.172400 -0.794016 0.278962 1.228645
Dim. 2 -0.781987 1.457919 -1.315956 0.289685
ÉTÉ Columns:
Branch Competitive
Helpful Enjoy close Keeps me Many interest
staff advertising to home informed ATMs rates
Mass 0.166667 0.166667 0.166667 0.166667 0.166667 0.166667
ChiDist 0.205443 0.400249 0.618174 0.614089 0.516271 0.476418
Inertia 0.007034 0.026700 0.063690 0.062851 0.044423 0.037829
Dim. 1 -0.427021 0.023291 1.378472 -1.336570 1.197046 -0.835218
Dim. 2 -0.278986 -1.788994 1.041542 0.087283 -0.301211 1.240367
2 (a) In order to test the hypothesis 𝐻 : no significant association between bank and reason for using that
bank, give the test statistic and its associated distribution. Also define each of the symbols you use in the
definition of the test statistic and indicate how to calculate or obtain them. (4)
EE n in mm I Effffitate for Iii Eckart EYrtw9 andgrand
EE
umn
(b) Give an expression for the Pearson residuals calculated for a contingency table.
tff.gg (1)
(c) Once the matrix of Pearson residuals is obtained, each entry is divided by the grand total from the
contingency table. Name and give a mathematical expression for the method of obtaining the CA map
coordinates from this matrix.
supsingularvaluedecomposition Si Ar EE (2)
UiaVin
(d) Construct a CA map for the contingency table above. (5)
(e) What proportion of variance in the data are you displaying in your map in (d)?72.02719 91.021 (1)
(f) Which bank provides the most competitive interest rates? (1)
ABSA
(g) Why do customers tend to choose Nedbank? (1)
Enjoy advertising
2