ST362A & ST562A Regression Analysis Assignment 2 Solutions
Due: Wednesday, October 6th, 2021 (11:59pm)
ST362A: Questions 1-6, with total 51 marks
ST562A: Questions 1-7, with total 61 marks
Note: The students in ST362A can work on the questions assigned to ST562A,
and receive bonus marks if the answers are correct.
Part I: R Programming
1. (12 marks) A study was made on the e↵ect of temperature on the yield of
a chemical process. The following data were collected:
x -5 -4 -3 -2 -1 0 1 2 3 4 5
Y 1 5 4 7 10 8 9 13 14 13 18
Please use R to construct a linear model, and retrieve results from the outputs
to answer the below questions. Note: R codes must be provided to this
question.
(a). (2 marks) Assuming a model Y = 0 + 1 x+✏, what are the least squares
estimates of 0 and 1 ? What is the fitted regression line?
(b). (2 marks) Construct an ANOVA table and test the hypothesis H0 : 1 =
0 with the significance level ↵ = 0.05.
(c). (2 marks) Construct a 95% confidence interval for the true mean value
of Y when x = 3. Interpret this interval.
(d). (2 marks) Construct a 95% confidence interval for the di↵erence between
the true mean value of Y when x1 = 3 and the true mean value of Y
when x2 = 2.
(e). (2 mark) Are there any indications that a better model should be tried?
(f). (2 marks) Comment on the number of levels of temperature investigated
with respect to the estimate of 1 in the assumed model.
Solution:
(a). x_temp <- c(-5:5)
y_yield <- c(1, 5, 4, 7, 10, 8, 9,13,14,13,18)
> result <- lm(y_yield~x_temp)
> summary(result)
1
, Call:
lm(formula = y_yield ~ x_temp)
Residuals:
Min 1Q Median 3Q Max
-2.0182 -1.1818 0.4182 1.1636 2.1636
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.2727 0.4632 20.021 9.00e-09 ***
x_temp 1.4364 0.1465 9.807 4.21e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.536 on 9 degrees of freedom
Multiple R-squared: 0.9144, Adjusted R-squared: 0.9049
F-statistic: 96.18 on 1 and 9 DF, p-value: 4.207e-06
The least squares estimates for 0 and 1 are ˆ0 = 9.2727 and ˆ1 =
1.4364 respectively. The fitted line is Ŷ = 9.2727 + 1.4364 ⇥ x.
(b). > anova(result)
Analysis of Variance Table
Response: y_yield
Df Sum Sq Mean Sq F value Pr(>F)
x_temp 1 226.945 226.94 96.18 4.207e-06 ***
Residuals 9 21.236 2.36
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> qf(0.05, 1, 9, lower.tail = FALSE)
[1] 5.117355
Construct a F ratio, Fobs = M SE(Reg) 226.94
M SE(Res) = 2.36 = 96.16102, which is much
greater than the critical value 5.117355 at ↵ = 0.05. Therefore we reject
H0 and conclude that 1 is not equal 0.
(c). > predict(result, newdata=data.frame(x_temp=3),interval="confidence
2
, level=0.95)
fit lwr upr
1 13.58182 12.13764 15.026
The 95% confidence interval for the true mean value of Y when x = 3
is (12.13764, 15.026).
(d). > qt(0.025, 9, lower.tail=FALSE)
[1] 2.262157
> 1.4364+2.262157*0.1465
[1] 1.767806
> 1.4364-2.262157*0.1465
[1] 1.104994
The true mean value of Y when x1 = 3 is E[Y1 ] = 0 + 3 1 , and the
true mean value of Y when x2 = 2 is E[Y2 ] = 0 2 1 . The dif-
ference is E[Y1 ] E[Y2 ] = ( 0 + 3 1 ) ( 0 2 1 ) = 5 1 . Therefore,
the 95% confidence interval is 5⇥ the 95% confidence interval of 1 ,
5 ⇥ (1.104994, 1.767806) = (5.52497, 8.83903).
(e). Since R2 = 0.9144 is high enough, and this means 91.44% of the vari-
ance of the response variable can be explained by the regression line.
The scatter plot attached also indicates this strong positive linear rela-
tionship. There is no need to try a much better model.
(f). > result1 <- lm(y_yield[1:5]~x_temp[1:5])
> summary(result1)
Call:
lm(formula = y_yield[1:5] ~ x_temp[1:5])
Residuals:
1 2 3 4 5
-0.4 1.6 -1.4 -0.4 0.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.4000 1.3808 8.256 0.00372 **
x_temp[1:5] 2.0000 0.4163 4.804 0.01717 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
3
Due: Wednesday, October 6th, 2021 (11:59pm)
ST362A: Questions 1-6, with total 51 marks
ST562A: Questions 1-7, with total 61 marks
Note: The students in ST362A can work on the questions assigned to ST562A,
and receive bonus marks if the answers are correct.
Part I: R Programming
1. (12 marks) A study was made on the e↵ect of temperature on the yield of
a chemical process. The following data were collected:
x -5 -4 -3 -2 -1 0 1 2 3 4 5
Y 1 5 4 7 10 8 9 13 14 13 18
Please use R to construct a linear model, and retrieve results from the outputs
to answer the below questions. Note: R codes must be provided to this
question.
(a). (2 marks) Assuming a model Y = 0 + 1 x+✏, what are the least squares
estimates of 0 and 1 ? What is the fitted regression line?
(b). (2 marks) Construct an ANOVA table and test the hypothesis H0 : 1 =
0 with the significance level ↵ = 0.05.
(c). (2 marks) Construct a 95% confidence interval for the true mean value
of Y when x = 3. Interpret this interval.
(d). (2 marks) Construct a 95% confidence interval for the di↵erence between
the true mean value of Y when x1 = 3 and the true mean value of Y
when x2 = 2.
(e). (2 mark) Are there any indications that a better model should be tried?
(f). (2 marks) Comment on the number of levels of temperature investigated
with respect to the estimate of 1 in the assumed model.
Solution:
(a). x_temp <- c(-5:5)
y_yield <- c(1, 5, 4, 7, 10, 8, 9,13,14,13,18)
> result <- lm(y_yield~x_temp)
> summary(result)
1
, Call:
lm(formula = y_yield ~ x_temp)
Residuals:
Min 1Q Median 3Q Max
-2.0182 -1.1818 0.4182 1.1636 2.1636
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.2727 0.4632 20.021 9.00e-09 ***
x_temp 1.4364 0.1465 9.807 4.21e-06 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.536 on 9 degrees of freedom
Multiple R-squared: 0.9144, Adjusted R-squared: 0.9049
F-statistic: 96.18 on 1 and 9 DF, p-value: 4.207e-06
The least squares estimates for 0 and 1 are ˆ0 = 9.2727 and ˆ1 =
1.4364 respectively. The fitted line is Ŷ = 9.2727 + 1.4364 ⇥ x.
(b). > anova(result)
Analysis of Variance Table
Response: y_yield
Df Sum Sq Mean Sq F value Pr(>F)
x_temp 1 226.945 226.94 96.18 4.207e-06 ***
Residuals 9 21.236 2.36
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
> qf(0.05, 1, 9, lower.tail = FALSE)
[1] 5.117355
Construct a F ratio, Fobs = M SE(Reg) 226.94
M SE(Res) = 2.36 = 96.16102, which is much
greater than the critical value 5.117355 at ↵ = 0.05. Therefore we reject
H0 and conclude that 1 is not equal 0.
(c). > predict(result, newdata=data.frame(x_temp=3),interval="confidence
2
, level=0.95)
fit lwr upr
1 13.58182 12.13764 15.026
The 95% confidence interval for the true mean value of Y when x = 3
is (12.13764, 15.026).
(d). > qt(0.025, 9, lower.tail=FALSE)
[1] 2.262157
> 1.4364+2.262157*0.1465
[1] 1.767806
> 1.4364-2.262157*0.1465
[1] 1.104994
The true mean value of Y when x1 = 3 is E[Y1 ] = 0 + 3 1 , and the
true mean value of Y when x2 = 2 is E[Y2 ] = 0 2 1 . The dif-
ference is E[Y1 ] E[Y2 ] = ( 0 + 3 1 ) ( 0 2 1 ) = 5 1 . Therefore,
the 95% confidence interval is 5⇥ the 95% confidence interval of 1 ,
5 ⇥ (1.104994, 1.767806) = (5.52497, 8.83903).
(e). Since R2 = 0.9144 is high enough, and this means 91.44% of the vari-
ance of the response variable can be explained by the regression line.
The scatter plot attached also indicates this strong positive linear rela-
tionship. There is no need to try a much better model.
(f). > result1 <- lm(y_yield[1:5]~x_temp[1:5])
> summary(result1)
Call:
lm(formula = y_yield[1:5] ~ x_temp[1:5])
Residuals:
1 2 3 4 5
-0.4 1.6 -1.4 -0.4 0.6
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.4000 1.3808 8.256 0.00372 **
x_temp[1:5] 2.0000 0.4163 4.804 0.01717 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
3