ASSIGNMENT in R (part 1 – Analysis of categorical data)
For all problems, show the code, required to answer each of the questions, in blue,
and corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output. Problem 1. ( 5 marks)
Nearly 75% of all death in the United States are attributed to just 10 causes, with the top four of
these accounting for over 50% of all non -accidental deaths as follows: heart disease (23.4%),
cancer (22.5%), respiratory disease (5.6%), and stroke (5.1%). The study of the causes of n =308
non – accidental deaths at a local hospital gave the following counts.
Cause Heart Cancer Respiratory Stroke Other
Disease Disease
Deaths 78 81 28 16 105 308
Do the data provide sufficient information to indicate that the number of deaths at this hospital
differ significantly from the proportion in the population in large?
Test appropriate hypothesis at 5% level of significance using R. For
full solution
1. State the null and alternative hypotheses.
2. Represent frequency table with categories
Heart Disease Cancer Respiratory Disease Stroke Other
3. Perform chi - square test and represent its results.
Show the code, required to answer questions 2 and 3, in blue, and
corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output.
4. Make the conclusions about truth or falsity of stated hypotheses.
1
, We test the number of deaths at this hospital differ significantly from the
proportion in the population in large.
H0 : p1 = 23.4%, p2 = 22.5%, p3 = 5.6%, p4 = 5.1% , p5 = 1-( p1+ p2+ p3+ p4)=43.4%
H1 : at least two proportions are different from the values specified in null hypothesis
> A<-as.table(c(78 ,81 ,28 ,16 ,105))
> row.names(A)<- c("Heart Disease", "Cancer", "Respiratory Disease", "Stroke"
, "Other")
> A
Heart Disease Cancer Respiratory Disease Stroke
78 81 28 16
Other
105
> P<-c(0.234,0.225,0.056,0.051,1-(0.234+0.225+0.056+0.051))
> chisq.test(A, p=P)
Chi-squared test for given probabilities
data: A
X-squared = 15.321, df = 4, p-value = 0.00408
Since the p- value = 0.00408 < 0.05 we reject null hypothesis at α=0.05.
There is sufficient sample evidence that proportions p1 = 23.4%, p2 = 22.5%, p3
= 5.6%, p4 = 5.1% , p5 = 43.4% given in null hypothesis are incorrect.
2
,Problem 2 (5 marks)
Accident data were analyzed to determine the number of fatal accidents for the cars of three
sizes. The data for 346 accidents are as follows.
Small Medium Large
Fatal 67 26 16
Not Fatal 128 63 46
Do the data indicate that the frequency of fatal accidents is dependent on the size of the cars?
Test appropriate hypothesis using R. Assume 0.05 level of significance.
For full solution
1. State the null and alternative hypotheses.
2. Represent contingency table with two categorical variables Type of Accident
(levels Fatal and Not Fatal) and
Car Size (levels are Small, Medium, and Large)
3. Perform chi - square test and represent its results.
Show the code, required to answer questions 2 and 3, in blue, and
corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output.
4. Make the conclusions about truth or falsity of stated hypotheses.
3
, H0 : the frequency of fatal accidents is independent on the size of the car
H1 : the frequency of fatal accidents is depends on the size of the cars
> B<-matrix(c(67,128,26,63,16,46), nrow = 2,
+ dimnames = list(Type_of_Accident = c("Fatal", "Not Fatal"),
+ Car_Size = c("Small", "Median", "Large")))
> B
Car_Size
Type_of_Accident Small Median Large
Fatal 67 26 16
Not Fatal 128 63 46
> chisq.test(B)
Pearson's Chi-squared test
data: B
X-squared = 1.8857, df = 2, p-value = 0.3895
Since p- value = 0.3895 > 0.05 null hypothesis is not rejected. There is insufficient
sample evidence to conclude that the frequency of fatal accidents is independent on
the size of the car.
4
For all problems, show the code, required to answer each of the questions, in blue,
and corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output. Problem 1. ( 5 marks)
Nearly 75% of all death in the United States are attributed to just 10 causes, with the top four of
these accounting for over 50% of all non -accidental deaths as follows: heart disease (23.4%),
cancer (22.5%), respiratory disease (5.6%), and stroke (5.1%). The study of the causes of n =308
non – accidental deaths at a local hospital gave the following counts.
Cause Heart Cancer Respiratory Stroke Other
Disease Disease
Deaths 78 81 28 16 105 308
Do the data provide sufficient information to indicate that the number of deaths at this hospital
differ significantly from the proportion in the population in large?
Test appropriate hypothesis at 5% level of significance using R. For
full solution
1. State the null and alternative hypotheses.
2. Represent frequency table with categories
Heart Disease Cancer Respiratory Disease Stroke Other
3. Perform chi - square test and represent its results.
Show the code, required to answer questions 2 and 3, in blue, and
corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output.
4. Make the conclusions about truth or falsity of stated hypotheses.
1
, We test the number of deaths at this hospital differ significantly from the
proportion in the population in large.
H0 : p1 = 23.4%, p2 = 22.5%, p3 = 5.6%, p4 = 5.1% , p5 = 1-( p1+ p2+ p3+ p4)=43.4%
H1 : at least two proportions are different from the values specified in null hypothesis
> A<-as.table(c(78 ,81 ,28 ,16 ,105))
> row.names(A)<- c("Heart Disease", "Cancer", "Respiratory Disease", "Stroke"
, "Other")
> A
Heart Disease Cancer Respiratory Disease Stroke
78 81 28 16
Other
105
> P<-c(0.234,0.225,0.056,0.051,1-(0.234+0.225+0.056+0.051))
> chisq.test(A, p=P)
Chi-squared test for given probabilities
data: A
X-squared = 15.321, df = 4, p-value = 0.00408
Since the p- value = 0.00408 < 0.05 we reject null hypothesis at α=0.05.
There is sufficient sample evidence that proportions p1 = 23.4%, p2 = 22.5%, p3
= 5.6%, p4 = 5.1% , p5 = 43.4% given in null hypothesis are incorrect.
2
,Problem 2 (5 marks)
Accident data were analyzed to determine the number of fatal accidents for the cars of three
sizes. The data for 346 accidents are as follows.
Small Medium Large
Fatal 67 26 16
Not Fatal 128 63 46
Do the data indicate that the frequency of fatal accidents is dependent on the size of the cars?
Test appropriate hypothesis using R. Assume 0.05 level of significance.
For full solution
1. State the null and alternative hypotheses.
2. Represent contingency table with two categorical variables Type of Accident
(levels Fatal and Not Fatal) and
Car Size (levels are Small, Medium, and Large)
3. Perform chi - square test and represent its results.
Show the code, required to answer questions 2 and 3, in blue, and
corresponding R output in black. Use copy & paste function in R environment
to copy and paste the code and the output.
4. Make the conclusions about truth or falsity of stated hypotheses.
3
, H0 : the frequency of fatal accidents is independent on the size of the car
H1 : the frequency of fatal accidents is depends on the size of the cars
> B<-matrix(c(67,128,26,63,16,46), nrow = 2,
+ dimnames = list(Type_of_Accident = c("Fatal", "Not Fatal"),
+ Car_Size = c("Small", "Median", "Large")))
> B
Car_Size
Type_of_Accident Small Median Large
Fatal 67 26 16
Not Fatal 128 63 46
> chisq.test(B)
Pearson's Chi-squared test
data: B
X-squared = 1.8857, df = 2, p-value = 0.3895
Since p- value = 0.3895 > 0.05 null hypothesis is not rejected. There is insufficient
sample evidence to conclude that the frequency of fatal accidents is independent on
the size of the car.
4