Correct Answers) | Version 2 | 2022–2025|
A+ Graded | Verified & Updated
Instructions
This R Markdown file includes the questions, the empty code chunk sections for your code, and
the text blocks for your responses. Answer the questions below by completing this R Markdown
file. You must answer the questions using this file. You can change the format from pdf to Word
or html and make other slight adjustments to get the file to knit but otherwise keep the formatting
the same. Once you’ve finished answering the questions, submit your responses in a single knitted
file (just like the homework peer assessments).
There are 3 sections. Partial credit may be given if your code is correct but your conclusion is
incorrect or vice versa.
Next Steps:
1. Save this .Rmd file in your R working directory - the same directory where you will
download the heart.csv data file into. Having both files in the same directory will help
in reading the .csv file.
2. Read the question and create the R code necessary within the code chunk section
immediately below each question. Knitting this file will generate the output and insert it
into the section below the code chunk.
3. Type your code and/or answer(s) to the questions in the text block provided immediately
after the question prompt.
4. We recommend knitting the file often not only at the end of the exam to avoid working
through knitting problems right before the exam submission. We will apply a 10% grade
reduction if you will not submit the knitted file. We will also apply 20% grade reduction
if you don’t submit the file via Canvas.
5. Submit the knitted file on Canvas.
Example Question Format:
1
,(8a) This will be the exam question - each question is already copied from Canvas and inserted
into individual text blocks below, you do not need to copy/paste the questions from the online
Canvas exam.
Response to question (8a):
# Example code chunk area. Enter your code below the comment and
# between the ```{r} and ```
This is the section where you type your written answers to the question. Depending on the
question asked, your typed response may be a number, a list of variables, a few sentences, or a
combination of these elements.
Ready? Let’s begin. We wish you the best of luck!
Final Exam Part 2 - Data Set Background
For this exam, you will be building a logistic regression model to predict if an individual has heart
disease, and you will be building a standard linear regression model to predict resting blood
pressure.
The heart.csv data set consists of the following 10 variables:
1. age: age in years
2. sex: (M, F)
3. cp: chest pain type
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results (3 levels)
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
12. ca: number of major vessels (0-3) colored by flourosopy
13. target: have disease or not (1=yes, 0=no)
Read the data and answer the questions below.
2
, Read Data
# Import the libraries
library(car)
## Loading required package: carData library(glmnet)
## Loading required package: Matrix
## Loaded glmnet 3.0-2
# Ensure that the sampling type is correct
RNGkind(sample.kind="Rejection")
# Read the data
data = read.csv('heart.csv', header= TRUE)
# Create a dummy variable for males
data$sexM = ifelse(data$sex=='M', 1, 0)
data$sex = NULL
# Split into training and testing sets
set.seed(6414)
smp_siz = floor(0.75*nrow(data))
train_ind = sample(seq_len(nrow(data)),size =
smp_siz) train = data[train_ind,] test = data[-
train_ind,]
1: 19pts - Classification
For this section you will be building logistic regression models that classify whether an individual
has heart disease. For GLMs, we need replications in order to perform residual analysis. The
following code aggregates the training data using a subset of the predictors to ensure that we have
replications.
# Aggregate the training data
train.agg.n = aggregate(target~sexM+exang+slope+ca,data=train,FUN=length)
train.agg.y = aggregate(target~sexM+exang+slope+ca,data=train,FUN=sum) train.agg =
cbind(train.agg.y,total=train.agg.n$target) train.agg$prob.target =
train.agg$target/train.agg$total train.agg$target=NULL
head(train.agg,1)
## sexM exang slope ca total prob.target
## 1 1 0 0 0 5 0.8
(1a) 3pts - Build a logistic regression model (use logit link function) called model1 . For this model
use the train.agg data set with prob.target as the response variable and sexM, exang, slope, and
ca as predictors.
Remember the replications. Display the model summary.
Response to question (1a):
3