Exam (elaborations)

Georgia Tech, Question 10.1, Questiosns and answers, Rated A+ 2022/2023

Rating

Sold

Pages

Grade

A+

Uploaded on

25-04-2023

Written in

2022/2023

Georgia Tech, Question 10.1, Questiosns and answers, Rated A+ 2022/2023 Document Content and Description Below Question 10.1 Using the same crime data set as in Questions 8.2 and 9.1, find the best model you can using (a) a regression tree model, and (b) a random forest model. In R, you can us e the tree package or the rpart package, and the randomForest package. For each model, describe one or two qualitative takeaways you get from analyzing the results (i.e., don’t just stop when you have a good model, but interpret it too). regression tree model As by now we know that the dataset contains only 47 points, for the regression tree model it might be hard to produce many splits or it might end up overfitting and we won’t be able to say for sure that the model would work as effectively with a large dataset. For this classification tree, I did not split the data in training and validation, rather used all the datapoints to create the model. The initial model used "Po1" "Pop" "LF" "NW" , the Residual mean deviance was 47390. This tree had 7 terminal nodes and looked as below – In the next step I pruned this tree with 6 , 4, 4,3 and 2 leaf nodes to look at the residual mean deviances, which kept increasing as I dropped a node. It might seem like leaf nodes = 7 is the best fit model, but because of a very small sample set this is overfitted. To solve this issue I chose to apply cross validation. is shows a cross-validated version of the model. Instead of computing the deviance on the full training data, it uses cross-validated values for each of the 6 successive prunings. We can compare theISYE 6501 Week 7 HW deviance in the outputs of just using with the cross validated deviance and see that the crossvalidated values are rather higher at every step. Just using tests on the training data and so under-reports the deviance. The cv values are more realistic. My random cross validation revealed that even for leafnode = 6 the RMSE is very close to that of 7. So I chose to prune the tree with 6 leaf nodes and then calculated the R2 of both unpruned and pruned models which happened to be very close to each other, withing .72 - .7 range. If the cross validation sampling were done differently, we could get minimum RMSE for some # of leaf nodes, and similarly the regression tree model with “limited” training data may become overfitted. Takeaway – The model shows that po1 is the first variable on which the first split happens and possibly LF is least important one as in the prunes tree this gets dropped first. It also shows that NW is probably more important the Pop as in the same brunch, pruning removed Pop. But kept NW. random forest model For deciding the NodeSize and mtry of the random forest model I created a loop for node size 2 to 15 and mtry values between 1 to 10 and charted their R square values to find the optimal numbers and found that mtry=3 and NOdeSize = 3 gave the highrest R sqr = 0. I applied these values to create the model and Looked at the importance of the variables in the model.ISYE 6501 Week 7 HW Takeaway – The random forest used more number of variables as compared to the regression tree, but did not produce better R sqr values. Possibly it’s because we don’t have enough sample of data for using this method and most of the trees were very similar to each other. From the charts we can see that it seems like increased the number of variables used in ‘sampling and split’ is actually decreasing the accuracy of this model. Question 10.2 Describe a situation or problem from your job, everyday life, current events, etc., for which a logistic regression model would be appropriate. List some (up to 5) predictors that you might use. While sending out targeted emails with offers, our marketing team at a leading automotive company would do a logistic regression modelling to determine the types of email flyers offers certain groups of customers would enact to. The predictors that could be used are – Customer age group, Types od Car they own, age of car, frequency of services availed at dealership, past offer redemption types etc. Based on these customer segmentations and created and the emails are formatted accordingly through sales force. Once the recipients click through them and we get back the sales and service data from dealerships, they constitute back to the model for further adjustments. Question 10.3 1. Using the GermanCredit data set from

Show more Read less

Institution

Georgia Tech

Course

Georgia Tech

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Georgia Tech
Course: Georgia Tech

Document information

Uploaded on: April 25, 2023
Number of pages: 44
Written in: 2022/2023
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

georgia tech
questiosns and answers
find
question 101
rated a 20222023 document content and description below question 101 using the same crime data set uscrimetxt as in questions 82 and 91

$7.99

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Savior

3.5

(25)

Also available in package deal

Get to know the seller

Savior NCSU

View profile

Sold

Member since

2 year

Number of followers

Documents

3434

Last sold

1 month ago

3.5

25 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Savior. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 43863 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Georgia Tech, Question 10.1, Questiosns and answers, Rated A+ 2022/2023

Written for

Document information

Subjects

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?