ISYE 6501 9/5/2019 Homework 2, Georgia Tech, Graded A+ Document Content and Description Below
ISYE 6501 9/5/2019 Homework 2, Georgia Tech, Graded A+ Document Content and Description Below ISYE 6501 9/5/2019 Homework 2 Question 3.1: Using the same data set (credit_card_ or credit_card_) as in Question 2.2, use the ksvm or kknn function to find a good classifi er: (a) using cross-validation (do this for the k-nearest-neighbors model; SVM is optional); Answer: To approach cross-validation for the KNN model, I attempted to leverage the kknn’s built-in cross-validation function: . This function uses leave-one-out cross-validation which means that every data point has a model fit to all other data points. LOOCV is essentially doing cross-validation X number of times where X is the number of data points. In my approach, I decided to test 20 values of K (just as in week 1) except this time I introduced a Kmax variable for robustness of code (easy to change to test more values). I once again scaled the data for a better fit. The total number of correctly predicted data points, were divided by the total number of data points to give a percentage of correct points for each K value. The results indicated that low values of K performed the worst (i.e K=1 is only 81% accurate) and that K>5 tends to produce fairly similar results with peaks around K=12 and K=15-17 at 85.3% accuracy. It is important to note that despite a decently high predictive value here, our model quality might not necessarily be high and we would need extraneous data outside of our train/validation set to confirm. To approach cross-validation for the SVM model, I simply found the argument parameter cross to include at the end of my line call for the KSVM function from last week’s homework and set that equal to 20 to do 20x cross validation of the data. I also found the call: model@cross which shows the error measured by cross-validation, and hence could use that to determine the accuracy of this model. Finally I used a for loop with various values of C in order to test the accuracy of each one (I tested 10 values from 0. to 1000 by magnitudes of 10). As noted in last week’s homework, larger values of C tend to give better results but reaching a saturation point around C=0.01 of about 86.2% accuracy. Code: KNN rm(list = ls()) ges("kknn") library(kknn) data <- ("/Users/Vikram/Downloads/credit_card_", stringsAsFactors = FALSE, header = FALSE) head(data) (1) kmax <- 20 model <- (V11~.,data,kmax=kmax,scale=TRUE) accuracy <- rep(0,kmax) for (k in 1:kmax) { This study source was downloaded by from CourseH on :04:36 GMT -05:00
Written for
- Institution
- ISYE 6501
- Module
- ISYE 6501
Document information
- Uploaded on
- April 25, 2023
- Number of pages
- 7
- Written in
- 2022/2023
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
- georgia tech
-
isye 6501 952019 homework 2
-
graded a document content and description below isye 6501 952019 homework 2 question 31 using the same data set creditcarddatatxt or creditcardd
Also available in package deal