Georgia Tech ISYE - 6501 Homework 2 Due Date: Thursday, September 3rd, 2020, Graded A+
Georgia Tech ISYE - 6501 Homework 2 Due Date: Thursday, September 3rd, 2020, Graded A+ Document Content and Description Below ISYE - 6501 Homework 2 Due Date: Thursday, September 3rd, 2020 Contents 1 ISYE - 6501 Homework 2 2 2 Homework Analysis 2 2.1 Analysis 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Analysis 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Analysis 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 11 ISYE - 6501 Homework 2 This document contains my analysis for ISYE - 6501 Homework 2 which is due on Thursday, September 3rd, 2020. Enjoy! 2 Homework Analysis 2.1 Analysis 3.1 Q: Using the same data set (credit_card_ or credit_card_) as in Question 2.2, use the ksvm or kknn function to find a good classifier. (a) using cross-validation (do this for the k-nearest-neighbors model; SVM is optional) RESULTS By using cross-validation at 10 folds on a k-nearest-neighbors (KNN) model, at k=15 with a rectangular kernel, we were able to achieve an accuracy score of roughly 85% (85.47009%). This means that 85 out of every 100 applicants is predicted correct! THE CODE: # needed library rm(list=ls()) library(kknn) library(dplyr) (12345) # read data into R data_path <- "data 3.1/" data_filename <- "credit_card_" credit_data <- (paste0(data_path, data_filename), header=TRUE) # train-valid-test-split sample_split <- sample(1:3, size=nrow(credit_data), prob=c(0.7,0.15,0.15), replace = TRUE) train_credit <- credit_data[sample_split==1,] valid_credit <- credit_data[sample_split==2,] test_credit <- credit_data[sample_split==3,] # training our model train_model <- (R1~., train_credit, kmax=100, scale=TRUE, kcv=10, kernel=c("rectangular", "triangular", "epanechnikov", "gaussian", "rank", "optimal"), kpar=list()) train_model ## ## Call: 2## (formula = R1 ~ ., data = train_credit, kmax = 100, kernel = c("rectangular", "triangul ## ## Type of response variable: continuous ## minimal mean absolute error: 0. ## Minimal mean squared error: 0. ## Best kernel: rectangular ## Best k: 15 Using cross-validation at 10 folds, we can see that the best kernel for our model is rectangular with a k of 15. Now that we have the best parameters for our model, let’s use them to train our validation data. # validating our model valid_model <- (R1~., valid_credit, ks=15, kernel="rectangular", scale=TRUE, kpar=list()) valid_pred <- round(predict(valid_model, valid_credit)) accuracy_score <- sum(valid_pred == valid_credit[,11]) / nrow(valid_credit) accuracy_score * 100 ## [1] 91 Our validation model provides and accuracy score of 91%. Now, let’s run the model through our test data to measure its true performance on data it hasn’t seen with before. # run test data through model test_pred <- round(predict(valid_model, test_credit)) accuracy_score <- sum(test_pred == test_credit[,11]) / nrow(test_credit) accuracy_score * 100 ## [1] 85.47009 Our test data provides an accuracy score of roughly 85%. This is lower than the validation data accuracy score (91%). We can conclude that our model favored the randomness of the data it was validated (valid_credit) on over data it hadn’t seen before (test_credit). Thus, being closer to the model’s true performance. (b) splitting the data into training, validation, and test data sets (pick either KNN or SVM; the other is optional). RESULTS A train-valid-test split allows us to produce a Support Vector Machine (SVM) with a splinedot kernel and C value of 1 that achieves an accuracy score of roughly 82% (82.05128). OUR CLASSIFIER EQUATION Based on the SVM model classifier equation below: classifier = β0 + β1x1 + β2x2...βpxi With an error margin (C) of 1 and the splinedot kernel, we produced the following classifier equation: classifier = −0. + 0.x1 + 0.x2 − 0.x3 + 0.x4 + 0.x5 − 0.x6 − 0.x7 − 0.x8 + 0.x9 + 0.x10 3THE CODE: # needed library library(kernlab) library(magicfor) library(ggplot2) library(hrbrthemes) (12345) # train-valid-test-split sample_split <- sample(1:3, size=nrow(credit_data), prob=c(0.7,0.15,0.15), replace = TRUE) train_credit <- credit_data[sample_split==1,] valid_credit <- credit_data[sample_split==2,] test_credit <- credit_data[sample_split==3,] # training our model magic_for(print, silent = TRUE) kerns <- list("rbfdot", "polydot", "vanilladot", "tanhdot", "laplacedot", "besseldot", "anovadot", "splinedot") for (kern in kerns) { train_model <- ksvm(R1~., data=train_credit, type="C-svc", kernel=kern, C=1, scaled=TRUE, kpar=list()) train_pred <- predict(train_model, train_credit[,1:10]) accuracy <- sum(train_pred == train_credit[,11]) / nrow(train_credit) print(accuracy) } kern_accuracy <- magic_result_as_dataframe() # displaying our model’s best kernel ggplot(kern_accuracy, aes(x=kern, y=accuracy, color=accuracy)) + geom_point(size=8) + ylim(c(.7, 1)) + labs(title="Accuracy Scores vs Kernel Function", y="Accuracy Scores", x="Kernel Function") + theme( = element_text(hjust=0.5), .x = element_text(hjust=0.5, size=14), .y = element_text(hjust=0.5, size=14), ion = "None", =element_line(size = 1, linetype="solid")) + geom_vline(xintercept="splinedot", linetype="dotted", color="blue", size=2) Show Less Last updated: 7 months ago
Geschreven voor
- Instelling
- ISYE 6501
- Vak
- ISYE 6501
Documentinformatie
- Geüpload op
- 25 april 2023
- Aantal pagina's
- 9
- Geschreven in
- 2022/2023
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
september 3rd
-
2020
-
september 3rd
-
2020 contents 1
-
georgia tech isye 6501 homework 2 due date thursday
-
graded a document content and description below isye 6501 homework 2 due date thursday
Ook beschikbaar in voordeelbundel