Exam (elaborations)

Georgia Tech Homework 2 Question 3.1, 100% Graded A+

Rating

Sold

Pages

Grade

A+

Uploaded on

25-04-2023

Written in

2022/2023

Georgia Tech Homework 2 Question 3.1, 100% Graded A+ Document Content and Description Below Homework 2 Question 3.1 Using the same data set (credit_card_ or credit_card_) as in Question 2.2, use the ksvm or kknn function to find a good classifier: (a) using cross- validation (do this for the k-nearest-neighbors model; SVM is optional); and Using leave-one-out crossvalidation with different kernel for classification data <- ("credit_card_", header = TRUE, sep = "") # Splitting data for training (70%) and validating (30%) number_of_data_points <- nrow(data) training_sample <- sample(number_of_data_points, size = round(number_of_data_points * 0.7)) training_data <- data[training_sample,] validating_data <- data[-training_sample,] kmax <- 100 model <- (R1~., training_data, kmax = kmax, scale = TRUE, kernel = c("rectangular", "triangular", "epanechnikov", "gaussian", "rank", "optimal")) model ## ## Call: ## (formula = R1 ~ ., data = training_data, kmax = kmax, kernel = c("rectangular", "triang ## ## Type of response variable: continuous ## minimal mean absolute error: 0. ## Minimal mean squared error: 0. ## Best kernel: gaussian ## Best k: 41 pred <- predict(model, validating_data[,-11]) accuracy <- sum(er(round(pred) == validating_data[,11])) / nrow(validating_data) # Accuracy on validation data cat("Best accuracy for validation data is", accuracy, " for K value of", model$eters$k, "nn") ## Best accuracy for validation data is 0. for K value of 41 (b) splitting the data into training, validation, and test data sets (pick either KNN or SVM; the other is optional). data <- ("credit_card_", header = TRUE, sep = "") # spliting data into training: 60%, validating: 20% and testing: 20% 1number_of_data_points <- nrow(data) training_sample <- sample(number_of_data_points, size = round(number_of_data_points * 0.6)) training_data <- data[training_sample,] non_training_data <- data[-training_sample,] number_of_non_training_data_points = nrow(non_training_data) validating_sample <- sample(number_of_non_training_data_points, size = round(number_of_non_training_data_points * 0.5)) validating_data <- non_training_data[validating_sample,] testing_data <- non_training_data[-validating_sample,] # Using kknn for crossvalidation Ks <- seq(1, 100) bestK <- 0 bestAcuracy <- 0 bestModel <- NULL for(k in Ks) { model <- kknn(R1~., training_data, validating_data, k = k, scale = TRUE) pred <- round(predict(model)) accuracy <- sum(pred == validating_data[,11]) / nrow(validating_data) # Keeping the best accuracy data for later use if(accuracy > bestAcuracy) { bestAcuracy <- accuracy bestK <- k bestModel <- model } } # Best K and it accuracy on validation data cat("Best K value is", bestK, "with accuracy of", bestAcuracy, "nn") ## Best K value is 11 with accuracy of 0. # Running the test data with best K value model <- kknn(R1~., training_data, testing_data, k = bestK, scale = TRUE) pred <- round(predict(model)) accuracy <- sum(pred == testing_data[,11]) / nrow(testing_data) # Accuracy of test data with best K cat("Acuracy with K value of", bestK, "on test data is", accuracy, "nn") ## Acuracy with K value of 11 on test data is 0. Question 4.1 Describe a situation or problem from your job, everyday life, current events, etc., for which a clustering model would be appropriate. List some (up to 5) predictors that you might use. One of the key revenue generator for our e-commerce business is the recommendation based online sales. In order to make product recommendation, we need to group our online visitors and returing customers into various groups. Some of the common predictors we use are: 21. Average money spent on each transaction by returning customers through our website helps us to understand customer budget. An >example is to group customers into $1-$25, $26-$100, $101-$200, etc. 2. Search history helps us to group our online visitors based on the keywords they use to search our inventory. 3. Customer location is another important predictor. A good use case is when a customer shopping from New York city, NY vs someone from Miami, FL in January, only one of them is more likely to buy snow gear. 4. Previously purchased product brands gives us a good idea about quality and brand our various customers may prefer. 5. Age group is also a major predictor for different product selection. Question 4.2 The iris data set contains 150 data points, each with four predictor variables and one categorical response. The predictors are the width and length of the sepal and petal of flowers and the response is the type of flower. The data is available from the R library datasets and can be accessed with iris once the library is loaded. It is also available at the UCI Machine Learning Repository (

Show more Read less

Institution

Georgia Tech

Course

Georgia Tech

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Georgia Tech
Course: Georgia Tech

Document information

Uploaded on: April 25, 2023
Number of pages: 7
Written in: 2022/2023
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

georgia tech homework 2 question 31
100 graded a document content and description below homework 2 question 31 using the same data set creditcarddatatxt or creditcarddata headerstxt as in

$8.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Savior

3.5

(25)

Also available in package deal

Get to know the seller

Savior NCSU

View profile

Sold

Member since

2 year

Number of followers

Documents

3434

Last sold

1 month ago

3.5

25 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Savior. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 43863 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Georgia Tech Homework 2 Question 3.1, 100% Graded A+

Written for

Document information

Subjects

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?