Question 3.1
Using the same data set (credit_card_data.txt or credit_card_data-headers.txt)
as in Question 2.2, use the ksvm or kknn function to find a good classifier:
(a) using cross-validation (do this for the k-nearest-neighbors model;
SVM is optional);
Cross Validation is a model evaluation method to see how well our model has performed and prevent over
fitting. Supposed we receive a dataset with n data points, we do not use all these data points for training
instead we split them into training data and testing data . We use different iterations of this training data
and test data to check the efficiency of our model .
Different type of cross validation methods that I came across are – Holdout method, Leave one out cross
validation and K-fold cross validation .
For k-nearest-neighbors model (knn) model , I am going to use the Leave one out cross validation . In this
case – if I have n datapoint, I leave one out and use n-1 for training purpose .
, Display my_model to find error and Best K . (In this case best k = 58)
Now, let us check the accuracy for our model :
Using the same data set (credit_card_data.txt or credit_card_data-headers.txt)
as in Question 2.2, use the ksvm or kknn function to find a good classifier:
(a) using cross-validation (do this for the k-nearest-neighbors model;
SVM is optional);
Cross Validation is a model evaluation method to see how well our model has performed and prevent over
fitting. Supposed we receive a dataset with n data points, we do not use all these data points for training
instead we split them into training data and testing data . We use different iterations of this training data
and test data to check the efficiency of our model .
Different type of cross validation methods that I came across are – Holdout method, Leave one out cross
validation and K-fold cross validation .
For k-nearest-neighbors model (knn) model , I am going to use the Leave one out cross validation . In this
case – if I have n datapoint, I leave one out and use n-1 for training purpose .
, Display my_model to find error and Best K . (In this case best k = 58)
Now, let us check the accuracy for our model :