100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

CPSC HW 1234 WITH COMPLETE SOLUTIONS

Rating
-
Sold
-
Pages
5
Grade
A+
Uploaded on
05-03-2025
Written in
2024/2025

CPSC HW 1234 WITH COMPLETE SOLUTIONS

Institution
CPSC
Course
CPSC

Content preview

CPSC HW 1234 WITH COMPLETE
SOLUTIONS
You should start with the following CountVectorizer and LogisticRegression objects, as
well as X_train and y_train (which you should further split with train_test_split and
shuffle=False):

You are given the following

countvec = CountVectorizer(stop_words="english")
lr = LogisticRegression(max_iter=1000, random_state=123) - ANSWER-# BEGIN
SOLUTION

X_train_fold, X_valid_fold, y_train_fold, y_valid_fold = train_test_split(
X_train, y_train, test_size=0.2, shuffle = False
)

X_train_fold_vec = countvec.fit_transform(X_train_fold)
X_valid_fold_vec = countvec.transform(X_valid_fold)

lr.fit(X_train_fold_vec, y_train_fold)

fold_score = lr.score(X_valid_fold_vec, y_valid_fold)

# END SOLUTION

0 age 13024 non-null int64
1 workclass 12284 non-null object
2 fnlwgt 13024 non-null int64
3 education 13024 non-null object
4 education.num 13024 non-null int64
5 marital.status 13024 non-null object
6 occupation 12281 non-null object
...

Given the information above, after performing cross validation with dummy classifier, at
this point, if you train sklearn's SVC model on X_train and y_train would it work? Why or
why not? - ANSWER-It won't work at this point because our data is not preprocessed
yet; we have some categorical columns and some NaN values in numeric columns. We
need to preprocess it first before feeding it into ML algorithms.

After performing the CV, you get

max_features = 100

, - train = 0.843253
- cv = 0.839331

max_features = 1000
- train = 0.911779
- cv = 0.911779

max_features = 10,000
- train = 0.964317
- cv = 0.894983

max_features = 100,000
- train = 0.976644
- cv = 0.895098

which one should you choose? - ANSWER-In terms of cross-validation score, it looks
like the best is max_features=100_000. In this case that means using all the words,
since the total number of words = 27345, which is less than 100,000

Discuss how changing the max_depth hyperparameter affects the training and cross-
validation accuracy.

What does it mean when the accuracy is 1.0 for max_depth >= 15 - ANSWER-In case
of the training data, a higher value of max_depth parameter results in higher accuracy.
For max_depth >= 15 the accuracy is 1.0, which means that the model is able to
classify all training examples perfectly. This happens because for higher max_depth
values, the decision tree learns a specific rule for almost all examples in the training
data. In case of the cross-validation scores, initially the accuracy increases a bit and
then it goes back down.

Generally speaking, should the best CV scores from the optimization of *individual
hyperparameter* agree with the best CV scores from the joint optimization (multiple
hyperparameters)? Why or why not? - ANSWER-In general there is no reason they
need to agree - by jointly optimizing the hyperparameters you might find something
better.

Given
validation score = 0.8955017301038062
test score = 0.8913193910502845

How does your test accuracy compare to your validation accuracy?
If they are different: do you think this is because you "overfitted on the validation set", or
simply random luck? - ANSWER-The test score is very close to the cross-validation
score. It doesn't seem like we are overfitting on the validation set.

given: CV score = 0.679, test score = 0.683

Written for

Institution
CPSC
Course
CPSC

Document information

Uploaded on
March 5, 2025
Number of pages
5
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
CLOUND Exam
View profile
Follow You need to be logged in order to follow users or courses
Sold
620
Member since
2 year
Number of followers
389
Documents
11482
Last sold
11 hours ago
PROF MM

HELLO WELCOME TO THIS PAGE WHERE YOU WILL FIND ALL EXAMS ,STUDY GUIDE ,CASE, TESTBANKS AND ANY OTHER STUDY MATERIALS,

4.0

122 reviews

5
64
4
16
3
29
2
3
1
10

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions