Tentamen (uitwerkingen)

CPSC 330 final exam Ch 16, 17, 18 with complete solutions

Beoordeling

Verkocht

Pagina's

Cijfer

A+

Geüpload op

05-03-2025

Geschreven in

2024/2025

CPSC 330 final exam Ch 16, 17, 18 with complete solutions

Instelling

CPSC

Vak

CPSC

Voorbeeld van de inhoud

CPSC 330 final exam Ch 16, 17, 18 with
complete solutions
.... gives you the ability to summarize the major themes in a large collection of
documents (corpus). - ANSWER-Topic modelling

.... is a great EDA tool to get a sense of what's going on in a large corpus. - ANSWER-
Topic modelling

2 approaches to reduce multi-class classification into binary classification - ANSWER-
the one-vs.-rest approach
- 1v{2,3}, 2v{1,3}, 3v{1,2}
- Learn a binary model for each class which tries to separate that class from all of the
other classes.

the one-vs.-one approach
- 1v2, 1v3, 2v3
- Build a binary model for each pair of classes.

After creating profile in Content-based filtering, what do we do? - ANSWER-Create
Ridge() model

Predict the rating of a movie that the user has not seen

An unsupervised approach which only uses the user-item interactions given in the
ratings matrix - ANSWER-Collaborative filtering in a recommender system

Apply all of the classifiers on the test example.

Count how often each class was predicted.

Predict the class with most votes.

These are the properties of - ANSWER-One Vs. One approach (OVO) for prediction

Basic text preprocessing (7) - ANSWER-Tokenization
- the process of breaking down a text or document into individual words, phrases,
symbols, or other meaningful elements known as tokens. In simpler terms, tokenization
is like splitting a sentence into its component parts
- Sentence segmentation: Split text into sentences
- Word tokenization: Split sentences into words

Converting text to lowercase

,Removing punctuation and stopwords
- stopwords: commonly used words that are often considered insignificant or carry little
meaning for understanding the context of a text (such as "a," "an," "the," "is," "in,"
"and,").

Discarding words with length < threshold OR word frequency < threshold

Lemmatization:
- Consider the lemmas instead of inflected forms.
- lemmatization: finding the base form of a word
- For example, lemmatizing the words "running," "ran," and "runs" would give you the
base form "run."
- Vancouver's → Vancouver
- computers → computer
- rising → rise, rose, rises

POS: restrict to a specific part of speech; For example, only consider nouns, verbs, and
adjectives

Stemming
- the process of reducing words to their base or root form, often by removing suffixes, to
simplify analysis and improve text processing efficiency.
- Before stemming: UBC is located in the beautiful province of British Columbia... It's
very close to the U.S. border.
- After stemming: ubc is locat in the beauti provinc of british columbia ... it 's veri close to
the u.s. border .

Collaborative filtering vs. Content-based filtering (1) - ANSWER-use item features or not

Collaborative: Recommends items based on similar users or items without requiring
explicit knowledge of item features.

Content-based: Recommends items based on item features like text, genre, or
metadata

Explain Neural Networks - ANSWER-Neural networks apply a sequence of
transformations on your input data.

We are adding one "layer" of transformations in between features (inputs) and the
target. (output)

The hidden units (e.g., h[1], h[2], ...) represent the intermediate processing steps.

At a very high level you can also think of them as Pipelines in sklearn.

, Explain what Transfer learning is - ANSWER-Recall: CNNs can take in images without
flattening them ← solution to the image classification!

Training a CNN from scratch is not common due to the need for a large dataset,
powerful computers, and significant human effort.

Instead, a common practice is to download a pre-trained model and fine-tune it for your
task.

This is called transfer learning.
- Transfer learning is like using what you already know to learn something new faster. In
machine learning, it means using a pre-trained model's knowledge to solve a different
problem instead of starting from scratch. It saves time and resources while improving
performance.

Given a test point, get scores from all binary classifiers (e.g., raw scores for logistic
regression). This is - ANSWER-OVR

How does LDA work (4) - ANSWER-1. Create BOW representation for the text column
using CountVectorizer

2. Create a topic model with sklearn's LatentDirichletAllocation

3. LDA basically allows access to these two word representations

Topic-words association
- `lda.components_` gives us the weights associated with each word (columns) for each
topic (rows).
- In other words, it tells us which word is important for which topic.

Document-topic association
- Calling `transform` on the data gives us document_topics association.
- It tells us which topic is important for which document.

4. You could change the data representation (i.e., change the labels, round values, and
drop sum column)

If you have K classes, it'll train K binary classifiers, one for each class.

this is - ANSWER-One-vs.-Rest approach (OVR)

If you want to pull documents related to a particular lawsuit, we use - ANSWER-Topic
modeling

ImageNet - ANSWER-An image dataset
There are 14 million images and 1000 classes

Meld schending auteursrecht

Geschreven voor

Instelling: CPSC
Vak: CPSC

Documentinformatie

Geüpload op: 5 maart 2025
Aantal pagina's: 16
Geschreven in: 2024/2025
Type: Tentamen (uitwerkingen)
Bevat: Vragen en antwoorden

Onderwerpen

cpsc 330 final exam ch 16 17 18

$11.99

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

CLOUND

4.0

(122)

Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

CLOUND Exam

Bekijk profiel

Volgen

Verkocht

621

Lid sinds

2 jaar

Aantal volgers

389

Documenten

11482

Laatst verkocht

23 uur geleden

PROF MM

HELLO WELCOME TO THIS PAGE WHERE YOU WILL FIND ALL EXAMS ,STUDY GUIDE ,CASE, TESTBANKS AND ANY OTHER STUDY MATERIALS,

4.0

122 beoordelingen

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper CLOUND. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor $11.99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50176 samenvattingen verkocht Opgericht in 2010, al 16 jaar dé plek om samenvattingen te kopen