100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Tentamen (uitwerkingen)

Uitwerking examenvragen data mining

Beoordeling
-
Verkocht
-
Pagina's
16
Cijfer
8-9
Geüpload op
21-12-2023
Geschreven in
2022/2023

Uitwerking van oudere examenvragen van studenten met feedback.











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
21 december 2023
Aantal pagina's
16
Geschreven in
2022/2023
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Voorbeeld van de inhoud

Date Download: 19/06/2023




Data Mining Exam Questions
Link:
https://docs.google.com/document/d/1P2Za3RewqiRAVlJkEUFZPb1T3H82_3w9AeuhwVeF
vKQ/edit?fbclid=IwAR1lmzkov2kXUnQ-
HWm6LWXP_qW7kWKnwpWPOgeXxbevJvL9Xo0QAFhmqJA

Other questions can be found here: https://wiki.vtk.be/Data_Mining_(H02C6A)
No real reason to create an account - mostly the same and no more detail than the Qs here
→ just be sure to add new questions/exams on the wiki for continuity reasons :)

If a question is answered and confirmed to be correct, mark it green.

If a question is answered but not confirmed to be correct, mark it yellow.

If a question is open and has no answer yet, mark it red.

There is a fixed formula sheet that is provided for you during the exam and it can be found
on Toledo as well (it does not contain all formulas though)


2022 July
1. Logistic regression weight update
2. PCY exercise
3. Calc recommendation of movies and user, with latent factor model -> WE NEED AN
EXAMPLE
4. 5 small questions testing your insights
5. Anomaly detection: You are given a series of graphs for each day (x-axis: time, y-
axis: amount of visitors on a website).
a. Is there anything unusual about the data (For a specific day in the fall the
amount of visitors was double at midnight)
b. If there is anything unusual about the data, is this an anomaly or normal but
unusual behaviour? (It was an anomaly due to the switch from daylight saving
to standard time if i remember correctly)
6. 5 small questions testing your insights.
a. One was about active learning
7. BIRCH vs CURE: Given a set of points, Show how BIRCH (only ellipsoids)/CURE
(can take more complex shapes) would cluster these points (2 clusters)
8. Google created a model in 2008 to predict flu outbreaks by looking at google
searches. The model was fairly accurate up until 2013, afterwards it started
overestimating flu cases, why? I think it might have to do with the rise of social
media, many articles about potential flu outbreaks cause people to search more
about the flu causing the model to overestimate. Correlation != Causation

2022 June
1. Logistic regression weight update

,Date Download: 19/06/2023


2. Max miner exercise
3. Bi projection exercise (I think?)
4. K means vs GMM (same as 2022 Jan)
5. 5 small questions testing your insights
6. Knn for anomalies (not sure)
7. A table of vaccination rates at different age groups. What are 2 potential problems
with this data? Something about simpson's paradox



2022 jan
1. Logistic regression -> but with gradient descent (does this mean we also have to flip
the objective function (multiply L by (-1))) yes
Chat gpt: logistic regression can be trained using various optimization algorithms,
and gradient descent is one of them. Gradient descent is a common optimization
algorithm used to find the optimal parameters for logistic regression, but it is not the
only option.

Logistic regression aims to model the probability of a binary outcome based on a set
of input features. The model applies a logistic (sigmoid) function to a linear
combination of the features to map the continuous input space to a probability
between 0 and 1. The parameters of the logistic regression model are estimated to
maximize the likelihood of the observed data.

Gradient descent is an iterative optimization algorithm that adjusts the model
parameters in the direction of the steepest descent of the loss function. In logistic
regression, the loss function is typically the log-likelihood or the negative log-
likelihood. By taking steps proportional to the negative gradient of the loss function,
gradient descent iteratively updates the parameters until convergence to the optimal
solution.

2. Bilevel projection of
Sequence DB: 10:<c(ad)a>,20:<d(ac)da>,30<c(cd)a(ac)>
What is this? → look at the last lines of sequence mining
3. Max miner algo


4. K means vs GMM




Both 2 clusters -> where would X1 and X2 be after 1 iteration of clustering from these
starting points given the data (for K means and for GMM)

, Date Download: 19/06/2023



How can you estimate this for the GMM case?


Someone who know what the GMM would look like?
=> EM clusering example in slides, plot it out

=> There should be an intuitive way of doing this, no :((((? HELP

5. Short questions (only know the answers not the question)
a. LR and overfitting
b. GBRT with a small LR
c. Run learning algo on data with actively acquired labels
d. Drawback to toivonens algorithm

6. Question DTW( diagram and how to improve DTW to prevent noise)
Someone who knows the answer to this?
There are slides on Longest Common Subsequence (LCSS) that tackle the noise
problem by allowing for gaps. It includes the algorithm and example.

7. KNN for outliers(rank the points from most to least anomalous)
8. Like slide (Some Data Puzzels) p54-55 the table

2021-07-18

1. Exercise on the generate + prune step of apriori (single iteration)
2. Compute LCSS (Time series)
3. Predict movie ratings using collaborative filtering
4. Exercise on complete link agglomerative clustering
5. GMM: rank the points from most to least anomalous
Data is represented by a mixture of Gaussian ⇒ each example x has a probability p(x)
of being generated by the GMM
High p(x) → GMM is probable to generate this sample x → no anomaly
Low p(x) → GMM is unlikely to generate the sample x → anomaly
How can he ask this? Given alpha and probabilities of x belonging to a cluster?
Anomaly detection -> slide 13 → This is kNN for anomalies tho… for distances farthest away
is most anomalous. For GMM / probabilities you want to order from low to high (low chance
to generate this, so hence highly likely anomalous)
6. Convert the data from a training set into the proper format for logistic regression
What are we supposed to do here?
I guess this is related to the fact that logistic regression methods require the input data to be
numerical and therefore you need to convert categorical variables into indicator variables
(dummy coding)
So e.g. when you have data with labels (small, medium large) you can convert it to (0,1,2)?
€10,89
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
sepm13 Katholieke Universiteit Leuven
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
35
Lid sinds
3 jaar
Aantal volgers
26
Documenten
10
Laatst verkocht
6 maanden geleden

3,0

2 beoordelingen

5
1
4
0
3
0
2
0
1
1

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen