100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Examen

Uitwerking examenvragen data mining

Puntuación
-
Vendido
-
Páginas
16
Grado
8-9
Subido en
21-12-2023
Escrito en
2022/2023

Uitwerking van oudere examenvragen van studenten met feedback.

Institución
Grado










Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
21 de diciembre de 2023
Número de páginas
16
Escrito en
2022/2023
Tipo
Examen
Contiene
Preguntas y respuestas

Temas

Vista previa del contenido

Date Download: 19/06/2023




Data Mining Exam Questions
Link:
https://docs.google.com/document/d/1P2Za3RewqiRAVlJkEUFZPb1T3H82_3w9AeuhwVeF
vKQ/edit?fbclid=IwAR1lmzkov2kXUnQ-
HWm6LWXP_qW7kWKnwpWPOgeXxbevJvL9Xo0QAFhmqJA

Other questions can be found here: https://wiki.vtk.be/Data_Mining_(H02C6A)
No real reason to create an account - mostly the same and no more detail than the Qs here
→ just be sure to add new questions/exams on the wiki for continuity reasons :)

If a question is answered and confirmed to be correct, mark it green.

If a question is answered but not confirmed to be correct, mark it yellow.

If a question is open and has no answer yet, mark it red.

There is a fixed formula sheet that is provided for you during the exam and it can be found
on Toledo as well (it does not contain all formulas though)


2022 July
1. Logistic regression weight update
2. PCY exercise
3. Calc recommendation of movies and user, with latent factor model -> WE NEED AN
EXAMPLE
4. 5 small questions testing your insights
5. Anomaly detection: You are given a series of graphs for each day (x-axis: time, y-
axis: amount of visitors on a website).
a. Is there anything unusual about the data (For a specific day in the fall the
amount of visitors was double at midnight)
b. If there is anything unusual about the data, is this an anomaly or normal but
unusual behaviour? (It was an anomaly due to the switch from daylight saving
to standard time if i remember correctly)
6. 5 small questions testing your insights.
a. One was about active learning
7. BIRCH vs CURE: Given a set of points, Show how BIRCH (only ellipsoids)/CURE
(can take more complex shapes) would cluster these points (2 clusters)
8. Google created a model in 2008 to predict flu outbreaks by looking at google
searches. The model was fairly accurate up until 2013, afterwards it started
overestimating flu cases, why? I think it might have to do with the rise of social
media, many articles about potential flu outbreaks cause people to search more
about the flu causing the model to overestimate. Correlation != Causation

2022 June
1. Logistic regression weight update

,Date Download: 19/06/2023


2. Max miner exercise
3. Bi projection exercise (I think?)
4. K means vs GMM (same as 2022 Jan)
5. 5 small questions testing your insights
6. Knn for anomalies (not sure)
7. A table of vaccination rates at different age groups. What are 2 potential problems
with this data? Something about simpson's paradox



2022 jan
1. Logistic regression -> but with gradient descent (does this mean we also have to flip
the objective function (multiply L by (-1))) yes
Chat gpt: logistic regression can be trained using various optimization algorithms,
and gradient descent is one of them. Gradient descent is a common optimization
algorithm used to find the optimal parameters for logistic regression, but it is not the
only option.

Logistic regression aims to model the probability of a binary outcome based on a set
of input features. The model applies a logistic (sigmoid) function to a linear
combination of the features to map the continuous input space to a probability
between 0 and 1. The parameters of the logistic regression model are estimated to
maximize the likelihood of the observed data.

Gradient descent is an iterative optimization algorithm that adjusts the model
parameters in the direction of the steepest descent of the loss function. In logistic
regression, the loss function is typically the log-likelihood or the negative log-
likelihood. By taking steps proportional to the negative gradient of the loss function,
gradient descent iteratively updates the parameters until convergence to the optimal
solution.

2. Bilevel projection of
Sequence DB: 10:<c(ad)a>,20:<d(ac)da>,30<c(cd)a(ac)>
What is this? → look at the last lines of sequence mining
3. Max miner algo


4. K means vs GMM




Both 2 clusters -> where would X1 and X2 be after 1 iteration of clustering from these
starting points given the data (for K means and for GMM)

, Date Download: 19/06/2023



How can you estimate this for the GMM case?


Someone who know what the GMM would look like?
=> EM clusering example in slides, plot it out

=> There should be an intuitive way of doing this, no :((((? HELP

5. Short questions (only know the answers not the question)
a. LR and overfitting
b. GBRT with a small LR
c. Run learning algo on data with actively acquired labels
d. Drawback to toivonens algorithm

6. Question DTW( diagram and how to improve DTW to prevent noise)
Someone who knows the answer to this?
There are slides on Longest Common Subsequence (LCSS) that tackle the noise
problem by allowing for gaps. It includes the algorithm and example.

7. KNN for outliers(rank the points from most to least anomalous)
8. Like slide (Some Data Puzzels) p54-55 the table

2021-07-18

1. Exercise on the generate + prune step of apriori (single iteration)
2. Compute LCSS (Time series)
3. Predict movie ratings using collaborative filtering
4. Exercise on complete link agglomerative clustering
5. GMM: rank the points from most to least anomalous
Data is represented by a mixture of Gaussian ⇒ each example x has a probability p(x)
of being generated by the GMM
High p(x) → GMM is probable to generate this sample x → no anomaly
Low p(x) → GMM is unlikely to generate the sample x → anomaly
How can he ask this? Given alpha and probabilities of x belonging to a cluster?
Anomaly detection -> slide 13 → This is kNN for anomalies tho… for distances farthest away
is most anomalous. For GMM / probabilities you want to order from low to high (low chance
to generate this, so hence highly likely anomalous)
6. Convert the data from a training set into the proper format for logistic regression
What are we supposed to do here?
I guess this is related to the fact that logistic regression methods require the input data to be
numerical and therefore you need to convert categorical variables into indicator variables
(dummy coding)
So e.g. when you have data with labels (small, medium large) you can convert it to (0,1,2)?
$13.17
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
sepm13 Katholieke Universiteit Leuven
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
35
Miembro desde
3 año
Número de seguidores
26
Documentos
10
Última venta
6 meses hace

3.0

2 reseñas

5
1
4
0
3
0
2
0
1
1

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes