100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Samenvatting Data Mining and its Applications (EBB056B05)

Puntuación
4.0
(1)
Vendido
11
Páginas
96
Subido en
24-06-2024
Escrito en
2023/2024

Samenvatting van de colleges van Data Mining and its Applications, alle slides van alle lectures zijn hierin opgenomen en aangevuld met materiaal van het boek/uitleg van chatGPT. Ik heb zelf een 8,5 gehaald op het tentamen met deze samenvatting er bij.

Mostrar más Leer menos
Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Libro relacionado

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

¿Un libro?
Subido en
24 de junio de 2024
Número de páginas
96
Escrito en
2023/2024
Tipo
Resumen

Temas

Vista previa del contenido

Lecture 1............................................................................................................................... 3
Lecture 2: Regression..........................................................................................................8
R-squared vs. RMSE.................................................................................................... 10
Linear regression:....................................................................................................... 11
Polynomial regression:................................................................................................12
Regression tree: the algorithm....................................................................................12
Bootstrap AGGregating (Bagging): for each tree/model a training ste is generated by
sampling uniformly with replacement from the standard training set...........................13
Generalization............................................................................................................. 16
Advantages of 5-Fold Cross-Validation...................................................................17
Lecture 3: Time series analysis.......................................................................................... 17
Seasonal effect:..........................................................................................................18
Exponential smoothing............................................................................................... 21
Stationarity................................................................................................................ 22
A seasonal difference is the difference between an observation and the corresponding
observation from the previous (seasonal) cycle...........................................................23
ARIMA Models:........................................................................................................... 24
Sequence segmentation.............................................................................................29
Characteristics of a time series................................................................................... 31
Lecture 4: clustering......................................................................................................... 32
Hierarchical Clustering (Linkage-Based Clustering).................................................... 32
K-Means Clustering (Model-Based Clustering).............................................................32
Density-Based Clustering (DBScan)............................................................................ 33
Example:...............................................................................................................34
Importance of MinPts:...........................................................................................34
Clustering Evaluation..................................................................................................34
Attribute Weighting.................................................................................................... 46
Prototype & model-based (k-means,... clustering).......................................................47
Partitioning; goal: a (disjoint) partitioning into k clusters with minimal costs.............. 47
K-means.....................................................................................................................48
Outliers: k-means vs. k-medoids.................................................................................48
Density-based clustering............................................................................................49
Clustering evaluation...................................................................................................51
Lecture 5: Classifiers; Decision Trees, Model validation...................................................56
Decision Trees............................................................................................................56


1

, Evaluation measures - Shannon Entropy.....................................................................63
Gain Ratio...................................................................................................................70
Gini Index.................................................................................................................... 71
x^2 measure............................................................................................................... 72
Decision Trees - Missing Values...................................................................................73
Pruning.......................................................................................................................74
Reduced Error Pruning................................................................................................76
Pessimistic Pruning.................................................................................................... 76
Model Validation......................................................................................................... 78
Lecture 6: Additional topics on Data Mining......................................................................86
Lecture 7: overview............................................................................................................ 91
ChatGPT..............................................................................................................................92
Example Usage..................................................................................................... 92
Row Splitter Node............................................................................................92
Partitioning Node............................................................................................ 92
Practical Example................................................................................................. 93
How Gain Ratio is Calculated:................................................................................ 93
Example Use:........................................................................................................ 93
How Gini Index is Calculated:.................................................................................94
Purpose of the Gini Index:..................................................................................... 94
Example Use:........................................................................................................94
Characteristics of String Variables........................................................................ 95
Use in Data Mining................................................................................................. 95
Handling String Variables...................................................................................... 95
Example................................................................................................................96




2

,Lecture 1
What is data mining?
→ the extraction of interesting information or patterns from large data sets, which may originally have been
developed for other purposes.

Data states:
● Data at rest
● Data on the move
● Data in use

From data to knowledge:




Data mining project understanding
- What is the primary objective?
- What are the criteria for success?



3

, - These are difficult to define
- Stakeholders involved in the data analysis/mining process speak different languages




Data Mining Stakeholders
● Business User: business understanding
○ Has a sound understanding of the business domain targeted by the data mining project. The
person can offer insight into the project context, the business value sought to be extracted via
data mining and advise on how results can be operationalized.
● Project Sponsor: project driver
○ The initiator or driver for the data mining project. Concerned with the potential ROI and sets
priorities and desired outputs. This person is championing the project, motivating
engagement of key personnel around the business problem.
● Project Manager: end-to-end project delivery
○ In charge for the data mining project implementation and is concerned with meeting goals for
quality, time and budget targets.
● Business Intelligence Analyst: data understanding
○ Bridge between the data and the business view of the targeted problem. Maintaining a sound
understanding of relevant data, the Business Intelligence Analyst is driving activities related to
Key Performance Indicators (KPIs) and extracting relevant data for reporting and dashboarding
purposes. Understands sources and ‘consumers’ of data, as well as need for changes in data
management processes
● Data Administrator & Integrator: data preparation & solution delivery
○ Provides action support for implementing key data access and processing activities, needed
by stakeholders of the data mining project. A technical person with sound data management
competences, including awareness of security and/or privacy concerns would be appropriate.
● Data Scientist/Engineer: data modeling of evaluation
○ This person combines data management skills with a sound understanding of data analysis
methods and tools and is driving the ingestion of data into the overall data analytics process.
The data scientist is able to communicate the analytics methods to the other stakeholders.
→ the data engineer and administrator + integrator are working closely on the technical side of data mining
and share relevant code and documentation.

Data Mining Project Workflow
1. Inception and discovery
a. Tool to sketch beliefs, experiences, known factors
b. How often will a certain product be found in a basket?
2. Data preparation




4
$9.59
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Reseñas de compradores verificados

Se muestran los comentarios
5 meses hace

4.0

1 reseñas

5
0
4
1
3
0
2
0
1
0
Reseñas confiables sobre Stuvia

Todas las reseñas las realizan usuarios reales de Stuvia después de compras verificadas.

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
donnakartoidjojo Rijksuniversiteit Groningen
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
43
Miembro desde
3 año
Número de seguidores
19
Documentos
12
Última venta
1 mes hace

4.3

3 reseñas

5
1
4
2
3
0
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes