100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Samenvatting Data Mining and its Applications (EBB056B05)

Beoordeling
4,0
(1)
Verkocht
11
Pagina's
96
Geüpload op
24-06-2024
Geschreven in
2023/2024

Samenvatting van de colleges van Data Mining and its Applications, alle slides van alle lectures zijn hierin opgenomen en aangevuld met materiaal van het boek/uitleg van chatGPT. Ik heb zelf een 8,5 gehaald op het tentamen met deze samenvatting er bij.

Meer zien Lees minder











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Ja
Geüpload op
24 juni 2024
Aantal pagina's
96
Geschreven in
2023/2024
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Lecture 1............................................................................................................................... 3
Lecture 2: Regression..........................................................................................................8
R-squared vs. RMSE.................................................................................................... 10
Linear regression:....................................................................................................... 11
Polynomial regression:................................................................................................12
Regression tree: the algorithm....................................................................................12
Bootstrap AGGregating (Bagging): for each tree/model a training ste is generated by
sampling uniformly with replacement from the standard training set...........................13
Generalization............................................................................................................. 16
Advantages of 5-Fold Cross-Validation...................................................................17
Lecture 3: Time series analysis.......................................................................................... 17
Seasonal effect:..........................................................................................................18
Exponential smoothing............................................................................................... 21
Stationarity................................................................................................................ 22
A seasonal difference is the difference between an observation and the corresponding
observation from the previous (seasonal) cycle...........................................................23
ARIMA Models:........................................................................................................... 24
Sequence segmentation.............................................................................................29
Characteristics of a time series................................................................................... 31
Lecture 4: clustering......................................................................................................... 32
Hierarchical Clustering (Linkage-Based Clustering).................................................... 32
K-Means Clustering (Model-Based Clustering).............................................................32
Density-Based Clustering (DBScan)............................................................................ 33
Example:...............................................................................................................34
Importance of MinPts:...........................................................................................34
Clustering Evaluation..................................................................................................34
Attribute Weighting.................................................................................................... 46
Prototype & model-based (k-means,... clustering).......................................................47
Partitioning; goal: a (disjoint) partitioning into k clusters with minimal costs.............. 47
K-means.....................................................................................................................48
Outliers: k-means vs. k-medoids.................................................................................48
Density-based clustering............................................................................................49
Clustering evaluation...................................................................................................51
Lecture 5: Classifiers; Decision Trees, Model validation...................................................56
Decision Trees............................................................................................................56


1

, Evaluation measures - Shannon Entropy.....................................................................63
Gain Ratio...................................................................................................................70
Gini Index.................................................................................................................... 71
x^2 measure............................................................................................................... 72
Decision Trees - Missing Values...................................................................................73
Pruning.......................................................................................................................74
Reduced Error Pruning................................................................................................76
Pessimistic Pruning.................................................................................................... 76
Model Validation......................................................................................................... 78
Lecture 6: Additional topics on Data Mining......................................................................86
Lecture 7: overview............................................................................................................ 91
ChatGPT..............................................................................................................................92
Example Usage..................................................................................................... 92
Row Splitter Node............................................................................................92
Partitioning Node............................................................................................ 92
Practical Example................................................................................................. 93
How Gain Ratio is Calculated:................................................................................ 93
Example Use:........................................................................................................ 93
How Gini Index is Calculated:.................................................................................94
Purpose of the Gini Index:..................................................................................... 94
Example Use:........................................................................................................94
Characteristics of String Variables........................................................................ 95
Use in Data Mining................................................................................................. 95
Handling String Variables...................................................................................... 95
Example................................................................................................................96




2

,Lecture 1
What is data mining?
→ the extraction of interesting information or patterns from large data sets, which may originally have been
developed for other purposes.

Data states:
● Data at rest
● Data on the move
● Data in use

From data to knowledge:




Data mining project understanding
- What is the primary objective?
- What are the criteria for success?



3

, - These are difficult to define
- Stakeholders involved in the data analysis/mining process speak different languages




Data Mining Stakeholders
● Business User: business understanding
○ Has a sound understanding of the business domain targeted by the data mining project. The
person can offer insight into the project context, the business value sought to be extracted via
data mining and advise on how results can be operationalized.
● Project Sponsor: project driver
○ The initiator or driver for the data mining project. Concerned with the potential ROI and sets
priorities and desired outputs. This person is championing the project, motivating
engagement of key personnel around the business problem.
● Project Manager: end-to-end project delivery
○ In charge for the data mining project implementation and is concerned with meeting goals for
quality, time and budget targets.
● Business Intelligence Analyst: data understanding
○ Bridge between the data and the business view of the targeted problem. Maintaining a sound
understanding of relevant data, the Business Intelligence Analyst is driving activities related to
Key Performance Indicators (KPIs) and extracting relevant data for reporting and dashboarding
purposes. Understands sources and ‘consumers’ of data, as well as need for changes in data
management processes
● Data Administrator & Integrator: data preparation & solution delivery
○ Provides action support for implementing key data access and processing activities, needed
by stakeholders of the data mining project. A technical person with sound data management
competences, including awareness of security and/or privacy concerns would be appropriate.
● Data Scientist/Engineer: data modeling of evaluation
○ This person combines data management skills with a sound understanding of data analysis
methods and tools and is driving the ingestion of data into the overall data analytics process.
The data scientist is able to communicate the analytics methods to the other stakeholders.
→ the data engineer and administrator + integrator are working closely on the technical side of data mining
and share relevant code and documentation.

Data Mining Project Workflow
1. Inception and discovery
a. Tool to sketch beliefs, experiences, known factors
b. How often will a certain product be found in a basket?
2. Data preparation




4

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven
6 maanden geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
donnakartoidjojo Rijksuniversiteit Groningen
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
43
Lid sinds
3 jaar
Aantal volgers
19
Documenten
12
Laatst verkocht
1 maand geleden

4,3

3 beoordelingen

5
1
4
2
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen