100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Data Mining For Business And Governance (880022-M-6)

Beoordeling
-
Verkocht
-
Pagina's
17
Geüpload op
21-06-2022
Geschreven in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examples for the course "Data Mining for Business and Governance" at Tilburg University which is part of the Master Data Science and Society. Course was given by Ç. Güven, G.R. Nápoles during the second semester, block three of the academic year 2021 / 2022 (January to March 2022).

Meer zien Lees minder










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
21 juni 2022
Aantal pagina's
17
Geschreven in
2021/2022
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Data Mining for Business and Governance (880022-M-6)
Lecturers: Ç. Güven, G.R. Nápoles

,Introduction to Data Mining
• no fixed definition, umbrella term
o Knowledge discovery in databases, Statistics, Artificial Intelligence, Machine learning
• Computation vs large data sets: trade-off between processing time and memory
o the larger the dataset, the more computational resources are needed
• Large amounts or big data: Volume, Variety, Velocity

Pipeline of a data mining task




Basic data types
• Dependency oriented: explicit or implicit relationships
• Non-Dependency oriented: no specified dependency between records (multidimensional
data)
• For many machine learning models, observations are assumed to be independent

What makes prediction possible?
• Associations between features/target, understand how datapoints are related
• Numerical: correlation coefficient
• Categorical: mutual information Value of x1 contains information about value of x2

Correlation coefficient
• Pearson's r/R measures the strength of linear relationship (dependency), no other shapes
• range (-1,1), the lower the number, the more dispersed the data is, 0 = randomly distributed
• for a strong linear relationship between two features, one of the features can be linearly
expressed in terms of the other and that makes one of those redundant in analysis





• Numerator: covariance (to what extent the features change together)
• Denominator: product of standard deviations (makes correlations independent of units)

Correlation versus causation
• Correlation does not imply causation
• correlation is a coincidence
• explain and check causation in an experimental study
o vary a single variable while the others are kept equal

, Supervised learning
• use labeled data to train the algorithm
• classification and regression problems

learning workflow
• 1) collect data
o consider reliability of measurement, privacy, and other regulations
o split data into training, validation, and test set with similar structure
▪ training set for learning
▪ validation set for tuning and setting hyperparameters
▪ test set for final evaluation
• 2) label examples (sometimes part of data collection)
o Annotation guidelines, Measure inter-annotator agreement, Crowdsourcing
• 3) choose representation (part of preprocessing)
o Features: attributes describing examples
o Observations: observed values for a given attribute
▪ numerical features: discrete or continuous
▪ categorical / nominal features, binary features
▪ ordinal features (scale)
o features can be converted to a vector
o ‘feature transformation’: e.g., use dummy coding to transform a categorical feature
to a numerical one
o ‘feature extraction’: select relevant features which represent the input and define
the output
• 4) train model(s)
o hyperparameters: settings for an algorithm decided by the programmer
▪ for each value of hyperparameter:
1) Apply algorithm to training set to learn
2) Check performance on validation set
3) Find/Choose best-performing setting
• 5) evaluate
o Check performance of tuned model on test set
o Goal: estimate how well your model will do in the real world (generalization)

regression task: predicting a numeric quantity
• regression analysis describes the relationship between random variables
• it can predict the value of one variable based on another variable and show trends
• output of regression problem is a function describing the relation between x and y
• numerical prediction (predict values for continuous variables) possible unlike classification

linear regression
• simplest regression technique with two types of variables
• aim is to minimize the difference between the predicted and the actual values
• measurements
o sum of squared errors




o or different loss functions

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
hannahgruber Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
101
Lid sinds
3 jaar
Aantal volgers
63
Documenten
9
Laatst verkocht
3 dagen geleden

4,3

8 beoordelingen

5
5
4
1
3
1
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen