100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Data Mining For Business And Governance (880022-M-6)

Rating
-
Sold
-
Pages
17
Uploaded on
21-06-2022
Written in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examples for the course "Data Mining for Business and Governance" at Tilburg University which is part of the Master Data Science and Society. Course was given by Ç. Güven, G.R. Nápoles during the second semester, block three of the academic year 2021 / 2022 (January to March 2022).

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 21, 2022
Number of pages
17
Written in
2021/2022
Type
Summary

Subjects

Content preview

Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Data Mining for Business and Governance (880022-M-6)
Lecturers: Ç. Güven, G.R. Nápoles

,Introduction to Data Mining
• no fixed definition, umbrella term
o Knowledge discovery in databases, Statistics, Artificial Intelligence, Machine learning
• Computation vs large data sets: trade-off between processing time and memory
o the larger the dataset, the more computational resources are needed
• Large amounts or big data: Volume, Variety, Velocity

Pipeline of a data mining task




Basic data types
• Dependency oriented: explicit or implicit relationships
• Non-Dependency oriented: no specified dependency between records (multidimensional
data)
• For many machine learning models, observations are assumed to be independent

What makes prediction possible?
• Associations between features/target, understand how datapoints are related
• Numerical: correlation coefficient
• Categorical: mutual information Value of x1 contains information about value of x2

Correlation coefficient
• Pearson's r/R measures the strength of linear relationship (dependency), no other shapes
• range (-1,1), the lower the number, the more dispersed the data is, 0 = randomly distributed
• for a strong linear relationship between two features, one of the features can be linearly
expressed in terms of the other and that makes one of those redundant in analysis





• Numerator: covariance (to what extent the features change together)
• Denominator: product of standard deviations (makes correlations independent of units)

Correlation versus causation
• Correlation does not imply causation
• correlation is a coincidence
• explain and check causation in an experimental study
o vary a single variable while the others are kept equal

, Supervised learning
• use labeled data to train the algorithm
• classification and regression problems

learning workflow
• 1) collect data
o consider reliability of measurement, privacy, and other regulations
o split data into training, validation, and test set with similar structure
▪ training set for learning
▪ validation set for tuning and setting hyperparameters
▪ test set for final evaluation
• 2) label examples (sometimes part of data collection)
o Annotation guidelines, Measure inter-annotator agreement, Crowdsourcing
• 3) choose representation (part of preprocessing)
o Features: attributes describing examples
o Observations: observed values for a given attribute
▪ numerical features: discrete or continuous
▪ categorical / nominal features, binary features
▪ ordinal features (scale)
o features can be converted to a vector
o ‘feature transformation’: e.g., use dummy coding to transform a categorical feature
to a numerical one
o ‘feature extraction’: select relevant features which represent the input and define
the output
• 4) train model(s)
o hyperparameters: settings for an algorithm decided by the programmer
▪ for each value of hyperparameter:
1) Apply algorithm to training set to learn
2) Check performance on validation set
3) Find/Choose best-performing setting
• 5) evaluate
o Check performance of tuned model on test set
o Goal: estimate how well your model will do in the real world (generalization)

regression task: predicting a numeric quantity
• regression analysis describes the relationship between random variables
• it can predict the value of one variable based on another variable and show trends
• output of regression problem is a function describing the relation between x and y
• numerical prediction (predict values for continuous variables) possible unlike classification

linear regression
• simplest regression technique with two types of variables
• aim is to minimize the difference between the predicted and the actual values
• measurements
o sum of squared errors




o or different loss functions
$7.18
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
hannahgruber Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
101
Member since
3 year
Number of followers
63
Documents
9
Last sold
3 days ago

4.3

8 reviews

5
5
4
1
3
1
2
1
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions