100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Lecture Slides & notes Real-life Machine learning (300363-B-6)

Rating
-
Sold
3
Pages
71
Uploaded on
17-12-2023
Written in
2023/2024

This document contains all the lecture slides and notes of the course 'Real-life Machine learning (300363-B-6)', given at Tilburg University as premaster for JADS. This document contains everything needed for the exam and is complete. Goodluck with the course!

Show more Read less
Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
Alle stof nodig voor het tentamen/everything needed for the exam
Uploaded on
December 17, 2023
Number of pages
71
Written in
2023/2024
Type
Summary

Subjects

Content preview

Lecture 1
What are we going to learn today?
- What is machine learning?
- What are supervised and unsupervised machine learning?
- Which are the most common types of machine learning problems?
- Which are the basic steps of the CRoss Industry Standard Process for data mining
(CRISP-DM)?

Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed

Machine learning
Assume that you are iterating over and over again an exercise
What should be constant in your exercise?
- Learning! - machine learning applies strategies and algorithms, combined with data
and statistics
- Improving! - machine learning applies statistical indices to measure the overlap
between ML prediction and expected result
When you are doing it, it is human learning
When a machine does it, it is machine learning!




An example of supervised learning
Supervised learning - classification
Given a labelled dataset, the model learns to
predict new examples




An example of unsupervised learning
Unsupervised learning - clustering,
dimensionality reduction, anomaly detection
and novelty detection
Given a dataset, without labels, the model
learns to use to cluster/group similar data

,CRISP-DM process model




Business understanding in the CRISP-DM process




Determine business objectives and success criteria
Business objectives and measures to evaluate the results have to be established

Business objectives:
● What is the customer’s primary objective?
● Increase the number of loyal customers
● Selling more of a certain product
● Have a positive marketing campaign

,Business success criteria:
● Objective measure to establish success (e.g. return of investment)

Main steps in a data mining project
1. Define the goals:
Business and data mining experts together have to define the goals. For each goal a
measure must be defined to understand its success
2. Obtain the models:
Pre-process the data, apply data mining algorithms
3. Evaluate results
Use the pre-specified measures to evaluate the models
4. Deploy:
If the evaluation is successful, the model can be deployed

Costs & benefits
Perform a cost-benefit analysis
Compute the benefits of the project (e.g. return on investment)
Compute the costs of the project - main factors:
● Data sources
● Data mining problem to be solved
● Available tools
● Expertise of the development team

Quantify the risk that the project fails:
● Knowledge not available
● Data not available
● Missing tools

Quality data & feature engineering
What are we going to learn today?
- What kind of data exists?
- How to prepare data?
- What is data balancing?
- How to apply data cleaning and feature scaling?
- What is feature selection?

, What kind of data exists?
- Structured data
- Unstructured data
- Semi-structured data

Structured data
Tabular data (rows and columns) which are very well defined
We know which columns there are and what kind of data they contain (the format is very
strict)
Often such data is stored in databases that represent the relationships between the data as
well. Questions about data can be answered by using a query language.

Unstructured data
The rawest form of data that can be any type of file.
Extracting value out of this shape of data is hard, since you need to extract structured
features from the data
For example, you might want to extract topics from movies.

Semi-structured data
This format is between structured and unstructured data
A consistent format is defined. However, the structure is not very strict. For example, it could
not be tabular or parts of the data may be missing.
Semi-structured data are often stored as files. However, some kinds of semi-structured data
can be stored in document oriented-databases. Such databases allow you to query the
sem-structured data

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Dee25 Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
132
Member since
4 year
Number of followers
74
Documents
44
Last sold
1 month ago

Hoi! Bedankt dat je een bezoekje brengt aan mijn profiel. Ik ben een student van de Master Economics met als track Data Science bij Tilburg University! Gemiddeld sta ik een 7,5 voor mijn vakken en graag wil ik jou helpen om dit ook te bereiken met mijn studie materiaal

4.0

25 reviews

5
13
4
4
3
6
2
0
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions