Summary

Samenvatting Data Science Methods for MADS (EBM216A05)

Rating

Sold

Pages

Uploaded on

05-02-2021

Written in

2020/2021

Summary of all lectures and mandatory literature included in the course Data Science Methods for MADS.

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Rijksuniversiteit Groningen (RuG)
Study: Marketing Analytics & Data Science
Course: Data Science Methods for MADS (EBM216A05)

All documents for this subject (7)

Document information

Uploaded on: February 5, 2021
Number of pages: 79
Written in: 2020/2021
Type: Summary

Subjects

marketing analytics
mads
data science
marketing intelligence
data science methods
dsm
rug

Content preview

Data Science Methods – Lectures and readings

Week 1
Lecture 1

Introduction to Machine Learning

Machine Learning according to Herbert Alexander Simon:
 Learning is any process by which a system improves performance from experience.
 Machine Learning is concerned with computer programs that automatically improve their
performance through experience.
 For example spam e-mails. After analyzing several spam e-mails, machine recognizes
spam e-mails because of similar characteristics (unsupervised learning)
 Supervised learning is that rules and restrictions are given to the machine

Why machine learning?
 Develop systems that can automatically adapt and customize themselves to individual
users.
 Discover new knowledge from large databases (data mining).
 Ability to mimic human and replace monotonous tasks – which require some intelligence.
 Develop systems that are too difficult/expensive to construct manually because they
require
detailed skills or knowledge tuned to a specific task (knowledge engineering bottleneck).

Statistics, machine learning and data mining: distinctions are fuzzy!
 Statistics: more theory based, more focused on testing hypothesis
 Machine learning: more heuristic, focused on improving performance of a learning agent
Looks at real time learning and robotics
 Data mining: integrates theory and heuristics, focus on the entire process of knowledge
discovery, including data cleaning, learning, and integration and visualization of results.

Machine learning:
 Study of algorithms that improve their performance at some task with experience.
 Optimize a performance criterion using example data or past experience.
 Role of statistics: inference from a sample
 Role of computer science: efficient algorithms to solve the optimization problem and
representing and evaluating the model for inference.

Machine learning is the preferred approach to:
 Speech recognition, natural language processing, computer vision, medical outcome
analysis, robot control.

This trend is accelerating because of:
 Improved machine learning algorithms, improved data capture/networking/faster
computers,
software too complex to write by hand, new sensors/IO devices, demand for self
customization to user/environment, failure of expert systems in the 1980’s.

Origins of data mining:

, Draws ideas from machine learning/artificial intelligence, pattern recognition, statistics
and
database systems.
 Traditional techniques may be unsuitable due to enormity of data.
 High dimensionality of data: heterogeneous and distributed nature of data
What is data mining?
 Extraction of implicit, previously unknown and potentially useful information from data.
 Exploration & analysis, by automatic or semi-automatic means, of large quantities of data
in order to discover meaningful patterns.

Knowledge discovery in databases (KDD) is…
 The non-trivial process of identifying
implicit (by contrast to explicit)
valid (patterns should be valid on new data)
novel (novelty can be measured by comparing to expected values)
potentially useful (should lead to useful actions)
understandable (to humans)
… patters in data
 Data mining is a step in the KDD process

Steps of a KDD process:
1. Data cleaning missing values, noisy data, inconsistent data
2. Data integration merging data from multiple data stores
3. Data selection select data relevant to the analysis
4. Data transformation aggregation (daily to weekly sales) or generalization (street to city)
5. Data mining apply intelligent methods to extract patterns
6. Pattern evaluation interesting patterns should contradict the user’s belief or confirm hypothesis
7. Knowledge presentation visualization and representation techniques to present mined ideas

Good to know…
 60 to 80% of the KDD effort is preparing the data and the remaining 20% is mining
 A data mining project always starts with analysis of the data with traditional query tools:
• 80% of the interesting information can be extracted using SQL
• how many transactions per month include item number 15?
• show me all the items purchased by Mandy Smith.
• 20% of hidden information requires more advanced techniques

, • which items are frequently purchased together by my customers?
• how should I classify my customers in order to decide whether future loan
applicants will be given a loan or not?

Data mining tasks
 Prediction Tasks
• Use some variables to predict unknown or future values of other variables
 Description Tasks
• Find human-interpretable patterns that describe the data.

 Common data mining tasks
• Classification [Predictive]
• Clustering [Descriptive]
• Association Rule Discovery [Descriptive]
• Sequential Pattern Discovery [Descriptive]
• Regression [Predictive]
• Deviation Detection [Predictive]

Applications of KDD
 Retail: market basket analysis, customer relationship management (CRM)
 Finance: credit scoring, fraud detection
 Manufacturing: optimization, troubleshooting
 Medicine: medical diagnosis
 Web mining: search engines (web 2.0)

Classification of ML algorithms
There are two types of ML algorithms: supervised learning and unsupervised learning.

 Supervised learning (Dependent techniques)
• Training data includes both the input and the desired results.
• For some examples the correct results (targets) are known and are given in input to the
model
during the learning process.
• The construction of a proper training, validation and test set is crucial
• Have to be able to generalize: give the correct results when new data are given in input
without
knowing a priori the target.
• (Logistic) Regression, Neural Networks, SVM

 Unsupervised learning (Interdependent techniques)
• The model is not provided with the correct results during the training.
• Can be used to cluster the input data in classes on the basis of their statistical properties
only.
• Examples: clustering, factor analysis

, Example supervised learning (algorithms)
 Logistic regression
 Data: bankruptcy of firms over years
• Firm level variables (leverage, liquidity)
• Industry level variables (market growth, recession)
• Marketing related variables (capabilities, assets)

Notes: logistic regression fits better, it’s a kind of machine learning.

The uses of supervised learning
Example: decision trees tools that create rules
 Prediction of future cases: use the rule to predict the output for future inputs
 Knowledge extraction: the rule is easy to understand
 Compression: the rule is simpler than the data it explains
 Outlier detection: exceptions that are not covered by the rule, e.g., fraud

Unsupervised learning
 Learning “what normally happens”
 No output
 Clustering: grouping similar instances
 Other applications: summarization, association analysis
 Example applications
• Customer segmentation in CRM
• Image compression: colour quantization
• Bioinformatics: learning motifs

Reinforcement learning
 Topics:
• Policies: what actions should an agent take in a particular situation

$4.79

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

nikkinuman

3.9

(300)

Also available in package deal

Get to know the seller

nikkinuman Rijksuniversiteit Groningen

View profile

Sold

532

Member since

9 year

Number of followers

363

Documents

Last sold

8 months ago

Cum laude afgestudeerde hotello & marketing student RUG

Door al mijn zelfgemaakte samenvattingen, verslagen en handige documenten ben ik cum laude afgestudeerd aan ''International Hospitality Management''. Daarnaast heb ik de Pre-Master Marketing afgerond aan de RUG zonder herkansingen en ben ik begonnen met de master MADS (Marketing Analytics & Data Science) aan de RUG Daarnaast ben ik een nieuwe weg in geslagen door de master Marketing te volgen, ook deze samenvattingen deel ik graag met jullie.

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller nikkinuman. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.79. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 50201 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Samenvatting Data Science Methods for MADS (EBM216A05)

Written for

Document information

Subjects

Content preview

Also available in package deal

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?