100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
College aantekeningen

ALL lecture notes of the course Data Analytics (P. Snoeren) (Grade: 8,5)

Beoordeling
4,0
(1)
Verkocht
9
Pagina's
37
Geüpload op
10-03-2020
Geschreven in
2019/2020

ALL Lecture notes of P. Snoeren INCLUDING: - Extra slides that he didn't include on canvas - Notes of what he said during the lectures - Exam info which he told us at the last lecture: which topics get how many questions












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
10 maart 2020
Aantal pagina's
37
Geschreven in
2019/2020
Type
College aantekeningen
Docent(en)
Onbekend
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

Notes strategy analytics lectures

Lecture 1

2 phenomena why data science is important
1. The possibility of data collection in every aspect of business
2. There is huge technological development

Big data = very large data set with 3 distinct characteristics
1. Volume = quantity of generated & stored data
2. Variety = type & nature of the data
3. Velocity = speed at which the data is generated & processed

You can own and recombine the data

Data science = involves principles, processes, and techniques for understanding phenomena
via the analysis of data
Business understanding→ data collection→ data storage→ data analysis→ implementation
➢ We focus on data analysis

Data mining = the extraction of knowledge from data, via technologies that incorporate
these principles
Data driven decision making (DDD) = refers to the practice of basing decisions on the
analysis of dtaa, rather than purely intuition
2 decisions of interest
1. Need discovery (find patterns in the data that help you understand the business)
a. E.g. Walmart after a hurricane looked at data and looked at changes in
demand after a hurricane. Saw that water was in more demand so had more
water in stock.
2. Repetitive decisions (happen on large scale)
a. E.g. when you have a contract with telecom provider at one point you want
to switch to another provider for a better offer. If the first provider can
predict when you will switch they can retain you with a better offer.

Marketing
- Online advertising (whenever you click on a link with an advert, and the page loads,
there is a bidding war going on how much people want to pay for your click)
- Recommendations for cross-selling (amazon does this when you want to buy your
photo camera, you can also buy an SD card) Things that are bought together
- Customer relationship management (Easyjet tries to give you info about how much
you travel to give you a warm feeling)

Finance
- Credit scoring and trading
- Fraud detection
- Workforce management

,Retail
- Marketing (AH bonus weeks are determined by customer behavior in the store)
- Supply chain management (predict which products are going to be bank ordered and
prevent this from happening)

Data analytics = the process of examining datasets in order to draw conclusions about the
useful info they may contain
3 types of data analytics
1. Descriptive analytics (BI): What has happened?
a. Simple descriptive statistics, dashboards, charts, diagrams
b. Simple correlational methods
2. Predictive analytics: What could happen?
a. Regression, classification
b. Advanced correlation methods
3. Prescriptive analytics: What should we do?
a. A-B testing, advanced econometric techniques
b. Causality
We focus on the first 2

Data science can help generate & sustain a CA if you align:
- Human capital
o Incentives
- Organization
o Center of excellence + local implementation (you need data scientist who can
do all the magic and local implementation with people who can speak to data
scientist and TM team)
- Culture
o Data science at core of strategy making
- Infrastructure
o No data, no DDD

Challenges in data science
From a large mass of data, you can always find something but it’s not always 100% clear if
this is generalizable to the big crowd
➢ Risk of over-fitting

Data mining process
Cross industry standards process for data mining/ analytics
➢ Also the core of the course make sure you structure your assignments according to
this model
Data analytic thinking
- Routinely transform business problems into data science problems
- Tacit skill that is only learned through trial & error

Supervised learning
Training data has one feature that is the target

,Supervised = classification, regression, similarity matching
Unsupervised = clustering, profiling, co-occurrence grouping
Both = similarity matching, link prediction, data reduction




Boundaries
- Knowledge discovery and data mining (KDD) is a subfield of machine learning
- Data science (prediction) is not econometrics (correlation & causality) is not a field of
statistics (interested if a observed distribution is likely to come from a random
distribution)
o Therefore, rely heavily on business understanding
o Always separate training, test and use data
o Also, this is why we are not interested in R2 or P-values (though we will use
other tools to evaluate models)

Case 1: Capital One
Right now very data driven company
Invest in high quality data
- Give customers random terms for their credit cards
- Allowed data on customers that normally weren’t given credit cards
- These turned out to be very profitable, i.e. those that pay off their det just enough
that they are not defaulting but Capital One still gets loads of interest

What can they do that other banks can’t?
- Customer acquisition
o Provide data driven services before they even spoke to them
- Product customization
o Differentiate interest rates for credit cards (make custom made products for
each individual customer)
- Customer retention
o Invested heavily in both IT and data analysts

What is required for Capital One to translate the business problem of fraud detection into a
data science task?

, Drawbacks of data driven strategy
- Cost and risk in data acquisition
o Providing customers with random terms for their credit cards is risky and in
short term likely to lead to losses
o Signet bank incurred losses for several years
- Capital One found out nobody recognized their brand
o Target variables generally short-term
o What is profitable in the short run does not necessarily help in the long run
- Might weed out certain customers
o Reciprocators vs. self-regulating stakeholders
o Customers who are likely to leave if someone else gives cheaper offer

Lecture 2

Datasets contain entities with certain attributes
Dataset = sample, population, data, set, work set
Entity = object, instance, observation, element, example, line, row, feature vector
Attribute = feature, characteristic, variable, column
- Predicted attribute = dependent, explained
- Predicting attribute = independent, explanatory

Model = a simplified representation of reality created to serve a purpose (abstraction of
irrelevant details)
Purpose
- Unsupervised setting: to identify (classes, group, patterns) → descriptive
- Supervised setting: to predict (try to estimate an unknown value) → predictive
o What is the value of this house?
Induction = generalizing from specific cases to general rules
e.g. developing classification and regression models
Deduction = applying general rules and specific facts to create other specific facts
e.g. using classification and regression models

Supervised & unsupervised not directly related to induction/ deduction, both can be both

Supervised segmentation
Objective: How can we segment the population into groups that differ from each with
respect to some quantity of interest?
Inputs: Informative attributes (have to be knowable beforehand, you can’t use the
value of an acquisition as input that still has to happen)
Knowable attributes that correlate with the target of interest
Outputs: Segments that are pure/ less impure in the quantity of interest

Beoordelingen van geverifieerde kopers

Alle reviews worden weergegeven
5 jaar geleden

4,0

1 beoordelingen

5
0
4
1
3
0
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
hannah2501 Universiteit van Amsterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
288
Lid sinds
10 jaar
Aantal volgers
229
Documenten
51
Laatst verkocht
8 maanden geleden

3,7

32 beoordelingen

5
8
4
11
3
9
2
2
1
2

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen