100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Analysis of Customer Data

Beoordeling
-
Verkocht
2
Pagina's
35
Geüpload op
29-03-2023
Geschreven in
2022/2023

Summary/lecture notes of the course Analysis of Customer Data.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
29 maart 2023
Aantal pagina's
35
Geschreven in
2022/2023
Type
Samenvatting

Voorbeeld van de inhoud

Summary analysis of customer data
Association rule mining  tries to find correlations between different items or itemset/sets
of items in a database.
- A rule-based machine learning method for discovering interesting relations between
variables in large databases.
- Initially used for market basket analysis to find how items purchased by customers
are related.
- Assumes all data are categorical. There is no good algorithm for numerical data.

Terminology
 Set of items  I = {i1, i2 …, im}
 Supermarket example: a list of all articles sold in a supermarket.
 Transaction  t (smaller set of items)
 t  I, t is a subset of the entire set of items I.
 the articles that someone buys in a supermarket.
 Transaction database  T = {t1, t2, …, tn}
 Collection of transactions.
 Example transaction database:




Another example: you could analyze a set of documents to see what kind of words, concepts,
or tags co-occur often. At that point, each document could be correlated to a list of
keywords.




- The keywords are items.
- Each document is a transaction.
- All the documents put together make up a transaction dataset.
- All the words that appear in the entire transaction database is I, the set of all possible
items.

Association rules
Association rules are "if-then" statements, that help to show the probability of relationships
between data items, within large data sets in various types of databases.
An association rule is an implication of the form:
- X  Y, where X, Y  I, and X  Y = 
A single item is an itemset of size 1.
A k-itemset is an itemset with k items.

, - E.g., {milk, bread, cereal} is a 3-itemset.

How do you determine which patterns are good/interesting or not interesting?
 Support measure  tells us how often a particular item occurs in our database or
how often a particular association rule holds in our dataset.
 Sup = Pr(X Y)
 It measures the probability of encountering an association rule or an itemset in
the dataset.
 Confidence measure  looks at the conditional probability. X implies Y but that
doesn’t mean that Y also implies X. it’s a one-directional association rule.
 Conf = Pr(Y | X)
 The confidence tells us what proportion of transactions that contain X also
contain Y.
When you’ve computed the support and confidence of a particular association rule, you can
tell how frequently it occurs and you can tell how confident you are in that rule. So, an
association rule tells us that when X occurs, Y occurs with a certain probability.

Support count = number of transactions in T that contain X.
- T = transaction dataset and X = itemset




n = the number of transactions in T.
 Support and support count are often used as synonyms.
 Absolute support = support count (number of transactions that contain the itemset)
 Whole number
 Relative support = support (proportion of transactions that contain the itemset)
 Real number between 0 and 1

Goal = to find all rules that satisfy the user-specified minimum support (minsup) and
minimum confidence (minconf).
- Confidence measures how strong the rule is, support measures how reliable it is.

Example:
Transaction data:




Minsup = 30%
Minconf = 80%

An example frequent itemset:
- {Chicken, Clothes, Milk} [sup = 3/7]
Association rules form the itemset:

, - {Clothes}  {Milk, Chicken} [sup = 3/7, conf = 3/3] confident
- {Milk, Chicken}  {Clothes} [sup = 3/7, conf = 3/4] not confident

Mining algorithms
There are many. They all use different strategies and data structures.
- Any algorithm should find the same set of rules although their computational
efficiencies and memory requirements may be different.
We’ll study only one:

The apriori algorithm
Has two steps:
1. Find all item sets that satisfy the minimum support threshold (frequent item sets).
2. Use frequent item sets to generate rules.

Step 1: mining all frequent itemsets
Frequent itemset = itemset whose support  minsup.
The apriori property (the key idea of the apriori algorithm): all subsets of a frequent itemset
are also frequent itemsets.
Iterative algo (also called level-wise search):
- Find all 1-item frequent itemsets, then all 2-item itemsets, etc.
- In each iteration k, only consider itemsets that contain some k-1 frequent itemset.
Example:




- Minsup = 0.5
You can see that when looking at the itemsets from a specific length, you only keep the ones
that comply to the minsup value. That is why in the F1 step itemset {4} is dropped.

The items are sorted in lexicographical order.
- Also called dictionary order.
 For example, the permutations of {1,2,3} in lexicographic order are 123, 132, 213,
231, 312, and 321.




Apriori candidate generation

, Join step: generate all possible candidate itemsets Ck of length k.
Prune step: remove those candidates in Ck that cannot be frequent.
Simpler explanation:
1. Generate a list of all candidate itemsets of length-1  each item in the dataset is a
candidate itemset.
2. Prune infrequent 1-itemsets  any candidate 1-itemset that does not meet the
minsup threshold is discarded.
3. Generate a list of candidate 2-itemsets  the algorithm generates itemsets of length-
2 by combining frequent 1-itemsets.
4. Prune infrequent 2-itemsets  any candidate 2-itemset that does not meet the
minsup threshold is discarded.

Step 2: generating rules from frequent itemsets
One more step is needed to generate association rules.
The algorithm generates association rules form the frequent itemsets, where each rule
consists of an antecedent (a set of items) and a consequent (a set of items). The confidence
of each rule is calculated as the support of the combined antecedent and consequent divided
by the support of the antecedent alone.
- A  B is an association rule if: confidence (A  B) > minconf
(support(A  B) = support (AB) = support(X),
Confidence(A  B) = support (AB) / support(A))
Generating rules example:
Suppose {2, 3, 4} is frequent, with sup=50%
- Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup = 50%, 50%, 75%
75%, 75%, 75%, 75% respectively
These generate the following association rules:




The apriori algorithm is a very computational expensive algorithm.
Clearly the space of all association rules is exponential, O(3 m), where m is the number of
items in I. The mining exploits sparseness of data, and high minimum support and high
minimum confidence values. Still, it typically produces a huge number of rules, thousands,
tens of thousands, millions, …

Itemsets  are unordered collections of items, events, etc.
Sequential patterns  items, elements, events, etc. are ordered.



Transactional database vs. sequential database

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
sophiedekkers54 Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
45
Lid sinds
7 jaar
Aantal volgers
24
Documenten
3
Laatst verkocht
1 maand geleden

3,0

2 beoordelingen

5
0
4
0
3
2
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen