Lecture 1: Introduction
Introduction
Types of customer data
● Purchase data → e.g, everyone uses Google but we don’t pay them directly. Google earns
money from ads placed on their websites. They also make money by collecting user data and
using it for various purposes.
● Click logs → Data collected about (potential) customers or visitors’ interactions with their
websites, such as clicks, page visits, navigation patterns.
● Trajectories (GPS data) → Data collected via GPS, to detect patterns in people’s behaviours,
e.g., suggesting where you might go next or what is nearby.
● Opinions (ratings, reviews, etc.) → e.g., films, hotels, restaurants.
● Bank transactions
● Demographic data → Data about people, e.g., age, gender, location.
What can we do with customer data?
● Classification → e.g., should a bank give a loan to a particular customer? The decision is
based on historical data from similar customers. The bank estimates the risk using the data
you provided and compares it to patterns in their full customer database.
● Clustering → e.g., finding subgroups of customers to organise targeted marketing campaigns.
This is the basis of personalised marketing: companies segment their customers and target
each group with tailored messages or offers based on shared characteristics or interests.
● Recommender systems → e.g., what product might a customer want to buy next? Popular
topic in analysis of customer data field. Recommendations can be based on:
- Item similarity = recommending items similar to those you’ve interacted with.
- User similarity = recommending items liked by users who are similar to you.
● Next event prediction → e.g., pre-fetching web pages based on expected clicks. Companies
try to predict what you’ll do next (e.g., buy, browse, or leave) and can use this to act, taking
action to try to get you to stick around (e.g., offer discounts).
● And much more.
This course: focus on pattern mining
● Patterns are interesting:
- People who buy diapers also buy beer
- People who like “Lord of the Rings” also like “Harry Potter”
- People read domestic news before international news
● Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behaviour (and
patterns in similar customers’ purchase behaviour)
- Find anomalies in bank transactions (potential fraud)
- Patterns are the normal characteristics of data. Anomalies are deviations
from these known patterns. So, if a transaction doesn’t follow typical patterns,
it might be flagged as suspicious.
- Place beer close to diapers on supermarket shelves
1
,Frequent itemsets & association rules
Designed for categorical data (e.g., product IDs, user IDs); not suited for numerical data. The idea
began with pattern mining in purchase data, especially in supermarkets. The goal was to identify not
just which products are frequently bought together, but also whether buying one product influences
the purchase of another. This leads to association rules, which describe how one itemset (e.g.,
bread) might imply the presence of another (e.g., milk). Association rule mining involves two steps:
1. Mining frequent itemsets (products that often appear together)
2. Generating association rules from these itemsets.
Patterns are evaluated using two different measures:
● Support: how often the pattern appears in the dataset.
● Confidence: how strong the correlation is between the two itemsets.
● I = {i1, i2, …, im}: a set of items → e.g., all products sold in a supermarkt.
● Transaction t: t is a set of items, meaning t ⊆ I. → This means every item in a transaction is
also part of the overall itemset I. A transaction represents a list of items that one customer
bought in a single visit, i.e., items purchased together.
● Transaction Database T = {t1, t2, …, tn}: a set of transactions → e.g., the entire purchase
history recorded by the supermarket. It’s a simplified model:
- Repeated purchases of the same item are only recorded once.
- Prices of items are not included, only the presence/absence of items matters.
2
,Though originally developed for market basket analysis, these models are widely applicable, including
domains like text analysis, where each transaction is a document and each item is a word. This
illustrates how association rule mining extends beyond customer data analysis.
● The model: rules
- I is the full set of all items (e.g., everything sold in a supermarket).
- A transaction (t) is a set of items bought together (e.g., {milk, bread}).
- It contains an itemset X if all items in X are also in t.
- Written as: X ⊆ t
- For example: if t = {milk, bread, eggs}, then it contains X = {milk, bread}.
● An association rule expresses a relationship between two itemsets.
- Written as: X → Y
- X and Y are both itemsets (i.e., subsets of I).
- X ∩ Y = ∅ → This means X and Y should not overlap.
- For example: if X = {bread} and Y = {milk}, the rule {bread} → {milk} means: if
someone buys bread, they are likely to buy milk too.
● Association rule mining helps us understand how products are often bought together by
analysing transactions. We first identify frequent itemsets (sets of items that appear often
together in transactions). Then, generate rules that describe likely co-purchases (statements
about how the presence of certain items implies the presence of others).
3
, ● Support measures how often a particular itemset or rule occurs in your dataset.
- The rule holds with support (sup) if sup% of transactions contain X ∪ Y (X union Y).
- Think of it as: if you pick a random transaction, what’s the chance it contains both X
and Y? → e.g., {milk, bread} likely has a higher support than {caviar, champagne}.
● Confidence measures how often Y appears in transactions that already contain X.
- The rule holds with confidence (conf) if conf% of transactions that contain X also
contain Y.
- This is a conditional probability: what’s the chance you see Y, given that X is present?
→ e.g., out of all bread purchases, how many also include milk?
● An association rule is a pattern: when X occurs, Y occurs with a certain probability. The
higher the probability, the stronger the rule.
● Support count of an itemset X (X.count) is the number of transactions in dataset T that
contain X. It’s the absolute version of support. → e.g., if 10 out of 100 transactions include
{bread, milk} the support count of {bread, milk} is 10.
4