DATA MINING EXAM Q&A
User-based collaborative filtering - Answer-1) Starts with user
2) Finds users who have purchased a similar set of items or ranked items in a similar
fashion
3) Makes recommendation to the initial user based on what the similar users
purchase or like
Item-based collaborative filtering - Answer-1) Starts with an item being considered by
a user
2) locates other items that tend to be co-purchased with that first item
Ratings Matrix - Answer-x=ratings
y=items
Most of the time will be mostly empty because it requires inputs by the customer
Out of 100 items a customer may only rate 10
Process of building a ratings matrix - Answer-1) Assemble known ratings
2) Predict unknown ratings
3) Prediction evaluation
What is the objective of a recommendation engine? - Answer-To fill blank cells of the
ratings matrix with predicted ratings for each user and the item, which the user
presumably hasn't experienced yet.
What is Collaborative filtering based on? - Answer-The idea that a user will prefer an
item if it is recommended by their like-minded friends
Distinguishing feature of Collaborative filtering - Answer-The algorithm considers
only the ratings matrix -- the past user-item interactions
It is general - so works for all categories as long as it's based on ratings
Approaches for Collaborative filtering - Answer-1) Neighborhood method
-K-Nearest Neighbor classification
2) Latent factor method - explains the ratings through a set of dimensions called
latent factors
-Matrix factorization
Matrix factorization - Answer-A latent factor method
Both users and items are mapped to a common set of self-extracted factors
Can you do training and scoring separately with recommendations? - Answer-No.
What is a difficult part of recommendation? - Answer-User-level preference is very
sensitive (personal and private)
, Users implicitly trust companies to safeguard data by agreeing to terms and
conditions
Clustering - Answer-Process of finding meaningful grouping in a dataset.
Group individuals by similarity
Is clustering about predicting a target class variable? - Answer-No, it's meant to
simply capture the possible natural groupings in data.
Why is clustering useful? - Answer-It can be used to explore if natural groups exists.
Answer questions about groups or what to offer.
Classification vs. Clustering - Answer-Classification - Supervised learning. Does data
belong to known group?
Clustering - Unsupervised learning. Process of dividing
Applications of Clustering - Answer-1) Marketing - discover groups of customers
2) Land use - identify similar land
3) Insurance - identify policy holder class
How are association rules and basket analysis used? - Answer-As an exploratory
tool to mine a limited number of common rules that can then be analyzed by a
human.
Can be used for building recommendations.
What is the goal of Association Rules? - Answer--Identify co-occurring item sets in
transaction-type databases.
-"What goes with what?"
What are the stages of Association rules? - Answer-1) Generate a list of meaningful
rules
2) Filter the list of rules based on interest criteria
What are popular rule generating algorithms for generating Association rules? -
Answer-1) Apriori
2) FP-Growth
Pivoted data - Answer-From a list of sessions and combinations of categories from
each particular session, the data can be transformed into something like a binary
form for processing
Association Rule terminology - Answer-If "item A" then "item B"
antecedent (premise) --> Consequent (conclusion)
{item A} --> {item B}
User-based collaborative filtering - Answer-1) Starts with user
2) Finds users who have purchased a similar set of items or ranked items in a similar
fashion
3) Makes recommendation to the initial user based on what the similar users
purchase or like
Item-based collaborative filtering - Answer-1) Starts with an item being considered by
a user
2) locates other items that tend to be co-purchased with that first item
Ratings Matrix - Answer-x=ratings
y=items
Most of the time will be mostly empty because it requires inputs by the customer
Out of 100 items a customer may only rate 10
Process of building a ratings matrix - Answer-1) Assemble known ratings
2) Predict unknown ratings
3) Prediction evaluation
What is the objective of a recommendation engine? - Answer-To fill blank cells of the
ratings matrix with predicted ratings for each user and the item, which the user
presumably hasn't experienced yet.
What is Collaborative filtering based on? - Answer-The idea that a user will prefer an
item if it is recommended by their like-minded friends
Distinguishing feature of Collaborative filtering - Answer-The algorithm considers
only the ratings matrix -- the past user-item interactions
It is general - so works for all categories as long as it's based on ratings
Approaches for Collaborative filtering - Answer-1) Neighborhood method
-K-Nearest Neighbor classification
2) Latent factor method - explains the ratings through a set of dimensions called
latent factors
-Matrix factorization
Matrix factorization - Answer-A latent factor method
Both users and items are mapped to a common set of self-extracted factors
Can you do training and scoring separately with recommendations? - Answer-No.
What is a difficult part of recommendation? - Answer-User-level preference is very
sensitive (personal and private)
, Users implicitly trust companies to safeguard data by agreeing to terms and
conditions
Clustering - Answer-Process of finding meaningful grouping in a dataset.
Group individuals by similarity
Is clustering about predicting a target class variable? - Answer-No, it's meant to
simply capture the possible natural groupings in data.
Why is clustering useful? - Answer-It can be used to explore if natural groups exists.
Answer questions about groups or what to offer.
Classification vs. Clustering - Answer-Classification - Supervised learning. Does data
belong to known group?
Clustering - Unsupervised learning. Process of dividing
Applications of Clustering - Answer-1) Marketing - discover groups of customers
2) Land use - identify similar land
3) Insurance - identify policy holder class
How are association rules and basket analysis used? - Answer-As an exploratory
tool to mine a limited number of common rules that can then be analyzed by a
human.
Can be used for building recommendations.
What is the goal of Association Rules? - Answer--Identify co-occurring item sets in
transaction-type databases.
-"What goes with what?"
What are the stages of Association rules? - Answer-1) Generate a list of meaningful
rules
2) Filter the list of rules based on interest criteria
What are popular rule generating algorithms for generating Association rules? -
Answer-1) Apriori
2) FP-Growth
Pivoted data - Answer-From a list of sessions and combinations of categories from
each particular session, the data can be transformed into something like a binary
form for processing
Association Rule terminology - Answer-If "item A" then "item B"
antecedent (premise) --> Consequent (conclusion)
{item A} --> {item B}