100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Essay

Data Mining Module

Rating
-
Sold
-
Pages
30
Grade
A+
Uploaded on
07-10-2022
Written in
2022/2023

First Class data mining Essay











Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
October 7, 2022
Number of pages
30
Written in
2022/2023
Type
Essay
Professor(s)
Unknown
Grade
A+

Content preview

Data Mining



Organics Assignment




Table of contents


Contents
Summary................................................................................................................................................3
Business Problem...................................................................................................................................3
Data Mining Representation..................................................................................................................3
Methodology - data mining approach...................................................................................................3

,Data Exploration....................................................................................................................................5
Data partition creation of model sets....................................................................................................6
Data Modelling....................................................................................................................................13
Regression...........................................................................................................................................14
Neural Network...................................................................................................................................16
Development of models..................................................................................................................16
Model performance.........................................................................................................................18
Neural network architecture of best model....................................................................................18
Decision Tree.......................................................................................................................................19
Development of models..................................................................................................................19
Performance of models...................................................................................................................22
Overfitting and limitations...............................................................................................................23
Analysis of the best model...................................................................................................................24
Conclusion..........................................................................................................................................27
Recommendation................................................................................................................................27
References and Bibliography...............................................................................................................28
References...........................................................................................................................................28
Other Resources..................................................................................................................................28
Appendix..............................................................................................................................................29
My Reflections on the Patchwork assignment.....................................................................................30




2
IMAT Data Mining Organics

, Summary
This report presents the results of using the sas data mining framework, to build
models that will help the management of a supermarket concentrate their resources
on targeting customers that are most likely to purchase organic products. Three
binary classification models were generated using regression analysis, decision
trees and Neuro networks . Age and AFFL were identified as the most important
predictors. All three models were predictive and have a satisfactory level of
performance. The decision tree was chosen as the champion models based on
performance and also the techniques’ ability to provide an non-technical explanation
of the model with a lift value of 2.97 times greater than selecting customers at
random. Several recommendations are made for improving data quality for the next
cycle of data mining.




Business Problem
The business problem is to identify the customers that are most likely to purchase
organic products in the supermarket. A data models will be built data set
(organics.xls) collected during the supermarket incentive period. By identifying
customers who are likely to purchase organic products the company will be able to
target its marketing efforts more effectively which should result in more sales per
marketing advertising spend.


Data Mining Representation
The business problem, identification of customers who are likely to buy organic
products is a type of data mining representation known as a classification problem.
The most suitable target variable is ORGYN which is identified as a binary variable.
The remaining variables will be given roles INPUT and are assigned the default
measurement levels with the exception of AFFL which has been changed from
interval to ordinal.


Methodology - data mining approach

The process to be adopted is the first flow in the virtuous cycle of data mining which
has four distinctive steps:

 Identify the business problem or opportunity.
 Mining data to transform it into actionable information.
 Acting on the information this is outside the remit of the brief (marketing
initiative driven by the model, eg targeted marketing offer)
 Measuring the results of the marketing initiative this is outside the remit of the
brief (measure profitability of the pilot marketing study using the model).


3
IMAT Data Mining Organics

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
marcinsielewicz Cambridge University
View profile
Follow You need to be logged in order to follow users or courses
Sold
97
Member since
3 year
Number of followers
80
Documents
0
Last sold
10 months ago

3.9

19 reviews

5
7
4
6
3
4
2
1
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions