Summary 2024
Written by Isabel Rutten (November 2024, Leiden University)
Competing on Analytics – Davenport
Analytics competitor: company that uses sophisticated data-collection technology and
analysis to get value from all your business processes. To become one:
- Change to an analytics-focused company, by supporting changes in culture,
processes, and skills
- Place all data-collection and analysis activities under a common leadership, with
common technologies and tools to facilitate data-sharing
- Focus on analytics initiatives that most directly serve your overarching competitive
strategy
- Establish analytics culture i.e. respect for measuring, testing and evaluating
quantitative evidence
- Hire analysts with top-notch quantitative analysis skills and its communication
- Use the right technology, such as CRM or ERP
Lecture 1 – Business Understanding
Business analytics: use analytical modeling and numerical analysis, including explanatory
and predictive modeling, and fact-based management to drive decision-making.
Features: task (provide decision support for specific defined goals), foundation (relies on
empirical information), realization (as a system using actual info and communication
capabilities), delivery (at the right time to the right people in an appropriate form).
Examples are in manufacturing (frequency errors), marketing and sales (most profitable
customer segments), professional services (will business progress result in objectives)
Data science: inter-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge from structural and unstructured data.
Machine learning: set of methods which learn through data a specific task, closely linked to
statistics and optimization; algorithms whose performance improve as they are exposed to
more data over time. Methods include linear regression, decision trees, neural networks, k-
means clustering. ML is relevant because it can handle millions of parameters.
Nowadays: pre-trained models are available, the cloud offers unlimited computation power,
no coding is required, more data is collected than ever through digitalization and automation.
When to use: problem related to data, too complex for coding, constantly changing,
perceptive problem, unstudied phenomenon, simple objective. When not: needs complete
explanation, high cost of error, hard to get right data, operational environment cannot handle
Drivers of cost: complexity, data, accuracy (cost of wrong prediction). Lessons: know
unknowns like quality, quantity, relevant features, model size, etc.; progress on model error
is first fast and later slows down; simplify problem first and solve subproblems
Artificial intelligence: A program that can sense, reason, act, and adapt; broad definition of
intelligence displayed by machines, various subfields with specific tasks e.g. reasoning,
knowledge representation, planning, learning.
1
Isabel Rutten. Machine Learning for Business Analytics. Leiden University. November 2024.
, Neural networks: subset of machine learning based on connected neurons approximated
by simple functions; capable of dealing with complex ML tasks while in the need of a large
amount of data. Examples: computer vision, natural language processing.
Data science pyramid of needs: from bottom to top: collect (e.g. logging) move/store (e.g.
pipelines), explore/transform (e.g. cleaning), aggregate/label (e.g. analytics), learn/optimize
(e.g. A/B testing), AI/Deep learning.
Linear regression: for task of finding a linear relationship between x and y observations,
allowing for predictions on unseen inputs for future outputs. Data is data set of
measurements including in- and output. During training the parameter of the model is search
such that it has the lowest error. During testing, trained model with fixed parameters is
evaluated with representative unseen data to estimate its performance. During deployment,
model with fixed parameters is run in final software.
ML template: projects fail due to a misunderstanding of objectives between business and
analyst, so use a ML template to structure the idea of the project, document the approach for
all participants and provide a starting point for the data scientists. The template includes:
task (business problem from perspective of business), data (data used to solve task),
method (high-level ML approach from data scientist perspective), evaluation (how solution to
task is evaluated and success definition), hypothesis (which use data and method to be
accepted or rejected; each describes how to achieve a goal states in task, each should focus
on single improvement, start simple, need acceptance criteria e.g. Using [data] with [method]
we will ensure [evaluation] to be [value]). See figure 1 for an example.
Typology of tasks/goals: descriptive (summary description e.g. reporting, segmentation,
detect interesting behavior), predictive (predict behavior of instances in business processes
e.g. regression, classification), understanding (support stakeholders in understanding
business processes e.g. process identification, analysis).
Machine learning lifecycles: iterative steps taken to build, deliver and maintain any data-
driven product. Different versions exist and can be adapted to team size, analytics
task/project scope, organization, existing pipelines and tools. Examples:
- KDD (Knowledge Discovery in Databases)
- SEMMA (Sample, Explore, Modify, Model, Assess)
- OSEMN (Obtain, Scrub, Explore, Model, iNterpret)
- TSDP (Team Data Science Process)
- CRISP-DM (Cross-Industry Standard Process for Data Mining): 6 major phases with
back-and-forth movement, arrows show most frequent ways. See figure 2.
- AI Engineering: Model life cycle with emphasis on the model. See figure 3.
Business Intelligence (BI) tools: See figure 4 for the BI tools pyramid.
Advantages Disadvantages
SPSS Good UI, easy to start High costs, licenses system
Python Scalability, big community Not strong in explanatory analysis,
need programming skills
R Free, big community for empirical Can be slow, steep learning curve
methods
LLM Create code and pictures, automated Higher competitiveness, unique
content creation
Colab Work on data from browser without -
need of installations
2
Isabel Rutten. Machine Learning for Business Analytics. Leiden University. November 2024.