Data Science for Business
Wouter Verbeke
Part 1: Foundations of Data Science
Chapter 1: Data Science for business
i. The digital revolution
Technologies from the past have been crucial and allowed to do things
more efficiently and fulfil some basic needs (store and process
information, transport things, exchange information, …).
More recent technologies focus on integrating information technology
(chips especially). Chips have become small, cheap and very powerful.
NOTE: the industry always expects a little bit too much from a new
technology (“hypes”). Once the hype is over, there is the trough of
disillusionment.
ii. How did we get here?
Computers: once computers were able to process data need to
collect, store and transfer data idea to develop a network
technology.
Networks: facilitated the development of the internet.
Internet (90’): triggered many business ideas (YouTube, Amazon,
…) need to store a lot of data; traditional databases were no
longer enough.
Big Data: database, -mart, -warehouse, -lake, in the cloud.
Artificial intelligence.
iii. Impact on business?
There were extensive investments in business infrastructure;
companies kept on investing in infrastructures
To support or automate virtually every business function
o Manufacturing, supply chain management, marketing,
accounting, HR, …
Every aspect of business is now open to and instrumented for
data collection.
o Initially: data was collected and considered as a by-product
of digitalization
o Now: we know that data allows to improve efficiency, data
is much more intentionally collected (data science, AI, …)
, NOTE: Artificial general intelligence needs a lot of data to
train
iv. Data
i. External
Data = anything you can store as a sequence of bits, data has no
utility
Information = data with utility; data which has been transformed,
processed, … (the processing adds value to the data)
ii. Internal
Knowledge = processed information inside humans’ heads; internal
to people (>< information is external); further increase of utility
e.g. a book is information, if you study it, it becomes knowledge
Wisdom = even more processed, more utility
iii. Programs and algorithms
A program implements an algorithm. We put data inside the
program, to obtain information.
An algorithm defines a finite sequence of steps to transform an input
into an output.
iv. Types of data
Tabular vs. non-tabular
Structured vs. non-structured
o Structured: tabular, images, … >< unstructured: text
Real-time vs. historical
Experimental vs. observational
o Observational: gathered under the current applied policy
, Internal vs. external
Big data vs. small data
v. Data science
“Every aspect of business is now open to and instrumented for data
collection”
i. Opportunities
Data opens a world of opportunities BUT collectioning data is not a
new idea; having big data/large databases available is new.
Data science is about seizing these opportunities
To get to know what you need to know
By analysing data, applying data analytics
Transforming data into information and knowledge
ii. How to use data
How to use data for improving the efficiency of business decision-
making?
Data science for business
Efficiency bottom line (decrease costs) and top line
(increase revenue)
Decisions something we control
NOTE: to exploit business opportunities that are offered by
technology,
A thorough understanding is required of business AND
technology,
In order to align business needs/opportunities and
technological solutions
iii. What is data science
Provost and Fawcett: DS = methods for extracting useful information
and knowledge from data
Data science ≈ Data analytics
Science vs. business
Theory vs. application
Statistics vs. engineering
iv. Methods
Any method that transforms data into information, insights,
knowledge, decisions, …
Any method f, that transforms input X into output Y
, v. Data science for business
i. Types of decisions
Strategic: high-level, long-term oriented, about business strategy
(what products, which customers, suppliers, …)
Operational: low-level, high frequency (which applicants to
accepts, …)
Tactic: connect strategic decision making to operational decision
making
ii. Machine learning for business decision-making
We can use ML to use data for improving the efficiency of business
DM; by using it for operational decisions (low-level, high frequency).
ML allows to make predictions at individual level (not an average
estimate but focussed on individual predictions).
The use of organisational data is a key component of evidence-based
DM, where ML plays an increasingly important role.
iii. What is machine learning?
A (computer) algorithm tells what steps are needed to go from
inputs to outputs and solve a problem.
Programming: algorithm program BUT for complex tasks, it can be
hard to implement an algorithm. Maybe the computer can come up
with an algorithm?
The idea of ML is to make computers smart enough to learn f rather
than a human programming it.
Different ML methods:
o Different types of tasks
#1 Different types of output Y , or Y^
Classification: binary or multiclass (e.g.
ChatGPT)
Regression: classification but with continuous
outcome/target variable (age, income, …)
Forecasting: not one single Y but continuous
evolution in output (demand over the next 20
hours)
Survival analysis: probability of an outcome to
happen as function of time
Spatial analysis: where the event will happen
Wouter Verbeke
Part 1: Foundations of Data Science
Chapter 1: Data Science for business
i. The digital revolution
Technologies from the past have been crucial and allowed to do things
more efficiently and fulfil some basic needs (store and process
information, transport things, exchange information, …).
More recent technologies focus on integrating information technology
(chips especially). Chips have become small, cheap and very powerful.
NOTE: the industry always expects a little bit too much from a new
technology (“hypes”). Once the hype is over, there is the trough of
disillusionment.
ii. How did we get here?
Computers: once computers were able to process data need to
collect, store and transfer data idea to develop a network
technology.
Networks: facilitated the development of the internet.
Internet (90’): triggered many business ideas (YouTube, Amazon,
…) need to store a lot of data; traditional databases were no
longer enough.
Big Data: database, -mart, -warehouse, -lake, in the cloud.
Artificial intelligence.
iii. Impact on business?
There were extensive investments in business infrastructure;
companies kept on investing in infrastructures
To support or automate virtually every business function
o Manufacturing, supply chain management, marketing,
accounting, HR, …
Every aspect of business is now open to and instrumented for
data collection.
o Initially: data was collected and considered as a by-product
of digitalization
o Now: we know that data allows to improve efficiency, data
is much more intentionally collected (data science, AI, …)
, NOTE: Artificial general intelligence needs a lot of data to
train
iv. Data
i. External
Data = anything you can store as a sequence of bits, data has no
utility
Information = data with utility; data which has been transformed,
processed, … (the processing adds value to the data)
ii. Internal
Knowledge = processed information inside humans’ heads; internal
to people (>< information is external); further increase of utility
e.g. a book is information, if you study it, it becomes knowledge
Wisdom = even more processed, more utility
iii. Programs and algorithms
A program implements an algorithm. We put data inside the
program, to obtain information.
An algorithm defines a finite sequence of steps to transform an input
into an output.
iv. Types of data
Tabular vs. non-tabular
Structured vs. non-structured
o Structured: tabular, images, … >< unstructured: text
Real-time vs. historical
Experimental vs. observational
o Observational: gathered under the current applied policy
, Internal vs. external
Big data vs. small data
v. Data science
“Every aspect of business is now open to and instrumented for data
collection”
i. Opportunities
Data opens a world of opportunities BUT collectioning data is not a
new idea; having big data/large databases available is new.
Data science is about seizing these opportunities
To get to know what you need to know
By analysing data, applying data analytics
Transforming data into information and knowledge
ii. How to use data
How to use data for improving the efficiency of business decision-
making?
Data science for business
Efficiency bottom line (decrease costs) and top line
(increase revenue)
Decisions something we control
NOTE: to exploit business opportunities that are offered by
technology,
A thorough understanding is required of business AND
technology,
In order to align business needs/opportunities and
technological solutions
iii. What is data science
Provost and Fawcett: DS = methods for extracting useful information
and knowledge from data
Data science ≈ Data analytics
Science vs. business
Theory vs. application
Statistics vs. engineering
iv. Methods
Any method that transforms data into information, insights,
knowledge, decisions, …
Any method f, that transforms input X into output Y
, v. Data science for business
i. Types of decisions
Strategic: high-level, long-term oriented, about business strategy
(what products, which customers, suppliers, …)
Operational: low-level, high frequency (which applicants to
accepts, …)
Tactic: connect strategic decision making to operational decision
making
ii. Machine learning for business decision-making
We can use ML to use data for improving the efficiency of business
DM; by using it for operational decisions (low-level, high frequency).
ML allows to make predictions at individual level (not an average
estimate but focussed on individual predictions).
The use of organisational data is a key component of evidence-based
DM, where ML plays an increasingly important role.
iii. What is machine learning?
A (computer) algorithm tells what steps are needed to go from
inputs to outputs and solve a problem.
Programming: algorithm program BUT for complex tasks, it can be
hard to implement an algorithm. Maybe the computer can come up
with an algorithm?
The idea of ML is to make computers smart enough to learn f rather
than a human programming it.
Different ML methods:
o Different types of tasks
#1 Different types of output Y , or Y^
Classification: binary or multiclass (e.g.
ChatGPT)
Regression: classification but with continuous
outcome/target variable (age, income, …)
Forecasting: not one single Y but continuous
evolution in output (demand over the next 20
hours)
Survival analysis: probability of an outcome to
happen as function of time
Spatial analysis: where the event will happen