2025-2026
This summary is based on the slides covered in the Machine Learning for Business course taught at the
Faculty of Business and Economics at the University of Antwerp.
1
,1: INTRO TO MACHINE LEARNING
WHAT IS…
1. Key concepts and terms
There are a lot of terms, but they don’t have fixed definitions. Each term can be used interchangeably depending on the
situation and environment. However, the following definitions will give you a guideline.
1.1 Data science
Data science = the process of extracting insights and knowledge from data using techniques from statistics, computer
science and domain expertise
● the broadest of terms
● used by people with backgrounds in statistics, computer science, mathematics, economics, engineering
or business analytics (~ everyone)
1.2 Data analytics
Data analytics = the process of examining, cleaning, transforming and modeling data to discover useful information and
support decision-making
● the most manual of the “data sciences”
● used by people with backgrounds in business, economics, marketing, finance or statistics
1.3 Data mining
Data mining = the process of discovering patterns, relationships or anomalies in large datasets using statistical and
computational methods
→ eg. patterns in products that people often buy together
● the oldest of the “data sciences”
● used when data analytics is not doable anymore, but you still make the decisions yourself
● used by people with backgrounds in business intelligence, marketing, statistic or computer science
1.4 Machine learning
Machine learning = the process of building algorithms that can learn from data and make predictions or decisions without
being explicitly programmed
→ eg. ML guesses a customer’s needs and decides if the customer needs a human employee to help with info
● the highest value of the “data sciences”
● decisions are made by the algorithm instead of a human
● used by people with backgrounds in computer science, data science, (digital business) engineering,
mathematics or physics (~ more technical and not mainly about business understanding)
1.5 Artificial intelligence
Artificial intelligence = the art of creating systems that can perform specific tasks typically requiring human intelligence,
such as reasoning, learning and perception
→ eg. AI for chatbot in customer service
● the hippest of the “data sciences”
● used by people with backgrounds in computer science, robotics, cognitive science or philosophy of
mind
2
,Artificial (General) Intelligence = the quest for a theoretical form of AI that can perform any intellectual task a human
can, with general reasoning and learning abilities across domains
● the absolute hippest of the “data sciences”
● not restricted to general capabilities or domain- or task specific knowledge
● most used by Sam Altman, Ilya Sutskever and other AI prophets
1.6 Big data
Big data = extremely large and complex datasets that require advanced tools and infrastructure to store, process and
analyze
● a bit of an outdated term in 2025 (we can store almost anything)
● used by people with backgrounds in computer science, information systems, cloud computing or data
engineering
1.7 Data engineering
Data engineering = a discipline of designing and building systems for collecting, storing and processing (big) data
efficiently and reliably
● underrated, but the number one reason why data science projects succeed or fail
→ if you have a good model, the implementation is crucial for it to succeed
● used by people with backgrounds in software engineering, computer science or information technology
1.8 Machine learning engineering
Machine learning engineering = the practice of implementing, deploying and maintaining machine learning models in
production environments
● one of the hardest things to do well
● used by people with backgrounds in computer science, software engineering or applied mathematics
The world constantly changes, so tomorrow your model might not be as good at making predictions as it is today. This is a
problem we have only recently been confronted with. All models failed after COVID because this was a situation that we
had never seen before.
→ eg. prediction for stock amount in stores: physical stores (closed) had too much and online stores had too little stock
1.9 Data governance
Data governance = a set of policies, processes and standards that ensure data is accurate, secure and used responsibly
across an organization
→ eg. Is it ethical to use people’s ethnicity when building a fraud detection model?
● the most overlooked aspect of data science
● used by people with backgrounds in information systems, law, business administration or data
management
The more we use data on a day-to-day basis in our businesses, the more we need to take into account the risks, the things
that can go wrong and ways to safely store the data.
2. Time to exercise 🐾
pdf ‘1. Introduction to Machine Learning’
slides 34 - 40
3
, DATA TYPES
1. Key concepts
1.1 Structured data
Structured data = data that can be neatly displayed in rows and columns in relational databases
● easy to retrieve and analyze
● typically consists of numbers, dates, strings
● does not require much preprocessing before use in classifier
EXAMPLE: tabular data in a SQL database or spreadsheet
→ the students’ names are replaced by an identifier (pseudonymisation) to anonymize the data
1.2 Unstructured data
Unstructured data = data that is primarily meant for humans, not machines
● challenging to retrieve and analyze with machine learning models
● requires preprocessing before it can be used in a classifier
● makes up to +/- 80% of data out there
EXAMPLE: text, video, audio, images …
1.3 Semi-structured data
Semi-structured data = data that is somewhere in between structured and unstructured
→ has some structure, but structure is not rigid
EXAMPLE: HTML, XML, JSON, YAML …
● (L) missing data: some people might not fill in their city
● (L) other countries structure their addresses differently (districts, postal codes …)
● (R) all these things have a meaning, but you have to know the context to be able to understand them
4