Samenvatting Fundamentals of Data Science
Terminology (i) and (ii)
• Querying and reporting
• You know exactly what you are looking for.
• SQL:
• SELECT * FROM CUSTOMERS WHERE AGE > 45
• OLAP: Online Analytical Processing
• GUI to query large data collections in real-time
• Pre-programmed dimensions of analysis
• Summary level
--> no modeling or pattern finding
OLAP GUI example
Classic Business Intelligence: You know what you are looking for -> Query/OLAP
• Data Science: “A set of fundamental principles that guide extraction of knowledge
from data”
• Data Mining: “The extraction of knowledge from data, via technologies that
incorporate these principles”
• Big Data: “Data that is so large that traditional data storage and processing systems
are unable to deal with it”
You don’t know what you look for/want to find new intricate patterns in the (big) data ->
Data Mining
,Terminology (iii): Technologies
no real faces
Applications of what we just saw?
Concerns?
• Modern ML techniques are very good at learning complex patterns in data to solve
certain types of predefined tasks
• Data science harnesses these techniques to solve commercial and business issues to
create value
Data
• At the basis of all of this: data!
• What is data?
-> Raw stream of facts
,Data as a strategic asset
• Data can lead to better decision making through data science
• Data -> information/knowledge
• Data is a valuable asset
Which types of decisions to support through data science
• Decisions for which discoveries need to be made
• Usually high impact
• E.g., prediction of demand shocks in times of crisis
• Decisions that repeat, especially at massive scale
• so decision-making can benefit from even small increases in decision-making
accuracy on data analysis.
• E.g., credit scoring
Remember!
• Data science: “A set of fundamental principles that guide extraction of knowledge
from data”
• Data mining: “The extraction of knowledge from data, via technologies that
incorporate these principles”
• Important technology: machine learning
• Learns from data
But what is learning?
, Learning
• We usually learn a function:
y = f(x)
• f: a mathematical or logical formula
• Can be learned using algorithms that learn f(x) from data, from examples
• E.g.: f() a program to identify cats in video data
• Gets better with more examples -> Remember: Machine learning OR:
• Mapping of x to y can be hardcoded, what the program does -> solution not
learned
• y = f(x)?
• Looks suspiciously like:
• But…
• Often more complex
Y= a + bx (a and b are learnable parameters)
Some learned functions can be very complex
• E.g., T-NLG: 17 billion parameters
Some decision support systems can involve multiple complex distributions/functions
E.g., for supporting liver assignment
Terminology (i) and (ii)
• Querying and reporting
• You know exactly what you are looking for.
• SQL:
• SELECT * FROM CUSTOMERS WHERE AGE > 45
• OLAP: Online Analytical Processing
• GUI to query large data collections in real-time
• Pre-programmed dimensions of analysis
• Summary level
--> no modeling or pattern finding
OLAP GUI example
Classic Business Intelligence: You know what you are looking for -> Query/OLAP
• Data Science: “A set of fundamental principles that guide extraction of knowledge
from data”
• Data Mining: “The extraction of knowledge from data, via technologies that
incorporate these principles”
• Big Data: “Data that is so large that traditional data storage and processing systems
are unable to deal with it”
You don’t know what you look for/want to find new intricate patterns in the (big) data ->
Data Mining
,Terminology (iii): Technologies
no real faces
Applications of what we just saw?
Concerns?
• Modern ML techniques are very good at learning complex patterns in data to solve
certain types of predefined tasks
• Data science harnesses these techniques to solve commercial and business issues to
create value
Data
• At the basis of all of this: data!
• What is data?
-> Raw stream of facts
,Data as a strategic asset
• Data can lead to better decision making through data science
• Data -> information/knowledge
• Data is a valuable asset
Which types of decisions to support through data science
• Decisions for which discoveries need to be made
• Usually high impact
• E.g., prediction of demand shocks in times of crisis
• Decisions that repeat, especially at massive scale
• so decision-making can benefit from even small increases in decision-making
accuracy on data analysis.
• E.g., credit scoring
Remember!
• Data science: “A set of fundamental principles that guide extraction of knowledge
from data”
• Data mining: “The extraction of knowledge from data, via technologies that
incorporate these principles”
• Important technology: machine learning
• Learns from data
But what is learning?
, Learning
• We usually learn a function:
y = f(x)
• f: a mathematical or logical formula
• Can be learned using algorithms that learn f(x) from data, from examples
• E.g.: f() a program to identify cats in video data
• Gets better with more examples -> Remember: Machine learning OR:
• Mapping of x to y can be hardcoded, what the program does -> solution not
learned
• y = f(x)?
• Looks suspiciously like:
• But…
• Often more complex
Y= a + bx (a and b are learnable parameters)
Some learned functions can be very complex
• E.g., T-NLG: 17 billion parameters
Some decision support systems can involve multiple complex distributions/functions
E.g., for supporting liver assignment