Data in Sport & Health
Lecture 1 – an introduction to data science
Data types
Structured
- Quantitative data
- Relational what can be found in a database
- Used in decision making → <50%
Unstructured data
- Qualitative data
- Can be found in the ‘wild’ → extract properties of
data
- Analysed → <1%
Big data
Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from structured and unstructured.
Knowledge discoveries in databases (KDD):
Data: Raw pieces of data
- Redm 192.234.234.453.674, v2.0
Information: Data is useful, organized and structured
- South facing traffic light on corner of Pitt and George
streets has turned red
Knowledge: Information is read, heard or seen and integrated and
understood
- The traffic light I am driving towards turned red
Wisdom: Informed decision making
- I better stop the car!
,Data science vs statistics:
- Not a very strict separation
- Each has its merits, statistical foundations of data science should not be ignored
- Primary differences:
• Statistics starts with hypothesis, DS with data
• Statistics shines with limited data, DS with lots of data
- Cultural differences between statisticians and data scientists.
Statistics
Old, extensive and mature (somewhat conservative) field
- Hypothesis → data collection → statistics
- Limited data
• Just about the hypothesis
• Few data points (test subjects, ethical considerations)
- Limited data demands carefulness and rigour
• A danger of drawing unfounded conclusions
- Hypothesis testing
Data science
Relatively young, methodology still growing
- Dataset → analysis → hypothesis
- Extensive data (long and wide)
- Leave room for exploration and discovery
- Extensive data demands carefulness and rigour
• Danger of drawing unfounded conclusions
- Hypothesis generation
• Findings are not unequivocal, publishable result
• Process doesn’t stop with DS
Data science lifecycle
,
Lecture 1 – an introduction to data science
Data types
Structured
- Quantitative data
- Relational what can be found in a database
- Used in decision making → <50%
Unstructured data
- Qualitative data
- Can be found in the ‘wild’ → extract properties of
data
- Analysed → <1%
Big data
Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from structured and unstructured.
Knowledge discoveries in databases (KDD):
Data: Raw pieces of data
- Redm 192.234.234.453.674, v2.0
Information: Data is useful, organized and structured
- South facing traffic light on corner of Pitt and George
streets has turned red
Knowledge: Information is read, heard or seen and integrated and
understood
- The traffic light I am driving towards turned red
Wisdom: Informed decision making
- I better stop the car!
,Data science vs statistics:
- Not a very strict separation
- Each has its merits, statistical foundations of data science should not be ignored
- Primary differences:
• Statistics starts with hypothesis, DS with data
• Statistics shines with limited data, DS with lots of data
- Cultural differences between statisticians and data scientists.
Statistics
Old, extensive and mature (somewhat conservative) field
- Hypothesis → data collection → statistics
- Limited data
• Just about the hypothesis
• Few data points (test subjects, ethical considerations)
- Limited data demands carefulness and rigour
• A danger of drawing unfounded conclusions
- Hypothesis testing
Data science
Relatively young, methodology still growing
- Dataset → analysis → hypothesis
- Extensive data (long and wide)
- Leave room for exploration and discovery
- Extensive data demands carefulness and rigour
• Danger of drawing unfounded conclusions
- Hypothesis generation
• Findings are not unequivocal, publishable result
• Process doesn’t stop with DS
Data science lifecycle
,