What is Data Science?
= Data science is a "concept to unify statistics, data analysis and their related methods" in order to
"understand and analyze actual phenomena" with data.
What makes a Data Scientist?
Data scientists use their data and analytical ability to find and interpret rich data sources; manage large
amounts of data; create visualizations to aid in understanding data; build mathematical models using the
data; and present and communicate the data insights/findings.
Machine learning = we want the machines
to be better than humans
Data Mining = dealing with data /
pre-processing / requires you to visualise
data
One commonality between all these fields = data-driven science
1
,What is data?
Example
Given these three conditions, what will the child do?
- We don't have enough data, so at one point we have to make a
prediction
- We have to convert the data into numerical representations, so
that the computer can process the data
Converting into numerical representations:
● Sunny = 1
● Cloudy = 0
● Rainy = 2
Binary representations:
● Yes = 1
● No = 0
Features = attributes
We can also convert the data into specific measurements...
Or visualize the data...
2
, Interpreting data
Algorithms look for rules, to be able to predict some kind of behavior / actions
Formally notations
Our prediction = y-hat
Our target = y
- The difference is that y is given, and y-hat is
a prediction of a model and therefore not
necessarily correct
The notations applied to the example above:
3