Methodology lecture 1
Big data analysis:
Data science = interdisciplinary field of science involved in unlocking knowledge and insight
from structured and structured data
- Draws from statistics, computer science, information science and mathematics
Data mining = finding information, meaning and patterns in (large) data sets
Text mining = Glean useful information from a large body of natural language text
Data analytics = Mostly the same as “data analysis”, but in the context of big data and data
science
Big data = data in large quantities, complex and generated fast so that is hard or (even
impossible) to process by utilising traditional approaches
The 5 V’s of Big Data:
Problems with research design over big data:
◼ Validity
◼ Reliability of measures
◼ Can we use it to do scientific research?
Big data comes with big problems and new types of questions, therefore new statistical
methods
Big data analytics = use of advanced analytic techniques for very large and diverse data sets
,Analytics process model:
3 types of data
- Unstructured data
- Semi-structured data
- Structured data
Structured data:
,- Data are in relational database, accessible using Structured Query Language (SQL).
Because the data are orderly placed on the website, the data can be easily “scraped”
Unstructured data:
◼ Unstructured web pages
◼ Siri voice command
◼ Video
◼ Body of emails
◼ Social media streams
◼ Interstellar radio signals
, Machine learning methods:
Supervised task:
Big data analysis:
Data science = interdisciplinary field of science involved in unlocking knowledge and insight
from structured and structured data
- Draws from statistics, computer science, information science and mathematics
Data mining = finding information, meaning and patterns in (large) data sets
Text mining = Glean useful information from a large body of natural language text
Data analytics = Mostly the same as “data analysis”, but in the context of big data and data
science
Big data = data in large quantities, complex and generated fast so that is hard or (even
impossible) to process by utilising traditional approaches
The 5 V’s of Big Data:
Problems with research design over big data:
◼ Validity
◼ Reliability of measures
◼ Can we use it to do scientific research?
Big data comes with big problems and new types of questions, therefore new statistical
methods
Big data analytics = use of advanced analytic techniques for very large and diverse data sets
,Analytics process model:
3 types of data
- Unstructured data
- Semi-structured data
- Structured data
Structured data:
,- Data are in relational database, accessible using Structured Query Language (SQL).
Because the data are orderly placed on the website, the data can be easily “scraped”
Unstructured data:
◼ Unstructured web pages
◼ Siri voice command
◼ Video
◼ Body of emails
◼ Social media streams
◼ Interstellar radio signals
, Machine learning methods:
Supervised task: