All Lectures and Tutorials Summary
,Lecture 1 - Summary
Course introduction, overview, and why privacy is important
Data analysis is a process of inspecting, cleansing, transforming and modelling data with the goal of
discovering useful information, informing conclusions and supporting decision-making.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to
extract knowledge and insights from many structural and unstructured data.
Data collection and pre-processing are always the first steps of a Business Analytics project
• Descriptive (what happened?) → just grasping is what is in your data
- Activities
- Results
• Diagnostic (Why did it happen?) → whatever we can understand from the data
- Content correlations
- W/L analysis
• Predictive (What will happen next?)
- Lead scoring
- Sales forecast
• Prescriptive (How can we make it happen?)
- Content recommendations based on passed activities & demographics
- Opportunity prioritization
Goal: reach the prescriptive state
Data
Big data is data with 3Vs
1. Volume - Enormous amounts of data (zettabytes)
2. Velocity - Real time stream of data
3. Variety - Data from a range of sensors, with different types
Problems with big data
What makes privacy of Big Data a problem different to traditional privacy? Scale!
- Lack of control and transparency (about what is being collected from us and what is happening with it)
- Data reusability (data is used for other things than the initial purpose)
- Data inference and re-identification
Most BA projects do not involve big data, but use with relatively small and structured data sets.
Structured data sets:
Used by most predictive techniques. Usually consists of entries (e.g. people) with attributes (e.g., name,
income, sex, nationality).
Unstructured data sets:
Has no structure. It might be data from cameras, social media sites, text entered in free text fields, etc..
Unstructured data is the majority of the data that is stored today, and it is often also big data. When
working with unstructured data, the first step is often to extract features to make it structured and
therefore suitable as input for an algorithm working with structured data (e.g., images from road-side
cameras are used to extract license plates which are then used to analyze the movement of cars).
, Tutorial 1 - Tutorial Notes
Privacy is Dead! Long Live Privacy!
Workgroup Discussion:
1. In what ways could data compromise our autonomy? Our human dignity? Our
rationality?
2. Are there ‘no-go’ areas for computer scientists? Should there be?
3. What role for law in computer science? What role for computer science in law?
4. Where should the intervention of law be in building digital technology?
Tutorial attendees will be asked to think about the design of an app (description will be
provided). Students will be asked to identify what parts of their lives might be
compromised by the design of the app.
Important questions to think about:
What app data can infer what private data? For example:
• Location data can infer religious data (if someone is at the location of a church every Sunday)
• Diet + physical + medical data can infer religion (if someone is not eating for an entire day during
Ramadan)
Apps get a lot of data, and each data combination can infer something as well, like habits, religion, diets.
Speed of typing becomes a diagnostic test, people who are typing at a certain rate can have cross
references with a dementia patient.