Summary - Interactive Data
Transformation - Master Information
Management
Sven van Alem
, Table of contents
1. Lecture 1: DBMS & Relational & SQL............................................................................................... 3
1.1 Database Management Systems ............................................................................................. 3
1.2 Relational Data Model ............................................................................................................. 4
1.3 Single table queries using SQL ................................................................................................. 5
2. Lecture 2: Entity Relationship, and translating from natural .......................................................... 6
2.1 Entity-Relationship Model ....................................................................................................... 6
2.2 Business concepts.................................................................................................................... 6
2.3 Relationships, degrees, and cardinalities ................................................................................ 8
2.4 Generalization and Specialization ........................................................................................... 9
3. Lecture 3: Translating ERD to DB schema & Database Normalization .......................................... 11
3.1 Relational schema ................................................................................................................. 11
3.2 Transforming ERD to Relational schema ............................................................................... 11
3.3 Data Normalization ............................................................................................................... 14
4. Lecture 4: Evolution of data management, big data, and data intensive systems ....................... 16
4.1 Evolution of Data management ............................................................................................ 16
4.2 Big Data Analytics .................................................................................................................. 16
4.3 Reasons for going beyond traditional RDBMS ...................................................................... 17
4.4 Big data .................................................................................................................................. 18
4.5 Storage layer (HDFS) .............................................................................................................. 19
4.6 Computation layer (MapReduce) .......................................................................................... 20
5. Lecture 5: The Spark ecosystem, RDDs, Programming model, and PySpark ................................ 23
5.1 Data flow models................................................................................................................... 23
5.2 Lambda expressions: preliminary material ........................................................................... 23
5.3 Apache spark architecture .................................................................................................... 24
5.4 The programming model: why spark?................................................................................... 25
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, etc. ..................... 28
6.1 Processing multiple tables ..................................................................................................... 28
6.2 Views ..................................................................................................................................... 29
6.3 Functions ............................................................................................................................... 29
6.4 Creating & Populating ........................................................................................................... 30
6.5 Data from Websites, Integration & Cleaning, Entity Extraction & resolution....................... 31
2
Transformation - Master Information
Management
Sven van Alem
, Table of contents
1. Lecture 1: DBMS & Relational & SQL............................................................................................... 3
1.1 Database Management Systems ............................................................................................. 3
1.2 Relational Data Model ............................................................................................................. 4
1.3 Single table queries using SQL ................................................................................................. 5
2. Lecture 2: Entity Relationship, and translating from natural .......................................................... 6
2.1 Entity-Relationship Model ....................................................................................................... 6
2.2 Business concepts.................................................................................................................... 6
2.3 Relationships, degrees, and cardinalities ................................................................................ 8
2.4 Generalization and Specialization ........................................................................................... 9
3. Lecture 3: Translating ERD to DB schema & Database Normalization .......................................... 11
3.1 Relational schema ................................................................................................................. 11
3.2 Transforming ERD to Relational schema ............................................................................... 11
3.3 Data Normalization ............................................................................................................... 14
4. Lecture 4: Evolution of data management, big data, and data intensive systems ....................... 16
4.1 Evolution of Data management ............................................................................................ 16
4.2 Big Data Analytics .................................................................................................................. 16
4.3 Reasons for going beyond traditional RDBMS ...................................................................... 17
4.4 Big data .................................................................................................................................. 18
4.5 Storage layer (HDFS) .............................................................................................................. 19
4.6 Computation layer (MapReduce) .......................................................................................... 20
5. Lecture 5: The Spark ecosystem, RDDs, Programming model, and PySpark ................................ 23
5.1 Data flow models................................................................................................................... 23
5.2 Lambda expressions: preliminary material ........................................................................... 23
5.3 Apache spark architecture .................................................................................................... 24
5.4 The programming model: why spark?................................................................................... 25
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, etc. ..................... 28
6.1 Processing multiple tables ..................................................................................................... 28
6.2 Views ..................................................................................................................................... 29
6.3 Functions ............................................................................................................................... 29
6.4 Creating & Populating ........................................................................................................... 30
6.5 Data from Websites, Integration & Cleaning, Entity Extraction & resolution....................... 31
2