100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

[23-24] Interactive Data Transformation complete summary IM

Rating
4.4
(5)
Sold
47
Pages
61
Uploaded on
13-11-2022
Written in
2021/2022

A complete summary of the lecture slides, recorded videos, and live lectures. Passed the course with a 7.5 by only studying this summary.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
November 13, 2022
Number of pages
61
Written in
2021/2022
Type
Summary

Subjects

Content preview

Summary
Interactive Data
Transformation

,Table of Contents
Lecture 1: Database management systems, relational data models, and SQL ....................................... 1
1.1. Database management systems .................................................................................................. 1
1.2. Relational data model .................................................................................................................. 3
1.3. Single table queries using SQL ...................................................................................................... 4
Lecture 2: Entity relationship and translating from a natural language specification ............................ 5
2.1. Basic concepts .............................................................................................................................. 5
2.2. Relationships, degrees & cardinalities ......................................................................................... 9
2.3. Generalization & specialization .................................................................................................. 15
Lecture 3: Transforming ERD to relational schema, and normalization ............................................... 19
3.1. Transforming ERDs ..................................................................................................................... 19
3.2. Data normalization ..................................................................................................................... 25
Lecture 4: Evolution of data management, big data, and data intensive systems ............................... 28
4.1 Evolution of data management ................................................................................................... 28
4.2. Big data analytics ........................................................................................................................ 28
4.3. Reasons for going beyond traditional RDBMS ........................................................................... 30
4.4. Storage layer............................................................................................................................... 32
4.5. Computation layer ...................................................................................................................... 33
Lecture 5: The Spark ecosystem, RDDs, programming model, and PySpark ........................................ 40
5.1. Lambda expressions ................................................................................................................... 40
5.2. Apache Spark .............................................................................................................................. 41
5.3. RDDs ........................................................................................................................................... 41
5.4. Programming model ................................................................................................................... 43
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, and more ........... 49
6.1. Processing multiple tables .......................................................................................................... 49
6.2. Views .......................................................................................................................................... 50
6.3. Functions .................................................................................................................................... 51
6.4. Creating & populating ................................................................................................................ 53
6.5. Data from websites, integration & cleaning, and entity extraction & resolution ...................... 56
6.6. Integration & cleaning ................................................................................................................ 59

,Lecture 1: Database management systems, relational data models,
and SQL

1.1. Database management systems
Reasons for database management systems (DBMS): it offers solutions to the following problems:
• Data redundancy and consistency: multiple file formats, duplication in different files.
• Difficulty in accessing data: need to write a new program to carry out each new task.
• Data isolation: multiple files and formats.
• Integrity problems: integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly. Hard to add new constraints or change
existing ones.
• Atomicity of updates: transfer of funds from one account to another should either be
complete or not happen at all. Failures may leave data in an inconsistent state with partial
updates carried out.
• Concurrent access by multiple users: uncontrolled concurrent accesses can lead to
inconsistencies.
o Example: two people reading a balance (e.g., €100) and then withdrawing money (e.g.,
50 for person A, 70 for person B) at the same time.
• Security problems: hard to provide user access to some, but not all, data.

Database (DB): shared collection of data with the same structure, including correlations and
relationships for a common purpose.

DBMS: a collection of programs that manages the database structure and controls access to the data
stored in the database. It offers functions and methods to build and manipulate the data. It can be
seen as a black box interacting between users/applications and the database.




Goals of a DBMS: separate data from application.
• Provide an interface that the application programmer must follow.
• Allow system administrator to make modifications without having an impact on the user, for
example improve or reconfigure systems.
• Users can change their view of the data without having to worry about how it is stored.




1

, Layers of a DBMS (architecture):
• Internal layer: software for storing and structuring the data and offers efficient access
methods.
• Logical layer: optimization of queries, resolves conflicting accesses of multiple users and
guarantees constant availability (even in case of failures).
• External layer: communicates with users, analyses user requests/queries, controls access and
presents the answers.




Development process / life cycle of a DBMS:
• Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem. Steps include analyzing the current data processing and
general business functions and needs.
• Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements. The output is a conceptual schema/ERD that corresponds to a
detailed, technology independent specification of the overall organizational data structure.
• Logical design: representation of the database. Transform the conceptual schema, i.e.,
outcome of previous step, in terms of the data management system.
• Physical design: the set of specifications that describe how data are stored in a computer’s
secondary memory by a specific database management system.
• Implementation: build database implementation, populate with data, install and test
applications, complete documents and training materials.
• Maintenance: monitor the operation and usefulness of the system. Repair errors in the
database and applications. Enhance by analyzing the database and applications to ensure that
evolving information requirements are met.




2
$7.19
Get access to the full document:
Purchased by 47 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Document also available in package deal

Reviews from verified buyers

Showing all 5 reviews
2 months ago

6 months ago

1 month ago

2 year ago

2 year ago

4.4

5 reviews

5
2
4
3
3
0
2
0
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
IMstudentTiU2122 Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
182
Member since
3 year
Number of followers
94
Documents
11
Last sold
1 month ago

3.7

13 reviews

5
5
4
4
3
1
2
1
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions