100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

COMPLETE Summary - Interactive Data Transformation (2023)

Rating
-
Sold
1
Pages
23
Uploaded on
08-06-2023
Written in
2022/2023

Includes everything from: Lecture 1: Data Management Systems, Relational, and SQL Lecture 2: Entity Relationships, and Translating from a Natural Lecture 3: Translating ERD to DB Schema, and Database Normalization Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems Lecture 5: The Spark Ecosystem, RDDs, Programming Model, and PySpark Lecture 6: Data Transformations with SQL, Entity Recognition, Data Cleaning Tools

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 8, 2023
Number of pages
23
Written in
2022/2023
Type
Summary

Subjects

Content preview

Interactive Data Transformation
Unit 4 – Year 2022/2023


Contents
Lecture 1: Data Management Systems, Relational, and SQL .............................................................. 2
Relational data model ............................................................................................................................ 3
Single table queries using SQL ............................................................................................................. 4
Lecture 2: Entity Relationships, and Translating from a Natural Language Perspective................ 5
Basic concepts ........................................................................................................................................ 5
Lecture 3: Translating ERD to DB Schema, and Database Normalization ....................................... 8
Transforming entity relationship diagram to relational schema ..................................................... 8
Normalization ........................................................................................................................................ 11
Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems .................... 12
Evolution of Data Management ......................................................................................................... 12
Big data and its challenges .................................................................................................................. 12
Big data analytics .................................................................................................................................. 13
Reasons for going beyond traditional RDBMS ................................................................................ 13
Lecture 5: The Spark Ecosystem, RDDs, Programming Model, and PySpark ................................ 17
Lambda expressions ............................................................................................................................. 17
Apache Spark ........................................................................................................................................ 18
Programming models ........................................................................................................................... 19
Lecture 6: Data Transformations with SQL, Entity Recognition, Data Cleaning Tools ................ 20
Views ...................................................................................................................................................... 21
Functions ............................................................................................................................................... 21
Creating and populating ...................................................................................................................... 22
Data from websites, integration & cleaning, entity extraction & resolution .............................. 22

,Lecture 1: Data Management Systems, Relational, and SQL
Reasons for Database Management Systems (DBMS):
In the early days database applications were built on top of file systems,
Drawbacks:
• Data redundancy and inconsistency: multiple file formats, duplication in different files
• Difficulty in accessing data: need to write a new program to carry out each new task
• Data isolation: multiple files and formats
• Integrity problems: integrity constraints become buried in program code rather than being
stated explicitly
- Hard to add new constraints or change existing ones
• Atomicity of updates: failures leave data in an inconsistent state with partial updates carried
out
• Concurrent access by multiple users: needed for performance, uncontrolled concurrent
accesses can lead to inconsistencies
• Security problems: hard to provide user access to some, but not all, data

A database (DB) is a collection of data with the same structure, including correlations and
relationships, that is defined for a particular use and used by several users
A database management system (DBMS) is a collection of programs over DB that specify the
data types, structure, constraints, and store this on a disk, retrieve, update, and mange access
rights
• A black box interacting between users/applications and the database
• The ultimate goal is to separate data from application
- Provide an interface that the application programmer must follow
- Allow system administrator to make modifications without having an impact on the user
- Users can change their view of the data without having to worry about how it is stored

There are different layers within the DBMS:
• External layer: communication with users
- Analysis of user requests (queries)
- Access control
- Answer presentation
• Logical layer:
- Optimization of queries
- Resolving conflicting accesses, i.e., multiple users
- Guarantees constant availability even in case of failures
• Internal layer:
- Storing the data
- Software for structuring the data
- Efficient access methods, i.e., keys, indices, etc.

System development life cycle
1. Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem
• Enterprise modeling: analyze current data processing, the general business functions and
their database needs, and justify need for new data and databases in support of business
• Conceptual data modeling: identify scope of database requirements for proposed
information system and analyze overall data requirements for business function(s) supported
by database
• Understand current data processing
• Understand general business functions and needs

, 2. Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements
• Conceptual data modeling:
- Develop preliminary conceptual data model, including entities and relationships
- Compare to enterprise data model
- Develop detailed conceptual data model, including entities, relationships, attributes, and
business rules
- Make conceptual data model consistent with other models of information systems
- Populate repository with all conceptual database specifications
• Output: conceptual schema
• Corresponds to a detailed technology independent specification of the overall organizational
data structure

3. Design
• Logical: representation of the DB; transform the conceptual schema into a logic schema,
which describes the data in terms of the data management technology that will be used to
implement the database
• Physical: the set of specifications that describe how data from a logical schema are stored in
a computer’s secondary memory by a specific database management system

4. Implementation: a designer writes, tests, and installs the program/scripts that access, create,
or modify the database
• Finalize all database documentation, train users, and put procedures into place for the
ongoing support of the information system users
• Populate with data
• Install application(s) and test
• Complete documentation and training materials

5. Maintenance: add, delete, or change characteristics of the structure of the database in order
to meet changing business conditions, to correct errors in database design, or to improve the
processing speed of database applications
• Monitor the operation and usefulness of the system
• Repair by fixing errors in database and applications
• Enhance by analyzing the database and applications to ensure that evolving information
requirements are met

Different types of DBMS:
• Traditional DBMS: text and numerical data
• Multimedia DBMS: multimedia data
• Spatial DBMS: geographic and geometric data
• Data warehouses
Relational data model
A relational model is an approach to managing data by representing it grouped into relations

A relational DBMS (RDBMS) is a database management system that manages data as a collection
of tables in which all relationships are represented by common values in related tables
$8.38
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Document also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
anneTBKIM Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
52
Member since
5 year
Number of followers
39
Documents
9
Last sold
8 months ago

3.0

1 reviews

5
0
4
0
3
1
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions