100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

[23-24] Interactive Data Transformation complete summary IM

Puntuación
4.4
(5)
Vendido
47
Páginas
61
Subido en
13-11-2022
Escrito en
2021/2022

A complete summary of the lecture slides, recorded videos, and live lectures. Passed the course with a 7.5 by only studying this summary.

Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
13 de noviembre de 2022
Número de páginas
61
Escrito en
2021/2022
Tipo
Resumen

Temas

Vista previa del contenido

Summary
Interactive Data
Transformation

,Table of Contents
Lecture 1: Database management systems, relational data models, and SQL ....................................... 1
1.1. Database management systems .................................................................................................. 1
1.2. Relational data model .................................................................................................................. 3
1.3. Single table queries using SQL ...................................................................................................... 4
Lecture 2: Entity relationship and translating from a natural language specification ............................ 5
2.1. Basic concepts .............................................................................................................................. 5
2.2. Relationships, degrees & cardinalities ......................................................................................... 9
2.3. Generalization & specialization .................................................................................................. 15
Lecture 3: Transforming ERD to relational schema, and normalization ............................................... 19
3.1. Transforming ERDs ..................................................................................................................... 19
3.2. Data normalization ..................................................................................................................... 25
Lecture 4: Evolution of data management, big data, and data intensive systems ............................... 28
4.1 Evolution of data management ................................................................................................... 28
4.2. Big data analytics ........................................................................................................................ 28
4.3. Reasons for going beyond traditional RDBMS ........................................................................... 30
4.4. Storage layer............................................................................................................................... 32
4.5. Computation layer ...................................................................................................................... 33
Lecture 5: The Spark ecosystem, RDDs, programming model, and PySpark ........................................ 40
5.1. Lambda expressions ................................................................................................................... 40
5.2. Apache Spark .............................................................................................................................. 41
5.3. RDDs ........................................................................................................................................... 41
5.4. Programming model ................................................................................................................... 43
Lecture 6: Data transformations with SQL, entity recognition, data cleaning tools, and more ........... 49
6.1. Processing multiple tables .......................................................................................................... 49
6.2. Views .......................................................................................................................................... 50
6.3. Functions .................................................................................................................................... 51
6.4. Creating & populating ................................................................................................................ 53
6.5. Data from websites, integration & cleaning, and entity extraction & resolution ...................... 56
6.6. Integration & cleaning ................................................................................................................ 59

,Lecture 1: Database management systems, relational data models,
and SQL

1.1. Database management systems
Reasons for database management systems (DBMS): it offers solutions to the following problems:
• Data redundancy and consistency: multiple file formats, duplication in different files.
• Difficulty in accessing data: need to write a new program to carry out each new task.
• Data isolation: multiple files and formats.
• Integrity problems: integrity constraints (e.g., account balance > 0) become “buried” in
program code rather than being stated explicitly. Hard to add new constraints or change
existing ones.
• Atomicity of updates: transfer of funds from one account to another should either be
complete or not happen at all. Failures may leave data in an inconsistent state with partial
updates carried out.
• Concurrent access by multiple users: uncontrolled concurrent accesses can lead to
inconsistencies.
o Example: two people reading a balance (e.g., €100) and then withdrawing money (e.g.,
50 for person A, 70 for person B) at the same time.
• Security problems: hard to provide user access to some, but not all, data.

Database (DB): shared collection of data with the same structure, including correlations and
relationships for a common purpose.

DBMS: a collection of programs that manages the database structure and controls access to the data
stored in the database. It offers functions and methods to build and manipulate the data. It can be
seen as a black box interacting between users/applications and the database.




Goals of a DBMS: separate data from application.
• Provide an interface that the application programmer must follow.
• Allow system administrator to make modifications without having an impact on the user, for
example improve or reconfigure systems.
• Users can change their view of the data without having to worry about how it is stored.




1

, Layers of a DBMS (architecture):
• Internal layer: software for storing and structuring the data and offers efficient access
methods.
• Logical layer: optimization of queries, resolves conflicting accesses of multiple users and
guarantees constant availability (even in case of failures).
• External layer: communicates with users, analyses user requests/queries, controls access and
presents the answers.




Development process / life cycle of a DBMS:
• Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem. Steps include analyzing the current data processing and
general business functions and needs.
• Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements. The output is a conceptual schema/ERD that corresponds to a
detailed, technology independent specification of the overall organizational data structure.
• Logical design: representation of the database. Transform the conceptual schema, i.e.,
outcome of previous step, in terms of the data management system.
• Physical design: the set of specifications that describe how data are stored in a computer’s
secondary memory by a specific database management system.
• Implementation: build database implementation, populate with data, install and test
applications, complete documents and training materials.
• Maintenance: monitor the operation and usefulness of the system. Repair errors in the
database and applications. Enhance by analyzing the database and applications to ensure that
evolving information requirements are met.




2
$7.19
Accede al documento completo:
Comprado por 47 estudiantes

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada


Documento también disponible en un lote

Reseñas de compradores verificados

Se muestran los 5 comentarios
2 meses hace

6 meses hace

2 meses hace

2 año hace

2 año hace

4.4

5 reseñas

5
2
4
3
3
0
2
0
1
0
Reseñas confiables sobre Stuvia

Todas las reseñas las realizan usuarios reales de Stuvia después de compras verificadas.

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
IMstudentTiU2122 Tilburg University
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
182
Miembro desde
3 año
Número de seguidores
94
Documentos
11
Última venta
2 meses hace

3.7

13 reseñas

5
5
4
4
3
1
2
1
1
2

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes