Escrito por estudiantes que aprobaron Inmediatamente disponible después del pago Leer en línea o como PDF ¿Documento equivocado? Cámbialo gratis 4,6 TrustPilot
logo-home
Resumen

COMPLETE Summary - Interactive Data Transformation (2023)

Puntuación
-
Vendido
1
Páginas
23
Subido en
08-06-2023
Escrito en
2022/2023

Includes everything from: Lecture 1: Data Management Systems, Relational, and SQL Lecture 2: Entity Relationships, and Translating from a Natural Lecture 3: Translating ERD to DB Schema, and Database Normalization Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems Lecture 5: The Spark Ecosystem, RDDs, Programming Model, and PySpark Lecture 6: Data Transformations with SQL, Entity Recognition, Data Cleaning Tools

Mostrar más Leer menos
Institución
Grado

Vista previa del contenido

Interactive Data Transformation
Unit 4 – Year 2022/2023


Contents
Lecture 1: Data Management Systems, Relational, and SQL .............................................................. 2
Relational data model ............................................................................................................................ 3
Single table queries using SQL ............................................................................................................. 4
Lecture 2: Entity Relationships, and Translating from a Natural Language Perspective................ 5
Basic concepts ........................................................................................................................................ 5
Lecture 3: Translating ERD to DB Schema, and Database Normalization ....................................... 8
Transforming entity relationship diagram to relational schema ..................................................... 8
Normalization ........................................................................................................................................ 11
Lecture 4: Evolution of Data Management, Big Data, and Data Intensive Systems .................... 12
Evolution of Data Management ......................................................................................................... 12
Big data and its challenges .................................................................................................................. 12
Big data analytics .................................................................................................................................. 13
Reasons for going beyond traditional RDBMS ................................................................................ 13
Lecture 5: The Spark Ecosystem, RDDs, Programming Model, and PySpark ................................ 17
Lambda expressions ............................................................................................................................. 17
Apache Spark ........................................................................................................................................ 18
Programming models ........................................................................................................................... 19
Lecture 6: Data Transformations with SQL, Entity Recognition, Data Cleaning Tools ................ 20
Views ...................................................................................................................................................... 21
Functions ............................................................................................................................................... 21
Creating and populating ...................................................................................................................... 22
Data from websites, integration & cleaning, entity extraction & resolution .............................. 22

,Lecture 1: Data Management Systems, Relational, and SQL
Reasons for Database Management Systems (DBMS):
In the early days database applications were built on top of file systems,
Drawbacks:
• Data redundancy and inconsistency: multiple file formats, duplication in different files
• Difficulty in accessing data: need to write a new program to carry out each new task
• Data isolation: multiple files and formats
• Integrity problems: integrity constraints become buried in program code rather than being
stated explicitly
- Hard to add new constraints or change existing ones
• Atomicity of updates: failures leave data in an inconsistent state with partial updates carried
out
• Concurrent access by multiple users: needed for performance, uncontrolled concurrent
accesses can lead to inconsistencies
• Security problems: hard to provide user access to some, but not all, data

A database (DB) is a collection of data with the same structure, including correlations and
relationships, that is defined for a particular use and used by several users
A database management system (DBMS) is a collection of programs over DB that specify the
data types, structure, constraints, and store this on a disk, retrieve, update, and mange access
rights
• A black box interacting between users/applications and the database
• The ultimate goal is to separate data from application
- Provide an interface that the application programmer must follow
- Allow system administrator to make modifications without having an impact on the user
- Users can change their view of the data without having to worry about how it is stored

There are different layers within the DBMS:
• External layer: communication with users
- Analysis of user requests (queries)
- Access control
- Answer presentation
• Logical layer:
- Optimization of queries
- Resolving conflicting accesses, i.e., multiple users
- Guarantees constant availability even in case of failures
• Internal layer:
- Storing the data
- Software for structuring the data
- Efficient access methods, i.e., keys, indices, etc.

System development life cycle
1. Planning: develop a preliminary understanding of the business situation and how information
systems might help solve the problem
• Enterprise modeling: analyze current data processing, the general business functions and
their database needs, and justify need for new data and databases in support of business
• Conceptual data modeling: identify scope of database requirements for proposed
information system and analyze overall data requirements for business function(s) supported
by database
• Understand current data processing
• Understand general business functions and needs

, 2. Analysis: analyze the business situation thoroughly to determine requirements and to
structure those requirements
• Conceptual data modeling:
- Develop preliminary conceptual data model, including entities and relationships
- Compare to enterprise data model
- Develop detailed conceptual data model, including entities, relationships, attributes, and
business rules
- Make conceptual data model consistent with other models of information systems
- Populate repository with all conceptual database specifications
• Output: conceptual schema
• Corresponds to a detailed technology independent specification of the overall organizational
data structure

3. Design
• Logical: representation of the DB; transform the conceptual schema into a logic schema,
which describes the data in terms of the data management technology that will be used to
implement the database
• Physical: the set of specifications that describe how data from a logical schema are stored in
a computer’s secondary memory by a specific database management system

4. Implementation: a designer writes, tests, and installs the program/scripts that access, create,
or modify the database
• Finalize all database documentation, train users, and put procedures into place for the
ongoing support of the information system users
• Populate with data
• Install application(s) and test
• Complete documentation and training materials

5. Maintenance: add, delete, or change characteristics of the structure of the database in order
to meet changing business conditions, to correct errors in database design, or to improve the
processing speed of database applications
• Monitor the operation and usefulness of the system
• Repair by fixing errors in database and applications
• Enhance by analyzing the database and applications to ensure that evolving information
requirements are met

Different types of DBMS:
• Traditional DBMS: text and numerical data
• Multimedia DBMS: multimedia data
• Spatial DBMS: geographic and geometric data
• Data warehouses
Relational data model
A relational model is an approach to managing data by representing it grouped into relations

A relational DBMS (RDBMS) is a database management system that manages data as a collection
of tables in which all relationships are represented by common values in related tables

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

Subido en
8 de junio de 2023
Número de páginas
23
Escrito en
2022/2023
Tipo
RESUMEN

Temas

$8.85
Accede al documento completo:

¿Documento equivocado? Cámbialo gratis Dentro de los 14 días posteriores a la compra y antes de descargarlo, puedes elegir otro documento. Puedes gastar el importe de nuevo.
Escrito por estudiantes que aprobaron
Inmediatamente disponible después del pago
Leer en línea o como PDF


Documento también disponible en un lote

Conoce al vendedor

Seller avatar
Los indicadores de reputación están sujetos a la cantidad de artículos vendidos por una tarifa y las reseñas que ha recibido por esos documentos. Hay tres niveles: Bronce, Plata y Oro. Cuanto mayor reputación, más podrás confiar en la calidad del trabajo del vendedor.
anneTBKIM Tilburg University
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
52
Miembro desde
6 año
Número de seguidores
39
Documentos
8
Última venta
11 meses hace

3.0

1 reseñas

5
0
4
0
3
1
2
0
1
0

Documentos populares

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes