100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Data management for data analytics (JBI050)

Rating
-
Sold
3
Pages
10
Uploaded on
24-03-2023
Written in
2022/2023

This is a summary for the course Data Management for Data Analytics provided by the TU/e. The summary is based on the lectures.

Institution
Module









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Module

Document information

Uploaded on
March 24, 2023
Number of pages
10
Written in
2022/2023
Type
Summary

Subjects

Content preview

DATA MANAGEMENT FOR DATA ANALYTICS
Book Database system concepts, 6th edition – INTRODUCTION
Silberschatz, Korth and Sudarshan 2011
Lectures Pre-recorded Database systems are used to replace database applications
Instructions AUD6 during the lecture timeslots that were built directly on top of file systems, as this had a
The tutorial focusses on the previous subject number of issues.
 Data redundancy and inconsistency
40% Homework assignments weeks 1-6 Usually you would work with multiple sources and file
formats, and there can be duplication of information
60% On campus exam
across files
 Difficulty in accessing data
We are allowed a 2-sided A4 cheat sheet during the exam. You need a new program to carry out each task
 Lack of data isolation
e.g. if you have an issue with your parsers, you can no
SUBJECTS + READINGS + ASSIGNMENTS longer access your data
 Integrity problems
Integrity constraints become buried in code as
opposed to being stated and managed explicitly. It is
Basic E-R model CH1 + CH7 difficult to add new constraints or change existing
Basic RA CH2 + CH6.1 ones
Week 1  Atomicity of updates
Failures may leave databases partially updated
HW1 Sunday night
 Concurrent access by multiple users
Uncontrolled concurrency can lead to inconsistencies
Advanced E-R model CH7  Security problems
Advanced RA CH6.1 It’s difficult to provide access to just a part of the data
Week 2 Database managements provides a solution for these issues
HW2 Sunday night
Two classes of languages:
Advanced E-R model CH7 Procedural User specifies what data is required and
how to get it. This is algebraic
Basic SQL CH3
Week 3 Declarative User specifies what data is required, not
how to get it. Based on calculus and logic
HW3 Sunday night
Data manipulation Entering, updating, deleting and
Constraints: FDs CH8 + additional material retrieving information from a database.
Advanced SQL CH3 + parts of CH4 & 5 Querying logical schemas
Week 4 Database design Given requirements for an information
HW4 Sunday night system, how to design the database?
How do we transform the design into an
actual implementation?
Constraints: FDs CH8
Advanced SQL CH3 + parts of CH4 & 5
Object system The “real world” that you try to model
Week 5
Information system The representation of the real world.
HW5 Sunday night This is an approximation.

Decomposition CH8 + appendix B Data modelling Which information? The structure,
based on FDs relationships, constraints, semantics
Week 6 Datalog Parts of appendix C Process modelling How is the information used/
manipulated/shared/created?
HW6 Sunday night
Data independence Insulation from changes in the way
the data is structured and stored
The questions in the homework assignments are
representative of possible exam questions. Schema The logical structure of the database
Physical data independence is the ability to
modify the physical schema without changing
the logical schema.
Instance The actual content of the database at a particular
point in time

, ENTITY-RELATIONSHIP MODELS ADVANCED E-R MODELS

BASIC E-R MODELS CARDINALITY CONSTRAINTS

E-R models They model relational databases and represent Cardinality constraints express the number of entities to
entities and their relationships. which another entity can be associated via a relationship set:
This is a conceptual schema.  One-to-one: e.g. each person only has one birth
Sets Collections of data that do not have duplicates certificate, and each birth certificate can only be
associated with one person. Each entity is mapped to
E-R models connect entities and relationships with lines. It is at most one entity in the other entity set.
not possible to directly connect two entities with a line, they
can only be connected through relationships.  One-to-many: e.g. you can have many passports, but
each passport only relates to one person
ENTITIES  Many-to-one: e.g. passports and people switch sides
 Many-to-many: e.g. friends to friends relationships
Entity A “thing” or “object” in the enterprise that is
distinguishable from other objects. → signifies “one”, − signifies “many”
Entity set A set of entities of the same type that share the
same properties (e.g. a set of persons)

Entities have attributes: the data that you want to store.
These are descriptive properties that all members of an Each passport is associated with at most one person.
entity set have. Each attribute has a domain, which is the set Each person can be associated with multiple passports.
of permitted values per attribute.

An example of an entity in a E-R diagram, with different
types of attributes: PARTICIPATION CONSTRAINTS

Entity set name Participation constraints provide information about whether
all entities in the set participate in at least one relationship.
Primary key → Primary key
 Total participation: Every entity in the set participates
Address → Composite attribute (vs. simple)
in at least one relationship in the relationship set.
Street → Component attribute
This is indicated by a double line (=).
Street name → Component attribute
e.g. each student is related to at least one instructor
Home number → Component attribute through the advisor relationship
City → Component attribute
 Partial participation: Some entities may not participate
{ phone_number } → Multivalued attribute (a set of values)
in any relationship in the relationship set (default).
date_of_birth → Simple attribute
e.g. instructors don’t need to advise any students,
Age () → Derived attribute (from date_of_birth) there are instructors that don’t advise any students.

In ternary relationships (between 3 entity sets), at most one
RELATIONSHIPS arrow may be used to indicate a cardinality constraint.
Any non-binary relationship can be represented using
Relationship An association between several entities binary relationships, by replacing the relationship set with a
Relationship set A relation among 𝑛 ≥ 2 entity sets. new entity set.
Each set must have a unique name.

{(𝑒1 , … , 𝑒𝑛 ) ∣ 𝑒1 ∈ 𝐸1 , … , 𝑒𝑛 ∈ 𝐸𝑛 } DESIGN CHOICES
Where (𝑒1 , … , 𝑒𝑛 ) is a relationship
𝐸𝑖 is an entity set and 𝑒𝑖 is an entity Avoid nulls and redundancies.
The role an entity has in the relationship can be indicated by
The degree of a relationship set is the number of entity sets labelling the lines/arrows that connect diamonds and
that participate in a relationship set. Relationship sets that rectangles. This is optional and explains how
involve two entity sets are binary. the entities interact via the relationship set.

Relationships can also have attributes. These are then A relationship attribute can also be added as an entity set
connected to the relationship by a dashed line. attribute if the relationship is many-to-one and
the entity set in question has total participation.
KEYS Attribute (A) as entity set attribute or as entity
set: It can be added as entity set attribute if the
Super key A subset of 1+ attributes whose values relationship would otherwise be one-to-one
uniquely identify each entity and the other entity set has total participation.
Candidate key A minimal super key.
No subset of this key is a super key. Use of a relationship set or entity set: If a relationship
There can be multiple candidate keys between two entities can occur more than once, a new
Primary key The chosen identifier for an entity set. entity set should be added in between. Otherwise a
These are underlined in E-R diagrams. relationship set suffices.

The combination of primary keys of the participating entity
sets forms the super key of the relationship set.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
NienkeUr Technische Universiteit Eindhoven
Follow You need to be logged in order to follow users or courses
Sold
46
Member since
3 year
Number of followers
18
Documents
11
Last sold
1 month ago

4.7

3 reviews

5
2
4
1
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions