Databases - summary
Lecture 1: Introduction and Overview
A database (DB) is a collection of data with
● A certain logical structure
● A specific semantics
● A specific group of users
A database management system (DBMS) allows to
● Create, modify and manipulate a database
● Query (retrieve) the data using a query language
● Support persistent storage of large amounts of data
● Enable durability and recovery from failure
● Control access to the data by many users in parallel
○ Without unexpected interactions among users (isolation)
○ Actions on the data should never be partial (atomicity)
Motivation for Database Management Systems
● why not just store data in files?
○ No query language
○ Weak logical structure (limited to directories)
○ No efficient access
■ Searching through a large file can take hours
○ No or limited protection from data loss
○ No access control for parallel manipulation of data
● So we need database management systems…
● Data independence
○ Logical view on the data independent of physical storage
○ User interacts with a simple view on the data
○ Behind the scenes (invisible for the user) are complex storage structures that
allow rapid access and manipulation
● Avoidance of duplication
○ Different views on the same database
■ For different users or different applications
■ Hiding parts of the data for privacy or security
● This is achieved by the ANSI SPARC Architecture…
ANSI SPARC Architecture: 3 levels
● View level
○ Application programs hide details of data types
○ Hide information (e.g., exam grade) for privacy
or security
● Logical level, also called conceptual schema
○ Describes data stored in the database, and
○ Relations among the data
1
,Databases - summary
● Physical level
○ How the data is stored
○ Disk pages, index structures, byte layout, record order
● This ensures logical and physical data independence…
● Different applications might use different views
● Data stored once at physical level good for consistency
● With these different views, we avoid that data has to be duplicated
Data independence
● Logical data independence
○ Logical data independence is the ability to modify the logical schema without
breaking existing applications
■ Applications access the views, not the logical database
● Physical data independence
○ Physical data independence is the ability to modify the physical schema without
changing the logical schema
■ E.g., a change in workload might cause the need for
● Different indexing structures
● Different database engine
● Distributing database engine
● Distributing the database on multiple machines
● Etc.
Relational model
● In this course, we work with relational databases
● View and logical level represent data as relations/tables
● In the pure relational model, a table is a set of tuples:
○ Has no duplicate tuples (rows)
○ No order on the tuples
Relational Model: Schema
● Database schema: structure of the database, that is, relations + constraints
● Example schema
○ Customers(id, name, street, city)
○ Accounts(depositor → Customers(id), accountnr)
● Database instance: actual content (‘state’) of the database at some moment
2
,Databases - summary
Motivation for database management systems
● High-level declarative query languages
○ Query tells you what you want, independent of storage structure
○ Efficient data access (automatic query optimization)
● Declarative query languages
○ Describe what information is sought
○ Not prescribe how to retrieve the desired information
Imperative vs. declarative languages
● Algorithm = logic + control (Kowalski)
● Imperative languages
○ Explicit control
○ Implicit logic
● Declarative languages
○ Implicit control
○ Explicit logic
SQL = Structured Query Language
● SQL is a declarative data manipulation language. The user describes conditions the
requested data is required to fulfil
● More concise than imperative languages
○ Less expensive program development
○ Easier maintenance
● Database system will optimise the query
○ Decides how to execute the query as fast as possible
○ Users (usually) do no need to think about efficiency
Motivation for Database Management Systems
● Well-defined data models & data integrity constraints
○ Relational model
○ Meta language for describing
■ Data
■ Data relations
■ Data constraints
● SQL can be used for table and constraint definitions…
3
, Databases - summary
Integrity constraints
● Various types of constraints
○ Data types, constrained data types
(domains)
■ E.g., numeric(2,0), varchar(40)
○ Columns constraints
■ E.g., unique, nullability, counter
○ Check constraints: logical expression
for domain integrity
■ E.g., age >= 18 and age <= 150
SQL DDL (Data Definition Language)
Concurrent Access & Transitions
● Motivation for database management systems
○ Multiple users, concurrent access
■ Transactions with ACID properties
● A transaction is a sequence of operations that performs a single logical function in a
database application
● Database management system ensures ACID properties
○ Atomicity: transaction executes fully (commit) or not at all (abort)
○ Consistency: database remains in a consistent state where all integrity
constraints hold
○ Isolation: multiple users can modify the database at the same time but will not
see each others partial actions
○ Durability: once a transaction is committed successfully, the modified data is
persistent, regardless of disk crashes
Designing database schemes: Entity Relationship (ER) model
● Entities = objects
○ E.g., customers, accounts, bank branches
4
Lecture 1: Introduction and Overview
A database (DB) is a collection of data with
● A certain logical structure
● A specific semantics
● A specific group of users
A database management system (DBMS) allows to
● Create, modify and manipulate a database
● Query (retrieve) the data using a query language
● Support persistent storage of large amounts of data
● Enable durability and recovery from failure
● Control access to the data by many users in parallel
○ Without unexpected interactions among users (isolation)
○ Actions on the data should never be partial (atomicity)
Motivation for Database Management Systems
● why not just store data in files?
○ No query language
○ Weak logical structure (limited to directories)
○ No efficient access
■ Searching through a large file can take hours
○ No or limited protection from data loss
○ No access control for parallel manipulation of data
● So we need database management systems…
● Data independence
○ Logical view on the data independent of physical storage
○ User interacts with a simple view on the data
○ Behind the scenes (invisible for the user) are complex storage structures that
allow rapid access and manipulation
● Avoidance of duplication
○ Different views on the same database
■ For different users or different applications
■ Hiding parts of the data for privacy or security
● This is achieved by the ANSI SPARC Architecture…
ANSI SPARC Architecture: 3 levels
● View level
○ Application programs hide details of data types
○ Hide information (e.g., exam grade) for privacy
or security
● Logical level, also called conceptual schema
○ Describes data stored in the database, and
○ Relations among the data
1
,Databases - summary
● Physical level
○ How the data is stored
○ Disk pages, index structures, byte layout, record order
● This ensures logical and physical data independence…
● Different applications might use different views
● Data stored once at physical level good for consistency
● With these different views, we avoid that data has to be duplicated
Data independence
● Logical data independence
○ Logical data independence is the ability to modify the logical schema without
breaking existing applications
■ Applications access the views, not the logical database
● Physical data independence
○ Physical data independence is the ability to modify the physical schema without
changing the logical schema
■ E.g., a change in workload might cause the need for
● Different indexing structures
● Different database engine
● Distributing database engine
● Distributing the database on multiple machines
● Etc.
Relational model
● In this course, we work with relational databases
● View and logical level represent data as relations/tables
● In the pure relational model, a table is a set of tuples:
○ Has no duplicate tuples (rows)
○ No order on the tuples
Relational Model: Schema
● Database schema: structure of the database, that is, relations + constraints
● Example schema
○ Customers(id, name, street, city)
○ Accounts(depositor → Customers(id), accountnr)
● Database instance: actual content (‘state’) of the database at some moment
2
,Databases - summary
Motivation for database management systems
● High-level declarative query languages
○ Query tells you what you want, independent of storage structure
○ Efficient data access (automatic query optimization)
● Declarative query languages
○ Describe what information is sought
○ Not prescribe how to retrieve the desired information
Imperative vs. declarative languages
● Algorithm = logic + control (Kowalski)
● Imperative languages
○ Explicit control
○ Implicit logic
● Declarative languages
○ Implicit control
○ Explicit logic
SQL = Structured Query Language
● SQL is a declarative data manipulation language. The user describes conditions the
requested data is required to fulfil
● More concise than imperative languages
○ Less expensive program development
○ Easier maintenance
● Database system will optimise the query
○ Decides how to execute the query as fast as possible
○ Users (usually) do no need to think about efficiency
Motivation for Database Management Systems
● Well-defined data models & data integrity constraints
○ Relational model
○ Meta language for describing
■ Data
■ Data relations
■ Data constraints
● SQL can be used for table and constraint definitions…
3
, Databases - summary
Integrity constraints
● Various types of constraints
○ Data types, constrained data types
(domains)
■ E.g., numeric(2,0), varchar(40)
○ Columns constraints
■ E.g., unique, nullability, counter
○ Check constraints: logical expression
for domain integrity
■ E.g., age >= 18 and age <= 150
SQL DDL (Data Definition Language)
Concurrent Access & Transitions
● Motivation for database management systems
○ Multiple users, concurrent access
■ Transactions with ACID properties
● A transaction is a sequence of operations that performs a single logical function in a
database application
● Database management system ensures ACID properties
○ Atomicity: transaction executes fully (commit) or not at all (abort)
○ Consistency: database remains in a consistent state where all integrity
constraints hold
○ Isolation: multiple users can modify the database at the same time but will not
see each others partial actions
○ Durability: once a transaction is committed successfully, the modified data is
persistent, regardless of disk crashes
Designing database schemes: Entity Relationship (ER) model
● Entities = objects
○ E.g., customers, accounts, bank branches
4