2025 – 2026
Contents
I Physical File Organization and Indexing 4
1 Storage Hardware and Physical DB Design 4
1.1 The Storage Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Internals of Hard Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 From Logical Concepts to Physical Constructs (3-layer architecture) . . . . . . . . . . . . 6
2 Record Organization 7
3 File Organization 8
3.1 Heap File Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Sequential File Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Random File Organization (Hashing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Indexed Sequential File Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.5 List Data Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.6 Secondary Indexes and Inverted Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.7 B-trees and B+ -trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
II Physical DB Organization 21
4 Physical DB Organization and DB Access Methods 21
4.1 From DB to Tablespace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Index Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 DB Access Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.1 Functioning of Query Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 Index Search (with Atomic Search Key) . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.3 Multiple Index and Multicolumn Index Search . . . . . . . . . . . . . . . . . . . . 25
4.3.4 Index Only Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.5 Full Table Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Join Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Nested-Loop join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.2 Sort-Merge join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.3 Hash join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Enterprise Storage Subsystems and Business Continuity 29
5.1 Disk Arrays and RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1.1 Techniques Applied in RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Enterprise Storage Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 DAS Directly Attached Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2.2 SAN Storage Area Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.3 NAS Network Attached Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2.4 NAS Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.5 iSCSI / storage over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Business Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
,III Extended Relational Databases 37
6 Active RDBMS Extensions 38
6.1 Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2 Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Object-Relational RDBMS Extensions 39
8 Recursive SQL queries 43
IV Object Oriented Databases and Object Persistence 43
9 Advanced Concepts of OO 44
9.1 Method Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.2 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.3 Method Overriding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.4 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9.5 Dynamic Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10 Basic Principles of Object Persistence 45
11 OODBMS 47
11.1 Components of ODMG Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
11.1.1 Object Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
11.1.2 Object Definition Language ODL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
11.1.3 Object Query Language OQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
11.1.4 Language Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
11.2 Evaluation of OODBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
V XML Databases 50
12 Extensible Markup Language XML 50
12.1 Document Type Definitions DTD and XML Schema . . . . . . . . . . . . . . . . . . . . . 51
12.2 Extensible Stylesheet Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
12.3 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
12.4 XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
13 Processing XML Documents 53
13.1 DOM API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
13.2 SAX API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14 Storage of XML Documents 54
15 Differences between XML and Relational Data 55
16 Mappings Between XML Documents and (Object-) Relational Data 55
16.1 Table-Based Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
16.2 Schema-Oblivious Mapping (Schredding) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
16.3 Schema-Aware Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
16.4 SQL/XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
17 Searching XML Data 58
17.1 Full-Text Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
17.2 Keyword-Based Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
17.3 Structured Search with Xquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Advanced Database Management and Content Management
2/73
,18 XML for Information Exchange 59
18.1 Message Oriented Middleware (MOM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
18.2 SOAP-Based Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
18.3 REST-Based Web Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
18.4 Web Services and Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
19 Other Data Representation Formats 60
19.1 JSON (JavaScript Object Notation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
19.2 YAML (YAML Ain’t a Markup Language) . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
VI NoSQL Databases 61
20 The NoSQL Movement 61
21 Key-Value Stores 62
22 Tuple and Document Stores 65
22.1 Items with Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
23 Column-Oriented DBs 65
24 Graph Based DBs 66
25 Other NoSQL Categories 67
VII Data Distribution and Distributed Transaction Management 67
26 Distributed Systems and Distributed DBs 67
27 Architectural Implications of Distributed DBs 67
28 Fragmentation, Allocation and Replication 68
29 Transparancy 69
VIII Data Intergration, Data Quality and Data Governance 69
30 Data and Process Integration 69
30.1 Convergence of Analytical and Operational Data Needs . . . . . . . . . . . . . . . . . . . 69
30.2 Data Integration and Data Integration Patterns . . . . . . . . . . . . . . . . . . . . . . . . 70
Advanced Database Management and Content Management
3/73
, Quizzes and questions to prepare for the exam → https://www.pdbmbook.com
Legend Code Highlightings
SQL keywords are highlighted in blue
Java keywords are highlighted in red
ODL keywords are highlighted in purple
Part I
Physical File Organization and Indexing
1 Storage Hardware and Physical DB Design
1.1 The Storage Hierarchy
Figure 1: Computer memory hierarchy
→ High speed memory, expensive and limited in cap. at the top
→ Slower memory, relatively cheap and larger in size at the bottom
1. Primary Storage (volatile memory)
∗ CPU → executes mathematical and logical processor operations
∗ Cache memory opeates at nearly the same speed as CPU
∗ Central storage (internal-/ main memory) → consists of memory chips == RAM, performance
expressed in nanoseconds
∗ Contains DB buffer and runtime code of the applications and DBMS
2. Secondary Storage
∗ Persistent storage media
∗ HDD Hard disk drive and SSD solid state drives based on flash memory
∗ Contains physical DB files
Exchange of data between secondary and primary storage is called I/O (Input/Output) and is
supervised by the operating system
Advanced Database Management and Content Management
4/73
,1.2 Internals of Hard Drives
HDD stores data on circular platters, covered with magnetic particles. Platters are secured on a
spindle, which rotates at a constant speed. Magnetic particles on platters are organized in concentric
circular tracks, each track consisting of sectors1 , which are the smalles addressable unit on HDD. A set
of tracks with the same diameter are called cylinders.
∗ HDD also contains a hard disk controller
∗ HDDs are directly accessible storage devices (DASDs)
∗ Disk blocks (clusters, pages, allocation units) consist of 2 or more physically adjacent sectors
Figure 2: Internals of HDD
Operations
BS: block size, ROT: Rotation time, TR: Transfer rate
→ Reading from a block, or writing to a block implies
∗ Positioning the actuator (seek time)
∗ Wait until the desired sector has rotated under the read/write head (rotational delay/ latency)
Transfer time depends on block size, density of magnetic particles and rotation speed of disks
Response time
service time + queueing time
Service time
seek time + rotational delay + transfer time
−→ Minimize expected seek time and rotational delay
Trba = Seek + ROT/2 + BS/TR
→ Trba expected time to retrieve/write disk block that’s situated randomly on the platter, independently
of previous read/write (rba = random block access)
Tsba = ROT/2 + BS/TR
→ Tsba expected time to sequentially retrieve disk block with R/W head already in correct position, thus
when the R/W can just go on sequentially, not having to search for the block’s position (sba = sequential
block access)
1 Traditionally 512 bytes, more recently 4096 bytes
Advanced Database Management and Content Management
5/73
,1.3 From Logical Concepts to Physical Constructs (3-layer architecture)
Figure 3: 3-Layer-Architecture
The Layers can also be defined as followed:
1. Presentation Layer (external) e.g. tables etc.
Views are the visual components responsible for displaying information to a user
2. Application Layer (conceptual/logical)
3. Data Layer (internal) stores and manages all the data
Catalog is the heart of the DBMS and contains the data definition or metadata of de DB application.
– Stores the definitions of the views, logical and internal DM and synchronizes these 3 DM to guarantee
their consistency.
Advanced Database Management and Content Management
6/73
,2 Record Organization
→ Record organization refers to organization of data items into stored records. Physical
implementation of a data item is a series of bits. Common techniques:
Relative Location simplest and most widespread
∗ Data items that represent attributes of same entity are stored on physically adjacent addresses
∗ Attribute types determined by relative ordering
→ Here is a record a horizontal row in the table that represents a single, complete entry of related data.
Embedded Identification data items representing attributes always preceded by attribute type
∗ Only non-empty attributes of record included
∗ Missing attributes no problem and no need to store attributes in fixed order to identify
∗ Similar to XML and JSON
Advanced Database Management and Content Management
7/73
,Pointers and Lists ideal for dealing with variable length records (due to e.g. variable length data type
multivalued attribute type, optional attribute type. . . )
BS: Block size, ROT: Rotation time, TR: Transfer rate, BF: Blocking factor = ⌊BS/RS⌋
Blocking Factor indicates how many records are stored in a single disk block. Determines how many
records are retrieved with a single read operation
3 File Organization
Introductory Concepts
Search Key single attribute type, or set of attribute types, whose values determine criteria according
to which records are retrieved
∗ Can be PK, Alternative key, or one or more non-key attribute types
∗ Can be composite (e.g. country, gender)
∗ can also be used to specify range queries, e.g. YearOfBearth btw 1980 and 1990
Primary file organization methods determine physical positioning of stored records on storage medium
→ e.g. heap files, random file organization, indexed seq. file organiz.
∗ Can only be applied once
Secondary file organization methods provide constructs to efficiently retrieve records according to
search key that was not used for primary file organiz. Based on secondary index
Linear search each record in file is retrieved and assessed against search key
Hashing and indexing primary techniques that specify relationship btw record’s search key and physical
location
3.1 Heap File Organization
Basic (and simplest) primary file organization method
∗ New records inserted at end of file
∗ No relationship between record’s attributes and physical location
∗ Only option for record retrieval is linear search
∗ For a file with NBLK blocks, it takes on average NBLK/2 sba to find record according to unique
search key
∗ Searching records according to non-unique search key requires scanning entire file
→ Always adding records at the end results in an unstructured sequence. When retrieving or updating
a record, searching the record means checking every other record as well. Efficient for inserting,
inefficient for retrieval.
Ppt does not say anything about the specifics and details of heap file organization.
Advanced Database Management and Content Management
8/73
,3.2 Sequential File Organization
∗ Records stored in ascending/descending order of search key
∗ Efficient to retrieve records in order determined by search key
∗ Records can still be retrieved by means of linear search, but now a more effective stopping criterion
can be used, i.e. once first higher/lower key value than required one is found
Binary search technique can be used, for unique search key K, with values Ki : algorithm to retrieve
record with key value Kµ :
Expected number of block accesses to retrieve record according to PK by means of:
∗ Linear search: NBLK/2 sba
∗ Binary search: log2 (NBLK) rba
Figure 4: Example: Sequential File Organization
Updating sequential file is more cumbersome than updating heap file → often done in batch.
Sequential files often combined with one or more indexes, see chapter 3.4
3.3 Random File Organization (Hashing)
Random file organization / direct file organization / hash file organization
assumes direct relationship between value of search key and physical location. Hashing algorithm defines
key-to-address transformation → generated addressess pertain to a bucket (bucket is simply a storage
area that can hold one or more records)
Advanced Database Management and Content Management
9/73
, Figure 5: Hashing
Logically, most effective when using PK or other candidate key as search key.
Hashing cannot guarantee that all keys are mapped to different hash values, hence bucket addresses →
collision occurs when several records are assigned to same bucket, also called synonyms.
∗ If more synonyms than slots for a bucket, bucket is in overflow → additional block accesses needed
to retrieve overflow records
∗ Hashing algorithm should distribute keys as evenly as possible over the respective bucket addresses
→ Popular hashing technique is division
address(keyi ) = keyi mod M
M is often a prime number close to, but a bit larger than, the number of available addresses
Why M is typically chosen a prime number:
it minimizes repetitive mappings and ensures
a more uniform distribution of records among
Efficiency
buckets. A non-prime divisor tends to cause
clustering, increasing the risk of collisions and
overflows.
of hashing algorithm measured by expected number of rba and sba → After using the algorithm and
retrieving the bucket, there is 1 rba to the first block of the bucket. Possibly followed by sba’s if the
bucket has overflow records.
Percentage of overflow records depend on hashing algorithm and key set → aim to achieve uniform
distribution, spreading the set of records evenly over the set of available buckets. Required number of
buckets NB:
NR
NB = ⌈ ⌉
BS × LF
Advanced Database Management and Content Management
10/73