100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4.2 TrustPilot
logo-home
Resumen

Samenvatting Advanced Database Management and Content Management

Puntuación
-
Vendido
-
Páginas
31
Subido en
18-12-2025
Escrito en
2025/2026

gegeven door prof. Lemahieu in de 1ste Master HIRB. Volledige samenvatting: bevat alle info uit de slides + uit de lessen en het boek

Institución
Grado











Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Libro relacionado

Escuela, estudio y materia

Institución
Estudio
Grado

Información del documento

¿Un libro?
No
¿Qué capítulos están resumidos?
Hoofdstuk 8-13, 16, 18
Subido en
18 de diciembre de 2025
Número de páginas
31
Escrito en
2025/2026
Tipo
Resumen

Temas

Vista previa del contenido

Chapter 12 — Physical Storage, File
Organization & Indexing
Expanded study summary (based on the Chapter 12 handbook + course slides)

Last updated: 2025-12-12




1. Storage hierarchy and why I/O dominates
A database system runs on top of a storage hierarchy. At the top are CPU registers and cache, then main
memory (RAM). These are extremely fast but volatile and limited. Below the I/O boundary lies
secondary storage (HDD/SSD), which is persistent but much slower. Because database files live on
secondary storage, physical database design is mostly about reducing disk I/O: one bad access pattern
can dominate total query time.

The key intuition is that reading from RAM is measured in nanoseconds, while retrieving data from disk
requires milliseconds. Therefore, the DBMS tries to (1) keep frequently used data in memory buffers,
and (2) organize files and indexes so that disk access is as sequential and as predictable as possible.


2. HDD internals and disk-access time components
A traditional hard disk stores bits on rotating platters. Data on a platter surface is organized into tracks,
which are divided into sectors. A set of tracks with the same radius across all surfaces is called a cylinder.
The read/write head moves on an actuator arm to the correct track (seek), while rotation brings the
sector under the head.

2.1 Addressing units: sector, block, and blocking
A sector is the smallest addressable unit on disk (historically 512 bytes, commonly 4096 bytes today).
DBMSs typically read and write disk blocks (also called pages or allocation units), each consisting of one
or more adjacent sectors. Because I/O happens per block, the DBMS benefits when logically related
records are stored within the same or nearby blocks.

2.2 Service time vs response time
Accessing a block requires: (1) seek time (position actuator), (2) rotational delay (wait for sector), and (3)
transfer time (read/write bytes). The response time additionally includes queuing time under load.

• Service time = seek time + rotational delay + transfer time
• Response time = service time + queuing time

,2.3 Random vs sequential block access
When the head is already near the correct position, sequential access avoids most seek time and often
reduces rotational delay. This is why database engines prefer sequential scans and range scans when
possible.

Typical formulas used in the chapter (notation: BS = block size, TR = transfer rate, ROT = rotation time):

• Random block access time (expected): T_rba = Seek + ROT/2 + BS/TR
• Sequential block access time (expected): T_sba = ROT/2 + BS/TR (head already positioned)


3. From logical concepts to physical constructs
Physical database design translates the logical data model (tables, attributes, and relationships) into an
internal model that the DBMS stores as files, blocks, records, and access paths. A central trade-off is
performance vs storage efficiency: structures that accelerate retrieval (indexes, extra pointers) consume
space and must be maintained on updates.

3.1 Data independence view
The chapter connects the conceptual/logical model (what the data means) to the internal model (how
data is physically stored). Physical data independence means the physical layout can change without
forcing changes to user views or application logic.


4. Record organization (how attributes become stored records)
Record organization describes how data items (attribute values) are stored together in a record on disk.
Physically, every value is a sequence of bits, but the DBMS must choose how to pack values so that
retrieval and updates are efficient.

4.1 Relative location (fixed layout)
Relative location is the most common approach: attributes of the same entity are stored in adjacent
positions in a fixed order. The schema tells the DBMS which bytes correspond to which attribute. This is
space efficient and fast, but inflexible when many attributes are optional or variable-length.

4.2 Embedded identification (self-describing records)
With embedded identification, each stored value is preceded by an attribute identifier (type/name),
similar to JSON or XML. This supports missing attributes and variable sets of fields, but adds overhead
because the record must store identifiers.

4.3 Pointers and lists (variable-length and multivalued attributes)
Pointers and lists handle attributes such as multiple addresses or optional repeated fields. The record
stores one or more pointers to overflow areas or chained blocks containing the extra values. This avoids
wasting space inside the main record but adds indirection and additional I/O.

,4.4 Blocking factor (BF)
The blocking factor BF indicates how many records fit in one disk block. For fixed-length records: BF =
floor(BS / RS), where RS is record size. For variable-length records, BF denotes the average number of
records per block. BF matters because a single I/O reads a full block: higher BF means more records
retrieved per disk access.


5. File organization (where records are placed on disk)
Primary file organization determines the physical placement of records on disk and can only be applied
once per file. Secondary file organization provides additional access paths without changing the physical
ordering. Key concepts used throughout are the search key (attribute(s) used for retrieval) and the
number of block accesses required.

5.1 Search keys
A search key is an attribute type or a set of attribute types used to retrieve records. It can be a primary
key, an alternative key, or a non-key attribute. Search keys can also be composite (e.g., (Country,
Gender)) and can support range queries (e.g., YearOfBirth between 1980 and 1990).

5.2 Heap file organization
In a heap file, new records are inserted at the end of the file, and there is no relationship between
attribute values and physical location. The only way to retrieve records is linear search (full scan). This
makes heap files excellent for inserts but inefficient for selective retrieval.

• Expected block accesses to find a record using a unique key: NBLK/2 sequential block accesses (sba)
on average
• For non-unique keys: must scan the entire file

5.3 Sequential file organization
In a sequential file, records are stored in ascending or descending order of a search key. This is very
efficient for ordered retrieval and range queries. Linear search can stop early when keys exceed the
desired value, and binary search becomes possible because the file is sorted.

Search cost comparisons (NBLK = number of blocks in the data file):

• Linear search: NBLK/2 sba (expected)
• Binary search on data file: log2(NBLK) random block accesses (rba)

A key drawback is maintenance: inserting/deleting records while keeping sorted order is cumbersome,
often handled via batch updates or overflow areas.

5.4 Random file organization (hashing)
Hashing (random file organization) assumes a direct relationship between a search key and a bucket
address. A hash function performs a key-to-address transformation and maps each record to a bucket (a
contiguous area of record slots). Hashing is most effective for equality search on unique keys.

, Collisions and overflow
Different keys can map to the same bucket (collision/synonyms). If more records map to a bucket than it
can hold, the bucket overflows. A good hash function distributes keys evenly to minimize overflow.

Division hashing
A common hashing technique is division: address(key) = key mod M, where M is often chosen as a prime
number close to the number of buckets to reduce clustering.

Efficiency and loading factor
Efficiency is measured by expected block accesses to retrieve a record. A non-overflow record typically
costs 1 rba to the bucket (plus possible sba inside the bucket). Overflow records require extra accesses
depending on overflow handling.

• Loading factor (LF): average fill degree of buckets (trade-off between space efficiency and overflow
probability)
• Overflow handling: open addressing (store overflow in primary area) or chaining (store overflow in
separate area linked by pointers)
• Dynamic hashing methods allow the file to grow/shrink without full reorganization

5.5 Indexed sequential file organization (ISFO)
Indexed sequential file organization reconciles two concerns: fast retrieval of individual records (like
hashing) and efficient retrieval of many records in key order (like sequential files). It combines a
sequentially ordered data file with one or more indexes.

Intervals and index entries
The file is divided into intervals (partitions). Each interval is represented by an index entry that stores
the first key in that interval and a pointer to the physical position of the first record (a block pointer or a
record pointer). The index itself is ordered by search key and is searched via binary search.

Dense vs sparse index (revisited)
A dense index has an entry for every search key value (often one per record for unique keys). A sparse
index stores entries for only some key values (typically one per block or per interval). Dense indexes are
faster but larger and more expensive to maintain; sparse indexes are smaller but require scanning inside
the pointed block.

Primary vs clustered index
A primary index is defined over a unique ordering key of the data file. A clustered index is similar but
uses a non-unique attribute (or set of attributes) as the ordering criterion. For clustered indexes,
retrieving all records with the same key value may require additional sequential reads after the first
match.

Because a file can only be physically ordered in one way, a primary index and a clustered index cannot
both exist on the same file. Secondary indexes, however, can be created on other attributes without
changing physical ordering.
$6.11
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor
Seller avatar
zenocominotto
5.0
(1)

Conoce al vendedor

Seller avatar
zenocominotto Katholieke Hogeschool Leuven
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
9
Miembro desde
1 año
Número de seguidores
1
Documentos
5
Última venta
1 semana hace

5.0

1 reseñas

5
1
4
0
3
0
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes