Physical Design¶
Learning Goals
- Understand how tables are stored in files
- Understand basic indexing techniques
- Use knowledge of file/index to tune basic SQL queries
File Organization¶
Storage Hierarchy¶
- Volatile storage (cache, main memory): loses content when power is switched of
- Non-volatile storage: content persists even when power is switched of
- The higher the level, the faster the access
Magnetic Hard Disk¶
For each disk access
- Each track is divided into sectors (smallest unit of data that can be read/written)
- A block (page) is a contiguous sequence of sectors from a single track
- Basic unit of data transfer between disk and memory
Optimization of disk block access
- Organize blocks in the way data is accessed
- Store related information close to each other
Conceptually, relational data is stored as sequences of bits on disk.
Functional requirements
- Processing records sequentially
- Efficient key-value search
- Insertion/deletion of records
Performance objectives
- Little wasted space
- Fast response time
- High number of transactions
Files¶
Storing databases on disks
- The database is stored as a collection of files
- Each file corresponds to a set of records
- A record contains a number of fields
Multiple records are “grouped” into blocks (pages)
Record size
- Fixed
- Variable
Files reside on mass storage (usually a disk)
- Fast random access
- Non-volatile storage
Example¶
Blocks in a file system are not necessarily contiguous
Speed
- Speed of reading a block (an I/O):
- \sim 10 msec to a non-contiguous block read
- \sim 1 msec for a contiguous block read
- DBMS/OS can reorganize blocks to make them contiguous
Fixed Size Records¶
All records use the same length if they occupy the whole space or not
Read record i
- Get record at location: \mathrm{recordSize \times i}
Delete record i
-
Move records i+1,\dots,n to location i,\dots,n-1 to close the gap
or
-
Move record n to i
or
-
Mark the gaps and fill them with future insertions
- Mark first gap in the file header
- Use the gaps to point to other gaps (free-list)
Variable Size Records¶
Records have different lengths and occupy different amounts of disk space
Use case: variable-lengths attributes (e.g., varchar)
Implementation alternatives
- If max. size is known:
- Map variable size records to fixed size records
- Slotted-page-structure
- Records stored contiguously
- Block header contains pointers to records
Organizing Records in a File¶
Determine the order of records within a file
Heap file organization
- A record can be placed anywhere in the file
Sequential file organization
- Records stored in sequential order by the value of the search key
Hash file organization
- Hash the record to a block based on a hash function and a hash key
Heap¶
- No apparent ordering on any of the columns in the table
- Search: linear scan always works
- Insert: find a free slot
Sequential¶
- Example table is stored in order of the ID column
- The ordering column does not need to be the primary key
- Search: binary search when search on the ID column
- Insert: reorganization of the file
Hashing¶
- Hash function used in this example is: mod(ID, 3)
- Search: use the hash function on the search key value (ID) to find the right block
- Insert: use the hash function on the search key value (ID) to find the right block and append
Indexes¶
Assumptions
- Many rows are stored in the database
- Many queries only access a small fraction of the rows
- Must be able to modify the rows, i.e., insert, update, and delete
- Cannot take the database offline to reorganize the files used in the database
Goal: access as little data as possible
Overview¶
Classification
Variations
- Single-level index
- Multi-level index
It is possible to have multiple indexes for the same relation.
Primary Sparse Index¶
- Defined on a file ordered on the search key
- One index entry for each block in the file
Secondary Dense Index¶
- Defined on a file not ordered on the search key
- One index entry for each record
Overview¶
Primary vs. secondary
- = clustering vs. non-clustering
- Is the file ordered on the search key?
Dense vs. sparse
- A separate index entry for each record (unique search key value)?
- Yes → dense
- No → sparse
Tradeoff
- Dense indexes: faster location of records
- Sparse indexes: smaller indexes
Multi-level Index¶
- Goal: the outer index (sparse) fits into main memory
- The number of levels can be greater than two
Limitations of index-sequential file organization
- Insertions and deletions
- Expensive reorganization of multiple levels of ordered files
B^+^-trees
- Balanced search trees Number of lookups/levels is the same for all entries
- Leaving some space in each disk block
Index Concepts Combined¶
A secondary sparse index does not make sense!
- Since records are not sorted on that key, we cannot predict the location of a record from the location of any other record
- Thus, secondary indexes are always dense.
Clustering Dense
- Index entry has a pointer to the first record
Non-clustering Dense
- Index entry has pointers to all the records
B+-Tree¶
- n = the fanout = the number of pointers = 3 in the example above
- \dashv = unused pointer
- nu = Unused search key value
- Leaf-level pointers: point to the data file
- Search: Both key lookups and range queries are well supported
Node Structure¶
Node structure used for root, internal, and leaf nodes
In this example, the fanout n is 10
- In each node: 10 (n) pointers (p_i)
- In each node: 9 (n-1) search key values (k_j)
The last pointer in leaf nodes is used to chain together the leaf nodes in search key order
Properties¶
Balanced
- All paths from the root node to leaf nodes have the same length
Bushy
- Each node has between \lceil {n \over 2} \rceil and n children
- Exception: the root node has between 2 and n children
- Leaf nodes have between $\lceil {n-1\over2} \rceil $ and n - 1 pointers to file records and 1 pointer to the next leaf node
Ordered
- The key values are sorted in each node, i.e, k_i < k_j, if i < j
- The subtrees are ordered
Minimal B+-Tree Example¶
- A minimally filled B+-tree for n=3
B+-Tree in Practice¶
- Each B+-tree node has the size of one I/O block of data.
- A B+-tree node is at least 50% filled (by design)
- The B+-tree contains a rather small number of levels, usually logarithmic in the size of the main file
- The first one or two levels of the tree are stored in main memory
- “Logically” close does not imply “physically close” Typically reading a node in a B+-tree requires one random I/O
- The non-leaf nodes are a hierarchy of sparse indexes
When a primary key constraint for a table is created in a DBMS, then the uniqueness is checked by using a B+-tree.
B+-Tree Updates¶
Insertions
- Overflow, split
- The tree height may be increased by 1
Deletions
- Underflow, borrow, coalesce
- The tree height may be decreased by 1
Multiple examples in DBS7 Slide 30- p. 54-
B+-Trees Online¶
Interactive apps available on the Web
https://goneill.co.nz/btree-demo.php
https://www.cs.usfca.edu/~galles/visualization/BPlusTree.html
third party material, not fully tested, might have a slightly different implementation (and errors)
Unordered Indexes (Hashing)¶
Static Hash Index¶
Build an index based on a hash function (instead of based on a search key order)
- Choose an appropriate hash function h
- Apply hash function h to search key value k to compute h(k)
- Bucket (disk block) for each value of h(k)
- Lookup
- One access to the bucket directory
- One access to the data file
- Good performance depends on a good hash function
- Bucket overflow
- Too many different key values hash to the same bucket
- Solution: overflow buckets and overflow chains
Open vs Closed Hashing¶
Open Hashing
- Overflow chains
Closed Hashing
- Fixed number of buckets
- Overflow: use one of the other buckets (linear probing, further hash functions, etc.)
Static Hashing¶
Problem for static hashing:
- Hash function and number of buckets determined during initialization
Databases grow or shrink over time!
-
Initial number of buckets too small
\to many overflows, performance degrades
-
Initial number of buckets too big
\to underflow, space is wasted
Solutions
- Periodic re-organization
- Dynamic hashing:
- allows modifications dynamically
Physical Design Tuning¶
Design decisions
- Ordered/clustering indexes vs. hashing
- Sparse vs. dense index
- Clustering vs. non-clustering indexes
- Joins and indexes
Tuning questions
- Are the costs of periodic reorganization acceptable?
- How frequent are insertions and deletions?
- Optimization goal: average access time or worst-case access time?
- What types of queries are expected?
Use of Indexes¶
Sometimes available indexes are not used, why?
- Catalog out of date optimizer may think table is small
- “Good” and “bad” query examples
SELECT * FROM EMP WHERE salary/12 > 4000
- Operations on conditions may prevent the optimizer from realizing the index could be useful
SELECT * FROM EMP WHERE salary > 48000
- Good
SELECT * FROM EMP WHERE SUBSTR(name, 1, 1) = ’G’
- Functions on conditions require function-based indexes
SELECT * FROM EMP WHERE name LIKE ’G%’
- Good
SELECT * FROM EMP WHERE name = ’Smith’
- Good
SELECT * FROM EMP WHERE salary IS NULL
- Requires an index on nullable values
- Nested sub-query
- Selection by negation
- Queries with
OR