A description of the database file page format.
This section provides an overview of the page format used by
PostgreSQL tables and indexes.  (Index
access methods need not use this page format.  At present, all index
methods do use this basic format, but the data kept on index metapages
usually doesn't follow the item layout rules exactly.)  TOAST tables
and sequences are formatted just like a regular table.
In the following explanation, a
byte
is assumed to contain 8 bits.  In addition, the term
item
refers to an individual data value that is stored on a page.  In a table,
an item is a tuple (row); in an index, an item is an index entry.
Table 7-1 shows the basic layout of a page.
There are five parts to each page.
Table 7-1. Sample Page Layout
| Item | Description | 
|---|
| PageHeaderData | 20 bytes long. Contains general information about the page, including
free space pointers. | 
| ItemPointerData | Array of (offset,length) pairs pointing to the actual items. | 
| Free space | The unallocated space. All new tuples are allocated from here, generally from the end. | 
| Items | The actual items themselves. | 
| Special Space | Index access method specific data. Different methods store different
data. Empty in ordinary tables. | 
  The first 20 bytes of each page consists of a page header
  (PageHeaderData). Its format is detailed in Table 7-2. The first two fields deal with WAL
  related stuff. This is followed by three 2-byte integer fields
  (pd_lower, pd_upper,
  and pd_special). These represent byte offsets to
  the start
  of unallocated space, to the end of unallocated space, and to the start of
  the special space. 
  
 
Table 7-2. PageHeaderData Layout
| Field | Type | Length | Description | 
|---|
| pd_lsn | XLogRecPtr | 8 bytes | LSN: next byte after last byte of xlog | 
| pd_sui | StartUpID | 4 bytes | SUI of last changes (currently it's used by heap AM only) | 
| pd_lower | LocationIndex | 2 bytes | Offset to start of free space. | 
| pd_upper | LocationIndex | 2 bytes | Offset to end of free space. | 
| pd_special | LocationIndex | 2 bytes | Offset to start of special space. | 
| pd_pagesize_version | uint16 | 2 bytes | Page size and layout version number information. | 
  All the details may be found in src/include/storage/bufpage.h.
 
  
  Special space is a region at the end of the page that is allocated at page
  initialization time and contains information specific to an access method. 
  The last 2 bytes of the page header,
  pd_pagesize_version, store both the page size
  and a version indicator.  Beginning with
  PostgreSQL 7.3 the version number is 1; prior
  releases used version number 0.  (The basic page layout and header format
  has not changed, but the layout of heap tuple headers has.)  The page size
  is basically only present as a cross-check; there is no support for having
  more than one page size in an installation.
 
  Following the page header are item identifiers
  (ItemIdData), each requiring four bytes.
  An item identifier contains a byte-offset to
  the start of an item, its length in bytes, and a set of attribute bits
  which affect its interpretation.
  New item identifiers are allocated
  as needed from the beginning of the unallocated space.
  The number of item identifiers present can be determined by looking at
  pd_lower, which is increased to allocate a new identifier.
  Because an item
  identifier is never moved until it is freed, its index may be used on a
  long-term basis to reference an item, even when the item itself is moved
  around on the page to compact free space.  In fact, every pointer to an
  item (ItemPointer, also known as
  CTID) created by
  PostgreSQL consists of a page number and the
  index of an item identifier.
 
 
  The items themselves are stored in space allocated backwards from the end
  of unallocated space.  The exact structure varies depending on what the
  table is to contain. Tables and sequences both use a structure named
  HeapTupleHeaderData, described below.
 
 
  The final section is the "special section" which may contain anything the
  access method wishes to store. Ordinary tables do not use this at all
  (indicated by setting pd_special to equal the pagesize).
  
 
  All table tuples are structured the same way. There is a fixed-size
  header (occupying 23 bytes on most machines), followed by an optional null
  bitmap, an optional object ID field, and the user data. The header is
  detailed
  in Table 7-3.  The actual user data
  (fields of the tuple) begins at the offset indicated by
  t_hoff, which must always be a multiple of the MAXALIGN
  distance for the platform.
  The null bitmap is
  only present if the HEAP_HASNULL bit is set in
  t_infomask. If it is present it begins just after
  the fixed header and occupies enough bytes to have one bit per data column
  (that is, t_natts bits altogether). In this list of bits, a
  1 bit indicates not-null, a 0 bit is a null.  When the bitmap is not
  present, all columns are assumed not-null.
  The object ID is only present if the HEAP_HASOID bit
  is set in t_infomask.  If present, it appears just
  before the t_hoff boundary.  Any padding needed to make
  t_hoff a MAXALIGN multiple will appear between the null
  bitmap and the object ID.  (This in turn ensures that the object ID is
  suitably aligned.)
  
 
Table 7-3. HeapTupleHeaderData Layout
| Field | Type | Length | Description | 
|---|
| t_xmin | TransactionId | 4 bytes | insert XID stamp | 
| t_cmin | CommandId | 4 bytes | insert CID stamp (overlays with t_xmax) | 
| t_xmax | TransactionId | 4 bytes | delete XID stamp | 
| t_cmax | CommandId | 4 bytes | delete CID stamp (overlays with t_xvac) | 
| t_xvac | TransactionId | 4 bytes | XID for VACUUM operation moving tuple | 
| t_ctid | ItemPointerData | 6 bytes | current TID of this or newer tuple | 
| t_natts | int16 | 2 bytes | number of attributes | 
| t_infomask | uint16 | 2 bytes | various flags | 
| t_hoff | uint8 | 1 byte | offset to user data | 
   All the details may be found in src/include/access/htup.h.
 
 
  Interpreting the actual data can only be done with information obtained
  from other tables, mostly pg_attribute. The
  particular fields are attlen and
  attalign. There is no way to directly get a
  particular attribute, except when there are only fixed width fields and no
  NULLs. All this trickery is wrapped up in the functions
  heap_getattr, fastgetattr
  and heap_getsysattr.
  
 
  To read the data you need to examine each attribute in turn. First check
  whether the field is NULL according to the null bitmap. If it is, go to
  the next. Then make sure you have the right alignment.  If the field is a
  fixed width field, then all the bytes are simply placed. If it's a
  variable length field (attlen == -1) then it's a bit more complicated,
  using the variable length structure varattrib.
  Depending on the flags, the data may be either inline, compressed or in
  another table (TOAST).