PostgreSQL uses an internal heuristic
parser for all date/time input support. Dates and times are input as
strings, and are broken up into distinct fields with a preliminary
determination of what kind of information may be in the
field. Each field is interpreted and either assigned a numeric
value, ignored, or rejected.
The parser contains internal lookup tables for all textual fields,
including months, days of the week, and time
zones.
This appendix includes information on the content of these
lookup tables and describes the steps used by the parser to decode
dates and times.
The date/time types are all decoded using a common set of routines.
Date/Time Input Interpretation
Break the input string into tokens and categorize each token as
a string, time, time zone, or number.
If the numeric token contains a colon (:), this is
a time string. Include all subsequent digits and colons.
If the numeric token contains a dash (-), slash
(/), or two or more dots (.), this is
a date string which may have a text month.
If the token is numeric only, then it is either a single field
or an ISO 8601 concatenated date (e.g.,
19990113 for January 13, 1999) or time
(e.g. 141516 for 14:15:16).
If the token starts with a plus (+) or minus
(-), then it is either a time zone or a special
field.
If the token is a text string, match up with possible strings.
Do a binary-search table lookup for the token
as either a special string (e.g., today),
day (e.g., Thursday),
month (e.g., January),
or noise word (e.g., at, on).
Set field values and bit mask for fields.
For example, set year, month, day for today,
and additionally hour, minute, second for now.
If not found, do a similar binary-search table lookup to match
the token with a time zone.
If not found, throw an error.
The token is a number or number field.
If there are more than 4 digits,
and if no other date fields have been previously read, then interpret
as a "concatenated date" (e.g., 19990118). 8
and 6 digits are interpreted as year, month, and day, while 7
and 5 digits are interpreted as year, day of year, respectively.
If the token is three digits
and a year has already been decoded, then interpret as day of year.
If four or six digits and a year has already been read, then
interpret as a time.
If four or more digits, then interpret as a year.
If in European date mode, and if the day field has not yet been read,
and if the value is less than or equal to 31, then interpret as a day.
If the month field has not yet been read,
and if the value is less than or equal to 12, then interpret as a month.
If the day field has not yet been read,
and if the value is less than or equal to 31, then interpret as a day.
If two digits or four or more digits, then interpret as a year.
Otherwise, throw an error.
If BC has been specified, negate the year and add one for
internal storage. (There is no year zero in the Gregorian
calendar, so numerically 1BC becomes year
zero.)
If BC was not specified, and if the year field was two digits in length, then
adjust the year to 4 digits. If the field was less than 70, then add 2000;
otherwise, add 1900.
Tip: Gregorian years AD 1-99 may be entered by using 4 digits with leading
zeros (e.g., 0099 is AD 99). Previous versions of
PostgreSQL accepted years with three
digits and with single digits, but as of version 7.0 the rules have
been tightened up to reduce the possibility of ambiguity.