[postgis-users] Looking at TOAST and compression

David Blasby dblasby at refractions.net
Fri Feb 27 17:08:02 PST 2004


I took a deeper look at postgresql's tuple TOASTing and compression.

Basically, if you have a variable-length data type (like geometries) in 
postgresql it will do "special" things as it grows larger.

Basically, as an object gets to a certain size postgresql will either 
try to compress it (and stick it in the main tuple table as normal), or 
it can TOAST it (basically put it in a Large Object table) or both.  You 
have some control over this when you create the type (c.f. create type's 
'storage' option).

When you have line in the postgis code like this:

GEOMETRY *geom1 = (GEOMETRY *)  PG_DETOAST_DATUM(PG_GETARG_DATUM(0));

When you call the PG_DETOAST_DATUM() the Datum will be:

a) NOT EXTENDED -- its a "normal tuple" (small) the Datum already has 
the Geometry in it.

b) EXTENDED, NOT EXTERNAL, COMPRESSED -- the main tuple table has a 
compressed geometry in it.  Uncompress it and return the Geometry.

c) EXTENDED, EXTERNAL, NOT COMPRESSED -- this is TOASTed.  Grab the 
actual geometry from the TOAST table.

d) EXTENDED, EXTERNAL, COMPRESSED -- this is TOASTed and compressed. 
Grab the compressed geometry from the TOAST table, then uncompress it.

I had originally thought that that postgresql would TOASTing/compressing 
at 8kb (the disk's page size).  It actually starts at page_size/4 (2kb) 
for normal tuples and page_size/16 (500 bytes) for inside-the-index tuples.

I prepared a table of geometries of increasing size.  You can see how 
when the geometry gets to 2kb, it will remain in the main table but be 
compressed.  When the compressed size gets to 2kb, it becomes TOASTed.

The table below shows this process happening (ignore the extra info).

  mem_size |                                             toastinfo 

----------+----------------------------------------------------------------------------------------------------
       300 | normal tuple of size 300 bytes (182 bytes when compressed)
       368 | normal tuple of size 368 bytes (149 bytes when compressed)
       440 | normal tuple of size 440 bytes (194 bytes when compressed)
...
      1740 | normal tuple of size 1740 bytes (968 bytes when compressed)
      1868 | normal tuple of size 1868 bytes (969 bytes when compressed)
      2000 | is EXTENDED is compressed (1996 raw) 985 bytes (985 bytes 
when compressed)
      2136 | is EXTENDED is compressed (2132 raw) 1002 bytes (1002 bytes 
when compressed)
...
      8336 | is EXTENDED is compressed (8332 raw) 1306 bytes (1306 bytes 
when compressed)
      8600 | is EXTENDED is compressed (8596 raw) 1668 bytes (1668 bytes 
when compressed)
      8868 | is EXTENDED is TOASTed(external) is compressed (8864 raw) 
2096 bytes (2100 bytes when compressed)
      9140 | is EXTENDED is TOASTed(external) is compressed (9136 raw) 
2130 bytes (2134 bytes when compressed)


NOTE: I found out that the PostGIS GEOMETRY type leaks memory (has extra 
junk in it) when you have a reasonably large number of sub-geometries. 
This doesnt cause any problem except waste disk space.  The new LWGEOM 
type will not do this!

I looked at the compression of postgis geometries - its about 50%.  This 
is because of the large overhead in the header and wasted space.

I also looked at the WKB type.  I found that it only compresses by about 
7% for smallish geometries (<500 points) and 12% for larger geometries 
(>7000 point).  This is a good indication that there's not much wasted 
space.

I thought it might be worth compressing mid-sized (500bytes +) LWGEOMs, 
but i dont think its worth the work since its only going to save a very 
small amount of space.

dave




More information about the postgis-users mailing list