[postgis-users] A PostGIS-Raster data proposal

Marshall, Steve smarshall at wsi.com
Wed Oct 25 12:14:11 PDT 2006


>From what I've been able to find re toasted columns; they're stored
>somewhere in the database as a separate table, outside the referencing
row,
>and limited to 1GB.  The 1GB size may be significant in that some
raster
>formats can be larger.  Not a show-stopper, but it would require
>repackaging of very large jpeg2000s, for example.

The 1GB limitation is a good point.  It is not a stopper for my
purposes, but could be a problem for some applications, e.g. very high
resolution satellite imagery, unless the data is reorganized into
smaller tiles.

> Given that raster archives can be many TB in size, what are the
> ramifications for backup and restore of a database containing toasted
> rasters ?
> Since toasted values are in a separate table, can I backup and restore
the
> primary table rows sans toasted columns, and manage the toast in
slices ?
> What happens if my raster table gets, um, toasted  ?  Hopefully, a
full
> multiTB  restore/reorg would not be necessary before I'm running
again.

The TOAST implementation is hidden from the end user, so toasted columns
go through the backup/restore processes as though they were stored
inline in the table.  At least to my knowledge, you cannot dump the
non-toasted columns in a table without dumping the toasted columns.  The
same limitations apply to restore, which may require more inefficient
restore procedures if the image data becomes corrupted.  However, for
what it's worth, I've had no corruption problems in PostgreSQL that were
not related to hardware or power failures, things I now take more pains
to avoid.

Both the TOAST limitations you cite might be better mitigated by using
the PostgreSQL large object interface instead.  LO size limitation is
2GB, instead of 1GB for TOAST, and you can dump the LO data
independently of the tables that reference the LO data, which would
address your backup concerns.  Also, LO has a more capable random access
interface than TOAST, providing seeks for both reads and writes.

Unfortunately, the LO implementation has some limitations.  I don't
think the seeking writes are guaranteed to follow transactional
semantics, i.e. they are not rolled back correctly (this at least used
to be true).  But more worrying to me is the fact that all LO's are
stored in a single relation (pg_largeobject) so this relation is likely
to get very big.  For your multi-TB image repository, all this data
would be in a single relation.   This would mean thousands of 1GB files
to hold the pg_largeobject relation, as well as a lot of space for the
index into this pg_largeobject.  Very large relations in PostgreSQL are
difficult to VACUUM, and tend to have poor performance characteristics,
in part because their indices cannot be loaded into memory all-at-once.


TOAST allows one to store image data in many relations because there is
one toast table for each relation that has toastable columns.   I think
the  multi-relation storage for image data will perform better and be
more maintainable, but your comments have been good food for thought on
the subject.





More information about the postgis-users mailing list