[postgis-users] Is this a perverse use-case for raster types?

Mon Oct 29 15:50:04 PDT 2012

James,

I use PostGIS raster for a similar purpose (model outputs) though my
model outputs are for a specific day (average temperature for a specific
date).  So, one raster with one band per day per variable.  I could
combine a year's worth of bands into one raster but I decided against that.

If you've got a large number of bands (100s or more), you may want to
consider having the rasters be out-of-db.  Part of the problem is that
anything stored in PostgreSQL (in-db) is TOASTed so needs to be
deserialized (and probably decompressed).  So, if the serialized raster
is big (more bands), the deTOASTing will take longer.

Another problem with your benchmark query is that the ST_Clip() is
running twice (for height and width).

If you're in the evaluation stage and you're compiling PostGIS yourself,
I'd recommend trying SVN -trunk (will become 2.1) as it has additional
capabilities and performance improvements.  I'm already using -trunk in
production as I needed the new features (full disclosure: I wrote almost
the new features in -trunk).

-bborie

On 10/29/2012 03:32 PM, James Hiebert wrote:
> Hi All,
> 
> I'm considering using PostGIS rasters for storage of raster data at my organization and I'm looking for some advice (or perhaps a reality check).  I work for a region climate services provider and the vast majority of our data (by volume, not necessarily complexity) are output from climate models.  These are generally a n-by-m raster with one band for each timestep.  There could be upwards of 36k to 72k timesteps for a typical model run.  We have hundreds of model runs.
> 
> So my question is, is it insane to be thinking of storing that many bands in a PostGIS raster?  Or more specifically, is this _not_ a use case for which PostGIS rasters were designed?  I notice that most of the examples in the docs and in "PostGIS In Action" focus only on images and I can imagine that handling multispectral satellite images as being more of the intended use case.
> 
> I did a little benchmarking of a typical use case of ours ("What's the average temperature inside a some polygon, e.g. a river basin?").  I noticed that the run time for doing a ST_Clip(raster, band, geometry) and ST_Intersects(raster, band, geometry) appears to be super-linear even when doing it on just a single band.  I ran the following query:
> SELECT rid, st_height(st_clip(rast, 1, the_geom)), st_width(st_clip(rast, the_geom)) FROM basins INNER JOIN bcsd ON ST_Intersects(rast, 1, the_geom) WHERE rid = <rid> (where basins is table of river basins with one single polygon and bcsd is a table with a raster column "rast").
> for a set of rasters with increasing number of bands, and the time to run the query is shown in the attached plot.  Since the raster properties are presumably shared across all the bands, it seems odd to me that run time would increase.  I would expect it to be _contant_ (with constant number of pixels), but I suppose that that's my own ignorance as to how the PG type extensions work?
> 
> Comments or explanations are welcome.
> 
> ~James
> 
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 

-- 
Bborie Park
Programmer
Center for Vectorborne Diseases
UC Davis
530-752-8380
bkpark at ucdavis.edu