[postgis-users] Is this a perverse use-case for raster types?

James Hiebert hiebert at uvic.ca
Mon Oct 29 17:02:35 PDT 2012


> I believe ST_Intersects() works on out-of-db rasters in the 2.0 series,
> possibly 2.0.1.

Hmmm, for me it it fails for the (raster, integer, geometry) signature:

raster_test=> SELECT rid FROM basins INNER JOIN bcsd ON ST_Intersects(rast, 1, the_geom) WHERE rid = 39;
ERROR:  rt_raster_intersects not implemented yet for OFFDB bands
CONTEXT:  PL/pgSQL function "_st_intersects" line 20 at RETURN

but it appears that you're right for the (geometry, raster, integer) signature:

raster_test=> SELECT rid FROM basins INNER JOIN bcsd ON ST_Intersects(the_geom, rast, 1) WHERE rid = 39;
 rid 
-----
  39
(1 row)

> I wonder what your benchmark's performance would be like if the raster
> is out-db.  I'd expect a flat line with little change regardless the #
> of bands.

Ah ha!  Yes, that's definitely the case.  With out of db storage, each of intersects/clip queries comes back in < 200ms, regardless of num bands.  That's more of the behaviour that I was expecting, too.  Thanks for helping me put a finger on it!

~James

On Mon, Oct 29, 2012 at 04:33:36PM -0700, Bborie Park wrote:
> I believe ST_Intersects() works on out-of-db rasters in the 2.0 series,
> possibly 2.0.1.
> 
> As for performance of in-db vs out-db, in-db is slightly faster but my
> benchmarks are rather old.  I hope to do some testing soon to see if I
> can improve out-db performance.
> 
> Tile size is critical regardless of whether or not you're going to store
> your rasters in-db or out-db.  Generally, tiles should be 100x100 or
> smaller.  Ideal tile size depends upon the input raster's dimensions and
> what tile dimension is cleanly divisible from the raster's dimension.
> 
> I wonder what your benchmark's performance would be like if the raster
> is out-db.  I'd expect a flat line with little change regardless the #
> of bands.
> 
> -bborie
> 
> On 10/29/2012 04:23 PM, James Hiebert wrote:
> >> If you've got a large number of bands (100s or more), you may want to
> >> consider having the rasters be out-of-db.
> > 
> > I had considered that (better, actually, than duplicating our data, right?), but was finding that st_intersects wasn't yet implemented for out of db storage.  Looking through the trunk code, though, it appears that maybe you've gone ahead and implemented that since 2.0.1?  If so, great!  ST_PixelAsPoints() is another good reason for me to seriously consider working out of trunk...
> > 
> >> Part of the problem is that
> >> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
> >> deserialized (and probably decompressed).  So, if the serialized raster
> >> is big (more bands), the deTOASTing will take longer.
> > 
> > Thanks; good to know.
> > 
> >> Another problem with your benchmark query is that the ST_Clip() is
> >> running twice (for height and width).
> > 
> > Ah, that changes the picture pretty dramatically (see attached plot).  Since it improves by a lot more than a factor of two, I suspect maybe I'm having some desktop scaling issues or something.  I'll go ahead and actually put this on our database server, try the trunk version, and go from there.  This is at least somewhat encouraging :)  Thanks for the suggestions.
> > 
> > ~James
> > 
> > On Mon, Oct 29, 2012 at 03:50:04PM -0700, Bborie Park wrote:
> >> James,
> >>
> >> I use PostGIS raster for a similar purpose (model outputs) though my
> >> model outputs are for a specific day (average temperature for a specific
> >> date).  So, one raster with one band per day per variable.  I could
> >> combine a year's worth of bands into one raster but I decided against that.
> >>
> >> If you've got a large number of bands (100s or more), you may want to
> >> consider having the rasters be out-of-db.  Part of the problem is that
> >> anything stored in PostgreSQL (in-db) is TOASTed so needs to be
> >> deserialized (and probably decompressed).  So, if the serialized raster
> >> is big (more bands), the deTOASTing will take longer.
> >>
> >> Another problem with your benchmark query is that the ST_Clip() is
> >> running twice (for height and width).
> >>
> >> If you're in the evaluation stage and you're compiling PostGIS yourself,
> >> I'd recommend trying SVN -trunk (will become 2.1) as it has additional
> >> capabilities and performance improvements.  I'm already using -trunk in
> >> production as I needed the new features (full disclosure: I wrote almost
> >> the new features in -trunk).
> >>
> >> -bborie
> >>
> >> On 10/29/2012 03:32 PM, James Hiebert wrote:
> >>> Hi All,
> >>>
> >>> I'm considering using PostGIS rasters for storage of raster data at my organization and I'm looking for some advice (or perhaps a reality check).  I work for a region climate services provider and the vast majority of our data (by volume, not necessarily complexity) are output from climate models.  These are generally a n-by-m raster with one band for each timestep.  There could be upwards of 36k to 72k timesteps for a typical model run.  We have hundreds of model runs.
> >>>
> >>> So my question is, is it insane to be thinking of storing that many bands in a PostGIS raster?  Or more specifically, is this _not_ a use case for which PostGIS rasters were designed?  I notice that most of the examples in the docs and in "PostGIS In Action" focus only on images and I can imagine that handling multispectral satellite images as being more of the intended use case.
> >>>
> >>> I did a little benchmarking of a typical use case of ours ("What's the average temperature inside a some polygon, e.g. a river basin?").  I noticed that the run time for doing a ST_Clip(raster, band, geometry) and ST_Intersects(raster, band, geometry) appears to be super-linear even when doing it on just a single band.  I ran the following query:
> >>> SELECT rid, st_height(st_clip(rast, 1, the_geom)), st_width(st_clip(rast, the_geom)) FROM basins INNER JOIN bcsd ON ST_Intersects(rast, 1, the_geom) WHERE rid = <rid> (where basins is table of river basins with one single polygon and bcsd is a table with a raster column "rast").
> >>> for a set of rasters with increasing number of bands, and the time to run the query is shown in the attached plot.  Since the raster properties are presumably shared across all the bands, it seems odd to me that run time would increase.  I would expect it to be _contant_ (with constant number of pixels), but I suppose that that's my own ignorance as to how the PG type extensions work?
> >>>
> >>> Comments or explanations are welcome.
> >>>
> >>> ~James

-- 
James Hiebert
Lead, Computational Support
Pacific Climate Impacts Consortium
http://www.pacificclimate.org
Room 112, University House 1, University of Victoria
PO Box 1700 Sta CSC, Victoria, BC V8V 2Y2
E-mail: hiebert at uvic.ca
Tel: (250) 472-4521
Fax: (250) 472-4830



More information about the postgis-users mailing list