[postgis-users] selected aligned tiles in pgraster between several large raster tables.

Graeme B. Bell grb at skogoglandskap.no
Wed Jun 19 01:51:25 PDT 2013


Hi Regina,

I tried dropping the geometry cast, it makes no difference. Results below. But I found another solution which is rather nice... 


===== 


I thought I should redesign this test so that you can check it yourself rather than take my word for it. Unfortunately I can't share my dataset so I'm using an OSGeo dataset below. 


Results found:  

- RAST vs. RAST::GEOMETRY join performance was pretty much identical.  

- There is a pretty huge difference (here, 100000x worse performance) between joining on RID and joining on either RAST or RAST::GEOMETRY, if you're working between different raster data in the same tiling/georeferencing system. 

- You can use a b-tree on your rasters instead of GiST, if you don't mind a bit of a workaround. This lets you make comparisons/joins on the rast column metadata super-fast. Not sure how well it will scale on gigabyte datasets yet, but I'm hopeful. The only downside is that your rids will get returned out of order and you'll need to 'order by' manually.



Detail.

0a. Test machine - decently fast: 

Server is Xeon E3-1270v2, running single-threaded with no other load at 3.9Ghz turbo. Samsung 840 pro SSD 512GB as the postgres main disk, and 32GB of 1600Mhz ECC RAM.

0b. Postgres was configured for good performance using a couple of online guides and pgbench.

0c. PostGIS 2.0.3, postgres 9.2.4. 



1. ilatlon.tif is a georeferenced tif available at http://download.osgeo.org/geotiff/samples/usgs/ilatlon.tif, size 13MB. Since the size is only 13MB, data is going to be cached in memory I expect. 

2. Insert to create '4 different rasters in the same georeference/tiling system'.  
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l1 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l2 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l3 | psql sl
raster2pgsql -I -C -M -r -t 50x50 ilatlon_float.tif temp.l4 | psql sl

Enable timing. 

sl=# \timing

The set of tests was run twice and two timing results were recorded.

sl=# select l1.rid from temp.l1 left join temp.l2 on l1.rid=l2.rid left join temp.l3 on l1.rid=l3.rid left join temp.l4 on l1.rid=l4.rid;
Time: 1.366 ms
Time: 1.185 ms


sl=# select l1.rid from temp.l1 inner join temp.l2 on l1.rast::geometry~=l2.rast::geometry inner join temp.l3 on l1.rast::geometry~=l3.rast::geometry inner join temp.l4 on l1.rast::geometry~=l4.rast::geometry;
Time: 99406.689 ms
Time: 99298.702 ms


sl=# select l1.rid from temp.l1 inner join temp.l2 on l1.rast~=l2.rast inner join temp.l3 on l1.rast~=l3.rast inner join temp.l4 on l1.rast~=l4.rast;
Time: 99295.640 ms
Time: 99332.718 ms

Is it the join type? No. 

sl=# select l1.rid from temp.l1 left join temp.l2 on l1.rast~=l2.rast left join temp.l3 on l1.rast~=l3.rast left join temp.l4 on l1.rast~=l4.rast;
Time: 149493.271 ms
Time: 149601.305 ms

sl=# select l1.rid from temp.l1 right join temp.l2 on l1.rast~=l2.rast right join temp.l3 on l1.rast~=l3.rast right join temp.l4 on l1.rast~=l4.rast;
Time: 99271.954 ms
Time: 100857.511 ms


My thoughts:

a. 1 ms vs 100000ms on a 13MB TIFF is quite a big difference. It will be painful with multi-GB TIFFs with tens of millions of records.

b. GIST indices are not good for equality. If there was a b-tree method for rast, perhaps that would be better? 

unfortunately, you can't simply "create index X on rast", b-trees don't work with them; casts don't seem to work either. 

BUT... we could use a b-tree on a function that produces a string describing the raster tile metadata:

sl=# CREATE INDEX temp1_btree ON temp.l1 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp2_btree ON temp.l2 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp3_btree ON temp.l3 (ST_GeoReference(rast));
CREATE INDEX
sl=# CREATE INDEX temp4_btree ON temp.l4 (ST_GeoReference(rast));
CREATE INDEX

sl=# select l1.rid from temp.l1 inner join temp.l2 on st_georeference(l1.rast)=st_georeference(l2.rast) inner join temp.l3 on st_georeference(l1.rast)=st_georeference(l3.rast) inner join temp.l4 on st_georeference(l1.rast)=st_georeference(l4.rast);

Time: 647.384 ms

Still 500x slower, but a quite reasonable improvement. 

Graeme







More information about the postgis-users mailing list