[postgis-users] ST_Intersection very slow

John Abraham jea at hbaspecto.com
Tue Feb 24 16:29:27 PST 2015


Following advice from Mark and Remi, I am now tiling the troublesome layer.  It's not going quickly.  I'm trying an intersection of the layer with only 5 of my 10kmX10km grids and it has been running for 30 min.  I have 8680 grids, which is more features than the layer I was originally working with (2709 polygons). I was hoping that intersection with squares would go much faster than an intersection with arbitrary polygons, but I can confirm that it's definitely not orders of magnitude faster.

So, I think my idea of simplifying things recursively with a quad grid is worth a shot.  Mark, can you point me to your Quadgrid function?

Mark suggested counting points.   Counting points (and rounding to the nearest 10000) gives:

select round(st_npoints(geom)/100,-2) as point100, count(*)
from lancover_polygons_snap group by round(st_npoints(geom)/100,-2) 
order by point100 desc;

26300;1
1500;1
1100;1
800;1
600;2
500;1
400;5
300;10
200;22
100;141
0;997846

The one with about 2630000 points represents the developed land in the region (towns and cities including roads that connect them).

Mark also suggested counting interior rings.   

select round(ST_NumInteriorRings(st_geometryN(geom,1)),-1) as rings10, avg(st_numpoints(geom)) as avgpts,
count(*) as numfeatures
from lancover_polygons_snap group by round(ST_NumInteriorRings(st_geometryN(geom,1)),-1)
order by rings10 desc;

43140;2629905;1
4230;147121;1
3800;110083;1
2480;76515;1
1760;50261;1
1720;55576;1
1680;64588;1
1350;41826;1
1130;38059;1
970;36425;1
950;37686;1
860;32952;1
850;34657;1
820;35285;2
810;24769;1
800;28113;1
760;30726;1
740;26296;1
720;24788;2
670;18874;1
660;22428;1
640;26258;1
630;27213;2
620;22056;2
610;15771;1
560;19942;2
540;23988;1
500;15763;2
480;16598;1
470;19865;2
410;15466;1
400;16023;1
380;13536;4
370;14646;3
360;14122;1
340;12910;1
330;13273;1
320;15222;2
300;13421;1
290;9072;3
280;10025;4
270;12505;4
260;11002;5
250;9843;3
240;9980;6
230;11456;3
220;9049;3
210;9194;6
200;8586;5
190;7089;5
180;6796;8
170;6913;8
160;6955;6
150;5911;10
140;5769;15
130;5377;13
120;5301;17
110;5246;17
100;4422;28
90;4332;28
80;3664;35
70;3292;43
60;2860;66
50;2236;74
40;1882;141
30;1493;215
20;993;618
10;445;3325
0;20;993265


--
John Abraham
jea at hbaspecto.com
403-232-1060


"POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.3 r11132" need upgrade) TOPOLOGY (topology procs from "2.0.3 r11132" need upgrade) RASTER (raster procs from "2.0.3 r11132" need upgrade)"


On Feb 24, 2015, at 3:31 PM, Rémi Cura <remi.cura at gmail.com> wrote:

> Hey,
> I highjack this interesting discussion, sorry =)
> 
> If you have one single massive Polygon (no multi).
> You can generate a point table (keeping path of course) with ST_DumpPoints.
> 
> Then you construct an index on this table.
> Then you perform your Intersection with indexed points, which will be very fast.
> Then you reconstruct parts of polygon.
> It will require some advanced to very advanced SQL (like windows function) to make it work tough.
> 
> I think it would be faster than other alternatives, because basicaly it amounts to using the optimized R-Tree index on points, vs a custom and non-otpimized Quad tree index on polygon.
> Moreover when using a recusrive strategy, you end up computing the same think a lot with geos.
> 
> Cheers,
> Rémi-C
> 
> 2015-02-24 23:10 GMT+01:00 Mark Wynter <mark at dimensionaledge.com>:
> 
> > Thanks Mark.  I will indeed do the st_dump, but I don't think it will help much (very few multis).  I think the tiling will help a lot.  What I wonder, though, is how long it will take to tile?  Afterall, it's still an st_intersection operation to tile each geometry against each tile.
> 
> I’ve almost finished writing the tutorial - where I address many of these points.    The variables that affect performance are:
> * how you’ve written your ST_Intersection query
> * multi vs. non-multi
> * size and complexity of geoms
> * no. available CPUs (for parallelisation)
> * tile batch size - important!!!
> 
> All strategies in combination may be necessary if your queries are taking forever.
> 
> For the demonstration dataset (a multi polygon representing whole of Australia), my tutorial tiling query incorporates ST_Intersection and ST_Difference subqueries to produce tiled features representing land and water.
> I achieved a 49x reduction in the query time to tile the whole of Australia, starting with a single multi polygon.
> 
> The more complex the query, the more significant this time saving is in absolute terms.
> 
> > Is there a quad tree type tiling algorithm in a function?  If I do 256 x 256 tiles, doing it all at once would be 65536 operations of st_intersection(complex_geom, square).  With a quad tree I'll only have 4 operations of st_intersection(complex_geom, square), and then 16 of (a_little_less_complex_geom, square), and then 64 of (even_less_complex_geom, square) and so on, for 8 nests.  The last one will still be 65536 operations, but the complex geoms should be a lot simpler by the time I get down to that level.  What do you think, is this worth trying?  Or is intersecting with a square fairly simple regardless of how complex the other geometry is?
> 
> I do have a SQL quadgrid tiling function - where a cell divides recursively subject to a maximum number of levels or “value thresholds” - but I’m not sure if that’s the right approach.
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150224/f4a7b050/attachment.html>


More information about the postgis-users mailing list