[postgis-users] ST_Intersection very slow.

Andre Mano andre.s.mano at gmail.com
Thu Feb 19 23:41:55 PST 2015


*"I will write up a tutorial explaining the benefits of vector tiling in
PostGIS, with examples and parallelisation code patterns.  If not today,
hopefully over the next week. "*

 Would be nice if you have the time!

Best,

Andre

On Fri, Feb 20, 2015 at 2:52 AM, Mark Wynter <mark at dimensionaledge.com>
wrote:

>
> So I've was running this query for 866000 s (10 days) before I decided to
> kill it:
>
>
>
> One potential thing I've realized is that a few of the geometries in
> tazjan2 are multipolygons, not single polygons.  But it's only a few.
> There are a few very large and complex polygons in lancover_polygons_snap,
> but again, most of the 998031 are simple small polygons, about half would
> be ST_CoveredBy the polygons in tazjan2 and most of the rest would only
> overlap two or three of the polygons in tazjan2.
>
>
>
> A common offender - the multipolygons.
>
> You can get an immediate performance improvement by
> ST_Dump(wkb_geometry).geom into a new table, with a new serial id, but
> retaining the original poly_id so you can don’t lose the multipolygon
> provenance of each polygon.
> Complex operations on the multipolygon table, like ST_Intersection(), are
> extremely slow - Because the query has to access and compute the operation
> on each of the geometry elements contained within the multi.
>
> Another massive performance gain can be achieved by “Tiling” your dumped
> polygon table - which has the effect of breaking your geometries into
> smaller features, which you can then “union” back together again, after
> running ST_Intersection on your vector tiles.  Don’t try to tile your multi
> polygon table - you’ll still be waiting days.
>
> Take the coastline of Australia - a single multi polygon
>
> SELECT Count(ogc_fid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as
> num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11;
>  num_rows | num_geoms | num_points
> -------------------+--------------
>         1 |      6718 |    1622598
>
> If you need to query the intersection of a polygon with land, and the
> water, its near impossible.  Many, many days.
>
> Hence we need to take a different approach.
>
> The first thing you can do is to dump the multi polygon table.
> Then create a regular vector grid.
> Then tile the both your Polygon tables using the vector grid.
> Put indexes on the resultant tiled tables.
> Then run your ST_Intersects.
> The result is many more geometries, and points, but now we get the benefit
> of geometry indexes.
>
> As a result of tiling Australia coastline, we now get
>
> SELECT Count(tid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as
> num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11_tiled;
>  num_rows | num_geoms | num_points
> -------------------+--------------
>     17222 |     29020 |    6513770
>
> Each geometry also has a tile_id, and the original poly_id.  The tile_id’s
> are really useful for subsetting your queries, and for using tile-id’s as
> an additional join condition.  At any time you can rapidly rebuild your
> original geometries doing ST_Union() GROUP BY poly_id.
>
> Using the approach of dumping and tiling, queries that once took days now
> takes minutes at most.  And as Remi mentioned, you can parallelise the
> process.  The concept of vector tiling in PostGIS is analogous to Map
> Reduce and assigning Key Value Pairs.  Its about chunking your data, doing
> lots of fast operations on the smaller bits, and then consolidating your
> outputs.  You don’t necessarily have to write a SQL function to do the SQL
> parallelisation.   My preference is Bash and GNU Parallel because you can
> write vanilla SQL and execute via psql in parallel.   For me its now about
> the speed I can write the SQL.
>
> So we’re now starting to apply Big Data concepts within PostGIS….    Holy
> smoke… and you can apply the same concepts to a wide range of PostGIS
> operations - scale up, scale out….
>
> I will write up a tutorial explaining the benefits of vector tiling in
> PostGIS, with examples and parallelisation code patterns.  If not today,
> hopefully over the next week.
>
> Mark
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>



-- 
..................................
André Mano
http://opussig.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150220/f0410fe2/attachment.html>


More information about the postgis-users mailing list