[postgis-users] Does anyone have multipolygons with large numbers of polygons

Kevin Neufeld kneufeld at refractions.net
Fri Apr 18 09:10:10 PDT 2008


Hi Regina,

Unfortunately, I don't think your solution works.  I performed some 
tests extracting points from a linestring (instead of a geometry from a 
collection - generate_series has the same performance problems).

I took three linestrings, with ~5000, ~10000, and ~20000 points and 
tried to extract the coordinates.
Here are my results:

-- A linestring with 5718 points:
EXPLAIN ANALYZE select pointn(the_geom, generate_series(1, 
numpoints(the_geom))) from kneufeld.tmp;
                                               QUERY 
PLAN                                              
--------------------------------------------------------------------------------------------------------
 Seq Scan on tmp  (cost=0.00..31.53 rows=1230 width=32) (actual 
time=1.518..3936.619 rows=5718 loops=1)
 Total runtime: 3958.204 ms
 Total runtime: 3961.324 ms
 Total runtime: 3940.620 ms

-- A linestring with 10532 points
 Total runtime: 18170.791 ms
 Total runtime: 18179.784 ms
 Total runtime: 18189.938 ms

-- A linestring with 21227 points
 Total runtime: 77720.651 ms
 Total runtime: 77740.417 ms
 Total runtime: 77710.720 ms


As you can see the generate_series way does not scale well.
Here are the same three linestrings using your technique (good try by 
the way!)

-- A linestring with 5718 points:
EXPLAIN ANALYZE
SELECT pointn(t.the_geom, g.n) As poly
FROM kneufeld.tmp t CROSS JOIN pgis_gs g
WHERE g.n <= numpoints(the_geom);
                                                               QUERY 
PLAN                                                              
----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..431332.72 rows=12300000 width=36) (actual 
time=1.817..4278.020 rows=5718 loops=1)
   ->  Seq Scan on tmp t  (cost=0.00..22.30 rows=1230 width=32) (actual 
time=0.008..0.013 rows=1 loops=1)
   ->  Index Scan using pgis_gs_pkey on pgis_gs g  (cost=0.00..175.66 
rows=10000 width=4) (actual time=0.094..30.803 rows=5718 loops=1)
         Index Cond: (g.n <= numpoints(t.the_geom))
 Total runtime: 4301.425 ms
 Total runtime: 4316.355 ms
 Total runtime: 4351.617 ms

-- A linestring with 10532 points
 Total runtime: 18170.791 ms
 Total runtime: 18179.784 ms
 Total runtime: 18189.938 ms

-- A linestring with 21227 points
 Total runtime: 77833.419 ms
 Total runtime: 77826.653 ms
 Total runtime: 77793.823 ms


I think the problem has anything to do with the result key being indexed 
or not.  pg_class reveals that even the largest of the three geometries 
fit in one page ... so the problem is also not related to toast tables.  
I may have something to do with way PostgreSQL passes parameters 
around... do you know if it passes parameters by value or by reference.  
Does it make a copy of "the_geom" for every call to numpoints, 
generate_series, and pointn?  This would explain the exponential 
computation time we're seeing.

Cheers,
Kevin

-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7

Phone: (250) 383-3022
Email: kneufeld at refractions.net


Obe, Regina wrote:
> I know there has been some discussion in the past about how
> generate_series doesn't scale well when you are dealing with large
> numbers of geometries.  My suspicion was that it was because
> generate_series has to materialize the result before outputting it and
> that the result key is not indexed.
>
> Unfortunately I don't have geometries large enough to test the theory
> out, but does seem to be faster for me even for  structures where I have
> more than 10 or geometries per multipolygon.  Can some people try this
> out.  The below will take each multi geom and break it out into single
> geom structures and will handle in theory a MULTI structure with up to
> 500,000 geoms and can easily handle more by increasing the number of
> records in the materialized generate series table.
>
> Can you compare this:
>
> SELECT g.n
> 	INTO pgis_gs
> FROM generate_series(1,500000) g(n);
>
> ALTER TABLE pgis_gs
>   ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN pgis_gs g
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> With these:
>
> --this will handle at most 5000 (for small sets of geometries - incurs a
> huge penalty)
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> -this can handle more but I suspect gets really bad for large numbers
> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As poly
> FROM somtable t;
>
> Thanks,
> Regina
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   



More information about the postgis-users mailing list