[postgis-users] Does anyone have multipolygons with large numbersof polygons

Obe, Regina robe.dnd at cityofboston.gov
Fri Apr 18 10:48:02 PDT 2008


Not sure the answer to your questions since I don't know much about the internals of PostgreSQL.  I always assumed by ref for large objects.  
 
We could try to force it to calculate the numpoints call only once and see if it makes a difference in timings
 
Something of the form
 
SELECT pointn(t.the_geom, g.n) As poly
FROM (SELECT the_geom, numpoints(the_geom) As np FROM kneufeld.tmp) t CROSS JOIN pgis_gs g
WHERE g.n <= np;
 
 
Also why are the timings for 
-- A linestring with 10532 points  
 
exactly the same in both cases or was that a bad cut and paste job :)
 
Thanks,
Regina
 

________________________________

From: postgis-users-bounces at postgis.refractions.net on behalf of Kevin Neufeld
Sent: Fri 4/18/2008 12:10 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Does anyone have multipolygons with large numbersof polygons



Hi Regina,

Unfortunately, I don't think your solution works.  I performed some
tests extracting points from a linestring (instead of a geometry from a
collection - generate_series has the same performance problems).

I took three linestrings, with ~5000, ~10000, and ~20000 points and
tried to extract the coordinates.
Here are my results:

-- A linestring with 5718 points:
EXPLAIN ANALYZE select pointn(the_geom, generate_series(1,
numpoints(the_geom))) from kneufeld.tmp;
                                               QUERY
PLAN                                             
--------------------------------------------------------------------------------------------------------
 Seq Scan on tmp  (cost=0.00..31.53 rows=1230 width=32) (actual
time=1.518..3936.619 rows=5718 loops=1)
 Total runtime: 3958.204 ms
 Total runtime: 3961.324 ms
 Total runtime: 3940.620 ms

-- A linestring with 10532 points
 Total runtime: 18170.791 ms
 Total runtime: 18179.784 ms
 Total runtime: 18189.938 ms

-- A linestring with 21227 points
 Total runtime: 77720.651 ms
 Total runtime: 77740.417 ms
 Total runtime: 77710.720 ms


As you can see the generate_series way does not scale well.
Here are the same three linestrings using your technique (good try by
the way!)

-- A linestring with 5718 points:
EXPLAIN ANALYZE
SELECT pointn(t.the_geom, g.n) As poly
FROM kneufeld.tmp t CROSS JOIN pgis_gs g
WHERE g.n <= numpoints(the_geom);
                                                               QUERY
PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..431332.72 rows=12300000 width=36) (actual
time=1.817..4278.020 rows=5718 loops=1)
   ->  Seq Scan on tmp t  (cost=0.00..22.30 rows=1230 width=32) (actual
time=0.008..0.013 rows=1 loops=1)
   ->  Index Scan using pgis_gs_pkey on pgis_gs g  (cost=0.00..175.66
rows=10000 width=4) (actual time=0.094..30.803 rows=5718 loops=1)
         Index Cond: (g.n <= numpoints(t.the_geom))
 Total runtime: 4301.425 ms
 Total runtime: 4316.355 ms
 Total runtime: 4351.617 ms

-- A linestring with 10532 points
 Total runtime: 18170.791 ms
 Total runtime: 18179.784 ms
 Total runtime: 18189.938 ms

-- A linestring with 21227 points
 Total runtime: 77833.419 ms
 Total runtime: 77826.653 ms
 Total runtime: 77793.823 ms


I think the problem has anything to do with the result key being indexed
or not.  pg_class reveals that even the largest of the three geometries
fit in one page ... so the problem is also not related to toast tables. 
I may have something to do with way PostgreSQL passes parameters
around... do you know if it passes parameters by value or by reference. 
Does it make a copy of "the_geom" for every call to numpoints,
generate_series, and pointn?  This would explain the exponential
computation time we're seeing.

Cheers,
Kevin

-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7

Phone: (250) 383-3022
Email: kneufeld at refractions.net


Obe, Regina wrote:
> I know there has been some discussion in the past about how
> generate_series doesn't scale well when you are dealing with large
> numbers of geometries.  My suspicion was that it was because
> generate_series has to materialize the result before outputting it and
> that the result key is not indexed.
>
> Unfortunately I don't have geometries large enough to test the theory
> out, but does seem to be faster for me even for  structures where I have
> more than 10 or geometries per multipolygon.  Can some people try this
> out.  The below will take each multi geom and break it out into single
> geom structures and will handle in theory a MULTI structure with up to
> 500,000 geoms and can easily handle more by increasing the number of
> records in the materialized generate series table.
>
> Can you compare this:
>
> SELECT g.n
>       INTO pgis_gs
> FROM generate_series(1,500000) g(n);
>
> ALTER TABLE pgis_gs
>   ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN pgis_gs g
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> With these:
>
> --this will handle at most 5000 (for small sets of geometries - incurs a
> huge penalty)
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> -this can handle more but I suspect gets really bad for large numbers
> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As poly
> FROM somtable t;
>
> Thanks,
> Regina
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>  
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20080418/a3c719eb/attachment.html>


More information about the postgis-users mailing list