[postgis-users] Does anyone have multipolygons with large numbersof polygons
Obe, Regina
robe.dnd at cityofboston.gov
Fri Apr 18 10:48:02 PDT 2008
Not sure the answer to your questions since I don't know much about the internals of PostgreSQL. I always assumed by ref for large objects.
We could try to force it to calculate the numpoints call only once and see if it makes a difference in timings
Something of the form
SELECT pointn(t.the_geom, g.n) As poly
FROM (SELECT the_geom, numpoints(the_geom) As np FROM kneufeld.tmp) t CROSS JOIN pgis_gs g
WHERE g.n <= np;
Also why are the timings for
-- A linestring with 10532 points
exactly the same in both cases or was that a bad cut and paste job :)
Thanks,
Regina
________________________________
From: postgis-users-bounces at postgis.refractions.net on behalf of Kevin Neufeld
Sent: Fri 4/18/2008 12:10 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Does anyone have multipolygons with large numbersof polygons
Hi Regina,
Unfortunately, I don't think your solution works. I performed some
tests extracting points from a linestring (instead of a geometry from a
collection - generate_series has the same performance problems).
I took three linestrings, with ~5000, ~10000, and ~20000 points and
tried to extract the coordinates.
Here are my results:
-- A linestring with 5718 points:
EXPLAIN ANALYZE select pointn(the_geom, generate_series(1,
numpoints(the_geom))) from kneufeld.tmp;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tmp (cost=0.00..31.53 rows=1230 width=32) (actual
time=1.518..3936.619 rows=5718 loops=1)
Total runtime: 3958.204 ms
Total runtime: 3961.324 ms
Total runtime: 3940.620 ms
-- A linestring with 10532 points
Total runtime: 18170.791 ms
Total runtime: 18179.784 ms
Total runtime: 18189.938 ms
-- A linestring with 21227 points
Total runtime: 77720.651 ms
Total runtime: 77740.417 ms
Total runtime: 77710.720 ms
As you can see the generate_series way does not scale well.
Here are the same three linestrings using your technique (good try by
the way!)
-- A linestring with 5718 points:
EXPLAIN ANALYZE
SELECT pointn(t.the_geom, g.n) As poly
FROM kneufeld.tmp t CROSS JOIN pgis_gs g
WHERE g.n <= numpoints(the_geom);
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..431332.72 rows=12300000 width=36) (actual
time=1.817..4278.020 rows=5718 loops=1)
-> Seq Scan on tmp t (cost=0.00..22.30 rows=1230 width=32) (actual
time=0.008..0.013 rows=1 loops=1)
-> Index Scan using pgis_gs_pkey on pgis_gs g (cost=0.00..175.66
rows=10000 width=4) (actual time=0.094..30.803 rows=5718 loops=1)
Index Cond: (g.n <= numpoints(t.the_geom))
Total runtime: 4301.425 ms
Total runtime: 4316.355 ms
Total runtime: 4351.617 ms
-- A linestring with 10532 points
Total runtime: 18170.791 ms
Total runtime: 18179.784 ms
Total runtime: 18189.938 ms
-- A linestring with 21227 points
Total runtime: 77833.419 ms
Total runtime: 77826.653 ms
Total runtime: 77793.823 ms
I think the problem has anything to do with the result key being indexed
or not. pg_class reveals that even the largest of the three geometries
fit in one page ... so the problem is also not related to toast tables.
I may have something to do with way PostgreSQL passes parameters
around... do you know if it passes parameters by value or by reference.
Does it make a copy of "the_geom" for every call to numpoints,
generate_series, and pointn? This would explain the exponential
computation time we're seeing.
Cheers,
Kevin
-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7
Phone: (250) 383-3022
Email: kneufeld at refractions.net
Obe, Regina wrote:
> I know there has been some discussion in the past about how
> generate_series doesn't scale well when you are dealing with large
> numbers of geometries. My suspicion was that it was because
> generate_series has to materialize the result before outputting it and
> that the result key is not indexed.
>
> Unfortunately I don't have geometries large enough to test the theory
> out, but does seem to be faster for me even for structures where I have
> more than 10 or geometries per multipolygon. Can some people try this
> out. The below will take each multi geom and break it out into single
> geom structures and will handle in theory a MULTI structure with up to
> 500,000 geoms and can easily handle more by increasing the number of
> records in the materialized generate series table.
>
> Can you compare this:
>
> SELECT g.n
> INTO pgis_gs
> FROM generate_series(1,500000) g(n);
>
> ALTER TABLE pgis_gs
> ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN pgis_gs g
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> With these:
>
> --this will handle at most 5000 (for small sets of geometries - incurs a
> huge penalty)
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> -this can handle more but I suspect gets really bad for large numbers
> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As poly
> FROM somtable t;
>
> Thanks,
> Regina
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20080418/a3c719eb/attachment.html>
More information about the postgis-users
mailing list