<HTML dir=ltr><HEAD><TITLE>Re: [postgis-users] Does anyone have multipolygons with large numbersof polygons</TITLE>

<META http-equiv=Content-Type content="text/html; charset=unicode">

<META content="MSHTML 6.00.2900.3314" name=GENERATOR></HEAD>

<BODY>

<DIV id=idOWAReplyText61654 dir=ltr>

<DIV dir=ltr><FONT face=Arial color=#000000 size=2>Not sure the answer to your questions since I don't know much about the internals of PostgreSQL.  I always assumed by ref for large objects.  </FONT></DIV>

<DIV dir=ltr><FONT face=Arial color=#000000 size=2></FONT> </DIV>

<DIV dir=ltr><FONT face=Arial color=#000000 size=2>We could try to force it to calculate the numpoints call only once and see if it makes a difference in timings</FONT></DIV>

<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>

<DIV dir=ltr><FONT face=Arial size=2>Something of the form</FONT></DIV>

<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>

<DIV dir=ltr><FONT size=2>SELECT pointn(t.the_geom, g.n) As poly<BR>FROM (SELECT the_geom, numpoints(the_geom) As np FROM kneufeld.tmp) t CROSS JOIN pgis_gs g<BR>WHERE g.n <= np;</FONT></DIV>

<DIV dir=ltr><FONT size=2></FONT> </DIV>

<DIV dir=ltr><FONT size=2></FONT> </DIV>

<DIV dir=ltr><FONT face=Arial size=2>Also why are the timings for </FONT></DIV>

<DIV dir=ltr><FONT size=2>-- A linestring with 10532 points  </FONT></DIV>

<DIV dir=ltr><FONT size=2></FONT> </DIV>

<DIV dir=ltr><FONT size=2>exactly the same in both cases or was that a bad cut and paste job :)</FONT></DIV>

<DIV dir=ltr><FONT size=2></FONT> </DIV>

<DIV dir=ltr><FONT size=2>Thanks,</FONT></DIV>

<DIV dir=ltr><FONT size=2>Regina</FONT></DIV>

<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV></DIV>

<DIV dir=ltr><BR>

<HR tabIndex=-1>

<FONT face=Tahoma size=2><B>From:</B> postgis-users-bounces@postgis.refractions.net on behalf of Kevin Neufeld<BR><B>Sent:</B> Fri 4/18/2008 12:10 PM<BR><B>To:</B> PostGIS Users Discussion<BR><B>Subject:</B> Re: [postgis-users] Does anyone have multipolygons with large numbersof polygons<BR></FONT><BR></DIV>

<DIV>

<P><FONT size=2>Hi Regina,<BR><BR>Unfortunately, I don't think your solution works.  I performed some<BR>tests extracting points from a linestring (instead of a geometry from a<BR>collection - generate_series has the same performance problems).<BR><BR>I took three linestrings, with ~5000, ~10000, and ~20000 points and<BR>tried to extract the coordinates.<BR>Here are my results:<BR><BR>-- A linestring with 5718 points:<BR>EXPLAIN ANALYZE select pointn(the_geom, generate_series(1,<BR>numpoints(the_geom))) from kneufeld.tmp;<BR>                                               QUERY<BR>PLAN                                             <BR>--------------------------------------------------------------------------------------------------------<BR> Seq Scan on tmp  (cost=0.00..31.53 rows=1230 width=32) (actual<BR>time=1.518..3936.619 rows=5718 loops=1)<BR> Total runtime: 3958.204 ms<BR> Total runtime: 3961.324 ms<BR> Total runtime: 3940.620 ms<BR><BR>-- A linestring with 10532 points<BR> Total runtime: 18170.791 ms<BR> Total runtime: 18179.784 ms<BR> Total runtime: 18189.938 ms<BR><BR>-- A linestring with 21227 points<BR> Total runtime: 77720.651 ms<BR> Total runtime: 77740.417 ms<BR> Total runtime: 77710.720 ms<BR><BR><BR>As you can see the generate_series way does not scale well.<BR>Here are the same three linestrings using your technique (good try by<BR>the way!)<BR><BR>-- A linestring with 5718 points:<BR>EXPLAIN ANALYZE<BR>SELECT pointn(t.the_geom, g.n) As poly<BR>FROM kneufeld.tmp t CROSS JOIN pgis_gs g<BR>WHERE g.n <= numpoints(the_geom);<BR>                                                               QUERY<BR>PLAN                                                             <BR>----------------------------------------------------------------------------------------------------------------------------------------<BR> Nested Loop  (cost=0.00..431332.72 rows=12300000 width=36) (actual<BR>time=1.817..4278.020 rows=5718 loops=1)<BR>   ->  Seq Scan on tmp t  (cost=0.00..22.30 rows=1230 width=32) (actual<BR>time=0.008..0.013 rows=1 loops=1)<BR>   ->  Index Scan using pgis_gs_pkey on pgis_gs g  (cost=0.00..175.66<BR>rows=10000 width=4) (actual time=0.094..30.803 rows=5718 loops=1)<BR>         Index Cond: (g.n <= numpoints(t.the_geom))<BR> Total runtime: 4301.425 ms<BR> Total runtime: 4316.355 ms<BR> Total runtime: 4351.617 ms<BR><BR>-- A linestring with 10532 points<BR> Total runtime: 18170.791 ms<BR> Total runtime: 18179.784 ms<BR> Total runtime: 18189.938 ms<BR><BR>-- A linestring with 21227 points<BR> Total runtime: 77833.419 ms<BR> Total runtime: 77826.653 ms<BR> Total runtime: 77793.823 ms<BR><BR><BR>I think the problem has anything to do with the result key being indexed<BR>or not.  pg_class reveals that even the largest of the three geometries<BR>fit in one page ... so the problem is also not related to toast tables. <BR>I may have something to do with way PostgreSQL passes parameters<BR>around... do you know if it passes parameters by value or by reference. <BR>Does it make a copy of "the_geom" for every call to numpoints,<BR>generate_series, and pointn?  This would explain the exponential<BR>computation time we're seeing.<BR><BR>Cheers,<BR>Kevin<BR><BR>-------------<BR>Kevin Neufeld<BR>Software Developer<BR>Refractions Research Inc.<BR>300-1207 Douglas St.<BR>Victoria, B.C., V8W 2E7<BR><BR>Phone: (250) 383-3022<BR>Email: kneufeld@refractions.net<BR><BR><BR>Obe, Regina wrote:<BR>> I know there has been some discussion in the past about how<BR>> generate_series doesn't scale well when you are dealing with large<BR>> numbers of geometries.  My suspicion was that it was because<BR>> generate_series has to materialize the result before outputting it and<BR>> that the result key is not indexed.<BR>><BR>> Unfortunately I don't have geometries large enough to test the theory<BR>> out, but does seem to be faster for me even for  structures where I have<BR>> more than 10 or geometries per multipolygon.  Can some people try this<BR>> out.  The below will take each multi geom and break it out into single<BR>> geom structures and will handle in theory a MULTI structure with up to<BR>> 500,000 geoms and can easily handle more by increasing the number of<BR>> records in the materialized generate series table.<BR>><BR>> Can you compare this:<BR>><BR>> SELECT g.n<BR>>       INTO pgis_gs<BR>> FROM generate_series(1,500000) g(n);<BR>><BR>> ALTER TABLE pgis_gs<BR>>   ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);<BR>><BR>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly<BR>> FROM somtable t CROSS JOIN pgis_gs g<BR>> WHERE g.n <= ST_NumGeometries(the_geom);<BR>><BR>> With these:<BR>><BR>> --this will handle at most 5000 (for small sets of geometries - incurs a<BR>> huge penalty)<BR>><BR>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly<BR>> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)<BR>> WHERE g.n <= ST_NumGeometries(the_geom);<BR>><BR>> -this can handle more but I suspect gets really bad for large numbers<BR>> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As poly<BR>> FROM somtable t;<BR>><BR>> Thanks,<BR>> Regina<BR>> -----------------------------------------<BR>> The substance of this message, including any attachments, may be<BR>> confidential, legally privileged and/or exempt from disclosure<BR>> pursuant to Massachusetts law. It is intended<BR>> solely for the addressee. If you received this in error, please<BR>> contact the sender and delete the material from any computer.<BR>><BR>> _______________________________________________<BR>> postgis-users mailing list<BR>> postgis-users@postgis.refractions.net<BR>> <A href="http://postgis.refractions.net/mailman/listinfo/postgis-users">http://postgis.refractions.net/mailman/listinfo/postgis-users</A><BR>>  <BR>_______________________________________________<BR>postgis-users mailing list<BR>postgis-users@postgis.refractions.net<BR><A href="http://postgis.refractions.net/mailman/listinfo/postgis-users">http://postgis.refractions.net/mailman/listinfo/postgis-users</A><BR></FONT></P></DIV></BODY></HTML>