[postgis-users] Does anyone have multipolygons withlarge numbersof polygons

Paragon Corporation lr at pcorp.us
Sat Apr 19 04:20:14 PDT 2008


It could be the reentry and prepare of the geom that is the culprit.  I
haven't looked at the code, but from thinking about it, would think there
would be some repetitive setup involved in each case (even if passing were
efficient) and then pulling out a single point from a reprepared array of
points.

Given that PostgreSQL has this side effect behavior of allowing set
returning functions in the SELECT part.  Couldn't we then create a SPLIT BY
geom function similar to DUMP, but returns a set of geoms instead of a dump
object.

Then  Kevin's query would be reduced to

SELECT t.gid, Geom_SplitBy(the_geom, 'POINT') as the_geom 
FROM 
	kneufeld.tmp t

You could even have an overloaded allows a range of geoms to pull that does

SELECT t.gid, Geom_SplitBy(the_geom, 'POINT', 1,5) as the_geom 
FROM 
	kneufeld.tmp t

Types would map to our GeomN type functions like PointN, GeometryN,
InteriorRingN.  This would avoid any pass by value inefficiencies since
the_geom would be passed only once and would allow for internal efficiencies
of pulling multiple geometries.

As a side note: A colony of mice is much more fun to play with than a puppy
:)

Thanks,
Regina

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Martin
Davis
Sent: Friday, April 18, 2008 5:25 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Does anyone have multipolygons withlarge
numbersof polygons

Yes, I absolutely agree - SPLIT BY should take "Inverse Aggregate" 
functions, which can be provided for *any* complex object.  The example they
give in the Manifold doc is classic - split a string into "chunks" 
at given characters (or regexes...).  It's funny that they give that example
- but it isn't actually supported by Manifold!

So it looks like it's the pass-by-value semantics which are slowing things
down, not the generate_series idiom.  Hopefully a SPLIT BY facility would be
able to avoid this problem - or would it?

While we're waiting for that, Regina, I suggest maybe getting a puppy?  8^)

Obe, Regina wrote:
> I remember looking at Manifold's SPLIT BY function and thinking - 
> "that's what I'm missing in my life - a inverse GROUP BY."
>
> I was disappointed to find out that there SPLIT BY  only supports 3 
> split functions which makes it much more limiting than the 
> generate_series approach.  It makes nice syntactic sugar at the very 
> least and perhaps if PostgreSQL could implement SPLIT TYPE functions 
> which would be the inverse of the AGGREGATE  functions, then this 
> would be much more interesting.
>
> Looking at Kevin's results I am beginning to think the bottleneck may 
> be in the GeomN type functions and maybe he is right that its doing a 
> copy call for each or something silly like that or a memory leak (I 
> wasn't looking is it linear or quadratic as the number of geometries 
> increases?).
>
> I'm relieved the sub select thing came up with the same results as if 
> it was better it would have shattered some of my beliefs on how things 
> work.
>
> Kevin,
> What happens if you leave out the PointN call altogether just to see 
> what kind of penalty is being incurred there.
> e.g.
> SELECT t.gid, generate_series(1, ST_NumPoints(the_geom)) FROM 
> kneufeld.tmp t
>
> Thanks,
> Regina
>
>
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of 
> Martin Davis
> Sent: Friday, April 18, 2008 2:22 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Does anyone have multipolygons with large 
> numbersof polygons
>
> This is an interesting problem.  I think the general pattern can be 
> stated as something like: "Split a complex column value across 
> multiple rows, along with other columns from the original record".  (A 
> nice way of thinking of this is that it's the inverse of GROUP BY).
>
> This is surprisingly awkward to do in standard SQL.  It's easier in 
> PostgreSQL since it has the generate_series() enhancement, but as you 
> are seeing it's still not a great general solution to the problem.
>
> The annoying thing is is that the underlying processing required to 
> compute the desired result is actually pretty trivial.  This is one of 
> those situations where the fairly limited declaritive semantics of SQL 
> get in your way.
>
> Manifold SQL has an extension which adds a SPLIT BY clause to handle 
> exactly this problem.  I suspect they have a pretty efficient 
> implementation, since it is implemented natively in the engine.
>
> http://www.manifold.net/doc/queries_and_geoms.htm
>
> Maybe it's time to lobby to get this added to PostgreSQL as well?
>
> Obe, Regina wrote:
>   
>> I know there has been some discussion in the past about how 
>> generate_series doesn't scale well when you are dealing with large 
>> numbers of geometries.  My suspicion was that it was because 
>> generate_series has to materialize the result before outputting it 
>> and that the result key is not indexed.
>>
>> Unfortunately I don't have geometries large enough to test the theory 
>> out, but does seem to be faster for me even for  structures where I
>>     
> have
>   
>> more than 10 or geometries per multipolygon.  Can some people try 
>> this out.  The below will take each multi geom and break it out into 
>> single geom structures and will handle in theory a MULTI structure 
>> with up to 500,000 geoms and can easily handle more by increasing the 
>> number of records in the materialized generate series table.
>>
>> Can you compare this:
>>
>> SELECT g.n
>> 	INTO pgis_gs
>> FROM generate_series(1,500000) g(n);
>>
>> ALTER TABLE pgis_gs
>>   ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>>
>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly FROM somtable t 
>> CROSS JOIN pgis_gs g WHERE g.n <= ST_NumGeometries(the_geom);
>>
>> With these:
>>
>> --this will handle at most 5000 (for small sets of geometries - 
>> incurs
>>     
> a
>   
>> huge penalty)
>>
>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly FROM somtable t 
>> CROSS JOIN generate_series(1,5000) g(n) WHERE g.n <= 
>> ST_NumGeometries(the_geom);
>>
>> -this can handle more but I suspect gets really bad for large numbers 
>> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As
>>     
> poly
>   
>> FROM somtable t;
>>
>> Thanks,
>> Regina
>> -----------------------------------------
>> The substance of this message, including any attachments, may be 
>> confidential, legally privileged and/or exempt from disclosure 
>> pursuant to Massachusetts law. It is intended solely for the 
>> addressee. If you received this in error, please contact the sender 
>> and delete the material from any computer.
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>   
>>     
>
>   

--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users





More information about the postgis-users mailing list