[postgis-users] Does anyone have multipolygons with large numbersof polygons

Obe, Regina robe.dnd at cityofboston.gov
Fri Apr 18 12:54:43 PDT 2008


I remember looking at Manifold's SPLIT BY function and thinking -
"that's what I'm missing in my life - a inverse GROUP BY." 

I was disappointed to find out that there SPLIT BY  only supports 3
split functions which makes it much more limiting than the
generate_series approach.  It makes nice syntactic sugar at the very
least and perhaps if PostgreSQL could implement SPLIT TYPE functions
which would be the inverse of the AGGREGATE  functions, then this would
be much more interesting.

Looking at Kevin's results I am beginning to think the bottleneck may be
in the GeomN type functions and maybe he is right that its doing a copy
call for each or something silly like that or a memory leak (I wasn't
looking is it linear or quadratic as the number of geometries
increases?).

I'm relieved the sub select thing came up with the same results as if it
was better it would have shattered some of my beliefs on how things
work.

Kevin,
What happens if you leave out the PointN call altogether just to see
what kind of penalty is being incurred there.
e.g.
SELECT t.gid, generate_series(1, ST_NumPoints(the_geom))
FROM kneufeld.tmp t

Thanks,
Regina



-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Martin Davis
Sent: Friday, April 18, 2008 2:22 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Does anyone have multipolygons with large
numbersof polygons

This is an interesting problem.  I think the general pattern can be 
stated as something like: "Split a complex column value across multiple 
rows, along with other columns from the original record".  (A nice way 
of thinking of this is that it's the inverse of GROUP BY).

This is surprisingly awkward to do in standard SQL.  It's easier in 
PostgreSQL since it has the generate_series() enhancement, but as you 
are seeing it's still not a great general solution to the problem.

The annoying thing is is that the underlying processing required to 
compute the desired result is actually pretty trivial.  This is one of 
those situations where the fairly limited declaritive semantics of SQL 
get in your way.

Manifold SQL has an extension which adds a SPLIT BY clause to handle 
exactly this problem.  I suspect they have a pretty efficient 
implementation, since it is implemented natively in the engine.

http://www.manifold.net/doc/queries_and_geoms.htm

Maybe it's time to lobby to get this added to PostgreSQL as well?

Obe, Regina wrote:
> I know there has been some discussion in the past about how
> generate_series doesn't scale well when you are dealing with large
> numbers of geometries.  My suspicion was that it was because
> generate_series has to materialize the result before outputting it and
> that the result key is not indexed.
>
> Unfortunately I don't have geometries large enough to test the theory
> out, but does seem to be faster for me even for  structures where I
have
> more than 10 or geometries per multipolygon.  Can some people try this
> out.  The below will take each multi geom and break it out into single
> geom structures and will handle in theory a MULTI structure with up to
> 500,000 geoms and can easily handle more by increasing the number of
> records in the materialized generate series table.
>
> Can you compare this:
>
> SELECT g.n
> 	INTO pgis_gs
> FROM generate_series(1,500000) g(n);
>
> ALTER TABLE pgis_gs
>   ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN pgis_gs g
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> With these:
>
> --this will handle at most 5000 (for small sets of geometries - incurs
a
> huge penalty)
>
> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
> WHERE g.n <= ST_NumGeometries(the_geom);
>
> -this can handle more but I suspect gets really bad for large numbers
> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As
poly
> FROM somtable t;
>
> Thanks,
> Regina
> -----------------------------------------
> The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended
> solely for the addressee. If you received this in error, please
> contact the sender and delete the material from any computer.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users



More information about the postgis-users mailing list