[postgis-users] Does anyone have multipolygons with large numbersof polygons
Martin Davis
mbdavis at refractions.net
Fri Apr 18 14:24:34 PDT 2008
Yes, I absolutely agree - SPLIT BY should take "Inverse Aggregate"
functions, which can be provided for *any* complex object. The example
they give in the Manifold doc is classic - split a string into "chunks"
at given characters (or regexes...). It's funny that they give that
example - but it isn't actually supported by Manifold!
So it looks like it's the pass-by-value semantics which are slowing
things down, not the generate_series idiom. Hopefully a SPLIT BY
facility would be able to avoid this problem - or would it?
While we're waiting for that, Regina, I suggest maybe getting a puppy? 8^)
Obe, Regina wrote:
> I remember looking at Manifold's SPLIT BY function and thinking -
> "that's what I'm missing in my life - a inverse GROUP BY."
>
> I was disappointed to find out that there SPLIT BY only supports 3
> split functions which makes it much more limiting than the
> generate_series approach. It makes nice syntactic sugar at the very
> least and perhaps if PostgreSQL could implement SPLIT TYPE functions
> which would be the inverse of the AGGREGATE functions, then this would
> be much more interesting.
>
> Looking at Kevin's results I am beginning to think the bottleneck may be
> in the GeomN type functions and maybe he is right that its doing a copy
> call for each or something silly like that or a memory leak (I wasn't
> looking is it linear or quadratic as the number of geometries
> increases?).
>
> I'm relieved the sub select thing came up with the same results as if it
> was better it would have shattered some of my beliefs on how things
> work.
>
> Kevin,
> What happens if you leave out the PointN call altogether just to see
> what kind of penalty is being incurred there.
> e.g.
> SELECT t.gid, generate_series(1, ST_NumPoints(the_geom))
> FROM kneufeld.tmp t
>
> Thanks,
> Regina
>
>
>
> -----Original Message-----
> From: postgis-users-bounces at postgis.refractions.net
> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
> Martin Davis
> Sent: Friday, April 18, 2008 2:22 PM
> To: PostGIS Users Discussion
> Subject: Re: [postgis-users] Does anyone have multipolygons with large
> numbersof polygons
>
> This is an interesting problem. I think the general pattern can be
> stated as something like: "Split a complex column value across multiple
> rows, along with other columns from the original record". (A nice way
> of thinking of this is that it's the inverse of GROUP BY).
>
> This is surprisingly awkward to do in standard SQL. It's easier in
> PostgreSQL since it has the generate_series() enhancement, but as you
> are seeing it's still not a great general solution to the problem.
>
> The annoying thing is is that the underlying processing required to
> compute the desired result is actually pretty trivial. This is one of
> those situations where the fairly limited declaritive semantics of SQL
> get in your way.
>
> Manifold SQL has an extension which adds a SPLIT BY clause to handle
> exactly this problem. I suspect they have a pretty efficient
> implementation, since it is implemented natively in the engine.
>
> http://www.manifold.net/doc/queries_and_geoms.htm
>
> Maybe it's time to lobby to get this added to PostgreSQL as well?
>
> Obe, Regina wrote:
>
>> I know there has been some discussion in the past about how
>> generate_series doesn't scale well when you are dealing with large
>> numbers of geometries. My suspicion was that it was because
>> generate_series has to materialize the result before outputting it and
>> that the result key is not indexed.
>>
>> Unfortunately I don't have geometries large enough to test the theory
>> out, but does seem to be faster for me even for structures where I
>>
> have
>
>> more than 10 or geometries per multipolygon. Can some people try this
>> out. The below will take each multi geom and break it out into single
>> geom structures and will handle in theory a MULTI structure with up to
>> 500,000 geoms and can easily handle more by increasing the number of
>> records in the materialized generate series table.
>>
>> Can you compare this:
>>
>> SELECT g.n
>> INTO pgis_gs
>> FROM generate_series(1,500000) g(n);
>>
>> ALTER TABLE pgis_gs
>> ADD CONSTRAINT pgis_gs_pkey PRIMARY KEY(n);
>>
>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
>> FROM somtable t CROSS JOIN pgis_gs g
>> WHERE g.n <= ST_NumGeometries(the_geom);
>>
>> With these:
>>
>> --this will handle at most 5000 (for small sets of geometries - incurs
>>
> a
>
>> huge penalty)
>>
>> SELECT t.gid, ST_GeometryN(the_geom, g.n) As poly
>> FROM somtable t CROSS JOIN generate_series(1,5000) g(n)
>> WHERE g.n <= ST_NumGeometries(the_geom);
>>
>> -this can handle more but I suspect gets really bad for large numbers
>> SELECT t.gid, ST_GeometryN(the_geom, ST_NumGeometries(the_geom)) As
>>
> poly
>
>> FROM somtable t;
>>
>> Thanks,
>> Regina
>> -----------------------------------------
>> The substance of this message, including any attachments, may be
>> confidential, legally privileged and/or exempt from disclosure
>> pursuant to Massachusetts law. It is intended
>> solely for the addressee. If you received this in error, please
>> contact the sender and delete the material from any computer.
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-users
mailing list