[postgis-devel] GeomUnion speedups

strk at refractions.net strk at refractions.net
Sun Jun 26 02:17:26 PDT 2005


On Sat, Jun 25, 2005 at 09:40:55PM -0400, Bill Binko wrote:
> On Sun, 26 Jun 2005 strk at refractions.net wrote:
> 
> > On Sat, Jun 25, 2005 at 06:58:07PM -0400, Bill Binko wrote:
> > ...
> > > As I mentioned before, I really think the extra overhead incurred by 
> > > MemGeomUnion (converting from PostGIS<->GEOS repeatedly) can be removed by 
> > > passing an opaque handle to the internal GEOS structures that you're 
> > > building.
> > 
> > Do you mean a memory pointer ?
> > It would leak on ERROR though...
> 
> I do mean that...
> 
> I thought the ffunc got called even if the sfunc errored out... but that 
> might have been more common sense then actual knowledge... I will look it 
> up.  If so, you could cleanup there.  We should look at some of the other 
> aggregation functions out there.  Certainly there is a safe way to do it 
> or they wouldn't bother with the sfunc/ffunc split!

My initial implementation of GeomUnion used an array of pointers to
Geometries detoasted in sfunc as the base type.
Those wouldn't have leaked because under pgsql memory management control
(palloced mem), and I never encountered segfaults due to early deletion.
But that was pretty hackish, as in theory the base type should still
be a real type (serialized). Refer to:
http://postgis.refractions.net/pipermail/postgis-devel/2004-December/000680.html

> > The only way to really reduce memory is dissolving nodes/edges
> > by means of Overlay operations, performed by GEOS/JTS.
> > 
> > Chunking would be useful for that, with the cost of more
> > graph constructions and conversions (not so slow).
> 
> True on the graph constructions, but I was under the (perhaps incorrect 
> assumption) that the conversions could be factored out by passing the 
> pointer.

I'm open to it, but will leave it as last option to avoid leaks.
GOES doesn't allow custom memory management funciont so far so
it's structure would not be under pgsql control.

> > > That would be nicely configurable (how many per chuck, etc).
> > 
> > I'd use memory size to define chuncks if possible.
> 
> I thought about that (I really did), but didn't know how hard it was to 
> keep track of the size of the GEOS structure.

I'd use size of postgis geometry ones.

> > > I am surprised that the new collect/buffer is not impacted by order: is 
> > > that a supposition?  Or did you test that?
> > 
> > It's the evidence ;)
> > GEOS operation is invoked only once, so a single graph is built
> > with all nodes/edges of all geometries, and a single conversion
> > is made. Order doesn't count then.
> 
> So basically, all of the nodes&edges are loaded no matter what.  But 
> doesn't it still have to traverse them all?  Does it know to traverse them 
> in a sensible order (r-tree) or is the tree reflective of the order 
> they're added?

Nope, it doesn't know. But in the incremental version not only nodes&edges
are loaded ALL but they are loaded MULTIPLE TIMES (until they dissolve).

> My original question was: have you actually tested this... I wasn't trying 
> to be facitious (honest!): I just wanted to know whether I needed to test 
> them with the large polygon set I had while I was testing the others.

Sure, since you're there ;)

> > So chunking would again have the ORDERING factor to take in
> > consideration.
> > I'm tempted to leave all of this in the user's hand ;)
> > Given currently available tools you can chose your algorithm
> > for a performant union operation. These are the ingredients:
> > 	collect()
> > 	buffer()
> > 	ORDER BY
> > 	LIMIT
> 
> I'd hope both standards-compliance and common decency would force you to
> choose a reasonable combination of those for geomUnion() :-b
> 
> Just kidding... it's obvious we're making progress.  Hopefully my next 
> note will have more info and less opinion :)

Thank you! If you can find a way to also track memory occupation
that would be great for comparing manual buffer(collect()) queries
and corresponding internal implementation. 

I've committed the internal implementation in postgis head.
You can switch it on and of at compile time using the UNITE_USING_BUFFER 
define. It defaults to ON for geos (and OFF for JTS which segfaults).

Thanks

--strk;



More information about the postgis-devel mailing list