[postgis-devel] GeomUnion speedups

Sat Jun 25 18:40:55 PDT 2005

On Sun, 26 Jun 2005 strk at refractions.net wrote:

> On Sat, Jun 25, 2005 at 06:58:07PM -0400, Bill Binko wrote:
> ...
> > As I mentioned before, I really think the extra overhead incurred by 
> > MemGeomUnion (converting from PostGIS<->GEOS repeatedly) can be removed by 
> > passing an opaque handle to the internal GEOS structures that you're 
> > building.
> 
> Do you mean a memory pointer ?
> It would leak on ERROR though...

I do mean that...

I thought the ffunc got called even if the sfunc errored out... but that 
might have been more common sense then actual knowledge... I will look it 
up.  If so, you could cleanup there.  We should look at some of the other 
aggregation functions out there.  Certainly there is a safe way to do it 
or they wouldn't bother with the sfunc/ffunc split!

> The only way to really reduce memory is dissolving nodes/edges
> by means of Overlay operations, performed by GEOS/JTS.
> 
> Chunking would be useful for that, with the cost of more
> graph constructions and conversions (not so slow).

True on the graph constructions, but I was under the (perhaps incorrect 
assumption) that the conversions could be factored out by passing the 
pointer.

> > That would be nicely configurable (how many per chuck, etc).
> 
> I'd use memory size to define chuncks if possible.

I thought about that (I really did), but didn't know how hard it was to 
keep track of the size of the GEOS structure.

> > I am surprised that the new collect/buffer is not impacted by order: is 
> > that a supposition?  Or did you test that?
> 
> It's the evidence ;)
> GEOS operation is invoked only once, so a single graph is built
> with all nodes/edges of all geometries, and a single conversion
> is made. Order doesn't count then.

So basically, all of the nodes&edges are loaded no matter what.  But 
doesn't it still have to traverse them all?  Does it know to traverse them 
in a sensible order (r-tree) or is the tree reflective of the order 
they're added?

My original question was: have you actually tested this... I wasn't trying 
to be facitious (honest!): I just wanted to know whether I needed to test 
them with the large polygon set I had while I was testing the others.

> So chunking would again have the ORDERING factor to take in
> consideration.
> I'm tempted to leave all of this in the user's hand ;)
> Given currently available tools you can chose your algorithm
> for a performant union operation. These are the ingredients:
> 	collect()
> 	buffer()
> 	ORDER BY
> 	LIMIT

I'd hope both standards-compliance and common decency would force you to
choose a reasonable combination of those for geomUnion() :-b

Just kidding... it's obvious we're making progress.  Hopefully my next 
note will have more info and less opinion :)

Bill