[postgis-devel] GeomUnion speedups
Bill Binko
bill at binko.net
Sat Jun 25 18:40:55 PDT 2005
On Sun, 26 Jun 2005 strk at refractions.net wrote:
> On Sat, Jun 25, 2005 at 06:58:07PM -0400, Bill Binko wrote:
> ...
> > As I mentioned before, I really think the extra overhead incurred by
> > MemGeomUnion (converting from PostGIS<->GEOS repeatedly) can be removed by
> > passing an opaque handle to the internal GEOS structures that you're
> > building.
>
> Do you mean a memory pointer ?
> It would leak on ERROR though...
I do mean that...
I thought the ffunc got called even if the sfunc errored out... but that
might have been more common sense then actual knowledge... I will look it
up. If so, you could cleanup there. We should look at some of the other
aggregation functions out there. Certainly there is a safe way to do it
or they wouldn't bother with the sfunc/ffunc split!
> The only way to really reduce memory is dissolving nodes/edges
> by means of Overlay operations, performed by GEOS/JTS.
>
> Chunking would be useful for that, with the cost of more
> graph constructions and conversions (not so slow).
True on the graph constructions, but I was under the (perhaps incorrect
assumption) that the conversions could be factored out by passing the
pointer.
> > That would be nicely configurable (how many per chuck, etc).
>
> I'd use memory size to define chuncks if possible.
I thought about that (I really did), but didn't know how hard it was to
keep track of the size of the GEOS structure.
> > I am surprised that the new collect/buffer is not impacted by order: is
> > that a supposition? Or did you test that?
>
> It's the evidence ;)
> GEOS operation is invoked only once, so a single graph is built
> with all nodes/edges of all geometries, and a single conversion
> is made. Order doesn't count then.
So basically, all of the nodes&edges are loaded no matter what. But
doesn't it still have to traverse them all? Does it know to traverse them
in a sensible order (r-tree) or is the tree reflective of the order
they're added?
My original question was: have you actually tested this... I wasn't trying
to be facitious (honest!): I just wanted to know whether I needed to test
them with the large polygon set I had while I was testing the others.
> So chunking would again have the ORDERING factor to take in
> consideration.
> I'm tempted to leave all of this in the user's hand ;)
> Given currently available tools you can chose your algorithm
> for a performant union operation. These are the ingredients:
> collect()
> buffer()
> ORDER BY
> LIMIT
I'd hope both standards-compliance and common decency would force you to
choose a reasonable combination of those for geomUnion() :-b
Just kidding... it's obvious we're making progress. Hopefully my next
note will have more info and less opinion :)
Bill
More information about the postgis-devel
mailing list