[postgis-devel] GeomUnion speedups
strk at refractions.net
strk at refractions.net
Sat Jun 25 17:04:42 PDT 2005
On Sat, Jun 25, 2005 at 06:58:07PM -0400, Bill Binko wrote:
...
> As I mentioned before, I really think the extra overhead incurred by
> MemGeomUnion (converting from PostGIS<->GEOS repeatedly) can be removed by
> passing an opaque handle to the internal GEOS structures that you're
> building.
Do you mean a memory pointer ?
It would leak on ERROR though...
> That solves the problem when you're not dealing with 50 complex
> multi-polys (like your test), but with 330,000 simple polyes (like in
> mine). If you load all of those into memory and detoast them all, you
> rapidly run into a memory issue, and start thrashing or failing.
>
> Perhaps we could build do it in chunks (have the sfunc have a counter that
> trips a collect every X shapes) and have the ffunc do any that haven't
> been collected yet and then do the buffer()?
Note that the collect() function does its work in couples.
Still by the end of the run you end up with a collection of
330,000 simple polys which is not much smaller then 330,000
distinct ones.
The only way to really reduce memory is dissolving nodes/edges
by means of Overlay operations, performed by GEOS/JTS.
Chunking would be useful for that, with the cost of more
graph constructions and conversions (not so slow).
> That would be nicely configurable (how many per chuck, etc).
I'd use memory size to define chuncks if possible.
> I am surprised that the new collect/buffer is not impacted by order: is
> that a supposition? Or did you test that?
It's the evidence ;)
GEOS operation is invoked only once, so a single graph is built
with all nodes/edges of all geometries, and a single conversion
is made. Order doesn't count then.
Both conversion and graph build (more the latter then the former)
have a cost related to size of input. Order influences the size
of input between calls. Imagine a complex polygon fully contained
inside a simple one. As soon as they are involved togheter in
a union operation the complex polygon will dissolve. If the
container polygon comes *last* probabilities are higher to have
this complex polygon laying around and being converted, noded
and stuffed in a graph for as many times as there are rows in your
table.
With the short-circuit we reduced number of graph-noding
operations in the disjoint-envelopes case, but there are many
cases in which bbox intersect but geometries doesn't or they
do but only for a small portion (ie. not dissolving many
vertexes).
So chunking would again have the ORDERING factor to take in
consideration.
I'm tempted to leave all of this in the user's hand ;)
Given currently available tools you can chose your algorithm
for a performant union operation. These are the ingredients:
collect()
buffer()
ORDER BY
LIMIT
--strk;
More information about the postgis-devel
mailing list