[geos-devel] GEOSGetCoordinates() and performance (postgis bug fix)

strk strk at keybit.net
Mon Oct 27 06:36:48 EST 2003


Defining a new postgres type for geos-geom might give a better performance
but sounds (IMHO) like a slow development process. I've been thinking 
about this thought. Here is what I propose:

We now have, for each row in input relation:

CASE 1:
	ITERATE:
	1) pgis->geos conversion (first arg)
	2) pgis->geos conversion (second arg)
	3) geosUnion  call
	4) geos->pgis conversion (result)

Note that step 1 will be for each iteration after the first EXACTLY the
output of step 3 from previous iteration. We could use a final func
for the whole job to be done and a state transaction function to just
gather informations about which geometries to work on.

CASE 2:
	ITERATE:
	1) get geometry array (first arg)
	2) get geometry (second arg)
	3) append geometry to array 
	END:
	1) UNION = array[0]->geos (conversion)
	2) for each array item N > 0
	3)	nextgeom = array[N]->geos (conversion)
	4) 	geosUnion call (UNION, nextgeom)
	5) end
	6) geos->pgis conversion (UNION)

With N input geometries we will have the following conversions:
	CASE 1: 2*N pgis->geos + N geos->pgis (might be missing something)
	CASE 2:   N pgis->geos + 1 geos->pgis 

Anyway, CASE 2 seems nicer with system resources.. but it might be
very memory expensive! If output geometry will be just a collection
of input geometries that big memory will be allocated anyway, thus
as long as we release each array element as soon as added to the
output geometry this will not be a big deal. On the other hand, if
resulting union will remove a lot of borders allocated memory will
be much more then the size required by step-by-step processing.

What do you think about this solution ?

--strk;

chodgson wrote:
> The way postgres handles user-defined aggregate functions is what is limiting 
> you. The problem is that aggregates are designed to use a "one-at-a-time" 
> approach to aggregating, which means that at each step of the process you must 
> have "geom UNION geom = geom". 
> 
> Clearly, if there was a postgres type which was a serialized GEOS gemetry (call 
> it type geos_geom) then we could write a geos_geom_union function which was 
> geos_geom UNION geos_geom = geos_geom, and as long as the serialization can be 
> done quickly (and is even possible?) then a faster union aggregate could result 
> from a query like:
> 
> SELECT geos_geom_to_geom( geos_geom_union( geom_to_geos_geom( the-geom ) ) ) 
> FROM the_table GROUP BY class_id;
> 
> I don't know enough C/C++ to know if such a serialization of a GEOS geometry is 
> possible, or fast... but if so, this would probably do it.
> 
> It could even be possible to re-write all of the geos functions to accept 
> geos_geom objects, and have standard geom object be auto-typecast into 
> geos_geom objects (using the "geom_to_geos_geom()" function, of course)... then 
> it would be easy to cache the geos objects in between processing stages - who 
> knows how big they might be... again, if this is even possible.
> 
> Chris
> 
> Quoting strk <strk at keybit.net>:
> 
> > dblasby wrote:
> > > Whenever I call the getCoordinates() function, I immediately make a copy
> > > of the list by converting it from the GEOS coordinates to PostGIS POINT3Ds.
> > >
> > > So, feel free to have all the Postgis GetCoordinates() access read-only
> > > version.
> > 
> > I've made a specialized POINT3D-array-from-Polygon function using read-only
> > direct access to internal Polygon's LinearRings. Speed up is to evaluate
> > but personally I can not really appreciate it...
> > 
> > > UNIT probably spending a lot of time building Postgis and GEOS geometries.
> > >
> > > 1. start with 2 input postgis geometrys
> > > 2. convert them to GEOS (O(n) where n=# of subcomponents)
> > > 3. run union
> > > 4. convert back to postgis (O(n) where n=# of subcomponents)
> > > 5. start again at step #1
> > >
> > > As you can see there possibly O(n^2) constructions (and certainly O(n^2)
> > > point copies).
> > >
> > > If you take an input dataset of disjoint polygons, you'll see this is
> > > O(n^2) in terms of geometry convertions and copying.
> > 
> > Do you think defining an ad-hoc postgresql type will give it a burst ?
> > 
> > --strk;
> > 
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> > 
> 
> 
> 
> 
> 
> _______________________________________________
> geos-devel mailing list
> geos-devel at geos.refractions.net
> http://geos.refractions.net/mailman/listinfo/geos-devel



More information about the geos-devel mailing list