[geos-devel] GEOSGetCoordinates() and performance (postgis bug fix)

Paul Ramsey pramsey at refractions.net
Mon Oct 27 10:20:37 EST 2003


I was talking about this with Chris a week ago, and we came to similar 
conclusions, with one difference: we think we need to have both methods 
available.  A CPU intensive but memory light version for large-but-slow 
queries and an optimized but memory heavy version for small-but-fast 
interactive queries.  Neither version can meet all the likely use cases 
for a union aggregate on their own.
P

On Monday, October 27, 2003, at 03:36 AM, strk wrote:

> Defining a new postgres type for geos-geom might give a better 
> performance
> but sounds (IMHO) like a slow development process. I've been thinking
> about this thought. Here is what I propose:
>
> We now have, for each row in input relation:
>
> CASE 1:
> 	ITERATE:
> 	1) pgis->geos conversion (first arg)
> 	2) pgis->geos conversion (second arg)
> 	3) geosUnion  call
> 	4) geos->pgis conversion (result)
>
> Note that step 1 will be for each iteration after the first EXACTLY the
> output of step 3 from previous iteration. We could use a final func
> for the whole job to be done and a state transaction function to just
> gather informations about which geometries to work on.
>
> CASE 2:
> 	ITERATE:
> 	1) get geometry array (first arg)
> 	2) get geometry (second arg)
> 	3) append geometry to array
> 	END:
> 	1) UNION = array[0]->geos (conversion)
> 	2) for each array item N > 0
> 	3)	nextgeom = array[N]->geos (conversion)
> 	4) 	geosUnion call (UNION, nextgeom)
> 	5) end
> 	6) geos->pgis conversion (UNION)
>
> With N input geometries we will have the following conversions:
> 	CASE 1: 2*N pgis->geos + N geos->pgis (might be missing something)
> 	CASE 2:   N pgis->geos + 1 geos->pgis
>
> Anyway, CASE 2 seems nicer with system resources.. but it might be
> very memory expensive! If output geometry will be just a collection
> of input geometries that big memory will be allocated anyway, thus
> as long as we release each array element as soon as added to the
> output geometry this will not be a big deal. On the other hand, if
> resulting union will remove a lot of borders allocated memory will
> be much more then the size required by step-by-step processing.
>
> What do you think about this solution ?
>
> --strk;
>
> chodgson wrote:
>> The way postgres handles user-defined aggregate functions is what is 
>> limiting
>> you. The problem is that aggregates are designed to use a 
>> "one-at-a-time"
>> approach to aggregating, which means that at each step of the process 
>> you must
>> have "geom UNION geom = geom".
>>
>> Clearly, if there was a postgres type which was a serialized GEOS 
>> gemetry (call
>> it type geos_geom) then we could write a geos_geom_union function 
>> which was
>> geos_geom UNION geos_geom = geos_geom, and as long as the 
>> serialization can be
>> done quickly (and is even possible?) then a faster union aggregate 
>> could result
>> from a query like:
>>
>> SELECT geos_geom_to_geom( geos_geom_union( geom_to_geos_geom( 
>> the-geom ) ) )
>> FROM the_table GROUP BY class_id;
>>
>> I don't know enough C/C++ to know if such a serialization of a GEOS 
>> geometry is
>> possible, or fast... but if so, this would probably do it.
>>
>> It could even be possible to re-write all of the geos functions to 
>> accept
>> geos_geom objects, and have standard geom object be auto-typecast into
>> geos_geom objects (using the "geom_to_geos_geom()" function, of 
>> course)... then
>> it would be easy to cache the geos objects in between processing 
>> stages - who
>> knows how big they might be... again, if this is even possible.
>>
>> Chris
>>
>> Quoting strk <strk at keybit.net>:
>>
>>> dblasby wrote:
>>>> Whenever I call the getCoordinates() function, I immediately make a 
>>>> copy
>>>> of the list by converting it from the GEOS coordinates to PostGIS 
>>>> POINT3Ds.
>>>>
>>>> So, feel free to have all the Postgis GetCoordinates() access 
>>>> read-only
>>>> version.
>>>
>>> I've made a specialized POINT3D-array-from-Polygon function using 
>>> read-only
>>> direct access to internal Polygon's LinearRings. Speed up is to 
>>> evaluate
>>> but personally I can not really appreciate it...
>>>
>>>> UNIT probably spending a lot of time building Postgis and GEOS 
>>>> geometries.
>>>>
>>>> 1. start with 2 input postgis geometrys
>>>> 2. convert them to GEOS (O(n) where n=# of subcomponents)
>>>> 3. run union
>>>> 4. convert back to postgis (O(n) where n=# of subcomponents)
>>>> 5. start again at step #1
>>>>
>>>> As you can see there possibly O(n^2) constructions (and certainly 
>>>> O(n^2)
>>>> point copies).
>>>>
>>>> If you take an input dataset of disjoint polygons, you'll see this 
>>>> is
>>>> O(n^2) in terms of geometry convertions and copying.
>>>
>>> Do you think defining an ad-hoc postgresql type will give it a burst 
>>> ?
>>>
>>> --strk;
>>>
>>> _______________________________________________
>>> geos-devel mailing list
>>> geos-devel at geos.refractions.net
>>> http://geos.refractions.net/mailman/listinfo/geos-devel
>>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> geos-devel mailing list
>> geos-devel at geos.refractions.net
>> http://geos.refractions.net/mailman/listinfo/geos-devel
>>
      Paul Ramsey
      Refractions Research
      Email: pramsey at refractions.net
      Phone: (250) 885-0632




More information about the geos-devel mailing list