[geos-devel] GEOSGetCoordinates() and performance (postgis bug
fix)
Paul Ramsey
pramsey at refractions.net
Mon Oct 27 10:20:37 EST 2003
I was talking about this with Chris a week ago, and we came to similar
conclusions, with one difference: we think we need to have both methods
available. A CPU intensive but memory light version for large-but-slow
queries and an optimized but memory heavy version for small-but-fast
interactive queries. Neither version can meet all the likely use cases
for a union aggregate on their own.
P
On Monday, October 27, 2003, at 03:36 AM, strk wrote:
> Defining a new postgres type for geos-geom might give a better
> performance
> but sounds (IMHO) like a slow development process. I've been thinking
> about this thought. Here is what I propose:
>
> We now have, for each row in input relation:
>
> CASE 1:
> ITERATE:
> 1) pgis->geos conversion (first arg)
> 2) pgis->geos conversion (second arg)
> 3) geosUnion call
> 4) geos->pgis conversion (result)
>
> Note that step 1 will be for each iteration after the first EXACTLY the
> output of step 3 from previous iteration. We could use a final func
> for the whole job to be done and a state transaction function to just
> gather informations about which geometries to work on.
>
> CASE 2:
> ITERATE:
> 1) get geometry array (first arg)
> 2) get geometry (second arg)
> 3) append geometry to array
> END:
> 1) UNION = array[0]->geos (conversion)
> 2) for each array item N > 0
> 3) nextgeom = array[N]->geos (conversion)
> 4) geosUnion call (UNION, nextgeom)
> 5) end
> 6) geos->pgis conversion (UNION)
>
> With N input geometries we will have the following conversions:
> CASE 1: 2*N pgis->geos + N geos->pgis (might be missing something)
> CASE 2: N pgis->geos + 1 geos->pgis
>
> Anyway, CASE 2 seems nicer with system resources.. but it might be
> very memory expensive! If output geometry will be just a collection
> of input geometries that big memory will be allocated anyway, thus
> as long as we release each array element as soon as added to the
> output geometry this will not be a big deal. On the other hand, if
> resulting union will remove a lot of borders allocated memory will
> be much more then the size required by step-by-step processing.
>
> What do you think about this solution ?
>
> --strk;
>
> chodgson wrote:
>> The way postgres handles user-defined aggregate functions is what is
>> limiting
>> you. The problem is that aggregates are designed to use a
>> "one-at-a-time"
>> approach to aggregating, which means that at each step of the process
>> you must
>> have "geom UNION geom = geom".
>>
>> Clearly, if there was a postgres type which was a serialized GEOS
>> gemetry (call
>> it type geos_geom) then we could write a geos_geom_union function
>> which was
>> geos_geom UNION geos_geom = geos_geom, and as long as the
>> serialization can be
>> done quickly (and is even possible?) then a faster union aggregate
>> could result
>> from a query like:
>>
>> SELECT geos_geom_to_geom( geos_geom_union( geom_to_geos_geom(
>> the-geom ) ) )
>> FROM the_table GROUP BY class_id;
>>
>> I don't know enough C/C++ to know if such a serialization of a GEOS
>> geometry is
>> possible, or fast... but if so, this would probably do it.
>>
>> It could even be possible to re-write all of the geos functions to
>> accept
>> geos_geom objects, and have standard geom object be auto-typecast into
>> geos_geom objects (using the "geom_to_geos_geom()" function, of
>> course)... then
>> it would be easy to cache the geos objects in between processing
>> stages - who
>> knows how big they might be... again, if this is even possible.
>>
>> Chris
>>
>> Quoting strk <strk at keybit.net>:
>>
>>> dblasby wrote:
>>>> Whenever I call the getCoordinates() function, I immediately make a
>>>> copy
>>>> of the list by converting it from the GEOS coordinates to PostGIS
>>>> POINT3Ds.
>>>>
>>>> So, feel free to have all the Postgis GetCoordinates() access
>>>> read-only
>>>> version.
>>>
>>> I've made a specialized POINT3D-array-from-Polygon function using
>>> read-only
>>> direct access to internal Polygon's LinearRings. Speed up is to
>>> evaluate
>>> but personally I can not really appreciate it...
>>>
>>>> UNIT probably spending a lot of time building Postgis and GEOS
>>>> geometries.
>>>>
>>>> 1. start with 2 input postgis geometrys
>>>> 2. convert them to GEOS (O(n) where n=# of subcomponents)
>>>> 3. run union
>>>> 4. convert back to postgis (O(n) where n=# of subcomponents)
>>>> 5. start again at step #1
>>>>
>>>> As you can see there possibly O(n^2) constructions (and certainly
>>>> O(n^2)
>>>> point copies).
>>>>
>>>> If you take an input dataset of disjoint polygons, you'll see this
>>>> is
>>>> O(n^2) in terms of geometry convertions and copying.
>>>
>>> Do you think defining an ad-hoc postgresql type will give it a burst
>>> ?
>>>
>>> --strk;
>>>
>>> _______________________________________________
>>> geos-devel mailing list
>>> geos-devel at geos.refractions.net
>>> http://geos.refractions.net/mailman/listinfo/geos-devel
>>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> geos-devel mailing list
>> geos-devel at geos.refractions.net
>> http://geos.refractions.net/mailman/listinfo/geos-devel
>>
Paul Ramsey
Refractions Research
Email: pramsey at refractions.net
Phone: (250) 885-0632
More information about the geos-devel
mailing list