[postgis-devel] Prepared Geometry API

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Mon Oct 6 04:26:21 PDT 2008


Paul Ramsey wrote:

> Now that the things actually work, a discussion of the user-facing API
> is worth entering into.
> 
> Right now, we have
> 
> ST_Contains(geom_a, geom_b, id_a)
> ST_ContainsProperly(geom_a, geom_b, id_a)
> ST_Covers(geom_a, geom_b, id_a)
> ST_Intersects(geom_a, geom_b, id_a, id_b)
> 
> We could easily enough add
> 
> ST_Within(geom_a, geom_b, id_b)
> 
> as an inverted implementation of contains.
> 
> The id_a/id_b keys are usually going to be primary keys, and
> occasionally constants, in the case of tests against geometry
> literals. Because this approach is the fastest, and guaranteed to not
> degrade as the candidates get larger, I would like to see this API
> around, no matter what we do with any other API.
> 
> Discussion? I am loath to volunteer do to a head-to-head of a
> memcmp'ed cache against a key'ed cache, but that's because I'm lazy.
> Oh, wait, I do have a trivial solution to that, though it isn't a
> direct comparison:

Yeah. Just to re-iterate, I'm really not too happy with these new APIs :(

My main argument against this is that we are suffering from a severe 
bout of premature optimisation based on someone's conjecture that 
memcmp() is slow - but no-one has presented any substantial evidence 
that this is the case. Bearing in mind that we have many other 
substantial overheads in the process such as TOAST overhead and GEOS 
creation overhead I think we really are getting ahead of ourselves here. 
Also remember that as Tom pointed out, memcmp() exits as soon as it 
finds its first non-matching word and so the overhead should be very 
minimal.

My second argument is that if we use the new APIs then people will have 
to re-write their existing code to see the benefits. Basing these APIs 
on a primary key is not a smart thing to do from day 1 since not all 
queries may necessarily have a PK - think subselects. I can see lots of 
horrible hacks involving generate_series() coming out of the woodwork here.

In terms of preference, I would suggest going with memcmp() by default 
for 1.3 branch using the existing 2 parameter APIs (we can easily alias 
the existing 3 parameter *prepared functions API for those that use it) 
and then taking further feedback on board. Then if there is a valid 
argument for adding these functions based on performance feedback from 
1.3.4, then let's consider these APIs again for 1.4. But to add an 
arguably more complicated special-case API causing people to have to 
re-write their applications, in a point release, based on no practical 
evidence is just plain crazy.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063



More information about the postgis-devel mailing list