[postgis-devel] Prepared Geometry API
Mark Cave-Ayland
mark.cave-ayland at siriusit.co.uk
Mon Oct 6 04:26:21 PDT 2008
Paul Ramsey wrote:
> Now that the things actually work, a discussion of the user-facing API
> is worth entering into.
>
> Right now, we have
>
> ST_Contains(geom_a, geom_b, id_a)
> ST_ContainsProperly(geom_a, geom_b, id_a)
> ST_Covers(geom_a, geom_b, id_a)
> ST_Intersects(geom_a, geom_b, id_a, id_b)
>
> We could easily enough add
>
> ST_Within(geom_a, geom_b, id_b)
>
> as an inverted implementation of contains.
>
> The id_a/id_b keys are usually going to be primary keys, and
> occasionally constants, in the case of tests against geometry
> literals. Because this approach is the fastest, and guaranteed to not
> degrade as the candidates get larger, I would like to see this API
> around, no matter what we do with any other API.
>
> Discussion? I am loath to volunteer do to a head-to-head of a
> memcmp'ed cache against a key'ed cache, but that's because I'm lazy.
> Oh, wait, I do have a trivial solution to that, though it isn't a
> direct comparison:
Yeah. Just to re-iterate, I'm really not too happy with these new APIs :(
My main argument against this is that we are suffering from a severe
bout of premature optimisation based on someone's conjecture that
memcmp() is slow - but no-one has presented any substantial evidence
that this is the case. Bearing in mind that we have many other
substantial overheads in the process such as TOAST overhead and GEOS
creation overhead I think we really are getting ahead of ourselves here.
Also remember that as Tom pointed out, memcmp() exits as soon as it
finds its first non-matching word and so the overhead should be very
minimal.
My second argument is that if we use the new APIs then people will have
to re-write their existing code to see the benefits. Basing these APIs
on a primary key is not a smart thing to do from day 1 since not all
queries may necessarily have a PK - think subselects. I can see lots of
horrible hacks involving generate_series() coming out of the woodwork here.
In terms of preference, I would suggest going with memcmp() by default
for 1.3 branch using the existing 2 parameter APIs (we can easily alias
the existing 3 parameter *prepared functions API for those that use it)
and then taking further feedback on board. Then if there is a valid
argument for adding these functions based on performance feedback from
1.3.4, then let's consider these APIs again for 1.4. But to add an
arguably more complicated special-case API causing people to have to
re-write their applications, in a point release, based on no practical
evidence is just plain crazy.
ATB,
Mark.
--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
More information about the postgis-devel
mailing list