[postgis-users] PostGIS spatial query performance

Fri Aug 9 04:05:30 PDT 2002

>> One last question: How do I get the inverse result of the above query,
>> i.e all parks, that do *not* have a road running through them?

Syntactically, this can be done by just reversing the query:

select park.*
from park,roads
where not
   (park.the_geom && roads.the_geom
    and distance(park.the_geom,mypoints.the_geom) = 0
   );

However this will take ages, as this query apparently cannot be optimized.
With a subquery you get much better results:

select park.*
from park.roads
where park.oid not in
	( select park.oid, park.the_geom, roads.the_geom
	  from park,roads
	  where park.the_geom && roads.the_geom
	 );

PostgreSQL first executes the geographical selection in the subquery and
uses the res ults to invert the main query. Notice the use of oid as
link-variable; geometry variables like the_geom cannot be used in a "(not)
in"  link.

This works fine for small tables. For large tables and joins I ran into
problems. I counted overlaps of 419 railway lines with three Dutch
administrative levels: municipalities (633), districts (2338) and
neighborhoods (10381), all from pretty large shapefiles (10M). Positive
queries were efficient (40 sec. at most), negative queries with subselects
not much slower for municipalities and districts, but failing for
neighborhoods ("pg_recvbuf: unexpected end on client connection"). I suppose
this can be solved with PL/PGSQL scripts using cursors, but if anyone can
think of a better solution, let me know.

Jan

Jan Hartmann
Department of Geography
University of Amsterdam
jhart at frw.uva.nl