[postgis-users] Help: Performance of spatial query

Jeremy Palmer jpalmer at paradise.net.nz
Mon Mar 15 01:03:51 PST 2004


Hi,

I have been trying to complete a query that tries to locate duplicate points
i.e. points that have been accidentally created twice or more – These points
may not exactly be in the same geometric position. My basic methodology is
to complete a 0.2m proximity check for each geometry point in the database.
The datum of the geometries is in NZGD2000 (basically the same parameters as
WGS84).
To do this I was using the buffer function with the overlaps operator to
find points within 0.000002 of a degree (~ 0.2m) of each other. I was also
trying to stop the point reporting on itself by checking that NOD1.id <
NOD2.id. However as explain query shows below my method produces a Cartesian
product between the 2 aliases of the node table. 

Can anyone help me with constructing a more efficient query? I have provided
the query explain and table details if that helps.


SELECT NOD1.id as nod_id1,
       NOD2.id as nod_id2,
       NOD1.shape as shape1,
       NOD2.shape as shape2
FROM node NOD1,
          node NOD2
WHERE NOD2.shape && buffer(NOD1.shape, 0.000002)
AND   NOD1.id < NOD2.id;

Explain Query was 


  Nested Loop  (cost=590312.88..5062636139085.67 rows=3509048593179
width=288)
   Join Filter: (("inner".shape && buffer("outer".shape, 2e-06::double
precision)) AND ("outer".id < "inner".id))
   ->  Seq Scan on crs_node nod1  (cost=0.00..379897.88 rows=10260188
width=144)
   ->  Materialize  (cost=590312.88..903329.76 rows=10260188 width=144)
         ->  Seq Scan on crs_node nod2  (cost=0.00..379897.88 rows=10260188
width=144)

Table "public. node"
     Column      |         Type         | Modifiers
-----------------+----------------------+-----------
 id              | integer              | not null
 cos_id_official | integer              | not null
 type            | character varying(4) | not null
 status          | character varying(4) | not null
 order_group_off | integer              | not null
 sit_id          | integer              |
 alt_id          | integer              |
 wrk_id_created  | integer              |
 se_row_id       | integer              |
 audit_id        | integer              | not null
 shape           | geometry             |
Indexes:
    "pkey_node" primary key, btree (id)
    "nod_aud_id" unique, btree (audit_id)
    "fk_nod_alt" btree (alt_id)
    "fk_nod_cos" btree (cos_id_official)
    "fk_nod_sit" btree (sit_id)
    "fk_nod_wrk" btree (wrk_id_created)
    "nod_shape_index" gist (shape)
Check constraints:
    "$1" CHECK (srid(shape) = 104108)
    "$2" CHECK (geometrytype(shape) = 'POINT'::text OR shape IS NULL)

Thank you very much

Jeremy Palmer






More information about the postgis-users mailing list