[postgis-users] Filtering Points Based On Proximity

Kevin Neufeld kneufeld at refractions.net
Sun Mar 23 23:47:07 PDT 2008


Hi Oliver,

Echoing Brent, good problem!  At first glance, I see two possible solutions:
1. Create a PL/Pgsql function that will delete one point at a time from 
your point table where it is too close to any other point (perhaps 
starting with the point that has the oldest timestamp).
2. Find "groups" of close points and keep only the point with the newest 
timestamp within each group.

Option 1 will be incredibly slow ... I would not recommend this 
approach.  However, if you're interested, below is a sample function to 
get you going.

Option 2 works relatively fast, but this solution resembles that of a 
transitive closure problem, so you may have to run this solution several 
times to finally end up with a table where no two points are within 5m 
of each other.

Try something like this...
DELETE FROM pts
WHERE id IN (
  SELECT id FROM pts
 
  EXCEPT

  -- create a new table of points where
  --
  (SELECT DISTINCT ON (a.id) b.id
  FROM pts a, pts b
  WHERE ST_DWithin(a.the_geom,b.the_geom, 5)
  ORDER BY a.id, b.my_timestamp DESC)
  );


Here is some sample code I used to simulate your problem:
-- Create a sample table of 300,000 random points over
-- a 20km x 20km area in BCAlbers projection
-- with a random timestamp
CREATE TABLE tmp (id serial, my_timestamp date, the_geom geometry);
INSERT INTO tmp (my_timestamp, the_geom)
  SELECT '2008-01-01'::date + (random()*365)::integer,
         setsrid(
           makepoint(1440000 + random()*20000,
                      460000 + random()*20000),
           3005)
  FROM generate_series(1, 300000);
CREATE INDEX tmp_geom_idx ON tmp USING GIST (the_geom);
ANALYZE tmp;

-- Now running option 2
postgis=# DELETE FROM tmp
postgis-# WHERE id IN (
postgis(#   SELECT id FROM tmp
postgis(# 
postgis(#   EXCEPT
postgis(#
postgis(#   (SELECT DISTINCT ON (a.id) b.id
postgis(#   FROM tmp a, tmp b
postgis(#   WHERE a.the_geom && expand(b.the_geom, 5)
postgis(#   AND distance(a.the_geom, b.the_geom) < 5
postgis(#   ORDER BY a.id, b.my_timestamp DESC)
postgis(#   );
NOTICE:  LWGEOM_gist_joinsel called with arguments that are not column 
references
DELETE 8679
postgis=# DELETE FROM tmp
postgis-# WHERE id IN (
postgis(#   SELECT id FROM tmp
postgis(# 
postgis(#   EXCEPT
postgis(#
postgis(#   (SELECT DISTINCT ON (a.id) b.id
postgis(#   FROM tmp a, tmp b
postgis(#   WHERE a.the_geom && expand(b.the_geom, 5)
postgis(#   AND distance(a.the_geom, b.the_geom) < 5
postgis(#   ORDER BY a.id, b.my_timestamp DESC)
postgis(#   );
NOTICE:  LWGEOM_gist_joinsel called with arguments that are not column 
references
DELETE 77
postgis=# DELETE FROM tmp
postgis-# WHERE id IN (
postgis(#   SELECT id FROM tmp
postgis(# 
postgis(#   EXCEPT
postgis(#
postgis(#   (SELECT DISTINCT ON (a.id) b.id
postgis(#   FROM tmp a, tmp b
postgis(#   WHERE a.the_geom && expand(b.the_geom, 5)
postgis(#   AND distance(a.the_geom, b.the_geom) < 5
postgis(#   ORDER BY a.id, b.my_timestamp DESC)
postgis(#   );
NOTICE:  LWGEOM_gist_joinsel called with arguments that are not column 
references
DELETE 1
postgis=# DELETE FROM tmp
postgis-# WHERE id IN (
postgis(#   SELECT id FROM tmp
postgis(# 
postgis(#   EXCEPT
postgis(#
postgis(#   (SELECT DISTINCT ON (a.id) b.id
postgis(#   FROM tmp a, tmp b
postgis(#   WHERE a.the_geom && expand(b.the_geom, 5)
postgis(#   AND distance(a.the_geom, b.the_geom) < 5
postgis(#   ORDER BY a.id, b.my_timestamp DESC)
postgis(#   );
NOTICE:  LWGEOM_gist_joinsel called with arguments that are not column 
references
DELETE 0


Here's a sample function that also works using option 1, but this will 
be very slow.
CREATE OR REPLACE FUNCTION del_pts() RETURNS VOID AS
$$
DECLARE
  pt_id integer;
 
BEGIN
  LOOP
    SELECT INTO pt_id a.id
      FROM pts a, pts b
      WHERE a.id != b.id
      AND a.the_geom && expand(b.the_geom, 5)
      AND distance(a.the_geom, b.the_geom) < 5
      ORDER BY a.my_timestamp ASC
      LIMIT 1;
    
    EXIT WHEN NOT FOUND;
  
    DELETE FROM tmp
      WHERE id = pt_id;
  
  END LOOP;
END;
$$
LANGUAGE plpgsql;


Cheers,
Kevin

Oliver Monson wrote:
> Hi,
> I'm a very new PostGIS user but I need to solve a somewhat complex 
> filtering problem (anyway I think its complex...)
>
> I have around 300,000 points, each with a lat, lon and altitude (also 
> converted to geometry). I need to get a subset of those points, where 
> none of them are within 5m (or some other arbitrary distance) of each 
> other. It doesnt matter which points get picked over another, as long 
> as whatever the data set it creates, that none of the points are 
> within that 5m radius and that relatively most of the points are used 
> (the points are scattered across a 20km squared area). I have other 
> data like a timestamp for each point so I can give priority to the 
> more recently created points, if that helps as a starting point. If it 
> makes it any easier, the altitude can be ignored and we can treat it 
> as a set of points on a 2D plane.
> But how would I go about doing this? I really have no idea where to 
> start. What would the SQL query look like? Any help would be 
> appreciated, thanks!
>
> -Oliver
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   



More information about the postgis-users mailing list