[postgis-users] Filtering Points Based On Proximity

Sun Mar 23 17:15:22 PDT 2008

On Sun, Mar 23, 2008 at 3:50 PM, Oliver Monson <monson36 at gmail.com> wrote:
> Hi,
> I'm a very new PostGIS user but I need to solve a somewhat complex filtering
> problem (anyway I think its complex...)
>
> I have around 300,000 points, each with a lat, lon and altitude (also
> converted to geometry). I need to get a subset of those points, where none
> of them are within 5m (or some other arbitrary distance) of each other. It
> doesnt matter which points get picked over another, as long as whatever the
> data set it creates, that none of the points are within that 5m radius and
> that relatively most of the points are used (the points are scattered across
> a 20km squared area). I have other data like a timestamp for each point so I
> can give priority to the more recently created points, if that helps as a
> starting point. If it makes it any easier, the altitude can be ignored and
> we can treat it as a set of points on a 2D plane.
>  But how would I go about doing this? I really have no idea where to start.
> What would the SQL query look like? Any help would be appreciated, thanks!
>
> -Oliver
>
> _______________________________________________
>  postgis-users mailing list
>  postgis-users at postgis.refractions.net
>  http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>

good problem.
maybe that's solvable in postgis. but i couldnt resist trying with
python and rtree (http://trac.gispython.org/projects/PCL/wiki/Rtree)
creating a dict/hash, where each key is a point at least [dist] away
from any other key, and values are all within the bbox centered at
their key.
if you want points actually within the distance (not just within the
bbox) you can add an extra check there.
this runs pretty quick. takes longer to plot than to do the
intersection queries.
code below.
-brent

from rtree import Rtree
import random
import pylab

points = {}

dist = 500
index = Rtree()

# create some random points and put them in an index.
for i in range(30000):
    x = random.random() * 10000
    y = random.random() * 10000
    pt = (x, y)
    points[i] =  pt
    index.add(i, pt)

print "index created..."

groups = {}
while len(points.values()):
    pt = random.choice(points.values())
    print pt
    bbox = (pt[0] - dist, pt[1] - dist, pt[0] + dist, pt[1] + dist)

    idxs = index.intersection(bbox)
    # add actual distance here, to get those within dist.

    groups[pt] = []
    for idx in sorted(idxs, reverse=True):
        delpt = points[idx]
        groups[pt].append(delpt)
        index.delete(idx, delpt)
        del points[idx]

# groups contains keys where no key is within dist of any other pt
# the values for a given key are all points within dist of that point.
#print groups

# keys plotted in red.
for pt, subpts in groups.iteritems():
    subpts = pylab.array(subpts)
    pylab.plot(subpts[:,0], subpts[:,1], 'k.')
    pylab.plot([pt[0]], [pt[1]], 'ro')

pylab.show()