[postgis-users] Bounding Box Query

Mon Mar 29 10:04:06 PST 2004

> Yes, I do to both questions.

Good, that makes it easier.

> normally, In PostGIS you get the x and y of each point shape using the 
> x(geometry) and y(geometry) functions.  But In our database, the 
geometry is 
> in the field called the_geom, so you would get the x and y for each 
point 
> using x(the_geom) and y(the_geom).

That's good too.  You shouldn't have to use x() or y() functions with your 
problem at all.

> So the query below works except it is very sloooooooow.

Sorry, I had thought you were reposting your original query, now I see 
that you are just dealing with the performance issue.

> SELECT str_num || ' ' || name as full_address,
> > > >    distance(the_geom,'POINT(2227284.1963667543
> > > > 1435389.6730164126)'::geometry) as dist
> > > >    FROM address
> > > > ORDER BY dist limit 1

I understand what you are doing here, but why don't you reduce it to a 
"normal" query that uses the_geom from the points and compares it to 
the_geom from the lines?  RAther than explicitly saying a coordinate in 
the select statement.  I.e.

SELECT str_num || ' ' || name as full_address,
       distance(a.the_geom,p.the_geom) as dist
FROM address a,
       mypoint p
WHERE ......<add where clause that selects the particular point you are 
interested in>...
ORDER BY dist limit 1

Now on to performance..
First off, do you have GIST indexes built for the point and lines tables?
If not, build them  (ask if you need help) and then try it again.  How 
many records are dealing with in both tables?

Distance() will compare the point with EVERY road line, hence the 
inefficiency.  If you have spatial indexes (GIST) built that may improve 
performance for you substantially. 

What you want to do is help reduce the number features that distance has 
to compare.  In order to do this you want to use another to method to 
first find the features that are generally close to one another.  The 
"&&"/bounding box operator can help with this.  What this operator does is 
it creates a rectangle (aka bounding box) around the extent of the 
features and compares those to each other.  If they overlap, then you can 
use the distance function to check more accurately. 

You can do this by adding to your WHERE clause:
AND a.the_geom && p.the_geom

In your case, since you are using a point feature - it won't have much of 
an extent since it's a discreet location.  Therefore you might not overlap 
features that could be very nearby.  Also, if you have a line segment that 
is long and very narrow, it will have a very narrow "bounding box" that 
might not overlap any points.

So the trick is that we want to expand the bounding box so that it 
overlaps more features, but not ALL of them!  Or else you're back to your 
performance problem again.  You can use the EXPAND() function to do this - 
but you need to decide first on the kind of distance tolerance you want it 
to check around the features.

Replace the above example with:
AND a.the_geom && expand(p.the_geom,1000)
It will search within a distance of 1000 units to find overlap bounding 
boxes from the lines.  Boy, it just hit me that I'm not sure if you are 
comparing points with points or points with lines.  Hopefully lines :)

You could expand your lines too using the same method.

So a final solution might look something like:
SELECT str_num || ' ' || name as full_address,
       distance(a.the_geom,p.the_geom) as dist
FROM address a,
       mypoint p
WHERE p.point_id = 1 
     AND a.the_geom && expand(p.the_geom,1000)
ORDER BY dist limit 1

Does that help?

Tyler