[postgis-users] Massive Performance Issues

Stephen Frost sfrost at snowman.net
Tue Jul 17 18:48:38 PDT 2007


* Alan Cunnane (alan_cunnane at yahoo.co.uk) wrote:
> Ah I am sorry here is the eplain analyze requested. The explain analyze query is shown below:
>  Total runtime: 71155.967 ms

72 seconds, eh?  The explain made it look alot worse.  Not like that's
good though, clearly needs to be improved. :)

> At the moment it is not returning any rows, however this could be because
> there are no two routes which intersect at this time. That can be worked
> out later but the performance of the query still needs to improve.

I'm wondering if there isn't a problem with the query itself...  A plan
like that and returning 0 rows indicates to me that there's something
not quite right...  I broke the query out some and reformatted it:

SELECT
  f.stop_reference AS origstop,
  a.stop_reference AS origconnect,
  h.route_number,
  c.stop_reference AS destconnect,
  i.route_number,
  d.stop_reference AS deststop,
  (e.arrival_time - g.depart_time) AS time
FROM
  service1   a,
  JOIN bus_stops1 d ON TRUE
  JOIN bus_stops1 f ON TRUE
  JOIN stop_link  b ON (a.stop_reference = b.stop_a)
  JOIN service1   c ON (c.stop_reference = b.stop_b)
  JOIN service1   e ON (d.stop_reference = e.stop_reference AND e.service_id = c.service_id)
  JOIN service1   g ON (f.stop_reference = g.stop_reference AND g.service_id = a.service_id)
  JOIN routes1    h ON (h.service_id = a.service_id)
  JOIN routes1    i ON (i.service_id = c.service_id)
WHERE
    a.service_id = 162
AND f.east_north &&
    setsrid(box3d(expand(PointFromText('POINT(318475 673980)',27700),400)),27700)
AND d.east_north &&
    setsrid(box3d(expand(PointFromText('POINT(331727 664569)',27700),400)),27700)
AND distance(PointFromText('POINT(318475 673980)', 27700),f.east_north) < 200
AND distance(PointFromText('POINT(331727 664569)', 27700),d.east_north) < 200
AND c.depart_time BETWEEN
    a.depart_time + interval '3 minutes' AND
    a.depart_time + interval '13 minutes'
AND g.stop_order < a.stop_order
AND c.stop_order < e.stop_order
AND h.route_number != i.route_number
ORDER BY time
LIMIT 1;

Now, as you can see from the above, you're doing a cartesean join
between a, bus_stops1(d) and bus_stops1(f).  You also seem to be pulling
service1 in four times, which seems like it's quite a few times...
You're also not using the same set of join criteria for each.

It strikes me that what you might want to do is start off with a
cartesian join between bus_stops1(d) and bus_stops1(f) (knowing that
it'll get filtered down by the distance calculations) and then build
your two trees from those off of that.

I'd also strongly encourage figuring out how to get some actual data out
of the query to make sure that it's really doing what you want...

> The bounding box seems to make a massive difference and I am now
> wondering can I use it elsewhere too. For example to create the
> stop_link table I need to create a table from a query which selects
> all stops on different routes within 200m of each other. Can I use
> a bounding box to do this? At the moment this is my query which
> takes a long time to complete:

Certainly!  Bounding boxes are great.  You shouldn't (in general) ever
be doing distance calculations without them!

> CREATE table stops_link (stop_a, stop_b)
> AS SELECT DISTINCT a.stop_reference, b.stop_reference 
> FROM bus_stops1 a, bus_stops1 b, service1 c, service1 d
> WHERE distance(a.east_north, b.east_north) < 200
> AND expand(a.east_north, 200) && b.east_north
> AND a.stop_reference = c.stop_reference
> AND b.stop_reference = d.stop_reference
> AND c.service_id != d.service_id
> AND a.stop_reference != b.stop_reference;

How about this:

CREATE table stops_link (stop_a, stop_b) AS
SELECT DISTINCT 
  a.stop_reference,
  b.stop_reference
FROM
  service1 c
  JOIN service1 d ON TRUE
  JOIN bus_stops1 a ON (c.stop_reference = a.stop_reference)
  JOIN bus_stops1 b ON (d.stop_reference = b.stop_reference)
WHERE 
    c.service_id <> d.service_id
AND a.stop_reference <> b.stop_reference
AND b.east_north &&
    expand(a.east_north,400)
AND distance(a.east_north, b.east_north) < 200
;

Obviously you'll want to make sure that you get the correct results.
Also, you want your bounding box to be big enough to cover all possible
valid results from your distance calculation.  I'd have to go back and
look at the expand() documentation to remember what it does exactly, but
generally I'm lazy and just make the expand parameter twice the size of
the distance boundary I'm doing. :)

> Sorry if it seems im bombarding you with questions now its just you have been so helpful so far! 

No prob, hope it helps...

	Thanks,

		Stephen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070717/c767d837/attachment.pgp>


More information about the postgis-users mailing list