[postgis-users] Question on optimizing slow geospatial query

Wed Feb 11 00:20:09 PST 2015

Hi Trang,

I think, it could help to create btree indices on "startts" and "uuid", 
too, since you are using them in your where clause as a filter (a 
probably unnecessary question regarding your date filter: I would expect 
the result of "t.startts > '2015-01-16' and t.startts < '2015-01-17'" to 
be null because there are no days between the 16th and the 17th of 
january - but perhaps it was only an example...). And in general, my 
suggestion would be to reduce the use of st_intersects to the necessary 
mimimum.
You could use the with clause for the filtration of your input data and 
afterwards double join the two tables first on the startloc and second 
on the endloc for the assignment of the origin and the destination zone. 
Then group by origin and destination zones while counting your trips and 
you should have your end result.

Here is how I would imagine the query:

with
t as (select trip_id, startloc, endloc from od1.trip_v1 where startts 
between 'minimum start date' and 'maximum start date'),
z as (select zone from od1.taz where uuid in ('kansas_303', 
'kansas_601', 'kansas_603', etc))

select  count(t.trip_id) as number_of_trips, orig.zone as orig_zone, 
dest.zone as dest_zone from t left join z as orig on 
st_intersects(t.startloc, z.geom) left join z as dest on 
st_intersects(t.endloc, z.geom) group by orig.zone, dest.zone;

However, I am not sure about how the gist indices work together with the 
subselects of the with clause...

Hope this helps,

Birgit.

Am 11.02.2015 um 08:19 schrieb Trang Nguyen:
>
> Hi,
>
> I am a newbie to Postgres/PostGIS and have a long running query that I 
> would like to optimize.
>
> There are two tables (trip and zone) that I am joining in the query, 
> one which has “startloc” and “endloc” columns with type 
> Geometry(Point) and other which contains a Geometry(MultiPolygon). 
> There are GIST indexes on all above columns:
>
> CREATE TABLE od1.trip_v1
>
> (
>
>   pkey bigint NOT NULL,
>
>   trip_id character varying,
>
>   startts timestamp without time zone,
>
>   endts timestamp without time zone,
>
>   startloc geometry(Point),
>
>   endloc geometry(Point),
>
>   …
>
> )
>
> CREATE TABLE od1.taz
>
> (
>
>   uuid character varying NOT NULL,
>
>   zone character varying NOT NULL,
>
>   createdts timestamp without time zone NOT NULL,
>
>   updatedts timestamp without time zone NOT NULL,
>
>   geom geometry(MultiPolygon) NOT NULL,
>
>   CONSTRAINT taz_pkey PRIMARY KEY (uuid)
>
> )
>
>
> I’m interested in building a matrix that, for a given set of input 
> zones, returns trips along with their start and end zones. Output 
> looks like:
>
> 10 trips that start at Zone A, ends at Zone B
>
> 2 trips that start at Zone A, ends at Zone C
>
> 9 trips that start at Zone A, ends at other
>
> 13 trips that start at Zone C, ends at Zone D
>
> Since I am dealing with a large dataset (> 24 million records and 
> growing), I was planning on writing a query that returns the trips 
> grouped by each zone along with match condition (start, end or both) 
> and doing aggregation on the client layer. I’m not sure whether this 
> is the best approach but I expect that otherwise, I would end up 
> having to write a very complex query to handle that type of aggregation.
>
> Even so, the current query is very slow with very high cost:
>
> SELECT t.trip_id,
>
> case
>
>                 when ST_intersects(t.startloc, z.geom) and 
> ST_intersects(t.endloc, z.geom) then 'orig-dest'
>
>                 when ST_intersects(t.startloc, z.geom) then 'orig'
>
>                 when ST_intersects(t.endloc, z.geom) then 'dest'
>
>                 else 'none'
>
> end  as match_cond,
>
> z.zone from od1.trip_v1               t, od1.taz z
>
> where t.startts > '2015-01-16' and t.startts < '2015-01-17'
>
> and z.uuid in ('kansas_303', 'kansas_601', 'kansas_603', etc)
>
> and ST_intersects(t.startloc, z.geom)
>
> or ST_intersects(t.endloc, z.geom)
>
> group by z.zone, t.trip_id, match_cond;
>
> Explain plan:
>
> "Group (cost=231446695055.73..245971533247.59 rows=14240037443 
> width=3498)"
>
> "  ->  Sort (cost=231446695055.73..231482295149.34 rows=14240037443 
> width=3498)"
>
> "        Sort Key: z.zone, t.trip_id, (CASE WHEN ((t.startloc && 
> z.geom) AND _st_intersects(t.startloc, z.geom) AND (t.endloc && 
> z.geom) AND _st_intersects(t.endloc, z.geom)) THEN 'orig-dest'::text 
> WHEN ((t.startloc && z.geom) AND _st_intersects(t.startloc, (...)"
>
> "        ->  Nested Loop (cost=91.70..14401634128.24 rows=14240037443 
> width=3498)"
>
> "              ->  Seq Scan on taz z (cost=0.00..739.19 rows=4619 
> width=3406)"
>
> "              ->  Bitmap Heap Scan on trip_v1 t  (cost=91.70..4151.26 
> rows=453 width=107)"
>
> "                    Recheck Cond: ((startloc && z.geom) OR (endloc && 
> z.geom))"
>
> "                    Filter: (((startts > '2015-01-16 
> 00:00:00'::timestamp without time zone) AND (startts < '2015-01-17 
> 00:00:00'::timestamp without time zone) AND ((z.uuid)::text = ANY 
> ('{kansas_303,kansas_601,kansas_603,kansas_604,kansas_10,kansas_11,kan 
> (...)"
>
> "                    ->  BitmapOr (cost=91.70..91.70 rows=2706 width=0)"
>
> "                          ->  Bitmap Index Scan on 
> idx_trip_v1_startloc  (cost=0.00..45.74 rows=1353 width=0)"
>
> "                                Index Cond: (startloc && z.geom)"
>
> "                          ->  Bitmap Index Scan on 
> idx_trip_v1_endloc  (cost=0.00..45.74 rows=1353 width=0)"
>
> "                                Index Cond: (endloc && z.geom)"
>
> Some help or suggestions on how to speed up the query would be much 
> appreciated.
>
> Also, I currently don’t have a specific map projection defined on the 
> geom columns so they are using the default of 0 in postgis. The points 
> in both trip and zone geometries are lon/lat. Is this an issue for the 
> ST_intersect?
>
> Thanks,
> Trang
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150211/0a8f70fe/attachment.html>