[postgis-users] Question on optimizing slow geospatial query
Trang Nguyen
Trang.Nguyen at inrix.com
Tue Feb 10 23:19:46 PST 2015
Hi,
I am a newbie to Postgres/PostGIS and have a long running query that I would like to optimize.
There are two tables (trip and zone) that I am joining in the query, one which has "startloc" and "endloc" columns with type Geometry(Point) and other which contains a Geometry(MultiPolygon). There are GIST indexes on all above columns:
CREATE TABLE od1.trip_v1
(
pkey bigint NOT NULL,
trip_id character varying,
startts timestamp without time zone,
endts timestamp without time zone,
startloc geometry(Point),
endloc geometry(Point),
...
)
CREATE TABLE od1.taz
(
uuid character varying NOT NULL,
zone character varying NOT NULL,
createdts timestamp without time zone NOT NULL,
updatedts timestamp without time zone NOT NULL,
geom geometry(MultiPolygon) NOT NULL,
CONSTRAINT taz_pkey PRIMARY KEY (uuid)
)
I'm interested in building a matrix that, for a given set of input zones, returns trips along with their start and end zones. Output looks like:
10 trips that start at Zone A, ends at Zone B
2 trips that start at Zone A, ends at Zone C
9 trips that start at Zone A, ends at other
13 trips that start at Zone C, ends at Zone D
Since I am dealing with a large dataset (> 24 million records and growing), I was planning on writing a query that returns the trips grouped by each zone along with match condition (start, end or both) and doing aggregation on the client layer. I'm not sure whether this is the best approach but I expect that otherwise, I would end up having to write a very complex query to handle that type of aggregation.
Even so, the current query is very slow with very high cost:
SELECT t.trip_id,
case
when ST_intersects(t.startloc, z.geom) and ST_intersects(t.endloc, z.geom) then 'orig-dest'
when ST_intersects(t.startloc, z.geom) then 'orig'
when ST_intersects(t.endloc, z.geom) then 'dest'
else 'none'
end as match_cond,
z.zone from od1.trip_v1 t, od1.taz z
where t.startts > '2015-01-16' and t.startts < '2015-01-17'
and z.uuid in ('kansas_303', 'kansas_601', 'kansas_603', etc)
and ST_intersects(t.startloc, z.geom)
or ST_intersects(t.endloc, z.geom)
group by z.zone, t.trip_id, match_cond;
Explain plan:
"Group (cost=231446695055.73..245971533247.59 rows=14240037443 width=3498)"
" -> Sort (cost=231446695055.73..231482295149.34 rows=14240037443 width=3498)"
" Sort Key: z.zone, t.trip_id, (CASE WHEN ((t.startloc && z.geom) AND _st_intersects(t.startloc, z.geom) AND (t.endloc && z.geom) AND _st_intersects(t.endloc, z.geom)) THEN 'orig-dest'::text WHEN ((t.startloc && z.geom) AND _st_intersects(t.startloc, (...)"
" -> Nested Loop (cost=91.70..14401634128.24 rows=14240037443 width=3498)"
" -> Seq Scan on taz z (cost=0.00..739.19 rows=4619 width=3406)"
" -> Bitmap Heap Scan on trip_v1 t (cost=91.70..4151.26 rows=453 width=107)"
" Recheck Cond: ((startloc && z.geom) OR (endloc && z.geom))"
" Filter: (((startts > '2015-01-16 00:00:00'::timestamp without time zone) AND (startts < '2015-01-17 00:00:00'::timestamp without time zone) AND ((z.uuid)::text = ANY ('{kansas_303,kansas_601,kansas_603,kansas_604,kansas_10,kansas_11,kan (...)"
" -> BitmapOr (cost=91.70..91.70 rows=2706 width=0)"
" -> Bitmap Index Scan on idx_trip_v1_startloc (cost=0.00..45.74 rows=1353 width=0)"
" Index Cond: (startloc && z.geom)"
" -> Bitmap Index Scan on idx_trip_v1_endloc (cost=0.00..45.74 rows=1353 width=0)"
" Index Cond: (endloc && z.geom)"
Some help or suggestions on how to speed up the query would be much appreciated.
Also, I currently don't have a specific map projection defined on the geom columns so they are using the default of 0 in postgis. The points in both trip and zone geometries are lon/lat. Is this an issue for the ST_intersect?
Thanks,
Trang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150211/99047d4e/attachment.html>
More information about the postgis-users
mailing list