[postgis-users] Non-linear time cost; please suggest a better way to structure the query
Paul Ramsey
pramsey at cleverelephant.ca
Wed Feb 23 07:36:39 PST 2022
You may need to materialize the trajectories so you can spatially index them
On Wed, Feb 23, 2022 at 6:14 AM Rory Meyer <rory.meyer at vliz.be> wrote:
> Afternoon all,
>
> I've got a database full of GPS points (along with associated data like
> speed, bearing, time, GPS_ID, class etc) and I'm trying to do complex
> aggregations with the data. I'm trying to build up a "heatmap" of the data
> by first creating a grid of polygons using ST_HexagonGrid and then using a
> window function to overlay the lines (created from a lead/lag window of
> each GPS point and the next one from the same GPS_ID) over the grid. I'd
> like to get the the number of seconds that gps carrying vehicles spend in
> each hex cell, grouped by class, speed, date, direction etc etc. The end
> goal would be to query a lon/lat and get a bunch of aggregated data for
> different classes, speed/bearing distributions.
>
> Here's a simplified look at the SQL (sorry, it's not really simple...):
>
> '''
> SELECT
> grid.gid,
> grid.geom,
> avg(traj.bearing, 511.0) AS avg_bearing,
> avg(traj.time_delta) AS avg_time_delta,
> sum(((st_length(st_intersection(traj.traj, grid.geom)) *
> traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
> FROM (
> (my_hex_grid AS grid LEFT JOIN ( SELECT
> subquery.gps_id,
>
> subquery.event_date,
> subquery.bearing,
>
>
> subquery.time_delta,
> st_makeline(subquery.pos,
> subquery.pos2) AS traj,
> st_distance(subquery.pos,
> subquery.pos2) AS traj_dist
> FROM (
> SELECT
> gps.mmsi,
> date_part('epoch'::text,
> (lead(gps.event_time) OVER time_order - gps.event_time)) AS
> time_delta,
> gps.geom,
> gps.bearing,
> lead(gps.geom)
> OVER time_order AS geom2
> FROM gps
> WHERE
> ((gps.event_time >= '<Start Time>') AND (gps.event_time <= '<End Time>'))
> WINDOW
> time_order AS (PARTITION BY gps.gps_id ORDER BY gps.event_time)) as subquery
> ON (st_intersects(gps.traj, grid.geom)))
> GROUP BY grid.gid, grid.geom
> '''
>
> My issue is that I've got a non-linear increase in time that the query
> takes to complete. If <Start Time> to <End Time> is a couple of hours then
> it's takes a couple of seconds to run. If it's for a day, it takes a couple
> minutes to run. If it's for a week it takes a couple of hours.
>
> I'd like to run this for over a year of data but that won't be feasible at
> this rate.
>
> Is there some way to avoid this non-linear increase in time or would it be
> best to just write some python code to loop through smaller chunks of data
> and write the results somewhere?
>
> Regards,
> Rory
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220223/9ca42812/attachment.html>
More information about the postgis-users
mailing list