[postgis-users] Non-linear time cost; please suggest a better way to structure the query

Wed Feb 23 07:36:39 PST 2022

You may need to materialize the trajectories so you can spatially index them

On Wed, Feb 23, 2022 at 6:14 AM Rory Meyer <rory.meyer at vliz.be> wrote:

> Afternoon all,
>
> I've got a database full of GPS points (along with associated data like
> speed, bearing, time, GPS_ID, class etc) and I'm trying to do complex
> aggregations with the data. I'm trying to build up a "heatmap" of the data
> by first creating a grid of polygons using ST_HexagonGrid and then using a
> window function to overlay the lines (created from a lead/lag window of
> each GPS point and the next one from the same GPS_ID) over the grid. I'd
> like to get the the number of seconds that gps carrying vehicles spend in
> each hex cell, grouped by class, speed, date, direction etc etc. The end
> goal would be to query a lon/lat and get a bunch of aggregated data for
> different classes, speed/bearing distributions.
>
> Here's a simplified look at the SQL (sorry, it's not really simple...):
>
> '''
> SELECT
>     grid.gid,
>     grid.geom,
>     avg(traj.bearing, 511.0) AS avg_bearing,
>     avg(traj.time_delta) AS avg_time_delta,
>     sum(((st_length(st_intersection(traj.traj, grid.geom)) *
> traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
> FROM (
> (my_hex_grid AS grid LEFT JOIN ( SELECT
>                                                          subquery.gps_id,
>
> subquery.event_date,
>                                                          subquery.bearing,
>
>
> subquery.time_delta,
>                                                          st_makeline(subquery.pos,
> subquery.pos2) AS traj,
>                                                          st_distance(subquery.pos,
> subquery.pos2) AS traj_dist
>                                                          FROM (
>                                                          SELECT
>                                                          gps.mmsi,
>                                                          date_part('epoch'::text,
> (lead(gps.event_time) OVER time_order - gps.event_time)) AS
>                                                  time_delta,
>                                                          gps.geom,
>                                                          gps.bearing,
>                                                          lead(gps.geom)
> OVER time_order AS geom2
>                                                           FROM gps
>                                                          WHERE
> ((gps.event_time >= '<Start Time>') AND (gps.event_time <= '<End Time>'))
>                                                          WINDOW
> time_order AS (PARTITION BY gps.gps_id ORDER BY gps.event_time)) as subquery
>                                   ON (st_intersects(gps.traj, grid.geom)))
>   GROUP BY grid.gid, grid.geom
> '''
>
> My issue is that I've got a non-linear increase in time that the query
> takes to complete. If <Start Time> to <End Time> is a couple of hours then
> it's takes a couple of seconds to run. If it's for a day, it takes a couple
> minutes to run. If it's for a week it takes a couple of hours.
>
> I'd like to run this for over a year of data but that won't be feasible at
> this rate.
>
> Is there some way to avoid this non-linear increase in time or would it be
> best to just write some python code to loop through smaller chunks of data
> and write the results somewhere?
>
> Regards,
> Rory
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220223/9ca42812/attachment.html>