[postgis-users] Non-linear time cost; please suggest a better way to structure the query

Rory Meyer rory.meyer at VLIZ.be
Wed Feb 23 23:53:25 PST 2022


I haven't looked at MobilityDB (although I will now). In addition to PostGIS I'm using TimeScaleDB to build "continuous aggregates" of the GPS data in order to get things like the latest position per hour, the average speed per day etc. It's pretty handy for that but is unable to use more complex aggregates with window functions, distinct, trajectory creation etc.

I suppose the main part of my query would be portion that distributes the difference between the lead/lag and current event_times (time delta) over the grids that the lead/lag created line pass over:
      sum(((st_length(st_intersection(traj.traj, grid.geom)) * traj.time_delta) / traj.traj_dist))
Does MobilityDB have a function that could help with this?

I tried to stay away from using the trajectory data type since it is only (x,y,t) and what I really need is (x,y,t,m1,m2,m3....) so that I can distribute the speed/bearing/dataX over the hexagon weighted with the time associated with each vertex.

Regards,
Rory

________________________________
From: Florian Nadler <florian.nadler at cybertec.at>
Sent: 24 February 2022 07:45
To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>; Rory Meyer <rory.meyer at VLIZ.be>
Subject: Re: [postgis-users] Non-linear time cost; please suggest a better way to structure the query


Hi,


apart from Paul advice did you ever take into consideration to use MobilityDB for this kind of spatial questions?

This will imply creating trajectories out of gps points too, but might simplify query design and processing time as this extension is developed for this kind of queries.


Checkout https://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf, chapter 3.3 which deals with quite simliar analysis.


Florian


Am 23.02.2022 um 15:14 schrieb Rory Meyer:
Afternoon all,

I've got a database full of GPS points (along with associated data like speed, bearing, time, GPS_ID, class etc) and I'm trying to do complex aggregations with the data. I'm trying to build up a "heatmap" of the data by first creating a grid of polygons using ST_HexagonGrid and then using a window function to overlay the lines (created from a lead/lag window of each GPS point and the next one from the same GPS_ID) over the grid. I'd like to get the the number of seconds that gps carrying vehicles spend in each hex cell, grouped by class, speed, date, direction etc etc. The end goal would be to query a lon/lat and get a bunch of aggregated data for different classes, speed/bearing distributions.

Here's a simplified look at the SQL (sorry, it's not really simple...):

'''
SELECT
    grid.gid,
    grid.geom,
    avg(traj.bearing, 511.0) AS avg_bearing,
    avg(traj.time_delta) AS avg_time_delta,
    sum(((st_length(st_intersection(traj.traj, grid.geom)) * traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
FROM (
(my_hex_grid AS grid LEFT JOIN ( SELECT
                                                         subquery.gps_id,
                                                         subquery.event_date,
                                                         subquery.bearing,
                                                         subquery.time_delta,
                                                         st_makeline(subquery.pos, subquery.pos2) AS traj,
                                                         st_distance(subquery.pos, subquery.pos2) AS traj_dist
                                                         FROM (
                                                         SELECT
                                                         gps.mmsi,
                                                         date_part('epoch'::text, (lead(gps.event_time) OVER time_order - gps.event_time)) AS                                                                time_delta,
                                                         gps.geom,
                                                         gps.bearing,
                                                         lead(gps.geom) OVER time_order AS geom2
                                                          FROM gps
                                                         WHERE ((gps.event_time >= '<Start Time>') AND (gps.event_time <= '<End Time>'))
                                                         WINDOW time_order AS (PARTITION BY gps.gps_id ORDER BY gps.event_time)) as subquery
                                  ON (st_intersects(gps.traj, grid.geom)))
  GROUP BY grid.gid, grid.geom
'''

My issue is that I've got a non-linear increase in time that the query takes to complete. If <Start Time> to <End Time> is a couple of hours then it's takes a couple of seconds to run. If it's for a day, it takes a couple minutes to run. If it's for a week it takes a couple of hours.

I'd like to run this for over a year of data but that won't be feasible at this rate.

Is there some way to avoid this non-linear increase in time or would it be best to just write some python code to loop through smaller chunks of data and write the results somewhere?

Regards,
Rory



_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/postgis-users


--
CYBERTEC PostgreSQL International GmbH
Römerstraße 19, A-2752 Wöllersdorf
Web: https://www.cybertec-postgresql.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220224/3c82f277/attachment.html>


More information about the postgis-users mailing list