[postgis-users] Non-linear time cost; please suggest a better way to structure the query

Wed Feb 23 22:45:45 PST 2022

Hi,

apart from Paul advice did you ever take into consideration to use 
MobilityDB for this kind of spatial questions?
This will imply creating trajectories out of gps points too, but might 
simplify query design and processing time as this extension is developed 
for this kind of queries.

Checkout 
https://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf, 
chapter 3.3 which deals with quite simliar analysis.

Florian

Am 23.02.2022 um 15:14 schrieb Rory Meyer:
> Afternoon all,
>
> I've got a database full of GPS points (along with associated data 
> like speed, bearing, time, GPS_ID, class etc) and I'm trying to do 
> complex aggregations with the data. I'm trying to build up a "heatmap" 
> of the data by first creating a grid of polygons using ST_HexagonGrid 
> and then using a window function to overlay the lines (created from a 
> lead/lag window of each GPS point and the next one from the same 
> GPS_ID) over the grid. I'd like to get the the number of seconds that 
> gps carrying vehicles spend in each hex cell, grouped by class, speed, 
> date, direction etc etc. The end goal would be to query a lon/lat and 
> get a bunch of aggregated data for different classes, speed/bearing 
> distributions.
>
> Here's a simplified look at the SQL (sorry, it's not really simple...):
>
> '''
> SELECT
>     grid.gid,
>     grid.geom,
>     avg(traj.bearing, 511.0) AS avg_bearing,
>     avg(traj.time_delta) AS avg_time_delta,
>     sum(((st_length(st_intersection(traj.traj, grid.geom)) * 
> traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
> FROM (
> (my_hex_grid AS grid LEFT JOIN ( SELECT
>  subquery.gps_id,
> subquery.event_date,
> subquery.bearing,
> subquery.time_delta,
> st_makeline(subquery.pos, subquery.pos2) AS traj,
> st_distance(subquery.pos, subquery.pos2) AS traj_dist
> FROM (
> SELECT
> gps.mmsi,
> date_part('epoch'::text, (lead(gps.event_time) OVER time_order - 
> gps.event_time)) AS time_delta,
> gps.geom,
> gps.bearing,
> lead(gps.geom) OVER time_order AS geom2
>  FROM gps
> WHERE ((gps.event_time >= '<Start Time>') AND (gps.event_time <= '<End 
> Time>'))
> WINDOW time_order AS (PARTITION BY gps.gps_id ORDER BY 
> gps.event_time)) as subquery
>                                   ON (st_intersects(gps.traj, grid.geom)))
>   GROUP BY grid.gid, grid.geom
> '''
>
> My issue is that I've got a non-linear increase in time that the query 
> takes to complete. If <Start Time> to <End Time> is a couple of hours 
> then it's takes a couple of seconds to run. If it's for a day, it 
> takes a couple minutes to run. If it's for a week it takes a couple of 
> hours.
>
> I'd like to run this for over a year of data but that won't be 
> feasible at this rate.
>
> Is there some way to avoid this non-linear increase in time or would 
> it be best to just write some python code to loop through smaller 
> chunks of data and write the results somewhere?
>
> Regards,
> Rory
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users

-- 
CYBERTEC PostgreSQL International GmbH
Römerstraße 19, A-2752 Wöllersdorf
Web:https://www.cybertec-postgresql.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220224/cf86994f/attachment.html>