[postgis-users] Non-linear time cost; please suggest a better way to structure the query

Florian Nadler florian.nadler at cybertec.at
Sat Feb 26 03:56:56 PST 2022

Hi Rory,

I recommend to start with the latest MobilityDB extension (docker image 
seems to be outdated) and import and partition the Berlin Workshop data, 
as mentioned in 

The query below should give a good starting point for your analysis - 
for a given period, output the average time cars remain in regions.

select carid,
                (atgeometry((atPeriod(T.Trip,'[2007-05-28 00:00:00+00, 2007-05-29 00:00:00+00]'::period)),
FROM Trips T,
      Regions R WHERE T.Trip && stbox(R.Geom,'[2007-05-28 00:00:00+00, 2007-05-29 00:00:00+00]'::period)
   AND st_intersects(trajectory(atPeriod(T.Trip,'[2007-05-28 00:00:00+00, 2007-05-29 00:00:00+00]'::period)),
group by carid,regionid

Just let me know if this works out for.


Am 24.02.2022 um 08:53 schrieb Rory Meyer:
> I haven't looked at MobilityDB (although I will now). In addition to 
> PostGIS I'm using TimeScaleDB to build "continuous aggregates" of the 
> GPS data in order to get things like the latest position per hour, the 
> average speed per day etc. It's pretty handy for that but is unable to 
> use more complex aggregates with window functions, distinct, 
> trajectory creation etc.
> I suppose the main part of my query would be portion that distributes 
> the difference between the lead/lag and current event_times (time 
> delta) over the grids that the lead/lag created line pass over:
>       sum(((st_length(st_intersection(traj.traj, grid.geom)) * 
> traj.time_delta) / traj.traj_dist))
> Does MobilityDB have a function that could help with this?
> I tried to stay away from using the trajectory data type since it is 
> only (x,y,t) and what I really need is (x,y,t,m1,m2,m3....) so that I 
> can distribute the speed/bearing/dataX over the hexagon weighted with 
> the time associated with each vertex.
> Regards,
> Rory
> ------------------------------------------------------------------------
> *From:* Florian Nadler <florian.nadler at cybertec.at>
> *Sent:* 24 February 2022 07:45
> *To:* PostGIS Users Discussion <postgis-users at lists.osgeo.org>; Rory 
> Meyer <rory.meyer at VLIZ.be>
> *Subject:* Re: [postgis-users] Non-linear time cost; please suggest a 
> better way to structure the query
> Hi,
> apart from Paul advice did you ever take into consideration to use 
> MobilityDB for this kind of spatial questions?
> This will imply creating trajectories out of gps points too, but might 
> simplify query design and processing time as this extension is 
> developed for this kind of queries.
> Checkout 
> https://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf 
> <https://docs.mobilitydb.com/MobilityDB-BerlinMOD/master/mobilitydb-berlinmod.pdf>, 
> chapter 3.3 which deals with quite simliar analysis.
> Florian
> Am 23.02.2022 um 15:14 schrieb Rory Meyer:
>> Afternoon all,
>> I've got a database full of GPS points (along with associated data 
>> like speed, bearing, time, GPS_ID, class etc) and I'm trying to do 
>> complex aggregations with the data. I'm trying to build up a 
>> "heatmap" of the data by first creating a grid of polygons 
>> using ST_HexagonGrid and then using a window function to overlay the 
>> lines (created from a lead/lag window of each GPS point and the next 
>> one from the same GPS_ID) over the grid. I'd like to get the the 
>> number of seconds that gps carrying vehicles spend in each hex cell, 
>> grouped by class, speed, date, direction etc etc. The end goal would 
>> be to query a lon/lat and get a bunch of aggregated data for 
>> different classes, speed/bearing distributions.
>> Here's a simplified look at the SQL (sorry, it's not really simple...):
>> '''
>>     grid.gid,
>>     grid.geom,
>>     avg(traj.bearing, 511.0) AS avg_bearing,
>>     avg(traj.time_delta) AS avg_time_delta,
>>     sum(((st_length(st_intersection(traj.traj, grid.geom)) * 
>> traj.time_delta) / traj.traj_dist)) AS cum_time_in_grid
>> FROM (
>> (my_hex_grid AS grid LEFT JOIN ( SELECT
>>  subquery.gps_id,
>> subquery.event_date,
>> subquery.bearing,
>> subquery.time_delta,
>> st_makeline(subquery.pos, subquery.pos2) AS traj,
>> st_distance(subquery.pos, subquery.pos2) AS traj_dist
>> FROM (
>> gps.mmsi,
>> date_part('epoch'::text, (lead(gps.event_time) OVER time_order - 
>> gps.event_time)) AS time_delta,
>> gps.geom,
>> gps.bearing,
>> lead(gps.geom) OVER time_order AS geom2
>>  FROM gps
>> WHERE ((gps.event_time >= '<Start Time>') AND (gps.event_time <= 
>> '<End Time>'))
>> WINDOW time_order AS (PARTITION BY gps.gps_id ORDER BY 
>> gps.event_time)) as subquery
>>                                   ON (st_intersects(gps.traj, 
>> grid.geom)))
>>   GROUP BY grid.gid, grid.geom
>> '''
>> My issue is that I've got a non-linear increase in time that the 
>> query takes to complete. If <Start Time> to <End Time> is a couple of 
>> hours then it's takes a couple of seconds to run. If it's for a day, 
>> it takes a couple minutes to run. If it's for a week it takes a 
>> couple of hours.
>> I'd like to run this for over a year of data but that won't be 
>> feasible at this rate.
>> Is there some way to avoid this non-linear increase in time or would 
>> it be best to just write some python code to loop through smaller 
>> chunks of data and write the results somewhere?
>> Regards,
>> Rory
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org  <mailto:postgis-users at lists.osgeo.org>
>> https://lists.osgeo.org/mailman/listinfo/postgis-users  <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> -- 
> CYBERTEC PostgreSQL International GmbH
> Römerstraße 19, A-2752 Wöllersdorf
> Web:https://www.cybertec-postgresql.com  <https://www.cybertec-postgresql.com>

CYBERTEC PostgreSQL International GmbH
Römerstraße 19, A-2752 Wöllersdorf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220226/2bfcd9f9/attachment.html>

More information about the postgis-users mailing list