[postgis-users] grouping points by time offset

Sun Jul 1 11:13:02 PDT 2007

Hi Kurt and everyone,

I have an untried, vaguely specified idea:

1.  Add a column for "consisent_track_id" - this will have a number
for a "consistent track" - one formed by a set of points all within
one hour of each other for a given ship.  This will be null at first.

2.  In a procedural language, grab a list of all the measurement
points for a given ship *ordered by time*.  Start with the first one,
give it an arbitrary unique ID for a "consistent track", store that ID
in a variable.  Grab the next one measure point, and if the time
difference < single hour assign the same ID to it, otherwise get a new
"consistent track" id and start the process again.  This will fill the
table with IDs that represent which "consistent track" a point belongs
to.  (My logic may be slightly off, especially at the edges, but you
get the idea.)

3.  Now it is easy:  use a GROUP BY consistent_track in a select to
aggregate the points into LINESTRINGs.  Do a select into a new table.

4.  With a ship id, you could do this all in one table for all the
ships and measurement points, which would be a little more graceful.
The primary key would probably be (ship_id, measurement_timestamp).
Have a separate table to store coalesced tracks.

5.  If you are streaming the measurement points, try to assign a
consistent track ID as they come in, which would be easy.  Saves on
query time.

6.  "Indexes", not "keys", are what (sometimes) speed up queries.

HTH and let us all know how you finally do fix it.

On 7/1/07, Kurt Schwehr <schwehr at ccom.unh.edu> wrote:
> Hi Rodrigo,
>
> I am not sure how to dump a view in a way that is useful, so here is
> what makes up the data set and answer for one ship.
>
> select key,userid,cg_sec,cg_timestamp from oneShip;
> key |  userid   |   cg_sec   |    cg_timestamp
> -----+-----------+------------+---------------------
> 251 | 366725230 | 1178661410 | 2007-05-08 21:56:50
> 252 | 366725230 | 1178661421 | 2007-05-08 21:57:01
> 253 | 366725230 | 1178661561 | 2007-05-08 21:59:21
> 254 | 366725230 | 1178661710 | 2007-05-08 22:01:50
> 255 | 366725230 | 1178661900 | 2007-05-08 22:05:00
> 256 | 366725230 | 1178661940 | 2007-05-08 22:05:40
> 257 | 366725230 | 1178663311 | 2007-05-08 22:28:31
> 258 | 366725230 | 1178663360 | 2007-05-08 22:29:20
> 259 | 366725230 | 1178663441 | 2007-05-08 22:30:41
> 260 | 366725230 | 1178663540 | 2007-05-08 22:32:20
> -- here is the time jump between the two that I need to split on
> 266 | 366725230 | 1178729461 | 2007-05-09 16:51:01
> 267 | 366725230 | 1178730221 | 2007-05-09 17:03:41
> (12 rows)
>
> Where the results should be this for the separated transits:
>
> userid           | startKey | endKey
> 366725230 |      251    |    260
> 366725230 |      266    |    267
>
> I've also put a dump of the db up on the web of "pg_dump --create
> ais> ais_pgdump.sql" in case that might be easier to deal with.
>
> http://vislab-ccom.unh.edu/~schwehr/ais/tmp/
>
> I've written a little python/psycopg2 script, but the full database
> has 4.5M entries and the processing is going to take the whole day
> (which I think implies that I need to tell postgresql that userid
> should be a key).
>
> Thanks much!
> -kurt
>