[postgis-users] Issue with ST_Intersection seems related to lines crossing themself
Rory Meyer
rory.meyer at VLIZ.be
Fri Oct 28 02:22:49 PDT 2022
Hi Juergen,
I wrote a quick blog post about how I grouped up AIS points into lines, but broken up on steps in time, distance, and calculated speed.
It's very short but contains some thinking and some code.
https://openais.xyz/post/2022-10-28-ais-traj/
You could add an additional spatial filter in the first part of the blog query to only fetch AIS points that are within the geometry that defines your port area.
Rory
________________________________
From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of magog002 at web.de <magog002 at web.de>
Sent: 24 October 2022 19:40
To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Issue with ST_Intersection seems related to lines crossing themself
Hello Rory,
Hello Simon,
thank you both for the feedback.
Sorry for the delay. I'm now back on this topic after my crazy last week.
First some additional explanation:
==================================
I use the LineString also to draw the track on a map (for each vessel / MMSI).
The track LineString from 00:00 of the current date to NOW() is generated "on-the-fly" from the stored GPS points (basically the RAW data you you get from the "AIS position report").
For today's data + 1-2 days older, the "on-the-fly" generated information is combined with a UNION from the information retrieved from table 'ais_trajectory_mmsi_daily'.
Currently I've got ~2 million new GPS position entries each day.
In general my aggregation of points per day produces lines that go "over land" if there is no data for some time until the next Antenna picks up something again (or jumping points).
As you Rory pointed out I might have to go the Multi-LineString way so if I don't get new GPS points for about ~10+ Min (vessel out of antenna range; by AIS definition the max. interval is 180 seconds) I start a new LineString (plus some other checks like distance).
More on this further down below as this is an extra task.
To answer both of your questions:
=================================
Yes I can also use MultiPoint data.
I have now added a new column to store the MultiPoint (type: PointM) to the table 'ais_trajectory_mmsi_daily'.
To test it I updated the missing data from my stored individual point data "AIS position report" (which is only a short time storage due to the large amount of data coming in).
I also fixed the old LineString column definition which was missing the explicit SRID before.
--
-- Add extra column to also store the PointM (Point + Timestamp) data as MultiPointM geometry (WGS84/GPS).
--
ALTER TABLE ais_trajectory_mmsi_daily ADD COLUMN geom_multipointm GEOMETRY(MULTIPOINTM, 4326);
--
-- Change column definition to geom type of LineStringM with WGS84 (SRID: 4326).
--
ALTER TABLE ais_trajectory_mmsi_daily ALTER COLUMN geo_lat_long_line TYPE GEOMETRY(LINESTRINGM, 4326) USING ST_SetSRID(geo_lat_long_line, 4326);
The complete table looks like this now:
=======================================
CREATE TABLE IF NOT EXISTS aisstream.ais_trajectory_mmsi_daily
(
mmsi integer,
min_created_on timestamp with time zone,
max_created_on timestamp with time zone,
geom_linestring geometry(LineStringM,4326), -- Fixed: Added exact type + SRID 4326!
geom_multipointm geometry(MultiPointM,4326)
)
Index (so far no index on the geometry columns):
================================================
CREATE INDEX IF NOT EXISTS ais_trajectory_mmsi_daily_mmsi ON aisstream.ais_trajectory_mmsi_daily USING btree (mmsi ASC NULLS LAST);
Additional indexes didn't really speed things up so far.
I've checked an GIST index on the 'geom_linestring' used in the ST_Intersects query as well as the 'min_created_on' column. I did an ANALYZE on the table afterwards to update the statistics. ;)
So far I still use the "... WHERE ST_Intersects(geom_linestring, geom_polygon) AND ..." with the LineString on my Polygon which is so far fast enough (0.5 seconds). Once I run "ST_Dump(ST_Intersection(geom_linestring, geom_polygon)) in the SELECT part the query times completely explode (30+ minutes). With MultiPoints the ST_DUMP is much faster.
With the suggested MultiPoint approach (the newly added column) the query times are fine if I run "ST_Intersection(geom_multipointm, geom_polygon) on the already 'pre-filtered' data.
So, I now have my points inside my polygon still in ~0.5 seconds (if I check for a range of ~8 days).
Here is my current query:
=====================
--
-- Find lines intersecting the given polygon (port/berth area).
-- From those get the actual points (at least 2 point) that are inside the given polygon (port/berth area).
-- What we want for now is the timestamp of the first and last point inside the polygon.
-- ==> This information is lost by calling ST_Intersection(...)
-- in the subquery (see alias: geom_intersection)!
--
-- The query also dumps some additional details (now many points do we have,..).
--
--EXPLAIN ANALYZE
WITH my_port AS (
SELECT ST_GeometryFromText('POLYGON((6.972692012786865 50.929290943993465,6.971780061721802 50.930535164787784,6.971426010131836 50.93042697299645,6.972337961196899 50.92918951148323,6.972692012786865 50.929290943993465))', 4326) AS geom
)
,my_traj AS (
SELECT ais.mmsi
,ais.min_created_on AS mintime
,ais.max_created_on AS maxtime
,ais.geom_linestring
,ais.geom_multipointm
FROM ais_trajectory_mmsi_daily AS ais
)
SELECT geom_intersection.mmsi
,geom_intersection.created_on_date
,ST_NPoints(geom_intersection.port_mpoint_intersect) AS point_count
,ST_GeometryN(geom_intersection.port_mpoint_intersect, 1) AS first_pointm
--
-- The following does not exist any more
--
,ST_M(ST_GeometryN(geom_intersection.port_mpoint_intersect, 1)) AS first_pointm_m_value
--
,ST_GeometryType(ST_GeometryN(geom_intersection.port_mpoint_intersect, 1)) AS first_pointm_geom_type
,geom_intersection.port_mpoint_intersect
,ST_GeometryType(geom_intersection.port_mpoint_intersect) AS geom_type
FROM (
SELECT my_traj.mmsi
,CAST(my_traj.mintime AS date) AS created_on_date
,ST_Intersection(geom_multipointm, my_port.geom) AS port_mpoint_intersect
--,ST_3DIntersection(geom_multipointm, my_port.geom) AS port_mpoint_3dintersect -- function does not exist error
,my_traj.geom_linestring
--,(ST_Dump(ST_Intersection(my_traj.geom_linestring, my_port.geom) )).geom AS geom_dump -- extremely slow (linestring)
--,(ST_Dump(ST_Intersection(my_traj.geom_multipointm, my_port.geom) )).geom AS geom_dump -- fast (multipoint)
FROM my_port, my_traj
WHERE ST_Intersects(my_traj.geom_linestring, my_port.geom)
AND my_traj.mintime >= '2022-10-16'
AND my_traj.maxtime < '2022-10-24'
GROUP BY mmsi, created_on_date , my_port.geom, my_traj.geom_multipointm ,my_traj.geom_linestring
) AS geom_intersection
WHERE ST_NPoints(geom_intersection.port_mpoint_intersect) > 1
GROUP BY mmsi, created_on_date, port_mpoint_intersect
;
1 - Issue - How do I get my timestamp for the points inside my polygon?:
========================================================================
I now have the issue that I lost the timestamp stored within my Point ("M" value) because ST_Intersection(..) removes this (https://postgis.net/docs/ST_Intersection.html).
The documentation kind of suggests 'ST_3DIntersection' (at least the Z is not removed...so I assume "M" would be available too after calling this function). This function should be available with PostGIS 2.1.0+.
I'm running PostgreSQL 13.7 with PostGIS 3.2.1 here (Ubuntu Linux).
The problem is that I get the error "function st_3dintersection(geometry, geometry) does not exist". I don't get why this is the case.
So how can I get back the timestamp for my identified points?
Some kind of LATERAL JOIN on my original "geom_multipointm" column data?
Sorry, I'm lost here.
2 - How to creating a Multi-LineString instead of a single LineString?
I assume some kind of window function LAG/LEAD between the points.
=================================================================
The other (new) task is break down my single LineString to Multi-LineString while aggregating the geom point data.
Currently the data is aggreated per day and MMSI. The 'Point' is converted to a 'PointM' (with the created_on timestamp) to be added to ST_MakeLine(geom_pointm ORDER BY created_on).
It seems I need some help on how to create this "multi-linestring".
I assume I need to use a window function to compare the current with the next and if >= 10 Min this following point is in the next linestring (minimum of two points required in each group) + check distance (to remove total nonsense).
For starters the >= 10 Min gap to generate Multi-LineStrings would be ok. The rest is for additional fine tuning later on. I started experimenting with the duration between two points using a window function.
Many thanks in advance!
Juergen
Gesendet: Montag, 17. Oktober 2022 um 15:09 Uhr
Von: "Rory Meyer" <rory.meyer at VLIZ.be>
An: "PostGIS Users Discussion" <postgis-users at lists.osgeo.org>
Betreff: Re: [postgis-users] Issue with ST_Intersection seems related to lines crossing themself
Hi Juergen,
I've been helping on an open source project for processing AIS data in postgis with a focus on scientific applications instead of commercial ones: http://openais.xyz/ It's still a little immature and the database portion of the project is still closed while I rewrite some of the containers.
Somethings that have helped me:
* Since AIS is a timeseries, geospatial dataset it often helps to have a timeseries plugin to work with. The TimeScaleDB HA version contains both PostGIS 3+ and TimescaleDB plugins. Timescaledb lets you automatically partition a table based on time and to automatically create binned data from live data. You could have a first/last position per ship MMSI per hour/day/week.
* The GPS on AIS is generally pretty accurate but can suffer from poor GPS lock when in ports that have high buildings, cranes etc near the harbour. This causes the vessels to "jump" around much more than expected. There are also instances of the vessels jumping several hundred km's in either the latitude or longitude direction. This might be due to bit-flip errors in the AIS VHF message.
* I would say that having a line intersect with your port geometry, even if no points are within it, is the kind of behaviour that you want. If this is not the case, then I would just check for points within the port area instead of the trajectory.
* For handling a trajectory that is on the edge of the port area and causing multiple, close in time, entrance and exit signals it might be worth looking into some of the following:
* Some kind hysteresis function by having a combination of the port boundaries and a buffer of the port boundaries.
* Filtering the entrance/exit signals through time. TimeScaleDB gives aggregate functions like "first(<column>, timestamp)" that would select the first item in the column by timestamp. First grouping the data by some sensible time window (daily) and then selecting the first/last entrance/exit signal per vessel
* Ignoring entrance/exit signals where the reverse signal occurred within X seconds.
* You might need more complex Trajectory generation code, something that uses window functions to break trajectories on obviously bad points (gaps between messages longer than 1 hour, sudden jumps of 100km+, points where calculated speed > 2* reported speed)
* It might be worth dumping the Z-value of the trajectory to get the first/last time in the intersection segment.
* To make your SQL a little easier to read you can use Common Table Expressions:
```
WITH my_port AS
(SELECT ST_GeometryFromText('POLYGON((7.1084729427821 50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 50.7352828020046))', 4326) AS geom ),
my_traj AS
(SELECT
ais.mmsi,
ais.created_on_date as date,
ais.min_created_on as mintime,
ais.max_created_on as maxtime,
ais.geo_lat_long_line as geom --renaming these to just help me out here
FROM ais_trajectory_mmsi_daily AS ais
)
SELECT
my_traj.mmsi,
my_traj.date,
ST_Intersection(my_traj.geom, my_port.geom) as port_traj_intersect,
ST_ZMin(ST_Intersection(my_traj.geom, my_port.geom)) as segment_start_time,
ST_ZMax(ST_Intersection(my_traj.geom, my_port.geom)) as segment_end_time
FROM my_geom, my_traj
WHERE ST_Intersects(my_traj.geom, my_port.geom)
AND my_traj.mintime > '2022-10-01'
AND my_traj.maxtime < '2022-10-02'
```
Hope that helps some,
Rory
________________________________
From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Simon SPDBA Greener <simon at spdba.com.au>
Sent: 17 October 2022 01:56
To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Issue with ST_Intersection seems related to lines crossing themself
Would using a MultiPoint (or GeometryCollection with Points) for the GPS locations rather than a Linestring be possible?
Simon
On 17/10/2022 10:10 am, magog002 at web.de<mailto:magog002 at web.de> wrote:
Hi,
I've got an issue with an SQL PostGIS query and hope to get a solution here.
First some details.
I'm storing GPS locations of vessels by their MMSI identification number (AIS) that are received in 3 Sek to 3 Minutes ranges in a table.
As this becomes quite unhandy in a very short time, each day the basic details mmsi, received_at (Timestamp), lat, long are converted into a Points with the Timestamp integrated and this is then converted into a LineString for each MMSI, per day, with the first and last timestamp of the day for which a point exists as direct columns.
Now I want to know when some vessel is/was inside a specific area (harbor, berth) so I have a Polygon which the LineString has to intersect.
So far so good.
This works but can create false positives (two points outside the polygon are connected and this created line goes through the Polygon but there is no actual Point of that line in the Polygon)...so I need to re-check again by pulling out the points from my LineString.
Here the basic search query that basically works:
================================
--
-- Return all vessels (MMSI) found in the selected timeframe where the
-- LineString Intersects the Polygon given as parameter to search.
--
SELECT ais_trajectory_mmsi_daily.mmsi
,ais_trajectory_mmsi_daily.min_created_on
,ais_trajectory_mmsi_daily.max_created_on
,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326) AS geo_lat_long_line
FROM ais_trajectory_mmsi_daily
WHERE ST_Intersects(ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326),
ST_GeometryFromText('POLYGON((7.1084729427821 50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 50.7352828020046))', 4326) )
AND min_created_on >= CAST('2022-10-01' AS TIMESTAMPTZ)
AND max_created_on < CAST('2022-10-15' AS TIMESTAMPTZ)
;
Here is now the query I use with ST_Intersection and where it starts to go wrong.
When a vessel stays a the same place the GPS coordinates are the same + drift.
As those points are all added to the LineString the line sections are running over themself and it looks like a "woolen noodle".
This seems to cause issues with ST_Intersection as I use it because instead of getting only the entering and exiting details as normally 1 row where the LineString hits my Polygon with times I now get hundreds of rows as a result and the visual result shown on the map is weird and sometimes wrong (the line goes out of the polygon in a way the original LineString did never exist).
Here is the query in question (basically the part above) with ST_Dump and ST_Intersection in the SELECT part + plus an outer query to get more details on what was the first and last time there was an intersection.
--
-- Outer query with union to visualize the Polygon and the
-- data received from the LineString that Intersects
-- with the polygon via e.g. pgAdmin.
--
SELECT NULL AS mmsi
,NULL AS created_on_date
,NULL AS min_created_on
,NULL AS max_created_on
,ST_GeometryFromText('POLYGON((7.1084729427821 50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 50.7352828020046))', 4326) AS geom
,NULL AS geom_start
,NULL AS geom_end
UNION
-- The actual data of interest
SELECT intersection.mmsi
,intersection.created_on_date
,intersection.min_created_on
,intersection.max_created_on
,intersection.geom
,MIN(DATE_TRUNC('second', TO_TIMESTAMP(ST_InterpolatePoint(intersection.geo_lat_long_line, ST_StartPoint(intersection.geom)) )::timestamptz )) AS geom_start
,MAX(DATE_TRUNC('second', TO_TIMESTAMP(ST_InterpolatePoint(intersection.geo_lat_long_line, ST_EndPoint(intersection.geom)) )::timestamptz )) AS geom_end
FROM (
SELECT ais_trajectory_mmsi_daily.mmsi
,CAST(ais_trajectory_mmsi_daily.min_created_on AS date) AS created_on_date
,ais_trajectory_mmsi_daily.min_created_on
,ais_trajectory_mmsi_daily.max_created_on
,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326) AS geo_lat_long_line
--
-- Here seems to be the problem...
--
,(ST_Dump(ST_Intersection(ST_GeometryFromText('POLYGON((7.1084729427821 50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 50.7352828020046))', 4326)
,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326)
)
)).geom AS geom
FROM ais_trajectory_mmsi_daily
WHERE ST_Intersects(ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326),
ST_GeometryFromText('POLYGON((7.1084729427821 50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 50.7352828020046))', 4326) )
AND min_created_on >= CAST('2022-10-01' AS TIMESTAMPTZ)
AND max_created_on < CAST('2022-10-02' AS TIMESTAMPTZ)
) AS intersection
GROUP BY intersection.mmsi
,intersection.created_on_date
,intersection.min_created_on
,intersection.max_created_on
,intersection.geom
ORDER BY created_on_date, mmsi, geom_start
;
If needed I can try to create some sample with data to put on sqlfiddle.
On Stackoverflow I found this which seems to be related to the same issue:
https://stackoverflow.com/questions/62090829/why-st-intersection-from-postgis-returns-multilinestring-for-self-intersecting-l
Many thanks in advance for hints to solve the problem.
Juergen
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/postgis-users
--
Simon Greener
39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, Australia
(m) +61 418 396 391
(w) www.spdba.com.au<http://www.spdba.com.au>
(m) simon at spdba.com.au<mailto:simon at spdba.com.au>
_______________________________________________ postgis-users mailing list postgis-users at lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20221028/d679b9b6/attachment.htm>
More information about the postgis-users
mailing list