[postgis-tickets] [PostGIS] #4853: Documentation of ST_ClusterDBSCAN function and minpoints parameter definition issue
PostGIS
trac at osgeo.org
Tue Feb 16 01:27:39 PST 2021
#4853: Documentation of ST_ClusterDBSCAN function and minpoints parameter
definition issue
-------------------------------------------------+-------------------------
Reporter: gisupanalizy | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS
| 2.4.10
Component: postgis | Version: 2.4.x
Keywords: window st_clusterdbscan windowing |
function |
-------------------------------------------------+-------------------------
While using ST_ClusterDBSCAN function I was confused seeing that some
clusters haven't got created even if they evidently should be. I was
assuming that:
{{{
ST_ClusterDBSCAN(geom, eps := 1, minpoints := 4) OVER(PARTITION BY
CONCAT(country_code, elevation_ft, height_ft, obstacle_type))
}}}
would result in clustering points that have the same "PARTITION BY"
attributes and the minimum group would need to have 4 points (including
core point) in eps distance. According to the docs:
https://postgis.net/docs/manual-2.4/ST_ClusterDBSCAN.html
An input geometry will be added to a cluster if it is either:
- A "core" geometry, that is within eps distance of at least minpoints
input geometries (including itself) or
- A "border" geometry, that is within eps distance of core geometry. its
surrounding area with radius eps.
But it seems that it's not exactly true. It seems that in order to cluster
4 points (like minpoints parameter is set), the grouping query:
{{{
OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft,
obstacle_type))
}}}
needs to result in at least five objects to let clst_id to be created for
other four.. Here is an example:
{{{
CREATE TABLE IF NOT EXISTS public.point_table
(
point_sys_id serial primary key,--System generated Primary Key - Asset
Cache
point_id bigint,
geom geometry(Point,4326),--Geometry Field
country_code varchar(4),--Country Code
elevation_ft numeric(7,2),--Elevation in Feet
height_ft numeric(7,2),--Height in Feet
obstacle_type varchar(50)--Obstacle Type
)
INSERT INTO point_table(point_id, geom, country_code, elevation_ft,
height_ft, obstacle_type)
VALUES
(1, '0101000020E6100000E4141DC9E5934B40D235936FB6193940', 'ARE', 100, 50,
'BUILDING'),
(2, '0101000020E6100000C746205ED7934B40191C25AFCE193940', 'ARE', 100, 50,
'BUILDING'),
(3, '0101000020E6100000C780ECF5EE934B40B6BE4868CB193940', 'ARE', 100, 50,
'BUILDING'),
(4, '0101000020E6100000A97A358FA5AF4B4074A0C65B724C3940', 'ARE', 100, 50,
'BUILDING'), -- this point is outside of the cluster distance (eps)
(5, '0101000020E6100000ABB2EF8AE0934B404451A04FE4193940', 'ARE', 100, 50,
'BUILDING')
select ST_ClusterDBSCAN(geom, eps := 0.000906495804256269, minpoints := 4)
OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft,
obstacle_type)) as clst_id,
point_id, geom, country_code, elevation_ft, height_ft,
obstacle_type
from point_table
--where point_id != 4
}}}
Running clustering query agains all five points works fine. But once you
exclude seemingly irrelevant point_id = 4 (which is outside of eps
distance anyway) the clustering stops working (clst_id becomes null), even
if still theoretically 4 needed points (according to the docs) are in
place.
Once I change the minpoints parameter to 3, clustering works fine for
those 4 neighbor points.
Can someone confirm my conclusion that ST_ClusterDBSCAN is not correct or
give some good exmplanation for this behavior?
The same issue I've posted on StackOverflow:
https://stackoverflow.com/questions/66159186/st-clusterdbscan-function-
and-minpoints-parameter-definition
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/4853>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list