[postgis-tickets] [PostGIS] #4853: Documentation of ST_ClusterDBSCAN function and minpoints parameter definition issue

PostGIS trac at osgeo.org
Tue Feb 16 01:27:39 PST 2021


#4853: Documentation of ST_ClusterDBSCAN function and minpoints parameter
definition issue
-------------------------------------------------+-------------------------
 Reporter:  gisupanalizy                         |      Owner:  pramsey
     Type:  defect                               |     Status:  new
 Priority:  medium                               |  Milestone:  PostGIS
                                                 |  2.4.10
Component:  postgis                              |    Version:  2.4.x
 Keywords:  window st_clusterdbscan windowing    |
  function                                       |
-------------------------------------------------+-------------------------
 While using ST_ClusterDBSCAN function I was confused seeing that some
 clusters haven't got created even if they evidently should be. I was
 assuming that:
 {{{
 ST_ClusterDBSCAN(geom, eps := 1, minpoints := 4) OVER(PARTITION BY
 CONCAT(country_code, elevation_ft, height_ft, obstacle_type))
 }}}

 would result in clustering points that have the same "PARTITION BY"
 attributes and the minimum group would need to have 4 points (including
 core point) in eps distance. According to the docs:
 https://postgis.net/docs/manual-2.4/ST_ClusterDBSCAN.html
 An input geometry will be added to a cluster if it is either:

 - A "core" geometry, that is within eps distance of at least minpoints
 input geometries (including itself) or
 - A "border" geometry, that is within eps distance of core geometry. its
 surrounding area with radius eps.

 But it seems that it's not exactly true. It seems that in order to cluster
 4 points (like minpoints parameter is set), the grouping query:


 {{{
 OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft,
 obstacle_type))
 }}}


 needs to result in at least five objects to let clst_id to be created for
 other four.. Here is an example:


 {{{
 CREATE TABLE IF NOT EXISTS public.point_table
 (
     point_sys_id serial primary key,--System generated Primary Key - Asset
 Cache
     point_id bigint,
     geom geometry(Point,4326),--Geometry Field
     country_code varchar(4),--Country Code
     elevation_ft numeric(7,2),--Elevation in Feet
     height_ft numeric(7,2),--Height in Feet
     obstacle_type varchar(50)--Obstacle Type
 )

 INSERT INTO point_table(point_id, geom, country_code, elevation_ft,
 height_ft, obstacle_type)
 VALUES
 (1, '0101000020E6100000E4141DC9E5934B40D235936FB6193940', 'ARE', 100, 50,
 'BUILDING'),
 (2, '0101000020E6100000C746205ED7934B40191C25AFCE193940', 'ARE', 100, 50,
 'BUILDING'),
 (3, '0101000020E6100000C780ECF5EE934B40B6BE4868CB193940', 'ARE', 100, 50,
 'BUILDING'),
 (4, '0101000020E6100000A97A358FA5AF4B4074A0C65B724C3940', 'ARE', 100, 50,
 'BUILDING'), -- this point is outside of the cluster distance (eps)
 (5, '0101000020E6100000ABB2EF8AE0934B404451A04FE4193940', 'ARE', 100, 50,
 'BUILDING')

 select ST_ClusterDBSCAN(geom, eps := 0.000906495804256269, minpoints := 4)
 OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft,
 obstacle_type)) as clst_id,
        point_id, geom, country_code, elevation_ft, height_ft,
 obstacle_type
 from point_table
 --where point_id != 4
 }}}


 Running clustering query agains all five points works fine. But once you
 exclude seemingly irrelevant point_id = 4 (which is outside of eps
 distance anyway) the clustering stops working (clst_id becomes null), even
 if still theoretically 4 needed points (according to the docs) are in
 place.

 Once I change the minpoints parameter to 3, clustering works fine for
 those 4 neighbor points.

 Can someone confirm my conclusion that ST_ClusterDBSCAN is not correct or
 give some good exmplanation for this behavior?

 The same issue I've posted on StackOverflow:
 https://stackoverflow.com/questions/66159186/st-clusterdbscan-function-
 and-minpoints-parameter-definition

-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/4853>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list