Running clustering queries

Regina Obe lr at pcorp.us
Thu Mar 28 06:34:52 PDT 2024


> On Thu, 28 Mar 2024, Gary Turner wrote:
> 
> >
> > On 28/03/2024 12:32 pm, Max Pyziur wrote:
> >> On Wed, 27 Mar 2024, Regina Obe wrote:
> >>
> >>> That's the function I would have suggested.
> >>>
> >>> What query are you trying to run and what is the spatial reference
> >>> system id for your data.
> >>>
> >>> I'm guessing the issue might be your units of measure.  If you want
> >>> to do
> >>> 500 ft you'd be best using a meter or mile based spatial reference system.
> >>
> >> Here are two examples that I tried:
> >> SELECT city, state, ST_ClusterDBScan(geom, eps := 1000, minpoints :=
> >> 1) OVER () AS cluster FROM shapefiles.atp_filling_stations aa WHERE
> >> aa.amenity = 'fuel'
> >> AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;
> >>
> >>
> >> SELECT ST_ClusterKMeans(geom, 1000) OVER () as cluster_id FROM
> >> shapefiles.atp_filling_stations GROUP BY cluster_id;
> >
> > But what's the SRS of 'geom'?
> >
> > If it's say WGS84, which is a degree based SRS, 1000 will be 1000 degrees!
> >
> > I don't use archaic units, sorry, so I don' know if there are any foot
> > based spatial reference systems.
> 
> So, I create a table from this data that I scraped (about 80k U.S. filling
> stations). The lat and long fields are floats.
> 
> I then issue these two commands:
> ALTER TABLE shapefiles.atp_filling_stations ADD COLUMN geom
> geometry(Point, 4326);
> 
> UPDATE shapefiles.atp_filling_stations SET geom =
> ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
> 
> So the SRS is 4326, right?
> 
> Max

Depends where you got the data, but generally if they give you longitude latitude 
Columns, then 4326 is a safe assumption and looks like you created the geometry right too.

To Gary's point, 4326 is degree based, so no wonder you aren't getting any meaningful answers since 1000 would cover the whole world.

SELECT city, state, ST_ClusterDBScan(geom, eps := 1000, minpoints :=
> >> 1) OVER () AS cluster FROM shapefiles.atp_filling_stations aa WHERE
> >> aa.amenity = 'fuel'
> >> AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;

For US data, I have pretty good luck with US National Atlas Equal Area -- SRID=2163 which is a meter based coordinate system generally good enough for accurate measurement.
Web Mercator is another but that is really bad for measure preserving

But SRID=2163 is not common for display on maps etc, so I usually only use it for good enough measurement preserving or as a functional index, not for storage.

I think your dataset is small enough that you can get away with on the fly projection like below


SELECT city, state, ST_ClusterDBScan( ST_Transform(geom,2163), eps := 0.3048*500, minpoints :=1) OVER () AS cluster FROM shapefiles.atp_filling_stations aa 
	WHERE aa.amenity = 'fuel'
AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;





More information about the postgis-users mailing list