Continuing on clustering queries

Max Pyziur pyz at brama.com
Thu Mar 28 08:31:27 PDT 2024


Thank you again; results are starting to come in.

Apologies for starting a new thread; but the other one was getting messy.

Here are the queries that I am testing:
SELECT city, state, ST_ClusterDBScan( ST_Transform(geom,2163), eps := 
0.3048*5000, minpoints :=1) OVER () AS cluster_id
INTO TEMP aa_zz
FROM shapefiles.atp_filling_stations aa
WHERE aa.amenity = 'fuel'
AND aa.ctry_code = 'US' AND aa.geom IS NOT NULL;

SELECT cluster_id, COUNT(*) FROM aa_zz GROUP BY 1 ORDER BY 2 DESC LIMIT 
10;

SELECT state,  cluster_id, COUNT(*) FROM aa_zz GROUP BY 2, 1 ORDER BY 3 
DESC LIMIT 20;

SELECT city, state,  cluster_id, COUNT(*) FROM aa_zz GROUP BY 3, 2, 1 
ORDER BY 4 DESC LIMIT 20;


I receive the following results:
SELECT 76023
  cluster_id | count
------------+-------
        8467 |   422
        3888 |   371
        5701 |   249
        3351 |   177
        1111 |   141
        8352 |   120
        1610 |   107
        1812 |    90
        1824 |    86
       14232 |    81
(10 rows)

  state | cluster_id | count
-------+------------+-------
  MI    |       8467 |   422
  NY    |       3888 |   371
  IL    |       5701 |   245
  NY    |       3351 |   177
  CO    |       8352 |   110
  TX    |       1610 |   101
  NY    |       1824 |    86
  DC    |       1111 |    82
  IN    |      14748 |    78
  TX    |      14297 |    75
  MO    |       3661 |    74
  CA    |      11272 |    74
  MI    |        462 |    71
  TX    |      14232 |    70
  TX    |      19326 |    69
  TX    |       1812 |    68
  FL    |      13693 |    68
  TX    |      16824 |    68
  CA    |      14006 |    66
  IL    |       2727 |    64
(20 rows)

      city     | state | cluster_id | count
--------------+-------+------------+-------
  DETROIT      | MI    |       8467 |   147
  CHICAGO      | IL    |       5701 |   104
  Austin       | TX    |       1610 |   101
  Detroit      | MI    |       8467 |    85
  BROOKLYN     | NY    |       3888 |    77
  Chicago      | IL    |       5701 |    75
  Fresno       | CA    |      11272 |    73
  Brooklyn     | NY    |       3888 |    71
  WASHINGTON   | DC    |       1111 |    66
  HOUSTON      | TX    |      14297 |    64
  St. Louis    | MO    |       3661 |    63
  HOUSTON      | TX    |      14232 |    63
  DENVER       | CO    |       8352 |    54
  BRONX        | NY    |       3351 |    53
  LOS ANGELES  | CA    |      14006 |    50
  Bakersfield  | CA    |       1777 |    49
  INDIANAPOLIS | IN    |      14748 |    44
  Bronx        | NY    |       3351 |    42
  Amarillo     | TX    |        622 |    41
  Austin       | TX    |       1601 |    38
(20 rows)


Obviously, the case-based aspect of the city name can be cancelled with a 
function such as UPPER(city).

I also am generous in the radius dimension setting it at 5k feet.

But my next question is how do position the cluster_id on a map w/o 
referencing the city, and then tying that out to a geometry that I have 
stored for the city?

Thank you again,

Max


More information about the postgis-users mailing list