[postgis-users] Merging census block data in postgresql

Paul Ramsey pramsey at cleverelephant.ca
Mon May 11 13:54:00 PDT 2020


Check for a GIST index on the geometry column of the address points in particular, but also the blocks.
If missing,

CREATE INDEX blocks_geom_x ON blocks USING GIST(geom);
CREATE INDEX addresses_geom_x ON addresses USING GIST(geom);

What SQL you run will depend to some extent on what you are trying to accomplish with the query. I'll assume you just want to make an address/block lookup table.

CREATE table addresses_blocks (
  address_id bigint,
  block_id bigint
);

INSERT INTO addresses_blocks (address_id, block_id) 
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);

Then just wait. Depending on your version of PgSQL and PostGIS this may or may not result in a parallel plan. Use EXPLAIN on the query before running it for real to see if you're getting multiple workers.

If you aren't getting multiple workers, you can get parallel behaviour by establishing multiple connections to the database and running independent queries in each that work on separate pieces of the problem.

INSERT INTO addresses_blocks (address_id, block_id) 
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);
  WHERE b.state = 'NY';

INSERT INTO addresses_blocks (address_id, block_id) 
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);
  WHERE b.state = 'CA';

Etc...
Depending on your platform, you might find that GNU parallel is a useful tool for spawning a set of workers that chew through a task file and run all the jobs you want in an optimally parallel way (ie, 4 cpus => 4 workers).

ATB,

P

> On May 11, 2020, at 1:27 PM, Tsering W. Shawa <shawatw at princeton.edu> wrote:
> 
> 
> I have to run a spatial join between millions of addresses and the census block boundaries covering whole United States. I can do this function in ArcGIS Pro by merging all the census blocks of each state to one File geodatabase and then run a spatial join tool. Many of you might know that the census block boundaries for the whole United States is over 11 GB and therefore too large for a shapefile. 
> 
> I am learning PostGIS and wanted to see how fast spatial join function works in PostGIS compare to ArcGIS Pro or QGIS. I have already imported all the census block boundaries in postgresql and plan to import geocoded addresses also in the postgresql database. 
> 
> My question is...what sql function should I use to merge them together? Do I need to create a new database or table that has all the census block boundaries merged together? Any spatial index I need to use to run it faster?  
> 
> Any workflow or suggestion or example of sql will be appreciated.
> 
> Many thanks in advance.
> 
> -Tsering
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list