[postgis-users] improve query - summarize building count and area on each parcel

karsten karsten at terragis.net
Mon Mar 27 16:56:13 PDT 2023


> >>> Okay that is old version,  
Ha for me PostgreSQL 12.8 + POSTGIS 3.1.4 this is the newest version any of
my servers is running ;)
 
>>> but I think the fixes were done like 6 or 7 years ago such as detailed
here:
https://www.postgresql.org/message-id/011f01c61653$96c509f0$8300a8c0@trideca
p.com

 ok thanks

>>> As far as if a inner join is always better than a left join, I would say
no.
For example here is one that showed at least for this use case, a LEFT JOIN
was faster:
https://towardsdatascience.com/how-we-optimized-postgresql-queries-100x-ff52
555eabe

>>> And here where it shows a INNER + UNION ALL was faster than a LEFT
https://www.crunchydata.com/blog/postgres-query-optimization-left-join-vs-un
ion-all

>>> Since you have to update every single parcel anyway, I am curious how
something like this would fair:
UPDATE parcels p SET
(building_count, building_fp) = 
(select count(*) as num, sum(st_area(b.geom)) as sumarea FROM buildings b on
st_intersects(p.geom,st_centroid(b.geom)  ) ;
>>>
 
Will check and report back
 
Karsten

 

 

 

 

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf
Of karsten
Sent: Monday, March 27, 2023 6:24 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] improve query - summarize building count and
area on each parcel

 

Regina,

 

 >>>>  So you had the centroids indexed before you switched to regular join
and that took days too?   I would expect the index on the centroid to make
much more of a difference than the LEFT / INNER JOIN change. 

No at that time I had not the centroids indexed just the polygon boundaries
themselves  ...

 

 >>  I think the LEFT / INNER JOIN performance would have a lot to do with
number of rows in each table and whether it resorts to a nested vs. hash. 
There have been performance issues in PostgreSQL that cause LEFT to be much
slower than INNER, but I think most of those have been long resolved.

It might be some left over stuff with our poor spatial stats estimates.  >>>

Interesting

 

>> SELECT version(), postgis_full_version();

shows:

PostgreSQL 12.8 (Ubuntu 12.8-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit |
POSTGIS="3.1.4 ded6c34" [EXTENSION] PGSQL="120" GEOS="3.10.1-CAPI-1.16.0"
PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3"
WAGYU="0.5.0 (Internal)"

 

Cheers

Karsten


 

 

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf
Of karsten
Sent: Monday, March 27, 2023 5:16 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] improve query - summarize building count and
area on each parcel

 

That worked perfectly:
Using just join and setting the index on the centroids geom as well

I had the query finish in less than a minute (compared to three day running
and not yet finished !)

 

Does anyone have an explanation why 'regular' join is so much faster than
'left join' ?

 

Cheers

Karsten

 

________________________________

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf
Of Marcin Mionskowski
Sent: Sunday, March 26, 2023 23:34
To: PostGIS Users Discussion
Subject: Re: [postgis-users] improve query - summarize building count and
area on each parcel

In my "explain tests", made on much smaller tables, I get 500 times smaller
costs only by changing LEFT JOIN to "regular" JOIN.

 

pon., 27 mar 2023 o 01:07 Raj Talati <rajtalati at gmail.com> napisał(a):

	I prefer doing insert instead of update if we doing all rows update
:) so create New table with new table and copy from old and later drop it
you can go without index and do index later after copy. Postgres update is
delete and insert internally and direct big update will bloat table.

	Other thing you can do multi thread if you have 16 core with nothing
else can go 12-14 threads updates which can do portion of your big table
easy if no good criteria can be mod function on PK 

	 

	Hope this helps .

	 

	Raj T

	 

	On Sun, Mar 26, 2023 at 6:54 PM karsten <karsten at terragis.net>
wrote:

		Hi All,

		 

		I have two tables,  a parcel (polygon) table and a building
(polygon) table in a PostGIS db. 
		There are about 1 million parcels and over a million
buildings in these tables ...

		I have set gist indexes on both and clustered each on a
geohash, pkey is set on objectid.

		I would like to fill 2 additional columns for the parcels
layer to have the number of building on each parcel (building_count) and the
summed up area of all those building footprints (building_fp).

		Below is an update query but it is extremely lengthy and
inefficient (on a beefy 16 core Ubuntu server with 128 GB tons of RAM) : I
had to stop an after running 3 days when it was not finished ...

		I am wondering if anyone has an idea how to make it more
efficient ?

		 

		This is the query:

		 

		UPDATE parcels p SET
		building_count = subquery.num,
		building_fp = subquery.sumarea
		from
		(select objectid, count(*) as num, sum(st_area(b.geom)) as
sumarea FROM parcels p
		left join buildings b on
st_intersects(p.geom,st_centroid(b.geom)) group by objectid ) subquery
		where p.objectid = subquery.objectid ;

		 

		Any suggestions ?

		 

		Cheers

		Karsten Vennemann

		_______________________________________________
		postgis-users mailing list
		postgis-users at lists.osgeo.org
		https://lists.osgeo.org/mailman/listinfo/postgis-users

	_______________________________________________
	postgis-users mailing list
	postgis-users at lists.osgeo.org
	https://lists.osgeo.org/mailman/listinfo/postgis-users




More information about the postgis-users mailing list