[postgis-users] Parallel spatial indexing for GiST?

Wed Sep 16 11:29:06 PDT 2020

Hi Giuseppe,

Thanks for your insights regarding BRIN.

I actually do employ BRIN, but only for Point type geometry, where I 
have the (subjective) feeling that the performance is least degraded 
compared to GiST. I also set the 'pages_per_range' parameter to a much 
smaller value than the default. Even with small values for this 
parameter, the size and creation times of the resulting index is nothing 
compared to GiST.

For Polygon data, the few times I tried using a BRIN type spatial index, 
I had the feeling it was probably some 3-4 times slower than GiST in 
terms of display times in a GIS, but these aren't hard figures, because 
I did not really time it. I also had the feeling that there was 
considerably more disk activity needed to access the relevant geometries.

The data is from osm2pgsql, that initially spatially sorts the data 
using the default PostGIS spatial sorting / clustering using Hilbert 
curve. This should be efficient. I derive tables from that, some of 
which are additionally being spatially clustered depending on the 
processing they have had (for those tables I actually also need to 
create GiST type spatial indexes, as the PostgreSQL CLUSTER command 
cannot use BRIN as input for spatial clustering, due to the nature of 
the index, it will fail with an error / warning about this when you 
attempt it, and CLUSTER needs a (spatial) index as input).

osm2pgsql itself already seems to optimize indexing, in the sense that 
it launches multiple index processes against different tables in 
parallel. This is a kind of "parallel indexing", but not against a 
single table / spatial column. For the processes I developed myself, 
this is not feasible though, and I would benefit of having parallel GiST 
index creation for a single geometry column.

Marco

Op 16-9-2020 om 18:55 schreef Giuseppe Broccolo:
> Hi Marco,
>
> Il giorno mer 16 set 2020 alle ore 15:35 Marco Boeringa 
> <marco at boeringa.demon.nl <mailto:marco at boeringa.demon.nl>> ha scritto:
>
>     [...]
>     Yes, I know there are BRIN type spatial indexes for PostGIS, which
>     are
>     comparatively super fast to create and lead to very small indexes
>     even
>     for ultra large tables, but from the little information and personal
>     experience I gathered, BRIN seems most suited for Point data only,
>     and
>     for static, not updated data, due to its requirement of clustered
>     data
>     for efficiency (actually not a problem in my particular case, since I
>     don't do updates, but only reloads). The few times I tried to use
>     it for
>     large, spatially clustered, Polygon data sets, it seemed less
>     efficient
>     when accessing the data spatially in a GIS, with clearly longer
>     display
>     times, although I don't have real benchmarks for that.
>
>     Most OpenStreetMap related tools like e.g. osm2pgsql also default to
>     GiST, and probably with good reason.
>
>
> About BRIN in PostGIS: it internally works using bounding boxes of 
> geometries,
> as GiST, so in principle you can use this index for any geometry type, 
> and as
> far as you use intersect, contains, is_contained operators for 2D 
> geometries and
> intersects for 3D ones in your geospatial queries.
>
> You are right when you say that BRIN is more suitable for "static" 
> data, because
> of how it internally works - creating a sort of summary of which range 
> of tuples are
> included in the data pages physically stored, just to use a few words. 
> New entries
> added during INSERTs or UPDATEs are properly summarised in BRINs as 
> far as
> the new indexed values/geometries are included in ranges/bounding 
> boxes already
> present in the index: in case new pages are created with data which 
> does not fall
> within the last summarized range, the new ranges are not automatically 
> acquired
> in the summary, and the related tuples remain unsummarized until a new 
> summarization
> is invoked, automatically through a VACUUM or manually through 
> |brin_summarize_range |
> or|brin_summarize_new_valuesfunctions. This allows some maintenance of 
> the
> |
> |index even with non static data, of course with some limitation 
> compared to GiST.
> |
> |
> |
> |About the performance: being a range index it surely performs worse 
> compared
> |
> |to Rtree indexes like GiST. How much worse depends from several factors:
>
> |
> |1) how the data pages are physically stored: ranges are as more 
> effective as possible
> |
> |as far as spatially close geometries are adjacently stored even in 
> physical pages the
> |
> |storage, so the initial import of spatial data should need to be done 
> following some|
> |sorting criteria|
> |
> |
> |2) BRIN granularity: performance starts to be closer to an Rtree one 
> as far as the size|
> |of the block range is small. This can be configured during index 
> creation with the|
> |parameter |pages_per_range|, i.e. how many pages are summarised per 
> range.|
> |Of course, the smaller the number, the larger is the resulting BRIN 
> and more time|
> |is needed for the creation|
> |
> |
> |GiSTs remain faster even with 2), but I'd suggest checking how the 
> data was originally|
> |imported into the geospatial DB in order to be sure you could benefit 
> as much as possible|
> |from a range index.
> |
> |
> |
> |Hope it helps,|
> |Giuseppe.
> |
> |
> |
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200916/559784d5/attachment.html>