[postgis-users] Parallel spatial indexing for GiST?

Wed Sep 16 09:55:25 PDT 2020

Hi Marco,

Il giorno mer 16 set 2020 alle ore 15:35 Marco Boeringa <
marco at boeringa.demon.nl> ha scritto:

> [...]
> Yes, I know there are BRIN type spatial indexes for PostGIS, which are
> comparatively super fast to create and lead to very small indexes even
> for ultra large tables, but from the little information and personal
> experience I gathered, BRIN seems most suited for Point data only, and
> for static, not updated data, due to its requirement of clustered data
> for efficiency (actually not a problem in my particular case, since I
> don't do updates, but only reloads). The few times I tried to use it for
> large, spatially clustered, Polygon data sets, it seemed less efficient
> when accessing the data spatially in a GIS, with clearly longer display
> times, although I don't have real benchmarks for that.
>
> Most OpenStreetMap related tools like e.g. osm2pgsql also default to
> GiST, and probably with good reason.
>

About BRIN in PostGIS: it internally works using bounding boxes of
geometries,
as GiST, so in principle you can use this index for any geometry type, and
as
far as you use intersect, contains, is_contained operators for 2D
geometries and
intersects for 3D ones in your geospatial queries.

You are right when you say that BRIN is more suitable for "static" data,
because
of how it internally works - creating a sort of summary of which range of
tuples are
included in the data pages physically stored, just to use a few words. New
entries
added during INSERTs or UPDATEs are properly summarised in BRINs as far as
the new indexed values/geometries are included in ranges/bounding boxes
already
present in the index: in case new pages are created with data which does
not fall
within the last summarized range, the new ranges are not automatically
acquired
in the summary, and the related tuples remain unsummarized until a new
summarization
is invoked, automatically through a VACUUM or manually through
brin_summarize_range

or brin_summarize_new_values functions. This allows some maintenance of the
index even with non static data, of course with some limitation compared to
GiST.

About the performance: being a range index it surely performs worse
compared
to Rtree indexes like GiST. How much worse depends from several factors:

1) how the data pages are physically stored: ranges are as more effective
as possible
as far as spatially close geometries are adjacently stored even in physical
pages the
storage, so the initial import of spatial data should need to be done
following some
sorting criteria

2) BRIN granularity: performance starts to be closer to an Rtree one as far
as the size
of the block range is small. This can be configured during index creation
with the
parameter pages_per_range, i.e. how many pages are summarised per range.
Of course, the smaller the number, the larger is the resulting BRIN and
more time
is needed for the creation

GiSTs remain faster even with 2), but I'd suggest checking how the data was
originally
imported into the geospatial DB in order to be sure you could benefit as
much as possible
from a range index.

Hope it helps,
Giuseppe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200916/984fa53c/attachment.html>