[postgis-users] Parallel spatial indexing for GiST?

Sat Jan 23 02:30:40 PST 2021

Hi,

Here it is, committed to PG14. Works for Postgres's internal `point` type
now as far as I know. No PostGIS support yet since that's gonna be in
almost a year and committed just a week ago.
https://commitfest.postgresql.org/29/2276/

There is no parallel sort there though so you're welcome to improve that
too.

You're welcome to test and report your findings.

On Sat, Jan 23, 2021 at 1:25 PM Marco Boeringa <marco at boeringa.demon.nl>
wrote:

> Hi all,
>
> A couple of months ago, I asked a question about GiST spatial indexing
> speed for (PostGIS) geometries, and the fact that the current
> implementation is slow and not capable of parallel indexing. GiST
> indexing is currently one of the major bottlenecks when dealing with
> massive spatial datasets like e.g. OpenStreetMap.
>
> Back then, a couple of posters indicated there was ongoing work to
> improve this, and that there might be improvements coming up for a
> future release of PostgreSQL, e.g. see the remarks of Alexander below as
> one example.
>
> Does anyone know what is the current status of all of this work? Is
> there any realistic chance of an implementation in the upcoming
> PostgreSQL 14? This GiST indexing performance issue appears to have a
> long history, but with the ever growing OpenStreetMap database, is
> becoming an increasing problem for PostGIS users. I think improvements
> in PG14 would be really welcomed by the PostGIS community.
>
> Marco Boeringa
>
> > Op 21-9-2020 om 18:39 schreef Peter Geoghegan:
> >> On Wed, Sep 16, 2020 at 3:17 PM Alexander Korotkov
> >> <aekorotkov at gmail.com> wrote:
> >>> In fact, GiST indexes are inferior to B-tree indexes for many reasons.
> >>> For instance, B-tree implements disk-ordered scan during VACUUM, while
> >>> GiST doesn't.  B-tree could implement retail index tuple delete (which
> >>> will be very useful for pluggable table AMs), but for GiST retail
> >>> delete is problematic, because its performance is not guaranteed to be
> >>> log(N).  B-tree implements deduplication, but GiST doesn't. And so
> >>> on.  So, most of advances are applied to B-tree, while GiST is
> >>> retarded.  That makes me think it's a good idea to reimplement GiST on
> >>> top of B-tree.
> >> A standard B-Tree with Z ordering is sometimes called a UB-Tree. This
> >> is described in "Modern B-Tree techniques", which promotes UB-Trees as
> >> having all of the advantages you list here. Though the book also says
> >> that it probably has somewhat inferior read performance to
> >> "specialized structures" (which probably refers to loosely ordered
> >> tree structures like GiST). This matches my own high level intuitions
> >> about it.
> >>
> >>> In short, my idea is to add to the B-tree ability to maintain "union
> >>> key" (MBR for spatial) in pivot tuple (including leftmost pivot
> >>> tuple), while navigation of insertions will remain comparison-based
> >>> (Z-ordered for spatial).  Once we have union keys correctly
> >>> maintained, we can implement a special "union key" scan for B-tree,
> >>> which would work similar to the GiST scan. Maintaining union keys
> >>> could be tricky, because we will have to make sure no concurrent split
> >>> ruined our work on extension of parent union key before we've extended
> >>> the child union key.  But nevertheless, I think it's solvable.
> >> That sounds hard. It sounds like you more or less want to make the
> >> structure a true UB-Tree for inserts -- a structure where you cannot
> >> allow there to be more than one place for any possible key in the
> >> index. But you also want to maintain a union key for selects, in order
> >> to make it possible to have a choice of subtree to insert into, just
> >> like GiST. Is it really possible to do both at the same time? It feels
> >> like a fundamental trade-off.
> >>
> >> I suspect that it would be helpful (if you just did a traditional
> >> UB-Tree, without the union key stuff) to add specialized suffix
> >> truncation logic, that picked a split point according to special
> >> criteria for the conditioned Z-order keys. That might limit the extent
> >> of the performance hit for reads, without the added complexity of the
> >> union key thing. This is just a guess.
> >>
> >> I don't have a particularly good sense of the costs here. It's obvious
> >> that the benefits would be very significant, but you don't need me to
> >> say that. The hard parts are 1.) determining the costs, and 2.)
> >> determining if the benefits are worth it.
> >>
> >>
> >> --
> >> Peter Geoghegan
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>

-- 
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210123/4ed8f57e/attachment.html>