[postgis-users] Parallel spatial indexing for GiST?

Sat Sep 26 23:24:52 PDT 2020

Peter, Paul and Alexander,

One last question: is there anything actionable by me regarding this, 
e.g. opening a ticket on the PostgreSQL issue tracker?

That is as much as I can do, because I have zero experience with 
PostgreSQL development, but if you want me to open an issue describing 
the performance issues and benefits to the PostGIS community of having 
much faster GiST indexing and referencing this mailing list thread, that 
is something I could potentially do. Probably not a whole lot of value 
in it, but at least it would document the enhancement request.

Marco

Op 21-9-2020 om 18:39 schreef Peter Geoghegan:
> On Wed, Sep 16, 2020 at 3:17 PM Alexander Korotkov <aekorotkov at gmail.com> wrote:
>> In fact, GiST indexes are inferior to B-tree indexes for many reasons.
>> For instance, B-tree implements disk-ordered scan during VACUUM, while
>> GiST doesn't.  B-tree could implement retail index tuple delete (which
>> will be very useful for pluggable table AMs), but for GiST retail
>> delete is problematic, because its performance is not guaranteed to be
>> log(N).  B-tree implements deduplication, but GiST doesn't.  And so
>> on.  So, most of advances are applied to B-tree, while GiST is
>> retarded.  That makes me think it's a good idea to reimplement GiST on
>> top of B-tree.
> A standard B-Tree with Z ordering is sometimes called a UB-Tree. This
> is described in "Modern B-Tree techniques", which promotes UB-Trees as
> having all of the advantages you list here. Though the book also says
> that it probably has somewhat inferior read performance to
> "specialized structures" (which probably refers to loosely ordered
> tree structures like GiST). This matches my own high level intuitions
> about it.
>
>> In short, my idea is to add to the B-tree ability to maintain "union
>> key" (MBR for spatial) in pivot tuple (including leftmost pivot
>> tuple), while navigation of insertions will remain comparison-based
>> (Z-ordered for spatial).  Once we have union keys correctly
>> maintained, we can implement a special "union key" scan for B-tree,
>> which would work similar to the GiST scan. Maintaining union keys
>> could be tricky, because we will have to make sure no concurrent split
>> ruined our work on extension of parent union key before we've extended
>> the child union key.  But nevertheless, I think it's solvable.
> That sounds hard. It sounds like you more or less want to make the
> structure a true UB-Tree for inserts -- a structure where you cannot
> allow there to be more than one place for any possible key in the
> index. But you also want to maintain a union key for selects, in order
> to make it possible to have a choice of subtree to insert into, just
> like GiST. Is it really possible to do both at the same time? It feels
> like a fundamental trade-off.
>
> I suspect that it would be helpful (if you just did a traditional
> UB-Tree, without the union key stuff) to add specialized suffix
> truncation logic, that picked a split point according to special
> criteria for the conditioned Z-order keys. That might limit the extent
> of the performance hit for reads, without the added complexity of the
> union key thing. This is just a guess.
>
> I don't have a particularly good sense of the costs here. It's obvious
> that the benefits would be very significant, but you don't need me to
> say that. The hard parts are 1.) determining the costs, and 2.)
> determining if the benefits are worth it.
>
>
> --
> Peter Geoghegan