<div dir="ltr">Hi, <div><br></div><div>Here it is, committed to PG14. Works for Postgres's internal `point` type now as far as I know. No PostGIS support yet since that's gonna be in almost a year and committed just a week ago.<br><a href="https://commitfest.postgresql.org/29/2276/">https://commitfest.postgresql.org/29/2276/</a></div><div><br></div><div>There is no parallel sort there though so you're welcome to improve that too.<br><br>You're welcome to test and report your findings.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jan 23, 2021 at 1:25 PM Marco Boeringa <<a href="mailto:marco@boeringa.demon.nl">marco@boeringa.demon.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>

<br>

A couple of months ago, I asked a question about GiST spatial indexing <br>

speed for (PostGIS) geometries, and the fact that the current <br>

implementation is slow and not capable of parallel indexing. GiST <br>

indexing is currently one of the major bottlenecks when dealing with <br>

massive spatial datasets like e.g. OpenStreetMap.<br>

<br>

Back then, a couple of posters indicated there was ongoing work to <br>

improve this, and that there might be improvements coming up for a <br>

future release of PostgreSQL, e.g. see the remarks of Alexander below as <br>

one example.<br>

<br>

Does anyone know what is the current status of all of this work? Is <br>

there any realistic chance of an implementation in the upcoming <br>

PostgreSQL 14? This GiST indexing performance issue appears to have a <br>

long history, but with the ever growing OpenStreetMap database, is <br>

becoming an increasing problem for PostGIS users. I think improvements <br>

in PG14 would be really welcomed by the PostGIS community.<br>

<br>

Marco Boeringa<br>

<br>

> Op 21-9-2020 om 18:39 schreef Peter Geoghegan:<br>

>> On Wed, Sep 16, 2020 at 3:17 PM Alexander Korotkov <br>

>> <<a href="mailto:aekorotkov@gmail.com" target="_blank">aekorotkov@gmail.com</a>> wrote:<br>

>>> In fact, GiST indexes are inferior to B-tree indexes for many reasons.<br>

>>> For instance, B-tree implements disk-ordered scan during VACUUM, while<br>

>>> GiST doesn't.  B-tree could implement retail index tuple delete (which<br>

>>> will be very useful for pluggable table AMs), but for GiST retail<br>

>>> delete is problematic, because its performance is not guaranteed to be<br>

>>> log(N).  B-tree implements deduplication, but GiST doesn't. And so<br>

>>> on.  So, most of advances are applied to B-tree, while GiST is<br>

>>> retarded.  That makes me think it's a good idea to reimplement GiST on<br>

>>> top of B-tree.<br>

>> A standard B-Tree with Z ordering is sometimes called a UB-Tree. This<br>

>> is described in "Modern B-Tree techniques", which promotes UB-Trees as<br>

>> having all of the advantages you list here. Though the book also says<br>

>> that it probably has somewhat inferior read performance to<br>

>> "specialized structures" (which probably refers to loosely ordered<br>

>> tree structures like GiST). This matches my own high level intuitions<br>

>> about it.<br>

>><br>

>>> In short, my idea is to add to the B-tree ability to maintain "union<br>

>>> key" (MBR for spatial) in pivot tuple (including leftmost pivot<br>

>>> tuple), while navigation of insertions will remain comparison-based<br>

>>> (Z-ordered for spatial).  Once we have union keys correctly<br>

>>> maintained, we can implement a special "union key" scan for B-tree,<br>

>>> which would work similar to the GiST scan. Maintaining union keys<br>

>>> could be tricky, because we will have to make sure no concurrent split<br>

>>> ruined our work on extension of parent union key before we've extended<br>

>>> the child union key.  But nevertheless, I think it's solvable.<br>

>> That sounds hard. It sounds like you more or less want to make the<br>

>> structure a true UB-Tree for inserts -- a structure where you cannot<br>

>> allow there to be more than one place for any possible key in the<br>

>> index. But you also want to maintain a union key for selects, in order<br>

>> to make it possible to have a choice of subtree to insert into, just<br>

>> like GiST. Is it really possible to do both at the same time? It feels<br>

>> like a fundamental trade-off.<br>

>><br>

>> I suspect that it would be helpful (if you just did a traditional<br>

>> UB-Tree, without the union key stuff) to add specialized suffix<br>

>> truncation logic, that picked a split point according to special<br>

>> criteria for the conditioned Z-order keys. That might limit the extent<br>

>> of the performance hit for reads, without the added complexity of the<br>

>> union key thing. This is just a guess.<br>

>><br>

>> I don't have a particularly good sense of the costs here. It's obvious<br>

>> that the benefits would be very significant, but you don't need me to<br>

>> say that. The hard parts are 1.) determining the costs, and 2.)<br>

>> determining if the benefits are worth it.<br>

>><br>

>><br>

>> -- <br>

>> Peter Geoghegan<br>

> _______________________________________________<br>

> postgis-users mailing list<br>

> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>

> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>

_______________________________________________<br>

postgis-users mailing list<br>

<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>

<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>

</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Darafei "Komяpa" Praliaskouski<br>OSM BY Team - <a href="http://openstreetmap.by/" target="_blank">http://openstreetmap.by/</a><br></div></div>