[postgis-users] Parallel spatial indexing for GiST?
Marco Boeringa
marco at boeringa.demon.nl
Sat Jan 23 02:39:26 PST 2021
Hi Darafei,
How does this relate to e.g. PostGIS with polygon or line data?
Faster indexing for point data only is only of limited value, there are
already reasonable alternatives for point type data, e.g. BRIN indexes.
However, for polygon and line data, GiST does still appear to be the
most efficient index for accessing the data in spatial lookups, and
tools like e.g. osm2pgsql default to them.
Is this commit just the "groundwork" layed out to implement something
like this in PostGIS, or does it require more work on the PostgreSQL
side itself before that is even conceivable?
Marco
Op 23-1-2021 om 11:30 schreef Darafei "Komяpa" Praliaskouski:
> Hi,
>
> Here it is, committed to PG14. Works for Postgres's internal `point`
> type now as far as I know. No PostGIS support yet since that's gonna
> be in almost a year and committed just a week ago.
> https://commitfest.postgresql.org/29/2276/
> <https://commitfest.postgresql.org/29/2276/>
>
> There is no parallel sort there though so you're welcome to
> improve that too.
>
> You're welcome to test and report your findings.
>
> On Sat, Jan 23, 2021 at 1:25 PM Marco Boeringa
> <marco at boeringa.demon.nl <mailto:marco at boeringa.demon.nl>> wrote:
>
> Hi all,
>
> A couple of months ago, I asked a question about GiST spatial
> indexing
> speed for (PostGIS) geometries, and the fact that the current
> implementation is slow and not capable of parallel indexing. GiST
> indexing is currently one of the major bottlenecks when dealing with
> massive spatial datasets like e.g. OpenStreetMap.
>
> Back then, a couple of posters indicated there was ongoing work to
> improve this, and that there might be improvements coming up for a
> future release of PostgreSQL, e.g. see the remarks of Alexander
> below as
> one example.
>
> Does anyone know what is the current status of all of this work? Is
> there any realistic chance of an implementation in the upcoming
> PostgreSQL 14? This GiST indexing performance issue appears to have a
> long history, but with the ever growing OpenStreetMap database, is
> becoming an increasing problem for PostGIS users. I think
> improvements
> in PG14 would be really welcomed by the PostGIS community.
>
> Marco Boeringa
>
> > Op 21-9-2020 om 18:39 schreef Peter Geoghegan:
> >> On Wed, Sep 16, 2020 at 3:17 PM Alexander Korotkov
> >> <aekorotkov at gmail.com <mailto:aekorotkov at gmail.com>> wrote:
> >>> In fact, GiST indexes are inferior to B-tree indexes for many
> reasons.
> >>> For instance, B-tree implements disk-ordered scan during
> VACUUM, while
> >>> GiST doesn't. B-tree could implement retail index tuple
> delete (which
> >>> will be very useful for pluggable table AMs), but for GiST retail
> >>> delete is problematic, because its performance is not
> guaranteed to be
> >>> log(N). B-tree implements deduplication, but GiST doesn't. And so
> >>> on. So, most of advances are applied to B-tree, while GiST is
> >>> retarded. That makes me think it's a good idea to reimplement
> GiST on
> >>> top of B-tree.
> >> A standard B-Tree with Z ordering is sometimes called a
> UB-Tree. This
> >> is described in "Modern B-Tree techniques", which promotes
> UB-Trees as
> >> having all of the advantages you list here. Though the book
> also says
> >> that it probably has somewhat inferior read performance to
> >> "specialized structures" (which probably refers to loosely ordered
> >> tree structures like GiST). This matches my own high level
> intuitions
> >> about it.
> >>
> >>> In short, my idea is to add to the B-tree ability to maintain
> "union
> >>> key" (MBR for spatial) in pivot tuple (including leftmost pivot
> >>> tuple), while navigation of insertions will remain
> comparison-based
> >>> (Z-ordered for spatial). Once we have union keys correctly
> >>> maintained, we can implement a special "union key" scan for
> B-tree,
> >>> which would work similar to the GiST scan. Maintaining union keys
> >>> could be tricky, because we will have to make sure no
> concurrent split
> >>> ruined our work on extension of parent union key before we've
> extended
> >>> the child union key. But nevertheless, I think it's solvable.
> >> That sounds hard. It sounds like you more or less want to make the
> >> structure a true UB-Tree for inserts -- a structure where you
> cannot
> >> allow there to be more than one place for any possible key in the
> >> index. But you also want to maintain a union key for selects,
> in order
> >> to make it possible to have a choice of subtree to insert into,
> just
> >> like GiST. Is it really possible to do both at the same time?
> It feels
> >> like a fundamental trade-off.
> >>
> >> I suspect that it would be helpful (if you just did a traditional
> >> UB-Tree, without the union key stuff) to add specialized suffix
> >> truncation logic, that picked a split point according to special
> >> criteria for the conditioned Z-order keys. That might limit the
> extent
> >> of the performance hit for reads, without the added complexity
> of the
> >> union key thing. This is just a guess.
> >>
> >> I don't have a particularly good sense of the costs here. It's
> obvious
> >> that the benefits would be very significant, but you don't need
> me to
> >> say that. The hard parts are 1.) determining the costs, and 2.)
> >> determining if the benefits are worth it.
> >>
> >>
> >> --
> >> Peter Geoghegan
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/postgis-users
> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>
>
>
> --
> Darafei "Komяpa" Praliaskouski
> OSM BY Team - http://openstreetmap.by/ <http://openstreetmap.by/>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210123/8d6c5b1d/attachment.html>
More information about the postgis-users
mailing list