[postgis-users] Parallel spatial indexing for GiST?
Paul Ramsey
pramsey at cleverelephant.ca
Wed Sep 16 10:05:57 PDT 2020
> On Sep 16, 2020, at 7:35 AM, Marco Boeringa <marco at boeringa.demon.nl> wrote:
>
> Hi all,
>
> This is probably more of a PostgreSQL question than a PostGIS one, but I have wondered if there is actually any work going on in allowing PostgreSQL / PostGIS to build GiST type spatial indexes in parallel, and / or if this is even logically and technically feasible? According to the PostgreSQL documentation, only B-tree indexes can be indexed in parallel.
>
> With the ever growing size of spatial databases like OpenStreetMap, with tables running into the 100s of million records, spatial indexing using GiST is one of the major bottle necks in re-creating or reloading a spatial PostGIS database. The indexing process seems highly CPU bound, with negligible disk activity for the majority of the time the indexing process runs, hence being able to take advantage of multiple cores seems like a possible big win. Nonetheless, there seems little to no mention of such (future) option for GiST type indexing when searching on the internet for relevant information.
Marco,
I do not know if there is active work in the area of making GIST index builds faster, but I have heard discussions of various approaches from people much smarter than I, so I am sure there are potential areas of improvement available. The single-threaded performance of index build might be made faster with some bulk/batch handling of inserts, though how that interacts with the generic GIST API expectation of one-at-a-time insertion I do not know.
Probably the biggest hurdle is just that the number of size-constrained GIST data sets is much smaller than that of BTREE data sets, so it's a lower priority. Certainly the growth in OSM ubiquity is increasing the number of users with very large spatial databases they need to index though, so we can expect more pressure as time goes on.
ATB,
P
>
> I have no knowledge of the internals of PostgreSQL, so there may be technical limitations I am unaware of that make this impossible or technically very difficult to implement, but it would be nice to hear something about this even if it is not feasible.
>
> Yes, I know there are BRIN type spatial indexes for PostGIS, which are comparatively super fast to create and lead to very small indexes even for ultra large tables, but from the little information and personal experience I gathered, BRIN seems most suited for Point data only, and for static, not updated data, due to its requirement of clustered data for efficiency (actually not a problem in my particular case, since I don't do updates, but only reloads). The few times I tried to use it for large, spatially clustered, Polygon data sets, it seemed less efficient when accessing the data spatially in a GIS, with clearly longer display times, although I don't have real benchmarks for that.
>
> Most OpenStreetMap related tools like e.g. osm2pgsql also default to GiST, and probably with good reason.
>
> Marco
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list