[postgis-users] Parallel spatial indexing for GiST?

Marco Boeringa marco at boeringa.demon.nl
Wed Sep 16 12:54:21 PDT 2020


Hi Paul,

Appreciate your insights. Good to hear there appear to be opportunities 
for improvements to GiST index build speed in the future, even if no 
active work is being done right now. Yes, I do think a lot of people, 
and an increasing number, could benefit from such work. I personally 
would certainly applaud any improvements being made, as it is especially 
clear that disk speed is not an issue in most of the processing 
involved, and disk speed therefor unlikely to become limiting with any 
improvements in index creation, meaning there is likely a good 
opportunity for improving GiST index build speed.

Marco

Op 16-9-2020 om 19:05 schreef Paul Ramsey:
>
>> On Sep 16, 2020, at 7:35 AM, Marco Boeringa <marco at boeringa.demon.nl> wrote:
>>
>> Hi all,
>>
>> This is probably more of a PostgreSQL question than a PostGIS one, but I have wondered if there is actually any work going on in allowing PostgreSQL / PostGIS to build GiST type spatial indexes in parallel, and / or if this is even logically and technically feasible? According to the PostgreSQL documentation, only B-tree indexes can be indexed in parallel.
>>
>> With the ever growing size of spatial databases like OpenStreetMap, with tables running into the 100s of million records, spatial indexing using GiST is one of the major bottle necks in re-creating or reloading a spatial PostGIS database. The indexing process seems highly CPU bound, with negligible disk activity for the majority of the time the indexing process runs, hence being able to take advantage of multiple cores seems like a possible big win. Nonetheless, there seems little to no mention of such (future) option for GiST type indexing when searching on the internet for relevant information.
> Marco,
> I do not know if there is active work in the area of making GIST index builds faster, but I have heard discussions of various approaches from people much smarter than I, so I am sure there are potential areas of improvement available. The single-threaded performance of index build might be made faster with some bulk/batch handling of inserts, though how that interacts with the generic GIST API expectation of one-at-a-time insertion I do not know.
> Probably the biggest hurdle is just that the number of size-constrained GIST data sets is much smaller than that of BTREE data sets, so it's a lower priority. Certainly the growth in OSM ubiquity is increasing the number of users with very large spatial databases they need to index though, so we can expect more pressure as time goes on.
> ATB,
> P
>


More information about the postgis-users mailing list