[postgis-users] Slow construction of GiST index, but better with smaller # of big rows

Tue Jan 15 13:59:15 PST 2019

In addition to the aforementioned solutions and recommendations (this is a fun thread), if your geometry are points than you could leverage the pointcloud extension which would do a bit of the clustering for you by storing the points in patches.

[http://sig.cmparks.net/cmp-ms-90x122.png]Stephen V. Mather
GIS Manager
(216) 635-3243<tel:(216)%20635-3243> (Work)
--sent from phone--

On Jan 15, 2019, at 16:45, Felix Kunde <felix-kunde at gmx.de<mailto:felix-kunde at gmx.de>> wrote:

As already mentioned by others, the more rows the longer it takes to build the index.

In preparation for a conference talk, I'm currently testing the performance of different spatial indexes that PostGIS offers: GiST, sp-GiST and BRIN. I'm also testing with 1 billion randomly generated points, lines and polygons and it took me around 6 hours to build the GiST index on my laptop. sp-GIST was usually twice as fast.

So far, I can say that GiST tends to slightly faster for large data sets than sp-GIST and often twice as fast as BRIN. The big advantage of BRIN imo is that it's fast to build (6hrs vs. 3,5 min) and requires hardly any disk space (50 GB vs. 3,6 MB) due to its simplicity. Just give it a try and see if the executions times are fine for you. Note, that it can be necessary to execute SET enable_seqscan = false before. In my case, only then did the query planner recognize my BRIN index.

For optimal performance your spatial can be clustered on disk. For this, I created a functional btree index transforming the geometries with ST_GeoHash. Btree indexes can be created in parallel with the latest Postgres version (took me 30min). Unfortunately, the CLUSTER command then takes a long time (8hrs on my machine). In the end my test query (a simple containment test) took around 250ms with BRIN (and 220ms with GIST and sp-GIST).

lg Felix

Gesendet: Sonntag, 13. Januar 2019 um 18:38 Uhr
Von: "Giuseppe Broccolo" <g.broccolo.7 at gmail.com<mailto:g.broccolo.7 at gmail.com>>
An: "PostGIS Users Discussion" <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Betreff: Re: [postgis-users] Slow construction of GiST index, but better with smaller # of big rows
Hi Wembo,

Il giorno sab 12 gen 2019 alle ore 16:29 Wenbo Tao <taowenbo1993 at gmail.com<mailto:taowenbo1993 at gmail.com>> ha scritto:
Hello,

    I was trying to build a GiST index on a geometry column in a table with 1 billion rows. It took an entire week to finish.

    Then I reduced the number of rows by grouping closer objects into one clump (using some clustering algorithm), and then compressed the clump as one row (the geometry column becomes the bounding box of all objects in that clump). The construction then went way faster -- down to 12 hours. I did this because the query I need to answer is finding all objects whose bbox intersects with a given rectangle. I can now query all clumps whose bbox intersects with the rectangle.

   So essentially, the index construction is slow for too many rows, but much faster for a smaller # of bigger rows. Any intuition why this is the case would be greatly appreciated!

Well, building GiST indexes requires an execution time that grows linearly with the size of the dataset (~O(N)). Of course, also hardware (CPU, storage, ...) impacts the build. So long execution times for one billion rows sound reasonable.

Your solution could be fine: you cluster close objects and index the obtained rows, than you can retrieve the clusters themselves and finally find the exact match. Of course, it is not an "elegant" solution.

You already had the suggestion to partition your table, and then index the single partitions, that could be completely fine.

A second suggestion I would like to give you, is to consider BRIN indexing, thought specifically for large datasets:

https://postgis.net/docs/using_postgis_dbmanagement.html#brin_indexes

Of course, there are some limitations with this index, so I invite you to read the linked documentation and consider your specific use case. But for intersections between bbox (and your case looks to be the case), BRINs could be a really good solution.

Hope this can help,
Giuseppe.
_______________________________________________ postgis-users mailing list postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org> https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20190115/b503313e/attachment.html>