[postgis-users] Slow construction of GiST index, but better with smaller # of big rows

Giuseppe Broccolo g.broccolo.7 at gmail.com
Sun Jan 13 09:38:25 PST 2019


Hi Wembo,

Il giorno sab 12 gen 2019 alle ore 16:29 Wenbo Tao <taowenbo1993 at gmail.com>
ha scritto:

> Hello,
>
>     I was trying to build a GiST index on a geometry column in a table
> with 1 billion rows. It took an entire week to finish.
>
>     Then I reduced the number of rows by grouping closer objects into one
> clump (using some clustering algorithm), and then compressed the clump as
> one row (the geometry column becomes the bounding box of all objects in
> that clump). The construction then went way faster -- down to 12 hours. I
> did this because the query I need to answer is finding all objects whose
> bbox intersects with a given rectangle. I can now query all clumps whose
> bbox intersects with the rectangle.
>
>    So essentially, the index construction is slow for too many rows, but
> much faster for a smaller # of bigger rows. Any intuition why this is the
> case would be greatly appreciated!
>

Well, building GiST indexes requires an execution time that grows linearly
with the size of the dataset (~O(N)). Of course, also hardware (CPU,
storage, ...) impacts the build. So long execution times for one billion
rows sound reasonable.

Your solution could be fine: you cluster close objects and index the
obtained rows, than you can retrieve the clusters themselves and finally
find the exact match. Of course, it is not an "elegant" solution.

You already had the suggestion to partition your table, and then index the
single partitions, that could be completely fine.

A second suggestion I would like to give you, is to consider BRIN indexing,
thought specifically for large datasets:

https://postgis.net/docs/using_postgis_dbmanagement.html#brin_indexes

Of course, there are some limitations with this index, so I invite you to
read the linked documentation and consider your specific use case. But for
intersections between bbox (and your case looks to be the case), BRINs
could be a really good solution.

Hope this can help,
Giuseppe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20190113/e8d5f9b6/attachment.html>


More information about the postgis-users mailing list