<div dir="ltr"><div dir="ltr"><div dir="ltr">As others said, indexing time is proportional to table size (actually more like O( N logN )). So if the number of rows is reduced the index build time will decrease.<div><br></div><div>In a recent blog post Paul listed some ideas about how index build performance could be improved via parallelization [1]. But that will require some changes in Postgres. (Although give the recent flurry of parallelization enhancements, maybe it won't be long now).</div><div><br></div><div>It would be nifty if GIST trees could be packed/bulk-loaded using one of the Rtree packing approaches (STR-tree or Hilbert) [2]. That should be faster than one-by-one insertion. Although might not be as amenable to parallelization.</div><div><br></div><div>[1] <a href="http://blog.cleverelephant.ca/2018/10/postgis-sprint-2.html" target="_blank">http://blog.cleverelephant.ca/2018/10/postgis-sprint-2.html</a></div><div>[2] <a href="https://en.wikipedia.org/wiki/R-tree">https://en.wikipedia.org/wiki/R-tree</a></div><div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Jan 12, 2019 at 8:29 AM Wenbo Tao <<a href="mailto:taowenbo1993@gmail.com" target="_blank">taowenbo1993@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hello,<div><br></div><div> I was trying to build a GiST index on a geometry column in a table with 1 billion rows. It took an entire week to finish. </div><div><br></div><div> Then I reduced the number of rows by grouping closer objects into one clump (using some clustering algorithm), and then compressed the clump as one row (the geometry column becomes the bounding box of all objects in that clump). The construction then went way faster -- down to 12 hours. I did this because the query I need to answer is finding all objects whose bbox intersects with a given rectangle. I can now query all clumps whose bbox intersects with the rectangle. </div><div><br></div><div> So essentially, the index construction is slow for too many rows, but much faster for a smaller # of bigger rows. Any intuition why this is the case would be greatly appreciated!</div><div><br></div><div>Thank you,</div><div>Wenbo Tao</div></div>
_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>