<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Howard,</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">The thing to note is that, when you “TOAST” a tuple that is larger than the page size, you just cut it into page-sized chunks and store it into a side table.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">So, you still get a billion records for your trillion point case, you just get them somewhere hidden off to the side.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Similarly if we moved to using blobs instead, we’d still end up with a billion records (maybe not in one table, that would be an implementation question) http://www.postgresql.org/docs/current/static/lo-implementation.html</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Since we store patches with practically no extra information next to them in the tuple, just reading the PC_MemSize (I think that’s the function) of the patch gives an idea of storage you can use to bump up your patch size to close to the maximum (8kb). </div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">You can also recompile your database with a higher page size if you’re feeling hacky.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">P.</div> <div id="bloop_sign_1391708832561353984" class="bloop_sign"><div><br></div><span style="font-family:helvetica,arial;font-size:13px"></span>-- <br>Paul Ramsey<br>http://cleverelephant.ca<div>http://postgis.net</div></div> <br><p style="color:#A0A0A8;">On February 6, 2014 at 9:44:13 AM, Howard Butler (<a href="mailto://howard@hobu.co">howard@hobu.co</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div>Paul,

<br>

<br>In playing around with both data loading and data export with pgpointcloud, and I've noticed some interesting things that are counter-intuitive to my Oracle-polluted mind. I have made a number of improvements to PDAL and the pgpointcloud drivers to speed things up here and there. In some cases, the load speeds are 30% faster than before.

<br>

<br>1) The total run import time of a significant file (8 million points) using small patches (400 pts) is faster than large ones (120,000 pts)

<br>2) Query back out to PDAL are roughly the same for both large and small patches

<br>

<br>After discussing with you, the reason for this is Toast [1], and the inline storage in the small patch scenario but the side-car storage in the large patch scenario. Small patches present some other challenges though:

<br>

<br>If I want to store 1 trillion points in pg (I have a 5 trillion point Oracle scenario right now), I might need 2.5 billion rows to get Toast'able performance (2x or better in many of my tests). The large patch scenario above only has 8.3 million rows. Each patch has a fixed overhead that once you get to 2.5 billion rows starts to overwhelm things a little bit.  2.5 billion index entries. 2.5 billion patch boundaries. 2.5 billion primary keys.

<br>

<br>Of course you can say, "use large patches" then and shoo me away, but the performance is noticeably sucky (expect that I'm using out-of-the-box configuration with the checkpoints bumped up a lot).  How can I get the best of both worlds? Can I increase the toast size? For a given schema, is there a way to determine the maximum patch size to still fit inside a toast'able row? Can more effort be funded to beef Postgresql in general up on the toast aspect? Is it silly to have a table with 2.5 billion rows, and instead have a bunch of sharded-out databases that collectively are 2.5 billion rows (ends up with all the same overheads though, even if you can run it)?    

<br>

<br>Howard

<br>

<br>[1] http://www.postgresql.org/docs/8.3/static/storage-toast.html

<br>_______________________________________________

<br>pgpointcloud mailing list

<br>pgpointcloud@lists.osgeo.org

<br>http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud

<br></div></div></span></blockquote></body></html>