[pgpointcloud] Toast(ed)

Thu Feb 6 09:57:48 PST 2014

Howard,
The thing to note is that, when you “TOAST” a tuple that is larger than the page size, you just cut it into page-sized chunks and store it into a side table.
So, you still get a billion records for your trillion point case, you just get them somewhere hidden off to the side.
Similarly if we moved to using blobs instead, we’d still end up with a billion records (maybe not in one table, that would be an implementation question) http://www.postgresql.org/docs/current/static/lo-implementation.html
Since we store patches with practically no extra information next to them in the tuple, just reading the PC_MemSize (I think that’s the function) of the patch gives an idea of storage you can use to bump up your patch size to close to the maximum (8kb). 
You can also recompile your database with a higher page size if you’re feeling hacky.
P.

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

On February 6, 2014 at 9:44:13 AM, Howard Butler (howard at hobu.co) wrote:

Paul,  

In playing around with both data loading and data export with pgpointcloud, and I've noticed some interesting things that are counter-intuitive to my Oracle-polluted mind. I have made a number of improvements to PDAL and the pgpointcloud drivers to speed things up here and there. In some cases, the load speeds are 30% faster than before.  

1) The total run import time of a significant file (8 million points) using small patches (400 pts) is faster than large ones (120,000 pts)  
2) Query back out to PDAL are roughly the same for both large and small patches  

After discussing with you, the reason for this is Toast [1], and the inline storage in the small patch scenario but the side-car storage in the large patch scenario. Small patches present some other challenges though:  

If I want to store 1 trillion points in pg (I have a 5 trillion point Oracle scenario right now), I might need 2.5 billion rows to get Toast'able performance (2x or better in many of my tests). The large patch scenario above only has 8.3 million rows. Each patch has a fixed overhead that once you get to 2.5 billion rows starts to overwhelm things a little bit. 2.5 billion index entries. 2.5 billion patch boundaries. 2.5 billion primary keys.  

Of course you can say, "use large patches" then and shoo me away, but the performance is noticeably sucky (expect that I'm using out-of-the-box configuration with the checkpoints bumped up a lot). How can I get the best of both worlds? Can I increase the toast size? For a given schema, is there a way to determine the maximum patch size to still fit inside a toast'able row? Can more effort be funded to beef Postgresql in general up on the toast aspect? Is it silly to have a table with 2.5 billion rows, and instead have a bunch of sharded-out databases that collectively are 2.5 billion rows (ends up with all the same overheads though, even if you can run it)?  

Howard  

[1] http://www.postgresql.org/docs/8.3/static/storage-toast.html  
_______________________________________________  
pgpointcloud mailing list  
pgpointcloud at lists.osgeo.org  
http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20140206/fddd55cb/attachment.html>