[pgpointcloud] Toast(ed)
Howard Butler
howard at hobu.co
Thu Feb 6 09:44:07 PST 2014
Paul,
In playing around with both data loading and data export with pgpointcloud, and I've noticed some interesting things that are counter-intuitive to my Oracle-polluted mind. I have made a number of improvements to PDAL and the pgpointcloud drivers to speed things up here and there. In some cases, the load speeds are 30% faster than before.
1) The total run import time of a significant file (8 million points) using small patches (400 pts) is faster than large ones (120,000 pts)
2) Query back out to PDAL are roughly the same for both large and small patches
After discussing with you, the reason for this is Toast [1], and the inline storage in the small patch scenario but the side-car storage in the large patch scenario. Small patches present some other challenges though:
If I want to store 1 trillion points in pg (I have a 5 trillion point Oracle scenario right now), I might need 2.5 billion rows to get Toast'able performance (2x or better in many of my tests). The large patch scenario above only has 8.3 million rows. Each patch has a fixed overhead that once you get to 2.5 billion rows starts to overwhelm things a little bit. 2.5 billion index entries. 2.5 billion patch boundaries. 2.5 billion primary keys.
Of course you can say, "use large patches" then and shoo me away, but the performance is noticeably sucky (expect that I'm using out-of-the-box configuration with the checkpoints bumped up a lot). How can I get the best of both worlds? Can I increase the toast size? For a given schema, is there a way to determine the maximum patch size to still fit inside a toast'able row? Can more effort be funded to beef Postgresql in general up on the toast aspect? Is it silly to have a table with 2.5 billion rows, and instead have a bunch of sharded-out databases that collectively are 2.5 billion rows (ends up with all the same overheads though, even if you can run it)?
Howard
[1] http://www.postgresql.org/docs/8.3/static/storage-toast.html
More information about the pgpointcloud
mailing list