[postgis-users] A PostGIS-Raster data proposal
Marshall, Steve
smarshall at wsi.com
Fri Oct 27 14:31:40 PDT 2006
Per Frank Warmerdam's suggestion, I've done a test of access performance
using internal postgresql toast functions vs. normal file seeking.
The test involved seeking in a toasted bytea column containing
approximately 20 MB of binary data. The TOAST column was set to
EXTERNAL storage (i.e. in separate TOAST table, but not compressed).
The test involved seeking through the data sequentially in chunks of
1000 bytes, and measuring the time to retrieve each chunk. The code to
do this was encapsulated in a postgresql server-side function and
invoked through SQL. I restarted the PostgreSQL server before the test
to avoid having any cacing of data in shared memory, which could
artificially speed up the data access.
As a comparison, I also wrote a program that would do the equivalent
data access from a file. The file contained the same data as the bytea
column, and the access was replaced with fseek and fread calls.
The results of the test were that toast seeking was about 10 times more
expensive than seeking in a local file. Each local file access averaged
in microseconds, while toast-seeks averaged 10's of microseconds. The
worst case file seeking was in milliseconds, while worst case
toast-seeking was in 10's of milliseconds. The absolute values for
toast-seeks don't seem too bad to me, but it is a bit worrying that the
values are an order of magnitude worse than local file I/O.
I did play around with some parameters in the DB test. Changing the
chunk size did not make a big difference, but it got a small boost by
setting it to the toast chunk size (1994 bytes). I did not vary the test
to do seeking around randomly instead of sequentially. This might give
a boost to the DB implementation due to caching; I'm not sure what this
would do to file I/O.
I also have not explored the performance of repeated access to the same
data segments. Here PostgreSQL data caching might help DB access
relative to file I/O.
There are still more things to do here, but I thought I'd share some
early results. I'm happy to provide the code and SQL definitions for
the test, if anyone else is interested in it.
Steve Marshall
More information about the postgis-users
mailing list