[postgis-users] A PostGIS-Raster data proposal

Gregory S. Williamson gsw at globexplorer.com
Tue Oct 31 19:11:40 PST 2006


I don't know if it could be applicable, but Informix allows users to define different chunk sizes for different dbspaces when storing blobs/sblobs. Let's the user control the performance vs. waste to some degree. But I have no clue if something similar could be implemented in postgres.

And as with "normal" data, there's a trade-off. Shapefiles can be faster than postGIS for some things, but the database gives me benefits that far outweigh the increase in size [maybe -- haven't checked] and decresed throughput.  So as long as a rasterblob gives me something as a tradeoff, it becomes a potentially useful tool.

My $0.03 worth (inflation, ya know ...)

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:	postgis-users-bounces at postgis.refractions.net on behalf of Paul Ramsey
Sent:	Tue 10/31/2006 12:49 PM
To:	postgis-users at postgis.refractions.net
Cc:	
Subject:	Re: [postgis-users] A PostGIS-Raster data proposal

Remember not to get too hung up on performance... after all, we knew going into 
a database would degrade performance.  Presumably there are other 
characteristics of the database that we *like* that is worth taking a 
performance hit.

Also consider the Oracle approach in your thinking, which is to shred everything 
into < pagesize chunks.  We used this approach in our only raster-in-db 
implementation (using the CHIP type) which I can tell you from experience can be 
a real PITA to manage.

P

Quoting Stephen Marshall <smarshall at wsi.com>:

> I've extended my testing of server-side access to include testing large
> objects (LO's) as well as TOAST.
> 
> I found that LO's have similar performance characteristics to TOAST.  On
> average, it takes tens of microseconds to access a 1000 bytes chunk of
> data from a much larger LO.  For both TOAST and LO's, the access time
> varied around close to 40 microseconds per access, compared to around 4
> microseconds to do a similiar access from a local file using fseek and
> fread.
> 
> The LO and flatfile tests differed from the TOAST tests in that LO's and
> flatfiles must be opened before access.  Opening the LO and flatfiles
> were done once prior to all the reads, and were not included in timing
> tests.  Open a file general takes around 100 microseconds and opening a
> LO takes several hundred microseconds, so the overhead is large relative
> to the time to read a small chunk of data.  It will be important to
> avoid designs the require opening an LO or file more than once.
> 
> Determining the size the data also has performance characteristics I
> have not yet tested thoroughly, but which may be important.  My tests
> require knowing the total data size, which may not necessarily be needed
> for raster-in-DB (for example the data formats themselves may carry
> sufficient size information).  However, finding the size of the TOAST or
> LO can be relatively expensive.
> 
> I am looking into ways to find the size of the TOAST object, but my
> current method seems to require reading all the data into memory.  Since
> this subverts the whole point of seeking within the TOASTed data, I'm
> not happy with this.  I'm looking into how to just scape the length from
> the data from the bytea, which I believe should be stored in the first 4
> bytes of a bytea (variable length field).  If anyone knows different or
> better ways to find the length of a bytea variable, please let me know.
> 
> Finding the size of an LO seems to require seeking to the end of the LO
> and using lo_ftell to return the seek position.  This takes more time
> than a single 1000 byte read (~ 65 microseconds).  Anyone know a faster
> way to find the size of an LO?
> 
> Finally, these tests (still) do not include the time it takes to
> transfer the data back to the database client, thus all the accesses are
> inherently acces to the local system only.  I'll be expanding the tests
> to include remote access soon.
> 
> Steve Marshall
> 
> Marshall, Steve wrote:
> 
> >Per Frank Warmerdam's suggestion, I've done a test of access performance
> >using internal  postgresql toast functions vs. normal file seeking.
> >
> >The test involved seeking in a toasted bytea column containing
> >approximately 20 MB of binary data.  The TOAST column was set to
> >EXTERNAL storage (i.e. in separate TOAST table, but not compressed).
> >The test involved seeking through the data sequentially in chunks of
> >1000 bytes, and measuring the time to retrieve each chunk.  The code to
> >do this was encapsulated in a postgresql server-side function and
> >invoked through SQL.  I restarted the PostgreSQL server before the test
> >to avoid having any cacing of data in shared memory, which could
> >artificially speed up the data access.
> >
> >As a comparison, I also wrote a program that would do the equivalent
> >data access from a file.  The file contained the same data as the bytea
> >column, and the access was replaced with fseek and fread calls.
> >
> >The results of the test were that toast seeking was about 10 times more
> >expensive than seeking in a local file.  Each local file access averaged
> >in microseconds, while toast-seeks averaged 10's of microseconds.  The
> >worst case file seeking was in milliseconds, while worst case
> >toast-seeking was in 10's of milliseconds.  The absolute values for
> >toast-seeks don't seem too bad to me, but it is a bit worrying that the
> >values are an order of magnitude worse than local file I/O.
> >
> >I did play around with some parameters in the DB test.  Changing the
> >chunk size did not make a big difference, but it got a small boost by
> >setting it to the toast chunk size (1994 bytes). I did not vary the test
> >to do seeking around randomly instead of sequentially.  This might give
> >a boost to the DB implementation due to caching; I'm not sure what this
> >would do to file I/O.
> >
> >I also have not explored the performance of repeated access to the same
> >data segments.  Here PostgreSQL data caching might help DB access
> >relative to file I/O.
> >
> >There are still more things to do here, but I thought I'd share some
> >early results.  I'm happy to provide the code and SQL definitions for
> >the test, if anyone else is interested in it.
> >
> >Steve Marshall
> >
> >
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users


-------------------------------------------------------
Click link below if it is SPAM gsw at globexplorer.com
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4547b447209581804284693&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:4547b447209581804284693!
-------------------------------------------------------








More information about the postgis-users mailing list