[postgis-users] A PostGIS-Raster data proposal

Craig Miller craig.miller at spatialminds.com
Tue Oct 31 12:57:27 PST 2006


PITA to manage from a programmers perspective, or PITA to manage from an DBA
perspective?  I was considering adding pgchip support into my software real
soon.

--Craig
 

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Paul
Ramsey
Sent: Tuesday, October 31, 2006 12:50 PM
To: postgis-users at postgis.refractions.net
Subject: Re: [postgis-users] A PostGIS-Raster data proposal

Remember not to get too hung up on performance... after all, we knew going
into a database would degrade performance.  Presumably there are other
characteristics of the database that we *like* that is worth taking a
performance hit.

Also consider the Oracle approach in your thinking, which is to shred
everything into < pagesize chunks.  We used this approach in our only
raster-in-db implementation (using the CHIP type) which I can tell you from
experience can be a real PITA to manage.

P

Quoting Stephen Marshall <smarshall at wsi.com>:

> I've extended my testing of server-side access to include testing 
> large objects (LO's) as well as TOAST.
> 
> I found that LO's have similar performance characteristics to TOAST.  
> On average, it takes tens of microseconds to access a 1000 bytes chunk 
> of data from a much larger LO.  For both TOAST and LO's, the access 
> time varied around close to 40 microseconds per access, compared to 
> around 4 microseconds to do a similiar access from a local file using 
> fseek and fread.
> 
> The LO and flatfile tests differed from the TOAST tests in that LO's 
> and flatfiles must be opened before access.  Opening the LO and 
> flatfiles were done once prior to all the reads, and were not included 
> in timing tests.  Open a file general takes around 100 microseconds 
> and opening a LO takes several hundred microseconds, so the overhead 
> is large relative to the time to read a small chunk of data.  It will 
> be important to avoid designs the require opening an LO or file more than
once.
> 
> Determining the size the data also has performance characteristics I 
> have not yet tested thoroughly, but which may be important.  My tests 
> require knowing the total data size, which may not necessarily be 
> needed for raster-in-DB (for example the data formats themselves may 
> carry sufficient size information).  However, finding the size of the 
> TOAST or LO can be relatively expensive.
> 
> I am looking into ways to find the size of the TOAST object, but my 
> current method seems to require reading all the data into memory.  
> Since this subverts the whole point of seeking within the TOASTed 
> data, I'm not happy with this.  I'm looking into how to just scape the 
> length from the data from the bytea, which I believe should be stored 
> in the first 4 bytes of a bytea (variable length field).  If anyone 
> knows different or better ways to find the length of a bytea variable,
please let me know.
> 
> Finding the size of an LO seems to require seeking to the end of the 
> LO and using lo_ftell to return the seek position.  This takes more 
> time than a single 1000 byte read (~ 65 microseconds).  Anyone know a 
> faster way to find the size of an LO?
> 
> Finally, these tests (still) do not include the time it takes to 
> transfer the data back to the database client, thus all the accesses 
> are inherently acces to the local system only.  I'll be expanding the 
> tests to include remote access soon.
> 
> Steve Marshall
> 
> Marshall, Steve wrote:
> 
> >Per Frank Warmerdam's suggestion, I've done a test of access 
> >performance using internal  postgresql toast functions vs. normal file
seeking.
> >
> >The test involved seeking in a toasted bytea column containing 
> >approximately 20 MB of binary data.  The TOAST column was set to 
> >EXTERNAL storage (i.e. in separate TOAST table, but not compressed).
> >The test involved seeking through the data sequentially in chunks of 
> >1000 bytes, and measuring the time to retrieve each chunk.  The code 
> >to do this was encapsulated in a postgresql server-side function and 
> >invoked through SQL.  I restarted the PostgreSQL server before the 
> >test to avoid having any cacing of data in shared memory, which could 
> >artificially speed up the data access.
> >
> >As a comparison, I also wrote a program that would do the equivalent 
> >data access from a file.  The file contained the same data as the 
> >bytea column, and the access was replaced with fseek and fread calls.
> >
> >The results of the test were that toast seeking was about 10 times 
> >more expensive than seeking in a local file.  Each local file access 
> >averaged in microseconds, while toast-seeks averaged 10's of 
> >microseconds.  The worst case file seeking was in milliseconds, while 
> >worst case toast-seeking was in 10's of milliseconds.  The absolute 
> >values for toast-seeks don't seem too bad to me, but it is a bit 
> >worrying that the values are an order of magnitude worse than local file
I/O.
> >
> >I did play around with some parameters in the DB test.  Changing the 
> >chunk size did not make a big difference, but it got a small boost by 
> >setting it to the toast chunk size (1994 bytes). I did not vary the 
> >test to do seeking around randomly instead of sequentially.  This 
> >might give a boost to the DB implementation due to caching; I'm not 
> >sure what this would do to file I/O.
> >
> >I also have not explored the performance of repeated access to the 
> >same data segments.  Here PostgreSQL data caching might help DB 
> >access relative to file I/O.
> >
> >There are still more things to do here, but I thought I'd share some 
> >early results.  I'm happy to provide the code and SQL definitions for 
> >the test, if anyone else is interested in it.
> >
> >Steve Marshall
> >
> >
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list