[postgis-users] A PostGIS-Raster data proposal
Craig Miller
craig.miller at spatialminds.com
Tue Oct 31 12:57:27 PST 2006
PITA to manage from a programmers perspective, or PITA to manage from an DBA
perspective? I was considering adding pgchip support into my software real
soon.
--Craig
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Paul
Ramsey
Sent: Tuesday, October 31, 2006 12:50 PM
To: postgis-users at postgis.refractions.net
Subject: Re: [postgis-users] A PostGIS-Raster data proposal
Remember not to get too hung up on performance... after all, we knew going
into a database would degrade performance. Presumably there are other
characteristics of the database that we *like* that is worth taking a
performance hit.
Also consider the Oracle approach in your thinking, which is to shred
everything into < pagesize chunks. We used this approach in our only
raster-in-db implementation (using the CHIP type) which I can tell you from
experience can be a real PITA to manage.
P
Quoting Stephen Marshall <smarshall at wsi.com>:
> I've extended my testing of server-side access to include testing
> large objects (LO's) as well as TOAST.
>
> I found that LO's have similar performance characteristics to TOAST.
> On average, it takes tens of microseconds to access a 1000 bytes chunk
> of data from a much larger LO. For both TOAST and LO's, the access
> time varied around close to 40 microseconds per access, compared to
> around 4 microseconds to do a similiar access from a local file using
> fseek and fread.
>
> The LO and flatfile tests differed from the TOAST tests in that LO's
> and flatfiles must be opened before access. Opening the LO and
> flatfiles were done once prior to all the reads, and were not included
> in timing tests. Open a file general takes around 100 microseconds
> and opening a LO takes several hundred microseconds, so the overhead
> is large relative to the time to read a small chunk of data. It will
> be important to avoid designs the require opening an LO or file more than
once.
>
> Determining the size the data also has performance characteristics I
> have not yet tested thoroughly, but which may be important. My tests
> require knowing the total data size, which may not necessarily be
> needed for raster-in-DB (for example the data formats themselves may
> carry sufficient size information). However, finding the size of the
> TOAST or LO can be relatively expensive.
>
> I am looking into ways to find the size of the TOAST object, but my
> current method seems to require reading all the data into memory.
> Since this subverts the whole point of seeking within the TOASTed
> data, I'm not happy with this. I'm looking into how to just scape the
> length from the data from the bytea, which I believe should be stored
> in the first 4 bytes of a bytea (variable length field). If anyone
> knows different or better ways to find the length of a bytea variable,
please let me know.
>
> Finding the size of an LO seems to require seeking to the end of the
> LO and using lo_ftell to return the seek position. This takes more
> time than a single 1000 byte read (~ 65 microseconds). Anyone know a
> faster way to find the size of an LO?
>
> Finally, these tests (still) do not include the time it takes to
> transfer the data back to the database client, thus all the accesses
> are inherently acces to the local system only. I'll be expanding the
> tests to include remote access soon.
>
> Steve Marshall
>
> Marshall, Steve wrote:
>
> >Per Frank Warmerdam's suggestion, I've done a test of access
> >performance using internal postgresql toast functions vs. normal file
seeking.
> >
> >The test involved seeking in a toasted bytea column containing
> >approximately 20 MB of binary data. The TOAST column was set to
> >EXTERNAL storage (i.e. in separate TOAST table, but not compressed).
> >The test involved seeking through the data sequentially in chunks of
> >1000 bytes, and measuring the time to retrieve each chunk. The code
> >to do this was encapsulated in a postgresql server-side function and
> >invoked through SQL. I restarted the PostgreSQL server before the
> >test to avoid having any cacing of data in shared memory, which could
> >artificially speed up the data access.
> >
> >As a comparison, I also wrote a program that would do the equivalent
> >data access from a file. The file contained the same data as the
> >bytea column, and the access was replaced with fseek and fread calls.
> >
> >The results of the test were that toast seeking was about 10 times
> >more expensive than seeking in a local file. Each local file access
> >averaged in microseconds, while toast-seeks averaged 10's of
> >microseconds. The worst case file seeking was in milliseconds, while
> >worst case toast-seeking was in 10's of milliseconds. The absolute
> >values for toast-seeks don't seem too bad to me, but it is a bit
> >worrying that the values are an order of magnitude worse than local file
I/O.
> >
> >I did play around with some parameters in the DB test. Changing the
> >chunk size did not make a big difference, but it got a small boost by
> >setting it to the toast chunk size (1994 bytes). I did not vary the
> >test to do seeking around randomly instead of sequentially. This
> >might give a boost to the DB implementation due to caching; I'm not
> >sure what this would do to file I/O.
> >
> >I also have not explored the performance of repeated access to the
> >same data segments. Here PostgreSQL data caching might help DB
> >access relative to file I/O.
> >
> >There are still more things to do here, but I thought I'd share some
> >early results. I'm happy to provide the code and SQL definitions for
> >the test, if anyone else is interested in it.
> >
> >Steve Marshall
> >
> >
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list