[postgis-devel] Postgis Raster JPEG 2000 (openjpeg)

Nathaniel Clay clay.nathaniel at gmail.com
Mon Mar 17 08:05:29 PDT 2014


Hi all,

Sorry for the long wait for a reply, I fell ill when starting to write this
email. Here is my response.


> How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5 or
> NetCDF?
>
 All of the mentioned, formats use libz compression, a loss less
compression scheme. With the exception that GTiff offers plain jpeg (DCT)
compression along with a fax format??
I think it may be a good idea to also ofter libz (deflate) and or LZMA
compression. -- Paul Ramsey pointed out that libz compression is applied
automatically by PgSQL.
JPEG 2000 uses Wavelet compression, allowing for both resolution
progression and quality progression and other progression ordering.

Killer Features:
1. Compression
2. 1 to 1 relation of tiles from one resolution to the next. Ability to
turn overviews in to Views.
3. Union and Clip via sort and append. ( Serialization != encode and decode
)
4. In format support for collections, series, with GML to support rendering.
5. In format support for GML, to support mixed queries of geometry and
raster.
6. Client server operations well defined and part of the specification for
JP2K and even suggests DB storage of the Code stream.
7. Partial (Quality) reads supporting error tolerant Scientific
calculations.
8. 3D (volumetric data)

Before I move on to the next questions a little background on my thought
process.
I see four different use types:
1. Active Server Tables
     * Tables that are actively processed on the server side, eg
MapAlgebra, Clip and the like, responding to the client with processed data.
2. Client/Server Tables
      * Classic Database model Client requests data and client processes
data and the server at clients request inserts/updates individual cell data
frequently.
3. Client Tables
      * These tables are updated by the client and rarely  processed server
side. Compressed data may be sent back for insert/update.
4. Archival Tables
      * These tables may require end to end verification  and/or validation
of data. Inserts and updates to these tables may also be forcefully
version-ed.

1. Should not be compressed. As regular server side access and processing
will take place.
2. The db administrator should have an option to have dynamic compression
available. Where by "active" rows are left decompressed/cached and are
compressed and moved to inactive compressed state in the background at the
servers convenience and scheduling. This would require Background workers.
Progressive reads would be restricted to inactive rows.
3. The db administrator should have an option for the client and server to
deal with only compressed data, it is on the disk compressed and is sent to
the client compressed. And any updates/inserts are compressed by the client
before being sent to the server. Any server side operation such as Clip
with trim would always incur a decompression on the server in this mode.
Progressive reads would be unrestricted.


Before deciding on a replacement format, we should identify if a
replacement is required?
What are the current short-comings?
1. Lack of high Ratio Compression both loss less and lossy
2. Overview table clutter *this is annoyance of mine, I would like to
investigate ways of moving overviews and the main table in to one or clean
it up in some way. (possibly views)
3. Simple Union and Clip should not incur a copy in to a new raster
structure, eg assembly of the union and clip should be shifted to the
client or consuming server side function allowing the consuming
function/application/client to make their own optimizations.  MapAlgebra
for example would only and transparently union/decode tiles that its
current operation spanned while the union operator would only organize the
data one coherent valid flow. In a geometry clip only the tiles to be
trimmed would be decoded and replaced in the stream. Yes, I realize that
you get that behaviour if you clip then union but a union then clip should
behave the same.
4. Partial and Progressive reads are not possible. (The ability to quickly
and memory efficiently "peek" in to a raster (choosing both resolution and
quality) and identify regions of interest either by  a client or an
optimized function. Cancelling or not performing reads in unneeded regions.
This is partially provided by overviews currently.)
5. Raster size should not be limited.
6.  Overviews are not automaticly updated. views?
7.  Being able to validate and correct raster data all the way through to
the client GDAL and writes back. Have our edits bullet proof.
8.  Easily, Opaquely and Efficiently store and access rasters that must be
version-ed and provide markup to support client rendering.
9.  Easily, Opaquely and Efficiently store and access rasters of a series
and provide markup to support client rendering.
10. Mosaic Rasters and raster collections. Provide simultaneous access to
both contiguously tiled product and the original shards. And and store them
efficiently (compression and dedup on shards).
11. All data with imported rasters should be faithfully preserved,
including color tables, and color profiles, and other metadata. Where and
how the data is stored and used internally to postgres is another mater.
12. Associated raster table data should transparently (in format) be
available to the client, including geometry.


Where are the performance limitations?
1. Raster size is limited, there should be no limit.
2. Disk, Memory, Network and CPU utilization should be manageable at table
creation by the db administrator. If disk space, I/O is at a premium then
compression may be a good option, of course trading for higher CPU
utilization.
3. Read spanning multiple tiles should not require copy to new raster
structure and full allocation in memory.

What are the wish-list items for a replacement format?
1. configurable compression
2. Partial Reads, the ability to have a usable representation of the data
with out reading all the data off the disk. ( Faster response to clients,
Feature Detection and Classification, Value Search)
3. Union (band,tiles), with out full copy in memory.
4. Raster size should not be limited.
5. 3D (volumetric data)
6. series support
7. Quality, Resolution and ordering of The product should be variable by
the client or consuming application/function.
8. Ability to efficiently support parallel processing when that becomes
available.
9. Support efficient and logical parallel loading of data.
10. Data validation all the way through to the client and back.
11. Raster versioning.
12. Application specific data ordering.


I have spent some time at poking at what I considered short-comings, where
I think performance is suffering, and what features are missing. I'm not
ready to publicly share my list though.

I wasn't involved with the original PostGIS raster binary format but from
my understanding while working it with, it appears as though the goal was
to keep things simple as all input and output goes through PostgreSQL,
which can (should?) be considered a black-box.

-bborie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140317/a5c51a7b/attachment.html>


More information about the postgis-devel mailing list