[postgis-devel] Postgis Raster JPEG 2000 (openjpeg)

Even Rouault even.rouault at mines-paris.org
Mon Mar 17 09:12:10 PDT 2014


Hi,

Just lurking at this discussion...

One practical element to take into account is that open source JPEG2000
encoders/decoders (the only one being maintained that I know of is OpenJPEG v2)
are, at least up to now, significantly slower than their proprietary
alternatives, or not as capable.

Even

> Hi all,
>
> Sorry for the long wait for a reply, I fell ill when starting to write this
> email. Here is my response.
>
>
> > How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5 or
> > NetCDF?
> >
>  All of the mentioned, formats use libz compression, a loss less
> compression scheme. With the exception that GTiff offers plain jpeg (DCT)
> compression along with a fax format??
> I think it may be a good idea to also ofter libz (deflate) and or LZMA
> compression. -- Paul Ramsey pointed out that libz compression is applied
> automatically by PgSQL.
> JPEG 2000 uses Wavelet compression, allowing for both resolution
> progression and quality progression and other progression ordering.
>
> Killer Features:
> 1. Compression
> 2. 1 to 1 relation of tiles from one resolution to the next. Ability to
> turn overviews in to Views.
> 3. Union and Clip via sort and append. ( Serialization != encode and decode
> )
> 4. In format support for collections, series, with GML to support rendering.
> 5. In format support for GML, to support mixed queries of geometry and
> raster.
> 6. Client server operations well defined and part of the specification for
> JP2K and even suggests DB storage of the Code stream.
> 7. Partial (Quality) reads supporting error tolerant Scientific
> calculations.
> 8. 3D (volumetric data)
>
> Before I move on to the next questions a little background on my thought
> process.
> I see four different use types:
> 1. Active Server Tables
>      * Tables that are actively processed on the server side, eg
> MapAlgebra, Clip and the like, responding to the client with processed data.
> 2. Client/Server Tables
>       * Classic Database model Client requests data and client processes
> data and the server at clients request inserts/updates individual cell data
> frequently.
> 3. Client Tables
>       * These tables are updated by the client and rarely  processed server
> side. Compressed data may be sent back for insert/update.
> 4. Archival Tables
>       * These tables may require end to end verification  and/or validation
> of data. Inserts and updates to these tables may also be forcefully
> version-ed.
>
> 1. Should not be compressed. As regular server side access and processing
> will take place.
> 2. The db administrator should have an option to have dynamic compression
> available. Where by "active" rows are left decompressed/cached and are
> compressed and moved to inactive compressed state in the background at the
> servers convenience and scheduling. This would require Background workers.
> Progressive reads would be restricted to inactive rows.
> 3. The db administrator should have an option for the client and server to
> deal with only compressed data, it is on the disk compressed and is sent to
> the client compressed. And any updates/inserts are compressed by the client
> before being sent to the server. Any server side operation such as Clip
> with trim would always incur a decompression on the server in this mode.
> Progressive reads would be unrestricted.
>
>
> Before deciding on a replacement format, we should identify if a
> replacement is required?
> What are the current short-comings?
> 1. Lack of high Ratio Compression both loss less and lossy
> 2. Overview table clutter *this is annoyance of mine, I would like to
> investigate ways of moving overviews and the main table in to one or clean
> it up in some way. (possibly views)
> 3. Simple Union and Clip should not incur a copy in to a new raster
> structure, eg assembly of the union and clip should be shifted to the
> client or consuming server side function allowing the consuming
> function/application/client to make their own optimizations.  MapAlgebra
> for example would only and transparently union/decode tiles that its
> current operation spanned while the union operator would only organize the
> data one coherent valid flow. In a geometry clip only the tiles to be
> trimmed would be decoded and replaced in the stream. Yes, I realize that
> you get that behaviour if you clip then union but a union then clip should
> behave the same.
> 4. Partial and Progressive reads are not possible. (The ability to quickly
> and memory efficiently "peek" in to a raster (choosing both resolution and
> quality) and identify regions of interest either by  a client or an
> optimized function. Cancelling or not performing reads in unneeded regions.
> This is partially provided by overviews currently.)
> 5. Raster size should not be limited.
> 6.  Overviews are not automaticly updated. views?
> 7.  Being able to validate and correct raster data all the way through to
> the client GDAL and writes back. Have our edits bullet proof.
> 8.  Easily, Opaquely and Efficiently store and access rasters that must be
> version-ed and provide markup to support client rendering.
> 9.  Easily, Opaquely and Efficiently store and access rasters of a series
> and provide markup to support client rendering.
> 10. Mosaic Rasters and raster collections. Provide simultaneous access to
> both contiguously tiled product and the original shards. And and store them
> efficiently (compression and dedup on shards).
> 11. All data with imported rasters should be faithfully preserved,
> including color tables, and color profiles, and other metadata. Where and
> how the data is stored and used internally to postgres is another mater.
> 12. Associated raster table data should transparently (in format) be
> available to the client, including geometry.
>
>
> Where are the performance limitations?
> 1. Raster size is limited, there should be no limit.
> 2. Disk, Memory, Network and CPU utilization should be manageable at table
> creation by the db administrator. If disk space, I/O is at a premium then
> compression may be a good option, of course trading for higher CPU
> utilization.
> 3. Read spanning multiple tiles should not require copy to new raster
> structure and full allocation in memory.
>
> What are the wish-list items for a replacement format?
> 1. configurable compression
> 2. Partial Reads, the ability to have a usable representation of the data
> with out reading all the data off the disk. ( Faster response to clients,
> Feature Detection and Classification, Value Search)
> 3. Union (band,tiles), with out full copy in memory.
> 4. Raster size should not be limited.
> 5. 3D (volumetric data)
> 6. series support
> 7. Quality, Resolution and ordering of The product should be variable by
> the client or consuming application/function.
> 8. Ability to efficiently support parallel processing when that becomes
> available.
> 9. Support efficient and logical parallel loading of data.
> 10. Data validation all the way through to the client and back.
> 11. Raster versioning.
> 12. Application specific data ordering.
>
>
> I have spent some time at poking at what I considered short-comings, where
> I think performance is suffering, and what features are missing. I'm not
> ready to publicly share my list though.
>
> I wasn't involved with the original PostGIS raster binary format but from
> my understanding while working it with, it appears as though the goal was
> to keep things simple as all input and output goes through PostgreSQL,
> which can (should?) be considered a black-box.
>
> -bborie
>





More information about the postgis-devel mailing list