[postgis-devel] Postgis Raster JPEG 2000 (openjpeg)

Fri Mar 21 08:29:00 PDT 2014

Bborie,

Unlimited Raster Size, I think an Unlimited raster size is doable, given
the peculiarities of  the JPEG 2000 format. The JPEG 2000 Code Stream
format is broken in to packets, these packets are 1 meg in in size and have
a header that can be and was designed to be indexed. The reassembly of the
Raster is done by reading the each of packets related to the current tile.
 The packets order determines the Progression order of the raster, being
either Resolution, Quality, Component, or Layer ( band).  If we were to
leave the packets in a table and use a cursor to scroll through the packets
in a requested order.  Component ordering would be the default as it would
assemble the tiles in linear order and at full resolution and Quality. We
could create a scan lines in memory and as our consuming function passes
over the tile and out of a tile that tile could be dropped. Thus only
decoding an operation window in memory. This would allow us to treat a
table of  "unlimited size" as a contiguous raster. This would require a
concept of private packet tables, tucked under a different schema, and a
public raster table, containing the "tiles".  Each of the tiles would
contain metadata about the raster and maintain ordered lists of OIDs for
each of the progression orders. If I am totally off base here please tell
me, also if you have a simpler or more efficient idea please write back.

Thanks,

Nathaniel Hunter Clay

On Mon, Mar 17, 2014 at 11:39 AM, Bborie Park <dustymugs at gmail.com> wrote:

> Nice list. Have you looked at what of these are possible in PostgreSQL?
>
> I have a few general comments:
>
> 1. Overview clutter. I agree with this one.
> 2. Unlimited raster size. This is a no go for anything with PostgreSQL
>
> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>
> http://michael.otacoo.com/postgresql-2/playing-with-large-objects-in-postgres/
>
> 3. Lossy compression. I can't say I'm for adding complexity. I prefer
> keeping things simple and do that simple thing extremely well.
>
> 4. All data faithfully stored. Given that raster file formats themselves
> can't faithfully keep all data between formats, I don't have high hopes for
> this. I do think there are opportunities to provide structures to stash
> metadata...
>
> 5. Partial read. Given a new serialized format, this is doable.
>
> 6. Lots of items with simultaneous/parallel keywords. This depends heavily
> upon what to do in PostgreSQL. Parallel processing (reading, operations,
> etc) through threads is not recommended by the PostgreSQL developers
> themselves. Parallel processes is doable through PostgreSQL's dynamic
> background workers but that API is still in active development by the
> PostgreSQL developers...
>
> Basically, it sounds like you want the kitchen sink in this proposed
> format. I wonder if that is the right approach/philosophy.
>
> Actually, I guess a more appropriate question would be: Would you want
> what you've listed in a in-db data store? If so, you really should start
> digging into what PostgreSQL can do and thin out your list.
>
> Given out-db support, we can add additional tools to take advantage of
> what is available in the out-db file formats.
>
> I suppose I should collect my thoughts on what I find lacking in the
> current format and propose a viable replacement format.
>
> -bborie
>
>
> On Mon, Mar 17, 2014 at 8:05 AM, Nathaniel Clay <clay.nathaniel at gmail.com>wrote:
>
>> Hi all,
>>
>> Sorry for the long wait for a reply, I fell ill when starting to write
>> this email. Here is my response.
>>
>>
>>> How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5
>>> or NetCDF?
>>>
>>  All of the mentioned, formats use libz compression, a loss less
>> compression scheme. With the exception that GTiff offers plain jpeg (DCT)
>> compression along with a fax format??
>> I think it may be a good idea to also ofter libz (deflate) and or LZMA
>> compression. -- Paul Ramsey pointed out that libz compression is applied
>> automatically by PgSQL.
>> JPEG 2000 uses Wavelet compression, allowing for both resolution
>> progression and quality progression and other progression ordering.
>>
>> Killer Features:
>> 1. Compression
>> 2. 1 to 1 relation of tiles from one resolution to the next. Ability to
>> turn overviews in to Views.
>> 3. Union and Clip via sort and append. ( Serialization != encode and
>> decode )
>> 4. In format support for collections, series, with GML to support
>> rendering.
>> 5. In format support for GML, to support mixed queries of geometry and
>> raster.
>> 6. Client server operations well defined and part of the specification
>> for JP2K and even suggests DB storage of the Code stream.
>> 7. Partial (Quality) reads supporting error tolerant Scientific
>> calculations.
>> 8. 3D (volumetric data)
>>
>> Before I move on to the next questions a little background on my thought
>> process.
>>  I see four different use types:
>> 1. Active Server Tables
>>      * Tables that are actively processed on the server side, eg
>> MapAlgebra, Clip and the like, responding to the client with processed data.
>> 2. Client/Server Tables
>>       * Classic Database model Client requests data and client processes
>> data and the server at clients request inserts/updates individual cell data
>> frequently.
>> 3. Client Tables
>>       * These tables are updated by the client and rarely  processed
>> server side. Compressed data may be sent back for insert/update.
>> 4. Archival Tables
>>       * These tables may require end to end verification  and/or
>> validation of data. Inserts and updates to these tables may also be
>> forcefully version-ed.
>>
>> 1. Should not be compressed. As regular server side access and processing
>> will take place.
>> 2. The db administrator should have an option to have dynamic compression
>> available. Where by "active" rows are left decompressed/cached and are
>> compressed and moved to inactive compressed state in the background at the
>> servers convenience and scheduling. This would require Background workers.
>> Progressive reads would be restricted to inactive rows.
>> 3. The db administrator should have an option for the client and server
>> to deal with only compressed data, it is on the disk compressed and is sent
>> to the client compressed. And any updates/inserts are compressed by the
>> client before being sent to the server. Any server side operation such as
>> Clip with trim would always incur a decompression on the server in this
>> mode. Progressive reads would be unrestricted.
>>
>>
>> Before deciding on a replacement format, we should identify if a
>> replacement is required?
>> What are the current short-comings?
>> 1. Lack of high Ratio Compression both loss less and lossy
>> 2. Overview table clutter *this is annoyance of mine, I would like to
>> investigate ways of moving overviews and the main table in to one or clean
>> it up in some way. (possibly views)
>> 3. Simple Union and Clip should not incur a copy in to a new raster
>> structure, eg assembly of the union and clip should be shifted to the
>> client or consuming server side function allowing the consuming
>> function/application/client to make their own optimizations.  MapAlgebra
>> for example would only and transparently union/decode tiles that its
>> current operation spanned while the union operator would only organize the
>> data one coherent valid flow. In a geometry clip only the tiles to be
>> trimmed would be decoded and replaced in the stream. Yes, I realize that
>> you get that behaviour if you clip then union but a union then clip should
>> behave the same.
>> 4. Partial and Progressive reads are not possible. (The ability to
>> quickly and memory efficiently "peek" in to a raster (choosing both
>> resolution and quality) and identify regions of interest either by  a
>> client or an optimized function. Cancelling or not performing reads in
>> unneeded regions. This is partially provided by overviews currently.)
>> 5. Raster size should not be limited.
>> 6.  Overviews are not automaticly updated. views?
>> 7.  Being able to validate and correct raster data all the way through to
>> the client GDAL and writes back. Have our edits bullet proof.
>> 8.  Easily, Opaquely and Efficiently store and access rasters that must
>> be version-ed and provide markup to support client rendering.
>> 9.  Easily, Opaquely and Efficiently store and access rasters of a series
>> and provide markup to support client rendering.
>> 10. Mosaic Rasters and raster collections. Provide simultaneous access to
>> both contiguously tiled product and the original shards. And and store them
>> efficiently (compression and dedup on shards).
>> 11. All data with imported rasters should be faithfully preserved,
>> including color tables, and color profiles, and other metadata. Where and
>> how the data is stored and used internally to postgres is another mater.
>> 12. Associated raster table data should transparently (in format) be
>> available to the client, including geometry.
>>
>>
>> Where are the performance limitations?
>> 1. Raster size is limited, there should be no limit.
>> 2. Disk, Memory, Network and CPU utilization should be manageable at
>> table creation by the db administrator. If disk space, I/O is at a premium
>> then  compression may be a good option, of course trading for higher CPU
>> utilization.
>> 3. Read spanning multiple tiles should not require copy to new raster
>> structure and full allocation in memory.
>>
>> What are the wish-list items for a replacement format?
>> 1. configurable compression
>> 2. Partial Reads, the ability to have a usable representation of the data
>> with out reading all the data off the disk. ( Faster response to clients,
>> Feature Detection and Classification, Value Search)
>> 3. Union (band,tiles), with out full copy in memory.
>> 4. Raster size should not be limited.
>> 5. 3D (volumetric data)
>> 6. series support
>> 7. Quality, Resolution and ordering of The product should be variable by
>> the client or consuming application/function.
>> 8. Ability to efficiently support parallel processing when that becomes
>> available.
>> 9. Support efficient and logical parallel loading of data.
>> 10. Data validation all the way through to the client and back.
>> 11. Raster versioning.
>> 12. Application specific data ordering.
>>
>>
>> I have spent some time at poking at what I considered short-comings,
>> where I think performance is suffering, and what features are missing. I'm
>> not ready to publicly share my list though.
>>
>> I wasn't involved with the original PostGIS raster binary format but from
>> my understanding while working it with, it appears as though the goal was
>> to keep things simple as all input and output goes through PostgreSQL,
>> which can (should?) be considered a black-box.
>>
>> -bborie
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140321/358bdfd4/attachment.html>