[postgis-devel] Postgis Raster JPEG 2000 (openjpeg)

Bborie Park dustymugs at gmail.com
Fri Mar 21 09:13:38 PDT 2014


So, would this JPEG 2000 file be stored in a single record's raster column?
Or the packet of this JPEG 2000 file? Or neither?

-bborie


On Fri, Mar 21, 2014 at 8:29 AM, Nathaniel Clay <clay.nathaniel at gmail.com>wrote:

> Bborie,
>
> Unlimited Raster Size, I think an Unlimited raster size is doable, given
> the peculiarities of  the JPEG 2000 format. The JPEG 2000 Code Stream
> format is broken in to packets, these packets are 1 meg in in size and have
> a header that can be and was designed to be indexed. The reassembly of the
> Raster is done by reading the each of packets related to the current tile.
>  The packets order determines the Progression order of the raster, being
> either Resolution, Quality, Component, or Layer ( band).  If we were to
> leave the packets in a table and use a cursor to scroll through the packets
> in a requested order.  Component ordering would be the default as it would
> assemble the tiles in linear order and at full resolution and Quality. We
> could create a scan lines in memory and as our consuming function passes
> over the tile and out of a tile that tile could be dropped. Thus only
> decoding an operation window in memory. This would allow us to treat a
> table of  "unlimited size" as a contiguous raster. This would require a
> concept of private packet tables, tucked under a different schema, and a
> public raster table, containing the "tiles".  Each of the tiles would
> contain metadata about the raster and maintain ordered lists of OIDs for
> each of the progression orders. If I am totally off base here please tell
> me, also if you have a simpler or more efficient idea please write back.
>
> Thanks,
>
> Nathaniel Hunter Clay
>
>
>
> On Mon, Mar 17, 2014 at 11:39 AM, Bborie Park <dustymugs at gmail.com> wrote:
>
>> Nice list. Have you looked at what of these are possible in PostgreSQL?
>>
>> I have a few general comments:
>>
>> 1. Overview clutter. I agree with this one.
>> 2. Unlimited raster size. This is a no go for anything with PostgreSQL
>>
>> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>>
>> http://michael.otacoo.com/postgresql-2/playing-with-large-objects-in-postgres/
>>
>> 3. Lossy compression. I can't say I'm for adding complexity. I prefer
>> keeping things simple and do that simple thing extremely well.
>>
>> 4. All data faithfully stored. Given that raster file formats themselves
>> can't faithfully keep all data between formats, I don't have high hopes for
>> this. I do think there are opportunities to provide structures to stash
>> metadata...
>>
>> 5. Partial read. Given a new serialized format, this is doable.
>>
>> 6. Lots of items with simultaneous/parallel keywords. This depends
>> heavily upon what to do in PostgreSQL. Parallel processing (reading,
>> operations, etc) through threads is not recommended by the PostgreSQL
>> developers themselves. Parallel processes is doable through PostgreSQL's
>> dynamic background workers but that API is still in active development by
>> the PostgreSQL developers...
>>
>> Basically, it sounds like you want the kitchen sink in this proposed
>> format. I wonder if that is the right approach/philosophy.
>>
>> Actually, I guess a more appropriate question would be: Would you want
>> what you've listed in a in-db data store? If so, you really should start
>> digging into what PostgreSQL can do and thin out your list.
>>
>> Given out-db support, we can add additional tools to take advantage of
>> what is available in the out-db file formats.
>>
>> I suppose I should collect my thoughts on what I find lacking in the
>> current format and propose a viable replacement format.
>>
>> -bborie
>>
>>
>> On Mon, Mar 17, 2014 at 8:05 AM, Nathaniel Clay <clay.nathaniel at gmail.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> Sorry for the long wait for a reply, I fell ill when starting to write
>>> this email. Here is my response.
>>>
>>>
>>>> How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5
>>>> or NetCDF?
>>>>
>>>  All of the mentioned, formats use libz compression, a loss less
>>> compression scheme. With the exception that GTiff offers plain jpeg (DCT)
>>> compression along with a fax format??
>>> I think it may be a good idea to also ofter libz (deflate) and or LZMA
>>> compression. -- Paul Ramsey pointed out that libz compression is applied
>>> automatically by PgSQL.
>>> JPEG 2000 uses Wavelet compression, allowing for both resolution
>>> progression and quality progression and other progression ordering.
>>>
>>> Killer Features:
>>> 1. Compression
>>> 2. 1 to 1 relation of tiles from one resolution to the next. Ability to
>>> turn overviews in to Views.
>>> 3. Union and Clip via sort and append. ( Serialization != encode and
>>> decode )
>>> 4. In format support for collections, series, with GML to support
>>> rendering.
>>> 5. In format support for GML, to support mixed queries of geometry and
>>> raster.
>>> 6. Client server operations well defined and part of the specification
>>> for JP2K and even suggests DB storage of the Code stream.
>>> 7. Partial (Quality) reads supporting error tolerant Scientific
>>> calculations.
>>> 8. 3D (volumetric data)
>>>
>>> Before I move on to the next questions a little background on my thought
>>> process.
>>>  I see four different use types:
>>> 1. Active Server Tables
>>>      * Tables that are actively processed on the server side, eg
>>> MapAlgebra, Clip and the like, responding to the client with processed data.
>>> 2. Client/Server Tables
>>>       * Classic Database model Client requests data and client processes
>>> data and the server at clients request inserts/updates individual cell data
>>> frequently.
>>> 3. Client Tables
>>>       * These tables are updated by the client and rarely  processed
>>> server side. Compressed data may be sent back for insert/update.
>>> 4. Archival Tables
>>>       * These tables may require end to end verification  and/or
>>> validation of data. Inserts and updates to these tables may also be
>>> forcefully version-ed.
>>>
>>> 1. Should not be compressed. As regular server side access and
>>> processing will take place.
>>> 2. The db administrator should have an option to have dynamic
>>> compression available. Where by "active" rows are left decompressed/cached
>>> and are compressed and moved to inactive compressed state in the background
>>> at the servers convenience and scheduling. This would require Background
>>> workers. Progressive reads would be restricted to inactive rows.
>>> 3. The db administrator should have an option for the client and server
>>> to deal with only compressed data, it is on the disk compressed and is sent
>>> to the client compressed. And any updates/inserts are compressed by the
>>> client before being sent to the server. Any server side operation such as
>>> Clip with trim would always incur a decompression on the server in this
>>> mode. Progressive reads would be unrestricted.
>>>
>>>
>>> Before deciding on a replacement format, we should identify if a
>>> replacement is required?
>>> What are the current short-comings?
>>> 1. Lack of high Ratio Compression both loss less and lossy
>>> 2. Overview table clutter *this is annoyance of mine, I would like to
>>> investigate ways of moving overviews and the main table in to one or clean
>>> it up in some way. (possibly views)
>>> 3. Simple Union and Clip should not incur a copy in to a new raster
>>> structure, eg assembly of the union and clip should be shifted to the
>>> client or consuming server side function allowing the consuming
>>> function/application/client to make their own optimizations.  MapAlgebra
>>> for example would only and transparently union/decode tiles that its
>>> current operation spanned while the union operator would only organize the
>>> data one coherent valid flow. In a geometry clip only the tiles to be
>>> trimmed would be decoded and replaced in the stream. Yes, I realize that
>>> you get that behaviour if you clip then union but a union then clip should
>>> behave the same.
>>> 4. Partial and Progressive reads are not possible. (The ability to
>>> quickly and memory efficiently "peek" in to a raster (choosing both
>>> resolution and quality) and identify regions of interest either by  a
>>> client or an optimized function. Cancelling or not performing reads in
>>> unneeded regions. This is partially provided by overviews currently.)
>>> 5. Raster size should not be limited.
>>> 6.  Overviews are not automaticly updated. views?
>>> 7.  Being able to validate and correct raster data all the way through
>>> to the client GDAL and writes back. Have our edits bullet proof.
>>> 8.  Easily, Opaquely and Efficiently store and access rasters that must
>>> be version-ed and provide markup to support client rendering.
>>> 9.  Easily, Opaquely and Efficiently store and access rasters of a
>>> series and provide markup to support client rendering.
>>> 10. Mosaic Rasters and raster collections. Provide simultaneous access
>>> to both contiguously tiled product and the original shards. And and store
>>> them efficiently (compression and dedup on shards).
>>> 11. All data with imported rasters should be faithfully preserved,
>>> including color tables, and color profiles, and other metadata. Where and
>>> how the data is stored and used internally to postgres is another mater.
>>> 12. Associated raster table data should transparently (in format) be
>>> available to the client, including geometry.
>>>
>>>
>>> Where are the performance limitations?
>>> 1. Raster size is limited, there should be no limit.
>>> 2. Disk, Memory, Network and CPU utilization should be manageable at
>>> table creation by the db administrator. If disk space, I/O is at a premium
>>> then  compression may be a good option, of course trading for higher CPU
>>> utilization.
>>> 3. Read spanning multiple tiles should not require copy to new raster
>>> structure and full allocation in memory.
>>>
>>> What are the wish-list items for a replacement format?
>>> 1. configurable compression
>>> 2. Partial Reads, the ability to have a usable representation of the
>>> data with out reading all the data off the disk. ( Faster response to
>>> clients, Feature Detection and Classification, Value Search)
>>> 3. Union (band,tiles), with out full copy in memory.
>>> 4. Raster size should not be limited.
>>> 5. 3D (volumetric data)
>>> 6. series support
>>> 7. Quality, Resolution and ordering of The product should be variable by
>>> the client or consuming application/function.
>>> 8. Ability to efficiently support parallel processing when that becomes
>>> available.
>>> 9. Support efficient and logical parallel loading of data.
>>> 10. Data validation all the way through to the client and back.
>>> 11. Raster versioning.
>>> 12. Application specific data ordering.
>>>
>>>
>>> I have spent some time at poking at what I considered short-comings,
>>> where I think performance is suffering, and what features are missing. I'm
>>> not ready to publicly share my list though.
>>>
>>> I wasn't involved with the original PostGIS raster binary format but
>>> from my understanding while working it with, it appears as though the goal
>>> was to keep things simple as all input and output goes through PostgreSQL,
>>> which can (should?) be considered a black-box.
>>>
>>> -bborie
>>>
>>>
>>> _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at lists.osgeo.org
>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>>>
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140321/9f4fe16b/attachment.html>


More information about the postgis-devel mailing list