[postgis-devel] Postgis Raster JPEG 2000 (openjpeg)
Bborie Park
dustymugs at gmail.com
Mon Mar 17 08:39:56 PDT 2014
Nice list. Have you looked at what of these are possible in PostgreSQL?
I have a few general comments:
1. Overview clutter. I agree with this one.
2. Unlimited raster size. This is a no go for anything with PostgreSQL
https://wiki.postgresql.org/wiki/BinaryFilesInDB
http://michael.otacoo.com/postgresql-2/playing-with-large-objects-in-postgres/
3. Lossy compression. I can't say I'm for adding complexity. I prefer
keeping things simple and do that simple thing extremely well.
4. All data faithfully stored. Given that raster file formats themselves
can't faithfully keep all data between formats, I don't have high hopes for
this. I do think there are opportunities to provide structures to stash
metadata...
5. Partial read. Given a new serialized format, this is doable.
6. Lots of items with simultaneous/parallel keywords. This depends heavily
upon what to do in PostgreSQL. Parallel processing (reading, operations,
etc) through threads is not recommended by the PostgreSQL developers
themselves. Parallel processes is doable through PostgreSQL's dynamic
background workers but that API is still in active development by the
PostgreSQL developers...
Basically, it sounds like you want the kitchen sink in this proposed
format. I wonder if that is the right approach/philosophy.
Actually, I guess a more appropriate question would be: Would you want what
you've listed in a in-db data store? If so, you really should start digging
into what PostgreSQL can do and thin out your list.
Given out-db support, we can add additional tools to take advantage of what
is available in the out-db file formats.
I suppose I should collect my thoughts on what I find lacking in the
current format and propose a viable replacement format.
-bborie
On Mon, Mar 17, 2014 at 8:05 AM, Nathaniel Clay <clay.nathaniel at gmail.com>wrote:
> Hi all,
>
> Sorry for the long wait for a reply, I fell ill when starting to write
> this email. Here is my response.
>
>
>> How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5 or
>> NetCDF?
>>
> All of the mentioned, formats use libz compression, a loss less
> compression scheme. With the exception that GTiff offers plain jpeg (DCT)
> compression along with a fax format??
> I think it may be a good idea to also ofter libz (deflate) and or LZMA
> compression. -- Paul Ramsey pointed out that libz compression is applied
> automatically by PgSQL.
> JPEG 2000 uses Wavelet compression, allowing for both resolution
> progression and quality progression and other progression ordering.
>
> Killer Features:
> 1. Compression
> 2. 1 to 1 relation of tiles from one resolution to the next. Ability to
> turn overviews in to Views.
> 3. Union and Clip via sort and append. ( Serialization != encode and
> decode )
> 4. In format support for collections, series, with GML to support
> rendering.
> 5. In format support for GML, to support mixed queries of geometry and
> raster.
> 6. Client server operations well defined and part of the specification for
> JP2K and even suggests DB storage of the Code stream.
> 7. Partial (Quality) reads supporting error tolerant Scientific
> calculations.
> 8. 3D (volumetric data)
>
> Before I move on to the next questions a little background on my thought
> process.
> I see four different use types:
> 1. Active Server Tables
> * Tables that are actively processed on the server side, eg
> MapAlgebra, Clip and the like, responding to the client with processed data.
> 2. Client/Server Tables
> * Classic Database model Client requests data and client processes
> data and the server at clients request inserts/updates individual cell data
> frequently.
> 3. Client Tables
> * These tables are updated by the client and rarely processed
> server side. Compressed data may be sent back for insert/update.
> 4. Archival Tables
> * These tables may require end to end verification and/or
> validation of data. Inserts and updates to these tables may also be
> forcefully version-ed.
>
> 1. Should not be compressed. As regular server side access and processing
> will take place.
> 2. The db administrator should have an option to have dynamic compression
> available. Where by "active" rows are left decompressed/cached and are
> compressed and moved to inactive compressed state in the background at the
> servers convenience and scheduling. This would require Background workers.
> Progressive reads would be restricted to inactive rows.
> 3. The db administrator should have an option for the client and server to
> deal with only compressed data, it is on the disk compressed and is sent to
> the client compressed. And any updates/inserts are compressed by the client
> before being sent to the server. Any server side operation such as Clip
> with trim would always incur a decompression on the server in this mode.
> Progressive reads would be unrestricted.
>
>
> Before deciding on a replacement format, we should identify if a
> replacement is required?
> What are the current short-comings?
> 1. Lack of high Ratio Compression both loss less and lossy
> 2. Overview table clutter *this is annoyance of mine, I would like to
> investigate ways of moving overviews and the main table in to one or clean
> it up in some way. (possibly views)
> 3. Simple Union and Clip should not incur a copy in to a new raster
> structure, eg assembly of the union and clip should be shifted to the
> client or consuming server side function allowing the consuming
> function/application/client to make their own optimizations. MapAlgebra
> for example would only and transparently union/decode tiles that its
> current operation spanned while the union operator would only organize the
> data one coherent valid flow. In a geometry clip only the tiles to be
> trimmed would be decoded and replaced in the stream. Yes, I realize that
> you get that behaviour if you clip then union but a union then clip should
> behave the same.
> 4. Partial and Progressive reads are not possible. (The ability to quickly
> and memory efficiently "peek" in to a raster (choosing both resolution and
> quality) and identify regions of interest either by a client or an
> optimized function. Cancelling or not performing reads in unneeded regions.
> This is partially provided by overviews currently.)
> 5. Raster size should not be limited.
> 6. Overviews are not automaticly updated. views?
> 7. Being able to validate and correct raster data all the way through to
> the client GDAL and writes back. Have our edits bullet proof.
> 8. Easily, Opaquely and Efficiently store and access rasters that must be
> version-ed and provide markup to support client rendering.
> 9. Easily, Opaquely and Efficiently store and access rasters of a series
> and provide markup to support client rendering.
> 10. Mosaic Rasters and raster collections. Provide simultaneous access to
> both contiguously tiled product and the original shards. And and store them
> efficiently (compression and dedup on shards).
> 11. All data with imported rasters should be faithfully preserved,
> including color tables, and color profiles, and other metadata. Where and
> how the data is stored and used internally to postgres is another mater.
> 12. Associated raster table data should transparently (in format) be
> available to the client, including geometry.
>
>
> Where are the performance limitations?
> 1. Raster size is limited, there should be no limit.
> 2. Disk, Memory, Network and CPU utilization should be manageable at table
> creation by the db administrator. If disk space, I/O is at a premium then
> compression may be a good option, of course trading for higher CPU
> utilization.
> 3. Read spanning multiple tiles should not require copy to new raster
> structure and full allocation in memory.
>
> What are the wish-list items for a replacement format?
> 1. configurable compression
> 2. Partial Reads, the ability to have a usable representation of the data
> with out reading all the data off the disk. ( Faster response to clients,
> Feature Detection and Classification, Value Search)
> 3. Union (band,tiles), with out full copy in memory.
> 4. Raster size should not be limited.
> 5. 3D (volumetric data)
> 6. series support
> 7. Quality, Resolution and ordering of The product should be variable by
> the client or consuming application/function.
> 8. Ability to efficiently support parallel processing when that becomes
> available.
> 9. Support efficient and logical parallel loading of data.
> 10. Data validation all the way through to the client and back.
> 11. Raster versioning.
> 12. Application specific data ordering.
>
>
> I have spent some time at poking at what I considered short-comings, where
> I think performance is suffering, and what features are missing. I'm not
> ready to publicly share my list though.
>
> I wasn't involved with the original PostGIS raster binary format but from
> my understanding while working it with, it appears as though the goal was
> to keep things simple as all input and output goes through PostgreSQL,
> which can (should?) be considered a black-box.
>
> -bborie
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140317/e89236d3/attachment.html>
More information about the postgis-devel
mailing list