<div dir="ltr">Nice list. Have you looked at what of these are possible in PostgreSQL?<div><br></div><div>I have a few general comments:</div><div><br></div><div>1. Overview clutter. I agree with this one.</div><div>2. Unlimited raster size. This is a no go for anything with PostgreSQL</div>
<div><br></div><div><a href="https://wiki.postgresql.org/wiki/BinaryFilesInDB">https://wiki.postgresql.org/wiki/BinaryFilesInDB</a></div><div><a href="http://michael.otacoo.com/postgresql-2/playing-with-large-objects-in-postgres/">http://michael.otacoo.com/postgresql-2/playing-with-large-objects-in-postgres/</a><br>
</div><div><br></div><div>3. Lossy compression. I can't say I'm for adding complexity. I prefer keeping things simple and do that simple thing extremely well.</div><div><br></div><div>4. All data faithfully stored. Given that raster file formats themselves can't faithfully keep all data between formats, I don't have high hopes for this. I do think there are opportunities to provide structures to stash metadata...</div>
<div><br></div><div>5. Partial read. Given a new serialized format, this is doable.</div><div><br></div><div>6. Lots of items with simultaneous/parallel keywords. This depends heavily upon what to do in PostgreSQL. Parallel processing (reading, operations, etc) through threads is not recommended by the PostgreSQL developers themselves. Parallel processes is doable through PostgreSQL's dynamic background workers but that API is still in active development by the PostgreSQL developers...</div>
<div><br></div><div>Basically, it sounds like you want the kitchen sink in this proposed format. I wonder if that is the right approach/philosophy.</div><div><br></div><div>Actually, I guess a more appropriate question would be: Would you want what you've listed in a in-db data store? If so, you really should start digging into what PostgreSQL can do and thin out your list.</div>
<div><br></div><div>Given out-db support, we can add additional tools to take advantage of what is available in the out-db file formats.</div><div><br></div><div>I suppose I should collect my thoughts on what I find lacking in the current format and propose a viable replacement format.<br>
<div><br></div><div>-bborie</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Mar 17, 2014 at 8:05 AM, Nathaniel Clay <span dir="ltr"><<a href="mailto:clay.nathaniel@gmail.com" target="_blank">clay.nathaniel@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi all,</div><div><br></div><div>Sorry for the long wait for a reply, I fell ill when starting to write this email. Here is my response.</div>
<div class=""><div>
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">How does OpenJPEG 2000 compare to other formats, such as GeoTIFF, HDF5 or NetCDF?</div>
</blockquote></div></div><div>
All of the mentioned, formats use libz compression, a loss less
compression scheme. With the exception that GTiff offers plain jpeg (DCT)
compression along with a fax format??<br></div><div>I think it may be a good idea to also ofter libz (deflate) and or LZMA compression. -- Paul Ramsey pointed out that libz compression is applied automatically by PgSQL. <br>
</div><div>JPEG 2000 uses Wavelet compression, allowing for both resolution progression and quality progression and other progression ordering.<br>
</div><div><br></div><div>Killer Features:<br></div><div>1. Compression<br></div><div>2. 1 to 1 relation of tiles from one resolution to the next. Ability to turn overviews in to Views.<br></div><div>3. Union and Clip via sort and append. ( Serialization != encode and decode )<br>
</div><div>4. In format support for collections, series, with GML to support rendering.<br></div><div>5. In format support for GML, to support mixed queries of geometry and raster.</div><div>6. Client server operations well defined and part of the specification for JP2K and even suggests DB storage of the Code stream.</div>
<div>7. Partial (Quality) reads supporting error tolerant Scientific calculations. </div><div>8. 3D (volumetric data)</div><div class=""><div><br>Before I move on to the next questions a little background on my thought process.<br>
</div>
</div><div>I see four different use types:<br></div><div>
1. Active Server Tables<br><div><div>
<img> * Tables that are actively processed on the server side, eg MapAlgebra, Clip and the like, responding to the client with processed data.<br></div>
<div>2. Client/Server Tables<br></div><div> * Classic Database model Client requests data and client processes data and the server at clients request inserts/updates individual cell data frequently.<br>
</div><div>3. Client Tables<br></div><div> * These tables are updated by the client and rarely processed server side. Compressed data may be sent back for insert/update.<br></div><div>4. Archival Tables<br>
</div></div><div> * These tables may require end to end verification and/or validation of data. Inserts and updates to these tables may also be forcefully version-ed.<br><br></div><div>1. Should not be compressed. As regular server side access and processing will take place.<br>
</div><div>2. The db administrator should have an option to have dynamic compression available. Where by "active" rows are left decompressed/cached and are compressed and moved to inactive compressed state in the background at the servers convenience and scheduling. This would require Background workers. Progressive reads would be restricted to inactive rows.<br>
</div><div>3. The db administrator should have an option for the client and server to deal with only compressed data, it is on the disk compressed and is sent to the client compressed. And any updates/inserts are compressed by the client before being sent to the server. Any server side operation such as Clip with trim would always incur a decompression on the server in this mode. Progressive reads would be unrestricted.<br>
<br>
</div></div>
<div dir="ltr"><div class=""><br><div>Before deciding on a replacement format,
we should identify if a replacement is required? <br>What are the current
short-comings? <br></div></div><div>1. Lack of high Ratio Compression both loss less and lossy <br></div><div>2. Overview table clutter *this is annoyance of mine, I would like to investigate ways of moving overviews and the main table in to one or clean it up in some way. (possibly views)<br>
</div><div>3. Simple Union and Clip should not incur a copy in to a new raster structure, eg assembly of the union and clip should be shifted to the client or consuming server side function allowing the consuming function/application/client to make their own optimizations. MapAlgebra for example would only and transparently union/decode tiles that its current operation spanned while the union operator would only organize the data one coherent valid flow. In a geometry clip only the tiles to be trimmed would be decoded and replaced in the stream. Yes, I realize that you get that behaviour if you clip then union but a union then clip should behave the same.<br>
</div><div>4. Partial and Progressive reads are not possible. (The ability to quickly and memory efficiently "peek" in to a raster (choosing both resolution and quality) and identify regions of interest either by a client or an optimized function. Cancelling or not performing reads in unneeded regions. This is partially provided by overviews currently.)<br>
</div><div>5. Raster size should not be limited.<br></div><div>6. Overviews are not automaticly updated. views?<br></div><div>7. Being able to validate and correct raster data all the way through to the client GDAL and writes back. Have our edits bullet proof.<br>
</div><div>8. Easily, Opaquely and Efficiently store and access rasters that must be version-ed and provide markup to support client rendering.<br></div><div>9. Easily, Opaquely and Efficiently store and access rasters of a series and provide markup to support client rendering.<br>
</div><div>10. Mosaic Rasters and raster collections. Provide simultaneous access to both contiguously tiled product and the original shards. And and store them efficiently (compression and dedup on shards).<br></div><div>
11. All data with imported rasters should be faithfully preserved, including color tables, and color profiles, and other metadata. Where and how the data is stored and used internally to postgres is another mater.<br></div>
<div>12. Associated raster table data should transparently (in format) be available to the client, including geometry. <br></div><div class=""><div><br></div><div>
<br></div><div>Where are the performance limitations? <br></div></div><div>1. Raster size is limited, there should be no limit.<br></div><div>2. Disk, Memory, Network and CPU utilization should be manageable at table creation by the db administrator. If disk space, I/O is at a premium then compression may be a good option, of course trading for higher CPU utilization.<br>
</div><div>3. Read spanning multiple tiles should not require copy to new raster structure and full allocation in memory.<br></div><div class=""><div><br>What are the
wish-list items for a replacement format?<br></div></div><div>1. configurable compression <br></div><div>2. Partial Reads, the ability to have a usable representation of the data with out reading all the data off the disk. ( Faster response to clients, Feature Detection and Classification, Value Search)<br>
</div><div>3. Union (band,tiles), with out full copy in memory.<br></div><div>4. Raster size should not be limited.<br></div><div>5. 3D (volumetric data)<br></div><div>6. series support<br></div><div>7. Quality, Resolution and ordering of The product should be variable by the client or consuming application/function.<br>
</div><div>8. Ability to efficiently support parallel processing when that becomes available.<br></div><div>9. Support efficient and logical parallel loading of data.<br></div><div>10. Data validation all the way through to the client and back.<br>
</div><div>11. Raster versioning.<br></div><div>12. Application specific data ordering.<br></div><div class=""><div><br>
</div>
<div><br></div><div>I have spent some time at poking at what I
considered short-comings, where I think performance is suffering, and
what features are missing. I'm not ready to publicly share my list
though.</div><div>
<br></div><div>I wasn't involved with the original PostGIS raster binary
format but from my understanding while working it with, it appears as
though the goal was to keep things simple as all input and output goes
through PostgreSQL, which can (should?) be considered a black-box.</div>
<div><br></div><div>-bborie</div></div></div><br></div>
<br>_______________________________________________<br>
postgis-devel mailing list<br>
<a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel</a><br></blockquote></div><br></div>