[gdal-dev] GDAL, vsis3 and vsisubfile

Mike Pfaffenberger mike.pfaffenberger at gmail.com
Mon Jul 24 19:41:05 PDT 2017


Hi Even,

I ran the script you linked, and your hypothesis is absolutely correct.

<JP2KCodeStream filename="/vsisubfile/4038_901949970,/vsis3/glitch253/
test2.ntf">
.
.
.
<Field name="SGcod_Progress" type="uint8" description="RLCP">1</Field>
<Field name="SGcod_NumLayers" type="uint16">19</Field>

I also added a quick printf in the vsi subfile read function which prints
the nSize and nCount variables. Running the python script you linked me
triggered the vsi subfile read function 75,868 times, mostly with small
sizes, and nCount=1.

Doing the same thing on my gdal_translate -srcwin 000 000 1000 1000
triggered vsi subfile read 9,024 times, almost all with nSize=1 and
nCount=1024. If the vsisubfile object is wrapping a vsis3 dataset, then
does each vsi subfile read turn into an HTTP request? That would certainly
explain the extremely long time to crop my window.

Just out of curiosity I ran the python script you linked on my JP2 file
(same image as the NITF, I just ran gdal_translate on it).

This one appears to have the codestream progression order LRCP with only
one layer...?:
 <Field name="SGcod_Progress" type="uint8" description="LRCP">0</Field>
 <Field name="SGcod_NumLayers" type="uint16">1</Field>

I'm guessing the fact that my JP2 file only has one layer is the reason
vsis3 works well with it, regardless of it being LRCP (not optimal for
windowed reads).

Anyway, thanks. I learned some more about JPEG2K here. Unfortunately I
think I'm pretty out of luck on the prospect of doing remote windowed reads
quickly on this data. However, I'm very open to suggestions if anyone has
any ideas on how it might work.

Cheers.

On Mon, Jul 24, 2017 at 11:21 AM, Even Rouault <even.rouault at spatialys.com>
wrote:

> Mike,
>
>
>
> (note to other readers: this is the continuation of the thread
>
> [gdal-dev] VSIS3 on digital globe multiview-stereo (NITF) )
>
>
>
> > I turned on some debug options that shed some light on to what's going
> on.
>
> > It appears that the NITF driver must internally open a JPEG 2000 Driver
> on
>
> > a virtual subfile. In my case, that virtual subfile starts at offset 4038
>
> > and continues to the end of the file, offset 901949970.
>
> >
>
> > While this is a nice way of providing a JPEG2000 decompression routine to
>
> > the NITF driver, when accessing a remote dataset, it causes the entire
> file
>
> > to be downloaded even when reading a small window.
>
> >
>
> > I used gdal_translate locally on my NITF file and turned it into a JP2
>
> > file, then I uploaded this file to S3 and ran my gdal_translate -srcwin
> 000
>
> > 000 1000 1000 /vsis3/mybucket/jp2file.JP2 local_file.tiff and it ran
>
> > instantly. Is there a way to completely bypass using the NITF driver and
>
> > simply open the NITF file with the JP2 driver wrapped up with vsis3?
>
>
>
> Yes, you should be able to open the following filename, but this is
> actually what the NITF driver does :
>
> /vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf (you may need to
> adjust the second value '901949970' to be 901949970-4038, since it is
> supposed to be a lenght and not an offset)
>
> This shoud be recognized by one of the JPEG2000 drivers, and you should
> likely get the same performance characteristics as using it through the
> NITF driver (or the NITF driver does something that requires reading the
> whole file, but I don't think so)
>
>
>
> My hypothesis is that the root cause of the performance issue comes is the
> progression order of the JPEG2000 codestream of this NITF file, that causes
> most of the file to be read through. Likely only X % of bytes are really
> read, but as they are scattered throughout the whole file, given the chunk
> by chunk downloading logic of /vsis3, you end up reading the whole file in
> practice.
>
> For example I'd expect LRCP (Layer-Resolution-Component-Precincts), RLCP
> and RPCL to cause issues. Whereas PCRL and CPRL should perform better for
> windowed requests.
>
>
>
> http://www.gwg.nga.mil/ntb/baseline/docs/bpj2k01/ISOJ2K_profile.pdf
> recommands using LRCP with 19-20 quality layers, so that would indeed cause
> a lot of seeking through the file. You can check the progression order in
> the output of the following (check for "SGcod_Progress")
>
>
>
> python dump_jp2.py /vsisubfile/4038_901949970,/vsis3/glitch253/test2.ntf
>
>
>
> where dump_jp2.py is
>
> https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/dump_jp2.py
>
>
>
> It is likely that your translating into JP2 turn the original codestream
> into one with a progression order that is more seeking friendly (the
> default progression order may be different depending on drivers)
>
>
>
> Even
>
>
>
> --
>
> Spatialys - Geospatial professional services
>
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170724/ad57b35f/attachment.html>


More information about the gdal-dev mailing list