[gdal-dev] Info about technical details of loading massive data

Thu Feb 11 03:05:59 PST 2021

On Thu, 11 Feb 2021, Richard Duivenvoorde wrote:

> Hi Dev's,
>
> I had a discussion with a friend about the sometimes hard times a
> GIS-person has when handling/loading/viewing (using QGIS/GDAL)
> massive (vector/raster) datasets, versus the R/Data-mangling
> community.
>
> Ending with a conclusion that it seemed (to us) that data-scientists
> try to load as much (clean objects/multi dimensional arrays) data in
> memory as possible, while GIS peeps always use the 'let's make it
> some kind of feature object from first, and do lazy loading' way
> use.
>
> BUT I'm not sure about this, so: is there maybe somebody who held a
> presentation or wrote a paper on how, for example gdal, handles a
> huge point file vs R (memory/disk/io wise)?
>
> While historically the 'Simple Features'-paradigm has be VERY
> valuable for us, I'm questioning myself if there could be some 'more
> efficient' way of handling the every time growing datasets we have
> to handle... I envision a super fast memory-data viewer or so, so I
> can quickly view my 16 Million points in my Postgis DB easily (mmm
> probably have to fork QGIS 0.1 for this... QGIS started of as a
> 'simple' postgis viewer :-) )

My experience is limited to file-based data and machines
have grown to the point where the files will fit in memory.

I have written a couple of device drivers (not yet released)
for raster file formats which seem designed for memory-mapped
read access. Although functions like VSIFReadL support
reading from memory-based files, I have not found a way to
use memory-mapping in a driver.
This makes me wonder whether I end up with three copies of the map in
memory in addition to whatever is needed for the screen display;
one in the Linux file-system cache, one in the driver
and one (or perhaps two) in the gdal library and QGIS ?

I haven't looked, and perhaps should, to see whether QGIS
reads a map once into whatever format it finds best (possibly compressed),
keeps each map open and reads areas as needs, or repeatedly opens, reads
and closes each map.
Without knowing that, it isn't clear which map decompressions and
memory to memory copies are necessary.

-- 
Andrew C. Aitchison					Kendal, UK
 			andrew at aitchison.me.uk