[gdal-dev] Info about technical details of loading massive data

Thu Feb 11 00:55:22 PST 2021

Hi Dev's,

I had a discussion with a friend about the sometimes hard times a GIS-person has when handling/loading/viewing (using QGIS/GDAL) massive (vector/raster) datasets, versus the R/Data-mangling community.

Ending with a conclusion that it seemed (to us) that data-scientists try to load as much (clean objects/multi dimensional arrays) data in memory as possible, while GIS peeps always use the 'let's make it some kind of feature object from first, and do lazy loading' way use.

BUT I'm not sure about this, so: is there maybe somebody who held a presentation or wrote a paper on how, for example gdal, handles a huge point file vs R (memory/disk/io wise)?

While historically the 'Simple Features'-paradigm has be VERY valuable for us, I'm questioning myself if there could be some 'more efficient' way of handling the every time growing datasets we have to handle... I envision a super fast memory-data viewer or so, so I can quickly view my 16 Million points in my Postgis DB easily (mmm probably have to fork QGIS 0.1 for this... QGIS started of as a 'simple' postgis viewer :-) ) 

Thanks for any input.

Regards,

Richard Duivenvoorde