[GRASS-user] Large File Support (LFS)

Wed Mar 24 07:57:52 EDT 2010

Hamish:
> > if you have a 32bit machine, the solution/workaround for this is
> > to pipe from stdin instead.
...
> Ok, this worked fine for r.in.xyz, thanks.
> 
> How about v.in.ascii (using sqlite)?

I assume v.in.ascii input= also tries to use glibc's fopen(), but yes
v.in.ascii can get its input from stdin as well, bypassing that problem.

I don't know if sqlite supports >2gb files on 32bit. Upgrading to the
latest versions of everything may help.

(remember that you only need a back end database if you want to store
more than simple x,y,z position data)

Currently GRASS's default is to maintain a single SQLite file per mapset.
This becomes problematic if you have many large datasets within the same
mapset and are trying to keep the $MAPSET/sqlite.db file smaller than 2gb.

Perhaps 1 DB file per map is as easy as pointing db.connect to a directory
instead of a single sqlite.db file? Or maybe it needs work in the driver
code. I'm not sure, you will have to experiment.

[I just tested, currently it only supports one file per connection,
pointing the sqlite driver to a dir results in an error. of course
you can run db.connect once per map to keep separate files, but that's
a bit more work (but easy enough in a script). once created maps remember
what their DB settings are; db.connect just sets the default which is
used by the map creation]

(or just switch over to the DBF driver, they seem to take about the same
amount of time to import a sample lidar set)

> Is the solution to import multiple files with v.in.ascii and then
> v.patch?

that's not needed. you could try something like:

cat file1.txt file2.txt file3.txt | \
  v.in.ascii out=all_points -zbt z=3 fs=,

(cat is really meant for concatenating many files, even if 90% of the
time it is just used to output a single file)

with -z, -b and -t flags you should be able to import many millions of
points, but I simply don't know how well GRASS's vector library supports
LFS. (if it fails please file a bug as it is a goal to support that)

to save on disk space I usually bzip2 (or gzip) big lidar text files then
use bzcat (or zcat) instead of plain cat to pipe them into the import
program.

> Is this likely to work with LiDAR data where topology isn't
> built?

a problem with both topology and databases is excessive memory use, as
each data point wants a small but finite amount of memory. when you get
bigger than approx. 3 million points RAM starts to be an issue.

So it is more likely to work well when topology is not built.
But without topology you are limited to what you can do with it of
course.

liblas is playing around with different spatial indexing schemes for
point data, we'll see what lessons they learn and can teach us. :)

experiment and let us know what you find out!

Hamish