[GRASS-user] Large vector files
epatton at nrcan.gc.ca
Fri Oct 6 07:22:39 EDT 2006
Michael and Jonathan,
I would _highly_recommend trying r.in.xyz if you have not already done so.
Especially with LIDAR and other forms of remotely-sensed data. I've had good
success with it. Note there is also a parameter in r.n.xyz to control how
much of the input map to keep in memory, allowing you to run the data import
in multiple passes.
Is it imperative that your data be imported as vector?
From: grassuser-bounces at grass.itc.it
To: Jonathan Greenberg
Cc: GRASS Users Users
Sent: 10/6/2006 3:03 AM
Subject: Re: [GRASS-user] Large vector files
I feel your pain. I'm one of those Lidar users and our Library has
just passed 100000km^2 collected at 1-2 points/m^2. Data management
is a real nightmare and so far as I've seen, the commercial vendors
fail to deal with the problem. I'm pretty new to GRASS, but it
combined with GMT appear to offer a far more appealing solution.
Right now I've just been experimenting with everything at a very
superficial level, but I'll share what I've learned; although it is
biased to working with lidar data.
-on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes
at around the 5 million point level (with topology building).
-without topology I have no issues importing as many as 20 million
points but it again choked when I tried another file with 100 million
points. However the error I received was not a memory allocation
error. I never dove any further into the problem when I discovered
how slowly v.surf.rst ran.
-I've had really positive experiences working with the GMT programs
surface and triangulate. Surface generated a grid that was comparable
with v.sur.rst but was 2 orders of magnitude faster. Triangulate was
3 order faster.
-I've found that it is quite easy to write scripts that automatically
break up the tasks into smaller "tiles". Even better yet, you can use
a idea posted earlier by Hamish (many thanks! :-)) to parallize the
computations. Or at least I have been able to with GMT (I think the
way GRASS handles regions is going to cause me grief when multiple
threads are trying working with different sub-regions...any thoughts?)
-But maybe the most important conclusion I've come to for working
with really large data sets is that files are not the way to go and
that a database serving the application manageable chunks of data is
a better option. Then again, I really don't know too much about
databases so I could be totally wrong on that one. Anyone have any
experience working with lidar through databases?
On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:
> I wonder (and I'm thinking out loud here) if there are ways to
> "tile" vector
> processes in an analogous (if not algorithmic) way to how we deal with
> massive raster datasets? Are the issues I'm running into
> something with older file formats, operating system/file system
> algorithmic maturity, or some mixture of all of these things? As you
> pointed out, the Lidar community seems to have the most pressing
> need for
> these issues to get sorted out -- however as GIS analyses get more
> and require more data, I'm guessing the average user may run into
> this as
> On a related note, apparently ESRI may be releasing a new version
> of their
> geodatabase format to get around some of the filesize issues in
> their 9.2
> release (the beta apparently has this functionality). No word on
> whether it
> a) works or b) has algorithmic advances to deal with these DB...
> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
>> Jonathan Greenberg wrote:
>>> Case in point: I just got this error on a v.in.ascii import of a
>>> ~200mb csv file with points:
>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap
>>> space, and
>>> the program never hit that limit anyway).
>> The vector format has a small but finite memory overhead for each
>> feature which makes more than several million data points
>> To get around this v.in.ascii (and a couple of other modules) let you
>> load in vector data without building topology. (v.in.ascii -b -t)
>> Then it's unknown how many points you can load, but it's a lot.
>> Without topology, about the only thing you can do with the data is
>> it through v.surf.rst.
>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),
>> you can
>> use r.in.xyz to bin it directly into a raster map.
>> with regard to the vector library and LFS support, I think you can
>> expect some "first user" problems, Radim commented on this some
>> time ago
>> in the mailing lists, have to search there for a better answer.
> Jonathan A. Greenberg, PhD
> NRC Research Associate
> NASA Ames Research Center
> MS 242-4
> Moffett Field, CA 94035-1000
> Office: 650-604-5896
> Cell: 415-794-5043
> AIM: jgrn307
> MSN: jgrn307 at hotmail.com
> grassuser mailing list
> grassuser at grass.itc.it
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
grassuser mailing list
grassuser at grass.itc.it
More information about the grass-user