[GRASS-user] Large vector files
michael_perdue at yahoo.ca
Fri Oct 6 03:03:21 EDT 2006
I feel your pain. I'm one of those Lidar users and our Library has
just passed 100000km^2 collected at 1-2 points/m^2. Data management
is a real nightmare and so far as I've seen, the commercial vendors
fail to deal with the problem. I'm pretty new to GRASS, but it
combined with GMT appear to offer a far more appealing solution.
Right now I've just been experimenting with everything at a very
superficial level, but I'll share what I've learned; although it is
biased to working with lidar data.
-on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes
at around the 5 million point level (with topology building).
-without topology I have no issues importing as many as 20 million
points but it again choked when I tried another file with 100 million
points. However the error I received was not a memory allocation
error. I never dove any further into the problem when I discovered
how slowly v.surf.rst ran.
-I've had really positive experiences working with the GMT programs
surface and triangulate. Surface generated a grid that was comparable
with v.sur.rst but was 2 orders of magnitude faster. Triangulate was
3 order faster.
-I've found that it is quite easy to write scripts that automatically
break up the tasks into smaller "tiles". Even better yet, you can use
a idea posted earlier by Hamish (many thanks! :-)) to parallize the
computations. Or at least I have been able to with GMT (I think the
way GRASS handles regions is going to cause me grief when multiple
threads are trying working with different sub-regions...any thoughts?)
-But maybe the most important conclusion I've come to for working
with really large data sets is that files are not the way to go and
that a database serving the application manageable chunks of data is
a better option. Then again, I really don't know too much about
databases so I could be totally wrong on that one. Anyone have any
experience working with lidar through databases?
On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:
> I wonder (and I'm thinking out loud here) if there are ways to
> "tile" vector
> processes in an analogous (if not algorithmic) way to how we deal with
> massive raster datasets? Are the issues I'm running into
> something with older file formats, operating system/file system
> algorithmic maturity, or some mixture of all of these things? As you
> pointed out, the Lidar community seems to have the most pressing
> need for
> these issues to get sorted out -- however as GIS analyses get more
> and require more data, I'm guessing the average user may run into
> this as
> On a related note, apparently ESRI may be releasing a new version
> of their
> geodatabase format to get around some of the filesize issues in
> their 9.2
> release (the beta apparently has this functionality). No word on
> whether it
> a) works or b) has algorithmic advances to deal with these DB...
> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
>> Jonathan Greenberg wrote:
>>> Case in point: I just got this error on a v.in.ascii import of a
>>> ~200mb csv file with points:
>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap
>>> space, and
>>> the program never hit that limit anyway).
>> The vector format has a small but finite memory overhead for each
>> feature which makes more than several million data points
>> To get around this v.in.ascii (and a couple of other modules) let you
>> load in vector data without building topology. (v.in.ascii -b -t)
>> Then it's unknown how many points you can load, but it's a lot.
>> Without topology, about the only thing you can do with the data is
>> it through v.surf.rst.
>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),
>> you can
>> use r.in.xyz to bin it directly into a raster map.
>> with regard to the vector library and LFS support, I think you can
>> expect some "first user" problems, Radim commented on this some
>> time ago
>> in the mailing lists, have to search there for a better answer.
> Jonathan A. Greenberg, PhD
> NRC Research Associate
> NASA Ames Research Center
> MS 242-4
> Moffett Field, CA 94035-1000
> Office: 650-604-5896
> Cell: 415-794-5043
> AIM: jgrn307
> MSN: jgrn307 at hotmail.com
> grassuser mailing list
> grassuser at grass.itc.it
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
More information about the grass-user