[GRASS-user] Large vector files

Michael Perdue michael_perdue at yahoo.ca
Fri Oct 6 03:03:21 EDT 2006


Jonathan,

I feel your pain. I'm one of those Lidar users and our Library has  
just passed 100000km^2 collected at 1-2 points/m^2. Data management  
is a real nightmare and so far as I've seen, the commercial vendors  
fail to deal with the problem. I'm pretty new to GRASS, but it  
combined with GMT appear to offer a far more appealing solution.
Right now I've just been experimenting with everything at a very  
superficial level, but I'll share what I've learned; although it is  
biased to working with lidar data.

-on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes  
at around the 5 million point level (with topology building).

-without topology I have no issues importing as many as 20 million  
points but it again choked when I tried another file with 100 million  
points. However the error I received was not a memory allocation  
error. I never dove any further into the problem when I discovered  
how slowly v.surf.rst ran.

-I've had really positive experiences working with the GMT programs  
surface and triangulate. Surface generated a grid that was comparable  
with v.sur.rst but was 2 orders of magnitude faster. Triangulate was  
3 order faster.

-I've found that it is quite easy to write scripts that automatically  
break up the tasks into smaller "tiles". Even better yet, you can use  
a idea posted earlier by Hamish (many thanks! :-)) to parallize the  
computations. Or at least I have been able to with GMT (I think the  
way GRASS handles regions is going to cause me grief when multiple  
threads are trying working with different sub-regions...any thoughts?)

-But maybe the most important conclusion I've come to for working  
with really large data sets is that files are not the way to go and  
that a database serving the application manageable chunks of data is  
a better option. Then again, I really don't know too much about  
databases so I could be totally wrong on that one. Anyone have any  
experience working with lidar through databases?

Cheers,

Mike



On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:

> I wonder (and I'm thinking out loud here) if there are ways to  
> "tile" vector
> processes in an analogous (if not algorithmic) way to how we deal with
> massive raster datasets?  Are the issues I'm running into  
> fundamentally
> something with older file formats, operating system/file system  
> limitations,
> algorithmic maturity, or some mixture of all of these things?  As you
> pointed out, the Lidar community seems to have the most pressing  
> need for
> these issues to get sorted out -- however as GIS analyses get more  
> advanced
> and require more data, I'm guessing the average user may run into  
> this as
> well.
>
> On a related note, apparently ESRI may be releasing a new version  
> of their
> geodatabase format to get around some of the filesize issues in  
> their 9.2
> release (the beta apparently has this functionality).  No word on  
> whether it
> a) works or b) has algorithmic advances to deal with these DB...
>
> --j
>
>
> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
>
>> Jonathan Greenberg wrote:
>>>
>>> Case in point: I just got this error on a v.in.ascii import of a
>>> ~200mb csv file with points:
>>>
>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap  
>>> space, and
>>> the program never hit that limit anyway).
>>
>>
>> The vector format has a small but finite memory overhead for each
>> feature which makes more than several million data points  
>> impractical.
>>
>> To get around this v.in.ascii (and a couple of other modules) let you
>> load in vector data without building topology.  (v.in.ascii -b -t)
>>
>> Then it's unknown how many points you can load, but it's a lot.
>>
>> Without topology, about the only thing you can do with the data is  
>> run
>> it through v.surf.rst.
>>
>>
>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),  
>> you can
>> use r.in.xyz to bin it directly into a raster map.
>>
>> see:
>>   http://grass.ibiblio.org/grass63/manuals/html63_user/r.in.xyz.html
>>   http://hamish.bowman.googlepages.com/grassfiles#xyz
>>
>>
>> with regard to the vector library and LFS support, I think you can
>> expect some "first user" problems, Radim commented on this some  
>> time ago
>> in the mailing lists, have to search there for a better answer.
>>
>>
>> Hamish
>
>
> -- 
> Jonathan A. Greenberg, PhD
> NRC Research Associate
> NASA Ames Research Center
> MS 242-4
> Moffett Field, CA 94035-1000
> Office: 650-604-5896
> Cell: 415-794-5043
> AIM: jgrn307
> MSN: jgrn307 at hotmail.com
>
>
> _______________________________________________
> grassuser mailing list
> grassuser at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




More information about the grass-user mailing list