[GRASS-user] Large vector files

Fri Oct 6 07:22:39 EDT 2006

Michael and Jonathan, 

I would _highly_recommend trying r.in.xyz if you have not already done so.
Especially with LIDAR and other forms of remotely-sensed data. I've had good
success with it. Note there is also a parameter in r.n.xyz to control how
much of the input map to keep in memory, allowing you to run the data import
in multiple passes.  

Is it imperative that your data be imported as vector?

~ ERIC.

-----Original Message-----
From: grassuser-bounces at grass.itc.it
To: Jonathan Greenberg
Cc: GRASS Users Users
Sent: 10/6/2006 3:03 AM
Subject: Re: [GRASS-user] Large vector files

Jonathan,

I feel your pain. I'm one of those Lidar users and our Library has  
just passed 100000km^2 collected at 1-2 points/m^2. Data management  
is a real nightmare and so far as I've seen, the commercial vendors  
fail to deal with the problem. I'm pretty new to GRASS, but it  
combined with GMT appear to offer a far more appealing solution.
Right now I've just been experimenting with everything at a very  
superficial level, but I'll share what I've learned; although it is  
biased to working with lidar data.

-on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes  
at around the 5 million point level (with topology building).

-without topology I have no issues importing as many as 20 million  
points but it again choked when I tried another file with 100 million  
points. However the error I received was not a memory allocation  
error. I never dove any further into the problem when I discovered  
how slowly v.surf.rst ran.

-I've had really positive experiences working with the GMT programs  
surface and triangulate. Surface generated a grid that was comparable  
with v.sur.rst but was 2 orders of magnitude faster. Triangulate was  
3 order faster.

-I've found that it is quite easy to write scripts that automatically  
break up the tasks into smaller "tiles". Even better yet, you can use  
a idea posted earlier by Hamish (many thanks! :-)) to parallize the  
computations. Or at least I have been able to with GMT (I think the  
way GRASS handles regions is going to cause me grief when multiple  
threads are trying working with different sub-regions...any thoughts?)

-But maybe the most important conclusion I've come to for working  
with really large data sets is that files are not the way to go and  
that a database serving the application manageable chunks of data is  
a better option. Then again, I really don't know too much about  
databases so I could be totally wrong on that one. Anyone have any  
experience working with lidar through databases?

Cheers,

Mike

On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:

> I wonder (and I'm thinking out loud here) if there are ways to  
> "tile" vector
> processes in an analogous (if not algorithmic) way to how we deal with
> massive raster datasets?  Are the issues I'm running into  
> fundamentally
> something with older file formats, operating system/file system  
> limitations,
> algorithmic maturity, or some mixture of all of these things?  As you
> pointed out, the Lidar community seems to have the most pressing  
> need for
> these issues to get sorted out -- however as GIS analyses get more  
> advanced
> and require more data, I'm guessing the average user may run into  
> this as
> well.
>
> On a related note, apparently ESRI may be releasing a new version  
> of their
> geodatabase format to get around some of the filesize issues in  
> their 9.2
> release (the beta apparently has this functionality).  No word on  
> whether it
> a) works or b) has algorithmic advances to deal with these DB...
>
> --j
>
>
> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
>
>> Jonathan Greenberg wrote:
>>>
>>> Case in point: I just got this error on a v.in.ascii import of a
>>> ~200mb csv file with points:
>>>
>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap  
>>> space, and
>>> the program never hit that limit anyway).
>>
>>
>> The vector format has a small but finite memory overhead for each
>> feature which makes more than several million data points  
>> impractical.
>>
>> To get around this v.in.ascii (and a couple of other modules) let you
>> load in vector data without building topology.  (v.in.ascii -b -t)
>>
>> Then it's unknown how many points you can load, but it's a lot.
>>
>> Without topology, about the only thing you can do with the data is  
>> run
>> it through v.surf.rst.
>>
>>
>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),  
>> you can
>> use r.in.xyz to bin it directly into a raster map.
>>
>> see:
>>   http://grass.ibiblio.org/grass63/manuals/html63_user/r.in.xyz.html
>>   http://hamish.bowman.googlepages.com/grassfiles#xyz
>>
>>
>> with regard to the vector library and LFS support, I think you can
>> expect some "first user" problems, Radim commented on this some  
>> time ago
>> in the mailing lists, have to search there for a better answer.
>>
>>
>> Hamish
>
>
> -- 
> Jonathan A. Greenberg, PhD
> NRC Research Associate
> NASA Ames Research Center
> MS 242-4
> Moffett Field, CA 94035-1000
> Office: 650-604-5896
> Cell: 415-794-5043
> AIM: jgrn307
> MSN: jgrn307 at hotmail.com
>
>
> _______________________________________________
> grassuser mailing list
> grassuser at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

_______________________________________________
grassuser mailing list
grassuser at grass.itc.it
http://grass.itc.it/mailman/listinfo/grassuser