[GRASS-user] Large vector files

Jonathan Greenberg jgreenberg at arc.nasa.gov
Fri Oct 6 15:17:08 EDT 2006

Unfortunately, I was hoping to work in a vector environment with the data --
I'm sure I could think up raster analogs to the analyses I'm trying to do
right now, but as Michael pointed out earlier, this is a problem that does
need to be solved -- Lidar, in particular, is getting more popular, and
software support remains primitive -- while my problem is not a Lidar one,
it still has the same underlying issue -- I need to be able to create and
manipulate massive vector files.

I am hearing a lot of suggestions about using things like PostGIS and
PostGRESQL here and elsewhere, but I am a total novice to this -- is there a
"dummy's guide" to working with these DB instead of shapefiles?  I'd like to
be able to use all of the various GRASS vector commands on a "large vector"
(what would the format be called?) that already exist -- I'm noticing
there's a lot more setup involved in getting a DB running, and the process
has yet to be streamlined.  Is it possible to simply substitute some
postgres driven vector DB for a GRASS vector in the GRASS algorithms, or do
the v.[whatever] algorithms need to be reworked to support this?


On 10/6/06 4:22 AM, "Patton, Eric" <epatton at nrcan.gc.ca> wrote:

> Michael and Jonathan,
> I would _highly_recommend trying r.in.xyz if you have not already done so.
> Especially with LIDAR and other forms of remotely-sensed data. I've had good
> success with it. Note there is also a parameter in r.n.xyz to control how
> much of the input map to keep in memory, allowing you to run the data import
> in multiple passes.
> Is it imperative that your data be imported as vector?
> ~ ERIC.
> -----Original Message-----
> From: grassuser-bounces at grass.itc.it
> To: Jonathan Greenberg
> Cc: GRASS Users Users
> Sent: 10/6/2006 3:03 AM
> Subject: Re: [GRASS-user] Large vector files
> Jonathan,
> I feel your pain. I'm one of those Lidar users and our Library has
> just passed 100000km^2 collected at 1-2 points/m^2. Data management
> is a real nightmare and so far as I've seen, the commercial vendors
> fail to deal with the problem. I'm pretty new to GRASS, but it
> combined with GMT appear to offer a far more appealing solution.
> Right now I've just been experimenting with everything at a very
> superficial level, but I'll share what I've learned; although it is
> biased to working with lidar data.
> -on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes
> at around the 5 million point level (with topology building).
> -without topology I have no issues importing as many as 20 million
> points but it again choked when I tried another file with 100 million
> points. However the error I received was not a memory allocation
> error. I never dove any further into the problem when I discovered
> how slowly v.surf.rst ran.
> -I've had really positive experiences working with the GMT programs
> surface and triangulate. Surface generated a grid that was comparable
> with v.sur.rst but was 2 orders of magnitude faster. Triangulate was
> 3 order faster.
> -I've found that it is quite easy to write scripts that automatically
> break up the tasks into smaller "tiles". Even better yet, you can use
> a idea posted earlier by Hamish (many thanks! :-)) to parallize the
> computations. Or at least I have been able to with GMT (I think the
> way GRASS handles regions is going to cause me grief when multiple
> threads are trying working with different sub-regions...any thoughts?)
> -But maybe the most important conclusion I've come to for working
> with really large data sets is that files are not the way to go and
> that a database serving the application manageable chunks of data is
> a better option. Then again, I really don't know too much about
> databases so I could be totally wrong on that one. Anyone have any
> experience working with lidar through databases?
> Cheers,
> Mike
> On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:
>> I wonder (and I'm thinking out loud here) if there are ways to
>> "tile" vector
>> processes in an analogous (if not algorithmic) way to how we deal with
>> massive raster datasets?  Are the issues I'm running into
>> fundamentally
>> something with older file formats, operating system/file system
>> limitations,
>> algorithmic maturity, or some mixture of all of these things?  As you
>> pointed out, the Lidar community seems to have the most pressing
>> need for
>> these issues to get sorted out -- however as GIS analyses get more
>> advanced
>> and require more data, I'm guessing the average user may run into
>> this as
>> well.
>> On a related note, apparently ESRI may be releasing a new version
>> of their
>> geodatabase format to get around some of the filesize issues in
>> their 9.2
>> release (the beta apparently has this functionality).  No word on
>> whether it
>> a) works or b) has algorithmic advances to deal with these DB...
>> --j
>> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
>>> Jonathan Greenberg wrote:
>>>> Case in point: I just got this error on a v.in.ascii import of a
>>>> ~200mb csv file with points:
>>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap
>>>> space, and
>>>> the program never hit that limit anyway).
>>> The vector format has a small but finite memory overhead for each
>>> feature which makes more than several million data points
>>> impractical.
>>> To get around this v.in.ascii (and a couple of other modules) let you
>>> load in vector data without building topology.  (v.in.ascii -b -t)
>>> Then it's unknown how many points you can load, but it's a lot.
>>> Without topology, about the only thing you can do with the data is
>>> run
>>> it through v.surf.rst.
>>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),
>>> you can
>>> use r.in.xyz to bin it directly into a raster map.
>>> see:
>>>   http://grass.ibiblio.org/grass63/manuals/html63_user/r.in.xyz.html
>>>   http://hamish.bowman.googlepages.com/grassfiles#xyz
>>> with regard to the vector library and LFS support, I think you can
>>> expect some "first user" problems, Radim commented on this some
>>> time ago
>>> in the mailing lists, have to search there for a better answer.
>>> Hamish
>> -- 
>> Jonathan A. Greenberg, PhD
>> NRC Research Associate
>> NASA Ames Research Center
>> MS 242-4
>> Moffett Field, CA 94035-1000
>> Office: 650-604-5896
>> Cell: 415-794-5043
>> AIM: jgrn307
>> MSN: jgrn307 at hotmail.com
>> _______________________________________________
>> grassuser mailing list
>> grassuser at grass.itc.it
>> http://grass.itc.it/mailman/listinfo/grassuser
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> grassuser mailing list
> grassuser at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grassuser

Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307 at hotmail.com

More information about the grass-user mailing list