[GRASS-user] Large vector files

Dylan Beaudette dylan.beaudette at gmail.com
Fri Oct 6 17:33:28 EDT 2006


Jonathan:

contact me if you would like some tips on PostGIS, I use it all the time for 
massive soil survey based analysis. 

Cheers,

Dylan

On Friday 06 October 2006 12:17, Jonathan Greenberg wrote:
> Unfortunately, I was hoping to work in a vector environment with the data
> -- I'm sure I could think up raster analogs to the analyses I'm trying to
> do right now, but as Michael pointed out earlier, this is a problem that
> does need to be solved -- Lidar, in particular, is getting more popular,
> and software support remains primitive -- while my problem is not a Lidar
> one, it still has the same underlying issue -- I need to be able to create
> and manipulate massive vector files.
>
> I am hearing a lot of suggestions about using things like PostGIS and
> PostGRESQL here and elsewhere, but I am a total novice to this -- is there
> a "dummy's guide" to working with these DB instead of shapefiles?  I'd like
> to be able to use all of the various GRASS vector commands on a "large
> vector" (what would the format be called?) that already exist -- I'm
> noticing there's a lot more setup involved in getting a DB running, and the
> process has yet to be streamlined.  Is it possible to simply substitute
> some postgres driven vector DB for a GRASS vector in the GRASS algorithms,
> or do the v.[whatever] algorithms need to be reworked to support this?
>
> --j
>
> On 10/6/06 4:22 AM, "Patton, Eric" <epatton at nrcan.gc.ca> wrote:
> > Michael and Jonathan,
> >
> > I would _highly_recommend trying r.in.xyz if you have not already done
> > so. Especially with LIDAR and other forms of remotely-sensed data. I've
> > had good success with it. Note there is also a parameter in r.n.xyz to
> > control how much of the input map to keep in memory, allowing you to run
> > the data import in multiple passes.
> >
> > Is it imperative that your data be imported as vector?
> >
> > ~ ERIC.
> >
> > -----Original Message-----
> > From: grassuser-bounces at grass.itc.it
> > To: Jonathan Greenberg
> > Cc: GRASS Users Users
> > Sent: 10/6/2006 3:03 AM
> > Subject: Re: [GRASS-user] Large vector files
> >
> > Jonathan,
> >
> > I feel your pain. I'm one of those Lidar users and our Library has
> > just passed 100000km^2 collected at 1-2 points/m^2. Data management
> > is a real nightmare and so far as I've seen, the commercial vendors
> > fail to deal with the problem. I'm pretty new to GRASS, but it
> > combined with GMT appear to offer a far more appealing solution.
> > Right now I've just been experimenting with everything at a very
> > superficial level, but I'll share what I've learned; although it is
> > biased to working with lidar data.
> >
> > -on my MacBook Pro (2gigs of ram and lots of swap) v.in.ascii chokes
> > at around the 5 million point level (with topology building).
> >
> > -without topology I have no issues importing as many as 20 million
> > points but it again choked when I tried another file with 100 million
> > points. However the error I received was not a memory allocation
> > error. I never dove any further into the problem when I discovered
> > how slowly v.surf.rst ran.
> >
> > -I've had really positive experiences working with the GMT programs
> > surface and triangulate. Surface generated a grid that was comparable
> > with v.sur.rst but was 2 orders of magnitude faster. Triangulate was
> > 3 order faster.
> >
> > -I've found that it is quite easy to write scripts that automatically
> > break up the tasks into smaller "tiles". Even better yet, you can use
> > a idea posted earlier by Hamish (many thanks! :-)) to parallize the
> > computations. Or at least I have been able to with GMT (I think the
> > way GRASS handles regions is going to cause me grief when multiple
> > threads are trying working with different sub-regions...any thoughts?)
> >
> > -But maybe the most important conclusion I've come to for working
> > with really large data sets is that files are not the way to go and
> > that a database serving the application manageable chunks of data is
> > a better option. Then again, I really don't know too much about
> > databases so I could be totally wrong on that one. Anyone have any
> > experience working with lidar through databases?
> >
> > Cheers,
> >
> > Mike
> >
> > On 5-Oct-06, at 5:29 PM, Jonathan Greenberg wrote:
> >> I wonder (and I'm thinking out loud here) if there are ways to
> >> "tile" vector
> >> processes in an analogous (if not algorithmic) way to how we deal with
> >> massive raster datasets?  Are the issues I'm running into
> >> fundamentally
> >> something with older file formats, operating system/file system
> >> limitations,
> >> algorithmic maturity, or some mixture of all of these things?  As you
> >> pointed out, the Lidar community seems to have the most pressing
> >> need for
> >> these issues to get sorted out -- however as GIS analyses get more
> >> advanced
> >> and require more data, I'm guessing the average user may run into
> >> this as
> >> well.
> >>
> >> On a related note, apparently ESRI may be releasing a new version
> >> of their
> >> geodatabase format to get around some of the filesize issues in
> >> their 9.2
> >> release (the beta apparently has this functionality).  No word on
> >> whether it
> >> a) works or b) has algorithmic advances to deal with these DB...
> >>
> >> --j
> >>
> >> On 10/5/06 4:16 PM, "Hamish" <hamish_nospam at yahoo.com> wrote:
> >>> Jonathan Greenberg wrote:
> >>>> Case in point: I just got this error on a v.in.ascii import of a
> >>>> ~200mb csv file with points:
> >>>>
> >>>> G_realloc: out of memory (I have 4gb RAM and plenty of swap
> >>>> space, and
> >>>> the program never hit that limit anyway).
> >>>
> >>> The vector format has a small but finite memory overhead for each
> >>> feature which makes more than several million data points
> >>> impractical.
> >>>
> >>> To get around this v.in.ascii (and a couple of other modules) let you
> >>> load in vector data without building topology.  (v.in.ascii -b -t)
> >>>
> >>> Then it's unknown how many points you can load, but it's a lot.
> >>>
> >>> Without topology, about the only thing you can do with the data is
> >>> run
> >>> it through v.surf.rst.
> >>>
> >>>
> >>> For multi-gigabyte x,y,z datasets (or x,y,f(x,y) just as well),
> >>> you can
> >>> use r.in.xyz to bin it directly into a raster map.
> >>>
> >>> see:
> >>>   http://grass.ibiblio.org/grass63/manuals/html63_user/r.in.xyz.html
> >>>   http://hamish.bowman.googlepages.com/grassfiles#xyz
> >>>
> >>>
> >>> with regard to the vector library and LFS support, I think you can
> >>> expect some "first user" problems, Radim commented on this some
> >>> time ago
> >>> in the mailing lists, have to search there for a better answer.
> >>>
> >>>
> >>> Hamish
> >>
> >> --
> >> Jonathan A. Greenberg, PhD
> >> NRC Research Associate
> >> NASA Ames Research Center
> >> MS 242-4
> >> Moffett Field, CA 94035-1000
> >> Office: 650-604-5896
> >> Cell: 415-794-5043
> >> AIM: jgrn307
> >> MSN: jgrn307 at hotmail.com
> >>
> >>
> >> _______________________________________________
> >> grassuser mailing list
> >> grassuser at grass.itc.it
> >> http://grass.itc.it/mailman/listinfo/grassuser
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
> > _______________________________________________
> > grassuser mailing list
> > grassuser at grass.itc.it
> > http://grass.itc.it/mailman/listinfo/grassuser

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341




More information about the grass-user mailing list