[GRASS-user] FW: Working with large vector files

Patton, Eric epatton at nrcan.gc.ca
Thu Oct 5 22:42:43 EDT 2006


Jonathan,

You can use Postgresal, MySql, Sqlite, and probably a few others as database
backends for Grass. AFAIK, most sql databases can handle creating and
manipulating multi-gigabyte databases without too much hassle. I haven't
worked with a databases as large as the ones you're talking about, but I've
imported ~300MB csv files into sqlite in about 7 seconds. They're pretty
fast.
v.in.ascii will use whatever database driver you have connected via
db.connect, not just dbf.

As Hamish has indicated, the memory bottleneck is going to occur in
v.in.ascii, specifically the topology creation stage, not with the database
backend per se. 

~ Eric. 

-----Original Message-----
From: grassuser-bounces at grass.itc.it
To: grassuser at grass.itc.it
Sent: 10/5/2006 7:00 PM
Subject: [GRASS-user] FW: Working with large vector files

I sent a similar question about large vector files to a listserv I
moderate (starserv), and one of the users made the comment below
indicating dbf files themselves can't be larger than 2gb.  Can other
types of databases be used as the backend for vector files?  Is this
statement not true?  How would this behavior affect things like
v.in.ascii (which I noticed uses a process called dbf for most of the
importing process).

--j

-- 
Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307 at hotmail.com

------ Forwarded Message
From: Richard Pollock <pollock at pcigeomatics.com>
Reply-To: <starserv at ucdavis.edu>
Date: Thu, 5 Oct 2006 18:40:54 -0400
To: <starserv at ucdavis.edu>
Conversation: Working with large vector files
Subject: RE: Working with large vector files

The file format can also be an issue. The maximum size of a .DBF file is
2GB. That is because the file has to contain offsets to various other
locations within the file, and those offsets are based on 32-bit
integers. No software that is writing to a .DBF file can get around
this. Ideally, the software should detect when it it has written as much
data that the output file format can accommodate, refuse to write any
more, close the file, and inform the user of the situation. If the
software keeps writing then all it will do is convert a maximum-sized
file that is at least usable into a corrupt file that may contain more
data but is unusable because of messed up offset values.

Lots of file formats that have been around a while have similar
problems. When they were designed, people weren't worried about files
anywhere near 2GB in size.

So, the first thing is to find a storage format that is not
intrinsically limited in size.

I understand that GRASS can write to a PostGIS database. PostGIS is
based on PostgreSQL (a free, opensource DBMS), which has a maximum table
size of 32 TB. If GRASS can read a buffer-full of the input data,
process it, and  write the results out to PostGIS table, and repeat
until all the input data are processed, then that may be your solution.
At least, as long as  the processing doesn't involve displaying the data
(displaying very large datasets has its own problems).

Cheers,

Richard


  _____  

From: owner-starserv at ucdavis.edu [mailto:owner-starserv at ucdavis.edu]
<mailto:owner-starserv at ucdavis.edu]>  On Behalf Of Jonathan Greenberg
Sent: Thursday, October 05, 2006 5:31 PM
To: STARServ
Subject: Re: Working with large vector files

I've been working on techniques to perform tree crown recognition using
high spatial resolution remote sensing imagery - the final output of
these algorithms is a polygon coverage representing each tree crown in
an image - as you can imagine that's a LOT of trees for a standard
quickbird image (on the order of 2 million polygons).  I understand that
I can be subsetting the rasters and doing smaller extractions, but this
is, at best, a hack - there's been a lot of work on efficient handling
of massive raster images (look at RS packages like ENVI and GRASS), but
massive vector handling is seriously lagging - early estimations are
that I'd need about 25 or so subsets for a single quickbird scene to
keep myself under the memory requirements.  

Right now I'm just trying to import a csv of xloc,yloc, crown radius
(the output of my crown mapping algorithm) into SOME GIS, perform a
buffer operation on that crown radius parameter (to give me the crown
polygon), and work with that layer.  ArcMap can actually import the
points, but the buffering process completely overwhelms it (I noticed
the DBF file hits 2gb and then I get the error).  I'm trying GRASS right
now but my first try also got a memory error (I'm working on a 32-bit PC
and a 64-bit mac, incidentally).

Besides GRASS and ArcMap, what else could I be trying out?  I should
point out ENVI also has a vector size problem - displaying large vectors
creates an out of memory error (at least on a 32-bit PC, haven't tried
it on my mac yet).

--j

On 10/5/06 2:08 PM, "Richard Pollock" <pollock at pcigeomatics.com> wrote:



What software created these large files in the first place?  

What format are the files  in?

Cheers,

Richard

 

  _____  

From: owner-starserv at ucdavis.edu [mailto:owner-starserv at ucdavis.edu]
<mailto:owner-starserv at ucdavis.edu]>   On Behalf Of Joanna Grossman
Sent: Thursday, October 05, 2006  4:23 PM
To: starserv at ucdavis.edu
Subject: Re: Working with  large vector files

I'm  not sure Jonathan, but it's certainly worth trying out GRASS and
some of the  other open source tools out there.
http://www.freegis.org/database/?cat=4
<http://www.freegis.org/database/?cat=4> 

Good  luck!

Joanna

Jonathan Greenberg  <jgreenberg at arc.nasa.gov> wrote: 
 


After  banging my head against this  issue for the Nth time, I'm putting
out  a
plaintive cry of "HELP!"  -- I am working (or would like to work with)
vector
files which are  larger than the 2gb limit imposed on them by ArcMap  --
can
anyone  recommend a GIS program that CAN deal with massive vector
coverages
-- efficiently would be nice, but simply being able to  open and
process them
without getting corruption errors would be a  great  start...

--j





-- 
Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307 at hotmail.com


------ End of Forwarded Message

 <<ATT257327.txt>> 




More information about the grass-user mailing list