[GRASS-dev] Re: [GRASS GIS] #516: v.extract slow on large datasets

GRASS GIS trac at osgeo.org
Wed Mar 4 02:26:59 EST 2009

#516: v.extract slow on large datasets
  Reporter:  gisboa       |       Owner:  grass-dev at lists.osgeo.org
      Type:  enhancement  |      Status:  new                      
  Priority:  minor        |   Milestone:  6.4.0                    
 Component:  Vector       |     Version:  unspecified              
Resolution:               |    Keywords:                           
  Platform:  All          |         Cpu:  All                      
Comment (by mmetz):

 Replying to [ticket:516 gisboa]:
 > Using v.extract on large datasets is incredibly slow. From a 3,000,000
 areas dataset I extracted the first 99 (id<100). It took 12 minutes to
 extract the geometries,

 There are probably several reasons for this. The spatial index is built
 from topology, that can take a bit. The category index used to select
 features is rather inefficient for large numbers of categories. These two
 aspects are handled by the vector libs. v.extract itself has potential for
 speed improvement. Regarding the vector libs, changes of the spatial index
 and the category index will only be done in grass7. Improving v.extract is
 possible for grass6, I have some ideas, but I won't get to it soon, and I
 don't know if anybody else will rewrite v.extract soon.

 > after that it says 'writing attributes' for another 6 minutes. The pg
 process is a runner-up in top, consuming about 50% cpu time,

 I think Glynn answered that in his comment to #513.

 > Would this be another reason to implement the file based geometry index?

 Probably yes. But that's not easy. There are "off-the-shelf" solutions for
 that, but 1) someone needs to evaluate these solutions for their
 suitability for grass, and 2) someone has to implement it.

 > Maybe a few modules should be rewritten to perform a dedicated task on
 their own, instead of relying on others, if that makes it slow.

 AFAICT, v.extract does not rely on other modules, it uses library
 functions only. IMHO, modules should not bypass core libraries. If a
 particular task is done inefficiently by the core libraries, these
 libraries need to be improved. A workaround for a specific module would
 only create a mess.

Ticket URL: <http://trac.osgeo.org/grass/ticket/516#comment:1>
GRASS GIS <http://grass.osgeo.org>

More information about the grass-dev mailing list