[GRASS-user] speeding up v.clean for large datasets

Thu Apr 25 23:33:36 PDT 2013

Thanks Markus for the explanation.  I've set PostGIS as my backend.  Will revert as I get more into v.net

On 22/04/2013, at 8:20 PM, Markus Metz wrote:

> On Mon, Apr 22, 2013 at 11:03 AM, Mark Wynter <mark at dimensionaledge.com> wrote:
>> Thanks Marcus.
>> Tried sqlite backend suggestion - no improvement  - then read that that sqlite is the default backend for grass7.
>> I suspect the complexity of the input dataset may be the contributing factor. For example, I ran v.clean over the already cleaned OSM dataset (2.6M lines), and it took only a few minutes since there were no intersections and no duplicates to remove.
> 
> I tested with a OSM road vector with 2.6M lines, the output has 5.3M
> lines: lots of intersections and duplicates which were cleaned in less
> than 15 minutes.
> 
> I am surprised that you experience slow removal of duplicates,
> breaking lines should take much longer.
> 
> About why removing duplicates takes longer at the end: when you have 5
> lines that could be duplicates you could check
> 
> 1 with 2, 3, 4, 5
> 2 with 1, 3, 4, 5
> 3 with 1, 2, 4, 5
> 4 with 1, 2, 3, 5
> 5 with 1, 2, 3, 4
> 
> or checking each combination only once:
> 
> 1 with 2, 3, 4, 5
> 2 with 3, 4, 5
> 3 with 4, 5
> 4 with 5
> 
> alternatively
> 
> 2 with 1
> 3 with 1, 2
> 4 with 1, 2, 3
> 5 with 1, 2, 3, 4
> 
> The current implementation uses the latter.
> 
> Markus M
> 
>> 
>> 
>>> Something is wrong there. Your dataset has 971074 roads, I tested with
>>> an OSM dataset with 2645287 roads, 2.7 times as many as in your
>>> dataset. Cleaning these 2645287 lines took me less than 15 minutes. I
>>> suspect a slow database backend (dbf). Try to use sqlite as database
>>> backend:
>>> 
>>> db.connect driver=sqlite
>>> database=$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db
>>> 
>>> Do not substitute the variables.
>>> 
>>> HTH,
>>> 
>>> Markus M
>>