[GRASS-user] speeding up v.clean for large datasets
Markus Metz
markus.metz.giswork at gmail.com
Mon Apr 22 03:20:01 PDT 2013
On Mon, Apr 22, 2013 at 11:03 AM, Mark Wynter <mark at dimensionaledge.com> wrote:
> Thanks Marcus.
> Tried sqlite backend suggestion - no improvement - then read that that sqlite is the default backend for grass7.
> I suspect the complexity of the input dataset may be the contributing factor. For example, I ran v.clean over the already cleaned OSM dataset (2.6M lines), and it took only a few minutes since there were no intersections and no duplicates to remove.
I tested with a OSM road vector with 2.6M lines, the output has 5.3M
lines: lots of intersections and duplicates which were cleaned in less
than 15 minutes.
I am surprised that you experience slow removal of duplicates,
breaking lines should take much longer.
About why removing duplicates takes longer at the end: when you have 5
lines that could be duplicates you could check
1 with 2, 3, 4, 5
2 with 1, 3, 4, 5
3 with 1, 2, 4, 5
4 with 1, 2, 3, 5
5 with 1, 2, 3, 4
or checking each combination only once:
1 with 2, 3, 4, 5
2 with 3, 4, 5
3 with 4, 5
4 with 5
alternatively
2 with 1
3 with 1, 2
4 with 1, 2, 3
5 with 1, 2, 3, 4
The current implementation uses the latter.
Markus M
>
>
>> Something is wrong there. Your dataset has 971074 roads, I tested with
>> an OSM dataset with 2645287 roads, 2.7 times as many as in your
>> dataset. Cleaning these 2645287 lines took me less than 15 minutes. I
>> suspect a slow database backend (dbf). Try to use sqlite as database
>> backend:
>>
>> db.connect driver=sqlite
>> database=$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db
>>
>> Do not substitute the variables.
>>
>> HTH,
>>
>> Markus M
>
More information about the grass-user
mailing list