[GRASS-user] speeding up v.clean for large datasets
Mark Wynter
mark at dimensionaledge.com
Thu Apr 25 23:33:36 PDT 2013
Thanks Markus for the explanation. I've set PostGIS as my backend. Will revert as I get more into v.net
On 22/04/2013, at 8:20 PM, Markus Metz wrote:
> On Mon, Apr 22, 2013 at 11:03 AM, Mark Wynter <mark at dimensionaledge.com> wrote:
>> Thanks Marcus.
>> Tried sqlite backend suggestion - no improvement - then read that that sqlite is the default backend for grass7.
>> I suspect the complexity of the input dataset may be the contributing factor. For example, I ran v.clean over the already cleaned OSM dataset (2.6M lines), and it took only a few minutes since there were no intersections and no duplicates to remove.
>
> I tested with a OSM road vector with 2.6M lines, the output has 5.3M
> lines: lots of intersections and duplicates which were cleaned in less
> than 15 minutes.
>
> I am surprised that you experience slow removal of duplicates,
> breaking lines should take much longer.
>
> About why removing duplicates takes longer at the end: when you have 5
> lines that could be duplicates you could check
>
> 1 with 2, 3, 4, 5
> 2 with 1, 3, 4, 5
> 3 with 1, 2, 4, 5
> 4 with 1, 2, 3, 5
> 5 with 1, 2, 3, 4
>
> or checking each combination only once:
>
> 1 with 2, 3, 4, 5
> 2 with 3, 4, 5
> 3 with 4, 5
> 4 with 5
>
> alternatively
>
> 2 with 1
> 3 with 1, 2
> 4 with 1, 2, 3
> 5 with 1, 2, 3, 4
>
> The current implementation uses the latter.
>
> Markus M
>
>>
>>
>>> Something is wrong there. Your dataset has 971074 roads, I tested with
>>> an OSM dataset with 2645287 roads, 2.7 times as many as in your
>>> dataset. Cleaning these 2645287 lines took me less than 15 minutes. I
>>> suspect a slow database backend (dbf). Try to use sqlite as database
>>> backend:
>>>
>>> db.connect driver=sqlite
>>> database=$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db
>>>
>>> Do not substitute the variables.
>>>
>>> HTH,
>>>
>>> Markus M
>>
More information about the grass-user
mailing list