[GRASS-user] speeding up v.clean for large datasets
Mark Wynter
mark at dimensionaledge.com
Fri Apr 19 18:02:15 PDT 2013
Thanks Markus.
Upgraded to GRASS 7, and re-ran v.clean on same OSM Australia dataset.
Substantially faster. The bulk of the time related to removal of duplicates, and it got exponentially slower as the process approached 100%. Overall it took 12 hours but I'm wondering how it would perform if we were to repeat v.clean for even larger road networks e.g. USA or Europe?
I'm tempted to try dividing the input dataset into say 4 smaller subregions (i.e. vector tiles), and then try patching them back.
I read that we will still need to run v.clean over the patched datasets to remove duplicates.
Since the only duplicates should be nodes along the common tile edges, is there a way to in effect constrain the v.clean process to slithers containing the common edges?
I've had a quick go at g.region but to no avail.
Thanks
GRASS 7.0.svn (PERMANENT):/data/grassdata > v.clean input=osm_roads_split output=osm_roads_split_cleaned tool=break type=line -c
--------------------------------------------------
Tool: Threshold
Break: 0
--------------------------------------------------
Copying vector features...
Copying features...
100%
Rebuilding parts of topology...
Building topology for vector map <osm_roads_split_cleaned at PERMANENT>...
Registering primitives...
971074 primitives registered
13142529 vertices registered
Number of nodes: 1458192
Number of primitives: 971074
Number of points: 0
Number of lines: 971074
Number of boundaries: 0
Number of centroids: 0
Number of areas: -
Number of isles: -
--------------------------------------------------
Tool: Break lines at intersections
100%
Tool: Remove duplicates
100%
--------------------------------------------------
Rebuilding topology for output vector map...
Building topology for vector map <osm_roads_split_cleaned at PERMANENT>...
Registering primitives...
2462829 primitives registered
13322052 vertices registered
Building areas...
100%
0 areas built
0 isles built
Attaching islands...
Attaching centroids...
100%
Number of nodes: 1819237
Number of primitives: 2462829
Number of points: 0
Number of lines: 2462829
Number of boundaries: 0
Number of centroids: 0
Number of areas: 0
Number of isles: 0
On 19/04/2013, at 6:07 PM, Markus Metz wrote:
> On Fri, Apr 19, 2013 at 9:06 AM, Mark Wynter <mark at dimensionaledge.com> wrote:
>> Hi All, we're looking for ways to speed up the cleaning of a large OSM road network (relating to Australia). We're running on a large Amazon AWS EC2 instance.
>>
>> What we've observed is exponential growth in time taken as number of linestrings increases.
>>
>> This means it's taking about 3 days to clean entire network.
>>
>> We were wondering if we were to split the dataset into say 4 subregions, and clean each separately, is it then possible to patch them back together at the end without having to run v.clean afterwards? We want to be able to run v.net over the entire network spanning the subregions.
>>
>> Alternatively, has anyone found a way to speed up v.clean for large network datasets?
>
> Yes, implemented in GRASS 7 ;-)
>
> Also, when breaking lines it is recommended to split the lines first
> in smaller segments with v.split using the vertices option. Then run
> v.clean tool=break. After that, use v.build.polylines to merge lines
> again. Or use in GRASS 7 the -c flag with v.clean tool=break
> type=line. The rmdupl tool is then automatically added, and the
> splitting and merging is done internally.
>
> Markus M
More information about the grass-user
mailing list