[GRASS-user] speeding up v.clean for large datasets

Fri Apr 19 00:06:47 PDT 2013

Hi All, we're looking for ways to speed up the cleaning of a large OSM road network (relating to Australia).  We're running on a large Amazon AWS EC2 instance.

What we've observed is exponential growth in time taken as number of linestrings increases.

This means it's taking about 3 days to clean entire network.

We were wondering if we were to split the dataset into say 4 subregions, and clean each separately, is it then possible to patch them back together at the end without having to run v.clean afterwards?  We want to be able to run v.net over the entire network spanning the subregions.

Alternatively, has anyone found a way to speed up v.clean for large network datasets?

GRASS 6.4.3svn (road_network):/data/grassdata > v.clean input=osm_roads output=osm_roads_cleaned tool=break,rmdupl
--------------------------------------------------
Tool: Threshold
Break: 0.000000e+00
Remove duplicates: 0.000000e+00
--------------------------------------------------
Copying vector lines...
Rebuilding parts of topology...
Building topology for vector map <osm_roads_cleaned>...
Registering primitives...
971074 primitives registered
13142529 vertices registered
Number of nodes: 1458192
Number of primitives: 971074
Number of points: 0
Number of lines: 971074
Number of boundaries: 0
Number of centroids: 0
Number of areas: -
Number of isles: -
--------------------------------------------------
Tool: Break lines at intersections