[GRASS-user] large shapefile not importing properly with v.import

Markus Metz markus.metz.giswork at gmail.com
Sun Oct 22 12:42:22 PDT 2017


On Fri, Oct 20, 2017 at 10:07 PM, Helmut Kudrnovsky <hellik at web.de> wrote:
>
> >Can you provide examples where v.in.ogr with cleaning/polygon conversion
did
> not work, but v.in.ogr -c + >v.clean produced better results?
>
> really good data sets with all kind of (topological) mess are (from
[correct
> and incorrect] overlapping to self intersections etc):
>
> o Natura 2000 data (~ 1GB):
>
> https://www.eea.europa.eu/data-and-maps/data/natura-8#tab-gis-data

Apparently, there are a lot of polygons in Natura2000 data that are really
overlapping, e.g.
SITECODE: UK0030395
SITENAME: Southern North Sea
with
SITECODE: UK0030352
SITENAME: Dogger Bank

Maybe some sites have been updated (both spatial delineation and name), but
the old versions have not been deleted

v.in.ogr without snapping gives me lots of warnings about
WARNING: Unable to calculate area centroid

this is a symptom of floating point precision errors, so I tried v.in.ogr
with snapping. With snap=1e-3, these warnings disappeared. Small areas
could still be removed with v.clean tool=rmarea. In this example, there are
still 75 areas smaller than 100 square meters which are most probably noise.

A hint for snapping: v.in.ogr suggest for these data a range of [1e-08, 1]
for suitable snapping values. The exponent thus ranges from -8 to 0.
Testing all possible values in this range obviously takes a lot of time.

You could set low = -8 , high = 0, and set mid to (low + high) / 2 = -4
Test with snap=1e$mid
If you still get errors, increase: set low to mid, get new mid with (low +
high) / 2
else, decrease: set high to mid, get new mid with (low + high) / 2
Continue this until you found the threshold were these warnings just
disappeared.

Snapping is slow and uses quite a bit of memory because it needs a spatial
search tree. The nearest-neighbor tree (kd tree) currently used could do
with some more optimization, but I (as the author of that beast) would need
quite a bit of time to come up with a faster balancing method.
>
> o World database of protected areas (~ 1 GB):
>
> https://www.protectedplanet.net/
>
> >The real cleaning happens only if the snap option is set to > 0.
>
> v.in.ogr gives some hints about the snap option, sometimes I don't know
what
> should be the optimal setting.
>
> >noticed that v.in.ogr complains about overlapping areas, which were input
> polygons that should not >overlap, but snapping did not help there,
instead
> I needed to remove small areas afterwards with v.clean.
>
> same experiences here.
>
> >Should the current min_area option of v.in.ogr also be used to remove
small
> areas in the output?
>
> never used this option:
>
> min_area=float
>     Minimum size of area to be imported (square meters)
>     Smaller areas and islands are ignored. Should be greater than snap^2
>     Default: 0.0001
>
> do you mean the small areas shouldn't be imported or small areas should be
> added to the neighbor area with the longest adjacent boundaries?

Thinking about it, small areas should be removed afterwards with v.clean
tool=rmarea.

Markus M
>
>
>
> -----
> best regards
> Helmut
> --
> Sent from: http://osgeo-org.1560.x6.nabble.com/Grass-Users-f3884509.html
> _______________________________________________
> grass-user mailing list
> grass-user at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/grass-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20171022/230ad11d/attachment.html>


More information about the grass-user mailing list