[postgis-users] Trying to import natural earth data into postgis 1.5.2

Nathan Gerber ngerber999 at gmail.com
Fri Dec 31 08:01:45 PST 2010


David,

I don't have my importer script anymore but the function I built for
correcting the polygons is still around somewhere. I just need to get
official approval to release the code to the public though I doubt there
will be a problem with that. FYI the code is written in Perl. Below is the
basic logic I used from what I can remember without looking at the code.


   1. Check the validity of the multipolygon.
   2. If Invalid, split the multipolygon into an array of individual
   polygons with no inner polygons. Inner polygons also go into the array.
   3. Check all polygons for validity.
      1. If valid, continue to the next polygon until finished.
      2. If invalid check if start and end point are the same, if not make
      them the same.
      3. If still invalid check for a reason (ST_INVALIDREASON) to see if
      there is a self-intersection and at what coordinate. If this is the case
      attempt to do a split at the self-intersection to create a new
polygon. This
      is rather complex and doesn't work nicely in many cases. You may have to
      recurse through this as sometimes there are multiply
self-intersections in a
      single polygon so be sure to check if the current intersection
coordinate is
      the same as the old one or not to prevent getting into a
possible infinite
      loop.
      4. If still invalid, check for "spurs", which are what I referred to
      as the zero area piers and peninsulas. This involves ripping the polygon
      into individual points and iterating through the array looking for a part
      that loops back on itself.
      5. As a last resort do a buffer(geom,0) on the geometry.
   4. Take the polygons and do a quick and dirty check if it is valid as a
   multipolygon. This works most of the time as there are rarely inner-rings in
   the data set though they do happen.
   5. If that was not valid, check the polygons for duplicates using
   ST_EQUALS. Sometimes they have duplicated polygons that cause some issues.
   6. If still not valid loop through all of the individual polygons by
   size, starting with the largest.
      1. For each polygon check the other polygons to see if they fall
      completely within the polygon by size starting with the largest.
Be sure to
      flag a polygon as used and when checking additional polygons that you are
      comparing against the new multipolygon to prevent issues causes
by a bulls
      eye within the middle of an inner ring.
   7. If there are still issues, take your new multipolygons and attempt to
   merge them with ST_UNION to each other. This can fix some issues where the
   outer rings overlap each other.
   8. If still invalid attempt a ST_BUFFER(multigeom,0) and flag it for a
   manual inspection as ST_BUFFER occasionally has issues with multipolygons
   and will have some of the rings missing.

You will need to work out your own rules for how to handle naming issues
that may exist in the data set. Also keep in mind that in some cases you may
find two distinct entries for a single political entity and you will want to
do an ST_UNION on them so you only have one entry in your database for a
country (or not depending upon your needs).
--
Nathan Gerber


On Thu, Dec 30, 2010 at 11:26 PM, David Blackman <whizziwig at gmail.com>wrote:

> Do you still have the script? Would you be willing to share it? I'd be
> happy to put in the work to update & host it if I can get it working.
>
> thanks,
> --dave
>
> On Thu, Dec 30, 2010 at 8:50 PM, Nathan Gerber <ngerber999 at gmail.com>
> wrote:
> > I had to build a rather complex fixing script as the problems are quite
> > varied. Some could be fixed with a simple st_buffer(geom,0) while others
> > required more advanced cleaning. Below is an incomplete list of problems
> I
> > found while sorting through the data:
> >
> > Figure eights or as I like to call them, Loop-di-dos |><|
> > Self-Intersections at river heads due to the rather simplistic boundary
> > simplification algorithms used.
> > Boundaries that back tracked along themselves.
> > Boundaries that contained a line off the boundary that represented a
> > peninsula or pier that had zero area.
> > Overlapping borders that had to be cleaned for my data set so that no
> single
> > point could belong in two countries.
> > There were also a few inconsistencies with naming conventions that had to
> be
> > manually corrected based upon some research (thank you wikipedia).
> > Missing state/province and/or county/muncipality borders that left a few
> > holes at the sub-country level in a few areas.
> > Some of my missing data and inconsistencies may have been corrected since
> I
> > pulled the data a year or so ago.
> >
> > I'd offer to send you my cleaned up data set but unfortunately it has
> been
> > updated with some proprietary data for Canada and Mexico.
> > --
> > Nathan Gerber
> >
> >
> > On Thu, Dec 30, 2010 at 7:03 PM, David Blackman <david at whizziwig.com>
> wrote:
> >>
> >> Hi all--
> >>
> >> I'm trying to import the 10m-admin-1 data from
> >> http://www.naturalearthdata.com/ to postgis 1.5.2. It's generating a
> >> lot of invalid geometry that I don't know how to fix.
> >>
> >> My import command looks like this:
> >>
> >> shp2pgsql -W LATIN1 -I 10m-admin-1-states-provinces-shp >
> >> 10m-admin-1-states-provinces-shp.sql
> >> (note, in postgis2, this entirely fails on invalid input characters,
> >> none of the character sets I tried worked)
> >>
> >> and of the import, 148 rows have invalid geometry, with errors like:
> >> blackmad=# select fips_1 FROM
> >> public."10m-admin-1-states-provinces-shp" WHERE ST_IsValid(the_geom) =
> >> false;
> >> NOTICE: Holes are nested at or near point 101.662 3.04074
> >> NOTICE: Self-intersection at or near point 120.185 22.9625
> >> NOTICE: IllegalArgumentException: Invalid number of points in
> >> LinearRing found 2 - must be 0 or >= 4
> >> NOTICE: Ring Self-intersection at or near point -47.3025 -16.0401
> >> NOTICE: Too few points in geometry component at or near point -65.458
> >> -22.1012
> >>
> >> This cleangeometry.sql script @
> >> http://www.sogis1.so.ch/sogis/dl/postgis/cleanGeometry.sql fixes most,
> >> but the LinearRing errors cause the script to choke.
> >>
> >> Can someone advise me on how to fix these errors, or where the problem
> >> lies (in the data or the import tool)?
> >>
> >> thanks
> >> --dave
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101231/c79845c9/attachment.html>


More information about the postgis-users mailing list