[postgis-users] Getting TopologyExections when trying to node linestrings to create an overlay

BladeOfLight16 bladeoflight16 at gmail.com
Tue Feb 3 08:28:35 PST 2015


I'm trying to create a polygon overlay. The basic process is relatively
simple: 1) Get the boundaries 2) Union the boundaries to node the
linestrings 3) Polygonize the noded outlines 4) Filter out holes using a
contains or intersects test. The problem I'm running into is that I'm
getting "GEOSUnaryUnion: TopologyException: found non-noded intersection
between LINESTRING" errors from ST_Union.

I've taken the liberty of creating a script that will set up a database to
reproduce what I'm seeing. I've gzipped it in the interest of size, since
it's got a fairly sizable set of polygons. Let me know if this delivery
method is a problem; I'll be happy to use other means. You can find it
here:
https://drive.google.com/file/d/0B_6I7kRgE8teVUpha2Q4ZlNDMWs/view?usp=sharing.
(I originally tried to attach it, but the PostGIS mailing list rejected
it.) The contents are a plain SQL file. It won't create a database for you,
but it will create the PostGIS extension IF NOT EXISTS. Specifically for
this problem, it creates and populates this table:

CREATE TABLE error_generating_polygons (
    geom_set_id SERIAL PRIMARY KEY,
    outer_boundary geometry NOT NULL,
    polygons geometry[] NOT NULL,
    error_code text NOT NULL,
    error_message text NOT NULL
);

It's a little weird, so I think I should provide some explanation. The
primary key is just a surrogate. The outer_boundary can be ignored for the
purposes of this error; it's there for me for that filtering out holes and
chaff after I get the raw overlay back. The polygons column is the most
interesting; it contains a set of polygons that will reproduce this error.
(I'll get into why I use an array in a minute.) The error_code and
error_message are the values of SQLSTATE and SQLERRM I get when noding the
boundaries; I've included them so that anyone can compare if they get
different results.

The reason I have an array of polygons is that this was the simplest method
of providing a large group of polygons. The data you're seeing here is not
sample data I've made up. This is real world data provided to me by a
client, who has produced them primarily through a mixture of GPS, manual
processing, and probably some more or less "automated" processing. In
short, it's pretty messy, and these are the actual polygons I need to
overlay. As for why I'm giving you these big groups instead of breaking
things down into smaller chunks, I did try that, at least somewhat. I found
that if I broke this up into pairs of polygons, I could only reproduce the
error with a *single* pair out of thousands of combinations. So sorry for
the shear number of polygons here, but apparently, the errors come at least
partly from some cumulative effect.

I have verified that every polygon in there is a valid one:

SELECT geom_set_id, ST_IsValidDetail(geom)
FROM (SELECT geom_set_id, UNNEST(polygons) geom
      FROM error_generating_polygons
     ) x
WHERE NOT ST_IsValid(geom);

That query gives me an empty set, plus I do plenty of checking before
letting a geometry get into the system.

Now, to get down to business. Here's the query you can run to see errors:

DO $$
DECLARE problem_row error_generating_polygons%ROWTYPE;
BEGIN
  FOR problem_row IN (SELECT * FROM error_generating_polygons) LOOP
    BEGIN
      PERFORM ST_Union(ST_Boundary(geom))
      FROM UNNEST(problem_row.polygons) p (geom);
      RAISE NOTICE 'geom_set_id % succeeded', problem_row.geom_set_id;
    EXCEPTION
      WHEN OTHERS THEN
        RAISE NOTICE 'Error for geom_set_id % (Code %): %',
problem_row.geom_set_id, SQLSTATE, SQLERRM;
    END;
  END LOOP;
END
$$;

I also tried a variant of this: I replaced ST_Bounrary(geom) with
ST_Boundary(ST_SnapToGrid(geom,
0.01)).

I tested on two separate installations of PostgreSQL/PostGIS. Believe it or
not, I'm getting different results from these. Here are the specs and
results for each one:

1) Runs on CentOS, installed from CentOS repositories (I believe. I need to
double check with IT to be 100% sure). Version Info:

PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-4), 64-bit
POSTGIS="2.1.4 r12966" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6
March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.6"
LIBJSON="UNKNOWN" RASTER

Every row throws an error on this one. This is where the errors in the
table come from. Adding ST_SnapToGrid makes it error out on only one row.

This one is more similar to my production machine at present.

2) Runs on Debian, installed from PostgreSQL maintained repository. Version
info:

PostgreSQL 9.4.0 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
4.7.2-5) 4.7.2, 64-bit
POSTGIS="2.1.5 r13152" GEOS="3.3.3-CAPI-1.7.4" PROJ="Rel. 4.7.1, 23
September 2009" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.8.0"
LIBJSON="UNKNOWN" RASTER

This one errored out less; only about 1/3 of the rows error out. Adding
ST_SnapToGrid also makes it error out on only one row.


An oddity to note: the Debian install has a more up to date PostGIS (2.1.5)
but a lagging GEOS (3.3.3), while the CentOS install has a slightly older
PostGIS (2.1.4) and an up to date GEOS (3.4.2). This makes it difficult for
me to determine where the improvement comes from: it could be from the
older version of GEOS, or it could be from the newer version of PostGIS. It
might also not have anything to do with the versions. As I said, it seems
to be some sort of cumulative effect, so it might even be order dependent
with the different installs choosing different evaluation orders.

I would also like to make it a point that even a single row erroring out is
a problem for me. I need to run this as part of a long running automatic
process, and once the overlays are generated, I will need to further
generate summary statistics using them. Also, while snapping to grid
greatly improved my results, it also caused a few errors on groups that
were working without it, if I recall correctly. (Those sets aren't included
here.)

I considered going straight to filing a bug, but I decided I'd rather bring
it up on the mailing list first, especially given that I'm seeing different
behavior on different installs. I wanted to see if any further work could
be done to narrow this down before filing a report, and I wanted to confirm
that filing it on the GEOS library would be the right place. Feel free to
ask for any clarification or for me to try anything with this data.
(Although, please be aware that my skill set may be lacking, especially if
you want me to do something with native code. =/)

Any potential work-around suggestions would be appreciated as well. Right
now, my work-around is to dump out all the individual line segments to a
table and use ST_Snap and ST_Equals to eliminate very close line segments.
It gets rid of these errors, but I think it may be the source of some
problems I'm having when I polygonize the resulting noded line string. With
that process of noding everything, ST_Polygonize is spitting out polygons
that cover areas that should be separate polygons or even just leaving out
entire areas altogether. (I have taken a look at those dumped-rebuilt
linstring using QGIS. The lines appear to be correct, but there could be
some microscopic gap or something weird like that for all I know.) I'm not
really interested in addressing that problem right now; I'd rather see what
dealing with the noding problems does to fix it first.

Thanks for anyone's help looking into this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20150203/bfe212ab/attachment.html>


More information about the postgis-users mailing list