[postgis-users] invalid polygons, buffer(geom, 0) and silent changes to the actual geometry

Paul Ramsey pramsey at opengeo.org
Tue Apr 7 08:18:47 PDT 2009


Here's the philosophy: if you filter validity at insert/parse time you
(a) add a very large CPU overhead to data loading and (b) you make it
impossible for people to clean their data within the database.

Here's the ugly reality: there is currently no bullet-proof was to
clean data within the database.

Here's a plea for help: if anyone on this list needs geometry cleaning
in the database and can provide funding, let me know, I'll build a
multi-funder project to (a) get the algorithms built in JTS (b) get
them ported to GEOS (c) get them tied into PostGIS.

Here's an operational way to deal with invalid incoming data so it
doesn't silently mix with your good data:

pramsey=# create table geometry_good ( id integer, the_geom geometry );
CREATE TABLE
pramsey=# create table geometry_bad ( id integer, the_geom geometry );
CREATE TABLE
pramsey=# create rule geometry_valid as on insert to geometry_good
where not st_isvalid(NEW.the_geom) do instead insert into geometry_bad
values ( NEW.id, NEW.the_geom );
CREATE RULE
pramsey=# insert into geometry_good values (1, 'POLYGON((200 200,200
400,400 400,400 200,200 200))');
INSERT 0 1
pramsey=# insert into geometry_good values (2, 'POLYGON((100 100,100
200,400 200,400 400,200 400,200 100,100 100))');
NOTICE:  Self-intersection at or near point 200 200
NOTICE:  Self-intersection at or near point 200 200
INSERT 0 0
pramsey=# select count(*) from geometry_good;
 count
-------
     1
(1 row)

pramsey=# select count(*) from geometry_bad;
 count
-------
     1
(1 row)

pramsey=# select id,st_astext(the_geom) from geometry_good;
 id |                     st_astext
----+----------------------------------------------------
  1 | POLYGON((200 200,200 400,400 400,400 200,200 200))
(1 row)

pramsey=# select id,st_astext(the_geom) from geometry_bad;
 id |                             st_astext
----+--------------------------------------------------------------------
  2 | POLYGON((100 100,100 200,400 200,400 400,200 400,200 100,100 100))
(1 row)

pramsey=#


On Tue, Apr 7, 2009 at 3:56 AM, Steve Grey <stevegrey78 at gmail.com> wrote:
> Hi,
>
> I hope someone can help, postgis appears to be silently changing invalid
> geometries when attempting to "clean" with buffer(geom,0).
>
> select postgis_full_version() gives:
> 'POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.6.0, 21 Dec 2007"
> USE_STATS'
>
> select version() gives:
> 'PostgreSQL 8.3.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real
> (Ubuntu 4.3.2-1ubuntu11) 4.3.2'
>
>
> I've been trying to "clean" some invalid polygons, and the advice often
> seems to buffer(geom,0) which turns an invalid polygon into a valid
> polygon.  However, the actual shape itself seems to change:
>
> select asewkt(geomfromtext('POLYGON((100 100,100 200,400 200,400 400,200
> 400,200 100,100 100))',27700)) gives...
> 'SRID=27700;POLYGON((100 100,100 200,400 200,400 400,200 400,200 100,100
> 100))'
> ...with no warning that the geometry is invalid - only when checking
> isvalid() is there a notice of  "Self-intersection at or near point 200
> 200".
>
> If I do...
> select isvalid(buffer(geomfromtext('POLYGON((100 100,100 200,400 200,400
> 400,200 400,200 100,100 100))',27700),0))
> ...I get...
> 't'
> ...so I've made the polygon valid and check it is so i the way that is often
> recommended.
>
> Except, if I look at the resulting shape as text using...
> select asewkt(buffer(geomfromtext('POLYGON((100 100,100 200,400 200,400
> 400,200 400,200 100,100 100))',27700),0))
> ...I get...
> 'SRID=27700;POLYGON((200 200,200 400,400 400,400 200,200 200))'
> ..which is a different shape, silent of errors, warnings or notices.  I've
> lost the smaller part of the shape.
>
>
>
> This follows onto similar lines with donut polygons.  I had a function that
> was occasionally being sent malformed polygons (as text) from a client
> application.  Checking if they are invalid, such as ...
> select isvalid(geomfromtext('MULTIPOLYGON(((100 100,100 400,600 400,600
> 100,100 100)),((200 200,200 300,300 300,300 200,200 200)))',27700))
> ... I get a warning (NOTICE:  Holes are nested at or near point 200 200)
> and...
> 'f'
>
> Again, isvalid() is raising an error - but simply putting the string into a
> geometry and getting it back, as in ...
> select asewkt(geomfromtext('MULTIPOLYGON(((100 100,100 400,600 400,600
> 100,100 100)),((200 200,200 300,300 300,300 200,200 200)))',27700))
> ...raises no notices etc.
>
> More seriously,  if I ask for the area of the polygon I don't get an error -
> I get...
> select area(geomfromtext('MULTIPOLYGON(((100 100,100 400,600 400,600 100,100
> 100)),((200 200,200 300,300 300,300 200,200 200)))',27700))
> 160000
>
> If I had a properly formed donut, I get
> select area(geomfromtext('MULTIPOLYGON(((100 100,100 400,600 400,600 100,100
> 100),(200 200,200 300,300 300,300 200,200 200)))',27700))
> 140000
>
>
> So, unless I check a polygon is valid, I could silently be getting an
> incorrect area measure, and perhaps more besides.  As far as this particular
> application goes I think I can work around he issue by assembling the
> (multi)polygons myself, but I'm concerned at the lack of warnings and its
> not just this application - some of my data providers send data with
> self-intersecting polygons etc..
>
> Regards,
>
> Steve
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>



More information about the postgis-users mailing list