[postgis-devel] Call for 1.4.2 and 1.5.1 (Handling of Invalid Geometries)

Chris Hodgson chodgson at refractions.net
Wed Feb 17 10:30:06 PST 2010


strk wrote:
 > I guess we can debate this one.

I believe this is the crux of the problem:

1) Spatial data with various levels of invalidity exists. Whether it 
comes from shapefiles or other formats, whether people paid for it or 
got it free, people have it and they want to use PostGIS with it.

2) It is a barrier to entry for those users who have invalid data and 
want to move to using PostGIS, if we cannot accept their invalid data. 
It may be that the data has been working just fine for them in other 
systems and they don't understand or immediately care that anything is 
wrong with it.

3) Some people would like to reduce that barrier to entry by providing a 
way to load the invalid data. There are two ways to do this, clean it 
before it gets in the database, or allow the invalid data in and clean 
it after it gets in.

4) There is some minimum level of validity required in order for data to 
be able to properly stored in the database, so there will always be some 
invalid cases which will not be able to be loaded into the database.

I am personally of the belief that letting invalid stuff into the 
database is a good thing. Even if we can't actually do anything with it 
in the database because it is so invalid, at least it can be stored. 
Given that it is possible for functions to create invalid output, we 
already know that it is possible for invalid stuff to be given to other 
functions, so we have to handle those cases anyway. So we will already 
be able to output these invalid geometries; why not allow them to be input?

If we agree that PostGIS should provide tools to help clean invalid 
geometries, it seems to make sense for a spatial database to provide 
these tools inside the database, not as external loader-helper tools. 
This makes even more sense given that we may need to clean up invalid 
geometries that are actually created within PostGIS.

This does mean that we have to accept that even basic calculations such 
as length() and area() will potentially fail - however, the user with 
invalid data must not have the expectation of being able to use these 
functions, as this is a problem inherent with their data. I'd rather 
tell them  "invalid goemetry; can't calculate area" than "invalid 
geometry; can't load into database".

My 2 cents worth.

Cheers,
Chris



More information about the postgis-devel mailing list