[postgis-users] Problem with ST_Within, Polygons and Multipolygons

Ben Madin ben at remoteinformation.com.au
Thu May 7 17:24:46 PDT 2009


Jorge,

I would have some faith in Brent's approach - if you run a query like :

select gid, nice_text_name, st_astext(st_centroid(the_geom)) as centroid
from bad_data_set
where st_isValid(the_geom) is false;

You will quickly get a list in a rough order of the magnitude of your  
problem, as well as the notices of where the self-intersections are.  
In QGIS there is a Zoom to Point Plugin so you can just put in the  
value given in the notice, and then zoom straight in and fix the  
geometry as best you can (or need!).

I recently had to do this for a smallish data set (but plenty of  
errors), and once you have your list and have done a few, you may be  
surprised how quickly you can tidy them up.

cheers

Ben






On 08/05/2009, at 6:39 AM, pcreso at pcreso.com wrote:

>
> HIi
>
> I have tried a few ways of dealing with this, including  
> ST_Simplify_PreserveTopology() to try to fix errors.
>
> I have yet to find a solution that automatically resolves such  
> probles as I think they should be. Using some variation of  
> buffer(buffer(geom,-1),1) instead of buffer(geom,0) sometimes fixes  
> things that the simple buffer of zero has failed to.
>
> What I have done that works for me is to use QGIS to edit the  
> geometries that are not fixed by the above operations.
>
> I run isvalid() limit 1, then zoom into the returned coodinates &  
> manually fix the problems, which so far have always been pretty  
> simple & obvious. I then rerun isvalid() & work through each error,  
> It doesn't take long per error, 200 or so can be fixed in a morning  
> without too much trouble.
>
> Also note that an empty geometry is regarded as an error by some  
> applications, but passes isvalid(), so I also do a check for  
> ST_NumGeomteries()=0 to clean up PostGIS datasets.
>
>
> HTH,
>
>   Brent Wood
>
> --- On Fri, 5/8/09, Jorge Arévalo <jorge.arevalo at gmail.com> wrote:
>
>> From: Jorge Arévalo <jorge.arevalo at gmail.com>
>> Subject: Re: [postgis-users] Problem with ST_Within, Polygons and   
>> Multipolygons
>> To: "PostGIS Users Discussion" <postgis- 
>> users at postgis.refractions.net>
>> Date: Friday, May 8, 2009, 9:44 AM
>> Hello,
>>
>> Many thanks for your help Kevin. I can confim one thing:
>> isvalid only returns the first error. Just today, I realized
>> on the big amount of errors of my data, as you said. Indeed,
>> I was reading the same link that you provided me about
>> geometry validity, and I split my multipolygon into single
>> polygons to check them. My conclusions are the same as
>> yours: what a mess!
>>
>>
>>
>> Oh, and thanks again for your response to my original
>> question. It helped me a lot... Maybe is it time to ask for
>> new data, without errors? If not, how should I
>> "clean" my data? I tried with ST_Buffer(geom, 0.0)
>> (I read in this link http://www.bostongis.com/postgis_extent_expand_buffer_distance.snippet
>> that is useful to repair data with self-interesections), but
>> doesn't seem to work for me...
>>
>>
>>
>> Regards
>> Jorge
>>
>>
>>
>> On Thu, May 7, 2009 at 6:58 PM,
>> Kevin Neufeld <kneufeld at refractions.net>
>> wrote:
>>
>>
>> Hi Jorge,
>>
>>
>>
>> So, I had a quick look at your data ... I'm sorry to
>> say that you've got a lot of problems with your ocean
>> polygon.
>>
>> Paul can correct me since he did the work on isvalidreason,
>> but I think st_isvalid only returns the first thing it finds
>> wrong with your input geometry.
>>
>>
>>
>> I did this and found all kinds of things:
>>
>>
>>
>> -- Extract the POLYGONs from the MULTI
>>
>> CREATE TABLE water_polys AS
>>
>> SELECT (ST_Dump(the_geom)).geom AS the_geom
>>
>> FROM water;
>>
>>
>>
>> SELECT ST_IsValidReason(the_geom)
>>
>> FROM water_polys
>>
>> WHERE NOT ST_IsValid(the_geom);
>>
>>
>>
>> So, looking at the individual polygons (of which only the
>> first error is again reported), I found over 66
>> self-intersections (figure 8's and so forth) and several
>> whose boundaries loop back and touch itself.
>>
>>
>>
>> Also, in response to your original question about why a
>> point on your airport also intersects the ocean polygon: it
>> turns out that the island in question is not a hole in your
>> ocean at all ... but a second polygon on top of your ocean
>> polygon.  This is why your point query returns true - there
>> are two surfaces (the island and the water) that intersect
>> the point (which, by the way, also makes an invalid
>> multipolygon).
>>
>>
>>
>>
>>
>> A while back I put together some isvalid/issimple geometry
>> descriptions with soem pretty pictures:
>>
>> http://postgis.refractions.net/documentation/manual-svn/ch04.html#OGC_Validity
>>
>>
>>
>> You may want to have a look through it and clean up your
>> dataset.  The invalidity will most certainly result in more
>> unexpected behaviour.
>>
>>
>>
>> Cheers,
>>
>> Kevin
>>
>>
>>
>>
>>
>> Jorge Arévalo wrote:
>>
>>
>> Hello,
>>
>>
>>
>> I have additional info. When running the method
>> "isvalid", I get this warning message:
>>
>>
>>
>> NOTICE:  Self-intersection at or near point -2749.99
>> 4.7955e+06
>>
>>
>>
>> What is exactly a "self-intersection"? May this
>> "self-intersection" be the cause of my problem?
>>
>>
>>
>> Thanks
>>
>> Jorge
>>
>>
>>
>>
>>
>>
>>
>> 2009/5/6 Jorge Arévalo <jorge.arevalo at gmail.com
>> <mailto:jorge.arevalo at gmail.com>>
>>
>>
>>
>>
>>
>>    Hi,
>>
>>
>>
>>    Thanks for your response Kevin. I can provide some
>> data:
>>
>>
>>
>>    - Screenshot (over geoserver) of the
>> "water" multipolygon. You can
>>
>>    see the shape of Spain and the Balearic Islands, and
>> a hole on the
>>
>>    left (Portugal). The multipolygon is open, isn't
>> it? -->
>>
>>    http://www.nebulared.com/tmp_geo/water_multipolygon.jpg
>>
>>
>>
>>    - Zoom over Menorca (one of the Balearic Islands). As
>> you can see,
>>
>>    the point tested is inside the "airports"
>> multipolygon, but
>>
>>    ST_Within and ST_Contains returns that the point is
>> inside both,
>>
>>    "water" and "airports"
>> multipolygons -->
>>
>>    http://www.nebulared.com/tmp_geo/zoom_in_water_multipolygon.jpg
>>
>>
>>
>>    - The "ST_AsText" data for the
>> "airport" multipolygon -->
>>
>>    http://www.nebulared.com/tmp_geo/airports_multipolygon.txt
>>
>>
>>
>>    - The "ST_AsText" data for the
>> "water" multipolygon (rar file, 2.5
>>
>>    MB of plain text...) -->
>>
>>    http://www.nebulared.com/tmp_geo/water_multipolygon.rar
>>
>>
>>
>>    So, how can I check the validity of my multipolygons?
>> Do someone
>>
>>    could help me with this?
>>
>>
>>
>>    Now, the query. My context is the next:
>>
>>    - I have a table with polygons that I want to check
>> (to say if they
>>
>>    are in an airport, in a forest, in water, etc). We
>> can call this
>>
>>    table T1. I use the centroid of the polygons to
>> check, not the
>>
>>    entire polygon.
>>
>>    - In other different table, that we'll call T2, I
>> have all the
>>
>>    multipolygons ("water",
>> "airports", etc). The name of the field that
>>
>>    says if the multipolygon is "water" or
>> "airport" or anything is
>>
>>    "clutter"
>>
>>
>>
>>    With this code, I pretend to loop over T1, checking
>> if the centroid
>>
>>    of its polygons are inside of one (and ONLY one) of
>> the
>>
>>    multipolygons of T2.  (Is part of a procedure)
>>
>>
>>
>>    (...)
>>
>>    BEGIN
>>
>>      FOR s IN SELECT * FROM T1 LOOP
>>
>>            select clutter from T2 where
>>
>>    ST_Within(ST_Centroid(T1.wkb_geometry),
>> T2.wkb_geometry))
>>
>>            (...)
>>
>>      END LOOP;
>>
>>      RETURN;
>>
>>    END;
>>
>>
>>
>>    Most of the times, the select inside the loop,
>> returns more than one
>>
>>    result ("water" and other one, normally).
>> That's my problem
>>
>>
>>
>>    I have another version of the query, with Java.
>> Again, I loop over
>>
>>    the table T1, and check the centroid of its polygons
>> with all the
>>
>>    multipolygons (water, airports...) that are in T2.
>> Then, my query,
>>
>>    inside the loop, is:
>>
>>
>>
>>    SELECT clutter FROM T2 WHERE ST_Within(?,
>> T2.wkb_geometry)
>>
>>
>>
>>    The '?' is replaced with the centroid of the
>> polygon in T1, of
>>
>>    course. Basically, is the same idea of the previous
>> SQL code, but
>>
>>    with Java.
>>
>>
>>
>>    Apart from this, other problem is that ST_Within
>> takes a long time.
>>
>>    1 sec per query. And I have like 2 millions of
>> polygons (queries) to
>>
>>    check...
>>
>>
>>
>>    My multipolygons' table (T2) has only 29 rows
>> (small table with
>>
>>    large geometries), and I found this
>>
>>    http://postgis.refractions.net/documentation/manual-1.3/ch05.html.
>>
>>    Seems to be good for me. But using "SET
>> enable_seqscan TO off", the
>>
>>    queries take the same time (~ 1 sec). Maybe the other
>> approach
>>
>>    (create an additional column that "caches"
>> the bbox, and matching
>>
>>    against this) could help me. Any ideas on how to
>> create this bbox
>>
>>    with my huge multipolygons?
>>
>>
>>
>>
>>
>>    Thanks in advance
>>
>>
>>
>>    Regards
>>
>>    Jorge
>>
>>
>>
>>
>>
>>    On Wed, May 6, 2009 at 4:28 AM, Kevin Neufeld
>>
>>    <kneufeld at refractions.net
>> <mailto:kneufeld at refractions.net>>
>> wrote:
>>
>>
>>
>>        Jorge Arévalo wrote:
>>
>>
>>
>>            ... If is useful, I tested the method
>> "ST_isvalid" with the
>>
>>            multipolygon and returns
>> "false". Maybe the multipolygon is
>>
>>            not closed? I loaded the data from a
>> shapefile. Is it
>>
>>            possible to create a
>> "non-valid" multipolygon? Does PostGIS
>>
>>            accept this?
>>
>>
>>
>>
>>
>>        Ah, yes.  Most spatial predicates in PostGIS
>> assume the input
>>
>>        geometry is valid.  Suppose you had defined a
>> POLYGON with a
>>
>>        hole or inner ring outside of the exterior
>> ring.  What is the
>>
>>        area?  Does the question even make sense?
>> PostGIS allows
>>
>>        invalid geometries in the database so users can
>> make full use of
>>
>>        the PostGIS toolset to do whatever they need
>> (ie. breakdown a
>>
>>        polygon to it's constituent linework and
>> rebuild it back up
>>
>>        again to a valid polygon)
>>
>>
>>
>>
>>
>>            Ok, being even more specific. I'm
>> working with data about
>>
>>            Spain. I have a HUGE multipolygon that
>> represents "water"
>>
>>            (this is, the coasts around Spain and its
>> islands). Then,
>>
>>            the "holes" inside this
>> multypolygon have the shape of
>>
>>            Spain, Balearic Islands and Canary
>> Islands. Of course, I
>>
>>            have more multipolygons that represent
>> "forests",
>>
>>            "airports", "cities",
>> etc, that fit into these holes.
>>
>>
>>
>>            Really, my problem is with some points
>> that belong to an
>>
>>            airport in an island. Using
>> "ST_Within" and "ST_Contains",
>>
>>            the result is that these points belong to
>> the multipolygon
>>
>>            "airport" and multipolygon
>> "water" at same time. Obviously,
>>
>>            the island (and its airport) is
>> surrounded by water, but the
>>
>>            airport's points shouldn't be
>> part of the multopolygon
>>
>>            "water". And, as I said, when I
>> apply "ST_isvalid" to the
>>
>>            multipolygon "water", returns
>> false. Maybe is not closed?
>>
>>
>>
>>
>>
>>        Yeah, as mentioned before, a point that is not
>> on the surface of
>>
>>        a (multi)polygon (whether in a hole or
>> completely outside) is
>>
>>        not considered within the (multi)polygon.  It
>> does sound like
>>
>>        your ocean polygon has validity issues.
>>
>>
>>
>>
>>
>>
>>
>>            Oh, btw, what's the difference
>> between "ST_Within" and
>>
>>            "Within". Does
>> "ST_Within" use index instead of geometry? Am
>>
>>            I right?
>>
>>
>>
>>        No, not instead of.  Both use the actual
>> geometry for testing
>>
>>        within.  ST_Within will also use the index to
>> narrow down the
>>
>>        candidate list first.
>>
>>        As you can see, the definition of ST_Within is
>> just a simple SQL
>>
>>        wrapper that first invokes the index.
>>
>>        postgis=# select prosrc from pg_proc where
>> proname = 'st_within';
>>
>>                      prosrc
>>          ---------------------------------------
>>
>>        SELECT $1 && $2 AND _ST_Within($1,$2)
>>
>>        (1 row)
>>
>>
>>
>>        http://postgis.refractions.net/documentation/manual-svn/ST_Within.html
>>
>>
>>
>>        Cheers,
>>
>>        Kevin
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>>        postgis-users mailing list
>>
>>        postgis-users at postgis.refractions.net
>>
>>        <mailto:postgis-users at postgis.refractions.net>
>>
>>        http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>>
>> postgis-users mailing list
>>
>> postgis-users at postgis.refractions.net
>>
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>> _______________________________________________
>>
>> postgis-users mailing list
>>
>> postgis-users at postgis.refractions.net
>>
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>>
>> -----Inline Attachment Follows-----
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

-- 

Ben Madin
REMOTE INFORMATION

t : +61 8 9192 5455
f : +61 8 9192 5535
m : 0448 887 220
Broome   WA   6725

ben at remoteinformation.com.au



							Out here, it pays to know...





More information about the postgis-users mailing list