[postgis-users] Problem with ST_Within, Polygons and Multipolygons

Jorge Arévalo jorge.arevalo at gmail.com
Fri May 8 00:00:36 PDT 2009


Hello,

Of course I have faith, I need it :-). But, to put you on context: I have a
table with 29 rows. Each row has a column with a kind of terrain (as a
string) and other column with a multipolygon. Each multipolygon has
thousands of polygons. For example, the multipolygon "water" has 8872.

Ok, After running ST_IsValid in the entire table, 27 of 29 multipolygons has
errors. ST_IsValid only informs about the first error.

For these reasons I think that fixing the errors manually could take a
really LONG time. Maybe applying ST_Buffer can reduce the amount of errors.
I'll try this option, and the Kevin's option too. It's gonna be a hard
work... But I'll keep you informed.

Regards
Jorge

On Fri, May 8, 2009 at 2:24 AM, Ben Madin <ben at remoteinformation.com.au>wrote:

> Jorge,
>
> I would have some faith in Brent's approach - if you run a query like :
>
> select gid, nice_text_name, st_astext(st_centroid(the_geom)) as centroid
> from bad_data_set
> where st_isValid(the_geom) is false;
>
> You will quickly get a list in a rough order of the magnitude of your
> problem, as well as the notices of where the self-intersections are. In QGIS
> there is a Zoom to Point Plugin so you can just put in the value given in
> the notice, and then zoom straight in and fix the geometry as best you can
> (or need!).
>
> I recently had to do this for a smallish data set (but plenty of errors),
> and once you have your list and have done a few, you may be surprised how
> quickly you can tidy them up.
>
> cheers
>
> Ben
>
>
>
>
>
>
>
> On 08/05/2009, at 6:39 AM, pcreso at pcreso.com wrote:
>
>
>> HIi
>>
>> I have tried a few ways of dealing with this, including
>> ST_Simplify_PreserveTopology() to try to fix errors.
>>
>> I have yet to find a solution that automatically resolves such probles as
>> I think they should be. Using some variation of buffer(buffer(geom,-1),1)
>> instead of buffer(geom,0) sometimes fixes things that the simple buffer of
>> zero has failed to.
>>
>> What I have done that works for me is to use QGIS to edit the geometries
>> that are not fixed by the above operations.
>>
>> I run isvalid() limit 1, then zoom into the returned coodinates & manually
>> fix the problems, which so far have always been pretty simple & obvious. I
>> then rerun isvalid() & work through each error, It doesn't take long per
>> error, 200 or so can be fixed in a morning without too much trouble.
>>
>> Also note that an empty geometry is regarded as an error by some
>> applications, but passes isvalid(), so I also do a check for
>> ST_NumGeomteries()=0 to clean up PostGIS datasets.
>>
>>
>> HTH,
>>
>>  Brent Wood
>>
>> --- On Fri, 5/8/09, Jorge Arévalo <jorge.arevalo at gmail.com> wrote:
>>
>>  From: Jorge Arévalo <jorge.arevalo at gmail.com>
>>> Subject: Re: [postgis-users] Problem with ST_Within, Polygons and
>>>  Multipolygons
>>> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
>>> Date: Friday, May 8, 2009, 9:44 AM
>>> Hello,
>>>
>>> Many thanks for your help Kevin. I can confim one thing:
>>> isvalid only returns the first error. Just today, I realized
>>> on the big amount of errors of my data, as you said. Indeed,
>>> I was reading the same link that you provided me about
>>> geometry validity, and I split my multipolygon into single
>>> polygons to check them. My conclusions are the same as
>>> yours: what a mess!
>>>
>>>
>>>
>>> Oh, and thanks again for your response to my original
>>> question. It helped me a lot... Maybe is it time to ask for
>>> new data, without errors? If not, how should I
>>> "clean" my data? I tried with ST_Buffer(geom, 0.0)
>>> (I read in this link
>>> http://www.bostongis.com/postgis_extent_expand_buffer_distance.snippet
>>> that is useful to repair data with self-interesections), but
>>> doesn't seem to work for me...
>>>
>>>
>>>
>>> Regards
>>> Jorge
>>>
>>>
>>>
>>> On Thu, May 7, 2009 at 6:58 PM,
>>> Kevin Neufeld <kneufeld at refractions.net>
>>> wrote:
>>>
>>>
>>> Hi Jorge,
>>>
>>>
>>>
>>> So, I had a quick look at your data ... I'm sorry to
>>> say that you've got a lot of problems with your ocean
>>> polygon.
>>>
>>> Paul can correct me since he did the work on isvalidreason,
>>> but I think st_isvalid only returns the first thing it finds
>>> wrong with your input geometry.
>>>
>>>
>>>
>>> I did this and found all kinds of things:
>>>
>>>
>>>
>>> -- Extract the POLYGONs from the MULTI
>>>
>>> CREATE TABLE water_polys AS
>>>
>>> SELECT (ST_Dump(the_geom)).geom AS the_geom
>>>
>>> FROM water;
>>>
>>>
>>>
>>> SELECT ST_IsValidReason(the_geom)
>>>
>>> FROM water_polys
>>>
>>> WHERE NOT ST_IsValid(the_geom);
>>>
>>>
>>>
>>> So, looking at the individual polygons (of which only the
>>> first error is again reported), I found over 66
>>> self-intersections (figure 8's and so forth) and several
>>> whose boundaries loop back and touch itself.
>>>
>>>
>>>
>>> Also, in response to your original question about why a
>>> point on your airport also intersects the ocean polygon: it
>>> turns out that the island in question is not a hole in your
>>> ocean at all ... but a second polygon on top of your ocean
>>> polygon.  This is why your point query returns true - there
>>> are two surfaces (the island and the water) that intersect
>>> the point (which, by the way, also makes an invalid
>>> multipolygon).
>>>
>>>
>>>
>>>
>>>
>>> A while back I put together some isvalid/issimple geometry
>>> descriptions with soem pretty pictures:
>>>
>>>
>>> http://postgis.refractions.net/documentation/manual-svn/ch04.html#OGC_Validity
>>>
>>>
>>>
>>> You may want to have a look through it and clean up your
>>> dataset.  The invalidity will most certainly result in more
>>> unexpected behaviour.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Kevin
>>>
>>>
>>>
>>>
>>>
>>> Jorge Arévalo wrote:
>>>
>>>
>>> Hello,
>>>
>>>
>>>
>>> I have additional info. When running the method
>>> "isvalid", I get this warning message:
>>>
>>>
>>>
>>> NOTICE:  Self-intersection at or near point -2749.99
>>> 4.7955e+06
>>>
>>>
>>>
>>> What is exactly a "self-intersection"? May this
>>> "self-intersection" be the cause of my problem?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Jorge
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2009/5/6 Jorge Arévalo <jorge.arevalo at gmail.com
>>> <mailto:jorge.arevalo at gmail.com>>
>>>
>>>
>>>
>>>
>>>
>>>   Hi,
>>>
>>>
>>>
>>>   Thanks for your response Kevin. I can provide some
>>> data:
>>>
>>>
>>>
>>>   - Screenshot (over geoserver) of the
>>> "water" multipolygon. You can
>>>
>>>   see the shape of Spain and the Balearic Islands, and
>>> a hole on the
>>>
>>>   left (Portugal). The multipolygon is open, isn't
>>> it? -->
>>>
>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.jpg
>>>
>>>
>>>
>>>   - Zoom over Menorca (one of the Balearic Islands). As
>>> you can see,
>>>
>>>   the point tested is inside the "airports"
>>> multipolygon, but
>>>
>>>   ST_Within and ST_Contains returns that the point is
>>> inside both,
>>>
>>>   "water" and "airports"
>>> multipolygons -->
>>>
>>>   http://www.nebulared.com/tmp_geo/zoom_in_water_multipolygon.jpg
>>>
>>>
>>>
>>>   - The "ST_AsText" data for the
>>> "airport" multipolygon -->
>>>
>>>   http://www.nebulared.com/tmp_geo/airports_multipolygon.txt
>>>
>>>
>>>
>>>   - The "ST_AsText" data for the
>>> "water" multipolygon (rar file, 2.5
>>>
>>>   MB of plain text...) -->
>>>
>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.rar
>>>
>>>
>>>
>>>   So, how can I check the validity of my multipolygons?
>>> Do someone
>>>
>>>   could help me with this?
>>>
>>>
>>>
>>>   Now, the query. My context is the next:
>>>
>>>   - I have a table with polygons that I want to check
>>> (to say if they
>>>
>>>   are in an airport, in a forest, in water, etc). We
>>> can call this
>>>
>>>   table T1. I use the centroid of the polygons to
>>> check, not the
>>>
>>>   entire polygon.
>>>
>>>   - In other different table, that we'll call T2, I
>>> have all the
>>>
>>>   multipolygons ("water",
>>> "airports", etc). The name of the field that
>>>
>>>   says if the multipolygon is "water" or
>>> "airport" or anything is
>>>
>>>   "clutter"
>>>
>>>
>>>
>>>   With this code, I pretend to loop over T1, checking
>>> if the centroid
>>>
>>>   of its polygons are inside of one (and ONLY one) of
>>> the
>>>
>>>   multipolygons of T2.  (Is part of a procedure)
>>>
>>>
>>>
>>>   (...)
>>>
>>>   BEGIN
>>>
>>>     FOR s IN SELECT * FROM T1 LOOP
>>>
>>>           select clutter from T2 where
>>>
>>>   ST_Within(ST_Centroid(T1.wkb_geometry),
>>> T2.wkb_geometry))
>>>
>>>           (...)
>>>
>>>     END LOOP;
>>>
>>>     RETURN;
>>>
>>>   END;
>>>
>>>
>>>
>>>   Most of the times, the select inside the loop,
>>> returns more than one
>>>
>>>   result ("water" and other one, normally).
>>> That's my problem
>>>
>>>
>>>
>>>   I have another version of the query, with Java.
>>> Again, I loop over
>>>
>>>   the table T1, and check the centroid of its polygons
>>> with all the
>>>
>>>   multipolygons (water, airports...) that are in T2.
>>> Then, my query,
>>>
>>>   inside the loop, is:
>>>
>>>
>>>
>>>   SELECT clutter FROM T2 WHERE ST_Within(?,
>>> T2.wkb_geometry)
>>>
>>>
>>>
>>>   The '?' is replaced with the centroid of the
>>> polygon in T1, of
>>>
>>>   course. Basically, is the same idea of the previous
>>> SQL code, but
>>>
>>>   with Java.
>>>
>>>
>>>
>>>   Apart from this, other problem is that ST_Within
>>> takes a long time.
>>>
>>>   1 sec per query. And I have like 2 millions of
>>> polygons (queries) to
>>>
>>>   check...
>>>
>>>
>>>
>>>   My multipolygons' table (T2) has only 29 rows
>>> (small table with
>>>
>>>   large geometries), and I found this
>>>
>>>   http://postgis.refractions.net/documentation/manual-1.3/ch05.html.
>>>
>>>   Seems to be good for me. But using "SET
>>> enable_seqscan TO off", the
>>>
>>>   queries take the same time (~ 1 sec). Maybe the other
>>> approach
>>>
>>>   (create an additional column that "caches"
>>> the bbox, and matching
>>>
>>>   against this) could help me. Any ideas on how to
>>> create this bbox
>>>
>>>   with my huge multipolygons?
>>>
>>>
>>>
>>>
>>>
>>>   Thanks in advance
>>>
>>>
>>>
>>>   Regards
>>>
>>>   Jorge
>>>
>>>
>>>
>>>
>>>
>>>   On Wed, May 6, 2009 at 4:28 AM, Kevin Neufeld
>>>
>>>   <kneufeld at refractions.net
>>> <mailto:kneufeld at refractions.net>>
>>> wrote:
>>>
>>>
>>>
>>>       Jorge Arévalo wrote:
>>>
>>>
>>>
>>>           ... If is useful, I tested the method
>>> "ST_isvalid" with the
>>>
>>>           multipolygon and returns
>>> "false". Maybe the multipolygon is
>>>
>>>           not closed? I loaded the data from a
>>> shapefile. Is it
>>>
>>>           possible to create a
>>> "non-valid" multipolygon? Does PostGIS
>>>
>>>           accept this?
>>>
>>>
>>>
>>>
>>>
>>>       Ah, yes.  Most spatial predicates in PostGIS
>>> assume the input
>>>
>>>       geometry is valid.  Suppose you had defined a
>>> POLYGON with a
>>>
>>>       hole or inner ring outside of the exterior
>>> ring.  What is the
>>>
>>>       area?  Does the question even make sense?
>>> PostGIS allows
>>>
>>>       invalid geometries in the database so users can
>>> make full use of
>>>
>>>       the PostGIS toolset to do whatever they need
>>> (ie. breakdown a
>>>
>>>       polygon to it's constituent linework and
>>> rebuild it back up
>>>
>>>       again to a valid polygon)
>>>
>>>
>>>
>>>
>>>
>>>           Ok, being even more specific. I'm
>>> working with data about
>>>
>>>           Spain. I have a HUGE multipolygon that
>>> represents "water"
>>>
>>>           (this is, the coasts around Spain and its
>>> islands). Then,
>>>
>>>           the "holes" inside this
>>> multypolygon have the shape of
>>>
>>>           Spain, Balearic Islands and Canary
>>> Islands. Of course, I
>>>
>>>           have more multipolygons that represent
>>> "forests",
>>>
>>>           "airports", "cities",
>>> etc, that fit into these holes.
>>>
>>>
>>>
>>>           Really, my problem is with some points
>>> that belong to an
>>>
>>>           airport in an island. Using
>>> "ST_Within" and "ST_Contains",
>>>
>>>           the result is that these points belong to
>>> the multipolygon
>>>
>>>           "airport" and multipolygon
>>> "water" at same time. Obviously,
>>>
>>>           the island (and its airport) is
>>> surrounded by water, but the
>>>
>>>           airport's points shouldn't be
>>> part of the multopolygon
>>>
>>>           "water". And, as I said, when I
>>> apply "ST_isvalid" to the
>>>
>>>           multipolygon "water", returns
>>> false. Maybe is not closed?
>>>
>>>
>>>
>>>
>>>
>>>       Yeah, as mentioned before, a point that is not
>>> on the surface of
>>>
>>>       a (multi)polygon (whether in a hole or
>>> completely outside) is
>>>
>>>       not considered within the (multi)polygon.  It
>>> does sound like
>>>
>>>       your ocean polygon has validity issues.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>           Oh, btw, what's the difference
>>> between "ST_Within" and
>>>
>>>           "Within". Does
>>> "ST_Within" use index instead of geometry? Am
>>>
>>>           I right?
>>>
>>>
>>>
>>>       No, not instead of.  Both use the actual
>>> geometry for testing
>>>
>>>       within.  ST_Within will also use the index to
>>> narrow down the
>>>
>>>       candidate list first.
>>>
>>>       As you can see, the definition of ST_Within is
>>> just a simple SQL
>>>
>>>       wrapper that first invokes the index.
>>>
>>>       postgis=# select prosrc from pg_proc where
>>> proname = 'st_within';
>>>
>>>                     prosrc
>>>         ---------------------------------------
>>>
>>>       SELECT $1 && $2 AND _ST_Within($1,$2)
>>>
>>>       (1 row)
>>>
>>>
>>>
>>>
>>> http://postgis.refractions.net/documentation/manual-svn/ST_Within.html
>>>
>>>
>>>
>>>       Cheers,
>>>
>>>       Kevin
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>>       postgis-users mailing list
>>>
>>>       postgis-users at postgis.refractions.net
>>>
>>>       <mailto:postgis-users at postgis.refractions.net>
>>>
>>>       http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> postgis-users mailing list
>>>
>>> postgis-users at postgis.refractions.net
>>>
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>> _______________________________________________
>>>
>>> postgis-users mailing list
>>>
>>> postgis-users at postgis.refractions.net
>>>
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>
>>>
>>> -----Inline Attachment Follows-----
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>  _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>
> --
>
> Ben Madin
> REMOTE INFORMATION
>
> t : +61 8 9192 5455
> f : +61 8 9192 5535
> m : 0448 887 220
> Broome   WA   6725
>
> ben at remoteinformation.com.au
>
>
>
>                                                        Out here, it pays to
> know...
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20090508/98d34199/attachment.html>


More information about the postgis-users mailing list