[postgis-users] Problem with ST_Within, Polygons and Multipolygons

Jorge Arévalo jorge.arevalo at gmail.com
Mon May 11 08:47:02 PDT 2009


Hello again,

I think that we are very close to solve this problem (I mean, my co-worker
and me :-)).

Our original data were a TAB file, from Mapinfo. If we open this file with
this software, the result is this:
http://www.nebulared.com/tmp_geo/mapinfo_all_polygons.jpg

We can see the islands with different colours than the water. So, Mapinfo
"understands" the data in the way we want. And if we open the table directly
in Quantum GIS, we get this:
http://www.nebulared.com/tmp_geo/water_polygons_in_qgis.png (you can see
explained our problem in the picture).

But, if we select all the polygons of "water", we get this:
http://www.nebulared.com/tmp_geo/water_polygons_selected_in_qgis.png(explained,
the cause of our problem)

So, we have some kind of "water" polygons, inside the multipolygon "water",
with exactly the same shape of the islands. More clear:
http://www.nebulared.com/tmp_geo/water polygons.jpg

We need to declare this polygons as "inner", and we did, manually
(transforming outer polygons into inner ones using the WKT representation
and updating the table):
http://www.nebulared.com/tmp_geo/fixed_island.jpg

Of course, our data don't have "semantic information" to know that these
polygons of "water" with the shape of the islands, have to be inner
polygons, and represent a hole. But Mapinfo knows! So, we did this:

TAB file --> MIF file (with Mapinfo)
MIF file --> SHP file (with ogr2ogr)
SHP file --> PostGIS (with shp2pgsql)

And seems to work! I mean, we have holes in the place of islands, and
QuantumGIS shows the data in the way we want:
http://www.nebulared.com/tmp_geo/balearic_islands_fixed.png

We don't know how, we don't know why... but that's. Any ideas? Now, we are
checking the data. Cross as much fingers as you can... Many thanks for your
help.

Regards! And many thanks
Jorge

2009/5/8 Jorge Arévalo <jorge.arevalo at gmail.com>

> Hello,
>
> Of course I have faith, I need it :-). But, to put you on context: I have a
> table with 29 rows. Each row has a column with a kind of terrain (as a
> string) and other column with a multipolygon. Each multipolygon has
> thousands of polygons. For example, the multipolygon "water" has 8872.
>
> Ok, After running ST_IsValid in the entire table, 27 of 29 multipolygons
> has errors. ST_IsValid only informs about the first error.
>
> For these reasons I think that fixing the errors manually could take a
> really LONG time. Maybe applying ST_Buffer can reduce the amount of errors.
> I'll try this option, and the Kevin's option too. It's gonna be a hard
> work... But I'll keep you informed.
>
> Regards
> Jorge
>
>
> On Fri, May 8, 2009 at 2:24 AM, Ben Madin <ben at remoteinformation.com.au>wrote:
>
>> Jorge,
>>
>> I would have some faith in Brent's approach - if you run a query like :
>>
>> select gid, nice_text_name, st_astext(st_centroid(the_geom)) as centroid
>> from bad_data_set
>> where st_isValid(the_geom) is false;
>>
>> You will quickly get a list in a rough order of the magnitude of your
>> problem, as well as the notices of where the self-intersections are. In QGIS
>> there is a Zoom to Point Plugin so you can just put in the value given in
>> the notice, and then zoom straight in and fix the geometry as best you can
>> (or need!).
>>
>> I recently had to do this for a smallish data set (but plenty of errors),
>> and once you have your list and have done a few, you may be surprised how
>> quickly you can tidy them up.
>>
>> cheers
>>
>> Ben
>>
>>
>>
>>
>>
>>
>>
>> On 08/05/2009, at 6:39 AM, pcreso at pcreso.com wrote:
>>
>>
>>> HIi
>>>
>>> I have tried a few ways of dealing with this, including
>>> ST_Simplify_PreserveTopology() to try to fix errors.
>>>
>>> I have yet to find a solution that automatically resolves such probles as
>>> I think they should be. Using some variation of buffer(buffer(geom,-1),1)
>>> instead of buffer(geom,0) sometimes fixes things that the simple buffer of
>>> zero has failed to.
>>>
>>> What I have done that works for me is to use QGIS to edit the geometries
>>> that are not fixed by the above operations.
>>>
>>> I run isvalid() limit 1, then zoom into the returned coodinates &
>>> manually fix the problems, which so far have always been pretty simple &
>>> obvious. I then rerun isvalid() & work through each error, It doesn't take
>>> long per error, 200 or so can be fixed in a morning without too much
>>> trouble.
>>>
>>> Also note that an empty geometry is regarded as an error by some
>>> applications, but passes isvalid(), so I also do a check for
>>> ST_NumGeomteries()=0 to clean up PostGIS datasets.
>>>
>>>
>>> HTH,
>>>
>>>  Brent Wood
>>>
>>> --- On Fri, 5/8/09, Jorge Arévalo <jorge.arevalo at gmail.com> wrote:
>>>
>>>  From: Jorge Arévalo <jorge.arevalo at gmail.com>
>>>> Subject: Re: [postgis-users] Problem with ST_Within, Polygons and
>>>>  Multipolygons
>>>> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
>>>> Date: Friday, May 8, 2009, 9:44 AM
>>>> Hello,
>>>>
>>>> Many thanks for your help Kevin. I can confim one thing:
>>>> isvalid only returns the first error. Just today, I realized
>>>> on the big amount of errors of my data, as you said. Indeed,
>>>> I was reading the same link that you provided me about
>>>> geometry validity, and I split my multipolygon into single
>>>> polygons to check them. My conclusions are the same as
>>>> yours: what a mess!
>>>>
>>>>
>>>>
>>>> Oh, and thanks again for your response to my original
>>>> question. It helped me a lot... Maybe is it time to ask for
>>>> new data, without errors? If not, how should I
>>>> "clean" my data? I tried with ST_Buffer(geom, 0.0)
>>>> (I read in this link
>>>> http://www.bostongis.com/postgis_extent_expand_buffer_distance.snippet
>>>> that is useful to repair data with self-interesections), but
>>>> doesn't seem to work for me...
>>>>
>>>>
>>>>
>>>> Regards
>>>> Jorge
>>>>
>>>>
>>>>
>>>> On Thu, May 7, 2009 at 6:58 PM,
>>>> Kevin Neufeld <kneufeld at refractions.net>
>>>> wrote:
>>>>
>>>>
>>>> Hi Jorge,
>>>>
>>>>
>>>>
>>>> So, I had a quick look at your data ... I'm sorry to
>>>> say that you've got a lot of problems with your ocean
>>>> polygon.
>>>>
>>>> Paul can correct me since he did the work on isvalidreason,
>>>> but I think st_isvalid only returns the first thing it finds
>>>> wrong with your input geometry.
>>>>
>>>>
>>>>
>>>> I did this and found all kinds of things:
>>>>
>>>>
>>>>
>>>> -- Extract the POLYGONs from the MULTI
>>>>
>>>> CREATE TABLE water_polys AS
>>>>
>>>> SELECT (ST_Dump(the_geom)).geom AS the_geom
>>>>
>>>> FROM water;
>>>>
>>>>
>>>>
>>>> SELECT ST_IsValidReason(the_geom)
>>>>
>>>> FROM water_polys
>>>>
>>>> WHERE NOT ST_IsValid(the_geom);
>>>>
>>>>
>>>>
>>>> So, looking at the individual polygons (of which only the
>>>> first error is again reported), I found over 66
>>>> self-intersections (figure 8's and so forth) and several
>>>> whose boundaries loop back and touch itself.
>>>>
>>>>
>>>>
>>>> Also, in response to your original question about why a
>>>> point on your airport also intersects the ocean polygon: it
>>>> turns out that the island in question is not a hole in your
>>>> ocean at all ... but a second polygon on top of your ocean
>>>> polygon.  This is why your point query returns true - there
>>>> are two surfaces (the island and the water) that intersect
>>>> the point (which, by the way, also makes an invalid
>>>> multipolygon).
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> A while back I put together some isvalid/issimple geometry
>>>> descriptions with soem pretty pictures:
>>>>
>>>>
>>>> http://postgis.refractions.net/documentation/manual-svn/ch04.html#OGC_Validity
>>>>
>>>>
>>>>
>>>> You may want to have a look through it and clean up your
>>>> dataset.  The invalidity will most certainly result in more
>>>> unexpected behaviour.
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Kevin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Jorge Arévalo wrote:
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> I have additional info. When running the method
>>>> "isvalid", I get this warning message:
>>>>
>>>>
>>>>
>>>> NOTICE:  Self-intersection at or near point -2749.99
>>>> 4.7955e+06
>>>>
>>>>
>>>>
>>>> What is exactly a "self-intersection"? May this
>>>> "self-intersection" be the cause of my problem?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Jorge
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2009/5/6 Jorge Arévalo <jorge.arevalo at gmail.com
>>>> <mailto:jorge.arevalo at gmail.com>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   Hi,
>>>>
>>>>
>>>>
>>>>   Thanks for your response Kevin. I can provide some
>>>> data:
>>>>
>>>>
>>>>
>>>>   - Screenshot (over geoserver) of the
>>>> "water" multipolygon. You can
>>>>
>>>>   see the shape of Spain and the Balearic Islands, and
>>>> a hole on the
>>>>
>>>>   left (Portugal). The multipolygon is open, isn't
>>>> it? -->
>>>>
>>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.jpg
>>>>
>>>>
>>>>
>>>>   - Zoom over Menorca (one of the Balearic Islands). As
>>>> you can see,
>>>>
>>>>   the point tested is inside the "airports"
>>>> multipolygon, but
>>>>
>>>>   ST_Within and ST_Contains returns that the point is
>>>> inside both,
>>>>
>>>>   "water" and "airports"
>>>> multipolygons -->
>>>>
>>>>   http://www.nebulared.com/tmp_geo/zoom_in_water_multipolygon.jpg
>>>>
>>>>
>>>>
>>>>   - The "ST_AsText" data for the
>>>> "airport" multipolygon -->
>>>>
>>>>   http://www.nebulared.com/tmp_geo/airports_multipolygon.txt
>>>>
>>>>
>>>>
>>>>   - The "ST_AsText" data for the
>>>> "water" multipolygon (rar file, 2.5
>>>>
>>>>   MB of plain text...) -->
>>>>
>>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.rar
>>>>
>>>>
>>>>
>>>>   So, how can I check the validity of my multipolygons?
>>>> Do someone
>>>>
>>>>   could help me with this?
>>>>
>>>>
>>>>
>>>>   Now, the query. My context is the next:
>>>>
>>>>   - I have a table with polygons that I want to check
>>>> (to say if they
>>>>
>>>>   are in an airport, in a forest, in water, etc). We
>>>> can call this
>>>>
>>>>   table T1. I use the centroid of the polygons to
>>>> check, not the
>>>>
>>>>   entire polygon.
>>>>
>>>>   - In other different table, that we'll call T2, I
>>>> have all the
>>>>
>>>>   multipolygons ("water",
>>>> "airports", etc). The name of the field that
>>>>
>>>>   says if the multipolygon is "water" or
>>>> "airport" or anything is
>>>>
>>>>   "clutter"
>>>>
>>>>
>>>>
>>>>   With this code, I pretend to loop over T1, checking
>>>> if the centroid
>>>>
>>>>   of its polygons are inside of one (and ONLY one) of
>>>> the
>>>>
>>>>   multipolygons of T2.  (Is part of a procedure)
>>>>
>>>>
>>>>
>>>>   (...)
>>>>
>>>>   BEGIN
>>>>
>>>>     FOR s IN SELECT * FROM T1 LOOP
>>>>
>>>>           select clutter from T2 where
>>>>
>>>>   ST_Within(ST_Centroid(T1.wkb_geometry),
>>>> T2.wkb_geometry))
>>>>
>>>>           (...)
>>>>
>>>>     END LOOP;
>>>>
>>>>     RETURN;
>>>>
>>>>   END;
>>>>
>>>>
>>>>
>>>>   Most of the times, the select inside the loop,
>>>> returns more than one
>>>>
>>>>   result ("water" and other one, normally).
>>>> That's my problem
>>>>
>>>>
>>>>
>>>>   I have another version of the query, with Java.
>>>> Again, I loop over
>>>>
>>>>   the table T1, and check the centroid of its polygons
>>>> with all the
>>>>
>>>>   multipolygons (water, airports...) that are in T2.
>>>> Then, my query,
>>>>
>>>>   inside the loop, is:
>>>>
>>>>
>>>>
>>>>   SELECT clutter FROM T2 WHERE ST_Within(?,
>>>> T2.wkb_geometry)
>>>>
>>>>
>>>>
>>>>   The '?' is replaced with the centroid of the
>>>> polygon in T1, of
>>>>
>>>>   course. Basically, is the same idea of the previous
>>>> SQL code, but
>>>>
>>>>   with Java.
>>>>
>>>>
>>>>
>>>>   Apart from this, other problem is that ST_Within
>>>> takes a long time.
>>>>
>>>>   1 sec per query. And I have like 2 millions of
>>>> polygons (queries) to
>>>>
>>>>   check...
>>>>
>>>>
>>>>
>>>>   My multipolygons' table (T2) has only 29 rows
>>>> (small table with
>>>>
>>>>   large geometries), and I found this
>>>>
>>>>   http://postgis.refractions.net/documentation/manual-1.3/ch05.html.
>>>>
>>>>   Seems to be good for me. But using "SET
>>>> enable_seqscan TO off", the
>>>>
>>>>   queries take the same time (~ 1 sec). Maybe the other
>>>> approach
>>>>
>>>>   (create an additional column that "caches"
>>>> the bbox, and matching
>>>>
>>>>   against this) could help me. Any ideas on how to
>>>> create this bbox
>>>>
>>>>   with my huge multipolygons?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   Thanks in advance
>>>>
>>>>
>>>>
>>>>   Regards
>>>>
>>>>   Jorge
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   On Wed, May 6, 2009 at 4:28 AM, Kevin Neufeld
>>>>
>>>>   <kneufeld at refractions.net
>>>> <mailto:kneufeld at refractions.net>>
>>>> wrote:
>>>>
>>>>
>>>>
>>>>       Jorge Arévalo wrote:
>>>>
>>>>
>>>>
>>>>           ... If is useful, I tested the method
>>>> "ST_isvalid" with the
>>>>
>>>>           multipolygon and returns
>>>> "false". Maybe the multipolygon is
>>>>
>>>>           not closed? I loaded the data from a
>>>> shapefile. Is it
>>>>
>>>>           possible to create a
>>>> "non-valid" multipolygon? Does PostGIS
>>>>
>>>>           accept this?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>       Ah, yes.  Most spatial predicates in PostGIS
>>>> assume the input
>>>>
>>>>       geometry is valid.  Suppose you had defined a
>>>> POLYGON with a
>>>>
>>>>       hole or inner ring outside of the exterior
>>>> ring.  What is the
>>>>
>>>>       area?  Does the question even make sense?
>>>> PostGIS allows
>>>>
>>>>       invalid geometries in the database so users can
>>>> make full use of
>>>>
>>>>       the PostGIS toolset to do whatever they need
>>>> (ie. breakdown a
>>>>
>>>>       polygon to it's constituent linework and
>>>> rebuild it back up
>>>>
>>>>       again to a valid polygon)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>           Ok, being even more specific. I'm
>>>> working with data about
>>>>
>>>>           Spain. I have a HUGE multipolygon that
>>>> represents "water"
>>>>
>>>>           (this is, the coasts around Spain and its
>>>> islands). Then,
>>>>
>>>>           the "holes" inside this
>>>> multypolygon have the shape of
>>>>
>>>>           Spain, Balearic Islands and Canary
>>>> Islands. Of course, I
>>>>
>>>>           have more multipolygons that represent
>>>> "forests",
>>>>
>>>>           "airports", "cities",
>>>> etc, that fit into these holes.
>>>>
>>>>
>>>>
>>>>           Really, my problem is with some points
>>>> that belong to an
>>>>
>>>>           airport in an island. Using
>>>> "ST_Within" and "ST_Contains",
>>>>
>>>>           the result is that these points belong to
>>>> the multipolygon
>>>>
>>>>           "airport" and multipolygon
>>>> "water" at same time. Obviously,
>>>>
>>>>           the island (and its airport) is
>>>> surrounded by water, but the
>>>>
>>>>           airport's points shouldn't be
>>>> part of the multopolygon
>>>>
>>>>           "water". And, as I said, when I
>>>> apply "ST_isvalid" to the
>>>>
>>>>           multipolygon "water", returns
>>>> false. Maybe is not closed?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>       Yeah, as mentioned before, a point that is not
>>>> on the surface of
>>>>
>>>>       a (multi)polygon (whether in a hole or
>>>> completely outside) is
>>>>
>>>>       not considered within the (multi)polygon.  It
>>>> does sound like
>>>>
>>>>       your ocean polygon has validity issues.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>           Oh, btw, what's the difference
>>>> between "ST_Within" and
>>>>
>>>>           "Within". Does
>>>> "ST_Within" use index instead of geometry? Am
>>>>
>>>>           I right?
>>>>
>>>>
>>>>
>>>>       No, not instead of.  Both use the actual
>>>> geometry for testing
>>>>
>>>>       within.  ST_Within will also use the index to
>>>> narrow down the
>>>>
>>>>       candidate list first.
>>>>
>>>>       As you can see, the definition of ST_Within is
>>>> just a simple SQL
>>>>
>>>>       wrapper that first invokes the index.
>>>>
>>>>       postgis=# select prosrc from pg_proc where
>>>> proname = 'st_within';
>>>>
>>>>                     prosrc
>>>>         ---------------------------------------
>>>>
>>>>       SELECT $1 && $2 AND _ST_Within($1,$2)
>>>>
>>>>       (1 row)
>>>>
>>>>
>>>>
>>>>
>>>> http://postgis.refractions.net/documentation/manual-svn/ST_Within.html
>>>>
>>>>
>>>>
>>>>       Cheers,
>>>>
>>>>       Kevin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>>       postgis-users mailing list
>>>>
>>>>       postgis-users at postgis.refractions.net
>>>>
>>>>       <mailto:postgis-users at postgis.refractions.net>
>>>>
>>>>       http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> postgis-users mailing list
>>>>
>>>> postgis-users at postgis.refractions.net
>>>>
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> postgis-users mailing list
>>>>
>>>> postgis-users at postgis.refractions.net
>>>>
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>>
>>>>
>>>>
>>>> -----Inline Attachment Follows-----
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>>  _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>> --
>>
>> Ben Madin
>> REMOTE INFORMATION
>>
>> t : +61 8 9192 5455
>> f : +61 8 9192 5535
>> m : 0448 887 220
>> Broome   WA   6725
>>
>> ben at remoteinformation.com.au
>>
>>
>>
>>                                                        Out here, it pays
>> to know...
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20090511/ab2ebbbe/attachment.html>


More information about the postgis-users mailing list