[postgis-users] Problem with ST_Within, Polygons and Multipolygons

Jorge Arévalo jorge.arevalo at gmail.com
Mon May 11 09:04:44 PDT 2009


Sorry, a mistake

Change this:
http://www.nebulared.com/tmp_geo/water polygons.jpg

For this:
http://www.nebulared.com/tmp_geo/water_polygons.jpg<http://www.nebulared.com/tmp_geo/water>



2009/5/11 Jorge Arévalo <jorge.arevalo at gmail.com>

> Hello again,
>
> I think that we are very close to solve this problem (I mean, my co-worker
> and me :-)).
>
> Our original data were a TAB file, from Mapinfo. If we open this file with
> this software, the result is this:
> http://www.nebulared.com/tmp_geo/mapinfo_all_polygons.jpg
>
> We can see the islands with different colours than the water. So, Mapinfo
> "understands" the data in the way we want. And if we open the table directly
> in Quantum GIS, we get this:
> http://www.nebulared.com/tmp_geo/water_polygons_in_qgis.png (you can see
> explained our problem in the picture).
>
> But, if we select all the polygons of "water", we get this:
> http://www.nebulared.com/tmp_geo/water_polygons_selected_in_qgis.png(explained, the cause of our problem)
>
> So, we have some kind of "water" polygons, inside the multipolygon "water",
> with exactly the same shape of the islands. More clear:
> http://www.nebulared.com/tmp_geo/water polygons.jpg
>
> We need to declare this polygons as "inner", and we did, manually
> (transforming outer polygons into inner ones using the WKT representation
> and updating the table):
> http://www.nebulared.com/tmp_geo/fixed_island.jpg
>
> Of course, our data don't have "semantic information" to know that these
> polygons of "water" with the shape of the islands, have to be inner
> polygons, and represent a hole. But Mapinfo knows! So, we did this:
>
> TAB file --> MIF file (with Mapinfo)
> MIF file --> SHP file (with ogr2ogr)
> SHP file --> PostGIS (with shp2pgsql)
>
> And seems to work! I mean, we have holes in the place of islands, and
> QuantumGIS shows the data in the way we want:
> http://www.nebulared.com/tmp_geo/balearic_islands_fixed.png
>
> We don't know how, we don't know why... but that's. Any ideas? Now, we are
> checking the data. Cross as much fingers as you can... Many thanks for your
> help.
>
> Regards! And many thanks
> Jorge
>
> 2009/5/8 Jorge Arévalo <jorge.arevalo at gmail.com>
>
> Hello,
>>
>> Of course I have faith, I need it :-). But, to put you on context: I have
>> a table with 29 rows. Each row has a column with a kind of terrain (as a
>> string) and other column with a multipolygon. Each multipolygon has
>> thousands of polygons. For example, the multipolygon "water" has 8872.
>>
>> Ok, After running ST_IsValid in the entire table, 27 of 29 multipolygons
>> has errors. ST_IsValid only informs about the first error.
>>
>> For these reasons I think that fixing the errors manually could take a
>> really LONG time. Maybe applying ST_Buffer can reduce the amount of errors.
>> I'll try this option, and the Kevin's option too. It's gonna be a hard
>> work... But I'll keep you informed.
>>
>> Regards
>> Jorge
>>
>>
>> On Fri, May 8, 2009 at 2:24 AM, Ben Madin <ben at remoteinformation.com.au>wrote:
>>
>>> Jorge,
>>>
>>> I would have some faith in Brent's approach - if you run a query like :
>>>
>>> select gid, nice_text_name, st_astext(st_centroid(the_geom)) as centroid
>>> from bad_data_set
>>> where st_isValid(the_geom) is false;
>>>
>>> You will quickly get a list in a rough order of the magnitude of your
>>> problem, as well as the notices of where the self-intersections are. In QGIS
>>> there is a Zoom to Point Plugin so you can just put in the value given in
>>> the notice, and then zoom straight in and fix the geometry as best you can
>>> (or need!).
>>>
>>> I recently had to do this for a smallish data set (but plenty of errors),
>>> and once you have your list and have done a few, you may be surprised how
>>> quickly you can tidy them up.
>>>
>>> cheers
>>>
>>> Ben
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 08/05/2009, at 6:39 AM, pcreso at pcreso.com wrote:
>>>
>>>
>>>> HIi
>>>>
>>>> I have tried a few ways of dealing with this, including
>>>> ST_Simplify_PreserveTopology() to try to fix errors.
>>>>
>>>> I have yet to find a solution that automatically resolves such probles
>>>> as I think they should be. Using some variation of buffer(buffer(geom,-1),1)
>>>> instead of buffer(geom,0) sometimes fixes things that the simple buffer of
>>>> zero has failed to.
>>>>
>>>> What I have done that works for me is to use QGIS to edit the geometries
>>>> that are not fixed by the above operations.
>>>>
>>>> I run isvalid() limit 1, then zoom into the returned coodinates &
>>>> manually fix the problems, which so far have always been pretty simple &
>>>> obvious. I then rerun isvalid() & work through each error, It doesn't take
>>>> long per error, 200 or so can be fixed in a morning without too much
>>>> trouble.
>>>>
>>>> Also note that an empty geometry is regarded as an error by some
>>>> applications, but passes isvalid(), so I also do a check for
>>>> ST_NumGeomteries()=0 to clean up PostGIS datasets.
>>>>
>>>>
>>>> HTH,
>>>>
>>>>  Brent Wood
>>>>
>>>> --- On Fri, 5/8/09, Jorge Arévalo <jorge.arevalo at gmail.com> wrote:
>>>>
>>>>  From: Jorge Arévalo <jorge.arevalo at gmail.com>
>>>>> Subject: Re: [postgis-users] Problem with ST_Within, Polygons and
>>>>>  Multipolygons
>>>>> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
>>>>> Date: Friday, May 8, 2009, 9:44 AM
>>>>> Hello,
>>>>>
>>>>> Many thanks for your help Kevin. I can confim one thing:
>>>>> isvalid only returns the first error. Just today, I realized
>>>>> on the big amount of errors of my data, as you said. Indeed,
>>>>> I was reading the same link that you provided me about
>>>>> geometry validity, and I split my multipolygon into single
>>>>> polygons to check them. My conclusions are the same as
>>>>> yours: what a mess!
>>>>>
>>>>>
>>>>>
>>>>> Oh, and thanks again for your response to my original
>>>>> question. It helped me a lot... Maybe is it time to ask for
>>>>> new data, without errors? If not, how should I
>>>>> "clean" my data? I tried with ST_Buffer(geom, 0.0)
>>>>> (I read in this link
>>>>> http://www.bostongis.com/postgis_extent_expand_buffer_distance.snippet
>>>>> that is useful to repair data with self-interesections), but
>>>>> doesn't seem to work for me...
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Jorge
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 7, 2009 at 6:58 PM,
>>>>> Kevin Neufeld <kneufeld at refractions.net>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Jorge,
>>>>>
>>>>>
>>>>>
>>>>> So, I had a quick look at your data ... I'm sorry to
>>>>> say that you've got a lot of problems with your ocean
>>>>> polygon.
>>>>>
>>>>> Paul can correct me since he did the work on isvalidreason,
>>>>> but I think st_isvalid only returns the first thing it finds
>>>>> wrong with your input geometry.
>>>>>
>>>>>
>>>>>
>>>>> I did this and found all kinds of things:
>>>>>
>>>>>
>>>>>
>>>>> -- Extract the POLYGONs from the MULTI
>>>>>
>>>>> CREATE TABLE water_polys AS
>>>>>
>>>>> SELECT (ST_Dump(the_geom)).geom AS the_geom
>>>>>
>>>>> FROM water;
>>>>>
>>>>>
>>>>>
>>>>> SELECT ST_IsValidReason(the_geom)
>>>>>
>>>>> FROM water_polys
>>>>>
>>>>> WHERE NOT ST_IsValid(the_geom);
>>>>>
>>>>>
>>>>>
>>>>> So, looking at the individual polygons (of which only the
>>>>> first error is again reported), I found over 66
>>>>> self-intersections (figure 8's and so forth) and several
>>>>> whose boundaries loop back and touch itself.
>>>>>
>>>>>
>>>>>
>>>>> Also, in response to your original question about why a
>>>>> point on your airport also intersects the ocean polygon: it
>>>>> turns out that the island in question is not a hole in your
>>>>> ocean at all ... but a second polygon on top of your ocean
>>>>> polygon.  This is why your point query returns true - there
>>>>> are two surfaces (the island and the water) that intersect
>>>>> the point (which, by the way, also makes an invalid
>>>>> multipolygon).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> A while back I put together some isvalid/issimple geometry
>>>>> descriptions with soem pretty pictures:
>>>>>
>>>>>
>>>>> http://postgis.refractions.net/documentation/manual-svn/ch04.html#OGC_Validity
>>>>>
>>>>>
>>>>>
>>>>> You may want to have a look through it and clean up your
>>>>> dataset.  The invalidity will most certainly result in more
>>>>> unexpected behaviour.
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Kevin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Jorge Arévalo wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> I have additional info. When running the method
>>>>> "isvalid", I get this warning message:
>>>>>
>>>>>
>>>>>
>>>>> NOTICE:  Self-intersection at or near point -2749.99
>>>>> 4.7955e+06
>>>>>
>>>>>
>>>>>
>>>>> What is exactly a "self-intersection"? May this
>>>>> "self-intersection" be the cause of my problem?
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Jorge
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2009/5/6 Jorge Arévalo <jorge.arevalo at gmail.com
>>>>> <mailto:jorge.arevalo at gmail.com>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   Hi,
>>>>>
>>>>>
>>>>>
>>>>>   Thanks for your response Kevin. I can provide some
>>>>> data:
>>>>>
>>>>>
>>>>>
>>>>>   - Screenshot (over geoserver) of the
>>>>> "water" multipolygon. You can
>>>>>
>>>>>   see the shape of Spain and the Balearic Islands, and
>>>>> a hole on the
>>>>>
>>>>>   left (Portugal). The multipolygon is open, isn't
>>>>> it? -->
>>>>>
>>>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.jpg
>>>>>
>>>>>
>>>>>
>>>>>   - Zoom over Menorca (one of the Balearic Islands). As
>>>>> you can see,
>>>>>
>>>>>   the point tested is inside the "airports"
>>>>> multipolygon, but
>>>>>
>>>>>   ST_Within and ST_Contains returns that the point is
>>>>> inside both,
>>>>>
>>>>>   "water" and "airports"
>>>>> multipolygons -->
>>>>>
>>>>>   http://www.nebulared.com/tmp_geo/zoom_in_water_multipolygon.jpg
>>>>>
>>>>>
>>>>>
>>>>>   - The "ST_AsText" data for the
>>>>> "airport" multipolygon -->
>>>>>
>>>>>   http://www.nebulared.com/tmp_geo/airports_multipolygon.txt
>>>>>
>>>>>
>>>>>
>>>>>   - The "ST_AsText" data for the
>>>>> "water" multipolygon (rar file, 2.5
>>>>>
>>>>>   MB of plain text...) -->
>>>>>
>>>>>   http://www.nebulared.com/tmp_geo/water_multipolygon.rar
>>>>>
>>>>>
>>>>>
>>>>>   So, how can I check the validity of my multipolygons?
>>>>> Do someone
>>>>>
>>>>>   could help me with this?
>>>>>
>>>>>
>>>>>
>>>>>   Now, the query. My context is the next:
>>>>>
>>>>>   - I have a table with polygons that I want to check
>>>>> (to say if they
>>>>>
>>>>>   are in an airport, in a forest, in water, etc). We
>>>>> can call this
>>>>>
>>>>>   table T1. I use the centroid of the polygons to
>>>>> check, not the
>>>>>
>>>>>   entire polygon.
>>>>>
>>>>>   - In other different table, that we'll call T2, I
>>>>> have all the
>>>>>
>>>>>   multipolygons ("water",
>>>>> "airports", etc). The name of the field that
>>>>>
>>>>>   says if the multipolygon is "water" or
>>>>> "airport" or anything is
>>>>>
>>>>>   "clutter"
>>>>>
>>>>>
>>>>>
>>>>>   With this code, I pretend to loop over T1, checking
>>>>> if the centroid
>>>>>
>>>>>   of its polygons are inside of one (and ONLY one) of
>>>>> the
>>>>>
>>>>>   multipolygons of T2.  (Is part of a procedure)
>>>>>
>>>>>
>>>>>
>>>>>   (...)
>>>>>
>>>>>   BEGIN
>>>>>
>>>>>     FOR s IN SELECT * FROM T1 LOOP
>>>>>
>>>>>           select clutter from T2 where
>>>>>
>>>>>   ST_Within(ST_Centroid(T1.wkb_geometry),
>>>>> T2.wkb_geometry))
>>>>>
>>>>>           (...)
>>>>>
>>>>>     END LOOP;
>>>>>
>>>>>     RETURN;
>>>>>
>>>>>   END;
>>>>>
>>>>>
>>>>>
>>>>>   Most of the times, the select inside the loop,
>>>>> returns more than one
>>>>>
>>>>>   result ("water" and other one, normally).
>>>>> That's my problem
>>>>>
>>>>>
>>>>>
>>>>>   I have another version of the query, with Java.
>>>>> Again, I loop over
>>>>>
>>>>>   the table T1, and check the centroid of its polygons
>>>>> with all the
>>>>>
>>>>>   multipolygons (water, airports...) that are in T2.
>>>>> Then, my query,
>>>>>
>>>>>   inside the loop, is:
>>>>>
>>>>>
>>>>>
>>>>>   SELECT clutter FROM T2 WHERE ST_Within(?,
>>>>> T2.wkb_geometry)
>>>>>
>>>>>
>>>>>
>>>>>   The '?' is replaced with the centroid of the
>>>>> polygon in T1, of
>>>>>
>>>>>   course. Basically, is the same idea of the previous
>>>>> SQL code, but
>>>>>
>>>>>   with Java.
>>>>>
>>>>>
>>>>>
>>>>>   Apart from this, other problem is that ST_Within
>>>>> takes a long time.
>>>>>
>>>>>   1 sec per query. And I have like 2 millions of
>>>>> polygons (queries) to
>>>>>
>>>>>   check...
>>>>>
>>>>>
>>>>>
>>>>>   My multipolygons' table (T2) has only 29 rows
>>>>> (small table with
>>>>>
>>>>>   large geometries), and I found this
>>>>>
>>>>>   http://postgis.refractions.net/documentation/manual-1.3/ch05.html.
>>>>>
>>>>>   Seems to be good for me. But using "SET
>>>>> enable_seqscan TO off", the
>>>>>
>>>>>   queries take the same time (~ 1 sec). Maybe the other
>>>>> approach
>>>>>
>>>>>   (create an additional column that "caches"
>>>>> the bbox, and matching
>>>>>
>>>>>   against this) could help me. Any ideas on how to
>>>>> create this bbox
>>>>>
>>>>>   with my huge multipolygons?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   Thanks in advance
>>>>>
>>>>>
>>>>>
>>>>>   Regards
>>>>>
>>>>>   Jorge
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>   On Wed, May 6, 2009 at 4:28 AM, Kevin Neufeld
>>>>>
>>>>>   <kneufeld at refractions.net
>>>>> <mailto:kneufeld at refractions.net>>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>       Jorge Arévalo wrote:
>>>>>
>>>>>
>>>>>
>>>>>           ... If is useful, I tested the method
>>>>> "ST_isvalid" with the
>>>>>
>>>>>           multipolygon and returns
>>>>> "false". Maybe the multipolygon is
>>>>>
>>>>>           not closed? I loaded the data from a
>>>>> shapefile. Is it
>>>>>
>>>>>           possible to create a
>>>>> "non-valid" multipolygon? Does PostGIS
>>>>>
>>>>>           accept this?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>       Ah, yes.  Most spatial predicates in PostGIS
>>>>> assume the input
>>>>>
>>>>>       geometry is valid.  Suppose you had defined a
>>>>> POLYGON with a
>>>>>
>>>>>       hole or inner ring outside of the exterior
>>>>> ring.  What is the
>>>>>
>>>>>       area?  Does the question even make sense?
>>>>> PostGIS allows
>>>>>
>>>>>       invalid geometries in the database so users can
>>>>> make full use of
>>>>>
>>>>>       the PostGIS toolset to do whatever they need
>>>>> (ie. breakdown a
>>>>>
>>>>>       polygon to it's constituent linework and
>>>>> rebuild it back up
>>>>>
>>>>>       again to a valid polygon)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           Ok, being even more specific. I'm
>>>>> working with data about
>>>>>
>>>>>           Spain. I have a HUGE multipolygon that
>>>>> represents "water"
>>>>>
>>>>>           (this is, the coasts around Spain and its
>>>>> islands). Then,
>>>>>
>>>>>           the "holes" inside this
>>>>> multypolygon have the shape of
>>>>>
>>>>>           Spain, Balearic Islands and Canary
>>>>> Islands. Of course, I
>>>>>
>>>>>           have more multipolygons that represent
>>>>> "forests",
>>>>>
>>>>>           "airports", "cities",
>>>>> etc, that fit into these holes.
>>>>>
>>>>>
>>>>>
>>>>>           Really, my problem is with some points
>>>>> that belong to an
>>>>>
>>>>>           airport in an island. Using
>>>>> "ST_Within" and "ST_Contains",
>>>>>
>>>>>           the result is that these points belong to
>>>>> the multipolygon
>>>>>
>>>>>           "airport" and multipolygon
>>>>> "water" at same time. Obviously,
>>>>>
>>>>>           the island (and its airport) is
>>>>> surrounded by water, but the
>>>>>
>>>>>           airport's points shouldn't be
>>>>> part of the multopolygon
>>>>>
>>>>>           "water". And, as I said, when I
>>>>> apply "ST_isvalid" to the
>>>>>
>>>>>           multipolygon "water", returns
>>>>> false. Maybe is not closed?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>       Yeah, as mentioned before, a point that is not
>>>>> on the surface of
>>>>>
>>>>>       a (multi)polygon (whether in a hole or
>>>>> completely outside) is
>>>>>
>>>>>       not considered within the (multi)polygon.  It
>>>>> does sound like
>>>>>
>>>>>       your ocean polygon has validity issues.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           Oh, btw, what's the difference
>>>>> between "ST_Within" and
>>>>>
>>>>>           "Within". Does
>>>>> "ST_Within" use index instead of geometry? Am
>>>>>
>>>>>           I right?
>>>>>
>>>>>
>>>>>
>>>>>       No, not instead of.  Both use the actual
>>>>> geometry for testing
>>>>>
>>>>>       within.  ST_Within will also use the index to
>>>>> narrow down the
>>>>>
>>>>>       candidate list first.
>>>>>
>>>>>       As you can see, the definition of ST_Within is
>>>>> just a simple SQL
>>>>>
>>>>>       wrapper that first invokes the index.
>>>>>
>>>>>       postgis=# select prosrc from pg_proc where
>>>>> proname = 'st_within';
>>>>>
>>>>>                     prosrc
>>>>>         ---------------------------------------
>>>>>
>>>>>       SELECT $1 && $2 AND _ST_Within($1,$2)
>>>>>
>>>>>       (1 row)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> http://postgis.refractions.net/documentation/manual-svn/ST_Within.html
>>>>>
>>>>>
>>>>>
>>>>>       Cheers,
>>>>>
>>>>>       Kevin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>>       postgis-users mailing list
>>>>>
>>>>>       postgis-users at postgis.refractions.net
>>>>>
>>>>>       <mailto:postgis-users at postgis.refractions.net>
>>>>>
>>>>>       http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> postgis-users mailing list
>>>>>
>>>>> postgis-users at postgis.refractions.net
>>>>>
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> postgis-users mailing list
>>>>>
>>>>> postgis-users at postgis.refractions.net
>>>>>
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -----Inline Attachment Follows-----
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>>  _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>
>>> --
>>>
>>> Ben Madin
>>> REMOTE INFORMATION
>>>
>>> t : +61 8 9192 5455
>>> f : +61 8 9192 5535
>>> m : 0448 887 220
>>> Broome   WA   6725
>>>
>>> ben at remoteinformation.com.au
>>>
>>>
>>>
>>>                                                        Out here, it pays
>>> to know...
>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20090511/8b046e1d/attachment.html>


More information about the postgis-users mailing list