FW: [Gdal-dev] OGRGeometry::Contains speed issue

Rob McCulley RMcCulley at county24.com
Wed Mar 22 11:58:39 EST 2006


After more than a month, I finally managed to get back to this script.  I thought I would give you all a recap.

I couldn't use the bounding boxes as an initial step because the bounding box of the AOI is the same as the whole area.

>From my testing, I determined that there is very little penalty from the centroid operation.  I did all of my initial testing on one of the 76 initial shapefiles to make the times a little more reasonable.  The one file ran for 22.4 minutes with the centroid operation in place.  When I removed the centroid operation, it ran for 21.3 minutes.  Not much of penalty.  If I removed the contains operation from the script, the one file ran for about 25 seconds.

The problem was with the complexity of the AOI shapefile itself.  It was a very detailed polygon, with four donut holes (also complex) in it.  What I did was create a generalized polygon within the AOI, which caught about 80% of the polygons within the AOI, then I created I generalized polygon outside of the AOI which eliminated about 80% of the polygons outside of the AOI.  For both of these I ignored the donut holes.  Any of the polygons not figured on the first two passes were then compared against the actual AOI boundary, again with the donut holes removed.  After it was all done, any polygons that were within the AOI were compared to polygons representing the donut holes.  When I ran this on the one file for testing, it took about 40 seconds.  Much better than 20 minutes.

All in all, it works pretty well.  The script now completes on all 76 shapefiles in about 45 minutes instead of 17 hours.

The lesson here is that the complexity of the polygon in the contains operation is very important to the speed of the operation.

Thanks for the help Frank and Sean.  I've now added a couple of more steps to the script, and now in one step and about an hour of processing time, I can perform what use to take the better part of a day for me.

Rob McCulley


-----Original Message-----
From: fwarmerdam at gmail.com [mailto:fwarmerdam at gmail.com]On Behalf Of
Frank Warmerdam
Sent: Wednesday, February 08, 2006 2:07 PM
To: Rob McCulley
Cc: Gdal (E-mail)
Subject: Re: [Gdal-dev] OGRGeometry::Contains speed issue


On 2/8/06, Rob McCulley <RMcCulley at county24.com> wrote:
> Hi all,
>
> I've got a python script that runs through a bunch of shapefiles, performs an OGRGeometry::Transform on them, adds a couple of attributes, and merges them all into one final shapefile.  All told it opens 76 shapefiles containing a total of ~18,000 polygons.  It performs this in about 10 minutes.
>
> My area of interest is an irregular area which contains ~16,000 of the polygons. I added in a couple of lines to the script to check if the polygon was in the aoi, and set an attribute accordingly:
>
> point = newGeom.Centroid()
> aoiTest = aoiGeom.Contains(point)
> if aoiTest == 1:
>         aoiVal = 1
> else:
>         aoiVal = 0
>
> There shouldn't ever be a situation where a polygon will cross the aoi boundary, but I use a centroid for the Contains, just in case it isn't perfect.
>
> The problem I'm having is this makes things a lot slower.  The first run through after adding those lines of code was ~17 hours (At the end the script outputs the run time in seconds).  For my purposes, 17 hours is too long.  I know that the aoi polygon is quite complex (~4000 vertices), and that probably slows things down a bit, but that much?  Am I missing something, or is there a better way of doing this?

Rob,

It might be nice to determine if the it is the Contains
operation, or the Centroid which is expensive. I suspect
it is the contains, but I'm not sure.

It might make sense to setup a much similar, but slightly
smaller AOI and first do the contains test against that.
Then do the test against the whole AOI if the item isn't
trivially contained in the simplified AOI.  That should reduce
the cost of the Contains operation alot.

But we you really need to do is preserve the spatial index
thingy that GEOS computes internally for the AOI from
test to test.  There is no way to do that through OGR.
In fact, I don't know how to do it directly with GEOS either
but I gather it is possible with some degree of fiddling.

Various other hacky solutions which roughly boil down to
maintaining your own course spatial index could also be
applied but will substantially complicate your script.

Modulo this particular issue, I am pleased that ogr and python
was useful for this fairly interesting application.

Best regards,

--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent




More information about the Gdal-dev mailing list