[gdal-dev] Re: Performance of reading large polygons with holes

Even Rouault even.rouault at mines-paris.org
Sun Apr 22 07:23:00 EDT 2012


Le dimanche 22 avril 2012 10:53:22, Jukka Rahkonen a écrit :
> Rahkonen Jukka <Jukka.Rahkonen <at> mmmtike.fi> writes:
> > Thus it is 4 seconds vs. 32 seconds measured by Martin. It is a
> > considerable
> 
> difference but perhaps Martin is
> 
> > not doing exactly the same thing.  Anyway, speed of OGR seems to be
> > excellent.
> 
> I am acting as a man-in-a-middle and Martin writes now as follows:
> 
> "Since the claimed 4 s for OGR seems suspiciously fast, I played around
> with some scenarios intended to ensure that OGR was actually reading and
> constructing every geometry.  I eventually settle on computing the
> maximum area of the polygons, since this was the simplest query I could
> come up with that would guarantee building the polygons.  On the Java
> side I used JEQL, since I could easily replicate this query, it's using
> more or less the same shapefile code as OJ, and it was the source of the
> 32 s number I gave earlier.
> 
> The result suprised me:  OGR: ~40s, JEQL ~20s.
> 
> The details, in case anyone wants to retry this:
> 
> OGR:
> 
> ogrinfo -sql "select max(OGR_GEOM_AREA) from tpi_1" tpi_1.shp
> 
> Result:    max_OGR_GEOM_AREA (Real) = 68476900073.166
> 
> JEQL:
> 
> ShapefileReader t file: "tpi_1.shp";
> t = select max(Geom.area(GEOMETRY)) from t;
> Print t;
> 
> Result:
> 
> col0:Double
> 68476900073.13647
> Run completed in 19.015 s
> 
> (Good to see that the areas are within 0.03 m^2!)
> 
> So either the OGR area routine is slow, or else the reader is pretty
> smart and only builds geometries when it really has to.  On the JEQL
> side, the read time of 32 s I quoted before was actually based on a
> query which was computing the maximum number of points in the geometries
> (again, to force reading the geometries, since JEQL uses lazy
> evaluation). So for some reason counting the max number of points is
> slow, which is peculiar. More research required to track down why that
> is.  But I think the area-based result is valid."
> 

Hi,

Ok, this is interesting for several reasons :

1) Trying to run the 'ogrinfo -sql "select max(OGR_GEOM_AREA) from tpi_1" 
tpi_1.shp' showed that a badly written optimization in OGR 1.9.0 broke such a 
query. Now fixed per http://trac.osgeo.org/gdal/ticket/4633

2) Width GDAL trunk, 1.9, 1.8 and 1.7, I never get Martin's result. I always 
get :

  MAX_OGR_GEOM_AREA (Real) = 70726589354.8597

Reached on shape index = 289 (first shape is index = 0):

OGRFeature(tpi_1):289
  id (Real) = 904743.0000000000000000000000000000000
  gridcode (Real) = 4.0000000000000000000000000000000
  class_name (String) = Plains or Open Slopes

I though it could come from a compiler issue, but I get the same results with 
GCC 4.4.3 in debug and optimized mode,  and also with MSVC 2008

3) This runs in about ~ 5 seconds on all those versions on my machine (after 
several runs so that the I/O cache is hot)

OS: Ubuntu 10.04 64bit
CPU : Intel(R) Core(TM) i5 CPU 750  @ 2.67GHz
RAM : 4 GB

Best regards,

Even


More information about the gdal-dev mailing list