[gdal-dev] Re: Performance of reading large polygons with holes
Even Rouault
even.rouault at mines-paris.org
Sun Apr 22 07:23:00 EDT 2012
Le dimanche 22 avril 2012 10:53:22, Jukka Rahkonen a écrit :
> Rahkonen Jukka <Jukka.Rahkonen <at> mmmtike.fi> writes:
> > Thus it is 4 seconds vs. 32 seconds measured by Martin. It is a
> > considerable
>
> difference but perhaps Martin is
>
> > not doing exactly the same thing. Anyway, speed of OGR seems to be
> > excellent.
>
> I am acting as a man-in-a-middle and Martin writes now as follows:
>
> "Since the claimed 4 s for OGR seems suspiciously fast, I played around
> with some scenarios intended to ensure that OGR was actually reading and
> constructing every geometry. I eventually settle on computing the
> maximum area of the polygons, since this was the simplest query I could
> come up with that would guarantee building the polygons. On the Java
> side I used JEQL, since I could easily replicate this query, it's using
> more or less the same shapefile code as OJ, and it was the source of the
> 32 s number I gave earlier.
>
> The result suprised me: OGR: ~40s, JEQL ~20s.
>
> The details, in case anyone wants to retry this:
>
> OGR:
>
> ogrinfo -sql "select max(OGR_GEOM_AREA) from tpi_1" tpi_1.shp
>
> Result: max_OGR_GEOM_AREA (Real) = 68476900073.166
>
> JEQL:
>
> ShapefileReader t file: "tpi_1.shp";
> t = select max(Geom.area(GEOMETRY)) from t;
> Print t;
>
> Result:
>
> col0:Double
> 68476900073.13647
> Run completed in 19.015 s
>
> (Good to see that the areas are within 0.03 m^2!)
>
> So either the OGR area routine is slow, or else the reader is pretty
> smart and only builds geometries when it really has to. On the JEQL
> side, the read time of 32 s I quoted before was actually based on a
> query which was computing the maximum number of points in the geometries
> (again, to force reading the geometries, since JEQL uses lazy
> evaluation). So for some reason counting the max number of points is
> slow, which is peculiar. More research required to track down why that
> is. But I think the area-based result is valid."
>
Hi,
Ok, this is interesting for several reasons :
1) Trying to run the 'ogrinfo -sql "select max(OGR_GEOM_AREA) from tpi_1"
tpi_1.shp' showed that a badly written optimization in OGR 1.9.0 broke such a
query. Now fixed per http://trac.osgeo.org/gdal/ticket/4633
2) Width GDAL trunk, 1.9, 1.8 and 1.7, I never get Martin's result. I always
get :
MAX_OGR_GEOM_AREA (Real) = 70726589354.8597
Reached on shape index = 289 (first shape is index = 0):
OGRFeature(tpi_1):289
id (Real) = 904743.0000000000000000000000000000000
gridcode (Real) = 4.0000000000000000000000000000000
class_name (String) = Plains or Open Slopes
I though it could come from a compiler issue, but I get the same results with
GCC 4.4.3 in debug and optimized mode, and also with MSVC 2008
3) This runs in about ~ 5 seconds on all those versions on my machine (after
several runs so that the I/O cache is hot)
OS: Ubuntu 10.04 64bit
CPU : Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz
RAM : 4 GB
Best regards,
Even
More information about the gdal-dev
mailing list