[gdal-dev] Call for discussion on RFC 92 text: WKB Only geometries

Even Rouault even.rouault at spatialys.com
Sat Feb 4 10:55:37 PST 2023


Hi Sean,


> but wouldn't it be possible for all OGRFeatures to carry WKB data by 
> default and add a method to provide it to callers?

My understanding of what you propose would involve massive code rewrites 
in all drivers and wouldn't be desirable from a performance point of 
view, because most drivers can't generate WKB easily (PostGIS and GPKG 
are the exceptions rather the norm). So either all other drivers should 
be modified to compose WKB at hand (massive coding effort. Probably 
several weeks of effort and significant risk of regressions). Or get it 
from the ExportToWkb() method of the OGRGeometry instance they currently 
build, but then you pay the price in memory and CPU time to generate WKB 
that might not be consumed by users.

| And only construct an OGRGeometry when it's asked for? Such as when 
GetGeometryRef is called?

Good point, we could both make GetGeometryRef() and GetGeomFieldRef() 
virtual methods whose default implementation would be the same as 
currently, ie. return the value of the corresponding member variable in 
the base OGRFeature class stored with 
SetGeometry[Directly]()/SetGeomField[Directly]()

And add a new virtual method:

virtual GByte* OGRFeature::GetWKBGeometry(int iGeomField, size_t* 
pnOutSize) const

whose default implementation would just use 
GetGeomFieldRef(iGeomField)->ExportToWkb().

The few drivers that can provide a more efficient implementation (GPKG 
typically) would create a derived class OGRFeatureGPKG with a specific 
implementation of those new virtual methods to avoid systematic 
OGRGeometry instantiation. The only drawback I see is that making 
GetGeometryRef() and GetGeomFieldRef() virtual would have a slight 
performance impact, but probably small enough.


But fundamentally I'm wondering if RFC 92 hasn't been made mostly out 
fashioned now that we have RFC 86. RFC 86 generally leads to 2x speed-up 
or more on real-world datasets compared to OGRFeature iteration (as 
measured by the bench_ogr_c_api vs bench_ogr_batch utilities) on drivers 
that have implemented it (currently Arrow, Parquet, FlatGeoBuf, GPKG), 
whereas RFC 92 only applies to GPKG & PostGIS and in the best - 
artificial - case only lead to 30% speed-up.

Of course, adopting RFC 86 requires significant effort from GDAL users, 
but the benefit is really measurable whereas with RFC 92 it would be 
marginal in most scenarios. As far as I can tell, the performance boost 
of RFC 86 comes mostly from saving creation & destruction of millions of 
OGRFeature instances, its array members, string attributes, geometries 
objects, more than the columnar organization of the ArrowArray data 
structures. In the GeoPackage driver, I've also shown that it makes it 
possible for efficient multi-threading pre-fetching, totally transparent 
for the user.

But to avoid selling false hopes, the benefit of RFC 86 in end-to-end 
scenarios would probably drop significantly (at least if looking at 
performance gain in percentage. The absolute performance savings on the 
GDAL side would remain) if you need to recreate individual features 
(QGIS' QgsFeature or MapServer' msShape objects) from the content of 
ArrowArray. So this is likely a complete shift of concepts that would be 
required.

Even


>
> On Tue, Jan 31, 2023 at 4:27 AM Even Rouault 
> <even.rouault at spatialys.com> wrote:
>
>     Hi,
>
>     Please find for review "RFC 92 text: WKB Only geometries" at
>     https://github.com/OSGeo/gdal/pull/7149
>
>     This RFC provides shortcuts to avoid instantiation of full
>     OGRGeometry
>     instances
>     in scenarios where only the WKB representation of geometries is
>     needed. The
>     hope is to save CPU time.
>
>     This is something I wanted to at least experiment. I've mixed
>     feelings
>     if it's something we actually want to adopt.
>
>     Even
>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
>     _______________________________________________
>     gdal-dev mailing list
>     gdal-dev at lists.osgeo.org
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
>
> -- 
> Sean Gillies
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230204/3f2cc0ed/attachment.htm>


More information about the gdal-dev mailing list