[gdal-dev] Call for discussion on RFC 92 text: WKB Only geometries

Daniel Baston dbaston at gmail.com
Mon Feb 6 16:58:32 PST 2023


| And only construct an OGRGeometry when it's asked for? Such as when
GetGeometryRef is called?
I'm wondering about a more broad application of this. Would it be helpful
to have the ability to lazy-initialize an OGRGeometry from multiple source
types such as WKB and GEOS, initially storing only a reference to the
external data in WKB/GEOS/etc and actually materializing the geometry when
required? Then methods such as OGRGeometry::exportToWkb and
OGRGeometry::exportToGEOS could check the external data type and use it
directly if it is compatible, avoiding materialization. This would avoid
multiple conversions to/from GEOS in cases where operations are chained, as
well as allowing WKB to pass directly between input and output drivers that
support it. Relatedly, this ability could be used to cache external-format
data when it is generated for an OGRGeometry, avoiding inefficiencies such
as two conversions to GEOS when checking to see if two geometries intersect
before calculating their intersection.

Dan




On Sat, Feb 4, 2023 at 1:55 PM Even Rouault <even.rouault at spatialys.com>
wrote:

> Hi Sean,
>
> but wouldn't it be possible for all OGRFeatures to carry WKB data by
> default and add a method to provide it to callers?
>
> My understanding of what you propose would involve massive code rewrites
> in all drivers and wouldn't be desirable from a performance point of view,
> because most drivers can't generate WKB easily (PostGIS and GPKG are the
> exceptions rather the norm). So either all other drivers should be modified
> to compose WKB at hand (massive coding effort. Probably several weeks of
> effort and significant risk of regressions). Or get it from the
> ExportToWkb() method of the OGRGeometry instance they currently build, but
> then you pay the price in memory and CPU time to generate WKB that might
> not be consumed by users.
>
> | And only construct an OGRGeometry when it's asked for? Such as when
> GetGeometryRef is called?
>
> Good point, we could both make GetGeometryRef() and GetGeomFieldRef()
> virtual methods whose default implementation would be the same as
> currently, ie. return the value of the corresponding member variable in the
> base OGRFeature class stored with
> SetGeometry[Directly]()/SetGeomField[Directly]()
>
> And add a new virtual method:
>
> virtual GByte* OGRFeature::GetWKBGeometry(int iGeomField, size_t*
> pnOutSize) const
>
> whose default implementation would just use
> GetGeomFieldRef(iGeomField)->ExportToWkb().
>
> The few drivers that can provide a more efficient implementation (GPKG
> typically) would create a derived class OGRFeatureGPKG with a specific
> implementation of those new virtual methods to avoid systematic OGRGeometry
> instantiation. The only drawback I see is that making GetGeometryRef() and
> GetGeomFieldRef() virtual would have a slight performance impact, but
> probably small enough.
>
>
> But fundamentally I'm wondering if RFC 92 hasn't been made mostly out
> fashioned now that we have RFC 86. RFC 86 generally leads to 2x speed-up or
> more on real-world datasets compared to OGRFeature iteration (as measured
> by the bench_ogr_c_api vs bench_ogr_batch utilities) on drivers that have
> implemented it (currently Arrow, Parquet, FlatGeoBuf, GPKG), whereas RFC 92
> only applies to GPKG & PostGIS and in the best - artificial - case only
> lead to 30% speed-up.
>
> Of course, adopting RFC 86 requires significant effort from GDAL users,
> but the benefit is really measurable whereas with RFC 92 it would be
> marginal in most scenarios. As far as I can tell, the performance boost of
> RFC 86 comes mostly from saving creation & destruction of millions of
> OGRFeature instances, its array members, string attributes, geometries
> objects, more than the columnar organization of the ArrowArray data
> structures. In the GeoPackage driver, I've also shown that it makes it
> possible for efficient multi-threading pre-fetching, totally transparent
> for the user.
>
> But to avoid selling false hopes, the benefit of RFC 86 in end-to-end
> scenarios would probably drop significantly (at least if looking at
> performance gain in percentage. The absolute performance savings on the
> GDAL side would remain) if you need to recreate individual features (QGIS'
> QgsFeature or MapServer' msShape objects) from the content of ArrowArray.
> So this is likely a complete shift of concepts that would be required.
>
> Even
>
>
>
> On Tue, Jan 31, 2023 at 4:27 AM Even Rouault <even.rouault at spatialys.com>
> wrote:
>
>> Hi,
>>
>> Please find for review "RFC 92 text: WKB Only geometries" at
>> https://github.com/OSGeo/gdal/pull/7149
>>
>> This RFC provides shortcuts to avoid instantiation of full OGRGeometry
>> instances
>> in scenarios where only the WKB representation of geometries is needed.
>> The
>> hope is to save CPU time.
>>
>> This is something I wanted to at least experiment. I've mixed feelings
>> if it's something we actually want to adopt.
>>
>> Even
>>
>> --
>> http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
>
> --
> Sean Gillies
>
> _______________________________________________
> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230206/27f82958/attachment-0001.htm>


More information about the gdal-dev mailing list