[gdal-dev] Improving details of the project build system and/or documentation
Howard Butler
howard at hobu.co
Tue May 30 07:57:57 PDT 2023
> On May 29, 2023, at 5:16 PM, Even Rouault <even.rouault at spatialys.com> wrote:
> 300 ms to load the 127 plugins on my system, ie each GDAL command line invokation will at least take 300 ms, which might be a significant penalty in some workflows)
PDAL had this penalty and then we did some profiling and saw that most of the cost was the dlopen, not the finding of the plugin via the filesystem. PDAL doesn't have a GDALOpen-like single entry method where drivers are all expected to be loaded and "tried" sequentially, however. We have a map that registers some properties of the "filename" that tells us which driver to actually dlopen, which is managed by libpdal.so. Users also have the ability to explicitly say "open this file with this driver at this dll/so location", which allows them to short cut the map.
> - you can "emulate" building a single driver as plugin, by re-running a GDAL build with just the dependencies of that driver, and setting -DGDAL_ENABLE_DRIVER_xxxx_PLUGIN=ON. The only build artifact of interest in that use case is the [gdal|ogr]_XXX.so/dll corresponding to that driver. The main inconvenience I can see with this current approach is that you have to pay the price for the core lib to be rebuilt, whereas with the free-standing CMake project, you'd point to the installed GDAL headers & lib
This is not so bad because the pain is bore by those building, but the scenario below, where a user can take a clean source tree and configure/make/install only the driver(s) they want as individual projects, is ideal IMO. A packager or someone looking to build specific plugins could do so against a stock GDAL install with ease.
> - by free-standing CMake project, what do you mean ? Couldn't that just the existing frmts/XXXX/CMakeLists.txt file that would detect it is called as the top level CMakeLists.txt (probably by checking ${CMAKE_CURRENT_SOURCE_DIR} == ${PROJECT_SOURCE_DIR}) and then change its logic to find libgdal and its dependencies ? And users would "cmake -S frmts/XXXX -B plugin_XXXX" to configure just for that plugin.
Yes. Here's what we do for PDAL, where just a little bit of extra CMake stuff is required for the -DSTANDALONE=ON scenario https://github.com/PDAL/PDAL/blob/master/plugins/rxp/CMakeLists.txt and a user with a clean source tree can go into plugins/rxp and issue "cmake -DSTANDALONE=ON -DPdal_DIR=wherever -DCMAKE_INSTALL_PREFIX=wherever" and configure/make/install just the plugin. Obviously, GDAL headers/lib need to findable in this scenario, and only public stuff can be used. The source for this driver is managed inside the PDAL source tree, but it is configurable as a standalone installable project.
> - I don't understand the "where multiple drivers have similar sets of dependencies, could be handled in some kind of hierarchical fashion based on common dependencies". why is a special case needed if several drivers share the same dependencies ? Or maybe you were thinking to drivers depending on other drivers ?
I was thinking of drivers that depended on other drivers. It isn't always obvious that some drivers are really families of dependencies.
> - we would need to keep the current monolithic CMake project approach working, at the very least for people doing static builds, but even for people doing dynamic builds as it might be still be more practical if you can install all the dependencies at once. Having the standalone capability in addition would be an extra complexity, so we need to be sure it addresses real use cases, and perhaps restrict it to a few select drivers. Obvious candidates for standalone CMake projects would likely be for the few drivers that have proprietary dependencies (ECW, JP2KAK, MrSID, Oracle, ...) and aren't shipped by FOSS binary distributions.
I totally agree that we would need to do some kind of cost/benefit to see if the complexity is worth the trouble. Our experience with PDAL is that drivers with hairy external dependencies that are either closed source or not conveniently distributed are the best candidates for this approach.
> - offering the possibility of building just a driver as a plugin in a unit way would require people to build against the same GDAL headers as libgdal.
Does it really? If communication across the boundary of the plugin to GDAL is using public GDAL pointers/classes/methods that haven't changed in many releases, does the plugin version actually need to perfectly match the main library version? For PDAL it hasn't, and we have used older binary plugins against newer main libraries with success. PDAL's interface in this regard is smaller, however, so the risk of changes causing problems are less. You could also wire in some explicit plugin versioning if you wanted a way to force a bump. I think if this were a community desire and priority, it could be done.
> Is it in the expected cultural background of rasterio users to do that on their own ?
My take on the culture of rasterio users is most just want to do 'pip install rasterio' and have it all work and not have to think about the fact that rasterio depends on a massive native library that must be compiled just so in order for that to happen. Rasterio does very well at abstracting GDAL away and being the pain sponge for its users. Part of that absorption duty is GDAL's build system :)
My hunch is the ability to more easily take advantage of GDAL plugins within various GDAL binary deployment archipelagos (conda forge, pip, debiangis, vcpkg, homebrew, osgeo4w) would be leveraged if it were easier to do, it reliably worked, and it didn't come with a big performance penalty. I would love for 'pip install rasterio[ecw]' to work and not have to be updated with every GDAL release unless the ECW driver itself was actually updated. That hunch could be misinformed, however, and we would need feedback and information from the community before we should commit to changing anything significant.
Howard
More information about the gdal-dev
mailing list