[gdal-dev] Draft GDAL/OGR class hierarchy for GDAL 2.0

Etienne Tourigny etourigny.dev at gmail.com
Thu Mar 27 12:23:47 PDT 2014


Dmitry - I am not sure that a wholescale refactor is the way to go - many
formats will be dropped because quite a few do not have active maintainers,
and development is mostly on a volunteer basis (correct me if I am wrong,
Even).

As an example, the netcdf driver has been mostly maintained by myself for
the past few years (with the help of a few users and devs), and it has been
about 99% voluntary work. And I will be unable to work on it for the next
year, so it probably would not be upgraded if the API changes drastically.
And I am sure there are quite a few drivers in the same situation.

It there a way to think of a new design, while retaining backwards
compatibility for the drivers that would not be updated? Especially for the
raster drivers, which would probably need less changes.

cheers
Etienne


On Thu, Mar 27, 2014 at 4:13 PM, Dmitriy Baryshnikov
<bishop.dev at gmail.com>wrote:

> Hi Even,
>
> I think that we can discuss the starting point for 2.0.
> You start with idea "...to "refactor" a code base of 1.2 million lines.."
> Maybe we have to start from white list - create ideal (brilliant) set of
> classes and create a well-designed foundation for GDAL. Also rewrite
> several wide used drivers (shp, tab, PostGIS, etc.). Than this will works
> very good, begin some kind of hackathon for rewriting overs.
> Also we can add some testing framework inside GDAL (e.g. googletest) and
> start empty repo on github for GDAL 2.0.
> The GDAL can be enhanced, we can move common things that use several
> drivers to core or replace some core thing with outer libraries. CMake
> maybe used as build system.
> Yes this is deep refactoring, but maybe this is need for GDAL 2.0?
>
> Best regards,
>     Dmitry
>
> 27.03.2014 14:24, Even Rouault пишет:
>
>  Hi,
>>
>>  Hi all,
>>>
>>> MY opinion is that's not a really good way to count how many classes you
>>> will have and trying to have minimal classes.
>>>
>>> Drawing a model with UML is not easy : it's not easy to have a good model
>>> from the beginning, it's not easy to name all members and methods
>>> correctly, and the worst is inheritance.
>>> You can think you spend too much time to make the model. But you can't
>>> say
>>> if you will spend less or much time to change code because your model was
>>> too simple.
>>>
>> The issue is that I'm trying to "refactor" a code base of 1.2 million
>> lines of
>> C/C++ code (not mentionning the 140 000 lines of Python tests), and with
>> 211
>> existing drivers. So, given that nobody will probably ever want to fund
>> the
>> effort, it is completely excluded to have to edit each of those drivers
>> in a non
>> automated way...
>> The end solution will necessary be a compromise. I would also like the
>> existing
>> C API to continue working as much as possible.
>>
>>  Because I don't know all the model of Gdal and the model can be "reset" I
>>> have some questions :
>>>
>>> 1) Why GDALMajorObject not inherit of GDALIMajorObject ?
>>>
>> It was just an omission in the UML diagram. My C++ sketch had this
>> inheritance.
>>
>>  2) Why all classes inherit (directly or not) of GDALMajorObject ?
>>>
>> That's how it is done currently.
>>
>>  What do
>>> you store in it ?
>>>
>> It contains a list of list of strings, or in more concrete terms, metadata
>> domains that contain metadata items.
>>
>>  Is it possible to put it as a member in classes ?
>>>
>> That could have been a way of doing it, yes. Always the debate
>> composition vs
>> inheritance.
>>
>>  Have
>>> you consider templates ?
>>>
>> Actually, no. I tend to avoid them unless they are an obvious solution.
>>
>>  3) An interface for Drivers ? geName(), getLongName(), getDescription() ?
>>> Or maybe abstract ...
>>>
>>> 4) What is GDALDataset in your diagram ?
>>>
>> It is the base class that a driver that supports both raster and vector
>> would
>> have to subclass.
>>
>>  5) Where is the link between Driver (Gdal/Ogr) and Dataset
>>> (Raster/Vector)
>>> ? Member, templates ?
>>>
>> A Driver instanciates a Dataset by opening an existing one, or by
>> creating a new
>> one. But the Dataset object itself is a free-standing objects that is
>> only owned
>> by user code, and not by the Driver.
>>
>> For now, I've dropped my crazy UML diagram that nobody would be able to
>> understand, even if I managed to make it work, and I'm pursuing my attempt
>> just merging OGRDataSource inside GDALDataset. And make OGRDriver and
>> OGRSFDriverRegistrar mostly disappear, or just create a minimized version
>> of
>> them, as compatibility layer for existing drivers, that forward the real
>> work to
>> GDALDriver and GDALDriverManager.
>>
>> Even
>>
>>  Florent
>>>
>>>
>>> 2014-03-25 21:14 GMT+01:00 Even Rouault <even.rouault at mines-paris.org>:
>>>
>>>  Selon Etienne Tourigny <etourigny.dev at gmail.com>:
>>>>
>>>>  I also (respectfully) think that there are too many classes (mostly the
>>>>> abstract classes and interfaces) which make it a bit hard to
>>>>> understand.
>>>>>
>>>>> I like Vincent's second suggestion to have 3 base Dataset classes
>>>>> (GDALRasterDataset, GDALVectorDataset, GDALHybridDataset) which inherit
>>>>> from GDALDataset, and an enum (vector,raster,hybrid) to easily query
>>>>>
>>>> which
>>>>
>>>>> type the Dataset is (or which type its driver supports). You then
>>>>>
>>>> subclass
>>>>
>>>>> the relevant Dataset class and it *should* work auto-magically for
>>>>>
>>>> rasters.
>>>>
>>>>>   For example, a given Dataset which subclasses GDALDataset would be
>>>>> migrated as  a subclass of GDALRasterDataset, which would imply that it
>>>>> does not have a vector dataset. If the implementation changes to allow
>>>>> vector, then you would change its parent and implement relevant vector
>>>>> methods.
>>>>>
>>>>> My view on this is KISS with a minimum number of classes...
>>>>>
>>>> Yes, the practice shows that the UML brainstorming leads to nowhere (at
>>>> least
>>>> with my current limited knowledge of C++ multi-inheritance)... So
>>>> actually
>>>> instead of creating new classes, I think that eventually there will be
>>>> less
>>>> classes... I'm not even sure about the interest of having subclasses of
>>>> GDALDataset to distinguish between raster only, vector only or
>>>> raster+vector. As
>>>> I wrote previously that could be done as a driver metadata instread.
>>>>
>>>> So, for now, I've just "imported" the usefull methods of OGRDataSource
>>>> into
>>>> GDALDataset, and made a few changes here or there so that it works. Well
>>>> except
>>>> ogr_refcount.py, because of a funny discovery was that the ref counting
>>>> of
>>>> GDALDataset and OGRDataSource is (was) not the same. Starts at 1 for
>>>> GDALDataset
>>>> and 0 for OGRDataSource...
>>>>
>>>> The temporary result is in the following branch:
>>>> https://github.com/rouault/gdal-1/commits/unification
>>>>
>>>> OGRDataSource is kept with just that :
>>>>
>>>> class CPL_DLL OGRDataSource : public GDALDataset
>>>> {
>>>> public:
>>>>                          OGRDataSource();
>>>>
>>>>      virtual const char  *GetName() = 0;
>>>>
>>>>      static void         DestroyDataSource( OGRDataSource * );
>>>>
>>>>      OGRSFDriver        *GetOGRDriver() const;
>>>>      void                SetOGRDriver( OGRSFDriver *poDriver );
>>>>      virtual     const char* GetDriverName();
>>>>
>>>> protected:
>>>>      friend class OGRSFDriverRegistrar;
>>>>
>>>>      OGRSFDriver        *m_poOGRDriver;
>>>>
>>>> };
>>>>
>>>> And it could eventually completely disappear (or maybe just remain as a
>>>> convenience class if we don't want to touch existing drivers) if I
>>>> manage
>>>> to
>>>> merge OGRSFDriver into GDALDriver, and get rid of OGRSFDriverRegistrar.
>>>>
>>>> Even
>>>>
>>>>  cheers
>>>>> Etienne
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 25, 2014 at 9:13 AM, Vincent Mora
>>>>> <vincent.mora at oslandia.com>wrote:
>>>>>
>>>>>  On 25/03/2014 11:44, Even Rouault wrote:
>>>>>>
>>>>>>  Selon Vincent Mora <vincent.mora at oslandia.com>:
>>>>>>>
>>>>>>>   On 24/03/2014 21:46, Even Rouault wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> "Release soon, release early", so for people who like UML diagrams
>>>>>>>>> (there
>>>>>>>>>
>>>>>>>>>  is
>>>>>>>>
>>>>>>>>  also a prototype of C++ classes for those who don't like UML very
>>>>>>>>>
>>>>>>>> much),
>>>>
>>>>> here's
>>>>>>>>
>>>>>>>>  a blog entry with the outcome of my thoughts for a possible
>>>>>>>>> re-organisation
>>>>>>>>>
>>>>>>>>>  of
>>>>>>>>
>>>>>>>>  the GDAL/OGR class hierarchy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   http://erouault.blogspot.ca/2014/03/draft-gdalogr-class-
>>>>>>>>>
>>>>>>>> hierarchy-for-gdal.html
>>>>>>>
>>>>>>>  I don't understand the need for HandleRasterData() and
>>>>>>>> HandleVectorData().
>>>>>>>>
>>>>>>>> Isn't inheriting from GDALEmptyRasterDataset or
>>>>>>>>
>>>>>>> GDALEmptyVectorDataset
>>>>
>>>>> sufficient to convey the information that the class doesn't handle
>>>>>>>>
>>>>>>> those
>>>>
>>>>> kind of datasets.
>>>>>>>>
>>>>>>>> A call to GetLayerCount()/GetRasterCount() from the class user is
>>>>>>>> sufficient to say "no vector/raster data here", and a dynamic _cast
>>>>>>>>
>>>>>>> to
>>>>
>>>>> GDALEmptyVectorDataset/GDALEmptyRasterDataset is even clearer to me,
>>>>>>>>
>>>>>>> so
>>>>
>>>>> HandleRasterData()/HandleVectorData() would be a third way to tell
>>>>>>>>
>>>>>>> the
>>>>
>>>>> same thing.
>>>>>>>>
>>>>>>>>  Actually, when thinking more about this, this kind of information
>>>>>>>
>>>>>> should
>>>>
>>>>> be at
>>>>>>> the driver level, and not at the dataset level. The idea is that
>>>>>>> there
>>>>>>> are use
>>>>>>> cases where you want to know if a driver can handle only vector, only
>>>>>>> raster, or
>>>>>>> both. For example if you still want to have separate Open dialog
>>>>>>>
>>>>>> boxes for
>>>>
>>>>> raster or vector.
>>>>>>>
>>>>>>>  Also I'm not sure about the names GDALAbstract* since those are
>>>>>>>>
>>>>>>> partial
>>>>
>>>>> implementations.
>>>>>>>>
>>>>>>>>  I know I'm not very good at naming. Do you have an alternative
>>>>>>>
>>>>>> proposal ?
>>>>
>>>>> The
>>>>>>> issue is that with my draft we end up with a lot of classes and
>>>>>>> interfaces, so
>>>>>>> it is not obvious to find a good name to reflect their content.
>>>>>>>
>>>>>>>  Partial sound ok to me, but I not so good at naming  ?
>>>>>>
>>>>>> Considering the number of classes, I gave it some thoughts this
>>>>>> morning
>>>>>> and, like Dmitry, thought of merging the Abtract (Partial) into the
>>>>>> interface before realizing that, even if it works (you can overload de
>>>>>> default function in Empty) it's a bit ugly and I ended up preferring
>>>>>>
>>>>> your
>>>>
>>>>> solution.
>>>>>>
>>>>>> An altenative would be to have a Dataset with both aspects and provide
>>>>>> three partial specialisations: one for vector (it will behave like
>>>>>>
>>>>> empty
>>>>
>>>>> raster) one for raster, and one for both. Code duplications could be
>>>>>> avoided by implementing protected member functions in the Dataset
>>>>>>
>>>>> class and
>>>>
>>>>> simply calling them in the implementations of partial specialization.
>>>>>>
>>>>> This
>>>>
>>>>> solution avoid the diamond shaped inheritance diagram of hell :)
>>>>>>
>>>>>>     Even
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  _______________________________________________
>>>>>> gdal-dev mailing list
>>>>>> gdal-dev at lists.osgeo.org
>>>>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>
>>>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>>
>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140327/71d97b72/attachment-0001.html>


More information about the gdal-dev mailing list