[RFC-24] Updated public draft

Sat Feb 24 19:36:58 EST 2007

2007/2/23, Umberto Nicoletti <umberto.nicoletti at gmail.com>:
> Tamas,
> you suggest we add the refcount member instead of the global map keyed
> by memory address?
>
> Anyway if we encapsulate the INCR/DECR properly we'll be able to
> switch between them later. I'd prefer the hash because it less
> invasive on the data structures and makes for an easier implementation
> (less #IFDEF, etc). More over it is easier to keep it only mapscript
> related, thus leaving the CGI unaffected (smaller structures, less
> memory and no unnecessary refcount housekeeping)

Umberto,

I would prefer the the refcount member instead of using a global
repository since the code should perform well in a multithreaded

environment. We should also avoid the extra cost of the various hash
table lookups. However - for debug purposes - it would be

considerable to establish that table so as to detect the potential
memory allocation problems (like the leaks for example).

> >
> > 2) The destroy function of the objects may be called from different
> > threads so it should be protected by locks (when USE_THREAD is
> > enabled). I would prefer using a per-object lock.
> >
>
> I don't know about C#, but Java and probably Python all have a single
> thread that performs GC so this is unnecessary. Also I've never seen
> this possibility mentioned on the swig docs, can you check?
> If it is needed only for some languages we should wrap it with
> preprocessor macros.
>

I mean that the garbage collector operates on a different thread as
the normal execution. I guess the java runtime implements a

similar approach as the .NET runtime in this case. Therefore the
--refcount will possibly be executed on different thread as the

++refcount.
(It depends on the compiler how the increment and decrement operations
are actually implemented. In some cases the the actual

refcount value will be fetched into a register, the addition and the
substraction will happen in that register and later the value is

written back to the memory. In the meantime the other thread might
change the value in the memory but those changes will

eventually be lost. In addition - for example - when a DWORD value is
not aligned to 4 byte boundaries on a 32 bit system, either

the writing of the value cannot be considered as an atomic operation.
While one part of that value is written the other thread might

read that partially written value back.)

> >
> > 3.2 Add references to the mapscript wrapper object
> > --------------------------------------------------------------------------
> > This effort can be ignored, see below.
> >
>
> I'm afraid we can't. There are many places where the code is like this:
>
> layer->map->layers[layersorder[i]]
>
> the parent object is accessed by the child without checking. If this
> happens and the parent has been gc'ed we obtain a segfault which
> results in a crash of the hosting language, which is exactly what we
> want to avoid *at any cost*.
>

Well, I haven't looked over the code in this regard and needless to
say I'm *totally disappointed*. Using a pointer without checking

it's value is not acceptable in most cases. Now then it's not so
difficult to write some test cases that will definitely fail with

unhandled exception, like:

     layerObj layer = new layerObj(null);
     layer.connection = "data.shp";
     layer.connectiontype = MS_CONNECTION_TYPE.MS_SHAPEFILE;
     layer.open();

DAMN !

> >
> > 3.3 Change arrays of structures in arrays of pointers
> > ---------------------------------------------------------------------------
> > Since we will change the object arrays to object pointer arrays it's
> > safe to destroy the parent when the child is still alive. Therefore
> > there's no need to keep a reference to the parent at the wrapper
> > object.
>
> This is wrong: see my reasoning above.
>

As of the revelation mentioned above we cannot consider the child
object to be self-owned at all. These objects should be kept alive
just as long as the parent. The lifespan of the child cannot exceed
the the lifespan of the parent. Thus we can see the distinction

how msFreeMap calls msFree on the map object, but freeLayer does not
call msFree on the layer object. Therefore freeLayer is

not the best place where the reference counting decrementation should
take place for the layer object.

The early garbage collection issue results in the possibility that the
destructor of the parent object might be called but references

on it's child still exist at the target language. To avoid this
situation we would want to keep references on the parent to extend
it's

lifetime until the reference to the child exists.
In my opinion - however- the protection logic should rather reside at
the C side instead of at the target language side. Otherwise the

code is not coherent with the fashion how the back references are used.

(Apparently at the target language I would like to see a fairly
obvious model as: every wrapper object is responsible for it's own C

memory reference and should not be aware of the lifetime of the other
wrapper objects.)

Incrementing the refcount on the parent at the C side will also keep
the lifespan of the parent as long as the child. In other words

the reference counting on the child should be dispatched to the parent:

int msAddRefMap(mapObj *map)
{
    return ++map->refcount;
}

int msAddRefLayer(layerObj *layer)
{
    if (layer->map)
        msAddRefMap(layer->map);

    return ++layer->refcount;
}

%typemap(out) struct map_obj*, mapObj* {
	msAddRefMap((mapObj*)$1);
	$result = $1;
}

%typemap(out) struct layer_obj*, layerObj* {
	msAddRefLayer((layerObj*)$1);
	$result = $1;
}

In the examples above msAddRefLayer increments the refcount of the
self as well as the refcount of the parent. When adding an

isolated child object to the parent the refcount of the child should
be added to the refcount of the parent. When separating the child

from the parent the refcount of the child should be substracted from
the refcount of the parent.

The destroy function of the layer should also propagate the call to
the parent object

%extend layerObj
{
~layerObj()
    {
        msReleaseLayer(self);
    }
}

%extend mapObj
{
~layerObj()
    {
        msReleaseMap(self);
    }
}

void msReleaseLayer(layerObj *layer)
{
    if (layer->map) {
            --layer->refcount;
            msReleaseMap(layer->map);
        }
        else {
         if (--layer->refcount == 0)
         {
            freeLayer(layer);
            free(layer);
           }
        }
}

void msReleaseMap(mapObj *map)
{
  if (--map->refcount ==0)
  {
    msFreeMap(map);
  }
}

The multithreading related code has been left out of the examples above.

>
> >
> > 3.4 Keep parent references and C internal structure in sync
> > -------------------------------------------------------------------------------------
> >
> > The only SWIG customizations I would want to see are:
> >
> > a) modifying the destroy  function of the wrapper so as to destroy the
> > underlying object every time regardless of the actual SWIGCMemOwn
> > setting.  For the C# binding it looks something like this:
> >
> > %typemap(csdestruct, methodname="Dispose", methodmodifiers="public") SWIGTYPE {
> >     if(swigCPtr.Handle != IntPtr.Zero) {
> >       swigCMemOwn = false;
> >       $imcall;
> >     }
> >     swigCPtr = new HandleRef(null, IntPtr.Zero);
> >     GC.SuppressFinalize(this);
> >   }
> >
>
> We only have to evaluate if it costs more to remove the newobject
> statements or to  write a typemap for every object's destructor, but I
> agree on the concept.
>

I cannot really understand that %newobject approach though. The
%newobject feature denotes that the object should be returned

as SWIGCMemOwn = true which alters the default SWIG behaviour (where
the returned object references are always "disowned").

How the removal of this notation will solve the same problem? On the
contrary, IMHO every object returned to SWIG should be marked as
%newobject in effect.

>
> > 3.5 Destructors obey the reference counter
> > --------------------------------------------------------------
> > Should take care of the multithreading issues have been mentioned above.
> > #2 should be omitted IMHO ("When the children are GC'ed earlier than
> > their parent, they must NULLify their pointers in the parent") this is
> > not an issue.
> >
>
> But then how would the parent know if it has to invoke the destructor
> on a child or not if the
> child has already been gc'ed? If they don't know they'll end up
> dereferencing a NULL pointer and that-is-bad(c).
> Imagine this scenario: a layer is added to a map. After a while they
> are gc'ed in this order: the layer first and then the map. If the
> layer does not set its reference to NULL in the parent the map will
> segfault when it attempts to free the layer.
>

The layer - being added to the map - cannot be considered as an
isolated object any more. The final destruction of the layer should be
occurred when the map is destroyed (as it is now). The map should be
kept alive until references on the layer still exist.
Though the layerObj is GC'ed and the destructor calls msReleaseLayer
the layerObj will not be freed because msReleaseLayer will propagate
the call to msReleaseMap and msFreeMap will actually destroy the
layer.
Upon calling ~layerObj, when there are no more references on the root
object and the garbage collector might even have called the destructor
of the map object the the underlying mapObj struct will be kept alive
until the following sequence destroys the layer:

~layerObj --> msReleaseLayer --> msReleaseMap --> msFreeMap --> freeLayer

Best  regards,

Tamas