[GRASS5] Re: Grass clones

Glynn Clements glynn.clements at virgin.net
Thu Mar 14 22:42:02 EST 2002


Markus Neteler wrote:

> Developers: Prof. Antoniol was so kind to start a first try on
> GRASS clone detection. A clone is considered as a piece of code
> which is very similar to another piece of code (poor man's definition).
> If a clone appears several times, it should become a library
> function to simplify the maintenance. In fact detection of clones
> is not so easy. You may, for an introduction, read this paper:
>  Paolo Tonella, ITC-irst: "An Introduction to Clone Detection" 
>  http://mpa.itc.it/grass2001/tonella2001_clones.ps
> 
> I do not re-sent the new GRASS clones analysis here due to bandwidth
> limitations, but I have put this preliminary analysis online:
> 
>  http://mpa.itc.it/markus/tmp/grass.cln
>  (550k, ASCII)
> 
> Please read further on Giulio Antoniol's mail below.
> 
> Giulio: let's wait for some comments... I'll forward if needed.

Some comments:

1. I don't think that we need to consider "similar" functions
initially. By "similar", I mean functions which could be merged into a
single function with the addition of extra parameters. I'm more
concerned about "exact" clones, which could simply be moved into a
library without requiring any interface changes.

I refer to these below as false matches, in the sense that they aren't
what I'm interested in at present. I accept that they could be
considered valid matches in other contexts; they may subsequently
prove useful in highlighting areas where it might be worth rethinking
the design.

2. Removing the parameterisation of identifiers would eliminate many
of the false matches, without eliminating many (any?) true matches.
There are quite a lot of functions which have identical structure, but
operate upon different global variables. E.g.

src/libes/gis/opencell.c G__reallocate_temp_buf 815 828
src/libes/gis/opencell.c G__reallocate_mask_buf 796 809

3. Alternatively, most[*] of the false matches relate to functions
within a common source file. I think that we can safely eliminate all
such cases. The cases which deserve attention are those where new
functionality was created by copy-paste-modify, and the current
developers are unaware as to the origin of the code.

[*] But not all; e.g. r.mapcalc/r.mapcalc3/r3.mapcalc have many
similarly structured functions (e.g. sin/cos/tan), but each lives in
its own source file.

4. There will be some "exact" clones which I don't expect to be found
this way, due to differences which aren't straightforward to ignore. 
Specific examples include:

a) Code has been converted from pre-ANSI (K&R) C to ANSI C.

b) Code has undergone stylistic changes other than simple
reformatting, e.g. changing "if (x == 0)" to "if (!x)" etc.

c) Bugfixes, minor enhancements etc, which are equally applicable to
all copies, have only been made to certain copies.

-- 
Glynn Clements <glynn.clements at virgin.net>



More information about the grass-dev mailing list