shapeObj buffer resize algorithm

Tue Jul 26 23:20:28 EDT 2005

On 7/26/05, Kovacs Baldvin <baldvin at cs.elte.hu> wrote:
> Hi!
> 
> According to profiling data a lot of time was spent in msAddLine
> when having pretty much lines. One of the reasons was the extremely
> inefficient way of pushing data onto the vector p->line: for every
> new item the whole content of the vector was copied to a newly
> allocated memory area.
> I attached a patch that makes shapeObj use the convetional heuristic
> to double its buffer size every time it fills up. Naturally I had
> to introduce a new member of shapeObj for holding the actual buffer
> size.
> 
> So this is actually a speed for memory tradeoff. I really do think
> that it worth it: the running time decreased pretty much.
> 
> Please consider the patch for inclusion.

Baldvin, 

I saw some similar behavior when I was profiling mapserver earlier
this year.  I added msAddLineDirectly() to avoid unnecessary 
recopying and reallocation of the points array, and changed the
code to use memcpy() on the points array.  This helped a great
deal in the profiled runs, but very little in the optimized runs. I came
to the conclusion that some of the word by word copying is slow
under the profiler but fast without it.  Have you verified that your
changes give significant improvements in optimized mode? 

Generally speaking I like your approach, and I have employed
similar methods in other circumstances to grow things efficiently.
However, I am finding it hard to imagine it making a significant
difference in mapserver.   You mention that your test case as
"having pretty much lines".  Are these features with a single
line, or with many lines in one feature?  I would have thought 
that multi-line line features would be relatively uncommon in 
general use.  Polygons with holes might be much more common,
but even then I am dubious that having many holes in a polygon
is common. 

Looking at the msAddLine() code, I must admit I wonder why it 
does not just use realloc() to reallocate the array larger instead
of mallocing a new array, and copying stuff over one structure
at a time.  If we used realloc() the runtime library would have the
opportunity to recognise that the underlying allocation block
is actually larger than needed, and that no transfer is actually needed.
I am not sure how widely this sort of optimization is implemented in
libc implementation but it seems like it *ought* to already be embedded.

I am somewhat concerned by your optimization because it now means
anywhere that creates a local lineObj needs to remember to 
properly initialize the bufsize array.  You may have found and updated
everything in the core mapserver code properly, but are you sure you
got all occurances in the mapscript bindings, utility programs and so 
forth?  Because of MapServer's habit of making core structures
like shapeObj public and leave initialization up to higher level code it
can be dangerous to change the structures and assumptions. 

My take is that we should only be taking this risk if we are reasonly
convinced that this is a significant optimization in fairly common
circumstances.   Furthermore, I think we should check if use of realloc()
does or doesn't give similar benefits on Linux and win32. 

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent