Parallelizing calls to msDrawLayer()

Sat Oct 13 20:20:47 EDT 2007

Tamas,

Tamas Szekeres wrote:
> 2007/10/14, David Fuhry <dfuhry at cs.kent.edu>:
> 
>> I might be misunderstanding your point here, but... Rendering a layer
>> into an independent imageObj should be a pretty independent operation,
>> and could be made so if it's not now.
> 
> If the vtable functions implemented by the driver are not reentrant
> then the rendering of the layers connected to the same driver is
> definitely dependent. The drawing itself might be created
> independently if mapserver and gd or agg could avoid using global
> variables during the drawings.

Right, that would be necessary.

> The same applies to the drivers as well, however it's quite more
> difficult to audit the code from this aspect because it might as well
> depent on the subsequent libraries. Moreover we should consider not
> only the globals (globally accessed static variables) but also all of
> the potential common resources like database connections file handles
> etc..

Right, we would have to lock those, or better, make them "expandable". 
For example, two threads rendering PostGIS layers would need to render 
sequentially in turn, or better, each have their own db connection 
(requiring some modification of the currently common db connection code).

>> Glancing at the mapserver
>> thread-safety FAQ, it seems there are more unsafe & locked components
>> related to data-fetching drivers than there are for rendering.  Which
>> makes me wonder why you suggest parallelizing the data-fetching but not
>> the rendering.
>>
> 
> Because I expect significantly greater increment in the performance by
> parallelizing the data retrieval than the drawing (+ the extra image
> overlays) itself.

Absolutely agreed.  I wonder if it's actually /easier/ to do both (by 
wrapping each msDrawLayer() in a thread) than to do just parallel 
retrieval though.

> 
>> Forgive me if I'm playing a bit of devil's advocate here.  I'm aware
>> that non-reentrant functions don't rewrite themselves, and that critical
>> sections don't surround themselves with mutexes.
> 
> Using a mutex in that function would serialize the the operation and
> kill the parallel behaviour definitely. However currently the driver
> operations are quite separated in fairly atomic operations so it
> wouldn't involve too much problems.

Oh, I'm just saying that some additional sections of code here and there 
will need to be locked to make them thread-safe.  Nothing too 
performance critical, I wouldn't think.

>> Surely though, it
>> ought not to be a tremendous amount amount of work to keep separate
>> layer-drawing operations from stepping on eachothers' toes?
> 
> I'm pretty sure currenly parallelizing the data retrieval is more
> trivial that reconstructing the drawing logic inside mapserver.
> For example the LayerWhichShapes data provider functions would trigger
> an asynchronous fetch operation to the data source and later the
> NextShape would serve the retrieved data from the memory when drawing
> the map.

Ah ok, glancing at maplayer.c and mapdraw.c, I'm starting to see what 
you mean.  So msDrawVectorLayer currently loops like:

while (s = layer->vtable->NextShape())
{
    msDrawShape(s);
}

and your thought is to... buffer the shapes (some of them, or all of 
them) with asynchronous NextShape calls, then render the buffer?  I 
think I fail to grasp the full picture, because what will be going on 
while NextShape() asynchronously fetches the next shape(s)?  The answer 
can't be "nothing", or we fail to exploit parallelism.

Or are you suggesting we fetch the /first/ shape of every layer in 
parallel, so as to get the rest of the shapes queued up behind the first 
one (depending on the driver, sort of)?

Steve W. had valid concerns that overzealous buffering would use 
excessive memory.  I see now that msDrawVectorLayer() uses a pipelined 
approach which keeps minimal geometry (a single shape) around at once, 
leaving buffering decisions to the driver.  I like it.

Thanks,

Dave

> Best regards,
> 
> Tamas