Parallelizing calls to msDrawLayer()

Sun Oct 14 13:43:14 EDT 2007

Ed,

Ed McNierney wrote:
> Dave -
> 
> I think the answer to your "what will be the net effect" is, "it
> depends".  That's the problem - there's no unalloyed good here; some
> scenarios would benefit, and some would be degraded.
> 
> If the "real-world I/O scheduler" simply re-serializes the requests
> we've gone to some effort to parallelize, there won't be much gain.  And
> I'm not nearly as confident as you that "real-world I/O schedulers"
> really work that way.  My empirical (not analytical) experience is
> otherwise.  You seem to be suggesting that the scheduler *knows* that
> thread 2 is going to call NextShape again, so it will just wait for that
> request and serve it rather than serving thread 3 - it might be good,
> but it's not clairvoyant.

Well, the system will at least read `hdparm /dev/sda | grep readahead` 
KB from the disk (on my system, 256).  That will keep layer 2's 
NextShape() spinning for a bit, but eventually a NextShape() call will 
need to go to disk again.  After a few NextShape() calls which read from 
disk, it will trigger adaptive readahead, which will read a significant 
chunk of the file into the page buffer.  Then it won't need the disk 
again for some time.

This whole time, the device may have been doing a bit of readahead of 
its own.  It's what the drives do with those 8MB-16MB caches.

Even assuming that msDrawShape() is blindingly fast, it seems that there 
would be some head idle time in here, during which the disk could get 
layer 3 going.

If the files are located on separate disks, of course, it's game over in 
favor of the threaded model.

> I still think it is important to understand just who the target user is
> for this exercise.  Is this intended to be the user not interested in
> careful data organization and hardware tuning, or the user who doesn't
> want to know all that and just wants things to work faster out of the
> box?  Many users don't have any NFS servers in their systems at all and
> only read local disk data.  MapServer is used across a very wide
> spectrum of usage scenarios, and I don't think that we (as a community)
> have a very good understanding of what the "typical" MapServer system
> looks like.

The target user would be anyone who has an idle core on their system, or 
who has an idle disk or idle db server or network connection, which will 
need to be accessed by a mapserv process which is currently running.

Replace "NFS request" with "PostGIS / Oracle Spatial / SDE request" or 
"WMS/WFS request" or "RAID request" or "secondary disk request" or 
"request to a disk with a buffer".  These I/O operations take time, 
during which nothing is done (unless every component of your system is 
always 100% saturated with other requests).

We don't know all users' usage scenarios, but we know what machines are 
doing.  Pretty soon users won't be able to buy single-core CPUs.  Most 
production environments run RAID1 or RAID5 arrays which can often 
efficiently respond to multiple requests at once.  DBs are moved from 
app servers to other machines if scalability warrants it.  Systems are 
becoming /more/ parallel, not less.  Mapserver may be able to take 
advantage of that, while also helping (or at least not hurting) the 
lone-CPU, single-disk guy.

Thanks,

Dave

> 	- Ed
> 
> 
> -----Original Message-----
> From: UMN MapServer Developers List [mailto:MAPSERVER-DEV at LISTS.UMN.EDU]
> On Behalf Of David Fuhry
> Sent: Sunday, October 14, 2007 11:33 AM
> To: MAPSERVER-DEV at LISTS.UMN.EDU
> Subject: Re: [UMN_MAPSERVER-DEV] Parallelizing calls to msDrawLayer()
> 
> Ed,
> 
>     I can answer your second question succinctly.  My whole line of 
> thinking stems from one question: "If, at the top of msDrawMap(), I 
> spawn a thread to render each layer, what will be the net effect?"
> 
>     The effect for my earlier example will be this.  Thread 1 will make 
> the NFS request and wait on I/O.  Simultaneously, threads 2-n will make 
> use of the idle CPU and idle local disk to render their vector layers. 
>    Then, when Thread 1 has finshed, much or all of the other layers' 
> work has been done.
> 
>     As to the excessive disk-seek concern.  Yes, a stupid I/O scheduler 
> will do exactly that.  Thread 2 will call NextShape(), and it will seek 
> to layer2.shp.  At the next instant, thread 3 will call NextShape(), and
> 
> the head will seek to layer3.shp.  Then thread 2 will call NextShape() 
> again and the head will seek back to layer2.shp.  It's hyperactive. 
> Will a real-world I/O scheduler really do this?  No.
> 
>     Consider an analogy with the process scheduler.  You have two 
> simultaneous map requests come in.  Two mapserv processes are launched. 
>   They both begin rendering shapes on the canvas.  Will the OS scheduler
> 
> call process 1's NextShape(); msDrawShape(); context-switch to process 
> 2, call a single NextShape(); msDrawShape();, context-switch back to 
> process 1, call its next NextShape(); msDrawShape();, context-switch, 
> etc.?  No way!  Context switches are expensive.  It will timeslice the 
> tasks, and let process 1 do a significant amount of work, then switch to
> 
> process 2 for a significant amount of time, then back, etc.  And if one 
> processe's NextShape() should cause a wait on I/O, it will surely yield 
> the CPU to the other so that it can get work done.
> 
>     The I/O scheduler may be no Einstein, but surely it knows that seeks
> 
> are very expensive (much more expensive than context-switches).  It can 
> timeslice I/O for a process accessing a device.  It can do readahead. 
> If it has several requests queued up for a device, it can order them to 
> minimize seek time.  It knows that requests to different devices can be 
> sent in parallel.  A huge part of its job is to avoid excessive disk 
> seeks for multiple requests to the same device.
> 
>     I think that if it knows the requests ahead of time, it can retrieve
> 
> the data more quickly than a serial loop.  Which makes an NFS request. 
> And waits.  Then processes it.  Then makes a local disk request.  And 
> waits.  etc.
> 
> Thanks,
> 
> Dave
> 
> Ed McNierney wrote:
>> Dave -
>>
>> If those rasters and shapefiles are on the same disk volume,
> "parallelizing" them is very likely to actually make things worse!
> MapServer will simply get in its own way, causing excessive disk seeks.
> And that may well be the most common situation for many users.
>> There are many good ideas here, but not all good ideas work for all
> scenarios.  Your initial comments seemed focus on rasterization time,
> not disk I/O optimization, which could be implemented in a rather
> different manner (one could fetch data in parallel and then render
> serially, as is done with WMS input).  It's good to bring these ideas
> out on the table, but this discussion is ranging over a lot of different
> ideas suitable for a lot of different situations.  It would be helpful
> to distinguish whether you're trying to (a) provide a built-in
> optimization for common use cases or a tool that can be used by
> sophisticated users, and (b) whether you're focusing on drawing speed or
> data fetch times.
>> 	- Ed
>>
>> Ed McNierney
>> Chief Mapmaker
>> Demand Media / TopoZone.com
>> 73 Princeton Street, Suite 305
>> North Chelmsford, MA  01863
>> Phone: 978-251-4242, Fax: 978-251-1396
>> ed at topozone.com
>>
>>
>>
>>
>> -----Original Message-----
>> From: David Fuhry [mailto:dfuhry at cs.kent.edu] 
>> Sent: Saturday, October 13, 2007 11:19 PM
>> To: Ed McNierney
>> Cc: MAPSERVER-DEV at LISTS.UMN.EDU; woodbri at SWOODBRIDGE.COM
>> Subject: Re: [UMN_MAPSERVER-DEV] Parallelizing calls to msDrawLayer()
>>
>> Ed,
>>
>>     Indeed.  I'm very appreciative of all your guys' comments.  I
> should 
>> clarify one point.
>>
>>     I agree that it would not be worthwhile to just parallelize 
>> rendering.  What I thing might be worthwhile, is to parallelize both 
>> rendering *and* the I/O that necessarily precedes it.  I view the
> latter 
>> as (through no fault of mapserver's) the bottleneck, and the former as
> 
>> perhaps icing on the cake.  Since the late bird doesn't get the worm
> ;), 
>> Ed, may I use you for an example?
>>
>>     You serve lots of raster.  Let's say your mapfile is composed of a
> 
>> base raster layer and a few vector layers on top.  A request is made. 
>> mapserv goes to render the first layer.  The tileindex is probably in 
>> the page buffer (memory), so it looks up the tile(s) quick and goes to
> 
>> fetch the raster image(s).
>>
>>     Maybe the images have to come across NFS.  The request goes over 
>> GigEth; ping / 2 says this takes 36ms.  The fileserver seeks its
> 7200RPM 
>> disk to the start of the TIFF; Seagate says this takes 8.5ms.  Let's
> say 
>> the sequential read & transfer back take zero time, except the 36ms 
>> lower-bound on network time.
>>
>>     What has mapserv done in this 36 + 8.5 + 36 = 80.5ms?  Nothing. 
>> Just waited on I/O.  It could have perhaps rendered several vector 
>> layers from ramdisk in this time.
>>
>>     Now Frank's GDAL goes to work mosaicing/clipping/warping/resizing 
>> the raster image(s).  It's chewing CPU.  Let's say that layers 2 thru
> n 
>> are shapefiles on the local disk.  What is the local disk doing during
> 
>> this time?  Nothing.  It could be checking for the existence of 
>> layer2.qix, or seeking to the start of layer2.shp, or likewise for 
>> layers3 thru n.  None of which will interfere with GDAL's work. 
>> Instead, we wait until GDAL is through, then waste 8.5ms seeking to
> the 
>> start of layer2.shp.  We could have sought layer2.shp even earlier, 
>> while waiting on the NFS request.
>>
>>     The thing which seems beautiful to me, is that OS schedulers (both
> 
>> process & I/O) are designed to be good at receiving a bunch of
> requests, 
>> and resolving them efficiently.  I think there may be value in
> launching 
>> a thread for each layer, thus throwing all the requests up against the
> 
>> OS at once, and letting its schedulers try to make the best use of CPU
> 
>> and I/O resources.  I have to imagine that they are likely to do
> better 
>> than having everything wait in line.
>>
>>     Yes, this would put more of a strain on the server at that
> instant. 
>>   By pipelining I/O though, it will also return a map quicker.
>>
>> Thanks,
>>
>> Dave
>>
>>
>> Ed McNierney wrote:
>>> David -
>>>
>>> I didn't have the time for a thoughtful reply earlier, and now most
> other folks have already raised some of the concerns I had.  I should
> hesitate more often, I guess - it saves typing <g>.
>>> I think Steve Woodbridge's comment is informative.  For his
> application he found a 4x-5x improvement by caching data files in a RAM
> disk.  That basically says that something like 80% of his entire
> MapServer rendering time is spent in disk I/O, not drawing.  Many users
> don't have a situation in which they can put their data in RAM disk.
> For a comparable kind of application you would then reasonably predict
> that optimizing multi-layer rendering so it was instantaneous would only
> produce a 20% performance improvement.
>>> Although I think MapServer's disk I/O is pretty good, if I were to
> spend time hunting for performance improvements I would be inclined to
> look at the various data I/O schemes.  Anything that can be done to
> reduce disk I/O is a big win (some of those improvements are, of course,
> external to MapServer itself in the form of data organization and
> indexing schemes).
>>> 	- Ed
>>>
>>> Ed McNierney
>>> Chief Mapmaker
>>> Demand Media / TopoZone.com
>>> 73 Princeton Street, Suite 305
>>> North Chelmsford, MA  01863
>>> Phone: 978-251-4242, Fax: 978-251-1396
>>> ed at topozone.com
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: UMN MapServer Developers List
> [mailto:MAPSERVER-DEV at LISTS.UMN.EDU] On Behalf Of David Fuhry
>>> Sent: Saturday, October 13, 2007 8:39 PM
>>> To: MAPSERVER-DEV at LISTS.UMN.EDU
>>> Subject: Re: [UMN_MAPSERVER-DEV] Parallelizing calls to msDrawLayer()
>>>
>>> Paul,
>>>
>>>     Thanks, that's a good suggestion.
>>>
>>>     I guess my thought is, given a really good implementation, a 
>>> heavily-contended server with a bright scheduler would just end up 
>>> scheduling the threads sequentially on the same CPU (perhaps likely, 
>>> since a small bit of the necessary data is in that processor's L1
> cache 
>>> already).  Then the onus is on the implementer to make sure that the 
>>> extra overhead is pretty low.
>>>
>>>     It sort of pushes some of the responsibility to the OS scheduler.
> 
>>> Which I think most of the time, will make better decisions than will
> a 
>>> deterministically-ordered mapserv loop.
>>>
>>> Thanks,
>>>
>>> Dave
>>>
>>> Paul Spencer wrote:
>>>> David,
>>>>
>>>> While you can perhaps gain some performance in a single map draw, in
> 
>>>> most real life uses of mapserver, folks are either serving many 
>>>> simultaneous requests or generating tiles in some way.  I think in 
>>>> either case, the addition of multi-threaded layer draws will
> actually 
>>>> cause contention for processor time with the multiple processes that
> are 
>>>> serving the requests and could hurt overall performance in high load
> 
>>>> systems.
>>>>
>>>> I think that you could probably get more bang for your development
> bucks 
>>>> by investing time in profiling the existing code.
>>>>
>>>> Cheers
>>>>
>>>> Paul
>>>>
>>>> On 13-Oct-07, at 6:37 PM, David Fuhry wrote:
>>>>
>>>>> Tamas,
>>>>>
>>>>>    (responses inline)
>>>>>
>>>>> Tamas Szekeres wrote:
>>>>>> David,
>>>>>> I consider it would be reasonable to establish such mechanism only
>>>>>> when fetching the data of the layers. Likewise currently the
> WMS/WFS
>>>>>> layers are pre-downloaded in parallel before starting to draw the
> map.
>>>>>> We should have a similar approach when fetching the other layers
> as
>>>>>> well.
>>>>>    Yes, I noticed that WMS/WFS layers are downloaded in parallel 
>>>>> before rendering begins.  And I agree, it would be advantageous to 
>>>>> extend the parallel-data-fetching paradigm to all layers.
>>>>>
>>>>>    For non-WMS/WFS layers though, wouldn't it be a significant 
>>>>> disruption to the codebase to add lines 1 and 2 into msDrawMap()?
>>>>>
>>>>> 1. for i=1 to layers.length (in parallel)
>>>>> 2.   data[i] = fetch_data_for_layer(i)
>>>>> 3. for i=1 to layers.length (serially)
>>>>> 4.   msDrawLayer(data[i])
>>>>>
>>>>>   ISTM that the data-fetching logic might be best left abstracted 
>>>>> beneath msDrawLayer().
>>>>>
>>>>>> However pre drawing all of the layers and later copying the layers
>>>>>> over the map image seems to be much less efficient.
>>>>> Drawing n layers onto n imageObjs is no more expensive than drawing
> n 
>>>>> layers onto one imageObj, and the former can be parallelized across
> n 
>>>>> threads.
>>>>> Although yes, I agree that composition (the "merge" step) will cost
> 
>>>>> something.
>>>>> I'm entertaining the idea that the time saved by parallel fetching
> & 
>>>>> drawing might outweigh the cost of composition.
>>>>>
>>>>>> When using the parallel fetching approach we should deal only with
> the
>>>>>> drivers from the aspect of the thread safety issues.
>>>>> I might be misunderstanding your point here, but... Rendering a
> layer 
>>>>> into an independent imageObj should be a pretty independent
> operation, 
>>>>> and could be made so if it's not now.  Glancing at the mapserver 
>>>>> thread-safety FAQ, it seems there are more unsafe & locked
> components 
>>>>> related to data-fetching drivers than there are for rendering.
> Which 
>>>>> makes me wonder why you suggest parallelizing the data-fetching but
> 
>>>>> not the rendering.
>>>>>
>>>>> Forgive me if I'm playing a bit of devil's advocate here.  I'm
> aware 
>>>>> that non-reentrant functions don't rewrite themselves, and that 
>>>>> critical sections don't surround themselves with mutexes.  Surely 
>>>>> though, it ought not to be a tremendous amount amount of work to
> keep 
>>>>> separate layer-drawing operations from stepping on eachothers'
> toes?
>>>>> Thanks,
>>>>>
>>>>> Dave Fuhry
>>>>>
>>>>>> Best regards,
>>>>>> Tamas
>>>>>> 2007/10/12, David Fuhry <dfuhry at cs.kent.edu>:
>>>>>>> Has anyone looked into parallelizing the calls to
> msDraw[Query]Layer()
>>>>>>> in msDrawMap()?
>>>>>>>
>>>>>>> Although I'm new to the codebase, it seems that near the top of
>>>>>>> msDrawMap(), we could launch a thread for each (non-WMS/WFS)
> layer,
>>>>>>> rendering the layer's output onto its own imageObj.  Then where
> we now
>>>>>>> call msDraw[Query]Layer, wait for thread i to complete, and
> compose 
>>>>>>> that
>>>>>>> layer's imageObj onto the map's imageObj.
>>>>>>>
>>>>>>> In msDraw[Query]Layer(), critical sections of the mapObj (adding
> labels
>>>>>>> to the label cache, for instance) would need to be protected by a
> 
>>>>>>> mutex.
>>>>>>>
>>>>>>> A threaded approach would let some layers get drawn while others
> are
>>>>>>> waiting on I/O or for query results, instead of the current
> serial
>>>>>>> approach where each layer is drawn in turn.  Multiprocessor
> machines
>>>>>>> could schedule the threads across all of their cores for
> simultaneous
>>>>>>> layer rendering.
>>>>>>>
>>>>>>> It seems this could significantly speed up common-case rendering,
>>>>>>> especially on big machines, for very little overhead.  Has there
> been
>>>>>>> previous work in this area, or are any major drawbacks evident?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Dave Fuhry
>>>>>>>
>>>> +-----------------------------------------------------------------+
>>>> |Paul Spencer                          pspencer at dmsolutions.ca    |
>>>> +-----------------------------------------------------------------+
>>>> |Chief Technology Officer                                         |
>>>> |DM Solutions Group Inc                http://www.dmsolutions.ca/ |
>>>> +-----------------------------------------------------------------+