[mapserver-dev] Speed in accessing World .wld files varies across disk systems

David Fuhry dfuhry at acm.org
Sat Oct 18 12:58:19 EDT 2008


As a general tip, if you use the ext3 filesystem (most Linux systems 
do), using the "-o dir_index" option when creating a filesystem will, 
according to `man mkfs.ext3`, "Use hashed b-trees to speed up lookups in 
large directories".

If you've got tens of thousands of files in a directory, looking up a 
specific filename will be much quicker.  If you've only got hundreds of 
files, you probably wouldn't see much of a difference.

ext4 filesystems will default dir_index to on.

-Dave

Chris Galli (XC Skies) wrote:
> That's exactly why I came up with such a bold theory :) My python code 
> is very simple and targets the exact same files. The performance for 
> getting any arbitrary 256x256 tile from any of several layers with my 
> code is about .08 seconds. The exact same mapserv request was roughly .3 
> seconds in this new (slower) environment.
> 
> When multiple requests are made via http for tiles, then mapserv starts 
> slowing down considerably, where in some instances it takes over 3 
> seconds per tile request. This is likely due to competition for disk IO 
> resources, but my custom python code running as a listening http server 
> consistently yields .07 to .09 seconds per tile for identical 
> simultaneous requests. This is why I'm left scratching my head still.
> 
> My temporary "solution" was to create more directories on the file 
> system by day, and store a limited amount of files in each directory. 
> i.e., 20080901/*files* and 20080902/*files*. I only store a rotating 
> archive of 2 weeks, but this was the best I could do for now.
> 
> Thanks for your thoughts. I'll keep thinking about how else to debug this.
> 
> -Chris
> 
> Paul Ramsey wrote:
>> Note that "lots of files in a directory" is a common performance
>> anti-pattern. The fopen() has to linearly scan the directory contents
>> to find the file requested, so too many files means too much seaking.
>> However, you should see about the same performance fall off for
>> Mapserver as for any other program doing file opens, and you haven't
>> described that kind of behavior.
>>
>> P.
>>
>> On Fri, Oct 17, 2008 at 3:55 PM, Chris Galli (XC Skies)
>> <cgalli at xcskies.com> wrote:
>>  
>>> Thanks for the post Paul. I can clearly see what you've pointed out 
>>> so I can
>>> chuck that theory out the window. I'll dig deeper into the 
>>> intricacies of my
>>> different disks again and see what I can shake out.
>>>
>>> -Chris
>>>
>>> Paul Ramsey wrote:
>>>    
>>>> Here's the function in question:
>>>>
>>>> http://trac.osgeo.org/mapserver/browser/trunk/mapserver/mapraster.c#L342 
>>>>
>>>>
>>>> As you can see, it doesn't do a directory search, though it does work
>>>> its way through a number of possible extension options. Note that
>>>> "wld" is the *first* option though, so that's not your problem.
>>>>
>>>> P.
>>>>
>>>> On Fri, Oct 17, 2008 at 3:29 PM, Chris Galli <cgalli at xcskies.com> 
>>>> wrote:
>>>>
>>>>      
>>>>> Hi Everyone,
>>>>>
>>>>> I know the above statement seems like it deserves an obvious 
>>>>> answer, so
>>>>> first let me say that I understand the complexities of disk
>>>>> implementations
>>>>> enough to realize that speed depends on a tremendous amount of factors
>>>>> and
>>>>> so cannot be easily discussed in terms of absolutes when comparing
>>>>> different
>>>>> disk systems. With that said, however, I'm seeing behaviour that 
>>>>> leads me
>>>>> to
>>>>> believe the discovery process for .wld files can be improved in 
>>>>> mapserv.
>>>>> I've tested with V 4.10 and 5.2 and they produce identical results.
>>>>>
>>>>> Here's the crux:
>>>>> When rendering raster images (say png files) which use .wld world 
>>>>> files
>>>>> via
>>>>> the cgi interface, I get wildly different response times on different
>>>>> linux
>>>>> systems. After a lengthy discovery process of why this was, I have 
>>>>> come
>>>>> to
>>>>> the conclusion that mapserv is probably not targeting wld files 
>>>>> directly
>>>>> on
>>>>> the file system, and instead looking for matching wld files for raster
>>>>> images by using some type of 'wild card' or other inefficient scan 
>>>>> of the
>>>>> file's current directory.
>>>>>
>>>>> For example, if I place a single raster png file called world.png 
>>>>> with a
>>>>> world.wld in an empty directory and turn on mapserver debug, response
>>>>> times
>>>>> seem reasonable. As I increase the amount of files within the 
>>>>> directory,
>>>>> the
>>>>> mapserv raster rendering becomes increasingly slower (asking for a 
>>>>> single
>>>>> 256x256 tile from a 1MB png file). When I perform the same test on
>>>>> another
>>>>> system, I barely see a slowdown in performance. Why? Because one disk
>>>>> system
>>>>> is much more robust with directory caching and disk-to-memory 
>>>>> hardware.
>>>>> Fair
>>>>> enough. But when I run the same tests on tiff files, both systems 
>>>>> produce
>>>>> identical results to within a few milliseconds. This implies that wld
>>>>> files
>>>>> are likely not being targeted efficiently.
>>>>>
>>>>> In addition to the above, I have some custom python code that accesses
>>>>> the
>>>>> exact same png raster files and servers them up to the exact 
>>>>> extents and
>>>>> tile size as does mapserv using the GD libs.  And that code was 
>>>>> actually
>>>>> returning tiles faster on the system which mapserv was running so 
>>>>> poorly.
>>>>> My
>>>>> code expects a file to exist and so does not need to 'discover' it,
>>>>> making
>>>>> the process much more efficient.
>>>>>
>>>>> Does anyone know or suspect that the above is true? If so, how does 
>>>>> one
>>>>> go
>>>>> about providing more details and elevating this to a potential
>>>>> change/enhancement?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -Chris
>>>>>
>>>>> -- 
>>>>> View this message in context:
>>>>> http://www.nabble.com/Speed-in-accessing-World-.wld-files-varies-across-disk-systems-tp20042027p20042027.html 
>>>>>
>>>>> Sent from the Mapserver - Dev mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> mapserver-dev mailing list
>>>>> mapserver-dev at lists.osgeo.org
>>>>> http://lists.osgeo.org/mailman/listinfo/mapserver-dev
>>>>>
>>>>>
>>>>>         
>>>     
> 
> _______________________________________________
> mapserver-dev mailing list
> mapserver-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapserver-dev


More information about the mapserver-dev mailing list