[Benchmarking] Some thoughts about a disk bound test

Tue Sep 14 09:01:25 EDT 2010

Adrian Custer ha scritto:
> Hey Andrea, All,
> 
> On Mon, 2010-09-13 at 09:15 +0200, Andrea Aime wrote:
>> Hi,
>> I was thinking about the possibility of making a disk bound
>> test next year.
> 
> During some wonderful hours in the squares and cafés of Barcelona this
> weekend, I got to digest the exercise some more and plan for the future.
> After many hours of thought, my analysis has certainly evolved.
> 
> 
>> The current server setup is quite un-balanced on that point of
>> view: we have 8 cpu's (which allows for a good cpu bound testing)
>> but the disk subsystem is low performance, single disk with rather
>> low throughput and high seek times (mind, I'm not complaining,
>> I'm very grateful we have such hardware to start with, just making
>> an honest assessment of what I think would be necessary to perform
>> a disk bound test).
>>
>> If next year we're going to make a I/O bound testing we should make
>> sure we have a better disk subsystem, which also implies we should
>> look for funding to get one.
> 
> Why? 
> 
> Once again, it all depends what the test is trying to *do*. If one wants
> 'realistic' numbers, we might have to buff up this weak link. However,
> if we are trying either to discriminate between servers or to find
> weaknesses in our own server, then the slower the disk the better --- it
> highlights any issues there might be.

A slow resource magnifies certain kind of problems, but tends to hide
others.
Think of the java2d scalability issue. It has been there for over ten
years, so it was there in all previous benchmarks as well.
In particular last year we have 8 CPU too, but they were slower, and the
test exercised quite a bit PNG encoding an only to a lesser extent
feature drawing. A ten years old problem was finally made evident
this year.

While I don't have proof, I suspect something similar might pop up.

The other reason to upgrade is that the servers as is are very much
unbalanced, whilst normally people tend to balance resources (and
their prices) when acquiring a server. If one knows the disk subsystem
is going to be important for performance money would be moved from
the cpu/memory compartment to the disk one (e.g., when I got my last
desktop machine I spent twice as much on the disk than on the CPU
because experience showed that was one of my major bottlenecks).

I also think that one of the recurring complaints this year was
about having a system and a test that were not "realistic" or
not "representative". Having balanced machines should help address
this issue and make people looking at the results more confident,
as the machine used actually looks like a real world server.

One final note emerges from looking at the disk bound results:
what you get is basically a flat line, mostly because the disk
is already playing the bottleneck at 2 or 4 threads.
If we go up to 64 it would be nice to have a system that allows
performance to raise as we add threads, at least for a bit
more than just 2-4 threads.
Again, it's more of a matter of perception when presenting
the results than scientific correctness. Not to say the latter
is not important, but just observing this is a public presentation
in a popular conference as opposed to a scientific paper, so we
should also care about people perception and produce a
result that is not only correct but also, to some extent,
entertaining.

> I believe we have a *lot* of work to do defining our own aims and goals
> before we launch into hardware purchases or sponsorship requests. Let us
> work on defining our goals, develop the tests we want to perform, and
> prepare the groundwork for any future tests before worrying about the
> hardware.
 > I hope to issue a longish document with my analysis and proposals
 > sometime in the near future

Yep, I agree. Unfortunately I cannot put days or weeks into thinking
about a detailed plan, I'm just using the scraps of time I have
available to make simple suggestions and propositions, and
I'm looking forward to your proposed test design.

Cheers
Andrea

-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.