Server Requirements for the OSGEO Reference Stack
Arnulf Christl (CCGIS)
arnulf.christl at ccgis.de
Thu Mar 9 09:10:13 PST 2006
Hi All,
some thoughts about the server requirements.
Is this a case of PiitWiSt! again (thats Finnish for 'Put it in the Wiki
Stupid!')
== Basic Hardware Requirements ==
The machine itself need not be exceptionally large or fast. (In my
opinion) What we would want to aim at is that it is as fast as possible
and has incredibly high connectivity and bandwidth all over the world. It
should be able to handle literally thousands of parallel requests. Why
that? Every time we launch a press release the boxes are going to be
slashdotted. You simply cannot prevent this from happening. Every news
channel will be linked and connected by RSS or Schuyler so it will be
virtually and real-lifely impossible to stagger news over several days to
relieve load. Whenever a new application or service goes online and OSGeo
goes on air within minutes it will go flat out. I am fairly curious at how
well CN will deal with the standard web pages and the Wikis.
The Mapbender boxes regularly blow up or rather stand still. For the
productive environments that are operated by customers this is heavily
embarrasing but only temporarily and after a few hours they take up
service again and the intended people can start to work business as usual.
With the OSGeo Service it will be different. It will be referenced from
many sites, universities and companies and it will be linked directly from
every single customer that uses the OSGeo stack (at least this should be
the case once projects go under the one roof). This means that most
requests to this server will come in peaks as there is not much "business
as usual". As soon as universities find out that they can actually use it
to do their GIS, SDI and OGC courses some constant load will build up
additionally. (((for obvious reasons the OGC never got around to setting
up a comprehensive reference site. This is obvious because their business
is dry standards and they cannot promote one solution over another or take
care that each implementation is maintained (set aside license issues).
OSGeo can do this and it would be nice to start it some day soon. It would
develop into something like the software business card))).
== Connectivity and Bandwidth ==
OK, conceded - then it does have to be a good machine. To get a feeling
for the available speed and type of distributed architecture (not high
parallel processing capabilities) you might want to check out this UMN
MapServer aerial photography service:
mb_perf_mon (can't find it from this terminal, I'll send it next week.)
The frame around it comes from an elderly dual 1 GHz CPU box with one GB
RAM. It runs on a devel server and is connected by a steady 2MBit. What is
more interesting is the orthophotos of the MapServer box. Zoom in to
around 1:5000 and then click on the frame border. The server is not magic,
a pyramid with four levels and TIFF uncompressed. The box itself is just a
dual CPU 1.8 GHz with local SCSII discs, also no magic. But it is dead
fast from anyhere in the world - I tested it from Delhi, Frankfurt,
London, Chicago (when we were in the meeting), and even Itajai (thats
where the last Brazilian MapServer meeting took place). The magic lies in
the connectivity, it was done by the Dutch / Swedish / Norwegian /
Extraterrestrial experts group adsemantics. They did an excellent job
connecting the City of Nuremberg (at least Dirk Wilhelm van Gulik said so
at the OSG 05 after he talked about the Apache Foundation). Those guys
work magic on the wires, nodes and also seem to know where to jump into
class A backbones.
== Potential Bottlenecks ==
The experience with many corporate sites is that they are slow. There are
several distinct reasons for this. One reason is that Internet
(capitalized for FW only) *content* is still often considered to be mostly
junk (true). Another is that many corporations operate large business
applications across the same wire as the internet services (or they are
identical). To ensure that standard business is not disturbed by a demo
service the throttle for the unloved "internet services" is often taken
back. The result is low performance.
But many of the OSGeo Foundation packages use the internet do do GIS,
which is very different. Its more comparable to what Google, Wikipedia or
Ebay do - each again from very different perspectives.
== University Connectivity ==
For all of these reasons it might be interesting to lure a university or
better even a network of universities into providing the demo services
that OSGeo wants to provide. Why then are some universities a lot faster?
Because from history they actually still make up a lot of class A
backbones. In Germany universities only use an averaged low single digit
percentage of their bandwidth, storage capacity and server load. Which is
amazing considering that students share several TB worth of video streams,
music files and thousands of backup copies of every software imaginable.
OK. Considering all this, the easier part of setting up the startup stack
is actually installing and maintaining the SDI software stack. This could
including a standard interface with edit option for the OSGeo member data
as WMS and WFS-T.
== Spatial Data for the Demo Stack ==
For a start it might do nothing but serve the locations of members,
sponsors and friends (however they will be named and categorized) as
points and symbols. Additionally it should have a repository of WMS and
WFS that are available Online (there are several repositories, one is
maintained in the Mapbender CVS, Refractions has a list, etc.).
== Software Packages for the SDI Stack ==
The coordinates and aplhanumeric attributes are stored in a
PostgreSQL/PostGIS database, rendered by a MapServer WMS, edited and
queried using a GeoServer WFS-T and displayed with the WMS clients
MapBuilder and Mapbender. MapGuide OS probably comes into play at all
levels (but I don't really know MapGuide OS any good yet).
== Names and URL ==
"Reference Site" is a more appropriate term than "Demo Site". Right from
the start all services should be available at a short URL something like
http://reference.osgeo.org
== Installation and Maintenance ==
In future it should be fairly easy to maintain the site by hitting a
button and automatically copy or compile the current SVN-version to this
adress. (Most packages will need different ways to this obviously, it
points to the Operating System pacakges, ports, etc.). Ideally the OSGeo
Foundation will operate one stack as 'dev', another as 'test', and the
stable version as 'prod' or 'reference'.
== Visions ==
A small vision for days to come (sitting on top of the alps gives a real
good panorama). People adding data in Germany will automatically connect
to e.g. University of Bonn. People doing that in the US will connect to
University of Minnesota or Dakota because they all appear under the same
URL. When data is requested from Germany it comes from the corresponding
mirror in the US. If the data seems to be out of date (or has been updated
recently, RSS feeds take care of that) it is heartbeated, replicated or
simply copied from the source to the mirror server. And vice versa
obviously. Some day people will have worked out how historizing spatial
data works (follow the geowiki discussions). Then we have would have
accomplished the next generation Wiki that Ward Cunningham sketched in the
keynote of the Wikimania 05, we would be able to stuff it with Free Maps
(sketched by Jimmy Wales as freedom topic to-be-done number 7 (Free the
maps) and done all this respecting good old RSM's perfectly ethic and
moral considerations regarding freedom of use, copyright, DRM and the evil
of the world.
None of these ideas have originated from my brains, I just proxy and glue
them together in a spatial context a little.
Best regards,
--
Arnulf Christl
More information about the Board
mailing list