[OSGeo-Discuss] OS Spatial environment 'sizing'

Randy George rkgeorge at cadmaps.com
Tue Feb 19 08:08:42 PST 2008


Hi Bruce,

 

                On the "scale relatively quickly" front, you should look at
Amazon's EC2/S3 services. I've recently worked with it and find it an
attractive platform for scaling http://www.cadmaps.com/gisblog

 

The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk Tomcat
+ Geoserver + custom SVG or XAML clients run out of Tomcat 

 

                If you use the larger instances the cost is higher but it
sounds like you plan on some heavy raster services (WMS,WCS) and lots of
memory will help.

Small EC2 instance provides $0.10/hr:

1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute
Unit), 160 GB of instance storage, 32-bit platform

 

Large EC2 instances provide $0.40/hr:

7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute
Units each), 850 GB of instance storage, 64-bit platform

 

Extra large EC2 instances $0.80/hr:

15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute
Units each), 1690 GB of instance storage, 64-bit platform

 

Note: that the instances do not need to be permanent. Some people (WeoGeo)
have been using a couple of failover small instances and then starting new
large instances for specific requirements. The idea is to start and stop
instances as required rather than having ongoing infrastructure costs. It
only takes a minute or so to start an ec2 instance. If you are running a
corporate service there may be parts of the day with very little use so you
just schedule your heavy duty instances for peak times. If you can connect
your raster to S3 buckets rather than instance storage you have built in
replicated backup.

 

I know that Java JAI can easily eat up memory and is core to Geoserver
WMS/WCS so you probably want to look at large memory footprint for any
platform with lots of raster service. I'm partial to Geoserver because of
its Java foundation.  I think I would try to keep the Apache2 mod_jk Tomcat
Geoserver on a separate server instance from PostGIS. This might avoid
problems for instance startup since your database would need to be loaded
separately. The instance ami resides in a 10G partition the balance of data
will probably reside on a /mnt partition separate from ec2-run-instances.
You may be able to avoid datadir problems by adding something like Elastra
to the mix. Elastra beta is a wrapper for PostgreSql that puts the datadir
on S3 rather than local to an instance. I suppose they still keep
indices(GIST et al) on the local instance. 

(I still think it an interesting exercise to see what could be done
connecting PostGIS to AWS SimpleDB services.)

 

So thinking out loud here is a possible architecture- 

    Basic permanent setup

put raster in S3 - this may require some customization of Geoserver, 

build a datadir in a PostGIS and backup to S3

create a private ami for Postgresql/PostGIS

create a private ami for the load balancer instance

create a private ami with your service stack for both a small and large
instance for flexibility, 

   Startup services

start a balancer instance

point your DNS CNAME to this balancer instance

start a PostGis instance (you could have more than one if necessary but it
would be easier to just scale to a larger instance type if the load demands
it)

have a scripted download from an S3 BU to your PostGIS datadir (I'm assuming
a relatively static data resource)

   Variable services

start service stack instance and connect to PostGIS

update balancer to see new instance - this could be tricky

repeat previous  two steps as needed 

at night scale back - cron scaling for a known cycle or use a controller
like weoceo to detect and respond to load fluctuation

 

By the way the public AWS ami with the best resources that I have found is
Ubuntu 7.10 Gutsy. The debian dependency tools are much easier to use and
the resources are plentiful.

 

I've been toying with using an AWS stack adapted for serving some larger
Postgis vector sets such as fully connected census demographic data and
block polygons here in US. The idea would be to populate the data directly
from the census SF* and TIGER with a background Java bot. There are some
potentially novel 3D viewing approaches possible with xaml. Anyway lots of
fun to have access to virtual systems like this. 

 

As you can see I'm excited anyway.

 

randy

 

 

From: discuss-bounces at lists.osgeo.org
[mailto:discuss-bounces at lists.osgeo.org] On Behalf Of
Bruce.Bannerman at dpi.vic.gov.au
Sent: Monday, February 18, 2008 6:35 PM
To: OSGeo Discussions
Subject: [OSGeo-Discuss] OS Spatial environment 'sizing'

 


IMO: 


Hello everyone, 

I'm trying to get a feel for server 'sizing' for a **hypothetical**
Corporate environment to support OS Spatial apps. 



Assume that: 

- this is a dedicated environment to allow the use of OS Spatial
applications to serve Corporate OGC Services. 

- the applications of interest are GeoServer, Deegree, GeoNetwork,
MapServer, MapGuide and Postgres/PostGIS. 

- the environment may need to scale relatively quickly. 

- it will be required to serve in the vicinty of 5 to 10 TB of data
initially (WMS, WFS, WCS). 



Can anyone shed some light on the following questions please? 

- I'm assuming a Linux installation (SLES, Redhat or Debian) or possibly
Intel Solaris. Has anyone experienced any issues in these (or other)
environments that they'd like to share? 

- Are there any recommendations as to dedicated network bandwidth that
should be allocated? 

- Has anyone done any work with load balancing and would like to share their
experiences? 

- Of the above OS Spatial products, which ones could co-exist on the same
server (excluding Postgres/PostGIS)? 


Any thoughts are appreciated. 


Bruce Bannerman 
Australia

Notice:
This email and any attachments may contain information that is personal,
confidential,
legally privileged and/or copyright. No part of it should be reproduced,
adapted or communicated without the prior written consent of the copyright
owner. 

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return
email, delete it from your system and destroy any copies. You are not
authorised to use, communicate or rely on the information contained in this
email.

Please consider the environment before printing this email.

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20080219/2d5d1d96/attachment-0002.html>


More information about the Discuss mailing list