[OSGeo-Discuss] OS Spatial environment 'sizing'

Cameron Shorter cameron.shorter at gmail.com
Tue Feb 19 11:43:25 PST 2008


Randy, what an informative email.
It is almost a "Howto for OSGeo hardware and performance tuning". I'm 
not aware of anyone who has written something similar (although I admit 
I have not looked).

I'd love to see it incorporated into an easily referenced resource - 
maybe a chapter in
http://wiki.osgeo.org/index.php/Educational_Content_Inventory

Also, a link from http://wiki.osgeo.org/index.php/Case_Studies .

What do you think?

Randy George wrote:
>
> Hi Bruce,
>
> On the “scale relatively quickly” front, you should look at Amazon’s 
> EC2/S3 services. I’ve recently worked with it and find it an 
> attractive platform for scaling http://www.cadmaps.com/gisblog
>
> The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk 
> Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>
> If you use the larger instances the cost is higher but it sounds like 
> you plan on some heavy raster services (WMS,WCS) and lots of memory 
> will help.
>
> Small EC2 instance provides $0.10/hr:
>
> 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 
> Compute Unit), 160 GB of instance storage, 32-bit platform
>
> Large EC2 instances provide $0.40/hr:
>
> 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 
> Compute Units each), 850 GB of instance storage, 64-bit platform
>
> Extra large EC2 instances $0.80/hr:
>
> 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 
> Compute Units each), 1690 GB of instance storage, 64-bit platform
>
> Note: that the instances do not need to be permanent. Some people 
> (WeoGeo) have been using a couple of failover small instances and then 
> starting new large instances for specific requirements. The idea is to 
> start and stop instances as required rather than having ongoing 
> infrastructure costs. It only takes a minute or so to start an ec2 
> instance. If you are running a corporate service there may be parts of 
> the day with very little use so you just schedule your heavy duty 
> instances for peak times. If you can connect your raster to S3 buckets 
> rather than instance storage you have built in replicated backup.
>
> I know that Java JAI can easily eat up memory and is core to Geoserver 
> WMS/WCS so you probably want to look at large memory footprint for any 
> platform with lots of raster service. I’m partial to Geoserver because 
> of its Java foundation. I think I would try to keep the Apache2 mod_jk 
> Tomcat Geoserver on a separate server instance from PostGIS. This 
> might avoid problems for instance startup since your database would 
> need to be loaded separately. The instance ami resides in a 10G 
> partition the balance of data will probably reside on a /mnt partition 
> separate from ec2-run-instances. You may be able to avoid datadir 
> problems by adding something like Elastra to the mix. Elastra beta is 
> a wrapper for PostgreSql that puts the datadir on S3 rather than local 
> to an instance. I suppose they still keep indices(GIST et al) on the 
> local instance.
>
> (I still think it an interesting exercise to see what could be done 
> connecting PostGIS to AWS SimpleDB services.)
>
> So thinking out loud here is a possible architecture–
>
> Basic permanent setup
>
> put raster in S3 – this may require some customization of Geoserver,
>
> build a datadir in a PostGIS and backup to S3
>
> create a private ami for Postgresql/PostGIS
>
> create a private ami for the load balancer instance
>
> create a private ami with your service stack for both a small and 
> large instance for flexibility,
>
> Startup services
>
> start a balancer instance
>
> point your DNS CNAME to this balancer instance
>
> start a PostGis instance (you could have more than one if necessary 
> but it would be easier to just scale to a larger instance type if the 
> load demands it)
>
> have a scripted download from an S3 BU to your PostGIS datadir (I’m 
> assuming a relatively static data resource)
>
> Variable services
>
> start service stack instance and connect to PostGIS
>
> update balancer to see new instance – this could be tricky
>
> repeat previous two steps as needed
>
> at night scale back – cron scaling for a known cycle or use a 
> controller like weoceo to detect and respond to load fluctuation
>
> By the way the public AWS ami with the best resources that I have 
> found is Ubuntu 7.10 Gutsy. The debian dependency tools are much 
> easier to use and the resources are plentiful.
>
> I’ve been toying with using an AWS stack adapted for serving some 
> larger Postgis vector sets such as fully connected census demographic 
> data and block polygons here in US. The idea would be to populate the 
> data directly from the census SF* and TIGER with a background Java 
> bot. There are some potentially novel 3D viewing approaches possible 
> with xaml. Anyway lots of fun to have access to virtual systems like 
> this.
>
> As you can see I’m excited anyway.
>
> randy
>
> *From:* discuss-bounces at lists.osgeo.org 
> [mailto:discuss-bounces at lists.osgeo.org] *On Behalf Of 
> *Bruce.Bannerman at dpi.vic.gov.au
> *Sent:* Monday, February 18, 2008 6:35 PM
> *To:* OSGeo Discussions
> *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>
>
> IMO:
>
>
> Hello everyone,
>
> I'm trying to get a feel for server 'sizing' for a **hypothetical** 
> Corporate environment to support OS Spatial apps.
>
>
>
> Assume that:
>
> - this is a dedicated environment to allow the use of OS Spatial 
> applications to serve Corporate OGC Services.
>
> - the applications of interest are GeoServer, Deegree, GeoNetwork, 
> MapServer, MapGuide and Postgres/PostGIS.
>
> - the environment may need to scale relatively quickly.
>
> - it will be required to serve in the vicinty of 5 to 10 TB of data 
> initially (WMS, WFS, WCS).
>
>
>
> Can anyone shed some light on the following questions please?
>
> - I'm assuming a Linux installation (SLES, Redhat or Debian) or 
> possibly Intel Solaris. Has anyone experienced any issues in these (or 
> other) environments that they'd like to share?
>
> - Are there any recommendations as to dedicated network bandwidth that 
> should be allocated?
>
> - Has anyone done any work with load balancing and would like to share 
> their experiences?
>
> - Of the above OS Spatial products, which ones could co-exist on the 
> same server (excluding Postgres/PostGIS)?
>
>
> Any thoughts are appreciated.
>
>
> Bruce Bannerman
> Australia
>
> Notice:
> This email and any attachments may contain information that is 
> personal, confidential,
> legally privileged and/or copyright. No part of it should be 
> reproduced, adapted or communicated without the prior written consent 
> of the copyright owner.
>
> It is the responsibility of the recipient to check for and remove viruses.
>
> If you have received this email in error, please notify the sender by 
> return email, delete it from your system and destroy any copies. You 
> are not authorised to use, communicate or rely on the information 
> contained in this email.
>
> Please consider the environment before printing this email.
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>   


-- 
Cameron Shorter
Geospatial Systems Architect
Tel: +61 (0)2 8570 5050
Mob: +61 (0)419 142 254

Think Globally, Fix Locally
Commercial Support for Geospatial Open Source Solutions
http://www.lisasoft.com/LISAsoft/SupportedProducts.html




More information about the Discuss mailing list