[OSGeo-Discuss] OS Spatial environment 'sizing' + Image Management
Arnulf Christl
arnulf.christl at wheregroup.com
Wed Feb 20 10:25:58 PST 2008
Lucena, Ivan wrote:
> Hi Bruce,
>
> Here I am again...
>
> Randy suggestion are pretty valuable and very well based but I have a
> especial interest on storing raster on databases so that is why I asked
> about it.
>
> Yes, raster is chunky and not very fluid but I love to hear from
> successful experience like Bruce's. And as Bruce also mentioned
> analytical process often needs to query on cell space rather than bands.
>
> Remember that decades ago some of us would be discussing the
> disadvantage of storing "vector" on databases now it is the norm for
> client/server application.
Hehe, good move, that one caught my attention. Nonetheless I will stick with suggesting to keep rasters out of databases - especially if you want to keep it for generations to come. You never know whether you will be able to get the stuff back out of SDE and Oracle in time to back it up and use it elsewhere[1]. So if you want to keep it for generations to come save it as uncompressed pixels on some large hard disks. Better still, publish them openly so that people can take them home and store it there for further reference. Distributing and making it widely available will assure that it survives (if you really mean it).
> Bruce mentioned SDE and Oracle, but what are the *open source* options
> to do *image management* on open source databases and who is using it?
There are none because nobody is using them. Thats an old and very useful Open Source deadlock. If there is no use for it, don't implement it. Fun part aside - asfaik to this day no need has grown for any real world Open Source implementations.
Regards,
[1] The brutal truth is this: when your key business processes are executed by opaque blocks of bits that you can't even see inside (let alone modify) you have lost control of your business. You need your supplier more than your supplier needs you--and you will pay, and pay, and pay again for that power imbalance. You'll pay in higher prices, you'll pay in lost opportunities, and you'll pay in lock-in that grows worse over time as the supplier (who has refined its game on a lot of previous victims) tightens its hold.
http://www.oreilly.com/catalog/cathbazpaper/chapter/ch05.html#AUTOID-1787
> I can only think of two, the PostGIS CHIP datatype and Terralib schemas
> (MySQL, PostgreSQL, and commercial RDBMS) but I don't know of any
> *sizable* project that is using then.
>
> Does anybody know and would like to share?
>
> Best regards,
>
> Ivan
>
>
> Bruce.Bannerman at dpi.vic.gov.au wrote:
>>
>> IMO:
>>
>>
>> Hi Randy,
>>
>> Thank you for your informative post. It has given me a lot to follow
>> up on and think about.
>>
>> I can see an immediate need that this type of solution could well be
>> used for. I like it.
>>
>> I suspect that in many larger corporate types of environments, it
>> could well be used effectively for 'pilot' and 'pre-production' type
>> tasks.
>>
>> For 'production' type environments, there would be issues of
>> integrating an external service hosting spatial data with internal
>> services hosting corporate aspatial data sources and applications.
>>
>>
>>
>> with regards to storing imagery in a database:
>>
>> <rant> (and not directed at you)
>>
>> I've also seen a lot of reports suggesting that image management
>> should be file based.
>>
>> My personal preference is to use a database if possible, so that I can
>> take advantage of corporate data management facilities, backups, point
>> in time restores etc.
>>
>> I've managed 70 GB orthophoto mosaics in ArcSDE / Oracle before with
>> minimal problems. I found performance and response times to be
>> comparable with other image web server options on the market that use
>> file based solutions for storing data.
>>
>> Ideally, I'm looking to manage state wide mosaics with a consistant
>> look and feel that can be treated as a single 'layer' by client GIS /
>> Remote Sensing applications (data integrity issues allowing).
>>
>> One potential use is 'best available' data mosaics could undergo
>> regular updates as more imagery is flown or captured. A database makes
>> it easier to manage and deliver such data.
>>
>> My definition of 'imagery' goes beyond aerial photographs and includes
>> multi or hyper-spectral imagery; various geophysics data sources such
>> as aeromagnetics, gravity, radiometrics; radar data etc.
>>
>> Typically this data is required for digital image analysis purposes
>> using a remote sensing application, so the integrity of 'the numbers'
>> that make up the image is very important.
>>
>> Many of today's image based solutions use a (lossy) wavelet
>> compression that can corrupt the integrity of 'the numbers' describing
>> the radiometric data in the image.
>>
>> When we consider the big picture issues facing us today, such as
>> Climate Change, I think that it is important to protect our definitive
>> image libraries from such corruption as they will be invaluable
>> sources of data for future multi-temporal analysis.
>>
>> That said, if the end use is just for a picture, then a wavelet
>> compression is a good option. Just protect the source data for future
>> use.
>>
>> </rant>
>>
>> So, does anyone know of a good open source spatial solution for
>> storing and accessing (multi and hyperspectral) imagery in a
>> database? ;-)
>>
>> WMS 1.3 and WCS are showing promise for serving imagery, including
>> multi and hyperspectral data.
>>
>>
>>
>> Bruce Bannerman
>>
>>
>>
>>
>>
>> discuss-bounces at lists.osgeo.org wrote on 20/02/2008 10:09:28 AM:
>>
>> > Hi Ivan,
>> >
>> > The most common advice I've seen says to leave raster out of the
>> DB.
>> > Of course footprints and meta data could be there, but you would
>> want to
>> > point Geoserver coverage to the image/image pyramid url somewhere
>> in the
>> > directory hierarchy.
>> >
>> > Brent has a nice writeup here:
>> > http://docs.codehaus.org/display/GEOSDOC/Load+NASA+Blue+Marble+Data
>> >
>> > In an AWS sense my idea is to Java proxy the Geoserver Coverage
>> Data URL to
>> > S3 buckets and park the imagery over on the S3 side to take
>> advantage of
>> > stability and replication. Performance, though, might not be as
>> good as a
>> > local directory. Maybe a one time cache to a local directory would
>> work
>> > better.
>> >
>> > Note: Amazon doesn't charge for inside AWS data transfers.
>> >
>> > So in theory:
>> > PostGIS holds the footprint geometry + metadata
>> > EC2 Geoserver WFS handles footprint queries into an Svg/Xaml
>> client, just
>> > stick it on top of something like JPL BMNG. Once a user picks a
>> coverage
>> > switch to the Geoserver WMS/WCS service for zooming around in the
>> selected
>> > image pyramid
>> > S3 buckets contain the tiffs, pyramids ...
>> > EC2 Geoserver handles WMS/WCS service
>> > EC2 proxy pulls the imagery from the S3 side as needed
>> >
>> > Sorry I haven't had time to try this so it is just theoretical. Of
>> course
>> > you can go traditional and just keep the coverage imagery files on
>> the local
>> > instance avoiding the S3 proxy idea. The reason I don't like that
>> idea is
>> > the imagery has to be loaded with every instance creation while an S3
>> > approach would need only one copy.
>> >
>> >
>> > randy
>> >
>> > -----Original Message-----
>> > From: Lucena, Ivan [mailto:ivan.lucena at pmldnet.com]
>> > Sent: Tuesday, February 19, 2008 2:59 PM
>> > To: rkgeorge at cadmaps.com; OSGeo Discussions
>> > Subject: Re: [OSGeo-Discuss] OS Spatial environment 'sizing'
>> >
>> > Hi Randy, Bruce,
>> >
>> > That is a nice piece of advise Randy. I am sorry to intrude the
>> > conversation but I would like to ask how that "heavy raster"
>> > manipulation would be treated by PostgreSQL/PostGIS, managed or
>> unmanaged?
>> >
>> > Best regards,
>> >
>> > Ivan
>> >
>> > Randy George wrote:
>> > > Hi Bruce,
>> > >
>> > > > >
>> > > On the "scale relatively quickly" front, you
>> should look
>> > > at Amazon's EC2/S3 services. I've recently worked with it and
>> find it an
>> > > attractive platform for scaling http://www.cadmaps.com/gisblog
>> > >
>> > > > >
>> > > The stack I like is Ubuntu+Java+ Postgresql/PostGIS + Apache2 mod_jk
>> > > Tomcat + Geoserver + custom SVG or XAML clients run out of Tomcat
>> > >
>> > > > >
>> > > If you use the larger instances the cost is
>> higher but
>> > > it sounds like you plan on some heavy raster services (WMS,WCS)
>> and lots
>> > > of memory will help.
>> > >
>> > > Small EC2 instance provides $0.10/hr:
>> > >
>> > > 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2
>> Compute
>> > > Unit), 160 GB of instance storage, 32-bit platform
>> > >
>> > > > >
>> > > Large EC2 instances provide $0.40/hr:
>> > >
>> > > 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2
>> > > Compute Units each), 850 GB of instance storage, 64-bit platform
>> > >
>> > > > >
>> > > Extra large EC2 instances $0.80/hr:
>> > >
>> > > 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2
>> Compute
>> > > Units each), 1690 GB of instance storage, 64-bit platform
>> > >
>> > > > >
>> > > Note: that the instances do not need to be permanent. Some people
>> > > (WeoGeo) have been using a couple of failover small instances and
>> then
>> > > starting new large instances for specific requirements. The idea
>> is to
>> > > start and stop instances as required rather than having ongoing
>> > > infrastructure costs. It only takes a minute or so to start an ec2
>> > > instance. If you are running a corporate service there may be
>> parts of
>> > > the day with very little use so you just schedule your heavy duty
>> > > instances for peak times. If you can connect your raster to S3
>> buckets
>> > > rather than instance storage you have built in replicated backup.
>> > >
>> > > > >
>> > > I know that Java JAI can easily eat up memory and is core to
>> Geoserver
>> > > WMS/WCS so you probably want to look at large memory footprint
>> for any
>> > > platform with lots of raster service. I'm partial to Geoserver
>> because
>> > > of its Java foundation. I think I would try to keep the Apache2
>> mod_jk
>> > > Tomcat Geoserver on a separate server instance from PostGIS. This
>> might
>> > > avoid problems for instance startup since your database would
>> need to be
>> > > loaded separately. The instance ami resides in a 10G partition the
>> > > balance of data will probably reside on a /mnt partition separate
>> from
>> > > ec2-run-instances. You may be able to avoid datadir problems by
>> adding
>> > > something like Elastra to the mix. Elastra beta is a wrapper for
>> > > PostgreSql that puts the datadir on S3 rather than local to an
>> instance.
>> > > I suppose they still keep indices(GIST et al) on the local instance.
>> > >
>> > > (I still think it an interesting exercise to see what could be done
>> > > connecting PostGIS to AWS SimpleDB services.)
>> > >
>> > > > >
>> > > So thinking out loud here is a possible architecture-
>> > >
>> > > Basic permanent setup
>> > >
>> > > put raster in S3 - this may require some customization of Geoserver,
>> > >
>> > > build a datadir in a PostGIS and backup to S3
>> > >
>> > > create a private ami for Postgresql/PostGIS
>> > >
>> > > create a private ami for the load balancer instance
>> > >
>> > > create a private ami with your service stack for both a small and
>> large
>> > > instance for flexibility,
>> > >
>> > > Startup services
>> > >
>> > > start a balancer instance
>> > >
>> > > point your DNS CNAME to this balancer instance
>> > >
>> > > start a PostGis instance (you could have more than one if
>> necessary but
>> > > it would be easier to just scale to a larger instance type if the
>> load
>> > > demands it)
>> > >
>> > > have a scripted download from an S3 BU to your PostGIS datadir (I'm
>> > > assuming a relatively static data resource)
>> > >
>> > > Variable services
>> > >
>> > > start service stack instance and connect to PostGIS
>> > >
>> > > update balancer to see new instance - this could be tricky
>> > >
>> > > repeat previous two steps as needed
>> > >
>> > > at night scale back - cron scaling for a known cycle or use a
>> controller
>> > > like weoceo to detect and respond to load fluctuation
>> > >
>> > > > >
>> > > By the way the public AWS ami with the best resources that I have
>> found
>> > > is Ubuntu 7.10 Gutsy. The debian dependency tools are much easier
>> to use
>> > > and the resources are plentiful.
>> > >
>> > > > >
>> > > I've been toying with using an AWS stack adapted for serving some
>> larger
>> > > Postgis vector sets such as fully connected census demographic
>> data and
>> > > block polygons here in US. The idea would be to populate the data
>> > > directly from the census SF* and TIGER with a background Java
>> bot. There
>> > > are some potentially novel 3D viewing approaches possible with xaml.
>> > > Anyway lots of fun to have access to virtual systems like this.
>> > >
>> > > > >
>> > > As you can see I'm excited anyway.
>> > >
>> > > > >
>> > > randy
>> > >
>> > > > >
>> > > > >
>> > > *From:* discuss-bounces at lists.osgeo.org
>> > > [mailto:discuss-bounces at lists.osgeo.org] *On Behalf Of
>> > > *Bruce.Bannerman at dpi.vic.gov.au
>> > > *Sent:* Monday, February 18, 2008 6:35 PM
>> > > *To:* OSGeo Discussions
>> > > *Subject:* [OSGeo-Discuss] OS Spatial environment 'sizing'
>> > >
>> > > > >
>> > >
>> > > IMO:
>> > >
>> > >
>> > > Hello everyone,
>> > >
>> > > I'm trying to get a feel for server 'sizing' for a **hypothetical**
>> > > Corporate environment to support OS Spatial apps.
>> > >
>> > >
>> > >
>> > > Assume that:
>> > >
>> > > - this is a dedicated environment to allow the use of OS Spatial
>> > > applications to serve Corporate OGC Services.
>> > >
>> > > - the applications of interest are GeoServer, Deegree, GeoNetwork,
>> > > MapServer, MapGuide and Postgres/PostGIS.
>> > >
>> > > - the environment may need to scale relatively quickly.
>> > >
>> > > - it will be required to serve in the vicinty of 5 to 10 TB of data
>> > > initially (WMS, WFS, WCS).
>> > >
>> > >
>> > >
>> > > Can anyone shed some light on the following questions please?
>> > >
>> > > - I'm assuming a Linux installation (SLES, Redhat or Debian) or
>> possibly
>> > > Intel Solaris. Has anyone experienced any issues in these (or other)
>> > > environments that they'd like to share?
>> > >
>> > > - Are there any recommendations as to dedicated network bandwidth
>> that
>> > > should be allocated?
>> > >
>> > > - Has anyone done any work with load balancing and would like to
>> share
>> > > their experiences?
>> > >
>> > > - Of the above OS Spatial products, which ones could co-exist on the
>> > > same server (excluding Postgres/PostGIS)?
>> > >
>> > >
>> > > Any thoughts are appreciated.
>> > >
>> > >
>> > > Bruce Bannerman
>> > > Australia
>> > >
>> > > Notice:
>> > > This email and any attachments may contain information that is
>> personal,
>> > > confidential,
>> > > legally privileged and/or copyright. No part of it should be
>> reproduced,
>> > > adapted or communicated without the prior written consent of the
>> > > copyright owner.
>> > >
>> > > It is the responsibility of the recipient to check for and remove
>> viruses.
>> > >
>> > > If you have received this email in error, please notify the
>> sender by
>> > > return email, delete it from your system and destroy any copies.
>> You are
>> > > not authorised to use, communicate or rely on the information
>> contained
>> > > in this email.
>> > >
>> > > Please consider the environment before printing this email.
>> > >
>> > > > >
>> > > > >
>> > > > >
>> > >
>> > >
>> ------------------------------------------------------------------------
>> > >
>> > > _______________________________________________
>> > > Discuss mailing list
>> > > Discuss at lists.osgeo.org
>> > > http://lists.osgeo.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > Discuss mailing list
>> > Discuss at lists.osgeo.org
>> > http://lists.osgeo.org/mailman/listinfo/discuss
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Discuss mailing list
>> Discuss at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/discuss
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
More information about the Discuss
mailing list