[postgis-users] VMWare and PostGIS
maplabs at light42.com
maplabs at light42.com
Sun Jan 13 19:38:33 PST 2013
> Date: Sun, 13 Jan 2013 00:36:31 +0000
> From: "Stephen V. Mather" <svm at clevelandmetroparks.com>
> ...
> I should have asked this a long time ago regarding performance... .
> So the classic storage solution (AFAIK) for a spatial database is
> RAID 10 for maximum read and write speed. I have a RAID 10 running
> under a virtualization layer (VMWare in this case) and my sustained
> read speeds are in the 1Gbps range. The hardware is oldish, but they
> are 10k SAS drives, so I would expect something a bit faster.
> To the question-- I know virtualization makes a (not-so-good)
> difference in performance running spatial databases on e.g. Amazon
> EC2 instances. I assume this penalty is paid even for dedicated
> private clouds. What is the consensus/experience with
> virtualization? For my next machine, should I keep it to bare metal
> for the PostGIS portion?
>
A few questions come to mind..
* What kind of queries on what kind of data? If you are servicing
thousands of visitors and answering the same dozen questions again and
again, like at a kiosk, how far is this thing from me and how far is
that... what about pre-processing all the answers in a batch and then
not invoking geometry at GUI time at all ? non-spatial is generally a
lot faster than spatial !
* if you are answering thousands of users, and the questions and the
data does vary, is it a distance calculation? Is there some way to
break that into parts, some of which are done in advance? Say, the
PointOnSurface of a Poly beforehand, so its not done again and again...
Pre-calculating intersections or areas of invariant elements.. Some
preparation can make a lot of difference.. thats how games do it...
* on another end of the spectrum, what about deep, detailed queries on
a lot of data.. for generally one or two users.. I set up hardware and
software to do that, I can tell you after months of running things,
that the disk IO was not the bottleneck by a long shot, even for pretty
huge data.. the CPU is.. PostGIS is *one core per query*.. if you have
something that is computationally intensive, the cost of IO wears off
very fast and then you are just watching a single core for a long, long
while.. We had jobs in the 30-200 minute range, often.. and into the
6-8 hour range occassionally.. on fast Xeon CPUs.. 100 hours or so was
the longest, well written query I had on that project... very fast
CPUs matter *a lot*, with *big caches*, though fast IO never
hurts.. Short point - PostGIS needs some kind of multi-core strategy,
that it does not have now..
* there is something in between, with more than a dozen users, but
doing real maps work.. I have never run that kind of setup.. I suspect
it could be disk IO, net IO or CPU bound, and at different times..
in sum:
I think there is a chance to seperate pre-processing, and deep queries,
from mass serving of results.. mass serving of results is a cloud
thing.. the 36GB distilled result that took us more than a calendar
year to generate, is ok to put in a cloud environment.. the 2000+GB of
source materials, and the queries that it takes to reduce that into the
distilled results, I would never put in a cloud .. it makes no sense to
me...
oh, a note on disk IO.. I watched that carefully.. on our main RAID 5
with quality 7200 rpm SATA II disks, I could get >200M/sec read and
write, sustained; on a single Western Digitial black label, 120-170
M/sec; what you describe on your 10k disks is lower than either of
those.. so yes I think your performance is low.. I have one machine
with SAS disks but we didnt use it for the long work mentioned above..
all of our disks were local to their hosts..
RAM: 8GB of ram is the upper end of what PostgreSQL / PostGIS knows
what to do with... the difference between <1G RAM, and say 3GB of RAM
was huge for us.. after 3G of RAM, it didnt make that much difference
for our single, big queries.. less than 10% performance penalty I'd
say, from 3600M to 8000M.. no gain above 8G.. as a hindsight
guess'timate ..
personally, I am a fan of real hardware.. yet VMs are here to stay.. To
make *your* environment rip, there is no substitute for measurement,
since there are very different kinds of loads and very different kinds
of computational questions to ask.. hope that helps.. I am interested
to hear what the others say on this topic as things evolve
Brian M Hamlin
OSGeo California Chapter
415-717-4462 cell
> Thanks in advance,
> Best,
> Steve
>
> [http://sig.cmparks.net/cmp-ms-90x122.png] Stephen V. Mather
> GIS Manager
> (216) 635-3243 (Work)
> clevelandmetroparks.com<http://www.clemetparks.com>
More information about the postgis-users
mailing list