[postgis-users] VMWare and PostGIS

maplabs at light42.com maplabs at light42.com
Sun Jan 13 19:38:33 PST 2013


> Date: Sun, 13 Jan 2013 00:36:31 +0000
> From: "Stephen V. Mather" <svm at clevelandmetroparks.com>

> ... 
> I should have asked this a long time ago regarding performance... .  
> So the classic storage solution (AFAIK) for a spatial database is 
> RAID 10 for maximum read and write speed.  I have a RAID 10 running 
> under a virtualization layer (VMWare in this case) and my sustained 
> read speeds are in the 1Gbps range.  The hardware is oldish, but they 
> are 10k SAS drives, so I would expect something a bit faster. 
>         To the question-- I know virtualization makes a (not-so-good) 
> difference in performance running spatial databases on e.g. Amazon 
> EC2 instances.  I assume this penalty is paid even for dedicated 
> private clouds.  What is the consensus/experience with 
> virtualization?  For my next machine, should I keep it to bare metal 
> for the PostGIS portion?
>

A few questions come to mind.. 

* What kind of queries on what kind of data?  If you are servicing 
thousands of visitors and answering the same dozen questions again and 
again, like at a kiosk, how far is this thing from me and how far is 
that... what about pre-processing all the answers in a batch and then 
not invoking geometry at GUI time at all ?  non-spatial is generally a 
lot faster than spatial !

* if you are answering thousands of users, and the questions and the 
data does vary, is it a distance calculation? Is there some way to 
break that into parts, some of which are done in advance?  Say, the 
PointOnSurface of a Poly beforehand, so its not done again and again... 
Pre-calculating intersections or areas of invariant elements..  Some 
preparation can make a lot of difference.. thats how games do it... 

* on another end of the spectrum, what about deep, detailed queries on 
a lot of data.. for generally one or two users.. I set up hardware and 
software to do that, I can tell you after months of running things, 
that the disk IO was not the bottleneck by a long shot, even for pretty 
huge data.. the CPU is.. PostGIS is *one core per query*.. if you have 
something that is computationally intensive, the cost of IO wears off 
very fast and then you are just watching a single core for a long, long 
while..  We had jobs in the 30-200 minute range, often.. and into the 
6-8 hour range occassionally.. on fast Xeon CPUs.. 100 hours or so was 
the longest, well written query I had on that project...  very fast 
CPUs matter *a lot*, with *big caches*, though fast IO never 
hurts.. Short point - PostGIS needs some kind of multi-core strategy, 
that it does not have now.. 

* there is something in between, with more than a dozen users, but 
doing real maps work.. I have never run that kind of setup.. I suspect 
it could be disk IO, net IO or CPU bound, and at different times.. 

in sum:
I think there is a chance to seperate pre-processing, and deep queries, 
from mass serving of results.. mass serving of results is a cloud 
thing.. the 36GB distilled result that took us more than a calendar 
year to generate, is ok to put in a cloud environment.. the 2000+GB of 
source materials, and the queries that it takes to reduce that into the 
distilled results, I would never put in a cloud .. it makes no sense to 
me... 

oh, a note on disk IO.. I watched that carefully.. on our main RAID 5 
with quality 7200 rpm SATA II disks, I could get >200M/sec read and 
write, sustained;  on a single Western Digitial black label, 120-170 
M/sec;  what you describe on your 10k disks is lower than either of 
those.. so yes I think your performance is low.. I have one machine 
with SAS disks but we didnt use it for the long work mentioned above.. 
all of our disks were local to their hosts.. 

RAM: 8GB of ram is the upper end of what PostgreSQL / PostGIS knows 
what to do with... the difference between <1G RAM, and say 3GB of RAM 
was huge for us.. after 3G of RAM, it didnt make that much difference 
for our single, big queries.. less than 10% performance penalty I'd 
say, from 3600M to 8000M.. no gain above 8G.. as a hindsight 
guess'timate .. 

personally, I am a fan of real hardware.. yet VMs are here to stay.. To 
make *your* environment rip, there is no substitute for measurement, 
since there are very different kinds of loads and very different kinds 
of computational questions to ask..  hope that helps.. I am interested 
to hear what the others say on this topic as things evolve

Brian M Hamlin
OSGeo California Chapter
415-717-4462 cell

 
> Thanks in advance,
> Best,
> Steve
>
> [http://sig.cmparks.net/cmp-ms-90x122.png] Stephen V. Mather
> GIS Manager
> (216) 635-3243 (Work)
> clevelandmetroparks.com<http://www.clemetparks.com>




More information about the postgis-users mailing list