[postgis-users] Hardware requirements for a server

Mathieu Basille basille.web at ase-research.org
Wed Feb 11 15:37:44 PST 2015


Thanks to everyone who contributed to this thread, either on the PostGIS 
[1] or the PostgreSQL [2] mailing lists. I will try to summarize everything 
in this message, which I will actually post on both lists to give an update 
to everyone. I hope it can be useful for other people interested. Please 
feel free to add more advice and other experiences, this is always useful!


Performance
===========

* CPU
Good CPU required for faster processing. Number of cores helps in parallel 
processing, but number of users != number of queries (Example: with no more 
than 4 concurrent users, it should be fine with a single quad-core CPU).

* Memory
Examples go from 8 to >32 GB RAM.

* Disks
Lots of I/0 with geoprocessing requires fast disks: best with SSD, 
otherwise 10k/15k RPM. An alternative would be to store indexes on faster 
disks and data on slower disks (need to tune PostgreSQL).
Better to have direct-attached storage (DAS), i.e. on the server directly 
(direct transfer between RAM and disks); external storage requires good 
network (additional RAM increases performance).

* Massive multi-user environment (lot of simultaneous connections): pgpool 
[3] (Linux/UNIX only). pgpool can be added later on, no need to worry about 
it as a start.


Platform
========

Linux is the platform of choice:
* Easier administration (install/configuration/upgrade), which is also true 
for addons/dependencies (starting with PostGIS, but also GEOS, GDAL, PL/R);
* Better performance [4];
* More tuning options (limited with MS systems);

There is still the possibility of a virtualbox on a MS server.


Other considerations
====================

* Backup: integrate a script that runs daily pg_dump daily to export and 
upload DB to storage box (which is part of the backup system)

* Integration with R: a dedicated R server brings more flexibility / 
extensions (e.g. Shiny) / performance (more cores and memory available for 
PostGIS) except if data transfer is the bottleneck. Use Pl/R for small 
functions (also if it fits naturally into PostgreSQL workflow) / otherwise 
in R with PostgreSQL connector.


Example setups
==============

* Dell Precision 2×6 cores, 20 GB RAM, SSD for indexes, 7200 HDD for big 
tables [Rémi Cura]:
Various usages, from visualization (few users) to complex queries with a 
lot of reading/writing (several users).

* Bare metal machine with ESXI; PostgreSQL machine with 8Gb RAM; 2 
quad-core processors; PostgreSQL tuned for fast reads, with large cache; 
pgPool; disks: 2 7.200RPM disks - with RAID 1 [George Silva]:
12 concurrent QGIS users, editing around 50.000 km² of land use coverage in 
1:5000 scale with lot of detail (in two separate DB). More editing than 
processing; some heavy queries (e.g. "complete feature" tool from QGIS) can 
take some time.

* Nominatim (OpenStreetMap data) [5]: > 1 GB RAM necessary, >32 GB 
recommended; 700 GB HDD; SSD recommended; example machine: 12-core with 
32GB RAM and standard SATA disks, I/O limiting factor.

Thanks again for the good feedback! This gives me very useful information 
to get started (I think this is still going to be a long process).

Mathieu Basille.


[1] http://lists.osgeo.org/pipermail/postgis-users/2015-February/040120.html

[2] http://www.postgresql.org/message-id/54DAA7D9.8020908@ase-research.org

[3] http://www.pgpool.net/

[4] 
https://stackoverflow.com/questions/8368924/postgresql-performance-on-windows-using-latest-versions

[5] http://wiki.openstreetmap.org/wiki/Nominatim/Installation


Le 09/02/2015 23:07, Mathieu Basille a écrit :
> Dear PostGIS users,
>
> I am currently planning to set up a PostGIS instance for my lab. Turns out
> I believe this would be useful for the whole center, so that I'm now
> considering setting up a PostGIS server for everyone—if interest is shared
> of course. At the moment, I am however struggling with what would be
> required in terms of hardware, and of course, the cost will depend on
> that—at the end of the day, it's really a matter of money well spent. I
> have then a series of questions/remarks, and I would welcome any feedback
> from people with existing experience on setting up a multi-user PostGIS
> server.
>
> * My own experience is rather limited: I used PostGIS quite a bit, but only
> on a desktop, with 2 users. The desktop was quite good (quad-core Xeon, 12
> Go RAM, 500 GB hd), running Debian, and we never had any performance issue
> (although some queries were rather long, but still acceptable).
>
> * The use case I'm envisioning would be (at least in the foreseeable future):
> - About 10 faculty users (which means potentially a little bit more
> students using it); I would have hard time considering more than 4
> concurrent users;
> - Data would primarily involve a lot (hundreds/thousands) of high
> resolution (spatial and temporal) raster and vector maps, possibly over
> large areas (Florida / USA / continental), as well as potentially millions
> of GPS records (animals individually monitored);
> - Queries will primarily involve retrieving points/maps over given
> areas/time, as well as intersecting points over environmental layers; other
> use cases will involve working with steps, i.e. the straight line segment
> connecting two successive locations, and intersecting them with
> environmental layers;
>
> * I couldn't find comprehensive or detailed guidelines on-line about
> hardware, but from what I could see, it seems that memory wouldn't be the
> main issue, but the number of cores would be (one core per database
> connection if I'm not mistaken). At the same time, we want to make sure
> that the experience is smooth for everyone...
>
> * Is there a difference in terms of performance and usability between a
> Linux-based and a MS-based server? My center is unfortunately MS-centered,
> and existing equipment runs with MS systems... It would thus be easier for
> them to set up a MS-based server.
>
> * Does anyone have worked with a server running the DB engine, while the DB
> itself was stored on another box/server? That would likely be the case here
> since we already have a dedicated box for file storage. Along these lines,
> does the system of the file storage box matter (Linux vs. MS)?
>
> * We may also use the server as a workstation to streamline PostGIS
> processing with further R analyses/modeling (or even use R from within the
> database using PL/R). Again, does anyone have experience doing it? Is a
> single workstation the recommended way to work with such workflow? Or would
> it be better (but more costly) to have one server dedicated to PostGIS and
> another one, with different specs, dedicated to analyses (R)?
>
> I realize my questions and comments may be a confusing, likely because of
> the lack of experience about these issues on my side. I really welcome any
> feedback of people working with PostGIS servers in a small unit, or any
> similar setting that could be informative!
>
> In advance, thank you very much!
>
> Sincerely,
> Mathieu Basille.
>
>

-- 

~$ whoami
Mathieu Basille
http://ase-research.org/basille

~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314

~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
  -- Paul Éluard



More information about the postgis-users mailing list