[SAC] OSGeo7 Sever Config Quote

Alex M tech_dev at wildintellect.com
Wed Feb 14 09:29:45 PST 2018


Chris,

Want to take a try at configuring some options? Just make some quotes on
the silicon mechanics website. I can get us the Non-profit discount
(3-5% typically) on any quote through my Rep. See what you can do for
under $6000.

Some bigger questions to discuss:
What services should be on what type of disk?
Which services should be easy to containerize?

Major services:
Downloads - no database, frequent writes only to the maven stuff?
Container Easy

Trac/SVN - Postgres DB, not sure how heavy on writes. Container, tricky

Webextra - mostly static archives. Container easy.

Wiki - MediaWiki php/mysql, lots of writes. Container moderate.

Mailman - frequent writes. Container ?

Projects & Adhoc - a variety of php and other type sites, mostly read,
not a lot of writes. Container - would make life easier if every project
was it's own containers, but we don't want that many db instances.


Why VMs:
Mostly because 1 bad VM could easily be restarted and CPU/RAM allocation
could be somewhat isolated. Yes I know there is a way to quota ram on
containers. ACL is relatively easy, not everyone needs access to the
host. Yes I know ssh can be put inside containers, but that means
network routing is more complicated. That said I agree we should move
some things to containers when easy.

Minor notes:
Our experience with RAID 6 was terrible, but that was a hardware raid
card. Rebuild on even small SAS drives was days.
ZFS - doesn't ZFS require a lot of RAM?


Thanks,
Alex

On 02/13/2018 02:09 PM, Chris Giorgi wrote:
> On Tue, Feb 13, 2018 at 8:57 AM, Alex M <tech_dev at wildintellect.com> wrote:
> 
>> Thanks for the feedback, some comments inline. - Alex
>>
>> Quick Note, all mention of RAID is software not hardware.
>>
>> On 02/13/2018 12:13 AM, Chris Giorgi wrote:
>>> Hi Alex,
>>>
>>> Overall, this looks like a solid machine, but I do have a few suggestions
>>> considering the details of the hardware configuration.
>>>
>>> -First, a RAID5 array for the spinning rust pool may leave the pool
>> unduly
>>> susceptible to complete failure during recovery from a single drive
>> failure
>>> and replacement due to the extreme load on all discs while recreating the
>>> data on the replaced disk tending to trigger a subsequent failure. Also,
>> no
>>> hot spare is available, leaving the pool running in degraded mode until
>>> someone can physically swap the drive. A RAID6 (or ZFS RAIDZ2)
>>> configuration having two drives worth of recovery data greatly minimizes
>>> such risk.
>>> --Suggest that all 4 hot-swap bays be provisioned with the HGST models
>>> listed (512b emulated sector size) in the quote if the 4k native sector
>>> size drives are not available -- this can be worked around at the FS
>> level
>>> (ashift=12) with minimal performance impact.
>>>
>> We actually don't need that much space, 2 TB drives would have sufficed,
>> I picked the smallest size they offered which was 8 TB. What about just
>> going with mirror on 2x HGST drives. Note we do have a backup server.
>> Normally I would also use 4 Drives, but the machines just don't have
>> that many bays.
>>
> 
> If bulk storage isn't at a premium, a simple mirror would work fine using
> either
> 2 or 3 drives (which would allow having the hot-spare online in the mirror,
> while improving read performance for free).
> 
> Having 4 drives does double the capacity while tolerating two drives failing
> before loosing any data, and extra space can be used for storing snapshots
> locally, allowing rolling back a particular dataset to a previous working
> state
> without having to recover from backups -- zfs allows access to the previous
> snapshots files by simply accessing them in the hidden .zfs directories,
> which
> is a life-saver for accidental file overwrite or deletion.
> 
> 
>>
>>> -Second, RAID5 will seriously reduce the performance of the SSDs, and,
>>> especially on writes, increases latency, which somewhat defeats the
>> purpose
>>> of utilizing SSDs. A simple mirror array would be much better performing
>>> and have the same level of redundancy, while a stripe could be much
>> faster
>>> when used as a cache for hot data from the HDDs rather than the primary
>>> storage. For heavy write loads, such as databases, MLC SSDs really aren't
>>> suitable because of the wear rate and they usually lack of
>>> power-loss-protection. A smaller capacity but higher iops NVMe type SSD
>> on
>>> the PCI bus would be much more effective for those workloads.
>>> --Suggest identifying workloads needing high-speed storage and determine
>>> read vs. write requirements before final selection of SSDs. Use two SATA
>>> SSDs in mirror or stripe configuration for bulk storage or cache.
>> Consider
>>> PCIe connected NVMe if large number of writes and transactions.
>>>
>>
>> Speed isn't a huge issue, SSDs in any form seem to perform fast enough.
>> We tend to use SSD storage for everything. Having slow spinning disks at
>> all is new, and only suggested to start holding larger archives in a
> 
> publicly accessible way.
>>
> 
> Looking at
> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/diskstats_latency/index.html
> shows that write-latency is the largest bottleneck, while
> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.osgeo.org/diskstats_iops/index.html
> indicates that home, www, and mailman have the highest iops needs,
> with most IO being small writes, which are poorly suited for the SSDs
> indicated in the quote (DC S4500), which are intended for low write, high
> read
> use and will quickly wear out if used for high write loads.
> 
> DC 3600 or 3700series drives are designed for the higher write environment
> and would be a better choice.
> 
> Another option that may be better yet would be to purchase one of the
> lower-cost
> PCIe Optane SSDs (900P) to provide a very high-performance cache for
> hot-data,
> and a pair of small high-write volume SSDs to use for mirrored
> write-caching -- in
> ZFS terms, this would allow a portion of the PCIe SSD to be used for the
> L2ARC,
> and the mirrored SATA SSDs for the SLOG.
> 
> 
>>
>> We could go to 4 drives and do a mirror 2x2. We want the ability to keep
>> going when a drive drops and get a new drive in within a couple of days.
>>
>> Note OSGeo6 is 6xSSD with I believe 2 RAID5.
>>
>> Can you verify the type of SSDs, there are other options - also make
>> sure to note these are not the consumer models.
>>
>>> -Third, the memory really is the biggest bottleneck and resource limit,
>> so
>>> I would favor increasing that as much as possible over the size of the
>> SSD
>>> pool. Unused memory is used to cache filesystem contents to RAM, which is
>>> orders of magnitude faster than a SSD, but is there for your workloads
>> when
>>> needed.
>>> --Suggest 128GB RAM, making trade-offs against SSD capacity if budget
>>> requires.
>>>
>>
>> If you look at OSGeo6 I'm not sure we're really utilizing all the ram we
>> bought.
>> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.
>> osgeo.org/index.html
>>
>> Though really in this case it's about $900 to add the additional ram up
>> to 128. I'm on the fence about this. Since I'd prefer to buy more often
>> cheaper machines than load up expensive ones.
>>
> 
> Looking at the stats, it appears that it's fully utilizing the ~128G
> allocated, with
> most of it acting as a page cache for the filesystems.
> 
> There do appear to be instances when swap was getting hit, but I can't tell
> why from just the graphs.
> 
> Increasing the memory will almost always be worth it compared to processor
> or disk speed upgrades, especially in a multi-tenant environment -- In unix,
> RAM is never wasted, it just gets allocated to various subsystems as needed.
> 
> 
>>> Some general comments on filesystems, software stack, and virtualization,
>>> in reverse order.
>>>
>>> For most of the needs I have seen discussed, full virtualization is far
>>> more heavy-handed than necessary -- a container solution such as LXC/LXD
>>> would be much more appropriate and allow for much better granularity with
>>> lower overhead. A few VMs may be useful for particular projects that need
>>> to run their own kernel, low-level services, or suspend and move to
>> another
>>> host for some reason, but those are the exception, not the rule. Many
>> tools
>>> for managing VMs can also manage containers, and provisioning many
>>> containers off the same base template is both very easy and consumes very
>>> little additional disk space when used on a CoW filesystem
>> (Copy-on-write)
>>> that supports cloning; additionally, backups are both instantaneous and
>>> only take up as much space as the changed files. My personal preference
>> is
>>> to use ZFS for a filesystem because it supports all levels of the storage
>>> stack from disk to filesystem to snapshots and remote backup in a single
>>> tool and thus can detect and correct data corruption anywhere in the
>> stack
>>> before it can be persisted. LVM2 and associated tools provide mostly
>>> similar functionality, but I find them much less intuitive and more
>>> difficult to administrate - that's certainly may be just a matter of
>>> personal taste and experience.
>>>
>>
>> We've already decided to go VM, so we can migrate existing services. In
>> our case administering a VM can be delegated easily. We do plan to try
>> our containers on OSGeo6 (existing). But for now we really just need to
>> move existing VMs from OSGeo3 so we can retire the hardware. These
>> include Downloads, Wiki, Trac/SVN, and Webextra (Foss4g)
>>
> 
> As a migration path, it's certainly easy enough to spin up a couple of VMs,
> but in the long run, those services really should be split up in a more
> fine-
> grained manner to make them both easier to admin and to reduce resource
> usage for services when they aren't in active use.
> 
> The current setup with many services running in each VM both increases
> resource contention and decreases performance, while also requiring
> more administrative overhead and making upgrading of a portion of the
> software stack very difficult without upgrading the entire system.
> 
> I would be happy to go over the various services needed by which users
> to help plan a proper migration away from the all-in-one VMs, and there
> is no problem with having both VMs and containers in various combinations,
> including containers within VMs where that makes sense.
> 
> 
>>
>>
>> I hope this helps with the purchasing and provisioning decisions.
>>>
>>> Take care,
>>>    ~~~Chris Giorgi~~~
>>>
>>>
>>>
>>>
>>> On Mon, Feb 12, 2018 at 1:27 PM, Regina Obe <lr at pcorp.us> wrote:
>>>
>>>> Alex,
>>>>
>>>> This looks good to me +1.  Really excited to have a new Box in place.
>>>>
>>>> I'm also thinking that with the new box, we could start off-loading
>>>> osgeo3,4 and allow Lance to upgrade the ganeti on them.
>>>> Since we won't have anything mission critical -- after we migrate
>> mission
>>>> critical stuff to osgeo7, if hardware on osgeo4 fails during upgrade, I
>>>> assume it wouldn't be a big deal.
>>>> As I recall, was it only osgeo4 that had a hardware issue?
>>>>
>>>> Thanks,
>>>> Regina
>>>>
>>>> -----Original Message-----
>>>> From: Sac [mailto:sac-bounces at lists.osgeo.org] On Behalf Of Alex M
>>>> Sent: Monday, February 12, 2018 3:54 PM
>>>> To: sac >> System Administration Committee Discussion/OSGeo <
>>>> sac at lists.osgeo.org>
>>>> Subject: [SAC] OSGeo7 Sever Config Quote
>>>>
>>>> Here's the latest quote for us to discuss server configuration for
>> OSGeo7.
>>>>
>>>> https://drive.google.com/open?id=1X-z66jXXBUZuPqh6EP0d43g2NUCL7xcL
>>>>
>>>> The plan based on discussions is to manage KVM virtual machines, lvm
>>>> drives, with libvirt. At such time that we feel we need to go to
>> something
>>>> more advanced because we are managing multiple physical machines we
>> could
>>>> convert to ganeti or openstack (less sure about how to convert to
>>>> openstack).
>>>>
>>>> The idea was up to 4 virtual machines, each would have someone
>> designated
>>>> to make sure it was updated, along with use of the unattended upgrades
>> for
>>>> security patches.
>>>>
>>>> As quoted I've done RAID 5 SSD, and RAID 5 traditional, 3 drives each.
>>>> That will give us fast storage and large storage (think downloads and
>>>> foss4g archives).
>>>>
>>>> I did redundant power to maximize uptime.
>>>>
>>>> RAM is only 64 GB which is up to 16 for each of the Virtual Machines.
>>>>
>>>> Please discuss and ask questions so we can possibly vote this week at
>> the
>>>> meeting.
>>>>
>>>> Thanks,
>>>> Alex
>>>> _______________________________________________
>>>> Sac mailing list
>>>> Sac at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>
>>>> _______________________________________________
>>>> Sac mailing list
>>>> Sac at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Sac mailing list
>>> Sac at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>
>>
>>
> 



More information about the Sac mailing list