[SAC] OSGeo7 Sever Config Quote - New

Thu Feb 15 08:54:47 PST 2018

All,

With Chris' help here's updated quotes, these are 4 variants that just
change the size of the large spinning disks. Which if I understand
correctly it's suggested we run in mirror so n-2 storage.

Chris is the mirror RAID based, which mode? Or some other way?

https://drive.google.com/file/d/1X-z66jXXBUZuPqh6EP0d43g2NUCL7xcL/view?usp=sharing

Opinion, I'm leaning towards the 4 TB drives instead of the 8 TB. That
will give over 7 TB of usable space with dual redundancy. Save us $500
which goes towards the +$800 on ram to get to 128GB or the Optane card
depending on how you look at it.

Thanks,
Alex

On 02/14/2018 01:03 PM, Chris Giorgi wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Alex M <tech_dev at wildintellect.com> wrote:
> 
>> Chris,
>>
>> Want to take a try at configuring some options? Just make some quotes on
>> the silicon mechanics website. I can get us the Non-profit discount
>> (3-5% typically) on any quote through my Rep. See what you can do for
>> under $6000.
>>
> 
> Targeting < $6000 using that base configuration is a bit tricky, but the
> changes
> I would make are the following:
> +$848    64G -> 128G Ram
> -$30 Hot-swap 4: S4500 -> HGST 8he SATA
> (+$212 for 4x10he or +$540 for 4x12he)
> -~$150 for 2x S3700 200G - 10DWPD Preferred for frequent writes (Not listed
> by Silicon Mechanics - open market price)*
> ( or -$210 2x S4500 480G - 1DWPD -> 2x S4600 240G - 3DWPD Moderate writes
> okay)
> (or +$212 for 2x S4600 480G - 3DWPD)
> (or +~$350 for 2x S3700 400G - 10DWPD Preferred for frequent writes) (Not
> listed by Silicon Mechanics - open market price)
> (or +$586 for 2x Micron 5100 Max 960G - 5DWPD)
> +~$650 1x Optane SSD 900P 480GB PCIe -10DWPD (Not listed by Silicon
> Mechanics - open market price)
> 
> 
> - Additional RAM will make largest improvement in overall performance and
> responsiveness under load.
> 
> - Configured with dual redundancy for the HDD array, this gives a capacity
> of 16TB.
> (20TB +$212, 24TB +$540).
> 
> - SATA SSDs would be configured for mirroring to be used primarily for
> write caching,
> 2x200G S3700 is more than enough, and can support 2TB of writes daily
> without
> premature failure (vs. 480GB/day with S4500s). Extra space may be used for
> DB
> tables having high transaction rates.
> 
> - PCIe connected Optane SSD would provide blistering fast caching for hot
> data,
> orders of magnitude lower latency and higher IOPS than SATA SSDs can
> provide.
> 
> 
>> Some bigger questions to discuss:
>> What services should be on what type of disk?
>> Which services should be easy to containerize?
>>
>> Major services:
>> Downloads - no database, frequent writes only to the maven stuff?
>> Container Easy
>>
> 
> Downloads can reside entirely on spinning platters with no problems.
> Frequently
> accessed files will be cached to memory by the filesystem, and the initial
> latency of accessing cold data is still reasonable.
> 
> 
>>
>> Trac/SVN - Postgres DB, not sure how heavy on writes. Container, tricky
>>
> 
> -Trac/SVN will have no qualms about sitting in containers, preferably one
> for
> each Trac instance.
> -A proxying web server configuration would allow a single server
> configuration
> to handle connections for all instances.
> -Trac itself should have low write requirements, with moderate reads.
> -SVN/git I/O loads based on commit volume; high frequency small random
> writes.
> -A single PostgreSQL installation should handle all of them with ease.
> -PostgreSQL will be the highest transaction loading, but is log structured,
> so block rewrites are not out of hand.
> -A VM may be appropriate for PostgreSQL, but it also works fine in
> containers, possibly better due to less overhead.
> 
> 
> 
> 
>>
>> Webextra - mostly static archives. Container easy.
>>
> Same story as Downloads.
> 
> 
>> Wiki - MediaWiki php/mysql, lots of writes. Container moderate.
>>
> Write caching + plenty of RAM should make this load perform well.
> No difficulty with containerization, but possibly use VM to limit
> resources if needed.
> 
>>
>> Mailman - frequent writes. Container ?
>>
> 
> Mail services may use either containers or VMs comfortably. Write-caching
> will help immensely here. Long term storage on spinning rust is fine.
> 
>>
>> Projects & Adhoc - a variety of php and other type sites, mostly read,
>> not a lot of writes. Container - would make life easier if every project
>> was it's own containers, but we don't want that many db instances.
>>
> 
> A container for every project is easily doable with a single PostgreSQL
> instance serving all of them -- if a few projects need their own dedicated
> instances, that's easy to setup using another container that they control.
> 
>>
>>
>> Why VMs:
>> Mostly because 1 bad VM could easily be restarted and CPU/RAM allocation
>> could be somewhat isolated. Yes I know there is a way to quota ram on
>> containers. ACL is relatively easy, not everyone needs access to the
>> host. Yes I know ssh can be put inside containers, but that means
>> network routing is more complicated. That said I agree we should move
>> some things to containers when easy.
>>
> 
> VMs are best used where unbounded resource consumption is likely or
> where services need to be suspended and resumed elsewhere. Most
> hypervisor based container technologies (such as LXC/LXD) can fully
> compartmentalize resource usage, and containers can be stopped and
> restarted just like VMs. The networking with ssh connections into the
> containers is SOP these days and is the primary means used by
> provisioning tools like Vagrant.
> 
> 
>> Minor notes:
>> Our experience with RAID 6 was terrible, but that was a hardware raid
>> card. Rebuild on even small SAS drives was days.
>> ZFS - doesn't ZFS require a lot of RAM?
>>
> 
> Hardware RAID 6 suffers badly when you have to resliver because the
> computational load must be handled by the card and be reasonably
> transparent to the operational state of the machine. A good software
> RAID is generally much faster, considering modern CPU power. ZFS
> RAIDZ2 doesn't suffer from the same issues because the filesystem
> understands the data duplication strategy directly, rather than having
> a translation layer between the physical disks and the filesystem.
> 
> ZFS does like a lot of RAM to increase the amount of cached data,
> but unless you're trying to use the deduplication feature (DON'T!),
> it will happily get by on a modest amount. In general, all filesystems
> will perform better if given lots of RAM for caching. ZFS can also very
> effectively use both read (L2ARC) and write (SLOG) caches on fast
> drives to improve performance on spinning rust arrays to the point
> only cold-data has any noticeable latency. When using ZFS, the
> entire pool can be compressed with lz4, which both increases
> useful storage and bandwidth -- files which are already compressed
> are recognized, so there is no drawback to using it on mixed data.
> 
> Take a look at the drives listed above and perhaps inquire
> with Silicon Mechanics about the availability of the
> DC S3700 200GB SATA SSDs and the Optane 900p 480G
> PCIe SSD (workstation class, not the insanely expensive one!)
> and let me know how far off the price target we end up.
> 
> Thanks,
>     ~~~Chris~~~
> 
> 
>>
>> Thanks,
>> Alex
>>
>> On 02/13/2018 02:09 PM, Chris Giorgi wrote:
>>> On Tue, Feb 13, 2018 at 8:57 AM, Alex M <tech_dev at wildintellect.com>
>> wrote:
>>>
>>>> Thanks for the feedback, some comments inline. - Alex
>>>>
>>>> Quick Note, all mention of RAID is software not hardware.
>>>>
>>>> On 02/13/2018 12:13 AM, Chris Giorgi wrote:
>>>>> Hi Alex,
>>>>>
>>>>> Overall, this looks like a solid machine, but I do have a few
>> suggestions
>>>>> considering the details of the hardware configuration.
>>>>>
>>>>> -First, a RAID5 array for the spinning rust pool may leave the pool
>>>> unduly
>>>>> susceptible to complete failure during recovery from a single drive
>>>> failure
>>>>> and replacement due to the extreme load on all discs while recreating
>> the
>>>>> data on the replaced disk tending to trigger a subsequent failure.
>> Also,
>>>> no
>>>>> hot spare is available, leaving the pool running in degraded mode until
>>>>> someone can physically swap the drive. A RAID6 (or ZFS RAIDZ2)
>>>>> configuration having two drives worth of recovery data greatly
>> minimizes
>>>>> such risk.
>>>>> --Suggest that all 4 hot-swap bays be provisioned with the HGST models
>>>>> listed (512b emulated sector size) in the quote if the 4k native sector
>>>>> size drives are not available -- this can be worked around at the FS
>>>> level
>>>>> (ashift=12) with minimal performance impact.
>>>>>
>>>> We actually don't need that much space, 2 TB drives would have sufficed,
>>>> I picked the smallest size they offered which was 8 TB. What about just
>>>> going with mirror on 2x HGST drives. Note we do have a backup server.
>>>> Normally I would also use 4 Drives, but the machines just don't have
>>>> that many bays.
>>>>
>>>
>>> If bulk storage isn't at a premium, a simple mirror would work fine using
>>> either
>>> 2 or 3 drives (which would allow having the hot-spare online in the
>> mirror,
>>> while improving read performance for free).
>>>
>>> Having 4 drives does double the capacity while tolerating two drives
>> failing
>>> before loosing any data, and extra space can be used for storing
>> snapshots
>>> locally, allowing rolling back a particular dataset to a previous working
>>> state
>>> without having to recover from backups -- zfs allows access to the
>> previous
>>> snapshots files by simply accessing them in the hidden .zfs directories,
>>> which
>>> is a life-saver for accidental file overwrite or deletion.
>>>
>>>
>>>>
>>>>> -Second, RAID5 will seriously reduce the performance of the SSDs, and,
>>>>> especially on writes, increases latency, which somewhat defeats the
>>>> purpose
>>>>> of utilizing SSDs. A simple mirror array would be much better
>> performing
>>>>> and have the same level of redundancy, while a stripe could be much
>>>> faster
>>>>> when used as a cache for hot data from the HDDs rather than the primary
>>>>> storage. For heavy write loads, such as databases, MLC SSDs really
>> aren't
>>>>> suitable because of the wear rate and they usually lack of
>>>>> power-loss-protection. A smaller capacity but higher iops NVMe type SSD
>>>> on
>>>>> the PCI bus would be much more effective for those workloads.
>>>>> --Suggest identifying workloads needing high-speed storage and
>> determine
>>>>> read vs. write requirements before final selection of SSDs. Use two
>> SATA
>>>>> SSDs in mirror or stripe configuration for bulk storage or cache.
>>>> Consider
>>>>> PCIe connected NVMe if large number of writes and transactions.
>>>>>
>>>>
>>>> Speed isn't a huge issue, SSDs in any form seem to perform fast enough.
>>>> We tend to use SSD storage for everything. Having slow spinning disks at
>>>> all is new, and only suggested to start holding larger archives in a
>>>
>>> publicly accessible way.
>>>>
>>>
>>> Looking at
>>> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.
>> osgeo.org/diskstats_latency/index.html
>>> shows that write-latency is the largest bottleneck, while
>>> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.
>> osgeo.org/diskstats_iops/index.html
>>> indicates that home, www, and mailman have the highest iops needs,
>>> with most IO being small writes, which are poorly suited for the SSDs
>>> indicated in the quote (DC S4500), which are intended for low write, high
>>> read
>>> use and will quickly wear out if used for high write loads.
>>>
>>> DC 3600 or 3700series drives are designed for the higher write
>> environment
>>> and would be a better choice.
>>>
>>> Another option that may be better yet would be to purchase one of the
>>> lower-cost
>>> PCIe Optane SSDs (900P) to provide a very high-performance cache for
>>> hot-data,
>>> and a pair of small high-write volume SSDs to use for mirrored
>>> write-caching -- in
>>> ZFS terms, this would allow a portion of the PCIe SSD to be used for the
>>> L2ARC,
>>> and the mirrored SATA SSDs for the SLOG.
>>>
>>>
>>>>
>>>> We could go to 4 drives and do a mirror 2x2. We want the ability to keep
>>>> going when a drive drops and get a new drive in within a couple of days.
>>>>
>>>> Note OSGeo6 is 6xSSD with I believe 2 RAID5.
>>>>
>>>> Can you verify the type of SSDs, there are other options - also make
>>>> sure to note these are not the consumer models.
>>>>
>>>>> -Third, the memory really is the biggest bottleneck and resource limit,
>>>> so
>>>>> I would favor increasing that as much as possible over the size of the
>>>> SSD
>>>>> pool. Unused memory is used to cache filesystem contents to RAM, which
>> is
>>>>> orders of magnitude faster than a SSD, but is there for your workloads
>>>> when
>>>>> needed.
>>>>> --Suggest 128GB RAM, making trade-offs against SSD capacity if budget
>>>>> requires.
>>>>>
>>>>
>>>> If you look at OSGeo6 I'm not sure we're really utilizing all the ram we
>>>> bought.
>>>> http://webextra.osgeo.osuosl.org/munin/osgeo.org/osgeo6.
>>>> osgeo.org/index.html
>>>>
>>>> Though really in this case it's about $900 to add the additional ram up
>>>> to 128. I'm on the fence about this. Since I'd prefer to buy more often
>>>> cheaper machines than load up expensive ones.
>>>>
>>>
>>> Looking at the stats, it appears that it's fully utilizing the ~128G
>>> allocated, with
>>> most of it acting as a page cache for the filesystems.
>>>
>>> There do appear to be instances when swap was getting hit, but I can't
>> tell
>>> why from just the graphs.
>>>
>>> Increasing the memory will almost always be worth it compared to
>> processor
>>> or disk speed upgrades, especially in a multi-tenant environment -- In
>> unix,
>>> RAM is never wasted, it just gets allocated to various subsystems as
>> needed.
>>>
>>>
>>>>> Some general comments on filesystems, software stack, and
>> virtualization,
>>>>> in reverse order.
>>>>>
>>>>> For most of the needs I have seen discussed, full virtualization is far
>>>>> more heavy-handed than necessary -- a container solution such as
>> LXC/LXD
>>>>> would be much more appropriate and allow for much better granularity
>> with
>>>>> lower overhead. A few VMs may be useful for particular projects that
>> need
>>>>> to run their own kernel, low-level services, or suspend and move to
>>>> another
>>>>> host for some reason, but those are the exception, not the rule. Many
>>>> tools
>>>>> for managing VMs can also manage containers, and provisioning many
>>>>> containers off the same base template is both very easy and consumes
>> very
>>>>> little additional disk space when used on a CoW filesystem
>>>> (Copy-on-write)
>>>>> that supports cloning; additionally, backups are both instantaneous and
>>>>> only take up as much space as the changed files. My personal preference
>>>> is
>>>>> to use ZFS for a filesystem because it supports all levels of the
>> storage
>>>>> stack from disk to filesystem to snapshots and remote backup in a
>> single
>>>>> tool and thus can detect and correct data corruption anywhere in the
>>>> stack
>>>>> before it can be persisted. LVM2 and associated tools provide mostly
>>>>> similar functionality, but I find them much less intuitive and more
>>>>> difficult to administrate - that's certainly may be just a matter of
>>>>> personal taste and experience.
>>>>>
>>>>
>>>> We've already decided to go VM, so we can migrate existing services. In
>>>> our case administering a VM can be delegated easily. We do plan to try
>>>> our containers on OSGeo6 (existing). But for now we really just need to
>>>> move existing VMs from OSGeo3 so we can retire the hardware. These
>>>> include Downloads, Wiki, Trac/SVN, and Webextra (Foss4g)
>>>>
>>>
>>> As a migration path, it's certainly easy enough to spin up a couple of
>> VMs,
>>> but in the long run, those services really should be split up in a more
>>> fine-
>>> grained manner to make them both easier to admin and to reduce resource
>>> usage for services when they aren't in active use.
>>>
>>> The current setup with many services running in each VM both increases
>>> resource contention and decreases performance, while also requiring
>>> more administrative overhead and making upgrading of a portion of the
>>> software stack very difficult without upgrading the entire system.
>>>
>>> I would be happy to go over the various services needed by which users
>>> to help plan a proper migration away from the all-in-one VMs, and there
>>> is no problem with having both VMs and containers in various
>> combinations,
>>> including containers within VMs where that makes sense.
>>>
>>>
>>>>
>>>>
>>>> I hope this helps with the purchasing and provisioning decisions.
>>>>>
>>>>> Take care,
>>>>>    ~~~Chris Giorgi~~~
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 12, 2018 at 1:27 PM, Regina Obe <lr at pcorp.us> wrote:
>>>>>
>>>>>> Alex,
>>>>>>
>>>>>> This looks good to me +1.  Really excited to have a new Box in place.
>>>>>>
>>>>>> I'm also thinking that with the new box, we could start off-loading
>>>>>> osgeo3,4 and allow Lance to upgrade the ganeti on them.
>>>>>> Since we won't have anything mission critical -- after we migrate
>>>> mission
>>>>>> critical stuff to osgeo7, if hardware on osgeo4 fails during upgrade,
>> I
>>>>>> assume it wouldn't be a big deal.
>>>>>> As I recall, was it only osgeo4 that had a hardware issue?
>>>>>>
>>>>>> Thanks,
>>>>>> Regina
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Sac [mailto:sac-bounces at lists.osgeo.org] On Behalf Of Alex M
>>>>>> Sent: Monday, February 12, 2018 3:54 PM
>>>>>> To: sac >> System Administration Committee Discussion/OSGeo <
>>>>>> sac at lists.osgeo.org>
>>>>>> Subject: [SAC] OSGeo7 Sever Config Quote
>>>>>>
>>>>>> Here's the latest quote for us to discuss server configuration for
>>>> OSGeo7.
>>>>>>
>>>>>> https://drive.google.com/open?id=1X-z66jXXBUZuPqh6EP0d43g2NUCL7xcL
>>>>>>
>>>>>> The plan based on discussions is to manage KVM virtual machines, lvm
>>>>>> drives, with libvirt. At such time that we feel we need to go to
>>>> something
>>>>>> more advanced because we are managing multiple physical machines we
>>>> could
>>>>>> convert to ganeti or openstack (less sure about how to convert to
>>>>>> openstack).
>>>>>>
>>>>>> The idea was up to 4 virtual machines, each would have someone
>>>> designated
>>>>>> to make sure it was updated, along with use of the unattended upgrades
>>>> for
>>>>>> security patches.
>>>>>>
>>>>>> As quoted I've done RAID 5 SSD, and RAID 5 traditional, 3 drives each.
>>>>>> That will give us fast storage and large storage (think downloads and
>>>>>> foss4g archives).
>>>>>>
>>>>>> I did redundant power to maximize uptime.
>>>>>>
>>>>>> RAM is only 64 GB which is up to 16 for each of the Virtual Machines.
>>>>>>
>>>>>> Please discuss and ask questions so we can possibly vote this week at
>>>> the
>>>>>> meeting.
>>>>>>
>>>>>> Thanks,
>>>>>> Alex
>>>>>> _______________________________________________
>>>>>> Sac mailing list
>>>>>> Sac at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>>>
>>>>>> _______________________________________________
>>>>>> Sac mailing list
>>>>>> Sac at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Sac mailing list
>>>>> Sac at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>>
>>>>
>>>>
>>>
>>
>>
>