[SAC] New Hardware, can we purchase now

Alex M tech_dev at wildintellect.com
Mon Apr 2 10:59:34 PDT 2018


To clarify I was pondering 2 devices not 3. The answer may be you want
the 3 we've already selected so the read cache is separate and larger.

Please let me know if there are any other issues with the config before
we proceed.

Thanks,
Alex

On 03/30/2018 02:54 PM, Chris Giorgi wrote:
> I'm not sure how we would go about fitting a third Optane device --
> the quote had HHHL PCIe cards listed, not the required U.2 devices
> which go in place of the micron sata ssds.
> The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
> interface, which then connects by cable to the drives themselves.
> The M.2 card slot on the board should be on it's own set of lanes, as
> none of the remaining PCIe slots on the board are occupied due to
> space constraints.
> The reason for using the more expensive (and faster) Optanes for the
> write cache is that a write-cache failure can lead to data corruption,
> and they have an order of magnitude more write endurance than a
> standard SSD.
> The read cache can use a larger, cheaper (but still fast) SSD because
> it see much lower write-amplification than the write cache and a
> failure won't cause corruption.
> 
>    ~~~Chris~~~
> 
> On Fri, Mar 30, 2018 at 11:53 AM,  <harrison.grundy at astrodoggroup.com> wrote:
>> Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?
>>
>> If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.
>>
>> Harrison
>>
>>
>>
>> Sent via the BlackBerry Hub for Android
>>
>>   Original Message
>> From: tech_dev at wildintellect.com
>> Sent: March 31, 2018 02:21
>> To: sac at lists.osgeo.org
>> Reply-to: tech at wildintellect.com; sac at lists.osgeo.org
>> Cc: chrisgiorgi at gmail.com
>> Subject: Re: [SAC] New Hardware, can we purchase now
>>
>> Here's the latest quote with the modifications Chris suggested.
>>
>> One question, any reason we can't just use the Optanes for both read &
>> write caches?
>>
>> Otherwise unless there are other suggestions or clarifications, I will
>> send out another thread for an official vote to approve. Note the price
>> is +$1,000 more than originally budgeted.
>>
>> Thanks,
>> Alex
>>
>> On 03/14/2018 09:47 PM, Chris Giorgi wrote:
>>> Further investigation into the chassis shows this is the base sm is using:
>>> https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
>>> It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
>>> slot on the motherboard.
>>> In light of this, I am changing my recommendation to the following,
>>> please follow-up with sm for pricing:
>>> 2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
>>> replacing SATA SSDs
>>> ..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
>>> (Depending on compatibility)
>>> Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.
>>>
>>> With this configuration, the Optanes supply a very fast mirrored write
>>> cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
>>> L2ARC), and no further cache configuration needed.
>>>
>>> Let me know if that sound more palatable.
>>>    ~~~Chris~~~
>>>
>>>
>>> On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi at gmail.com> wrote:
>>>> Alex,
>>>>
>>>> Simply put, write caching requires redundant devices; read caching does not.
>>>>
>>>> The write cache can be relatively small -- it only needs to handle
>>>> writes which have not yet been committed to disks. This allows sync
>>>> writes to finish as soon as the data hits the SSD, with the write to
>>>> disk being done async. Failure of the write cache device(s) may result
>>>> in data loss and corruption, so  they MUST be redundant for
>>>> reliability.
>>>>
>>>> The read cache should be large enough to handle all hot and much warm
>>>> data. It provides a second level cache to the in-memory block cache,
>>>> so that cache-misses to evicted blocks can be serviced very quickly
>>>> without waiting for drives to seek. Failure of the read cache device
>>>> degrades performance, but has no impact on data integrity.
>>>>
>>>>   ~~~Chris~~~
>>>>
>>>> On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev at wildintellect.com> wrote:
>>>>> My overall response, I'm a little hesitant to implement so many new
>>>>> technologies at the same time with only 1 person who knows them (Chris G).
>>>>>
>>>>> My opinion
>>>>> +1 on some use of ZFS, if we have a good guide
>>>>> -1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
>>>>> have more people comfortable with them.
>>>>> +1 on trying LXD
>>>>> +1 on Optane
>>>>> ?0 on the SSD caching
>>>>>
>>>>> 1. What tool are we using to configure write-caching on the SSDs? I'd
>>>>> rather not be making an overly complicated database configuration.
>>>>>
>>>>> 2. That seems a reasonable answer to me, though do we still need the
>>>>> SSDs if we use the Optane for caching? It sounds to me like Optane or
>>>>> SSD would suffice.
>>>>>
>>>>> 3. Disks -  Yes if we plan to archive OSGeo Live that would benefit from
>>>>> larger disks. I'm a -1 on storing data for the geodata committee, unless
>>>>> they can find large data that is not publicly hosted elsewhere. At which
>>>>> point I would recommend we find partners to host the data like GeoForAll
>>>>> members or companies like Amazon/Google etc... Keep in mind we also need
>>>>> to plan for backup space. Note, I don't see the total usable disk size
>>>>> of backup in the wiki, can someone figure that out and add it. We need
>>>>> to update https://wiki.osgeo.org/wiki/SAC:Backups
>>>>>
>>>>> New question, which disk are we installing the OS on, and therefore the
>>>>> ZFS packages?
>>>>>
>>>>> Thanks,
>>>>> Alex
>>>>>
>>>>> On 03/13/2018 12:57 PM, Chris Giorgi wrote:
>>>>>>  Hi Alex,
>>>>>> Answers inline below:
>>>>>> Take care,
>>>>>>    ~~~Chris~~~
>>>>>>
>>>>>> On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev at wildintellect.com> wrote:
>>>>>>> On 03/02/2018 12:25 PM, Regina Obe wrote:
>>>>>>>> I'm in IRC meeting with Chris and he recalls the only outstanding thing
>>>>>>>> before hardware purchase was the disk size
>>>>>>>>
>>>>>>>> [15:17] <TemptorSent> From my reply to the mailing list a while back, the
>>>>>>>> pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
>>>>>>>>  [15:19] <TemptorSent> That gives us practical double-redundant storage of
>>>>>>>> 12-16TB and 16-20TB respectively, depending how we use it.
>>>>>>>>
>>>>>>>>
>>>>>>>> If that is all, can we just get the bigger disk and move forward with the
>>>>>>>> hardware purchase.  Unless of course the purchase has already been made.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Regina
>>>>>>>>
>>>>>>>
>>>>>>> Apologies, I dropped the ball on many things while traveling for work...
>>>>>>>
>>>>>>> My take on this, I was unclear on if we really understood how we would
>>>>>>> utilize the hardware for the needs, since there are a few new
>>>>>>> technologies in discussion we haven't used before. Was also in favor of
>>>>>>> small savings as we're over the line item, and that money could be used
>>>>>>> for things like people hours or 3rd party hosting, spare parts, etc...
>>>>>>>
>>>>>>> So a few questions:
>>>>>>> 1. If we get the optane card, do we really need the SSDs? What would we
>>>>>>> put on the SSDs that would benefit from it, considering the Optane card?
>>>>>>
>>>>>> The Optane is intended for caching frequently read data on very fast storage.
>>>>>> As a single unmirrored device, it is not recommended for write-caching of
>>>>>> important data, but will serve quite well for temporary scratch space.
>>>>>>
>>>>>> Mirrored SSDs are required for write caching to prevent failure of a single
>>>>>> device causing data loss. The size of the write cache is very small by
>>>>>> comparison to the read cache, but the write-to-read ratio is much higher,
>>>>>> necessitating the larger total DWPD*size rating. The SSDs can also provide
>>>>>> the fast tablespace for databases as needed, which also have high write-
>>>>>> amplification. The total allocated space should probably be 40-60% of the
>>>>>> device size to ensure long-term endurance. The data stored on the SSDs
>>>>>> can be automatically backed up to the spinning rust on a regular basis for
>>>>>> improved redundancy.
>>>>>>
>>>>>>> 2. What caching tool will we use with the Optane? Something like
>>>>>>> fscache/CacheFS that just does everything accessed, or something
>>>>>>> configured per site like varnish/memcache etc?
>>>>>>
>>>>>> We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
>>>>>> as well as providing an explicit cache where desirable. This configuration can
>>>>>> be modified at any time, as the system's operation is not dependent on the
>>>>>> caching device being active.
>>>>>>
>>>>>>> 3. Our storage growth is modest, not that I don't consider the quoted 8
>>>>>>> or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
>>>>>>> reliability data, and take significantly less time to rebuild in a Raid
>>>>>>> configuration. So how much storage do we really need for Downloads and
>>>>>>> Foss4g archives?
>>>>>>
>>>>>> OSGeo-Live alone has a growth rate and retention policy that indicates needs for
>>>>>> on the order of 100GB-1TB over the next 5 years from my quick calculations, not
>>>>>> including any additional large datasets. Supporting the geodata project would
>>>>>> likely consume every bit of storage we throw at it and still be
>>>>>> thirsty for more in
>>>>>> short order, so I would propose serving only the warm data on the new server and
>>>>>> re-purposing one of the older machines for bulk cold storage and backups once
>>>>>> services have been migrated successfully.
>>>>>>
>>>>>> Remember, the usable capacity will approximately equal the total capacity of a
>>>>>> single drive in a doubly redundant configuration with 4 drives  at
>>>>>> proper filesystem
>>>>>> fill ratios. We'll gain some due to compression, but also want to provision for
>>>>>> snapshots and backup of the SSD based storage, so 1x single drive size is a
>>>>>> good SWAG. Resliver times for ZFS are based on actual stored data, not disk
>>>>>> size, and can be done online with minimal degradation of service, so that's a
>>>>>> moot point I believe.
>>>>>>
>>>>>>> 4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?
>>>>>>
>>>>>> See (1).
>>>>>>
>>>>>>> I think with the answers to these we'll be able to vote this week and order.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Alex
>>>>>>> _______________________________________________
>>>>>>> Sac mailing list
>>>>>>> Sac at lists.osgeo.org
>>>>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>>
>>
>> _______________________________________________
>> Sac mailing list
>> Sac at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/sac



More information about the Sac mailing list