[SAC] New Hardware, can we purchase now

Chris Giorgi chrisgiorgi at gmail.com
Fri Apr 13 19:15:40 PDT 2018


No, this has the wrong optane drives again it appears:
"Optane: 2 x Intel 280GB 900P Series (3D XPoint, 10 DWPD) HHHL PCIe
3.0 x4 NVMe SSD"
is wrong, they should be the U.2 form devices.

On Fri, Apr 13, 2018 at 8:39 AM, Alex M <tech_dev at wildintellect.com> wrote:
> Chris and Harrison,
>
> Can you confirm that this quote is acceptable and we should move on to
> voting?
>
> https://drive.google.com/open?id=1M491x3mSl51K1o60Bksulf7KOCqkru55
>
>
> Thanks,
> Alex
>
> On 04/02/2018 10:59 AM, Alex M wrote:
>> To clarify I was pondering 2 devices not 3. The answer may be you want
>> the 3 we've already selected so the read cache is separate and larger.
>>
>> Please let me know if there are any other issues with the config before
>> we proceed.
>>
>> Thanks,
>> Alex
>>
>> On 03/30/2018 02:54 PM, Chris Giorgi wrote:
>>> I'm not sure how we would go about fitting a third Optane device --
>>> the quote had HHHL PCIe cards listed, not the required U.2 devices
>>> which go in place of the micron sata ssds.
>>> The PCIe -> U.2 interface card provides 4 PCIe 3.0 lanes to each U.2
>>> interface, which then connects by cable to the drives themselves.
>>> The M.2 card slot on the board should be on it's own set of lanes, as
>>> none of the remaining PCIe slots on the board are occupied due to
>>> space constraints.
>>> The reason for using the more expensive (and faster) Optanes for the
>>> write cache is that a write-cache failure can lead to data corruption,
>>> and they have an order of magnitude more write endurance than a
>>> standard SSD.
>>> The read cache can use a larger, cheaper (but still fast) SSD because
>>> it see much lower write-amplification than the write cache and a
>>> failure won't cause corruption.
>>>
>>>    ~~~Chris~~~
>>>
>>> On Fri, Mar 30, 2018 at 11:53 AM,  <harrison.grundy at astrodoggroup.com> wrote:
>>>> Can someone confirm that the 4x PCIe slots aren't shared with the M.2 slot on the board and that 2 independent 4x slots are available?
>>>>
>>>> If all 3 (SSD, Optanes) are on a single 4x bus, it kinda defeats the purpose.
>>>>
>>>> Harrison
>>>>
>>>>
>>>>
>>>> Sent via the BlackBerry Hub for Android
>>>>
>>>>   Original Message
>>>> From: tech_dev at wildintellect.com
>>>> Sent: March 31, 2018 02:21
>>>> To: sac at lists.osgeo.org
>>>> Reply-to: tech at wildintellect.com; sac at lists.osgeo.org
>>>> Cc: chrisgiorgi at gmail.com
>>>> Subject: Re: [SAC] New Hardware, can we purchase now
>>>>
>>>> Here's the latest quote with the modifications Chris suggested.
>>>>
>>>> One question, any reason we can't just use the Optanes for both read &
>>>> write caches?
>>>>
>>>> Otherwise unless there are other suggestions or clarifications, I will
>>>> send out another thread for an official vote to approve. Note the price
>>>> is +$1,000 more than originally budgeted.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> On 03/14/2018 09:47 PM, Chris Giorgi wrote:
>>>>> Further investigation into the chassis shows this is the base sm is using:
>>>>> https://www.supermicro.com/products/system/1U/6019/SYS-6019P-MT.cfm
>>>>> It has a full-height PCIe 3.0 x8 port, as well as a M2 PCIe 3.0 x4
>>>>> slot on the motherboard.
>>>>> In light of this, I am changing my recommendation to the following,
>>>>> please follow-up with sm for pricing:
>>>>> 2ea. Intel Optane 900p 280GB PCIe 3.0 x4 with U.2 interfaces,
>>>>> replacing SATA SSDs
>>>>> ..connected to either a SuperMicro AOC-SLG3-2E4R or AOC-SLG3-2E4R
>>>>> (Depending on compatibility)
>>>>> Then, a single M.2 SSD such as a 512GB Samsung 960 PRO in the motherboard slot.
>>>>>
>>>>> With this configuration, the Optanes supply a very fast mirrored write
>>>>> cache (ZFS ZIL SLOG), while the M.2 card provides read caching (ZFS
>>>>> L2ARC), and no further cache configuration needed.
>>>>>
>>>>> Let me know if that sound more palatable.
>>>>>    ~~~Chris~~~
>>>>>
>>>>>
>>>>> On Wed, Mar 14, 2018 at 10:36 AM, Chris Giorgi <chrisgiorgi at gmail.com> wrote:
>>>>>> Alex,
>>>>>>
>>>>>> Simply put, write caching requires redundant devices; read caching does not.
>>>>>>
>>>>>> The write cache can be relatively small -- it only needs to handle
>>>>>> writes which have not yet been committed to disks. This allows sync
>>>>>> writes to finish as soon as the data hits the SSD, with the write to
>>>>>> disk being done async. Failure of the write cache device(s) may result
>>>>>> in data loss and corruption, so  they MUST be redundant for
>>>>>> reliability.
>>>>>>
>>>>>> The read cache should be large enough to handle all hot and much warm
>>>>>> data. It provides a second level cache to the in-memory block cache,
>>>>>> so that cache-misses to evicted blocks can be serviced very quickly
>>>>>> without waiting for drives to seek. Failure of the read cache device
>>>>>> degrades performance, but has no impact on data integrity.
>>>>>>
>>>>>>   ~~~Chris~~~
>>>>>>
>>>>>> On Wed, Mar 14, 2018 at 9:05 AM, Alex M <tech_dev at wildintellect.com> wrote:
>>>>>>> My overall response, I'm a little hesitant to implement so many new
>>>>>>> technologies at the same time with only 1 person who knows them (Chris G).
>>>>>>>
>>>>>>> My opinion
>>>>>>> +1 on some use of ZFS, if we have a good guide
>>>>>>> -1 on use of Funtoo, We've prefered Debian or Ubuntu for many years and
>>>>>>> have more people comfortable with them.
>>>>>>> +1 on trying LXD
>>>>>>> +1 on Optane
>>>>>>> ?0 on the SSD caching
>>>>>>>
>>>>>>> 1. What tool are we using to configure write-caching on the SSDs? I'd
>>>>>>> rather not be making an overly complicated database configuration.
>>>>>>>
>>>>>>> 2. That seems a reasonable answer to me, though do we still need the
>>>>>>> SSDs if we use the Optane for caching? It sounds to me like Optane or
>>>>>>> SSD would suffice.
>>>>>>>
>>>>>>> 3. Disks -  Yes if we plan to archive OSGeo Live that would benefit from
>>>>>>> larger disks. I'm a -1 on storing data for the geodata committee, unless
>>>>>>> they can find large data that is not publicly hosted elsewhere. At which
>>>>>>> point I would recommend we find partners to host the data like GeoForAll
>>>>>>> members or companies like Amazon/Google etc... Keep in mind we also need
>>>>>>> to plan for backup space. Note, I don't see the total usable disk size
>>>>>>> of backup in the wiki, can someone figure that out and add it. We need
>>>>>>> to update https://wiki.osgeo.org/wiki/SAC:Backups
>>>>>>>
>>>>>>> New question, which disk are we installing the OS on, and therefore the
>>>>>>> ZFS packages?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Alex
>>>>>>>
>>>>>>> On 03/13/2018 12:57 PM, Chris Giorgi wrote:
>>>>>>>>  Hi Alex,
>>>>>>>> Answers inline below:
>>>>>>>> Take care,
>>>>>>>>    ~~~Chris~~~
>>>>>>>>
>>>>>>>> On Mon, Mar 12, 2018 at 10:41 AM, Alex M <tech_dev at wildintellect.com> wrote:
>>>>>>>>> On 03/02/2018 12:25 PM, Regina Obe wrote:
>>>>>>>>>> I'm in IRC meeting with Chris and he recalls the only outstanding thing
>>>>>>>>>> before hardware purchase was the disk size
>>>>>>>>>>
>>>>>>>>>> [15:17] <TemptorSent> From my reply to the mailing list a while back, the
>>>>>>>>>> pricing for larger drives: (+$212 for 4x10he or +$540 for 4x12he)
>>>>>>>>>>  [15:19] <TemptorSent> That gives us practical double-redundant storage of
>>>>>>>>>> 12-16TB and 16-20TB respectively, depending how we use it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If that is all, can we just get the bigger disk and move forward with the
>>>>>>>>>> hardware purchase.  Unless of course the purchase has already been made.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Regina
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Apologies, I dropped the ball on many things while traveling for work...
>>>>>>>>>
>>>>>>>>> My take on this, I was unclear on if we really understood how we would
>>>>>>>>> utilize the hardware for the needs, since there are a few new
>>>>>>>>> technologies in discussion we haven't used before. Was also in favor of
>>>>>>>>> small savings as we're over the line item, and that money could be used
>>>>>>>>> for things like people hours or 3rd party hosting, spare parts, etc...
>>>>>>>>>
>>>>>>>>> So a few questions:
>>>>>>>>> 1. If we get the optane card, do we really need the SSDs? What would we
>>>>>>>>> put on the SSDs that would benefit from it, considering the Optane card?
>>>>>>>>
>>>>>>>> The Optane is intended for caching frequently read data on very fast storage.
>>>>>>>> As a single unmirrored device, it is not recommended for write-caching of
>>>>>>>> important data, but will serve quite well for temporary scratch space.
>>>>>>>>
>>>>>>>> Mirrored SSDs are required for write caching to prevent failure of a single
>>>>>>>> device causing data loss. The size of the write cache is very small by
>>>>>>>> comparison to the read cache, but the write-to-read ratio is much higher,
>>>>>>>> necessitating the larger total DWPD*size rating. The SSDs can also provide
>>>>>>>> the fast tablespace for databases as needed, which also have high write-
>>>>>>>> amplification. The total allocated space should probably be 40-60% of the
>>>>>>>> device size to ensure long-term endurance. The data stored on the SSDs
>>>>>>>> can be automatically backed up to the spinning rust on a regular basis for
>>>>>>>> improved redundancy.
>>>>>>>>
>>>>>>>>> 2. What caching tool will we use with the Optane? Something like
>>>>>>>>> fscache/CacheFS that just does everything accessed, or something
>>>>>>>>> configured per site like varnish/memcache etc?
>>>>>>>>
>>>>>>>> We can do both if desirable, allocating large cache for the fs (L2ARC in ZFS),
>>>>>>>> as well as providing an explicit cache where desirable. This configuration can
>>>>>>>> be modified at any time, as the system's operation is not dependent on the
>>>>>>>> caching device being active.
>>>>>>>>
>>>>>>>>> 3. Our storage growth is modest, not that I don't consider the quoted 8
>>>>>>>>> or 10 TB to be reliable, but the 2 and 4 TB models have a lot more
>>>>>>>>> reliability data, and take significantly less time to rebuild in a Raid
>>>>>>>>> configuration. So how much storage do we really need for Downloads and
>>>>>>>>> Foss4g archives?
>>>>>>>>
>>>>>>>> OSGeo-Live alone has a growth rate and retention policy that indicates needs for
>>>>>>>> on the order of 100GB-1TB over the next 5 years from my quick calculations, not
>>>>>>>> including any additional large datasets. Supporting the geodata project would
>>>>>>>> likely consume every bit of storage we throw at it and still be
>>>>>>>> thirsty for more in
>>>>>>>> short order, so I would propose serving only the warm data on the new server and
>>>>>>>> re-purposing one of the older machines for bulk cold storage and backups once
>>>>>>>> services have been migrated successfully.
>>>>>>>>
>>>>>>>> Remember, the usable capacity will approximately equal the total capacity of a
>>>>>>>> single drive in a doubly redundant configuration with 4 drives  at
>>>>>>>> proper filesystem
>>>>>>>> fill ratios. We'll gain some due to compression, but also want to provision for
>>>>>>>> snapshots and backup of the SSD based storage, so 1x single drive size is a
>>>>>>>> good SWAG. Resliver times for ZFS are based on actual stored data, not disk
>>>>>>>> size, and can be done online with minimal degradation of service, so that's a
>>>>>>>> moot point I believe.
>>>>>>>>
>>>>>>>>> 4. Do we know what we plan to put on the SSD drives vs the Spinning Disks?
>>>>>>>>
>>>>>>>> See (1).
>>>>>>>>
>>>>>>>>> I think with the answers to these we'll be able to vote this week and order.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Alex
>>>>>>>>> _______________________________________________
>>>>>>>>> Sac mailing list
>>>>>>>>> Sac at lists.osgeo.org
>>>>>>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>>>>>>
>>>>
>>>> _______________________________________________
>>>> Sac mailing list
>>>> Sac at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/sac
>>
>> _______________________________________________
>> Sac mailing list
>> Sac at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/sac
>>
>
> _______________________________________________
> Sac mailing list
> Sac at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/sac


More information about the Sac mailing list