[SAC] [OSGeo] #2706: Set up load balancing configuration for download.osgeo.org

OSGeo trac_osgeo at osgeo.org
Mon Mar 28 15:41:26 PDT 2022


#2706: Set up load balancing configuration for download.osgeo.org
---------------------------+----------------------------------------
 Reporter:  robe           |       Owner:  sac@…
     Type:  task           |      Status:  new
 Priority:  normal         |   Milestone:  Sysadmin Contract 2022-II
Component:  Systems Admin  |  Resolution:
 Keywords:                 |
---------------------------+----------------------------------------
Changes (by robe):

 * milestone:  Sysadmin Contract 2022-I => Sysadmin Contract 2022-II


Comment:

 I've started to work on this -- a lot of the notes are on #2705.

 So I have set up a round-robin for download.osgeo.org and notified via
 project and discuss to use upload.osgeo.org for sftp.  upload.osgeo.org
 will remain only connected to osgeo7-download.

 I have download-cache.osgeo.org for testing which consists of (osgeo4 and
 osgeo9 which pull directly from upload.osgeo.org).

 I have download.osgeo.org which consists of (osgeo7 pulling via
 download.lxd and osgeo9 pulling via upload.osgeo.org).  Note both
 ultimately go thru the nginx on osgeo7, so nginx itself is not issue of
 slow download on osgeo7.

 All osgeo9 does is proxy straight to upload.osgeo.org (nginx) ->
 osgeo7-download, but yet when this is active speed can be like anywhere
 from 6MB/s to 20MB/s.

 How this is possible my guess is the connectivity between the hosts is at
 least 100GB/s but the thru put out to the world is much lower and since
 osgeo7 is heavily taxed network out, it cripples the outbound network.
 osgeo9 only caches the current request pulling at 100-1000GB/s from
 download and since it is not taxxed with as many requests can push out
 much faster.

 Putting this in place immediately ballooned osgeo9 traffic.

 Here are stats from osgeo9:

 osgeo9 vnstat output as of now - note I turned it on 2 days ago, so that
 2022-03: 7.58 tiB is just for the 2 days. The traffic though I think
 includes copying from upload.osgeo.org (so really half of that).

 Anyway it's huge and I can't believe how huge it is.

 On osgeo9 as of now

 vnstat output
 {{{

                      rx      /      tx      /     total    /   estimated
  enp2s0f0:
        2022-02      5.44 GiB  /  425.10 GiB  /  430.54 GiB
        2022-03      3.60 TiB  /    3.98 TiB  /    7.58 TiB  /    8.42 TiB
      yesterday      1.27 TiB  /    1.24 TiB  /    2.51 TiB
          today      2.16 TiB  /    2.11 TiB  /    4.27 TiB  /    4.73 TiB

 }}}



 vnstat -d 5 #for last 5 days

 # note late 3/26 is when I added it to round robin
 {{{
  enp2s0f0  /  daily

           day        rx      |     tx      |    total    |   avg. rate
      ------------------------+-------------+-------------+---------------
      2022-03-24   191.36 MiB |   21.29 GiB |   21.48 GiB |    2.14 Mbit/s
      2022-03-25   471.22 MiB |   21.44 GiB |   21.90 GiB |    2.18 Mbit/s
      2022-03-26   160.08 GiB |  174.21 GiB |  334.29 GiB |   33.24 Mbit/s
      2022-03-27     1.27 TiB |    1.24 TiB |    2.51 TiB |  255.63 Mbit/s
      2022-03-28     2.23 TiB |    2.17 TiB |    4.40 TiB |  483.15 Mbit/s
      ------------------------+-------------+-------------+---------------
      estimated      2.40 TiB |    2.34 TiB |    4.75 TiB |


 }}}


 ----


 Now on osgeo7:

 vnstat
 {{{
                       rx      /      tx      /     total    /   estimated
  eno1:
        2022-02      1.54 TiB  /  104.72 TiB  /  106.26 TiB
        2022-03      1.76 TiB  /  115.49 TiB  /  117.25 TiB  /  130.25 TiB
      yesterday     27.14 GiB  /    2.95 TiB  /    2.97 TiB
          today     44.84 GiB  /    4.18 TiB  /    4.22 TiB  /    4.66 TiB

 }}}


 vnstat -d 5  #for last 5 days


 {{{
  eno1  /  daily

           day        rx      |     tx      |    total    |   avg. rate
      ------------------------+-------------+-------------+---------------
      2022-03-24    75.43 GiB |    4.45 TiB |    4.52 TiB |  460.09 Mbit/s
      2022-03-25    70.75 GiB |    4.40 TiB |    4.47 TiB |  454.90 Mbit/s
      2022-03-26    46.99 GiB |    4.24 TiB |    4.29 TiB |  436.80 Mbit/s
      2022-03-27    27.14 GiB |    2.95 TiB |    2.97 TiB |  302.77 Mbit/s
      2022-03-28    45.75 GiB |    4.26 TiB |    4.31 TiB |  473.15 Mbit/s
      ------------------------+-------------+-------------+---------------
      estimated     49.35 GiB |    4.60 TiB |    4.65 TiB |

 }}}

 So how do we solve this issue.

 1. Finish setting up osgeo8 to also act as a proxy.  This one can be a
 true cache since it has much more disk space than osgeo9. So it can do a
 full rsync of download.  Short term solution.  One issue I am working out
 is that all the traffic coming thru osgeo9 to osgeo7 is being logged as
 osgeo9 on download container.  Which is both good and bad.  Good in that
 it's easy to see how much traffic osgeo9 is picking up, but bad in that I
 don't have a single authoritative log (then again we wouldn't anyway with
 a true round-robin).  osgeo9 logs are showing the true identity of traffic
 it is handling.

 2. Curb traffic - I'm investigating nginx settings to say limit each user
 to 1 or 2 requests per second etc or limit bandwith.  I've been trying -
 https://www.nginx.com/blog/rate-limiting-nginx/  but my settings seem to
 be ignored or not working as expected. There is a lot of bot traffic (we
 really don't need hogging resources). I still need to break up the stats
 to figure out low hanging fruit that should just be killed off.

 3. Setup a true CDN for download around world (future plan, this could be
 costly something like keycdn comes to mind as someone had suggested a
 while back since they offer an open source plan - https://www.keycdn.com
 /open-source-cdn.  Though given how much traffic this is, I suspect we'll
 quickly run out or not be able to use download.osgeo.org for name which
 would make it worse than just adding some extra round robin vms on
 commercial cloud hosters (hetzner, atlantic, digital ocean come to mind).
 Keycdn commercial pricing is $0.01/GB per month for NA/Europe for over 100
 TB/month - which would be the bulk of our traffic.  Given we are doing
 about 105-130 TB if my math is right would be about $1300/mth -- way too
 much.

-- 
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2706#comment:3>
OSGeo <https://osgeo.org/>
OSGeo committee and general foundation issue tracker.


More information about the Sac mailing list