[SAC] [OSGeo] #2706: Set up load balancing configuration for download.osgeo.org
OSGeo
trac_osgeo at osgeo.org
Mon Mar 28 15:41:26 PDT 2022
#2706: Set up load balancing configuration for download.osgeo.org
---------------------------+----------------------------------------
Reporter: robe | Owner: sac@…
Type: task | Status: new
Priority: normal | Milestone: Sysadmin Contract 2022-II
Component: Systems Admin | Resolution:
Keywords: |
---------------------------+----------------------------------------
Changes (by robe):
* milestone: Sysadmin Contract 2022-I => Sysadmin Contract 2022-II
Comment:
I've started to work on this -- a lot of the notes are on #2705.
So I have set up a round-robin for download.osgeo.org and notified via
project and discuss to use upload.osgeo.org for sftp. upload.osgeo.org
will remain only connected to osgeo7-download.
I have download-cache.osgeo.org for testing which consists of (osgeo4 and
osgeo9 which pull directly from upload.osgeo.org).
I have download.osgeo.org which consists of (osgeo7 pulling via
download.lxd and osgeo9 pulling via upload.osgeo.org). Note both
ultimately go thru the nginx on osgeo7, so nginx itself is not issue of
slow download on osgeo7.
All osgeo9 does is proxy straight to upload.osgeo.org (nginx) ->
osgeo7-download, but yet when this is active speed can be like anywhere
from 6MB/s to 20MB/s.
How this is possible my guess is the connectivity between the hosts is at
least 100GB/s but the thru put out to the world is much lower and since
osgeo7 is heavily taxed network out, it cripples the outbound network.
osgeo9 only caches the current request pulling at 100-1000GB/s from
download and since it is not taxxed with as many requests can push out
much faster.
Putting this in place immediately ballooned osgeo9 traffic.
Here are stats from osgeo9:
osgeo9 vnstat output as of now - note I turned it on 2 days ago, so that
2022-03: 7.58 tiB is just for the 2 days. The traffic though I think
includes copying from upload.osgeo.org (so really half of that).
Anyway it's huge and I can't believe how huge it is.
On osgeo9 as of now
vnstat output
{{{
rx / tx / total / estimated
enp2s0f0:
2022-02 5.44 GiB / 425.10 GiB / 430.54 GiB
2022-03 3.60 TiB / 3.98 TiB / 7.58 TiB / 8.42 TiB
yesterday 1.27 TiB / 1.24 TiB / 2.51 TiB
today 2.16 TiB / 2.11 TiB / 4.27 TiB / 4.73 TiB
}}}
vnstat -d 5 #for last 5 days
# note late 3/26 is when I added it to round robin
{{{
enp2s0f0 / daily
day rx | tx | total | avg. rate
------------------------+-------------+-------------+---------------
2022-03-24 191.36 MiB | 21.29 GiB | 21.48 GiB | 2.14 Mbit/s
2022-03-25 471.22 MiB | 21.44 GiB | 21.90 GiB | 2.18 Mbit/s
2022-03-26 160.08 GiB | 174.21 GiB | 334.29 GiB | 33.24 Mbit/s
2022-03-27 1.27 TiB | 1.24 TiB | 2.51 TiB | 255.63 Mbit/s
2022-03-28 2.23 TiB | 2.17 TiB | 4.40 TiB | 483.15 Mbit/s
------------------------+-------------+-------------+---------------
estimated 2.40 TiB | 2.34 TiB | 4.75 TiB |
}}}
----
Now on osgeo7:
vnstat
{{{
rx / tx / total / estimated
eno1:
2022-02 1.54 TiB / 104.72 TiB / 106.26 TiB
2022-03 1.76 TiB / 115.49 TiB / 117.25 TiB / 130.25 TiB
yesterday 27.14 GiB / 2.95 TiB / 2.97 TiB
today 44.84 GiB / 4.18 TiB / 4.22 TiB / 4.66 TiB
}}}
vnstat -d 5 #for last 5 days
{{{
eno1 / daily
day rx | tx | total | avg. rate
------------------------+-------------+-------------+---------------
2022-03-24 75.43 GiB | 4.45 TiB | 4.52 TiB | 460.09 Mbit/s
2022-03-25 70.75 GiB | 4.40 TiB | 4.47 TiB | 454.90 Mbit/s
2022-03-26 46.99 GiB | 4.24 TiB | 4.29 TiB | 436.80 Mbit/s
2022-03-27 27.14 GiB | 2.95 TiB | 2.97 TiB | 302.77 Mbit/s
2022-03-28 45.75 GiB | 4.26 TiB | 4.31 TiB | 473.15 Mbit/s
------------------------+-------------+-------------+---------------
estimated 49.35 GiB | 4.60 TiB | 4.65 TiB |
}}}
So how do we solve this issue.
1. Finish setting up osgeo8 to also act as a proxy. This one can be a
true cache since it has much more disk space than osgeo9. So it can do a
full rsync of download. Short term solution. One issue I am working out
is that all the traffic coming thru osgeo9 to osgeo7 is being logged as
osgeo9 on download container. Which is both good and bad. Good in that
it's easy to see how much traffic osgeo9 is picking up, but bad in that I
don't have a single authoritative log (then again we wouldn't anyway with
a true round-robin). osgeo9 logs are showing the true identity of traffic
it is handling.
2. Curb traffic - I'm investigating nginx settings to say limit each user
to 1 or 2 requests per second etc or limit bandwith. I've been trying -
https://www.nginx.com/blog/rate-limiting-nginx/ but my settings seem to
be ignored or not working as expected. There is a lot of bot traffic (we
really don't need hogging resources). I still need to break up the stats
to figure out low hanging fruit that should just be killed off.
3. Setup a true CDN for download around world (future plan, this could be
costly something like keycdn comes to mind as someone had suggested a
while back since they offer an open source plan - https://www.keycdn.com
/open-source-cdn. Though given how much traffic this is, I suspect we'll
quickly run out or not be able to use download.osgeo.org for name which
would make it worse than just adding some extra round robin vms on
commercial cloud hosters (hetzner, atlantic, digital ocean come to mind).
Keycdn commercial pricing is $0.01/GB per month for NA/Europe for over 100
TB/month - which would be the bulk of our traffic. Given we are doing
about 105-130 TB if my math is right would be about $1300/mth -- way too
much.
--
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/2706#comment:3>
OSGeo <https://osgeo.org/>
OSGeo committee and general foundation issue tracker.
More information about the Sac
mailing list