[SAC] osgeo.org outage + 'fix'

Christopher Schmidt crschmidt at metacarta.com
Tue Mar 4 23:14:46 EST 2008


Frank asked me to look into an outage on osgeo. (Or rather, begged for
someone to help him ;)

I found that there were many network connections open to 202.114.10.251
(via netstat), and then got into HTTP logs to find that that IP address
was repeatedly downloading specific files from the ossim SVN repository.

This is in the same APNIC block that has been giving me problems on
hypercube, so I've set up a 'deny' rule for this IP address in the ossim
SVN repo config:

Order allow,deny
Allow from all
Deny from 202.114.10.251

I've also enabled http://svn.osgeo.org/server-status for the time being:
at some point we should limit this based on IP (possibly just to
localhost), but in the short term, it provides a useful debugging tool
for seeing a quick apache status. (The reason to hide it is that there
is information there that isn't 'public', like which URLs are being
visited. Since I consider most OSGeo information generally public
knowledge, I don't see this as a major risk, but still think that it's
worth being a bit less cavalier when someone has more time: It's in
httpd/conf/httpd.conf, under the sever-status Location block.)

The cause of the problem was simply that the IP address in question was
opening many connections, and holding them open for a long period of
time while downloading even small files. It's not clear why the IP
address/person behind it was doing thi: It behaves somewhat like a robot
gone horribly wrong, but loops again and again, so it's not clear how
one could write a bot *that* bad and not notice. (Also not clear: Why it
is coming from APNIC, a somewhat unlikely candidate for ossim
downloads.) When the number of connections opened got to 50, the apache
server appeared to have 'locked up' due to lack of available children,
only letting traffic in as a remote IP address finally dropped a
connection. Blocking the IP address makes the returned data a very small
403 page, letting the content move in and out easier.

I've checked my changes in, but would invite anyone with more insight to
do a more thorough job: this is the solution I've been using for
hypercube, and it's working somewhat okay, though we should probably
investigate a more automatic solution than "notice the server is dead,
and block the offending IP address."  

Regards,
-- 
Christopher Schmidt
MetaCarta


More information about the Sac mailing list