[SAC] Google walking trac...

Frank Warmerdam warmerdam at pobox.com
Fri Mar 30 00:28:55 EDT 2007


Folks,

I was doing a bit of digging around to see what the httpd processes
were doing that were pegging at 100% for a while.  I had blamed this on
subversion.   It looks like I was partly right, and partly wrong.  Digging
in /proc for the httpd's I found they were doing trac related stuff, and
looking in /var/log/httpd/trac-access_log I found these entry near the end
of the file:

66.249.66.10 - - [29/Mar/2007:23:17:40 -0400] "GET 
/mapguide/changeset/1142/trunk?old_path=%2F&old=1142&format=zip HTTP/1.1" 200 
123170821

66.249.66.10 - - [29/Mar/2007:23:21:02 -0400] "GET 
/fdo/changeset/2615/trunk?old_path=%2F&old=2615&format=zip HTTP/1.1" 200 208304671

It appears that googlebots have been walking the changeset links available
in Trac, and requesting changesets that were basically the complete state
of the svn repository - in the above cases 208MB and 123MB each.

So, I think this is a significant part of our problem.  I also suspect, but
don't know, that our httpd memory leaks (httpd's were continuing to bloat
through the day till the machine came down if they weren't restarted) was
related to this machinery.

Howard and I are putting robots.txt's into place that we home will screen
direct access to svn and access to svn through trac.  Hopefully this will
moderate our load issues.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org



More information about the Sac mailing list