[SAC] Motion: robots.txt, Drupal and performance

Frank Warmerdam warmerdam at pobox.com
Thu Mar 29 13:03:25 EDT 2007

Howard Butler wrote:
> Motion:
> I move that we disable crawlers on osgeo.org until we can do 500+ 
> pages/sec with Drupal.  In our current configuration, even if we moved 
> svn and trac off to the other server, we would not be able to keep up 
> with the bots.


I'm -1 on this.


#1 removing ourselves from search engines is death to OSGeo's promotional

#2 Even with full crawler activity, I haven't see us doing anything
    near 50 pages/sec - I think 500pages/sec is an unnecessarily high bar.

#3 I don't think we are groking where our load is coming from.

My short term suggestion would be:

  o a moderately restrictive robots.txt (ie. screen out stuff like mapguide
    and fdo docs.  That way the total number of pages should be modest.
  o continued monitoring of load and activity to try and better understand
    the issues
  o try some of the performance tweaks previously suggested for Drupal

> I completely agree that a website not on the search engines is 
> equivalent to practically not existing.  I would rather not exist and 
> have the machine stand up.  In the last week, load issues have caused us 
> to burn through *a lot* of our volunteer admin juice.  Someone needs to 
> step forward to aggressively marshal the Drupal performance stuff and 
> get us where we need to be.  Otherwise we can just leave it alone and 
> not be on the indexes.

I agree, and understand that we can't be babying things at the level we
have, but in the last week we have also learned alot about our machine
configuration, and improved quite a few things.

I am prepared to continue tweaking backup schemes, and robots.txt to
reach a healthy steady-state.

Best regards,
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | President OSGeo, http://osgeo.org

More information about the Sac mailing list