ATTN: mirror sites using wget

Justin Hickey jhickey at impact1.hpcc.nectec.or.th
Fri Nov 27 05:13:13 EST 1998


Hello all

If you are maintaining a mirror site of Markus's web site using wget, you may
have the following problem.

I recently found that wget was downloading extra files from Markus's site that
overwrote some of the pages of the grass site. My wget command was as follows:

wget -b -m -np --cut-dirs=1 -P /www/hpcc/grass -o mirrorLog
http://www.geog.uni-hannover.de/grass/

As far as I know this is supposed to only download files under
http://www.geog.uni-hannover.de/grass/, however, wget was downloading files
from http://www.geog.uni-hannover.de/grasslinks/ and
http://www.geog.uni-hannover.de/grasslinksB/ as well. The effect it had was
that the grasslinks/index.html file overwrote the grass/index.html file wiping
out the main home page for the grass site. It may have wiped out other files as
well if they had the same name.

I don't know why wget does this but I have mailed the wget mailing list to see
what they say. If you find your mirror site has the same problem, then a fix is
to use the -X option to exclude the "offending" directories like so:

wget -b -m -np -X /grasslinks/,/grasslinksB/ --cut-dirs=1 -P /www/hpcc/grass -o
mirrorLog http://www.geog.uni-hannover.de/grass/

Just thought I'd let you know.

-- 
Sincerely,

Jazzman (a.k.a. Justin Hickey)  e-mail: jhickey at hpcc.nectec.or.th
High Performance Computing Center
National Electronics and Computer Technology Center (NECTEC)
Bangkok, Thailand
==================================================================
People who think they know everything are very irritating to those
of us who do.  ---Anonymous

Jazz and Trek Rule!!!
==================================================================



More information about the grass-user mailing list