Mirror site with wget

Justin Hickey jhickey at impact1.hpcc.nectec.or.th
Thu Jun 18 13:02:39 EDT 1998


Hello all

I set up a mirror site of Markus's http and ftp sites using wget. After
compiling wget, this is what I did:

1. Defined directories under my http server and my ftp server to hold the grass
data

2. Made changes (set proxies etc.) to the global wgetrc file (default is
/usr/local/etc/wgetrc). The only change worth noting is that I added the
following line so that the host name of the URL's would be dropped when saving
them

	add_hostdir = off

Otherwise, a directory is created with the name of the host machine (eg
www.laum.uni-hannover.de) as the root of your mirror site.

3. Wrote a shell script to run the wget commands (I plan to use this script as
a cron job) shown below

------------------------------ begin script ----------------------------------

#! /bin/sh

# Rotate the logs
mv /home/webmaster/mirrorLog/httpLog3 /home/webmaster/mirrorLog/httpLog4
mv /home/webmaster/mirrorLog/httpLog2 /home/webmaster/mirrorLog/httpLog3
mv /home/webmaster/mirrorLog/httpLog1 /home/webmaster/mirrorLog/httpLog2
mv /home/webmaster/mirrorLog/httpLog0 /home/webmaster/mirrorLog/httpLog1
mv /home/webmaster/mirrorLog/ftpLog3 /home/webmaster/mirrorLog/ftpLog4
mv /home/webmaster/mirrorLog/ftpLog2 /home/webmaster/mirrorLog/ftpLog3
mv /home/webmaster/mirrorLog/ftpLog1 /home/webmaster/mirrorLog/ftpLog2
mv /home/webmaster/mirrorLog/ftpLog0 /home/webmaster/mirrorLog/ftpLog1

# Get the grass html pages
wget -b -m -np --cut-dirs=3 -P /www/grass -o /home/webmaster/mirrorLog/httpLog0
http://www.laum.uni-hannover.de/iln/grass/grass42/

# Get the grass ftp pages
wget -b -m -np --cut-dirs=2 -P /ftp/grass -o /home/webmast/mirrorLog/ftpLog0
ftp://130.75.72.14/pub/grass421/

----------------------------------- end script ------------------------------

Notes:

The above shows a rotation of 5 logs each of wget output

The wget commands should all be on one line (of course)

Explanation of the options:

	-b		run in the background
	-m		use the mirror options
	-np		no parent files - only download files that are under
the
			given URL, even if there are links to files in other
			directories (eg without -np, if there is a link to the
			website's top page you will download the whole site
			instead of just the grass files)
	--cut-dirs=n	remove n directories from the path of the URL (eg http
			URL is www.laum.uni-hannover.de/iln/grass/grass42/ if n
			equals 3 then iln/grass/grass42 is removed from the
URL.
			Otherwise the mirror site will have iln/grass/grass42
			as its root)
	-P <dest>	path to the mirror site destination
	-o <log>	specify the log file

I hope this is of help to anyone who is setting up a mirror site.

-- 
Sincerely,

Jazzman (a.k.a. Justin Hickey)  e-mail: jhickey at hpcc.nectec.or.th
High Performance Computing Center
National Electronics and Computer Technology Center (NECTEC)
Bangkok, Thailand
==================================================================
People who think they know everything are very irritating to those
of us who do.  ---Anonymous

Jazz and Trek Rule!!!
==================================================================



More information about the grass-user mailing list