Mirror site with wget
Justin Hickey
jhickey at impact1.hpcc.nectec.or.th
Thu Jun 18 13:02:39 EDT 1998
Hello all
I set up a mirror site of Markus's http and ftp sites using wget. After
compiling wget, this is what I did:
1. Defined directories under my http server and my ftp server to hold the grass
data
2. Made changes (set proxies etc.) to the global wgetrc file (default is
/usr/local/etc/wgetrc). The only change worth noting is that I added the
following line so that the host name of the URL's would be dropped when saving
them
add_hostdir = off
Otherwise, a directory is created with the name of the host machine (eg
www.laum.uni-hannover.de) as the root of your mirror site.
3. Wrote a shell script to run the wget commands (I plan to use this script as
a cron job) shown below
------------------------------ begin script ----------------------------------
#! /bin/sh
# Rotate the logs
mv /home/webmaster/mirrorLog/httpLog3 /home/webmaster/mirrorLog/httpLog4
mv /home/webmaster/mirrorLog/httpLog2 /home/webmaster/mirrorLog/httpLog3
mv /home/webmaster/mirrorLog/httpLog1 /home/webmaster/mirrorLog/httpLog2
mv /home/webmaster/mirrorLog/httpLog0 /home/webmaster/mirrorLog/httpLog1
mv /home/webmaster/mirrorLog/ftpLog3 /home/webmaster/mirrorLog/ftpLog4
mv /home/webmaster/mirrorLog/ftpLog2 /home/webmaster/mirrorLog/ftpLog3
mv /home/webmaster/mirrorLog/ftpLog1 /home/webmaster/mirrorLog/ftpLog2
mv /home/webmaster/mirrorLog/ftpLog0 /home/webmaster/mirrorLog/ftpLog1
# Get the grass html pages
wget -b -m -np --cut-dirs=3 -P /www/grass -o /home/webmaster/mirrorLog/httpLog0
http://www.laum.uni-hannover.de/iln/grass/grass42/
# Get the grass ftp pages
wget -b -m -np --cut-dirs=2 -P /ftp/grass -o /home/webmast/mirrorLog/ftpLog0
ftp://130.75.72.14/pub/grass421/
----------------------------------- end script ------------------------------
Notes:
The above shows a rotation of 5 logs each of wget output
The wget commands should all be on one line (of course)
Explanation of the options:
-b run in the background
-m use the mirror options
-np no parent files - only download files that are under
the
given URL, even if there are links to files in other
directories (eg without -np, if there is a link to the
website's top page you will download the whole site
instead of just the grass files)
--cut-dirs=n remove n directories from the path of the URL (eg http
URL is www.laum.uni-hannover.de/iln/grass/grass42/ if n
equals 3 then iln/grass/grass42 is removed from the
URL.
Otherwise the mirror site will have iln/grass/grass42
as its root)
-P <dest> path to the mirror site destination
-o <log> specify the log file
I hope this is of help to anyone who is setting up a mirror site.
--
Sincerely,
Jazzman (a.k.a. Justin Hickey) e-mail: jhickey at hpcc.nectec.or.th
High Performance Computing Center
National Electronics and Computer Technology Center (NECTEC)
Bangkok, Thailand
==================================================================
People who think they know everything are very irritating to those
of us who do. ---Anonymous
Jazz and Trek Rule!!!
==================================================================
More information about the grass-user
mailing list