Tuesday, April 17, 2012

Using wget to download files recursively from https site

wget is a command used mostly in Linux to download files from web links (http, https or ftp). To download a file, a simple invocation of wget like
  wget http://mylibrary.com/build-kits/rel-2.7.1/prod1/prod1-2.7.1.10.tar
will download prod1-2.7.1.10.tar to the current directory.

To download set of files (ex .tar files) from a secured https site, we need to provide several options to it
  wget --no-check-certificate -r -l1 --no-parent -A.tar http://mylibrary.com/build-kits/rel-2.7.1/prod1/
where
  --no-check-certificatedon't validate the server's certificate
  -r or --recursive: specify recursive download
  -l1 or --level=NUMBER: maximum recursion depth (inf or 0 for infinite).
 --no-parentdon't ascend to the parent directory
 -A or --accept=LIST:  comma-separated list of accepted extensions.
 -nd or --no-directories: don't create directories.

No comments: