Archive a website with wget

On occasion I have needed to download the contents of a website before it was decommissioned. I wanted to keep the content, but was not really interested in keeping the entire website and its data. A quick and easy way to do this is with wget.

To do this, open a terminal and type:

mkdir website-name
cd website-name

You can name the folder anything you like.

Now run the following command:

wget --limit-rate=200k --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla https://joshdawes.com

Make sure you are archiving a website you have permission for, such as a site you own.

Replace joshdawes.com with the actual site you want to archive. Here is what each flag does:

–limit-rate=200k: Limit the download to 200 KB/sec. Higher rates can look suspicious.
–no-clobber: Do not overwrite existing files (useful for resuming).
–convert-links: Make links work locally, offline.
–random-wait: Random waits between downloads.
-r: Recursive; downloads the full site.
-p: Pulls in assets (images, etc).
-E: Ensures the right file extensions.
-e robots=off: Ignore robots.txt restrictions (only use this where you have permission).
-U mozilla: Spoof a browser user agent.