Software Query

I’m looking for an application that will allow me to download the entire contents of a web site in one throw. In the old days, I would have used GoZilla for this purpose, but GoZilla seems to have fallen on hard times.

Ideally, I’d like to be able to download just the pages on a site that contain particular keywords, but the whole site would do.

Can anybody recommend an application that does this that (a) runs on Windows XP, (b) ain’t spyware, and (c) is free- or shareware?

Thanks much, folks!

Author: Jimmy Akin

Jimmy was born in Texas, grew up nominally Protestant, but at age 20 experienced a profound conversion to Christ. Planning on becoming a Protestant seminary professor, he started an intensive study of the Bible. But the more he immersed himself in Scripture the more he found to support the Catholic faith, and in 1992 he entered the Catholic Church. His conversion story, "A Triumph and a Tragedy," is published in Surprised by Truth. Besides being an author, Jimmy is the Senior Apologist at Catholic Answers, a contributing editor to Catholic Answers Magazine, and a weekly guest on "Catholic Answers Live."

26 thoughts on “Software Query”

  1. No, this has nothing to do with secret project #4. That’s something I am working on at the moment (with a bunch of other people), though.

  2. Oops. Forgot a closing tag there. Anyway, both links still work (in Firefox at least), only the first spills over to the edge of the second.

  3. Yes, I second the suggestion of Publius. Try wget. It’s a command-line utility, which means that it can also be easily run from within a shell script (or batch file on MS-Windows).

  4. I use webzip from Spidersoft. It works great and downloads everything exactly as it is on the server (which it sounds like you want). It has multiple options so you can set it to take what you want. The program is shareware with a free trial period (if you get your hands on an old version, which works just fine if you are gathering text sites not data base or php driven sites, the trial period doesn’t end)
    http://www.spidersoft.com/

  5. Jimmy, it sounds like you want a web mirroring program, not just a program that downloads chosen files or downloads all linked files in a particular page, some levels deep. A program that can do web mirroring synchronizes every file hosted in an online directory with a backup location on your computer.
    So I don’t think you want Firefox’s extension, Download Them All, because all that does is download everything linked on a given page. Scrapbook is nice, but similarly, it needs you to start on a given page and then you tell it to get what is linked up to X levels deep, so it’s still possible to not get the entire site with Scrapbook, because it depends on HTML links to be aware of content and grab it. I seriously doubt Scrapbook would be able to get everything from a site like catholic.com even if it was set to go many levels deep.
    A cursory search on Tucows gave me the shareware result “AJC Directory Synchronizer”. I’m sure there are many other site mirroring tools. I think you’ll want to use the terms “site” and “mirror” in your search criteria on download sites such as Tucows or Download.com, to get specifically what you’re looking for as opposed to the rest.

  6. Oh, just wanted to add: Besides “site” and “mirror”, try additionally the search term “synchronize”.
    I also found this page for you which can give you an intro to mirroring: http://www.boutell.com/newfaq/creating/mirroring.html
    They recommend wget. Here’s how it says to use wget to mirror a site:

    Type wget -m -k http://xxx.yyy and press enter.

    where http://xxx.yyy of course would be replaced with, say, http://www.catholic.com
    I’d create a new folder just for this first, to keep your downloads tidy, and then enter into the new folder from your command prompt, before issuing the wget command.
    (If Catholic.com is the site you want to mirror, make sure to use the “www” in the URL to avoid downloading a bunch of forum posts from forums.catholic.com, assuming you don’t want to grab those.)

  7. I’d second the recommendation for wget if you’re looking for something scriptable/repeatable. It has loads of parameters/switches for tweaking how it crawls through a website and what it does with the resulting files. I’ve only ever played with it in the Unix/Linux/OSX domain, but I’m sure there are Windows ports available.

  8. I would like to find something that would accomplish the same job, but for Safari (on a Mac of course). Any ideas?

  9. I think Adobe Acrobat Pro v7.0 can do what you’re talking about. I do something similar with Adobe Acrobat Standard 5.0, and I think 7.0 pro has this feature and then some.
    Of course, I’m pretty sure it will download them as PDFs, and that would present some difficulty if you wanted to convert them over to another format, I think. Also, Adobe software is pricey.
    So, FWIW,

Comments are closed.