Broken Links  Broken Links

View the latest version of this manual online at http://mindprod.com/application/brokenlinks.manual.html.
Introduction Sample HTML Export
Why use Xenu? Repairing Broken Links
How to Use Xenu Automatically Repairing Redirects
Configuring Brokenlinks ReplaceURLs
Running Brokenlinks Futures
Presumed Good File Links
Sample Text Report

Introduction

BrokenLinks is a tool to help you find and track broken links on your website, namely URL (Uniform Resource Locator)s that no longer point to anything useful. It is a back end to the Xenu broken link detector that compensates for XENU’s weakness of overwhelming you with reports of links that are not really broken. You get the basic idea. BrokenLinks whittles XENU’s giant list of broken links to the ones you should look at first. This saves you immense amounts of time researching links that are not really broken.
Both XENU and BrokenLinks share a common limitation. They can’t detect a broken link that has been redirected to a working place-holder site, e.g. one advertising that the domain is up for sale. Similarly, some sites just quietly redirect all broken links to the home page. BrokenLinks cannot detect that. Most embarrassingly, BrokenLinks can’t detect a domain bought out by a pornography company. You can still have people threaten to sue or kill you for deliberately trying to send to them to a porn site.

Why use Xenu?

Finding the broken links is only 10% of the work. Fixing them is what is so labour intensive. If you let your website deteriorate with broken links, visitors become frustrated, and stop visiting. Having clean links encourages Google to take your site more cleanly.

How to Use Xenu

Download and install a free copy of Xenu Link Sleuth.

First you spider your local copy of your website with Xenu. Read the Xenu documentation on how to do that. You first have to be sure XENU is working properly before BrokenLinks will work. Use XENU directly to find orphans.

Once you are pretty sure you have XENU configured correctly, run it on your local website, with external link checking turned on.

Be careful to verify the check external links option is on at the very last moment before you start the spidering.
When it has finished spidering your website and checking all the links, click Export Page Map to TAB-separated File. (Don’t confuse this with Export to TAB-separated File). You may optionally get XENU to also produce an HTML (Hypertext Markup Language) report.

Configuring Brokenlinks

Download and install a free copy of Brokenlinks.

The first time you use BrokenLinks you must configure it by creating a text file with a text editor. It will look something like this:

Configure it according to the embedded comments. Then save the file, giving it a name of the form xxxx.properties.

The properties are all pretty straightforward except for brokenForgivenessDays=7.

  1. If you have only a handful of broken links, and you religiously run XENU/BrokenLinks every day, you might set brokenForgivenessDays=2, though I still set it to 6. One advantage of running every day is you stay on top of researching and repairing broken links. You are never faced with large numbers of them to fix all at once. I personally run BrokenLinks twice a day so that I test sites at different times of day, avoiding treating them as dead when they are just temporarily down for backup. Further, that way I rarely have more than a couple of links to research at any one time.
  2. If you have only a handful of broken links, and you religiously run XENU/BrokenLinks twice a week, use brokenForgivenessDays=5
  3. If you don’t want to think about brokenForgivenessDays, leave this property out, and accept the default: brokenForgivenessDays=7
  4. If you have only a handful of broken links, and you religiously run XENU/BrokenLinks every week, use brokenForgivenessDays=8
  5. If you have hundreds of broken links, and you run XENU/BrokenLinks only every once in a while, use brokenForgivenessDays=14
  6. You can experiment setting it to various values. The smaller the brokenForgivenessDays number, the sooner and the more broken links will be revealed to you. However, you will be pestered with more temporarily broken links. If you are feeling overwhelmed by broken links, increase the value to show you only the deadest links. The minimum value that makes much sense is 1. XENU itself effectively uses 0.
BrokenLinks Files
file Description
brokenlinks.html List of broken links that have remained broken for a number of days. In HTML format so that you can embed them in an HTML page to view and research them with a browser.
brokenlinks.properties Master BrokenLinks configuration file. Names and locates other files. You might rename it to some other *.properties name. You specify the name of this file on the BrokenLinks command line. It contains links to the names and locations of the other files.
DESCRIPT.ION Optional TCC (Take Command line) file descriptions for the TCC Describe program.
history.bin Link checking history database. In binary, not human readable. It contains a records of all the links on your website, when they were last tested good and last tested bad, (echoes of Santa Claus). It gets updated each time you run BrokenLinks with information from the XENU spider and from BrokenLink’s own slower but more reliable tests.
permanentRedirects.csv URLs that have been permanently redirected. You will likely want to update most of these to the new value with ReplaceURLs.
presumedgood.csv List of presumed good URLs that BrokenLinks will not check because they fail even though they are actually OK.
report.txt Report from BrokenLinks on how the last run went.
temporaryRedirects.csv URLs that have been temporarily redirected. You might want to update a few of these to the new value with ReplaceURLs.
xenupage.csv Output from XENU that BrokenLinks uses for input.
_O_V_E_R_V_I_E_W.txt An optional one-line description of each file.

Running Brokenlinks

Now run BrokenLinks like this:
java.exe -jar brokenlinks.jar xxxx.properties
If you have Jet, you simplify that to:
brokenlinks.exe xxxx.properties

You will get a report of the critical broken links to research both in text and html form. Embed the html in a web page somewhere. Here is my list of broken links for mindprod.com. The layout is designed so make it easy to research the problems. You can click to get the page where the broken link is, or click to where it was trying to go.

Then research the broken links and fix them. The run XENU again, click Export Page Map to TAB-separated File and run BrokenLinks. Run this cycle at different times of the day, since some websites shutdown part of the day for maintenance. You want to catch them when they are up. Run the cycle after repairing a batch of links to see how you did. After you get the list whittled down to none, run the cycle weekly, twice weekly or daily to stay on top of the broken links. I find running it daily works best since you never get overwhelmed with work, and thus are not tempted to postpone the work.

If you are pressed for time, you an also rerun BrokenLinks without a new XENU run. This will catch most of the problems you would rerunning XENU, but not all.

If you erase the history.bin file, it will automatically start over from scratch collecting history.

It is best to run BrokenLinks at various times of day so that you won’t think a site is down that is just offline for an hour each day for backup. I am a bit compulsive. I run it twice a day.

Presumed Good File

If you find a link that XENU/BrokenLinks thinks is broken, but which is actually ok, or it doesn’t matter for some reason, add it to your list of presumed good links. The presumedgood.csv CSV (Comma Separated Value) file will look something like this:
Thereafter that presumed good link will be excluded from the broken links list.

Sample Text Report

Here is roughly what the text report that BrokenLinks produces will look like:

Sample HTML Export

Here is roughly what the combined broken links and presumed good HTML report that BrokenLinks produces will look like:

Broken Links Sorted by Error Code

There are 5 links that have been broken for at least 6 days yet to be fixed. Last revised: 2010-02-02

Broken Links by Status Code
Status Code Links To
    Linked From
no connecthttp://www.spectra-strip.amphenol.com/
 /bgloss/cables.html
Not Foundhttp://chronicle.augusta.com/stories/110602/met_143-4952.000.shtml
 http://www.augustachronicle.com/stories/110602/met_143-4952.000.shtml
Not Foundhttp://lyle.smu.edu/~lghuang/CSE8313/notes/Module2_UML_Part1.ppt
 /jgloss/uml.html
Not Foundhttp://ru.rbytes.net/info/submit/
 /jgloss/minorhassle.html
Not Foundhttp://www.ifaw.org/ifaw_canada_english/join_campaigns/save_baby_seals_end_the_seal_hunt/seal_hunt_facts_canada_doesn't_want_you_to_know/index.php
 /animalrights/sealhunt.html

Links Presumed Good

Xenu claims the following links are broken, but they have been manually found to be good. They should be manually rechecked from time to time. The problem may be an unknown SSL certificate authority which needs to be OKed manually, (a missing/unknown/uninstalled certificate root authority) or it may be the website sends the data, but with not-found status.

There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02

Links Presumed Good
Link To
http://cgi.omroep.nl/cgi-bin/streams?/rnw/smac/2004/amsterdam_forum__chomsky_on_iraq_and_war_on_terror_20051216_low.rm
http://localhost/
http://www.akademika.no/
http://www.glish.com/css/7.asp
http://www.microsoft.com/windows/windowsmedia/download/AllDownloads.aspx
http://www.os2site.com/sw/internet/time/clock2.htm
http://www.telegraph.co.uk/news/yourview/1562772/David-Cameron-answers-your-questions.html
http://www.thefreedictionary.com/
http://www.theserverside.com/tt/books/wiley/masteringEJB/
https://player.helixcommunity.org/2008/help/playerfaq.html
https://tsa.aloaha.com/
https://www.eecs.harvard.edu/mailman/listinfo/jopt-users
https://www.foldershare.com/welcome.aspx

SSL (Secure Sockets Layer) certificate authority which needs to be OKed manually, (a missing/unknown/uninstalled certificate root authority) or it may be the website sends the data, but with not-found status.

There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02

Links Presumed Good
Link To
http://cgi.omroep.nl/cgi-bin/streams?/rnw/smac/2004/amsterdam_forum__chomsky_on_iraq_and_war_on_terror_20051216_low.rm
http://localhost/
http://www.akademika.no/
http://www.glish.com/css/7.asp
http://www.microsoft.com/windows/windowsmedia/download/AllDownloads.aspx
http://www.os2site.com/sw/internet/time/clock2.htm
http://www.telegraph.co.uk/news/yourview/1562772/David-Cameron-answers-your-questions.html
http://www.thefreedictionary.com/
http://www.theserverside.com/tt/books/wiley/masteringEJB/
https://player.helixcommunity.org/2008/help/playerfaq.html
https://tsa.aloaha.com/
https://www.eecs.harvard.edu/mailman/listinfo/jopt-users
https://www.foldershare.com/welcome.aspx


Repairing Broken Links

Here are some tips to help you find a replacement link for a broken one.

Automatically Repairing Redirects

BrokenLinks can automatically repair permanently redirected URLs. Websites often reorganise, and leave behind tombstones on the old page that describe where the information is now. Your browser will automatically follow these chains to find the new information. You know this has happened when the URL displayed when the page in found does not match the original. It is best to update your web pages with the new link since they browse faster by going direct to the link, and because they will continue to work if the tombstone is deleted.

BrokenLinks has a feature to automatically maintain these changes for you. BrokenLinks automatically exports a redirects.csv CSV file that gives the old URL, the new URL, and the pages where the old URL appears. It is best to manually examine this list to prune any changes you don’t want to apply, e.g. Yahoo’s replacement links that go preposterously on and on and one. Then use replaceURLs to process that file and apply the changes to your local website mirror. Best take a backup before you try it out. If you generate URLs with code, import them from databases, replaceURLs will correct your website, but the changes will be undone the next time your regenerate your HTML.

You can use the CSVPatch utility to automatically replace URLs in CSV files as well.

Here is the TakeCommand script I use to run BrokenLinks, automatically discard some of the redirects I won’t apply, let me edit the list of both permanent and temporary links, and also use them to update two CSV files, hassle.csv and air.csv.

I also scan the temporary redirects looking for redirects to pages with names containing words like error or suspended. I then manually check these out. Usually it means the website owner has not paid his ISP (Internet Service Provider) bills and the account has been suspended. Sometimes sites have died, or not paid bills and the owner or ISP redirects them to another living site, sometimes the ISP ’s or someone else’s parking site. He should use a permanent redirect, but uses a temporary one instead. I can catch these by eyeballing the list. The list is mostly just internal housekeeping junk, so I don’t scan it carefully every day.

ReplaceURLs

ReplaceURLs is a command line utility that takes only one parameter, the name of the file of redirects. e.g.
rem run replaceURLs to update all the redirected URLS on a website
java.exe replaceurls.jar  E:\redirects.csv
You don’t have to tell replaceURLs where your local website mirror is. The names of the files that need changing are in redirects.csv. You told it earlier when you configured BrokenLinks where your website files were and you also told XENU.

You might want to repair some of the links manually. You want to make sure the new link truly points to the original information, not some parking page. Just prune the ones you want to ignore or handle manually, and feed the remainder to replaceURLs

ReplaceURLs presumes all your URLs are pure lower case. It won’t find them if they are mixed or all upper case, (except for the tail end path part). Some validator programs will complain about URLS not in all lower case. You can condition your website to use all lower case URLs by running TidyURLs.

TidyURLs

TidyURLs will clean up the links on your website, making sure they are lower case (just the host part). They will put quotes around URLs that are missing them. It will replace spaces in URLs with %20. There are many other cleanups and validations. l It is a command line utility that allows the switches -s for subdirectories too, -q for quiet, -v for verbose, -dry for dry run (does not actually change your files, just tells you what it would do if the -dry option were not there. It allows you to specify which files or file trees you want to process. It automatically ignores all files except *.html files. Here is how you typically use it:

Futures

Here are various ways I hope eventually to improve BrokenLinks:
  1. Vastly improve the speed of rechecking links by checking 30 of them time simultaneously the way XENU does.
  2. Convert to Java Web Start. This will make the program easier to use by novices since it will not require configuration. The Configuration properties file will be replaced by a GUI (Graphic User Interface). The user will not have to manually allocate a directory for the history file.
  3. Remove the dependence on XENU. Handle everything it does in BrokenLinks. This will as a side effect make BrokenLinks notice local links that are in the wrong case. Wrong case links work under Windows and XENU, but fail after you upload to a Unix-based webserver.
  4. Avoid checking links that recently checked OK to vastly speed up link checking. You could then afford to do it daily or even before every upload. XENU rechecks everything from scratch every time you run it.
  5. Tools to insert warnings styles on broken links so they will have an icon next to them warning your visitors of the problem and letting them know you are aware of it.

Acquiring HtmlMacros

PackageVersionReleasedLicenceLanguageNotes 
brokenlinks
Brokenlinks
2.3 2011-02-08 free Java
more infoprecismanualscreenshotbrowse source repository
for the current version of Brokenlinks.
find and track broken links on your website. Back end to Xenu Link Sleuth. Also tidies URLs on a website, and updates redirected links.
download 2.8MB zip for Brokenlinks Java source, compiled class files, jar and documentation to run on your own machine as an application.

First install the most recent Java.

To install, extract the zip download with WinZip, (or similar unzip utility) into any directory you please, often J:\ — ticking off the use folder names option.

To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to
access brokenlinks source in repository with [Tortoise] Subversion client on wush.net/svn/mindprod/com/mindprod/brokenlinks/.

After you have installed the jar, you can run it as an application. Type:

java -jar J:\com\mindprod\brokenlinks\brokenlinks.jar parms

adjusting as necessary to account for where the jar file is.

download ASP PAD XML program description for the current version of Brokenlinks.

Brokenlinks is free.
$489.00 US donated so far. If the CMP utilities solved your problem, please donate a buck or two, or donate to one of the charities featured in the footer public service ads throughout the website and get a tax receipt.
Full source included. You may even include the source code, modified or unmodified in free/commercial open source/proprietary programs that you write and distribute. Non-military use only.
 
 
A 1 Website Analyser
Download BrokenLinks
Google sitemap
HTML Broken link fixer student project
Xenu

CMP homejump to top You can get the freshest copy of this page from: or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror)
http://mindprod.com/application/brokenlinks.manual.html J:\mindprod\application\brokenlinks.manual.html
logofeedback Please email your feedback for publication, letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to Roedy Green : feedback email If you want your message kept confidential, not considered for posting, please explicitly specify that.
mindprod.com IP:[65.110.21.43]
view BlogYour face IP:[38.107.179.211]
You are visitor number 11.