Broken Links
©1996-2012 Roedy Green, Canadian Mind Products
View the latest version of this manual online at http://mindprod.com/application/brokenlinks.manual.html.
Introduction
BrokenLinks is a tool to help you find and track broken links on your website, namely URL (Uniform Resource Locator)s
that no longer point to anything useful. It is a back end to the Xenu broken link detector that compensates for XENU’s
weakness of overwhelming you with reports of links that are not really broken.
- XENU often tells you a link is broken when it really isn’t. BrokenLinks retests broken links and often removes links that are not really broken from your
consideration. BrokenLinks typically discovers that only 1%
of the links XENU reports as broken actually need to be repaired. 99% of them are just down temporarily or fail XENU’s quick probe,
but pass BrokenLink’s more patient test which more accurately simulates what a real browser does.
- XENU gives you a report of the status of the universe at a given instant in time. A
link may be broken simply because the website it points to is down for a few hours for maintenance. BrokenLinks maintains a history of when a link was last found good and last found bad. It then removes
from your consideration links that may be only temporarily not working.
- There are links that any automated link checker, including BrokenLinks, believes to be
broken, but when you try them manually they, for all practical purposes work. They may include pages that require a
password or certificate override. Sometimes these involve multiple layer of redirection, problems with one of the
intermediate steps. With BrokenLinks, you can provide a list of such URL s, to treat, at
least for now, as good, to take these out of consideration. BrokenLinks retests this list
itself and prepares a list of them in a form that you can manually retest them too, to make sure they truly are
good, (or irrelevant). XENU’s similar feature simply ignores all such links for now
and all time. Out of site, out of mind.
You get the basic idea. BrokenLinks whittles XENU’s giant
list of broken links to the ones you should look at first. This saves you immense amounts of time researching links
that are not really broken.
Both XENU and BrokenLinks share a common limitation. They
can’t detect a broken link that has been redirected to a working place-holder site, e.g. one advertising that
the domain is up for sale. Similarly, some sites just quietly redirect all broken links to the home page.
BrokenLinks cannot detect that. Most embarrassingly, BrokenLinks
can’t detect a domain bought out by a pornography company. You can still have people threaten to sue or kill
you for deliberately trying to send to them to a porn site.
Why use Xenu?
- XENU is extremely fast compared with the competition.
- It is free.
- It does the spidering work (link chasing and finding) that BrokenLinks does not yet
handle.
Finding the broken links is only 10% of the work. Fixing them is what is so labour
intensive. If you let your website deteriorate with broken links, visitors become frustrated, and stop visiting.
Having clean links encourages Google to take your site more cleanly.
How to Use Xenu
Download and install a free copy of
Xenu Link Sleuth.
First you spider your local copy of your website with Xenu. Read
the Xenu documentation on how to do that. You first have to be sure
XENU is working properly before BrokenLinks will work. Use
XENU directly to find orphans.
Once you are pretty sure you have XENU configured correctly, run it on your local
website, with external link checking turned on.
Be careful to verify the check external links option is on at the very last moment
before you start the spidering.
When it has finished spidering your website and checking all the links, click Export Page Map to
TAB-separated File. (Don’t confuse this with Export to TAB-separated File).
You may optionally get XENU to also produce an HTML (Hypertext Markup Language) report.
Configuring Brokenlinks
Download and install a free copy of Brokenlinks.
The first time you use BrokenLinks you must configure it by creating a text file with a
text editor. It will look something like this:
Configure it according to the embedded comments. Then save the file, giving it a name of the form xxxx.properties.
The properties are all pretty straightforward except for brokenForgivenessDays=7.
- If you have only a handful of broken links, and you religiously run XENU/BrokenLinks every day, you might set brokenForgivenessDays=2, though I
still set it to 6. One advantage of running every day is you stay on top of researching and repairing broken links.
You are never faced with large numbers of them to fix all at once. I personally run BrokenLinks twice a day so that I test sites at different times of day, avoiding treating them as
dead when they are just temporarily down for backup. Further, that way I rarely have more than a couple of links to
research at any one time.
- If you have only a handful of broken links, and you religiously run XENU/BrokenLinks twice a week, use brokenForgivenessDays=5
- If you don’t want to think about brokenForgivenessDays, leave this property
out, and accept the default: brokenForgivenessDays=7
- If you have only a handful of broken links, and you religiously run XENU/BrokenLinks every week, use brokenForgivenessDays=8
- If you have hundreds of broken links, and you run XENU/BrokenLinks only every once in a while, use brokenForgivenessDays=14
- You can experiment setting it to various values. The smaller the brokenForgivenessDays number, the sooner and the more broken links will be revealed to you.
However, you will be pestered with more temporarily broken links. If you are feeling overwhelmed by broken links,
increase the value to show you only the deadest links. The minimum value that makes much sense is 1. XENU itself effectively uses 0.
| BrokenLinks Files |
| file |
Description |
| brokenlinks.html |
List of broken links that have remained broken for a number of days. In HTML format so that you can embed
them in an HTML page to view and research them with a browser. |
| brokenlinks.properties |
Master BrokenLinks configuration file. Names and locates other files. You might
rename it to some other *.properties name. You specify the name of this file on the
BrokenLinks command line. It contains links to the names and locations of the other
files. |
| DESCRIPT.ION |
Optional TCC (Take Command line) file descriptions for the TCC Describe program. |
| history.bin |
Link checking history database. In binary, not human readable. It contains a records of all the links on
your website, when they were last tested good and last tested bad, (echoes of Santa Claus). It gets updated each
time you run BrokenLinks with information from the XENU
spider and from BrokenLink’s own slower but more reliable tests. |
| permanentRedirects.csv |
URLs
that have been permanently redirected. You will likely want to update most of these to the new value
with ReplaceURLs. |
| presumedgood.csv |
List of presumed good URLs
that BrokenLinks will not check because they fail even
though they are actually OK. |
| report.txt |
Report from BrokenLinks on how the last run went. |
| temporaryRedirects.csv |
URLs
that have been temporarily redirected. You might want to update a few of these to the new value with
ReplaceURLs. |
| xenupage.csv |
Output from XENU that BrokenLinks uses for input. |
| _O_V_E_R_V_I_E_W.txt |
An optional one-line description of each file. |
Running Brokenlinks
Now run BrokenLinks like this:
java.exe -jar brokenlinks.jar xxxx.properties
If you have Jet, you simplify that to:
brokenlinks.exe xxxx.properties
You will get a report of the critical broken links to research both in text and html form. Embed the html in a web
page somewhere. Here is my list of broken links for
mindprod.com. The layout is designed so make it easy to research the problems. You can click to get the page
where the broken link is, or click to where it was trying to go.
Then research the broken links and fix them. The run XENU again, click Export Page Map to TAB-separated File and run BrokenLinks. Run this cycle at
different times of the day, since some websites shutdown part of the day for maintenance. You want to catch them when
they are up. Run the cycle after repairing a batch of links to see how you did. After you get the list whittled down
to none, run the cycle weekly, twice weekly or daily to stay on top of the broken links. I find running it daily
works best since you never get overwhelmed with work, and thus are not tempted to postpone the work.
If you are pressed for time, you an also rerun BrokenLinks without a new XENU run. This will catch most of the problems you would rerunning XENU, but
not all.
If you erase the history.bin file, it will automatically start over from scratch
collecting history.
It is best to run BrokenLinks at various times of day so that you won’t think a
site is down that is just offline for an hour each day for backup. I am a bit compulsive. I run it twice a day.
Presumed Good File
If you find a link that XENU/BrokenLinks thinks is broken, but
which is actually ok, or it doesn’t matter for some reason, add it to your list of presumed good links. The
presumedgood.csv CSV (Comma Separated Value) file will look something like this:
Thereafter that presumed good link will be excluded from the broken links list.
Sample Text Report
Here is roughly what the text report that BrokenLinks produces will look like:
Sample HTML Export
Here is roughly what the combined broken links and presumed good HTML report that BrokenLinks produces will look like:
Broken Links Sorted by Error Code
There are 5 links that have been broken for at least 6 days yet to be fixed. Last revised: 2010-02-02
Links Presumed Good
Xenu claims the following links are broken, but they have been manually found to
be good. They should be manually rechecked from time to time. The problem may be
an unknown SSL certificate authority which needs to be OKed manually,
(a missing/unknown/uninstalled certificate root authority) or
it may be the website sends the data, but with not-found status.
There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02
SSL (Secure Sockets Layer) certificate authority which needs to be OKed manually,
(a missing/unknown/uninstalled certificate root authority) or
it may be the website sends the data, but with not-found status.
There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02
Repairing Broken Links
Here are some tips to help you find a replacement link for a broken one.
- The more often you run XENU/BrokenLinks, the better you odds
you will catch a website when it is up, and thus have fewer false broken links to deal with.
- If an entire website goes down, procrastinate fixing any links. It will usually come back within 5 days. If the website itself is up, check to see that most of it is working before investing time
fixing links. They may just be having temporary server problems.
- Email the author or webmaster telling them that a certain link is not working and ask if the material is still
available and where. Often it is a technical problem they are unaware of e.g. a file accidentally deleted or a
website down. They fix it and the link will come back to life within a day or two. They are typically embarrassed
and they thank me profusely for bringing the problem to their attention.
- Go to the home page of the target website, and use the local search to see if you can find the document.
- Use Google’s site search, e.g.site:mindprod.com to get Google to look only on
one particular site.
- Go to the home page and try to find what you want by using the menu system.
- Look in the google caches. The original document may be there. The date on the cache can be a clue too.
- Look in http://web.archive.org, aka the wayback machine to
look in the old snapshots of the site.
- If you know the title of a video, Google will almost surely find it posted somewhere else.
- When you first insert a link, and have duplicate sources, record them. They may come in handy later.
- Make sure you label links (perhaps in the comments) with what they are, and perhaps quoting a little content.
Having a precise quotation of some content will make it easier to find the document if it moves.
- Scan the XENU export document for any unrecognised links. Chase them back to the
original link you put on you web page. These complicated chains occur when a website keeps referring you over and
over to a replacement of a replacement before it finally tells you it does not have the document. This is tricky to
do, so I do it as a last resort. I hope eventually to get BrokenLinks to automate this for
me.
Automatically Repairing Redirects
BrokenLinks can automatically repair permanently redirected URLs. Websites often
reorganise, and leave behind tombstones on the old page that describe where the information is now. Your browser will
automatically follow these chains to find the new information. You know this has happened when the URL displayed when
the page in found does not match the original. It is best to update your web pages with the new link since they
browse faster by going direct to the link, and because they will continue to work if the tombstone is deleted.
BrokenLinks has a feature to automatically maintain these changes for you. BrokenLinks automatically exports a redirects.csv CSV file that gives the old
URL, the new URL, and the pages where the old URL appears. It is best to manually examine this list to prune any
changes you don’t want to apply, e.g. Yahoo’s replacement links that go preposterously on and on and one.
Then use replaceURLs to process that file and apply the changes to your local website
mirror. Best take a backup before you try it out. If you generate URLs
with code, import them from databases,
replaceURLs will correct your website, but the changes will be undone the next time your regenerate your HTML.
You can use the CSVPatch utility to automatically replace URLs
in CSV files as well.
Here is the TakeCommand script I use to run BrokenLinks, automatically discard some of
the redirects I won’t apply, let me edit the list of both permanent and temporary links, and also use them to
update two CSV files, hassle.csv and air.csv.
I also scan the temporary redirects looking for redirects to pages with names containing words like
error or suspended. I then manually check these out. Usually it means the website owner
has not paid his ISP (Internet Service Provider) bills and the account has been suspended. Sometimes sites have died, or not paid bills and the
owner or ISP redirects them to another living site, sometimes the ISP ’s or someone else’s parking site.
He should use a permanent redirect, but uses a temporary one instead. I can catch these by eyeballing the list. The
list is mostly just internal housekeeping junk, so I don’t scan it carefully every day.
ReplaceURLs
ReplaceURLs is a command line utility that takes only one parameter, the name of the file of redirects. e.g.
java.exe replaceurls.jar E:\redirects.csv
You don’t have to tell replaceURLs where your local website mirror is. The names of
the files that need changing are in redirects.csv. You told it earlier when you configured
BrokenLinks where your website files were and you also told XENU.
You might want to repair some of the links manually. You want to make sure the new link truly points to the
original information, not some parking page. Just prune the ones you want to ignore or handle manually, and feed the
remainder to replaceURLs
ReplaceURLs presumes all your URLs
are pure lower case. It won’t find them if they are mixed or all upper
case, (except for the tail end path part). Some validator programs will complain about URLS not in all lower case.
You can condition your website to use all lower case URLs
by running TidyURLs.
TidyURLs
TidyURLs will clean up the links on your website, making sure they are lower case (just the host part). They will
put quotes around URLs
that are missing them. It will replace spaces in URLs
with %20. There are many other cleanups
and validations. l It is a command line utility that allows the switches -s for
subdirectories too, -q for quiet, -v for verbose, -dry for dry run (does not actually change your files, just tells you what it would do if the
-dry option were not there. It allows you to specify which files or file trees you want to
process. It automatically ignores all files except *.html files. Here is how you typically
use it:
Futures
Here are various ways I hope eventually to improve BrokenLinks:
- Vastly improve the speed of rechecking links by checking 30 of them time simultaneously the way XENU does.
- Convert to Java Web Start. This will make the program
easier to use by novices since it will not require configuration. The Configuration properties file will be
replaced by a GUI (Graphic User Interface). The user will not have to manually allocate a directory for the history file.
- Remove the dependence on XENU. Handle everything it does in BrokenLinks. This will as a side effect make BrokenLinks notice local links
that are in the wrong case. Wrong case links work under Windows and XENU, but fail after
you upload to a Unix-based webserver.
- Avoid checking links that recently checked OK to vastly speed up link checking. You could then afford to do it
daily or even before every upload. XENU rechecks everything from scratch every time you
run it.
- Tools to insert warnings styles on broken links so they will have an icon next to
them warning your visitors of the problem and letting them know you are aware of it.
Acquiring HtmlMacros
| Package | Version | Released | Licence | Language | Notes | |
|---|

Brokenlinks |
2.3 |
2011-02-08 |
free |
Java |

❘ precis
❘ manual
❘ screenshot
❘ browse source repository
for the current version of Brokenlinks.
find and track broken links on your website. Back end to Xenu Link Sleuth. Also tidies URLs on a website, and updates redirected links.
2.8MB
zip for Brokenlinks Java source, compiled class files, jar and documentation to run on your own machine as an application.
First install the most recent Java.
To install, extract the zip download with WinZip,
(or similar unzip utility) into any directory you please,
often J:\ — ticking off the
use folder names option. To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to
access brokenlinks source in repository with [Tortoise] Subversion client on wush.net/svn/mindprod/com/mindprod/brokenlinks/.
After you have installed the jar, you can run it as an application. Type: java -jar J:\com\mindprod\brokenlinks\brokenlinks.jar parms
adjusting as necessary to account for where the jar file is.
download ASP PAD XML program description for the current version of Brokenlinks.
Brokenlinks is free. Full source included.
You may even include the source code, modified or unmodified
in free/commercial open source/proprietary programs that you write and distribute. Non-military use only. |
|
|
| |
|---|
Links
  |
You can get the freshest copy of this page from: |
or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) |
| http://mindprod.com/application/brokenlinks.manual.html |
J:\mindprod\application\brokenlinks.manual.html |
 | Please email your feedback for publication, letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
If you want your message kept confidential, not considered for posting, please explicitly specify that. |
| Canadian Mind Products |
|
| mindprod.com IP:[65.110.21.43] |
| view Blog | Your face IP:[38.107.179.211] |
| Feedback | You are visitor number
11. | |