image provider

Link Replacer


Disclaimer Wrinkles
The Problem Selling
Your Mission Links
Implementation

Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

The Problem

Imagine you are the webmaster of some large website, e.g. the CBC (Canadian Broadcasting Corporation). A listener points out that there are hundreds of obsolete links on the website to the old url for the home page of some show. They need to be updated. The site is so large it would take an unacceptably long time to do a global search and replace. Further other people are working on all these pages. You might not be able to get immediate access to a particular page. Most of the pages in the website don’t even need to be revised.

Your Mission

Your mission, if you choose to accept it, is so find a way to rapidly update all the links and generate 303 permanent redirects to redirect hits from outside the website to the old pages.

Implementation

This is similar to the way a search engine works. You periodically spider the website, avoiding respidering documents that have not changed since the last spider. You create a giant index. The key is the absolute URL (Uniform Resource Locator) linked to, the value is the list of pages where that link appears.

When you get a request to update, You look up that URL in the index, then search each of those pages in turn and replace the URLs (Uniform Resource Locators) with the new one. If you cannot get access to a page, you put in queue to do later.

After the next spider, you need to automatically repeat all the updates since the last spider, but only on the newly spidered pages.

Wrinkles

Selling

After you get this working on your own small website, find some broken links on the website of some potential customer. You might use BrokenLinks. Use your program to generate a list of the pages that need to be fixed, and a list of redirects. Send these results, innocently, as if you had discovered them manually. Repeat. Eventually they should ask you how you know all this. If they don’t, try a different potential customer. This way you will not be rebuffed as a salesman and you will have already proved the value of your program.

CSVReplaceUrls
HTTP
redirection
response codes

This page is posted
on the web at:

http://mindprod.com/project/linkreplacer.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\linkreplacer.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[52.55.19.189]
You are visitor number