image provider

Date-Sensitive Search Engine


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

Let us say that I was trying to track down what George W. Bush had to say about torture at Abu Ghraib. In particular the thing I am after was some time in January or February in 2005. I would like to be able to search like this Bush Abu Ghraib between:2005-01-01: 2005-02-28

I don’t mean the web page was updated or created between those dates, I mean that the web page contains an internal date, e.g. a sentence of the form: Mr. Bush said that he was totally innocent of any wrongdoing an Abu Ghraib on January 3, 2005. It was all the fault of a few bad apples.

When you are spidering the web to create the indexes, you have to be clever about plucking dates. The article might have a date of the form 01/19/2005 and the date reference might be last wednesday You have to handle all the many forms of dates and date references embedded in aricles and convert them to ISO-8601 (International Standards Organisation # 8601 (date/time)) standard YYYY-MM-DD iso form.

You create a list of dates and the offsets at which they occur in the document.

You could demonstrate this with a local database of articles randomly selected from the web. Then show your demo to each of the search engine companies and see if you can sell your date-plucking code for them to incorporate. It had better be quick. Every document on the web will be repeatedly run through your algorithm.


This page is posted
on the web at:

http://mindprod.com/project/datesearchengine.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\datesearchengine.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[3.145.23.123]
You are visitor number