image provider

Bulk Spell Checker


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

This is a relatively simple program for bulk spell-checking and correcting an entire website. It has several pieces:
  1. A parser that scans a local copy of the entire website, ignoring HTML (Hypertext Markup Language) markup, creating a HashMap off all the words used anywhere on the entire website, with a count of how many times each word was used. The list is case-sensitive.
  2. An interface to a traditional spell check program to mark the words in the list as good or bad and provide a list of likely alternates for bad words.
  3. You then remove the good words from the list.
  4. Then you display the words sorted in order by frequency, most infrequently used first.
  5. The user can peruse the list and select a replacement word, or type one in.
  6. The user runs a batch replace program that replaces the uses of the bad words with replacements.
  7. That deals with the errors that do not depend on context.
  8. Then the user prepares a list of bad words remaining, in context.
  9. The user can peruse the list and select a replacement word, or type one in.
  10. The user runs a batch replace program that replaced the uses of the bad words with replacement, but only in the one context.

A large website might overflow RAM (Random Access Memory). So you might implement this with a 64-bit JVM (Java Virtual Machine) and a ton of RAM, and offer it as a service on the web.

HTML spell checker
spell checking

This page is posted
on the web at:

http://mindprod.com/project/spellchecker.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\spellchecker.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[18.97.9.174]
You are visitor number