This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in
Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything
else useful to implementing this project. Everything I have prepared to help you is right here.
This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is
fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the
define the end point, or a series of ever more difficult versions of this project, and research the information yourself to solve them.
Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many
other projects of my own.
Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.
You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
There are many spam
filtering programs, however I have not found a single one that worked satisfactorily.
What makes this suggested student project different?
- It is user-extensible.
- It is written in pure Java, so runs almost anywhere.
- You can use it in four modes.
The Four Modes
- As a parallel email client. It goes and gets the mail before your email program
does and deletes mail from the server that is spam, and marks mail on the server that
is likely spam. The advantage of this is simplicity. You don’t have to
reconfigure you mail clients. The disadvantage is that spam can leak through in the
time between you run the filter and the time you run you email program. The other
disadvantage is you have to turn off automatic mail fetch in your email program to
ensure the spam filter has completed.
- As a POP3 (Post Office Protocol version 3) proxy. Your email program fetches
mail from the spam filter and the spam filter fetches it from the server. The
disadvantage of this approach is you have to configure your email program to use the
proxy filter for receiving and the ordinary mail server for sending. You can’t do
that with some email programs such as Eudora 6.
- As a POP3/SMTP proxy. Your email program fetches mail from the spam filter and the
spam filter fetches it from the server. The disadvantage of this approach is you have
to configure your email program to use the filter. It is inefficient to send outgoing
mail through the filter.
- As a server application. The program runs on the server and deletes spam without
even having to download it. The disadvantage is you need permission from your
ISP (Internet Service Provider) or Webserver admit to run the filter this way.
You can write your own spam filters that are applied just like
built-in ones. All you need do is write Java code that implements this interface:
Some of the filters you could write using this interface
- hook into Vipul’s razor.
- List of spam words
- list of spam phrases, perhaps with weights.
- Filter that extracts everyone in your address book as a friend.
- Filter that extracts everyone you have sent mail to recently as a friend.
- Baysian filtering.
- Neural net.
- Avoid languages you don’t speak.
- Something to deal with a particular virus’s junk mail.
A simple filter that acts as a parallel email client can be
implemented using JavaMail fairly easily. Each filter can extract information from its
own configuration file, or from -D system propertise.
It is simpler than most such applicaions, since it has no user interface, just a
configuration file. It is intended to be used by Java programmers, who then may configure
it and install it for their technopeasant friends and customers.
What need are two interfaces for the spam filter.
One is a Java Interface, that gets given a Javamail MimeMessage and returns a percentage likely this is spam.
Another is a socket protocol where the message gets sent to a socket and gets back a
rating. The filter can then be written in any language.
you need some way to register the existence of multiple spam filters.
Then any email program can plug into any combination of spam filters.