This essay does not describe an
existing computer program, just one that should exist. This
essay is about a suggested student project in
Java programming. This essay gives a
rough overview of how it might work. I have no source, object,
specifications, file layouts or anything else useful to implementing this
This project outline is not like the artificial, tidy little problems you
are spoon-fed in school, when all the facts you need are included, nothing
extraneous is mentioned, the answer is fully specified, along with hints
to nudge you toward a single expected canonical solution. This project is
much more like the real world of messy problems where it is up to you to
fully the define the end point, or a series of ever more difficult versions
of this project, and research the information yourself to solve them.
Everything I have to say to help you with this project is written below.
I am not prepared to help you implement it; or give you any additional
materials. I have too many other projects of my own.
Though I am a programmer, I don’t do people’s homework for
them. That just robs them of an education.
You have my full permission to implement this project in any way you please
and to keep all the profits from your endeavour.
Please do not email me about this project without reading the disclaimer above.
There are many spam filtering programs, however I have not found a
single one that worked satisfactorily. What makes this suggested student project different?
- It is user-extensible.
- It is written in pure Java, so runs almost anywhere.
- You can use it in four modes.
The Four Modes
- As a parallel email client. It goes and gets the mail before your email program does and deletes mail from the
server that is spam, and marks mail on the server that is likely spam. The advantage of this is simplicity. You
don’t have to reconfigure you mail clients. The disadvantage is that spam can leak through in the time
between you run the filter and the time you run you email program. The other disadvantage is you have to turn off
automatic mail fetch in your email program to ensure the spam filter has completed.
- As a POP3 (Post Office Protocol version 3) proxy. Your email program fetches mail from the spam filter and the spam filter fetches it from the
server. The disadvantage of this approach is you have to configure your email program to use the proxy filter for
receiving and the ordinary mail server for sending. You can’t do that with some email programs such as Eudora
- As a POP3/SMTP proxy. Your email program fetches mail from the spam filter and the spam filter fetches it from
the server. The disadvantage of this approach is you have to configure your email program to use the filter. It is
inefficient to send outgoing mail through the filter.
- As a server application. The program runs on the server and deletes spam without even having to download it.
The disadvantage is you need permission from your ISP (Internet Service Provider) or Webserver admit to run the filter this way.
You can write your own spam filters that are applied just like built-in ones. All you need do is write Java code that
implements this interface:
Some of the filters you could write using this interface include:
- hook into Vipul’s razor.
- List of spam words
- list of spam phrases, perhaps with weights.
- Filter that extracts everyone in your address book as a friend.
- Filter that extracts everyone you have sent mail to recently as a friend.
- Baysian filtering.
- Neural net.
- Avoid languages you don’t speak.
- Something to deal with a particular virus’s junk mail.
A simple filter that acts as a parallel email client can be implemented using JavaMail fairly easily. Each filter can
extract information from its own configuration file, or from -D system propertise.
It is simpler than most such applicaions, since it has no user interface, just a configuration file. It is
intended to be used by Java programmers, who then may configure it and install it for their technopeasant friends and
What need are two interfaces for the spam filter.
One is a Java Interface, that gets given a Javamail MimeMessage and returns a
percentage likely this is spam.
Another is a socket protocol where the message gets sent to a socket and gets back a rating. The filter can then
be written in any language.
you need some way to register the existence of multiple spam filters.
Then any email program can plug into any combination of spam filters.