Jonathan Coupal |
ITX is a business consulting and technology solutions firm committed to
consistently providing superior products and services in nine practice
areas including Business Performance, IT Solution Strategies and
Implementation, Internet Marketing, Technical Services, IT Staffing,
Internet Services, and Technology Research.
http://www.itx.net / (800) 600-7785. |
Jonathan Coupal
has written 1 articles for CGIDir. |
View all articles by Jonathan Coupal... |
Executive Summary
Unsolicited commercial email, also commonly known as spam, has
developed a negative reputation because it is at best a waste of
valuable time and at worst an offensive intrusion into one’s desktop.
It is estimated that 56% of all mail that passes through the Internet
is spam which is an increase of 40% from one year ago today1 . In
addition, it is estimated that spam costs an average of $874 per
employee per year, with a loss of approximately 1.4% of productivity
due to managing spam on the desktop2 .
Introduction
Internet email is an electronic system through which messages are
transferred between systems on behalf of their users. It is a trusting
system in that the mail server will deliver a message to the receipt
that it is addressed to. This level of trust becomes a problem when
anyone in the world can send an email to anyone. Individuals and
organizations that send unsolicited email (spammers) are taking
advantage of this trusting system.
Currently, there is very little that can be done to prevent spammers
from creating and sending emails. At this time, the only effective
remedy to this nuisance is implementing a filtering system to aid in
the management of spam.
Determining the Nature of Spam
It is fairly obvious to a person who is reading an advertisement for
Viagra that the message is spam. However, to a computer the email is
just strings of numbers, letters, and symbols. This is the first
challenge in the process of managing spam: how to get a computer to
analyze these strings to recognize and differentiate the welcome from
the unwelcome emails.
The simplest method for avoiding spam is to only accept mail from
authenticated senders. This is easy to implement but would result in
the receipt of almost no email from outside organizations. This is due
to the trusting nature of the email system which treats every incoming
connection as a valid connection. Building a “white list” of known peer
servers from trusted organizations (i.e., business partners and
clients) is, unfortunately, impossible to maintain for a large
organization.
One method for determining whether or not an incoming message is spam
is by testing the sending server for full compliance with RFC 2821 and
correct DNS setups. RFC 2821 is the specification used to describe how
messages are to be sent between SMTP mail servers, and the correct DNS
setups have to occur with the cooperation of the ISP who owns the IP
address assigned to the mail server. This method of testing for
compliance with the RFC’s and collaboration with the sender’s ISP
allows the receiving server to block the incoming connection once it
has determined that the sending server is probably not a mainstream
server. This system works because, traditionally, many spammers have
built very basic bulk emailing engines, and many of these engines are
so poorly constructed that they are barely capable of sending mail at
all. The downside to this method is that there is a high likelihood of
a false positive when interacting with some open source or low-cost
email servers.
Freeware code or low cost systems are commonly used by small
organizations for budgetary reasons. Another issue with this method is
that many spammers send their mail through valid mail servers that have
been inadvertently left available to act as relays (mail forwarders),
and the relay servers will pass all of these initial configuration
tests.
Another common manner of avoiding mail classified as spam is to
identify the sending server as it attempts to send mail by using a
third party mechanism called a Blacklist server. These servers, such as
the Open Relay DataBase or SpamCop, offer a free service by holding
databases of identified addresses of numerous spammers and open relays.
When a mail server attempts to send mail, a simple query of one or more
of these services is typically sufficient to reduce the volume of spam
by 25%. However, these services are not perfect and can result in false
positives or negatives due to either overly aggressive databases or
latency in reporting. For instance, one of the most reliable databases
of server addresses, ORDB.org, only tests to see if the server is
misconfigured as an open relay. If a spammer sends mail from a properly
configured server or from one that has not yet been reported, then the
mail will pass this test. In addition, many companies have had
difficulty getting their valid servers out of these databases, which
causes an issue with irresolvable false positives.
A third way for determining spam is through the use of content
filtering. This method involves the filtering of mail by matching it
against a list of words or phrases. This list of words and phrases is
maintained on the server either by the local system administrator or
through a subscription service. Although this method was popular
several years ago, spammers have become adept in avoiding these
filtering engines by masking their content with misspellings. In order
to keep the list accurate and up to date, the system administrator or
keyword service must continually increase the size of the list. This
particular method tends to be fraught with false positives and false
negatives. For example, one mail user might actually be using Viagra
that would result in a false positive, and V1agr@ is still Viagra that
would result in a false negative. This filtering method is considered
by most as ineffective at filtering out anything but the most offensive
email.
A final method of filtering spam is Bayesian filtering which is a very
successful variation of keyword filtering. This method differs from
traditional filters as it utilizes a statistical method for filtering
messages based on the content of the message and the end user trains
the system by providing feedback on what is and is not spam. Based on
this feedback the filter builds an index of all of the words and
phrases, including misspellings, which tend to occur in messages
indicated as spam. The main reason this system operates better than
conventional keyword filtering is that it is able to filter a message
that misspells the primary keyword phrases or words. Essentially, if
the spammer misspells everything in the message, then it cannot get
through. The only significant drawback to this system is that there is
a delay in implementation due to the need of training the system, and
there is an ongoing need for the end users to inform the system of
false positives and false negatives. The advantage of this system is
the frequency of the required feedback declines over time. Also, the
system reacts quickly to spammers new techniques because it is
constantly learning based on feedback from the users.
Handling Spam
Once a message has been identified as spam, the recipient must
determine how the message will be handled. One approach is to refuse to
accept the message once it has been identified, or to delete the
message before it is delivered to the end user. This method, although a
common method for implementing controls on a mail server, carries the
risk of deleting a valid message that has been improperly identified as
spam.
An alternative way of managing spam is to clearly identify the message
as such. This involves inserting an identifier of some sort into the
header or subject line of the message that identifies it as spam. For
instance, a spam filter can be configured to insert the word [SPAM]
into the subject line so that the recipient can see the message and
manage it appropriately. Once the message is identified, it is simple
enough to set up a rule or filter on the mail client that will
automatically sort the message into a folder marked “spam”. The end
user can periodically review the messages in this folder to identify
any false positives.
Implementing a Filtering System
Building a method for determining whether or not a message is spam is
not sufficient, and until there is a major revision to SMTP, one that
is less trusting, a mechanism needs to be built to somehow filter the
spam out of incoming mail.
The simplest manner to accomplish this is by purchasing a
spam-filtering appliance. This mechanism would be installed in front of
the organization’s current mail server and would serve as a relay that
would filter out spam as it passes through the server. Although this
“black box” system is easy to implement, there are a few issues that
might prompt a system administrator to look elsewhere for an
alternative solution. The first notable issue is that these systems
typically require some sort of subscription service to maintain
functionality. Second, the level of control or optimization available
is minimal and limited to the features or options planned and
implemented in the appliance’s system. Aside from its ease of
installation, this approach allows for very little change or adjustment
to the systems that are already in place.
Another approach similar to the appliance method would be to implement
filtering software in front of the mail server. Once the software is
selected it can be installed on a mail server or on the same system as
the organization’s existing server. The software acts as a relay
server, forwarding and filtering messages into the existing mail
server. Since this is a software-only solution, there are a few
advantages over the appliance solution. One significant benefit is that
there are a wide variety of software solutions offered. Another
advantage is that this mechanism can often be installed directly on the
mail server itself, avoiding the cost of maintaining another system.
This system has the same disadvantage as the mail filtering appliance
in that it is another system for the administrator to maintain.
Many of the more robust mail server systems implement some sort of spam
control mechanism directly, as part of their receipt and delivery
function. This is very easy to use and implement, but is dependent on
how well the filtering mechanism was implemented in the mail server’s
system and which method might be available. The advantage to this is
tight integration with the mail server. For instance, if a mail server
receives a message that is deemed spam, it may automatically reply with
a message stating that the “user does not exist”. This may help to
decrease the likelihood of the spammer sending additional messages to
the address, but other systems that relay messages might not be able to
provide this level of filtering.
Another approach for implanting a filtering system is spam control
software being installed as a component of the mail client itself.
Using this mechanism is simple and inexpensive to implement, but works
differently than some of the server-based controls. For instance, some
of these software controls simply key in on the sender’s email address
and mark anything from an unknown sender as spam. In addition, many of
these systems require a subscription to maintain keyword lists, as
Bayesian statistical techniques are not effective for a single user’s
small volume of mail messages.
An alternative option would be to outsource the entire filtering
mechanism to an outside organization that specializes in mail hosting
and filtering. This is a cost-effective approach, because it is often
much less expensive compared to the purchase and implementation of a
software filtering system on-site. In addition, outsourcing the
function makes change very easy; if a third party provider does not
provide sufficient or accurate filtering, then the provider can quickly
be changed. This level of freedom is not available when a solution is
purchased for local installation.
End User Training
It is important to train end users on how to handle spam when it comes
into their mailbox. A few simple rules can be sufficient to help to
reduce the ongoing impact of spam:
1. Regardless of the circumstances, a user should never respond to
spam. Responding to the message tells the spammer that the email
address is valid and makes the address worth more than it did before.
In fact, one is more likely receive more spam after responding to try
to get off the spammer’s list.
2. Avoid viewing or previewing spam. There are telltale markers
embedded in HTML-based spam that helps to identify that the message was
actually read.
3. Don’t forward spam to a colleague, especially if utilizing a
Bayesian filtering system, as this will lend validity to the content of
the message.
4. Be judicious in how an email address is handed. Filling out every
form on the Internet with a personal or business email address is bound
to result in the receipt of more spam messages.
5. Take email addresses off from a personal or business website and
replace them with forms that send email in the background from the web
server via cgi. There are automated programs that simply scour the web
for email addresses that might be embedded in web pages.
About ITX Spam Filtering
ITX offers a hosted anti-spam solution that provides a blended method
of identifying and handling spam. Acting as an SMTP relay for incoming
mail, ITX’s mail server starts with an identification of the sender by
IP address and submitting for reference to a blacklist database from
ORDB.org. Next, the content is checked for relevancy in a fully trained
Bayesian database. These factors are combined to determine, with 99%
accuracy, whether or not the message is truly spam. Messages identified
as spam will be marked as such in the subject line, checked for virus
signatures, and forwarded on to its final destination. Unlike most
providers, ITX has a strict policy of never deleting or refusing mail
that is earmarked to a client.
This conservative, blended approach has proved to be a stable and
reliable solution that has been extensively tested in a production
environment. Anti-spam filtering is available to both ITX mail hosting
clients, along with POP3/SMTP services, and to Spam filtering clients
as a relay mechanism.
About Jonathan Coupal
Jonathan Coupal is the Vice President and Chief Technology Officer of
ITX Corp, a business consulting and technology solutions firm based in
Rochester, New York. Mr. Coupal manages both the day-to-day and
strategic operations of the Technology Integration Practice Group.
Among Mr. Coupal’s greatest strengths are evaluating customers’ unique
problems, developing innovative, cost effective solutions and providing
a “best practice” implementation methodology. Mr. Coupal’s extensive
knowledge and experience enables him to fully analyze client systems to
recommend the most effective technologies and solutions that will both
optimize their business processes and fulfill immediate and future
goals. Mr. Coupal and his team build a high level of trust with
clients, establishing ITX as their IT partner of choice. Mr. Coupal
holds certifications with Microsoft and CompTia, including MCSE, MCSA,
Security+, Linux+ and i-Net+.
|