Avoiding the Impact of Spam

	Not rated
	Rate:

Jonathan Coupal
May 11, 2007

Jonathan Coupal

ITX is a business consulting and technology solutions firm committed to consistently providing superior products and services in nine practice areas including Business Performance, IT Solution Strategies and Implementation, Internet Marketing, Technical Services, IT Staffing, Internet Services, and Technology Research.
http://www.itx.net / (800) 600-7785.

Jonathan Coupal has written 1 articles for CGIDir.

View all articles by Jonathan Coupal...

Executive Summary

Unsolicited commercial email, also commonly known as spam, has developed a negative reputation because it is at best a waste of valuable time and at worst an offensive intrusion into one’s desktop. It is estimated that 56% of all mail that passes through the Internet is spam which is an increase of 40% from one year ago today1 . In addition, it is estimated that spam costs an average of $874 per employee per year, with a loss of approximately 1.4% of productivity due to managing spam on the desktop2 .

Introduction

Internet email is an electronic system through which messages are transferred between systems on behalf of their users. It is a trusting system in that the mail server will deliver a message to the receipt that it is addressed to. This level of trust becomes a problem when anyone in the world can send an email to anyone. Individuals and organizations that send unsolicited email (spammers) are taking advantage of this trusting system.

Currently, there is very little that can be done to prevent spammers from creating and sending emails. At this time, the only effective remedy to this nuisance is implementing a filtering system to aid in the management of spam.

Determining the Nature of Spam

It is fairly obvious to a person who is reading an advertisement for Viagra that the message is spam. However, to a computer the email is just strings of numbers, letters, and symbols. This is the first challenge in the process of managing spam: how to get a computer to analyze these strings to recognize and differentiate the welcome from the unwelcome emails.

The simplest method for avoiding spam is to only accept mail from authenticated senders. This is easy to implement but would result in the receipt of almost no email from outside organizations. This is due to the trusting nature of the email system which treats every incoming connection as a valid connection. Building a “white list” of known peer servers from trusted organizations (i.e., business partners and clients) is, unfortunately, impossible to maintain for a large organization.

One method for determining whether or not an incoming message is spam is by testing the sending server for full compliance with RFC 2821 and correct DNS setups. RFC 2821 is the specification used to describe how messages are to be sent between SMTP mail servers, and the correct DNS setups have to occur with the cooperation of the ISP who owns the IP address assigned to the mail server. This method of testing for compliance with the RFC’s and collaboration with the sender’s ISP allows the receiving server to block the incoming connection once it has determined that the sending server is probably not a mainstream server. This system works because, traditionally, many spammers have built very basic bulk emailing engines, and many of these engines are so poorly constructed that they are barely capable of sending mail at all. The downside to this method is that there is a high likelihood of a false positive when interacting with some open source or low-cost email servers.

Freeware code or low cost systems are commonly used by small organizations for budgetary reasons. Another issue with this method is that many spammers send their mail through valid mail servers that have been inadvertently left available to act as relays (mail forwarders), and the relay servers will pass all of these initial configuration tests.

Another common manner of avoiding mail classified as spam is to identify the sending server as it attempts to send mail by using a third party mechanism called a Blacklist server. These servers, such as the Open Relay DataBase or SpamCop, offer a free service by holding databases of identified addresses of numerous spammers and open relays. When a mail server attempts to send mail, a simple query of one or more of these services is typically sufficient to reduce the volume of spam by 25%. However, these services are not perfect and can result in false positives or negatives due to either overly aggressive databases or latency in reporting. For instance, one of the most reliable databases of server addresses, ORDB.org, only tests to see if the server is misconfigured as an open relay. If a spammer sends mail from a properly configured server or from one that has not yet been reported, then the mail will pass this test. In addition, many companies have had difficulty getting their valid servers out of these databases, which causes an issue with irresolvable false positives.

A third way for determining spam is through the use of content filtering. This method involves the filtering of mail by matching it against a list of words or phrases. This list of words and phrases is maintained on the server either by the local system administrator or through a subscription service. Although this method was popular several years ago, spammers have become adept in avoiding these filtering engines by masking their content with misspellings. In order to keep the list accurate and up to date, the system administrator or keyword service must continually increase the size of the list. This particular method tends to be fraught with false positives and false negatives. For example, one mail user might actually be using Viagra that would result in a false positive, and V1agr@ is still Viagra that would result in a false negative. This filtering method is considered by most as ineffective at filtering out anything but the most offensive email.

A final method of filtering spam is Bayesian filtering which is a very successful variation of keyword filtering. This method differs from traditional filters as it utilizes a statistical method for filtering messages based on the content of the message and the end user trains the system by providing feedback on what is and is not spam. Based on this feedback the filter builds an index of all of the words and phrases, including misspellings, which tend to occur in messages indicated as spam. The main reason this system operates better than conventional keyword filtering is that it is able to filter a message that misspells the primary keyword phrases or words. Essentially, if the spammer misspells everything in the message, then it cannot get through. The only significant drawback to this system is that there is a delay in implementation due to the need of training the system, and there is an ongoing need for the end users to inform the system of false positives and false negatives. The advantage of this system is the frequency of the required feedback declines over time. Also, the system reacts quickly to spammers new techniques because it is constantly learning based on feedback from the users.

Handling Spam

Once a message has been identified as spam, the recipient must determine how the message will be handled. One approach is to refuse to accept the message once it has been identified, or to delete the message before it is delivered to the end user. This method, although a common method for implementing controls on a mail server, carries the risk of deleting a valid message that has been improperly identified as spam.

An alternative way of managing spam is to clearly identify the message as such. This involves inserting an identifier of some sort into the header or subject line of the message that identifies it as spam. For instance, a spam filter can be configured to insert the word [SPAM] into the subject line so that the recipient can see the message and manage it appropriately. Once the message is identified, it is simple enough to set up a rule or filter on the mail client that will automatically sort the message into a folder marked “spam”. The end user can periodically review the messages in this folder to identify any false positives.

Implementing a Filtering System

Building a method for determining whether or not a message is spam is not sufficient, and until there is a major revision to SMTP, one that is less trusting, a mechanism needs to be built to somehow filter the spam out of incoming mail.

The simplest manner to accomplish this is by purchasing a spam-filtering appliance. This mechanism would be installed in front of the organization’s current mail server and would serve as a relay that would filter out spam as it passes through the server. Although this “black box” system is easy to implement, there are a few issues that might prompt a system administrator to look elsewhere for an alternative solution. The first notable issue is that these systems typically require some sort of subscription service to maintain functionality. Second, the level of control or optimization available is minimal and limited to the features or options planned and implemented in the appliance’s system. Aside from its ease of installation, this approach allows for very little change or adjustment to the systems that are already in place.

Another approach similar to the appliance method would be to implement filtering software in front of the mail server. Once the software is selected it can be installed on a mail server or on the same system as the organization’s existing server. The software acts as a relay server, forwarding and filtering messages into the existing mail server. Since this is a software-only solution, there are a few advantages over the appliance solution. One significant benefit is that there are a wide variety of software solutions offered. Another advantage is that this mechanism can often be installed directly on the mail server itself, avoiding the cost of maintaining another system. This system has the same disadvantage as the mail filtering appliance in that it is another system for the administrator to maintain.

Many of the more robust mail server systems implement some sort of spam control mechanism directly, as part of their receipt and delivery function. This is very easy to use and implement, but is dependent on how well the filtering mechanism was implemented in the mail server’s system and which method might be available. The advantage to this is tight integration with the mail server. For instance, if a mail server receives a message that is deemed spam, it may automatically reply with a message stating that the “user does not exist”. This may help to decrease the likelihood of the spammer sending additional messages to the address, but other systems that relay messages might not be able to provide this level of filtering.

Another approach for implanting a filtering system is spam control software being installed as a component of the mail client itself. Using this mechanism is simple and inexpensive to implement, but works differently than some of the server-based controls. For instance, some of these software controls simply key in on the sender’s email address and mark anything from an unknown sender as spam. In addition, many of these systems require a subscription to maintain keyword lists, as Bayesian statistical techniques are not effective for a single user’s small volume of mail messages.

An alternative option would be to outsource the entire filtering mechanism to an outside organization that specializes in mail hosting and filtering. This is a cost-effective approach, because it is often much less expensive compared to the purchase and implementation of a software filtering system on-site. In addition, outsourcing the function makes change very easy; if a third party provider does not provide sufficient or accurate filtering, then the provider can quickly be changed. This level of freedom is not available when a solution is purchased for local installation.

End User Training

It is important to train end users on how to handle spam when it comes into their mailbox. A few simple rules can be sufficient to help to reduce the ongoing impact of spam:
1. Regardless of the circumstances, a user should never respond to spam. Responding to the message tells the spammer that the email address is valid and makes the address worth more than it did before. In fact, one is more likely receive more spam after responding to try to get off the spammer’s list.
2. Avoid viewing or previewing spam. There are telltale markers embedded in HTML-based spam that helps to identify that the message was actually read.
3. Don’t forward spam to a colleague, especially if utilizing a Bayesian filtering system, as this will lend validity to the content of the message.
4. Be judicious in how an email address is handed. Filling out every form on the Internet with a personal or business email address is bound to result in the receipt of more spam messages.
5. Take email addresses off from a personal or business website and replace them with forms that send email in the background from the web server via cgi. There are automated programs that simply scour the web for email addresses that might be embedded in web pages.

About ITX Spam Filtering

ITX offers a hosted anti-spam solution that provides a blended method of identifying and handling spam. Acting as an SMTP relay for incoming mail, ITX’s mail server starts with an identification of the sender by IP address and submitting for reference to a blacklist database from ORDB.org. Next, the content is checked for relevancy in a fully trained Bayesian database. These factors are combined to determine, with 99% accuracy, whether or not the message is truly spam. Messages identified as spam will be marked as such in the subject line, checked for virus signatures, and forwarded on to its final destination. Unlike most providers, ITX has a strict policy of never deleting or refusing mail that is earmarked to a client.

This conservative, blended approach has proved to be a stable and reliable solution that has been extensively tested in a production environment. Anti-spam filtering is available to both ITX mail hosting clients, along with POP3/SMTP services, and to Spam filtering clients as a relay mechanism.

About Jonathan Coupal

Jonathan Coupal is the Vice President and Chief Technology Officer of ITX Corp, a business consulting and technology solutions firm based in Rochester, New York. Mr. Coupal manages both the day-to-day and strategic operations of the Technology Integration Practice Group. Among Mr. Coupal’s greatest strengths are evaluating customers’ unique problems, developing innovative, cost effective solutions and providing a “best practice” implementation methodology. Mr. Coupal’s extensive knowledge and experience enables him to fully analyze client systems to recommend the most effective technologies and solutions that will both optimize their business processes and fulfill immediate and future goals. Mr. Coupal and his team build a high level of trust with clients, establishing ITX as their IT partner of choice. Mr. Coupal holds certifications with Microsoft and CompTia, including MCSE, MCSA, Security+, Linux+ and i-Net+.