Build a Perl/CGI voting systemUsing locked DBM files with CGI-driven forms saves client data without DBMS overkill
|
|
|
| 1.0/5.0 (1 votes total) |
|
|
|
Allan Peda March 27, 2006
|
Allan Peda |
Allan Peda has worked as a software developer and a Linux system
administrator for ten years. He previously worked as an environmental
engineer for New York State and for several consulting firms. You can contact Allan at [email protected]. |
Allan Peda
has written 1 articles for CGIDir. |
View all articles by Allan Peda... |
Many Web-deployed applications are written within elaborate
database-driven server-side development frameworks such as PHP and
Java™ servlets, but for simple applications (for example, where the
entire dataset fits comfortably within a Web server's RAM), data
persistence can be easily accomplished using locked DBM files in
conjunction with the Perl MLDBM module. This article
presents a real-world example -- a Web-based voting application -- that
highlights the use of minimal external modules, forgoes using
client-based cookies, and takes advantage of CGI attributes.
It's no secret that software is becoming increasingly complex and that
we are seeing additional layers added to systems in an effort to keep
software components modular. The primary result should be that these
systems are now easier to maintain and more scalable, yet sometimes
these techniques are simply overkill that results in over-designed
software. In other cases, developers choose an overly complex but
well-known technology rather than investigate a simpler but less
familiar alternative.
After all, if all one has is a hammer, then every problem looks like a nail.
I was recently asked to design a small program to tally election
ballots for a university student organization. It was a simple project
insofar as it was limited to serving no more than 500 students over the
course of one week; afterwards, the results would be tallied and
released.
Since this project required an almost trivial level of service,
there would be no benefit to passing queries to and from an external
database. Instead, the script could rapidly read and write data
structures directly. Still, I wanted to put something together that was
a little bit better designed than a few pages of spaghetti code. I
wanted a well-thought out, self-contained design that would also offer
simplified deployment.
CGI considerations: Simplicity vs. complexity
Perl seemed an obvious language to choose for this project -- it is
supported on most typical platforms, and there are many convenient
libraries available in the Perl library repository, CPAN.
As to underlying architectures, the Common Gateway Interface (CGI)
was the first widely used approach to extend Web servers to provide
interactive content. Developers often speak of the superiority of newer
standards such as JSP, .NET, mod_perl, PHP, and ISAPI, and they are
correct to point out CGI's deficiencies. But in the case of this
project, a CGI script that counts votes for several hundred users
hardly constitutes a large-scale application, since all ballot
information can easily be held in the system RAM of the Web server.
This allows the entire lookup table to be loaded into memory each time
a user submits a request to read or write data.
Furthermore, the logical sequence of drafting a ballot, confirming a
ballot, and tallying the results lends itself to splitting up the
logical data into three different physical files. This would minimize
attempts to open locked files.
There would also be no real penalty for a transaction failing
occasionally due to a locked file. Whether a transaction failed due to
network problems or a locked file, the result would be the same: The
user would simply click a second (or third) time, then the vote would
most likely be counted on one of those attempts. This behavior should
be kept in mind, however, as a different application might not be as
forgiving of an inability to process concurrent transactions.
For this project, CGI offers several advantages:
-
It needs no special Web server enhancements.
-
It requires no database engine (in this simple case).
-
It can be incrementally developed.
-
It can even be upgraded later through the use of accelerators such as mod_perl.
Keep in mind, however, that due to platform constraints, CGI
applications (those that create new processes) run much slower on Win32
systems. Also, the Apache Web server is still considered a
Linux™/UNIX®-hosted application, although it may run fine under
Windows®. The IBM's Resources
section offers information on another (non-IIS) Web server alternative
for Win32 systems, as well as a classic explanation of the original CGI
specification on the original National Center for Supercomputing
Applications (NCSA) site.
Functional design considerations
Let's dive right into a major concern for this simple project: the functionality of the design.
Here's the idea. The user is presented with an initial screen
requesting that she enter her e-mail address and select several
candidates from a Web form. The submission of the selections is
recorded as a draft ballot entry locally, then an e-mail verification
is sent to the supplied e-mail address. In this case, I am assuming
that a verified e-mail address is sufficient to establish a user's
identity.
This brings up the issue of multiple voting. Practically speaking, I can think of no way to make it impossible
for users to vote multiple times using multiple e-mail addresses, but
we can limit the tallying of votes so that only one vote is allowed per
e-mail account. This verification e-mail contains a link back to the
originating CGI script, which allows comparison of the link to the
record saved in a local DBM file. If the two records match, a ballot
entry is made in the cast ballot table and the vote is tallied. If
records do not match, no entry is made and the vote is not scheduled
for confirmation. Instead, a new confirmation e-mail is generated, with
a new verification record in the database. This overwrites any draft
ballot entry for the associated e-mail address, effectively starting
the process again from the beginning.
If the records match, the voter may confirm the draft ballot. At
this point, if the voter changes her mind, she could simply return to
the Web form and enter a new draft ballot, which would replace the
previous one. This design presents a reasonably secure system; as long
as each voting user has one and only one acceptable e-mail account,
there is reasonable assurance that users could not vote twice. (I'll
return to this later.)
Let's get into the system details.
Details: Hash keys
The use of hashed keys to create associative arrays in Perl allows
on-the-fly development of complex data structures. When you combine
this feature with the ability to store these (arbitrarily complex)
structures in a binary DBM file, you can develop what amounts to a tiny
database system. The missing component that allows all of this to work
is provided by the MLDBM and MLDBM::Sync modules.
The MLDBM module allows complex Perl hashes to be almost seamlessly stored in a local file. The MLDBM::Sync module makes it possible to safely lock these file, using the $sync->Lock and $sync->ReadLock methods. After loading or saving the structure of interest, a subsequent call to the UnLock() method flushes I/O and unties the variable. (See the Perl documentation regarding the MLDBM::Sync module for more information -- man 3 MLDBM::Sync .)
Essentially, the logic flow is as simple as that shown in Listing 1.
Listing 1. Pseudo-code logic flow
1 unless( defined( $q->param( $vparm ) )){
2 # Display initial voting stuff here
3 # select a candidate
4 $ballotBox->printForm( $q );
5 } else {
6 # if vote is tallied, do _not_ mail a ballot
7 if( $castBallot->voteIsTallied( $q ) ){
8 print "Your vote has already been recorded"
9 } else {
10 #
11 # vote not tallied yet, check if we have a draft ballot on file
12 # and move the draftBallot into the castBallot object
13 #
14 if( $draftBallot->exactMatch( $q ) ){
15 # cast ballot
16 print $q->h2('Thank you, your vote has been recorded.');
17 # add the vote to the cast ballot db file
18 $castBallot->tallyVote( $q );
19 # sum up all votes
20 $ballotBox->addVotes( $castBallot );
21 $cc_msg->send();
22 } elsif ( $draftBallot->voter_is_okay( $voter_email )){
23 # Send e-mail to allow voter to confirm vote
24 $mime_msg->send()
25 } else {
26 print 'Only University ballots are acceptable';
27 }
28 }
29 }
Once I had decided what the underlying conditional flow would be, the
only remaining task was to construct objects that would fit into this
sequence of actions. As I noted previously, the necessary hash data
structures were retrieved and updated using tie 'd variables and MLDBM
file locking. The objects used were more like smart structures than
full-fledged objects; the data was passed between them in a way that
paralleled the progress of a ballot from an initial draft to a final
counted vote.
In other words, the list of ballots was used to construct a DraftBallot , which in turn was used to create the CastBallot and BallotBox classes. In this way there was minimal coupling to the main ballot CGI program.
On another tangent, while I understand that it is generally
considered poor practice to have constructors that rely on external
resources such as files (since they could fail and wind up in an
unpredictable state), the code in this case was much easier to
understand by doing just that. Since Perl doesn't rely on pointers, I
can see no reason not to take advantage of this simplification.
Details: E-mail gotchas
Allowing users to send e-mail from your Web server is a risky move
since spammers could potentially exploit your host to send unsolicited
e-mail. In order to minimize the possibility of such an exploit, the
script always checks to determine if the e-mail is being sent to an
acceptable address. You could further tighten the system by modifying
the validation method voter_is_okay() within the DraftBallot class to consult a list of acceptable e-mail addresses. Effectively, this would require users to register in advance to vote.
Other methods of preventing duplicate votes could involve collecting IP
addresses or setting cookies on the client, but I rejected these
approaches as it is likely that many students would use shared public
terminals on campus.
|
SASL authentication
Many service providers allow e-mail to be relayed only from mail clients that use SASL authentication.
Simple Authentication and Security Layer is a method for adding
authentication support to connection-based protocols in which a
protocol includes a command to identify and authenticate a user to a
server and optionally negotiate protection of subsequent protocol
interactions. If its use is negotiated, a security layer is inserted
between the protocol and the connection.
When using SASL authentication, you have two options: Either point
the script to a machine that can forward the e-mail using correct
credentials, or rewrite the script to contact the external SMTP server
directly using the Perl Net::SMTP_auth and Authen::SASL modules. (The second option is slower.)
|
|
Details: Not-so-secret ballots
The call to the $castBallot->dumpHTMLentrys()
method would echo back a detailed accounting of who voted for whom. In
practice I would comment this call out, scheduling the Web server to be
shut down when elections were over by using the Linux at batch command.
With the server off, you can uncomment this section and restart with the Web server temporarily set to listen only
on the localhost address. The full results would be then be echoed back
to anyone clicking on a previously submitted link, which can be
collected through a copy sent to a dedicated free e-mail account.
Notice that ballots are not tallied twice in this example. These
results are hidden using a short JavaScript function in case it was
decided to make them discreetly available to everyone in an unobtrusive
manner. Admittedly some people would prefer totally anonymous voting,
but since club elections are often done as a show of hands, that hardly
constitutes a secret ballot.
As I thought about this workflow scheme, I realized that the necessity of using a GET -based
verification link and the use of unencrypted verification links made it
a trivial exercise to read these links and construct a false
confirmation ballot based on a specific e-mail address and some known
verification links. In order to discourage this, while still allowing
the easy debugging via unencrypted links, I decided to add one more
twist to the verification process: adding a unique identifier to each
draft ballot.
This identifier was based on the operating system process identifier
(PID) of the executing script. This was combined with a random number
in order to make it difficult to predict the URL to validate a draft
ballot. I was concerned about this because a malicious individual might
be able to deconstruct the very visible URL patterns in order to create
false confirmation ballots. This is the one part of the code that would
not translate directly to a mod_perl
version, since it relies on the use of the PID of the running Perl
instance concatenated with random digits. If the form is generated from
a reused mod_perl instance, the PID number will not necessarily change between invocations.
In hindsight, I realize that the best way to obfuscate this link would
be to use an MD5-generated hash value, effectively hiding all voter
information. This would have the dual benefit of being pretty tough to
forge, while remaining portable to mod_perl -based
scripts. The drawback would be that the code is slightly more difficult
to debug by inspecting the exchange of information between the client
and the server.
Details: File layout
Installation requires three types of directories on the Web server:
-
A writable directory for saving user submissions
-
An area for the CGI to run from
-
An area where static data (such as CSS, logo images, and possibly a file containing more detailed instructions) can reside
Also note that permissions will most likely need to be tweaked so that
the Web server can write to the directory for the DBM files.
Listing 2 shows the creation of some typical directories on the Web server.
Listing 2. Setting up directories on the Web server
$ id uid=500(allan) gid=500(allan) groups=10(wheel),48(apache),500(allan)
$ sudo mkdir /var/www/db /var/www/javascript/ /var/www/css/
$ sudo chmod 2775 /var/www/db
$ sudo chmod 2755 /var/www/javascript/ /var/www/css/
$ sudo chown apache.apache /var/www/db/
Strictly speaking, only the cgi-bin (/var/www/cgi-bin) and DBM
(/var/www/db) directories are absolutely essential, since they hold the
script executable and the voting data, respectively. The layout shown
in Listing 1 is specific to Linux, and the user and group names of the
Web server process may vary, but the essential point is that there are
several components that need to be placed in the correct area of the
file system to be available to the Web server. After copying the
support files into their respective directories, be sure to update any
aliases in the Web server configuration files such as httpd.conf.
After creating the directories as described in Listing 2, copy the
files from the ZIP file into the analogous subdirectories on your
system. Most importantly, the ballot, DraftBallot.pm, BallotBox.pm, and
CastBallot.pm files should be in the cgi-bin directory. Only three
nonstandard Perl modules are required; the process of installing them
is presented in Listing 3 (see the modules' README files for more
details).
Listing 3. Installation of Perl modules
$ sudo perl -MCPAN -e 'install MLDBM'
$ sudo perl -MCPAN -e 'install MLDBM::Sync'
$ sudo perl -MCPAN -e 'install MIME::Lite'
Details: Static vs. dynamic DNS
While it would have been possible for me to set up this service from
a site with an assigned domain on a static IP address, I felt that a
dynamic DNS would offer several security advantages. Normally a server
is not reachable from the Web at large without a static IP address, but
a dynamic DNS service allows me to set up a temporary resolvable
machine name under another top-level domain. This allows me to quickly
appear and disappear from the Internet, minimizing my exposure to black
hats. Best of all, the service is free.
It is also worth noting that it may be advisable to configure the
server to listen on a nonstandard high-numbered port such as 8000 since
many ISPs block incoming connection requests on port 80. A client
(voter) could then typically be referred to the vote server only
through a link from a well-known static address, such as a
school-supplied Web page. When voting is completed, the server offering
the Web application can be completely removed from the Web remotely
without shutting it down or reconfiguring it. There would be no
possibility of any vulnerabilities affecting the referring page (which
may be administered by someone else). This is an especially important
consideration in politically sensitive environments.
Details: Is GET harmful?
Browsers can maintain state by transferring data to referred pages using the GET and POST
methods, as well as through cookie information contained within the
headers passed to the server. In order to confirm that a ballot was
sent from a real person (or at least from an active e-mail account),
draft ballots would be sent to an e-mail address for confirmation.
Additionally, cc: or bcc: messages could be used later for reference.
As I mentioned earlier, the most straightforward way to accomplish this
is to send an HTTP GET structured link to the voter. Still, some authors claim that GET s
that update records are simply bad form. In this case, though, any user
subsequently clicking on a link will simply receive an update as to the
current tally of votes for each candidate, so no harm is done.
Other possible improvements
Other security considerations could and should be taken into account
when using this script. Any program that allows an external entity to
input data is vulnerable to malicious activity, such as buffer
overflows and embedded control characters. Conversely, the use of a
dedicated routine to read and write local DBM files has at least one
benefit: There is no possibility of SQL injection when there is no SQL
back end to access.
In a nod to the need to filter incoming data, I set the variables $CGI::DISABLE_UPLOADS and $CGI::POST_MAX to very strict values. Additionally I recommend the following:
-
String all incoming variables of all unexpected characters and truncate the length to a reasonable limit.
- A lot of runtime data is kept inside the script. The advantage to
this practice is that there are fewer files to distribute and to set
permissions for. The disadvantage is that users may not want to edit
code, and the code becomes less clear. A possible compromise may be to
take advantage of kludges such as the
DATA pseudo-file handle to tuck data at the end of a script.
-
File locking is a very tricky issue and race conditions abound. It
seemed that for every guideline I found spelling out the correct way to
lock files, a subsequent correction was posted. I tried to minimize the
time files were open and leverage the locking mechanism provided for
the
MLDBM module.
-
Perl modules are not put in their own path away from the CGI, so they
could theoretically be executed from the cgi-bin directory. It is
recommended that these modules not be set as executable.
- PHP is practically ubiquitous on Linux platforms, so I would
consider porting this script in PHP if the need to reimplement it
arose. However, I am not sure there is a PHP equivalent to the
MLDBM module.
-
The layout of the voting form is considered unfair by some since it presents the first candidate as a default.
-
I did not use perldoc. I should.
Conclusion
Having the opportunity to set up this system, while attempting to keep
it simple and self contained, allowed me to discover some very useful
Perl modules. I found the process of refining features and developing
functional specifications for such a simple project to be both
interesting and enjoyable. I hope the list of considerations when
crafting this type of system will help you with similar projects.
|