Intro to CGI"What is CGI? What is Perl? Chmod? Executable? 501 Errors? What the hell is a cgi-bin?"
|
|
|
| 3.0/5.0 (5 votes total) |
|
|
|
Matt Kruse April 06, 2006
|
1. WHAT
IS
CGI?
CGI stands for "Common Gateway Interface" - a term you don't really
need to know. In short, CGI defines how web servers and web browsers
handle information from HTML forms on web pages. That's simplifying it,
but you get the point. In the broader sense, however, the term 'CGI' is
often used to mean "any program that runs on a web server and interacts
with a web browser". You may hear someone ask, "Where can I get a CGI
script to handle this form?" or "Use CGI to do what you need". What
they are referring to is a program of some sort that runs on your web
server.
2. DYNAMIC
CONTENT
Okay, so a web server spends most of its time answering requests,
loading the HTML page that a user is requesting, and sending it to
them. Nothing too complicated. But this isn't very exciting, is it?
What if I want the user to see something different every time they load
a page? What if I want to ask a user for information, and save it to a
database? What if I want to display information from a file that may
change 5 times a day? In situations like this, loading a static
(non-changing) page just isn't good enough. We need the web server to
run a program, take some action, and then send a results page back to
the user's web browser. The results page might be different every time
the program is run.
Let's take an example...
You create a web page in your browser that has a form in it, asking the
user's name and email address. There is also a 'Submit' button. When
the user presses submit, their information should be saved in a file on
the web server that you can view later, and they should get a 'Thank
You' screen back.
In your original HTML document, you will have a <FORM> and some <INPUT> tags. For example:
<FORM METHOD="POST" ACTION="http://www.server.com/cgi-bin/program.cgi">
Name: <INPUT NAME="name"><BR>
Email: <INPUT NAME="email"><BR>
<INPUT TYPE="SUBMIT" VALUE="Submit">
</FORM>
This is a basic form. Webmonkey's HTML Tutorial
(http://www.hotwired.com/webmonkey/99/30/index4a.html?tw=html) is good if you need to learn more about Forms in HTML. If you don't
understand forms, read up on them first. You can't really tackle CGI
scripts without understanding forms.
The <FORM> tag has two parameters that are important for us. The
METHOD tag defines how the browser will send the information to the
server, and how the web server will send it to your program. It can
either be "POST" or "GET" - you will most often see "POST". For a full
explanation of the difference, you need a longer tutorial or a book.
The other parameter, ACTION, is the URL of the program on the server
that will process the information sent from the form and do something
with it.
When the user hits 'submit' the web browser makes a connection to the
server, requests the URL in the 'ACTION' paramater, and also sends all
the form values that the user entered. The web server looks at the URL,
realizes it is a program rather than a static file, and runs it. The
program then grabs all the data sent to it, does something, and returns
HTML back to the browser as the response. That's it! That's the basic
process that almost all CGI scripts are going to go through.
3. CONFIGURATION
ON
THE
SERVER
SIDE
In order for all this to happen, you need to make sure your web server
is setup to handle this whole thing. By default, some are and some
aren't. You will probably need to check with your webmaster or ISP to
see if your server is setup correctly, and if you are able to run CGI
programs. Or if you are
the webmaster, you need to read your server's documentation with
respect to CGI. But it's important to understand why it works and why
it doesn't.
When you (your browser, actually) request a URL from a server, the
server needs to do some checking to find out what to do. How does the
server know if the URL you are requesting is a static file it should
just load and send, or if it's a program it should run and send to you?
This is typically decided by two factors: Which directory the file is
in, and its file extension.
First, let's look at the directory part. If you're reading this, you've
no doubt heard of a 'cgi-bin' directory, and noticed that most CGI
scripts need to be in this directory. Why? Well, this is a server
configuration issue. The server is setup to know that any file in this
directory is a program to run, and not a static file to send to the
browser. Usually, you can't even put a regular HTML file in this
directory, because when the server tries to load it, it will try to run
it as a program rather than just send it as a file.
Are you curious where the name 'cgi-bin' came from? Well, it goes back
to the original days of the NCSA web server. By default, this web
server had two directories: cgi-src and cgi-bin. The first contained
source code for CGI programs that could run on the server. The second
contained the binaries (compiled executables) of the programs, which
could be run on the server. Web servers typically don't have the
cgi-src directory anymore, but the name cgi-bin has stuck around as the
'default' place to put executable CGI programs on a web server.
Now let's look at the second factor to determine whether a web server
runs the file or loads it as a static file: the file extension.
The extension of a file on the server - .html, .cgi, .pl, .txt, etc -
tells the server what kind of file it is and how to handle it. It knows
that .html and .txt files are plain text static files that should just
be sent to the browser, for example. You can add your own file
extensions through the web server's configuration options, and tell it
how to handle those files. The .cgi extension is one example of an
extension that the web server is configured to recognize as a program
it should run.
Okay, now let's take another look at the <FORM> line from our example above:
<FORM METHOD="POST" ACTION="http://www.server.com/cgi-bin/program.cgi">
When the web browser sends its request to the ACTION URL, the web
server sees that it is in the cgi-bin directory, and its extension is
.cgi - so it knows that this is a program that it should run. So it
hands off a request to the operating system telling it to run the
program, and also passes all the form data to this program. Makes
perfect sense, doesn't it?
4. RUNNING
THE
PROGRAM
We're now at the point where the web server has decided it should run
the CGI program, and its made the request to the Operating System to
execute the file. This is where a lot of problems start happening,
because there are a lot of things that need to be exactly correct in
order for the program to run successfully and send the output back to
the web browser. Some of these potential problems are specific to UNIX,
and some are specific to Windows NT (I won't go into other operating
systems because these two are the most common). I'll just go down the
list of things that need to be correct in order for this to work.
1. The file needs to be executable (Unix only)
In Unix, files have attributes that don't exist in the Windows NT
world. One of these is the executable bit. Each file has a setting that
tells the operating system whether it can be executed as a program or
not, and whether it can be run by only the file owner, only the group
that the file owner is in, or by everyone on the server. In order for
the operating system to run the file, it needs to be marked as
'executable' by Everyone. This is what the 'chmod' command does. I
won't go into detail about how chmod works, but when you see an
instruction that says something like 'do a chmod 755 on the program.cgi
file', what it is telling you is to make the file executable by
everyone on your server, so it can be run from the web server.
2. The file needs to point to a valid executable (Unix only)
For .cgi files, the server knows to run it as a program, but it needs
to know HOW to run it. If it's a compiled executable, there's no
problem - it just runs it. But if it's a script using a language like
Perl, it needs to know where to find the Perl program that will run the
script. This is the function of the first line of the file. For example:
#!/usr/bin/perl
In Unix, this points to an executable file (in this case, the program
is named 'perl') that will run your script. The first two characters -
#! - is called a shebang, and it's common Unix syntax. If your script
starts with the line above, and your server doesn't have a program
called /usr/bin/perl, the whole thing will die and you'll get an error
back. For perl scripts, the line above is typical, and most servers
have /usr/bin/perl. But in some rare instances, things are configured
differently and you need to edit this first line to point to a valid
program to run.
3. The file needs to have an executable extension (NT only)
In the NT world, execute permissions don't exist and the 'shebang'
doesn't apply. (Note: Some web servers on NT are "smart" and have been
designed to use the shebang line to act like Unix). So you don't need
to worry about using 'chmod' on servers running Windows NT, and you can
ignore the first line of the file. What you do need to be concerned
about, however, is the file extension.
Windows decides which program to use to open a file based on its
extension. If you want to run a perl script on NT, for example, you
need to give it an extension of .pl, and the Operating System will know
to use Perl to open and run the file.
4. The program needs to return a valid response
Any CGI programming that runs needs to return a valid response to the
browser. If it encounters a problem while running and dies, it could
output an error message, however. If you were running the program in a
normal window on NT or Unix, you would simply see the error message. In
the web world, however, the program needs to hand its response back to
the web server, who then packages it up to send back to the browser. If
the program outputs an error message, the web server does not get the
response it expects and instead returns a general error (501 error, for
example) back to the browser saying there was a problem running the
program.
5. THE
WHOLE
SERVER -
SIDE
PROCESS
Now that you understand how things need to be setup, it's a good time
to step through the whole process and see exactly what happens when a
CGI script is run. Going back to our original example, here is the
sequence of events (assuming a Unix server):
1. The browser requests the URL in the ACTION tag, and passes all the data along with the request
2. The server recognizes that .cgi means that this file should be run
3. It checks to make sure that CGI programs are allowed to run on the server
4. It checks to make sure that CGI programs are allowed to run in the /cgi-bin directory
5. It launches a sub-process to run the program in the operating system
6. The operating system opens the file and looks at the first line to see which program to use with the script
7. It runs this program and passes it the filename to run
8. The script runs, does whatever it needs to, and then returns an HTML response, using print() statements, for example
9. The whole response is passed back to the server which then packages it up in an HTTP response, including content length, etc.
10. The server then passes the whole response back to the browser, which displays it.
6. POSSIBLE
PROBLEMS
Of course, that whole process doesn't always go as planned, and there
are some things that can stand in the way of your program running
correctly.
1. File permissions
When the web server launches a sub-process to run the program (Step 5
above) it does a trick. It changes the User ID of who it is running as
to a user that has very little or no permissions to do anything on the
web server. This is for security purposes - so you can't write a script
that over-writes important files on accident, or deletes whole
directory trees. But this also creates a problem when your program
tries to access files to read and write. If you want your program to
write to a file, you need to make sure it has permissions setup
correctly for this user (usually a user named 'nobody') to write to it.
Once again, you need to use the 'chmod' command. A command like 'chmod
777 filename.txt' will give Read, Write, and Execute permissions for
the file for anyone on the server machine, so even when the server
changes to the new user it will still have access to the file.
File permissions are an important thing to remember when trying to
setup someone's CGI script, and are often the cause of it not working
correctly. Make sure to follow instructions on which file permissions
are needed for which files in order to setup the script correctly.
2. Content-type
The first thing a CGI script needs to output, assuming it's giving an
HTML response back to the user, is "Content-type: text/html" followed
by two returns (creating an empty line). This is needed in any CGI
script so that the web server knows what kind of data is being sent
back to the browser and can handle it appropriately. The CGI script
could actually return any type of response it wanted to - it could be
plain text, or a PDF document, or a Microsoft Word file. But 99% of the
time, the result of any CGI script is going to be plain HTML.
If the script runs and it outputs something other than the
Content-type, the web server will return an error message to the
browser saying the script returned an invalid response.
7. DEBUGGING
Any time you have a script that doesn't work, you need to go through a
series of steps to figure out what is wrong. Most of the time, you'll
be setting up a script that someone else wrote, so sometimes it can be
difficult to figure out what is wrong. But if you follow a few steps,
it should be easier.
1. Make sure CGI scripts are allowed
If you aren't the Webmaster in charge of the web server, the very first
thing to check is that you are allowed to run CGI scripts. Some ISP's
don't allow users to run CGI scripts on their web sites at all. Some
others allow it only in certain directories. Others allow it anywhere,
with no restrictions. It really depends on your host, which web server
they are running, and what they allow.
2. Make sure file is executeable
See above. Make sure the file is executeable!
3. Check shebang line
Check the first line of file and see if the program that's trying to
run it actually exists. If it says #!/usr/bin/perl, you need to make
sure this program exists on your web server. If you don't know how to
check, you can ask your ISP or webmaster and they will be able to tell
you if this is correct. For Perl scripts, some people may need to
change it to /usr/local/bin/perl or /usr/bin/perl5 or possible other
locations.
4. Run it locally
Hopefully you have access to a command-prompt or terminal on your web
server. In other words, you can telnet into your Unix machine or sit
down at a DOS prompt on your NT server. If you don't have this kind of
access, it will be harder to figure out what is wrong.
Log into the machine, change to the directory where the script is
actually located, and try to run it from the command-line. It may give
you an error message if the script has a syntax error in it, for
example, which will help you to edit the file and fix it. If it runs
perfectly, you know that the problem is most likely in your web
server's configuration or possibly with security and file permissions
problems.
5. Check the error log
Web servers keep a log of all errors, which includes problems with
trying to run CGI programs. If there is a problem and your program
cannot run correctly, looking at the error log might show you an error
message like "access to filename.txt not allowed" which means your
program tried to access a file that didn't have file permissions set
correctly.
If you don't have access to your web server's error log on your ISP or
host, ask the administrators to check it for you and see if there are
errors for the script you are trying to run.
6. Ask your Webmaster for help
If you've checked all these things, email your webmaster and see if
they can help you out. They may have some special things setup, or they
might be able to give you some clues. Also, be sure to tell them you've
tried the above steps - they will love you! Nothing is more aggravating
than receiving a request for help from someone who has apparently done
NOTHING to try to help themselves.
7. Contact the script author for help
If all else fails, contact the script author for help. I recommend you
use this as a last resort. In all my experience with CGI scripts and
helping people install what I've written, I would say 95% of the
problems are because of web server setup problems or other issues that
I simply couldn't help them with. Be sure to tell them which steps
you've tried to resolve the issue too. I know that I am much more
willing to help someone who has gone through some debugging steps on
their own before asking me. When I get requests like "I can't get your
script to work at all - please help!" I simply delete them. I cannot
help everyone, and I can't spend time teaching someone how to solve
their problems who doesn't even help me understand what their problem is.
8. SUMMARY
Hopefully I've gone into enough detail in this tutorial to help you
out. If you want to know more, or really get into the nitty-gritty of
things, you'll need to buy a book (I always recommend O'Reilly books) or learn through trial-and-error on your web server.
An important thing to remember is that this can be a complicated subject
and you shouldn't expect easy answers. Programming and/or installing
CGI scripts is much more difficult and involved that writing simple
HTML. If you're trying to make the jump from one to the other, make
sure you've got the desire to really learn it and the knowledge to make
it happen. I've seen all too many people become frustrated while trying
to install scripts or write simple CGI script because they just don't
have the knowledge or experience needed for it.
Learning CGI programming and making scripts work on your site
can be a very satisfying experience. Hopefully this helped you along
your way, and you'll have much success with it! Good luck!
|