Intro to CGI

"What is CGI? What is Perl? Chmod? Executable? 501 Errors? What the hell is a cgi-bin?"


	3.0/5.0 (5 votes total)
	Rate:

Matt Kruse
April 06, 2006

Matt Kruse

Much content, relevant to aspiring web developers, is availabe on the author's website at http://www.mattkruse.com.

http://www.mattkruse.com

Matt Kruse has written 1 articles for CGIDir.

View all articles by Matt Kruse...

1. WHAT IS CGI?

CGI stands for "Common Gateway Interface" - a term you don't really need to know. In short, CGI defines how web servers and web browsers handle information from HTML forms on web pages. That's simplifying it, but you get the point. In the broader sense, however, the term 'CGI' is often used to mean "any program that runs on a web server and interacts with a web browser". You may hear someone ask, "Where can I get a CGI script to handle this form?" or "Use CGI to do what you need". What they are referring to is a program of some sort that runs on your web server.

2. DYNAMIC CONTENT

Okay, so a web server spends most of its time answering requests, loading the HTML page that a user is requesting, and sending it to them. Nothing too complicated. But this isn't very exciting, is it? What if I want the user to see something different every time they load a page? What if I want to ask a user for information, and save it to a database? What if I want to display information from a file that may change 5 times a day? In situations like this, loading a static (non-changing) page just isn't good enough. We need the web server to run a program, take some action, and then send a results page back to the user's web browser. The results page might be different every time the program is run.

Let's take an example...

You create a web page in your browser that has a form in it, asking the user's name and email address. There is also a 'Submit' button. When the user presses submit, their information should be saved in a file on the web server that you can view later, and they should get a 'Thank You' screen back.

In your original HTML document, you will have a <FORM> and some <INPUT> tags. For example:

<FORM METHOD="POST" ACTION="http://www.server.com/cgi-bin/program.cgi"> Name: <INPUT NAME="name"><BR> Email: <INPUT NAME="email"><BR> <INPUT TYPE="SUBMIT" VALUE="Submit"> </FORM>
This is a basic form. Webmonkey's HTML Tutorial (http://www.hotwired.com/webmonkey/99/30/index4a.html?tw=html) is good if you need to learn more about Forms in HTML. If you don't understand forms, read up on them first. You can't really tackle CGI scripts without understanding forms.

The <FORM> tag has two parameters that are important for us. The METHOD tag defines how the browser will send the information to the server, and how the web server will send it to your program. It can either be "POST" or "GET" - you will most often see "POST". For a full explanation of the difference, you need a longer tutorial or a book. The other parameter, ACTION, is the URL of the program on the server that will process the information sent from the form and do something with it.

When the user hits 'submit' the web browser makes a connection to the server, requests the URL in the 'ACTION' paramater, and also sends all the form values that the user entered. The web server looks at the URL, realizes it is a program rather than a static file, and runs it. The program then grabs all the data sent to it, does something, and returns HTML back to the browser as the response. That's it! That's the basic process that almost all CGI scripts are going to go through.

3. CONFIGURATION ON THE SERVER SIDE

In order for all this to happen, you need to make sure your web server is setup to handle this whole thing. By default, some are and some aren't. You will probably need to check with your webmaster or ISP to see if your server is setup correctly, and if you are able to run CGI programs. Or if you are the webmaster, you need to read your server's documentation with respect to CGI. But it's important to understand why it works and why it doesn't.

When you (your browser, actually) request a URL from a server, the server needs to do some checking to find out what to do. How does the server know if the URL you are requesting is a static file it should just load and send, or if it's a program it should run and send to you? This is typically decided by two factors: Which directory the file is in, and its file extension.

First, let's look at the directory part. If you're reading this, you've no doubt heard of a 'cgi-bin' directory, and noticed that most CGI scripts need to be in this directory. Why? Well, this is a server configuration issue. The server is setup to know that any file in this directory is a program to run, and not a static file to send to the browser. Usually, you can't even put a regular HTML file in this directory, because when the server tries to load it, it will try to run it as a program rather than just send it as a file.

Are you curious where the name 'cgi-bin' came from? Well, it goes back to the original days of the NCSA web server. By default, this web server had two directories: cgi-src and cgi-bin. The first contained source code for CGI programs that could run on the server. The second contained the binaries (compiled executables) of the programs, which could be run on the server. Web servers typically don't have the cgi-src directory anymore, but the name cgi-bin has stuck around as the 'default' place to put executable CGI programs on a web server.

Now let's look at the second factor to determine whether a web server runs the file or loads it as a static file: the file extension.

The extension of a file on the server - .html, .cgi, .pl, .txt, etc - tells the server what kind of file it is and how to handle it. It knows that .html and .txt files are plain text static files that should just be sent to the browser, for example. You can add your own file extensions through the web server's configuration options, and tell it how to handle those files. The .cgi extension is one example of an extension that the web server is configured to recognize as a program it should run.

Okay, now let's take another look at the <FORM> line from our example above:

<FORM METHOD="POST" ACTION="http://www.server.com/cgi-bin/program.cgi">

When the web browser sends its request to the ACTION URL, the web server sees that it is in the cgi-bin directory, and its extension is .cgi - so it knows that this is a program that it should run. So it hands off a request to the operating system telling it to run the program, and also passes all the form data to this program. Makes perfect sense, doesn't it?

4. RUNNING THE PROGRAM

We're now at the point where the web server has decided it should run the CGI program, and its made the request to the Operating System to execute the file. This is where a lot of problems start happening, because there are a lot of things that need to be exactly correct in order for the program to run successfully and send the output back to the web browser. Some of these potential problems are specific to UNIX, and some are specific to Windows NT (I won't go into other operating systems because these two are the most common). I'll just go down the list of things that need to be correct in order for this to work.

1. The file needs to be executable (Unix only)

In Unix, files have attributes that don't exist in the Windows NT world. One of these is the executable bit. Each file has a setting that tells the operating system whether it can be executed as a program or not, and whether it can be run by only the file owner, only the group that the file owner is in, or by everyone on the server. In order for the operating system to run the file, it needs to be marked as 'executable' by Everyone. This is what the 'chmod' command does. I won't go into detail about how chmod works, but when you see an instruction that says something like 'do a chmod 755 on the program.cgi file', what it is telling you is to make the file executable by everyone on your server, so it can be run from the web server.

2. The file needs to point to a valid executable (Unix only)

For .cgi files, the server knows to run it as a program, but it needs to know HOW to run it. If it's a compiled executable, there's no problem - it just runs it. But if it's a script using a language like Perl, it needs to know where to find the Perl program that will run the script. This is the function of the first line of the file. For example:
#!/usr/bin/perl
In Unix, this points to an executable file (in this case, the program is named 'perl') that will run your script. The first two characters - #! - is called a shebang, and it's common Unix syntax. If your script starts with the line above, and your server doesn't have a program called /usr/bin/perl, the whole thing will die and you'll get an error back. For perl scripts, the line above is typical, and most servers have /usr/bin/perl. But in some rare instances, things are configured differently and you need to edit this first line to point to a valid program to run.

3. The file needs to have an executable extension (NT only)

In the NT world, execute permissions don't exist and the 'shebang' doesn't apply. (Note: Some web servers on NT are "smart" and have been designed to use the shebang line to act like Unix). So you don't need to worry about using 'chmod' on servers running Windows NT, and you can ignore the first line of the file. What you do need to be concerned about, however, is the file extension.

Windows decides which program to use to open a file based on its extension. If you want to run a perl script on NT, for example, you need to give it an extension of .pl, and the Operating System will know to use Perl to open and run the file.

4. The program needs to return a valid response

Any CGI programming that runs needs to return a valid response to the browser. If it encounters a problem while running and dies, it could output an error message, however. If you were running the program in a normal window on NT or Unix, you would simply see the error message. In the web world, however, the program needs to hand its response back to the web server, who then packages it up to send back to the browser. If the program outputs an error message, the web server does not get the response it expects and instead returns a general error (501 error, for example) back to the browser saying there was a problem running the program.

5. THE WHOLE SERVER - SIDE PROCESS

Now that you understand how things need to be setup, it's a good time to step through the whole process and see exactly what happens when a CGI script is run. Going back to our original example, here is the sequence of events (assuming a Unix server):

1. The browser requests the URL in the ACTION tag, and passes all the data along with the request
2. The server recognizes that .cgi means that this file should be run
3. It checks to make sure that CGI programs are allowed to run on the server
4. It checks to make sure that CGI programs are allowed to run in the /cgi-bin directory
5. It launches a sub-process to run the program in the operating system
6. The operating system opens the file and looks at the first line to see which program to use with the script
7. It runs this program and passes it the filename to run
8. The script runs, does whatever it needs to, and then returns an HTML response, using print() statements, for example
9. The whole response is passed back to the server which then packages it up in an HTTP response, including content length, etc.
10. The server then passes the whole response back to the browser, which displays it.

6. POSSIBLE PROBLEMS

Of course, that whole process doesn't always go as planned, and there are some things that can stand in the way of your program running correctly.

1. File permissions

When the web server launches a sub-process to run the program (Step 5 above) it does a trick. It changes the User ID of who it is running as to a user that has very little or no permissions to do anything on the web server. This is for security purposes - so you can't write a script that over-writes important files on accident, or deletes whole directory trees. But this also creates a problem when your program tries to access files to read and write. If you want your program to write to a file, you need to make sure it has permissions setup correctly for this user (usually a user named 'nobody') to write to it. Once again, you need to use the 'chmod' command. A command like 'chmod 777 filename.txt' will give Read, Write, and Execute permissions for the file for anyone on the server machine, so even when the server changes to the new user it will still have access to the file.

File permissions are an important thing to remember when trying to setup someone's CGI script, and are often the cause of it not working correctly. Make sure to follow instructions on which file permissions are needed for which files in order to setup the script correctly.

2. Content-type

The first thing a CGI script needs to output, assuming it's giving an HTML response back to the user, is "Content-type: text/html" followed by two returns (creating an empty line). This is needed in any CGI script so that the web server knows what kind of data is being sent back to the browser and can handle it appropriately. The CGI script could actually return any type of response it wanted to - it could be plain text, or a PDF document, or a Microsoft Word file. But 99% of the time, the result of any CGI script is going to be plain HTML.

If the script runs and it outputs something other than the Content-type, the web server will return an error message to the browser saying the script returned an invalid response.

7. DEBUGGING

Any time you have a script that doesn't work, you need to go through a series of steps to figure out what is wrong. Most of the time, you'll be setting up a script that someone else wrote, so sometimes it can be difficult to figure out what is wrong. But if you follow a few steps, it should be easier.

1. Make sure CGI scripts are allowed

If you aren't the Webmaster in charge of the web server, the very first thing to check is that you are allowed to run CGI scripts. Some ISP's don't allow users to run CGI scripts on their web sites at all. Some others allow it only in certain directories. Others allow it anywhere, with no restrictions. It really depends on your host, which web server they are running, and what they allow.

2. Make sure file is executeable

See above. Make sure the file is executeable!

3. Check shebang line

Check the first line of file and see if the program that's trying to run it actually exists. If it says #!/usr/bin/perl, you need to make sure this program exists on your web server. If you don't know how to check, you can ask your ISP or webmaster and they will be able to tell you if this is correct. For Perl scripts, some people may need to change it to /usr/local/bin/perl or /usr/bin/perl5 or possible other locations.

4. Run it locally

Hopefully you have access to a command-prompt or terminal on your web server. In other words, you can telnet into your Unix machine or sit down at a DOS prompt on your NT server. If you don't have this kind of access, it will be harder to figure out what is wrong.

Log into the machine, change to the directory where the script is actually located, and try to run it from the command-line. It may give you an error message if the script has a syntax error in it, for example, which will help you to edit the file and fix it. If it runs perfectly, you know that the problem is most likely in your web server's configuration or possibly with security and file permissions problems.

5. Check the error log

Web servers keep a log of all errors, which includes problems with trying to run CGI programs. If there is a problem and your program cannot run correctly, looking at the error log might show you an error message like "access to filename.txt not allowed" which means your program tried to access a file that didn't have file permissions set correctly.

If you don't have access to your web server's error log on your ISP or host, ask the administrators to check it for you and see if there are errors for the script you are trying to run.

6. Ask your Webmaster for help

If you've checked all these things, email your webmaster and see if they can help you out. They may have some special things setup, or they might be able to give you some clues. Also, be sure to tell them you've tried the above steps - they will love you! Nothing is more aggravating than receiving a request for help from someone who has apparently done NOTHING to try to help themselves.

7. Contact the script author for help

If all else fails, contact the script author for help. I recommend you use this as a last resort. In all my experience with CGI scripts and helping people install what I've written, I would say 95% of the problems are because of web server setup problems or other issues that I simply couldn't help them with. Be sure to tell them which steps you've tried to resolve the issue too. I know that I am much more willing to help someone who has gone through some debugging steps on their own before asking me. When I get requests like "I can't get your script to work at all - please help!" I simply delete them. I cannot help everyone, and I can't spend time teaching someone how to solve their problems who doesn't even help me understand what their problem is.

8. SUMMARY

Hopefully I've gone into enough detail in this tutorial to help you out. If you want to know more, or really get into the nitty-gritty of things, you'll need to buy a book (I always recommend O'Reilly books) or learn through trial-and-error on your web server.

An important thing to remember is that this can be a complicated subject and you shouldn't expect easy answers. Programming and/or installing CGI scripts is much more difficult and involved that writing simple HTML. If you're trying to make the jump from one to the other, make sure you've got the desire to really learn it and the knowledge to make it happen. I've seen all too many people become frustrated while trying to install scripts or write simple CGI script because they just don't have the knowledge or experience needed for it.

Learning CGI programming and making scripts work on your site can be a very satisfying experience. Hopefully this helped you along your way, and you'll have much success with it! Good luck!