22.9. Active Contents

Apache provides several possibilities for the delivery of active contents. Active contents are HTML pages that are generated on the basis of variable input data from the client, such as search engines that respond to the input of one or several search strings (possibly interlinked with logical operators like AND or OR) by returning a list of pages containing these search strings.

Apache offers three ways of generating active contents:

Server Side Includes (SSI)

These are directives that are embedded in an HTML page by means of special comments. Apache interprets the content of the comments and delivers the result as part of the HTML page.

Common Gateway Interface (CGI)

These are programs that are located in certain directories. Apache forwards the parameters transmitted by the client to these programs and returns the output of the programs. This kind of programming is quite easy, especially since existing command-line programs can be designed in such a way that they accept input from Apache and return their output to Apache.

Module

Apache offers interfaces for executing any modules within the scope of request processing. Apache gives these programs access to important information, such as the request or the HTTP headers. Programs can take part in the generation of active contents as well as in other functions (such as authentication). The programming of such modules requires some expertise. The advantages of this approach are high performance and possibilities that exceed those of SSI and CGI.

While CGI scripts are executed directly by Apache (under the user ID of their owner), modules are controlled by a persistent interpreter that is embedded in Apache. In this way, separate processes do not need to be started and terminated for every request (this would result in a considerable overhead for the process management, memory management, etc.). Rather, the script is handled by the interpreter running under the ID of the web server.

However, this approach has a catch. Compared to modules, CGI scripts are relatively tolerant of careless programming. With CGI scripts, errors, such as a failure to release resources and memory, do not have a lasting effect, because the programs are terminated after the request has been processed. This results in the clearance of memory that was not released by the program due to a programming error. With modules, the effects of programming errors accumulate, as the interpreter is persistent. If the server is not restarted and the interpreter runs for several months, the failure to release resources, such as database connections, can be quite disturbing.

22.9.1. Server Side Includes: SSI

Server-side includes are directives that are embedded in special comments and executed by Apache. The result is embedded in the output. For example, the current date can be printed with <!--#echo var="DATE_LOCAL" -->. The # at the end of the opening comment mark <!-- shows Apache that this is an SSI directive and not a simple comment.

SSIs can be activated in several ways. The easiest approach is to search all executable files for SSIs. Another approach is to specify certain file types to search for SSIs. Both settings are explained in Section 22.7.2.15. “Server-Side Includes”.

22.9.2. Common Gateway Interface: CGI

CGI is the abbreviation for common gateway interface. With CGI, the server does not simply deliver a static HTML page, but executes a program that generates the page. This enables the generation of pages representing the result of a calculation, such as the result of the search in a database. By means of arguments passed to the executed program, the program can return an individual response page for every request.

The main advantage of CGI is that this technology is quite simple. The program merely must exist in a specific directory to be executed by the web server just like a command-line program. The server sends the program output on the standard output channel (stdout) to the client.

22.9.3. GET and POST

Input parameters can be passed to the server with GET or POST. Depending on which method is used, the server passes the parameters to the script in various ways. With POST, the server passes the parameters to the program on the standard input channel (stdin). The program would receive its input in the same way when started from a console.

With GET, the server uses the environment variable QUERY_STRING to pass the parameters to the program. An environment variable is a variable made available globally by the system (such as the variable PATH, which contains a list of paths the system searches for executable commands when the user enters a command).

22.9.4. Languages for CGI

Theoretically, CGI programs can be written in any programming language. Usually, scripting languages (interpreted languages), such as Perl or PHP, are used for this purpose. If speed is critical, C or C++ may be more suitable.

In the simplest case, Apache looks for these programs in a specific directory (cgi-bin). This directory can be set in the configuration file, described in Section 22.7. “Configuration”).

If necessary, additional directories can be specified. In this case, Apache searches these directories for executable programs. However, this represents a security risk, as any user will be able to let Apache execute programs (some of which may be malicious). If executable programs are restricted to cgi-bin, the administrator can easily see who places which scripts and programs in this directory and check them for any malicious intent.

22.9.5. Generating Active Contents with Modules

A variety of modules is available for use with Apache. The term “module” is used in two different senses. First, there are modules that can be integrated in Apache to handle specific functions, such as modules for embedding programming languages. These modules are introduced below.

Second, in connection with programming languages, modules refer to an independent group of functions, classes, and variables. These modules are integrated in a program to provide a certain functionality, such as the CGI modules available for all scripting languages. These modules facilitate the programming of CGI applications by providing various functions, such as methods for reading the request parameters and for the HTML output.

22.9.6. mod_perl

Perl is a popular, proven scripting language. There are numerous modules and libraries for Perl, including a library for expanding the Apache configuration file. The home page for Perl is http://www.perl.com/. A range of libraries for Perl is available in the Comprehensive Perl Archive Network (CPAN) at http://www.cpan.org/.

22.9.6.1. Setting up mod_perl

To set up mod_perl in SUSE LINUX, simply install the respective package (see Section 22.6. “Installation”). Following the installation, the Apache configuration file includes the necessary entries (see /etc/apache2/mod_perl-startup.pl). Information about mod_perl is available at http://perl.apache.org/.

22.9.6.2. mod_perl versus CGI

In the simplest case, run a previous CGI script as a mod_perl script by requesting it with a different URL. The configuration file contains aliases that point to the same directory and execute any scripts it contains either via CGI or via mod_perl. All these entries already exist in the configuration file. The alias entry for CGI is:

ScriptAlias /cgi-bin/ "/srv/www/cgi-bin/"

The entries for mod_perl are:

<IfModule mod_perl.c> 
# Provide two aliases to the same cgi-bin directory, 
# to see the effects of the 2 different mod_perl modes. 
# for Apache::Registry Mode 
ScriptAlias /perl/          "/srv/www/cgi-bin/" 
# for Apache::Perlrun Mode 
ScriptAlias /cgi-perl/      "/srv/www/cgi-bin/" 
</IfModule> 

The following entries are also needed for mod_perl. These entries already exist in the configuration file.

#
# If mod_perl is activated, load configuration information
#
<IfModule mod_perl.c>
Perlrequire /usr/include/apache/modules/perl/startup.perl
PerlModule Apache::Registry

#
# set Apache::Registry Mode for /perl Alias
#
<Location /perl>
SetHandler  perl-script
PerlHandler Apache::Registry
Options ExecCGI
PerlSendHeader On
</Location>

#
# set Apache::PerlRun Mode for /cgi-perl Alias
#
<Location /cgi-perl>
SetHandler  perl-script
PerlHandler Apache::PerlRun
Options ExecCGI
PerlSendHeader On
</Location>

</IfModule>

These entries create aliases for the Apache::Registry and Apache::PerlRun modes. The difference between these two modes is as follows:

Apache::Registry

All scripts are compiled and kept in a cache. Every script is applied as the content of a subroutine. Although this is good for performance, there is a disadvantage: the scripts must be programmed extremely carefully, as the variables and subroutines persist between the requests. This means that you must reset the variables to enable their use for the next request. If, for example, the credit card number of a customer is stored in a variable in an online banking script, this number could appear again when the next customer uses the application and requests the same script.

Apache::PerlRun

The scripts are recompiled for every request. Variables and subroutines disappear from the namespace between the requests (the namespace is the entirety of all variable names and routine names that are defined at a given time during the existence of a script). Therefore, Apache::PerlRun does not necessitate painstaking programming, as all variables are reinitialized when the script is started and no values are kept from previous requests. For this reason, Apache::PerlRun is slower than Apache::Registry but still a lot faster than CGI (in spite of some similarities to CGI), because no separate process is started for the interpreter.

22.9.7. mod_php4

PHP is a programming language that was especially developed for use with web servers. In contrast to other languages whose commands are stored in separate files (scripts), the PHP commands are embedded in an HTML page (similar to SSI). The PHP interpreter processes the PHP commands and embeds the processing result in the HTML page.

The home page for PHP is http://www.php.net/. For PHP to work, install mod_php4-core and, in addition, apache2-mod_php4 for Apache 2.

22.9.8. mod_python

Python is an object-oriented programming language with a very clear and legible syntax. An unusual but convenient feature is that the program structure depends on the indentation. Blocks are not defined with braces (as in C and Perl) or other demarcation elements (such as begin and end), but by their level of indentation. The package to install is apache2-mod_python.

More information about this language is available at http://www.python.org/. For more information about mod_python, visit the URL http://www.modpython.org/.

22.9.9. mod_ruby

Ruby is a relatively new, object-oriented high-level programming language that resembles certain aspects of Perl and Python and is ideal for scripts. Like Python, it has a clean, transparent syntax. On the other hand, Python has adopted abbreviations, such as $.r for the number of the last line read in the input file — a feature that is welcomed by some programmers and abhorred by others. The basic concept of Ruby closely resembles Smalltalk.

The home page of Ruby is http://www.ruby-lang.org/. An Apache module is available for Ruby. The home page is http://www.modruby.net/.