KPLUG Logo

KPLUG Apache Tutorial


Table of Contents


Introduction

Web pages. It's old news that the internet has made information available on a scale never before seen in human history. It's old news that the most popular way to provide this information is HTML. It's even old news that Apache serves more than 60% of all web pages, thus making it more popular than all other web servers combined. So why is there a mystery about configuring Apache to serve this information? Well.... simply put, it's not mysterious. It's actually fairly easy, and information on how to do it is scattered all over typical Linux distro CD's and the web. This tutorial is an attempt to consolidate some of that information and offer a bit of explanation about the how's and why's. So let's get started.


Roll your own vs. Instant Gratification

Most distributions of Linux include pre-compiled versions of Apache, and many include the source code as well. Usually the easiest thing to do is just install Apache from your CD's. But if you want the latest version of Apache or insist on pristeen sources, you're going to want to download. Everything Apache can be found at http://www.apache.org including source code, documentation, helper applications, and (for the code-head) more information about the Apache API than you can take in one sitting, and of course, binaries for Linux. If you're just installing a pre-compiled binary (either from apache.org, your favorite archive, or your distro CD) you can skip ahead to the basic configuration section. For those of you following at home, I'll be working with version 1.3.20. If you're reading this tutorial sometime in 2002, much of this may no longer apply, as version 2 is rapidly getting stable enough for a production environment, so don't complain to KPLUG.

Once you have the source code and have unpacked it (I won't go into that here.... if you don't know how to download and install a source rpm or a tarball, you probably shouldn't be trying to do it this way anyway), it will be necessary to configure and compile it. Do not mistake this configuration process for the apache run-time configuration process. They're apples and oranges. Configuring the compile-time options determines such defaults as where apache will look for its configuration files, where it looks for web pages to serve, and several other important features.

Since you're one of the adventurous souls compiling your own apache, you get to make some decisions. Do you want to be follow RedHat's structure, or maybe SuSE, or maybe you want to stay "True to GNU" and follow the guidelines layed out in 'make-stds', or maybe you're one of the FHS puritans and demand that non boot-essential config files not reside in /etc, but rather in /etc/opt/<package>. The choice is yours. Many of these important choices are gathered into one location for you, the layout file config.layout. Edit this file by either altering one of the existing entries, or creating a new one that matches your desired configuration.

Basic config.layout options
Option Meaning
prefix This is the basic top level directory of most things apache.
This option is almost purely at your discretion
execprefix The prefix for binary files.
Typically /usr
bindir Where apache places binary files.
Typically $execprefix/bin
sbindir Where apache places system binaries.
Typically $execprefix/sbin
libexecdir Where apache looks for what can best be described as apache-specific
helper files. These include such things as dynamic modules, which
we'll discuss later.
Typically /usr/lib/apache
mandir Where apache will install the manpage(s)
Typically /usr/man
sysconfdir Where apache looks by default for the runtime config files.
Typically /etc/httpd/conf
datadir Used to mark the top level of the directory tree containing the data to be served
Typically the same as $prefix
iconsdir Where apache looks for icons representing various mime-types when
serving ftp directories as web-pages.
Typically $datadir/icons
htdocs This is the "document root", where your main "index.html" lives.
Typically $datadir/htdocs
cgidir The default location for cgi executables.
Typically $datadir/cgi-bin
includedir Location for include files for compiling apache add-ons.
Typically $prefix/incluce/apache
localstatedir Where apache stores state files.
Typically /var
runtimedir Where apache will store runtime state files.
Typically $localstatedir/run
logfiledir Default location of apache log files.
Typically $localstatedir/log/apache
proxycachedir Where apache will store cached files if you've included the
proxy module as part of the configuration.
Typically $localstatedir/cache/apache

Now when I say typically this does not necessarily mean that's where you'll find these files on your system. RedHat, SuSE, GNU, the HFS standard, Debian, Storm, and Mandrake all have different things to say about it. These decisions are yours to make, and you should make them based on your analysis of current and future requirements, disk space available, backup criticality, corporate policy, and your own personal preferences. The only recommendation that I will strongly make is that you make every effort to make your paths relative to the $prefix, $datadir, and $localstatedir directives rather than completely static. This makes changing the config much easier, and when we start talking about security, that can become important.

Now that you've decided your file layout structure, you need to consider what capabilities you want your web server to have. Several things that we take for granted about web servers may not be default behaviour. In general, the apache team included the most useful modules as part of the source distributions default configuration, but you should probably take a good look at src/Configuration.tmpl. Most modules can either be included statically in the binary or can be loaded dynamically by the server as needed (DSO -- Dynamic Shared Object).

Both methods have their pros and cons, and in general the normal guidelines for static vs. dynamic apply. The static method is the easiest, and makes for faster servers. The downside is that your webserver can suffer "Microsoft Syndrome" and can begin to take on swiss army knife features at the expense of memory efficiency and executable size. Using dynamic shared modules makes your overall executable size smaller, meaning less resources are required to handle multiple instances (apache uses the fork-ahead server model for those C coders keeping score at home) and children spawn faster. The downsides are that there is a measurable latency to loading/linking the module into apache on the fly, and DSO's don't execute quite as fast as static modules. Since benchmarking these tredeoffs is highly traffic-pattern dependant, and patterns tend to change over time, it's a real tough call at design time. In general, just make your best guess and forget about it.

Short list of my favorite modules
View the complete list
Mod Name Description
actions Executing CGI script based on media type or request method
autoindex Automatic directory listings
cgi Invoking CGI scripts
env Altering the environment passed to CGI and SSI pages
imap Improved support for server side image maps
include Support for Server Side Includes
log_config Configurable logging support.
mime_magic Support for media types based on file contents (type).
mime Support for media types based on common but braindead MIME type.
negotiation Support for content negotiation.
proxy Provides for HTTP 1.0 caching proxy support.
rewrite URI to filename rewriting on the fly.
setenvif Allows setting env variables based on request attributes.
This is useful to deal with buggy browsers, or to deny cool
features to MSIE users just for the fun of it.
so Supports loading shared modules at runtime.
status Provides information on server status and performance.
userdir Supports user-specific directories (member home pages).
vhost_alias Dynamically configured mass virtual hosting.

Once you have decided your layout, and made your decisions about modules, you're ready to configure the source code for compilation. This important step sets up the makefiles to be compatible with Linux and also sets up the proper linking options for your modules. Go to the root of the apache source tree, and enter the command

./configure --with-layout= MyLayoutName \
            --enable-module= module_name \
            --enable-module= module_name2 \
            --enable-shared= shared_module_name \
            --enable-shared= shared_module_name2 \
            --disable-module= unwanted_module_name

Since to get exactly the features you want, you may have to configure the source tree several times, I recommend that you create a small shell script for your configure command. It will save a lot of typing in the long run. Once your source tree has been configured, all you need to do is build and install the program.

Building apache once the source is configured is a snap. (Note that as for most installations, you will probably require root permissions to properly build and install apache.) Just enter the command

   make
and you're on your way. If your linux box has a proper development environment set up (and it should, or you probably would have already skipped ahead to the configuration section) everything should go smoothly. Once the build has completed, installing apache is just a matter of typing
   make install

We now need a way to start and stop apache on our system. Most distro's have a fairly good SYSV init template to copy somewhere in the /etc/rc directories, but apache provides a program called apachectl to start and stop the server if you want to use it. Now.... just because you've compiled and installed apache doesn't mean it's ready to run. You still need to configure the runtime environment. The fun is just starting.


Basic Configuration

Generally, for the home hobbyist, there is no need to do any editing at all to apache's configuration file, but this wouldn't be much of a tutorial if I just ignored the whole issue. You can locate the config file by remembering where you built it (if you build your own) or usually by looking for the file /etc/httpd/conf/httpd.conf. The oonfig file is broken up into three sections, the Global Section, the Main (or default server) section, and the Virtual Hosts section. In older versions of Apache, two additional files, srm.conf and access.conf controlled resources and access rights. These files are still kept around, but are now deprecated.

Section 1: Global Section

This section controls behaviour that is global to all instances of apache running on your system. The example configuration file contains excellent documentation for each of the options. Below is a table containing some general guidance for use when modifying the options.

Global Section Directives
View all directives
Directive Hints
ServerRoot If you configured sysconfdir to be /etc/httpd/conf then
make this "/etc/httpd"
LockFile This file is used by apache to decide if it's running or not.
If the path does not start with a leading /, apache will assume the path is relative to the ServerRoot defined above.
(RedHat /var/lock/httpd.lock)
pidfile This file is where apache stores the process id of the server.
If the path does not start with a leading "/" apache will assume the path is relative to the ServerRoot defined above.
(Redhat /var/run/httpd.pid)
ScoreBoardFile This file stores internal server information, but is not needed on most Linux configurations. Just to be safe, create a place for it.
(RedHat /var/run/httpd.scoreboard)
TimeOut This is the number of seconds before net traffic times out. The default on this is 300, which is 5 minutes. It can be set much
lower, but values below 30 tend to cause problems.
KeepAlive Allows persistant connections. Unless you have a good reasons to not want them, set this to "on".
MaxKeepAliveRequests This determines the maximum number of Requests allowed on a persistant channel before it closes. 100 is a reasonable number
KeepAliveTimeout Determines how long a KeepAlive channel will remain open if idle. 15 is a good number.
MinSpareServers Sets the desired number of servers that are idle, awaiting requests. If there are ever less than this many of idle child processes, apache will start spawning more until this number is reached. Too many wastes resources. Too few and spikes in server hits could degrade performance. 2 is a good number for home or SOHO, 3 - 5 for a business or small university.
MaxSpareServers Sets the maximum desired number of idle servers. If there are more idle servers than desired, apache will begin to kill off children, reclaiming their resources. 10 is the default, while for the hobbyist or SOHO user, a value of 5 can be used to save resources.
StartServers The number of children to spawn at startup. The default is 5. Busy sites should set this higher, but not too high or you'll spend your first minute and a half spawning children and not serving requests. Apache will dynamically adjust the number of processes later, so setting this value very high is almost never useful.
MaxClients This sets a ceiling on the number of child processes that can be spawned. It can be set up to 256 without modifying source code.
MaxRequestsPerChild This sets the maximum number of requests that a child process will handle before dying. It is mainly useful on IRIX and SunOS where there are noticeable memory leaks in the libraries. A vaule of 0 will allow unlimited requests per child, and is claimed to be safe on Linux. I recommend a value of 1000, or 10000 for heavily loaded sites.
Listen Determines the address and port number that apache will bind. This can be used to limit apache to a specific address. For instance, you can use Listen 127.0.0.1:80 to cause apache to respond only to requests from the localhost. The usual value is 80, which tells apache to listen on the HTTP port of all interfaces. Multiple Listen directives can be used.
BindAddress Detemines which IP addresses apache will respond to. This is used on machines with multiple IP addresses (either through multiplexing or using multiple interfaces). The normal value is *, which causes apache to listen on all addresses.
ExtendedStatus This is only useful if you have loaded mod_status, and tells apache to keep track of extended information on a per request basis. It cannot be used on a virtualhost by virtualhost basis. Set this value to "on" if you've decided to compile mod_status as a built-in module (recommended).
ClearModuleList Apache has a list of modules that should be active. This directive clears that list. It is assumed that you will then turn on what you want using the AddModule directive.
AddModule Modules are sort of complicated. When you compile apache, it gets a list of included modules, not all of which are "turned on". This directive is used to activate a built-in module. It can be used even if you haven't used the ClearModuleList directive.
LoadModule This directive is used to load a dynamically loaded module (as oppossed to a built-in module. Order of execution can be important, so pay close attention to the example configuration and the documentation for any alternative modules you load.
<IfDefine></IfDefine> This is used to conditionally execute directives based on whether or not a specific value is defined, usually by means of a command line switch (-D foo). One use for this is for a startup script to check for the existance of a module, and load/configure it if it exists (RedHat's startup script does this, for example).


Section 2: Main (Default Server) Section

Section 2 of the configuration file deals with the default server. The default server (or main server) is the one that will handle any requests not captured by a <VirtualHost> stanza in your configuration. Directives and instructions that you set in this section are, in general, inherited by virtualhosts as well, so you can set some good default behaviours here rather than duplicating a lot of effort. Settings inside <VirtualHost> stanzas will override these options for that particular virtualhost only.

Common Directives
View the complete list
DirectiveNotes
Port Here for historical reasons, and for setting the SERVER_PORT environment variable for CGI and SSI. Set this to whatever your HTTP port will be (usually 80). Note: This does NOT apply to virtualhosts.
User Sets the user that apache will handle requests as. For security reasons, apache changes its effective UID before handling requests, so all of your documents must be accessible to this user. For this reason, it is useful to create a user called www or apache to use with your webserver. Running as the user nobody or as UID -1 does not work on all systems or with all libraries.
Group Just as apache changes its UID, it also changes its GID. This is the group to change to. Once again, nobody can cause you some difficult to track-down problems, so it's probably a good idea to create a group.
ServerAdmin Set this to the e-mail address that should receive all error notifications.
ServerName Set this to the fully qualified domain name of the server. Also used when setting up name-based virtual hosts. If you don't set this, you will likely encounter problems on startup.
DocumentRoot Set this to the directory to search for the main index file for this server. Apache will search for a file that matches your DirectoryIndex in this directory to display when no other page is requested (as when you request http://www.example.com)
UserDir When using the mod_userdir module, this allows you to map requests to user's home directories instead of to the document root tree. Set this to "www" to map requests for http://example.org/~foo to ~foo/www on the example.org server, for example. For security reasons, if you use this, also use UserDir Disabled root.
DirectoryIndex Used with mod_dir, this option sets the search order for files when a user requests a directory listing by specifying a "/" at the end of a directory name or for the document root. Normally this will just return "index.html", but you could specify
DirectoryIndex index.html index.php index.pl index.cgi
to have apache search for each of these files, returning the first one it found.
HostNameLookups Generally set to "off" to save the latency time of the DNS lookup, you can set this to either "on" or "double". "On" is useful to pass the hostname as REMOTE_HOST to CGI/SSI's and "Double" is the ultra-paranoid setting to detect spoofed requests. On heavily loaded sites this can cause some real slowdown, and most poeple don't need it.
ErrorLog Sets the name of the file to use for error logging. As of version 1.3, you can also direct errors to the syslog facility.
LogLevel Sets the level of information that apache will send to the error log. Defaults to "error". Possible options are "emerg", "alert", "crit", "error", "warn", "notice", "info", and "debug". These options follow the general content guidelines for syslog(3).
LogFormat When using mod_log_config (recommended), this directive allows you to customize the format of the log file. The options are many and various. Read the documentation. The most commonly used is
LogFormat "%h %l %u %t \"%r\" %>s %b" for main host, and
LogFormat "%v %l %u %t \"%r\" %>s %b" for virtual hosts.
Alias Allows for transparent redirection of requests. Typically used for icon, library image, and cgi directory redirection on a wholesale basis. Aliases are processed after <Location> stanzas and before <Directory> stanzas.
ScriptAlias Has the same result as Alias, but also marks the directory as containing cgi scripts, so apache will process them as such.
AddHandler If using mod_mime (recommended) this directive maps file extensions to handlers. An example of this is using
AddHandler cgi-script .cgi
to cause any file with the extension .cgi to be treated as a cgi file. This overrides any previous mappings.
AddType If using mod_mime (recommended) this directive maps file extensions to MIME types. One particularly forward looking use for this directive is mapping the ".xhtml" extension to text/html. An example of this is using
AddType text/html .xhtml
to cause any file with the extension .xhtml to be treated as html by the client. Converting your html to xhtml will generally only have small impacts on presentation, which can almost always be mediated with proper adjustments to CSS. While it isn't fully desirable to treat xhtml as html, no major browser is fully XHTML aware as of yet, so waddayagonnadoo?
ErrorDocument Allows you to set custom pages or scripts to handle HTTP exceptions and errors. This lets you get away from the canned error messages and allows for a more friendly and effective way to handle things like broken links and access denial.
Example: ErrorDocument 404 errordocs/404.cgi would invoke a custom error script when a file is not found on the server (bad typing or broken/obsolete link).


Section 3: Virtual Servers

Virtual servers are a way for a single invocation of apache to serve multiple domain names. There are three ways to go about it, named based, port based, and address based. Port based is commonly used to serve HTTP and HTTPS from the same server. Address based virtual hosting is used primarily for backward compatibility to HTTP 1.0 clients, which don't transmit the desired hostname as part of the request. The most commonly used method of virtual hosting is named based, where multiple domain names share the same IP address (CNAME aliasing) and is commonly used by web hosting services to preserve IP space, and by SOHO's who wish to serve something like www.my_business.com and www.my_personal_page.net from the same server. One caveat is that named based virtual hosting cannot be used with SSL secure servers because of the way the SSL protocol works.

The third section of the apache configuration file deals with virtual servers. Virtual servers are defined in a <VirtualHost> stanza. Stanzas are almost like HTML tags.... they start with a <keyword> in angle braces, and end with </keyword>. Other common examples of stanzas are <Location>, <Directory>, and <IfDefine>. Directives inside stanzas only apply within the scope defined by that stanza. For instance, if you added
<Directory /home/foouser/public_html/*>
   Order Deny, Allow
   Deny from Joe
   Allow from All
</Directory>

then the user Joe would have no access to files located under /home/foouser/public_html, but his access would remain unaffected for all other areas of your server. Sorry about the short digression.... stanzas are important, and I'm running short of free time before the KPLUG meeting to integrate them better into the turorial.

Let's give an example of setting up a name based virtual host. We will assume that www.example.com and www.foo.org point to the same IP address. In your httpd.conf file you would add the following:


   NameVirtualHost *
   <VirtualHost>
      ServerAdmin webmaster@example.com
      DocumentRoot /www/docs/example.com
      ServerName example.com
      ErrorLog logs/example.com_error
   </VirtualHost>

   <VirtualHost>
      ServerAdmin webmaster@foo.org
      DocumentRoot /www/docs/foo.org
      ServerName foo.org
      ErrorLog logs/foo.org_error
   </VirtualHost>

This is about all you need to get started. Of course, you may want to enable or disable certain features for each virtual host, like disabling cgi or enabling paranoid DNS lookups for logging purposes. Simply place the appropriate directives in the virtual hosts stanza, and you're done.

But what if you want to host hundreds of virtual hosts? Your httpd.conf would grow quick huge, be slow to load, and consume a lot of resources. The answer comes from dynmaically configured mass virtual hosting provided by mod_vhost_alias. If you enable this module, either as a dynamic module or built-in, you can use something like this:


# Turn off Canonical Names so CGI/SSI works properly
UseCanonicalName off

# Set the logging format for all virtual hosts
LogFormat "%V %h %l %u %t \"%r\" %s %b" vcommon
CustomLog logs/access_log vcommon

# Dynamically include server names in file requests
VirtualDocumentRoot /www/vhosts/%0/htdocs
VirtualScriptAlias  /www/vhosts/%0/cgi-bin
With this setup, a request to http://www.virtualhost.com/foo/bar.html would map to a request for the file /www/vhosts/www.virtualhost.com/htdocs/foo/bar.html. You can still use <Directory> and other stanzas to control things on a directory by directory basis.

One interesting thing you can do with virtual hosts is make your own web server perform differently by how you access it. For instance, on my web server at home, I have my DNS set up with several aliases to the web server, like "docs", "weather", "mirror", "daily", "rfc", and "howto". I then access my webserver by different name, like "http://rfc" or "http://weather" to access the right sets of pages.


Logging Options

At the KPLUG meeting where this tutorial was first presented, there seemed to be a lot of interest in various logging options. Apache offers a very well rounded set of logging options, including options to place logs from virtual hosts into separate files. Using configurable logs via mod_log_config, you can accomplish just about any type of logging you desire, including logging cookies, conditional logging, or passing logs to a logging host via syslog. Maintaining a separate logging host is almost always benificial to large sites. Lincoln D. Stein (known to Perl fans everywhere) has a quick example of one way to accomplish this.

ApacheToday has a four part series of tutorials on apache logging that you can view online.


Authorization Options

Another issue that garnered some interest was Apache authorization methods. There are dozens of ways to authorize and authenicate access in apache, and a separate tutorial in and of itself could be written on setting up some of these. I'll just refer you to other peoples fine work on the subject.

Dynamic Content

Dynamic content is a fairly fun thing to play with. It includes things like negotiated content (for folks who want their web-pages gif-free and in french), CGI, PHP, Perl generated pages, and SSI (Server Side Includes).

Negotiated Content

Beginning with HTTP 1.1, compliant browsers have been able to send information to the server specifying additional information and preferences along with their requests for web documents. The browser can, for example, inform the web server that it will accept GIF images, but would really prefer PNG or JPEG if they're available. Apache can parse these preferences and react to them. The common request headers that Apache understands are Accept, Accept-Language, Accept-Charset, and Accept-Encoding.

Apache's negotiation rules can be quite complex, so it's a real good idea to read the documentation if you really want to fine-tune your website, but basic negotiation is actually quite easy. First, ensure that mod_negotiation is enabled for your server (since it is compiled in by default, unless you changed that, your're OK). Second, add a handler for type-map, usually by including the configuration directive
   AddHandler type-map .var
and third by setting up the type-map files themselves. Then instead of hyper-linking to an image file or web-page, you hyperlink to the .var file, and let Apache sort out what should get served. An example file that would serve a page in a preferred language might be helpful here. If you create a file called foo.var, and create a hyperlink to it, and fill in the contents like this:

URI: foo.english.html
Content-type: text/html
Content-language: en

URI: foo.french.html
Content-type: text/html
Content-language: fr

URI: foo.german.html
Content-type: text/html
Content-language: de

Now when the user cliks on the link, Apache looks for a which language the browser says it prefers (the Accept-language header), and will return the right file. You can do the same thing with images. If you had a link like <IMG SRC=./foo.var> and the foo.var file contained

URI: foo.jpeg
Content-type: image/jpeg; qs=0.8

URI: foo.gif
Content-type: image/gif; qs=0.5

URI: foo.png
Content-type: image/png; qs=0.3

then apache would look for the Accept-encoding header in the request, and return the type of image that was 1) in the list of acceptable encodings, and 2) had the highest qs value (these range from 1.000 to 0.000)

Now lets say you have a case where none options in your .var file are acceptable to the browser. Apache will return error 406 (NOT ACCEPTABLE), and a hyperlinked list of the possible options. This can be a cool feature with translated pages, but tends not to work too well with images, as you can probably imagine.

Transparent Content Negotiation

Now, it's often not that much fun to do all of this work.... setting up the .var files, checking all the links for validity, reconfiguring your browser's preferences for each test run, etc. So Apache offers what is called "transparent content negotition". If you enable Multiviews in the Options directive, have files like foo.en.html, foo.fr.html, foo.de.html, and foo.html, and simply hyper-link to "foo", with no extension, Apache will fake up a type-map on the fly, and serve the best match. It's often a good idea to have a "default", like foo.html which, since it has no encoding or language specified at all, is always acceptable to the browser.

Of course, for the self-flagellating code-head types, you could "simply" use mod_actions to re-write documents into the desired format on the fly using CGI scripts, but you'd want a really fast server, lots of time on your hands to write the translators, a box of Chees-Its, and a case of Mountain Dew just to get started on such a project.


CGI

CGI refers to the Common Gateway Interface, and is the most common method of executing external programs or scripts on the server side to generate content. Even things like PHP make use of the concepts of CGI to perform their functions and features. CGI can also be your worst security nightmare, so use it carefully, and pay close attention to your server configuration. Probably the best instructions on enabling CGI in Apache ever written is the CGI HOWTO included with the Apache documentation. Look it over carefully. For those who don't want to click the mouse, be aware that the default setting for the Options directive is "All", which allows executing CGI's from anywhere they are found. This can be a big security hole in and of itself, so if your web server will be visible from the internet (I can't say it enough) pay close attention to your server configuration.

Getting CGI to work

Modules
mod_alias
mod_cgi
mod_mime
Configuration Directives
AddHandler
Options
ScriptAlias
If you wish to allow execution of CGI on your web server, you should include mod_cgi, mod_mime, and mod_alias in your server. You may also want to add a couple lines to your configuration file:
    AddModule mod_mime.c
    AddModule mod_cgi.c
    AddModule mod_alias.c
    ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
    AddHandler cgi-script cgi
ScriptAlias maps requests for http://www.example.com/cgi-bin/foo to the script /home/httpd/cgi-bin/foo, and tells Apache that every file in the cgi-bin directory should be treated as a CGI script. The AddHandler directive tells apache that files that ends with .cgi should be treated as a CGI program; that is, if the file exists and is executable, Apache should run it. This example will work anywhere in the document tree, not just the cgi-bin directory. You only need this line if you wish to allow execution of CGI's outside the ScriptAlias'ed directory. You could drop this directive into < VirtualHost> or <Directory> stanzas to limit its scope. No matter how you choose to configure your CGI access, you may want to consider security along every step of the way.
   Options -ExecCGI
   <Directory /foo/bar/ >
      Options +ExecCGI
   <Directory>
   <Directory /home/httpd/*/www/cgi-bin/ >
      Options +ExecCGI
   <Directory>
This disables CGI exection globally , but allows it for the /foo/bar directory and any directory with a name that matches /home/httpd/*/www/cgi-bin. This might be useful to allow exection of CGI's from user's home directories. Interaction between ScriptAlias, Options, and the AddHandler directives can be tricky, (ScriptAlias and ScriptAliasMatch override Options, for example, while Options and the Handler work hand in hand) so it will require some experimentation on your part until you are comfortable with the way things work.

Since this is strictly an Apache tutorial, we're not going to cover how to write CGI scripts, but if there is enough interest, KPLUG will do a CGI HOWTO in the future.

(Just as a side note, sites heavy in CGI should consider looking at the FastCGI module available from the Apache Module Registry to speed up CGI responsiveness.)


PHP

PHP is becoming more and more popular on the web. The latest NetCraft survey (November 2001) shows around 40% of all web sites on the net are PHP enabled. While compiling and installing PHP as a module for your Apache webserver is a bit tricky, it is well worth the effort. Luckily most distributions come with PHP already, so unless you're rolling your own Apache, it should be a breeze. If you're compiling your own PHP, download the latest stable source from the PHP homepage, and unpack it. There are about a hundred configuration options, many with their own particular dependancies, so coniguring the source tree can be a real pain, but to get exactly the features you want, it's the only way to go. You'll need the apache source tree as well, so if you want to build your own PHP, you'll almost have to build your own apache as well, but if you're interested in this step, that's almost a given.
To compile PHP as a static module, perform the following:
tar zvxf apache_1.3.x.tar.gz
tar zvxf php-x.x.x.tar.gz
cd apache_1.3.x
./configure with_whatever_options
cd ../php-x.x.x
./configure --with-apache=../apache_1.3.x --with_whatever options
make
make install
cd ../apache_1.3.x
./configure --activate_module=src/modules/php4/libphp4.a
make
make install
To compile PHP as a DSO (you'll need the apxs utility and DSO support already compiled into apache for this), simply do:
tar zxvf php-x.x.x.tar.gz
cd php-x.x.x
./configure --with-whatever-options --with-apxs
make
make install
Some people might prefer to just copy the binary of apache over the old apache binary, thus avoiding any possible overwrites of existing configuration files.

Configuring Apache for PHP

Simply add the following lines to your httpd.conf file.
# Use the next line if PHP is a DSO, omit it otherwise
LoadModule php4_module /path/to/php3/module/libphp4.so

# These lines need to go in for both DSO and static
AddModule mod_php4.c
AddType application/x-httpd-php4 .php4 .php  
That's about it. Pretty simple. Again, this is an Apache tutorial, so we won't go into writing PHP programs, but if there is enough interest, KPLUG will whip up a tutorial.


Perl and mod_perl

Perl, while not being written from the ground up for web-use like PHP was, has an enourmous existing code-base. With the advent of mod_perl's server-embedded Perl engine, its now fairly fast to not only use Perl scripts as CGI's, but to actually code entire Apache extension modules in Perl. Compilation and installation of mod_perl is similar to compiling and installing PHP, for example, to install mod_perl as a DSO, do:
tar zvxf mod_perl-1.x.tar.gz
cd mod_perl-1.x
perl Makefile.PL           \
   USE_APXS=1              \
   WITH_APXS=/path/to/apxs \ 
   EVERYTHING=1            \
   [... more options if desired ]
make
make test
make install
Please note that perl 5.003 requires patching (included in the mod_perl source tree in the INSTALL file) if you build mod_perl as a DSO. Apply the patch and recompile perl BEFORE building mod_perl.

Configuring Apache for mod_perl

There's more than one way to skin a cat, according to Perl fans. Similarly, there are lot of ways to configure apache with mod_perl. In fact, using PERL directives, it's completely possible to re-write httpd.conf completely in perl! But for basic functionality, just add the following:
   # for Apache::Registry Mode
   Alias /perl/   "/home/httpd/cgi-bin/"
   # for Apache::Perlrun Mode
   Alias /cgi-perl/  "/home/httpd/cgi-bin/"

   # For /perl/* as apache modules written in perl
   <Location /perl>
      Perlrequire  /path/to/apache/modules/perl/startup.perl
      PerlModule   Apache::Registry
      SetHandler   perl-script
      PerlHandler  Apache::Registry
      Options      ExecCGI
      PerlSendHeader On
   </Location>

   # For /cgi-perl/*  handling as embedded perl
   <Location /cgi-perl>
      SetHandler   perl-script
      PerlHardler  Apache::PerlRun
      Options      ExecCGI
      PerlSendHeader On
   </Location>

   # For mod_perl status information
   <Location /perl-status>
      SetHandler   perl-script
      PerlHandler  Apache::Status
      order        deny, allow
      deny from all
      allow from localhsot
   </Location>

   # Include the next line if mod_perl is a DSO
   LoadModule perl_module  /path/to/apache/modules/libperl.so

   AddModule mod_perl.c
While this, of course, just scratches the surface, there is plenty of additional information available both in the pod files that come with mod_perl, the apache module help file, and on the mod_perl home page. As with CGI and PHP, this isn't a Perl tutorial, but if enough interest develops, KPLUG will eventually cover it in more depth.


Server Side Includes (SSI)

Much like html pages with embedded scripts, SSI is just another set of what can be thought of as almost HTML tags. SSI allows for an easy way to include right in the middle of a web page such things as file modification time, values of environment variables, current date and time, and even the output of programs and scripts. It differs from standard CGI in that the "included" information is parsed right into an html file, rather than the entire content being generated by a program or script. The apache documentation carries a quite good tutorial. Probably the most common use for SSI's is including a standard footer or header on web pages.

Configuring Apache for SSI

There isn't much to do, really. Just configure and compile mod_include (either as DSO or static), and add a few lines to the config file:
   # Use this to allow SSI in files.  This can go in stanzas, too.
   Options +Includes
   # Or you can have SSI but disable executing scripts via SSI with
   Options +IncludesNOEXEC

   # Use this if mod_include is a DSO
   LoadModule includes_module     /path/to/apache/modules/mod_include.so

   AddModule mod_include.c 

   AddType text/html .shtml
   AddHandler server-parsed .shtml
   
   # Optionally, you could run *all* html files through the SSI parser.
   # This does no harm to non SSI html files, but slows you down a bit
   AddHandler server-parsed .html


Security in an insecure world

Apache, in a nutshell, is a way to let someone see data. It can therefore be important that you know who that someone is, and what you are letting them see. Things like PHP, CGI, and SSI have the potential to expose your entire filesystem (and any filesystem you have access to) to users in unintended ways. There are hundreds of precautions you can take, ranging from the simple to the absurd. The simplest thing to do is to simply restrict a bit of access. Setting proper permissions and ownership on the ServerRoot directory is a good start. Disallowing filesystem root access using <Directory> stanzas is another. Ensuring that CGI is only run in proper script-aliased directories is a must. Same goes for SSI. suExec'ing CGI's or using CGIWrap isn't a bad idea. Finally, you could run your entire web server in chroot'ed environment. There are patches available to the apache source code to do have apache drop itself down into a chroot environment, but you might have trouble syncing patches with your current version of apache, so another approach may be used. Without patching apache's source, you can still start/run in a chroot jail using proper configs and startup scripts. How to do that is quite nicely described in chapter 29 of the online book Securing and Optimizing RedHat Linux.


Performance

While apache is fast, some people read the so-called "independant benchmark studies" and wonder "Is it fast enough?". The people of the Apache Development Team have re-written portions of apache to deal with some of the un-real situations (also known as "dirty tricks") covered by some of the benchmark studies, but Apache was written to be stable first, and fast second. That leaves room for improvement. There are some patches out there to speed things up a bit (at the expense of a tiny bit of functionality), but in general the best thing you can do to enhance performance is tweak your config file a bit.

TUX

TUX is a new kernel-level web server from RedHat, which while quite limited in ability (so far) has some exciting features. The only real caveat to TUX is that it only works with 2.4 or newer kernels, and may require you to perform a kernel build. Tux is designed to serve static pages (most web documents are static), and to hand off to a companion web server (usually apache) anything it can't handle. While this may not sound like much, consider a web page that needed SSI, but had 5 images on it. TUX would get the request for the SSI page, and hand it off to apache. Then the browser would parse the page, realize it needed 5 inline images, and request them. TUX would serve them up just fine, since the images are static. And since TUX is partially a kernel mode process, it can take advantage of zero-copy block I/O and with a good ethernet card can do direct scatter-gather DMA straight from the page cache to the network. With a nifty little thread scheduler to help minimize disk I/O impacts and the reality that many complete web sites can fit in a 512MB computer's page cache, TUX can blaze along at over 12000 transactions a second. Even SPECweb99, the most stringent web benchmark there is (and the one that most closely models reality) puts TUX (3435) at 3 times faster than IIS 5 (1454) on comperable hardware. Now that's fast! Of course, the IBM e390 with 16 processors and 32 ethernet cards (gigabit, or course) scored 21000, but waddayagonnadoo?

Configuring TUX

Tux isn't all that hard to get, compile, install, and configure, but since TUX is a young product, configuration options vary widely from version 2.0 to 2.1. First, if you don't already have TUX enabled in your 2.4 kernel, go get it from RedHat. Follow the included directions to apply the patch and compile the kernel, user space programs and optional TUX modules. Then your configuration can be done via either /etc/sysconfig/tux (on RedHat systems), by sysctl (if you have it on your system) or by writing directly to the /proc filesystem. While there are many options, the only three that must be set to get tux to work are serverport, clientport, and documentroot. If configuring tux via the /proc filesystem, just echo the values you want onto the files.
   echo 80 > /proc/sys/net/tux/serverport
   echo 8080 > /proc/sys/net/tux/clientport
   echo '/home/httpd/html' > /proc/sys/net/tux/documentroot
Then just start /usr/sbin/tux, and you're on your way.

Configuring Apache to work with TUX

Apache configuration requires very little tweaking to work with tux. All that is really required is to change the port that apache listens on to whatever you've set as tux'es clientport. An additional measure would be to limit apache to listening on the loopback interface so that it only services requests from tux, and can't be accessed from the web.
   # Change httpd.conf
   # From: Port 80 to:
   Port 8080

   # From: BindAddress * to:
   BindAddress 127.0.0.1
With a little experimentation, you can get the webserver of your dreams up and running in less than a day.