Table of Contents
- Prerequisites
- Other programs
- Installation
- Linux system configuration
- General configuration
- Web pages' MIME types
- Access logs
- Log rotation
- Aliases
- Handling directories
- Writing HTTP rather than HTML
- Users' own web pages
- Delegating the controls for certain pages
- Access control by client IP address
- Access control by user authentication
- Virtual hosts
Figure 1. Prerequisites
Network
Hardware
Software
Wetware (people!)
Figure 2. Prerequisites: Network
Permanent and direct IP access
Vulnerable periods?
Support?
24hrs/day, 365 days/year?
Holiday/Illness cover?
Figure 3. Prerequisites: Hardware
Macs, PCs, Suns, ...
Hardware support? (24x7?)
Backups?
Disc space
Network speed
Memory
Processor power
Figure 4. Prerequisites: Software
Permanently running daemon
Software support?
Service rates?
DNS lookup rates?
CGI?
Figure 5. Prerequisites: Wetware
Checking logfiles
Changing configuration files
Software updates & patches
Data files
Backups
Holiday/Illness cover
To run a web service you need four things: a connection to the outside world (network), a machine to run the service from (hardware), a program to run it with (software) and people to maintain both the server and the data it serves (‘wetware’).
Your network needs to be a permanent connection and your server needs to have a constant IP address. You neeed to know what the support is for your network, who to contact in case of problems, when the vulnerable periods are, etc. (The CUDN has Monday to Saturday 0800–0930,1700–1900 and Sunday 0000 to Monday 0800 for its vulnerable periods.)
Your machine must have the power to support the number of hits the server will get. Note that it must be powerful to cope with the peak demand and not just the mean or modal demand. The most important element of your hardware for a web server is the network card; buy a good one. Next most important is the amount of RAM. The CPU comes last in the list. Unless you are planning to run very computationally expensive CGI programs you don't need the latest, greatest, fastest chip in the world.
Clearly you neeed a good web server program. The program described in this course is Apache. It has all the facilities you could want (and then some), is free. (and that's ‘free’ as in speech as well as in beer. It is also the most widely used web server on the Internet, being used by >65% of active sites. (Source: Netcraft Web Server Survey, February 2002.)
Finally, it is important to realise that to provide a web service rather than just a server you need people. Pages get dated, users' needs change, links pointing out of your site go stale, dump tapes need to be changed and error reports need to be addressed.
Figure 6. Support tools
Editors
HTML checkers
Graphics manipulators
Scanners etc.
Log file analyser
CGI programs
Figure 7. Support tools: Text editors
Plain text editor
Configuration files
HTML data files
emacs, vi, pico
Figure 8. Deprecated support tools: HTML editors
There exist specialist HTML editors
Inflexible & incomplete
Poor quality HTML
Plain text editors still pretty good
Avoid MS Word like the plague
Figure 9. Support tools: HTML checkers
Check HTML syntax
Check HTML quality
Check links still work
weblint
cron job
Figure 10. Support tools: Graphics manipulators
Best all-rounder is gimp—the GNU Image Manipulation Program
Also ee—Electric Eyes
Bother available as Red Hat packages.
Figure 11. Support tools: Scanners etc.
Flat bed scanners
Digital cameras
A web server serves out web pages. However, to populate the web site the pages need to be written and checked and log files may need to be analysed.
Firstly you will need to write the web pages, or possibly edit those submitted by others. The author still regards a plain text editor (emacs, vi or perhaps even pico -w) as the best tool for editing web pages. Contrary to popular belief, the dedicated web authoring tools are still not very good. Of the various authoring packages by far the worst is Microsoft Word's ‘save as HTML’ feature. The quality of HTML generated by this is appalling and it should be avoided like the plague.
The HTML in the page still needs to be checked for syntax, link integrity and accessibility. This is true whether or not a dedicated HTML authoring package was used; indeed, if one was used then a ‘second opinion’ is all the more important. The text itself should also be checked for spelling and grammar, but beware the rather over-simplistic grammar rules in some word processors.
In addition to the text there are all the other media formats, with static graphics the most common. The GNU Image Manipulation Program (the GIMP), gimp, is a massively powerful GUI image manipulator that starts at the level of Adobe Photoshop but takes things much further, including having a scripting language. For simple viewing, croping, rescaling and format conversion the Electric Eyes program, ee, is considerably simpler to learn and use. Images can be initially created interactively (e.g. with the GIMP), with a digital camera or by scanning in photographs.
Figure 12. Support tools: CGI programs
Common Gateway Interface
Not covered in this course
SSI
SSIexec
PHP
perl CGI module
python CGI module
The other support you may need is for CGI programs. First you need to make a decision: are you going to permit the running of programs on the server? While this course is not going to review the technologies in any depth it should give some idea of the spectrum available and the dangers asssociated with them.
The author is aware of only one vulnerability of a system through the web server itself. He is, however, aware of many break-ins via the CGI programs run by web servers. Static pages are vastly more secure.
The simplest of this style of program is the ‘server side include’ facility. This allows you to add certain tags to a web page which are not valid HTML but which are transformed by the server into valid HTML with dynamic content. A common example is the SSI tag that says when the page was last modified. However, consider whether you want the ‘last updated’ tag to be the last time you fixed your spelling (automatic version) or the last time you changed the content (manual version). Slightly beyond this is the ‘server side include executable’ where the tag runs an external program to generate the content. It is at this point, where an extra program is run by the web server, that you need to start being very careful about security. (It is possible to turn off the SSI executable feature while retaining the weaker SSI functionality.)
PHP takes this one stage further, offering a scripting language embedded in the HTML to provide powerful functionality and logic. The Perl and Python CGI modules take the page author away from HTML all together. The CGI modules are presented with a URL (and some input data for POST queries) and have to write their own HTTP as well as HTML, in the format described for the ‘as is’ pages in the section called “Writing HTTP rather than HTML”. The modules provide simple function calls for most of this though.
Figure 13. Support tools: Secure access
ssh: Replacement for rsh, rlogin, rcp
Maching daemon: sshd
Red Hat package
Unix Support's CD
Finally, unless you plan to work exclusively at the console (you don't) you will need secure network access to your server. Don't use telnet or the ‘r-commands’ (rlogin, rsh, rcp and rsync) but their secure analogues provided by the ‘ssh’ suite of programs.
Red Hat Linux version 7.0 and above ship with an SSH system. Also, Unix Support provides a CD with ssh clients for most platforms including a Red Hat Linux packaging of the software suite for the Intel platform. The CD is free from the CS Reception.
Figure 14. Example server
3Com 3c905B, 700MHz Athlon, 256MB RAM, 20GB disc
Red Hat Linux 7.3
Apache v1.3.23
The example server we are going to use for this course is a 700MHz Athlon with 256MB of RAM a 1GB disc and a 3Com 3c905B card. This is adequate for a production server. If it was very heavily used I would increase the disc size. The RAM and the CPU are perfectly adequate.
We will be running Red Hat Linux 7.3. Typically we would not be running X on the web server but we will for this example because we will be our own client too. We will run with Apache 1.3.23 which is the version shipped with Red Hat Linux 7.3.
Figure 15. Apache installation
As
root
Unix Support's NFS server
Mount Red Hat mirror
Locate Apache package
Install Apache package
Unmount Red Hat mirror
Figure 16. Apache installation: Mounting the mirror
Unix Support mirror:
nfs-uxsup.csx.cam.ac.uk
Red Hat mirror:
/linux/redhat
#
mount -o ro nfs-uxsup.csx.cam.ac.uk:/linux/redhat /mnt
#
cd /mnt/updates/7.3/en/os/i386/
#
ls -l apache-*
-rw-r--r-- ... apache-1.3.23-14.i386.rpm -rw-r--r-- ... apache-devel-1.3.23-14.i386.rpm -rw-r--r-- ... apache-manual-1.3.23-14.i386.rpm
Figure 17. Apache installation: Examining the package
#
rpm --query --info --package apache-1.3.23-14.i386.rpm
Name : apache Relocations: (not relocateable) Version : 1.3.23 Vendor: Red Hat, Inc. Release : 14 Build Date: Wed 19 Jun 2002 16:55:48 Install date: (not installed) Build Host: daffy.perf.redhat.com Group : System Environment/Daemons Source RPM: apache-1.3.23-14.src.rpm Size : 1248999 License: Apache Software License Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Summary : The most widely used Web server on the Internet. Description : Apache is a powerful, full-featured, efficient, and freely-available Web server. Apache is also the most popular Web server on the Internet.
Figure 18. Apache installation: Examining the package
#
rpm --query --list --package apache-1.3.23-14.i386.rpm
/etc/httpd/conf /etc/httpd/conf/httpd.conf ... /etc/rc.d/init.d/httpd.init ... /var/www /var/www/html /var/www/html/index.html /var/www/icons /var/www/icons/a.gif ... /usr/man/man8/httpd.8 ... /usr/sbin/httpd ...
Figure 19. Apache installation: Installing the package
#
rpm --install apache-1.3.23-14.i386.rpm
#
cd
#
umount /mnt
This has not started the server.
Please remember to unmount the mirror.
We install Apache as root
and then
configure it so that root
will not be
needed subsequently for the configuration or administration of the
server except to shut it down or restart it.
We use the network file system (NFS) to mount Unix Support's
mirror of the Red Hat distribution. Within it
(/mnt/updates/7.3/en/os/i386/
) are all
the software packages, including Apache
(apache-1.3.12-2.i386.rpm
).
We examine the Apache package for information and a listing of its contents and finally we install it. Once we've done the installation we unmount the file server.
This installation has not started the server but has arranged that it will be started on the next reboot. (Though we don't need to and won't reboot just to start it.)
Figure 20. Apache installation: Configuration file layout
+--- conf/ ---+--- *.conf | +--- access.log /etc/httpd/ ---+--- logs -> /var/log/httpd/ ---+ | +--- error.log +--- modules -> /usr/lib/apache
Figure 21. Apache installation: Data file layout
+--- cgi-bin/ empty | /var/www/---+--- icons/ --- *.gif | +--- html/ --- index.html default
Figure 22. Apache installation: System file layout
/usr/sbin
: Binaries
/usr/man
: Manual pages
/etc/rc.d
: Startup/Shutdown scripts
/etc/logrotate.d
: Log rotation
The files installed come in three classes: the configuration
files (/etc/httpd/
,
/etc/logrotate.d/apache
) that the server
managers need access to, the data files
(/var/www/
) that the web page authors and
editors need access to and the system files (everything else) that
we aren't going to touch. We will define groups to keep these
categories apart.
If you were prepared to do all the updates to the web pages
as root
and had no special requirements
such as access controls then you could just run the program now.
The website exists under /var/www/html/
and I
wish you much happiness with rogether. However a small amount of
work (typically about 15 minutes) will make everything a lot
easier and safer.
Figure 23. Configuring the operating system
Package provides a user and group for the daemon
We need to add a group for the apache administrators
And at least one group for the web authors
Avoid use of
root
Log rotation
Figure 24. Configuring the O/S: User & groups
#
groupadd -r webadmins
#
groupadd -r webeditor
#
vi /etc/group
Figure 25. Configuring the O/S: File permissions as installed
#
ls -ld /var/www /etc/httpd /var/log/httpd
drwxr-xr-x 3 root root 1024 Jun 27 12:09 /etc/httpd drwxr-xr-x 5 root root 1024 Jun 27 12:09 /var/www drwxr-xr-x 2 root root 1024 Jun 27 16:36 /var/log/httpd
Only
root
can make modifications.
Figure 26. Configuring the O/S: File permissions
Change the group to
webadmins
:#
chgrp -R webadmins /etc/httpd /var/log/httpd /etc/logrotate.d/apache
#
chgrp -R webeditor /var/www
Let the group write to the directories:
#
chmod -R g+w /var/www /etc/httpd /var/log/httpd /etc/logrotate.d/apache
Make the group ownership ‘setgid’:
#
find /var/www /etc/httpd /var/log/httpd -type d -exec chmod g+s {} \;
Figure 27. Configuring the O/S: File permissions—as changed
#
ls -ld /var/www /etc/httpd /var/log/httpd /etc/logrotate.d/apache
drwxrwsr-x 3 root webadmins 1024 Jun 27 12:09 /etc/httpd -rw-rw-r-- 1 root webadmins 172 Jun 27 12:09 /etc/logrotate.d/apache drwxrwsr-x 5 root webeditor 1024 Jun 27 12:09 /var/www drwxrwsr-x 2 root webadmins 1024 Jun 27 12:09 /var/log/httpd
The daemon will run as user
apache
.How can the daemon write its log files?
It starts life and opens the log files as user
root
.
Figure 28. Being a webadmin
A fresh login will pick up membership of group
webadmins
.This gives access to existing
webadmins
-writable files.Files created in setgid directories will be owned by group
webadmins
Check your permissions mask
We create system groups for the administration of the server
and the management of the web pages. On a small server these can
be combined. On large servers you may well want multiple groups
to manage various subsets of the web pages. We specifically want
to avoid requiring root
access to
reconfigure the web server. root
will be
used to start or stop the server and nothing
else.
We set the permissions on the data and configuration
directories so that members of the relevant group can make changes
(g+w
on files and directories) and any files or
subdirectories created will have matching group ownership
(g+s
on directories).
The chmod (change the mode (permissions)
of a file system object) and chgrp (change the
group ownership of a file system object) commands (and the
chown command which changes the user ownership
of a file system object, though we aren't using that command here)
have a -R
option to make them behave recursively.
Every file system object beneath the named directories will have
their mode or group modified.
The find command is slightly trickier.
We want to apply the g+s
mode change to
every directory beneath the named directories
but we don't want to apply it to the files. The
find command shown starts at each of the three
directories listed and checks each file system element beneath
them, testing to see if the element in question is a directory
(‘-type d
’). If it is then it
executes a command
(‘-exec ... \;
’) and that
command is
‘chmod g+s
’.
(‘dir
{}
’ is replaced by the name of the
file system element being considered.)
Figure 29. Starting the server
#
/etc/rc.d/init.d/httpd start
Starting httpd: [ OK ]
While we're here, we shall describe the manual stopping of the server, which we will hardly ever need, and the manual restarting of the server which we will use frequently in this course to bring in a new configuration file. Restarting is just stopping and starting wrapped into a single command.
Figure 30. Restarting or stopping the server
#
/etc/rc.d/init.d/httpd restart
Shutting down http: [ OK ] Starting httpd: [ OK ]#
/etc/rc.d/init.d/httpd stop
Shutting down http: [ OK ]
Figure 31. Configuring the service
As a webadmin, not as
root
!Directory:
/etc/httpd/conf/
Directory and contents are group-writable by
webadmins
httpd.conf
: Configuration file
srm.conf
&access.conf
: Obsolete & empty
Directory:
/etc/logrotate.d/
apache
: Controls the rotation of the log files.File is writable by members of group
webadmins
.
Red Hat's packaging of Apache's configuration files echoes
an obsolete format of having three distinct configuration files in
the /etc/httpd/conf/
directory. In this
course we will put all our configuration in the single file:
/etc/httpd/conf/httpd.conf
and we will write
this file from scratch to better learn what it all means.
The only other configuration file we will need is the log
rotation file in /etc/logrotate.d/apache
. We
need to change this only if we change either the log files being
kept or the duration they are kept for. These two reasons take on
extra significance given the lunacy of the 1998 Data Protection
Act. The client machine names or addresses that appear in logs
and your record of what they have fetched may constitute personal
data. We will return to this file in
the section called “Log rotation”
Figure 32. httpd.conf
: Running the daemon
ServerType standalone
ServerRoot /etc/httpd
DocumentRoot /var/www/html
Port 80
User apache
Group apache
ServerAdmin rjd4@cam.ac.uk
ServerName www.inst
.cam.ac.uk
ErrorLog /var/log/httpd/error_log
LogLevel info
Options None
Figure 33. Syntax: Running the daemon
ServerType standalone
The daemon will not rely on inetd to launch it on demand but will run permanently.
ServerRoot /etc/httpd
Any files refered to in this configuration file will either be fully qualified or resolved relative to this directory.
DocumentRoot /var/www/html
The documents to be served are found in this directory.
Port 80
This is the standard port of WWW services. It is privileged on a Unix system so must be opened by
root
. Once opened, the port can be passed to unprivileged services (e.g. running userapache
). Ports 8000 and 8080 are commonly used ports for completely unprivileged servers.
User apache
Group apache
We created a user and group specifically for the webserver. These two lines tell the server to use them. The server can change its user and group ids only if it is started as
root
.
ServerAdmin rjd4@cam.ac.uk
Some error messages displayed to the client can contain a contact email address. This is where it is defined.
ServerName www.
inst
.cam.ac.ukYou may not need this line. If your machine's real name is
boring.
but there is a DNS record pointinginst
.cam.ac.ukwww.
to it as well then you want the server to identify itself asinst
.cam.ac.ukwww.
. This is how you override the machine's host name.inst
.cam.ac.uk
ErrorLog /var/log/httpd/error_log
Any error messages will be logged to the file
/var/log/httpd/error_log
.
LogLevel info
An error in Apache comes with a severity rating. This directive specifies what the minimum level to log is.
Options None
Apache has various options, almost all of which default to ‘on’. We will turn them off so we are forced to meet them explicitly in this course.
Figure 34. Syntax: Suboptions to
LogLevel
emerg
Emergencies—system is unusable. e.g ‘
Child cannot open lock file. Exiting.
’
alert
Alert—Action must be taken immediately. e.g ‘
getpwuid: couldn't determine user name from uid.
’
crit
Critical condition—Any different from
alert
? e.g ‘socket: Failed to get a socket, exiting child
’
error
Error condition—effects a single transfer, not the system as a whole. e.g ‘
Premature end of script headers
’
warn
Warning e.g ‘
child process 1234 did not exit, sending another SIGHUP
’
notice
Notice—Normal but significant condition. e.g ‘
caught SIGTERM, shutting down
’
info
Informational messages e.g ‘
Server seems busy, (you may need to increase StartServers, or Min/Max SpareServers).
’
debug
Debugging messages e.g ‘
Opening config file /etc/httpd/conf/httpd.conf
’
Figure 35. “Pool” of daemons
Single initially launched daemon.
Runs as
root
Answers no requests
Maintains a “pool” of child daemons
Pool of child daemons that do the real work.
These do the real work
Run as user
apache
Answer a certain number of requests and then die
Parameters for experts only!
Figure 36. httpd.conf
: Parameters for daemon pool
PidFile /var/run/httpd.pid LockFile /var/lock/httpd.lock ScoreBoardFile /var/run/httpd.scoreboard Timeout 300 KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 15 MinSpareServers 5 MaxSpareServers 20 StartServers 8 MaxClients 150 MaxRequestsPerChild 100
Figure 37. Apache's functionality
Our server has very little functionality.
It serves all documents as ‘
text/plain
’.It can only log errors.
We can add functionality as we need it.
‘Modules’
We can run a web server with just the configuration lines we
have met so far. It will be not very good, to say the least. Its
principal failing is that it has no concept of the MIME content
types of the objects it serves and dishes everything up as MIME
content type ‘text/plain
’.
If we look at
http://localhost/index.html
we
see the HTML source (because the browser has been told that the
document is of type text/plain
). We need
to add some functionality to Apache: the ability to determine what
MIME content type a document is.
Apache's functionality comes in a set of files called
‘modules’. We start by clearing any default modules
built into the system by default. Without this line many modules
would be available by default. Partly because this is a lesson
and partly because all good system administrators are control
freaks (regarding the systems, not the users!) the only modules
used here will be the ones we explicitly add. The
mod_so.c
module is built in to the Apache
binary. But because we have cleared the module list it is not
turned on by default. This is the module that allows us to load
extra modules that are not built into the binary.
Figure 38. httpd.conf
: Initialising the modules
# Start with an empty module list ClearModuleList AddModule mod_so.c
Figure 39. Syntax: Starting up the module system
ClearModuleList
Lose all information about modules in use.
AddModule mod_so.c
Use the
mod_so.c
module. Because it is built in to the binary we don't need to specify the external file the module lives in.
Figure 40. httpd.conf
: Following symbolic links
Options +FollowSymLinks
The server at the moment also doesn't respect symbolic
links, refusing to follow them either for pages or directories.
Following symbolic links is an option under Apache and, as you
will recall, we turned off all options so that we would notice
them. There are two options relevant to symbolic links:
FollowSymLinks
and
SymLinksIfOwnerMatch
.
Figure 41. Syntax: Option suboptions for symbolic links
Options +FollowSymLinks
The web server will follow symbolic links.
Options +SymLinksIfOwnerMatch
The web server will follow symbolic links if the owner of the link (typically its creator) and the owner of the target of the link are the same.
The Options directive has a catch. If we give the line
Options FollowSymLinksthen this completely overrides any previous Options lines and FollowSymLinks becomes the only option in force. For this reason, we use the modifier syntax
Options +FollowSymLinkswhich adds the option to the set of options in force.
Figure 42. httpd.conf
: Adding support for MIME types
LoadModule mime_module modules/mod_mime.so AddModule mod_mime.c TypesConfig /etc/mime.types DefaultType text/plain AddEncoding x-compress Z AddEncoding x-gzip gz tgz
Now we see our first use of an external module. The syntax for the process is rather obscure. This is unfortunate but nothing we can't handle.
Figure 43. Syntax: Loading an external module
LoadModule mime_module modules/mod_mime.so
This line says that the file
modules/mod_mime.so
(resolved relative to the ServerRoot definition at the start of the configuration file) contains a module calledmime_module
. This module is added to the list of modules that the server knows about. As yet the server won't use the module; it just knows where to get it should it be called upon to use it.
AddModule mod_mime.c
This line tells the server to look through all the modules it knows about (either built-in or located with LoadModule directives) looking for a module whose original source file was called
mod_mime.c
(stupid, but that's how they chose to do it) and activate it.
When a module is activated some commands are added to the
set permitted in the configuration file. The three directives
used here (‘TypesConfig’,
‘DefaultType’ and
‘AddEncoding’) are all provided by
mime_module
module and would be invalid
without the preceding LoadModule and
AddModule lines.
Next we will consider those extra commands that the
mod_mime
module adds. Unless the module is
loaded and added before these commands are used they will result
in a syntax error.
Figure 44. mod_mime
: Directives
TypesConfig /etc/mime.types
Red Hat ships with a file called
/etc/mime.types
(part of themailcap
package) which identifies the file name extensions used for various MIME content types on the system. This line instructs the web server to use that file to identify MIME content types of files.
DefaultType text/plain
This says that if the server cannot determine the MIME content type of the file it is about to send then it should presume
text/plain
.
AddEncoding x-compress Z
This declares that any file whose name ends in ‘
.Z
’ should be declared as having MIME encoding type ‘x-compress
’ (i.e. it is compressed) and the file name without the.Z
suffix should be used to determine the underlying MIME content type.
Figure 45. Some lines from /etc/mime.types
# MIME type Extension application/activemessage application/andrew-inset ez application/applefile application/mac-binhex40 hqx application/octet-stream bin dms lha lzh exe class application/postscript ai eps ps application/x-dvi dvi application/x-javascript js image/gif gif image/jpeg jpeg jpg image/x-xwindowdump xwd message/partial message/rfc822 model/vrml wrl vrml text/plain asc txt text/html html htm
At the moment we are only logging errors. There is an independent mechanism to log transfers and it comes as a module. Furthermore, we have no means to deal with the log files generated. This section will address the first issue and the following will address dealing with log files once we've got them.
Figure 46. httpd.conf
: Logging transfers
LoadModule config_log_module modules/mod_log_config.so AddModule mod_log_config.c HostnameLookups On IdentityCheck Off CustomLog /var/log/httpd/access_log "%t %h \"%r\" %>s %B"
Figure 47. mod_log_config
: Directives
CustomLog
filename
"format
"Log to the file with the given format. Multiple log files may be defined.
HostnameLookups On
Convert IP addresses to hostnames.
IdentityCheck On
Do an
ident
lookup for each incoming request.
Figure 48. mod_log_config
: Logging escape sequences
%t
: Time of the request
%h
: Remote hostname
%r
: First line of the request
%s
: Status code
%B
: Data bytes sent
The CustomLog directive takes two arguments. The first is the file name to log to and the second is the format of the log itself. The format line consists of a series of ‘escape sequences’ (the things starting with percentage characters). Each of these is replaced by some piece of information about the request or the server's response to it. There is no reason why you should not have more than one log file; you just have multiple CustomLog lines each defining a different log file.
The simple escape sequence is
‘%X
’ for some value of
‘X’. See the figure for the most useful examples.
It is possible to log an arbitrary header from the query or
response. For the server it is usually of more use to see the
incoming headers. See the syntax description for some examples.
Of most use in log files is the referring page. For example,
you could strip out just those log lines with status
code 404 (page not found) and check the refering page. If
it's an internal page you can fix the link and if it's external
you can contact the webmaster responsible.
The %h
code requires the server to
perform a lookup in the DNS to turn the IP address of the
incoming request into a name. This is not an expensive
operation, but if your web site is very heavily used you may
want to avoid it. There are two ways to go about this. You can
use %a
instead. This just logs the IP address
and attempts no lookup. Alternatively you can use
%h
but set the directive
HostnameLookups Off
. Under these
circumstances %h
behaves like
%a
. However, if you want to do access control
based on client host name you must have
HostnameLookups On
, hence the provision of
%a
.
The %l
escape also requires some
explanation. The ident
protocol
provides a means for the server to ask of the client the name of
the user on the client (or some tag uniquely identifying the
user) who is making the connection. This is only possible if
the client system is running the corresponding
ident
server. This server is quite
common on multi-user systems and almost unknown on single-user
systems. Again, the load is small for a lightly loaded web
server but potentially severe for a heavily loaded one. (Far
more so than for the hostname lookups.)
Finally we need to explain the
‘%>s
’ construction. We will
see in a later section that some modules run a page through
quite intricate processing. ‘%s
’
is the status code for the processing of the query and
‘%>s
’ is the status code
finally returned to the client. The latter is typically what we
really want. The figure below lists the most commonly seen
status codes. The full set can be found in
RFC 2616.
Figure 49. Common status codes
- 200
OK
- 301
Moved Permanently
- 307
Temporary Redirect
- 400
Bad Request
- 401
Unauthorized
- 403
Forbidden
- 404
Not Found
- 500
Internal Server Error
- 505
HTTP Version Not Supported
Figure 50. mod_log_config
: Common logging
escape sequences
%a
: Client's IP address
%B
: Bytes sent, excluding HTTP headers.
%f
: The name of the file served.
%h
: Remote hostname, or IP address is hostname lookups are off.
%l
: Remote logname fromidentd
ifIdentityCheck
is on.
%r
: The first (typically only) line of the request.
%s
: Status code of the request.
%T
: Number of seconds taken to service the request.
%t
: Time of the request.
%U
: The URL requested.
%u
: The userid used if this is a page that requires userid/password.
%{
: Argument ofheader
}iin the incoming request
header
%{
: Argument ofheader
}oin the outgoing response
header
The escape sequences can be more involved than this. Full details are in the Apache documentation.
The %i
logging option records the value
of an incoming, request header. The most commonly useful
headers are given below.
Figure 51. HTTP request headers
Authorization
: Access rights to restricted pages.
From
: E-mail address of the user making the request. (Often blank.)
If-Modified-Since
: Only send the data if necessary.
Referer
: The URL of the referring page.
User-Agent
: The web client. Many lie.
Figure 52. Some example log lines
[17/Apr/2000:10:10:25 +0100]hostname
"GET /index.html HTTP/1.0" 200 1316 [17/Apr/2000:10:11:00 +0100]hostname
"GET /bogus.html HTTP/1.0" 404 0 [17/Apr/2000:10:12:00 +0100]hostname
\ "GET http://elsewhere
/index.html HTTP/1.0" 200 1316 [17/Apr/2000:10:30:23 +0100]hostname
\ "GET /cgi-bin/phf?Qalias=x%0a/bin/cat/%20/etc/passwd HTTP/1.0" 404 0
The figure has four example log lines in the format defined in our configuration file.
[17/Apr/2000:10:10:25 +0100] hostname
"GET /index.html HTTP/1.0" 200 1316
The first line shows a succesful transfer of the URL
http://
.
Note that the client need only request the local part of the URL
having determined what machine to connect to itself.machine
/index.html
[17/Apr/2000:10:11:00 +0100] hostname
"GET /bogus.html HTTP/1.0" 404 0
The second line shows an unsuccessful transfer request. The file being looked for does not exist (status code 404). Note that the logged number of bytes sent back is 0.
[17/Apr/2000:10:12:00 +0100]hostname
"GET http://elsewhere
/index.html HTTP/1.0" 200 1316
The third line is an example of someone trying to use the
server as a proxy server. If a request
comes in for a fully qualified URL some servers (and Apache if
you configure it appropriately) will act as a web client, fetch
that URL and pass it back to you. By default Apache does not do
this. Instead, it ignores the
http://
component and treats it as a request for the local URL
elsewhere
/index.html
. Note that this request
generates a status code 200 and returns
1316 bytes—exactly the same number as in line
one.
[17/Apr/2000:10:30:23 +0100] hostname
\
"GET /cgi-bin/phf?Qalias=x%0a/bin/cat/%20/etc/passwd HTTP/1.0" 404 0
The fourth line is an example of an unsuccesful hacking
attempt. The phf script has a hole
permitting arbitrary shell commands to be run. Note that these
would have run as the user apache
which
has no special privilege, but it is still a way in.
The Data Protection Act (1998). The Data Ptotection Comissioner's office has advised us that machine names and IP addresses that can be used to identify an individual (e.g. that of the computer in a student's room) may constitute personal data in the meaning of the DPA(98). Until there is an expensive test case and some ignorant, senile, senior judge pronounces precedent we won't know for certain.
In this section we consider what we can do with the logs and, in particular, how to stop them growing out of control.
Figure 53. /etc/logrotate.conf
# rotate log files weekly weekly # keep 4 weeks worth of backlogs rotate 4 # send errors to root errors root # create new (empty) log files after rotating old ones create # RPM packages drop log rotation information into this directory include /etc/logrotate.d
Figure 54. /etc/logrotate.d/apache
—as shipped
/var/log/httpd/access_log /var/log/httpd/error_log { missingok sharedscripts postrotate /bin/kill -HUP `cat /var/run/httpd.pid 2>/dev/null` 2> /dev/null || true endscript }
Red Hat Linux provides a service called ‘log
rotation’ which provides a uniform mechanism to stop log
files growing out of control over time. At regular intervals
(nightly, weekly and monthly are all common) the log file
error_log
, say, is renamed to
error_log.1
. If there was a previously
existing error_log.1
it is renamed to
error_log.2
,
error_log.2
to
error_log.3
and so on up to some limit.
The default frequency of this operation is defined in the file
/etc/logrotate.conf
to be weekly and the
number of log files kept is set to default to 4.
error_log.3
is discarded rather than
renamed to error_log.4
. A new
error_log
is created.
The directory /etc/logrotate.d/
contains the rotation instructions specific to the log files for
a particular package. The log files for the
apache
package are kept in the file
/etc/logrotate.d/apache
. These are given
as /var/log/httpd/error_log
and
/var/log/httpd/access_log
. The empty
brackets after the /var/log/httpd/error_log
line means that there is no special action needed after the
error log file has been rotated. The three lines in the
brackets after the
/var/log/httpd/access_log
line identify a
(single line) shell script that should be run after the access
log file has been rotated. This sends the HUP signal to the web
daemon which causes it to reopen all its log files so that it is
now logging to the newly created log files rather than
the .1
versions.
While this course does not consider the
analog log analysis program, we will remark
that the log rotation script is a good place to run it from.
Each time the system rotates a log file,
analog gets to process it. We might also
want to address the DPA(98) issues here by insisting that the
log files not be world-readable. The create line
stipulates that when the logs files are rotated a new, empty one
is to be created which is read/write to
root
, read-only to members of group
webadmins
and not readable at all by
anyone else.
Figure 55. /etc/logrotate.d/apache
—as modified
/var/log/httpd/access_log /var/log/httpd/error_log { missingok sharedscripts create 0640 root webadmins postrotate /bin/kill -HUP `cat /var/run/httpd.pid 2>/dev/null` 2> /dev/null || true endscript }
Figure 56. Resolving a URL to a file via an alias
By default, the ‘local part’ of any URL is
converted to a file name by simply resolving it as a file name
relative to ServerRoot
, which is
/var/www/html/
on a Red Hat Linux
installation. So, for example, the URL
http://
would resolve to the
file server
/wombat/index.html/var/www/html/wombat/index.html
.
However, sometimes we want a URL to point out of the
ServerRoot
directory tree. For example we
can see that the Red Hat Linux Apache installation puts a
collection of GIF files in /var/www/icons/
which is not below /var/www/html/
. We might
want the URL
http://
to resolve to the file server
/icons/new.gif/var/www/icons/new.gif
which it won't by default.
We can accomplish this in two ways: either we create a
symbolic link from /var/www/html/icons
to
/var/www/icons/
or we tell Apache to override
the ServerRoot
setting in certain regards.
As this is an Apache course, we will do the latter.
Figure 57. httpd.conf
: Aliases in Apache configuration
# Aliases LoadModule alias_module modules/mod_alias.so AddModule mod_alias.c Alias /icons/ /var/www/icons/
As before, to add functionality to Apache we need a module.
In this case it is the mod_alias
module.
This module adds a number of keywords to the configuration syntax
but we need only one for now. In the slide the
Alias directive maps a set of URLs with local
parts starting /icons/
to the directory
/var/www/icons/
.
Figure 58. Access log: Failing to read a directory
[27/Apr/2000:15:47:11 +0100]hostname
"GET /index.html HTTP/1.0" 200 2537 [27/Apr/2000:15:48:09 +0100]hostname
"GET / HTTP/1.0" 404 0
http://
worksserver
/index.htmlhttp://
doesn'tserver
/
At the moment, while our web server can handle files, determine their MIME content and encoding types from their names' extensions and log their transfer, it still can't handle URLs that resolve to directories. Attempts to get such a URL (e.g. the top level URL for the site as a whole) give 404 errors. This is clearly unacceptable.
There are two ways to handle this and most sites implement both.
The first is to provide automatic indexing. Given a URL corresponding to a directory, the web server will create an HTML web page giving a list of all the entries in that directory. These can be annotated with icons (or their ALT text) to identify the corresponding MIME content types. They can be labelled with sizes, titles etc. or left completely plain. We will start with the basic functionality (and the relevant module) and slowly add in some flashier functions.
The other approach is to nominate one or more filenames so
that if such a file exists within a directory then that file will
be displayed instead. The name index.html
is
traditional for this, but is not compulsory.
Figure 59. httpd.conf
: Module for automatic indexing
# Automatic indexing of directory URLs LoadModule autoindex_module modules/mod_autoindex.so AddModule mod_autoindex.c Options +Indexes
Figure 60. Browser's view of automatic indexing
Index of / * Parent Directory * index.html * poweredby.png
If we simply add the automatic indexing module and enable
automatic indexing with an Option statement
then we see lists of contents for directory URLs (including
index.html
). Notice that the three links
shown are one directory, one HTML file and a graphic in PNG
format but there is no indication of the MIME content type in
the page shown. Each entry is simply preceded by a
bullet.
Figure 61. httpd.conf
: ‘Fancy’ indexing
IndexOptions +FancyIndexing
Figure 62. Browser's view of fancy indexing
Index of / Name Last modified Size Description __________________________________________________________________ Parent Directory 25-Apr-2000 14:00 - index.html 25-Apr-2000 18:08 2k poweredby.png 01-Mar-2000 18:37 1k _____________________________________________________________
Figure 63. httpd.conf
: Fancy indexing options
IndexOptions +SuppressLastModified +ScanHTMLTitles
Figure 64. Browser's view of fancy indexing options
Index of / Name Size Description __________________________________________________________________ Parent Directory - index.html 2k Test Page for the Apache Web Server on Re> poweredby.png 1k _____________________________________________________________
The mod_autoindex
module adds a
large number of directives to the allowed set. We'll start with
just IndexOptions. This allows us to modify
the displayed format. Almost always it is passed the
FancyIndexing
suboption which turns on the
“long form” listing seen in the figure. In
conjunction with this are a number of other options to modify
this long form of the output, as shown in the figure. The
figure below below lists the more useful options to
IndexOptions.
Figure 65. httpd.conf
: Adding icons to the fancy listing
IndexOptions IconWidth IconHeight AddIconByType (HTM,/icons/layout.gif) text/html AddIconByType (TXT,/icons/text.gif) text/* AddIconByType (IMG,/icons/image2.gif) image/* AddIconByType (MOD,/icons/world2.gif) model/* AddIconByType (SND,/icons/sound2.gif) audio/* AddIconByType (VID,/icons/movie.gif) video/*
We can very usefully augment the automatic listings by adding icons (or the corresponding alternative text) to the lines of output depending on the MIME content types of the files. The directive AddIconByType is provided for this purpose. Its first argument is a pair: the ALT text and the icon. Its second argument is the MIME contents type or types it should be used for. Note that wild cards can be used for the MIME content subtype.
Whenever an image is included in a page it should have its
WIDTH
and HEIGHT
parameters explicitly specified but Apache doesn't have the
facility to parse the image files it serves to determine these
numbers automatically so a compromise is made. All the icons
shipped with Apache are the same size. The
IndexOptions parameters
IconHeight and IconWidth
instruct Apache to include these values (which are wired in to
the module's source). All the Apache icons have width
20 pixels and height 22 pixels. If you choose to
replace the icons you are strongly recommended to make them all
the same size and to use the line
IndexOptions IconWidth=in theX
IconHeight=Y
httpd.conf
file, to supply their
values.In this example I use one icon for HTML pages (by far the most common, we might expect) and another icon for all the other text subtypes. If the distribution of your MIME content types is different you might choose a different strategy. One place where this might make sense is with the application subtypes, where lumping them all together as “application content types” is not particularly useful.
Figure 66. httpd.conf
: Application subtypes
AddIconByType (_PS,/icons/a.gif) application/postscript AddIconByType (PDF,/icons/a.gif) application/pdf AddIconByType (HQX,/icons/binhex.gif) application/mac-binhex40 AddIconByType (DVI,/icons/dvi.gif) application/x-dvi AddIconByType (TEX,/icons/tex.gif) application/x-tex AddIconByType (TAR,/icons/tar.gif) application/x-tar AddIconByType (BIN,/icons/binary.gif) application/octet-stream AddIconByType (XXX,/icons/unknown.gif) application/*
There is a vast array of application subtypes. Every
application-specific data type can claim one using the
“x-
” extension subtypes.
The mainstream applications have applied for “real”
application subtypes. The application types you have on your
website should be represented by useful icons (there are plenty)
and the default (unknown.gif
in our case)
should only be used very rarely. The image in file
/var/www/icons/icon.sheet.gif
shows all of
them in a single picture.
Figure 67. httpd.conf
: Directories
AddIcon (_UP,/icons/back.gif) .. AddIcon (DIR,/icons/folder.gif) ^^DIRECTORY^^ AddIcon (---,/icons/blank.gif) ^^BLANKICON^^
Directories don't have MIME types so we need to explicitly
add an icon for these. To do this, we use
AddIcon which associates icons with items
either by name or by special controls. For example, we can
match on the name “..
” to
provide an icon for the reference to the parent directory.
There are also some special controls, written
“^^DIRECTORY^^
” and
“^^BLANKICON^^
”, match
directories and places where no icon would be used (to get the
formatting right).
Figure 68. Browser's view of a fully labelled web page
Index of / Name Size Description __________________________________________________________________________ [_UP] Parent Directory - [HTM] index.html 2k Test Page for the Apache Web Server on R e> [DIR] manual/ - [IMG] poweredby.png 1k _________________________________________________________________
On the subject of formatting, we need to point out a few
problems. Because Apache generates PRE
formatted pages rather than tables it is important that all the
icons be the same size and that all the
ALT
text be the same length
(traditionally three characters). It doesn't appear possible to
put spaces in the ALT
text so I tend to
use underscores for spaces and three dashes for the blank icon
(because it precedes a horizontal rule which in text browsers
are written with a row of hyphens).
It is possible to modify the widths of the displayed
columns. The IndexOptions directive has
suboptions
NameWidth=
and
x
DescriptionWidth=
.
The variables y
x
and
y
an be either an explicit number of
characters or an asterisk. In the former case the name column
is made as wide as its widest element and the description column
is sized to make the whole thing 79 columns wide.
Figure 69. mod_autoindex
: IndexOptions suboptions
FancyIndexing
: Turns on the “long” format.
ScanHTMLTitles
: Display the HTML title or web pages as their description. This can be intensive on the disc.
SuppressDescription
: Turn off the description column altogether.
SuppressLastModified
: Turn off the column for the last modification date and time.
SuppressSize
: Turn off the column for the size of documents.
IconWidth[=X]
: Specify the width of all the icons in pixels (defaults to 20).
IconHeight[=Y]
: Specify the height of all the icons in pixels (defaults to 22).
NameWidth=X
: Width in characters of the file name column. An asterisk means “as wide as the widest element”.
DescriptionWidth=Y
: Width in characters of the “description” or “title scan” column. An asterisk means that the whole row should be 79 characters wide.
Figure 70. httpd.conf
: Headers and footers
HeaderName HEADER.html ReadmeName README.html
Figure 71. Browser's view of headers and footers
This is some text to go at the top of the page above the listing. Name Size Description __________________________________________________________________________ [_UP] Parent Directory - [HTM] HEADER.html 1k [HTM] README.html 1k [HTM] index.html 2k Test Page for the Apache Web Server on R e> [DIR] manual/ - [IMG] poweredby.png 1k _________________________________________________________________
In addition to customising the listing itself, we can also
append information to the top and bottom of the listing. The
mod_autoindex
module provides two
directives HeaderName and
ReadmeName for this purpose. The
HeaderName directive specifies the name of a
file whose contents are placed above the listing and the
ReadmeName a file whose contents go beneath
it.
The filenames must correspond to a MIME content text type.
If it is text/html
then they are
included directly into the generated HTML directory listing. If
they are text/plain
then they are
included within a PRE
block.
Note that the text above the listing
replaces the original text “Index
of /”. Also note that the
HEADER.html
and
README.html
files appear in the listing and
the last directive from the
mod_autoindex
module we will consider
is IndexIgnore. This takes a number of
regular expressions following it. Files that match one or
more of these expressions is not listed in the index.
Figure 72. httpd.conf
: Suppressing files from the listing
IndexIgnore .??* *~ *# HEADER* README* SCCS RCS CVS
Figure 73. httpd.conf
: Default files
# Default files in directory URLs LoadModule dir_module modules/mod_dir.so AddModule mod_dir.c DirectoryIndex index.html index.htm
The other approach to dealing with directory URLs is to
define a filename such that if that file appears within the
directory it is displayed instead of the directory itself. The
mod_dir
module provides precisely this
functionality.
It provides the DirectoryIndex
directive which gives a list of names which should be tried.
Note that it can take an absolute local path. In the example
quoted if a directory URL was quoted then its
index.html
file would be used if it
existed. If it didn't exist then, if the file
index.htm
existed it would be used.
Finally, if neither existed, and
mod_autoindex
module was loaded then
the directory listing would be given. If the module was not
loaded then a 404 “file not found” error would
be given.
If you use both the mod_autoindex
and mod_dir
modules then in the
configuration file, mod_autoindex
must precede
mod_dir
. If they are placed in the
other order then the mod_dir
is
ignored. The author has no idea why this is and assumes it is a
bug.
We saw in the logging section that HTTP (the transfer protocol, not the language of the web pages) has the concept of status codes, with 200 being the ‘OK’ response and 404 being the ‘file not found’ response. From time to time, we may want to force the generation of a particular error message or status code. There are two ways to go about doing this.
The core Apache system has a directive called ErrorDocument. This lets us specify exactly what page will be sent back to accompany a 404, say, status code.
Figure 74. httpd.conf
: Setting the 404 error document
ErrorDocument 404 /errors/404.html ErrorDocument 500 "Oops, server goof."
Figure 75. Syntax: Specifying error messages
ErrorDocument
: If the server generates status codennn
"text
"nnn
then atext/plain
page will be returned with that status code andtext
as the text.
ErrorDocument
: If the server generates status codennn
URL
nnn
then the local web page atwill be returned along with status code
URL
nnn
.
This depends on the server generating a specific status
code. You will recall that status code 403 corresponds to
‘forbidden’. We might want to indicate that trying
to fetch a particular URL was expressly forbidden rather than just
not present. For example, given a directory URL, we might want to
display an index.html
file if one exists but
give a 403 status code if one does not. So we need a way to
generate pages with status codes of other than 200. We could
do this just by turning off or on the indexing option but the
mechanisms described here provide more flexibility.
This functionality is provided by a module called
mod_asis
. This lets us provide web pages
that aren't HTML or any other MIME type but which are the entire
HTTP response to a query. This allows us to add status codes and
other HTTP metadata beyond just the HTML content.
First let's see what a full HTTP session looks like.
Figure 76. Faking a browser with telnet
$
telnet draig.csi.cam.ac.uk 80
Trying 131.111.10.224... Connected to draig.csi.cam.ac.uk. Escape character is '^]'.GET / HTTP/1.0
HTTP/1.1 200 OK Date: Tue, 16 May 2000 08:54:29 GMT Server: Apache/1.3.12 (Unix) (Red Hat/Linux) Last-Modified: Tue, 25 Apr 2000 17:08:10 GMT ETag: "f242-9e9-3905d0fa" Content-Length: 2537 Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> ... </BODY> </HTML>
Figure 77. HTTP response headers
HTTP/1.1 200 OK
: The HTTP protocol version number (our query was version 1.0 but the server is entitled to reply with version 1.1), followed by the status code and a text explanation of the status code.
Date
: The timestamp of the response.
Server
: A description of the responding server.
Last-Modified
: When the page was last modified.
ETag
: ‘Entity tag’: a key used to uniquely identify this version of the page for caches etc.
Content-Length
: Number of bytes in the body of the response. (i.e. the HTML page, but not the HTTP headers.)
Connection
: Whether the TCP connection should be kept open after this transfer to allow further requests.
Content-Type
: The MIME content type of the following documentBlank line: The separator between the headers and the body of the web page.
So, if we are going to generate a status code 403, say, then we will need to create that first line and perhaps some others. The module will assist us with many of them, though.
The module works as follows: we create a new, fake MIME
content type called httpd/send-as-is
and
associate it with files ending with one or more suffixes
(.asis
, traditionally). The module then
causes the server to process these files as nearly raw HTTP rather
than as HTML or some other MIME content type. Because
httpd/send-as-is
is not a true MIME type,
we don't want to define it in the
/etc/mime.types
file, so we use the
AddType directive of the
mod_mime
module to define it purely
within the web server. This gives us a module dependency: the
mod_asis
module cannot be used without
the mod_mime
module already being
added.
Figure 78. Adding the mod_asis
module
# Send .asis files "as is" AddType httpd/send-as-is asis LoadModule asis_module modules/mod_asis.so AddModule mod_asis.c
Now, if we wanted to provide for forbidding directory
indexing in certain directories as opposed to providing an
index.html
file, we could provide the
DirectoryIndex line
DirectoryIndex index.html index.asisThen, if a user creates a
index.html
file it
is treated as usual. If there is no
index.html
file but there is an
index.asis
file it is used and send ‘as
is’. If there is neither then the directory is
autoindexed.Let's now look at constructing a plausible
index.asis
file.
Figure 79. A plausible index.asis
file
Status: 403 Directory searching is prohibited Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <HTML><HEAD> <TITLE>Security policy violation</TITLE> </HEAD><BODY> <H1>Security policy violation</H1> <P>This web site's security policy prohibits the autoindexing of this directory. Your request has been logged.</P> </BODY></HTML>
A more useful page would give links to a search engine and
the such like. More importantly, observe the headers at the start
of the page, split from the body by the first blank line of the
page. (The line is truly empty; there are no spaces or other
whitespace characters in it.) The
Status:
header introduces the
status code and the explanatory text message. We don't get to
specify the HTTP version being spoken; the server will take care
of that for us. Any following lines (before the blank line) that
look like HTTP headers will be passed through untouched and must
be valid HTTP header lines. The server will add the
Server:
,
Date:
and
Connection:
lines and we should
not write these.
Figure 80. Faking a browser with telnet again
$
telnet draig.csi.cam.ac.uk 80
GET /two/ HTTP/1.0
Trying 131.111.10.224... Connected to draig.csi.cam.ac.uk. Escape character is '^]'. Connection closed by foreign host. HTTP/1.1 403 Directory searching is prohibited Date: Tue, 16 May 2000 11:30:40 GMT Server: Apache/1.3.12 (Unix) (Red Hat/Linux) Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <HTML><HEAD> <TITLE>Security policy violation</TITLE> </HEAD><BODY> <H1>Security policy violation</H1> <P>This web site's security policy prohibits the autoindexing of this directory. Your request has been logged.</P> </BODY></HTML>
If we inspect the access log file we will see the 403 lines there too.
[16/May/2000:12:06:30 +0100] hydra.csi.cam.ac.uk "GET /two/ HTTP/1.0" 403 345
This is where we get to see the difference between the
logging code ‘%s
’
and ‘%>s
’. The
former would log a status code of 200 because the
.asis
file was processed correctly. The
latter shows 403 because that is the ultimate status code after
all the internal reprocessing is complete.
Figure 81. httpd.conf
: User directories
# Users' web pages LoadModule userdir_module modules/mod_userdir.so AddModule mod_userdir.c UserDir public_html
Apache contains a mechanism (read module) that allows users
to supply their own web pages. It is not uncommon for a web
server to offer nothing but these and to have no
‘central’ web pages at all (except perhaps for a top
level index.html
file. The
mod_userdir
module provides a single
directive, UserDir. It can be used in a number
of ways, however.
Figure 82. user_dir
: Remapping http://
server
/~user
/index.html
UserDir public_html
Maps URL to
~/
.user
/public_html/index.html
UserDir /home/userpages
Maps URL to
/home/userpages/
.user
/index.html
UserDir /home/*/webstuff
Maps URL to
/home/
.user
/webstuff/index.html
UserDir http://
other
/home/userpagesMaps URL to
http://
other
/home/userpages/user
/index.html
UserDir http://
other
/*/webstuffMaps URL to
http://
other
/user
/webstuff/index.html
So far, our editing of the file
httpd.conf
has set parameters for the entire
server. On occasion it is appropriate to have one set of
parameters for one set of web pages and another for other parts.
We need some way to pass directives applicable just to a certain
set of pages. There are a number of ways to describe subsets of
pages and Apache supports them all. We will restrict ourselves to
just the simplest in this course.
The simplest is by considering subtrees of the web pages.
We can tag a directory with some special options and have those
apply to every web page beneath it. For example we might want to
restrict access to everything under
/var/www/html/restricted/
.
We might want to tag multiple directory trees by applying
these overrides to every directory that matches a regular
expression, rather than by specifying its explicit name. For
example, anything under any directory called
restricted
might get special rules.
We might just specify a regular expressions that matches
files and apply the rules to any files (as opposed to directories)
that match. So any web page called
special.html
might get nonstandard rules.
Alternatively, rather than specify the restriction by file name (after the URL has been resolved) we might change the rules according to the URL quoted before this gets mapped onto a file (or directory) name. Again this could be a subtree of URLs or the set of URLs that match a regular expression.
Using the directory structure to control options also
permits the placing of special files in the directory structure to
control the trees beneath them (traditionally called
.htaccess
). These control files, in turn,
might benefit from the filename matching rules to stop their being
fetched by clients.
While these all make sense in isolation, the combination of rules governing directory trees, URLs, filename regular expressions and URL regular expressions is a recipe for trouble. We are going to approach this issue from the KISS (‘Keep It Simple, Stupid!’) standpoint and restrict ourselves to directory subtrees here.
Figure 83. A simple restriction example
By default:
index.html
files to be respected.Automatic indexing permitted.
Under
/var/www/html/fubar/
:
index.html
files to be respected.Automatic indexing forbidden.
Our configuration file will run with
DirectoryIndex index.htmlbut we need
Options +Indexesfor the default case and
Options -Indexesfor the
/fubar/
subdirectory. The next
element of the configuration file we will examine provides
precisely this functionality.Figure 84. httpd.conf
: Restricting options to subdirectories
# Default Options +Indexes # Subdirectory restriction <Directory /var/www/html/fubar/> Options -Indexes </Directory>
The <Directory> tag limits the
application of parts of the configuration to just those files and
directories beneath
/var/www/html/fubar/
.
We start to see a problem here, though. Inevitably, the
directory structure will get larger and larger. The set of
overrides and rules will get longer and longer. More and more
people will need access to the httpd.conf
file. More and more lines will get added to it. This is bad.
What is needed is a way to delegate the controls over a directory
tree to the directory itself. This facility exists, using control
files in some directories, traditionally called
.htaccess
. The file can, however, have any
name we choose to give it. However, before we start delegating
control we might want to restrict just what configurations in the
httpd.conf
file we are prepared to have
overridden in the delegated control files.
Figure 85. httpd.conf
: Delegation of (some) control
AccessFileName .config <Directory /var/www/html> AllowOverride AuthConfig FileInfo Indexes </Directory>
The delegated control file was originally used to control
access to subtrees of web pages (and we'll see how to do this
soon) and the name of the directive that sets it
(AccessFileName) reflects that history. It is
a more general overriding facility, though, so to reinforce that,
we'll use the name AccessFileName directive to
set the name of the delgated configuration file to be
.config
.
The second line specifies what facets of the Apache
configuration can be overridden in the
.config
files. This aspect of the Apache
control mechanim is not as refined as it might be, unfortunately.
Any directive that appears in the httpd.conf
file and which ‘makes sense’ applied to a directory tree (more
precisely, any directive that could appear in a
<Directory>...<Directory/> block)
can be placed in this subconfiguration file.
Figure 86. Core functionality: Delegation of (some) control
AccessFileName
fname
Within the document tree the a file
will override the default behaviour with the behaviour specified within (insofar as is permitted).
fname
AllowOverride
suboptions
This directive specifies exactly what aspects of the configuration may and may not be overridden in the files named by the AccessFileName directive.
Figure 87. Core functionality: AllowOverride suboptions
AuthConfig
Control the mechanisms used for authenticating users for access to restricted documents. See the section on access control for more on this option.
FileInfo
This permits the use of the directives found in the MIME module to change or add MIME types.
Indexes
This permits the use of the directives found in the two directory modules.
Options
Allow the use of the Options directive in the delegated control files.
All
Permit all overrides.
None
Permit no overrides. Ignore the delegated control files.
Now let's return to the case study in the slide. We will
drop the subdirectory directives entirely and instead specify what
overrides we will permit and in what file. We then have a
.config
directory that changes the options
again.
Figure 88. httpd.conf
: Restricting options to subdirectories
# Default Options +Indexes AccessFileName .config <Directory /var/www/html> AllowOverride Options </Directory>
Figure 89. /var/www/html/fubar/.config
contents
Options -Indexes
Now we start to see Apache creak at the seams. Note that to change the nature of indexes (using the IndexOptions directive) we would need to allow the override Indexes. However, because turning automatic indexing on or off (Options +Indexes or Options -Indexes) is handled by the Options directive we have to permit the Options override. This is unfortunate because there are other suboptions to Options that we might not want to delegate. Mercifully, indexing is the exception rather than the rule in this regard. In most other cases the controls over delegation do make sense.
The next question to address is how these files nest. If I
have a default state of Options +Indexes
and a file /var/www/html/fubar/.config
containing Options -Indexes, what happens
if I have a file
/var/www/html/fubar/snafu/.config
with
Options +Indexes? As you might expect the
fubar/snafu/.config
overrides the
fubar/.config
file for the contents of
fubar/snafu/
.
There are basically two ways to restrict access to web pages: by the client's IP address or by making the client quote a userid and password. For the time being we will control the entire web site. We can then use the previous section to control just subsets of the site. In this section we will restrict by IP address and in the following section we will describe a superior system.
So, first of all, a warning about IP access restrictions:
web proxies can really spoil your day. A web proxy is a system
that forwards web requests on to another server. So if
www.
restricts access to clients inside
inst
.cam.ac.ukcam.ac.uk
it is vulnerable to proxies
within cam.ac.uk
. If
randompc.example.com
makes a direct
request it will be rejected. However, if
randompc.example.com
makes a request of
a web proxy,
proxy.
,
then the proxy forwards the request to
college
.cam.ac.ukwww.
.
The latter sees a query coming from within
inst
.cam.ac.ukcam.ac.uk
and honours it.
The proxy then forwards ther answer back to
randompc.example.com
. The moral of
this tale is to use client address restriction only when you
restrict to a set of machines you control (enough to restrict
proxies on them). Don't use it blindly.
To give an example of how hard this is the Computing Service discovered it was running an unintended proxy which allowed the CS minutes (restricted to the CS internal network by IP address) to be read from any machine in the world if they knew about our proxy. The CS friendly probing suite now probes for web proxies so you won't be surprised by yours the way we were by ours.
And now, we will demonstrate how to add client address
access restrictions in the Apache configuration file using the
mod_access
module.
Figure 90. httpd.conf
: Access restrictions
# Access control by IP address LoadModule access_module modules/mod_access.so AddModule mod_access.c order deny,allow allow from .csi.cam.ac.uk deny from all allow from .csx.cam.ac.uk
The order line is read first. The
‘deny,allow
’ argument specifies that
initially all requests will be honoured
then all the deny lines will be applied
then all the allow lines will be applied
regardless of the order the lines appear in.
In the example given, access is permitted to the site from
clients in the two domains
csi.cam.ac.uk
and
csx.cam.ac.uk
but no others. Note the
use of the leading dot to indicate that, for example,
.csx.cam.ac.uk
is a domain and not a
hostname. Also note that for access control by domain name to
work you need to have HostnameLookups set to
On
.
Figure 91. Request from randompc.example.com
Initial state: Access allowed
deny from all
: Access denied
allow from .csi.cam.ac.uk
: Inapplicable—No change
allow from .csx.cam.ac.uk
: Inapplicable—No changeFinal state: Access denied
Figure 92. Request from ghoul.csi.cam.ac.uk
Initial state: Access allowed
deny from all
: Access denied
allow from .csi.cam.ac.uk
: Applicable—Access allowed
allow from .csx.cam.ac.uk
: Inapplicable—No changeFinal state: Access allowed
Figure 93. mod_access
: “allow” directives
order deny,allow
Initially all access allowed,
then apply all deny lines,
then apply all allow lines.
order allow,deny
Initially all access denied,
then apply all allow lines,
then apply all deny lines.
allow from all
All requests are allowed.
allow from
host.inst.cam.ac.uk
Requests from the host are allowed. Requires HostnameLookups On.
allow from
.inst.cam.ac.uk
requests from hosts within the domain are allowed. Requires HostnameLookups On.
allow from
131.111.11.84
Requests from the host are permitted.
allow from
131.111.11.0/255.255.255.0
Requests from any IP address starting
131.111.11.
are allowed.
allow from
131.111.11.0/24
Requests from any IP address starting
131.111.11.
are allowed. (The first three numbers correspond to the first 24 bits of the IP address quoted.)
Figure 94. mod_access
: “deny” directives
deny from ...
As per allow from ...
As said before, the author advises very strongly against
restricting access to .cam.ac.uk
. You
may be able to get away with restrictions to
if you rule your institution with an iron fist. Often it is far
more useful is to require authorised users to authenticate
themselves against the server. The mechanisms for this are
split over a variety of modules and core Apache functionality
depending on how to want to run the authentication. We will
take the simplest approach here and authenticate against a text
password file. Equivalent modules exist for authenticating
against more complex databases. This becomes necessary if the
database gets too big and linear text file searching too
slow.inst
.cam.ac.uk
Figure 95. httpd.conf
: Restricting access to authenticated users
LoadModule auth_module modules/mod_auth.so AddModule mod_auth.c <Directory /var/www/html/restricted> AuthType Basic AuthName wombat AuthUserFile /etc/httpd/conf/passwd require valid-user </Directory>
Figure 96. Creating an Apache password file
$
touch /etc/httpd/conf/passwd
$
ls -l /etc/httpd/conf/passwd
-rw-rw-r-- 1 root webadmin 0 Jun 1 10:12 passwd$
htpasswd /etc/httpd/conf/passwd demouser
New password: dem0user Re-type new password: dem0user Adding password for user demouser
First let's consider what we've done to the
httpd.conf
file. We have included a module
mod_auth
whose function is to permit
checking IDs against a plain text password file. This module
provides us with the AuthUserFile directive
which specifies the location of that password file. The
AuthName and AuthType
directives belong to the core Apache functionality because they
are independent of the supporting database. The
AuthType directive specifies the mechanism
that is going to be used to transmit the ID and password. If we
are going to use the mod_auth
module we
must specify Basic
as the authentication type
because this is the only one widely understood. This sends
passwords unencrypted over HTTP.
Basic authentication is best illustrated by using telnet as our web client again.
Figure 97. Basic authentication uncovered—1
$
telnet hydra.csi.cam.ac.uk 80
Trying 131.111.11.148... Connected to hydra.csi.cam.ac.uk. Escape character is '^]'.GET /restricted/ HTTP/1.0
HTTP/1.1 401 Authorization Required Date: Thu, 01 Jun 2000 10:29:37 GMT Server: Apache/1.3.12 (Unix) (Red Hat/Linux) WWW-Authenticate: Basic realm="wombat" Connection: close Content-Type: text/html; charset=iso-8859-1 ... Connection closed by foreign host.
So our attempt to get the
/restricted/
URL fails with a status
code 401 ‘Authorization Required’. Note the
HTTP header line
WWW-Authenticate: Basic realm="wombat"On receipt of this status code and header line a sensible browser will prompt the user for an ID and password for the server, quoting the realm ‘wombat’. The concept of realms allows us to split the web site into more than one distinctly controlled area. For one directory tree (
/var/www/html/restricted/
we can demand
IDs and passwords for one realm (wombat) and for another tree we
can demand a different set of IDs and passwords.The browser will then send back the same request as before
but this time quoting the ID and password given, Base64 encoded.
(The Base64 encoding of
‘demouser:dem0user
’
is
‘ZGVtb3VzZXI6ZGVtMHVzZXI=
’.)
Figure 98. Basic authentication uncovered—2
$
telnet hydra.csi.cam.ac.uk 80
Trying 131.111.11.148... Connected to hydra.csi.cam.ac.uk. Escape character is '^]'.GET /restricted/ HTTP/1.0
Authorization: Basic ZGVtb3VzZXI6ZGVtMHVzZXI=
HTTP/1.1 200 OK Date: Thu, 01 Jun 2000 11:09:15 GMT Server: Apache/1.3.12 (Unix) (Red Hat/Linux) Last-Modified: Thu, 01 Jun 2000 10:28:10 GMT ETag: "6b543-144-39363aba" Accept-Ranges: bytes Content-Length: 324 Connection: close Content-Type: text/html ...
The browser will typically remember the userid and password for realm ‘wombat’ and if challenged for the same realm again won't reprompt the user.
Figure 99. ID-based access restriction logic
Authenticate the ID
Is the ID allowed access?
To date we have just explained how the Basic authentication authenticates a web user. We still haven't really explained why the user is subsequently let in. There are two sides to the permissions: First, the client must authenticate themseves to the server as a particular ID. Second, the ID must, of itself, have permission to access the pages. This second stage is covered with the require directive. The line in our example file
require valid-usermeans that any user from the
/etc/httpd/conf/passwd
file is allowed
access if they can quote the password.Figure 100. An example /etc/httpd/conf/passwd
file
demouser:RGMhGsfmvLQeE bob:ylxjJ83Fx7p8E tom:C6QeAIpNqz9IE dick:yfPWrksACScys harry:tXFkoaIYJqbrk
The password file maintained by the htpasswd program uses the same password hashing algorithm as the traditional Unix password file, but note that you cannot use the system password file for the Apache system. This file must be maintained separately. Also note that the IDs used in this file are not login names. There need be no relation at all between the IDs used for web authentication and the system's login names.
Figure 101. A more refined access control
/var/www/html/restricted/alpha
: Any valid user
/var/www/html/restricted/beta
:tom
,dick
,harry
/var/www/html/restricted/gamma
:bob
,tom
Figure 102. httpd.conf
: Finer grained access control
LoadModule auth_module modules/mod_auth.so AddModule mod_auth.c <Directory /var/www/html/restricted> AuthType Basic AuthName wombat AuthUserFile /etc/httpd/conf/passwd </Directory> <Directory /var/www/html/restricted/alpha> require valid-user </Directory> <Directory /var/www/html/restricted/beta> require user tom dick harry </Directory> <Directory /var/www/html/restricted/gamma> require user bob tom </Directory>
In the slide we see an alternative use of
requires. Here, we set up a single mechanism
to authenticate the clients for the directory
/var/www/html/restricted
and three
different schemes for determining who (once authenticated) is
allowed in.
Figure 103. httpd.conf
: Access control by groups
LoadModule auth_module modules/mod_auth.so AddModule mod_auth.c <Directory /var/www/html/restricted> AuthType Basic AuthName wombat AuthUserFile /etc/httpd/conf/passwd AuthGroupFile /etc/http/conf/group </Directory> <Directory /var/www/html/restricted/alpha> require valid-user </Directory> <Directory /var/www/html/restricted/beta> require group betagrp </Directory> <Directory /var/www/html/restricted/gamma> require group gammagrp </Directory>
Figure 104. An example /etc/httpd/conf/group
file
betagrp: tom dick harry gammagrp: bob tom
There is one level of sophistication above lists of users: lists of groups. In addition to the password file for web IDs to be authenticated there can be a group file assigning these web IDs to web groups. Again, these are completely independent of the Unix login groups and note that the web group file has a different format from the Unix group file.
It's worth recalling that anything that appears in a
<Directory> block can also appear in
the directory's corresponding .htaccess
(or
whatever you chose to call it with the AccessFileName
directive) file.
Figure 105. mod_auth
: Directives
AuthType Basic
: Specifies the ‘basic’ authentication mechanism.
AuthName
: Specifies the ‘security realm’.realm
AuthUserFile
: Specifies the web ID password file.
file
AuthGroupFile
: Specifies the web group file.
file
require valid-user
: Any authenticated ID may have access.
require user
: ID must be authenticated and be one ofuser1
user2
user1
oruser2
to have access.
require group
: ID must be authenticated and be in groupgrp1
grp2
grp1
orgrp2
to have acces
Figure 106. HTTP request headers
GET / HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.14-6.1.1 i686)
Host: hydra.csi.cam.ac.uk
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
Accept-Encoding: gzip
Accept-Language: es, en
Accept-Charset: iso-8859-1,*,utf-8
A header that is part of the HTTP/1.1 spec. but has been a
standard extension to HTTP/1.0 in all browsers is the
Host:
header. This identifies
by name the host the brower was trying to
connect to.
At first glance this is pointless. If the browser hadn't been trying to connect to the server it wouldn't have been connecting to this instance of Apache in the first place! However, it is possible to have multiple different names all pointing to the same IP address and hence the same instance of Apache.
There are several ways to do this in the DNS but the most common and easiest is to have a single real name for the IP address (its ‘canonical name’) specified in the DNS by an A record (so called because it looks up the address corresponding to a name) and one or more aliases. These aliases are other names defined to be equivalent ot the real name by CNAME records in the DNS (so called because they look up the canonical name of the alias).
Figure 107. DNS entries
www-uxsup.csx.cam.ac.uk. 1D IN CNAME nymph.csi.cam.ac.uk. nymph.csi.cam.ac.uk. 1D IN A 131.111.10.245
By explicit inclusion of the originally requested host name it is possible to have a multiplicity of websites each corresponding to different names for the same host. This is managed in the configuration file with the <VirtualHost> directive.
Figure 108. httpd.conf
: Setting up a virtual
host
# Virtual host example <VirtualHost cockatrice.csi.cam.ac.uk> DocumentRoot /var/www/cock </VirtualHost>
The slide shows the setting up of a virtual host with the system definitions but with a different document root. You might want to create separate Unix user groups for the control of the content of the virtual host data and the ‘canonical’ host data.
On systems that run multiple virtual hosts, it is very common for the canonical document root to have nothing but a home page saying ‘go to one of these virtual hosts’ and for all the data to be under the document trees for the various virtual hosts.
The title of this document is:
Web Server Management: Running Apache on Red Hat Linux
URL:
http://www-uxsup.csx.cam.ac.uk/courses/apache/student.html