Webalizer is a good free program to analyze your apache webserver’s logs. It will give a a rough idea of who is visiting your site and who referred the vistor and what search terms were used to find your site.
To install, simply open a terminal and
sudo /usr/bin/yum install webalizer
This will install the binaries and a simple configuration file.
cd /etc/webalizer
make a copy of your webalizer.conf example file for fyour first virtual domain. Our examples will be virtual_1_webalizer.conf. After we customize this file to meet your server’s needs, we will simply copy and rename virtual_1_webalizer.conf for each of your other domains (virtual_2_webalizer.conf, etc)
First we’ll do a little prepatory work. Create a directory for each of your virtual domains where webalizer can store its output. You will need to do this if you want to let users access their webalizer results remotely. In our example, mkdir /var/www/usage/
mkdir /var/www/usage/
mkdir /var/www/usage/virtual_1
Second, we’ll need to create a history file so that webalizer will accumulate statistics for more than a single day.
touch /var/lib/webalizer/virtual_1_webalizer.hist
mkdir /var/lib/webalizer/virtual_1
touch /var/lib/webalizer/virtual_1/webalizer.current
Open the virtual_1_webalizer.conf in your favorite editor (I use vi). I have bolded the portions I had to modify.
vi /etc/webalizer/virtual_1_webalizer.conf
1. Insert the name of your apache server’s access log for virtual_1. In our example below, it’s www.yourdomain.com-access_log.
2. Specify the file log format. In our example below, its clf (common log format).
3. Indicate the output directory /var/www/usage/virtual_1
4. Tell webalizer where to find the virtual host’s history file /var/lib/webalizer/virtual_1_webalizer.hist
5. Tell webalizer where to fine the virtual host’s current file /var/lib/webalizer/virtual_1/webalizer.current
6. Now tell webalizer the name of the virtual domain www.your_domain.com
Now save the file.
#
# Sample Webalizer configuration file
# Copyright 1997-2000 by Bradford L. Barrett (brad@mrunix.net)
#
# Distributed under the GNU General Public License. See the
# files “Copyright” and “COPYING” provided with the webalizer
# distribution for additional information.
#
# This is a sample configuration file for the Webalizer (ver 2.01)
# Lines starting with pound signs ‘#’ are comment lines and are
# ignored. Blank lines are skipped as well. Other lines are considered
# as configuration lines, and have the form “ConfigOption Value” where
# ConfigOption is a valid configuration keyword, and Value is the value
# to assign that configuration option. Invalid keyword/values are
# ignored, with appropriate warnings being displayed. There must be
# at least one space or tab between the keyword and its value.
#
# As of version 0.98, The Webalizer will look for a ‘default’ configuration
# file named “webalizer.conf” in the current directory, and if not found
# there, will look for “/etc/webalizer.conf”.
# LogFile defines the web server log file to use. If not specified
# here or on on the command line, input will default to STDIN. If
# the log filename ends in ‘.gz’ (ie: a gzip compressed file), it will
# be decompressed on the fly as it is being read.
LogFile /var/log/httpd/www.yourdomain.com-access_log
# LogType defines the log type being processed. Normally, the Webalizer
# expects a CLF or Combined web server log as input. Using this option,
# you can process ftp logs as well (xferlog as produced by wu-ftp and
# others), or Squid native logs. Values can be ‘clf’, ‘ftp’ or ’squid’,
# with ‘clf’ the default.
#LogType clf
# OutputDir is where you want to put the output files. This should
# should be a full path name, however relative ones might work as well.
# If no output directory is specified, the current directory will be used.
OutputDir /var/www/usage/virtual_1
# HistoryName allows you to specify the name of the history file produced
# by the Webalizer. The history file keeps the data for up to 12 months
# worth of logs, used for generating the main HTML page (index.html).
# The default is a file named “webalizer.hist”, stored in the specified
# output directory. If you specify just the filename (without a path),
# it will be kept in the specified output directory. Otherwise, the path
# is relative to the output directory, unless absolute (leading /).
HistoryName /var/lib/webalizer/virtual_1_webalizer.hist
# Incremental processing allows multiple partial log files to be used
# instead of one huge one. Useful for large sites that have to rotate
# their log files more than once a month. The Webalizer will save its
# internal state before exiting, and restore it the next time run, in
# order to continue processing where it left off. This mode also causes
# The Webalizer to scan for and ignore duplicate records (records already
# processed by a previous run). See the README file for additional
# information. The value may be ‘yes’ or ‘no’, with a default of ‘no’.
# The file ‘webalizer.current’ is used to store the current state data,
# and is located in the output directory of the program (unless changed
# with the IncrementalName option below). Please read at least the section
# on Incremental processing in the README file before you enable this option.
Incremental yes
# IncrementalName allows you to specify the filename for saving the
# incremental data in. It is similar to the HistoryName option where the
# name is relative to the specified output directory, unless an absolute
# filename is specified. The default is a file named “webalizer.current”
# kept in the normal output directory. If you don’t specify “Incremental”
# as ‘yes’ then this option has no meaning.
IncrementalName /var/lib/webalizer/virtual_1/webalizer.current
[PORTION OMITTED]
# HostName defines the hostname for the report. This is used in
# the title, and is prepended to the URL table items. This allows
# clicking on URL’s in the report to go to the proper location in
# the event you are running the report on a ‘virtual’ web server,
# or for a server different than the one the report resides on.
# If not specified here, or on the command line, webalizer will
# try to get the hostname via a uname system call. If that fails,
# it will default to “localhost”.
HostName www.your_domain.com
[OMITTED REST OF FILE]
Now create a script called run_webalizer.sh
touch /etc/webalizer/run_webalizer.sh
insert the following:
for i in /etc/webalizer/*.conf; do webalizer -c $i;
echo “webalizer has run”;
done
Now, make the script executable.
chmod + /etc/webalizer/run_webalizer.sh
Now, run the script.
sh /etc/webalizier/run_webalizer.sh
NOTE: You may get an error message like
"Error: Unable to save current run data"
The message probably indicates some permissions error. I haven’t figured out how to fix it but it does not appear to prevent webalizer from doing its basic job.
Now, let’s check our work. You should now be able to see your output using your web browser at:
http://server_address/usage/virtual_1/
Now we want to add the other domains. Just copy the above steps for each domain (e.g. copy virtual_1_webalizer.conf to virtual_2_webalizer.conf)
Now we want webalizer to run automatically for all our virtual hosts at 12:01 each morning and mail the output to us. To do this we will add it to the cron file
crontab -e
1 0 * * * sh /etc/webalizer/run_webalizer.sh 2>&1 | mail -s “webalizer ouput” root@your_domain.com
By default, webalizer is only viewable locally (localhost). To enable others to view it, you will have to edit the file:
vi /etc/httpd/conf.d/webalizer.conf
To make this less accessible, you can assign passwords by modifying your apache server configuration file like this
vi /etc/httpd/conf/httpd.conf
and insert this file into the virtual domain’s directory
<Directory> /var/www/usage/virtual_1>
AuthType Basic
AuthName “Password Required”
AuthUserFile /var/www/pass/.virtual_1.password
Require valid-user user_name
</Directory>
Now make a directory for your passwords like this
mkdir /var/www/pass
htpasswd -c /var/www/pass/.virtual_1.password/ username
Explanations of the data displayed by webalizer can be found here: