Linux in the server world

One of the guiding principles at LinuxNovice.org is the fact that not all Linux novices are computer novices, so we endeavor to provide information for developers and administrators who may not be new to computers, but are new to Linux

With its stability and ease of use, Linux continues to make advances in the server market. While Linux is great solution for providing file and print services, it\’s most common use is as a web server. And with recent additions of things like IBM\’s Websphere and ChiliSofts ASP, the flexibility of Linux based web servers is at an all time high.

What is webalizer

Webalizer is a free web server log analyzer. It provides very detailed traffic report that can help webmasters understand where their traffic is coming from and where it is going. Webalizer generates reports in both table format and graphical, making it a tool useful for system admins as well as business managers.

Getting and installing webalizer (rpm, srpm, tarball)

As with most programs for Linux, you have several options for installing Webalizer. From the Webalizer\’s download page you can get the source code in tar/gz format, as well as precompiled binaries for different unixes and even DOS/Windows. Your other alternative, and possibly your easiest, is to download the RPM package (if you system supports RPM). You can find the Webalizer RPM\’s at RPMFind.

INSTALLATION INSTRUCTIONS
Tar/gz

[root@]# cd /usr/src
[root@]# tar -zxvf /path/to/webalizer-XXX.src.tgz
[root@# cd webalizer-XXX
[root@]# ./configure
[root@]# make
[root@]# make install
RPM

[root@]# rpm -i webalizer-xxx.rpm
SRPM

[root@]# rpm –rebuild webalizer-xxx.src.rpm
[root@]# rpm -i /usr/src/redhat/RPM/i386/webalizer-xxx.rpm
Please note that you may need to install the following library for graphics support. LibGD.

Log files and config files

Webalizer is designed to be used with most any log file, but work out of the box with apache/ncsa style web log files and will provide reports for everything from where traffic is coming from to where it is going and how it got there.

The primary configuration file for webalizer is located at /etc/webalizer-conf, however you can place this file anywhere you want and specify via the command line which file to use. I personally reccomend placing the webalizer configuration file int he same location as your apache config files for consistancy.

There are a number of things you can change in the webalizer-conf, so let take a look at some of them.

LogFile /var/log/httpd/access_log
If you have set up custom logs for virtual domains, or have placed you log file in a different directory, simply specify the path to it here (including file name)
OutputDir /var/lib/httpd/htdocs/usage
Output directory should be somewhere within your web root. You can enable Apache authentication if you don\’t want this to be publically available.
#HostName localhost
If you are running a virtual domain, or `hostname` reports something other than the web server name, make sure to set this.
PageType htm*
If you use SSI enabled paged (.shtml) or PHP (.php, .php3, .phtml) or any other filename extention, you need to add these so webalizer knows that you consider files with these extentions to be pages.

#IgnoreSite bad.site.net
#IgnoreURL /test*
#IgnoreReferrer file:/*

Now this is a tricky one. There is an option set that matches this one called hide. The difference is, using hide still counts hits that match your criteria, but they are excluded from the \”Top\” count sections. On the other hand Ignore, will completely ignore (as in not count) those hits at all. Using Ignore will skew your actual results. However, this may be what you want to do. Either way the format is the same. If you want to remove hits coming from specific sites, add an IgnoreSite entry (for example, to ignore hits from your company, add IgnoreSite *.company.com, or for just yourself IgnoreHost mybox.company.com). To keep certain directories out of the report add a IgnorURL (if you write your output to /webalizer you can ignore hits to this directory by adding IgnoreURL/webalizer*). You can find a sample report at Sample Report.

Running webalizer

Most of the time you will want to run webalizer as part of your cron tasks. The main thing you will need to make sure of is that the directory listed as the output directory is created and owned by the user running webalizer. In many cases you can do this as the web server user (typically nobody). To set things up d the following as root:

# mkdir /path/to/webroot/
# webalizer -c /path/to/webalizer-conf
# crontab -u nobody -e
Add the following entry to you crontab using the editor
0 5 * * 7 /usr/local/bin/webalizer -c /path/to/webalizer-conf

The entry given will run webalizer at 12:05am, every Sunday night. You may chose a different schedule, see man 5 crontab for help with crontab entries.

# chown -R nobody /path/to/webroot/

These steps will create the directory needed, create the initial webalizer report, setup cron to run the reports (as nobody) and change the owner of the reporting directory to nobody (so that when webalizer runs as nobody, it can overwrite the current files).

Conclusion

Webalizer is a great GPL log analyzer and report generator. While it does not offer the depth of some of the commercial packages, it is very useful. The report it generates give an excellent snap shot of a sites traffic and offer a nicely formatted view of who is accessing what resources. For my money, its a great choice.