[ Top | Up | Prev | Next | Map | Index ]

Analog 5.03: Choosing a logfile


The basic command for selecting a logfile is
LOGFILE logfilename
or just to put the logfile name on the command line without any arguments, e.g., analog logfilename. A - sign or the word stdin is interpreted as standard input: this is useful on Unix systems for constructing pipes. All logfiles must be within your computer's file system (on disk, or at least mounted under Unix, or on a mapped drive under NT) -- analog won't use FTP or HTTP to fetch them from the internet. In the Mac version, you can also analyse a particular single logfile by dragging it onto the analog icon.

You can have several LOGFILE commands. You can include wildcards in the logfile name (but not necessarily in the directory name: this is system-dependent), and you can use a list of logfiles separated by commas (without spaces). So the following commands would tell analog to read logfile1, c:\logs\logfile2, and all files ending in .log:

LOGFILE logfile1,*.log
LOGFILE c:\logs\logfile2
Or if you were on a Mac, you might use something like
LOGFILE "Hard Drive:Internet Applications:Analog:Logs:*"
You can also use the special command
LOGFILE none
to erase the list of logfiles specified so far.

If the name of a logfile doesn't include a directory, it will be looked for wherever analog expects to find logfiles. (This location is built in when the program is compiled.) For example, on Windows it would be in the same folder as the analog executable. So to include a logfile from the current working directory, for example on Unix, you need to use ./logfile.log instead of just logfile.log.

The LOGFILE commands are cumulative, except that any logfiles on the command line or in configuration files specified on the command line override any in the default configuration file or configuration files loaded from there, and are themselves overridden by any in the mandatory configuration file or configuration files loaded from there. Usually you don't need to worry about this, and it will do what you expect! (Actually I should have said "logfiles or cache files" -- but we'll get on to that later).


Analog knows about several different types of logfile. By default it will attempt to see if your logfile is of one of the types it knows about, based on the first line. The types it can usually diagnose are the common log format, the NCSA combined format, referrer log and browser log, the W3 extended log format, the Microsoft IIS format, the Netscape format, the WebSTAR format and the WebSite format. Examples of all these formats are given at the end of this section. If you have debugging on, analog will report what type of logfile it thinks yours is.

If your logfile is not in one of the standard formats, you will probably still be OK, because it is possible to tell analog about other formats using a LOGFORMAT command. This is explained in the next section. But most users don't ever need to know about this because they have logfiles in a standard format. So the best thing to do is just to try analysing your logfile and see if analog will understand it. If it does, you don't need to worry about LOGFORMATs.

If analog can't understand your logfile, it will warn you that it can't detect the format, or possibly that it found a lot of corrupt lines. There are basically four reasons why this might happen:

  1. Some log formats are not very well designed and analog can't analyse them reliably. In this case it will give up, usually with a helpful message, rather than risk doing a bad job. For example, you might get "Logfile with ambiguous dates" or "Time without date." In this case you should read the notes on all the built-in formats below where some common problems with those formats are described.
  2. Since analog tries to deduce the format based on the first line of the logfile, it could just be that the first line is corrupt. In this case, you could tell analog the format, or you could just fix the first line.
  3. For the same reason, if the format changes midway through the log, analog will count the remaining lines as corrupt. In this case, you will find that your report contains a partial analysis but with a large number of corrupt lines too. You will need to give analog two LOGFORMAT commands to tell it about the two different formats.
  4. Finally, some logfiles really aren't in one of the standard formats. In this case you will need to read the next section and learn how to tell analog about your format.
If you can't see what's wrong with your logfile, you can specify DEBUG ON, and analog will report where each line was corrupt.
There's also a second argument to the LOGFILE command, which specifies a prefix to add to all the filenames in that logfile. This is useful if you've got several different servers or virtual hosts, when the same filename may occur on each of the servers. For example,
LOGFILE mydomain.log http://www.mydomain.com
would translate a filename /file.html in mydomain.log to http://www.mydomain.com/file.html.

If you are using this command to combine logfiles from several different virtual hosts, then the Virtual Host Report doesn't tell you about the different virtual hosts. The virtual host name has just become part of the filename. So you want to look in the Directory Report instead. (And you will probably want to use the SUBDIR command as well.)

If the logfile contains the name of the virtual host on each line, then the argument can contain a %v, and the name of the virtual host will be inserted at that point. If %v is included and the logfile line doesn't have a virtual host, then that line will be marked as corrupt.


It is often convenient to store logfiles compressed to save disk space. Analog on the Mac can read logfiles compressed using gzip. And analog on Unix and Win32 can read compressed logfiles provided that you use an UNCOMPRESS command to say how to uncompress them. You need to supply the types of file that you want to uncompress in a comma-separated list, together with the name of a command that will uncompress the files to standard output (rather than to a file). For example, on Unix you might use
UNCOMPRESS *.gz,*.Z  /usr/bin/gzcat
whereas on Windows NT, you might use
UNCOMPRESS *.gz ("c:\Program Files\gzip\gzip" -cd)
This would be a suitable command to include in the default configuration file.

If analog determines when it starts to uncompress a logfile that that file isn't wanted for the analysis, two undesirable things can happen. Either the program might pause until the logfile is fully uncompressed, or there might be a "broken pipe" error reported. This is system dependent, and out of analog's control.

(NB: There's nothing to stop you using the UNCOMPRESS command for other types of preprocessing, for example DNS resolution.)


Logfile formats

Here is a summary of the various logfile formats which analog knows about. To illustrate them, I have used the same (fictional) request as it might be recorded in the different formats.

The common logfile format is written by most servers. Its lines look like

jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ HTTP/1.0" 200 1243
(except all on one line). Some versions of Microsoft software have a buggy version of this with an extra quote mark before the HTTP like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000]
"GET /~sret1/ "HTTP/1.0" 200 1243
Analog will understand these, but (as with any two formats) it will reject lines if the format changes half way through.
The NCSA referrer log looks like
[25/Dec/1998:17:45:35] http://www.site.com/ -> /~sret1/
and the browser (or agent) log looks like
[25/Dec/1998:17:45:35] Mozilla/2.0 (X11; I; HP-UX A.09.05)
In the referrer log, the date can be omitted.
The NCSA combined log is the same as the common log, except that it has the referrer and browser on the end in quotes, like this:
jay.bird.com - fred [25/Dec/1998:17:45:35 +0000] "GET /~sret1/ HTTP/1.0"
200 1243 "http://www.site.com/" "Mozilla/2.0 (X11; I; HP-UX A.09.05)"
(except all one line). If you are using the Apache server, you can generate this with the mod_log_config module, using the Apache command
LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\""
It is usually better to use the combined log than separate logs, because it stores more information in less space.
The Microsoft IIS logfile looks like
192.64.25.41, -, 25/12/98, 17:45:35, W3SVC1, HOST1, 192.16.225.10,
2178, 303, 1243, 200, 0, GET, /~sret1/, -,
(except all on one line). However, the format is extremely badly designed, in that the date follows local conventions: for example, in North America the above example would have the date 12/25/98 instead. Also the year may have two or four digits.

Analog will diagnose which form the logfile is in if possible: but if both the date and the month are at most 12, there is no way to tell which format it is. In this case, it will report "Logfile with ambiguous dates" and advise you to use the command LOGFORMAT MICROSOFT-NA2 for North American date format, or LOGFORMAT MICROSOFT-INT2 for international date format. (Or -NA4 or -INT4 if you have four digit years.)

In some countries, the date will not be in any of these formats, in which case you need to write your own LOGFORMAT command, based on the examples in the next section.

There are also various third-party extensions to the Microsoft format to include, for example, the browser and referrer. But they all do it in different ways, so analog can't automatically diagnose them, and again, you need to write a LOGFORMAT command for them.


The WebSite format looks like
12/25/98 17:45:35  jay.bird.com  host1  Server  fred  GET  /~sret1/
http://www.site.com/    Mozilla/2.0 (X11; I; HP-UX A.09.05)  200  1243  2178
(except all on one line, and with the fields separated by tabs). It suffers from the same problem with ambiguous dates as the IIS logfile (above), so again you might have to use LOGFORMAT WEBSITE-NA or LOGFORMAT WEBSITE-INT, or even have to write your own LOGFORMAT command.
The W3 extended log, the Netscape log, and the WebSTAR log can be recognised because they must include at or near the top a line telling analog what format to expect on subsequent lines. (They may also contain later lines changing the format). If the header line is missing, analog won't be able to interpret the subsequent lines and so won't be able to analyse the logfile. In this case, you will have to either replace the missing header or use a LOGFORMAT command to tell analog your format.

If analog finds that the header line is corrupt, it will usually tell you what was wrong with it. The most common problem is that you're not allowed the time without the date or vice versa -- in particular, having the date just at the top of the logfile is not sufficient; you must have it on each line. By default, Microsoft servers produce extended logs with the date only at the top. But if the date changes during the logfile, the server doesn't then write a new date line. For this reason analog can't analyse such logfiles safely. There are some programs on the helper applications page to put the date on each line. If you already have such a logfile you might want to use one of these programs, but they have to assume that the date doesn't change during the logfile, so it would be safer to tell your server to log the date on every line in future.

The extended log is described at http://www.w3.org/TR/WD-logfile.html. Its header line looks like

#Fields: date time cs-uri
In the rest of the logfile, the fields can be separated by spaces or tabs. Remember the logfile must contain the date as well as the time on every line -- see above.

There is also Microsoft's attempt at the extended format -- unfortunately they didn't read the spec., so they didn't enclose the browser and referrer in quotes, they replaced spaces in the browser name with +'s, and they put the time taken to serve the request in milliseconds instead of seconds. And there is WebSTAR's attempt which is very nearly right except that they erroneously used the CS-HOST field as the client hostname instead of the server hostname. Analog will understand all of these versions.

Extended logs always record the time in GMT, so you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone.

The WebSTAR format is described at http://www.starnine.com/webstar/docs/ws4manual.3f.html. It has a header line like

!!LOG_FORMAT DATE TIME RESULT URL BYTES_SENT HOSTNAME
In the rest of the logfile, the fields are separated by tabs. The WebSTAR server also records the time in GMT, so again you will probably need to use a LOGTIMEOFFSET command to convert to your local timezone. Some other Mac servers also use the WebSTAR format, or something looking like it. Analog will understand these too.

Finally, the Netscape header line looks like

format=%Ses->client.ip% [%SYSDATE%] "%Req->reqpb.clf-request%"
%Req->srvhdrs.clf-status% %Req->srvhdrs.content-length%

Go to the analog home page.

Stephen Turner
07 July 2001

Need help with analog? Use the analog-help mailing list.

[ Top | Up | Prev | Next | Map | Index ]