 Analog 5.22:
Inclusions and exclusions
 Analog 5.22:
Inclusions and exclusionsHOSTEXCLUDE mycomputer.myisp.comwould exclude all requests by that computer from the statistics. (To exclude lines just from one specific report, see below.)
The rule for determining whether an item is included or excluded is as follows. All the INCLUDE and EXCLUDE commands for that item are considered one by one in order, and the item is included or excluded according to the last command it matched. Items which don't match any of the INCLUDE or EXCLUDE commands are included if the first command was an exclusion, and excluded if the first command was an inclusion. For example, the configuration
FILEINCLUDE /~sret1/* FILEEXCLUDE /~sret1/backgammon/*,/~sret1/analog/* FILEINCLUDE /~sret1/backgammon/*.gifwould instruct the program to examine only my files, excluding my backgammon and analog files, but including gifs in my backgammon directory. On the other hand,
FILEEXCLUDE /~sret1/*/img/*would analyse all files, except for images in my various directories. (If you get confused with all the inclusions and exclusions, remember that you can always use SETTINGS ON to see what the options you have specified represent.) Note that inclusions and exclusions can contain any number of wildcards, and can be lists separated by commas (but no spaces).
The full list of these commands is HOSTINCLUDE and HOSTEXCLUDE; FILEINCLUDE and FILEEXCLUDE; BROWINCLUDE and BROWEXCLUDE; REFINCLUDE and REFEXCLUDE; USERINCLUDE and USEREXCLUDE; VHOSTINCLUDE and VHOSTEXCLUDE; and STATUSINCLUDE and STATUSEXCLUDE.
Because the inclusions and exclusions take place after the aliasing, the name you must use is the aliased name. (In the absence of output alias commands, this is the name of the item in the output.)
Sometimes a line doesn't contain a particular sort of item, either because there is no field reserved for it on the line, or because the browser didn't send it for that request, or because it was present but corrupt. You can include or exclude these lines by making a special blank entry in the INCLUDE or EXCLUDE command. For example,
USERINCLUDE jim USERINCLUDE ""would include lines from user jim and lines without any user specified.
The behaviour of REQINCLUDE and REFINCLUDE can be slightly unintuitive if the file has search arguments.
You can also use regular expressions for the inclusions and exclusions by prefixing the expression with "REGEXP:" or "REGEXPI:". I've already described this at length in the context of aliases, so you can look there for all the details. A regular expression must be on a line on its own, not within a comma-separated list.
STATUSINCLUDE 200-206,304,500-would mean only look at lines with status codes 200-206, 304 or 500-599.
Some people want to exclude status code 304 (Not Modified) to stop those requests appearing in the Request Report. But there is a better solution. By default, analog counts code 304 as a successful request, because it assumes that the cached version of the document is then presented to the user. But you can count it as a redirected request with the command
304ISSUCCESS OFFAgain, if you don't understand this, stick with the default.
FROM 990701 TO 000615:1300Alternatively, each of the components can be preceded by + or - to represent time relative to the time at which the program was invoked. In this case, the date can have more than 2 digits. This allows constructions like
FROM -01-00+01   # from tomorrow last year
TO -00-0131  # to the end of last month (OK even if last month
             # didn't have 31 days)
FROM -00-00-112
TO   -00-00-01  # statistics for the last 16 weeks
FROM -00-00-00:-06+01  # statistics for the last 6 hours
There are command line abbreviations +F and +T
for the FROM and TO commands; for example,
+T-00-00-01:1800 looks at statistics until 6pm yesterday.
-F and -T turn off the from and to, as do FROM
OFF and TO OFF.
So, for example, the command
REFREPEXCLUDE http://your.site.com/*would exclude your internal referrers from the Referrer Report. However, it would not exclude them from the Failed Referrer Report, the Referring Site Report, etc. (you need to use FAILREFEXCLUDE, REFSITEEXCLUDE etc. for that); nor would it prevent other analysis of logfile lines with those referrers, as REFEXCLUDE would.
The full list of these commands is REQINCLUDE and REQEXCLUDE; REDIRINCLUDE and REDIREXCLUDE; FAILINCLUDE and FAILEXCLUDE; TYPEINCLUDE and TYPEEXCLUDE; DIRINCLUDE and DIREXCLUDE; HOSTREPINCLUDE and HOSTREPEXCLUDE; REDIRHOSTINCLUDE and REDIRHOSTEXCLUDE; FAILHOSTINCLUDE and FAILHOSTEXCLUDE; DOMINCLUDE and DOMEXCLUDE; ORGINCLUDE and ORGEXCLUDE; REFREPINCLUDE and REFREPEXCLUDE; REFSITEINCLUDE and REFSITEEXCLUDE; SEARCHQUERYINCLUDE and SEARCHQUERYEXCLUDE; SEARCHWORDINCLUDE and SEARCHWORDEXCLUDE; INTSEARCHQUERYINCLUDE and INTSEARCHQUERYEXCLUDE; INTSEARCHWORDINCLUDE and INTSEARCHWORDEXCLUDE; REDIRREFINCLUDE and REDIRREFEXCLUDE; FAILREFINCLUDE and FAILREFEXCLUDE; BROWSUMINCLUDE and BROWSUMEXCLUDE; BROWREPINCLUDE and BROWREPEXCLUDE; OSINCLUDE and OSEXCLUDE; VHOSTREPINCLUDE and VHOSTREPEXCLUDE; REDIRVHOSTREPINCLUDE and REDIRVHOSTREPEXCLUDE; FAILVHOSTREPINCLUDE and FAILVHOSTREPEXCLUDE; USERREPINCLUDE and USERREPEXCLUDE; REDIRUSERREPINCLUDE and REDIRUSERREPEXCLUDE; and FAILUSERINCLUDE and FAILUSEREXCLUDE.
The inclusion or exclusion applies to the unaliased name, if you are doing any output aliases. (This contrasts with the behaviour of normal INCLUDE and EXCLUDE commands, which apply to the aliased name.)
All directory names end in slashes, so DIRINCLUDE and DIREXCLUDE, and REFSITEINCLUDE and REFSITEEXCLUDE, implicitly add a trailing slash even if you don't give one. This sometimes catches people out in the following situation.
REFSITEEXCLUDE http://my.host.com/* # probably not what you wantmeans not to list subdirectories of the referring site http://my.host.com/, but to keep the site itself in the list. To exclude the site completely, just use
REFSITEEXCLUDE http://my.host.com/
You can also use the symbolic word pages in suitable INCLUDE and EXCLUDE commands; one very common command is
REQINCLUDE pagesto include only pages in the Request Report.
PAGEINCLUDE *.asp PAGEEXCLUDE /sret1.htmlI.e., *.asp are pages, but /sret1.html isn't. (If the file has search arguments, the PAGEINCLUDE and PAGEEXCLUDE are reckoned just on the part of the filename before the question mark.)
REQLINKINCLUDE pages,*.pdfwould link to pages and PDF files in the Request Report. The full set of these commands is REQLINKINCLUDE and REQLINKEXCLUDE (Request Report), REDIRLINKINCLUDE and REDIRLINKEXCLUDE (Redirection Report), FAILLINKINCLUDE and FAILLINKEXCLUDE (Failure Report), REFLINKINCLUDE and REFLINKEXCLUDE (Referrer Report), REDIRREFLINKINCLUDE and REDIRREFLINKEXCLUDE (Redirected Referrer Report), and FAILREFLINKINCLUDE and FAILREFLINKEXCLUDE (Failed Referrer Report). Note that the target of the links is also affected by the BASEURL command.
ROBOTINCLUDE Googlebot/*
Stephen Turner
Need help with analog? Use the analog-help mailing list.