KWIC reads a list of lines, generates all the circular shifts of each line, sorts the shifted lines, and writes the resulting lines to stdout. For a given line, a circular shift is generated by removing the first word in the line and appending it to the end of the line. Thus, a line with N words has N circular shifts (including the original line).
The KWIC output is useful because it supports a simple lookup scheme. Suppose that the input is a long list of book titles, and that the user wants all the books on "programming". In the KWIC output, the title (perhaps shifted) of every one of those books will have an entry under "p".
For some words, e.g., "the" and "of", there is no point in generating a shift; no user will look up all lines beginning with "the". Therefore, KWIC provides a facility for ignoring such "noise words" in the output. The user provides a noise words file and shifts beginning with a noise word are suppressed.
kwic [-n noiseWordsFile] linesFile ...If the
-n
argument is present,
then the noise words are read from noiseWordsFile
.
Otherwise, if the KWICNOISEWORDS
environment variable
exists, then KWICNOISEWORDS
is assumed to contain the name of
the noise words file.
If the -n
argument is not present and
the KWICNOISEWORDS
environment variable does not exist,
then the noise words are read from the file noiseWords
in the current directory.
If one or more linesFile
s are present,
then the lines to be shifted are read from these files.
Otherwise the lines are read from stdin
.
In the noise words file, each word must appear on a line of its own. The words must be sorted in ascending order. Case is irrevelant. Thus, if "the" is in the noise words file, then a shift beginnning with "The" or "the" will be suppressed.
The C Programming Language The Cat in the Hatand the noise words file contains the following words:
and in theThen KWIC will write the following lines to stdout:
C Programming Language, The Cat in the Hat, The Hat, The Cat in the Language, The C Programming Programming Language, The C