Searching Files
Many programs provide facilities for searching files for
groups of characters that match patterns.
These programs include:
- all the standard editors (vi, emacs, ed, ex, sed ...)
- special scripting languages such as awk and perl
- pattern matching commands, namely grep, egrep
and fgrep.
The Grep Family of Commands
Regular-Expressions in Grep
The following conventions appear to be universal in all the programs
which support regular expression patterns (vi, awk,
perl ...).
- Any ordinary character matches itself
- The special characters are:
. ^ $ [ * \
- The notations \. \^ \$
\[ \* and \\ match exactly one
occurrence of the special character given after the backslash.
- Special pattern matching notations:
.
| A period matches any single character (but not a
newline). |
^
| A caret matches an empty string at the beginning of the
line. |
$
| A dollar symbol matches an empty string at the end of
the line. |
[abc]
| A set of characters, from which exactly one character
will be matched. A shorthand notation exists for a
range of characters, e.g. [a-z]. |
[^abc]
| This matches exactly one character and this character is
not contained in the set after the caret symbol. |
*
| an asterisk following a one-character pattern (i.e. a
period or the square bracket set notation) indicates
that zero or more repetitions of that one-character
pattern are allowed. |
Note that only a subset of the notation has been covered above.
For more complete information, you should consult the on-line
manual pages.
Examples using Basic Grep Patterns
Regular Expressions in Egrep
Egrep has an annoyingly different set of pattern matching operations
compared to grep.
It is probably more powerful, including the following operators
which are not provided in grep.
*
| matches zero or more repetitions of the preceding
regular expression.
(I.e we are not constrained to a one-character pattern). |
+
| matches one or more repetitions of the preceding regular
expression. |
?
| matches zero or one repetitions of the preceding regular
expression. |
|
| separates two regular expressions, either of which is
matched. |
( .. )
| Sub-expressions in a regular expression may be enclosed
in parentheses. |
Command-Line Flags
The commonly used flags are:
-i
| ignore case of letters when pattern matching |
-n
| always show the number of a line containing a match |
-v
| invert the pattern matching so that lines which do not
match the pattern are output by grep/egrep/fgrep. |
-w
| apply the pattern to each word in the file (i.e. this is
equivalent to enclosing the pattern in
\< and \> brackets). |
More Examples of Use