sh
& csh
), and
the Unix awk
program notation.
#!
characters on
the first line must be in columns one and two. The remainder
of the first line must give the location of the executable
file for the perl program. (If the script file is flagged as
executable, the Unix system will automatically invoke perl
when you use the name of the perl script as a command.)
#!/public/bin/perl chdir("/usr/man/man1") || die "cannot cd to /usr/man/man1"; # loop over all files in directory /usr/man/man1 while ($filename = <*>) { if (!open(MANFILE, "$filename")) { print STDERR "could not read $filename\n"; next; } # read the file, looking for a line reading: # .SH NAME while (<MANFILE>) { if (/^\.SH *NAME\b/) { last; # exit the loop } } if ($_) { # we found that line, read the following line $_ = <MANFILE>; if ($_) { s/\\//g; # remove backslashes in $_ print $_; } else { # It's an error if we are at the end of file print STDERR "$filename: eof found after .SH NAME\n"; } } else { print STDERR "$filename: no .SH NAME line found\n"; } close(MANFILE); }Our example program visits a system directory named
/usr/man/man1
.
In that directory, there are files containing
unformatted on-line manual information.
For example, there may be a file named ls.1v
,
and this contains the
on-line documentation for the ls
command.
The file is meant to be input to the nroff
or
troff
text formatting packages on Unix.
(The groff
program is an equivalent to
nroff
and troff
.)
Our Perl script reads each file looking for a line which
is exactly as follows.
.SH NAME(where the period is in column one). This line appears in each description to introduce a one-line synopsis of the command. For example, on SunOS, the
ls.1v
file contains
the two lines
.SH NAME ls \- list the contents of a directoryOur Perl script will output that line (after removing the backslash character).
The net result is that we obtain synopses of all the Unix commands for which documentation is available.
Here is a quick run through of the program logic:
/usr/man/man1
where the
on-line documentation for commands normally resides.
If we cannot change to that directory, we print an error
message and quit.<*>
matches all the filenames
in the current directory, and delivers the next filename to us
each time we use that construct.$filename
.
(When there are no more filenames, the <*>
construct delivers a special undef value;
this value is considered equivalent to False and
causes the loop to terminate.)<MANFILE>
implicitly reads the next line (as a string value) into the
variable named $_
.
If no lines remain to be read, the
undef value is assigned to $_
.$_
.
If $_
matches the pattern ^\.SH *NAME\b
then we exit the loop. ^
forces a match at the beginning of the line;
the *
allows zero or more repetitions of the
preceding space; the \b
construct insists that we
are at a word boundary.)$_
.
If we successfully read that line, we use the statement
s/\\//g;to remove all occurrences of a backslash in variable
$_
.
(Without the g
, only the first occurrence would be
changed.)$_
.
In fact, the statement could be abbreviated to just:
print;because
$_
will be supplied as a default argument
if none is given in the program.
(And this default behaviour applies to many Perl operations.)
0 99 -123 2.34 -6.3e23 24.34E-4
\'
and \\
.
The string can contain any characters including linefeeds.
If double quotes are used, all the usual C escape codes
may be used (plus some more) and, as in csh,
variable names may get expanded. 'hi' 'don\'t' 'the backslash is \\, so there!' 'line one line two' "line one\nline two" "a doublequote is \"!"
+ - * / % **(
%
is modulus, as in C;
**
is exponentiation.)< <= == >= > !=
lt le eq ge gt ne
. xThe dot operator concatenates two strings. E.g.
"abc" . "def"has the same value as
"abcdef"
.
The x
operator performs replication. E.g.
"abc" x 3produces the string value
"abcabcabc"
.
$
.@
.%
.
$Counter $i $a_silly_name_37X @Array1 @A_B_C %Map_Number_37
=
. E.g.
$X = $Y + 1; @Array1 = ( 0, "Hi!", -4.5, "Bye!" );
+= -= *= /= %= **= ++ --E.g.
$X += 10; $Y = ++$X; $Z--;
.= x=Examples:
$A .= "abc"; ## same as $A = $A . "abc"; $B x= 3; ## same as $B = $B x 3;
chop()
chops the last character off
a string variable. Examples:
$A = "abcdef"; chop($A); ## sets $A to be "abcde" $X = chop($A); ## sets $X = "d", $A = "abc"
@A1 = ( 0, "A", -4.5 ); @A2 = ( "First", @A1, "Last" );
$
(not @
) to indicate a scalar value.
Arrays have, by default, zero origin indexing as in C.
Example:
@A1 = ( 0, "A", -4.5 ); $i = $A1[1]; ## assigns "A" to $i $A1[1] = 3.3; ## replaces "A" with 3.3
@A1 = ( 0, "A", -4.5 ); $Len = @A1; ## set $Len = 3
@A1 = ( 0, "A", -4.5 ); ($V0, $V1) = @A1; ## sets $V0 = 0, $V1 = "A" ($X) = @A1; ## sets $X = 0
@A1 = (10, 20, 30, 40, 50, 60); @A2 = @A1[2..4]; ## sets @A2 = (30, 40, 50); @A3 = @A1[4,0,2]; ## sets @A3 = (50, 10, 30);
push()
and pop()
operators
are provided.
push(@A, $val); ## same as ## @A = (@A $val); $X = pop(@A); ## removes last value from @A ## and assigns it to $X
unshift()
and shift()
operations
insert into and extract from a queue.
unshift(@Q, $val); ## same as ## @Q = ($val @Q); $x = shift(@Q); ## same as ## $x = $Q[0]; ## @Q = @Q[1..@Q-1];
reverse()
,
sort()
and chop()
. E.g.
@A1 = (1.1, 5.5, 2.2, 6.6, 3.3, 4.4); @A2 = reverse @A1; ## sets @A2 = (4.4,3.3, ... 1.1) @A3 = sort @A1; ## sets @A3 = (1.1,2.2, ... 6.6)Applying
chop()
to an array is the same as
applying chop()
to each element.
continue
is renamed to next
and
break
becomes last
.
There are also some additions, which appear in the following list.
{ stmt1; stmt2; ... stmtM; }
if ( expr ) { stmt1; ... stmtM; }
if ( expr ) { stmt1; ... stmtM; }
else { stmt1; ... stmtN; }
if ( expr1 ) { stmt1; ... stmtM; }
elsif ( expr2 ) { stmt1; ... stmtN; }
...
else { stmt1; ... stmtP; }
unless ( expr ) { stmt1; ... stmtM; }
while ( expr ) { stmt1; ... stmtM; }
until ( expr ) { stmt1; ... stmtM; }
for( init_expr; test_expr; incr_expr ) {
stmt1; ... stmtM;
}
foreach $Var (@ListOfValues) {
stmt1; ... stmtM;
}
next
statement ...
causes the next iteration of the loop to start immediately.
I.e., control jumps to the top of the loop, as with a C
continue
statement.
last
statement ...
causes Perl to exit the closest containing loop,
as with a C break
statement.
STDIN
, STDOUT
, and STDERR
.
They correspond to the standard input, standard output and
standard error output streams of Unix, respectively.STDIN
,
we can read one line from the file using the STDIN
notation. E.g.
$Line = <STDIN>;reads the next line, as a string value, into variable
$Line
.$Line
is assigned
the special value undef
."\n"
). Thus,
$Line = <STDIN>; chop($Line);is common usage.
STDIN
notation is used alone,
the input line is implicitly assigned to
the variable $_
. Thus,
while (<STDIN>) { print $_; }copies all the standard input to the standard output stream.
print "Variable X = ", $X, " Y = ", $Y, "\n"sends five values in succession to the output.
print @A;
print STDERR "Unrecoverable error!\n"
die
prints to the standard error
stream and then terminates the program. Example:
die "Unrecoverable error!\n"
printf
operator. For example:
printf "Pi = %10.8; e = %10.8\n", pi, e;
open(INPUTFILE,"/home/sue/project");An
undef
value is assigned if
the file cannot be opened.open(OUTPUTFILE,">results.txt");(The leading `>' indicates output; if the first two characters are '>>', the file is opened in append mode.)
close
function. E.g.
close(INPUTFILE);
The pattern expressions in Perl are more complex than regular expressions. Nonetheless, a good understanding of regular expressions helps a lot in understanding Perl pattern expressions.
There are a small number of rules for determining which strings match a regular expression:
=~
operator may be used to match
a string value on the left with a pattern on the right.
For example:
if ($Line =~ /^$/) { print "empty line\n"; }performs a match against the pattern
^$
(with the same meaning as in the Unix programs grep and egrep).
The slash characters surrounding the pattern indicate to perl
that the meanings of special symbols are treated differently.
$_
variable takes place.
E.g.
if ( /^$/ } { emptyLineCnt++; }
$Line =~ s/man/person/;changes the first occurrence of man in
$Line
to person.$_
variable. The following is a complete statement:
s/man/person/; ## edit the $_ variable
^
$
\b
\B
{m,n}
{m,}
{n}
*
+
?
.
[abc]
[^abc]
(...)
\n
\t
\r
\f
\d
[0-9]
\D
[^0-9]
\w
[a-zA-Z0-9_]
\W
\s
[ \t\n\r\f]
\S
\1
\2
\023
023
(etc.)
\x7f
\cD
$` . $& . $'is always the same as the original string.
The program:
#!/public/bin/perl @a = ( 'a', 'ba', 'a*', 'aa*', '(aa)*', '\b', ' .\*', '(a|b|c)', '(a|b|c)* (c|b|a)*' ); $s0 = "abc aaa x*"; foreach $s (@a) { if ($s0 =~ /$s/) { print "$s\n\t!$`!$&!$'!\n"; } else { print "$s\n\tno match\n"; } }Exact output:
a !!a!bc aaa x*! ba no match a* !!a!bc aaa x*! aa* !!a!bc aaa x*! (aa)* !!!abc aaa x*! \b !!!abc aaa x*! .\* !abc aaa! x*!! (a|b|c) !!a!bc aaa x*! (a|b|c)* (c|b|a)* !!abc aaa! x*!
&foo( $arg1, "arg2", 3 ); ## 3 args passed &bar( @argArray ); ## each array element is an argument &doSomething(); ## no arguments passed
sub foo { ... }
@_
.
E.g., our foo subroutine could receive its three
arguments like this:
sub foo { ($v1, $v2, $v3) = @_; ... }
sub foo { foreach (@_) { ## $_ holds the next argument to process ... } ... }
$v1
, $v2
and $v3
.)sub foo { local($v1, $v2, $v3) = @_; ... }and additional local variables may be created by executing a statement like this:
local($temp); ## create variable $tempThe new variables are initialized to the
undef
value.
The variables disappear when the subroutine returns.sub maxOfMany { local($result) = pop(@_); ## remove last argument foreach (@_) { if ($_ > $result) { $result = $_; } } $result; ## return the result }And we can invoke that subroutine as in this example:
$m = &maxOfMany( -10, 3, 15, -6, 4 );