Introduction to Perl

PERL

AN EXAMPLE PERL PROGRAM

The program reads as below. Note that the #! characters on the first line must be in columns one and two. The remainder of the first line must give the location of the executable file for the perl program. (If the script file is flagged as executable, the Unix system will automatically invoke perl when you use the name of the perl script as a command.)
    #!/public/bin/perl

    chdir("/usr/man/man1") || die "cannot cd to /usr/man/man1";

    # loop over all files in directory /usr/man/man1
    while ($filename = <*>) {

	if (!open(MANFILE, "$filename")) {
	    print STDERR "could not read $filename\n";
	    next;
	}

	# read the file, looking for a line reading:
	#		.SH NAME
	while (<MANFILE>) {
	    if (/^\.SH  *NAME\b/) {
		last; # exit the loop
	    }
	}

	if ($_) {
	    # we found that line, read the following line
	    $_ = <MANFILE>;
	    if ($_) {
		s/\\//g; # remove backslashes in $_
		print $_;
	    } else {
		# It's an error if we are at the end of file
		print STDERR "$filename: eof found after .SH NAME\n";
	    }
	} else {
	    print STDERR "$filename: no .SH NAME line found\n";
	}
	close(MANFILE);
    }
Our example program visits a system directory named /usr/man/man1. In that directory, there are files containing unformatted on-line manual information. For example, there may be a file named ls.1v, and this contains the on-line documentation for the ls command. The file is meant to be input to the nroff or troff text formatting packages on Unix. (The groff program is an equivalent to nroff and troff.) Our Perl script reads each file looking for a line which is exactly as follows.
	.SH NAME
(where the period is in column one). This line appears in each description to introduce a one-line synopsis of the command. For example, on SunOS, the ls.1v file contains the two lines
	.SH NAME
	ls \- list the contents of a directory
Our Perl script will output that line (after removing the backslash character).

The net result is that we obtain synopses of all the Unix commands for which documentation is available.

Here is a quick run through of the program logic:

  1. We change our directory into /usr/man/man1 where the on-line documentation for commands normally resides. If we cannot change to that directory, we print an error message and quit.

  2. The construct <*> matches all the filenames in the current directory, and delivers the next filename to us each time we use that construct.

  3. We assign the next filename into the variable $filename. (When there are no more filenames, the <*> construct delivers a special undef value; this value is considered equivalent to False and causes the loop to terminate.)

  4. We open the file for input. If we cannot open the file, we print an error message and go back to the top of the loop for the next file.

  5. We enter a loop, reading one line from the file on each iteration. The construct <MANFILE> implicitly reads the next line (as a string value) into the variable named $_. If no lines remain to be read, the undef value is assigned to $_.

  6. Inside the while loop, we use regular expression pattern matching to check the contents of $_. If $_ matches the pattern ^\.SH *NAME\b then we exit the loop.
    (The ^ forces a match at the beginning of the line; the * allows zero or more repetitions of the preceding space; the \b construct insists that we are at a word boundary.)

  7. If we didn't hit the end of the file, we read the next line into variable $_. If we successfully read that line, we use the statement
    	s/\\//g;
    
    to remove all occurrences of a backslash in variable $_. (Without the g, only the first occurrence would be changed.)

  8. And then we print the line in $_. In fact, the statement could be abbreviated to just:
    	print;
    
    because $_ will be supplied as a default argument if none is given in the program. (And this default behaviour applies to many Perl operations.)

SCALAR DATATYPES AND CONSTANTS

OPERATORS

VARIABLES

ASSIGNMENT OPERATORS, etc

OPERATIONS ON ARRAYS

CONTROL STRUCTURES

The control structures are similar to C except that the keyword continue is renamed to next and break becomes last. There are also some additions, which appear in the following list.

INPUT-OUTPUT AND FILE OPERATIONS

PATTERN MATCHING

Regular expressions

Perl patterns

Frequently, a pattern will match more than one substring in a string. In that case, the "leftmost/longest" rule applies: the leftmost position where a match is found is chosen and, beginning at that position, the longest matching substring is selected.

Example program

The program below illustrates the pattern operators. Note that, after a successful match, $& contains the matched substring, $` contains the portion of the original string preceding the match, and $' contains the portion of the original string following the match. Thus
	$` . $& . $'
is always the same as the original string.

The program:

	#!/public/bin/perl

	@a = (  'a',    'ba',           'a*',
		'aa*',  '(aa)*',        '\b',
		' .\*', '(a|b|c)',      '(a|b|c)* (c|b|a)*'
	);

	$s0 = "abc aaa x*";

	foreach $s (@a) {
		if ($s0 =~ /$s/) {
			print "$s\n\t!$`!$&!$'!\n";
		} else {
			print "$s\n\tno match\n";
		}
	}
Exact output:
	a
		!!a!bc aaa x*!
	ba
		no match
	a*
		!!a!bc aaa x*!
	aa*
		!!a!bc aaa x*!
	(aa)*
		!!!abc aaa x*!
	\b
		!!!abc aaa x*!
	 .\*
		!abc aaa! x*!!
	(a|b|c)
		!!a!bc aaa x*!
	(a|b|c)* (c|b|a)*
		!!abc aaa! x*!

SUBROUTINES