CSCI 330 Quiz 3 Sample Solutions

Question 1 Regular expressions [10 marks]

(Part i) Provide a regular expression for each of the patterns described below:
  1. Names consist of one or more alphabetic characters (anything in 'a' to 'z' or 'A' to 'Z').
  2. Keywords match any one of the following: "def", "base", "hex", "ace".
  3. Comments consist of any sequence of characters that begins with '[' and ends with ']' and does not have any ']' inside.
  4. Hexlit: consist of one or more hex characters (anything in '0' to '9' or 'a' to 'f' or 'A' to 'F').

Sample solution
Names: [a-zA-Z]+
Keywords: def | base | hex | ace
Comments: [\[][^\]]*[\]]
Hexlit: [0-9a-fA-F]+

(Part ii) First, identify which pairs of regular expressions from part i conflict (overlap) and in what ways, then suggest an ordering to use when matching the regexes against tokens to resolve these conflicts and briefly justify your ordering.

Sample solution
Comments don't overlap with any of the others, can go anywhere in the order

"def" and "ace" overlap with Hexlit, all four keywords overlap with Names
   and it seems likely that keywords would be higher priority than either,
   thus the Keywords rule should be before Names and Hexlit

Names and Hexlit overlap for anything in [a-fA-F]*
    e.g.  aaa+bbb could easily describe the sum of two hexlits or the sum of two identifiers
    and I don't see an overarching rationale for prioritizing one or the other:
       so it could come down to a language philosophy decision
     *or* tweak your regular expressions to eliminate the ambiguity,
          like add some unique prefix character(s) for hexlits, e.g. #[0-9a-fA-F]+

Question 2 Context free gramars [14 marks]

Suppose our language has the token set and grammar rules shown below, and we are attempting to parse the expression
    22   +   a   *   19   *   b   +   c

(i) Sketch two different valid parse trees for the expression.

Sample solution

There is a core of the parse tree that is fixed, the ambiguity surfaces in a subtree within in.
The ascii sketch below shows the overall tree then the two different subtree versions.

           expr
          / |  \
      expr PLUS mult
     / |  \        \
 expr PLUS subtree  IDENT
  |                  (c)
 mult
  |
NUMBER
 (22)

       mult                                     mult
     / |   \                                  / |   \
 mult TIMES mult                          mult TIMES mult
  |        / |   \                      /  |  \       |
IDENT  mult TIMES mult              mult TIMES mult  IDENT
 (a)    |           |                |          |     (b)
      NUMBER      IDENT            IDENT      NUMBER
       (19)        (b)              (a)        (19)

(ii) Briefly explain how/why this shows an ambiguity in the grammar and the nature of that ambiguity.

Sample solution
Two structurally different parse trees for the same input in a language is the very definition of ambiguity,
and the importance of the ambiguity in this example is that
one subtree implies the subtree 'meaning' is (a*19)*b while the other implies a*(19*b), resulting in different
intepretations of the correct sequence of computations.

In a more general sense, the rule mult: mult TIMES mult means that in any subexpression
with a sequence of TIMES operators they could justifiably be parsed in any order relative to
one another.

 Tokens
PLUS: "+"
TIMES: "*"
EQ: "="
SEMI: ";"
NUMBER: [0-9]+
IDENT: [a-z]+
 CFG rules
many: one many;
many: one;
one: IDENT EQ expr SEMI;
expr: expr PLUS mult;
expr: mult;
mult: mult TIMES mult;
mult: IDENT;
mult: NUMBER