Sample solution: Identifiers: [A-Z][a-zA-Z0-9]* Integers: [-]?[0-9]+ Reals: [.][0-9]+[x][+-][0-9]+ |
Sample solution: keywords and identifiers can't clash since keywords are all lowercase whereas identifiers must start with an uppercase, so no issues there reals (and their embedded x,+,-) can't conflict with anything else because of the leading . we do need to check integers before the - operator, however, since we'd want -12 to be treated as the integer -12 not the - operator followed by the integer 12 |
Token types: IDENTIFIER: [a-z]+ INTEGER: [0-9]+ ADD: [+] MULT: [*] Grammar rules: expression --> multexpr ADD expression expression --> multexpr multexpr --> value multexpr --> multexpr MULT multexpr value --> IDENTIFIER value --> INTEGER
Sample solution:
The ambiguity stems from the multexpr --> multexpr MULT multexpr.
If we have a token sequence like x * y * z either of the two * operations
could be parsed first, producing either of the following parse trees:
multexpr
/ \
/ \
value multexpr
| / \
IDENTIFIER multexpr multexpr equating to
(x) | | (x * (y * z))
value value
| |
IDENTIFIER IDENTIFIER
(y) (z)
multexpr
/ \
/ \
multexpr multexpr equating to
/ \ \ ((x * y) * z)
multexpr multexpr value
| | \
value value IDENTIFIER
| | (z)
IDENTIFIER IDENTIFIER
(x) (y)
|
Token types: IDENTIFIER: [a-z]+ INTEGER: [0-9]+ ADD: [+] ASSIGN: [=] MULT: [*] SEMI: [;] Grammar rules: assignment --> IDENTIFIER ASSIGN expr SEMI expr --> subexpr expr --> subexpr op expr subexpr --> val subexpr --> subexpr ASSIGN val val --> IDENTIFIER | INTEGER op --> ADD | MULT assignment is the 'starting' non-terminal
Sample solution:
(i) precedence and associativity
The leftmost assign (from the assignment rule) is a special case in that it
will always get parsed out first (evaluated last) and there can only be one.
The remainder of the discussion deals with precedence within the subexpr's.
The only way ADD/MULT can be processed is through expr rules,
and all the expr rules must be applied before moving on to subexprs,
so all the ADD/MULTs must be parsed out before/above any of the ASSIGNs.
This means they'll be higher in the parse tree and evaluated later than ASSIGNs
(since the tree is constructed top-down but evaluated bottom-up).
Thus ASSIGN has higher precedence thand ADD/MULT.
ADD and MULT are treated interchangably within expr, so have the same
associativity and precedence.
In terms of associativity, if we consider x = y = z we can see in the subexpr rule
that the rightmost = must be parsed first, since the thing to its right can only
be a value (identifier or integer)
Thus it parses like
=
/ \
= z
/ \
x y
which effectively means ((x = y) = z) so assignments evaluate left-to-right.
The rules for +/* are the opposite: the expr --> subexpr op expr means the portion
with any other */+ must appear in the right subtree, so x * y * z gives
*
/ \
x * which effectively means (x * (y * z))
/ \ and implies */+ evaluate right to left
y z
(ii) is this sensible?
From a conventional viewpoint probably not: this is the opposite associativity to
"normal" math, and the opposite associativity to the way most languages handle assignment,
and the leftmost = operator has a different precedence level than all the other ones do.
Consider a = b + c = d = e + f is permitted and gives something like
a = (b + (((c = d) = e) + f))?
(That doesn't mean it's unusable, and there may be a solid justification for it
in some special-purpose language, but it would be counterintutitive for most
programmers and probably not recommended in a general-purpose language.)
|
Token types: Grammar rules
IDENTIFIER: [a-z]+ program --> statements
INTLIT: [0-9]+ statements --> stmt
STRLIT: ["][^\n]*["] statements --> stmt statements
ASSIGN: [=] stmt --> assignment
SEMI: [;] stmt --> declaration
INT: "integer" assignment --> IDENTIFIER ASSIGN value SEMI
STR: "string" declaration --> IDENTIFIER type SEMI
type --> STR
type --> INT
value --> INTLIT
value --> STRLIT
Briefly outline the type/declaration checking that would need to be performed in each of
the relevant rules.
Sample solution:
We'd need some way to record/look up what variables have been declared
and what type they were declared with, I'll assume here we're using
some form of symbol table.
We'd also need to evaluate the data type associated with the right hand value
in the assignment statement, so here I'll assume we can associate types with
the nonterminals as we evaluate rules.
Given those assumptions, a workable checking approach would be:
value --> INTLIT
action: associate datatype integer with this 'value' nonterminal
value --> STRLIT
action: associate datatype string with this 'value' nonterminal
type --> STR
action: associate datatype string with this 'type' nonterminal
type --> INT
action: associate datatype integer with this 'type' nonterminal
declaration --> IDENTIFIER type SEMI
action: lookup the identifier in the symbol table, error if previously declared
insert the identifier into the table with whatever datatype is associated with the 'type' nonterminal
assignment --> IDENTIFIER ASSIGN value SEMI
action: lookup the identifier in the symbol table, error if not declared yet
check the datatype for that identifier (from the lookup) against the datatype
associated with the 'value' nonterminal, error if the two datatypes don't match
|