Sample solution Names: [a-zA-Z]+ Keywords: def | base | hex | ace Comments: [\[][^\]]*[\]] Hexlit: [0-9a-fA-F]+ |
Sample solution Comments don't overlap with any of the others, can go anywhere in the order "def" and "ace" overlap with Hexlit, all four keywords overlap with Names and it seems likely that keywords would be higher priority than either, thus the Keywords rule should be before Names and Hexlit Names and Hexlit overlap for anything in [a-fA-F]* e.g. aaa+bbb could easily describe the sum of two hexlits or the sum of two identifiers and I don't see an overarching rationale for prioritizing one or the other: so it could come down to a language philosophy decision *or* tweak your regular expressions to eliminate the ambiguity, like add some unique prefix character(s) for hexlits, e.g. #[0-9a-fA-F]+ |
Sample solution There is a core of the parse tree that is fixed, the ambiguity surfaces in a subtree within in. The ascii sketch below shows the overall tree then the two different subtree versions. expr / | \ expr PLUS mult / | \ \ expr PLUS subtree IDENT | (c) mult | NUMBER (22) mult mult / | \ / | \ mult TIMES mult mult TIMES mult | / | \ / | \ | IDENT mult TIMES mult mult TIMES mult IDENT (a) | | | | (b) NUMBER IDENT IDENT NUMBER (19) (b) (a) (19) |
Sample solution Two structurally different parse trees for the same input in a language is the very definition of ambiguity, and the importance of the ambiguity in this example is that one subtree implies the subtree 'meaning' is (a*19)*b while the other implies a*(19*b), resulting in different intepretations of the correct sequence of computations. In a more general sense, the rule mult: mult TIMES mult means that in any subexpression with a sequence of TIMES operators they could justifiably be parsed in any order relative to one another. |
Tokens PLUS: "+" TIMES: "*" EQ: "=" SEMI: ";" NUMBER: [0-9]+ IDENT: [a-z]+ | CFG rules many: one many; many: one; one: IDENT EQ expr SEMI; expr: expr PLUS mult; expr: mult; mult: mult TIMES mult; mult: IDENT; mult: NUMBER |