| Known issues in lab4 sample solution
this is here as a just-in-case: if anyone spots a bug in the lab4 sample solution let me know and I'll share it here along with a suggested fix |
program
open
def x integer ;
# x is accessible here
repeat open
# x is accessible here
def y real ;
# x and y are accessible here
close until (x < 10) ;
# x is accessible here
close
# valid
program
open
def x integer ;
repeat open
def x integer ;
close until (x < 10) ;
close
program
open
def x integer ;
def x integer ; # invalid, already declared an x once in this scope
close
(Note: the important part here is that we're trying to redeclare the same name
in the same scope, it would still be an error if the second def x was as a real.)
int nextID; /* id to be assigned to next scope */ int scopes[128]; /* array of scopes */ int numScopes; /* how many scopes are currently on the stack */ void initScopes(); /* init nextID, numScopes, and scopes array */ int pushScope(); /* push annd increment nextID, return 0 on fail or numScopes otherwise */ int popScope(); /* pop and return top scope id, 0 if fails */ int topScope(); /* return top scope id, 0 if fails */e can call the initScopes from the main routine In the yacc rule for blocks, where block -> open statements close, we can insert a block of C to do the push operation after the open, and a block of C to do the pop operation after the close (along with error checking).
block: OPENKEY { if (pushScope() == 0) { deal with error, push failed, e.g. maybe stack is full }
statements
CLOSEKEY { if (popScope() == 0) { deal with error, pop failed somehow }
SEMI ;
%union { long info }
%token<long> OPENKEY CLOSEKEY etc
%type<long> program statements etc
Now, rather than just a single long, we'll allow storage of an entire struct for each token
and nonterminal, e.g.
%union { struct itemInfo { char content[128], int dtype; } info; }
%token<struct itemInfo> OPENKEY CLOSEKEY etc
%token<struct itemInfo program statements etc
This specifies that each token and nonterminal gets an info.content field (char array
to hold the text of an identifier or literal)
and an info.dtype field (e.g. to use 0 for unknown, 1 for integers, 2 for reals).
Setting fields in the lex file
We can get at these fields for a specific token inside the lex file through a variable named yylval,
e.g. yylval.info.content or yylval.info.dtype.
If you recall, in the lex file we had access to another local variable named yytext, containing
a copy of the text pattern that was just read.
If we wanted to copy yytext into the yylval.info.content for an identifier we can thus use
something: AAA foo ICK {
/* look at AAA and ICK's dtypes and, based on them, set something's dtype */
};
At the bottom layers of the parsing rules the actions are relatively simple
(here again assuming we've decided to use 1 for dtype when something is an integer,
2 when we've decided it's a real).
value: INTLIT { $<info.dtype>$ = 1; }
value: REALLIT { $<info.dtype>$ = 2; }
Similarly for type specifiers, if we had typespec -> INTKEY or REALKEY then we could
record the correct type for the typespec nonterminal using
typespec: INTKEY { $<info.dtype>$ = 1; }
typespec: REALKEY { $<info.dtype>$ = 2; }
For expressions, we can look at the types of its subexpressions and base our decisions on that, e.g.
addexpr: addexpr addop multexpr {
/* if the RHS addexpr and multexpr have the same type then use that for the LHS addexpr
otherwise use real as the type */
if ($<info.dtype>1 == $<info.dtype>2)
$<info.dtype>$ = $<info.dtype>1;
else $<info.dtype>$ = 2;
};
(The above assumes that every subexpression would have dtype 1 or 2: the logic would
be a little more complex if there are other possible types to worry about.)
Logic like the above can be used for everything except a rule like
| One last complication with the positions in $<info.whatever>N
If you have interleaved C chunks in your rule those count as 'positions', e.g. thing: aaa bbb { C code stuff } ccc ; ccc is in position 4 NOT position 3. |
/* storage for a single symbol table entry */
struct SymEntry {
char name[128]; /* symbol name */
int scopeId; /* declaration scope */
int typeId; /* 1 for integer, 2 for real */
};
struct SymEntry SymbolTbl[128]; /* just an array with space for 128 (or whatever) entries */
int numSyms; /* number of symbols currently in table */
void initSyms(); /* initializes numSyms for SymbolTbl */
int insert(char nm[], int r, int c, int sc, int typ); /* insert new entry, return 1 if ok, 0 otherwise */
struct SymEntry* lookup(char nm[]); /* return ptr to entry of innermost name match in active scopes, null if not found */
The initSyms would be called in main to set up the table (maybe just sets numSyms to 0).
The insert is pretty straightforward: if the table is already full then return an error (0),
otherwise fill in the next SymbolTbl array position with the passed data (remember to use strncpy
for the name), increment numSyms and return true (1).
The lookup is a bit tricker: we want to search from the innermost current scope to the outermost,
returning a pointer to the first variable found with a matching name. This means searching
backwards through the activescopes, e.g.
for s = numScopes-1 downto 0:
currentScope = activeScopes[s]
for e = 0 to numSyms-1:
if SymbolTbl[e]'s name matches the parameter nm
return the address of SymbolTbl[e]
return null if we never found a match
Using lookup and insert in our grammar rules
Within our grammar rules we would make appropriate use of insert and lookup, e.g.
vardecl: DEFKEY IDENT typespec SEMI {
/* $<info.content>2 has the name of the variable being declared
* $<info.dtype>3 has the 1 or 2 for the desired type
* topScope() tells us what scope we're currently in
*
* we can call lookup(the variable name) to see if there's already a variable with that name,
* (lookup returns a pointer to the variable's entry),
* and it's an error if that variable's declared scope is the exact same as our current scope
* as long as that isn't the case, we can simply call insert,
* passing the name, scope, and data type
* (and checking the return type to make sure insert didn't fail)
*/
};
value: IDENT {
/* call lookup on the variable, if it returns null then it's an error
* (undeclared variable), otherwise look up the ->typeId to find out
* what data type to assign to value ($<info.dtype>$)
*/
};
assign: SETKEY IDENT expr SEMI {
/* call lookup on the identifier, if it returns null then it's an error
* (undeclared variable), otherwise get the variable's type from the ->typeId
* in the returned pointer and compare that with the info.dtype for expr
* just in case the variable is an int and the expr is a real
* (in which case print a warning but keep going)
*/
};
Aside from the usual foray into debugging and testing, that should take care of everything
we need to get working type checking and variable declaration/scope checking working
for a basic language!