| Question | Mark |
| 1. Tokenizing IttyBitty in Ruby [10 marks] | |
| 2. Auto-generating scripts (cmake) [10 marks] | |
| 3. Parse trees and symbol tables [10 marks] | |
| 4. Grammars and code translation [10 marks] | |
| 5. C macros and C++ templates [10 marks] | |
| Exam Total (Best 4) [40 marks] |
Question 1: Tokenizing IttyBitty in Ruby [10]
Given the IttyBitty specifications and Ruby tokenizer studied this term (copies are appended to the end of the exam), suppose we wanted to replace the 'integer' token type with a set of token types representing positive integers of different bases:
(i) Show the modifications necessaray for the IttyBitty grammar rules to support this change (i.e. what happens to the rule Value --> integer, and what new rules need to be added?).
(ii) Show an exact set of changes/additions that would be needed in order for the Ruby tokenizer to support the revised language specifications. (Note: for parts (i) and (ii) you only need to show changes/additions, you do not need to copy/rewrite the entire grammar or tokenizer.)
Sample solution:
(i) We need to replace the rule for integer with a set of rules for the
different types of integer, possibly making Integer a new non-terminal,
e.g. the old rule was
Value --> integer
the new rules could be
Value --> Integer
Integer --> binary
Integer --> decimal
Integer --> octal
Integer --> hexadecimal
Similarly the integer regex in the original grammar
would be removed, and the regexes for the four new
integer categories would be inserted instead, i.e.
binary b[0-1]+
octal o[0-7]+
decimal d[0-9]+
hexadecimal x[0-9a-f]+
Note that the variable regex would be impacted by this,
since now there is ambiguity with tokens whose pattern
is x[a-f]+ ... each of these could be a valid hex pattern
or a valid identifier under the current rules.
One solution would be to specify that variable identifiers
could not begin with the letter x, e.g.
variable [a-wyzA-WYZ]+
(ii) The changes to the ruby tokenizer would mimic those listed above,
the when case for integers would need to be replaced with four when
cases, one for each of the regexes identified in (i),
and either (a) the when case for variables would need to have its
regex adjusted to remove ambiguity, or (b) we would need to define
which interpretation was correct for the x[a-f]+ case, and ensure
that interpretation's when case came first in the code block.
|
Question 2: Auto-generating scripts (cmake) [10]
The cmake program generates a custom makefile for a project
by
(i) examining source code for dependencies (e.g. detecting which files #include which others),
(ii)
obtaining user responses to a set of project configuration questions provided by the project
developer,
(iii) applying the data
from parts (i) and (ii) to a template, also provided by the project developer.
1. Does what the project developer does constitute metaprogramming (the developer providing the template and the project configuration questions)? Why or why not? Clearly justify your answer.
2. Does writing the cmake software itself constitute metaprogramming? Why or why not? Clearly justify your answer.
Sample solution:
For both of these cases the important part is the justification you provide,
not your actual choice of yes/no.
|
Question 3: Parse trees and symbol tables [10]
Many high level languages support both integer and floating point numeric data, using the single '+' operator to represent both forms of addition.
However, most assembly languages use different instructions for integer addition than for floating point addition, e.g. ADD vs ADDFLT.
During the process of compilation, the compiler must then identify which is the correct type of instruction to use in the resulting computation.
Describe, in terms of parse trees and symbol tables, how a compiler is able to do this. Use the expression 3.2 * y + 4 - x as an example.
Sample solution:
To determine which type of operation should be used in any given
computation, the compiler must be able to identify the data types
of the operands.
If the operands were simple numeric values it would be relatively
easy for the compiler to simply consider the token type, and
use a fixed set of rules for operation selection, e.g.
float op float: use float
int op int: use int
float op int: use ? float
int op float: use ? int
However, if either of the operands is a variable, then the data type
of that variable must be determined, which is where the symbol table
comes in. In a statically typed language, the type of the variable
would generally be determined in its declaration statement, and recorded
in the symbol table to be looked up later, e.g. if we saw an instruction
like int x,y; we would add x and y to the table along with their
now known type.
Furthermore, if the operands are expressions, then those expressions
must first be evaluated to determine their resulting types
before identifying the appropriate data type for the current op.
This is where the parse tree comes in, e.g.
given the expression 3.2*y+4-x, apply the grammar rules to produce
a parse tree identifying the "meaning" of the expression
suppose the resulting parse tree is as shown on the left below,
then the tree can be evaluated bottom-up to identify the
types of the resulting operands for each operator
- (iv) once the + type is known from (iii), and the type of x is known from the
/ \ symbol table, we can determine the type of - op to use
+ x (iii) once the type of * op is know, and knowing the type of 4,
/ \ we can determine the + type
* 4 (ii) from the types determined in (i), we know the float*int rule should
/ \ be used to determine the right * op to use
3.2 y (i) 3.2 is a float, look up y in the symbol table, say it's an int
|
Question 4: Grammars and code translation [10]
Suppose we are working on a translator to take source code written in one language and translate it to another language.
For this question, we consider specifically the problem of translating sequences of if-then-else cases from one language to another.
The table below shows the grammar rules for the two languages, with non-terminals shown in uppercase beginning with an underscore.
Given a parse tree for an if-else sequence in language I, outline (pseudocode + discussions) what the translator would need to do to produce the logically-comparable parse tree for language II.
LANGUAGE I _IFELSESEQ --> _IF _IFELSESEQ --> _IF _ELSESEQ _ELSESEQ --> _ELSE _ELSESEQ --> _ELSIF _ELSESEQ --> _ELSIF _ELSESEQ _IF --> if _CONDITION _BLOCK _ELSIF --> elsif _CONDITION _BLOCK _ELSE --> else _BLOCK | LANGUAGE II _IFELSESEQ --> (cond _CONDLIST ) _CONDLIST --> _CONDCASE _CONDLIST --> _CONDCASE _CONDLIST _CONDCASE --> ( _CONDITION _BLOCK ) |
Sample solution:
If we look at the trees for a typical structure, we can see the translation
patterns we need to apply, e.g. (shortening _CONDITION and _BLOCK below)
The first version for a statement with an if, elsif, elsif, else:
_IFELSESEQ
/ \
_IF _ELSESEQ
/ | \ / \
if _CONDT _BLK _ELSIF _ELSESEQ
/ | \ / \
elsif _CONDT _BLK _ELSIF _ELSESEQ
/ | \ |
elsif _CONDT _BLK _ELSE
/ \
else BLOCK
The corresponding cond version:
_IFELSESEQ
/ | \
(cond _CONDLIST )
/ \
_CONDCASE _CONDLIST
/ | / \
(_CONDT _BLK) _CONDCASE _CONDLIST
/ | / \
(_CONDT _BLK) _CONDCASE _CONDLIST
/ | |
(_CONDT _BLK) _CONDCASE
/ |
(_CONDT _BLK)
To transform from the first to the second:
we introduce a (cond subtree on the left at the top level,
a ) subtree on the right at the top level,
then treat that first _CONDT _BLOCK as an _ELSESEQ
for each _ELSESEQ, we replace the _ELSIF with a _CONDLIST
- the _CONDT and _BLOCK subtrees are identifcal, only the
surrounding syntax changes, from the elsif to the (cond .... )
for the final else case we simply replace with a _CONDCASE,
substituting (t for the else
|
Question 5: C macros and C++ templates [10]
Suppose in our C source code we want to be able to use round brackets ( ) for block delimiters in our if/else if/else statements, instead of the traditional curly brackets { }
We decide to try the following set of macros to take our customized C variant and transform it into valid C code just prior to compilation:
#define if(...) if(__VA_ARGS__) Block
#define else else Block
#define Block(...) { __VA_ARGS__ }
(i) Would our macros correctly translate the following into valid C?
if ( x < y ) ( a = 1 ; b = 2 ; ) else ( b = 1 ; a = 2 ; )Clearly show the sequence of macro expansions to justify your answer.
(ii) Would the macros work for single-statement C blocks, e.g.?
if ( x < y ) a = 0 ; else if ( x > y ) a = 1 ; else a = 2 ;Again, show the macro expansions needed to justify your answer.
(iii) Using C++ templates, write a type-safe variadic min function using the < operator for comparisons, e.g. the following would each be valid calls (assuming the parameters passed were compatible with < and each other).
min(x,y,z) min(x) min(a,b,c,d,e)
Sample solution:
(i) Yes, the expansion would produce the following:
if( x < y ) { a = 1 ; b = 2 ; }
else { b = 1 ; a = 2 ; }
Expansion process:
if (x < y) ( a = 1 ; b = 2 ; )
after the first rule becomes if (x < y) Block ( a = 1 ; b = 2 ; )
after the third rule becomes if (x < y) { a = 1 ; b = 2 ; }
Similarly,
else ( b = 1 ; a = 2 ; )
after the second rule becomes else Block ( b = 1 ; a = 2 ; )
after the third rule becomes else { b = 1 ; a = 2; }
(ii) No, the resulting expansion would produce the following (leftover Blocks):
if( x < y ) Block a = 0 ;
else Block if( x > y ) Block a = 1 ;
else Block a = 2 ;
Expansion process:
if (x < y ) a = 0;
after the first rule becomes if (x < y) Block a = 0;
but the third rule doesn't apply now, because there are no ( ) around the a = 0,
so the Block sticks in our expanded code.
(Similar problems arise in the expansions of the else's)
(iii)
template <typename T>
T min(T x) {
return x;
}
template <typename T, typename... Args>
T min(T x, Args... args) {
T tmp = min(args...);
return ((x<tmp)?x:tmp);
}
|
# The IttyBitty Language
# ======================
#
# A program consists of one or more statements,
# with the permissible token types and list of statement
# grammar rules described below
#
# Note that tokens must all be whitespace delimited,
# token names begin with a lowercase letter,
# non-terminal language component names begin with an uppercase letter
#
# Comments in IttyBitty begin with a # and go to eoln
# and are removed by the tokenizer
# Tokens:
# =======
# variable [a-zA-Z]+
# integer [-]?[0-9]+
# quote "
# text between quotes, this is any block of non-whitespace
# characters that does not contain a quote
# mathop +, -, *, /, %
# assignop =
# compop ==, <, !=, <=, >=, >
# keyword if, while, print
# bracket ( )
# delimiter { }
# unknown any otherwise unrecognized string of non-whitespace characters
# Grammar rules
# =============
# Program --> StatementList
# StatementList --> Statement
# StatementList --> Statement StatementList
# Statement --> VarAssign
# Statement --> Loop
# Statement --> Conditional
# Statement --> Print
# VarAssign --> variable assignop Expression
# Print --> print Value
# Print --> print Text
# Text --> quote WordList quote
# WordList --> text
# WordList --> text WordList
# Conditional --> if Condition Block
# Loop --> while Condition Block
# Condition --> bracket-( Value compop Value bracket-)
# Value --> variable
# Value --> integer
# Block --> delimiter-{ StatementList delimiter-}
# Expression --> Value
# Expression --> Value mathop Expression
# Sample program
# --------------
# print " Welcome! "
# mid = 3
# x = -4
# y = 10
# while ( x < y ) {
# if ( x == mid ) {
# print " mid point reached "
# }
# print " x is "
# print x
# x = x + 1
# }
|
# a tokenizer for the IttyBitty language
#
# the tokenize method takes one or more lines of source code,
# and produces an array of tokens, each of the form [ type, value ]
# where the value is the actual text string for the token
# and the token type is one of the following strings:
# quote, text, keyword, compop, assignop, bracket, delimiter, integer, mathop, variable, unknown
# the tokenizer strips out any comments in a line before tokenizing the line
def tokenize(source)
# first split/join the lines of text into a single array of words
rawTokens = Array.new
source.each_line do | line |
# strip out any comments in the line before adding its tokens to the list
cleanline = ""
if line.include?("#") then
pos = line.index("#")
if pos > 0 then
cleanline = line[0..(pos-1)]
end
else
cleanline = line
end
rawTokens += cleanline.split
end
# now translate each token into a token-type, token-value pair
# and put into the tokenlist
tokenlist = Array.new
numTokens = 0
# while matching tokens, keep track of whether or not we are
# currently inside a text literal
inText = false
rawTokens.each do | token |
# see if we are entering/leaving a text literal
if token == "\"" then
inText = !inText # flip state in/out of text mode
ttype = 'quote'
elsif inText then
ttype = 'text' # everything inside " " is text
else
# match each token against the regular expression for the
# corresponding token type
case token
when "if", "while", "print"
ttype = 'keyword'
when "==", "<", "!=", ">", ">=", "<="
ttype = 'compop'
when "="
ttype = 'assignop'
when "(", ")"
ttype = 'bracket'
when "{", "}"
ttype = 'delimiter'
when /^[-]?[0-9]+$/
ttype = 'integer'
when "-","+", "*", "/", "%"
ttype = 'mathop'
when /^[a-zA-Z]+$/
ttype = 'variable'
else
puts "Error: invalid token encountered: #{token}"
ttype = 'unknown'
end
end
# the token value is simply the text for that token
tokenpair = [ ttype, token ]
tokenlist[numTokens] = tokenpair
numTokens += 1
end
# return the final list of token type/value pairs
return tokenlist
end
|
# parser for the IttyBitty language
# - produces an array-based representation of the program structure,
# annotated with component identification
#
# A summary of what the parser produces for each component type
# is provided below.
#
# Notes:
# StatementList returns a non-empty array of statements, which is
# in Program and Block, and referred to below as a StatementsArray
# Integer values are stored as their text representation
#
# Program --> ['Program',StatementsArray]
#
# Statement --> ['Assign', varname, opname, Expression ]
# --> ['Print', Value ]
# --> ['Print', ['Text', ArrayOfWords] ]
# --> ['If', ['Cond', Value, opname, Value ],
# ['Block', StatementsArray]]
# --> ['While', ['Cond', Value, opname, Value ],
# ['Block', StatementsArray]]
#
# Value --> ['Variable', varname]
# --> ['Integer', value]
#
# Expression --> Value
# --> [ 'Expression', Value opname Expression ]
# parsing debug flag (true if parser debugging turned on, false otherwise)
PRSDEBUG = false
# $ymtable: parsing symbol table (global)
# ========
# hash table of variable names and current initialization state,
# true if initialized, valse otherwise
# the hash table is cleared whenever the parser is called on a
# Program component
$ymtable = {}
# Track the current active scope id, $SID
$SID = -1
# (parse tokenlist) builds a parse tree from the token list for a program
# =================
# (parse tokenlist component position scopes) - builds a parse (sub)tree for the
# specified component, assuming it begins at the specified positon in the token list
# scopes is a list of the current active scopes, newest scope at the end
# scope id's are unique, based on the position of their first token in the tokenlist,
# which is 0 for global scope, the position of the { token for blocks
#
# a parser for the IttyBitty language
# checks if a source code is syntactically correct for the language,
# and builds a parse tree
# expects three parameters:
# the overall token list for the source code
# the type of language component we are checking for next in the token list
# (defaults to an entire valid program)
# the token list position where parsing for the component should begin
# (defaults to the start of the token list)
# returns two values:
# if parsing succeeds it returns the parse tree
# and the position of the token immediately after the component
# otherwise
# it returns false and the position originally passed to it
def parse(tokenlist, component="Program", position=0, scopes=[])
if PRSDEBUG then
puts "\n---parsing #{component} from position #{position} (#{tokenlist[position]}) ---"
end
# make sure we're not off the end of the token list
if tokenlist.length <= position then
return false,tokenlist.length
end
# default parse tree is empty array
parsetree = Array.new
# based on the component type, try to parse the tokenlist appropriately
case component
when "Program"
# initialize
$ymtable.clear # make sure we start with a clean table
$SID = 0 # set the current scope id to 0 (global)
scopes << $SID # add the global scope to the list of active scopes
# Program --> StatementList
parsetree,pos = parse(tokenlist, "StatementList", position, scopes)
if parsetree == false then
return false, position
elsif tokenlist.length > pos then
puts "---parsing successful(?) but unused tokens from position #{pos} ---"
return false, position
end
return ['Program',parsetree], pos
when "StatementList"
parsetree,pos = parse(tokenlist, "Statement", position, scopes)
# StatementList --> Statement
if parsetree == false then
return false,position
# StatementList --> Statement StatementList
else
backuppos = pos
backuptree = parsetree
parsetree,pos = parse(tokenlist, "StatementList", pos, scopes)
if parsetree == false then
return [backuptree],backuppos
else
return [backuptree] + parsetree,pos
end
end
when "Statement"
# Statement --> Print
parsetree,pos = parse(tokenlist, "Print",position, scopes)
if parsetree != false then
return parsetree,pos
end
# Statement --> VarAssign
parsetree,pos = parse(tokenlist, "VarAssign",position, scopes)
if parsetree != false then
return parsetree,pos
end
# Statement --> Loop
parsetree,pos = parse(tokenlist, "Loop",position, scopes)
if parsetree != false then
return parsetree,pos
end
# Statement --> Conditional
parsetree,pos = parse(tokenlist, "Conditional",position, scopes)
if parsetree != false then
return parsetree,pos
end
return false,position
when "VarAssign"
# VarAssign --> variable = Expression
if tokenlist.length < position+3 then
return false,position
end
var = tokenlist[position]
op = tokenlist[position+1]
if var[0] == 'variable' and op[0] == 'assignop' then
parsetree,pos = parse(tokenlist, "Expression",position+2, scopes)
if parsetree != false then
# if the variable is not in the symbol table,
# add it with value [ $SID ] (the active scope id)
if !$ymtable.has_key?(var[1]) then
$ymtable[var[1]] = [ $SID ]
varscopes = $ymtable[var[1]]
else
# otherwise, get the variable's entry from the symbol table
# (its list of scopes in which it has been initialized)
# if $SID is not in that list, add it and update
# variable's entry in the scopes list
varscopes = $ymtable[var[1]]
if !varscopes.include?($SID) then
varscopes << $SID
$ymtable[var[1]] = varscopes
end
end
return ['Assign', var[1], op[1], parsetree],pos
end
end
return false,position
when "Print"
if tokenlist.length < position+2 then
return false,position
end
prt = tokenlist[position]
if prt[0] != 'keyword' or prt[1] != 'print' then
return false,position
end
parsetree,pos = parse(tokenlist, "Value", position+1, scopes)
if parsetree != false then
return ['Print', parsetree],pos
end
# Print --> print text
parsetree,pos = parse(tokenlist, "Text", position+1, scopes)
if parsetree != false then
return ['Print', parsetree],pos
end
return false,position
when "Value"
if tokenlist.length < position+1 then
return false,position
end
val = tokenlist[position]
# Value --> variable
if val[0] == 'variable' then
# if the variable is not yet in the symbol table add it now as uninitialized
if !$ymtable.has_key?(val[1]) then
$ymtable[val[1]] = []
end
# get the variable's list of scopes from the symbol table,
# and compare it to the list of active scopes
# if there is no overlap between the two then the
# variable has not been initialized,
# so generate a warning message
varscopes = $ymtable[val[1]]
overlap = varscopes & scopes
if overlap.length < 1 then
puts "WARNING: use of uninitialized variable #{val[1]}"
end
return ['Variable', val[1]],position+1
# Value --> integer
elsif val[0] == 'integer' then
return ['Integer',val[1]],position+1
end
return false,position
when "Conditional"
if tokenlist.length <= position then
return false,position
end
iftok = tokenlist[position]
# Conditional --> if Condition Block
if iftok[0] == 'keyword' and iftok[1] == 'if' then
condtree,newpos = parse(tokenlist, "Condition",position+1, scopes)
if condtree != false then
blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes)
if blocktree != false then
return ['If',condtree,blocktree],lastpos
end
end
end
return false,position
when "Loop"
if tokenlist.length <= position then
return false,position
end
whiletok = tokenlist[position]
# Loop --> while Condition Block
if whiletok[0] == 'keyword' and whiletok[1] == 'while' then
condtree,newpos = parse(tokenlist, "Condition",position+1, scopes)
if condtree != false then
blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes)
if blocktree != false then
return ['While',condtree,blocktree],lastpos
end
end
end
return false,position
when "Condition"
# Condition --> ( Value comparisonop Value )
if tokenlist.length < position+5 then
return false,position
end
br1 = tokenlist[position]
if br1[0] == 'bracket' and br1[1] == "(" then
val1tree,pos = parse(tokenlist, "Value",position+1, scopes)
if val1tree != false then
op = tokenlist[position+2]
if op[0] == 'compop' then
val2tree,lastpos = parse(tokenlist, "Value",position+3, scopes)
if val2tree != false then
br2 = tokenlist[position+4]
if br2[0] == 'bracket' and br2[1] == ')' then
return ['Cond',val1tree,op[1],val2tree],position+5
end
end
end
end
end
return false,position
when "Block"
# Block --> { StatementList }
if tokenlist.length < position+1 then
return false,position
end
br1 = tokenlist[position]
if br1[0] == 'delimiter' and br1[1] == '{' then
oldSID = $SID # remember the current scope id
$SID = position # capture the new scope id
scopes << $SID # add the id for the current scope to the active list
parsetree,pos = parse(tokenlist, "StatementList",position+1, scopes)
scopes.pop # remove the scope id, back to the previous scope list
$SID = oldSID # restore the correct active scope id
if parsetree != false then
if tokenlist.length < pos+1 then
return false,position
end
br2 = tokenlist[pos]
if br2[0] == 'delimiter' and br2[1] == '}' then
return ['Block',parsetree],pos+1
end
end
end
return false,position
when "Expression"
# Expression --> Value
parsetree,pos = parse(tokenlist, "Value",position, scopes)
if parsetree != false then
if tokenlist.length <= pos
return parsetree,pos
end
backuppos = pos
# Expression --> Value mathop Expression
op = tokenlist[pos]
if op[0] == 'mathop' then
exprtree,pos = parse(tokenlist, "Expression",pos+1, scopes)
if exprtree != false then
return ['Expression', parsetree, op[1], exprtree],pos
end
end
return parsetree,backuppos
end
return false,position
when "Text"
# Text --> " WordList "
if tokenlist.length <= position then
return false,position
end
q1 = tokenlist[position]
if q1[0] == 'quote' then
parsetree,pos = parse(tokenlist, "WordList",position+1, scopes)
if parsetree != false then
if tokenlist.length <= pos then
return false,position
end
q2 = tokenlist[pos]
if q2[0] == 'quote' then
return ['Text', parsetree],pos+1
end
end
end
return false,position
when "WordList"
# WordList --> text
if tokenlist.length <= position then
return false,position
end
word = tokenlist[position]
if word[0] == 'text' then
# WordList --> text WordList
wordstree,pos = parse(tokenlist, "WordList",position+1, scopes)
if wordstree != false then
return [word[1]]+wordstree,pos
end
return [word[1]],position+1
end
return false,position
else # this else is for the overall component case
puts "Error, invalid component type specified: #{component}"
return false,position
end
# return the results of parsing as the suggested component type
# should not be reachable?
puts "---???Reached default end of parse???---"
return parsetree,position
end
|