CSCI 485: Spring Midterm Exam and Sample Solutions
Sections S18N01-S18N02 (Dr. Wessels)

Question Mark
1. Tokenizing IttyBitty in Ruby
[10 marks]
 
2. Auto-generating scripts (cmake)
[10 marks]
 
3. Parse trees and symbol tables
[10 marks]
 
4. Grammars and code translation
[10 marks]
 
5. C macros and C++ templates
[10 marks]
 
Exam Total (Best 4)

[40 marks]

 

Question 1: Tokenizing IttyBitty in Ruby [10]

Given the IttyBitty specifications and Ruby tokenizer studied this term (copies are appended to the end of the exam), suppose we wanted to replace the 'integer' token type with a set of token types representing positive integers of different bases:

For example, the decimal value 12 would be represented in the four different bases as xa, d12, o14, b1100.

(i) Show the modifications necessaray for the IttyBitty grammar rules to support this change (i.e. what happens to the rule Value --> integer, and what new rules need to be added?).

(ii) Show an exact set of changes/additions that would be needed in order for the Ruby tokenizer to support the revised language specifications. (Note: for parts (i) and (ii) you only need to show changes/additions, you do not need to copy/rewrite the entire grammar or tokenizer.)

Sample solution:

(i) We need to replace the rule for integer with a set of rules for the
different types of integer, possibly making Integer a new non-terminal, 
e.g. the old rule was
   Value --> integer
the new rules could be
   Value --> Integer
   Integer --> binary
   Integer --> decimal
   Integer --> octal
   Integer --> hexadecimal

Similarly the integer regex in the original grammar
would be removed, and the regexes for the four new
integer categories would be inserted instead, i.e.
   binary b[0-1]+
   octal  o[0-7]+
   decimal d[0-9]+
   hexadecimal x[0-9a-f]+

Note that the variable regex would be impacted by this,
   since now there is ambiguity with tokens whose pattern
   is x[a-f]+ ... each of these could be a valid hex pattern
   or a valid identifier under the current rules.

   One solution would be to specify that variable identifiers
   could not begin with the letter x, e.g.
      variable [a-wyzA-WYZ]+

(ii) The changes to the ruby tokenizer would mimic those listed above,
   the when case for integers would need to be replaced with four when
   cases, one for each of the regexes identified in (i),
   and either (a) the when case for variables would need to have its
   regex adjusted to remove ambiguity, or (b) we would need to define
   which interpretation was correct for the x[a-f]+ case, and ensure
   that interpretation's when case came first in the code block.

Question 2: Auto-generating scripts (cmake) [10]

The cmake program generates a custom makefile for a project by
(i) examining source code for dependencies (e.g. detecting which files #include which others),
(ii) obtaining user responses to a set of project configuration questions provided by the project developer,
(iii) applying the data from parts (i) and (ii) to a template, also provided by the project developer.

1. Does what the project developer does constitute metaprogramming (the developer providing the template and the project configuration questions)? Why or why not? Clearly justify your answer.

2. Does writing the cmake software itself constitute metaprogramming? Why or why not? Clearly justify your answer.

Sample solution:
For both of these cases the important part is the justification you provide,
     not your actual choice of yes/no.

Question 3: Parse trees and symbol tables [10]

Many high level languages support both integer and floating point numeric data, using the single '+' operator to represent both forms of addition.

However, most assembly languages use different instructions for integer addition than for floating point addition, e.g. ADD vs ADDFLT.

During the process of compilation, the compiler must then identify which is the correct type of instruction to use in the resulting computation.

Describe, in terms of parse trees and symbol tables, how a compiler is able to do this. Use the expression 3.2 * y + 4 - x as an example.

Sample solution:
To determine which type of operation should be used in any given
   computation, the compiler must be able to identify the data types
   of the operands.

If the operands were simple numeric values it would be relatively
   easy for the compiler to simply consider the token type, and
   use a fixed set of rules for operation selection, e.g.
      float op float: use float
      int   op int:   use int
      float op int:   use ? float
      int   op float: use ? int

However, if either of the operands is a variable, then the data type
   of that variable must be determined, which is where the symbol table
   comes in.  In a statically typed language, the type of the variable
   would generally be determined in its declaration statement, and recorded
   in the symbol table to be looked up later, e.g. if we saw an instruction
   like    int x,y;   we would add x and y to the table along with their
   now known type.

Furthermore, if the operands are expressions, then those expressions
   must first be evaluated to determine their resulting types
   before identifying the appropriate data type for the current op.
   This is where the parse tree comes in, e.g.

   given the expression 3.2*y+4-x, apply the grammar rules to produce
     a parse tree identifying the "meaning" of the expression

   suppose the resulting parse tree is as shown on the left below,
      then the tree can be evaluated bottom-up to identify the
      types of the resulting operands for each operator

         -      (iv)  once the + type is known from (iii), and the type of x is known from the
        / \           symbol table, we can determine the type of - op to use
       +   x    (iii) once the type of * op is know, and knowing the type of 4,
      / \             we can determine the + type
     *   4      (ii)  from the types determined in (i), we know the float*int rule should
    / \               be used to determine the right * op to use
  3.2  y        (i)   3.2 is a float, look up y in the symbol table, say it's an int

Question 4: Grammars and code translation [10]

Suppose we are working on a translator to take source code written in one language and translate it to another language.

For this question, we consider specifically the problem of translating sequences of if-then-else cases from one language to another.

The table below shows the grammar rules for the two languages, with non-terminals shown in uppercase beginning with an underscore.

Given a parse tree for an if-else sequence in language I, outline (pseudocode + discussions) what the translator would need to do to produce the logically-comparable parse tree for language II.

LANGUAGE I

_IFELSESEQ --> _IF
_IFELSESEQ --> _IF  _ELSESEQ

_ELSESEQ --> _ELSE
_ELSESEQ --> _ELSIF
_ELSESEQ --> _ELSIF _ELSESEQ

_IF --> if _CONDITION _BLOCK
_ELSIF --> elsif _CONDITION _BLOCK
_ELSE --> else _BLOCK
LANGUAGE II

_IFELSESEQ --> (cond _CONDLIST )
_CONDLIST --> _CONDCASE
_CONDLIST --> _CONDCASE _CONDLIST
_CONDCASE --> ( _CONDITION _BLOCK )

Sample solution:
If we look at the trees for a typical structure, we can see the translation
patterns we need to apply, e.g. (shortening _CONDITION and _BLOCK below)

The first version for a statement with an if, elsif, elsif, else:

            _IFELSESEQ
            /         \
          _IF          _ELSESEQ
         / | \        /        \
   if _CONDT _BLK _ELSIF        _ELSESEQ
                 / |   \       /        \
         elsif _CONDT _BLK _ELSIF       _ELSESEQ
                          /   |   \          |
                     elsif _CONDT _BLK     _ELSE
                                          /    \
                                       else   BLOCK

The corresponding cond version:

               _IFELSESEQ
              /        |  \
           (cond _CONDLIST  )
                /         \
         _CONDCASE        _CONDLIST
        /     |          /         \
  (_CONDT _BLK)    _CONDCASE      _CONDLIST
                  /    |           /        \
            (_CONDT  _BLK)    _CONDCASE   _CONDLIST
                             /     |         |
                        (_CONDT  _BLK)    _CONDCASE
                                         /      |
                                     (_CONDT  _BLK)

To transform from the first to the second:

   we introduce a (cond subtree on the left at the top level,
                a ) subtree on the right at the top level,
           then treat that first _CONDT _BLOCK as an _ELSESEQ

   for each _ELSESEQ, we replace the _ELSIF with a _CONDLIST
       - the _CONDT and _BLOCK subtrees are identifcal, only the
        surrounding syntax changes, from the elsif to the (cond .... )

   for the final else case we simply replace with a _CONDCASE,
        substituting (t for the else

Question 5: C macros and C++ templates [10]

Suppose in our C source code we want to be able to use round brackets ( ) for block delimiters in our if/else if/else statements, instead of the traditional curly brackets { }

We decide to try the following set of macros to take our customized C variant and transform it into valid C code just prior to compilation:

#define if(...) if(__VA_ARGS__) Block
#define else else Block
#define Block(...) { __VA_ARGS__ }
(i) Would our macros correctly translate the following into valid C?
if ( x < y ) ( a = 1 ; b = 2 ; )
else ( b = 1 ; a = 2 ; )
Clearly show the sequence of macro expansions to justify your answer.

(ii) Would the macros work for single-statement C blocks, e.g.?

if ( x < y ) a = 0 ;
else if ( x > y ) a = 1 ;
else a = 2 ;
Again, show the macro expansions needed to justify your answer.

(iii) Using C++ templates, write a type-safe variadic min function using the < operator for comparisons, e.g. the following would each be valid calls (assuming the parameters passed were compatible with < and each other).

min(x,y,z) min(x) min(a,b,c,d,e)

Sample solution:
(i) Yes, the expansion would produce the following:

if( x < y ) { a = 1 ; b = 2 ; }
else { b = 1 ; a = 2 ; }

Expansion process:
      if (x < y) ( a = 1 ; b = 2 ; )
after the first rule becomes  if (x < y) Block ( a = 1 ; b = 2 ; )
after the third rule becomes if (x < y) { a = 1 ; b = 2 ; }

Similarly,
      else ( b = 1 ; a = 2 ; )
after the second rule becomes else Block ( b = 1 ; a = 2 ; )
after the third rule becomes else { b = 1 ; a = 2; }

(ii) No, the resulting expansion would produce the following (leftover Blocks):

if( x < y ) Block a = 0 ;
else Block if( x > y ) Block a = 1 ;
else Block a = 2 ;

Expansion process:
      if (x < y ) a = 0;
after the first rule becomes  if (x < y) Block a = 0;
but the third rule doesn't apply now, because there are no ( ) around the a = 0,
   so the Block sticks in our expanded code.

(Similar problems arise in the expansions of the else's)

(iii)

template <typename T>
T min(T x) {
  return x;
}

template <typename T, typename... Args>
T min(T x, Args... args) {
   T tmp = min(args...);
   return ((x<tmp)?x:tmp);
}

# The IttyBitty Language
# ======================
#
# A program consists of one or more statements,
#   with the permissible token types and list of statement
#   grammar rules described below
#
# Note that tokens must all be whitespace delimited,
#   token names begin with a lowercase letter,
#   non-terminal language component names begin with an uppercase letter
#
# Comments in IttyBitty begin with a # and go to eoln
#   and are removed by the tokenizer

# Tokens:
# =======
# variable        [a-zA-Z]+
# integer         [-]?[0-9]+
# quote            "
# text            between quotes, this is any block of non-whitespace
#                   characters that does not contain a quote
# mathop          +, -, *, /, %
# assignop        =
# compop          ==, <, !=, <=, >=, >
# keyword         if, while, print
# bracket         ( )
# delimiter       { }
# unknown         any otherwise unrecognized string of non-whitespace characters

# Grammar rules
# =============
# Program --> StatementList
# StatementList --> Statement
# StatementList --> Statement StatementList
# Statement --> VarAssign
# Statement --> Loop
# Statement --> Conditional
# Statement --> Print
# VarAssign --> variable assignop Expression
# Print --> print Value
# Print --> print Text
# Text --> quote WordList quote
# WordList --> text
# WordList --> text WordList
# Conditional --> if Condition Block
# Loop --> while Condition Block
# Condition --> bracket-( Value compop Value bracket-)
# Value --> variable
# Value --> integer
# Block --> delimiter-{ StatementList delimiter-}
# Expression --> Value
# Expression --> Value mathop Expression

# Sample program
# --------------
# print " Welcome! "
# mid = 3
# x = -4
# y = 10
# while ( x < y ) {
#    if ( x == mid ) {
#       print " mid point reached "
#    }
#    print " x is "
#    print x
#    x = x + 1
# }

# a tokenizer for the IttyBitty language
#

# the tokenize method takes one or more lines of source code,
#     and produces an array of tokens, each of the form [ type, value ]
# where the value is the actual text string for the token
#     and the token type is one of the following strings:
#     quote, text, keyword, compop, assignop, bracket, delimiter, integer, mathop, variable, unknown
# the tokenizer strips out any comments in a line before tokenizing the line
def tokenize(source)
   # first split/join the lines of text into a single array of words
   rawTokens = Array.new
   source.each_line do | line |
      # strip out any comments in the line before adding its tokens to the list
      cleanline = ""
      if line.include?("#") then
         pos = line.index("#")
         if pos > 0 then
            cleanline = line[0..(pos-1)]
         end
      else
         cleanline = line
      end
      rawTokens += cleanline.split
   end

   # now translate each token into a token-type, token-value pair
   #     and put into the tokenlist
   tokenlist = Array.new
   numTokens = 0

   # while matching tokens, keep track of whether or not we are
   #    currently inside a text literal
   inText = false

   rawTokens.each do | token |
      # see if we are entering/leaving a text literal
      if token == "\"" then
         inText =  !inText  # flip state in/out of text mode
         ttype = 'quote'
      elsif inText then
         ttype = 'text'       # everything inside " " is text
      else
         # match each token against the regular expression for the
         #    corresponding token type
         case token
             when "if", "while", "print"
                ttype = 'keyword'
             when "==", "<", "!=", ">", ">=", "<="
                ttype = 'compop'
             when "="
                ttype = 'assignop'
             when "(", ")"
                ttype = 'bracket'
             when "{", "}"
                ttype = 'delimiter'
             when /^[-]?[0-9]+$/
                ttype = 'integer'
             when "-","+", "*", "/", "%"
                ttype = 'mathop'
             when /^[a-zA-Z]+$/
                ttype = 'variable'
             else
                puts "Error: invalid token encountered: #{token}"
                ttype = 'unknown'
          end
      end

      # the token value is simply the text for that token
      tokenpair = [ ttype, token ]
      tokenlist[numTokens] = tokenpair
      numTokens += 1
   end

   # return the final list of token type/value pairs
   return tokenlist
end

# parser for the IttyBitty language
#   - produces an array-based representation of the program structure,
#     annotated with component identification
#
# A summary of what the parser produces for each component type
#   is provided below.
#
# Notes:
#    StatementList returns a non-empty array of statements, which is
#       in Program and Block, and referred to below as a StatementsArray
#    Integer values are stored as their text representation
#
# Program --> ['Program',StatementsArray]
#
# Statement --> ['Assign', varname, opname, Expression ]
#           --> ['Print', Value ]
#           --> ['Print', ['Text', ArrayOfWords] ]
#           --> ['If', ['Cond', Value, opname, Value ],
#                      ['Block', StatementsArray]]
#           --> ['While', ['Cond', Value, opname, Value ],
#                         ['Block', StatementsArray]]
#
# Value --> ['Variable', varname]
#       --> ['Integer', value]
#
# Expression --> Value
#            --> [ 'Expression', Value opname Expression ]


# parsing debug flag (true if parser debugging turned on, false otherwise)
PRSDEBUG = false

# $ymtable: parsing symbol table (global)
# ========
#    hash table of variable names and current initialization state,
#       true if initialized, valse otherwise
#    the hash table is cleared whenever the parser is called on a
#       Program component
$ymtable = {}

# Track the current active scope id, $SID
$SID = -1

# (parse tokenlist) builds a parse tree from the token list for a program
# =================
# (parse tokenlist component position scopes) - builds a parse (sub)tree for the
#    specified component, assuming it begins at the specified positon in the token list
#    scopes is a list of the current active scopes, newest scope at the end
# scope id's are unique, based on the position of their first token in the tokenlist,
#    which is 0 for global scope, the position of the { token for blocks
#
# a parser for the IttyBitty language
#    checks if a source code is syntactically correct for the language,
#    and builds a parse tree
# expects three parameters:
#      the overall token list for the source code
#      the type of language component we are checking for next in the token list
#          (defaults to an entire valid program)
#      the token list position where parsing for the component should begin
#          (defaults to the start of the token list)
# returns two values:
#     if parsing succeeds it returns the parse tree
#        and the position of the token immediately after the component
#     otherwise
#        it returns false and the position originally passed to it
def parse(tokenlist, component="Program", position=0, scopes=[])

if PRSDEBUG then
   puts "\n---parsing #{component} from position #{position} (#{tokenlist[position]}) ---"
end

    # make sure we're not off the end of the token list
    if tokenlist.length <= position then
       return false,tokenlist.length
    end

    # default parse tree is empty array
    parsetree = Array.new

    # based on the component type, try to parse the tokenlist appropriately
    case component
        when "Program"
            # initialize
            $ymtable.clear # make sure we start with a clean table
            $SID = 0       # set the current scope id to 0 (global)
            scopes << $SID # add the global scope to the list of active scopes

            # Program --> StatementList
            parsetree,pos = parse(tokenlist, "StatementList", position, scopes)
            if parsetree == false then
               return false, position
            elsif tokenlist.length > pos then
               puts "---parsing successful(?) but unused tokens from position #{pos} ---"
               return false, position
            end
            return ['Program',parsetree], pos

        when "StatementList"
            parsetree,pos = parse(tokenlist, "Statement", position, scopes)
            # StatementList --> Statement
            if parsetree == false then
               return false,position
            # StatementList --> Statement StatementList
            else
               backuppos = pos
               backuptree = parsetree
               parsetree,pos = parse(tokenlist, "StatementList", pos, scopes)
               if parsetree == false then
                  return [backuptree],backuppos
               else
                  return [backuptree] + parsetree,pos
               end
            end

        when "Statement"
           # Statement --> Print
           parsetree,pos = parse(tokenlist, "Print",position, scopes)
           if parsetree != false then
              return parsetree,pos
           end
           # Statement --> VarAssign
           parsetree,pos = parse(tokenlist, "VarAssign",position, scopes)
           if parsetree != false then
              return parsetree,pos
           end
           # Statement --> Loop
           parsetree,pos = parse(tokenlist, "Loop",position, scopes)
           if parsetree != false then
              return parsetree,pos
           end
           # Statement --> Conditional
           parsetree,pos = parse(tokenlist, "Conditional",position, scopes)
           if parsetree != false then
              return parsetree,pos
           end
           return false,position

        when "VarAssign"
           # VarAssign --> variable = Expression
           if tokenlist.length < position+3 then
              return false,position
           end
           var = tokenlist[position]
           op = tokenlist[position+1]
           if var[0] == 'variable' and op[0] == 'assignop' then
              parsetree,pos = parse(tokenlist, "Expression",position+2, scopes)
              if parsetree != false then
                 # if the variable is not in the symbol table,
                 #    add it with value [ $SID ] (the active scope id)
                 if !$ymtable.has_key?(var[1]) then
                    $ymtable[var[1]] = [ $SID ]
                    varscopes = $ymtable[var[1]]
                 else
                    # otherwise, get the variable's entry from the symbol table
                    #    (its list of scopes in which it has been initialized)
                    #    if $SID is not in that list, add it and update
                    #       variable's entry in the scopes list
                    varscopes = $ymtable[var[1]]
                    if !varscopes.include?($SID) then
                       varscopes << $SID
                       $ymtable[var[1]] = varscopes
                    end
                 end
                 return ['Assign', var[1], op[1], parsetree],pos
              end
           end
           return false,position

        when "Print"
           if tokenlist.length < position+2 then
              return false,position
           end
           prt = tokenlist[position]
           if prt[0] != 'keyword' or prt[1] != 'print' then
              return false,position
           end
           parsetree,pos = parse(tokenlist, "Value", position+1, scopes)
           if parsetree != false then
              return ['Print', parsetree],pos
           end
           # Print --> print text
           parsetree,pos = parse(tokenlist, "Text", position+1, scopes)
           if parsetree != false then
              return ['Print', parsetree],pos
           end
           return false,position

        when "Value"
           if tokenlist.length < position+1 then
              return false,position
           end
           val = tokenlist[position]
           # Value --> variable
           if val[0] == 'variable' then
              # if the variable is not yet in the symbol table add it now as uninitialized
              if !$ymtable.has_key?(val[1]) then
                 $ymtable[val[1]] = []
              end
              # get the variable's list of scopes from the symbol table,
              #     and compare it to the list of active scopes
              # if there is no overlap between the two then the
              #     variable has not been initialized,
              #     so generate a warning message
              varscopes = $ymtable[val[1]]
              overlap = varscopes & scopes
              if overlap.length < 1 then
                 puts "WARNING: use of uninitialized variable #{val[1]}"
              end
              return ['Variable', val[1]],position+1
           # Value --> integer
           elsif val[0] == 'integer' then
              return ['Integer',val[1]],position+1
           end
           return false,position

        when "Conditional"
           if tokenlist.length <= position then
              return false,position
           end
           iftok = tokenlist[position]
           # Conditional --> if Condition Block
           if iftok[0] == 'keyword' and iftok[1] == 'if' then
              condtree,newpos = parse(tokenlist, "Condition",position+1, scopes)
              if condtree != false then
                 blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes)
                 if blocktree != false then
                    return ['If',condtree,blocktree],lastpos
                 end
              end
           end
           return false,position

        when "Loop"
           if tokenlist.length <= position then
              return false,position
           end
           whiletok = tokenlist[position]
           # Loop --> while Condition Block
           if whiletok[0] == 'keyword' and whiletok[1] == 'while' then
              condtree,newpos = parse(tokenlist, "Condition",position+1, scopes)
              if condtree != false then
                 blocktree,lastpos = parse(tokenlist, "Block",newpos, scopes)
                 if blocktree != false then
                    return ['While',condtree,blocktree],lastpos
                 end
              end
           end
           return false,position

        when "Condition"
           # Condition --> ( Value comparisonop Value )
           if tokenlist.length < position+5 then
              return false,position
           end
           br1 = tokenlist[position]
           if br1[0] == 'bracket' and br1[1] == "(" then
              val1tree,pos = parse(tokenlist, "Value",position+1, scopes)
              if val1tree != false then
                 op = tokenlist[position+2]
                 if op[0] == 'compop' then
                    val2tree,lastpos = parse(tokenlist, "Value",position+3, scopes)
                    if val2tree != false then
                       br2 = tokenlist[position+4]
                       if br2[0] == 'bracket' and br2[1] == ')' then
                          return ['Cond',val1tree,op[1],val2tree],position+5
                       end
                    end
                 end
              end
           end
           return false,position

        when "Block"
           # Block --> { StatementList }
           if tokenlist.length < position+1 then
              return false,position
           end
           br1 = tokenlist[position]
           if br1[0] == 'delimiter' and br1[1] == '{' then
              oldSID = $SID     # remember the current scope id
              $SID = position   # capture the new scope id
              scopes << $SID    # add the id for the current scope to the active list
              parsetree,pos = parse(tokenlist, "StatementList",position+1, scopes)
              scopes.pop        # remove the scope id, back to the previous scope list
              $SID = oldSID     # restore the correct active scope id
              if parsetree != false then
                 if tokenlist.length < pos+1 then
                    return false,position
                 end
                 br2 = tokenlist[pos]
                 if br2[0] == 'delimiter' and br2[1] == '}' then
                    return ['Block',parsetree],pos+1
                 end
              end
           end
           return false,position

        when "Expression"
           # Expression --> Value
           parsetree,pos = parse(tokenlist, "Value",position, scopes)
           if parsetree != false then
              if tokenlist.length <= pos
                 return parsetree,pos
              end
              backuppos = pos
              # Expression --> Value mathop Expression
              op = tokenlist[pos]
              if op[0] == 'mathop' then
                 exprtree,pos = parse(tokenlist, "Expression",pos+1, scopes)
                 if exprtree != false then
                    return ['Expression', parsetree, op[1], exprtree],pos
                 end
              end
              return parsetree,backuppos
           end
           return false,position

        when "Text"
           # Text --> " WordList "
           if tokenlist.length <= position then
              return false,position
           end
           q1 = tokenlist[position]
           if q1[0] == 'quote' then
              parsetree,pos = parse(tokenlist, "WordList",position+1, scopes)
              if parsetree != false then
                 if tokenlist.length <= pos then
                    return false,position
                 end
                 q2 = tokenlist[pos]
                 if q2[0] == 'quote' then
                    return ['Text', parsetree],pos+1
                 end
              end
           end
           return false,position

        when "WordList"
           # WordList --> text
           if tokenlist.length <= position then
              return false,position
           end
           word = tokenlist[position]
           if word[0] == 'text' then
              # WordList --> text WordList
              wordstree,pos = parse(tokenlist, "WordList",position+1, scopes)
              if wordstree != false then
                 return [word[1]]+wordstree,pos
              end
              return [word[1]],position+1
           end
           return false,position

        else # this else is for the overall component case
           puts "Error, invalid component type specified: #{component}"
           return false,position
    end

    # return the results of parsing as the suggested component type
    #    should not be reachable?
    puts "---???Reached default end of parse???---"
    return parsetree,position
end