Note: the repo has been pushed (4pm Feb. 9th) |
Vurbossity revisions:
We'll be working with a slightly revised version of verbossity, the changes to the language specs are as follows (in addition to the corrections posted on the lab1 page):
Lab 3 overview:
For lab 3 you'll be implementing a hand-crafted tokenizer for vurbossity, written in C++ and following the specifications below. The two core routines are the actual tokenizer, called tokenize, and a routine to display the resulting tokens, printTokens. The tokenizer reads the source code from standard input (so lab3x is run similarly to labs 1 and 2, ./lab3x < somecode.vurb). It strips comments and identifies each valid token, storing information about the token in an array of token structs and returning the total number of valid tokens read.// read each word from standard input, // displaying error messages for invalid tokens encountered, // filling in the corresponding token information in the tokens array for valid tokens, // and returning the number of valid tokens read int tokenize(token tokens[]); |
// each token has a type (from the TokenTypes enum), // the associated token text content, and // its position in the sequence of valid tokens struct token { unsigned int ttype; string content; int pos; }; |
// enumeration of all the token types [only partially completed] enum TokenType { Invalid = -1, Begin, Left, Right, End, Identifier, RealLit, BoolLit, IntLit, StrLit }; |
// display the token information for each token in the array void printTokens(token tokens[], int size); |
I'll be writing my own version of lab3.cpp to include/call your tokenize and printToken routines, so make sure all the code they rely upon is in the tokenizing.cpp file (i.e. don't have them rely on code that is in your lab3.cpp).
A sample input file and resulting output is provided below (just testing the tokenization, we don't care at this point if the program is syntactically valid):begin left 104 right COM blah blah blah foo begin "this is some text" 9x5? end |