Review and prep:
A significant portion of lab 1 will be spent getting familiar with
the basics of lex (lexical analyzer) and yacc (yet another compiler compiler).
If you completed CSCI 330 recently then this will (hopefully) be review, otherwise there is quite a bit of new material to pick up as part of the lab 1 exercise.
If you're rusty with the git submit process then it is also worth seeing the
Review/prep for the labs
As usual, you'll need to fork/clone a git repository with the starting code for lab1
(csci439 as the course, lab1 as the repo name), and remember your git add/commit/push
when the time comes to submit.
Bonus marks opportunity:
If anyone is looking for an additional challenge for lab 1, I'll give a 15% bonus for having the parser correctly detect when variables are used without being declared and when a variable is declared multiple times (10% for detecting one or the other but not both). In the lab today we'll look at a possible structure for adding a simple symbol table and helper functions to support this. |
Lab 1 exercise:
Lab 1 will focus on creating a lex/yacc recognizer for a simple programming
language we'll make up as the lab goes along :)
(I'll use this page to post a summary of the language rules we agreed upon
in the lab.)
Rules for the Monday lab's language | Rules for Wednesday lab's language |
TokenizingBlock delimiter tokens: OPENBR { CLOSEBR } Datatype specifier tokens: INT int REAL real Integer literals: INTLIT {Digit}+ Real literals: REALLIT {Digit}+[.]{Digit}+ Identifiers: IDENT {Alpha}+ Other keywords: PRINT print ASSIGNOP <- | Tokenizing:Block delimiter tokens LBRACK ( RBRACK ) Terminators QUEST ? EXCL ! SEMI ; Keywords LET let SHOW show Int literals INT [0-9]+ Variable names VAR [a-zA-Z_]+ Assignment op ARROW <- |
Parsingprogram --> block block --> OPENBR variabledefs statements CLOSEBR variabledefs --> variabledef variabledefs variabledefs --> variabledef variabledef --> IDENT datatype variabledef --> IDENT datatype value statements --> statements statement statements --> statement statement --> printstmt statement --> assignstmt printstmt --> PRINT IDENT assignstmt --> IDENT ASSIGNOP value value --> INTLIT value --> REALLIT datatype --> INT datatype --> REAL | Parsing:program --> block program --> block program block --> LBRACK decls stmts RBRACK decls --> decl decls --> decl decls stmts --> stmt stmts --> stmt stmts decl --> LET VAR INT SEMI stmt --> assignstmt stmt --> displaystmt assignstmt --> VAR ARROW INT QUEST displaystmt --> SHOW VAR EXCL |
In the lab1 repository, you'll be provided with skeleton lex and yacc files ( lab1.lex, lab1.yacc) and a makefile to compile them into a lab1 executable that will be your recognizer.
After running make to compile the lex/yacc code, we can test user programs in our
new language by running
./lab1 < programname
Currently the skeleton version thinks anything with the form "start NNN finish" (where NNN is an integer) is a valid program, so the .lex and .yacc files need a fair bit of work.
The skeleton lex/yacc is currently set up so it can track the current row/column number as it reads through the source file, and if it decides the source code program is invalid it displays an error message with the row/column. If no error messages are displayed then it appears the source code program is valid.
Objectives:
The goal for lab1 is to modify the .lex and .yacc files so that the constructed lab1 executable accepts syntactically valid programs in our newly created language, and generates at least one error message for each syntactically invalid program. The error message should include an accurate row/column for the point at which the lab1 program identifies there is a problem.
This means adding suitable regular expressions, tokens, rules etc across the two files to support the grammar rules.
Your lab1 program is not required to check the semantic rules (things like declaring variables before they're used, type checking, etc), it just has to check basic syntax.
Debugging:
If lex/yacc give nebulous warnings about shift/reduce conflicts,
one way to get more information is to add a -v option at the end of
the yacc commands in the makefile, e.g.
yacc lab1.yacc -v
This will cause yacc to produce an extra file, y.output, which contains a description of the finite state machine yacc constructed. At the top of y.output it will also tell you which state(s) have conflicts.
These conflicts mean that there is some ambiguity in what action the parser should take given a specific state and token, and could come from ambiguities in either your lex regexes or your yacc grammars.
Sometimes conflicts can also be flagged when you have executable C blocks embedded in your rules at points where the parser can't yet identify which rule will ultimately be correct, e.g.
W: X { } Y ; W: X { } Z ;Either of the two { } code blocks could theoretically be valid at the point we're being asked to run them, it's not until Y or Z is encountered that the parser knows which form of W we're actually dealing with.
Eliminating the code blocks (or moving them after the Y/Z) will elimiate such conflicts, since the parser isn't asked to take external action until after it has figured out which form is correct.
Testing:
When evaluating your programs, they'll be run against a suite of valid programs and a suite of invalid programs, using different subsets and combinations of the language features.
I strongly recommend creating a collection of test cases and programs. While your .lex and .yacc code must be strictly your own work, I have no objection to students sharing interesting programs for use as test cases.
Submission:
Remember to perform "git add"s for your lab1.lex and lab1.yacc files, plus any test cases you used that you'd like me to see, then do a "git commit" and "git push" to submit the work before the deadline.
Reminder of git fetch/clone/submit cycle:
To obtain a lab initially (e.g. lab1 here) ssh csci fork csci439/lab1 csci439/$USER/lab1 git clone csci:csci439/$USER/lab1 |