## Monday, 4 February 2008

### Parentheses in the calculator: a first look at context-free grammars

This series on writing a program that can evaluate simple arithmetic expressions is all about baby steps. I have no idea what I'm doing in terms of the problem space and I'm similarly ignorant of my tool, Ruby. But I'm making progress and have a good idea of what is next:

• expand number support to the rationals
• report the position of a syntax error when found
• implement support for parentheses
The first two shouldn't be a big deal, I could probably get them working pretty quickly with the existing code base. Parentheses however, are significantly more work. I know of only one way to deal with them: context-free grammars. Even if I was aware of another way to deal with the parens I would ignore it - this exercise is about learning compilers and I know that implementing a context-free grammar is necessary. I have found the lecture notes for CS 164 at Berkeley University to be quite useful. They read well without a lecturer and handle the topics with a degree of patience that allows me to comprehend it. I really wish I had been given the chance to do a course like this while I was at university. I think the context-free grammar I need can be defined as follows:
E => n | E+E | E-E | E*E | E/E | (E)
E stands for Expression and is the only non-terminal in this grammar. A non-terminal is a token that can be "expanded" further using the grammar rules. The grammar contains the following terminal tokens: 'n', '+', '-', '*', '/', '(' and ')'. Basically this grammar says that an expression can consist of either a number (n), or a combination of two expressions, or an expression in parentheses. Using this grammar we could derive the following strings:
120+16*2 6+((120-2)*4) (((1)))
But note that we couldn't derive:
1+*2 )2+3 4+(-)
Obviously this is the sort of behaviour we want. So now that I have a grammar, I want to implement a function that will take a string and tell me whether it is derivable from the grammar. There are a few ways to do this but I have to read more before I can figure out which one to try to implement first. Its also possible that I will have to modify my grammar to make it parse able.