Saturday, 9 February 2008

More on context free grammars

I started writing a post on the work I've done this week but soon realised that it would be useful to explain grammars a bit more before continuing. After all, it is possible that some of you went to the Wikipedia page for context-free grammars and were (like me) unable to curb the involuntary reflex to close the tab as soon as that stack of ugly math symbols came into view. Anyway, lets get to it. To make a grammar you need the following:

  1. One or more terminal symbols.
  2. One or more non-terminal symbols.
  3. A start symbol (must be non-terminal).
  4. Production rules that relate a non-terminal symbol to a set of symbols.
Terminal symbols are the output of the grammar and are usually written in lower case. Non-terminal symbols are temporary - they cannot appear in a generated string and are designed solely for consumption via production rules. They are usually written using upper case. Ok, so lets define a grammar: Terminal(s): a, b Non-terminal(s): S Start symbol: S The only thing left is the production rule(s):
S => aS | b
This rule says that the non-terminal S can be replaced with either aS or b. Given this fact, it should be apparent that this grammar produces strings that consist of zero or more a's and end with a b. Just to be clear, let me show you the 4 smallest strings that it can produce:
b ab aab aaab
And now some strings that this grammar cannot produce:
a ba baab abaab
So now that we have that clear, the question is how would you write a program that would return true for the top 4 and false for the bottom 4. Ok thats not the question - its trivial. But given an arbitrary set of production rules... now thats much more tricky. In my previous post I proposed a grammar for my calculator and discussed some valid and invalid strings. From what I've read so far, it looks like that getting an implementation that will return true for the valid strings and false for the invalid ones will actually take me 90% of the way to having an implementation for building a parse tree based on the grammar.

No comments:

Post a comment