Sunday 27 January 2008

More syntactic analysis: building an expression tree

Its been a few days since the last post in this series, mostly because this step was trickier than the previous few and partly because once I had the tree building working I couldn't resist adding the evaluation step. This was actually a good thing, because it helped me discover a mistake I made with the tree building and saved me from making a fool of myself by posting some broken crap (writing unit tests should have theoretically prevented this but I suck at them). Anyway, lets get on with it. I need to define the data structure the tree will be built from. I tried enhancing the tokens to act as the tree nodes themselves, but it made the tree manipulations more complicated. Instead I decide to keep it simple:

class BinaryTreeNode
attr_accessor :lhs, :rhs, :token

def initialize(token)
@token = token
end

def leaf?
return @lhs == nil && @rhs == nil
end
end
Pretty self explanatory I think. The tree node holds a reference to a token and its two child nodes. The 'leaf?' method isn't really necessary but it helps simplify the building algorithm. I considered trying to download some sort of diagramming tool to explain the process, but I knew it was going to suck. So instead, you get this: Hopefully you can follow this without too much trouble. I build the tree by performing the same single operation over and over. I don't know what to call it ('create_parent' isn't particular explanatory), but this is the code:
def create_parent(target_node, operator, integer)
new_node = BinaryTreeNode.new(operator)
new_node.lhs = target_node
new_node.rhs = BinaryTreeNode.new(integer)
return new_node
end
This method takes an existing node and makes it the left child of a new node. It then sets up the operator and the value and returns the new node. In the diagram above, this method is called on the (6) node in step 1 and the result is the tree in step 2. The tree building algorithm only needs to use this operation on two nodes:
  • the root node if the operator is a + or - or if the root node is the only node or if the root operator is a * or /.
  • the root node's right hand child in other cases (the operator is a * or / and the other conditions don't apply).
Once you understand the create_parent operation, you will find there is not much else to it:
def build_tree(tokens)
tokens = tokens.dup
root_node = BinaryTreeNode.new(tokens.shift) 

while(tokens.size > 1)
  operator = tokens.shift
  integer = tokens.shift 
  if(root_node.leaf? || !operator.major || root_node.token.major)
    # If the root_node is the only node or if we have a '+' or '-'
    # OR if the root operator is '*' or '/' then append to root
    root_node = create_parent(root_node, operator, integer)   
  else
    # We have a * or /, append to root.rhs   
    root_node.rhs = create_parent(root_node.rhs, operator, integer)            
  end
end

 return root_node
end
The Operator.major method is a badly named way of determining whether the operator is addition or subtraction (minor) or multiplication or division (major). I'm sure there is some appropriately mathy term that I could use instead but it did not come to me. So while there are still tokens left to process, I take two off, and then do a create_parent with them. The logic I described above is used to figure out where the new node is created and thats it. I must admit I was pleased with the simplicity of this approach. If the next operator is addition or subtraction then you want the existing expression to be evaluated first, so simply creating a new root node and attaching the existing tree to the left does the job. If the next operator is multiplication or division, then you want it to take precedence before the root operator, UNLESS the root operator is also multiplication or division (to preserve left to right evaluation). You might be wondering what that first line of code is, 'tokens = tokens.dup'. This was a quick fix for a small issue I overlooked. I am using Array.shift to work my way through the tokens and as a result when the calling code returns, the array it passed in will be empty. The 'dup' method simply returns a clone of the array which I can modify without risk of breaking anything. I tried writing unit tests for the tree building but they end up being quite lame. Its fiddly work to verify via code that the nodes are arranged into the correct structure. I know it goes against the spirit of unit testing but I found it easier to test the tree structure simply by skipping forward to implementing evaluation and using the evaluation tests. Tomorrow I will post the wacky implementation I chose for evaluating the tree. Its probably not the best way to do it but it demonstrates some cool Ruby functionality.

5 comments:

  1. This noble project is going to be rewarding!

    Question: Are you able to tell how many people are RSS subscribed to this thing?

    ReplyDelete
  2. Not exactly. Google Reader will tell me how many Google Reader users are subscribed. To know the number of subscribers for sure I think I would have to setup a service such as FeedBurner.

    I might investigate it a few months down the track. Right now I guarantee I can count the subscribers on one hand.

    ReplyDelete
  3. Nice algorithm!

    Associating numerical value to operators may help give you a easy operator precedence. Thus the higher valued operators become the RHS of the root node, while the lower priority operators use the root as a LHS child.

    You have the potential to support multi-parameter function calls and the like, but the tree would need to support N child nodes, and not just LHS and RHS.

    ReplyDelete
  4. The "minor" and "major" you use are usually described as "expression" and "term". Terms involve multiplication (or division) with factors. Expressions involve addition (or subtraction) with terms.

    ReplyDelete
  5. it is really good simple algorithm of implementing expression tree.

    ReplyDelete