Monday, 28 January 2008

Evaluating the expression tree

Lets start off with some tests:

class EvaluateTests < Test::Unit::TestCase
  def test_1
    assert_equal(1, calculate("1"))
  def test_add
    assert_equal(3, calculate("1+2"))
  def test_subtract
    assert_equal(3, calculate("7-4"))
  def test_multiply
    assert_equal(6, calculate("3*2"))
  def test_divide
    assert_equal(3, calculate("6/2"))
  def test_negative_result
    assert_equal(-1, calculate("1-2"))
  def test_multitive_before_additive
    assert_equal(14, calculate("4*3+2"))
    assert_equal(14, calculate("2+3*4"))
    assert_equal(10, calculate("4+3*2"))
  def test_left_to_right
    assert_equal(8, calculate("12-6+2"))
    assert_equal(4, calculate("12/6*2"))
Not exactly comprehensive, but its a start. I need to write a calculate method to tie together the work I've done over the last few weeks:
def calculate(expression)
  tokens = tokenize(expression)
  root_node = build_tree(tokens)
  return root_node.evaluate()
Evaluating the tree is obviously a recursive function. Here is the first bit:
class BinaryTreeNode
  def evaluate()
    return @token.evaluate(self)
I decided the easiest way to do this is have the evaluate method take an argument: the tree node that contains the token. When evaluating an IntegerToken the value is simply returned, no need to use the argument:
class IntegerToken < Token
  def evaluate(containerNode)
    return @value
For the OperatorToken, we want: evaluate(left child) [operator] evaluate (right child). There are lots of different ways to do this - a case statement is the simplest:
class OperatorToken
  def evaluate(containerNode)
    case @value
      when "+"
        return containerNode.lhs.evaluate() + containerNode.rhs.evaluate()
      when "-"
        return containerNode.lhs.evaluate() - containerNode.rhs.evaluate()
      when "*"
        return containerNode.lhs.evaluate() * containerNode.rhs.evaluate()
      when "/"
        return containerNode.lhs.evaluate() / containerNode.rhs.evaluate()
My my isn't that boring. Another option would be to implement subclasses for OperatorToken - I wouldn't learn much from doing it though. Lets try something different instead:
class OperatorToken < Token
  attr_reader :major
  def initialize(value)
    @value = value
    @eval_method = Fixnum.instance_method(value)
    case @value
      when "+","-"   
        @major = false
      when "*","/"
        @major = true
  def evaluate(containerNode)
    bound_method = @eval_method.bind(containerNode.lhs.evaluate())    
The three relevant lines of code are highlighted. But first, some background. Operators in Ruby are actually just methods. When I execute the line x = 2 + 3 I am calling the '+' method on the object '2' which is an instance of the class Fixnum. So what I am actually doing with that line is this: x = 2.+(3) . Classes in Ruby are just like normal objects, so you can call methods on them as I am doing in the first highlighted line. If the OperatorToken is constructed with "+", then I acquire an instance of the '+' method defined on the class Fixnum and store it as @eval_method. What I actually store is an instance of UnboundMethod - if you are a .NET developer, it might help to think of it as similar to MethodInfo as you can use it to invoke the method, but you will need to specify the instance you wish to invoke the method on. The second highlighted line does this - the left node is evaluated and the result is our bind target. The third line invokes the bound method and passes the result of evaluating the right node. The result is what we are after and is returned. Yes, this is much more complicated than a switch statement, and surely much slower. But I learned something, and hopefully so did you. Lets run my unit tests:
>ruby calculator.rb
Loaded suite calculator
Finished in 0.022 seconds.

24 tests, 37 assertions, 0 failures, 0 errors
>Exit code: 0
I think its worth mentioning that I have sort of covered up some mistakes I made while implementing the tree building. I didn't initially write the left to right unit tests and so for a while my calculator could do "2+9/3*4" but not "9/3*4" (it returned zero rather than 12). I didn't notice this problem until I ran the stress tester:
def stress_test()
  expressions = generate_expressions(100)
  rejects = []
  expressions.each do |exp|
      rejects << exp if (eval(exp) != calculate(exp))    
  return rejects
def generate_expressions(count)
    expressions = []
    operators = ["+","-","*","/"]    
    count.times do
      exp = (rand(100) + 1).to_s
      rand(10).times do
        exp << operators[rand(4)]
        exp << (rand(100) + 1).to_s
      expressions << exp
    return expressions
This baby will generate 100 expressions, each containing up to 10 terms between 1 and 100. It does not generate zeros because I would then have to deal with division by zero problems (I'll let that wait for version two). The eval method is not mine - it is a built in method that uses Ruby's own expression evaluation engine. This is really useful because I can pass an expression string into my program and compare the result with what Ruby came up with. If there is a discrepancy then I add it to the array of rejects and return them when done. The stress test revealed the aforementioned problem with left to right evaluation which I fixed, but it also raised something else. Consider this test:
def test_really_big_number
  assert_equal(1000000000000000000001, calculate("1000000000000000000000+1"))
Unfortunately, it doesn't pass:
>ruby calculator.rb
Loaded suite calculator
Finished in 0.02 seconds.

1) Error:
TypeError: bind argument must be an instance of Fixnum
calculator.rb:55:in `bind'
calculator.rb:55:in `evaluate'
calculator.rb:12:in `evaluate'
calculator.rb:118:in `calculate'
calculator.rb:209:in `test_really_big_number'

25 tests, 37 assertions, 0 failures, 1 errors
>Exit code: 1
When a Fixnum gets larger than a certain size it changes into a Bignum. Unfortunately, this wreaks havoc with my evaluation code because my unbound method will only bind to a Fixnum, not a Bignum. There are some workarounds, a relatively simple one already springs to mind. However I don't think they are worth entertaining - going back to the case statement or something similar seems to be a better idea. Next post I clean up this issue and the code in general.

No comments:

Post a Comment