Paul Batum: Ruby

Showing posts with label Ruby. Show all posts

Monday, 31 March 2008

Ruby on Rails, IDE's and autocompletion

Over the last month I've been tinkering with a web development framework called Ruby on Rails. Its a relatively new framework (only a few years old), quite popular and very different to ASP.NET. My progress with Rails so far has been pitifully slow, for a few different reasons: I've managed to make things harder for myself by trying to learn the newest version (released Dec 2007) which limits the availability of appropriate tutorials and textbooks, I don't have much of an attention span at home and so I'm doing various other things at the same time, and there is one other reason that I'll discuss in a moment. My original plan was to play with Rails from within Heroku, a Rails web host currently in beta that offers a browser based integrated development environment, but I soon realised that it was not appropriate for a newbie learning the ropes. I already had Ubuntu running on my laptop so I went through the process of installing it but then there was still the matter of which IDE to use. Its funny - as a .NET developer I never really had to worry about IDE's. Visual Studio is fantastic and there is little in the way of realistic alternatives. I've never wished for a radically different IDE, just enhancements to VS. But step outside of the world of Microsoft based development and suddenly there are many alternatives and little in the way of consensus of what is 'best'. The majority of Rail's tutorial videos are done on a Macintosh using TextMate, but that is not option for me. When it comes to Rails development on Linux, there are several options and for now I've settled on RadRails, a plugin for Aptana Studio, which seems to be some sort of fork of Eclipse. Ok so why all this harping on about IDE's? Well it turns out that the biggest lesson I've learned so far isn't about web development, but about how I learn frameworks/api's in general:

AUTOCOMPLETION

Known as Intellisense in the Microsoft world, autocompletion is my bread and butter. I will use the documentation to find the necessary namespace and then explore it with autocompletion. Only now am I truly aware of how much I rely on it. If you asked most developers about autocompletion, they'd say "oh yeah, its great, it saves heaps of typing". This is certainly true, but saved key strokes is not where the real benefit lies for me: its a discovery mechanism. I found it difficult learning Ruby without autocompletion (and still do) . Its an unfortunate fact that autocompletion is harder to implement and in general less useful for dynamically typed languages. Autocompletion for Rails would be particularly difficult to implement because so much of Rails is metaprogramming magic. I expect to talk about this in more detail in some future posts. For me, Ruby's impressive support for metaprogramming is its most exciting feature. Much of what makes Rails worthwhile relies on this. But it is this very feature that makes meaningful autocompletion so difficult to achieve, which makes it a double edged sword. Its obvious to me that I need to learn to get by without my crutch. And learn I shall, because while RadRails claims to support autocompletion, I've been hitting a whole lot of ctrl+space and so far I'm not impressed.

Sunday, 24 February 2008

Status update: compilers and ruby on rails

So you probably noticed I haven't been posting much about the parsing stuff lately and thats because I stopped working on it. I got to the point where wikipedia + lecture notes isn't quite cutting it and I've been unable to find a resource that tackles the subject matter in a way I am comfortable with. A good textbook could make a big difference and there are a few out there that look promising but my heart is not in it at the moment. I'm somewhat torn in between wanting to finish the projects I start and making sure that I am enjoying the projects. However the simple fact is that this plan of less video games and more programming/reading/learning is only sustainable if I am really enjoying the work. My intention is to come back to the compilers stuff in a few months and spend time looking at some of the programs out there that make compiler/interpreter development easier. Previously I avoided this because I wanted to learn the fundamentals and I found this to be worthwhile. It was never my intention to implement an actual compiler or interpreter completely from scratch, although I had hoped to get somewhat further than I did. In any case, the topic that holds my interest at the moment is that of a web development framework called Ruby on Rails. Rails interests me for a number of reasons but I think the primary driver is frustration with ASP.NET at work. I have read about the model-view-controller pattern but never actually worked with it first hand so I am curious about how it compares to developing with web forms. Rather than downloading and configuring Rails locally I am making use of Heroku - they provide free Rails hosting with an integrated development environment. I'm sure you will hear how I'm finding it soon enough.

Monday, 11 February 2008

Syntax checking with a LL(1) parser

After reading about some different types of parsers, I decided to have a go at LL(1). This means the parser works Left-to-right, generates the Leftmost derivation and looks one token ahead. The basic idea of LL(1) is to build a stack of symbols that represent the derived expression by using a lookup table that tells you what rule to apply based on the current symbol and current token. Not all context free grammars can be parsed using LL(1) but I think the calculator's grammar can, with some modification. Lets break the work into some smaller steps:

Modify the grammar I described last week to support LL(1) parsing.
Modify the tokens_are_valid method to use LL(1) to verify whether the token stream is well formed. Given that I already have tests for tokens_are_valid, I will know that I am already on the right track once the existing tests are passing with the new implementation.
Add syntax tests for parentheses. Watch them fail.
Modify the tokens_are_valid implementation to make the new tests pass. Once this is happening we are done with using LL(1) for syntax verification and can move on to using LL(1) to actually do the parsing.
Modify the build_tree method to use LL(1) and watch the operator precedence tests fail because my grammar doesn't deal with the precedence issue.
Fix the grammar to support operator precedence, make the corresponding changes to the lookup table and watch the tests pass.
Add evaluation tests for parentheses . I think these babies should automatically work if I've done the other steps right.

In this post I will tackle steps 1,2 and 3. Here is the grammar I described last week:

E => n | E+E | E-E | E*E | E/E | (E)

As it stands, this grammar will not work with LL(1) because to figure out which rule to apply, I have to look at least 2 tokens ahead. For example, starting with E and given two different inputs "1" and "1+1", its obvious that I need to use a different rule for each input, but both of them start with the token "1". This problem can be fixed by modifying the grammar. Here goes:

E => (E) | nT T => +E | -E | * E | /E | ε

The symbol 'ε' seems to represent null or empty. Basically it means that the symbol T can just disappear without producing anything. Resuming the above example, now there is only one rule to apply: E => nT. In the "1" case, the T disappears. In the "1+1" case the T is expanded to +E. Here is the lookup table for this grammar:

	n	+	-	*	/	(	)	$
E	nT					(E)
T		+E	-E	*E	/E		ε	ε

The '$' represents the end of the input string. The gaps in the table represent syntax errors. Writing the code There are a few issues that need addressing when it comes to implementation. The most obvious one is that I need some way to represent the symbols in the above table. Well it turns out Ruby has a symbol class, though it was not designed to represent grammar symbols - Ruby symbols are just named constants that have no intrinsic value(as far as I can tell). They use the :symbol_name syntax, so for example E and T are now represented as :E and :T. I amended the existing token classes to expose their corresponding symbol:

class AdditionToken < OperatorToken
def symbol
  return :plus
end
end

I need a way to determine if a symbol is terminal or not, and there is a rather nice way to do this - simply defining an extra method on the Symbol class:

class Symbol
def non_terminal?
  return (self == :T || self == :E)
end
end

Just to be clear, Symbol is not my class, I'm just adding an extra method to an existing class. Executing :T.non_terminal? will return true. Next on the agenda is the end token. We don't strictly need an end token but it does make the algorithm simpler.

class EndToken
def symbol
  return :end
end
end

The lookup table can be done as 2 dimensional hash:

def create_lookup_table()
lookup = {}
lookup[:E] = {}
lookup[:E][IntegerToken] = [:int,:T]
lookup[:E][LeftParenToken] = [:lpar, :E, :rpar]
lookup[:T] = {}
lookup[:T][AdditionToken] = [:plus, :E]
lookup[:T][SubtractionToken] = [:minus, :E]
lookup[:T][MultiplicationToken] = [:multiply,:E]
lookup[:T][DivisionToken] = [:divide, :E]
lookup[:T][RightParenToken] = []
lookup[:T][EndToken] = []
return lookup
end

As you can see, the table is first indexed with the non-terminal (:E or :T) and then with a token type. The result is an array of symbols, which may be empty. If an attempt to lookup an undefined entry is made then nil is returned and this is interpreted as a syntax error. Right, with that out of the way, I can begin work on tokens_are_valid. The goal is to replace the old implementation with an LL(1) algorithm and hopefully all the existing tests will pass:

def tokens_are_valid(tokens)
tokens = tokens.dup
symbols = [:E]
lookup = create_lookup_table()

# The symbol and token arrays can be used as stacks if the contents is first reversed.
# An end token/symbol is appended to each before they are reversed.
tokens.push(EndToken.new())
tokens.reverse!
symbols.push(:end)
symbols.reverse!

while(tokens.size > 0)
  current_symbol = symbols.pop() #The current symbol will always be consumed. Safe to pop.
  current_token = tokens.last    #The current token won't necessarily be consumed, cannot pop it yet.
  if(current_symbol.non_terminal?)
    result = lookup[current_symbol][current_token.class]
    return false if result.nil?   
      symbols.push(result.reverse)
      symbols.flatten!     
  else
    return false if current_symbol != current_token.symbol
    tokens.pop()
  end             
end

return true
end

Hopefully this is relatively self-explanatory. The bit that may need some explanation is in bold. See, I am using arrays as stacks but to do so I need to reverse the contents - I do this once at the beginning for the tokens and the symbols. However its also necessary to reverse the result of the table lookup for the same reason. Yes, I could have coded the table with the symbols backwards but that is more confusing. Easiest just to push a reversed copy of the result onto the symbol stack. The only problem with doing this is that you end up with an array inside an array, which is why I call the flatten! method. In case you were wondering, Ruby has a convention to use the exclamation mark for methods that modify the instance you called the method on, rather than just returning a modified object. The following should demonstrate:

irb(main):001:0> x = [1,2,3]
=> [1, 2, 3]
irb(main):002:0> y = x.reverse
=> [3, 2, 1]
irb(main):003:0> x
=> [1, 2, 3]
irb(main):004:0> x.reverse!
=> [3, 2, 1]
irb(main):005:0> x
=> [3, 2, 1]

Well, my proposed change to tokens_are_valid works fine - all my tests are still passing! The next step is to write some tests that include parentheses. Here are a few (no need to bore you with the full set):

class ParenthesesSyntaxTests < Test::Unit::TestCase
def test_surround_one_number # "(1)"
  tokens = [LeftParenToken.new(),IntegerToken.new(1), RightParenToken.new()]
  assert(tokens_are_valid(tokens))
end   

def test_surround_operation # "(1+2)"
  tokens = [LeftParenToken.new(),IntegerToken.new(1), AdditionToken.new(), IntegerToken.new(2), RightParenToken.new()]
  assert(tokens_are_valid(tokens))
end

def test_left_compound # "((500*20)+16)"
  tokens = [LeftParenToken.new(), LeftParenToken.new(), IntegerToken.new(500), MultiplicationToken.new(), IntegerToken.new(20), RightParenToken.new(), AdditionToken.new(), IntegerToken.new(16), RightParenToken.new()]
  assert(tokens_are_valid(tokens))
end

def test_right_compound # "(16+(500*20))"
  tokens = [LeftParenToken.new(), IntegerToken.new(16), AdditionToken.new(), LeftParenToken.new(), IntegerToken.new(500), MultiplicationToken.new(), IntegerToken.new(20), RightParenToken.new() , RightParenToken.new()]
  assert(tokens_are_valid(tokens))
end

def test_empty_parens_error # "()"
  tokens = [LeftParenToken.new(), RightParenToken.new()]
  assert(!tokens_are_valid(tokens))
end

def test_wrong_order_error # ")2("
  tokens = [RightParenToken.new(), IntegerToken.new(1), LeftParenToken.new()]
  assert(!tokens_are_valid(tokens))
end

def test_no_operator_adjacent_to_paren_error # "1+2(3+4)"
  tokens = [IntegerToken.new(1), AdditionToken.new(), IntegerToken.new(2), LeftParenToken.new(), IntegerToken.new(3), AdditionToken.new(), IntegerToken.new(4), RightParenToken.new()]
  assert(!tokens_are_valid(tokens))
end

end

All but one pass! Can you guess which one? It turns out this is a problem with my grammar, not the implementation. Next post I will discuss the failing test and what I have to do to fix it.

Saturday, 2 February 2008

The calculator, version one

There are a few loose ends to tie up and then the first pass of the calculator is complete. If you've forgotten where I was up to, here is a quick reminder:

The calculator falls over when the numbers get big because I implemented the tree evaluation in a educational but less than practical manner.
Whitespace should be ignored.
The codebase needs a general cleanup.

None of these are particularly involved - this should be a relatively short post! During the last post I discussed a few ways to implement the tree evaluation. I pointed out the most obvious solution (a switch statement), mentioned polymorphism and then ventured forth into using unbound methods. Upon discovering the issues with the latter approach it was a no brainer to go back to polymorphism:

class OperatorToken
 attr_reader :multitive, :value

 def evaluate(container_node)
   return do_operation(container_node.left_child.evaluate(), container_node.right_child.evaluate())
 end 
end

class AdditionToken < OperatorToken
 def initialize()
   @value = "+"
   @multitive = false
 end

 def do_operation(left_child_value, right_child_value)
   return left_child_value + right_child_value
 end 
end

I think you can guess what the other Tokens look like. The @value field isn't necessary to the implementation but it allowed me to keep a bunch of existing tests. The @multitive field is a replacement of the crappy named "major" field I used to determine whether the operator was higher precedence. I had to make a few minor changes to other sections of the program but it was all very easy. Whitespace is trivial to deal with. I could have easily handled it when implementing the tokenization but I guess it slipped my mind. Lets add a test for it:

  def test_spaces
   assert_equal([14,"+",62,"*",127], tokenize_for_value("  14 +62*   127 ") ); end

The fix is just stripping whitespace from the string each time a token is generated. I added a few small tests and did my best to clean up the code. You can download it here. Some sample output:

C:\Users\Paul>irb
irb(main):001:0> require 'calculator_v1'
=> true
irb(main):002:0> calculate "10 + 2*6"
=> 22
irb(main):003:0> calculate "1000000000000000000000000000000000-1"
=> 999999999999999999999999999999999
irb(main):004:0> calculate "x-1"
=> "Syntax Error"
irb(main):005:0>

Next post I discuss the what I would like to achieve in the next version.

Monday, 28 January 2008

Evaluating the expression tree

Lets start off with some tests:

class EvaluateTests < Test::Unit::TestCase
  def test_1
    assert_equal(1, calculate("1"))
  end
  
  def test_add
    assert_equal(3, calculate("1+2"))
  end
  
  def test_subtract
    assert_equal(3, calculate("7-4"))
  end
  
  def test_multiply
    assert_equal(6, calculate("3*2"))
  end
  
  def test_divide
    assert_equal(3, calculate("6/2"))
  end
  
  def test_negative_result
    assert_equal(-1, calculate("1-2"))
  end
  
  def test_multitive_before_additive
    assert_equal(14, calculate("4*3+2"))
    assert_equal(14, calculate("2+3*4"))
    assert_equal(10, calculate("4+3*2"))
  end
  
  def test_left_to_right
    assert_equal(8, calculate("12-6+2"))
    assert_equal(4, calculate("12/6*2"))
  end
end

Not exactly comprehensive, but its a start. I need to write a calculate method to tie together the work I've done over the last few weeks:

def calculate(expression)
  tokens = tokenize(expression)
  root_node = build_tree(tokens)
  return root_node.evaluate()
end

Evaluating the tree is obviously a recursive function. Here is the first bit:

class BinaryTreeNode
  def evaluate()
    return @token.evaluate(self)
  end
end

I decided the easiest way to do this is have the evaluate method take an argument: the tree node that contains the token. When evaluating an IntegerToken the value is simply returned, no need to use the argument:

class IntegerToken < Token
  def evaluate(containerNode)
    return @value
  end
end

For the OperatorToken, we want: evaluate(left child) [operator] evaluate (right child). There are lots of different ways to do this - a case statement is the simplest:

class OperatorToken
  def evaluate(containerNode)
    case @value
      when "+"
        return containerNode.lhs.evaluate() + containerNode.rhs.evaluate()
      when "-"
        return containerNode.lhs.evaluate() - containerNode.rhs.evaluate()
      when "*"
        return containerNode.lhs.evaluate() * containerNode.rhs.evaluate()
      when "/"
        return containerNode.lhs.evaluate() / containerNode.rhs.evaluate()
      end
    end
   end
end

My my isn't that boring. Another option would be to implement subclasses for OperatorToken - I wouldn't learn much from doing it though. Lets try something different instead:

class OperatorToken < Token
  attr_reader :major
  
  def initialize(value)
    @value = value
    @eval_method = Fixnum.instance_method(value)
    case @value
      when "+","-"   
        @major = false
      when "*","/"
        @major = true
      end
  end
  
  def evaluate(containerNode)
    bound_method = @eval_method.bind(containerNode.lhs.evaluate())    
    return bound_method.call(containerNode.rhs.evaluate())
  end
end

The three relevant lines of code are highlighted. But first, some background. Operators in Ruby are actually just methods. When I execute the line x = 2 + 3 I am calling the '+' method on the object '2' which is an instance of the class Fixnum. So what I am actually doing with that line is this: x = 2.+(3) . Classes in Ruby are just like normal objects, so you can call methods on them as I am doing in the first highlighted line. If the OperatorToken is constructed with "+", then I acquire an instance of the '+' method defined on the class Fixnum and store it as @eval_method. What I actually store is an instance of UnboundMethod - if you are a .NET developer, it might help to think of it as similar to MethodInfo as you can use it to invoke the method, but you will need to specify the instance you wish to invoke the method on. The second highlighted line does this - the left node is evaluated and the result is our bind target. The third line invokes the bound method and passes the result of evaluating the right node. The result is what we are after and is returned. Yes, this is much more complicated than a switch statement, and surely much slower. But I learned something, and hopefully so did you. Lets run my unit tests:

>ruby calculator.rb
Loaded suite calculator
Started
........................
Finished in 0.022 seconds.

24 tests, 37 assertions, 0 failures, 0 errors
>Exit code: 0

I think its worth mentioning that I have sort of covered up some mistakes I made while implementing the tree building. I didn't initially write the left to right unit tests and so for a while my calculator could do "2+9/3*4" but not "9/3*4" (it returned zero rather than 12). I didn't notice this problem until I ran the stress tester:

def stress_test()
  expressions = generate_expressions(100)
  rejects = []
  expressions.each do |exp|
      rejects << exp if (eval(exp) != calculate(exp))    
  end
  return rejects
end
 
def generate_expressions(count)
    expressions = []
    operators = ["+","-","*","/"]    
    count.times do
      exp = (rand(100) + 1).to_s
      rand(10).times do
        exp << operators[rand(4)]
        exp << (rand(100) + 1).to_s
      end        
      expressions << exp
    end
    return expressions
end

This baby will generate 100 expressions, each containing up to 10 terms between 1 and 100. It does not generate zeros because I would then have to deal with division by zero problems (I'll let that wait for version two). The eval method is not mine - it is a built in method that uses Ruby's own expression evaluation engine. This is really useful because I can pass an expression string into my program and compare the result with what Ruby came up with. If there is a discrepancy then I add it to the array of rejects and return them when done. The stress test revealed the aforementioned problem with left to right evaluation which I fixed, but it also raised something else. Consider this test:

def test_really_big_number
  assert_equal(1000000000000000000001, calculate("1000000000000000000000+1"))
end

Unfortunately, it doesn't pass:

>ruby calculator.rb
Loaded suite calculator
Started
........E................
Finished in 0.02 seconds.

1) Error:
test_really_big_number(EvaluateTests):
TypeError: bind argument must be an instance of Fixnum
calculator.rb:55:in `bind'
calculator.rb:55:in `evaluate'
calculator.rb:12:in `evaluate'
calculator.rb:118:in `calculate'
calculator.rb:209:in `test_really_big_number'

25 tests, 37 assertions, 0 failures, 1 errors
>Exit code: 1

When a Fixnum gets larger than a certain size it changes into a Bignum. Unfortunately, this wreaks havoc with my evaluation code because my unbound method will only bind to a Fixnum, not a Bignum. There are some workarounds, a relatively simple one already springs to mind. However I don't think they are worth entertaining - going back to the case statement or something similar seems to be a better idea. Next post I clean up this issue and the code in general.

Sunday, 27 January 2008

More syntactic analysis: building an expression tree

Its been a few days since the last post in this series, mostly because this step was trickier than the previous few and partly because once I had the tree building working I couldn't resist adding the evaluation step. This was actually a good thing, because it helped me discover a mistake I made with the tree building and saved me from making a fool of myself by posting some broken crap (writing unit tests should have theoretically prevented this but I suck at them). Anyway, lets get on with it. I need to define the data structure the tree will be built from. I tried enhancing the tokens to act as the tree nodes themselves, but it made the tree manipulations more complicated. Instead I decide to keep it simple:

class BinaryTreeNode
attr_accessor :lhs, :rhs, :token

def initialize(token)
@token = token
end

def leaf?
return @lhs == nil && @rhs == nil
end
end

Pretty self explanatory I think. The tree node holds a reference to a token and its two child nodes. The 'leaf?' method isn't really necessary but it helps simplify the building algorithm. I considered trying to download some sort of diagramming tool to explain the process, but I knew it was going to suck. So instead, you get this:

Hopefully you can follow this without too much trouble. I build the tree by performing the same single operation over and over. I don't know what to call it ('create_parent' isn't particular explanatory), but this is the code:

def create_parent(target_node, operator, integer)
new_node = BinaryTreeNode.new(operator)
new_node.lhs = target_node
new_node.rhs = BinaryTreeNode.new(integer)
return new_node
end

This method takes an existing node and makes it the left child of a new node. It then sets up the operator and the value and returns the new node. In the diagram above, this method is called on the (6) node in step 1 and the result is the tree in step 2. The tree building algorithm only needs to use this operation on two nodes:

the root node if the operator is a + or - or if the root node is the only node or if the root operator is a * or /.
the root node's right hand child in other cases (the operator is a * or / and the other conditions don't apply).

Once you understand the create_parent operation, you will find there is not much else to it:

def build_tree(tokens)
tokens = tokens.dup
root_node = BinaryTreeNode.new(tokens.shift) 

while(tokens.size > 1)
  operator = tokens.shift
  integer = tokens.shift 
  if(root_node.leaf? || !operator.major || root_node.token.major)
    # If the root_node is the only node or if we have a '+' or '-'
    # OR if the root operator is '*' or '/' then append to root
    root_node = create_parent(root_node, operator, integer)   
  else
    # We have a * or /, append to root.rhs   
    root_node.rhs = create_parent(root_node.rhs, operator, integer)            
  end
end

 return root_node
end

The Operator.major method is a badly named way of determining whether the operator is addition or subtraction (minor) or multiplication or division (major). I'm sure there is some appropriately mathy term that I could use instead but it did not come to me. So while there are still tokens left to process, I take two off, and then do a create_parent with them. The logic I described above is used to figure out where the new node is created and thats it. I must admit I was pleased with the simplicity of this approach. If the next operator is addition or subtraction then you want the existing expression to be evaluated first, so simply creating a new root node and attaching the existing tree to the left does the job. If the next operator is multiplication or division, then you want it to take precedence before the root operator, UNLESS the root operator is also multiplication or division (to preserve left to right evaluation). You might be wondering what that first line of code is, 'tokens = tokens.dup'. This was a quick fix for a small issue I overlooked. I am using Array.shift to work my way through the tokens and as a result when the calling code returns, the array it passed in will be empty. The 'dup' method simply returns a clone of the array which I can modify without risk of breaking anything. I tried writing unit tests for the tree building but they end up being quite lame. Its fiddly work to verify via code that the nodes are arranged into the correct structure. I know it goes against the spirit of unit testing but I found it easier to test the tree structure simply by skipping forward to implementing evaluation and using the evaluation tests. Tomorrow I will post the wacky implementation I chose for evaluating the tree. Its probably not the best way to do it but it demonstrates some cool Ruby functionality.

Tuesday, 22 January 2008

Really simple syntactic analysis

(Continues from the previous post) So now I have an array of these token objects that I want to build into a binary tree. I said yesterday that I would not handle syntax errors in this first version but I have since realised that I was only talking about a particular type of syntax error: one that occurs at the lexical level. For example, the string "32+65-chocolate" is not an expression I expect my lexer to handle. Now if I was writing a lexer for a language that allows variable declarations, the aforementioned string might be valid because the previous line was "chocolate=6". That however, is outside the scope of this exercise so for now "chocolate" does not lex. So with that said, its time to discuss the other type of syntax error that springs to mind. Here are some examples:

1+
2*+3
-4

Uhoh. How is "-4" a syntax error? Its time to spill the beans: I lied when I said my lexer would support integers. This first version will only support non-negative integers. The important thing is to stay focused on getting something working, no matter how basic. I'll deal with those pesky negative numbers later. Ok, so I wrote units tests for the three examples above:

class SyntaxTests < Test::Unit::TestCase
def test_disallow_end_with_operator
tokens = [IntegerToken.new("1"), OperatorToken.new("+")]
assert(!tokens_are_valid(tokens))
end

def test_disallow_two_consecutive_operators
tokens = [IntegerToken.new("2"), OperatorToken.new("*"), OperatorToken.new("-"), IntegerToken.new("3")]
assert(!tokens_are_valid(tokens))
end

def test_disallow_start_with_operator
tokens = [OperatorToken.new("-"), IntegerToken.new("4")]
assert(!tokens_are_valid(tokens))
end
end

Ok now I need to make them pass. No problem!!

def tokens_are_valid(tokens)
return false
end

Ok that was some savage cheats. I obviously need to define some tests for valid syntax aswell as invalid syntax:

def test_allow_two_numbers
tokens = [IntegerToken.new("2"), OperatorToken.new("+"), IntegerToken.new("6")]
assert(tokens_are_valid(tokens))
end

def test_allow_three_numbers
tokens = [IntegerToken.new("73"), OperatorToken.new("*"), IntegerToken.new("56"), OperatorToken.new("-"), IntegerToken.new("94")]
assert(tokens_are_valid(tokens))
end

Ugh, those token constructors are taking up a lot of space aren't they? I might do something about that later. Also while I'm on the topic of token constructors, I also need to write tests to ensure that they raise exceptions if I do something silly like OperatorToken.new("6"). That will also have to wait. Getting back to the topic at hand, I now have some tests that fail:

>ruby calculator.rb
Loaded suite calculator
Started
FF..........
Finished in 0.009 seconds.

1) Failure:
test_allow_three_numbers(SyntaxTests) [calculator.rb:81]:
<false> is not true.

2) Failure:
test_allow_two_numbers(SyntaxTests) [calculator.rb:76]:
<false> is not true.

12 tests, 12 assertions, 2 failures, 0 errors
>Exit code: 1

Ok, now its time to flake out a bit. I figure I'm supposed to define some sort of context-free grammar but it looks kinda hard. Lets do something way easier:

def tokens_are_valid(tokens)
return false if(tokens.empty?)
return false if(tokens.first.class != IntegerToken)
return false if(tokens.last.class != IntegerToken)

expect_next_is_integer = true
tokens.each do |token|
 token_is_integer = (token.class == IntegerToken)
 return false if(expect_next_is_integer != token_is_integer)    
 expect_next_is_integer = !expect_next_is_integer
end

return true
end

If there are no tokens, or we do not start or end with an integer token, then return false. Then just loop through them and make sure the types alternate. I'm assuming that there are only two types of tokens - we will run into problems when it comes time to implement parentheses but that is to be expected. Ok lets give it a try:

>ruby calculator.rb
Loaded suite calculator
Started
............
Finished in 0.001 seconds.

12 tests, 12 assertions, 0 failures, 0 errors
>Exit code: 0

Good job. The process of building the binary tree will be much easier now that we can ensure that our array of tokens is valid.

Monday, 21 January 2008

Really simple lexical analysis

(Continues from the previous post)

If I ever develop my own programming language, you can bet I won't be writing the lexer by hand. Programs such as Flex will generate a lexer based on the regular expressions you feed it. However the purpose of this exercise is to learn the fundamentals of how compilers work, so today I will write the lexer. It should be easy because the strings I am parsing are so simple. I'm going to write a bunch of unit tests to define the behavior I am aiming for. Here is an example:

def test_integer_operator_integer
assert_equal([193,"*",671], tokenize_for_value("193*671") ); end

Basically this is saying that when passed the string "193*671", I expect the tokenize_for_value function to return an array that contains the values 193,"*" and 671. This is pretty simple and could be done in lots of different ways. I chose a mechanism that hopefully somewhat resembles what a proper lexer would do. First I define two types of token:

class IntegerToken
attr_reader :value
def initialize(value)
@value = value.to_i()
end
end

class OperatorToken
attr_reader :value
def initialize(value)
@value = value
end
end

The only difference between them at this point is that the IntegerToken converts its value to an integer (duh). I expect to extend these classes once I get to the next step but for now they will suffice. Next, I define an array of regular expressions that recognise integers and operators:

definitions = []
definitions << [/^\d+/, IntegerToken]
definitions << [/^\+/, OperatorToken]
definitions << [/^\-/, OperatorToken]
definitions << [/^\*/, OperatorToken]
definitions << [/^\//, OperatorToken]

Now we just loop our way through the string, matching the regular expressions against it and slicing out the characters that were matched successfully and build a token for them.

  tokens = []
lastSize = 0
# Track the number of remaining characters in the expression.
# If we manage to go through all the regular expressions without
# getting a match then we can parse no more.
while(expression.size != lastSize)
lastSize = expression.size
definitions.each do |regex, token_type|
# Do a match with the regex against the string, remove the
# characters that match and put them into lexmeme
lexmeme = expression.slice!(regex)
if (lexmeme)
  # We got a match, create a new token and break out of
  # the regex loop to preserve the regex ordering.
  tokens << token_type.new(lexmeme)
  break
end
end
end

return tokens

It would be fairly easy to add some code that will report if we had to finish parsing with characters still remaining - this would only happen in the case of a syntax error. I'll add that later once I have a basic version of the calculator working. Finally, I need a method that will take the string, call tokenize and give me an array of their values to make the unit tests easy to write:

def tokenize_for_value(expression)
tokens = tokenize(expression)
return tokens.map { |t| t.value }
end

When I run the ruby file (source) the unit tests execute. I wrote seven, so here is the output:

>ruby calculator_source.rb
Loaded suite calculator_source
Started
.......
Finished in 0.001 seconds.

7 tests, 7 assertions, 0 failures, 0 errors
>Exit code: 0

I'm pleased with how easy it is to write and run unit tests in ruby. There was very little friction. Next up: building a binary tree out of the tokens.

Sunday, 13 January 2008

Using IRB as an engine for programming puzzle game things

So I've been screwing around with Ruby a fair bit, but haven't really started writing proper programs in it yet. However I did have this one dumb idea that amused me, perhaps you will get a kick out of it. I mentioned in the last post that Ruby has a REPL called IRB. While experimenting with how it works I had this idea of making small puzzles that can be loaded into the IRB and then solved by executing code. Here is sample output for a toy example I made:

C:\Users\paul>irb -r puzzle1.rb
You have begun Puzzle 1.
You are in a small room, with a single closed door.
You can see a silver_key and copper_key, both on the floor.
The goal of the puzzle is to get the door open so you can leave the room.
To inspect individual objects simply type their name.
To interact with objects, use 'noun.verb' syntax. For example, try 'door.open'.

Puzzle1 >door
The door is closed.
Avaliable actions:
close
inspect
lock
open
unlock

Puzzle1 >door.open
The door is locked.

Puzzle1 >door.unlock
Action 'unlock' expects a single argument.

Puzzle1 >door.unlock silver_key
You try to put the key in the lock but it does not fit.

Puzzle1 >door.unlock copper_key
You unlocked the door.

Puzzle1 >door.open
The door is now open.
You exit the room via the doorway. Wheee, you completed the puzzle!

C:\Users\paul>

The source for this is quite ugly at the moment. I'm looking forward to digging into some of Ruby's meta-programming features that will help me clean it up. Now if player knows just a tiny bit of Ruby they can 'cheat' and finish this without using the silver_key or copper_key. I'm hoping to take this idea one step further and make puzzles that cannot be completed without this sort of 'cheating'. I'm not interested in developing typical text adventure style puzzles - I was never that much of a fan of the genre. But logic puzzles that require programming skills? That is another thing entirely. Surely lots of other people have had this same idea but I did a few searches and didn't find anything. A shame really because I imagine they could be quite fun. I will have a go at making one and we will see what happens.

Thursday, 10 January 2008

First impressions of Ruby

My first experience with Ruby was the really nice interactive tutorial. This is by far the best way to introduce people to a language that I have seen - it is really slick. My complements to Why for managing to make the first step of learning a new language so painless. However I would be remiss if I failed to mention the damn cursor on the thing - when you are using backspace it always deletes one more character than it looks like. Don't let this put you off though, give it a go! After finishing the interactive tutorial I downloaded the latest installer for windows and began playing around. Exploring Ruby is proving to be lots of fun but I am somewhat spoiled by Visual Studio - the tools and documentation included in the package are simply not as sophisticated and this is hardly surprising. The package comes with a REPL (Read-Eval-Print Loop). If you tried the interactive Ruby tutorial then you already have experience using a REPL - they are quite useful for learning a new programming language, though they are far better suited to interpreted languages than compiled languages. I find the simplicity of a REPL attractive because it makes it convenient to do short experimentation sessions. I can sit down for just 10 minutes and learn a couple of new things. Next post: RRC (Random Ruby Crap) #1

Paul Batum