Monday 28 January 2008

Evaluating the expression tree

Lets start off with some tests:

class EvaluateTests < Test::Unit::TestCase
  def test_1
    assert_equal(1, calculate("1"))
  end
  
  def test_add
    assert_equal(3, calculate("1+2"))
  end
  
  def test_subtract
    assert_equal(3, calculate("7-4"))
  end
  
  def test_multiply
    assert_equal(6, calculate("3*2"))
  end
  
  def test_divide
    assert_equal(3, calculate("6/2"))
  end
  
  def test_negative_result
    assert_equal(-1, calculate("1-2"))
  end
  
  def test_multitive_before_additive
    assert_equal(14, calculate("4*3+2"))
    assert_equal(14, calculate("2+3*4"))
    assert_equal(10, calculate("4+3*2"))
  end
  
  def test_left_to_right
    assert_equal(8, calculate("12-6+2"))
    assert_equal(4, calculate("12/6*2"))
  end
end
Not exactly comprehensive, but its a start. I need to write a calculate method to tie together the work I've done over the last few weeks:
def calculate(expression)
  tokens = tokenize(expression)
  root_node = build_tree(tokens)
  return root_node.evaluate()
end
Evaluating the tree is obviously a recursive function. Here is the first bit:
class BinaryTreeNode
  def evaluate()
    return @token.evaluate(self)
  end
end
I decided the easiest way to do this is have the evaluate method take an argument: the tree node that contains the token. When evaluating an IntegerToken the value is simply returned, no need to use the argument:
class IntegerToken < Token
  def evaluate(containerNode)
    return @value
  end
end
For the OperatorToken, we want: evaluate(left child) [operator] evaluate (right child). There are lots of different ways to do this - a case statement is the simplest:
class OperatorToken
  def evaluate(containerNode)
    case @value
      when "+"
        return containerNode.lhs.evaluate() + containerNode.rhs.evaluate()
      when "-"
        return containerNode.lhs.evaluate() - containerNode.rhs.evaluate()
      when "*"
        return containerNode.lhs.evaluate() * containerNode.rhs.evaluate()
      when "/"
        return containerNode.lhs.evaluate() / containerNode.rhs.evaluate()
      end
    end
   end
end
My my isn't that boring. Another option would be to implement subclasses for OperatorToken - I wouldn't learn much from doing it though. Lets try something different instead:
class OperatorToken < Token
  attr_reader :major
  
  def initialize(value)
    @value = value
    @eval_method = Fixnum.instance_method(value)
    case @value
      when "+","-"   
        @major = false
      when "*","/"
        @major = true
      end
  end
  
  def evaluate(containerNode)
    bound_method = @eval_method.bind(containerNode.lhs.evaluate())    
    return bound_method.call(containerNode.rhs.evaluate())
  end
end
The three relevant lines of code are highlighted. But first, some background. Operators in Ruby are actually just methods. When I execute the line x = 2 + 3 I am calling the '+' method on the object '2' which is an instance of the class Fixnum. So what I am actually doing with that line is this: x = 2.+(3) . Classes in Ruby are just like normal objects, so you can call methods on them as I am doing in the first highlighted line. If the OperatorToken is constructed with "+", then I acquire an instance of the '+' method defined on the class Fixnum and store it as @eval_method. What I actually store is an instance of UnboundMethod - if you are a .NET developer, it might help to think of it as similar to MethodInfo as you can use it to invoke the method, but you will need to specify the instance you wish to invoke the method on. The second highlighted line does this - the left node is evaluated and the result is our bind target. The third line invokes the bound method and passes the result of evaluating the right node. The result is what we are after and is returned. Yes, this is much more complicated than a switch statement, and surely much slower. But I learned something, and hopefully so did you. Lets run my unit tests:
>ruby calculator.rb
Loaded suite calculator
Started
........................
Finished in 0.022 seconds.

24 tests, 37 assertions, 0 failures, 0 errors
>Exit code: 0
I think its worth mentioning that I have sort of covered up some mistakes I made while implementing the tree building. I didn't initially write the left to right unit tests and so for a while my calculator could do "2+9/3*4" but not "9/3*4" (it returned zero rather than 12). I didn't notice this problem until I ran the stress tester:
def stress_test()
  expressions = generate_expressions(100)
  rejects = []
  expressions.each do |exp|
      rejects << exp if (eval(exp) != calculate(exp))    
  end
  return rejects
end
 
def generate_expressions(count)
    expressions = []
    operators = ["+","-","*","/"]    
    count.times do
      exp = (rand(100) + 1).to_s
      rand(10).times do
        exp << operators[rand(4)]
        exp << (rand(100) + 1).to_s
      end        
      expressions << exp
    end
    return expressions
end
This baby will generate 100 expressions, each containing up to 10 terms between 1 and 100. It does not generate zeros because I would then have to deal with division by zero problems (I'll let that wait for version two). The eval method is not mine - it is a built in method that uses Ruby's own expression evaluation engine. This is really useful because I can pass an expression string into my program and compare the result with what Ruby came up with. If there is a discrepancy then I add it to the array of rejects and return them when done. The stress test revealed the aforementioned problem with left to right evaluation which I fixed, but it also raised something else. Consider this test:
def test_really_big_number
  assert_equal(1000000000000000000001, calculate("1000000000000000000000+1"))
end
Unfortunately, it doesn't pass:
>ruby calculator.rb
Loaded suite calculator
Started
........E................
Finished in 0.02 seconds.

1) Error:
test_really_big_number(EvaluateTests):
TypeError: bind argument must be an instance of Fixnum
calculator.rb:55:in `bind'
calculator.rb:55:in `evaluate'
calculator.rb:12:in `evaluate'
calculator.rb:118:in `calculate'
calculator.rb:209:in `test_really_big_number'

25 tests, 37 assertions, 0 failures, 1 errors
>Exit code: 1
When a Fixnum gets larger than a certain size it changes into a Bignum. Unfortunately, this wreaks havoc with my evaluation code because my unbound method will only bind to a Fixnum, not a Bignum. There are some workarounds, a relatively simple one already springs to mind. However I don't think they are worth entertaining - going back to the case statement or something similar seems to be a better idea. Next post I clean up this issue and the code in general.

Pasting code into Blogger

I've been using a javascript tool to format my example code. It does two useful things:

  1. Replaces html-unfriendly characters such as <>
  2. Wraps it in some nice formatting tags.
You can choose whether it embeds the style information directly into the output or creates a seperate css entry. I've been using the embedding up until recently, but now I have a problem. I've discovered that Blogger's rich text compose mode annihilates my indenting so its time to switch to using the html editor instead. The embedded styles make for noisy html so I will fiddle with this post and try to get the css working instead.
class EvaluateTests < Test::Unit::TestCase
  def test_1
    assert_equal(1, calculate("1"))
  end
end
Edit: Yay, that works fine.

Sunday 27 January 2008

More syntactic analysis: building an expression tree

Its been a few days since the last post in this series, mostly because this step was trickier than the previous few and partly because once I had the tree building working I couldn't resist adding the evaluation step. This was actually a good thing, because it helped me discover a mistake I made with the tree building and saved me from making a fool of myself by posting some broken crap (writing unit tests should have theoretically prevented this but I suck at them). Anyway, lets get on with it. I need to define the data structure the tree will be built from. I tried enhancing the tokens to act as the tree nodes themselves, but it made the tree manipulations more complicated. Instead I decide to keep it simple:

class BinaryTreeNode
attr_accessor :lhs, :rhs, :token

def initialize(token)
@token = token
end

def leaf?
return @lhs == nil && @rhs == nil
end
end
Pretty self explanatory I think. The tree node holds a reference to a token and its two child nodes. The 'leaf?' method isn't really necessary but it helps simplify the building algorithm. I considered trying to download some sort of diagramming tool to explain the process, but I knew it was going to suck. So instead, you get this: Hopefully you can follow this without too much trouble. I build the tree by performing the same single operation over and over. I don't know what to call it ('create_parent' isn't particular explanatory), but this is the code:
def create_parent(target_node, operator, integer)
new_node = BinaryTreeNode.new(operator)
new_node.lhs = target_node
new_node.rhs = BinaryTreeNode.new(integer)
return new_node
end
This method takes an existing node and makes it the left child of a new node. It then sets up the operator and the value and returns the new node. In the diagram above, this method is called on the (6) node in step 1 and the result is the tree in step 2. The tree building algorithm only needs to use this operation on two nodes:
  • the root node if the operator is a + or - or if the root node is the only node or if the root operator is a * or /.
  • the root node's right hand child in other cases (the operator is a * or / and the other conditions don't apply).
Once you understand the create_parent operation, you will find there is not much else to it:
def build_tree(tokens)
tokens = tokens.dup
root_node = BinaryTreeNode.new(tokens.shift) 

while(tokens.size > 1)
  operator = tokens.shift
  integer = tokens.shift 
  if(root_node.leaf? || !operator.major || root_node.token.major)
    # If the root_node is the only node or if we have a '+' or '-'
    # OR if the root operator is '*' or '/' then append to root
    root_node = create_parent(root_node, operator, integer)   
  else
    # We have a * or /, append to root.rhs   
    root_node.rhs = create_parent(root_node.rhs, operator, integer)            
  end
end

 return root_node
end
The Operator.major method is a badly named way of determining whether the operator is addition or subtraction (minor) or multiplication or division (major). I'm sure there is some appropriately mathy term that I could use instead but it did not come to me. So while there are still tokens left to process, I take two off, and then do a create_parent with them. The logic I described above is used to figure out where the new node is created and thats it. I must admit I was pleased with the simplicity of this approach. If the next operator is addition or subtraction then you want the existing expression to be evaluated first, so simply creating a new root node and attaching the existing tree to the left does the job. If the next operator is multiplication or division, then you want it to take precedence before the root operator, UNLESS the root operator is also multiplication or division (to preserve left to right evaluation). You might be wondering what that first line of code is, 'tokens = tokens.dup'. This was a quick fix for a small issue I overlooked. I am using Array.shift to work my way through the tokens and as a result when the calling code returns, the array it passed in will be empty. The 'dup' method simply returns a clone of the array which I can modify without risk of breaking anything. I tried writing unit tests for the tree building but they end up being quite lame. Its fiddly work to verify via code that the nodes are arranged into the correct structure. I know it goes against the spirit of unit testing but I found it easier to test the tree structure simply by skipping forward to implementing evaluation and using the evaluation tests. Tomorrow I will post the wacky implementation I chose for evaluating the tree. Its probably not the best way to do it but it demonstrates some cool Ruby functionality.

Thursday 24 January 2008

Android Code Day: London

In November last year, Google announced an open source operating system for mobile phones called Android. This is a big deal for me because I have wanted a "gPhone" for a while now and a phone that runs Android will be the closest I can get any time soon. I expect the mobile applications market will grow significantly over the next few years and so this is one of the fields I want to start spending time on. I was expecting to start posting about Android in a week or two, after I finished the series of articles on the parsing/calculator exercise. The reason why I'm posting about Android today is because last night I managed to register for the Android Code Day running next week at Google London. I found out about it via one of Google's official blogs and I'm quite glad I was subscribed - I checked a few minutes ago and registration has already closed. These things are popular! I'm still working on the parsing exercise. I can be kinda slow when it comes to this sort of stuff (hey that's part of why I'm doing it!) and I'm struggling to come up with the algorithm for building the binary tree while ensuring I follow order of operations. I could probably Google/Wiki it but its become a personal challenge now - I know I can do this bit without help, its just about getting my brain to shift into the right gear. "You are SUCH a geek – you know that don’t you?" - My boss's reply when I asked for the day off so I could go to the Android Code Day :)

Tuesday 22 January 2008

Really simple syntactic analysis

(Continues from the previous post) So now I have an array of these token objects that I want to build into a binary tree. I said yesterday that I would not handle syntax errors in this first version but I have since realised that I was only talking about a particular type of syntax error: one that occurs at the lexical level. For example, the string "32+65-chocolate" is not an expression I expect my lexer to handle. Now if I was writing a lexer for a language that allows variable declarations, the aforementioned string might be valid because the previous line was "chocolate=6". That however, is outside the scope of this exercise so for now "chocolate" does not lex. So with that said, its time to discuss the other type of syntax error that springs to mind. Here are some examples:

  • 1+
  • 2*+3
  • -4
Uhoh. How is "-4" a syntax error? Its time to spill the beans: I lied when I said my lexer would support integers. This first version will only support non-negative integers. The important thing is to stay focused on getting something working, no matter how basic. I'll deal with those pesky negative numbers later. Ok, so I wrote units tests for the three examples above:
class SyntaxTests < Test::Unit::TestCase
def test_disallow_end_with_operator
tokens = [IntegerToken.new("1"), OperatorToken.new("+")]
assert(!tokens_are_valid(tokens))
end

def test_disallow_two_consecutive_operators
tokens = [IntegerToken.new("2"), OperatorToken.new("*"), OperatorToken.new("-"), IntegerToken.new("3")]
assert(!tokens_are_valid(tokens))
end

def test_disallow_start_with_operator
tokens = [OperatorToken.new("-"), IntegerToken.new("4")]
assert(!tokens_are_valid(tokens))
end
end
Ok now I need to make them pass. No problem!!
def tokens_are_valid(tokens)
return false
end
Ok that was some savage cheats. I obviously need to define some tests for valid syntax aswell as invalid syntax:
def test_allow_two_numbers
tokens = [IntegerToken.new("2"), OperatorToken.new("+"), IntegerToken.new("6")]
assert(tokens_are_valid(tokens))
end

def test_allow_three_numbers
tokens = [IntegerToken.new("73"), OperatorToken.new("*"), IntegerToken.new("56"), OperatorToken.new("-"), IntegerToken.new("94")]
assert(tokens_are_valid(tokens))
end
Ugh, those token constructors are taking up a lot of space aren't they? I might do something about that later. Also while I'm on the topic of token constructors, I also need to write tests to ensure that they raise exceptions if I do something silly like OperatorToken.new("6"). That will also have to wait. Getting back to the topic at hand, I now have some tests that fail:
>ruby calculator.rb
Loaded suite calculator
Started
FF..........
Finished in 0.009 seconds.

1) Failure:
test_allow_three_numbers(SyntaxTests) [calculator.rb:81]:
<false> is not true.

2) Failure:
test_allow_two_numbers(SyntaxTests) [calculator.rb:76]:
<false> is not true.

12 tests, 12 assertions, 2 failures, 0 errors
>Exit code: 1
Ok, now its time to flake out a bit. I figure I'm supposed to define some sort of context-free grammar but it looks kinda hard. Lets do something way easier:
def tokens_are_valid(tokens)
return false if(tokens.empty?)
return false if(tokens.first.class != IntegerToken)
return false if(tokens.last.class != IntegerToken)

expect_next_is_integer = true
tokens.each do |token|
 token_is_integer = (token.class == IntegerToken)
 return false if(expect_next_is_integer != token_is_integer)    
 expect_next_is_integer = !expect_next_is_integer
end

return true
end
If there are no tokens, or we do not start or end with an integer token, then return false. Then just loop through them and make sure the types alternate. I'm assuming that there are only two types of tokens - we will run into problems when it comes time to implement parentheses but that is to be expected. Ok lets give it a try:
>ruby calculator.rb
Loaded suite calculator
Started
............
Finished in 0.001 seconds.

12 tests, 12 assertions, 0 failures, 0 errors
>Exit code: 0
Good job. The process of building the binary tree will be much easier now that we can ensure that our array of tokens is valid.

Monday 21 January 2008

Really simple lexical analysis

(Continues from the previous post)
If I ever develop my own programming language, you can bet I won't be writing the lexer by hand. Programs such as Flex will generate a lexer based on the regular expressions you feed it. However the purpose of this exercise is to learn the fundamentals of how compilers work, so today I will write the lexer. It should be easy because the strings I am parsing are so simple. I'm going to write a bunch of unit tests to define the behavior I am aiming for. Here is an example:
def test_integer_operator_integer
assert_equal([193,"*",671], tokenize_for_value("193*671") ); end
Basically this is saying that when passed the string "193*671", I expect the tokenize_for_value function to return an array that contains the values 193,"*" and 671. This is pretty simple and could be done in lots of different ways. I chose a mechanism that hopefully somewhat resembles what a proper lexer would do. First I define two types of token:
class IntegerToken
attr_reader :value
def initialize(value)
@value = value.to_i()
end
end

class OperatorToken
attr_reader :value
def initialize(value)
@value = value
end
end
The only difference between them at this point is that the IntegerToken converts its value to an integer (duh). I expect to extend these classes once I get to the next step but for now they will suffice. Next, I define an array of regular expressions that recognise integers and operators:
definitions = []
definitions << [/^\d+/, IntegerToken]
definitions << [/^\+/, OperatorToken]
definitions << [/^\-/, OperatorToken]
definitions << [/^\*/, OperatorToken]
definitions << [/^\//, OperatorToken]
Now we just loop our way through the string, matching the regular expressions against it and slicing out the characters that were matched successfully and build a token for them.
  tokens = []
lastSize = 0
# Track the number of remaining characters in the expression.
# If we manage to go through all the regular expressions without
# getting a match then we can parse no more.
while(expression.size != lastSize)
lastSize = expression.size
definitions.each do |regex, token_type|
# Do a match with the regex against the string, remove the
# characters that match and put them into lexmeme
lexmeme = expression.slice!(regex)
if (lexmeme)
  # We got a match, create a new token and break out of
  # the regex loop to preserve the regex ordering.
  tokens << token_type.new(lexmeme)
  break
end
end
end

return tokens
It would be fairly easy to add some code that will report if we had to finish parsing with characters still remaining - this would only happen in the case of a syntax error. I'll add that later once I have a basic version of the calculator working. Finally, I need a method that will take the string, call tokenize and give me an array of their values to make the unit tests easy to write:
def tokenize_for_value(expression)
tokens = tokenize(expression)
return tokens.map { |t| t.value }
end
When I run the ruby file (source) the unit tests execute. I wrote seven, so here is the output:
>ruby calculator_source.rb
Loaded suite calculator_source
Started
.......
Finished in 0.001 seconds.

7 tests, 7 assertions, 0 failures, 0 errors
>Exit code: 0
I'm pleased with how easy it is to write and run unit tests in ruby. There was very little friction. Next up: building a binary tree out of the tokens.

Sunday 20 January 2008

Discovering compilers

My first exercise for learning how compilers work is to write a program that takes a string containing a mathematical expression and evaluates the result e.g. "2+3*7" => 23. I will be implementing it using Ruby and will write unit tests as I go. Since my goal is to learn the underlying concepts of compiler construction, I am avoiding any libraries/functionality provided by Ruby that would make it too easy. The first implementation will ignore whitespace and accept strings containing integers and the 4 standard operators (plus, minus, multiply, divide). The second implementation will report syntax errors and accept decimal numbers and parentheses. According to my research on Wikipedia, this exercise can be broken into the following steps:

  1. lexical analysis (identifying the tokens that appear in the string)
  2. syntactic analysis (assembling the tokens into a tree)
  3. semantic analysis (evaluating the tree for a result)
I will implement the lexical analysis step using regular expressions. They are well suited for the task and I need the practice with them. The syntactic analysis is the tricky bit because I have to follow order of operations. The semantic analysis will be trivial as it simply requires a recursive algorithm executed on the root node of the tree. Next up: Implementing lexical analysis.

Sunday 13 January 2008

Using IRB as an engine for programming puzzle game things

So I've been screwing around with Ruby a fair bit, but haven't really started writing proper programs in it yet. However I did have this one dumb idea that amused me, perhaps you will get a kick out of it. I mentioned in the last post that Ruby has a REPL called IRB. While experimenting with how it works I had this idea of making small puzzles that can be loaded into the IRB and then solved by executing code. Here is sample output for a toy example I made:

C:\Users\paul>irb -r puzzle1.rb
You have begun Puzzle 1.
You are in a small room, with a single closed door.
You can see a silver_key and copper_key, both on the floor.
The goal of the puzzle is to get the door open so you can leave the room.
To inspect individual objects simply type their name.
To interact with objects, use 'noun.verb' syntax. For example, try 'door.open'.

Puzzle1 >door
The door is closed.
Avaliable actions:
close
inspect
lock
open
unlock

Puzzle1 >door.open
The door is locked.

Puzzle1 >door.unlock
Action 'unlock' expects a single argument.

Puzzle1 >door.unlock silver_key
You try to put the key in the lock but it does not fit.

Puzzle1 >door.unlock copper_key
You unlocked the door.

Puzzle1 >door.open
The door is now open.
You exit the room via the doorway. Wheee, you completed the puzzle!

C:\Users\paul>
The source for this is quite ugly at the moment. I'm looking forward to digging into some of Ruby's meta-programming features that will help me clean it up. Now if player knows just a tiny bit of Ruby they can 'cheat' and finish this without using the silver_key or copper_key. I'm hoping to take this idea one step further and make puzzles that cannot be completed without this sort of 'cheating'. I'm not interested in developing typical text adventure style puzzles - I was never that much of a fan of the genre. But logic puzzles that require programming skills? That is another thing entirely. Surely lots of other people have had this same idea but I did a few searches and didn't find anything. A shame really because I imagine they could be quite fun. I will have a go at making one and we will see what happens.

Thursday 10 January 2008

First impressions of Ruby

My first experience with Ruby was the really nice interactive tutorial. This is by far the best way to introduce people to a language that I have seen - it is really slick. My complements to Why for managing to make the first step of learning a new language so painless. However I would be remiss if I failed to mention the damn cursor on the thing - when you are using backspace it always deletes one more character than it looks like. Don't let this put you off though, give it a go! After finishing the interactive tutorial I downloaded the latest installer for windows and began playing around. Exploring Ruby is proving to be lots of fun but I am somewhat spoiled by Visual Studio - the tools and documentation included in the package are simply not as sophisticated and this is hardly surprising. The package comes with a REPL (Read-Eval-Print Loop). If you tried the interactive Ruby tutorial then you already have experience using a REPL - they are quite useful for learning a new programming language, though they are far better suited to interpreted languages than compiled languages. I find the simplicity of a REPL attractive because it makes it convenient to do short experimentation sessions. I can sit down for just 10 minutes and learn a couple of new things. Next post: RRC (Random Ruby Crap) #1

Monday 7 January 2008

A plan

The first step in fixing a problem is recognising the problem exists. If you read my first post then you know I already have this covered. Just to recap, my software development skills are narrowly focused. I idly dream about working for Google but until just recently I was nowhere near being on the right path (I hope I'm closer now). The second step is to devise a plan to solve the problem. The third step is to put the plan in action, but I will write about that another day. This post is about the second step. Here is my plan in no particular order: 1. Learn a programming language that is outside my comfort zone First and foremost, it cannot be statically typed. It is time I got some perspective and experienced first hand what programming is like without this particular "safety net". As compensation for losing static typing I want some advanced object oriented support. Features such as "pure OO" (everything is an object) and stuff like mix-ins would be good. Finally, it must be professionally relevant, which rules out languages that I consider to be academic or niche such as Haskell and Lisp. It didn't take long to decide on Ruby. I am impressed with its OO features and meta-programming support. It is definitely outside my comfort zone and seems like a worthwhile language to learn given the popularity of Rails. Of course the fact that Steve Yegge has written a fair bit of positive stuff about Ruby has nothing at all to do with my decision. 2. Learn to develop for the web using technology other than ASP.NET I love frameworks, but they do have costs associated with them. One cost I have incurred from using ASP.NET exclusively is that I'm a web developer that can't write javascript. Another cost is that I lack familiarity with other approaches to web development such as MVC. Just as I seek perspective in other programming language paradigms, the same applies for web development. I think its worth concentrating on the web in particular because if I'm ever going to get uber rich, it will be from joining/founding a web startup. No, I'm not going to learn PHP. My current plan is to learn javascript, and tool around with some other frameworks such as Ruby on Rails and the Google Web Toolkit. 3. Write a blog about my endeavours. Write often and carefully. Blogging is an important part of my plan for a number of reasons. The most important reason is that writing a post about a particular topic will force me to get my thinking straight. A useful measure of how well you understand something is how well you can explain it. Another reason is the motivational aspect. I am convinced that maintaining a blog will help me stay interested in the project. Hopefully I'll manage to make it interesting enough to generate comments and lets face it, its much easier to stick to something if other people take interest in it. Yet another reason is that blogging will improve my communication skills. I am generally a slow writer. Well formed code comes to me much easier than well formed written English. I suspect I am not alone in that regard. Its really distressing how slowly I am writing these posts. I think at the moment they are taking about 5 hours each. You are probably blinking incredulously at that figure and frankly I do not blame you. What can I say - sometimes it takes a long time to get my thoughts straight. Plus I like to do research on the topics I'm touching on to avoid saying anything incredibly stupid. Ok, enough blogging about blogging. I am aware that it is often frowned upon. It wont happen again. For a while at least. I promise. 4. Fill out some of the computer-sciencey areas that my software engineering degree did not touch heavily on. Particularly: Algorithms and data structures, big O notation and compilers. If you had asked me yesterday to explain how quicksort works I wouldn't have been able to. I know I learned it at some point at university, but it didn't stick. I would struggle to answer questions about tree traversal (all I can remember is something about breadth first or depth first). I understand the basics of complexity analysis - I can tell you why O(n log n) is better than O(n^2) and what they mean, but I can't go much further than that. And as for compilers, well, there wasn't enough interest in the class and it was cancelled so I had to pick something different. For the algorithms and data structures, simple programming exercises will go a long way. For complexity analysis I will start from Wikipedia and go from there. As far as compilers go, I'll be taking baby steps - a calculator program that parses and calculates the result for strings such as "2 + 3 x 5 / (7-4)" by building a binary tree will be a good place to start I think. Well, that is the plan so far. Next post will be about my first impressions of Ruby.

Friday 4 January 2008

This is Steve Yegge's fault

A couple of weeks ago I discovered Steve Yegge's blog via Jeff Atwood. Steve has written many essays (you should see the size of these things, they are not mere 'posts') and in general they focus on programming languages and improving your development skills & practices. I found Steve's writing fascinating and to put it briefly, his essays are the reason this blog exists. Now, I've been trying to kick this blog off for a couple of days and this first post has been causing me some real difficulty because it keeps devolving into a flowery poem entitled "Dear Stevey, how do I love thee, let me count the ways". Let me try once again to express this succinctly and without unnecessary romantics: taken in concentrated dose, by a certain category of programmer, Steve's essays are a cure for laziness in multiple domains. Please allow me a moment of self-indulgence to provide a little background. I am a software engineering graduate and I have been developing software exclusively using Microsoft tools and environments for the entirety of my short career (5 years, including part time work during university). I suspect I have fallen into a particular stereotype that might not be well named but is at least recognisable, given these attributes:

  • Windows is the only operating system I know how to use.
  • Since leaving university 2 years ago I have not learned any new programming languages (I am primarily a C# developer).
  • I have an unhealthy degree of affection for strong typing, static type checking and intellisense.
  • Everything I know about web development is ASP.NET based.
I could go on, but I think you get the point. I am horribly Microsoft focused and lacking in perspective. I'd say the depth of my knowledge is pretty good given how long I've been working in the industry... but my breadth of knowledge? Forget about it! Its probably worth mentioning one thing that I consider to be well-founded: I am nutty about object oriented programming. I am hoping to learn first hand how programming without static typing can be a good thing, but as far as OO is concerned, you will have to pry it out of my cold, dead fingers! As you may have surmised, Steve has convinced me to start blogging. The primary focus of this blog is breaking out of the mold I described above. Reading software blogs and doing my job is not enough. Not if I ever want to work at Google (which I do). You don't become an amazing programmer simply by clocking in 9-5 for X years and reading Joel on Software (although I'm sure it helps). What embarrasses me is that it has taken me so long to figure that out. Next post: some more details about my plan.