Saturday 21 February 2009

Fluent NHibernate semantic model: Visitors

My work-in-progress semantic model based version of Fluent NHibernate makes heavy use of the visitor pattern. Currently, the visitor implementation is there to serve two purposes:

  1. Facilitate the construction of a Hbm* representation of the mapping model. Hbm* refers to the set of classes found in the NHibernate.Cfg.MappingSchema namespace that are generated from the NHibernate mapping schema. My version of Fluent NHibernate communicates with NHibernate by serializing the Hbm* representation to xml.
  2. Enable powerful, user-defined conventions that walk the mapping model and make changes. The NamingConvention class is a very simple example of this – it takes mapping model instances and sets their Name property based on reflected data such as the System.Type or the PropertyInfo.

Lets begin by looking at the implementation of AcceptVisitor for the root of the mapping model – HibernateMapping:

public override void AcceptVisitor(IMappingModelVisitor visitor)
{
    visitor.ProcessHibernateMapping(this);

    foreach (var classMapping in Classes)
        visitor.Visit(classMapping);
}

This is reasonably straightforward. The HibernateMapping tells the visitor to first process a HibernateMapping instance, and passes itself as the argument. Then it tells the visitor to visit each of the child classes. What the visitor does when its told to visit a ClassMapping is its business, but what it is likely to do is call AcceptVisitor on the ClassMapping:

public override void AcceptVisitor(IMappingModelVisitor visitor)
{
    visitor.ProcessClass(this);            

    if (Id != null)
        visitor.Visit(Id);

    if (Discriminator != null)
        visitor.Visit(Discriminator);

    foreach (var subclass in Subclasses)
        visitor.Visit(subclass);

    base.AcceptVisitor(visitor);
}

This is similar to the previous AcceptVisitor implementation, but its worth noting that at the end, it calls base.AcceptVisitor(visitor). This is necessary because ClassMapping inherits from a common base class (JoinedSubclassMapping and SubclassMapping also inherit from this base class). Here is AcceptVisitor on ClassMappingBase:

public override void AcceptVisitor(IMappingModelVisitor visitor)
{
    foreach (var collection in Collections)
        visitor.Visit(collection);

    foreach (var property in Properties)
        visitor.Visit(property);

    foreach (var reference in References)
        visitor.Visit(reference);
}

Of course – all class mappings, regardless of how they fit into an inheritance hierarchy, can have collections, properties and references (many-to-ones). Its probably not necessary to follow this any further. The important point is that as long as the visitor calls AcceptVisitor when it is told to visit something, then it will make its way along the entire mapping model. To make life easier, I’ve implemented a DefaultMappingModelVisitor class that does precisely this. It has a whole bunch of code that all looks very similar to this:

public override void Visit(PropertyMapping propertyMapping)
{
    propertyMapping.AcceptVisitor(this);
}

public override void Visit(ManyToOneMapping manyToOneMapping)
{
    manyToOneMapping.AcceptVisitor(this);
}

public override void Visit(KeyMapping keyMapping)
{
    keyMapping.AcceptVisitor(this);
}
.. Many more Visit methods

Now you might be looking at this and wondering why this is necessary. Why can’t we skip this completely, and just have AcceptVisitor implementations that will call AcceptVisitor directly like this:

public override void AcceptVisitor(IMappingModelVisitor visitor)
{
    visitor.ProcessHibernateMapping(this);

    foreach (var classMapping in Classes)
        classMapping.AcceptVisitor(visitor); <--- JUST DO THIS??
}

The answer is that the proposed change will work when you want one single visitor instance to visit the entire mapping model. While this works fine for conventions (as you will see in a moment), it does not work so well for building the Hbm representation. I’ll get to that in a second, but lets first take a look at the simpler case of the conventions. Here is a gutted version of the NamingConvention:

public class NamingConvention : DefaultMappingModelVisitor
{
    public Func<MemberInfo, string> DetermineNameFromMember = info => info.Name;
    public Func<Type, string> DetermineNameFromType = type => type.AssemblyQualifiedName;

    public override void ProcessOneToMany(OneToManyMapping oneToManyMapping)
    {
        if (!oneToManyMapping.Attributes.IsSpecified(x => x.ClassName))
        {
            if (oneToManyMapping.ChildType == null)
                throw new ConventionException("Cannot apply the naming convention. No type specified.", oneToManyMapping);
            oneToManyMapping.ClassName = DetermineNameFromType(oneToManyMapping.ChildType);
        }
    }
}

I’ve removed the majority of its implementation for the sake of brevity. As it currently stands, it will walk the entire mapping model (because it inherits from the aforementioned DefaultMappingModelVisitor), and when it encounters a OneToManyMapping, it will attempt to set its ClassName based on the ChildType property. Its worth noting that the full implementation of the NamingConvention class actually handles naming of many other mappings types, such as ClassMappings, ManyToManyMappings, etc. This means that this visitor completely handles the concern of setting the name of mapping model elements for the entire mapping model. This point is important, because the next example is different.  As I mentioned before, this visitor implementation would work fine with the previous simplification of having AcceptVisitor directly call AcceptVisitor on the children. Lets now move on to the process of building a Hbm* representation, and examine why the simplification won’t work so well for this case.

I define an interface for classes that build Hbm:

public interface IHbmWriter<T>
{
    object Write(T mappingModel);        
}    

Here is an example implementor, a hbm writer that will handle ColumnMappings:

public class HbmColumnWriter : NullMappingModelVisitor, IHbmWriter<ColumnMapping>
{
    private HbmColumn _hbm;

    public object Write(ColumnMapping mappingModel)
    {
        _hbm = null;
        mappingModel.AcceptVisitor(this);
        return _hbm;
    }

    public override void ProcessColumn(ColumnMapping columnMapping)
    {
        _hbm = new HbmColumn();
        _hbm.name = columnMapping.Name;
        
        if(columnMapping.Attributes.IsSpecified(x => x.IsNotNullable))
        {
            _hbm.notnull = columnMapping.IsNotNullable;
            _hbm.notnullSpecified = true;
        }

        if (columnMapping.Attributes.IsSpecified(x => x.Length))
            _hbm.length = columnMapping.Length.ToString();
        
//etc
    }
}

You’ll notice that this class inherits from NullMappingModelVisitor, which is basically a blank implementation of IMappingModelVisitor. It has all the methods, but none of them do anything. So this visitor ONLY knows how to handle ColumnMappings - if its passed to any other type of mapping, it will do nothing. This is certainly a difference approach to the NamingConvention, which actually knew how to set the name for Classes, OneToMany’s, ManyToMany’s and many other mapping model elements. So why does this difference exist? Basically, the job of generating all the Hbm that NHibernate requires is too big for one class. Creating separate classes for generating the appropriate HBM helps make the job more manageable. These classes can then be composed, like so:

public class HbmIdWriter : NullMappingModelVisitor, IHbmWriter<IdMapping>
{
    private readonly IHbmWriter<ColumnMapping> _columnWriter;
    private readonly IHbmWriter<IdGeneratorMapping> _generatorWriter;

    private HbmId _hbm;

    public HbmIdWriter(IHbmWriter<ColumnMapping> columnWriter, IHbmWriter<IdGeneratorMapping> generatorWriter)
    {
        _columnWriter = columnWriter;
        _generatorWriter = generatorWriter;
    }

    public object Write(IdMapping mappingModel)
    {
        _hbm = null; 
        mappingModel.AcceptVisitor(this);
        return _hbm;
    }

    public override void ProcessId(IdMapping idMapping)
    {
        _hbm = new HbmId();

        if(idMapping.Attributes.IsSpecified(x => x.Name))
            _hbm.name = idMapping.Name;
    }

    public override void Visit(ColumnMapping columnMapping)
    {
        var columnHbm = (HbmColumn) _columnWriter.Write(columnMapping);
        columnHbm.AddTo(ref _hbm.column);
    }

    public override void Visit(IdGeneratorMapping generatorMapping)
    {
        var generatorHbm = (HbmGenerator) _generatorWriter.Write(generatorMapping);
        _hbm.generator = generatorHbm;
    }
}

This hbm writer handles IdMappings. IdMappings include one or more columns and a generator, so this writer composes a IHbmWriter<ColumnMapping> and a IHbmWriter<IdGeneratorMapping>. In this way, the task of creating the hbm representation can be managed by a family of visitors that delegate to each other as required.

Finally now, I can return to the previous point of why all the AcceptVisitor implementations call visitor.Visit(child) rather than child.AcceptVisitor(visitor). The former allows the current visitor to see that a different visitor is passed in when calling child.AcceptVisitor(). You can see some of this happening above, in the override for Visit(ColumnMapping) – it asks the IHbmWriter<ColumnMapping> to write the column. The implementation of that method will call columnMapping.AcceptVisitor(this). Thus, the Fluent NHibernate semantic model has an implementation of the visitor pattern that supports both single visitors that visit the entire graph themselves, and families of visitors that collaborate to get a large job done.

Sunday 8 February 2009

Fluent NHibernate semantic model: AttributeStore<T>

My work to rewrite Fluent NHibernate using a semantic model continues at a reasonable pace. I will admit there is some unconventional stuff in the code base, and much of it deserves explanation. Today I would like to explain AttributeStore<T> - what it is for and how it works. Lets start by taking a look at the model class for the contents of a collection with OneToMany semantics:

public class OneToManyMapping : MappingBase, ICollectionContentsMapping
{
    private readonly AttributeStore<OneToManyMapping> _attributes;

    public OneToManyMapping()
    {
        _attributes = new AttributeStore<OneToManyMapping>();
        _attributes.SetDefault(x => x.ExceptionOnNotFound, true);
    }

    public AttributeStore<OneToManyMapping> Attributes
    {
        get { return _attributes; }
    }

    public string ClassName
    {
        get { return _attributes.Get(x => x.ClassName); }
        set { _attributes.Set(x => x.ClassName, value); }
    }

    public bool ExceptionOnNotFound
    {
        get { return _attributes.Get(x => x.ExceptionOnNotFound); }
        set { _attributes.Set(x => x.ExceptionOnNotFound, value); }
    }
}

As you can see, I am doing something unusual in some of the getters and setters. Basically, I am storing the values of certain properties in a dictionary, and these values are keyed on the name of the property you use to access them. Its very easy to add new attributes, I use this Resharper live template:

public $ReturnType$ $PropertyName$
{
    get { return $_attributes$.Get(x => x.$PropertyName$); }
    set { $_attributes$.Set(x => x.$PropertyName$, value); }
}

This approach allows me to write code that asks questions about properties beyond simply “what is the value?”. Here is an example from the NamingConvention class:

public override void ProcessOneToMany(OneToManyMapping oneToManyMapping)
{
    if (!oneToManyMapping.Attributes.IsSpecified(x => x.ClassName))
    {
        if (oneToManyMapping.ChildType == null)
            throw new ConventionException("Cannot apply the naming convention. No type specified.", oneToManyMapping);
        oneToManyMapping.ClassName = DetermineNameFromType(oneToManyMapping.ChildType);
    }
}
This convention walks the mapping model, naming mappings based on the assigned Type or MemberInfo. In this example, if the ClassName property for the OneToManyMapping hasn’t been specified explicitly, then the NamingConvention uses the ChildType property to set the ClassName. If the user has set the ClassName themselves, then I don’t want the NamingConvention to overwrite that, so I ask the question “Is the ClassName property specified?”. If the ClassName property was just implemented as an autoproperty, I would probably have to check if it was null. But what if the property was a bool? Make it a nullable bool? Then I would have nasty if(blah.HasValue) code in various places. Yuck!

Here is another example, this time from the class that creates a HbmOneToMany instance based on my OneToManyMapping:

public override void ProcessOneToMany(OneToManyMapping oneToManyMapping)
{
    _hbmOneToMany = new HbmOneToMany();
    _hbmOneToMany.@class = oneToManyMapping.ClassName;

    if(oneToManyMapping.Attributes.IsSpecified(x => x.ExceptionOnNotFound))
    {
        _hbmOneToMany.SetNotFound(oneToManyMapping.ExceptionOnNotFound);
    }
}

This code always sets the @class field on the HbmOneToMany, but it won’t always call SetNotFound. It only calls SetNotFound if the the ExceptionOnNotFound property was specified. The point of this behaviour is to only generate the xml the user desires. It is not mandatory to set the not-found attribute on a one-to-many element, so why write it if the user hasn’t specified it?

As well as being able to ask questions about the properties, I also wanted a convenient way to copy them. The next code sample is the code for OneToManyPart. This class is part of the fluent interface for FluentNHibernate. It builds up information on the collection being mapped, and builds the appropriate collection when ResolveCollectionMapping() is called (obviously the IsInverse property is the only value copied at the moment, but that will change as the supported functionality grows):

public class OneToManyPart<PARENT, CHILD> : IDeferredCollectionMapping
{
    private readonly PropertyInfo _info;
    private readonly AttributeStore<ICollectionMapping> _attributes;

    private Func<ICollectionMapping> _collectionBuilder;

    public OneToManyPart(PropertyInfo info)
    {
        _info = info;
        _attributes = new AttributeStore<ICollectionMapping>();
        AsBag();   
    }

    public OneToManyPart<PARENT, CHILD> AsBag()
    {
        _collectionBuilder = () => new BagMapping();
        return this;
    }

    public OneToManyPart<PARENT, CHILD> AsSet()
    {
        _collectionBuilder = () => new SetMapping();
        return this;
    }

    public OneToManyPart<PARENT, CHILD> IsInverse()
    {
        _attributes.Set(x => x.IsInverse, true);
        return this;
    }

    ICollectionMapping IDeferredCollectionMapping.ResolveCollectionMapping()
    {
        var collection = _collectionBuilder();       
        _attributes.CopyTo(collection.Attributes);

        collection.PropertyInfo = _info;            
        collection.Key = new KeyMapping();
        collection.Contents = new OneToManyMapping {ChildType = typeof (CHILD)};

        return collection;
    }

}

The relevant lines are at the beginning of ResolveCollectionMapping(). Once the collection instance is created, the attributes collected in the _attributes field are copied to the AttributeStore for the new collection instance.

Well that is probably enough examples of why I am using this pattern. Now I want to run through the implementation. Lets start with AttributeStore<T>:

public class AttributeStore<T>
{
    private readonly AttributeStore _store;

    public AttributeStore()
        : this(new AttributeStore())
    {

    }

    public AttributeStore(AttributeStore store)
    {
        _store = store;
    }

    public U Get<U>(Expression<Func<T, U>> exp)
    {
        return (U)(_store[GetKey(exp)] ?? default(U));
    }

    public void Set<U>(Expression<Func<T, U>> exp, U value)
    {
        _store[GetKey(exp)] = value;
    }

    public void SetDefault<U>(Expression<Func<T, U>> exp, U value)
    {
        _store.SetDefault(GetKey(exp), value);
    }   

    public bool IsSpecified<U>(Expression<Func<T, U>> exp)
    {
        return _store.IsSpecified(GetKey(exp));
    }

    public void CopyTo(AttributeStore<T> target)
    {
        _store.CopyTo(target._store);
    }

    private string GetKey<U>(Expression<Func<T, U>> exp)
    {
        PropertyInfo info = ReflectionHelper.GetProperty(exp);
        return info.Name;
    }
}

As you can see, AttributeStore<T> is a generic wrapper for a non-generic class called AttributeStore. The purpose of AttributeStore<T> is to expose get and set methods that take a lambda, and convert that lambda into a dictionary key, and then delegate to an inner attribute store using that dictionary key. Finally, here is the code for the non-generic attribute store:

public class AttributeStore
{
    private readonly IDictionary<string, object> _attributes;
    private readonly IDictionary<string, object> _defaults;

    public AttributeStore()
    {
        _attributes = new Dictionary<string, object>();
        _defaults = new Dictionary<string, object>();
    }

    public object this[string key]
    {
        get
        {
            if (_attributes.ContainsKey(key))
                return _attributes[key];
            
            if (_defaults.ContainsKey(key))
                return _defaults[key];

            return null;
        }
        set { _attributes[key] = value; }
    }

    public bool IsSpecified(string key)
    {
        return _attributes.ContainsKey(key);
    }

    public void CopyTo(AttributeStore store)
    {
        foreach (KeyValuePair<string, object> pair in _attributes)
            store._attributes[pair.Key] = pair.Value;
    }

    public void SetDefault(string key, object value)
    {
        _defaults[key] = value;
    }
}

AttributeStore is just a wrapper for a couple of dictionaries, one for the values that have been specified, and one for the default values. That’s pretty much all there is to it.

I see AttributeStore<T> as a superior alternative to the weakly typed bag of attributes approach that Fluent NHibernate currently uses.There are no magic strings, and its all strongly typed. Its more powerful than just using properties with backing fields, and it requires pretty much the same amount of code. Sure, its much slower, but performance is not really a concern for Fluent NHibernate. I can see myself using this pattern on other projects.

Monday 2 February 2009

I’m starting to Git it

For the last week I have been experimenting with Git. Git is a relatively new source code management tool that is set apart from the SCM’s you are likely to be familiar with because it is distributed rather than centralised. I’m not going to sit here and harp on about all the reasons why this approach has merit when there are plenty of resources available that will do a much better job:

Nor am I going to teach you the fundamentals of how git works and how to use it, again for similar reasons:

Today I just want to talk about what I had to do to start using Git effectively for my work on Fluent NHibernate (which is hosted in Subversion on Google Code). Basically, I wanted a local repository for Fluent NH. I wanted to be able to commit smaller chunks of work to the local repository and then commit to SVN once I was happy with it. I also wanted to be able to branch, rollback, view history, etc all without touching the SVN repository. Git gave me these capabilities.

Here are the steps I followed to get up and running with Git and a Google Code hosted project:

  1. Installed msysgit. I installed version 1.5.5 because I heard that the later versions had problems with git-svn (the tool for bridging Git and Subversion). I have no idea if this is actually the case. When installing, I chose the option that added git to my path environment variable. This allowed me to use Git in the windows command line, but there was a hitch. Every time I ran git from the windows command line, it would close the command line. I followed the fix here here to solve it.
  2. Configure git. I set up a SSH key and my details, as described here. I also set the autocrlf option to false – this seems to minimise problems with line endings. To do so I used the command:

    git config –global core.autocrlf false
     
  3. Used git-svn to create a local clone. Initially I ran into some difficulty because I pulled the repository down using http – this caused problems later when I tried to commit and could not authenticate properly. Using https when cloning the repository fixed the problem. Another issue to be aware of when using git-svn is that unlike the rest of Git, git-svn is very slow. I had to leave git-svn working away for over 20 minutes when I first pulled down my branch of Fluent NHibernate. Later I learned that you can tell git-svn to clone a revision rather than the entire repository, so I used this functionality when I pulled the second time (using https instead of http). The command I used was:

    git svn clone https://fluent-nhibernate.googlecode.com/svn/branches/pb-rewrite --username "paul.batum" –r234

    My Google Code username is “paul.batum”. I was prompted for my password the first time, but then Git seemed to remember it from then on. The –r234 switch specified a particular revision to get. If I was prepared to wait, I would have omitted the flag and pulled the entire history into my local repository, but patience got the better of me.
  4. Added a .gitignore file to specify which types of files should not be tracked by Git. I grabbed the .gitignore file from an a .NET project hosted in GitHub called Machine. I then made a few modifications, copied into my new repository and checked it in:

    git add .gitignore
    git commit –m “Added ignore file”


    You can view my ignore file here.
  5. Worked with the local repository. My git repository was ready to use. I worked on the Fluent NH code for a while, using the git gui (run simply by using the git gui command) to make small commits.
  6. Rebased my local repository against the SVN repository. Before I commit back to the SVN repository, I need to merge any changes that have been made since I cloned the repository. Git offers merge functionality, but it also offers something slightly different called a rebase. In a nutshell, the difference between a merge and a rebase is that a merge will interleave your changes with everyone else’s changes. A rebase will take each of your changes, and apply them all sequentially after everyone else’s changes. You can read a better explanation here. The rebase command was straightforward:

    git svn rebase
     
  7. Checked the state of my repository. I wanted to see what my repository looked like after the rebase. A commit viewer called gitk is installed as part of msysgit, to run it with all branches displayed the command is:

    gitk –-all
     
  8. Commited my changes to the SVN repository. With the rebase complete, my local commits were ready to be pushed to the main repository on Google Code. To commit my changes back to SVN, I ran the following command:

    git svn dcommit –-username paul.batum

    The dcommit command pushes my commits to the SVN repository one at a time. This means that the rich history that was created locally by my small commits is preserved in the SVN history. There are some options for “squashing” all of the local changes so that they are all represented by a single commit, but I am yet to explore those.

I am still a total Git newbie. But hopefully this guide will save some hair pulling for other windows developers that decide to learn git by using it locally for their SVN hosted projects.