Saturday, October 4, 2008

RXRDG Part III - (Re)visiting the tree

This is the third part in the Regular Expression Random Data Generator series. This installment builds on and some knowledge of the previous two is required.

It's time to output our syntax tree. The main objective here will be too traverse the tree and output the type and other information on the nodes as we go.


static void Main(string[] args)
{
string expression = "ab(cd)*";

Parser parser = new Parser();
INode node = parser.Parse(expression);

Console.WriteLine(Traverse(node));
}

static string Traverse(INode node)
{
string result = string.Empty;
result += string.Format("<{0}>{1}", node.Token.TokenType, Environment.NewLine);
foreach (INode childNode in node.ChildNodes)
{
result += Traverse(childNode);
}
result += string.Format("{1}", node.Token.TokenType, Environment.NewLine);

return result;
}

This certainly does the job but there are a few problems with this approach:

  • It isn't very efficient - it uses recursion and string concatenation
  • It isn't very extensible - what happens when we change the type of output we want?
  • It isn't very helpful - to get better information out of a node we'll need to add a switch statement and test for the type of INode

A more interesting way of getting the output is using a visitor design pattern. The basic idea behind the visitor pattern is a mechanism called double dispatch. Lets define an interface for the visitor class.


public interface IVisitor
{
void Visit(LiteralNode node);
void Visit(RepetitionNode node);
void Visit(ConcatenationNode node);
void Visit(ParenthesisNode node);
}

The visitor will contain all the necessary logic to create the output we desire. Each Visit method takes a type of node so extracting information from a node within a method will be easy. The tricky part is to call the correct method with the correct parameter. Unfortunately C# does not support multimethods so there are only two ways of doing that. One is to write a switch statement to identify a type of node and call the appropriate method. The other is to let the nodes do it. First we define the caller interface.


public interface IVisitable
{
void Accept(IVisitor visitor);
}

Next we change the INode interface to inherit the IVisitable interface and extend the NodeBase class by adding an Accept method as required.


public abstract class NodeBase : INode
{
public abstract void Accept(IVisitor visitor);

//omited...
}

And finally we need to extend each node class with the concrete implementation of the Accept method.


public override void Accept(IVisitor visitor)
{
visitor.Visit(this);
}

So every node is capable of calling the correct Visitor method passing itself as a parameter. You may be wondering why we are violating the DRY principle here. Couldn't we simply implement the Accept method in the base class? We could, but the base class would simply call the Accept(NodeBase node) method and not the concrete method we need!

The disadvantages of this pattern are:

  • The code is less clear if the design pattern is not known.
  • The list of derived visitable classes must be known in advance
  • Boilerpate code is added to every derived visitable class

There are ways to mitigate these limitations (using generic visitors or dynamic casting) but that's out of scope for this series. In our case the complete list of RE tokens is known and not likely to extend. What visitor allows us to extend is the ways in which to utilize strong typed information we recieve from individual nodes. Lets write our XmlVisitor! First we define the basics.


public class XmlVisitor : IVisitor
{
readonly string LR = Environment.NewLine;
readonly static int TabSpaces = 3;

private int _level;
private StringBuilder _builder = new StringBuilder();

private static string GetTab(int tabLevel)
{
string tab = string.Empty;
return tab.PadLeft(tabLevel * TabSpaces, ' ');
}
}

The StringBuilder will help with the efficiency of building a string representation of the tree while GetTab method will take care of proper indentation. Next we fill in the interface members.


public void Visit(LiteralNode node)
{
string tab = GetTab(_level);
_builder.AppendFormat("{0}<{1}>{2}{3}", tab, "literal", node.Token.Character, LR);
}

public void Visit(RepetitionNode node)
{
string tab = GetTab(_level);
_builder.AppendFormat("{0}<{1} min='{2}' max='{3}'>{4}", tab, "repeatition", node.Token.MinOccurs, node.Token.MaxOccurs, LR);
_level++;
foreach (INode childNode in node.ChildNodes)
{
childNode.Accept(this);
}
_level--;
_builder.AppendFormat("{0}{2}", tab, "repeatition", LR);
}

//rest omited...

Using our visitor is still quite cumbersome as we need to create an instance of the visitor, get the node to accept the visitor and then have a method return the StringBuilder value. We can wrap that functionality in a single static method. This means that initializing the visitor will not be needed so we can declare its default constructor private.


private XmlVisitor()
{ }

public static string Visit(INode node)
{
XmlVisitor visitor = new XmlVisitor();
node.Accept(visitor);
return visitor._builder.ToString();
}

We can remove the cumbersome Traverse method and test our shiny new visitor!


Console.WriteLine(XmlVisitor.Visit(node));
//Console.WriteLine(Traverse(node));


Related posts:

No comments: