Piglet 1.4 released

Another release of Piglet! It’s been some time since the last release, and here are the major new features. Get it on NuGet or fork away on Github.

Performance

Parser construction and and especially lexer construction speeds are vastly improved. The main feature here is improved lexing speed, especially when you are handling lange and overlapping character sets.

Useability

Prioritize constants in rule over expressions. When writing constants in rules they now receive higher priority than expressions previously written. This reduces the amount of expressions the user needs, and is the normal right thing to do anyway.

Unicode support is now improved. \w and \d now handles the same ranges as normal MS regular expressions.

Thread safety

Both the lexer and the parser are now thread safe. The lexer interface is slightly changed in order to accommodate this. Lexers now include the Tokenize method which is way better to read token streams than the previous for loop.

Piglet also has a new icon in nuGet!

Bug fixes

A few bug fixes has also been sneaked in.

For full details, check the github list of issues for release 1.4. Please report any and all problems or issues on GitHub, or why not ask a question on stack overflow? Make sure to tag it with piglet to make sure someone spots it.

Building a DSL with Piglet – Extending our DSL

In the last part we created a very basic DSL for use with turtle graphics. Today we’re going to extend the language with more capabilites and learn more about how to use Piglet and also incidentally a bit of compiler construction (nothing scary, I promise).

This is the DSL we ended up with last time:

pendown
move 50
rotate 90
move 50
rotate 90
move 50
rotate 90
move 50
penup
move -10
rotate 90
move 10
pendown
move 30
rotate 90
move 30
rotate 90
move 30
rotate 90
move 30

One obvious thing here is that it would be really nice if we could have variables. We could represent the constants with something else and perform calculations on them. An important thing is to consider the syntax, for a simple DSL we want something that is easy to parse. For this reason we’ll use a var keyword to make variable declarations and require all variables to start with a $. Let’s modify the existing program to include our new language features:

var $size = 50
var $innersize = 30
pendown
move $size
rotate 90
move $size
rotate 90
move $size
rotate 90
move $size
penup
move -10
rotate 90
move 10
pendown
move $innersize
rotate 90
move $innersize
rotate 90
move $innersize
rotate 90
move $innersize

Looking at the new DSL, we note that there is now a new legal statement to make – the var declaration. So we need to extend statement with another rule. The statement list is already pretty long, so we separate it out into another rule:

// Runtime stuff
var variables = new Dictionary<string, int>();

var variableIdentifier = configurator.Expression();
variableIdentifier.ThatMatches("$[A-Za-z0-9]+").AndReturns(f => f.Substring(1));

variableDeclaration.IsMadeUp.By("var")
    .Followed.By(variableIdentifier).As("Name")
    .Followed.By("=")
    .Followed.By<int>().As("InitialValue").WhenFound(f =>
    {
        variables.Add(f.Name, f.InitialValue);
        return null;
    });

Since we want something that matches not a single constant string, we need to make an expression. Expressions are made using the configuration.Expression() method. You give it a regex and you get back an object. You also need to make a function to return a value for the expression. What we want is the variable name without the $ at the start.

To keep track of what variables we have and what values they have, we add them to a dictionary of strings and integers which we then can reference.

We add this new rule to our choices for statementList by adding this to the end:

.Or.By(variableDeclaration)

Note that our rule has just a single component we do not need to add a WhenFound method. It will automatically return the value of the function returned by the single value. Not that we’re using the values returned yet, but we will.

Continuing on, we want to use the variables for the rest of the commands. In order to do that we need to make another rule. Let’s call it expression. For now, a valid expression is either a constant int value, or a variable name. We then use the new rule instead of plain ints for the move and rotate commands. Here’s the full listing that supports the language that we set out to support so far:

// Runtime stuff
var turtle = new Turtle(canvas1);
var variables = new Dictionary<string, int>();

// Parser configurator
var configurator = ParserFactory.Fluent();
var statementList = configurator.Rule();
var statement = configurator.Rule();
var variableDeclaration = configurator.Rule();
var expression = configurator.Rule();

var variableIdentifier = configurator.Expression();
variableIdentifier.ThatMatches("$[A-Za-z0-9]+").AndReturns(f => f.Substring(1));

variableDeclaration.IsMadeUp.By("var")
    .Followed.By(variableIdentifier).As("Name")
    .Followed.By("=")
    .Followed.By<int>().As("InitialValue").WhenFound(f =>
    {
        variables.Add(f.Name, f.InitialValue);
        return null;
    });

expression.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

statementList.IsMadeUp.ByListOf(statement);
statement.IsMadeUp.By("pendown").WhenFound(f =>
{
    turtle.PenDown = true;
    return null;
})
    .Or.By("penup").WhenFound(f =>
    {
        turtle.PenDown = false;
        return null;
    })
    .Or.By("move").Followed.By(expression).As("Distance").WhenFound(f =>
    {
        turtle.Move(f.Distance);
        return null;
    })
    .Or.By("rotate").Followed.By(expression).As("Angle").WhenFound(f =>
    {
        turtle.Rotate(f.Angle);
        return null;
    })
    .Or.By(variableDeclaration);

A few interesting possibilities here. It’s perfectly acceptable to replace the int for the variable declaration. In fact, it’s a pretty good idea. This enables us to write things like this:

var $foo = 42
var $bar = $foo

What we really want however is to be able to write this:

var $size = 50
var $spacing = 10
var $innersize = $size - $spacing * 2

We’re halfway there already. What is needed is to create some more rules for how an expression can be made. We start simple, we allow only a single addition. Separate what is today an expression into a new rule called term, and make a new rule for additionExpressions.

expression.IsMadeUp.By(additionExpression);

additionExpression.IsMadeUp.By(term).As("First").Followed.By("+").Followed.By(term).As("Second").WhenFound(f => f.First + f.Second)
    .Or.By(term);
term.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

So, an addition expression is either an add, or a single lone value. This works fine as long as you dont try something like var $a = $b + $c + 100. This is easily fixable by changing the first term into an additionExpression. We are wiring the rule to itself! And like magic we can have as many additions in a row as we’d like. It’s trivial to add a subtraction expression as well, duplicate the rule, and change the operator and the WhenFound function to subtract instead of add (renaming to addSub instead to reflect the new functionality).

addSub.IsMadeUp.By(additionExpression).As("First").Followed.By("+").Followed.By(term).As("Second").WhenFound(f => f.First + f.Second)
                    .Or.By(additionExpression).As("First").Followed.By("-").Followed.By(term).As("Second").WhenFound(f => f.First - f.Second)
                    .Or.By(term);

However, if we add a multiplication and division here things will start to get strange. Everything will be evaluated left to right, so when writing var $a = 2 + 3 * 10 $a will have the value 60 instead of the expected 32. In order to solve this we need to redefine term. A term will now be a multiplication or division expression or a single factor that is a constant. We also wire this rule to itself like the addition rule so we can do many multiplications in a row. The expression grammar now looks like this:

expression.IsMadeUp.By(addSub);

addSub.IsMadeUp.By(addSub).As("First").Followed.By("+").Followed.By(mulDiv).As("Second").WhenFound(f => f.First + f.Second)
    .Or.By(addSub).As("First").Followed.By("-").Followed.By(mulDiv).As("Second").WhenFound(f => f.First - f.Second)
    .Or.By(mulDiv);

mulDiv.IsMadeUp.By(mulDiv).As("First").Followed.By("*").Followed.By(factor).As("Second").WhenFound(f => f.First * f.Second)
    .Or.By(mulDiv).As("First").Followed.By("/").Followed.By(factor).As("Second").WhenFound(f => f.First / f.Second)
    .Or.By(factor);

factor.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable]);

As the final thing for today, let’s add support for parenthesis. They go in the factor rule. So, a factor can be an entire expression wrapped in parenthesis. It’s almost a thing of magic. We wire the entire last rule back up to the start.

factor.IsMadeUp.By<int>()
    .Or.By(variableIdentifier).As("Variable").WhenFound(f => variables[f.Variable])
    .Or.By("(").Followed.By(expression).As("Expression").Followed.By(")").WhenFound(f => f.Expression);

This is where we’ll end for this part. We have a fully functional expression parser. You can use expressions for all the commands that previously only took integer values, and you can assign values to variables. A few things for the adventurous to try:

  • Add variable assignment. Make set $foo = 1+3*$bar a legal, working piece of code
  • Add support for the modulus operator
  • Add support for unary minus. var $foo = -$barM work. This is a bit tricky, and don’t be afraid if you get some scary exceptions from Piglet

Next time, we’ll make the DSL a bit more fun. We’ll add some basic flow control! Code on github is updated with the latest version

Tagged , , ,

Using Piglet to create a DSL – Setting the stage for turtles

Piglet has quite a few uses, and a case where it shines in particular is in the area of domain specific languages. So, in order to compliment the admittedly scarce tutorials on how to actually use Piglet, I intend to write a short little series on how to make a domain specific language using Piglet. Let’s get straight to it.

Enter the Turtle

I’m going to go for a classic educational DSL here, Turtle graphics. Because everyone like turtles, right? Basically, it’s a way of drawing graphics that could be imagined as if you had a (very obedient) turtle with a paintbrush attached to the back of it’s shell. Masterfully trained, it responds to three commands:

  • Put the brush up or down
  • Move forward or backward a certain number of steps
  • Rotate a certain number of degrees

I have made a little WPF application that implements this behaviour. Eschewing all the WPF junk, here’s the turtle itself:

    public class Turtle
    {
        private readonly Canvas canvas;
        private double angle;
        private double x;
        private double y;

        public Turtle(Canvas canvas)
        {
            this.canvas = canvas;
            x = canvas.ActualWidth/2;
            y = canvas.ActualHeight/2;
        }

        public bool PenDown { get; set; }

        public void Move(double distance)
        {
            var oldX = x;
            var oldY = y;

            double rads = angle/360.0*Math.PI*2.0;
            x += Math.Sin(rads)*distance;
            y += Math.Cos(rads)*distance;

            if (PenDown)
            {
                canvas.Children.Add(new Line { X1 = oldX, Y1 = oldY, X2 = x, Y2 = y, Stroke = new SolidColorBrush(Color.FromRgb(0, 0, 0))});
            }
        }

        public void Rotate(float a)
        {
            angle += a;
        }
    }

Nothing fancy, unless you’ve forgotten all about basic trigonometry. It takes a canvas and when moving it leaves trails if the pen is set to down. Using the turtle is straightforward. This program draws a twenty-sided polygon:

var turtle = new Turtle(canvas);          
turtle.Rotate(90);

turtle.PenDown = true;

for (int i = 0; i < 20; ++i)
{
   turtle.Rotate(360/20);
   turtle.Move(20);
}

It’s fun drawing with the turtle, but cumbersome since you’d have to recompile every time you want to change the shape. What we would really want is to describe it in a structured way using text. Sort of like a mini-programming language. LOGO would be the classic for turtle programming, but I feel it would be more fun to make our own little language for this.

Turtleese

When making a DSL or full blown language, in any tool, it’s usually easiest to start small and work your way up. To get the ball rolling, or turtle walking, let’s make a language that contains only the basic commands understood by the turtle. A sample piece of this rudimentary turtleese looks like this:

pendown
move 50
rotate 90
move 50
rotate 90
move 50
rotate 90
move 50
penup
move -10
rotate 90
move 10
pendown
move 30
rotate 90
move 30
rotate 90
move 30
rotate 90
move 30

This program should make the turtle draw two concentric squares, 10 units apart. It’s really verbose, but it’s a start that we can improve on. If we look at the structure of the program, we can see that this is made up of list of statements. This is the key to formulate a grammar for parsing the language. Piglet gives us two options for configuring a parser, a fluent and a more technical interface. I’ll use the fluent interface for this series, maybe in the end provide an equivalent grammar in the technical interface. In the end, it’ll serve to deepen the knowledge of how grammars actually work, though you should have a pretty good idea when we actually get there.

Here’s the entire thing:

var turtle = new Turtle(canvas1);

var configurator = ParserFactory.Fluent();
var statementList = configurator.Rule();
var statement = configurator.Rule();

statementList.IsMadeUp.ByListOf(statement);
statement.IsMadeUp.By("pendown").WhenFound(f =>
    {
        turtle.PenDown = true;
        return null;
    })
    .Or.By("penup").WhenFound(f =>
    {
        turtle.PenDown = false;
        return null;
    })
    .Or.By("move").Followed.By<int>().As("Distance").WhenFound(f =>
    {
        turtle.Move(f.Distance);
        return null;
    })
    .Or.By("rotate").Followed.By<int>().As("Angle").WhenFound(f =>
    {
        turtle.Rotate(f.Angle);
        return null;
    });
var parser = configurator.CreateParser();
parser.Parse(code.Text);

Going through it line by line. First we make a configurator object. This is the object that we’ll use to make the parser. Then we make two rules, statementList and statement. A thing of huge importance here. The order of declaring these is vitally important. You see, if you’d make this in the wrong order it would just try to find a single statement. A parser constructed by Piglet must be able to parse everything you give it down to a single rule. For now, the single rule is the statement list. Usually it’s something like “program” for a programming language or translation unit or something to that effect.

Moving on, we declare that a statementList unsurprisingly is made up by a list of statement. This is a reference to another rule.

A statement is declared as one of the four commands that the turtle understands. Following each possibility there is a lambda that gets executed when the parser has recognized that a rule is to be applied. For now we make the turtle do the same thing as the command says. There is a need to return something from the parse function, we ignore that for now and return null. Later on we’ll revisit this and find out why you want to return things from rules.

The move and rotate are interesting, since they have a parameter which is an integer value. We’ll need to find out the value of this parameter. In Piglet, this requires you to give the parameter a name. This name then becomes accessible on the dynamic object that is the input to the WhenFound function. Integers are simple types, so Piglet has a built-in function for recognizing them.

Calling parser.Parse with the code causes to the turtle to do what you wanted it to. The program also gives some helpful hints for when you’ve confused our poor reptilian friend, you get this functionality for free when using Piglet.

The full source code for this tutorial is found on GitHub: TurtlePig. Feel free to fork and mess about with it. The same repo will be updated once this tutorial continues, so you might find a slightly different version. I’ll figure a way to keep all versions accessible.

Next time, we’ll take a look at making this language a lot more capable – step by step.

Tagged , , ,

There’s something about Haskell

This is sort of a follow-up on my previous post about my adventure with Haskell. In retrospect, I’m happy I wrote my experience down since time would have since blurred the details of my initial experience. I have not really used Haskell itself for anything really since my last effort with the language, but things have stuck with me in a way that I hadn’t thought initially possible. All this talk about learning functional programming to make you a better programmer, I thought it was if not untrue then at least greatly exaggerated.

In general for C#, my main day-to-day language:

  • Making higher order functions, a lot. I wish the syntax for this was a whole lot better in C#
  • Annoyance with the poor type inference in C#
  • A strong desire to use the Y combinator just because I know how it works
  • A wish that every function call would be curried by default. Just a call with too few parameter could give you a Func back
  • Some way to mark a member function as pure that would allow me to verify that I’m not altering state anywhere

Either way, Haskell has continued to stay attractive. Perhaps because it brings a real mind-challenge to creating problems that would be mundane to solve in an imperative language. It’s the puzzle aspect that drives the attraction. The great mystery of using a language whose paradigm still presents challenges to me is a constant lure. A language that won’t allow me to cheat and mess up it’s paradigm when I am too dimwitted to understand how to solve the problem in the right way.

I’ve started to watch the haskell tag on stackoverflow, and responding whenever I think I know the answer. Always I’m inordinately proud of any upvotes or even getting to the right answer simply because it feels like an accomplishment rather than some piece of research that the poster usually could have done himself using a few minutes of google.

I try to keep programming on my spare time to keep broadening my senses and I really need to make something real in Haskell. Not just some little silly piece of demo software but an actual creation that does something. Often when I start doing at-home projects the enthusiasm dries up after the concept is proven since the rest is all boilerplate in a technique that I already am familiar with. Writing a piece of real software in Haskell would make that boilerplate interesting and challenging in itself I imagine.

Now to find that piece of software that none or few have written before me, that might have real users in the end and is within my grasp to actually complete…

Tagged ,

Using IDisposable with Autofac

Using IDisposable with Autofac presents a few challenges and requires you to really think about lifetime scopes to avoid memory leaks. These days, with pretty excellent garbage collection built into .NET the age-old process of finding memory leaks is usually something that you need not worry about anymore. However, the contract of an IDisposable is something that still requires manual release. And if you’re ever resolving anything deriving from IDisposable using Autofac you’re going to run into problems.

Take a look at this code. What does it do?

using System;
using Autofac;

namespace AutoFacDisposableDemo
{
    public interface IResource : IDisposable 
    {
    }

    public class MyResource : IResource
    {
        private byte[] _lotsOfMemory;

        public MyResource()
        {
            _lotsOfMemory = new byte[1000000];
        }

        public void Dispose()
        {
            _lotsOfMemory = null;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var builder = new ContainerBuilder();
            builder.RegisterType<MyResource>().As<IResource>();
            var container = builder.Build();

            while (true)
            {
                using (container.Resolve<IResource>())
                {
                }
            }
        }
    }
}

At a first glance, it seems to do nothing but feed the garbage collector. But take a look at the memory usage. It will be constantly increasing, and you will run out of memory! If you try removing the IDisposable interface from the IMyResource this code will stop running out of memory. So why does this happen?

Autofac manages your IDisposables

Yes, Autofac tries to be smart and will actually contain a reference to the object internally whenever you resolve a component that is deriving from IDisposable. This is because autofac doesn’t know what other objects might be referencing your resource and you haven’t told it anything on when it is supposed to go out of scope. Especially if autofac is wired in such a way to create non-transient instances where many could be using your disposable object and only the last usage should dispose of it.

This happens transparently, and because you’ve normally done what is usually the right thing and called Dispose on it, you have released the expensive resources on it – leaving only a small skeleton object floating around that never will be garbage collected. This is scary because the memory leak isn’t huge and obvious like forgetting to dispose of sockets that show up pretty quickly. If you run this through a memory profiler it will be held by some internal IDisposable stack somewhere that is rooted to a closure somewhere deep down.

It really is not a solution to try to work around this by removing the from the interface, since that will cause all sorts of problems down the road – breaking the semantics in the processs. Instead what you need to do is to use lifetime scopes. If you change the main loop to this it will run without a leak:

while (true)
{
    using (var scope = container.BeginLifetimeScope())
    {
       var myResource = scope.Resolve<IResource>();
    }
}

Note that we are resolving from the opened scope, and disposing the scope instead of the allocated resources. This is all fine but a bit simplistic. What happens if we are using factory functions instead?

public interface IResourceFactory
{
    IResource GetResource();
}

public class MyResourceFactory : IResourceFactory
{
    private readonly Func<IResource> _newResource;

    public MyResourceFactory(Func<IResource> newResource)
    {
        _newResource = newResource;
    }

    public IResource GetResource()
    {
        return _newResource();
    }
}

class Program
{
    static void Main(string[] args)
    {
        var builder = new ContainerBuilder();
        builder.RegisterType<MyResource>().As<IResource>();
        builder.RegisterType<MyResourceFactory>().As<IResourceFactory>();
        var container = builder.Build();

        var factory = container.Resolve<IResourceFactory>();

        while (true)
        {
            using (var scope = container.BeginLifetimeScope())
            {
                var myResource = factory.GetResource();
            }
        }
    }
}

Out of memory again. Autofac will give you a Func that does new for you in a sense. But that Func is dynamically created to make objects that have the same lifetime scope as the factory object – not the lifetime scope that you called it in! This makes prefect sense in a way, since you can have multiple lifetime scopes going on at the same time – even nested since you can create a lifetime scope from another lifetime scope. Changing it to this will eliminate the problem:

class Program
{
    static void Main(string[] args)
    {
        var builder = new ContainerBuilder();
        builder.RegisterType<MyResource>().As<IResource>();
        builder.RegisterType<MyResourceFactory>().As<IResourceFactory>();
        var container = builder.Build();

        while (true)
        {
            using (var scope = container.BeginLifetimeScope())
            {
                var factory = scope.Resolve<IResourceFactory>();
                var myResource = factory.GetResource();
            }
        }
    }
}

Singletons

Ah, the global variables of the 21st century. What happens if you make the resource factory into a singleton?

var builder = new ContainerBuilder();
builder.RegisterType<MyResource>().As<IResource>();
builder.RegisterType<MyResourceFactory>().As<IResourceFactory>().SingleInstance();
var container = builder.Build();

Bam. Out of memory again! Singletons might be better than globals, but it’s still not a very good idea. The func will be bound to the top lifetime scope and all the IDisposables that gets created are also bound to that scope regardless of how many times you call Dispose on them. A better idea would be to use InstancePerLifetimeScope instead. This removes the problem but also causes the factory to be instantiated several times. Singletons are generally a bad idea, since you can’t be sure who is going to be adding a dependency on an IDisposable and cause memory or resource leak.

More options

There is a Owned class that you can resolve for. So, if you resolve for Owned instead, you are required to release the resource yourself and autofac does no effort to keep the reference in memory any more. Just make sure you call Dispose on the Owned object instead of the internal IResource.

Creating lifetime scopes

You don’t want to pass your scopes around, so you can get the lifetime scope injected for you if you take a dependency on the LifetimeScope object. If you do so, the current lifetime scope from the Resolve will be passed to the constructor from which you can derive more child lifetime scopes or Resolve objects given the current scope. It leaves a bad taste since this has all the trademarks of showing your container since the scope will allow you to create any instance of any type without making an explicit dependency. It would be a better solution to try to avoid this and rely on auto-generated Funcs to create objects in the correct lifetime scope.

In conclusion

  • Be really careful. The leaks are not obvious, and if you’re using other peoples code to inject you can never be sure when they’re using something that is disposable.
  • If you’re using factories, they cannot be singletons if used to create anything IDisposable. In fact, I’d avoid them in general since it’s way too easy to pass a factory from a different scope into a child scope and using that to create objects that will be leaking.
  • Find distinct units-of-work and begin and end lifetime scopes there. This is the place to resolve objects that are all in the same scope.
  • Don’t dispose of injected IDisposables manually. If you work with autofac instead of against it the objects should dispose of them for you. This makes you safe for whenever someone decides to add another usage of your object or change the number of instances created. If you absolutely need to dispose of it manually – make use of the Owned class, and get an object that you yourself are responsible for.
Tagged , , , ,

Why waiting for asynchronous IO is evil

Are you doing IO? If so, who’s waiting on it to complete? IO looks easy on the surface but if you in any way intend to have a fully scaleable application you’ll soon see that this is a delicate and tricky concept. To illustrate this, let’s make a web server real quickly using low level IO!

Old-school IO

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        while (true)
        {
            var tcpClient = tcpListener.AcceptTcpClient();
            new Thread(() => HandleRequest(tcpClient)).Start();
        }
    }

    public static void HandleRequest(TcpClient client)
    {
        try
        {
            // Read the entire header chunk
            var stream = client.GetStream();
            var headers = new StringBuilder();
            while (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
            {
                headers.Append(Encoding.ASCII.GetChars(new[] {(byte) stream.ReadByte()}));
            }

            // Find out what was requested in the first line
            // Assume GET xxxx HTTP/1.1         
            var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

            // Read the file and serve it back with the minimal headers
            var file = new FileInfo(path);
            var fileStream = file.OpenRead();

            // Minimal headers
            var responseHeaders = Encoding.ASCII.GetBytes(
                string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
            stream.Write(responseHeaders, 0, responseHeaders.Length);
            fileStream.CopyTo(stream);
            fileStream.Close();
        } 
        catch {} 
        finally
        {
            client.Close();
        }
    }
}

Code is easy to read, and straight to follow. But this simplicity is deceptive. You see, what will actually go on here is waiting and locked threads. And locked threads are bad stuff. The read itself will not really run on your thread, instead .NET will be smart and use an IO completion port to get your data from the network if the data isn’t immediately available. This means that you’re wasting a thread. Threads are expensive resources. Not only do they cause context switching but they also each will incur it’s own stack space. This implementation of a web server will never scale because of memory useage.

Waiting is evil

Every time you wait, you are locking up a resource. It’s an important point to make, since the simplicity of the synchronous functions present such a lure to the developer. So we need to make use of the asynchronous pattern, either using the task parallel library or by the Begin/End function pairs. Trouble is that this also presents a way too easy to access waiting handle. If you’re doing an application that needs to scale and that needs to be able to handle lots of IO you can’t do any waiting.

In fact, the task parallel library presents another very nasty gotcha. If you were to wrap code that does waiting inside tasks you are screwing over the task thread pool by occupying tasks in waiting and preventing new tasks from starting. This leads to thread pool starvation and an unresponsive application. When you use the TPL for IO you need to create tasks that encapsulate a singular IO operation in a task using Task.FromAsync in order to make sure that the background IO runs without consuming a thread for waiting.

Thinking asynchronously

The great thing about doing IO async is that if data is not available the function to get it won’t be run by you. You’ll get your stuff back in a callback. This callback runs on the internal worker thread pool. This pool is something you do NOT want to do any sort of long running operations on. It needs to be available to other things.

This has a few other interesting applications. Since you can’t wait for things, iterating becomes really awkward. Consider the loop that reads each byte from the stream to get the header block. You can’t make a loop anymore, since going back and iterating means that the thread that holds control over the loop needs to wait for the result of the operation. So, iterating needs to be accomplished using recursion.

Error handling also becomes difficult. If you throw an exception from inside an AsyncCallback often the application will die straight away. There won’t be a callstack for the exception to propagate back on since the callback has been initiated by the system when the async operation completed asynchronously.

An asynchronous web server

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Text;
using System.Threading;

class Program
{
    static void Main(string[] args)
    {
        var tcpListener = new TcpListener(IPAddress.Any, 1337);
        tcpListener.Start();
        AcceptClient(tcpListener);

        // To avoid the program exiting
        Thread.Sleep(Timeout.Infinite);
    }

    private static void AcceptClient(TcpListener tcpListener)
    {
        tcpListener.BeginAcceptTcpClient(result =>
        {
            var client = tcpListener.EndAcceptTcpClient(result);
            var stream = client.GetStream();

            // Start next connection attempt
            AcceptClient(tcpListener);

            var buffer = new byte[1];
            var headers = new StringBuilder();

            Action readAction = null;
            readAction = () => stream.BeginRead(buffer, 0, 1, readResult =>
            {
                stream.EndRead(readResult);
                headers.Append(Encoding.ASCII.GetString(buffer));
                if (!(headers.Length > 4 && headers.ToString(headers.Length - 4, 4) == "\r\n\r\n"))
                {
                    readAction();   // Recurse to read one more byte
                }
                else
                {
                    // Assume GET xxxx HTTP/1.1         
                    var path = new string(headers.ToString().Skip("GET ".Length).TakeWhile(c => !Char.IsWhiteSpace(c)).ToArray());

                    // Read the file and serve it back with the minimal headers
                    if (!File.Exists(path))
                    {
                        stream.Close();
                        return;
                    }
                    var file = new FileInfo(path);
                    var fileStream = file.OpenRead();

                    // Minimal headers
                    var responseHeaders = Encoding.ASCII.GetBytes(
                        string.Format("HTTP 200 OK\r\n" + "Content-Length: {0}\r\n" + "\r\n", file.Length));
                    stream.BeginWrite(responseHeaders, 0, responseHeaders.Length, writeResult =>
                    {
                        stream.EndWrite(writeResult);
                        byte[] fileBuffer = new byte[file.Length];
                        fileStream.BeginRead(fileBuffer, 0, (int)file.Length, fileReadResult =>
                        {
                            fileStream.EndRead(fileReadResult);

                            stream.BeginWrite(fileBuffer, 0, fileBuffer.Length, contentWriteResult =>
                            {
                                stream.EndWrite(contentWriteResult);
                                fileStream.Close();
                                stream.Close();
                            }, stream);
                        }, fileStream);
                    }, stream);
                }
            }, stream);
            readAction();

        }, tcpListener);
    }
}

The code above is obviously for demonstrative purposes. Generally it’s not a good idea to read single bytes from streams, and in this case it’s an especially bad idea since it generates an impressive call stack from the recursion. But it shows the general idea on how you should code to achieve an IO-bound application that will scale. There are no explicit threading. No thread is ever in a waiting state. The application becomes entirely reactive to IO instead of reading and waiting for IO to complete. Interesting things happen when you start to run this and break and a random point. All threads in the worker thread pool are normally doing nothing, which is great because this means that they’re available to the system to quickly process IO callbacks. A request to this server will not spin up a thread. The memory usage will be kept absolutely minimal.

The situation improves somewhat with the new .NET 4.5 which has the async/await keywords built in. Improves in the way that the syntax becomes nicer but the lessons still hold true. If you’re waiting on anything you’re killing your application with wait handles and no amount of async keywords are going to rescue you. It’s a pity that most of the examples of doing asynchronous operations in the documentation often show off ways to use WaitOne that pretty much totally defeats the purpose of being asynchronous in the first place.

Tagged ,

Fun with Moq: Dynamic mocking

Run time mocking

Sometimes it’s useful to be able to create a Mock object without the type being known at compile time. Normally when using Moq you create a Mock object using this syntax:

var mock = new Mock<IMyInterface>().Object;

But what if IMyInterface is not known at compile time? What you would like would be to do this var mock = new Mock(myType).Object and receive the mocked object back. Turns out this method does not exists, but it’s possible to get around this using the magic of reflection!

public static object DynamicMock(Type type)
{
   var mock = typeof(Mock<>).MakeGenericType(type).GetConstructor(Type.EmptyTypes).Invoke(new object[] { });
   return mock.GetType().GetProperties().Single(f => f.Name == "Object" && f.PropertyType == type).GetValue(mock, new object[] {});
}

This piece of code will return to you a mock that implements the interface type you specify and return it as an object. This could then be used for various fun stuff to automate the unit testing of certain things.

Dynamic parameter checking

Seen this code before?

public class MyConcreteClass 
{
   private readonly IDependency1 _a;
   private readonly IDependency2 _b;
   private readonly IDependency1 _c;
   private readonly IDependency1 _d

   public MyConcreteClass(IDependency1 a, IDependency2 b, IDependency1 c, IDependency1 d)
   {
     if (a == null) throw new ArgumentNullException("a");
     if (b == null) throw new ArgumentNullException("b");
     if (c == null) throw new ArgumentNullException("c");
     if (d == null) throw new ArgumentNullException("d");

     _a = a;
     _b = b;
     _c = c;
     _d = d;
   }
}

It’s great to cast all those ArgumentNullExceptions to check the value, but testing this is a complete pain, since you’d need four separate test cases for each parameter to verify that the correct exception is being thrown. With dynamic mocking and some reflection this can be totally automated for classes that only take interfaces as parameters – which in a heavily dependency injected application is pretty much all of them.

All you need to test this scenario is a single call to NullParameterCheck<MyConcreteClass>() and this method somewhere in your testing code.

public static void NullParameterCheck<T>()
{
	var constructor = typeof(T).GetConstructors().Single();

	Assert.IsFalse(constructor.GetParameters().Any(f => !f.ParameterType.IsInterface), "All parameters must be interfaces in order to use this helper");

	for (int i = 0; i< constructor.GetParameters().Length; ++i)
	{
		var paramName = constructor.GetParameters()[i].Name;
		int i1 = i;
		var args = constructor.GetParameters().Select((p, ix) => ix == i1
			? null
			: DynamicMock(p.ParameterType));
		try
		{
			constructor.Invoke(args.ToArray());
			Assert.Fail("No exception for parameter null {0}", paramName);
		}
		catch (TargetInvocationException e)
		{
			Assert.IsTrue(e.InnerException is ArgumentNullException);
			Assert.AreEqual(paramName, ((ArgumentNullException)e.InnerException).ParamName);
		}
	}
}

This code will loop around each parameter, replacing one of them with null in turn to verify that each interface parameter will generate the corresponding ArgumentNullException

Happy mocking!

Tagged , , , ,

The most important point of paying someone to develop for you

From time to time I amuse myself in other peoples failures. I know I’m not alone in this ignoble endeavour. There’s a lot of sites that are pretty much aimed at poking fun of projects that failed and code that is all kinds of shit. I can understand why you end up writing shitty code when learning, and I can understand why you fail projects in regards to deadline. I have little understanding of how on earth you can manage to run a software project completely into the ground when you’re paying some professional people to do program it for you.

How on earth can you hire someone to develop for you without checking their work?

I grew up with an image of hired developers being the übermensch of software development. The hired guns of the developer world, the special forces to be put in where brave men have failed. As I came to understand, this is not the general case. Especially when this comes to outsourcing.

Again, I’m drawn to the analogy of building houses. You’re recognizing that you cannot make the house in person, so you’re hiring other people to do it for you. This is all fine, and it’s a good sign that you’re not going to take on something you know you don’t truly understand (though it makes for pretty good TV shows). And, then the house is built you move in without inspecting it and complain when the roof leaks and the walls are full of mold.

And it’s not very hard to check the quality of software. Sure, to understand a piece of code thoroughly is hard, but to get a general sense of quality takes all of a few hours for an experienced developer. Even if you’d pay a consultant triple the normal hourly fee to go through someone elses work and critique it, it would be the best money you’d spend on the project. And, yes, it needs to be someone who isn’t the one who make the system.

At times, you don’t even need to do the actual review. If the ones developing the system fear the reviewer, then that is probably good enough evidence that the system will suck to high heavens come the review date. Someone who writes competent software would relish the opportunity to learn from the reviewer.

I know I would. Do take my word for it and check my stuff.

Piglet 1.3.0 is out, now with unicode support!

Piglet has been updated today, with version 1.3.0. Here’s a list of the most exiting changes:

Unicode support

You can now use the full character range available in your regular expressions and parser tokens. This means that the parser will correctly be able to lex things such as Arabic, Chinese and Japanese. When using regular expressions for these, all the normal rules apply but the characters will not be included in any of the shorthand notation. For instance, the traditional Japanese numeral Kanjis are not part of the \d construct.

Nothing in existing code needs to be altered to enable this support. The runtime of the lexer has been slightly altered and is very slightly slower, but it should not even be noticeable.

Choice of lexer runtime

The most costly thing by far in the parser and lexer construction algorithms is the lexer table compression. Though this has been alleviated somewhat by the unicode functionality which actually served to reduce the size of the lexing tables, it can still be quite expensive.

If a faster construction time but a slower lexer is desired, you now have other options. When constructing, set the LexerRuntime in the LexerSettings variable of your ParserConfigurator. Or if constructing just a lexer with no accompanying parser, set the LexerRuntime property.

The available values are:

  • Tabular. Tabular is the slowest to construct but the fastest to run. Lexers built this way will use an internal table to perform lookups. Use this method if you only construct your lexer once and reuse it continually or parsing very large texts. Time complexity is O(1) – regardless of input size or grammar size. Memory usage is constant, but might incur a larger memory use than other methods for small grammars.
  • Nfa. Nfa means that the lexer will run as a non-finite automata. This method of constructing is VERY fast but slower to run. Also, the lexing performance is not linear and will vary based on the configuration of your grammar. Initially uses less memory than a tabular approach, but might increase memory usage as the lexing proceeds.
  • Dfa. Runs the lexing algorithm as a deterministic finite automata. This method of construction is slower than NFA, but faster than Tabular. It runs in finite memory and has a complexity of O(1) but a slower run time and more memory usage than a tabular lexer. It provides a middle road between the two other options.

The tabular option is still the default option.

Hope you find it useful, and please report and bugs or problems either directly to me or file an issue on github.

The language feature abuse threshold

C# has an odd strategy to language features which can probably be best approximated with “That looks cool, let’s put it in”. This has resulted in a language which is about as full of syntactic sugar as the very best of them.

As an example, C# now has lambdas since a few years back. They’ve got their own syntax as well. Not that we actually needed lambdas in a strict sense, since we got delegates before that. Not that we needed those either, since we’ve got objects and interfaces to pass around. Which weren’t themselves needed. The only thing you really need is a few machine code instructions. Or, given a convoluted example you need only one assembly language instruction.

Of course you don’t want to code in that, so you’d end up writing in some type of sugar coated language in order to be as productive as possible. But, when are you crossing the threshold of overusing language features just because you can?

My example here is, again, going to be the lambda functions in C#. In part because I am using those myself a lot, and the usage is increasing – maybe in part due to my experience with Haskell which really turned me on to using a functional style.

Local functions

Lambdas let you make local functions, something which isn’t possible using a normal member function. Which means you can create something like this.

public void DoStuff(string message)
{
    Func<string, bool> messageContains = s => (message??"").Contains(s);
    if (messageContains("this"))
    {
        //.. stuff
    }
    else if (messageContains("that"))
    {
        // .. other stuff
    }
}

This saves quite a few characters to type, since you’d get a useful null safe comparison but it’s only scoped in the local function and doesn’t pollute your class. Overuse, or clever?

Self recursive lambdas

This Fibonacci function works just like the standard double recursive function, but from within a local scope.

Func<int, int> fib = null;
fib = f => f < 2 ? f : fib(f - 1) + fib(f - 2);

Granted, this is contrived and probably in all sorts of bad styles? Or appropriate somewhere?

Functions returning functions

My favourite, honestly very very useful, but would you yourself use this? Overuse?

public bool HasSpecificChildren(XDocument doc)
{
    Func<string, Func<XContainer, bool>> hasDescendant = 
        name => e => e.Descendants(name).Any();
    Func<Func<XContainer, bool>, Func<XContainer, bool>, Func<XContainer, bool>> and =
        (a, b) => x => a(x) && b(x);

    return and(hasDescendant("child"), hasDescendant("otherChild"))(doc);
}

In case it’s not obvious, this code is equivalent to this

public bool HasSpecificChildren2(XDocument doc)
{
    return doc.Descendants("child").Any() && doc.Descendants("otherChild").Any();
}

Now, interestingly, though this example is a bit over the top – which of the two implementations is the most redundant. I’d say it is the second one. The lambda sillyness only repeats the functionality once for each part and is actually as factored as you can get. Consider if you were to change the implementation from Descendants to Elements. One solution has only one place to change..

Currying

A final piece of something that probably is a bit from the Haskell world, though interestingly enough the venerable Jon Skeet wrote about it.

Currying is the idea that each function only really needs one argument. This can be achieved in C# as well. Consider this.

public static Func<T1, Func<T2, Func<T3, TResult>>> Curry<T1, T2, T3, TResult>(Func<T1, T2, T3, TResult> uncurried)
{
    return a => b => c => uncurried(a, b, c);
}

static void Main(string[] args)
{
    Func<string, string, string, string> func = (a,b,c) => a + b + c;
    var uncurried = func("currying", "is", "awesome");

    Func<string, Func<string, Func<string, string>>> curry = Curry(func);
    var curried = curry("currying")("is")("awesome");
}

This is very rarely seen in C#, but is a mainstay of other languages and can prove very useful indeed for function composition. So, are we abusing the language enough? Or do we need to go even further.

All of the things above have their place in your development toolbox, honestly. But when are you overusing them?

Is it OK to make a local lambda statement in order to avoid passing an argument to a private function? I know I make local lambdas constantly for this very reason. Is it cool to prefer LINQ to avoid making for loops? Should you avoid the var keyword because someone else might get confused of your typing intentions or just go along with the speed of development that it offers constantly?

When are you over the line? And who is to determine what is acceptable?

Follow

Get every new post delivered to your Inbox.