Tag Archives: open source

Introducing Piglet

This is an introductory post for my latest creation, Piglet the friendly parsing and lexing tool. As so much software Piglet is written out of both a sense of satisfaction in solving a difficult problem, and seeing a void where my piece of software might be useful to someone. Piglet is open sourced using the MIT license. This post is intended to compliment the readme on github to form a better picture of why I have written this library and how it can be used.

Why another parser generator?!

Because the other parser generators are too big for easy tasks, and harder to integrate into a project. They are also highly technical and hard to understand for someone who isn’t genuinely interested in old-school comp.sci.

The purpose is for you to have a tool at hand for parsing tasks that are smaller than full blown language construction efforts. Piglet tries to bridge the gap between (ab)using regular expressions and going in guns blazing with a large parsing toolset. It is not going to replace ANTLR or any of those tools, though you certainly can parse larger grammars with Piglet. This is the tool to use if you want a low-treshold, easy to use tool for any context free data.

Code based configuration

Piglet is, in sharp contrast to most other generators configurable in code. You create your parser and use it as naturally as you configure any other object. It has about the same functionality as the yacc/bison family of tools and generates similar parsers. This is an important point because if your parser is configured using a separate input file or even a completely separate tool you are always going to have a distance between the running code and the parser. The parser generators also sometimes generate fairly incomprehensible code. If you have a small parsing task, are you really going to use a generator or are you going to roll your own parser?

There are some tradeoffs in this strategy. Obviously the parser generation is going to be done at runtimes. However, the parsers are reentrant to the construction should only need to be done once. It also enables you to be able to use lambdas as actions when rules are followed, which also means that Piglet can construct type-safe parsers!

Ease of use

Unfortunately parsing is a bit complex and it’s usually required to know a bit about context free grammars to be able to construct a parser. Piglet tries to get around this problem by introducing a fluent configuration interface.

It’s easier to show it straight up. This is a full blown JSON parser. The only thing missing here is really only details such as the ability to write 0x notation for hexadecimal and exponent notation for numbers. There rest is all there .

// Create a configuration object. This object generates all the configuration needed for the parser.
var config = ParserFactory.Fluent();

// Create rules. The first rule is the MAIN rule, the rule that everything must be able
// to condense down to. For JSON this is a single object, since a JSON string is always
// just one object.
var jsonObject = config.Rule();

// Used to represent an element of an object
// which is something like "elementname" : some_value
var jsonElement = config.Rule();

// This represents a value of a json element
var jsonValue = config.Rule();

// This represents an array of values
var jsonArray = config.Rule();

// Now we declare what a jsonObject is made up of
// Literals are found in quotes. Interesting parts that we are interested 
// in are named using the As clause, which makes them accessible in the 
// .WhenFound. The result of WhenFound is returned to the caller.
	.WhenFound( o => new JsonObject { Elements = o.ElementList } );

// Declares what an element is made up of. Note that the rule above uses the
// jsonElement before what it is made of is declared. This is a crucial part
// of parsing. This bit has two interesting named pieces "Name" and "Value".
// Since each bit has a value, this gets assigned to the o parameter to the WhenFound lambda.
	.WhenFound( o => new JsonElement { Name = o.Name, Value = o.Value } );

// A jsonValue has several interpretations, separated by Or clauses.
// Predefined parsing is found for simple types. Note the recursive use
// of jsonObject, values can be full objects themselves! There is no need
// for .WhenFound clauses for single part rules, they
// will always return the value of the single part.
	.Or.By("null").WhenFound(o => null); // Need to specify, since the default will return the string "null"

// This rule could have been merged into the jsonValue rule, as its 
// own .Or.By clause. It's separated for readability only.
	   .WhenFound(o => o.Values);

// When the configuratin is completed - create the parser!
// If the parser creation is unsuccessful, this throws the
// friendliest exception possible to help you find the issue.
var parser = config.CreateParser();

// Here is how you use it!
var jObject = (JsonObject)parser.Parse(
		 ""IntegerProperty"" : 1234, 
		 ""another_object"" : {
		 ""empty_object"" : {

There is also a less verbose technical interface, which right now includes more functions than the fluent one does. In the interest of brevity in this post I’m not going to describe it in detail. The features that currently only exists in the tech interface are context dependent precedence, token precedence, panic mode error recovery and type specific parser generation (the fluent parsers will all generate an object as an end result). These should all be upcoming features in further versions for the fluent interface.

The fluent configuration interface is right now in a working state, but can certainly use more functions! Common tasks should be easy to accomplish while harder tasks may require you to fall back on the technical configuration. As an exciting preview of things to come, I’m currently working on an embedded domain specific language generator (made using Piglet) called Snout, which when finished should help the maintainability of the fluent interface.

Piglet is totally dependency free, and written in C# 4. The code is well commented and understandable. If you are looking for code that reasonably explains the parser generation algorithm, Piglet is a sensible choice.

In technical terms, Piglet generates LALR(1) parsers – mainly because it’s an easier way of expressing your grammar since you do not require left factoring which is a pretty difficult concept to explain.

More examples

I’ve written a Demo project that is available in the main github repository which contains a few demos. The unit tests for Piglet are also quite extensive and are suitable for learning. I’m hoping to find more time to write a detailed tutorial on how to go from a given format into a complete parser. Suggestions for tutorials and articles are most welcome.

Getting your hands on it

Piglet lives on GitHub if you want the source code and demos. It also lives on NuGet if you just want the binaries.


You are most welcome, regardless of skill level. There’s lots of stuff to do! Or just fork it and go to town on your own. I would be flattered if you did.

And if you can’t program, Piglet could use a logo. I’m thinking a piglet would make an appropriate one :)

Tagged , , ,