Ecru 0.2.0

It's been a bit longer for this release.

The major new additions are message buffering on eventual references and the Ref.whenResolved and Ref.whenBroken methods, as well as the __whenMoreResolved Miranda method. Along with eventual sends and the vat support from the previous release, this is enough to provide support for the when syntax sugar. The Python interface now can control the E event loop: ecru.api.iterate runs a single item in the queue of the vat exposed to Python, and ecru.api.incrementalEval queues an E expression to be run, returning a promise to Python immediately. There's an example IRC-bot REPL in doc/examples/

Special shout-out to Eric Mangold (aka teratorn) for bugfixes and testing. Thanks Eric!

Ecru 0.1.3

Time for another Ecru release. This release should be a lot more stable: ctypes is no longer used for the Python interface. The major behaviour change I've added is support for Selfless objects, allowing for value equality: for example, Map objects with the same contents now compare equal. The rest of the work has been in code reorganization and preparation for concurrency support; vats are implemented now, and the REPL executes code by putting it into the vat's event queue. Eventual sends work now (i.e., "foo <- doStuff(x, y)"). Goals for the next couple releases are actual multi-vat execution, support for the 'when' expression, and portability to other OSes.

Ecru, A C Runtime For E

I'm happy to announce Ecru, a new implementation of the E language.

E is a language designed both for security and safe and efficient concurrency. Twisted borrowed the idea of Deferreds from E, where they are much better integrated into the language than in Python, owing to the syntax and library support E provides. E's design is based around capability security, even to the level of allowing mutually untrusting programs to run in the same process. Due to its consistent focus on security, it's possible to write secure programs without having to do much more than stick to good object-oriented style.

My goals with Ecru development are, initially, to develop an environment suitable for restricted execution of the type that my esteemed colleague has been looking for, allowing safe scripting of server-hosted applications. A wider goal is to provide an effective replacement to the C implementation of Python for development of network software; a successor to Twisted, as it were. Furthermore, E's semantics lend themselves to more efficient implementation than Python's. Although Ecru has received essentially no optimization work thus far, I believe it may be possible to make it faster than Python for many tasks, without being any more difficult to work with.

Right now Ecru depends on Python, since I'm using PyMeta for its parser. Ecru only implements enough of E to run the compiler; I plan to soon implement OMeta in E, so that Ecru can also be used standalone. There's a simple REPL in Python, so you can download Ecru and try it out right now. Currently I'm focusing on cleaning up the code (bootstraps are usually messy affairs) and replacing some of the standard-library object stubs I wrote in C with the versions implemented in E in use by the existing Java version.

PyMeta 0.3.0 Released

Originally when I was implementing PyMeta I was sure that the input stream implementation that the Javascript version used was inefficient. Rather than having a mutable object with methods for advancing and rewinding the stream, it has immutable objects with "head" and "tail" methods, which return the value at the current position and a new object representing the next position in the stream. All that allocation couldn't be healthy.

Turns out I was wrong. I misunderstood the requirements for OMeta's input stream. Various operations require the ability to save a particular point in the stream and go back to it. To further complicate matters, arguments to rules are passed on the stream by pushing them onto the front, and rules that take arguments read them off of the stream. This is very handy for certain types of pattern matching, but it totally destroys any hope of simply implementing the input as a list and an index into it, because there has to be a way to uniquely identify values pushed onto the stream. If a value gets pushed onto the stream, is read from it, then another one is pushed on, both of them have the same predecessor, but they don't have the same stream position. It becomes more like a tree than a sequence. JS-OMeta handled this by just creating a new stream object for each argument. I didn't give up soon enough on my clever idea when initially implementing PyMeta, and it grew more complicated with each feature I implemented, involving a bunch of lists for buffering things and a complicated mark/rewind scheme.

After writing a rather complicated grammar with PyMeta, I began to wonder if I could improve its speed. By this time I knew the JS version's algorithm was less complicated so I decided to try it out. It cut my grammar's unit tests' runtime from around 40 seconds to 4 seconds. Also, it fixed a bug.

Moral of the story: I'm not going to try to implement clever optimizations until I understand the original version any more. :)

I've released a version with this new input implementation. Get it in the usual place.

More Than Just Parsers

PyMeta is more than just a parsing framework, it's a general pattern matching language. Here's a parser for a tiny HTML-like language:

from pymeta.grammar import OMeta
from itertools import chain

tinyHTMLGrammar = """

name ::= <letterOrDigit>+:ls => ''.join(ls)

tag ::= ('<' <spaces> <name>:n <spaces> <attribute>*:attrs '>'
'<' '/' <token n> <spaces> '>'
=> [n.lower(), dict(attrs), c])

html ::= (<text> | <tag>)*

text ::= (~('<') <anything>)+:t => ''.join(t)

attribute ::= <spaces> <name>:k <token '='> <quotedString>:v => (k, v)

quotedString ::= (('"' | '\''):q (~<exactly q> <anything>)*:xs <exactly q>
=> ''.join(xs))

TinyHTML = OMeta.makeGrammar(tinyHTMLGrammar, globals(), name="TinyHTML")

This will parse an HTML-ish string into a tree structure.

Python 2.5.2 (r252:60911, Apr 8 2008, 21:49:41)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import html
>>> x = html.TinyHTML("<html><title>Yes</title><body><h1>Man, HTML is
<i>great</i>.</h1><p>How could you even <b>think</b>
otherwise?</p><img src='HIPPO.JPG'></img><a
href=''>A Good Website</a></body></html>")

>>> tree = x.apply("html")

>>> import pprint

>>> pprint.pprint(tree)
[['title', {}, ['Yes']],
[['h1', {}, ['Man, HTML is ', ['i', {}, ['great']], '.']],
['How could you even ', ['b', {}, ['think']], ' otherwise?']],
['img', {'src': 'HIPPO.JPG'}, []],
['a', {'href': ''}, ['A Good Website']]]]]]]

Suppose now that we want to turn this tree structure back into the HTMLish format we found it in. We can write an unparser:

def formatAttrs(attrs):
Format a dictionary as HTML-ish attributes.
return ''.join([" %s='%s'" % (k, v) for (k, v) in attrs.iteritems()])

unparserGrammar = """
contents ::= [<tag>*:t] => ''.join(t)
tag ::= ([:name :attrs <contents>:t]
=> "<%s%s>%s</%s>" % (name, formatAttrs(attrs), t, name)
| <anything>)

TinyHTMLUnparser = OMeta.makeGrammar(unparserGrammar, globals(), name="TinyHTMLUnparser")

Square brackets in a rule indicate that a list with the given contents should be matched. This way we can traverse a tree structure and produce a string.

>>> html.TinyHTMLUnparser([tree]).apply("contents")
"<html><title>Yes</title><body><h1>Man, HTML is
<i>great</i>.</h1><p>How could you even <b>think</b>
otherwise?</p><img src='HIPPO.JPG'></img>
;<a href=''>A Good Website</a></body></html>"

Other sorts of transformations are possible, of course: here's an example that ignores everything but the 'src' attribute of IMG and the 'href' attribute of A:

linkExtractorGrammar = """
contents ::= [<tag>*:t] => list(chain(*t))
tag ::= ( ["a" :attrs ?('href' in attrs) <contents>:t] => ([attrs['href']] + t)
| ["img" :attrs ?('src' in attrs) <contents>:t] => ([attrs['src']] + t)
| [:name :attrs <contents>:t] => t
| :text => [])

LinkExtractor = OMeta.makeGrammar(linkExtractorGrammar, globals(), name="LinkExtractor")

>>> html.LinkExtractor([tree]).apply("contents")

['HIPPO.JPG', '']

And here's an example that produces another tree, without B or I elements:

boringifierGrammar = """
contents ::= [<tag>*:t] => list(chain(*t))
tag ::= ( ["b" <anything> <contents>:t] => t
| ["i" <anything> <contents>:t] => t
| [:name :attrs <contents>:t] => [[name, attrs, t]]
| :text => [text])

Boringifier = OMeta.makeGrammar(boringifierGrammar, globals(), name="Boringifier")

And once we have the new tree, we can treat it just like the original:

>>> tree2 = html.Boringifier([tree]).apply("contents")
>>> pprint.pprint(tree2)
[['title', {}, ['Yes']],
[['h1', {}, ['Man, HTML is ', 'great', '.']],
['p', {}, ['How could you even ', 'think', ' otherwise?']],
['img', {'src': 'HIPPO.JPG'}, []],
['a', {'href': ''}, ['A Good Website']]]]]]]
>>> html.TinyHTMLUnparser([tree2]).apply("contents")
"<html><title>Yes</title><body><h1>Man, HTML is
great.</h1><p>How could you even think otherwise?</p><img src='HIPPO.JPG'>
</img><a href=''>A Good Website</a></body>

As you can see, the final result shows the original string, but with <b> and <i>tags removed.

This kind of tree transformation is highly useful for implementing language tools, and might form a good basis for a refactoring library. Analysis can be done on the syntax tree produced by the parser, transformation can be done by other tools, and the unparser can then turn it back into valid source code.

PyMeta: How and Why

(Note: In the course of writing this post, I fixed a bug. Get PyMeta 0.1.1 here.)

Python already has a big pile of parsing frameworks, such as ANTLR, PLY, PyParsing, YAPPS2, ZAPPS, ZestyParser, and plenty others. Why did I write another one?

PyMeta Works Like Python

One of the main difficulties I've had using parser generators has been the difficulty of figuring out why a grammar didn't work. Fixing shift-reduce and reduce-reduce conflicts seemed like voodoo to me, and though I slightly understand better how to fix such things now it's still a different mode of thinking that I don't want to try to get into when I just want to parse something simple. PyMeta uses a variation on the Parsing Expression Grammar (PEG) approach to parsing. The chief consequence of this is there's no possibility of ambiguity in a parse: a successful parse will yield exactly one result, and you can trace the control flow through the grammar to figure out how it got there.

This is possible because the alternative operator ("|") works like Python's "or": it tries all the alternatives in order, returning the value of the first expression to succeed. Let's take on the example that's so problematic in the Bison doc I linked to:

>>> ifExample = """
ifStmt ::= ( <token 'if'> <expr> <token 'then'> <stmt> <token 'else'> <stmt>
| <token 'if'> <expr> <token 'then'> <stmt>)

expr ::= <spaces> <letter>
stmt ::= <ifStmt> | <expr>

(Rules can take arguments in PyMeta: "token" here is a built-in rule that consumes whitespace and matches all the characters in the string passed to it.) I've reversed the order in 'ifStmt' from the Bison example. This grammar isn't ambiguous: PyMeta will try to match the if/then/else form first, and if that fails, it rewinds back to before the 'if' and tries the if/then branch of the

>>> Example = OMeta.makeGrammar(ifExample, {})
>>> g = Example("if x then y")
>>> g.apply("ifStmt")
>>> g2 = Example("if x then y else if a then b")
>>> g2.apply("ifStmt")

(Since I didn't specify explicit return values for any of the expressions, they simply return the value of the last expression in the sequence. In this case, "letter" returns the letter it parses, which is then passed up through "expr" and "stmt" and "ifStmt".)

An additional advantage is that you don't need a separate lexer pass: you can handle everything in a single grammar.

PyMeta Lets You Say It Quickly

So why not ZestyParser or PyParsing, which also don't have these problems? There's a couple reasons. First of all, neither of these offer a compact syntax for defining a parser. (Which arguably isn't desirable in all cases; I may add a pyparsing-like interface to PyMeta later.) Second, PyMeta fits Python's object-oriented abilities better -- each grammar is its own class, and can be subclassed in the usual way, as well as extended with new grammar rules.

>>> from pymeta.runtime import ParseError
>>> class Example2(Example):
... def parseIt(self):
... try:
... return self.apply("ifStmt")
... except ParseError:
... return "No Good"
>>> Example2("if x then y").parseIt()
>>> Example2("bob's yer uncle").parseIt()
'No Good'
>>> extendedGrammar = """
number ::= <digit>+
expr ::= <super> | <spaces> <number>
>>> Example3 = Example2.makeGrammar(extendedGrammar, {})
>>> Example3("if 1 then x").parseIt()

Notice here the use of the special rule "super": this calls the grammar rule of the same name as the current one in the superclass. (This uses Python's super() internally.) This way it's possible to try new extensions to a grammar without having to disturb existing code.

Introducing PyMeta

I've just upload the initial release of a Python implementation of Alessandro Warth's OMeta, an object-oriented pattern matching language. Get the tarball here (or simply bzr get lp:pymeta). It's an extremely handy tool for all kinds of parsing and filtering jobs. Here's a tiny example:

>>> exampleGrammar = """
word ::= <letter>+:chars => ''.join(chars)
greeting ::= <word>:x ',' <spaces> <word>:y '!' => (x, y)

Here's a breakdown of the syntax in the first rule: as you can probably tell, "word ::=" defines a parsing rule. "<letter>" calls a builtin rule that matches a single letter. The "+" works similar to its use in regexes, executing the expression before it multiple times and collecting the results into a list. ":chars" binds the result to a local variable, "chars". "=>" is used to provide a Python expression as the result of a successful parse.

The second rule strings multiple expressions together in sequence -- after matching a word, it looks for a comma and then calls the builtin rule "spaces" which matches any amount of whitespace, then another word and finally an exclamation point. Once that's all matched it returns a tuple containing the two matched words.

Using this in a Python program is simple:

>>> from pymeta.grammar import OMeta
>>> GreetingParser = OMeta.makeGrammar(exampleGrammar, {})
>>> gp = GreetingParser("Hello, world!")
>>> gp.apply("greeting")
('Hello', 'world')

In my next few posts I'll be showing more examples of what PyMeta can do and how it's different from other Python parsing tools.