Pyparsing introduction: BNF to code

After reading my previous post, you should have a pretty good understanding of what a BNF definition is all about. Let’s put this theory into practice, and write some basic parsers in Python, using Pyparsing!

Pyparsing allows a pretty one-to-one mapping of BNF to Python code: you can define sets and combinations, then parse any text fragment against it. This is something very important to notice: one basic BNF definition can (and should) be reused: if you once wrote a BNF definition for an integer value, you can easily reuse this definition in, eg, a basic integer math expression.

The most basic element using Pyparsing is a Word. In it’s most basic form this is a set of characters which will match any arbitrary length string, as long as the characters in this string are part of the Word character set.

A little introduction example: let’s write a parser which accepts words consisting of small-cap characters, or sentences which consist of words separated by spaces. First we define a formal definition using BNF:

Continue reading »

Pages: 1 2 3 4 5 6

Text parsing, formal grammars and BNF introduction

Parsing input is something most developers run into one day. Parsing binary input can be pretty straight-forward, as most of the time you know the format of the input, ie you know what to expect: if you receive a message of 10 bytes, the first byte could be a message ID, the second one the payload length, third one message type ID, and others message content. Pretty easy to handle.

Parsing human-readable text can be harder though, as human beings tend to be less strict when providing input (eg whitespacing), you can’t ask humans to prepend strings with their length, etc.

There are several ways to handle text input. One well-known method is using regular expressions with matches, but writing regular expressions which are able to process not-so-strict input can be pretty though, writing expressions to parse large bodies of text is hard, using sub-expressions can become pretty complicated,… Overall regular expressions usually involve quite a lot of black magic for the average outsider.

xkcd comic: Regular expressions

Luckily, there are easier methods to parse text input too, of which I’d like to introduct one: a Python module called Pyparsing, which can do BNF-style text parsing.

First of all, let me explain “BNF”. The Backus-Naur Form, aka BNF, is a metasyntax you can use to express the grammar of a formal language. This might make no sense at all, so let’s split it up:

Continue reading »

Pages: 1 2