After reading my previous post, you should have a pretty good understanding of what a BNF definition is all about. Let’s put this theory into practice, and write some basic parsers in Python, using Pyparsing!
Pyparsing allows a pretty one-to-one mapping of BNF to Python code: you can define sets and combinations, then parse any text fragment against it. This is something very important to notice: one basic BNF definition can (and should) be reused: if you once wrote a BNF definition for an integer value, you can easily reuse this definition in, eg, a basic integer math expression.
The most basic element using Pyparsing is a Word. In it’s most basic form this is a set of characters which will match any arbitrary length string, as long as the characters in this string are part of the Word character set.
A little introduction example: let’s write a parser which accepts words consisting of small-cap characters, or sentences which consist of words separated by spaces. First we define a formal definition using BNF:
- character ::= a | b | c | d | … | y | z
- word ::= character | character word
- sentence ::= word | sentence ” ” word
Let’s port this formal definition to Python code. First we need to do some imports, as in most Python programs. I’d encourage the reader to write this code in an interactive interpreter (give iPython a try, it rocks!) and experiment a little with it (tab-completion and ‘?’ rock!):
from pyparsing import Word from string import lowercase
Pyparsing includes several useful pre-defined lists of characters, including
- alphas: a-zA-Z
- nums: 0-9
- alphanums: alphas + nums
These are normal Python strings. In this sample we only want lowercase characters though, so we import this from the string module.
Now we can define one word: a word is a concatenation of lowercase characters.
word = Word(lowercase)
Let’s play around with this:
print word.parseString('hello') # returns ['hello'] print word.parseString('Hello') # raises ParseException: Expected W:(abcd...) (0), (1,1) print word.parseString('hello world') # returns ['hello']