After reading my previous post, you should have a pretty good understanding of what a BNF definition is all about. Let’s put this theory into practice, and write some basic parsers in Python, using Pyparsing!
Pyparsing allows a pretty one-to-one mapping of BNF to Python code: you can define sets and combinations, then parse any text fragment against it. This is something very important to notice: one basic BNF definition can (and should) be reused: if you once wrote a BNF definition for an integer value, you can easily reuse this definition in, eg, a basic integer math expression.
The most basic element using Pyparsing is a Word. In it’s most basic form this is a set of characters which will match any arbitrary length string, as long as the characters in this string are part of the Word character set.
A little introduction example: let’s write a parser which accepts words consisting of small-cap characters, or sentences which consist of words separated by spaces. First we define a formal definition using BNF:
- character ::= a | b | c | d | … | y | z
- word ::= character | character word
- sentence ::= word | sentence ” ” word
Let’s port this formal definition to Python code. First we need to do some imports, as in most Python programs. I’d encourage the reader to write this code in an interactive interpreter (give iPython a try, it rocks!) and experiment a little with it (tab-completion and ‘?’ rock!):
from pyparsing import Word from string import lowercase
Pyparsing includes several useful pre-defined lists of characters, including
- alphas: a-zA-Z
- nums: 0-9
- alphanums: alphas + nums
These are normal Python strings. In this sample we only want lowercase characters though, so we import this from the string module.
Now we can define one word: a word is a concatenation of lowercase characters.
word = Word(lowercase)
Let’s play around with this:
print word.parseString('hello') # returns ['hello'] print word.parseString('Hello') # raises ParseException: Expected W:(abcd...) (0), (1,1) print word.parseString('hello world') # returns ['hello']
Six pages?? Ouch. And no next button. Any chance you could put it all on one page next time? I feel like I’m reading some ad-infested hardware blog.
very nice tutorial! thanks!
What’s the difference to other parser systems like simpleparse ?
Regards,
I don’t see the difference in (except for the whitespaces)
print sentence.parseString(‘hello world’) # notice >1 spaces
# returns ['hello', 'world']
print sentence.parseString(‘Hello world’)
# raises a ParseException
Why does the second one raise an exception ?
Francis: I guess you’re referring to the snippet on page 2? It says:
from pyparsing import OneOrMore
sentence = OneOrMore(word)
The definition of ‘word’ is given on the previous page:
word = Word(lowercase)
where ‘lowercase’ is imported from the ‘string’ module and equals
abcdefghijklmnopqrstuvwxyz
The definition of the BNF type ‘word’ is Word(lowercase), ie. a concatenation of any character in the string (or list, so you want) ‘lowercase’, which is a-z.
A sentence is defined as OneOrMore words.
The string ‘Hello world’ can not be parsed since it does not match OneOrMore(word): the first item in it (‘Hello’) contains characters not matching the definition of word: the ‘H’ (since we defined a word to be a concatenation of lowercase characters, it shouldn’t contain any uppercase characters).
As you can see, on page 3 a better definition of sentence is constructed using a ‘startword’ definition which should be a concatenation of one uppercase character, followed by zero or more lowercase characters.The example shows ‘A valid sentence.’ can be parsed and validated. The string ‘Hello world!’ would be valid in this BNF construct too. ‘Hello world’ would not match since we’re missing a punctuation sign.
Using the definitions from page 3
almost_valid_sentence = startword + body
or (even more limited)
hello_caps = startword + word
would validate and parse ‘Hello world’.
Good introduction – thank you!
Although I share the feelings of “sb” about pagination.
hi poh,, what if the expr is like this A=B+c?
Good introduction to pyparsing. Thanks Nicolas!