Let’s try to define ’startword’. Here’s a pretty naive approach:
from pyparsing import Word from string import lowercase uppercase = lowercase.upper() startchar = Word(uppercase, exact=1) endchars = Word(lowercase) startword = startchar + endchars
Notice the ‘exact’ keyword argument in the startchar definition: this denotes we want the word length to be exactly equal to one. Other arguments are ‘min’ and ‘max’. This solution got several issues though, check this test-drive:
In [85]: print startword.parseString('Hello')
['H', 'ello']
We’d rather not want our word to be split into 2 pieces in the parser output… We could get around this by aggregating pieces (which is possible inside the parser), but this is not a nice solution. This approach got another major issue too:
In [86]: print startword.parseString('H')
ParseException: Expected W:(abcd...) (1), (1,2)
This exception should indeed be raised against single-character words, although these should be valid (’A valid sentence.’ should be a valid sentence…). This could be fixed by defining a startword to be a single uppercase character (’startchar’) OR ’startchar + endchars’.
Luckily Pyparsing allows a less verbose solution: by passing 2 arguments to the Word constructor, one can define a set from which the first character of the word should be part of (first argument), and a second set for the remaining part of the word. Here’s a correct solution:
In [87]: from pyparsing import Word
In [88]: from string import lowercase
In [89]: uppercase = lowercase.upper()
In [90]: startword = Word(uppercase, lowercase)
In [91]: print startword.parseString('Hello')
['Hello']
In [92]: print startword.parseString('A')
['A']
In [93]: print startword.parseString('hello')
ParseException: Expected W:(ABCD...,abcd...) (0), (1,1)
Finally, we can code our final sentence parser:
In [1]: from pyparsing import Word, ZeroOrMore
In [2]: from string import lowercase
In [3]: uppercase = lowercase.upper()
In [4]: word = Word(lowercase)
In [5]: startword = Word(uppercase, lowercase)
In [6]: end = Word('.?!', exact=1)
In [7]: body = ZeroOrMore(word)
In [8]: sentence = startword + body + end
In [9]: print sentence.parseString('A valid sentence.')
['A', 'valid', 'sentence', '.']
In [10]: print sentence.parseString('I!')
['I', '!']
In [11]: print sentence.parseString('a very INVALID sentence')
ParseException: Expected W:(ABCD...,abcd...) (0), (1,1)
Pretty simple, huh?
Let’s try to do something remotely useful with our current knowledge now. We’ll write a simple application which allows us to enter basic mathematical expressions, assign their value to variables, and pre-assigned variables in expressions.
Variable names are arbitrary, but may only consist of lowercase characters. We only implement binary operators (+, -, * and /) on 2 basic operands, or basic assignations. These are some samples of valid expressions:
- foo = 1
- bar = 1 + 2
- baz = foo + bar
- bat = baz
- foo = baz / 2
Exercise: write down a BNF definition of valid expressions. You can use an ‘integer’ definition, as defined in the previous article.














5 Responses to “Pyparsing introduction: BNF to code”
Leave a comment