Now we want to define a sentence. Did you notice in the last example Pyparsing doesn’t error out on the input, but stops processing it after the first word? The parsing module is aware of whitespace, which is (by default) the end of a structure. So we need to define a sentence as a list of words, separated by whitespace (implicit):
from pyparsing import OneOrMore
sentence = OneOrMore(word)
print sentence.parseString('hello world')
# returns ['hello', 'world']
print sentence.parseString('hello')
# returns ['hello']
print sentence.parseString('hello world') # notice >1 spaces
# returns ['hello', 'world']
print sentence.parseString('Hello world')
# raises a ParseException
Let’s enhance our sentence parser somewhat: we want to parse sentences which are defined using these rules:
- A word is a concatenation of lowercase characters (a-z)
- A sentence starts with a word which starts with one uppercase character (A-Z)
- A sentence consists of one or more words, including the first one
- A sentence ends with a dot, a question mark or an exclamation mark
Let’s rewrite this more formal using BNF:
- lccharacter ::= a | b | c | … | z
- uccharacter ::= A | B | C | … | Z
- word ::= lccharacter | lccharacter word
- startword ::= uccharacter | uccharacter word
- end ::= . | ? | !
- body ::= word | body ” ” word | “”
- sentence ::= startword body end
Notice ‘body’ can be an empty string, so ‘I!’ is a valid sentence.
Let’s rewrite this to Python code again. Did you notice ‘body’ is actually almost the same thing as the ’sentence’ structure we defined before, only including an empty string? No need to re-explain this














5 Responses to “Pyparsing introduction: BNF to code”
Leave a comment