Ikke's blog

'cause this is what I do

Pyparsing introduction: BNF to code

Let’s try to define ‘startword’. Here’s a pretty naive approach:

from pyparsing import Word
from string import lowercase
uppercase = lowercase.upper()
startchar = Word(uppercase, exact=1)
endchars = Word(lowercase)
startword = startchar + endchars

Notice the ‘exact’ keyword argument in the startchar definition: this denotes we want the word length to be exactly equal to one. Other arguments are ‘min’ and ‘max’. This solution got several issues though, check this test-drive:

In [85]: print startword.parseString('Hello')
['H', 'ello']

We’d rather not want our word to be split into 2 pieces in the parser output… We could get around this by aggregating pieces (which is possible inside the parser), but this is not a nice solution. This approach got another major issue too:

In [86]: print startword.parseString('H')
ParseException: Expected W:(abcd...) (1), (1,2)

This exception should indeed be raised against single-character words, although these should be valid (‘A valid sentence.’ should be a valid sentence…). This could be fixed by defining a startword to be a single uppercase character (‘startchar’) OR ‘startchar + endchars’.

Luckily Pyparsing allows a less verbose solution: by passing 2 arguments to the Word constructor, one can define a set from which the first character of the word should be part of (first argument), and a second set for the remaining part of the word. Here’s a correct solution:

In [87]: from pyparsing import Word
In [88]: from string import lowercase
In [89]: uppercase = lowercase.upper()
In [90]: startword = Word(uppercase, lowercase)
In [91]: print startword.parseString('Hello')
['Hello']
In [92]: print startword.parseString('A')
['A']
In [93]: print startword.parseString('hello')
ParseException: Expected W:(ABCD...,abcd...) (0), (1,1)

Finally, we can code our final sentence parser:

In [1]: from pyparsing import Word, ZeroOrMore
In [2]: from string import lowercase
In [3]: uppercase = lowercase.upper()
In [4]: word = Word(lowercase)
In [5]: startword = Word(uppercase, lowercase)
In [6]: end = Word('.?!', exact=1)
In [7]: body = ZeroOrMore(word)
In [8]: sentence = startword + body + end
In [9]: print sentence.parseString('A valid sentence.')
['A', 'valid', 'sentence', '.']
In [10]: print sentence.parseString('I!')
['I', '!']
In [11]: print sentence.parseString('a very INVALID sentence')
ParseException: Expected W:(ABCD...,abcd...) (0), (1,1)

Pretty simple, huh?

Let’s try to do something remotely useful with our current knowledge now. We’ll write a simple application which allows us to enter basic mathematical expressions, assign their value to variables, and pre-assigned variables in expressions.

Variable names are arbitrary, but may only consist of lowercase characters. We only implement binary operators (+, -, * and /) on 2 basic operands, or basic assignations. These are some samples of valid expressions:

foo = 1
bar = 1 + 2
baz = foo + bar
bat = baz
foo = baz / 2

Exercise: write down a BNF definition of valid expressions. You can use an ‘integer’ definition, as defined in the previous article.

Pages: 1 2 3 4 5 6

Posted in Development.

Tagged with bnf, parsing, pyparsing, python.

By Nicolas – January 20, 2008

8 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

sb says

Six pages?? Ouch. And no next button. Any chance you could put it all on one page next time? I feel like I’m reading some ad-infested hardware blog.

January 20, 2008, 18:09 Reply
jauco says

very nice tutorial! thanks!

January 21, 2008, 07:22 Reply
Olivier Berger says

What’s the difference to other parser systems like simpleparse ?

Regards,

January 21, 2008, 11:15 Reply
Francis says

I don’t see the difference in (except for the whitespaces)

print sentence.parseString(‘hello world’) # notice >1 spaces
# returns ['hello', 'world']
print sentence.parseString(‘Hello world’)
# raises a ParseException

Why does the second one raise an exception ?

April 19, 2008, 09:22 Reply
Nicolas says

Francis: I guess you’re referring to the snippet on page 2? It says:

from pyparsing import OneOrMore
sentence = OneOrMore(word)

The definition of ‘word’ is given on the previous page:

word = Word(lowercase)

where ‘lowercase’ is imported from the ‘string’ module and equals

abcdefghijklmnopqrstuvwxyz

The definition of the BNF type ‘word’ is Word(lowercase), ie. a concatenation of any character in the string (or list, so you want) ‘lowercase’, which is a-z.

A sentence is defined as OneOrMore words.

The string ‘Hello world’ can not be parsed since it does not match OneOrMore(word): the first item in it (‘Hello’) contains characters not matching the definition of word: the ‘H’ (since we defined a word to be a concatenation of lowercase characters, it shouldn’t contain any uppercase characters).

As you can see, on page 3 a better definition of sentence is constructed using a ‘startword’ definition which should be a concatenation of one uppercase character, followed by zero or more lowercase characters.The example shows ‘A valid sentence.’ can be parsed and validated. The string ‘Hello world!’ would be valid in this BNF construct too. ‘Hello world’ would not match since we’re missing a punctuation sign.

Using the definitions from page 3

almost_valid_sentence = startword + body

or (even more limited)

hello_caps = startword + word

would validate and parse ‘Hello world’.

April 19, 2008, 14:17 Reply
GDR! says

Good introduction – thank you!

Although I share the feelings of “sb” about pagination.

December 18, 2009, 11:20 Reply
lerry says

hi poh,, what if the expr is like this A=B+c?

March 14, 2011, 05:45 Reply
Wayne says

Good introduction to pyparsing. Thanks Nicolas!

December 4, 2012, 01:03 Reply

« Text parsing, formal grammars and BNF introduction Interesting though »

@donsbot Any reason Data.ByteString.zipWith' isn't exported/public API? 12:30:27 AM December 17, 2011 from web in reply to donsbot Reply Retweet Favorite
RT @raichoo: "#Ocaml has an OOP extension… that nobody uses […] not even its inventor" -Yaron Minsky, Janestreet. 09:28:04 PM December 16, 2011 from web Reply Retweet Favorite
Whenever you launch a tech startup, don't look for office space: head to Starbucks. Coffee, power & free wifi, all you need is a laptop! 06:30:52 PM December 16, 2011 from web Reply Retweet Favorite
@viktorklang Bit me a couple of time in the Haskell bindings as well. Simple approach to make a library "threadsafe"? 03:19:38 PM December 16, 2011 from web in reply to viktorklang Reply Retweet Favorite
W00t ^_^ RT @johtib Screenshot teaser for something I've been working on lately: http://t.co/OyGog474 01:19:31 PM December 16, 2011 from web Reply Retweet Favorite

@eikke

Proudly powered by WordPress and Carrington.

Pyparsing introduction: BNF to code

8 Responses

About Ikke's blog

Meta

Friends

Me

Planets

License

Me @ Twitter

Pyparsing introduction: BNF to code

8 Responses

Subscribe

About Ikke's blog

Meta

Friends

Me

Planets

License

Me @ Twitter

Tags