Embedding JavaScript in Python

Reading some posts about embedding languages/runtimes in applications on Planet GNOME reminded me I still had to announce some really quick and incomplete code blob I created some days after last GUADEC edition (which was insanely cool, thanks guys).

It takes WebKit’s JavaScriptCore and allows you to embed it in some Python program, so you, as a Python developer, can allow consumers to write plugins using JavaScript. Don’t ask me whether it’s useful, maybe it’s not, but anyway.

There’s one catch: currently there is no support to expose custom Python objects to the JavaScript runtime: you’re able to use JavaScript objects and functions etc. from within Python, but not the other way around. I started working on this, but the JSCore API lacked some stuff to be able to implement this cleanly (or I missed a part of it, that’s possible as well), maybe it has changed by now… There is transparent translation of JavaScript base types: unicode strings, booleans, null (which becomes None in Python), undefined (which becomes jscore.UNDEFINED) and floats.

I did not work on the code for quite a long time because of too much real-job-work, maybe it no longer compiles, sorry… Anyway, it’s available in git here, patches welcome etc. I guess this is the best sample code around. It’s using Cython for compilation (never tried with Pyrex, although this might work as well). If anyone can use it, great, if not, too bad, I did learn Cython doing this ;-)

Python ‘all’ odity

[update] Question solved, see bottom of post.

Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.

Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether

  • all elements in L are >= A
  • all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]

The first test should always result True, the second test could result to False.

Here’s the output of a test-run:

In [1]: import random, sys

In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)]

In [3]: len(a)
Out[3]: 2000000

In [4]: #Check whether all elements are >= 100 

In [5]: %timeit all(i >= 100 for i in a)
10 loops, best of 3: 515 ms per loop

In [6]: %timeit any(i < 100 for i in a)
10 loops, best of 3: 454 ms per loop

In [7]: def f(l):
   ...:     for i in l:
   ...:         if i < 100:
   ...:             return False
   ...:     return True
   ...: 

In [8]: %timeit f(a)
10 loops, best of 3: 292 ms per loop

In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated

In [10]: %timeit all(i >= 100000 for i in a)
100 loops, best of 3: 4.73 ms per loop

In [11]: %timeit any(i < 100000 for i in a)
100 loops, best of 3: 4.29 ms per loop

In [12]: def g(l):
   ....:     for i in l:
   ....:         if i < 100000:
   ....:             return False
   ....:     return True
   ....: 

In [13]: %timeit g(a)
100 loops, best of 3: 2.82 ms per loop

In [14]: #For reference

In [15]: %timeit False in (i >= 100 for i in a)
10 loops, best of 3: 531 ms per loop

In [16]: %timeit False in (i >= 100000 for i in a)
100 loops, best of 3: 5.03 ms per loop

It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.

[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn :-)

Python if/else in lambda

Scott, in your “Functional Python” introduction you write:

The one limitation that most disappoints me is that Python lacks is a functional way of writing if/else. Sometimes you just want to do something like this:

lambda x : if_else(x>100, “big number”, “little number”)

(This would return the string “big number” if x was greater than 100, and “little number” otherwise.) Sometimes I get around this by defining my own if_else that I can use in lambda-functions:

def if_else(condition, a, b) :
   if condition : return a
   else         : return b

Actually, you don’t need this helper if_else function at all:

In [1]: f = lambda x: x > 100 and 'big' or 'small'
In [2]: for i in (1, 10, 99, 100, 101, 110):
...:     print i, 'is', f(i)
...:
1 is small
10 is small
99 is small
100 is small
101 is big
110 is big

James, obviously you’re right… Stupid me didn’t think about that. Your version won’t work when a discriminator isn’t known at import time. But even then a function taking *args and **kwargs with a class-like name, returning a correct class instance, would cut the job.

Regarding the module/plugin stuff, I’d rather use setuptools/pkg_resources :-)

Code Review comic

OSNews: WTFs per minute

I just love this comic.

Python factory-like type instances

When designing applications or libraries, sometimes you need to be able to create instances of a certain interface (in a liberal sense) at runtime without knowing at write/compile time which specific implementation (class) you’ll need to use, as this could depend on runtime variables.

An example of this is an interface providing some functionality which should be implemented differently on different platforms, eg Linux and Windows.

There are some standard patterns how to achieve this. One of them is the factory pattern, which works somewhat like this Python example (let’s pretend ‘PLATFORM’ is ‘linux2′ or ‘win32′, ie sys.platform):

#Pretend we use sys.platform instead of PLATFORM where we use it
PLATFORM = 'linux2'

class FooBase(object):
    def say_foo(self):
        print 'foo'

class PlatformFoo(FooBase):
    def say_platform_foo(self):
        raise NotImplementedError

    @staticmethod
    def get_class():
        #Several ways to get this (dict, introspection, if-tree,...), pick yours
        klass = {
            'linux2': LinuxFoo,
            'win32': WindowsFoo,
        }.get(PLATFORM, None)
        if not klass:
            raise Exception, 'Platform not supported'
        return klass

class WindowsFoo(PlatformFoo):
    def say_platform_foo(self):
        print 'win32 foo'

class LinuxFoo(PlatformFoo):
    def say_platform_foo(self):
        print 'linux foo'

def main():
    foo_class = PlatformFoo.get_class()
    foo = foo_class()
    foo.say_platform_foo()

if __name__ == '__main__':
    main()

Executing this code will, as expected, write ‘linux foo’ to the console. Obviously we could not return the platform-specific class in a PlatformFoo function, but an actual instance, up to you.

Python allows you to handle this situation somewhat nicer though, without introducing any intermediate functions, by using metaclasses.

Continue reading »

Funky C

It’s pretty funny to know this is valid C99 code, implementing a very basic array assignment and hello world:

%:include <stdio.h>
??=include <stdlib.h>

int main(int argc, char *argv<::>) ??<
    int i??(:> = {1, 2, 3??>;
    printf("Hello world\n");
    return 0;
%>

It’s (ab)using an obscure feature in the C89 and C99 specifications called Trigraph and Digraph, when using GCC you need to pass the ‘-trigraphs’ parameter to enable this functionality. More information can be found on the Wikipedia page about it. I wonder whether people joining code obfuscation games use this.

How not to write Python code

Lately I’ve been reading some rather unclean Python code. Maybe this is mainly because the author(s) of the code had no in-depth knowledge of the Python language itself, the ‘platform’ delivered with cPython,… Here’s a list of some of the mistakes you should really try to avoid when writing Python code:

  • Remember Python comes batteries included
    Python is shipped with a whole bunch of standard modules implementing a broad range of functionality, including text handling, various data types, networking stuff (both low- and high-level), document processing, file archive handling, logging, etc. All these are documented in the Python Library Documentation, so it is a must to browse at least through the list of available modules, so you get some notions of what you can use by default. An example: don’t introduce a dependency on Twisted to implement a very basic and simple custom HTTP server if you don’t have any performance needs, use BaseHTTPServer and derivates.
  • Python is Python, don’t try to emulate bad coding patterns from other languages
    Python is a mature programming language which provides great flexibility, but also has some pretty specific patterns which you might not know in other languages you used before.
    As an example, don’t try to emulate PHP’s ‘include’ or ‘require’ function, at all. This could be done, somewhat, by writing the code to be included (and executed on inclusion) in a module on the top level (ie. not in functions/classes/…), and using something like ‘from foo import *’ where you want this code to be executed. This will work, but it can become hard to maintain this. Modules are not meant to be used like this, so don’t. If you need to execute some code at some point, put it in a module as a function, import the function and call it wherever you want.
  • Continue reading »

Pyparsing introduction: BNF to code

After reading my previous post, you should have a pretty good understanding of what a BNF definition is all about. Let’s put this theory into practice, and write some basic parsers in Python, using Pyparsing!

Pyparsing allows a pretty one-to-one mapping of BNF to Python code: you can define sets and combinations, then parse any text fragment against it. This is something very important to notice: one basic BNF definition can (and should) be reused: if you once wrote a BNF definition for an integer value, you can easily reuse this definition in, eg, a basic integer math expression.

The most basic element using Pyparsing is a Word. In it’s most basic form this is a set of characters which will match any arbitrary length string, as long as the characters in this string are part of the Word character set.

A little introduction example: let’s write a parser which accepts words consisting of small-cap characters, or sentences which consist of words separated by spaces. First we define a formal definition using BNF:

Continue reading »

Pages: 1 2 3 4 5 6

Text parsing, formal grammars and BNF introduction

Parsing input is something most developers run into one day. Parsing binary input can be pretty straight-forward, as most of the time you know the format of the input, ie you know what to expect: if you receive a message of 10 bytes, the first byte could be a message ID, the second one the payload length, third one message type ID, and others message content. Pretty easy to handle.

Parsing human-readable text can be harder though, as human beings tend to be less strict when providing input (eg whitespacing), you can’t ask humans to prepend strings with their length, etc.

There are several ways to handle text input. One well-known method is using regular expressions with matches, but writing regular expressions which are able to process not-so-strict input can be pretty though, writing expressions to parse large bodies of text is hard, using sub-expressions can become pretty complicated,… Overall regular expressions usually involve quite a lot of black magic for the average outsider.

xkcd comic: Regular expressions

Luckily, there are easier methods to parse text input too, of which I’d like to introduct one: a Python module called Pyparsing, which can do BNF-style text parsing.

First of all, let me explain “BNF”. The Backus-Naur Form, aka BNF, is a metasyntax you can use to express the grammar of a formal language. This might make no sense at all, so let’s split it up:

Continue reading »

Pages: 1 2

django-validation now includes inheritance support

I’m happy to announce django-validation got field type inheritance support since a couple of minutes. This means your form fields will be validated starting from the most base field type (django.newforms.Field) up to the actual field type (no multiple-inheritance supported though).

In the example I wrote yesterday, when using a TestField field, this field will be validated as a django.newforms.Field (a “required” check will be done), then as a django.newforms.CharField (”min_length” and “max_length” checks), and finally as a TestField. A normal CharField would be validated as a Field first, then as a CharField, etc.

The returned errors will be a list of all errors found, starting with the most basic one (the ones found by the most general class, Field).

Next to this, all generated Javascript code should be namespaced now (based on Python module and class names), although there might be some bad things left, I’m no Javascript guru. The generated code might be somewhat messy.

Current Python code is most certainly ugly and will need more rewrites. Next to this, other field types should be added, and some tests would be nice too.

I made a snapshot of yesterday’s sample (with some changes, the ClientValidator API slightly changed), you can try it here.

Next »