Python ‘all’ odity

[update] Question solved, see bottom of post.

Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.

Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether

  • all elements in L are >= A
  • all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]

The first test should always result True, the second test could result to False.

Here’s the output of a test-run:

In [1]: import random, sys

In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)]

In [3]: len(a)
Out[3]: 2000000

In [4]: #Check whether all elements are >= 100 

In [5]: %timeit all(i >= 100 for i in a)
10 loops, best of 3: 515 ms per loop

In [6]: %timeit any(i < 100 for i in a)
10 loops, best of 3: 454 ms per loop

In [7]: def f(l):
   ...:     for i in l:
   ...:         if i < 100:
   ...:             return False
   ...:     return True
   ...: 

In [8]: %timeit f(a)
10 loops, best of 3: 292 ms per loop

In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated

In [10]: %timeit all(i >= 100000 for i in a)
100 loops, best of 3: 4.73 ms per loop

In [11]: %timeit any(i < 100000 for i in a)
100 loops, best of 3: 4.29 ms per loop

In [12]: def g(l):
   ....:     for i in l:
   ....:         if i < 100000:
   ....:             return False
   ....:     return True
   ....: 

In [13]: %timeit g(a)
100 loops, best of 3: 2.82 ms per loop

In [14]: #For reference

In [15]: %timeit False in (i >= 100 for i in a)
10 loops, best of 3: 531 ms per loop

In [16]: %timeit False in (i >= 100000 for i in a)
100 loops, best of 3: 5.03 ms per loop

It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.

[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn :-)

Python if/else in lambda

Scott, in your “Functional Python” introduction you write:

The one limitation that most disappoints me is that Python lacks is a functional way of writing if/else. Sometimes you just want to do something like this:

lambda x : if_else(x>100, “big number”, “little number”)

(This would return the string “big number” if x was greater than 100, and “little number” otherwise.) Sometimes I get around this by defining my own if_else that I can use in lambda-functions:

def if_else(condition, a, b) :
   if condition : return a
   else         : return b

Actually, you don’t need this helper if_else function at all:

In [1]: f = lambda x: x > 100 and 'big' or 'small'
In [2]: for i in (1, 10, 99, 100, 101, 110):
...:     print i, 'is', f(i)
...:
1 is small
10 is small
99 is small
100 is small
101 is big
110 is big

James, obviously you’re right… Stupid me didn’t think about that. Your version won’t work when a discriminator isn’t known at import time. But even then a function taking *args and **kwargs with a class-like name, returning a correct class instance, would cut the job.

Regarding the module/plugin stuff, I’d rather use setuptools/pkg_resources :-)

Python factory-like type instances

When designing applications or libraries, sometimes you need to be able to create instances of a certain interface (in a liberal sense) at runtime without knowing at write/compile time which specific implementation (class) you’ll need to use, as this could depend on runtime variables.

An example of this is an interface providing some functionality which should be implemented differently on different platforms, eg Linux and Windows.

There are some standard patterns how to achieve this. One of them is the factory pattern, which works somewhat like this Python example (let’s pretend ‘PLATFORM’ is ‘linux2′ or ‘win32′, ie sys.platform):

#Pretend we use sys.platform instead of PLATFORM where we use it
PLATFORM = 'linux2'

class FooBase(object):
    def say_foo(self):
        print 'foo'

class PlatformFoo(FooBase):
    def say_platform_foo(self):
        raise NotImplementedError

    @staticmethod
    def get_class():
        #Several ways to get this (dict, introspection, if-tree,...), pick yours
        klass = {
            'linux2': LinuxFoo,
            'win32': WindowsFoo,
        }.get(PLATFORM, None)
        if not klass:
            raise Exception, 'Platform not supported'
        return klass

class WindowsFoo(PlatformFoo):
    def say_platform_foo(self):
        print 'win32 foo'

class LinuxFoo(PlatformFoo):
    def say_platform_foo(self):
        print 'linux foo'

def main():
    foo_class = PlatformFoo.get_class()
    foo = foo_class()
    foo.say_platform_foo()

if __name__ == '__main__':
    main()

Executing this code will, as expected, write ‘linux foo’ to the console. Obviously we could not return the platform-specific class in a PlatformFoo function, but an actual instance, up to you.

Python allows you to handle this situation somewhat nicer though, without introducing any intermediate functions, by using metaclasses.

Continue reading »

How not to write Python code

Lately I’ve been reading some rather unclean Python code. Maybe this is mainly because the author(s) of the code had no in-depth knowledge of the Python language itself, the ‘platform’ delivered with cPython,… Here’s a list of some of the mistakes you should really try to avoid when writing Python code:

  • Remember Python comes batteries included
    Python is shipped with a whole bunch of standard modules implementing a broad range of functionality, including text handling, various data types, networking stuff (both low- and high-level), document processing, file archive handling, logging, etc. All these are documented in the Python Library Documentation, so it is a must to browse at least through the list of available modules, so you get some notions of what you can use by default. An example: don’t introduce a dependency on Twisted to implement a very basic and simple custom HTTP server if you don’t have any performance needs, use BaseHTTPServer and derivates.
  • Python is Python, don’t try to emulate bad coding patterns from other languages
    Python is a mature programming language which provides great flexibility, but also has some pretty specific patterns which you might not know in other languages you used before.
    As an example, don’t try to emulate PHP’s ‘include’ or ‘require’ function, at all. This could be done, somewhat, by writing the code to be included (and executed on inclusion) in a module on the top level (ie. not in functions/classes/…), and using something like ‘from foo import *’ where you want this code to be executed. This will work, but it can become hard to maintain this. Modules are not meant to be used like this, so don’t. If you need to execute some code at some point, put it in a module as a function, import the function and call it wherever you want.
  • Continue reading »

Pyparsing introduction: BNF to code

After reading my previous post, you should have a pretty good understanding of what a BNF definition is all about. Let’s put this theory into practice, and write some basic parsers in Python, using Pyparsing!

Pyparsing allows a pretty one-to-one mapping of BNF to Python code: you can define sets and combinations, then parse any text fragment against it. This is something very important to notice: one basic BNF definition can (and should) be reused: if you once wrote a BNF definition for an integer value, you can easily reuse this definition in, eg, a basic integer math expression.

The most basic element using Pyparsing is a Word. In it’s most basic form this is a set of characters which will match any arbitrary length string, as long as the characters in this string are part of the Word character set.

A little introduction example: let’s write a parser which accepts words consisting of small-cap characters, or sentences which consist of words separated by spaces. First we define a formal definition using BNF:

Continue reading »

Pages: 1 2 3 4 5 6

django-validation now includes inheritance support

I’m happy to announce django-validation got field type inheritance support since a couple of minutes. This means your form fields will be validated starting from the most base field type (django.newforms.Field) up to the actual field type (no multiple-inheritance supported though).

In the example I wrote yesterday, when using a TestField field, this field will be validated as a django.newforms.Field (a “required” check will be done), then as a django.newforms.CharField (”min_length” and “max_length” checks), and finally as a TestField. A normal CharField would be validated as a Field first, then as a CharField, etc.

The returned errors will be a list of all errors found, starting with the most basic one (the ones found by the most general class, Field).

Next to this, all generated Javascript code should be namespaced now (based on Python module and class names), although there might be some bad things left, I’m no Javascript guru. The generated code might be somewhat messy.

Current Python code is most certainly ugly and will need more rewrites. Next to this, other field types should be added, and some tests would be nice too.

I made a snapshot of yesterday’s sample (with some changes, the ClientValidator API slightly changed), you can try it here.

django-validation: an introduction

Some time ago I wrote this generic AJAX Django form validation code. Some people didn’t like this, as AJAX should not be used to perform form validation, which is sometimes true, sometimes not, as I pointed out before.

So I’ve been thinking since some time to create a Django templatetag which allows one to generate client-side Javascript form validation code without writing any code himself (unless using custom widgets). Today I got into it.

Continue reading »

CouchDB with Python

Today I’ve been investigating CouchDB a little better (only heard some rumors about it before). It’s actually a pretty nice technology which can, in some places, be pretty useful… I tend to compare it to caching serialized PHP associative arrays or Python dict’s in a Memcached server using some specific prefixes, except it’s not really memory-based (it’s persistent), you get a complete query interface (views), there’s dataset versioning support (!), etc. While writing this I start to wonder what similarities I ever saw between CouchDB and a Python pickled dict in Memcached…

Anyway, one use case I saw was site user profiles: profile data is most of the time not relational at all, so why store it in a relational database, which makes it sometimes rather hard to add extra profile information fields, unless you use some dirty ’save serialized form’ trick, which renders your data unqueryable? Storing profile information (using eg. a user’s primary email address or login name as key for the user profile document) in CouchDB allows you to extend the profile “schema” easily: just add a field to your profile editting form, make sure it’s processed server-side an stored in the profile document, and add some extra code to your profile rendering template so the extra field get displayed too. No need to alter SQL tables at all!

As in my last site project I also have some sort of user profiles, I was thinking about using CouchDB for storage of these objects. As the site is written using Django, it would be nice to be able to define a standard Django model for the profile, which would be stored in CouchDB, not insome SQL server. This way you can still enjoy newforms goodness, among others.

So I started some new project, called django-couchdb, which should in time provide a model base class (similar to django.db.models.Model), corresponding managers to query the data, and so on. I don’t know (yet) whether all this is possible to achieve, anyway, I started by creating a very basic Python class which allows you to access a CouchDB server in a very Pythonic way: using dicts. A Server is a dict consisting of Databases, a Database is a dict of Documents. All this implemented thanks to the goodness of the DictMixin base class.

The client is not finished yet, at least 3 TODO items are on my list:

  • Error/Exception handling
  • Revision handling
  • View support

Currently there is no support for any of these. Views should be easy to add, error handling a little harder. I think revision handling is the hardest part, escpecially on figuring out how to provide this functionality in a Pythonic manner.

You can find the current code in this Git repository. Patches or external branches are very welcome!

By the way: the website I referred to before has been launched. It’s only of any use (well, maybe) for dutch-speaking users though. You can visit it here. Yes, the template will change.