Python ‘all’ odity

Nicolas — Thu, 01 May 2008 13:57:00 +0000

[update] Question solved, see bottom of post.

Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.

Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether

all elements in L are >= A
all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]

The first test should always result True, the second test could result to False.

Here’s the output of a test-run:

In [1]: import random, sys

In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)]

In [3]: len(a)
Out[3]: 2000000

In [4]: #Check whether all elements are >= 100 

In [5]: %timeit all(i >= 100 for i in a)
10 loops, best of 3: 515 ms per loop

In [6]: %timeit any(i < 100 for i in a)
10 loops, best of 3: 454 ms per loop

In [7]: def f(l):
   ...:     for i in l:
   ...:         if i < 100:
   ...:             return False
   ...:     return True
   ...: 

In [8]: %timeit f(a)
10 loops, best of 3: 292 ms per loop

In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated

In [10]: %timeit all(i >= 100000 for i in a)
100 loops, best of 3: 4.73 ms per loop

In [11]: %timeit any(i < 100000 for i in a)
100 loops, best of 3: 4.29 ms per loop

In [12]: def g(l):
   ....:     for i in l:
   ....:         if i < 100000:
   ....:             return False
   ....:     return True
   ....: 

In [13]: %timeit g(a)
100 loops, best of 3: 2.82 ms per loop

In [14]: #For reference

In [15]: %timeit False in (i >= 100 for i in a)
10 loops, best of 3: 531 ms per loop

In [16]: %timeit False in (i >= 100000 for i in a)
100 loops, best of 3: 5.03 ms per loop

It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.

[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn

Ikke's blog » performance

Python ‘all’ odity