[update] Question solved, see bottom of post.
Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.
Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether
- all elements in L are >= A
- all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]
The first test should always result True, the second test could result to False.
Here’s the output of a test-run:
In [1]: import random, sys In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)] In [3]: len(a) Out[3]: 2000000 In [4]: #Check whether all elements are >= 100 In [5]: %timeit all(i >= 100 for i in a) 10 loops, best of 3: 515 ms per loop In [6]: %timeit any(i < 100 for i in a) 10 loops, best of 3: 454 ms per loop In [7]: def f(l): ...: for i in l: ...: if i < 100: ...: return False ...: return True ...: In [8]: %timeit f(a) 10 loops, best of 3: 292 ms per loop In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated In [10]: %timeit all(i >= 100000 for i in a) 100 loops, best of 3: 4.73 ms per loop In [11]: %timeit any(i < 100000 for i in a) 100 loops, best of 3: 4.29 ms per loop In [12]: def g(l): ....: for i in l: ....: if i < 100000: ....: return False ....: return True ....: In [13]: %timeit g(a) 100 loops, best of 3: 2.82 ms per loop In [14]: #For reference In [15]: %timeit False in (i >= 100 for i in a) 10 loops, best of 3: 531 ms per loop In [16]: %timeit False in (i >= 100000 for i in a) 100 loops, best of 3: 5.03 ms per loop
It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.
[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn