[update] Question solved, see bottom of post.
Since Python 2.5 the language got a new built-in method ‘all’ (and it’s nephew ‘any’). I wanted to play around with this a little, combined with generators, so I created a little testcase to test performance.
Here’s the test-case: take a list L of X random numbers in a given range [A, B], and check whether
- all elements in L are >= A
- all elements in L are >= (A + Z) where Z is a number in [0, (B - A)]
The first test should always result True, the second test could result to False.
Here’s the output of a test-run:
In [1]: import random, sys In [2]: a = [random.randint(100, sys.maxint) for i in xrange(2000000)] In [3]: len(a) Out[3]: 2000000 In [4]: #Check whether all elements are >= 100 In [5]: %timeit all(i >= 100 for i in a) 10 loops, best of 3: 515 ms per loop In [6]: %timeit any(i < 100 for i in a) 10 loops, best of 3: 454 ms per loop In [7]: def f(l): ...: for i in l: ...: if i < 100: ...: return False ...: return True ...: In [8]: %timeit f(a) 10 loops, best of 3: 292 ms per loop In [9]: #Same thing for 100000, since now the list shouldn't be completely iterated In [10]: %timeit all(i >= 100000 for i in a) 100 loops, best of 3: 4.73 ms per loop In [11]: %timeit any(i < 100000 for i in a) 100 loops, best of 3: 4.29 ms per loop In [12]: def g(l): ....: for i in l: ....: if i < 100000: ....: return False ....: return True ....: In [13]: %timeit g(a) 100 loops, best of 3: 2.82 ms per loop In [14]: #For reference In [15]: %timeit False in (i >= 100 for i in a) 10 loops, best of 3: 531 ms per loop In [16]: %timeit False in (i >= 100000 for i in a) 100 loops, best of 3: 5.03 ms per loop
It’s as if ‘all’, ‘any’ or ‘in’ don’t break/return when a first occurence of False (or True, obviously) is found. Is this the desired behaviour, and if it is, why? The calculation time difference between using all/any/in or a custom-made function (which is, unlike all etc, not written in C) which breaks whenever it can, is pretty astonishing.
[update] Question solved. It’s pretty normal the function-based approach performs better, since it combines what ‘all’ and the generator provided to ‘all’ do, taking away the generator function-call overhead. Damn
Hmm, hier snap ik niets van, maar bedankt voor je berichtje op mn blog en succes met de cello! (Staat ook nog op mijn lijstje)
Ik kan goed programmeren, maar toch niet in deze taal. I like
Bedankt voor je berichtje
Today I was suspecting the same thing, that all() did not bail early.
Here is some code that demonstrates that it does bail on the first False
x = []
def testnum5(n):
x.append(1)
return n==5
print all( (testnum5(i) for i in [5,5,5,1,5,5]) )
print 'testnum5() called:',sum(x)
# should be 4 if all() bails early, which it is