(This is a reply on a post by Ahmed Soliman on recursion performance in (C)Python, and CPython function call overhead in general. I started to write this as a comment on his post, but it turned out much longer, so sending it over here in the end.)

Hey,

As discussed before, this is not a fair comparison, since the non-recursive version is much ‘smarter’ than the recursive one: it calculates values and will never recalculates them, whilst the recursive version calculates everything over and over again.

Adding some simple memoization helps a lot. First, my testing code:

Here are the benchmarks on my MacBook Pro Intel Core2Duo 2.33GHz with 3GB RAM (running quite a lot of applications). Do note the ‘dumb’ version calculates *fib(35)*, whilst the slightly optimized versions, which still use recursion but much less recursive calls (as they should) or your second version calculate *fib(150)*.

Using MacOS X 10.5.6 stock CPython 2.5.1:

MacBook:Projects nicolas $ python -V Python 2.5.1 MacBook:Projects nicolas $ python fib.py 35 150 fib(35) = 9227465 Calculation took 12.8542108536 seconds Calculating the amount of recursive calls to calculate fib(35) Calculating fib(35) = 9227465 took 29860703 calls fib2(150) = 9969216677189303386214405760200 Calculation took 0.00020694732666 seconds memoize_dict(fib)(150) = 9969216677189303386214405760200 Calculation took 0.00141310691833 seconds memoize_constant_list(151, fib)(150) = 9969216677189303386214405760200 Calculation took 0.000310182571411 seconds

Overall it looks like *fib2* and *memoize_constant_list* perform fairly similar, I guess function call overhead and *list.append* have a similar influence on performance in this case.

Using Jython 2.5.0 from the binary distribution on the Java HotSpot 64bit Server VM as shipped for OS X 10.5.6:

MacBook:Projects nicolas $ ./Jython/jython2.5.0/jython -V Jython 2.5.0 MacBook:Projects nicolas $ ./Jython/jython2.5.0/jython fib.py 35 150 fib(35) = 9227465 Calculation took 12.5539999008 seconds Calculating the amount of recursive calls to calculate fib(35) Calculating fib(35) = 9227465 took 29860703 calls fib2(150) = 9969216677189303386214405760200 Calculation took 0.0519998073578 seconds memoize_dict(fib)(150) = 9969216677189303386214405760200 Calculation took 0.00399994850159 seconds memoize_constant_list(151, fib)(150) = 9969216677189303386214405760200 Calculation took 0.00300002098083 seconds

The ‘dumb’ *fib* implementation performs similar in both CPython and Jython. Jython performs significantly less good on the other implementations though, but maybe todays news could help here, not sure how much locking on *dict* and *list* access Jython introduces.

Finally, using Unladen Swallow 2009Q2, self-compiled from SVN on the same system, using standard settings:

MacBook:Projects nicolas $ ./unladen-swallow/unladen-2009Q2-inst/bin/python -V Python 2.6.1 MacBook:Projects nicolas $ ./unladen-swallow/unladen-2009Q2-inst/bin/python fib.py 35 150 fib(35) = 9227465 Calculation took 12.2675719261 seconds Calculating the amount of recursive calls to calculate fib(35) Calculating fib(35) = 9227465 took 29860703 calls fib2(150) = 9969216677189303386214405760200 Calculation took 0.000118970870972 seconds memoize_dict(fib)(150) = 9969216677189303386214405760200 Calculation took 0.000972986221313 seconds memoize_constant_list(151, fib)(150) = 9969216677189303386214405760200 Calculation took 0.00036096572876 seconds

which is similar to, slighly better or slightly worse than the CPython run, and when enforcing JIT (which introduces a significant startup time, which is not measured here):

MacBook:Projects nicolas $ ./unladen-swallow/unladen-2009Q2-inst/bin/python -j always fib.py 35 150 fib(35) = 9227465 Calculation took 14.6129109859 seconds Calculating the amount of recursive calls to calculate fib(35) Calculating fib(35) = 9227465 took 29860703 calls fib2(150) = 9969216677189303386214405760200 Calculation took 0.0432291030884 seconds memoize_dict(fib)(150) = 9969216677189303386214405760200 Calculation took 0.0363459587097 seconds memoize_constant_list(151, fib)(150) = 9969216677189303386214405760200 Calculation took 0.0335609912872 seconds

which, to my surprise, performs pretty worse than the default settings.

Overall: your first implementation performs tons and tons of function calls, whilst the second one, which resembles *memoize_list_fib* in my code (which is recursive), performs significantly less function calls and in the end *memoize_list_fib* performs almost as good as your second version (it performs +- the same number of function calls as the number of times you’re going through your loop).

So whilst I do agree function calls in Python are reasonably slow compared to plain C function calls (which is just a jmp, no frame handling etc. etc. required), your comparison between your recursive and non-recursive implementation is completely unfair, and even if calculating *fib(35)* takes several seconds, consider you’re doing a pretty impressive 29860703 function calls to perform the calculation.

Time to get some sleep.

In your fib2() function, you construct a list of all the Fibonacci numbers up to the point in the sequence you’re asked for. This means that it’s memory requirements will be proportional to the number you ask for: the same problem with naive recursive solutions.

But each iteration of the loop only requires access to the previous two items in the sequence, so you could get away with only storing those two items. Something like this should be sufficient:

def fib3(n):

if n == 0 or n == 1: return n

x, y = 0, 1

for i in xrange(n-1):

x, y = y, x + y

return y

It’d be interesting to see how Unladen Swallow fairs with this version: as well as getting rid of the list reallocation, it also moves the logic to be in terms of local variables which should be easier to optimise.

it is an intelligent solution James,But am i supposed not to use recursion at all in Python,i wish there is a plan for optimizing the python interpreter functions call.

I think programmers should be able to choose their problem solving technique,not to be compelled to avoid using someone i mean recursion here.

Mohamed: I do use recursion in Python (e.g. for processing tree data structures) — I just don’t use it in cases where I am going to hit the recursion limit. Python has other tools for those sort of situations (ones that leave you with a readable stack trace when things go wrong too).

Furthermore, the naive recursive fib() implementation given here can’t take advantage of tail call optimisation anyway, due to it not simply returning the result of another function call.

To take advantage of tail call optimisation, you’d need an implementation like:

def fib_helper(x, y, n):

if n == 0: return x

return fib_helper(y, x + y, n – 1)

def fib4(n):

return fib_helper(0, 1, n)

While I agree that my fib3() implementation is slightly more difficult to read than the original naive fib() implementation, it isn’t clear to me that fib4() is better. If I was using a different language, things might tip in the other direction, but we are talking about Python here.

Great reply! However, I think that even with memoization you’re very likely to hit recursion limits in python, since you still call the function recursively, and in my experience python doesn’t really hold a huge space for the call stack. Try for example running fib2(1000) from first run, where you’ll hit the limit, now try fib2(900) then fib2(1000), you’ll immediately get the result since you only did 100 recursive calls this time.

Bottom line, you may use recursion in python, but you gotta bit a little bit more careful when designing your algorithms than other compiled programming languages

More discussion going on at http://geekfromcairo.wordpress.com/2009/08/04/to-understand-recursion-you-need-to-understand-recursion-take-2/#comment-39 and before…