Skip to content


Scala tail recursion and decompiler adventures

I’ve been into Scala lately. More about it will follow later, but there’s something I found out which I really like.

Last couple of days I wrote some very basic Scala snippets, containing constructs which would be non-trivial or ‘unusual’ to write in Java, compile it to a class file, and then use a Java decompiler to figure out how the Scala compiler maps those constructs to JVM bytecodes.

There’s one thing which took my attention: looks like (basic) tail-recursive functions are optimized into while-loops! This only happens if the last call of a function is a call to itself (the most basic form of tail recursion), but it’s an interesting feature anyway… No more need to put socket accept handling in an infinite while loop :-)

A little demo. First, here’s a Scala object which implements a very basic ‘reduce’ function:

object Reducer {
  def reduce[T, V](fun: (V, T) => V, values: List[T], initial: V): V = {
    if(values isEmpty)
      return initial
    val next = fun(initial, values head)
    return reduce(fun, values tail, next)
  }

  def main(args: Array[String]): Unit = {
    val values = List(1, 2, 3, 4)
    val sum = reduce[Int, Int]((x, y) => x + y, values, 0)
    println("Result: " + sum)
  }
}

We can compile and run this, and it’ll output the expected result ’10′:

MacBook:reduce nicolas $ scalac Reducer.scala 
MacBook:reduce nicolas $ scala Reducer
Result: 10

Now we can open the generated class files in JD. There are a couple of them (it’s interesting to take a look at all of them and figure out what they represent exactly), but in this case we need ‘Reducer$.class’, which contains the implementations of our public functions, including ‘reduce’.

Here’s the Java version of the ‘reduce’ function:

public <T, V> V reduce(Function2<V, T, V> fun, List<T> values, V initial)
{
  while (true)
  {
    if (values.isEmpty())
      return initial;
    Object next = fun.apply(initial, values.head());
    initial = next;
    values = values.tail();
  }
}

‘Function2′ is a built-in Scala type which represents a function taking 2 parameters. As you can see, this code does exactly the same as our Scala version and is most likely the way we’d write the code manually as well (the only thing I don’t get is why ‘next’ is an Object and not a ‘V’, I might figure that out later), but without forcing us to write the imperative code, whilst still producing bytecodes which will most likely show the best performance on the JVM (which currently has no tail recursion optimization support (although that might change one day)).

I like it :-)

[update]
For reference, here’s a slightly more Scala-ish implementation of reduce, showing the same time performance characteristics during some basic profiling. I was not able to get JD nor jad to generate any usable decompiled code though:

def reduce[T, V](fun: (V, T) => V, values: List[T], initial: V): V = {
    values match {
        case List() => initial;
        case head :: tail => reduce(fun, tail, fun(initial, head))
    }
}

It uses Scala’s “List” pattern matching functionality.

Posted in Development, Technology.

Tagged with , , , .


7 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Simon says

    Ran into an ugly little bit of Java code recently, that surely must have been written by a functional-language programmer – splitting a comma-separated string, implemented by tail recursion. Never mind that Java has better ways of splitting strings, such methods really don’t work under the JVM – the code was failing with the inevitable StackOverflowException…

  2. Berlin Brown says

    “the code was failing with the inevitable ”

    Yea and I always wonder how Java handles recursive calls. I always feel that I can’t get more than 3000 calls on some of the most basic code. I figure with the default 64MB jvm heap size, I sure hope the JVM can take more than that.

    To the Scala code, have you tested if you increase the number of parameters in your reduce method. I wonder if Scala will ever default to the Java recursive call if you have too many args.

  3. Daniel Gonzalez says

    “the only thing I don’t get is why ‘next’ is an Object and not a ‘V’”
    That’s due to the way Generics work in Java: They only exist at compile time. JVM does NOT know a thing about Generics, it keeps on dealing with Object. As somebody said, Generics is “syntactical sugar”.

    That also mean that you can’t rely on Generics for run time checks, because that information has been lost when transalated to bytecode.

  4. Nicolas says

    @Simon: I can imagine :-) Using *any* functional construct in a primarily imperative language/environment might cause issues, but I do think having notions of functional concepts and techniques can also benefit imperative programmers and code quality.

    @Berlin: Regarding number of arguments, I didn’t test it yet, but I expect things to work out fine, at least until 22 arguments (looks like 22 is the largest function arity in the Scala library: http://lampsvn.epfl.ch/svn-repos/scala/scala/trunk/src/library/scala/).

    @Daniel: Thanks for the information, although I don’t really see how that applies here: why can’t the Scala compiler emit bytecodes which immediately identify ‘next’ to be of type V? I might be missing something… Don’t know Java bytecodes at all, might need to peek into them one day.

  5. Ismael Juma says

    Hi,

    A very useful addition to Scala 2.8.0 (unreleased) is the @tailrec annotation. That allows you to tell the compiler to issue an error if it’s unable to apply the optimisation.

    Best,
    Ismael

    • Nicolas says

      Ismael, I saw that while browsing the Scala sources, indeed :-) Still on 2.7 though, looking forward to 2.8!

  6. Jim says

    “Yea and I always wonder how Java handles recursive calls. I always feel that I can’t get more than 3000 calls on some of the most basic code. I figure with the default 64MB jvm heap size, I sure hope the JVM can take more than that. ”

    The limitation on the number of nested calls is determined by the (per-thread) stack size, not the heap size. This can be expanded with -Xss jvm parameter.



Some HTML is OK

or, reply to this post via trackback.