Tuesday, December 9, 2014

When You Change a Method Return Type...

... strange effects can result under certain circumstances! Recently some Oomph users reported mysterious NoSuchMethodErrors at runtime and I spent quite some time to hunt down the problem. What I found is kind of scary. Consider the following, simple program:

  public class Util
  {
    public static void run()
    {
      // Run it...
    }
  }

  public class Main
  {
    public static void main(String[] args)
    {
      Util.run();
    }
  }

The bytecode of the compiled main() method is as simple as:

  invokestatic Util/run()V
  return

Notice the uppercase "V" at the end of the run() method call. It indicates that the return type of the called run() method is "void" and is part of the bytecode of the caller! Now change the declaration of the called run() method to return a boolean value:

  public class Util
  {
    public static boolean run()
    {
      // Run it...
      return true;
    }
  }

  public class Main
  {
    public static void main(String[] args)
    {
      Util.run();
    }
  }

Recompile both classes and look at the bytecode of the main() method again:

  invokestatic Util/run()Z
  pop
  return

Notice that the bytecode has changed even though the source code of the Main class has not changed the least bit. The old run() method with no return type would not be considered a valid call target anymore!

Interesting, but when can this become a real problem?

Well, in our case the calling and the called method are in different OSGi plugins and we use Maven/Tycho to build them and our Oomph users use p2 to install or update them. The following steps turned out to be tragic:

  • I changed the return type of the called method from void to boolean.
  • Maven/Tycho has built both the calling and the called plugin.
    • The called plugin got a new version (build qualifier) because it was really changed.
    • The calling plugin did not get a different version because its source code wasn't changed.
  • A user updated his Oomph installation to the new build.
    • The called plugin was updated because a new version was found.
    • The calling plugin was not updated because there was no new version available. To be clear, there was a plugin with different content in the new build, but it had the same version as in the previous build.
As a result this user was faced with an evil exception at runtime:

  java.lang.NoSuchMethodError: Util.run()V

Now that I know why this happened I can easily fix the nasty problem by applying a fake change to the calling plugin's source code to cause Tycho to assign a new version number to it; one that is consistent with the bytecode of the called plugin.

The fact that this can happen so easily leaves me kind of scared. After all, I'll probably never ever try to change a method return type again.

5 comments:

  1. https://wiki.eclipse.org/Evolving_Java-based_APIs_2 is a fairly good reference in this regard.

    ReplyDelete
  2. Changing the return type is a binary incompatible change. Under semantic versioning that is a major version increase. This should have been caught by your build and test system if you used semantic versioning.

    ReplyDelete
  3. Hi Cedric, BJ, Thanks for your hints. All this happened in consecutive nightly builds of unreleased code. No "real" public API was involved.

    ReplyDelete
  4. What is the reason that this is part of the bytecode of the caller and not the callee? I'm sure there's a pretty obvious reason Java designers did this, and I would really like to know the reason :)

    ReplyDelete
  5. @Marco I explained it a bit at http://tomsondev.bestsolution.at/2014/12/09/strange-things-about-java-byte-code-and-consequences-for-api-evolution/

    ReplyDelete