Monday, August 17, 2009

Are Non-Virtual Methods Safer?

This post is in reply this post on Ayende's blog, but it concerns a debate I have seen popping up over and over. Please read Ayende's post for a good summary of the "issue" - that I will maintain is a non-issue - but in really short summary the topic is about "when dare we make our methods virtual?" in this case with the example of an elevator that starts to calculate taxes rather than "going up" when we call a (virtual, overriden) Up() method.

Since methods are virtual by default in Java but not in .NET this of course makes for intriguing debate, often revolving around "are virtual methods dangerous?" in that we don't know what overriding methods will be doing. There are many useful arguments around what methods to make virtual and why, but the "dangerous" argument is in my opinion not one of them, and so it is the one that I will focus on here and do my best to debunk.

In my view, there just isn't a problem, and I will try to explain what I mean, but I apologize in advance for my lack of ability to do so more succinctly.

To begin with, in our elevator case, let's assume we have the following type structure, where the first three types are provided by the framework (the elevator supplier) and the last three would normally be written by the client (but could of course be provided by the framework writer as well in some cases if good generic implementations are plausible):

Framework:
public interface IElevator { ... }
public abstract class ElevatorBase : IElevator { ... }
public class DefaultElevator : ElevatorBase { ... }

Client:
public class CustomElevator : DefaultElevator { ... }
public sealed class CertainElevator : IElevator { ... }
public class SpecialElevator : IElevator { ... }

This type structure I think covers most use cases I have seen in discussion around this topic. To explain a bit, the ElevatorBase class provides reusable implementations that most implementations of an elevator would use. The DefaultElevator is a sensible default implementation of an elevator - assuming a sensible default elevator can be written. If not, this class would be left out (And anyone writing custom elevators would have to extend ElevatorBase rather than DefaultElevator).

The CertainElevator is a sealed class, the whole point of which is to give no surprises to its users - a requirement that is enforced (to some extent) by the class being sealed. It should be noted, however, that sealing may not always be enough. The reason for why CertainElevator does not inherit ElevatorBase or DefaultElevator - even though they may well have just the right behavior to reuse - is that it cannot afford to be sensitive to changes in its behavior coming from those classes either (the Fragile Base Class problem).

The SpecialElevator is a class that has such very special behavior that the (non-virtual) implementations from ElevatorBase are no good to it (or, all ElevatorBase methods are virtual, but SpecialElevator would have to override all of them (because it's so very special) and thus there would be no point in extending ElevatorBase), thus it has to implement IElevator directly. Note, however, that it does so for a different reason than CertainElevator (which isn't really that special, but needs to protect itself from a potentially evolving base class).

Now, the core principle when working with types is to always bind to the most abstract type possible. Usually, this should be the interface. Thus, variables and parameters should, whenever possible, be declared as IElevator.

The beauty when binding to an interface, of course, is that you only bind to a set of member signatures and to a certain "point" in a type hierarchy (defining what the type "is") - but you do NOT bind yourself to any implementation. This is a prime example of the value of low coupling, as when the number of expectations on a service is minimized, the client can more easily swap implementations, making it more reusable and in many senses more robust.

However, in contrast with this we see the argument that when we call the Up() method on an elevator, we want to know that the elevator goes up rather than calculates taxes. Well, an interface won't make any such promises. Neither will a non-sealed class where the Up() method is virtual (or where one of its helper methods is virtual, perhaps it follows the Template Method Pattern. Furthermore note that even without virtual methods, you have the constructors to consider, as pointed out by Peter Morris in the comment section of Ayende's blog!).

What this means is that we're discussing a case that most explicitly contradicts our default approach of binding to the most abstract type (the interface): We're saying outright that in this case we do want to bind directly to an implementation (and have it non-overridable, to boot) whereas this would normally be considered the opposite of what you should try to strive for.

Well, fair enough, binding to an implementation is allowed, as long as we know we're giving up some reusability of our client code etc (and as long as we do it grudgingly). But here, finally, comes my point:

With a type structure like the one above, the whole issue of "should my methods be virtual" goes away completely: The client writer can chose in every instance if they want to bind to the interface (preferable) or to some sealed and completely predictable CertainElevator class - but in no case will they be binding to any virtual methods of the framework designer's making!

So, from the client's perspective:

Whenever you bind to an interface, you are not tied to any particular implementation - in other words, whenever you're binding to an interface, the whole debate of whether some implementation or other uses virtual methods or not is entirely inconsequential to you and your code, as your code is not binding to any non-virtual methods and thus not getting any of the behavioral guarantees they come with!

Those guarantees only come when you are actually binding to an implementation - which, again, should be the exceptional case - and the reason then is that you then happen to rely on having exactly one implementation - which means that in such cases, you should bind to a sealed class and thus again: the whole "virtuals or not?" debate goes away. (Note that in cases where the client really needs to bind to a CertainElevator, following this requirement to its final paranoid conclusion it follows that it is quite possible the client must even write the sealed class themselves, to make *really* sure that it doesn't suddenly change under their feet. Yes, versioning helps, but not 100%)

And from the framework writer/extender's perspective:

As long as you provide the interface, you're good. The rest - the abstract base class (ElevatorBase) and any default implementations (DefaultElevator) - you provide are really just utility classes for implementations. No actual clients should ever be binding to those classes anyway - just to the interfaces and in the exceptional cases to sealed classes (which are of course completely void of virtual methods).

So what if you make "the wrong method virtual" in one of your utility base classes? It is a non-concern, because nobody should bind to these implementations! You can't guarantee any particular behavior for a client that binds to an interface anyway, and when they need to bind to a particular behavior they should bind to a sealed class in which case the answer becomes "making any method virtual is wrong and, actually, won't compile.."

The debate always seems to go back to: "Yeah, but what if someone overrides my virtual method in a bad/clumsy way" - but my question is then: Who is binding to that virtual method, making it a problem? Because if nobody is binding to it - and I have done my best to show why I think no-one ever should be binding to it - then surely it can't be an issue exactly which methods you decide to make virtual.

Sure, making only a few of them virtual might be helpful, in that it gives a quick clue to the extender what could be a good way to extend (template method pattern is pretty nice imo) - and those are the types of arguments around when to make a method virtual that I think do make sense - but you won't be protecting anyone from anything by making the methods of your non-sealed class non-virtual, since nobody should be binding to your non-sealed classes anyway.

Sometimes it seems almost as if the whole issue springs from a desire by framework makers - including Microsoft - to market some of their non-sealed classes as something that it would be a good idea to bind directly to? In that case this would seem non-serious, in fact sloppy and dangerous - filled with false promise of guarantees that are not really there and leading to debates such as this one.

I'll stop now before I repeat myself more, but I'm really confused why everyone seems to see a big problem where I just can't see any at all. So if anyone can see what I'm missing, please let me know.

/Mats

2 comments:

  1. You are vastly over-simplifying the problem. I think if you examined your own programs, you would find that you are binding to a lot more explicit implementations than you realize. Not every class has an abstract base class or interface to bind to, nor does it necessarily make sense for them to. If I am creating a Windows form, I want to bind to a concrete implementation, because I am counting on that implementation in order for my application to work correctly. It is not ok with me to bind to an IForm and let anyone come along and supply any implementation they like; windowing is too complex to rely on abstract definitions (as all attempted platform-agnostic GUI frameworks have discovered).

    Moreover, the issue is not just whether a type is being bound to. Implementation classes still want to expose virtual methods so that their behavior can be extended without being completely re-implemented (although this would be less of an issue if modern OO languages made composition as easy as inheritance). But that doesn't mean that all of their methods should be virtual, as that invites inheritors to unknowingly change behaviors that will cause the class to not function correctly. This is the whole concept behind encapsulation in the first place.

    That said, what you are arguing against is the need for "paternalism", protecting developers from themselves by not allowing them to do dangerous things like override methods. There are pros and cons to this argument. However, that is NOT the primary reason why methods should be kept sealed. The primary reason is that it *limits* extensibility. Don't believe me? Check out http://blogs.msdn.com/brada/archive/2003/07/29/50202.aspx:

    "For example, we found that ArrayList item enumeration calls several virtual methods per each MoveNext and Current. Fixing those performance problems could (but probably doesn't) break user-defined implementations of virtual members on the ArrayList class that are dependent upon virtual method call order/frequency."

    Additional analysis of the problem referred to can be found at http://blogs.msdn.com/ricom/archive/2005/08/26/performance-quiz-7-generics-improvements-and-costs-solution.aspx.

    The primary problem with making methods virtual is that it invites consumers to take dependencies, not only on the behavior of the methods, but on the *order* and *frequency* in which they are called. This severely limits your ability to refactor and improve the class in the future. *That* is why it is so important to only make methods virtual that have explicitly been designed to be extensible.

    ReplyDelete
  2. Hi David,

    I'm not saying binding to implementation mustn't ever be done, just that if you really need to do it, then you should probably seal the class as well.

    ReplyDelete