Tuesday, August 18, 2009

A Pet Shop Anti Benchmark?

Anyone who has been involved with the many "PetShop" related dramas over the years must have had a difficult time not to develop anti-bodies to the whole idea. However, like a fool who just doesn't know how to stop, I now return to this concept hoping to revive it yet again. No, don't run, please hear me out...there's a bit of a twist to it - I actually think I may have had a bit of an idea!

As far as I can see, historically all the problems have really sprung from the same root: That different players tried to use PetShop implementations to demonstrate that their platforms were faster/better.

"Well, duh", you might say to that :-)

Before I start sounding like I'm heaping blame to the left and right on PetShop contributors, I want to humbly note that if blame there should be, then I am one of those to be blamed. I wrote one such PetShop using Pragmatier (the O/R Mapper I had written, I think it may have been the first O/R Mapping based PetShop for .NET - it was right around the time of the theserverside drama linked at the top of this post) and we tested it in a big lab and published a report ( http://www.devx.com/vb2themax/Article/19867 ) - all, of course, with the express hope that it would show potential customers what a splendid and fast product we offered.

So I want to underline how it would indeed seem very human to be driven towards implementing a PetShop (it takes a bit of time) because one hopes it will demonstrate something positive in terms of the competitive benefits of one's framework. Indeed, that's why I did it.

But while we can go on for a long while detailing why this approach has so many drawbacks that it may be ultimately doomed to collapse in a black hole of bad PR, I am suddenly more intrigued by the notion that perhaps a PetShop benchmark could be very useful after all - but if used in exactly the opposite way of how they have been used so far:

What about a PetShop benchmark that compared the performance of a persistence layer using different O/RMs versus a plain ADO one in order to show how small the difference is between frameworks (and between any framework and plain ADO) in terms of overall application performance? The goal of such a site would be not to compete about who is actually fastest, but to clearly demonstrate how much of overall application performance that is eaten up by the O/RMs on average (and of course show the caps given by the worst and best numbers as well) - and if my predictions will turn out correct, that this average is likely to be convincingly small.

Of course, one could then go on to study the result details to see that mapper X is the one that makes the application go marginally faster than mapper Y - or the results could be anonymized, but I don't think they would have to be: I seriously think that if done in a realistic way, the outcome on a viewer would be the very opposite of wanting to look up who was actually fastest. (*)

The overview would show (I think) that of the total application execution time, all the interesting mappers take up a similar, pretty small percentage, and if so the conclusion would become that it doesn't matter if mapper X or Y is the fastest - they are all fast enough, and the question instead becomes which mapper fits your style of work, existing tool sets, has good documentation etc etc.

Now, as far as I can see, the only mappers who wouldn't want to contribute their mapper to such a benchmark are the ones who 1) are so horribly slow that they would be off the chart, or 2) have speed as their main selling point, since the benchmark would primarily show their selling point to be largely irrelevant.

The rest of the mappers would benefit from jointly demonstrating that speed isn't really the differentiating factor between mappers (and almost never a reason to discourage use of mappers in general), and that thus the client can have the luxury of making their choice based on things like nice documentation, or API suitability, or relevant feature sets...and mapper makers in their turn will have the benfit of being able to convincingly demonstrate just why there is credible room for several players on the market (they all cater to different tastes and prioritize features differently).

In a sense, the idea might be better described as an "anti-benchmark" than a real benchmark, since (if I am right) it would be used mainly to demonstrate that the benchmark itself is meaningless (which, in a paradoxial way, still is very meaningful to demonstrate).

But most importantly, such a benchmark could give us a conclusive answer to the age old question: Does the speed of an (non-dog-slow) O/RM matter?

(*) Further against anonymous results: there would of course still be a small segment of applications where any small performance benefit really would be the deciding factor, and why shouldn't they be able to find the info about who is fastest and why shouldn't the fastest framework be able to have at least those customers coming to it? seems fair to me. See above for why I think any framework spending a large effort on extreme performance optimization can probably use those customers, especially if a benchmark such as this one would go live...

Monday, August 17, 2009

Are Non-Virtual Methods Safer?

This post is in reply this post on Ayende's blog, but it concerns a debate I have seen popping up over and over. Please read Ayende's post for a good summary of the "issue" - that I will maintain is a non-issue - but in really short summary the topic is about "when dare we make our methods virtual?" in this case with the example of an elevator that starts to calculate taxes rather than "going up" when we call a (virtual, overriden) Up() method.

Since methods are virtual by default in Java but not in .NET this of course makes for intriguing debate, often revolving around "are virtual methods dangerous?" in that we don't know what overriding methods will be doing. There are many useful arguments around what methods to make virtual and why, but the "dangerous" argument is in my opinion not one of them, and so it is the one that I will focus on here and do my best to debunk.

In my view, there just isn't a problem, and I will try to explain what I mean, but I apologize in advance for my lack of ability to do so more succinctly.

To begin with, in our elevator case, let's assume we have the following type structure, where the first three types are provided by the framework (the elevator supplier) and the last three would normally be written by the client (but could of course be provided by the framework writer as well in some cases if good generic implementations are plausible):

Framework:
public interface IElevator { ... }
public abstract class ElevatorBase : IElevator { ... }
public class DefaultElevator : ElevatorBase { ... }

Client:
public class CustomElevator : DefaultElevator { ... }
public sealed class CertainElevator : IElevator { ... }
public class SpecialElevator : IElevator { ... }

This type structure I think covers most use cases I have seen in discussion around this topic. To explain a bit, the ElevatorBase class provides reusable implementations that most implementations of an elevator would use. The DefaultElevator is a sensible default implementation of an elevator - assuming a sensible default elevator can be written. If not, this class would be left out (And anyone writing custom elevators would have to extend ElevatorBase rather than DefaultElevator).

The CertainElevator is a sealed class, the whole point of which is to give no surprises to its users - a requirement that is enforced (to some extent) by the class being sealed. It should be noted, however, that sealing may not always be enough. The reason for why CertainElevator does not inherit ElevatorBase or DefaultElevator - even though they may well have just the right behavior to reuse - is that it cannot afford to be sensitive to changes in its behavior coming from those classes either (the Fragile Base Class problem).

The SpecialElevator is a class that has such very special behavior that the (non-virtual) implementations from ElevatorBase are no good to it (or, all ElevatorBase methods are virtual, but SpecialElevator would have to override all of them (because it's so very special) and thus there would be no point in extending ElevatorBase), thus it has to implement IElevator directly. Note, however, that it does so for a different reason than CertainElevator (which isn't really that special, but needs to protect itself from a potentially evolving base class).

Now, the core principle when working with types is to always bind to the most abstract type possible. Usually, this should be the interface. Thus, variables and parameters should, whenever possible, be declared as IElevator.

The beauty when binding to an interface, of course, is that you only bind to a set of member signatures and to a certain "point" in a type hierarchy (defining what the type "is") - but you do NOT bind yourself to any implementation. This is a prime example of the value of low coupling, as when the number of expectations on a service is minimized, the client can more easily swap implementations, making it more reusable and in many senses more robust.

However, in contrast with this we see the argument that when we call the Up() method on an elevator, we want to know that the elevator goes up rather than calculates taxes. Well, an interface won't make any such promises. Neither will a non-sealed class where the Up() method is virtual (or where one of its helper methods is virtual, perhaps it follows the Template Method Pattern. Furthermore note that even without virtual methods, you have the constructors to consider, as pointed out by Peter Morris in the comment section of Ayende's blog!).

What this means is that we're discussing a case that most explicitly contradicts our default approach of binding to the most abstract type (the interface): We're saying outright that in this case we do want to bind directly to an implementation (and have it non-overridable, to boot) whereas this would normally be considered the opposite of what you should try to strive for.

Well, fair enough, binding to an implementation is allowed, as long as we know we're giving up some reusability of our client code etc (and as long as we do it grudgingly). But here, finally, comes my point:

With a type structure like the one above, the whole issue of "should my methods be virtual" goes away completely: The client writer can chose in every instance if they want to bind to the interface (preferable) or to some sealed and completely predictable CertainElevator class - but in no case will they be binding to any virtual methods of the framework designer's making!

So, from the client's perspective:

Whenever you bind to an interface, you are not tied to any particular implementation - in other words, whenever you're binding to an interface, the whole debate of whether some implementation or other uses virtual methods or not is entirely inconsequential to you and your code, as your code is not binding to any non-virtual methods and thus not getting any of the behavioral guarantees they come with!

Those guarantees only come when you are actually binding to an implementation - which, again, should be the exceptional case - and the reason then is that you then happen to rely on having exactly one implementation - which means that in such cases, you should bind to a sealed class and thus again: the whole "virtuals or not?" debate goes away. (Note that in cases where the client really needs to bind to a CertainElevator, following this requirement to its final paranoid conclusion it follows that it is quite possible the client must even write the sealed class themselves, to make *really* sure that it doesn't suddenly change under their feet. Yes, versioning helps, but not 100%)

And from the framework writer/extender's perspective:

As long as you provide the interface, you're good. The rest - the abstract base class (ElevatorBase) and any default implementations (DefaultElevator) - you provide are really just utility classes for implementations. No actual clients should ever be binding to those classes anyway - just to the interfaces and in the exceptional cases to sealed classes (which are of course completely void of virtual methods).

So what if you make "the wrong method virtual" in one of your utility base classes? It is a non-concern, because nobody should bind to these implementations! You can't guarantee any particular behavior for a client that binds to an interface anyway, and when they need to bind to a particular behavior they should bind to a sealed class in which case the answer becomes "making any method virtual is wrong and, actually, won't compile.."

The debate always seems to go back to: "Yeah, but what if someone overrides my virtual method in a bad/clumsy way" - but my question is then: Who is binding to that virtual method, making it a problem? Because if nobody is binding to it - and I have done my best to show why I think no-one ever should be binding to it - then surely it can't be an issue exactly which methods you decide to make virtual.

Sure, making only a few of them virtual might be helpful, in that it gives a quick clue to the extender what could be a good way to extend (template method pattern is pretty nice imo) - and those are the types of arguments around when to make a method virtual that I think do make sense - but you won't be protecting anyone from anything by making the methods of your non-sealed class non-virtual, since nobody should be binding to your non-sealed classes anyway.

Sometimes it seems almost as if the whole issue springs from a desire by framework makers - including Microsoft - to market some of their non-sealed classes as something that it would be a good idea to bind directly to? In that case this would seem non-serious, in fact sloppy and dangerous - filled with false promise of guarantees that are not really there and leading to debates such as this one.

I'll stop now before I repeat myself more, but I'm really confused why everyone seems to see a big problem where I just can't see any at all. So if anyone can see what I'm missing, please let me know.

/Mats