13 January 2008

Avoid mega unit tests

We've just had a posting to the jMock user list that included the following:

I was involved in a project recently where JMock was used quite heavily. Looking back, here's what I found:
  1. The unit tests where at times unreadable (no idea what they were doing).
  2. Some tests classes would reach 500 lines in addition to inheriting an abstract class which also would have up to 500 lines.
  3. Refactoring would lead to massive changes in test code.

I've seen this failure mode on another project I've been helping with, so I think there might be a common pattern.

I don't think any unit test code should get that large, except for unusual circumstances. Unit tests are supposed to focus on at most a few classes and shouldn't need a large amount of set up. What I saw on the other project was enormous amounts of preparation and positioning to get the objects into a state where the expected feature could be exercised. Of course it's hard to understand the point of a test when there's just so much code. If you see this pattern, then I'd suggest that the code (or at least the test code) needs breaking up a bit. On the other hand, integration and acceptance tests that happen to be written using a unit test framework might well be larger.

One thing I need to explore is whether the naive use of interaction-testing is particularly susceptible to this failing, or whether it happens all the time and we're the only ones who get complained to. I am, however, convinced that the emphasis on mainly using Mocks to substitute external systems (some of which I perpetrated myself in the early days) is a deeply bad idea which pushes teams towards the sort of problem described here.

18 comments:

Oliver said...

Hi Steve,
we have 2 JMock tests which are getting very large.

Originally I wrote them to help me understand how hibernate talks to our database. These tests have also been pivotal in finding and avoiding a nasty hibernate edge case.

Unfortunately each time we extend our data model we have to change the expectations in the test.

I still argue that we need to understand what hibernate is doing.

We have a compromise, we don't use JMock for all our database tests we use DBUnit.

DBUnit also has the advantage of really commiting transactions to the database, allowing you to discover ancient integrity rules in legacy schemas that we don't have time to learn. It also has its own disadvantages.

I fight to keep our 2 JMock hibernate tests on the basis that they have found and helped solve what would have been very serious issues.

I could remove them if I had perfect knowledge of Hibernate, but since these tests verify my understanding of hibernate, even then I'd be on shaky ground.

People moaning about maintaining tests is the thin end of the wedge. If it's slowing you down, speed it up. If you don't understand it, rewrite it.

If you can't rewrite it, then you probably don't understand the thing you've broken and need to ask for help (or, God forbid, do some learning).

If you just can't be bothered to rewrite it, then it's not killing you, probably more on the scale of a slight itch.

OliBye

Steve Freeman said...

Hi Oli,

I'd agree that for testing against Hibernate, I'd prefer to exercise the real thing with integration tests. The configuration is as important as the code.

In your case, I'd wonder whether I could have most of the dependency stubbed out in some automatic way so I could have the test focus on the sliver that really matters — but I'm sure you've considered that.

The risk that I've seen on some projects is that there are things that are killing them and should be rewritten, but the team don't think it's that bad, perhaps because they're used to worse. That's why it's a good idea to have a community where you can get a second opinion.

Colin Jack said...

Yeah I certainly think this is a topic that people need to spend a lot more time documenting.

On the problem affecting other types of tests, I agree. In our case:

1) State - Often these tests use a couple of Object Mothers/Test Data Builders to setup a few domain aggregates then pass them to some service/entity that does some work. The tests themselves can be simple but they are calling out to the object mothers which internally do quite a bit of work.
2) Integration - They have to inherit from a base class that handles the transaction and they definitely involve quite a bit of object setup.
3) Interaction - If they involve a lot of mocking/stubbing they do get complex very quickly. You also get different types of complexity, not least as when you want to see the state tests for the behavior, checking that the correct thing was done, you have to hunt around a good bit. We now use a fluent interface to build the mocks, so can get reuse of the mocking code. However realistically a little change in interactions could break quite a few tests.

The problem isn't just about the type of testing of course, the level you test at has a big effect too. Interaction/state testing at the lowest levels couples you to implementation, doing it at the lower level makes the tests easier to read/write and less fragile. I'm sure there is a way of using different types/levels of tests to maximize benefits, not sure I've found it yet though.

So I think there generally needs to be more focus on how to write maintainable and useful tests.

Steve Freeman said...

Colin, I think you're right. Most of the material so far has been on "how do I do this at all?" Now we need to work on "how do I do this in practice?", raising the level of skill and judgement in the trade.

We need more material out there beyond the simple two or three class examples.

Colin Jack said...

Yeah thats my feeling. As you say more realistic examples would definitely help too.

My opinion is that the risk at the moment isn't just awful tests (interaction or state) but also missed opportunities. Both techniques can help with design but people (probably myself included) sometimes don't take proper advantage of this aspect. People also tend to use one approach or the other, or if they use both they don't really have the knowledge to judge where each is appropriate (other than most people mocking/stubbing external dependencies).

You thus end up with people writing all sorts of tests at all sorts of granularities for all sorts of reasons, which from what I've seen can lead to each technique and TDD in general getting a bit of a bad name.

Steve Freeman said...

Colin.

Yes, I'm getting tired of being given a bad name... :)

Colin Jack said...

Yeah, there does seem to be a bit of a fight back against mocking these days. I used to think the fightback was good because I like state testing (especially for business code) but now I'm turning full circle and wondering if its just that many people do it badly (myself included) or don't take full advantage of it.

Mocking/stubbing is also an odd topic as you have two types of advantages, one is easy to sell and seems to be what many people focus on whilst the other is much more subtle and arguably more useful.

The easy one is decoupling in order to write unit tests, many years back when I started using mocking it just made sense to mock out depdencies (database/file system/etc). In many cases it still does if you want fast tests.

However you guys seem to focus a lot on the harder design oriented issues. Specifying ISP compliant role interfaces and using them as an aid to your design efforts is more difficult to explain and justify. To me its also more about changing the way you design which is hard to sell and in many cases I think people overuse the technique (which happens with every good technique). Having said that I'm certainly interested in working out whether/how/when I should weave it into my design process.

Steve Freeman said...

well, we went from obscure to backlash in one.

Now, after a while, quite a few people understand what we meant but many (including some published names) don't appear to. Apparently, victory is not always to the swift (or so I keep telling myself).

Colin Jack said...

Wow, but hasn't mocking been around for a while yet. In general though it does seem like your message hasn't really been heeded

Whilst I'm commenting on the whole thing I should say that I think that the discussion could lead to it being overused. This happens with other topics too. Take IOC, some people use DI/IOC to create truly decoupled systems where no class really has much of a dependency on any other class. To me thats massive overdesign and I think that then causes backlash (like the whole storm we had over IOC recently).

Anyway I digress but I did wonder, have you thought about a WIKI or something similiar?

Steve Freeman said...

We published at the first XP conference in 2000. Some people agree with our approach, some don't.

It can be overused, like everything, although my guess is that misunderstanding is more common.

This is our current forum, we're working on publishing a book which is rather taking up the cycles at the moment.

Ernest Hammingweight said...

I look forward to reading your book. When do you think that it'll be available?

Steve Freeman said...

Now there's an interesting question which, as Agilistas, we should have a better answer.

We'll probably post some large chunks up in the next few weeks.

Colin said...

I read with interest the problem discussed here of tests getting long and difficult to work with. I've found myself in the same position in the past - especially when working with legacy code and tests.

My observation is that problems tend to occur under some of the following circumstances:

* When there are relatively high levels of test coverage (generally a good thing in itself)
* When the code units under test are relatively coarse-grained and have complex dependencies
* When the code involved has an interface with generic operations - whose behaviour is greatly dependent on the supplied parameters
* When the operations being mocked or tested have complex parameter graphs or use message based interfaces e.g. contained in an XML string
* When the code under test makes repetitive calls on mocked code

In these cases, the tests often do become long and involved. Worse, they may tend to overlap with each other - which means that the same code is tested more than once. When a piece of code is changed, it may affect many tests. When a test fails, it may be difficult to work out why or even where it is failing.

Often these problems are symptomatic of badly written code or badly written tests - in which case, the best solution is to refactor as necessary to improve matters. Some steps that I find to help when writing or refactoring tests are:

* Make tests specific to the test case - try not to add assertions that aren't strictly relevant to the test.
* Try to ensure that the requirement under test is clear from reading the test code
* Pay attention to the design of test code and it's impact on test failure diagnosis - if an expectation fails, will it be clear where it has failed and what the problem is?

What this means in practice will depend partly on the tools and development approach being used. For example, with EasyMock, I tend to use custom argument matchers a lot to give myself more control over the argument checking.

For a different perspective on argument testing, take a look at the SevenMock mock object implementation (www.sevenmock.org). My motivation for writing this was to make parameter assertions the explicit responsibility of the test code. There are various pros and cons of this approach - I'd certainly be interested to hear any opinions that people might have. Any chance of adding a link to the mockobject.com site, Steve?

Nat Pryce said...

How are the parameter assertions in Seven Mock any more explicit than parameter matchers in jMock? They seem to be used for the same thing.

Colin said...

Hi Nat,

I think my observations should apply to most mock object implementations - since most allow custom matching of parameters.

I guess you're referring to the statement in the SevenMock documentation "Since assertions are coded explicitly in the test class..."? I agree that jMock parameter matchers are just as explicit. I also find them concise and easy to read - which is something I really like about jMock.

Perhaps I can find a better way to word this, though - what I was trying to convey was that the assertions are executed in-place in the test code, rather than being evaluated later by the mock object runtime. The implication is that the assertions are coded in java and that the stacktrace resulting from any assertion failure will point back to the assertion rather than the mock runtime - more like EasyMock matchers than jMock.

Nat Pryce said...
This comment has been removed by the author.
Nat Pryce said...

Colin, the stacktrace of a jMock expectation failure refers to the *precise point where the code under test failed*, not at the jMock runtime. Making assertions after the fact means that the stacktrace cannot store that information.

JMock is there in the stack-trace, but it does not represent where the failure was detected. I usually filter the jMock stacktrace elements out with my IDE so that the error message is easier to read.

Roberto Simoni said...

I think JMock tell you about your module/classes design.
For example... if I begin to write lots of expectations, JMock is saying to you "hey guys, you haven't done a good redesign", and the answer *must* be "ok, i'll redesign". Instead, I usually see that the answer is "Let's go JMock, this is another expectation, eat it!!!".
Well, this is a joke, but I hope you understand what I mean.