I am pleased to report that the campaign goes well thus far.
We met the enemy at the border of Weblogic, and battle was joined, SVN commits clashing with intractable lengths of binary spaghetti at every turn.
Our original plan was to crush all remote beans entirely, liberating the entire application from this particular scourge. Much progress has been made towards this goal, although the enemy is powerful and resourceful.
We can confirm that at least five of our opponents fell early in the glorious battle, with minimal losses on our side, despite the heavy resistance. At several points in the battle the WTF/minute meter was rising to dangerous levels. One particularly brave refactor was rolled back in a blaze of glory, still muttering “stay in target directory… stay in target directory”…
Our initial attempt at penetration of the enemy’s lines was at the JMX pass. Despite valiant attempts by our brave tests, we were eventually defeated in this attack by proprietary “extensions” to what should have been well-accepted standards. The enemy knows no shame, and will stoop to perverting even innocent RMI for it’s nefarious purposes. In fact, they dredged up what should have been long-forgotten and discarded protocols to throw us off the track, even resorting to JMX over IIOP, with proprietary client libraries, which was too much for us – we could not withstand that level of firepower. We had to find another way.
We were repelled by the evil machinations of the enemy’s classpath conflicts at the battle of the EAR, but were able to outflank them and retreat to a previous revision number with minimal losses. We then regouped and attacked again in the region of the remote stateless session bean, and only one of the enemy escaped our wrath. We will be watching this one remaining combatant carefully, and at his first slipup he is sure to be doomed as well. We have assigned our best integration test to watch his every move.
Despite attempts to deceive our troops with obscure and fiendish JNDI naming practices, we were able to ensnare many of the enemy in our powerful web of functional tests, where they can do no further harm.
At least four entire modules of the enemy’s forces did not survive the encounter!
We have won this conflict for now, yet many challenges still lie ahead in future battles. The enemy used many clever tactics this time, but we have not yet encountered their ultimate weapon: the greatly-feared container-managed entity bean! Weapons of mass deprecation of this nature must not be allowed to continue to intimidate future generations. Massive as these weapons are, they are slow and no match for our lightweight DDD techniques – we will lure them into shallow memory and watch them go aground, tangled in their own distributed transactions.
For now, though, our troops have withdrawn to the safety of our own home OSGi modules, where order, structure and modularity reign supreme.
Stay tuned for future reports….
In my last post, I discussed the concept of “technical debt bankruptcy”, a term for when you make the informed decision that a particular piece of software is costing more to maintain than it would cost to re-write.
Two closely related concepts are how to determine exactly and safely when this point has arrived, and how to handle the “restructuring of debt” that comes with the technical equivalent of a chapter 11 re-organization.
As a colleague pointed out in a comment about the last post, you can’t simply ignore the baked-in business value in the old solution. If you’re considering re-writing it, instead of just turning it off, then there must be someone using it for a valuable purpose.
First, though, you must be able to determine if your project (or some part of it) “qualifies” for such a process. Some projects don’t lend themselves to a proper restructuring, and might simply need to be put out of their (and your) misery, if the business value has evaporated.
This requires measuring, not guessing, even though sometimes there are items involved that are hard to measure.
Some of the things you need to measure are:
1. What is the current business value? This involves asking who uses the feature, how much, and what is it’s relative importance in the overall project. Sometimes it is difficult to get objective metrics on this, but that’s what’s needed to make an informed decision as opposed to a guess.
Gut feel does not count as a metric – you need to know who is using the system, how often, and for what purpose. Then you can determine if that purpose is equally well or better served by some other system, allowing that user to be weaned off the legacy system under consideration.
If you have logging or other metrics of use being gathered on the old system, this can be a good place to start, although you may need to augment the current metrics to get a baseline. Be sure that you’re looking at a long enough period to get a real picture, as often legacy systems have very intermittent use patterns.
2. What are the features, exactly? The best way to determine this is to write behavior-driven decoupled functional tests around the features of the system under consideration.
What I mean here is a list of scenarios that describe use cases of the feature in question, along with executable specifications for each such case, written so that they do not have any dependencies directly on the code under test.
Like all good tests, they need to be repeatable, atomic (at least at the whole scenario level), and automated. One example might be Selenium/Specs tests that exercise a feature of a web application by simulating a user interacting with the application from a browser. The tests must reflect how the feature is actually used, not how it was intended to be used, as often in legacy apps these are not at all the same thing.
3. The cost to re-write (or “aggressive refactoring”, as you might call it, given that there are existing functional tests).
This is a tough one, as often the technologies and techniques that might be used for a rebuild of an existing system are very different from those that were used in it’s original construction – especially if a lot of time has passed. Web and application development techniques have come a long way since, for instance, JSP 1.0, and the amount of effort to get an operational application is massively smaller than it used to be.
You need to estimate carefully, referring to the functional tests as the specification for the new system to be built. Once those functional tests are passing against the re-write, then you have, by definition, replaced the old system with the old, as the functional tests provide the full definition of what needs to be re-written. Don’t gold-plate, and don’t neglect to cover every valuable feature with the functional tests.
Now it’s down to the hard numbers: Let’s say that the old system takes 500 units (of something) to implement a new feature, on average. If we have 10 features in our backlog, our estimated total cost is then 5000 units.
If our new system will take 2000 units to build, and the cost of a new feature is then 50 units (and this is a very reasonable ratio, in my experience), then the math works out like so:
Total forecast for adding our backlog to the old system: 5000 units.
Total forecast for creating the new system and building the 10 features in the backlog: 2500.
This step is often made harder by the fact that some of the stakeholders in a project may not fully understand or have faith in either the estimates or the functional tests.
If they don’t have faith in the functional tests, then they must be able to read them and point out where the test doesn’t cover a valuable use-case. If they can’t, then they must accept that the tests do in fact specify the system adequately.
The estimates are another matter – often stakeholders won’t understand how large an advantage modern tools and techniques can offer, and find it difficult to believe that for the cost of a few bugs fixes or new features you can actually entirely replace a running legacy application. This is where non-technical stakeholders must have enough confidence in the professionalism and expertise of their developers – and nothing can substitute for that.
4. Conclusion
Now we have to take into account elements like migration of users from the old system to the new, data transfer (if any – perhaps the new system can be built on the same database as the old, in which case this doesn’t cost anything). Assuming these are, let’s say, 500 units (and that’s a lot, proportionally), we are still 2000 units ahead of the game with our current backlog, and that’s not considering if the new system costs less to deploy and/or operate than the old one. If it does, then the decision is even easier.
Of course, once the current backlog is handled, all future features have a 10 to 1 advantage over maintenance on the new system as well, which is where the real long-term massive savings come into play – not to mention the developer satisfaction, lower defect rate, and so forth.
We’re probably all familiar with the term “technical debt”, meaning the cost of doing things in a non-optimal or non-quality way. While I can go on at length (as my colleagues can attest!) about how this is avoidable by baking in quality, and thus saving time and money at every turn, the fact is that many existing projects have considerable technical debt.
Setting aside for the moment the discussion of telling “good” technical debt from “bad” technical debt, let’s just focus on a projects “bad” technical debt.
If we describe this kind of debt as factors that slow down the ability to change and improve the system, then we see that we are paying the “interest” on this debt every time we touch the codebase or it’s deployed instances.
What I’d like to focus on in this posting is the point at which this technical debt is evaluated to be sufficient that it makes sense to do what we’d normally call a “re-write” of the offending system or subsystem. Basically, when the -ah- mud gets so deep that the hip-waders aren’t helping, it might be time to throw in the towel and start a do-over.
I call this point “technical debt bankruptcy”. Much like a real bankruptcy, it’s an admission that chipping away at the debt isn’t going to work and isn’t worth it – that it’s time to re-group, and in a chapter-11-ish way, fold our tent in a responsible fashion.
Of course, determining if you’re at this point is critical. If you’re not, then you might be throwing out the baby with the bathwater and losing valuable work and former effort for reasons that are not sufficient. Often, political reasons can get us into that kind of situation, where the pressure to do a re-write is not justified. If there are no or few changes to a system, and it’s working sufficiently well as is, then there may be no reason to declare bankruptcy.
If, however, the bill collectors of technical debt are knocking down the virtual door, it’s important to know when to make the right move. As the song says “know when to fold ‘em”.
Part of knowing this is to be able to measure the pace and cost of change, and to be able to estimate, or better, measure, the cost of change if a re-write were done. Let’s say you’ve got a legacy project and a new project. The legacy project is using some old technology and techniques that are painful to work with, and you know they’re causing you to burn more time than they should be. If you also have a newer project (maybe something nice and greenfield) that’s being done with the latest new and shiny tools, Agile techniques, and so forth, you can get a rough idea of what each feature point in the new project is costing you. Now you can compare this to the cost of a feature point in the old project and make a comparison.
If you can look at your backlog of epics and get an idea as to what the future cost of maintenance on the legacy project is going to cost you, then you can take this cost and estimate it instead using the ruler of the new technologies and techniques. The delta is the amount you’ve got available to “spend” on a re-write, essentially.
If the math is right, then declare your technical debt bankruptcy and begin anew!
In some further posts, I’ll explore how to do a well-organized and structured “chapter 11″ on a project, rather than just dropping the ball, including the part that functional tests play in this process, and look at a re-write as a form of highly aggressive refactoring, rather than a whole new project.
Recently I’ve had the opportunity to consider the right qualities of a tool and/or framework for acceptance testing.
Acceptance tests can be found at a number of different levels, depending on how the story criteria is expressed. (See my previous post on levels of testing) . Often they’re called “executable specifications” if they’re written in such as way as to describe the behavior of a system in a given scenario.
Often they are functional or integration tests, and generally they are best expressed as “black box” tests, that is, tests that have no awareness of the internals of the code or component being tested – all they see is what goes in and what comes out, and they assert their success or failure based on those elements alone.
An acceptance test must be comprehensible to the story author, or whatever domain expert is going to actually do the “accepting” that the story has been satisfied. If they simply take the word of a developer that this big ball of code they see in front of them is testing what they said it should, that’s not as good as if they can actually read and understand the executable specification (test) themselves. Ideally, they should even be able to author their own acceptance tests, without a developer involved – perhaps using existing tests as an example.
So, some of the criteria for a good tool might be:
Given these criteria, should the tests be part of the build process for the piece of code under test? For acceptance tests that span modules, this is not practical – you can’t actually run the test when you build a module, you can only run it after a certain set of modules has been built (and perhaps even deployed).
I would propose that acceptance tests don’t even belong with the build they’re testing. I think they belong elsewhere, in an independent location where they can be updated as stories are written and verified. Ideally, there should be a shared space for stories for an entire system or suite of applications, as often a piece of criteria spans multiple components – so organizing the tests by component is artificial at best.
Another question that comes up when considering tools for these kinds of tests is whether or not they must be stored and versioned with the code they validate. As a developer, my immediate reaction is “yes” – until I think about it a bit. How does it help me to know what tests a previous version of my code passed or did not pass? All that tells me is where I was in the past, not where I am now. How much inconvenience am I willing to put up with on the part of test authors to get this capability? Dealing with any version control system is extra work, especially for a domain expert/BA who’s not also a developer. I’m now convinced that where I am now is more important than where I was, and being able to easily write and run and make visible tests is more important to me than knowing what happened back in history.
Let’s examine a few different tools and ways of doing acceptance tests and look at the pros and cons:
JUnit Developers working with Java have a choice of a number of excellent test frameworks, including TestNG and JUnit. They are likely already familiar with them, and the tests fit nicely into the build cycle of Java applications built with any of the more popular project build and management tools, such as Ant, Buildr, Maven, and so forth.
Our acceptance tests are written just like a functional test, in that they fire up whatever context our application should run within, then provide input to it and verify the output with assertions.
This kind of test is not well-suited for acceptance tests for a number of reasons. First, it’s hard to separate the code from the test to ensure we’re truly doing black-box testing. If, for example, we’re within the same VM as the thing we’re testing, it’s very tempting to manipulate objects directly in our test, as opposed to going through, lets say, the publish REST api. This also makes it more difficult to initialize a new testable instance of the application – to be truly independent from it, our test should spawn a whole new JVM from the system under test – which is non-trivial from within Java, although of course re-usable fixtures can be written to take the sting out of it.
This doesn’t help us much when our test spans multiple modules or components, however – then we must either create stubs for each of the components we depend on, or work in a much more complex deployment process to create a “live” version of each of our dependencies.
Finally, a JUnit test is not particularly readable by a non-developer, and not particularly visible, other than as a green or red bar in an IDE or CI server somewhere, so it fails a couple of our most important criteria.
JBehave, easyb, Specs, RSpec BDD frameworks such as those listed above take our testing power in a different direction. They mostly rely on sophisticated DSL’s, enabling us to write our tests in a much more english-like fashion, often quite readable to the non-developers who have taken the time to learn a bit of the DSL.
They still suffer from some of the other problems described above, however – although some of these tools do offer better output formats to make their results more visible (such as the excellent Forms capability in specs, which can show test results in HTML table formats).
They of course still have a learning curve, as the DSLs in each case are another whole language to be learned.
FitNesse A different approach can be seen in tools like FitNesse, which allows tests to be created in a Wiki, by editing web pages and inserting special markup to call test “fixtures”. These fixtures still have to be customized to the situation, of course, but with FitNesse we have the potential of being de-coupled from a single module or system under test, and of developing our tests independently of the code altogether.
The table approach of FitNesse still requires the test author to understand the fixtures available to FitNesse – this is essentially FitNesse’s DSL – but of course only a fairly small number of fixtures need be learned to be able to write a wide variety of tests.
Some projects, however, in an attempt to retain the ability to use FitNesse tests to verify previous versions of their application, take the step of checking the FitNesse tests in to their version control system, thereby losing some of that independence.
FitNesse has it’s own ability to track changes to it’s Wiki pages (and thus it’s tests), but this is not tied to the checkins or releases of the software under test.
FitNesse makes it easy for a non-developer domain expert to author tests by using existing tests as templates, then changing the inputs to the fixtures being used to create new tests from the existing building blocks.
These tests and their results are highly visible, especially if the FitNesse wiki is hosted and available to anyone (including developers) to use at any time to verify against a specific deployed instance of the entire suite of applications.
Of course, you’ve still got the issue of deploying a testable instance of your system to deal with in FitNesse. One approach I’ve seen to solve this is to actually have a fixture that can deploy the testable instance as part of the set up for the whole test suite, using versions to indicate the revision of code to be tested – e.g. you have a fixture that says “deploy module X version 123 to test environment 1″, “deploy module Y version 345 to test environment 1″ and so forth, then executes its tests against those deployed instances.
Of course, any database cleanup/reset to get to known state can happen before or just after the deployment, so your tests always start from a known point.
An important part of making this fully automatable is a mutex service: a way to make a call to a known location and say “check out test environment 1″, basically saying that test environment 1 is now busy until it’s released by another call. This ensures that you don’t start another test run on the same environment while it’s still in a transition state.
The mutex can also of course report on who has what environment in use, and since when, to detect failures that might not release a lock.
FitNesse has it’s warts, however, even in a scenario like this, but overall I’ve seen it succeed more often that other approaches for the high-visibility acceptance tests that many projects need, while tools such as easyb, specs, and RSpec are better for describing behaviours within a single module, and executing as part of that module’s build process.
Why do we want to go fast in the first place? Well, if we’re not accumulating technical debt, our quality is still within the bounds we’ve set for ourselves, and we’re satisfying user stories, then we want to be able to accomplish as much as we sustainably can each sprint. This way we can deliver value faster, and people tend to like to pay for that kind of thing
Well, first, we need a road: the basics of an agile environment need to exist before we can go very fast at all. If we’re working on the build system every sprint and trying to get the basics of a story understood, we’re still in road construction, and we shouldn’t expect much in the way of speed until we get some of this basic asphalt laid down and smooth. These includes such basics as a good development setup for our team, a basic understanding of agile principles and a grasp of the technology stack we’re using, and the support of a continuous integration server, to mention a few. If we’re attempting to bounce along over the potholes without setting up a proper environment for rapid delivery of software value, we’ll reap what we sow.
Assuming we’ve built the road, then, what things tend to hold us back? Just like on a real road, the only things stopping us from going faster and faster (to the mechanical limit of our vehicle, or in our case, our keyboards and brains) are either externally-imposed limitations (e.g. a speed limit and cops to enforce it), or our own ability to control the pace without going off the road. In software construction, as in life, we can go off the road in many varied ways, but they all tend to be spectacular, destructive, and painful. Unlike the real road, we can be cruising along for some time before we discover we’ve left the asphalt behind and are sailing over a cliff.
We’ll start by assuming that our corporate environment has eliminated externally imposed speed limits and political roadblocks – not always a safe assumption, but lets assume for the moment that we’re one of the luck developers who work in such a situation.
Our top-level speedometer, to overuse our analogy a bit, is our velocity, measured in features per iteration, or complexity points per iteration – in other words, how much business value are we adding per time period?
The most common way to go off the road is for quality to slip. This can be detected in one of a number of ways, including an ever-increasing defect rate. If most of your sprint is taken up by fixing defects or paying off code debt, then you’re probably trying to go too fast (or you’ve not finished laying the road after all). Of course, it’s possible you’ve just got a few bad drivers on your team, but we’ll assume that’s easier to see (if not necessarily easier to fix). What’s worse than seeing quality slip? Not seeing quality slip, even though it is. We can’t measure quality directly, per se, but we sure can measure a lot of other things. Once we know what the normal position of each guage is (e.g. once we establish reasonable code standards that we can measure), then we can watch them to get early warning of things going awry.
Just like on the real road, we need two categories of things, it seems, to help us go as fast as safely possible: I’ll call them headlights and guardrails.
Headlights
The most basic tools here are user stories, acceptance criteria/tests (ideally executable ones), and metrics such as defect rate and velocity measurements. None of these are trivial or straightforward, and it’s easy to think you’ve got a good view and suddenly discover you’ve been accumulating code debt without realizing it. The only proper reaction at that point is to slow down and correct the problem, as we’ll discuss below.
A good business understanding of the goals and epics behind our user stories gives us more range to see further ahead, and going fast requires looking further ahead, while at the same time paying attention to where you are at the moment.
Just like when driving we must be aware of the road immediately ahead, our user stories give us the close-focus we need to be doing the immediately useful thing. We can’t discard these in favor of looking further ahead exclusively, or we’ll never get to where we want, but we can combine that with an awareness of both the near future and an understanding of the overall destination to make better decisions in our day-to-day work.
If we concentrate exclusively on the user stories in hand for each iteration we can find we’ve lost sight of the forest, and may have a hard time fitting together features that should blend into an overall product. If we concentrate only on the distant horizon and not on the user story we’re working on we’ll never get anything done. The proper balance lets us go fast.
We don’t want to be like the driver in the joke with the punch line that ends “we’re lost… but we’re making bloody good time”!
Looking a bit further ahead also allows us to anticipate curves and obstacles in the road, and be ready to hand them when they arrive. If we know, for instance, from our long-range planning that we intend to scale our application to thousands of users, we might make different decisions than if we’re aware that a single user on a desktop box is the intended audience – even though neither of these factors is really represented directly by each user story we work on.
Executable acceptance tests from a tool like Greenpepper, Fitness, RSpec, or the like can be valuable headlights, freeing developer time from the repetitive manual verification and allowing BA/Customer Proxies to have control over the acceptance process – again freeing up developers to develop, and maximizing team velocity. As was mentioned in a recent stand-up meeting: if you’ve manually tested once, you’ve probably already spent more time than it takes to set up an automated test to do the same thing repeatedly, not to mention you’ve probably enjoyed it a lot less
Guardrails
There’s a big difference between guardrails and a stone wall built across the road ahead, however – it’s not hard to let a testing tool or technique turn into a straightjacket, with tons of brittle and hard-to-maintain tests that don’t help us at all. We need the right tool for the right job, and used the right way.
If we have guardrails ensuring the basics of our code quality, we can go faster with the confidence that when we look back at the end of each sprint we will not have accumulated more code debt that needs to be paid back later. For example: if we establish a test coverage metric that ensures we have a breaking build if our code coverage goes below a certain minimum level (I propose this always be 100%, but that’s another post), we can move forward with the assurance that there’s no code that’s being left untested, so we won’t find ourselves in the distinctly non-TDD-like position of having to go back and write tests for existing code, burning time that should be able to be used for the next story.
We can also refactor with better confidence if we know for a fact there are tests watching over our shoulders, ready to break should our refactor not be true. Refactoring code that is, at least in part, untested should always be an unacceptable risk.
If we have some checkstyle, PMD, FindBugs or other static analysis tools checking that our cyclomatic complexity is within bounds, that our class size and line length are readable, and other critical maintainability and coding standards factors are met, we can plunge forward without the fear of a huge cleanup being required just to make the code understandable down the road a ways.
Of course, just like guardrails and headlights are not infallible in the real world, all the tools and checks in the world don’t ensure good quality code. One area that’s particular hard to ensure quality within via automatic mechanisms is design. You can have code that’s 100% covered, passes every checkstyle rule known to man, and still represents a terrible design. This is where the human factor comes into play – the automation merely ensures that you’re spending valuable human attention span on the stuff that really requires a brain, as opposed to things that can be verified mechanically.
Discipline is the glue that makes all of this work together – often times developers themselves will have the “smell” of something done not quite right, but not feel like they’ve got the latitude to dig into it and clean it up, so they save it until the mythical “later”, which sometimes never comes. Management and team leads must also be disciplined enough to have the patience while that kind of refactor happens – with the firm knowledge that they’ll get paid back by better productity and a lower defect rate over the mid to long term.
A final warning: It’s easy to let headlights become leashes and for guardrails become cubicle walls. Many agile practitioners are concerned, and rightly so, that adding tools and techniques can turn into a new dogmatism and inflexible methodologies. It’s up to us in the trenches to make sure we don’t let this happen, while at the same time getting all the juice we can out of helpful techniques and tools.
Properly applied, though, headlights and guardrails can be valuable tools in letting us reach our maximum velocity, while still arriving safely at our destination.