»
S
I
D
E
B
A
R
«
Country and State in Vaadin
Nov 25th, 2009 by Mike

In my ongoing tinkering with the Vaadin web framework, I had a need to build a country and state/province dropdown, much like you see on almost every e-commerce site out there.

Unlike the typical pair of these fields, however, I wanted mine to work correctly – that is, for the list of provinces or states to reflect the currently selected country automatically.

Simple as this sounds, it’s an exercise I’ve done many many times in too many different frameworks and technologies to count, and it’s never quite as easy it as seems like it ought to be. As you can see, many sites give up, and simply have a combined list of all states and Canadian provinces, or some such hack. I’ve never actually seen one that changes the name of the “State” field to “Province” when Canada is selected as the country, for instance.

So I set out to try this in Vaadin, and was pleasantly surprised.

Let’s assume we’re building a Form object in Vaadin. We can add our “country” drop down select box by saying this:

         Select country = new Select("Country:");
        country.addItem("[Unspecified]");
        country.addItem("United States");
        country.addItem("Canada");
        country.addItem("Afganistan");
        form.addField("country", country);

Of course, we’d probably want our list of countries to come from some other data source, like a property file or static list, but this gives you the idea.

Now we add our “state” drop-down:

         final Select state = new Select("State:");
        form.addField("state", state);

Now we must decide whether to populate the state with actual U.S. States or Canadian provinces….

         country.addListener(new Property.ValueChangeListener() {
            public void valueChange(Property.ValueChangeEvent valueChangeEvent) {
                String selectedCountry = valueChangeEvent.getProperty().getValue().toString();
                if (selectedCountry.equals("United States")) {
                    state.setCaption("State:");
                    state.removeAllItems();
                    state.addItem("Alabama");
                    state.addItem("Texas");
                    state.addItem("Idaho");
                } else if (selectedCountry.equals("Canada")) {
                    state.setCaption("Province:");
                    state.removeAllItems();
                    state.addItem("Alberta");
                    state.addItem("Saskatchewan");
                    state.addItem("Manitoba");
                }
                state.requestRepaint();
            }
        });

So our fields before we select a country look like this:

Now we’ve got a “state” field that will change it’s title to “Province:” if the country we select is Canada, and populate itself with the appropriate list of the provinces.

But switch right back to “State” and populate the dropdown accordingly if we change our minds and select United States again…

Not too difficult, and very easy to understand and extend to other situations (or countries).

As with all Vaadin, it’s pure Java, fully testable (at the Unit level, as well as at the functional level). In fact for the “real” version of this (which reads it’s countries and such from property files), I was even able to easily TDD it, which often isn’t the case in UI technologies.

First Look at Vaadin
Nov 12th, 2009 by Mike

I’ve had the chance over a recent weekend to have a first crack at a web framework called Vaadin.

I was originally browsing for news about the latest release of Google’s GWT framework when I stumbled on a reference to Vaadin, and decided to go take a look. What I found intrigued me, and I decided to take it for a test drive, as I was down sick for a couple of days with a laptop nearby…. My back became annoyed with me, but it was worth it, I think.

First Look
First off, the practicalities: Vaadin is open source, and with a reasonable license, the Apache License. The essential bits of Vaadin are contained in a single JAR, and it’s both Ant and Maven friendly right out of the box.

The next thing that struck me about Vaadin was the documentation. The first unusual thing about it’s documentation was the fact of it’s existance, as open source projects are downright notorious for poor documentation. Vaadin is a pleasant exception, with tons of examples, a well-organized API doc, in the usual JavaDoc format, and even the “Book of Vaadin”, an entire PDF book (also available in hardcopy) that takes you through Vaadin in enough detail to be immediately productive.

Given that surprisingly pleasant start, I dug deeper, creating a little app of my own.

Just the Java, Ma’am
The main thing that kept me interested in Vaadin once I started digging further was that it’s pure Java. Many frameworks talk about writing your UI work in Java, such as Wicket, but there’s still a corresponding template and some wiring that happens to put the code and the template together. Not so with Vaadin.

When they say “just Java”, they mean it – your entire UI layer is coded in Java, plain and simple. No templates, no tag libraries, no Javascript, no ‘nuthin. It’s reminiscent of the Echo framework, except in Vaadin’s case the Javascript library that your code automatically produces is Google’s GWT, instead of Echo’s own Core.JS library.

Unlike GWT, though, the Vaadin approach doesn’t bind you to any specific source code though, it’s just a binary jar you put on your classpath.

The only thing in my sample app, other than 2 Java files, was a web.xml and a css stylesheet, both of which were only a few lines long. And this was no “Hello, World”, either, but a rich AJAX webapp with a tree menu, fancy non-modal “fading” notifications, images, complex layouts, and a form with build-in validation. And it took maybe 4 hours of total work to produce – and that was from a standing start, as I’d never heard of Vaadin before last Thursday. Not bad, not bad at all.

I found I was able to get a very capable little webapp up and running with no need to invent my own components, even though I had trees and sliders and menus and other assorted goodies on the page. It worked in every browser I was able to try it in, which is certainly not the case for my own hand-rolled JavaScript most of the time :)

I haven’t yet tried creating my own custom components, but it certainly looks straightforward enough.

I did try linking to external resources, and included non-Vaadin pages in my app, with no difficulties, so it appears that Vaadin plays well with others, and can be introduced into an existing project that uses, for instance, a whack of JSP’s that one might want to obsolete.

Webapps
I think Vaadin warrants more exploration, and I intend to put it further through its paces in the next few weeks. It appears extremely well-suited to web applications, as opposed to websites with a tiny bit of dynamic stuff in them.

It offers an interesting alternative to some of the patterns I’ve seen for advanced dynamic webapp development so far.

One approach I’ve seen a lot is to divide the duties of creating an app into the “back end” services and the “UI”. Generally the UI is written in either JavaScript, or uses Flex or some other semi-proprietary approach. The “back end” stuff is frequently written to expose it’s services as REST, then the two are bolted together. The pain point here happens when the two meet, as it’s common and easy to have minor (or major!) misunderstandings between the two teams. This usually results in a lot of to-and-fro to work out the differences before the app comes all the way to life.

The other approach, more common on smaller or resource-strapped teams, is to have the same group responsible for both UI and back-end services. This reduces the thrash in the joints a bit, but doesn’t eliminate it, because the two technologies on the two sides of the app aren’t the same. You can’t test JavaScript the same way you write Java, for instance, and they’re two different languages – one of which (Java) has far better tooling support than the other. IDE support, for instance, is superb for Java, and spotty at best for JavaScript.

With Vaadin, both of these approaches become unnecessary, as its the same technology all the way through (at least, what you write is – technically it’s still using JavaScript, but because that’s generated, I don’t count it).

You get to use all of the tools you know and love for the back-end services to write the code for the UI, which you can then unit and functional test to your heart’s content.

The temptation to mix concerns between UI code and back-end service code must still be resisted, of course, but at least that code isn’t buried somewhere in the middle of a JSP page, ready to leap out and bite you later.

Because you’re using dynamic layouts, the app always fits properly on the screen without any extra work, addressing a pet peeve of mine, the “skinny” webapp, restraining itself to the least common denominator of screen size, thus rendering impotent my nice wide monitors.

Scala
Just because Vaadin is a Java library doesn’t restrict you to using Java to drive it, however. I made another little webapp where the whole UI was defined in Scala, calling the Vaadin APIs, and it worked like a charm. In some ways, Scala is an even better fit for Vaadin than straight Java, I suspect. I haven’t tried any other JVM compatible language, but I see no reason they wouldn’t work equally well.

Deployment and Development Cycle
As I was building the app with Maven, I added a couple of lines to my POM and was able to say “mvn jetty:run” to get my Vaadin app up and running on my local box in a few seconds. My development cycle was only a few seconds between compile and interactive tests, as I was experimenting with the trial-and-error method.

TDD would be not only possible, but easy in this situation.

I successfully deployed my little Vaadin app to ServiceMix, my OSGi container of choice, without a hitch.

Performance appeared excellent overall, although I haven’t formally tested it with a load-testing tool (yet).

Summary
So far, I’m impressed with Vaadin. I’m more impressed with any web framework I’ve worked with in a number of years, in fact. I’m sure there are some warts in there somewhere, but for the benefits it brings to the table, I suspect they’re easily worth it. I think the advantages to teams that already speak fluent Java is hard to overstate, and the productivity to produce good-looking and functioning webapps is quite remarkable.

The Case for the Natural-Language Unbound Variable
Nov 3rd, 2009 by Mike

I was recounting to a colleague the other day a story that is probably much funnier to fellow developers than anyone else, and thought I’d share it here as well.

I had the opportunity for a number of years to work with a software team in the Bahamas, for a company with operations there. They were a great group to work with, and we did some good stuff. The Bahamas has it’s own “dialect” of English sometimes, however, and it was a while before I got used to the slight differences and accents from my native Bermudian.

One of my favorite elements of the unique Bahamian dialect is their use of a unbound variable in natural language.

You know when you’re talking about something, and the name of a specific piece of technology escapes you? Well, those of you with younger memories maybe don’t have this mental lazy-loading happen quite as often as us old geezers, but I’m sure it happens once in a while. You’re saying something about a webapp deployed to a cluster and you want to refer to the device that distributes the processing load of the users across the machines in the cluster. “Load balancer” is the term you’re looking for, but your brain has hit a bad sector and is busy doing an fsck, so you stammer over the right term.

Bahamians have invented a term especially for this situation, and I call it the “unbound variable” in natural language. We’re not sure what it’s value should be in this situation, but it serves as a place holder when we think of the right word later on.

It’s called “tingum”, pronounced just like you’d expect, with a short “i”. Its considered bad form to pronounce it “thing-um”, I understand, even thought that might be the root of the word originally.

So you can simply say, “When the tingum moves a session from one server to another”, rather than having to actually work out the necessary technobabble to go in that spot.

It saves us the trouble of having to come up with new terms at random, like “whatsit”, or the much more verbose “thing-a-me-bob” or “whatchamacallit”.

It’s important not to bind this variable to any one term, even if it later becomes clear what that term should be in any one context, because that would reduce it’s generality. If you know that tingum should be bound to “load balancer”, you might inadvertently use it in it’s bound form in the wrong sentence.

In this sense, it’s rather like “_” (the underscore) in Scala – it’s *supposed* to be unbound.

So, the next time your mental RAM gets a checksum error and you can’t remember the correct technical term, keep the “tingum” in mind!

Technical Debt Bankruptcy: When does my project qualify?
Sep 14th, 2009 by Mike

In my last post, I discussed the concept of “technical debt bankruptcy”, a term for when you make the informed decision that a particular piece of software is costing more to maintain than it would cost to re-write.

Two closely related concepts are how to determine exactly and safely when this point has arrived, and how to handle the “restructuring of debt” that comes with the technical equivalent of a chapter 11 re-organization.

As a colleague pointed out in a comment about the last post, you can’t simply ignore the baked-in business value in the old solution. If you’re considering re-writing it, instead of just turning it off, then there must be someone using it for a valuable purpose.

First, though, you must be able to determine if your project (or some part of it) “qualifies” for such a process. Some projects don’t lend themselves to a proper restructuring, and might simply need to be put out of their (and your) misery, if the business value has evaporated.

This requires measuring, not guessing, even though sometimes there are items involved that are hard to measure.

Some of the things you need to measure are:

1. What is the current business value? This involves asking who uses the feature, how much, and what is it’s relative importance in the overall project. Sometimes it is difficult to get objective metrics on this, but that’s what’s needed to make an informed decision as opposed to a guess.

Gut feel does not count as a metric – you need to know who is using the system, how often, and for what purpose. Then you can determine if that purpose is equally well or better served by some other system, allowing that user to be weaned off the legacy system under consideration.

If you have logging or other metrics of use being gathered on the old system, this can be a good place to start, although you may need to augment the current metrics to get a baseline. Be sure that you’re looking at a long enough period to get a real picture, as often legacy systems have very intermittent use patterns.

2. What are the features, exactly? The best way to determine this is to write behavior-driven decoupled functional tests around the features of the system under consideration.

What I mean here is a list of scenarios that describe use cases of the feature in question, along with executable specifications for each such case, written so that they do not have any dependencies directly on the code under test.

Like all good tests, they need to be repeatable, atomic (at least at the whole scenario level), and automated. One example might be Selenium/Specs tests that exercise a feature of a web application by simulating a user interacting with the application from a browser. The tests must reflect how the feature is actually used, not how it was intended to be used, as often in legacy apps these are not at all the same thing.

3. The cost to re-write (or “aggressive refactoring”, as you might call it, given that there are existing functional tests).

This is a tough one, as often the technologies and techniques that might be used for a rebuild of an existing system are very different from those that were used in it’s original construction – especially if a lot of time has passed. Web and application development techniques have come a long way since, for instance, JSP 1.0, and the amount of effort to get an operational application is massively smaller than it used to be.

You need to estimate carefully, referring to the functional tests as the specification for the new system to be built. Once those functional tests are passing against the re-write, then you have, by definition, replaced the old system with the old, as the functional tests provide the full definition of what needs to be re-written. Don’t gold-plate, and don’t neglect to cover every valuable feature with the functional tests.

Now it’s down to the hard numbers: Let’s say that the old system takes 500 units (of something) to implement a new feature, on average. If we have 10 features in our backlog, our estimated total cost is then 5000 units.

If our new system will take 2000 units to build, and the cost of a new feature is then 50 units (and this is a very reasonable ratio, in my experience), then the math works out like so:

Total forecast for adding our backlog to the old system: 5000 units.

Total forecast for creating the new system and building the 10 features in the backlog: 2500.

This step is often made harder by the fact that some of the stakeholders in a project may not fully understand or have faith in either the estimates or the functional tests.

If they don’t have faith in the functional tests, then they must be able to read them and point out where the test doesn’t cover a valuable use-case. If they can’t, then they must accept that the tests do in fact specify the system adequately.

The estimates are another matter – often stakeholders won’t understand how large an advantage modern tools and techniques can offer, and find it difficult to believe that for the cost of a few bugs fixes or new features you can actually entirely replace a running legacy application. This is where non-technical stakeholders must have enough confidence in the professionalism and expertise of their developers – and nothing can substitute for that.

4. Conclusion

Now we have to take into account elements like migration of users from the old system to the new, data transfer (if any – perhaps the new system can be built on the same database as the old, in which case this doesn’t cost anything). Assuming these are, let’s say, 500 units (and that’s a lot, proportionally), we are still 2000 units ahead of the game with our current backlog, and that’s not considering if the new system costs less to deploy and/or operate than the old one. If it does, then the decision is even easier.

Of course, once the current backlog is handled, all future features have a 10 to 1 advantage over maintenance on the new system as well, which is where the real long-term massive savings come into play – not to mention the developer satisfaction, lower defect rate, and so forth.

When to Declare Project “Technical Debt Bankruptcy”
Sep 13th, 2009 by Mike

We’re probably all familiar with the term “technical debt”, meaning the cost of doing things in a non-optimal or non-quality way. While I can go on at length (as my colleagues can attest!) about how this is avoidable by baking in quality, and thus saving time and money at every turn, the fact is that many existing projects have considerable technical debt.

Setting aside for the moment the discussion of telling “good” technical debt from “bad” technical debt, let’s just focus on a projects “bad” technical debt.

If we describe this kind of debt as factors that slow down the ability to change and improve the system, then we see that we are paying the “interest” on this debt every time we touch the codebase or it’s deployed instances.

What I’d like to focus on in this posting is the point at which this technical debt is evaluated to be sufficient that it makes sense to do what we’d normally call a “re-write” of the offending system or subsystem. Basically, when the -ah- mud gets so deep that the hip-waders aren’t helping, it might be time to throw in the towel and start a do-over.

I call this point “technical debt bankruptcy”. Much like a real bankruptcy, it’s an admission that chipping away at the debt isn’t going to work and isn’t worth it – that it’s time to re-group, and in a chapter-11-ish way, fold our tent in a responsible fashion.

Of course, determining if you’re at this point is critical. If you’re not, then you might be throwing out the baby with the bathwater and losing valuable work and former effort for reasons that are not sufficient. Often, political reasons can get us into that kind of situation, where the pressure to do a re-write is not justified. If there are no or few changes to a system, and it’s working sufficiently well as is, then there may be no reason to declare bankruptcy.

If, however, the bill collectors of technical debt are knocking down the virtual door, it’s important to know when to make the right move. As the song says “know when to fold ‘em”.

Part of knowing this is to be able to measure the pace and cost of change, and to be able to estimate, or better, measure, the cost of change if a re-write were done. Let’s say you’ve got a legacy project and a new project. The legacy project is using some old technology and techniques that are painful to work with, and you know they’re causing you to burn more time than they should be. If you also have a newer project (maybe something nice and greenfield) that’s being done with the latest new and shiny tools, Agile techniques, and so forth, you can get a rough idea of what each feature point in the new project is costing you. Now you can compare this to the cost of a feature point in the old project and make a comparison.

If you can look at your backlog of epics and get an idea as to what the future cost of maintenance on the legacy project is going to cost you, then you can take this cost and estimate it instead using the ruler of the new technologies and techniques. The delta is the amount you’ve got available to “spend” on a re-write, essentially.

If the math is right, then declare your technical debt bankruptcy and begin anew!

In some further posts, I’ll explore how to do a well-organized and structured “chapter 11″ on a project, rather than just dropping the ball, including the part that functional tests play in this process, and look at a re-write as a form of highly aggressive refactoring, rather than a whole new project.

Maven with included Dependencies
Sep 4th, 2009 by Mike

One of the most common, and, in my opinion, valid criticism’s of Apache Maven is the way it handles dependencies.

The default way of doing dependencies in Maven is for a build to look for dependency jars in one of a specified number of places. If you don’t specify a location, it will start with your local Maven repository, then a stock set of online repositories (such as ibiblio).

What this means is that you can quickly and easily add a dependency to your project by just listing it in the POM, and Maven will frequently just go get it for you as required. For dependencies that don’t change rapidly (which is generally a bad idea for entirely separate reasons), the local copy is always used, so after the first download you don’t go get the jar again unless it changes.

The downside of this (and hence the basis for the criticism) is that your local repository is considered a cache, and is not checked in to your source control system. Maven aficionados believe that source control should be for what the name implies: source, not jars (which are an artifact, not source code). This is good, except in situations where the local repository is removed (as it’s a cache, and not “backed up” by being in source control) and you want to rebuild your project. Normally, no problem: Maven will simply go back out into the wilds of the internet and get your jars for you again. Except when it can’t, which is where the problem starts.

Let’s say your project depends on foo-1.0.jar, for the sake of discussion. Foo-1.0.jar is found in a repository at http://baz.com. No problem. Two months from now, you come back and make a change to your project, and your local repo doesn’t have foo-1.0.jar in it. Maven goes to get it for you, but baz.com has gone offline. Oops. Now you can’t build your project at all. Worse, let’s say you *do* find foo-1.0.jar, but it’s been updated. Granted, this is a gross violation of the version scheme, but as baz.com is not under your control, it *can* happen – and Murphy was an optimist.

The first level of defense against this is to have a local jar proxy that *is* backed up, like Nexus. Nexus not only provides a safe haven for the exact versions of Jars you need to build your project, but also proxies new jars automatically, when set up correctly. That way, once you use that external dependency, it’s available on the local proxy immediately forever more, in precisely the correct version.

This solution is so good, that I always consider Nexus and Maven to be two parts of one solution – as you don’t really have reliable and repeatable builds without Nexus in the mix.

Another way to go, however, is perhaps even more straightforward, and might be appropriate if you have a limited number of “internal” binary dependencies – that is, binary dependencies that you generate within your organization between projects.

This is the option I want to describe in this blog, and it takes a form very similar to the old Ant style of having a “lib” directory in your project, and checking jars into source control.

You simply create a directory (called lib or whatever you wish, but “lib” might be more standard), and put your jar dependencies in it. Then you refer to each dependency in your pom.xml file with a special syntax, like so:

    <dependency>
      <groupId>mygroup</groupId>
      <artifactId>myjar</artifactId>
      <version>1.0.0</version>
      <scope>system</scope>
      <systemPath>${basedir}/src/main/resources/lib/myjar-1.0.0.jar</systemPath>
    </dependency>

Now the myjar-1.0.0.jar will be on my classpath during compilation, easy as that. I can then check in that nicely versioned jar file, and have a fully self-contained project, while at the same time being able to leverage the power of Maven and it’s rich ecosystem of plugins. If you’re using the shade plugin, this is an excellent way to create a self-contained executable jar file containing all of your necessary dependencies in one shot, rather than having to worry about classpaths at runtime.

In web applications, you can do even better. By placing the dependency in the correct spot in your source tree, you can have it automatically included in the finished .war file, while at the same time finding it in your classpath during compile time, like so:

    <dependency>
      <groupId>xstream</groupId>
      <artifactId>xstream</artifactId>
      <version>1.3.1</version>
      <scope>system</scope>
      <systemPath>${basedir}/src/main/webapp/WEB-INF/lib/xstream-1.3.1.jar</systemPath>
    </dependency>

This includes the xstream jar in my webapp. Every IDE that can digest a POM file (which include my 3 main ones, IntelliJ IDEA, Eclipse IDE and Netbeans) also automagically find the dependencies.

One minor constraint here is that you must use a *versioned* jar file – e.g. it must be named according to the artifact-version.jar pattern. I consider this a feature, not a limitation, as I’m a firm believer that all artifacts should be properly versioned, so I can tell what source code originally created them. Yes, I know that can be done internally, in a manifest or such, but I like to see the label on the outside of the box, so to speak.

As one colleague put it, now you can be assured of coming back to your project months or years later, even if you don’t have internet connectivity, and be able to build the darn thing again, without going off on a wild jar chase all over the internet.

This pattern can overcome a serious and legitimate objection to Maven, and might be appropriate in many situations. Give it a try, let me know what you think!

Specs and Selenium together
Aug 9th, 2009 by Mike

I recently had the chance to dive into a new project, this one with a rich web interface. In order to create acceptance test around the (large and mostly untested) existing code, we’ve started writing specs acceptance tests.

Once we have our specs written to express what the existing functionality is, we can refactor and work on the codebase in more safety, our tests acting as a “motion detector” to let us know if we’ve broken something, while we write more detailed low-level tests (unit tests) to allow easier refactoring of smaller pieces of the application.

What’s interesting about our latest batch of specs is that they are written to express behaviours as experienced through a web browser – e.g. “when a user goes to this link and clicks this button on the page, he sees something happen”. In order to make this work we’ve paired up specs with Selenium, a well-known web testing framework.

By abstracting out the connection to Selenium into a parent Scala object, we can build a DSL-ish testing language that lets us say things like this:

object AUserChangesLanguages extends BaseSpecification {

  "a public user who visits the site" should beAbleTo {
    "Change their language to French" in {
      open("/")
      select("languageSelect", "value=fr")
      waitForPage
      location must include("/fr/")
    }
    "Change their language to German" in {
      select("languageSelect", "value=de")
      waitForPage
      location must include("/de/")
    }
    "Change their language to Polish" in {
      select("languageSelect", "value=pl")
      waitForPage
      location must include("/pl/")
    }
  }
}

This code simply expresses that as a user selects a language from a drop-down of languages, the page should refresh (via some Javascript on the page) and redirect them to a new URL. The new URL contains the language code, so we can tell we’ve arrived at the right page by the “location must include…” line.

Simple and expressive, these tests can be run with any of your choice of browsers (e.g. Firefox, Safari, or, if you insist, Internet Explorer).

Of course, there’s lots more to testing web pages, and we’re fleshing out our DSL day by day as it needs to express more sophisticated interactions with the application.

We can get elements of the page (via Xpath), make assertions about their values, click on things, type things into fields and submit forms, basically all the operations a user might want to do with a web application.

There are some frustrations, of course. The Xpath implementation on different browsers works a bit differently – well, ok, to be fair, it works on all browsers except Internet Exploder, where it fails in various frustrating ways. We’re working on ways to overcome this that don’t involve having any “if browser == ” kind of logic.

It’s also necessary to start the Selenium RC server before running the specs, but a bit of Ant magic fixes this.

We’ve got these specs running on our TeamCity continuous integration server, using the TeamCity runner supplied with Specs, where we get nicely formatted reports as to what’s pending (e.g. not finished being written yet), what’s passing, and what’s failing.

The specs written with Selenium this way are also a bit slow, as they must actually wait in some cases for the browser (and the underlying app!) to catch up. When run with IE as the browser, they’re more than just a bit slow, in fact…

They are, however, gratifyingly black-box, as they don’t have any connection to the code of the running application at all. For that matter, the application under test can be written in any language at all, and in this case is a combination of J2EE/JSP and some .NET.

There’s a lot of promise in this type of testing, even with it’s occasional frustrations and limitations, and I suspect we’ll be doing a lot more of it.

Scala Tip: Check for Numerics
Aug 7th, 2009 by Mike

A colleague the other day need to write a test that checked a string to ensure it was all digits, and happened to be using Scala for it.

An initial thought was to just iterate through the string character by character – but that was laborious and verbose…

A second thought was to just try to convert the string to an integer with toInt, then watch for the exception when it wasn’t, but even this seemed a bit heavy-handed, now that we’re getting used to Scala’s expressiveness.

After a quick experiment we came up with a cleaner solution: Use the built-in abilities to convert the string to a list, then use a partially applied function to filter the list, then take the length of the resulting all-digits string. If this length is not the same as the length of the original string, then there are non-digit characters in the string – otherwise, it’s all digits.

Here’s a transcript of a Scala command-line session trying this out:

scala> "foo".toList.filter(_.isDigit).length
res5: Int = 0

scala> "123".toList.filter(_.isDigit).length
res6: Int = 3

scala> "abc123".toList.filter(_.isDigit).length
res7: Int = 3
</pre>
Of course, you can turn this into a test easily enough by
assertEquals(originalString.length, originalString.toList.filter(_.isDigit).length)

Now that's pretty concise :)

Automatic Dependency Management with Maven
May 28th, 2009 by Mike

Apache Maven is a lot more than a “build tool”, and one of it’s major strengths is it’s ability to manage dependencies.

Maven’s not just for external dependency management, though – it can help us work faster and more easily with our own modules as well as those written by others. In fact, it’s “internal” dependency management is actually far more powerful for most development shops.

Every dependency Maven manages is identified with 3 pieces of information – it’s group id, it’s artifact id, and it’s version. Group id is often some sub-domain of the company it’s working on, e.g. com.point2.somemodule, and the artifact id helps identify the specific module with that group, like rest-api or such.

Possibly the most interesting part is the version number, though, as this is where the real power of Maven comes to the fore. Versions allow us to maximize the opportunity for parallel development without descending into unversioned chaos. Each version represents a specific point in time in a library’s development – and, most importantly, allows us to “re-assemble” our application to a known state at any time (not re-build it).

Let’s take a for-instance to see how this might work…

Component-Based Application “Assembly”
For example, let’s say I’ve got a few teams working on different modules for my new application, let’s call them “persistence”, “rest-api” and the user-interface, “ui”. Each of these modules depends on a set of common utility classes in “util”.

We can represent this through a set of triples like so:

rest-api depends-on persistence
rest-api depends-on util
ui depends-on rest-api
persistence depends-on util (directly, and not only on the transitive dependency through persistence)

The unseen aspect here is the versioning. If we include versions in our triples, we see the picture is a bit more sophisticated:

rest-api-1.0 depends on persistence-3.1
rest-api-1.0 depends-on util-1.1
ui-1.0 depends-on rest-api-1.0
persistence-3.1 depends-on util-1.0

Now we have a fully defined dependency graph that we can assemble into an application, say app-1.0. At any time, if we want a copy of the app in 1.0 state, we re-construct it from this deployed modules, no need to build any source code, and we’ve got the exact same app, every time.

Get it in motion…
Now let’s look at this in a dynamic development environment, where we’re trying to maximize sustainable velocity:

Although there are dependencies between each module, we don’t want to hold up one team by forcing them to build the other teams modules unnecessarily. We also want each team to choose if they want to work with the very latest version of the other modules, or working against a fixed and stable version for a time instead.

The “ui” team, for example, might be refactoring JavaScript code that’s relying on version 1.0.3 of “rest-api”, while “rest-api” in turn is already working on 1.0.4 – and it uses 1.1.0 of “persistence”… it can get tangled in a hurry without a way to manage it, and we don’t want to be artificially discouraged from writing modular code just because it’s hard to keep version numbers straight.

Enter Maven again. Instead of forcing everyone to just always work with the latest version of every other module (which can bring productivity to a screeching halt in some situations), we allow each time to decide what dependency they will include in their POM (Project Object Model) file.

What if I want the very latest version of “persistence” while I work on “rest-api”, with changes checked in by other developers while I’m still working? This is where the SNAPSHOT version comes into play. Instead of declaring a dependency on 1.1.0, I declare a dependency on 1.1.1-SNAPSHOT. This represents the latest “edge” code for the referenced dependency.

Now we have a graph that looks like this:

app-1.0-SNAPSHOT depends on ui-1.1-SNAPSHOT
rest-api-1.1-SNAPSHOT depends on persistence-3.1-SNAPSHOT
rest-api-1.1-SNAPSHOT depends-on util-1.1
ui-1.1-SNAPSHOT depends-on rest-api-1.1-SNAPSHOT
persistence-3.1-SNAPSHOT depends-on util-1.1

As you can see, we have a mix of stable versioned modules (util in this case), and “on the fly” versions. Yet at the same time we’re assured that major changes that break backwards compatibility will not be seen, as we indicate such changes with a change in our major version number (e.g. 1.X to 2.0).

Then we can set up a CI job (say on TeamCity, Bamboo, or whatever your CI system of choice is) to automatically build and deploy our SNAPSHOT version of “persistence” to our local Maven repository (within our company firewall). The SNAPSHOT version actually turns into a date/timestamped version when it’s deployed to Nexus, and Maven is clever enough to fetch for us the most recent of these SNAPSHOTs every time we build. The “persistence” team checks in some code, CI builds it and deploys the resulting SNAPSHOT jar to our repository, and we get it automagically the next time we build, even though we’re working on rest-api, not persistence.

When we’re ready to “stabilize” our dependencies, we simply switch from the SNAPSHOT to a specific version. Maven has a pre-defined “release” process that guarantees, among other things, that every released version has no remaining SNAPSHOT dependencies, is tagged to version control, and verified via all it’s tests. More than a build tool indeed…

We could of course just put all the modules we’re going to depend on in an aggregator POM, and build everything every time we make a change, but this is hardly efficient, and limits our development velocity unnecessarily (and of course we might not all be in the same source tree, or even the same version-control repository). We want to be building smaller pieces, not bigger ones.

A critical part of this process is our company-local Maven repository – here I mean not just the developer-local repository on each developers own workstation, but a product like Nexus that holds a company-wide copy of all required jars for a build. By doing this, we can guarantee a consistent copy of all our required dependencies without having to depend on the availability of outside repositories, such as ibiblio. It’s not a bad idea to in fact *only* permit access to the local repository for building releases, which ensures this policy is not violated accidentally – while at the same time keeping the “external” maven repo’s available to developers for experimenting and prototyping. Once something gets used in production code, however, it gets stored in the “inside the firewall” Nexus repo (and backed up from there). This avoids the bad practice of checking jar files into source control (it’s called “source” control for a reason).

Testing, Testing…
To add a new aspect to the problem, let’s say that it’s not only production code we depend on, but helper classes for tests as well. If it’s difficult to set up a fixture for a certain kind of test, that might be a code smell in and of itself, but that’s also another story. If we have some test helpers that reside in our dependent modules, we won’t be able to see those helpers in our tests in another module, as we’re only depending on that module’s production code, not test code.

We can easily tell Maven to also bundle up the test code from a certain module, however, and make it available to us in a jar file, like so:

 <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>test-jar</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            ...
     </plugins>
</build>

Now when we build, we’ll get a test jar as well as our regular jar, which we can depend on like so:

   <dependency>
            <groupId>com.point2.core</groupId>
            <artifactId>somemodule</artifactId>
            <version>1.0</version>
            <scope>test</scope>
            <classifier>tests</classifier>
        </dependency>

Now our test classes in the module declaring the above dependency can see the test helpers in the somemodule module – but we’re still not including test code in our production jar.

Again, I have to emphasize that this level of coupling might indicate a deeper issue, but if you do need to do this, it’s good to know how :)

Maven also includes facilities to analyze and clean up a complex dependency tree, remove unnecessary dependencies, and keep the whole project manageable.

In summary, Maven can handle extremely complex dependency management for us in a fully declarative and versioned manner, allowing us at any moment to see exactly what our project depends on, both in production and test code. In conjunction with a CI system (like TeamCity) and repository server (like Nexus), we can automate the deployment of intermediary and full-release versions to the point where we save significant time, and never build code that we’re not actually working on, allowing us to concentrate on the task at hand and leaving the heavy lifting to Maven.

This allows us to only ever build the code we’re actually changing – never code that’s already available in another library, reducing our developer cycle time significantly. It also means we’re spending more time “assembling” software from re-usable components than re-compiling (and probably re-testing) code that’s already verified and available in object form.

Maven: not just for breakfast anymore.

Built-In Database Upgrades
May 28th, 2009 by Mike

Often times an application (web application or otherwise) involves a database for storing it’s persistent data. As the application evolves over time, the database needs to change, and sometimes existing data needs to be updated to reflect those changes.

How do we keep the app in sync with the schema changes of the corresponding database, in all the environments we might deploy into – such as local, testing, perhaps staging, and production? In some situations we want to re-create the database from scratch (such as a testing environment, for instance). In others (such as a production environment), we definitely don’t want to re-create the database, but just make carefully scripted modifications. In all cases, we want to make only the necessary modifications, based on the current state of the database.

A tool that a colleague recently recommended to me for this job is dbdeploy, a utility for maintaining synchronization between our application and our database. Dbdeploy is less complex than other solutions, such as Database Migrations in Rails, arguably easier to implement and more portable to non-Ruby/Rails environments, and uses just SQL as the language to express the database changes.

Like Migrations, dbdeploy allows us to quickly and easily script our database changes, then apply those changes as required to a specific database instance in each of our deployment environments.

We can create a series of simple SQL files, like

  • 001_create_base.sql
  • 002_expand_customer_name.sql
  • 003_add_order_status.sql
  • 004_update_customers.sql

…and so forth. Each file represents a single simple change to our database schema, and is numbered in the order in which they occur. We don’t edit them, but add new ones as further changes are made, to ensure backwards-compatibility with all versions of our database.

Dbdeploy uses a change-log table, added to each of our target databases, to keep track of which scripts have already been run in that environment, automatically producing a script containing the set of changes that bring the current database up to the latest specification, including only the necessary changes and updates.

Dbdeploy comes with a command-line interface and an Ant task implementation, each of which is very straightforward to use. DbDeploy by default does not apply the changes, however – it only produces an SQL script that we can then review and either apply manually or via a second Ant task (or other means). Both dbdeploy and this second task need the connection information for the database supplied, however – the driver, JDBC URL, login and password. This is where I started to think “this can be simpler”…

If we’re deploying an application whose configuration already includes the necessary data to connect to the database, we could employ dbdeploy in a different way that overcomes this issue, as well as adding a few other advantages.

Let’s say in our development environment we keep our dbdeploy delta scripts in a directory that gets bundled into our finished application as resources. Maven can easily do this if you just put the files in the src/main/resources directory (by default), for instance. Now our database scripts can be bundled with our application automatically every time it’s built, without having any extra files to worry about.

We can also include dbdeploy in our classpath, instead of using it as a separate utility, and wrap up our application as a single jar file that includes our database definition scripts. We can extract these files from our application jar on demand with code like this (we don’t show the getResources method for clarity, but it gets a list of resource paths matching a specified pattern from our current classpath):

  public static String extractScripts() throws Exception {
        File scriptsDir = new File(SCRIPT_DIR_NAME);
        if (!scriptsDir.exists())
            if (!new File(SCRIPT_DIR_NAME).mkdirs())
                throw new RuntimeException("Could not create dir 'scripts'");
        Collection<String> scripts = getResources(".*.sql");
        for (String script : scripts)
            extractToDir(script);
        return SCRIPT_DIR_NAME;
    }

    private static void extractToDir(String script) throws Exception {
        String endName = new File(script).getName();
        InputStream in = script.getClass().getResourceAsStream("/" + endName);
        if (in == null)
            throw new RuntimeException("Can't find resource " + endName);
        FileOutputStream fos = new FileOutputStream(new File(SCRIPT_DIR_NAME + "/" + endName));
        byte[] buf = new byte[1024];
        int i = 0;
        while ((i = in.read(buf)) != -1)
            fos.write(buf, 0, i);
        fos.close();
    }

Let’s say our main application is invoked with “java -jar someApplication-1.0.jar”. This implies we’ve set up a default main class and method in our jar meta-data – it’s the same as saying “java -cp someApplication-1.0.jar com.point2.main.MyMainClass”, just easier to type.

Nothing stops us having another main class, however, that we can invoke on demand, e.g.:
java -cp someApplication-1.0.jar com.point2.main.DbDeploy

Code like this allows us to run dbdeploy on our extracted scripts, while passing it the database connection info we already know:

 public static void prepScripts(String dbType, String driverName, String dbUrl, String userName, String password) throws Exception {
        com.dbdeploy.DbDeploy runner = new com.dbdeploy.DbDeploy();
        runner.setDbms(dbType);
        runner.setDriver(driverName);
        runner.setUrl(dbUrl);
        runner.setOutputfile(new File("scripts/output.sql"));
        runner.setUserid(userName);
        runner.setPassword(password);
        runner.setScriptdirectory(new File(SCRIPT_DIR_NAME));
        runner.go();
    }

This code produces our “output.sql” script, which contains exactly the required changes to our specified database to bring it up to date:

Then we can actually execute the resulting script (if we want to) via a method like so:

 public static void applyScript(String script, String driverName, String dbUrl, String userName, String password) throws Exception {
        StringBuilder sb = new StringBuilder();
        FileInputStream fstream = new FileInputStream(SCRIPT_DIR_NAME + "/" + script);

        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;
        while ((strLine = br.readLine()) != null) {
            sb.append(strLine);
            sb.append("n");
        }
        in.close();
        executeSql(sb.toString(), driverName, dbUrl, userName, password);
    }

    private static void executeSql(String sql, String driverName, String dbUrl, String userName, String password) throws Exception {
        Driver driver = (Driver) Class.forName(driverName).newInstance();
        Connection connection = driver.connect(dbUrl, null);
        Statement statement = connection.createStatement();
        statement.executeUpdate(sql);
        connection.close();
    }

Doing this starts a sequence that

  1. Extracts the database change scripts from the jar file
  2. Uses the application’s own configuration to get the JDBC Driver, URL, login and password information
  3. Invokes dbdeploy (internally) to create the database update script appropriate to whatever database we’re connected to
  4. Optionally, if we include the “execute” option on our command-line, run the resulting script – if necessary, creating the “change log” table used by dbdeploy in the process

As a finishing touch, we can examine the changelog table and determine exactly what incremental version our target database has been brought up to, and compare this to a known required version number in our application. In this way, our application can automatically verify that the dbdeploy process has been done on the database it’s being run against – and if it isn’t, we can output the proper command-line to perform the upgrade as a suggestion to the user (or even do it automatically, if permissable in our environment).

Now, instead of having a number of extra pieces to deploy to our target environment, we’re down to one nice tidy self-contained jar file, and we can choose to either simply update the database immediately or dump the script so we can examine what it’s going to do first, and either then decide to go ahead and apply or apply the changes manually ourselves.

Compared to dbdeploy in either Ant or standalone mode we’ve got the following advantages:

  1. We keep the simplicity of our single-jar deployment
  2. We can upgrade our database in any environment our application can run in – no need for extra tools
  3. We don’t need to repeat our database configuration information in two places – so they can’t get out of sync and end up running against the wrong database
  4. We ensure that the scripts required for the current version of the code are always with the code
  5. We guarantee that our application always runs against the database versioned to the expected state

I definitely recommend dbdeploy for this type of scenario.

»  Substance: WordPress   »  Style: Ahren Ahimsa