I recently had occasion to use a trick with test execution that is simple and very helpful, but that I forget about frequently, so I thought I’d pass it along here.
We use Maven to execute our unit (and sometimes functional, but that’s another story) tests for a suite of applications my team works on. Some of our apps have quite a few tests, and even though each unit test is very fast (as good unit tests ought to be), running that many of them still takes a bit of time. We’re also burdened with a very slow CI server, unfortunately, which aggravates that situation.
Unit tests, by definition, do not rely on the operation of any other test – they are self-contained, stateless and repeatable. This means they’re ideally suited to running in parallel, and fortunately, that’s very easy to do with Maven.
Maven uses the Surefire Plugin to do it’s test-running, and it’s configurable to run tests in parallel. Our project uses a parent pom (different from an aggregator pom) to specify common configuration, so our pom files stay lean. In our parent pom, we just add the following:
org.apache.maven.plugins maven-surefire-plugin 2.5 methods 20
And we’ve told Maven to run our tests in a maximum of 20 threads, and to make each method run in parallel. (There are other options here for only running classes in parallel, etc, which might be helpful for functional tests).
Just doing that took a minute off the time of a couple of our builds, and a bit off the time of even some of the smaller ones. Of course, it’s even more effective if you’re running on a multicore machine, but even the feeble VM running our CI was faster with it there.
Enjoy!
Another update in our team’s journey to functional testing nirvana via Specs and Selenium.
We discovered an earth-shattering revelation (not): Internet Explorer Sucks. Unfortunately, it’s the “target” browser for a certain project we’re working on, therefore we’re stuck with it. Being stuck with it means finding a way to make it suck sufficiently less that it will actually run our lovely tests, which Firefox (and several other browsers) digest quite happily.
One way in which IE sucks is that it’s painfully slow – test that take a few seconds on Firefox timeout after several minutes in IE. If you keep cranking out the timeout they might eventually pass, but we want test results in our lifetimes, so we set about finding a way to get our feedback out of IE a bit faster.
It turns out one of the very slow things about IE is asking for an element via an xpath, and we were doing this a lot. We tried reducing the number of places we did this, and indeed that helped a bit, but there were some things on our existing app that were really hard to refer to without xpath. To ease the pain further, we took a slight detour. Instead of asking Selenium for a specific xpath, which then caused Selenium to ask the browser, we fetched the entire output of the browser (via the getHtmlSource method in Selenium).
What we’d like to do instead, where possible, is simply make assertions on the XML of the entire page – so we get it just one time, and so that we don’t depend on the Xpath implementation of the browser to find bits of our page for us.
Unfortunately, we still can’t ask anything xpath-ish of it, as although it bears a close resemblance to XML, it’s not really html, so any parser we could find choked on it.
Enter HtmlCleaner a handy little jar that can digest even the most disguisting HTML and deposit clean shiny XML in it’s place. Now we could take our XML version of what the browser spat out and finally make Xpath calls to it.
Now Scala itself has excellent support for XML (see the whole book here), but it’s support for Xpath is… interesting. Xpath is actually done via a series of methods and functions, meaning although it is very powerful, it’s not the same xpath as you might be used to, or, more importantly in our case, not the same as you could expect a browser to understand. This means we could either re-write all our Xpath queries (and make it harder to use Selenium IDE or Firebox to help us write new ones), or find another way…
Enter Jaxen, a handy Java lib with full Xpath support, without all the weight of other solutions. Of course, as Specs is written in Scala and Scala happily uses any existing Java lib, we could just drop a jar into our “lib” dir and work some magic.
We needed to convert the HTML produced by Selenium into XML, then turn it into a DOM document that Jaxen can process, so we can ask questions in xpath of the resulting Jaxen document.
This little incantation:
val cleanerAndProps = initCleaner private def initCleaner = { var cleaner : HtmlCleaner = new HtmlCleaner(); var props : CleanerProperties = cleaner.getProperties(); props.setTranslateSpecialEntities(true) props.setRecognizeUnicodeChars(true) props.setOmitComments(true) (cleaner, props) }
Returns a tuple of the HtmlCleaner instance and it’s properties, which we can then use for our conversion somewhat like so:
def getHtmlText(element: String) = { var node : TagNode = cleanerAndProps._1.clean(selenium.getHtmlSource()) val domSerializer: DomSerializer = new DomSerializer(cleanerAndProps._2) val document: org.w3c.dom.Document = domSerializer.createDOM(node) val xpath = new DOMXPath(element) xpath.stringValueOf(document) }
This method then takes a string (actually an xpath query), “cleans” the HTML from Selenium, then serializes it into a DOM (a regular org.w3c.dom.Document), allowing us to use Jaxen’s DOMXpath object to fish out the string value of whatever our Xpath query returns, allowing us to say things like:
manufacturerSelected = getHtmlText("//table[@class='DataList']//a")
In our tests.
Of course, we can dress this with other methods that get attributes and other things from the DOM tree, but you get the point.
The nice part about all this is that it doesn’t require a new round trip to the browser – we can assert as many things as we want and fish out as many values as we need directly from the resulting page in milliseconds now, instead of several seconds (or worse for IE) before.
So now we can run our tests much faster than before – but we’re writing a lot of tests, so it’s going to get slow again if we can’t scale better, even with the optimization with Jaxen and HtmlCleaner.
So we looked into Selenium Grid. Before, we had our Ant script that ran the tests (whether locally or on TeamCity), fire up a Selenium RC Server before the run, and shut it down again afterwards. This was time consuming, and problematic as we added more suites of tests. We wanted our tests to be able to run all at the same time, but each Selenium server (which does the communicating with the browser), needs to be on a different port for this to happen. Gah – this was getting complicated.
Selenium Grid, however, address this problem and several more at once.
It allows you to start a single “Hub” server, on one port (4444 by default). This hub then redirects load from tests to one of a number of actual Selenium RC servers that you fire up and leave running long-term.
In our case, we started 3 RC servers to talk to FireFox, and one for IE (it’s not a good idea to run more than one IE RC server on a single machine, as IE doesn’t play nice with other IE’s).
Then on other VM’s we can fire up even more RC servers – both IE and FireFox flavors. Selenium Grid gives you a nice control panel to see the servers and which ones are available, and you can add new servers (and remove them) while the grid is up.
The max number of test runs you can have going at once is now the total number of Selenium RC servers in your grid, allowing many short sets of tests to all run at once, giving much faster actual elapsed time before you get feedback when something breaks.
Of course, one of the benefits of this new approach with Selenium Grid is that I can now test if something works via IE on Windows – from my Mac! For that matter, any combination of browsers and operating systems that we’ve got an RC server for can be tested, all at once. Good stuff.
There are still plenty of frustrations in this process – if you kill a test suite before it finishes, for instance, it appears that the Selenium RC server it was talking to remains unavailable forever, and when you restart the hub, you have to restart all of the RC servers… but it’s a step up from where we were.
Java has always had the ability to develop concurrent applications – right from the start, the “Thread” class and its kin gave a single Java application the power to spawn off a separate process, while continuing with the original process. Like all concurrency, it was of course limited by the amount of cores or systems available to run processes, but the basics were there.
More recently, however, it’s become clear that higher levels of concurrency are required in order to make our applications as scalable as we’d like, especially in light of recent hardware trends that give us not as many huge leaps upward in terms of speed, but provide more and more cores for our applications to spread themselves into in each system.
Concurrency is more than running multiple processes on the one box, though – we must think of the problems we’re trying to solve in new ways, ways that lend themselves well to expressing groups of smaller tasks that can all be happening at once, with as little temporal dependency as possible – that is, trying to reduce the need for one task to wait for another task to complete before it can move ahead. This is not just a programming issue, it has to be taken into account at design-time as well, to a degree.
Java’s original concurrency support was oriented more towards richer threads – entire streams of execution that could exist for a long time if needed. Just the thing if you’re writing a web server, or something similar. These richer threads were expensive to create, however, so a developer didn’t want to just spin off thousands of them, or the hope-for increase in scalability would be lost to the overhead pretty quickly.
Other problems lend themselves better to a lighter-weight kind of concurrency, where the cost of breaking up into smaller concurrent tasks is lower, and the tasks generally shorter-lived and less complex than a full thread. This leads us to lightweight alternative methodologies such as Fibers, Processes, and Actors, such as those found in more functional-oriented languages, such as Haskell, Erlang and others.
In more recent incarnations of Java, libraries were added to the JDK to allow us to do quite a bit of lightweight concurrency right in good ole’ fashioned Java, however, and these bear closer inspection if you’re not familiar with them.
Let’s take a closer look at java.util.concurrent and it’s classes by way of a code example…
Let’s say we’ve got a class to do some useful piece of work, say a piece of mathematics, that takes some time to process. Let’s call it “UsefulWork” in a fit of creativity. It’s created with a parameter that gives it what it needs to do the useful thing it does – in our example, just an integer.
We have a lot of these classes to run in order to solve our overall problem, each with a different set of parameters. When we’re all done, we need to add up the results and present the final answer to our caller.
In the sequential world, we might defined our UsefulWork process as a simple method, like so:
private long doUsefulWork(int param) { ... do the number crunching... return result; }
and we might call this method a bunch of time, once with each of our parameters, adding up the overall result as we go, like thus…
long finalResult = 0L; for (int i = 0; i < numberOfJobs; i++) finalResult = finalResult + doUsefulWork(param[i]); }
Each computation complete, we add up our answer, and life is good. But if we look at this problem in a different way, there’s nothing about each call to doUsefulWork that requires the answer from the previous call – each one does it’s job independently. So what’s stopping all the useful work from happening concurrently, rather than one piece at a time? Not a thing – let’s see how that would look.
Instead of a method, we now define a little class to do our useful computation for us, while implementing the Callable interface, like this:
private class UsefulWork implements Callable<Long> { private int param = 0; public UsefulWork(int param) { this.param = param; } @Override public Long call() throws Exception { ... work out the result, using the param, however it is we do that.... } return amount; } }
The “call()” method is defined in the Callable interface, and is a bit like the “run()” method from Thread, except it allows us to return a result.
Now we tee up a series of our UsefulWork classes, like so:
List<UsefulWork> usefulWorkList = newArrayList(new UsefulWork(100), new UsefulWork(50), new UsefulWork(200), new UsefulWork(10), new UsefulWork(500), new UsefulWork(125), new UsefulWork(80));
(we’re using a method from the invaluable Google Collections library to spin up a new ArrayList of the proper type here)
Now we fire up our new Callables like so:
ExecutorService executorService = Executors.newFixedThreadPool(POOL_SIZE); List<Future<Long>> futures = executorService.invokeAll(usefulWorkList, 5, TimeUnit.SECONDS); Long total = 0L; for (Future future: futures) total = total + (Long) future.get();
In the first line above, we create an ExecutorService, in this case, using a fixed-size “pool” of threads which will be used to run our jobs (this is the expensive bit, so it’s nice to do this in some kind of initialization only once – the same pool can be re-used).
Then we create a list of Future objects, each of which deals with type “Long”, by calling the executorService.invokeAll. The parameters are our list of Callables, timeout, and the unit in which the timeout is expressed – in this case, seconds. What we’ve said is “start doing all these things I’m giving you at once, but don’t allow any one of them to run more than 5 seconds”. We can also use unlimited timeouts, but then a hanging Callable can get us in trouble.
The rest of our code above is just collecting the results – once we’ve made the call to “invoke” our UsefulWork’s are all churning away (POOL_SIZE of them at a time, anyway), doing whatever they do to get their invdividual answer without waiting for the herd.
Our for loop at the end simply collects the fruit of their labours, retrieving the results of the calculations that have already been performed. It’s important to note that the call to “get()” is not actually doing the calculation at that moment – the actual calculation is already finished, ready to return. This is the power of the Future interface – the ability to do it’s job and then be available at a later time for retrieving the result.
This is of course an extremely simple example, and doesn’t take into account several things you might need to think about for real production code, but I think it gives a taste of what’s possible just with what comes with the standard JDK.
In a later post we’ll lean on java.util.concurrent a bit harder, and see what other things we can do.
Given a need to work towards more scalable systems, and to increase concurrency, many developers working in Java wonder what’s on the other side of the fence.
Languages such as Erlang and Haskell bring the functional programming style to the fore, as they essentially force you to think and write functionally – they simply don’t support (well, at least not easily) doing things in a non-functional fashion very easily.
The mental grinding of gears, however, can be considerable for developers coming from the Java (or C/C++) worlds in many cases, who have become familiar with the Object-oriented paradigm. An excellent solution might be to consider the best of both worlds: Scala.
I had the opportunity this weekend to renew my acquaintance with Scala, and to work on blending it with the old mainstay, Java. Scala runs on the JVM, so as you might expect, it can be combined pretty readily with Java – more cleanly than before, given some new Maven and IDE plugins.
IDE Support for Scala My IDE of choice lately has been Netbeans, and sure enough, there’s a (beta) Scala plugin for Netbeans.
Installing this plugin gives me syntax-aware editing of Scala source, along with basic refactoring support. Here’s the obligatory “Hello, World” in Scala sitting happily in my Netbeans window, color-highlighted syntax and all:
Maven plugin In order to organize my code, tests, and generally make life easier, I’ve used Maven to first generate the project, then created the code above (and some more, as we’ll see in a minute). I started with the regular “mvn archetype:generate” command to build my source directories just as for a Java project, then added a bit of Maven magic to the generated POM to use the Maven Scala plugin. Here’s the full POM:
Lines 21-25 tell Maven where to find the actual Scala plugin 29-33 add the Scala libraries to our list of dependencies
Lines 44 thru 48 add the Scala plugin itself to our build lifecycle
Lines 58 thru 75 specify the lifecycle events we bind to for actually compiling first the Scala production code, then the tests.
When we open this POM with Netbeans, you’ll see we’ve got a folder defined for both our Scala sources and tests, and our regular folders for Java sources and tests:
Calling Scala code from Java code
Now that we’ve got our project defined to contain both Scala and Java, let’s put them to work together. We can call our Scala code directly from Java just as if the objects involved were Java, like so:
HelloWorld.main(null)
The only class we’ve got defined with the name HelloWorld is the Scala version – but it looks like Java as far as our Java classes are concerned. (There are some things to keep in mind when doing more advanced types between the two, but if you keep your interfaces straightforward these edge cases don’t arise very often).
Calling Java code from Scala code
The reverse is also true: we can make use of the entire library of existing Java code out there from Scala, even easier than in the reverse direction, as the more sophisticated type system of Scala handles anything Java can throw at it. Let’s look at a good example: Using JUnit4 (a Java library) to test both Scala and Java code.
Using JUnit4 to test Scala
The simple and expressive JUnit library has implementations in a number of other languages, but in the case of Scala and Java, the connection is so close we don’t really need a Scala version at all – we can use JUnit directly from a test written in Scala, as described below.
With the test written in Scala…
Here we see a Scala test, DogTest, running from within Netbeans (using Maven).
(Yes, the DogTest class is actually defined in a file called AppTest.scala – perfectly valid in Scala)
The results can be seen in the lower window (printing “Hello, World”).
As you can see, we’re using JUnit4, even with it’s Annotations, despite the fact the test is written in Scala.
With the test written in Java…
Now let’s go the other way: This is simply a matter of writing a regular Java Junit4 test, except instead of calling a Java class we’re actually calling our Scala class again.
In this example, you can see we’re using JUnit 4 in the usual way, but on line 20 and 25 we make reference directly to Scala classes, just as if they were regular Java classes. (The IDE shows them underlined due to a classpath issue, but they compile just fine in any case – the Netbeans Scala support is still Beta, after all).
As you can see, these two languages blend very smoothly in many respects, even more so than one of my other favorite combinations, Java and JRuby.
Although it is cool to see Scala and Java in a single project, I probably wouldn’t do it this way for production code. A cleaner approach might be to write a pure Scala module that is then a dependency of the Java project (or vice-versa), using Mavens ability to handle cross-project SNAPSHOT dependencies to break our work up into more manageable chunks.
We can then call Scala functionality from any Java application, even webapps, while still working with all of the exact same tools and procedures we’re used to for pure Java projects. Of course, we can also go the other way: write Scala web applications (perhaps with the excellent Lift framework) and call existing Java functionality, just as easily.
I predict Scala will be used in more and more Java (and other JVM-based language) projects where the advantages of functional programming and high concurency are necessary, while at the same time preserving the massive investment in Java many of us already have.
Long live the happy couple!