»
S
I
D
E
B
A
R
«
Speed up your Unit Tests with Parallel Execution
Jun 8th, 2010 by Mike

I recently had occasion to use a trick with test execution that is simple and very helpful, but that I forget about frequently, so I thought I’d pass it along here.

We use Maven to execute our unit (and sometimes functional, but that’s another story) tests for a suite of applications my team works on. Some of our apps have quite a few tests, and even though each unit test is very fast (as good unit tests ought to be), running that many of them still takes a bit of time. We’re also burdened with a very slow CI server, unfortunately, which aggravates that situation.

Unit tests, by definition, do not rely on the operation of any other test – they are self-contained, stateless and repeatable. This means they’re ideally suited to running in parallel, and fortunately, that’s very easy to do with Maven.

Maven uses the Surefire Plugin to do it’s test-running, and it’s configurable to run tests in parallel. Our project uses a parent pom (different from an aggregator pom) to specify common configuration, so our pom files stay lean. In our parent pom, we just add the following:


  org.apache.maven.plugins
  maven-surefire-plugin
  2.5
  
methods
20
  

And we’ve told Maven to run our tests in a maximum of 20 threads, and to make each method run in parallel. (There are other options here for only running classes in parallel, etc, which might be helpful for functional tests).

Just doing that took a minute off the time of a couple of our builds, and a bit off the time of even some of the smaller ones. Of course, it’s even more effective if you’re running on a multicore machine, but even the feeble VM running our CI was faster with it there.

Enjoy!

Putting the “fun” back in Functional Testing
Aug 14th, 2009 by Mike

Another update in our team’s journey to functional testing nirvana via Specs and Selenium. :)

We discovered an earth-shattering revelation (not): Internet Explorer Sucks. Unfortunately, it’s the “target” browser for a certain project we’re working on, therefore we’re stuck with it. Being stuck with it means finding a way to make it suck sufficiently less that it will actually run our lovely tests, which Firefox (and several other browsers) digest quite happily.

One way in which IE sucks is that it’s painfully slow – test that take a few seconds on Firefox timeout after several minutes in IE. If you keep cranking out the timeout they might eventually pass, but we want test results in our lifetimes, so we set about finding a way to get our feedback out of IE a bit faster.

It turns out one of the very slow things about IE is asking for an element via an xpath, and we were doing this a lot. We tried reducing the number of places we did this, and indeed that helped a bit, but there were some things on our existing app that were really hard to refer to without xpath. To ease the pain further, we took a slight detour. Instead of asking Selenium for a specific xpath, which then caused Selenium to ask the browser, we fetched the entire output of the browser (via the getHtmlSource method in Selenium).

What we’d like to do instead, where possible, is simply make assertions on the XML of the entire page – so we get it just one time, and so that we don’t depend on the Xpath implementation of the browser to find bits of our page for us.

Unfortunately, we still can’t ask anything xpath-ish of it, as although it bears a close resemblance to XML, it’s not really html, so any parser we could find choked on it.

Enter HtmlCleaner a handy little jar that can digest even the most disguisting HTML and deposit clean shiny XML in it’s place. Now we could take our XML version of what the browser spat out and finally make Xpath calls to it.

Now Scala itself has excellent support for XML (see the whole book here), but it’s support for Xpath is… interesting. Xpath is actually done via a series of methods and functions, meaning although it is very powerful, it’s not the same xpath as you might be used to, or, more importantly in our case, not the same as you could expect a browser to understand. This means we could either re-write all our Xpath queries (and make it harder to use Selenium IDE or Firebox to help us write new ones), or find another way…

Enter Jaxen, a handy Java lib with full Xpath support, without all the weight of other solutions. Of course, as Specs is written in Scala and Scala happily uses any existing Java lib, we could just drop a jar into our “lib” dir and work some magic.

We needed to convert the HTML produced by Selenium into XML, then turn it into a DOM document that Jaxen can process, so we can ask questions in xpath of the resulting Jaxen document.

This little incantation:

 val cleanerAndProps = initCleaner

  private def initCleaner = {
    var cleaner : HtmlCleaner = new HtmlCleaner();
    var props : CleanerProperties = cleaner.getProperties();

    props.setTranslateSpecialEntities(true)
    props.setRecognizeUnicodeChars(true)
    props.setOmitComments(true)
    (cleaner, props)
  }

Returns a tuple of the HtmlCleaner instance and it’s properties, which we can then use for our conversion somewhat like so:

def getHtmlText(element: String) = {
      var node : TagNode = cleanerAndProps._1.clean(selenium.getHtmlSource())
      val domSerializer: DomSerializer = new DomSerializer(cleanerAndProps._2)
      val document: org.w3c.dom.Document = domSerializer.createDOM(node)
      val xpath = new DOMXPath(element)
      xpath.stringValueOf(document)
  }

This method then takes a string (actually an xpath query), “cleans” the HTML from Selenium, then serializes it into a DOM (a regular org.w3c.dom.Document), allowing us to use Jaxen’s DOMXpath object to fish out the string value of whatever our Xpath query returns, allowing us to say things like:

manufacturerSelected = getHtmlText("//table[@class='DataList']//a")

In our tests.

Of course, we can dress this with other methods that get attributes and other things from the DOM tree, but you get the point.

The nice part about all this is that it doesn’t require a new round trip to the browser – we can assert as many things as we want and fish out as many values as we need directly from the resulting page in milliseconds now, instead of several seconds (or worse for IE) before.

So now we can run our tests much faster than before – but we’re writing a lot of tests, so it’s going to get slow again if we can’t scale better, even with the optimization with Jaxen and HtmlCleaner.

So we looked into Selenium Grid. Before, we had our Ant script that ran the tests (whether locally or on TeamCity), fire up a Selenium RC Server before the run, and shut it down again afterwards. This was time consuming, and problematic as we added more suites of tests. We wanted our tests to be able to run all at the same time, but each Selenium server (which does the communicating with the browser), needs to be on a different port for this to happen. Gah – this was getting complicated.

Selenium Grid, however, address this problem and several more at once.

It allows you to start a single “Hub” server, on one port (4444 by default). This hub then redirects load from tests to one of a number of actual Selenium RC servers that you fire up and leave running long-term.

In our case, we started 3 RC servers to talk to FireFox, and one for IE (it’s not a good idea to run more than one IE RC server on a single machine, as IE doesn’t play nice with other IE’s).

Then on other VM’s we can fire up even more RC servers – both IE and FireFox flavors. Selenium Grid gives you a nice control panel to see the servers and which ones are available, and you can add new servers (and remove them) while the grid is up.

The max number of test runs you can have going at once is now the total number of Selenium RC servers in your grid, allowing many short sets of tests to all run at once, giving much faster actual elapsed time before you get feedback when something breaks.

Of course, one of the benefits of this new approach with Selenium Grid is that I can now test if something works via IE on Windows – from my Mac! For that matter, any combination of browsers and operating systems that we’ve got an RC server for can be tested, all at once. Good stuff.

There are still plenty of frustrations in this process – if you kill a test suite before it finishes, for instance, it appears that the Selenium RC server it was talking to remains unavailable forever, and when you restart the hub, you have to restart all of the RC servers… but it’s a step up from where we were.

Built-In Database Upgrades
May 28th, 2009 by Mike

Often times an application (web application or otherwise) involves a database for storing it’s persistent data. As the application evolves over time, the database needs to change, and sometimes existing data needs to be updated to reflect those changes.

How do we keep the app in sync with the schema changes of the corresponding database, in all the environments we might deploy into – such as local, testing, perhaps staging, and production? In some situations we want to re-create the database from scratch (such as a testing environment, for instance). In others (such as a production environment), we definitely don’t want to re-create the database, but just make carefully scripted modifications. In all cases, we want to make only the necessary modifications, based on the current state of the database.

A tool that a colleague recently recommended to me for this job is dbdeploy, a utility for maintaining synchronization between our application and our database. Dbdeploy is less complex than other solutions, such as Database Migrations in Rails, arguably easier to implement and more portable to non-Ruby/Rails environments, and uses just SQL as the language to express the database changes.

Like Migrations, dbdeploy allows us to quickly and easily script our database changes, then apply those changes as required to a specific database instance in each of our deployment environments.

We can create a series of simple SQL files, like

  • 001_create_base.sql
  • 002_expand_customer_name.sql
  • 003_add_order_status.sql
  • 004_update_customers.sql

…and so forth. Each file represents a single simple change to our database schema, and is numbered in the order in which they occur. We don’t edit them, but add new ones as further changes are made, to ensure backwards-compatibility with all versions of our database.

Dbdeploy uses a change-log table, added to each of our target databases, to keep track of which scripts have already been run in that environment, automatically producing a script containing the set of changes that bring the current database up to the latest specification, including only the necessary changes and updates.

Dbdeploy comes with a command-line interface and an Ant task implementation, each of which is very straightforward to use. DbDeploy by default does not apply the changes, however – it only produces an SQL script that we can then review and either apply manually or via a second Ant task (or other means). Both dbdeploy and this second task need the connection information for the database supplied, however – the driver, JDBC URL, login and password. This is where I started to think “this can be simpler”…

If we’re deploying an application whose configuration already includes the necessary data to connect to the database, we could employ dbdeploy in a different way that overcomes this issue, as well as adding a few other advantages.

Let’s say in our development environment we keep our dbdeploy delta scripts in a directory that gets bundled into our finished application as resources. Maven can easily do this if you just put the files in the src/main/resources directory (by default), for instance. Now our database scripts can be bundled with our application automatically every time it’s built, without having any extra files to worry about.

We can also include dbdeploy in our classpath, instead of using it as a separate utility, and wrap up our application as a single jar file that includes our database definition scripts. We can extract these files from our application jar on demand with code like this (we don’t show the getResources method for clarity, but it gets a list of resource paths matching a specified pattern from our current classpath):

  public static String extractScripts() throws Exception {
        File scriptsDir = new File(SCRIPT_DIR_NAME);
        if (!scriptsDir.exists())
            if (!new File(SCRIPT_DIR_NAME).mkdirs())
                throw new RuntimeException("Could not create dir 'scripts'");
        Collection<String> scripts = getResources(".*.sql");
        for (String script : scripts)
            extractToDir(script);
        return SCRIPT_DIR_NAME;
    }

    private static void extractToDir(String script) throws Exception {
        String endName = new File(script).getName();
        InputStream in = script.getClass().getResourceAsStream("/" + endName);
        if (in == null)
            throw new RuntimeException("Can't find resource " + endName);
        FileOutputStream fos = new FileOutputStream(new File(SCRIPT_DIR_NAME + "/" + endName));
        byte[] buf = new byte[1024];
        int i = 0;
        while ((i = in.read(buf)) != -1)
            fos.write(buf, 0, i);
        fos.close();
    }

Let’s say our main application is invoked with “java -jar someApplication-1.0.jar”. This implies we’ve set up a default main class and method in our jar meta-data – it’s the same as saying “java -cp someApplication-1.0.jar com.point2.main.MyMainClass”, just easier to type.

Nothing stops us having another main class, however, that we can invoke on demand, e.g.:
java -cp someApplication-1.0.jar com.point2.main.DbDeploy

Code like this allows us to run dbdeploy on our extracted scripts, while passing it the database connection info we already know:

 public static void prepScripts(String dbType, String driverName, String dbUrl, String userName, String password) throws Exception {
        com.dbdeploy.DbDeploy runner = new com.dbdeploy.DbDeploy();
        runner.setDbms(dbType);
        runner.setDriver(driverName);
        runner.setUrl(dbUrl);
        runner.setOutputfile(new File("scripts/output.sql"));
        runner.setUserid(userName);
        runner.setPassword(password);
        runner.setScriptdirectory(new File(SCRIPT_DIR_NAME));
        runner.go();
    }

This code produces our “output.sql” script, which contains exactly the required changes to our specified database to bring it up to date:

Then we can actually execute the resulting script (if we want to) via a method like so:

 public static void applyScript(String script, String driverName, String dbUrl, String userName, String password) throws Exception {
        StringBuilder sb = new StringBuilder();
        FileInputStream fstream = new FileInputStream(SCRIPT_DIR_NAME + "/" + script);

        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;
        while ((strLine = br.readLine()) != null) {
            sb.append(strLine);
            sb.append("n");
        }
        in.close();
        executeSql(sb.toString(), driverName, dbUrl, userName, password);
    }

    private static void executeSql(String sql, String driverName, String dbUrl, String userName, String password) throws Exception {
        Driver driver = (Driver) Class.forName(driverName).newInstance();
        Connection connection = driver.connect(dbUrl, null);
        Statement statement = connection.createStatement();
        statement.executeUpdate(sql);
        connection.close();
    }

Doing this starts a sequence that

  1. Extracts the database change scripts from the jar file
  2. Uses the application’s own configuration to get the JDBC Driver, URL, login and password information
  3. Invokes dbdeploy (internally) to create the database update script appropriate to whatever database we’re connected to
  4. Optionally, if we include the “execute” option on our command-line, run the resulting script – if necessary, creating the “change log” table used by dbdeploy in the process

As a finishing touch, we can examine the changelog table and determine exactly what incremental version our target database has been brought up to, and compare this to a known required version number in our application. In this way, our application can automatically verify that the dbdeploy process has been done on the database it’s being run against – and if it isn’t, we can output the proper command-line to perform the upgrade as a suggestion to the user (or even do it automatically, if permissable in our environment).

Now, instead of having a number of extra pieces to deploy to our target environment, we’re down to one nice tidy self-contained jar file, and we can choose to either simply update the database immediately or dump the script so we can examine what it’s going to do first, and either then decide to go ahead and apply or apply the changes manually ourselves.

Compared to dbdeploy in either Ant or standalone mode we’ve got the following advantages:

  1. We keep the simplicity of our single-jar deployment
  2. We can upgrade our database in any environment our application can run in – no need for extra tools
  3. We don’t need to repeat our database configuration information in two places – so they can’t get out of sync and end up running against the wrong database
  4. We ensure that the scripts required for the current version of the code are always with the code
  5. We guarantee that our application always runs against the database versioned to the expected state

I definitely recommend dbdeploy for this type of scenario.

»  Substance: WordPress   »  Style: Ahren Ahimsa