Functional Testing for Web Applications
As far as I can tell, available tools for performing functional testing on web applications do not make it easy to reliably reproduce what happens when a real user interacts with the site. Admittedly, it's hard to get a comprehensive list of what tools are available and the relevant feature matrix, so there may be a killer solution that I haven't come across yet, and I hope that's the case. (For once, even Wikipedia is for want of information on the subject.)I argue that a solution for performing functional testing on modern web applications should have the following properties:
- User input should be able to be simulated using native key and mouse events.
- Tests should be able to be written in JavaScript.
- Test code should be able to suspend execution while waiting for network or other events.
- Tests should be able to run in multiple browsers and on multiple platforms.
- Tests should not hardcode XPaths, element ids, or CSS class names.
- Test suites should be able to be run via a cron job.
User input should be able to be simulated using native key and mouse events
Most testing frameworks that I have looked at do something analogous to selenium-browserbot.js to emulate user input. JavaScript libraries like these use the browser's built-in API to create JavaScriptEvent
objects and dispatch them.
To this end, Internet Explorer offers createEventObject()
and
fireEvent()
whereas other browsers (that comply with the
DOM Level 2
Events Specification) provide createEvent()
and
dispatchEvent()
. This is a sensible, clean solution
for automating input events that can be
wrapped in a uniform API that works across browsers.
Unfortunately, this does not accurately recreate what happens when a user clicks
the mouse or presses a key. For example, on Firefox, when automating a mouse
event, the event that is dispatched has null values for Firefox's custom
rangeParent
and rangeOffset
properties. If a user
performs a real click with the mouse, those properties identify a collapsed
Range that corresponds to the point in the document where the user clicked.
(This is invaluable if you are trying to create a click-to-edit interface and
want to easily determine where to position the cursor.) Although such differences
may seem small, it would make such a click-to-edit interface impossible to test
using a framework that automates its events using the browser's JavaScript API.
To be fair, to build a framework that offers support for native events that works on all browsers and platforms is not easy. It requires writing low-level native code for multiple platforms, which is not something many individual developers know how to do. The most accessible cross-platform solution that I have found for doing this sort of thing is to use java.awt.Robot. I tried to create a trusted applet that could be embedded in any web page so it could be scripted via LiveConnect so that JavaScript test code could use it. Unfortunately, it did not have the same level of fidelity on all browsers.
While I was at Google, I spent a bit of time talking to Simon Stewart who owns the WebDriver project. Simon is a smart guy who really gets it when it comes to web application testing, and he took on the inglorious task of writing the native code for multiple platforms to build a test framework that did honest automation of input events. It appears that since WebDriver was introduced on the Google Open Source blog, its code has started to be migrated into the Selenium repository, also hosted on Google Code.
Although I was disappointed with Selenium's original approach at automating
input events, I now have high hopes for Selenium 2 which will include Simon's work!
By comparison, any framework that advertises its ability to work with forms as
its main strength likely
does things such as document.getElementById('input_name').value = 'simulated input'
,
which is not at all what happens when a user types in a field. Starting with a
framework that provides true emulation for low-level events makes it possible to
simulate everything else.
Tests should be able to be written in JavaScript
Unless your web application is written in Flash or runs as a Java applet, at some level, it is executing JavaScript code. It seems pretty logical that the application code and test code should be able to be written in the same language, so being able to write tests in JavaScript is a must. This guarantees that any features of the language exercised by the application code can be verified in the test code. It also makes it easier for developers to write tests since they will already be comfortable with the testing language.Though there are many promising things about Selenium 2, the current version of Selenium has support for just about every popular programming language except JavaScript. What gives? Was it really more important to add support for Perl bindings than JavaScript ones? No, but JavaScript bindings are generally harder to write because the most convenient JavaScript interpreter to use is the one built into the browser.
JavaScript executed in a web page by a browser is sandboxed and has no access to things such as the native event queue. JsUnit leverages the browser's interpreter as I describe, and so long as your tests behave within the constraints of the sandbox, JsUnit works quite well and no special work needs to be done to port tests to other web browsers. However, as the next section will demonstrate, writing tests often requires functionality that the sandbox cannot provide.
Test code should be able to suspend execution while waiting for network or other events
Even though JavaScript does not support threads, that does not mean that all logic will be executed synchronously. For example, anXMLHttpRequest
is often configured to call a function asynchronously once it has finished
loading data over the network.
Consider the following application code:
/** Redraws a menu. Returns true if successful; otherwise, returns false. */ var updateMenu = function() { var menu = document.getElementById('menu'); if (menu) { redrawMenu(menu); } return !!menu; } menuButton.addEventListener('click', function(e) { if (!updateMenu()) { // If the menu element was not available to be updated, then another // event handler in the current chain will add it to the DOM, so // defer updateMenu() until that happens. setTimeout(updateMenu, 0); } }, false);It is common to use
setTimeout()
to work around timing bugs in
browser rendering. Subtleties like these are particularly important to test
because it is so easy for them to regress. Ideally, the following would be a
suitable test:
assertFalse(isMenuUpdated()); menuButton.dispatchEvent(e /* a click event */); assertTrue(isMenuUpdated());If the browser being tested relies on the code path that uses
setTimeout()
,
then this JavaScript test will likely fail because dispatchEvent()
will synchronously call menuButton
's event listener which will
schedule the timeout and return immediately. Then assertTrue(isMenuUpdated())
will be called, the assertion will fail, and now that the testing code's thread
of execution has terminated, the callback scheduled by the timeout will run.
Unfortunately, it is too late because the test has already failed.
Ideally, it would be possible to write the test as follows:
assertFalse(isMenuUpdated()); menuButton.dispatchEvent(e /* a click event */); sleep(10); // suspend execution for 10ms and yield to the browser's JS thread assertTrue(isMenuUpdated());This would temporarily give control back to the browser, giving the callback scheduled by the timeout a chance to run. When the test resumes, the assertion will succeed as expected.
Unfortunately, the existence of a sleep()
function is antithetical
to JavaScript's thread-free nature. If your JavaScript is running in the context
of a trusted Firefox extension, there is support for
working with threads
from JavaScript.
Chickenfoot uses this
API to implement a sleep() command.
I am not aware of equivalent solutions on other browsers.
Tests should be able to run on multiple browsers and on multiple platforms
Anyone who does web development knows that each web browser behaves differently -- just because a test passes on one browser does not mean that it is guaranteed to pass on the others. Considering that most large JavaScript codebases do some branching based on the browser in which they are running, it is impossible to get 100% code coverage if a test is only executed on one browser because only one of the browser-based branches will be followed.For these reasons, it is imperative that a web application test framework be able to run the test suite on any configuration in which the web application itself may be run (this includes mobile web browsers!). For frameworks that run inside the browser sandbox, like JsUnit, this is trivially achieved. But for frameworks that need to access the chrome of the browser, like Selenium 2, this can be a lot of work.
Tests should not hardcode XPaths, element ids, or CSS class names
Tests that know about XPaths, element ids, or CSS class names are not functional tests. Ideally, a functional test should continue to pass if a developer changes any of these things in such a way that does not alter the behavior of the application from the user's perspective. Hardcoding these values makes tests brittle and harder to maintain.If, for whatever reason, you cannot avoid this, I recommend honoring the Don't Repeat Yourself (DRY) principle as explained in The Pragmatic Programmer. Instead of hardcoding an id across your tests, create a utility function available to all tests and give it a declarative name:
function getSaveButtonElement() { // By giving this function a declarative name, it makes it easier to change // the implementation to use an XPath or other accessor, if need be, while // still maintaining the contract of the function. document.getElementById('save-button'); }Consistently using the accessor in your tests will make them more readable and less brittle. (Note that small design decisions like this are good for software development in general, not just tests.)
This style also helps make tests more reusable, which can be particularly useful if you are maintaining multiple UIs for the same application (commonly, one designed for desktop browsers and another optimized for mobile ones). As the functionality of these UIs will likely overlap, if designed correctly, the same functional tests can be used to test both interfaces.
For example, suppose getSaveButtonElement()
were part of some sort of TestDriver
object and all tests in a suite
were written to use TestDriver.getSaveButtonElement()
. It should be
possible to inject the appropriate TestDriver
object based on the
environment (desktop versus mobile) so that a test written as follows would
apply to both interfaces:
assertFalse(isSaved()); var buttonEl = TestDriver.getSaveButtonElement(); click(buttonEl); assertTrue(isSaved());Ideally, test engineers who are not developers of the application under test should be able to write functional tests using only the API made available through a
TestDriver
like the one used in the above example.
Test suites should be able to run via a cron job
If a test suite can be run via a cron job, then it means that all of the setup required to run the suite has been encapsulated in some sort of script. Anyone on the development team should be able to run that script so that the overhead of running tests is as low as possible.But despite your best efforts, that overhead can never be quite low enough, which is why running the suite automatically via a cron job at regular intervals is the only way to be absolutely sure that tests will get run and catch errors. The policy for how often tests should be run and how to handle test failures when they happen will vary from team to team, but at least having the ability to run tests from cron will empower your team to experiment and make those decisions.
More importantly, developers are more likely to write tests if they know that they will be run regularly (and their results will be published to other team members) because they can see the impact of their tests. When tests are not run regularly, developers may only write tests as a token gesture to claim that they tested their code before submitting it. Such tests do not get run regularly, go stale, and never pass again.
Functional testing using Chickenfoot
Chickenfoot is a Firefox extension I built for my Master's Thesis at MIT that enables end-user programmers to customize web pages and automate tasks they perform on the web. The research goal was to make it powerful yet accessible to end-user programmers, but whenever we presented it to people from industry, as soon as they heard the word "automation," they would always immediately ask if it could be used for testing.At Google, I spent a lot of time developing a cross-browser click-to-edit solution for Google Tasks. Before Firefox 3 came out, Firefox lacked support for content-editable elements, so I made each task a DIV and would shimmy a textarea on top of it whenever it was clicked, putting the cursor in the appropriate place. (You can still observe this behavior on Firefox 2 today, though I would not be surprised if it gets dropped at some point. Google Wave did not even attempt to support Firefox 2, presumably because of its lack of support for content-editable elements.)
Without getting into the details, this was a complicated feature to implement, so coming up with a comprehensive regression test suite was critical. I tried to use Selenium, but discovered that it could not faithfully automate pressing the up and down key. That is, the key event would fire, triggering the appropriate JavaScript listeners, but because it was not a native event, the cursor did not actually move through the textarea. I could have used JavaScript to position the cursor myself, but the movement of the cursor was the thing I was trying to test!
It became apparent that native input events were a necessity, so I used LiveConnect in conjunction with Java's java.awt.Robot. As mentioned in the first section, I could not get Robot to work reliably via an applet in all browsers, but it worked well in Firefox, and since that was the only place Chickenfoot was available, it was good enough for me.
At first, using Robot was somewhat comical because it actually took over your mouse and keyboard, so it was impossible to use either of them while the tests were running. My life turned into the "Compiling" xkcd comic because I would take a break to play Guitar Hero whenever I kicked off the Chickenfoot test suite. Months later, a colleague wrote a Python script to run the tests in VNC so I could use my machine to do other things while the tests were running. My Guitar Hero skills suffered dramatically.
Because we were using Chickenfoot, tests were written in JavaScript and could
access objects in the page directly. Because the test code was executed in the
privileged environment of Firefox's browser chrome, special functions such as
sleep()
as described in the third section were available.
We adopted the design pattern described at the end of section five where the
application would expose an object named TestDriver
and the test
code restricted itself to TestDriver
's methods. While
developing tests, it might not be clear what sorts of methods we would need to
add to TestDriver
, but using Chickenfoot as a
REPL made it
efficient to experiment with the page and to determine the best way to
implement a new TestDriver
method. The test suite evolved to include
its own meta-language where TestDriver.setTasks('A > B C* _ D');
meant
"clear the task list and create a task with two sub-tasks, skip a line, create a fourth task,
and put the cursor at the end of the third task." The implementation of
setTasks()
fired off all of the appropriate input events to make
that happen, so whatever the test did could easily be reproduced by a developer
using a real keyboard and mouse.
As you may have guessed, the one important property that this system lacked was
the ability to run tests on other browsers. A critical data-loss bug that only
existed on Safari almost made it live to production for exactly this reason
(fortunately, the QA team caught it in time). After talking to Simon Stewart, I
started working on a parallel implementation of TestDriver
in Java
that could be injected if the JavaScript was evaluated using
Rhino while running the test on
WebDriver. Unfortunately, I did not complete that work before I left, so I
don't know how well such a solution would have worked, in practice. However,
I still find the idea of being able to develop tests quickly using the REPL in
Chickenfoot and then having them automatically work in other browsers particularly
attractive.
How browser vendors can help
If I were a browser vendor, I would be thinking hard about how to make my browser the best one for doing functional testing. Think about what Firebug has done for Firefox. No matter how much faster Chrome gets, I'm still going to have to use Firefox at least some of the time because its web developer tools are so far superior to everyone else's. Even though the full Firebug extension is only available in Firefox (and therefore cannot help me debug IE-specific issues), it is still incredibly valuable to me. Similarly, if Firefox had an extension that was exceptionally better than all other functional testing tools, I would use it even though it would not help me with cross-browser testing. Ideally, it would challenge the other browser vendors to make similar offerings so that ultimately using the best tool would not be at odds with testing multiple browsers.In terms of building a tool that satisfies all of my properties, Firefox is likely in the best position because it already uses JavaScript as its scripting language, so creating JavaScript bindings for its test tool should be straightforward. As demonstrated by Chickenfoot, it already exposes appropriate APIs for suspending execution while other events, such as network activity, continue.
Although it appears that LiveConnect will continue to be supported, it would be better if a test framework did not require the Java plugin, so supporting an API equivalent to Robot's natively would be superior. Ideally the API would funnel events into the native queue without taking over the user's mouse and keyboard, avoiding the issue I ran into that could only be solved with VNC.
Finally, it would be great if the browser could be kicked off from the command line in some sort of headless mode that would just run the test and print the results to standard out:
./firefox --test test_file.js --output=json -Xusername=foo -Xpassword=barThis would start Firefox with a clean profile and run
test_file.js
in the privileged JavaScript context that would have access to all of the
testing utilities and other browser chrome. Command-line arguments specified by
-X
would be available to the test, as well, so things like
usernames and passwords would not have to be hardcoded into the tests.
Ideally, it would also be possible to run a test that can manage multiple instances
of Firefox with different profiles so that applications such as real-time chat
can also be tested. (Multiple profiles are needed for applications such as Gmail
that only enable one user to be logged in per browser because of how cookies are
used.)
The list of feature requirements for such a tool may be endless, but if Mozilla put the basics in place (built-in support for injecting native events and running the browser headless with a clean profile), it is likely that third-parties would rally around it to build testing tools on top of it. Arguably by providing an API for a scriptable debugger, Firefox (intentionally or not) positioned itself as the most attractive platform for a tool like Firebug. Just like that dead guy in Field of Dreams said, "If you build it, he will come."
Despite the perceived dominance of Selenium in the area of functional web testing, I think the space is still wide open and that browser vendors are in the best position to move things forward. By building a framework with the properties I describe, developers will be able to emulate the user experience more precisely, ultimately enabling them to write better tests and deliver higher-quality web applications. It is in the vendors' interest to become the optimal platform for testing because the browser with the best tools is likely to be the browser on which web applications are tested the most. This in turn increases the perception that the browser provides the best overall user experience because the majority of applications will have been tested (and are therefore guaranteed to work) with that browser.