PhantomJS as an HTML Validator

Please, cite this blog post via BibTeX as such:

@misc{bugayenko2014blog0406,
  author = {Bugayenko, Yegor},
  title = {{PhantomJS as an HTML Validator}},
  howpublished = {\url{https://www.yegor256.com/140406b.html}},
  year = {2014},
  month = {apr},
  note = {[Online; accessed 16-11-2025]}
}

I created phandom a few months ago, but yesterday finally found the time to make some needed changes to it. So, now is a good time to explain how I’m using Phandom in some of my unit tests.

Before I get started, though, I should say a few words about phantomjs, which is a JavaScript interface for WebKit. WebKit, on the other hand, is a web browser without a user interface. WebKit is a C++ library that enables manipulation of HTML content, through DOM calls. For example, this is a simple JavaScript located code in example.js:

var page = require('webpage').create();
page.open(
  'http://google.com',
  function() {
    console.log('loaded!');
    phantom.exit(0);
  }
);

We run phantomjs from the command line with the following code:

$ phantomjs example.js

PhantomJS creates a page object (provided by webpage module inside phantomjs), and then asks it to open() a Web page. The object communicates with WebKit and converts this call into DOM instructions. After which, the page loads. The PhantomJS engine then terminates on line 6.

WebKit renders a web page with all necessary components such as CSS, JavaScript, ActionScript, etc, just as any standard Web browser would.

So far so good, and this is the traditional way of using PhantomJS. Now, on to giving you an idea of how Phandom (which stands for “PhantomJS DOM”) works inside Java unit tests:

To test this, let’s give phantomjs an HTML page and ask him to render it. When the page is ready, we’ll ask phantomjs to show us how this HTML looks in WebKit. If we see the elements we need and desire,—we’re good to go. Let’s use the following example:

import com.rexsl.test.XhtmlMatchers;
import org.hamcrest.MatcherAssert;
import org.phandom.Phandom;
public class DocumentTest {
  @Test
  public void rendersValidHtml() {
    Document doc = new Document();
    // This is the method we're testing. It is supposed
    // to return a valid HTML without broken JavaScript
    // and with all required HTML elements.
    String html = doc.html();
    MatcherAssert.assertThat(
      XhtmlMatchers.xhtml(new Phandom(html).dom()),
      XhtmlMatchers.hasXPath("//p[.='Hello, world!']")
    );
  }
}

When we use the above code, here is what happens. First, we get HTML html as a String from doc object, and then pass it to Phandom as an argument. Then, on line 13, we call the Phandom.dom() method to get an instance of the class org.w3c.dom.Document.

If our HTML contains any broken JavaScript code, method dom() produces a runtime exception and the unit test fail. If HTML is clean and WebKit is able to render it without problems, the test passes.

I’m using this mechanism in a few different projects,and it works quite well. Therefore, I highly recommend it.

Of course, you shouldn’t forget that you must have phantomjs installed on your build machine. In order to avoid unit test failures when phantomjs is not available or present, I’ve created the following supplementary method:

public class DocumentTest {
  @Test
  public void rendersValidHtml() {
    Assume.assumeTrue(Phandom.installed());
    // the rest of the unit test method body...
  }
}

Enjoy and feel free to report any bugs or problems you encounter to: GitHub issues