Introduction to the Document Object Model and Selenium

The Inquisitive Dev
4 min readNov 13, 2020

In a previous post I covered how to use Selenium to automate the testing of a web user interface. Selenium achieves this by interacting with the HTML Document Object Model (DOM). This post will look at how the DOM can be used in automated Selenium tests to locate and interact with elements on a web page.

This article was originally published over at The Inquisitive Dev.

What is the Document Object Model?

The Document Object Model (DOM) is a standard, defined by W3C (World Wide Web Consortium). It defines a programming interface for accessing and manipulating HTML elements on a web page. The standard defines:

  • HTML elements as objects
  • the Properties of elements
  • the methods to access elements
  • the events for elements

When a web page loads, the browser creates the HTML DOM. Along with the element methods, properties and events mentioned above, the model consists of a hierarchy of objects (aka the DOM tree), as shown in the diagram below:

Selenium WebDriver

The role of the Selenium WebDriver is two-way-communication with the browser DOM, i.e. WebDriver passes commands to the browser and receives information back. This is done via the browser-specific driver (e.g. ChromeDriver).

There are different versions (aka bindings) of WebDriver depending on what language you intend to use to interact with the DOM. There are bindings for Java, JavaScript, Ruby, C# and Python.

Web elements in the DOM are type cast to WebElement objects by the WebDriver and located using the findElement() method. This method takes a locator as its argument and searches through the DOM until a match is found. If no match is found a null value is returned. Once located and returned to the WebDriver, WebElements have class methods that can be called programmatically, as well as properties that can be manipulated.

Finding DOM elements

Before you start writing any code you should be thinking about how you are going to test what you build. This involves making smart decisions about how to locate elements in the DOM. Doing so will make your tests robust and reliable, not to mention more readable. There are a number of ways to locate elements on a web page. Below are some of the most commonly used.

Finding elements by ID

This should be your first choice. The Selenium WebDriver code is optimised in a way that makes the locating of IDs in the DOM fast and efficient. To avoid ambiguity in your tests you should always make sure element ids are unique in the DOM.

<h1 id="title">This is the page title</h1>WebElement pageTitle = driver.findElement(By.id("title"));

Finding elements by tag name

You can also locate elements by tag name. The example below will locate all <p> elements in the DOM and store them in a List object.

Finding elements by class name

You can also select elements by the class name assigned to them. The example below shows a paragraph element being targeted by its class attribute “description”.

<p class="description">blah blah</p>WebElement desc = driver.findElement(By.className("description"));

Clicking on elements

Once an element has been successfully located, your tests can also click on it. By chaining together the locating and clicking of elements, you can simulate user journeys through your application or site.

To click on an element, use the click() method.

<button id="submit">Submit</button>driver.findElement(By.id("submit")).click();

Entering Text into fields

Another action that can really help to simulate realistic user journeys is being able to type text into input boxes. For example a login form:

<form action="/action_page.php">
<label for="username">username:</label><br>
<input type="text" id="user" name="user" value="">
<br>
<label for="password">password:</label><br>
<input type="text" id="psword" name="psword" value="Doe">
<br><br>
<input id="submit" type="submit" value="Submit">
</form>
driver.findElement(By.id("user")).sendKeys("theInquisitiveDev");
driver.findElement(By.id("psword")).sendKeys("password123");
driver.findElement(By.id("submit")).click();

As before we can use the findElement() method to locate the desired input field. The sendKeys() method can then be used to programmatically enter text into the fields. Finally, we submit the form by using the click() method on the submit element.

Conclusion

This was a real quick introduction on what the Document Object Model is and how it relates to automated testing with Selenium. There is obviously a lot more you can do with DOM manipulation but that is outside of the scope of this post. For more information the Selenium documentation is a great place to start.

If you found this article useful, please consider subscribing to my blog.

Thanks for reading

--

--