Before starting the automation using any automation tool, it is very important to know how that tool works and how it is architecture. This will helps to take the good advantage of the tool at the same time it will help to make the right automation framework.
In my further posts, I will start explaining how to use selenium and how to create a selenium framework in details but before that let’s get an overview of Selenium web driver architecture. Selenium can be a little bit confusing. As a beginner, you will find how simply you can record and play the selenium scripts but it is not straightforward to how it’s doing that. At first glance, it might appear that Selenium is actually driving the browser directly from your code but there’s actually a little bit more going on here and it’s going to help us understand how we can remote execute our test by looking at this basic architecture. So here’s a picture of the architecture for Selenium WebDriver which is the current version of Selenium which I going to explain here.
Selenium webdriver architecture mainly divided into three parts
- Language level bindings
- Selenium Webdriver API
1) Language Level Bindings :
You can see at the Left hand side here we’ve got some bindings and these are language level bindings and with which you can implement the Selenium webdriver code. In simple words these the languages in which are making an framework, will interact with the Selenium Webdriver and work on various browsers and other devices. So we have a common API that we use for Selenium that has a common set of commands and we have various bindings for the different languages. So you can see there’s Java, Java, Python, Ruby, there’s also some other bindings and new bindings can be added very easily.
2) Selenium Webdriver API:
Now these bindings communicate with Selenium Webdriver API and and This API send the commands taken from language level bindings interpret it and sent it to Respective driver. Right now don’t worry about how it works. I will explain them in upcoming posts. In basic term it contains set of common library which allow to send command to respective drivers.
Drivers here at the right hand side, you see we have various internet browser specific drivers such as IE driver, a Firefox, Chrome, and other drivers such as HTML unit which is an interesting one. It works in headless mode which make text execution faster. It also contains mobile specific drivers as well. But the basic idea here is that each one of these drivers knows how to drive the browser that it corresponds to. So the Chrome driver knows how to handle the low level details of Chome browser and drive it to do things like clicking button, going into pages, getting data from the browser itself, the same thing for Firefox, IE, and so on.
How all blocks work together?
So what’s happening here is you’re going to write your test in let’s say in Java and you’re going to be using common Selenium API and that Java binding is going to be sending command across this common WebDriver API. Now on the other end is going to be listening to a driver, It’s going to interpret those commands and it’s going to execute them on the actual browser and then it’s going to return the resulting backup using the WebDriver API to your code where you can look at that result.
Let’s take more closure look that how exactly that works
Let say you have written test using java (binding code) against Selenium API and that binding code is going to issue commands across WebDriver wire protocol this is a rest-based web service that is able to interpret those commands. The driver server is just a little executable that runs each one of the drivers has this driver server that basically listens on a port on your local machine when you run your tests and it’s waiting for these commands to come in. And when these commands come in it interprets those commands and then automates the browser and then returns those results back.
I hope this will give some clear idea about how Selenium Webriver being an architect.