Dmitri Tikhanski is a Contributing Writer to the BlazeMeter blog.

Become a JMeter and Continuous Testing Pro

Start Learning

Test Your Website Performance NOW! |

arrowPlease enter a URL with http(s)
Nov 16 2016

How to Spider a Site with JMeter - A Tutorial

When it comes to building a web application load test, you might want to simulate a group of users “crawling” the website and randomly clicking the links. Personally, I don’t like this form of testing as I believe that the load test needs to be repeatable, so each consecutive test hits the same endpoints and assumes the same throughput as the previous one. If the entry criteria is different, it’s harder to analyze the load test results.


But sometimes this rule isn’t applicable, especially for dynamic websites like blogs, news portals, social networks, etc., where new content is being added very often or even in real time. This form of testing ensures the user will get a smooth browsing experience and also checks for broken links or any unexpected errors.


This article covers the 3 most common approaches of simulating website “crawling”: clicking all links found in the web page, using HTML Link Parser and the advanced spidering test plan.


1. Clicking All Links Found in the Web Page


The process of getting the links using Regular Expression Extractor is described in the Using Regular Expressions in JMeter article. The algorithm is as simple as this:


1. Extracting all the links from the response with the Regular Expression Extractor and store them into JMeter Variables. The relevant regular expression would be something like:


<a[^>]* href="([^"]*)"


Don’t forget to set “Match No.” to -1 to extract all links. If you leave it blank, only the first match will be returned.


regular expression extractor jmeter


2. Use ForEach Controller to iterate the extracted links


3. Use HTTP Request sampler to hit the URL living in the “Output Variable Name”




regular expression extractor jmeter demo



● Simplicity of configuration

● Stability

● Failover and resilience



● Regular Expressions are hard to develop and sensitive to markup change hence fragile

● Not actually a “crawler” or “spider”, just consequential requests to links


2. Using HTML Link Parser


JMeter provides a special Test Element - HTML Link Parser. This element is designed for extracting HTML links and forms, and substitute matching HTTP Request Sampler relevant fields with the extracted values. Therefore HTML Link Parser can be used to simulate crawling the website with minimal configuration. Here’s how:


1. Put the HTML Link Parser under a Logic Controller (i.e. Simple Controller if you just need a “container” or While Controller to set a custom condition like maximum number of hits)


2. Put the HTTP Request Sampler under the Logic Controller and configure Server Name or IP and Path fields to limit the extracted values to an “interesting” scope. You probably want to focus on domain(s) belonging to the application under test and don’t want it to crawl over the Internet, because if your application has any link that leads to an external resource - JMeter will go outside. Perl5-style regular expressions can be used to set the extracted links scope.


html link parser jmeter




html link parser demo jmeter



● Easy to configure and use

● Acts like a “spider”



● Zero error tolerance, any failure to extract links from the response will cause cascade failures of subsequent requests


3. Advanced “Spidering” Test Plan


Assuming the limitations of the above approaches, you might want to come up with a solution that won’t break on errors and will be crawling the whole application under test. Below you can find a reference Test Plan outline which can be used as a skeleton for your “spider”:

● Open the main page

○ Extract all the links from it

○ Click on a random link

○ If the returned value has “good” MIME type (if link results in an image or PDF document or whatever link extraction will be skipped) - extract all links from the response


Advanced “Spidering” Test Plan jmeter


Explanation of the used elements:

While Controller: to set the maximum amount of requests so the test won’t last forever. If you limit via scheduling you can skip it

Once Only Controller: to execute a call to the main page only once

XPath Extractor: - to filter out urls that doesn’t belong to the application under test and other links that are not interesting like “mailto”, “callto”, etc. An Example XPath query will look like:


//a[starts-with(@href,'/') or starts-with(@href,'.') or contains(@href,'${SITE}') and not(contains(@href,'mailto'))]/@href


Using XPath is not a must and in some cases it may be very memory intensive, so you may need to consider other ways of fetching the links from the response. It is used for demonstration purposes as normally XPath queries are more human-readable than CSS/JQuery and especially Regular Expressions


__javaScript() function: actually makes 3 things:

a. Chooses random link out of the extracted by the XPath Extractor

b. Removes “../” bit from the beginning of the URLs

c. Sets HTTP Request title to the current random URL

Regular Expression Extractor: to extract Content-Type header from the response

If Controller: the next round of extracting links from the response will start only if the response has matching content type




Advanced “Spidering” Test Plan jmeter demo



● Extreme flexibility and configurability



● Complexity


If you want to try the above scripts yourself - they are available at jmeter-spider repository. As usual, feel free to use the below discussion box to express any form of feedback.

arrowPlease enter a URL with http(s)

Interested in writing for our Blog?Send us a pitch!