Jan. 17th, 2017

How I Learned to Stop Worrying about LoadRunner and Start Loving JMeter

This blog post details the experience of an engineer who switched from to LoadRunner to JMeter. Due to personal and professional reasons, he requested to remain anonymous. This is his story.

 

Once upon a time in a galaxy far, far away I worked as a performance testing engineer. The company I worked at had a HP LoadRunner license. Therefore, when it came to load testing, the choice was quite obvious: we always used LoadRunner.

 

LoadRunner used to work for basic use cases, like load testing of web applications that were followed by regression tests, to ensure that new features and bug fixes didn’t have a negative impact on the overall system performance. It also worked for “exotic” test scenarios, like load testing complex algorithms implementation in a variety of programming languages, less common transports like protocol buffers, RTP, VoIP,  and even custom in-house developed protocols. Until a very simple project changed everything.

 

One day, I was told to assess the performance of an API endpoint. This was no rocket science, just a simple SOAP API that took specially formed requests and returned responses. The service had already been tested and used to work fine, but the question was about handling increased data volumes.

 

What I was testing was something similar to this:

 

Think about a web service that provides weather information for all the towns in the United Kingdom. For the ~50,000 places that are of “town” status there, it works like a charm and returns information in less than 5 seconds. But the next release requires adding support for the rest of the world, ~3,000,000 towns. This is 60x times more data and the stakeholders need to ensure that the increased volume will be processed in less than 5 seconds to meet the SLAs.

 

The endpoint is allowed to query weather information in bulk (i.e. several thousands of towns per request) and in parallel. The questions were:

1. Is the web service capable of returning weather information for 3 million towns in less than 5 seconds? This was the most important question.

2. If yes, what is the most performing configuration (towns per request and the number of concurrent requests) that has the smallest memory/CPU footprint?

3. If no, how many towns can be processed in 5 seconds, how long will it take to process all 3 million and what is the bottleneck?

 

So I got WSDL for the web service, launched VuGen and started working on the test script development.

 

Challenge 1: Generating GUID

 

First of all, each message had to contain an unique identifier, GUID, as a message ID. It appeared that LoadRunner didn’t have the ability to generate a GUID structure via Parameters or through API. The Load Generators used in the company were all running on Windows so I was able to call the CoCreateGuid function from ole32.dll. The relevant code is as “simple” as:

 

  int GenerateGUID()
   {
       typedef struct _GUID
       {
           unsigned long Group1;
           unsigned short Group2;
           unsigned short Group3;
           unsigned char Group4[8];
       } GUID;
  
       GUID m_guid;
       char msgId[msgIdSize];
  
       lr_load_dll("ole32.dll");
  
       CoCreateGuid(&m_guid);
  
       sprintf(msgId, "%08lx-%04x-%04x-%02x%02x-%02x%02x%02x%02x%02x%02x",
           m_guid.Group1, m_guid.Group2, m_guid.Group3,
           m_guid.Group4[0], m_guid.Group4[1], m_guid.Group4[2], m_guid.Group4[3],
           m_guid.Group4[4], m_guid.Group4[5], m_guid.Group4[6], m_guid.Group4[7]);
  
       lr_save_string(msgId, "msgId");
  
       return 0;
   }

 

Challenge 2: Generating SOAP Payload

 

Another surprise was learning that string concatenation can be very tricky within vuser_init, as the majority of LoadRunner functions free up allocated memory at the end of each iteration, and there is no way to change this behavior.

 

So if you are using i.e. lr_eval_string function, you should know that the used memory will be free only by the beginning of the next iteration. This limits to approximately 1Mb per thread (on Windows) and you won’t be able to increase stack size via the #pragma comment(linker, "/STACK:xxx) compiler option.

 

Also it appeared that the lr_param_sprintf function has an internal limitation of 32k characters. So if you try to build up a character array exceeding 32 kilobytes - you’ll get a memory violation error. The limit is not documented anywhere and the error message is not informative enough to guess where the problem lies, so I used vanilla C functions where possible to avoid further surprises.

 

Finally, I ended up with this monster (gentle reminder that this is nothing more than a string concatenation):

 

   for (i = 1; i <= townsPerRequest; i++)
   {
       unsigned long oldTownsLen, newTownsLen;
       char *oldTowns, *newTowns;
       size_t needed;
       char* allocated;
       lr_eval_string_ext("{towns}", 7, &oldTowns, &oldTownsLen, 0, 0, -1);
       lr_eval_string_ext("{newTown}", 9, &newTowns, &newTownsLen, 0, 0, -1);
      
       needed = snprintf(NULL, 0, "%s%s", oldTowns, newTowns);
       allocated = (char*)malloc(needed + 1);      
      
       sprintf(allocated, "%s%s", oldTowns, newTowns);
       lr_save_string(allocated, "towns");   
      
       free(allocated);
       lr_eval_string_ext_free(&oldTowns);
       lr_eval_string_ext_free(&newTowns);
   }

 

Challenge 3: Test Execution

 

Finally, I had proof of concept for the two things I needed. One, it was possible to construct and execute the request for an arbitrary number of towns from all over the world. Two, it was the same result we would have got if the same request originated from production hosts.

 

I configured the test execution in the Performance Center to match the current production setup. This ensured that given a similar number of threads and towns per request, I should receive the same results - weather information for 50,000 towns in less than 5 seconds.

 

I kicked off the test, grabbed some tea while the results were collating (the Performance Center is pretty slow even for 1 VU with 1 iteration) and realized that it took more than 10 seconds for LoadRunner to get the weather information on 50,000 towns. A quick glance at the Load Generators health graphs revealed that the CPU and RAM were totally used - 100% from the very beginning until the end of the test. The deadline was coming closer and I still had only one half-working solution that could obtain information for a maximum of 30,000 towns in 5 seconds using one Load Generator.

 

JMeter Evaluation

 

I remembered doing something similar in the past using Apache JMeter and decided to give it a try, to see whether it was better or not.

 

I recorded a couple of requests on http://blazedemo.com website - basically just opening the site and clicking the “Find Flights” button - with both JMeter and LoadRunner. Then I configured the tests to run for 50 users, for 1 minute, without ramp-up and ramp-down, and started the tests on a brand new Microsoft Azure Standard DS2 v2 (2 cores, 7 GB memory) machine.

 

Here is what I got:

 

LoadRunner Summary Results

 

loadrunner summary results

 

JMeter Summary Results (plotted from .jtl results file using BlazeMeter Sense)

 

JMeter Summary Results (plotted from .jtl results file using BlazeMeter Sense)

 

Assuming both tools were running 50 virtual users configured to send requests as fast as they can, LoadRunner managed to process 4742 transactions and JMeter ended up with 12356.

 

LoadRunner Transactions per Second

 

LoadRunner Transactions per Second

 

JMeter Transactions per Second

 

jmeter transactions per second

 

As can be seen from the images above, JMeter’s maximum throughput is 272 transactions per second and LoadRunner’s upper limit is just 90 TPS.

 

For a basic HTTP scenario, with the same hardware and the default JMeter and LoadRunner configurations, JMeter appeared to be more than 2.5x times faster. Not only that, but when request size increases, JMeter handles it more smoothly. Therefore, when you have huge requests and limited hardware resources, it makes more sense to use JMeter.

 

Given JMeter’s tuned configuration for increased heap, stack and setting up proper garbage collector options, it was possible to reach about 10x throughput, compared to what LoadRunner was able to generate.
Conclusion

 

Given that JMeter is faster than LoadRunner for the HTTP protocol, especially when it comes to sending HTTP requests that contain a lot of data, and the simplicity of test scenarios implementation; using JMeter was definitely more feasible than implementing the same and using LoadRunner for SOAP service load testing.

 

Just to compare:

- Generating GUID in JMeter is as simple as calling ${__UUID} function

- String concatenation in Groovy language using GString templates is also trivial and readable

 

def stringA = "foo";
def stringB = "bar";

def stringC = "$stringA$stringB";

 

I was finally able to mimic obtaining weather information for 3 million towns by using Apache JMeter and learned that free open source software solutions can not only be a feasible alternative, but they may also work better and provide more human-friendly interfaces.

 

So don’t limit yourselves to something that used to work. There are no guarantees that you are using the best available option. See the Testing SOAP/REST Web Services Using JMeter article to learn more about web services load testing using Apache JMeter.

 

Learn more about switching from LoadRunner to JMeter, from this demo.

 

Switch your LoadRunner scripts to open-source JMeter and Selenium in minutes, with our free online script converter. Learn more about how to switch, here.

Interested in writing for our Blog? Send us a pitch!