Load Testing Best Practices
What is Load Testing?
Load testing is testing that checks how systems function under a heavy number of concurrent virtual users, performing transactions over a certain period of time. In other words, load testing tests how systems handle heavy load volumes. There are a few types of open-source load testing tools, with JMeter being the most popular one.
Here I will go through the most important best practices for load testing your site, including for everyday traffic, and for peak traffic events.
Test Early and Test Often
- Plan ahead to fix failures before peak load is expected
It’s never too early to start load testing, but to ensure you have time to test all scenarios, it’s best to allow at least 90 days. This way your not just checking a box to say “I did it,” but you’re actually allowing time to fix bugs and bottlenecks and even rerun the heavy load test.
- Make sure you are testing the right things
Building a checklist of the KPIs you want to test will make sure you only test what you need and will allow you to create a realistic load scenario. There’s no point checking for a million users if your site will never get hit by more than 10,000. Similarly, make sure your site performs so that it easy to use for your audience.
- Protect your reputation and revenue
Poor performance not only costs over $300K p/hour if your site is down, but can also harm your reputation. Customers want easy accessibility, and will choose to go elsewhere if your site does not perform. By checking your system under different loads, you can avoid system crashes, ensure there are no memory leaks, and keep response times low. This saves your company money and protects your company name.
When and How to Load Test
When load testing your site, it is recommended to run both small tests after each build and larger tests for specific events when your site will be put under extra stress such as Black Friday.
You should run small load tests after every build, to ensure code changes don’t affect the everyday user. It’s not enough to test at the end of the process, so by testing continuously, you can find and fix bottlenecks before it’s too late. Other things to check when running small load tests include average load seen on the application, and the average time a user spends.
It’s best practice to run large tests for maximum load before peak events. This is to ensure your
infrastructure is prepared, allows you to plan ahead for known problems, and monitor end user experience concurrently.
We recommend you run large load tests during traffic downtime, like Sunday at 2:00 AM so that real users aren’t affected. Release a maintenance notice. If you can’t test in production, you should create a replica as similar as possible.
While your application is under a large load, you also want to verify what the end user is seeing. Having a single front end user capturing metrics like page load time gives you a clearer picture of both server and end user performance concurrently. You can have the faster response times in the world, but if the user never sees certain page objects, it may not matter.
What Type of Load Tests Should You Run?
We’ve just covered some best practices for when to run your load tests, but there are many different types of tests and scenarios to consider when running performance tests:
Long Soak Tests
A peak event such as Black Friday isn’t over in an hour, but rather is a long day where you expect customers coming to your site for sale. In this case, you want to identify any memory leaks (where an application grabs memory to use but doesn’t give it up when it’s done – forces you to reboot if not caught) or queues that unload slowly. You can do this by running a Soak Test, which strains your system over time to ensure your system can properly recycle resources such as CPU memory, long-term stability, inactive threads, and lost connections. It’s essentially an endurance test to maintain healthy status over a period of time.
If you want to verify how your system reacts properly to a sudden ramp-up of virtual users, you need to run a Spike Test. This will monitor the response to the sudden jump, not the failure point, but allow you to monitor how your system can recover from that sudden peak of users.
You also want to monitor how the system recovers.
Failure Point Test
So your expected tests have passed, but you shouldn’t stop there. Take it one step further and determine where your system fails by taking it to its maximum limit. Even if you don’t expect that many users, it’s important to test your breaking point and see that the system can recover well.
How to Identify You Peak Load Times
We’ve gone over the different test scenarios, and a general approach of when to run large or small load tests, but how do you identify your peak load times? The best place is to start by involving your entire team from marketing to product to R&D to discuss when the peak events will occur, and plan ahead to which parts of the application will be stressed. You should also consider that if a competitor’s site goes down, it might cause a flood of traffic to yours
Because all these factors are impossible to guess, you need to determine when, why and how your system will fail. Industry standards suggest that systems are considered under load if 80% of resources are being utilized, and you should test at least 20% over your expected peak. To help plan, you can turn to previous metrics to help.
Even if you think you can anticipate what type of traffic your site will be experiencing on a peak day, other factors can be unpredictable. By checking server logs, you can view the history of web requests, including client IP address, date and time and the page requests.
For example, in the graph above, you can see that the system under test reaches capacity at about 90 virtual users, where the hits per second stops increasing. By discussing these metrics with your entire technical team, you can plan ahead and put together a mitigation procedure if this number is reached. Then as a team you can strictly define your current state and where to focus on for your future goals. Just remember, depending on what environment you choose to run your tests in it is also pertinent to notify your R&D team of end users that maintenance will be happening during the time of the test.
Creating a Great Load Scenario
You’ve determined the best time to test, and what type of load test you want to run, now we need to look at the type of journey your users will take to build a realistic load scenario.
What kind of journeys do your users take?
Instead of testing a “happy path” test the real journey users are taking, by monitoring the behavior of your customers during those peak events.
For example, once you have identified the saturation point, you should cross-reference the information with APM data to check if anything is missing or contradictory and enrich your understanding of how users behave on your website.
It’s also important to look for bottlenecks and high stress points where user traffic spiked. Then, chart different trends and check where your system was close to reaching its limit. For example, if you put a popular product on sale, you would expect the high volume on that one page, and it might crash the entire system. Therefore it is best practice to divide your system into logical sections based off trends observed by testing them individually to help you identify the weakest link.
Where are your users coming from?
A common mistake load testers make is that they test their infrastructure only from inside the organization and not from outside. If you fail to test your network infrastructure, the chains of delivery are not adequately tested like they will be used on a peak user day. Testing external infrastructures and hosting servers is crucial to preventing failure. It is also important to test what types of networks your users are coming from. You can learn about running tests from up to 50 multiple geo locations in this guide.
Where do users drop off?
Another common mistake is that tests are made assuming that a user fully completes their actions in a system. However, users often drop off, and it’s important to test a scenario where users may drop off. For example, testing if someone adds something to their cart, but does not checkout. You need to know how your system reacts to the potential drop offs.
Determining ramp up and ramp down speed
Ramp-up is the amount of time it will take to add all test users (threads) to a test execution, and should be configured slow enough so that you have enough time to determine at what stage problems started occurring.
We suggest to start at 10% of your peak load and slowly ramp up from there, being sure to monitor indicators at each stage.
Then, start out by testing 80% of the capacity. Once you decide those numbers, run load tests for 80% of the capacity and monitor your KPIs and how your system reacts. Ensure everything is completely stable, memory capacity is mellow, CPU is low and recovery from spikes is quick. Everything should be working perfectly for 80%. If something seems jittery at this point, you can be pretty sure that you won’t be able to count on 100%.
If the test succeeded, slowly climb up to 100%. If not, identify bottlenecks and errors and fix what needs fixing before testing your system again.
Once you’ve reached full capacity it’s time to test your system. This is the number of users you are expecting on your website, according to previous user patterns, trend analysis, product requirements and expected events. Check for memory leaks, high CPU usage, unusual server behaviour and any errors.
Define KPIs in advance
You want to define Failure Criteria as part of your test to ensure your not just going through the motions, but actually verifying from both an SLA and business standpoint that the response time was quick enough, the error percentages did not get above a certain threshold and more.
Blazemeter offers a pass/fail flagging system based on a number of different criteria which is configurable.
For example, without setting a threshold for response time, your application could be performing very slowly, but the performance test would still pass. In this case, if the average response time is greater than 5 seconds, then the test will be marked as a failure. Therefore we can test that our users are having a fast experience in addition to monitoring things like errors or other KPIs.
Running Your Load Test
It’s now time to calibrate and run your load test!
1. Make sure the test you just created works
The first step is making sure the test you created with your configuration works.
In Blazemeter, we have a debug test option that allows you to validate your configuration with a low-scale test. You can learn more about debugging and calibrating your tests here.
2. Validate load resources are not over or under utilized
This is done through a calibration procedure that validates that performance test resources are not causing a bottleneck.
There is a recorded step by step process for this procedure in Blazemeter’s help tab.
3. Slowly begin to ramp up to peak load
In order to determine if your load resources are over or under utilized, you need to watch both CPU and Memory Usage on the generation machines as you slowly ramp up. To prevent load generator failure from being a variable, CPU values should be lower than 80% and Memory Levels should be less than 70%. Blazemeter has a tab in the report section to help you monitor these load engines as seen in the image below.
Identify, Fix & Eliminate Bottlenecks
The graph below shows what a bottleneck can look like. The hits per second in purple drops and the response time in gold increase abruptly.
If you do have a failure, you need to rerun your load test after the fix has been implemented. This way your not just checking a box that a load test has been done, but you can be confident that you system is prepared for high usage.
It’s ideal to fix any sources of errors ahead of time. This can be done by having a database replication, database or application failover cluster with a procedure tied to how to switch over beforehand. This lets you keep services up and running while fixes are happening in the background.
It is also helpful to have an organized platform to share information during a load test so that if any critical assignments need to be distributed quickly, they can be in a quick and organized way.
The status quo for high performing applications will continue to require faster and more flexible systems as time goes on. By integrating automation to seamlessly update tests, automating test runs, and choosing to test early and often in your development life cycles can help you meet those standards.
As you prepare for your own load test scenarios, make sure to prepare a backup plan with backup servers and locations ready to grow.
In conclusion, a solid load testing strategy involves running both smaller tests within your development cycle, as well as running stress tests in preparation for large testing events.
To get started with load testing, you can simply put your URL into the box below, and begin to run load tests within BlazeMeter’s Continuous Testing platform.