A Guide to Soak Testing and Spike Testing
Imagine you're building a web application or API. You've completed all functional testing, and you even did a few performance tests. The launch event was successful. After some time, however, your application performance starts to degrade, and API response times increase. You may be wondering what went wrong? This article will look at two types of performance testing that can help avoid such situations: soak testing and spike testing. We'll first recap stress testing because soak and spike testing are related to it, and then take a closer look at these other types of tests, when they are helpful, and how to plan and execute them.
Revisiting Stress Testing
Let's look at the definition of stress testing again to put soak testing and spike testing into perspective. The objective of a stress test is to find the amount of load that causes a system to break. Testers achieve this by continually ramping up the number of users and requests hitting the software to a number beyond expected real-world conditions (for example, twice as much). At some point, you reach the situation where your software no longer works as expected. In other words, you are putting stress on your tests to see how your system performs, so you can decide which areas you want to fix, or to prepare a back up plan in case you hit these numbers in the real-world.
Soak tests and spike tests are two types of stress tests. The difference between them is the way they ramp-up and ramp-down.
What is Soak Testing?
The defining factor of a soak test is its duration. Unlike stress tests, you don't increase traffic to the point of failure, but rather stop ramping up at the expected average load. Still, there's a ramp-up process, and you don't typically start with the target load. Once you've reached the target, however, you maintain this load for a more extended period. This period can be a few hours or even days.
Other names sometimes used for soak testing are endurance testing, capacity testing, and longevity testing, and they describe the purpose of the test quite well. Let's look at the reasons for running soak tests.
Why Do You Need Soak Testing?
While it's evident that an application needs CPU, memory, and bandwidth while handling requests, the system sometimes doesn't properly clean up resources after each request. Or, even if it does, it may not happen fast enough. Every execution reserves some space in the server's memory, and that space may remain unavailable until the garbage collection kicks in. The filesystem can also fill up if every request writes to log files or creates some temporary files necessary to complete the interaction, without deleting them afterward. Another common problem are network connections to backend components like databases, which may open but not close on time.
All the issues mentioned have something in common. They won't hurt the system's performance or availability immediately. Let's look at an example. Imagine you do extensive log writing. Every request slowly fills up the drive until you run out of disk space. Now, you may have been proactive and already implemented log rotation to delete log files after 30 days. A month's worth of traffic from a hundred users will never fill the drive then. What about a month's traffic from hundreds of thousands of users, though? Most applications have some potential leak like this, and soak testing can help you find them so you can make sure they don’t harm performance.
How to Run a Soak Test
As mentioned earlier, a soak test is a long-running test because its objective is to find the problems that won't show up in shorter tests. This requirement can make running these tests more complicated and require extensive planning.
In my previous article, "3 Things to Look Out for When Stress Testing Your API", I talked about test environments. I mentioned the advantages and disadvantages of dedicated test and staging environments over testing in the production environment. Soak tests mostly find issues related to environmental constraints like memory. If the staging environment is any different from production, you will not get accurate results. Of course, an exact clone of the live environment is the best way to run tests, but this may be out of the picture.
Sometimes you're lucky, though. Suppose you have an application that doesn't get 24/7 traffic, such as an industry-specific software only used during business hours or a system exclusively used during specific events like broadcasts. In that case, you can test your system during downtime with zero or minimal actual production traffic.
A commonly used strategy is testing over weekends: start your test on Friday, run for two days, and have results Monday morning. Although a soak test is supposed to run with regular production traffic, it's also possible to somewhat compress the load. For example, you could use the number of requests you expect over five days (Monday to Friday) and run the same number over two days (Saturday and Sunday).
While your soak test is running, you should record metrics like CPU, memory, network, and disk utilization, because they give you the hints you need to find the bottlenecks. If traffic remains steady, metrics should mostly stay constant, with only minor deviations in both directions. If a metric grows over time without returning to normal eventually, something may be wrong, and you have found your leak. Use the test results to improve infrastructure planning and add resources like memory or configure concurrent database connections accordingly.
If you can't do long-running soak tests, an alternative with even more compression is called peak testing. With peak testing, you only run a test for a few hours with the load that you expect during peak times, such as a Black Friday sale. It won't find issues that only arise after days of continuous load, but it can still be helpful to figure out some performance problems. An interesting question, however, is the system's behavior when it goes from average load to peaks and back. That leads us to the second topic for today's article, spike testing.
What is Spike Testing?
Unlike most stress tests that only increase load towards the point of failure and soak tests that increase the traffic and keep it constant afterward, ramping up and down is essential for spike tests. In a spike test, testers also increase the number of requests very quickly up to stress levels, decrease it again and continue running the test. To put it differently: you can think of the test load as a primarily flat line with a few spikes, hence the name.
Spikes in traffic can either happen randomly or after constant intervals. For example, a spike test could stay in average load for eight minutes, immediately ramp up to double the request count for two minutes, go back to previous levels, and repeat it six times, so the whole test runs for an hour. You could also implement a "step-up spike test", in which you gradually increase the height of each spike.
3 Spike Testing Use Cases
Let's look at three scenarios that benefit from spike testing.
1. Spike Testing for Failure Recovery
A unique property of spike tests compared to other performance tests is that they don't just test how the system reacts but also how it recovers from failure. To illustrate possible test results and their implications for your application, let's work with our initial example and assume various behaviors. If the two minutes of higher load do not affect performance, it's excellent news. It's also uninsightful, so we should probably increase the stress on the system to learn its boundaries.
Two things could happen during the more significant spike. Either response times could increase, but you'd still have a working system, or you'd get more error responses. In both cases, you may want to improve the software, although the first scenario would still be better than the second since there are no (additional) errors.
Also, the new question that spike testing can help answer is what happens when the spike is over, and traffic returns to normal levels. The best-case scenario is that errors stop occurring and response times decrease. Depending on the internal reason for the problems, this could ideally happen immediately in minute eleven, although you may find that the application needs a few minutes more to recover. Or worse, error rates and response times remain high. The catastrophic scenario, however, would be to find that the system never recovers.
Fast recovery can have more relevance than perfect behavior during a spike. Spikes could be short and performance reductions unavoidable, but they can affect your overall performance metrics and important KPIs like availability if they have prolonged consequences. If these are important to you, make sure to add spike tests to your overall testing strategy.
2. Spike Testing for Auto-Scaling
Many systems have some auto-scaling built into them. Think of cloud environments where an application launches new server instances when the load increases that terminate after the spike. Moreover, even a single process on a single machine may rely on garbage collection or other cleanup processes that run regularly.
The usual ramp-up speed in a stress test gives such systems ample time to provision new resources and run the internal maintenance routines that are required to stay performant. With spike tests, however, changes in traffic come about abruptly. They happen faster than the system usually reacts. It's vital to understand the behavior of the software and its underlying infrastructure in these cases to achieve good scalability.
Going back to the example, you might observe that response times are slow in minute nine when the spike starts. Then, in minute ten, the system has managed to keep up by scaling its resources to the additional traffic. If that's your observation, you can feel confident that any performance problems will be temporary. If you find nothing improves during the spike because the scale-up needs more than two minutes, there's a definite need for improvement. Of course, one and two minutes are arbitrary numbers as it all depends on your goals and expectations.
By the way, don't forget to look into the scale-down as well. If you provision new cloud resources when a spike starts but never stop them when it ends, you'll find yourself with underutilized servers and unnecessary cloud computing bills from your host. Remember that traffic patterns change in both directions.
3. Spike Testing for Product Launches
Many products and websites have relatively constant demand and traffic patterns, or they grow steadily and slowly. Then, some suddenly go viral after being featured on television or a high-traffic website or when being marketed to an eagerly waiting audience. If that's something you expect to happen to your website, spike tests are necessary to avoid disappointed users.
Performance Testing Strategy
Performance tests come after functional testing, and stress tests come after the first basic load testing. Adding soak testing to your strategy is helpful for every application with steady traffic patterns, and spike testing is essential for every application that expects spikes in traffic that you want to handle without degraded performance. You can configure both types of tests in test runners like JMeter, and you can trigger them from the cloud using BlazeMeter.