Guest Blogger: Peter HJ Van Eijk Cloud Computer Master Returns
Systematic Load Testing
Load testing has a tendency to become a really big and complicated project. Maybe that is one of the reasons why it is not done often enough. You can throw a lot of random traffic at a site, yet never be sure you have totally covered every possibility.
Load testing is like any other testing.
The late computer scientist Edsger Dijkstra’s is famous for saying “testing can only prove the presence of bugs, not their absence”. The same is true for load testing, with one difference: you know for sure that there is a bottleneck somewhere. In reality our tests are based on assumptions about what can go wrong. These assumptions are often based on our knowledge of the way things are build. So if we test a calculator, it does not make much sense to test every addition from zero to infinity. If it breaks, it is typically at the edge of the capabilities, such as very large numbers. Similarly, missing data often leads to errors, because from experience we know programmers tend to overlook that.
Efficient load testing is based on realistic hypotheses about what can go wrong in reality.
That is why we model user scenarios. But it will never be possible to model all user scenarios, and we also have to exercise the infrastructure. It makes quite a difference if the user requests a lot of static content in his or her pages, or a lot of dynamically generated content and transactions. The latter is much more of a strain on the backend.
The infrastructure of a typical website includes storage, databases, application servers, web servers, load balancers and all the networks in between. In addition, the internet, the DNS system, any content delivery network, as well as the users’ browser and PC or smartphone, are also part of the supply chain.
And while you are testing, your load test generator is also a potential bottleneck!
My proposed way of developing load tests is to make a set of hypotheses of the form: ‘this component is a bottleneck’. If tests can only show the presence of bottlenecks, we must maximize the chance that a particular test exposes a bottleneck, as well as minimize the chance that a bottleneck will remain hidden from our view.
‘Hypothesis based load testing’ starts by grouping the infrastructure along the lines that I have sketched above. Often these correspond to teams within the IT infrastructure operations department. For each of these units, traffic scenarios need to be crafted that test the hypothesis that this unit is the bottleneck. With system administrator tools you can then typically see which resource is saturated, and validate or reject this hypotheses.
With a bit of modeling you will then be able to say to what extent this unit poses a risk to the total performance target. The result of this is that you not only have more confidence in your conclusion about the maximum capacity of the site, but thanks to your analysis, you can point out with surgical precision the piece of infrastructure that most needs upgrading.
What I particularly like about this approach that it also works in cloud computing environments. Even if you don’t own a particular component, because it is a cloud service such as a CDN, you can still develop and test the hypothesis that these components contribute to the bottleneck.
We have just begun to explore the world of cloud computing, and load testing in the cloud. I am sure we will see some very interesting developments in the near future.
For more developments in cloud computing follow my blog at http://blog.clubcloudcomputing.com
Peter HJ van Eijk is a trainer, writer, consultant and speaker on Cloud Computing and other digital infrastructures, based in the Netherlands. He is master trainer for the Cloud Essentials course www.cloudessentials.net and a Cloud Credential Council Certified Trainer.