How to Analyze the Results of a Load Test Using BlazeMeter
Running a test plan is only 50% of the performance testing task.
The most challenging part of the performance testing process is the analysis of test results and the identification of bottlenecks. Think of the load testing reports as the evidentiary support to prove your innocence to a crime in court (ahem, that we hope you didn't commit).
Load Testing: Active Users v ALL_Response Time
Note your Honor, Ladies and Gentlemen of the jury, if you will, that until minute 02:52, the response time had barely increased. This means that the site that was undergoing stress testing was able to handle about 580 users without issue. However, note that at the 580 active user mark and upwards, some rather suspect actions begin to occur: response time begins to increase to unacceptable values and it is thus imperative that we are able to break down and assess what the possible impetus for this type of behavior might be.
Exhibit A: Active Users v Latency
As you can see in Exhibit A, Active Users vs Latency, the graph has the same exact base points. That implies that that the alleged bottleneck has no relation to issues caused by the webserver itself. Thus we can consider that the performance issues are caused by low network capacity.
While it may be our only assumption, we CAN prove it if we look at:
Exhibit B: Active Users vs Hits/s.
Note that Hits/s ends and increases nearly 3 minute after the Latency and Response Time have started to increase.
Coincidence? I think not.
Such situations demand the observance that at 02:52, the moment latency achieved some critical point, and below which the increased number of users does not influence the network. But we can obviously see that after this point the network becomes extremely sensitive to amount of users.
And in three minutes, at 02:55, latency achieves saturation point. While we can assume that this may lead to errors in requests, it can be checked on another graph:
Exhibit C: Active Users vs Errors
Our expectations have been correct. Note that after achieving the latency saturation point a number of errors have started to increase. This allows for obvious consideration that the reason for the performance issues in this test is low network capacity which led to errors in responses.
From 02:45-02:52 all systems are go and the network is insensitive to the amount of active users.
Starting from 02:52-02:56 the network's behavior becomes shaky and sweaty to the increase of load. Hence, the sharp rise of the Response Time. During this period there are no errors in the responses.
Starting from 02:56, the first errors begin to appear due to low capacity of said network.
These results need further investigation. If the webserver is in one subnet and the load generator is in another subnet, we cannot, in good conscience and with reasonable doubt each way, conclude which subnet is faulty. We need a mediator aka- a load webserver from another subnet to undergo a full review and retesting of all aforementioned actions to attempt to duplicate results. That said, it seems quite obvious that in this particular case the low network capacity has caused the issue.
Verdict: We find the low network capacity ... guilty of causing a bottleneck!
If you are interested in appealing this case, please do so in the comments section below, or forever hold your peace.