This week’s post is taken from a collection of stories and a Q&A session based around the experiences of Brad Stoner, an expert in testing, performance and Virtual Users. Brad is a senior performance engineer at Neotys and was the manager of the Load and Performance Team at H&R Block. In his tenure there, he managed five people and successfully improved application performance and quality for the company. In a given year, it wasn’t uncommon for Brad’s team to be tasked with more than fifty projects around the performance testing process. This experience led Brad to found his own venture in Sandbreak Digital Solutions, a consulting company with a focus in web application performance testing, web page optimization, front end optimization, capacity testing, infrastructure validation and cloud testing. With twelve years of experience in IT, Brad has held multiple roles that range from systems engineering to operations management and we can certainly learn a lot from him.
The Neighbor’s Cookies
Brad recounts multiple examples from his experience in which his classic performance testing methods came up short in some fashion. He explains what he did to correct course in each situation and how he learned from it. In a story called “Keeping An Eye On Your Neighbors,” Brad explains how he was able to narrow down a client’s problem to neighboring servers with too many connections, leading to spikes in CPU. In another story called “Outside The Firewall,” he learned that testing from both inside and outside the firewall (not in isolation) is important as some defects can only be found at full load.
In another example called “All About The Cookies,” Brad explains how a client had called him with a catastrophe in production. Everything had been tested and should have been completely fine. There were no large media or marketing occurrences at that time, so this performance spike was an odd thing to see. Brad looked at the servers and saw some were handling a much higher load than others – not the same as when the tests had occurred. His client was using cookie persistence to spread the load evenly during the test, but since load generators were sticking to web servers, the entire test they ran was bad. After looking at the data, Brad could see that large numbers of users were logging into one particular application at the same time and it was causing CPU to spike. As a result, users were getting kicked off right when they logged on! The ultimate culprit: incorrect cookie usage.
Q&A With Brad Stoner
1) What is the impact of having performance testing tools installed on a virtual machine instead of a physical machine to generate load?
Brad: Honestly, to generate the load, I like to have a controller on a physical box. But, if I’m running a test from the cloud, it’s all virtual anyways. It’s hard to tell in reality if it’s virtual or physical box so unless it’s a perfect storm conditions (like in the “Neighbors” story), you’re not going to notice a difference. I’m just as confident running a test from a physical box – both from my controller and my load generators – as I am from a VM. I’m not phased anymore. Five years ago had you asked me, I would have said, “I’m not doing anything on a VM”. That’s come a long way.
2) You told a story of a tester who ran a report during a long-running load test because he suspected it might cause problems and wanted to see if that was the case. If the tester knew this report would cause errors, why didn’t he bring it up beforehand?
Brad: If you are familiar with a system and have an idea of where there might be a weakness, setting up and saying you are going to push that weakness causes the tester to look much closer at the system. If you do a test without really telling anyone, and it’s in plain view, it’s a lot easier to tell that this will be a significant problem by seeing what results. It’s great because the team will perform these activities, a surprise problem occurs and I can pinpoint it right then and there. Then we repeat it to stabilize the system and begin to isolate the one-offs very quickly.
3) How do you calculate the number of virtual users to test with?
Brad: That’s a good question. When looking at load testing profiles, we look at two things. The first is the amount of work being done on a system at a transactional rate. Second is the number of virtual users. If you mess with those two things, you can really have problems with your performance results. You need to identify how long the use case takes in real life, how many times a use case needs to be executed, do the simple math to get your total concurrency and get your number of virtual users online.
4) When generating load tests, is it not possible to have functional or integration testing run (for example, with Selenium testing)?
Brad: It’s possible. If you have an internet facing app, you can do what you want. You can have regression testing, functional testing, manual or by hand. It’s on the right path to get all of these activities to occur at the same time. In performance, if looking at the 80/20 rule, you’re not treating a lot of pieces of the application. So, the more pieces you can get in there to better emulate what going on in production, the more defects you will find and the more success you are going to have when the application launches. You might have to use other tools, but what’s important is how you’re exercising the application, that’s the main focus.
5) What do you think would be a good remedy for database connection or session timeouts?
Don’t kill persistent connections (with firewall issues). It’s done as a protective measure. Applications aren’t that resilient. When performance testing, you’re not focused on resilience. Negative testing is a great way to see what can happen from these activities. If you can do it, kill connections, bring down a server or database and pass the data to Operations or Development Operations and say if this is what you see, here’s the problem. Use the data to define the incident quickly. Always look at database connection pooling. I will never write a test report again and say everything’s good without looking at trends in connections, threads, open cursors and 10-12 other areas of an application that may come back to bite you if you tested longer. Look at response time and CPU, but really dig down to other things that interact with the user, to ensure that those don’t have some kind of wobble or negative performance trend in them.
Hopefully you were able to pick up a few knowledge points and lessons from Brad’s follies and experience in performance testing. After hearing quite a few examples of how things went wrong, we were able to learn how to make our tests better. Things such as connecting with agile and development teams more often, altering penetrations and performance testing among many other tidbits can completely change the outcome of your test and the end user experience.