I had an absolute blast at the second virtual #NeotysPAC. Between the relevant content and networking, it was well worth my time and investment. I must commend Henrik, Stephane, and everyone else at Neotys for putting together such an engaging event.
In my talk, “Preparing for the Pipeline,” I introduced value stream mapping as a way of understanding and documenting your performance engineering or performance testing process. This approach is a great way to look for opportunities to optimize how you work – removing unnecessary tasks, simplifying, or automating them.
I then narrowed in on automation as a way of being more efficient in our work: not just automated test execution but process automation in the broader sense. I explored what makes a task suitable for automation or not, and the concept of starting small before building up more complex automated pipelines.
For the demonstration, I introduced examples of how we can change the way we use our load testing tools so that we are ready to automate. I used JMeter, but the same concepts could be applied to any load testing tool.
In this blog, I will look at the other side of the coin – how can we simplify our work without doing any automation at all?
Load testing tools generally simulate network traffic. The historic reason for this is that network traffic is extremely resource-efficient to generate. On a single mid-spec laptop, you can potentially simulate thousands of threads or users executing a high volume of activity.
Over time, web and mobile apps have adopted increasingly rich interfaces. As a side-effect, the network traffic that we work with is growing in complexity — for example, asynchronous interactions, client-heavy computation, and dynamically generated payloads.
One of the systems I regularly test is for creating documents. It includes a fully functional word processor that runs entirely in a web browser. The network traffic that passes between the browser and the server when you generate and submit your modified document is diabolical:
- The payload are hundreds of kilobytes in size, all dynamic content
- Parts of the payload are Base64 encoded
- Parts of the Base64 content are URL encoded
- There are hundreds of dynamic fields you need to correlate – some of which are within the Base64 encoded parts of the message
I’ve load tested this system before using network traffic – it is possible. But it’s not practical. The effort required to build and maintain the test suite is enormous. In a rapid delivery model like agile or CI/CD, you stand no chance of keeping up.
To simplify the problem, I have taken a different approach to generating load. Instead of simulating network traffic, I spin up multiple instances of the Chrome browser and interact with the UI in each browser instead. The browsers handle all the complex network traffic for me. As a result, my load test suite is magnitudes simpler and easier to maintain.
A byproduct of this approach is that you need more hardware resources to simulate the same load. Using network traffic, I can simulate thousands of concurrent threads from one of my load agents, but only 30 concurrent browser sessions before response times are impacted. That’s enough for the load I need to apply int this situation. It has turned out to be an elegant solution to keep my load test assets simple.
If you have never performed UI automation, there is a bit of a learning curve. Interacting with UI elements is a little different than network traffic, and can take some time getting used to.
In my case, I am using JMeter with the WebDriver Sampler plugin. I’m using chromedriver and spinning up instances of Chrome in headless mode. The side benefit of using Selenium WebDriver is the collaboration potential. The delivery teams I work with know WebDriver, which means not only can they maintain load test suites I build using WebDriver, but I can also take their existing test assets and turn them into performance tests. My colleague Philip Webb spoke about this at PAC #4 and how they’ve taken WebDriver performance testing to a whole new level with the Mark59 framework.
There seems to be what I would consider a religious debate going on about whether to performance test in fully integrated end to end environments or to mock out external components. As with most of these types of decisions, I think the answer is “it depends.”
Entirely integrated end to end environments is excellent. If you want realism and to understand the full end to end customer or user experience, nothing beats them. But system complexity is on the rise. What if you want to performance test a system which has dozens of external components that you require to run a test? If one of those components is down, you can no longer execute the test. It’s just not going to work in a rapid delivery model where you could be deploying multiple times a day.
Sometimes it makes a lot of sense to mock out some or all external integrations so you can focus on performance testing a specific system. The benefits include:
- More control over your environment
- Fewer dependencies and more availability
- Less cost to build and maintain
What it does require is someone to build the mocks. Most solutions include an API layer between components, and this can be a great place to create your mocks. It also requires finding someone to develop them if you cannot build them yourself.
One significant risk to call out is that mocks need delays. Your mocks could include delays that match production behavior. However, it would help if you also were careful about timeouts. The worst performance-related issue I’ve ever experienced was one where one system was calling a downstream system – a process that took a significantly long time (25 seconds per API call). Some unanticipated single-threaded behavior in the calling system ended up crashing it even under trivial amounts of load. To check for these types of issues, you must run a test where the mocks are set to time out.
To me, shift right is as simple as “doing more in production.” It’s about taking advantage of the fact that we now release software in smaller chunks more frequently. Each release/change carries with it less risk, which allows us to consider dealing with performance after it goes live (in some situations).
The backbone of shift right is application performance monitoring (APM). These are monitoring tools that have visibility of every line of code, every request, every database query made by your applications. They allow you to diagnose performance issues rapidly. What used to take days or weeks to diagnose can now be done in minutes.
Some other shift right options include:
- Canarying is releasing changes to a subset of your users, monitoring how it goes, and if everything works fine, then releasing it to the rest. APM is critical here. I’ve heard New Zealand is Facebook’s “canary.”
- Synthetics. I’m not sold on this yet. They are mostly simple automated tests to check your endpoints in production to make sure they are up and performing regularly.
- Rapid roll-back is an excellent way to manage risk. If you can roll back a deployment that has performance issues in a minute or two, you can effectively mitigate the risk of deploying changes.
Shift right is not and never will be a replacement for performance testing – but it is a pragmatic way to manage performance risk in the right context. Ideally, we should be stretching performance testing in both directions (left and right) to reduce that clunky, expensive end to end performance testing that does not serve rapid delivery.
Collaborate with Delivery Teams
I’ve discussed how by switching to UI automation load generation, we can start to collaborate with developers and testers. Here are other possibilities:
- Get delivery teams to do their component or API level benchmarking. This is achievable in the right context, also enabling you to mentor and collaborate with them to develop their capability.
- Getting developers into single-user tracing is a great way to catch many performance issues. This involves looking under the covers while using a new or modified feature in a system (without load, just a single user). This could be using an APM tool, profiler, or even just a network proxy like Fiddler or Chrome developer tools. It can be simple things like checking for parallel versus sequential requests, client-side caching, and compression of images.
- If delivery teams build in performance acceptance criteria into their user stories, it gets everyone thinking about performance from the start.
One thing I was taught a long time ago was to engage early on with initiatives or projects by conducting a performance risk assessment. Assessments allow you to look at what’s about to be built, helping to also where there may be potential performance problems. The best part – such exercises get the entire project team (including architects) thinking about performance before anything’s developed.
The goal is to get delivery teams feeling accountable for the performance of their systems and hopefully excited about performance too.
Do More Simple Stuff
I suppose I should call this “do less end to end performance testing.” Let’s take advantage of the fact we’re introducing less risk with each release to do more earlier, more in production. This allows us to keep the scope of our end to end performance testing at a practical level – and able to adapt and change it over time.
If I was to draw a new “pyramid” for performance testing, it might look more like an hourglass or yo-yo:
Don’t Buy Untestable Software
Organization, don’t buy or build untestable (or difficult to test) software. If nothing else, this should be a key factor when deciding on a technology. The cost of dealing with impossible to identify dynamic UI objects or mind-boggling network traffic outweighs any licensing cost savings you would otherwise make.
Putting it All Together
It’s all about acknowledging that the way software is delivered has changed. We now release smaller changes more often – and we must adapt. It’s a double-edged sword. On the one hand, it makes life harder because things are moving faster, and we have to keep up, but it also opens unique opportunities to do a much more full breadth of activities than we used to. This transition is what takes performance testing into the realm of performance engineering.
Learn More about the Performance Advisory Council and Performance Engineering
Want to see the full conversation, check out the presentations here.