#NeotysPAC – Super Saiyan Performance Engineering, by Stephen Townshend

performance engineering best practices

It was an honor to take part in my fifth Neotys PAC, flying to the other side of the world to talk about performance engineering once again. I want to take the time to thank Neotys for putting together what I think is the best performance engineering conference in the world.

One of the common themes during previous PAC events has been AI and machine learning. I’m the first to admit I’ve been skeptical in the past, but I’ve been part of a team over the past six months who have put together something special, which I want to share with you today.

Let’s start with a high-level overview.

I want to introduce to APT – the fully automated performance testing framework. APT is an AI-based framework that is continuously consuming data from production. It pulls in application logs, performance metrics, and sniffs the traffic between the users and the system.

APT’s goal is to identify “scenarios” of interest. For example, it identifies the busiest five-minute period each day. It also picks up deviations from the norm – for example, users accessing some functionality they usually don’t touch, response times spiking to higher than normal levels, or perhaps a period where a burst of errors occurred. During the day, APT identifies all the conditions that led up to these events and remembers them.

Overnight APT takes the “scenarios” identified during the day and uses them to generate load test assets dynamically. It then executes these tests against our performance testing environment, which has the next release deployed. Because of this, we are simultaneously testing the performance of the next build and trying to reproduce issues that occurred in production.

Here’s the kicker – APT is also listening to our performance testing environment. Much like the tool Akamas, which Stefano Doni has presented at previous PAC events, APT has access to modify infrastructure, platform, and application-level configuration. When APT can reproduce or find a new performance issue, it attempts to improve performance or resolve the problem itself by modifying the system configuration. If APT was able to improve the performance, then those configuration changes are pulled into the next release.

That’s the high-level overview of APT. Before I go into the technical detail of how we implemented this, I have an important question. Did you believe a single word I just said? Because I fabricated the entire story. There is no such thing as APT.

My topic today is Spotting Bullsh*t in Performance Engineering (much like the steaming pile of manure I just laid before you).

I’m going to cover three topics today:
• Bullsh*t in our load testing tools
• Bullsh*t in the cloud
• Bullsh*it with our APM tools

Bullsh*t in our load testing tools

Record and play back

One of the oldest myths in the industry is the concept of record and play back load testing. In short, that we can record a load test script, and without further intervention, it becomes a proper load test that we can execute.

Anyone who has worked on even one performance testing engagement knows that this is rubbish. The only situation this would work would be a static website.

Let’s say for argument’s sake that we could record traffic in a dynamic web application and play it back without any further effort. The problem is we would be sending the same test data every time. The chances are that in the first iteration, we’d populate one (or multiple) caches, and then for the rest of our test, we’d just be pulling the same record from the cache. That’s not a meaningful test.

Even if there was no caching, the test data we use has an impact on performance. As a simple example – in the insurance industry, we have customers who have one or more insurance policies. Searching and retrieving a customer with one policy is likely to be much quicker than retrieving a customer with a hundred policies.

Simply put, the concept of record and play back is a marketing gimmick with no value in the real world. The potential victims include junior performance engineers who are perhaps missing guidance, people who have no experience in performance testing but have been asked to do some anyway (maybe an automated functional tester or a developer in a delivery team), but also decision-makers who have been asked to choose a tool or approach but do not have any experience in performance testing.

No coding necessary

Another typical marketing message is that you can use a particular load test tool without needing to do any programming. Here are some example quotes from load testing tool vendors I picked up from just five minutes browsing the web:

I’m not here to argue that we can use load testing tools without writing code – many tools offer this functionality. You can even use the open-source JMeter without writing a line of code in many situations. What I am arguing is whether this provides any value.

For one, how do we deal with the unusual? Let’s say we make a request, and a base-64 encoded response comes back. Within this response is a dynamic value we need to extract. How do we do this using our load testing tool? There is probably a tool out there that would allow us to do this without writing any code, but not many would handle this situation. The simplest solution is write a few lines of code to decode the payload and grab the value we need.

The more significant issue here is that anyone afraid of or unable to write code is going to struggle with some of the most fundamental concepts required for performance testing, including:

Network protocols. Even reading and working with HTTP traffic is no less complex than writing code.
Regular expressions are required for most load testing tools, but even XPath is no less complex than writing code. Also, if you are doing UI load generation, you still need to understand how to traverse the DOM using XPath.
• The mathematics of workload modeling and queuing theory are also no less complicated than writing code.

The bottom line is that performance testing or engineering requires technical depth. If writing code is scary, then your ability to provide business value with a load testing tool is significantly hampered. And if you can’t write code, you’re limiting yourself to the most basic functionality that your load testing tool provides.

Automatic correlation

There are two kinds of automatic correlation. There’s the purely automatic “magic” correlation that does everything for you, but also framework parameters where you define the extract and bind rules.

I’m going to focus on the 100% automatic correlation first. This is where you complete a recording using a commercial load testing tool, and a message will pop up offering to automatically correlate dynamic values – mostly doing a lot of the hard work for you. It sounds great – so you hit the button. The next thing you know, the tool has made an absolute dog’s breakfast of your test script. You’ll spend the next three days picking it apart and fixing it up. Some of the common issues I come across are:

• Fragile regular expressions such as looking for the 214th instance of a ubiquitous HTML tag. This will break immediately with new test data or any change to the application.
• Non-trivial situations won’t be handled. For example – what if you need to extract a value but then submit it again later on with a salt (random characters) added to the end of it. Automatic correlation will either fail to handle this or do it incorrectly and cause all kinds of problems.
• Your test suites end up with a lot of technical debt due to extraction rules and variables which are not sensibly named and organized, and many extractions which may not be necessary. This can also degrade the efficiency of your load test.

Most of the time, genuinely automatic correlation does more harm than good. It’s another marketing gimmick that offers little practical value in the real world.

Framework parameters, on the other hand, are great – but they are not magic. You need an intimate understanding of both the application traffic and the system behavior to make them work. Even then, there will be situations you cannot handle.

Important notes:

• I do not want to discourage you from seeking simplification and automation wherever possible. What I will say is don’t try to run before you can walk. To effectively automate a process, you first need to understand how it works.
Load testing tool vendors are trying to make our lives easier. My point here is really that no tool can handle every situation – you need to be able to problem-solve yourself. The tool will not do your job for you (even if it claims it will). The loudest voices asking for features in our tools are not necessarily from the best performance experts.

Bullsh*t in the cloud

Scale out/up

I’ve often heard comments suggesting that in the cloud that performance is less of an issue than on-premise because we can scale out/scale up the underlying infrastructure. I was talking to someone a week ago who said they hear from senior IT leaders that “we don’t need to performance test because it’s in the cloud” monthly. This argument can be nullified entirely by the fact that performance is more than just capacity.

Let’s say, for example, we migrate a system from on-premise into the cloud. We performance test before and after and can now achieve 10x the load we could before. Unfortunately, though, every user interaction now takes 5x longer, and there is also a 4% chance of an HTTP-500 error occurring every time the user interacts with the system. Overall performance has degraded despite capacity increasing.

Even capacity issues can continue to occur regardless of how much hardware we have. As a simple example – say you had a server with 32 GB of memory running a single JVM. Let’s say the default JVM settings have not been touched, and you’re only allocating 1GB of memory to the heap. It doesn’t matter how much memory you throw on the server; the JVM isn’t going to use it until you configure it correctly.

Auto-scaling infrastructure can also be the cause of performance issues. Recently I was performance testing some API’s in Azure, which all shared the same app service plan (a set of hardware resources). Performance testing showed that under peak load that the service plan was reaching 100% CPU usage. We did some exploratory performance testing and enabled auto-scaling out when the CPU exceeded 70%. Here is a scatter showing the raw response time during a performance test when the auto-scaling kicked in:

The auto-scaling was triggered as intended, but unfortunately, each time a new service plan was added, it caused every API to stop responding for two to three minutes.

The thing is, it wasn’t just a problem with how we set up auto-scaling. It was related to how the underlying system was designed and implemented. If you intend to use auto-scaling out, then you need to plan for and execute with this in mind from the start. Don’t just assume you can scale out.

Monitoring is simpler

Another perception I often see is that when something is in the cloud that we don’t have to worry about the underlying hardware. It reminds me of a two-year-old child placing their hands over their eyes and proclaiming, “you can’t see me!” (because if they can’t see you, you can’t see them!) Of course, that’s not how it works.

There’s a big assumption that our cloud providers will give us all the information we need to understand system behavior and diagnose performance issues. This is not always the case. Recently we had an API that was taking 30 seconds. We narrowed the culprit down to our cloud gateway – but we couldn’t figure out why. Eventually, we had to reach out to the cloud vendor, who notified us the gateway had been burning 100% CPU for weeks. The reason we didn’t spot this is that the cloud vendor does not provide visibility of this metric! This is scary because how do we protect ourselves from the same thing happening in the future? We have no way to track CPU usage on our gateway.

It’s important to remember that no matter where your code is running – on-premise, the cloud, on Mars, somewhere there is a physical computer with CPU, memory, disk, and network resources running your code. There are many layers of abstraction on top of this, such as virtualization, operation systems, application platforms, etc. The more layers of abstraction there are, the easier the life of a developer – code moves ever closer to human language. But these layers make the life of a performance engineer more difficult. It means we have:

1. More layers we need to understand
2. More layers we need to monitor
3. When a performance issue occurs, more layers could be the culprit

Any layer we cannot see (monitor) becomes a blind spot. When a performance issue arises, we must consider that the culprit might be one of our blind spots. This is a common and frustrating situation in the cloud.

To any notion that monitoring and understanding cloud systems are easier than on-premise, quite the opposite is true from my own experience.

Load test from the cloud

On a slightly different note, there are several tools and services out there, which provide you the ability to build and/or execute performance tests from the cloud. Some of the value propositions include:

• On-demand infrastructure to save cost
• Scale tests massively
• Access your test assets from anywhere

Many organizations have taken a strategic direction to move to the cloud. If you need to build a new load test platform, you might have pressure on you to put it in the cloud. Before you decide, there are some things you need to consider:

• Where are your users in relation to the system under test? For example – if your system is in the cloud but all your users on-premise, you will miss out on testing the impact of the on-premise network if you run your tests from the cloud.
• How often will you be running your tests? If you run tests once per sprint, there is a real value proposition. If you are continually deploying and testing every day, it is likely the costs of cloud services will outweigh anything you had on-premise.
• Are there any security controls you need to consider? I’m of the view that a cloud solution can be just as secure as any on-premise one. But what are the opinions of your organization? You may find your hands are tied.

In little old New Zealand, I have only once come across a situation where I thought, “we need cloud load generation!” That was for the national online census. I know New Zealand is a small country, but I suspect there are not as many situations that warrant cloud services as we might be led to believe by vendors and the industry.

Bullsh*t with our APM tools

APM is fantastic – in fact, it’s necessary for rapid delivery models. The bullsh*t is how we implement it. I hate using the phrase, but, in this case, it has to be said – to get value out of APM, you need to change the culture of your organization.

You can’t just install an APM solution and expect magic to happen. If you do, you’ll end up with a tool presenting generic dashboards full of technical information that no-one looks at or cares about.

I was in a situation a while back where New Relic was set up and installed for a business-critical application. This application has one key business process with five steps. I wanted to know how long each of these steps takes in production – but I couldn’t. Because it was a single page application, the five requests were too similar for New Relic to differentiate between. All it would have taken was a developer adding a few comments in the code to achieve this. Unfortunately, the developers were not even aware New Relic existed. It wasn’t set up and installed in the development or test environments. There were only about five people in the organization with access to New Relic – two operations staff who waited for alerts, and three others who barely looked at it.

This situation is anti-DevOps. It encourages a lack of accountability for performance by the developers. One of the most significant gains you can have with APM is building a feedback loop, so developers can see how their code is performing in the real world (or during performance testing). Even if the operations team had been actively looking at New Relic every day – they don’t have the required understanding of the application to interpret the data, let alone act on it.

To get the most out of APM:

• Make it accessible to as many people as possible. Developers, operations, testers, BAs, architects, project managers, even the business. And don’t just make it accessible in terms of setting up user accounts – make the information digestible. Give the business metrics they care about and can act on.
• Remember the goal to build that feedback look back from production to delivery. If your APM isn’t doing this, you need to change the culture.
• A tool cannot understand your business – you must build a business understanding into it. Focus on what matters to your business, not just the technical detail.

In summary, APM is not bullsh*t, but the way we implement it often is.

How do we avoid bullsh*t?

There’s a lot to discuss here, and I think this will be a conversation I will need to continue in the future. For now, I have five tips to help keep things real:

1. Remember the basics. No matter how complex the situation, there are some fundamentals you can use to make sense of it. Two things that I always go back to are queuing theory, and somewhere there is a physical computer somewhere running your code.

2. Remove the buzzwords. There are so many buzzwords flying around; it’s often hard to make sense of things. If you strip all the buzzwords away, you’re left either with the reality of the situation or nothing – in which case nothing is being said. You might find that “DevOps enabled” actually means “has a Jenkins plugin.”

3. There is no such thing as magic. It’s a silly thing to say – but for new performance testers and engineers – if it sounds too good to be true, it probably is. And many of the things that can make our lives easier require understanding how they work first – you can’t run without learning to walk.

4. It’s OK to say, “I don’t know.” It’s normal to be worried about looking stupid. I do this too at times, but nothing makes you look more foolish than presenting the wrong information, which leads to bad decisions. There’s nothing wrong with admitting you don’t know the answer, many of the smartest and most senior people I’ve ever worked with were the first to say they didn’t know.

5. Drill deeper. To me, this is the most important lesson. How often have you been in a project where everyone understands most of the solution except this one component; no-one quite knows what it does, how it works, or even who is responsible for it. That’s the thing that is going to cause massive issues. When something is fuzzy and unclear – drill deeper. Keep asking until you find the answers. You’ll often uncover things no-one expected.

I initially wanted to cover two more topics – bullsh*t in our organizations, and bullsh*t in ourselves, but that will have to wait for another time.



Learn More about the Performance Advisory Council

Want to see the full conversation, check out the presentation here.

Leave a Reply

Your email address will not be published. Required fields are marked *