#NeotysPAC – Performance Testing On The Unit Level by Joey Hendricks

 

A huge shoutout and thank you to Neotys for organizing this amazing event! I believe that the PAC is one of the best platforms for performance engineers to share their expertise and insights. It was the very first time I presented a topic on an event of this scale, which was an immensely exciting experience. 

Within this blog post, I will walk you through the concept of testing code early and we will see how we could create a type of performance test that fits perfectly into the world of Test-Driven Development.

To give practical examples, I have developed an open-source Python framework that can aid in preventing your beautiful code from turning into a lazy sack of potatoes.

Don’t go Jurassic and test your performance early!

As a millennial in his twenties, it can sometimes be hard to grasp how different our online world looked back in the days. The world of application performance has drastically changed over the past decades, yet sometimes one comes across such an egregiously slow running application that it makes one think as if they were powered by a bunch of overweight hamsters running on a wheel.

A bad performing application will give your customers a horrible digital experience which can result in decreased trust in your brand, a decline in online interactions, and a tarnished reputation, ultimately leading to a decline in revenue.

So for a lot of modern organizations with an online presence, bad performance could lead to nightmarish situations. That is why we do performance testing to verify that applications adhere to their non-functional requirements and that they do not buckle under peak load moments.

However, many times when we, the performance engineers, show up at the party, a lot of performance defects have already been introduced into the application. Our testing efforts will then usually lead us to discover defects that could result in optimization and additional changes to the code.

As this backtracking can be time-consuming and expensive, it would be best to start validating the performance of the code as early as possible, allowing developers to find potential performance bottlenecks early and deal with them swiftly and in a timely matter. It essentially gives performance engineers room to focus their efforts on helping teams tackle more complicated performance problems that might pop up in the later stages of the development life cycle.

How to test the performance of code early.

So how do you test the performance of code? Well, most developers would argue that profiling is just that but what exactly does it do? For a developer, this might be very obvious but not everybody is familiar with the concept. In a nutshell, profiling is done with the help of a tool to gain insights into the performance of the code. Most of these tools will focus on measuring the duration of your code and tell which part of the code takes the most time.

This information can be very helpful for developers because this tells them exactly what part of their code is causing the slowdown and will likely contain the performance bottleneck. However, this work is usually only executed by a developer after a performance problem has been brought to their attention.

So how can we as performance engineers bring these slowdowns earlier to their attention? In my opinion, this can only be done by shifting-left and to encourage developers to do more profiling or create automated performance tests for their code.

When we shift-left with our performance testing efforts, it could be possible that we have to test against units of software rather than the entire stack. Since it can be unclear how big or small a unit of software can be, it would be best to roughly outline what constitutes a unit of software:

  • a set of procedures or functions, in a procedural or functional language
  • a class and its nested classes, in an object or object-oriented language

For functional testing, we are already using a concept called Test-Driven Development to create test cases for a particular feature. These test cases are based on the business requirements and are normally written before the development of a feature begins. These tests will then use all the relevant units of software to form the feature and simulate its use, thereby allowing developers to simultaneously kick off these predefined tests while writing the code, to verify if the feature meets the pre-defined requirements.

Mainly test-driven development focuses on validating the functional aspect of a feature but it is very well possible to also check if the performance of a feature is meeting our expectations. Two examples of what we can achieve with these kinds of performance tests are the following:

  • A regression test that checks if a change has not impacted the performance of the feature.
  • A boundary test that verifies that features or the units within don’t breach a set boundary.

To get this type of testing to work properly, non-functional requirements must be defined for the feature you are developing. It allows your team to include performance into the definition of done and define test cases for it. This switches the focus of teams to also think about performance more proactively and this extra testing could lead to more performance defects being found in the early stages of the development life cycle. Another consequence could be that teams find faulty design more often and involve performance testing experts earlier during development when facing performance challenges. Allowing performance engineers to advise teams on design decisions or point out other potential performance issues.

This entire concept is not without its pitfalls, lack of proper communication between technical teams and the business side can cause poorly defined performance requirements or none at all. To avoid this issue, better synchronization within the two parties is necessary, which is why we, performance engineering should encourage teams to converse amongst themselves and design concrete examples to formalize a shared understanding of how the feature should perform.

With these efforts, we can help switch a company’s development culture towards becoming more performance-oriented. As soon as companies shift-left with performance testing, their teams will be able to detect potential performance problems much earlier. To save time and effort, it would be helpful to automate these tests but for that, we would need a framework that could fail a test when the code is facing performance problems and would give key insights into the possible issues without potentially creating additional problems.

Sorry for the long-running code, here’s a potato

For those of you meme enthusiasts, you might have definitely stumbled across one of those many long posts on social media where at the end, there is a picture of a potato with the text “Sorry for the long post, here’s a potato”.

Similarly, I decided to take a page out of the social media etiquettes rulebook and started naming everything that would be slow or not working properly, you guessed it right, a potato.

So when I was helping a friend of mine debug slow code, we kept referring to it as “potato code” because it ran so frustratingly slow. As we went through the steps of fixing up the project so it would run faster, I started thinking to myself, how could I possibly make this easier to work with? How could I gain insights faster and in a targeted manner? But first and foremost, I started to think how would it be possible to test the performance of code earlier byways of a technically similar concept as that of a regular unit test?

From the frustration, an idea for an open-source framework called QuickPotato was born. So I started working on making QuickPotato a reality. I wanted my new framework to mainly do two things, first I wanted to verify if my code slowed down, and secondly, if it did slow down I wanted quick insights into what was making my code perform like a lazy sack of potatoes.

So with that as my mission, I set out working on my first target and implemented a technique that would allow QuickPotato to measure and profile the performance of Python code. I also wanted QuickPotato to fail a test if it breached a defined boundary on many different measurements like the max, min, average, and 95th percentile of the response time.

Besides that, I also wanted QuickPotato to detect a change in the performance and fail the test if this change was too significant. So I went ahead with a simple statistical test called a student’s T-test to check if there was a difference between the previously executed test and the current test, thereby allowing developers to compare a baseline against a benchmark.

Now with the first target behind me, I started working on the second. Since QuickPotato profiles all the code that it tests, all of the data is available to create powerful visualizations for analysis. Choosing to go with a flame graph to project the performance and the code paths, I finished the two problems I initially set out to solve and gave QuickPotato its first visualization which developers could use to interpret their code.

Baking potatoes on code flame graphs

One of the strongest visualizations that I know of, which can represent how well the code is functioning would be a flame graph and therefore, it was my first choice in terms of visualization. I understand that for some performance engineers, the term “flame graphs” might be unfamiliar so let’s quickly highlight what they are.

Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified. The example given below demonstrates how a flame graph, generated with QuickPotato, should look like:

Example of a simple flame graph

 

“The code that this flame graph was generated from can be found on my GitHub.”

These graphs might look complicated but once you understand how to interpret them, it will become much clearer and you will see what remarkable insights they can deliver. The following points can help make sense of the flame graph shown above:

  • Each box is a function in the stack
  • The y-axis shows the stack depth, the top box shows what was on the CPU
  • The x-axis does not show time but spans the population and is ordered alphabetically
  • The width of the box shows how long it was on-CPU or as part of a parent function that was on-CPU
  • The colour of a box is how much time relative to the maximum time was spent in that function. This means the more red a box is, the more time was spent in that function (only done for graphs generated with QuickPotato).

If you want to learn more about flame graphs, I would recommend you to visit Brendan Greg’s website (he is the creator of flame graphs) and he has a lot of documentation around this topic which is definitely worth a read.

Turning lazy Python code into an Olympic athlete.

If you have come this far in this article, it would be unfair to leave you without a proper technical demonstration of the concept which I have explained so far. That is why I have added the animation below to provide a piece of example code that we can performance test. For more in-depth technical information, I would recommend you to check out my GitHub.

“The code above functionally counts the number of digits in a number.”

Wrapping your mind around a complex performance problem is not easy and setting everything up which could give you a good view of the problem at hand might be time-consuming. When facing a performance problem, you would want to quickly pull out meaningful statistics and have them available in an understandable visualization like for example, a flame graph.

Within the following two examples, I will showcase how it is possible to quickly and easily profile your code and pull out a flame graph, which we can use to understand what components within the code are behaving slowly. Below you can see how it is possible to quickly shape your code to be able to profile it without much hassle:

The next step would then be to generate a visual which could be used so you can see what function in the code is slowing down. For this example, we will use a flame graph, which is rendered by adding the following code:

Once this improvised test is executed, it will generate a flame graph that looks like the example below:

With this information, a developer could start hunting for the root cause of the performance bottleneck. It is also possible to create a test to detect if the performance of the code has been impacted and this can be done the following way:

As you can imagine, it would be fairly easy to adopt these tests into a TTD testing framework, which we can use to detect a change in the performance of code. If these tests fail, it would be great if we can also automatically generate a flame graph which we can use to analyze the profiled results. Preferably, we would want a comparison between what the performance was before and after the change.

Within QuickPotato, we can compare two flame graphs, baseline vs. benchmark allowing us to quickly spot a significant difference in the behaviour of our code.

The test that is being executed in the animation above is failing because we have added a sleep function slowing down our code significantly compared to our baseline, therefore failing the tests and rendering a flame graph comparison.

With the bottleneck now visualized in the profiled results, it is very easy to quickly remediate the mistake and remove the sleep function. Because we have total flexibility within a “unit performance test”, we can specify exactly what kind of use we want to simulate, thus allowing us to create handy performance tests to test our code often and early for potential performance defects.

Final word

I truly love the idea of testing the performance of units of software as early as possible and I believe doing this kind of early performance tests could be very beneficial to improve the overall speed, stability, and quality of an application as a whole. I had a lot of fun with this topic and intend to continue improving QuickPotato, so it can become a more useful and powerful framework that can help Python projects become lightning fast.

If this blog post has piqued your interest and you want to learn more about the technical magic behind QuickPotato or want to help improve the project with your skills, I would recommend to visit my GitHub and check out the project, or to get in touch with me through LinkedIn. I am always open to any technical questions and helpful suggestions to make this open-source project a success.

 

If you want to know more about Joey’s presentation, the recording is already available here.

Leave a Reply

Your email address will not be published. Required fields are marked *