#NeotysPAC – Performance — Raw Data: The #1 Superfood for a Performance Tester, by Stijn Schepers

Introduction

I was stoked that Neotys organized this years’ Performance Advisory Council (PAC) with the theme JURASSIC. The idea behind this JURASSIC PAC is to go back to basics and explain that “one lesson” which was the most valuable throughout your career.  I do believe that mentorship is the most efficient way to gain knowledge. Every senior performance engineer should have the moral duty to share knowledge to peers getting into performance engineering.  The lesson that I want to share is that a performance tester should be looking at raw data when analyzing the results of a load test. Raw data is the “superfood for a performance engineer.” This lesson is one of the “tips” which I described in the following LinkedIn article: 25 TIPS FOR A PERFORMANCE ENGINEER | LinkedIn

What is raw data?

For a performance engineer, raw data can be defined as the measurement of response time for every single request as opposed to averages and aggregation.

For a load test tool the ability to get to the raw data is a crucial feature.  Some of the “cloud based” load test tools may only provide averages. The reason is that a load test can generate a lot of raw data, and by aggregating the raw data during runtime, the software provider can reduce the amount of data collected. This is a huge issue when you want to analyze your results properly and solve system bottlenecks or tune an application. Therefore, you really need to look at raw data and analyse the patterns. Software providers of load test tools should consider adding the collected raw data and performance measurements into one central data repository which is open for engineers to look at.

Why is raw data the superfood for a performance engineer?

The mission of a performance engineer is to provide end users with a great digital experience. Therefore, it is crucial that a performance engineer is able to analyze the response times in the finest detail or granularity. Averages hide system bottlenecks and smooth out imperfections. As a performance engineer, you need to see the response time patterns in the finest granularity. Patterns show you the “ugliness” of an application. You need to see this ugliness to decide if tuning and optimization is required, which depends on the business requirements.

Same stats, different graphs

While preparing for the NeotysPac, I came across an article describing the Datasaurus. Brilliant! I need dinosaurs when talking at the “Jurassic PAC. I downloaded the dataset and visualized the data in Tableau, a business intelligence tool I’ve been using for many years to explore data. The data provides us with 13 different figures. You can see 12 of these figures in the image below.

All these figures have the same statistical summary. Is this not amazing?! Thirteen sets of the same statistical information but with a totally different pattern. The Datasaurus illustrates the weakness of statistical summary and why patterns based on raw data are so important for a performance engineer. Driving a CI/CD pipeline based on (just) an average response time is asking for trouble.

Practical examples of result analysis based on raw data

Banding and performance spikes are two quite common patterns I’ve seen many times when analyzing the raw data. Using averages may not be sufficient to understand the application performance.

Example 1: Banding

In the figures below you can see the results of a load test. The response time (y-axis) is displayed against the duration of the test (x-axis). The first graph is based on averages; the second graph is based on the raw data. Looking at the first graph, we can see a stable response time with an average response of 213 ms. However, when we look at the second graph displaying the raw data, we see a different view. We can clearly see one group of requests with a response of a few milliseconds, a second group with a response of 400 ms and a third smaller group of 1400 ms. Only by looking at raw data you can clearly see a pattern of “banding.” A performance engineer needs to see these patterns to be able to make a correct decision and inform stakeholders with the true story of application performance.

Example 2: Spikes

The second example displays the response times of a load test where “performance spikes” happened. A spike is a sudden significant build-up of response times. The figure with the averages does not show these spikes.  The raw data shows spikes with max response times of around 10 seconds.  You need to see this behavior of spikes to decide if tuning is required. Spikes are quite common. As an example, major garbage collections for Java applications can cause “stop-the-world” events, a situation where Java stops processing while it’s doing the garbage collection.  If you want to learn more about stop-the-world events, the following blog post is a great start:   https://www.dynatrace.com/news/blog/major-gcs-separating-myth-from-reality/

How do I get the raw data?

Now that we know that raw data is superfood for a performance engineer, how do we actually get this raw data? And what do we do with the raw data? How do we make cool visualizations that tell a story?

Step 1: Export the raw data

Below we describe how to export raw data with NeoLoad GUI. The same is possible with other tools like JMeter or LoadRunner. Look into the documentation of your toolset to understand how to retrieve the raw data. For JMeter, Stephen Townshend’s blogpost is a must read:  #NeotysPAC – Tableausaurus Rex: Analysing JMeter Results Using Tableau Desktop.

After a load test is finished in NeoLoad, go to the “Results” section and the “Value” tab. Here you can click on the Export button to export the raw data (Raw results) of all Transactions. With NeoLoad, you are limited to retrieving the raw data on a transaction level. Transactions you define in the Design of a User Path.

You choose where you want your results to be exported to, and the columns separator. Save the file as a .csv file.

The exported file will look like this:

Step 2: Add extra information to the exported data

It is quite common to add an extra column “RunID” to the exported raw data. A RunID identifies every single test execution. The first load test will get RID001, the second RID002 end so on. A RunID is very useful when you combine all your results into one centralized “Global Result” file as it will provide you with a column “RunID” to filter your results on.

Step 3: Import the raw data in Tableau

We use Tableau as a business intelligence tool to visualize the data. You can download Tableau here. There is a 14-day free trial. Enough to have a decent play with the tool.  After you have installed Tableau , open Tableau and connect to a Text file.

You will get a screen that looks like the one below.

Click on “Sheet 1” to open your first worksheet:

The sheet will look like:

Change Time to a “Date & Time” data type:

The icon will change from string (Abc) to DateTime:

Step 4 : Visualize the results (raw data) of a load test

Graph 1: Performance over Time

Right click on Time and drag and drop the field into the columns section, choose Time (Continuous).

Right click on Response Time and drop the field on Rows, choose “Response time”

Great! Now you have created your first basic graph with Tableau: “The response time over time.”

You can add more dimension to the graph by using, for example, the color dimension. Drag and drop the Element field on the color dimension.

Different elements (transactions) will get a different color. Drag and drop the Success field on Shape to visualize the error transactions:

Dimensions are very powerful in Tableau and are extremely useful when creating customized graphs detailing the outcome of a load test. You want these customized graphs to tell a story to your stakeholders. Adding “Reference Lines” can help to bring this message across.

Often we add a reference line to the results. To do so , click on the y-axis “Response Time,” Add Reference Line.  We are going to add a Reference Line showing the average response time.

The format of the average can be changed easily by right clicking on Average, selecting Format, and changing the value to a custom number with 1 decimal place as an example.

Other useful reference lines can be 95th percentile. You need to choose Distribution, and as Value the Percentile that you want to visualize.

Filters can be used to filter on specific data. Here we will add two filters: one filter on Time and one filter on Response Time. To do so, drag and drop the field Time on Filter and select Relative Date. Push Next and select Range of Dates. For the Response Time, right click on this field and drag and drop the field on Filters. Select “All values” and “Range of Values.”

For both created filters, right clicking Show Filter will provide you with 2 sliders which can be used to drill down into specific data areas.

You can use the slider to zoom in and focus on specific data and patterns. When you analyze system bottlenecks (banding, spikes, etc) this can be very useful:

Note: best to provide meaningful names to the sheets by editing the name. Double click on “Sheet 1” and change name to Response Time.

Graph 2: Throughput

To create a second graph detailing the throughput, select Worksheet, New Worksheet.

Drag and drop the field Time on Columns like we did with the Response Time graph.

Drag and drop the field “Element” on Rows and select CNT(Element).

To display the throughput in Calls / Minute, change the Time to Minute:

Add a Reference Line which displays the maximum throughput.

Edit the name of the x- and y-axis to more meaningful names and update the name of the sheet to Throughput. This will provide you with the following figure:

It can be very useful to combine two or more worksheets into one dashboard.

Choose the Sheets to be part of this Dashboard. We will add both the Response and Throughput sheet to the dashboard.

Step 5: Visualise the results (raw data) of multiple load tests

We have shown you how to analyze one result file. By combining different result files into one global result file, Tableau is very powerful to easily compare different test execution runs.

In this example, we have 3 different test execution runs. We have modified the data exports and added one extra column RunID. We have created a merge.bat file which will create a new globalresults.txt file which is basically one .txt file with the combined data of the 3 exports.

Use the Tableau worksheet you have created in the previous steps and change the data source to point to the globalresults.txt.  Errors will appear, which you can close. The errors are because the globalresults.txt file has the time field as a string format which we need to re-adjust.

In this example, we have 2 measurements in Columns that need to be adjusted and are not properly working. This is related to the Time field that needs to be changed to Date & Time.

The figure below displays the 3 execution runs.

Adding a filter on RunID provides you with the ability to look at one or more specific result sets.

Note that you can make your filters on a sheet apply “globally” to all sheets using the same data source. The is useful for dashboards which combine different Tableau worksheets.

Add a Tableau Calculated fields: Offset time.

With an offset time it is easier to compare different runs and to use – at the same time  –  the datetime functions of Tableau.

To create an Offset Time, right click on time and click on Create-Calculated Field.

The calculated field “Offset Time” : DATETIME([Time]-({FIXED([RUNID]): MIN([Time])}))

Drag and drop Offset time to columns and RunID on the rows. The figure below displays the difference between Time and Offset Time.

Using the offset time provides you with a Date Time field that is extremely handy to do comparisons. You can add the original Time field on Detail to understand at which exact time a measurement was done.

If we go back to the Tableau dashboard with Throughput and Response, we can now get a lot more detail out of the results than just looking at the same data using a load test tool.

Adding more execution runs

You can keep on extending the globalresults.txt file with additional execution runs. Let’s say you have done an extra execution which you want to add to the globalresults.txt.

Just re-create the global results file by executing the .bat file after you have deleted the old globalresults.txt.

Assuming the data source of Tableau is this file, refresh the data (F5) and the RID004 will show up. Easy!

Conclusion

Raw data is superfood for a performance engineer. It provides the engineer with the finest granularity of data, which is needed to detect bottlenecks and solve performance and stability issues quickly. Tools like Tableau are very powerful to visualize results of a performance test based on the raw data. This will help an engineer to bring a message across to stakeholders. My hope is that at least one person, who is reading this blog post,  will be intrigued by raw data and adjust the way she/he is analyzing performance testing metrics.  This new way of analyzing data will definitely lift up her/his career and will make the world of bits and bytes better for the end user. And at the end, that’s why we are doing performance engineering!

More reading – I love raw data

Great! Convinced that raw data is what a performance engineer needs to detect bottlenecks and improve end-user experience? Some great articles and videos that may be worthwhile reading/listening to:

Equinox IT finds insight in millions of rows of Telco data (tableau.com)

#NeotysPAC – Performance Testing is not an Average Game, by Stijn Schepers

#NeotysPAC – Tableausaurus Rex: Analysing JMeter results using Tableau Desktop, by Stephen Townshend

How to get a better understanding of your performance test results | LinkedIn

www.performance-workshop.org/wp/wp-content/uploads/2013/12/Chasing_Tornadoes_Davies.pdf (performance-workshop.org)

Download: PAC DataFiles and Tableau Workbook

If you want to know more about Stijn’s presentation, the recording is already available here.

Leave a Reply

Your email address will not be published. Required fields are marked *