Data analysis is a big part of performance testing, load testing, and performance engineering, whether analyzing load testing tool results in logs or application logs for troubleshooting a bottleneck. Data provides critical insights into system behavior, and every performance engineer should be equipped with the necessary skills and tools required to analyze the data irrespective of its format or where it came from.
Most often, the process of data analysis is exploratory. In my Neotys Virtual PAC presentation, I talked about how the process of analyzing load test results could change from one test to the other. For instance, as illustrated in the picture below, analysis is a very fluid process.
For this reason, we require not mere dashboarding tools but tools that support the exploratory analysis of data.
I have been analyzing data for many years now. Early on in my career, like most others, I started by looking at the tables and charts presented by load testing tools. I worked with spreadsheets to parse and visualize log files. This process was often limited and is time-consuming.
When I got introduced to Tableau, I realized how much fun and accessible data analysis could be. Since then, I have been using Tableau for data analysis (and I still use it). It is sturdy and interactive.
But the obvious downside is the cost – it is a paid tool. I was looking at open source options that could help me achieve a similar outcome. (not just a dashboarding platform, but a device that supports all aspects of data analysis – preparation, manipulation, exploratory analysis, and visualization).
As discussed in my PAC talk:
- R is an open-source programming language. So, it’s free and flexible and easy to learn.
- Because it’s programming, you can easily re-use the code and further enhance it with the help of configuration files.
- With R’s recent increased popularity, there are a lot of active online communities if you need support.
- Being a programming language specifically developed for data analysis, it comes with a lot of handy visualization and statistical libraries.
More about R
R is not just another reporting platform. It is a powerful programming language that comes with a massive collection of libraries that support all aspects of data analysis. While R can be used in any domain, in my presentation, I have talked about some of the data analysis challenges specific to performance testing and tried to show ways to address them.
The data we work with comes in a variety of formats such as plain text files, Excel, CSV, or databases (among others). R has a wide range of packages available on CRAN, such as readxl, read.csv, and RDBMS database packages like RODBC.
One thing to note here is that by default, R loads all the data into memory. You might need to change the allocated memory for R if you are working with large volumes of data. If you are working with a database, you might be able to limit the data you deal with using the SQL query you issue.
In my presentation, I introduced essential functions and libraries in R that we can use to solve common problems we can encounter like:
- Renaming a column into something meaningful
- Converting timings from milliseconds or microseconds to seconds for consistency
- Converting server timestamps (UTC or epoch) to a consistent time zone
- Handling missing values in the datasets
R allows filtering the data set conditionally, adding a new column, and populating the value based on specific criteria and much more. For instance, if you want to filter the dataset for only specific error codes or time duration, you can easily do that in R.
Quite often, we may need to covert our data between long and wide formats, especially while processing business data for constructing a workload model.
For example, imagine you are analyzing the number of work orders from an ERP system. The data might look something like this:
For our analysis, it makes more sense if the data is in a “long” format. For cases like these, the Reshape package in R would be a perfect solution:
Visual Data Exploration
Analyzing large volumes of data is easier when we can visualize it. This is because our eyes can quickly pick up common patterns. GGPlot2 and Plotly are some of the standard visualization libraries available in R.
The following scatter plot was built in R using Plotly in R Studio. Fortunately, Plotly provides some of the interactive features that I am used to in Tableau, such as filtering, mouse over to see data values and zooming in and out. This comes in handy when we want to explore the data.
You can play with color, size, shape, or opacity for adding a new dimension to your data set for analysis. Please check my video presentation if you would like to see R in action.
R is known for producing rich graphs that can be used to communicate performance test results with your stakeholders.
This is an example graph created using R that cleanly communicates the Pass-fail ratio of all the APIs included in the load test.
Taking it a step further, you can build interactive dashboards in ‘R’ using Shiny apps. Shiny is an open-source R package that provides a sturdy web framework for building web applications using R without requiring the knowledge of HTML, CSS, or JS.
Overall, my experience with learning and using R has been positive so far. I am quite keen to explore it further and look for opportunities to fit R in the automated performance testing context.
Learn More about the Performance Advisory Council
Want to see the full conversation, check out the presentation here.