In today’s world, speed is vital to all types of businesses, hence pushing many enterprises to adopt DevOps to achieve two critical objectives. One is to increase the number of code releases. The other is to shorten the amount of time between releases. The ability to release a new feature before others do will give them a critical advantage against competitors and will surge their market share. DevOps aims to bring together two teams, development and operations, offering higher flexibility and faster deployment times. In this article, the DevOps benefits, the importance of performance engineering, and the way to move forward with automation performance testing in a DevOps world will be discussed.
Benefits of DevOps and Microservices
Companies adopt DevOps for multiple beneficial reasons. One of them is the faster deployments to production by merging the Development and Operation steams. Applications become available in a short period, giving enterprises the ability to innovate faster than their competitors. Also, by adopting shorter development cycles, they reduce implementation failures and long six–month deployment periods belong to the past. Shorter development cycles allow quick identification of code defects and quick fixing. There is a reduction of implementation failures since teams work while using agile programming principles. With DevOps, organizations are engaged, thus bonding and a good team spirit is created.
They work towards common goals, and as a result, they move away from the finger-pointing mentality. When both Developers and Operations feel like belonging to the same team, they work towards achieving team goals over personal ones. Finally, DevOps embraces automation and microservices, making it easier to deploy faster and reduce IT costs, since automation resolves many tedious manual tasks that are required for deployments. One of the most successful DevOps practices is a microservices architecture implementation, which boosts DevOps methodology. In this article, we are going to discuss the challenges of performance, when moving to microservices, and how automation in performance should be implemented.
What happens to performance when moving to DevOps with microservices architecture? Is performance engineering still necessary?
A lot of DevOps evangelists will insist that moving to DevOps with a microservices architecture eliminates the need for performance. They claim that nowadays, with infrastructure and coding simplified, developers can do performance engineering themselves, as complexity is reduced. Another claim is that the microservices architecture would help teams deploy in production and remove changes in the event of any critical performance issues, rendering performance risks non-critical. On the contrary, a faster rate of change means higher performance risks, making performance in DevOps vital for the business. Additionally, companies are moving to cloud/microservices gradually, which adds another threat to performance. Until the move to the cloud is complete, there is extra latency, as applications still must talk to the monolith that is mostly deployed on-premise.
It must be acknowledged that the cloud is not a panacea that scales up and down whenever we want to; scaling up comes at a high cost. It is important to realize that any changes these cloud providers make to their components such as load balancers, virtual machines, or other, can impact the performance of your applications. Most cloud migration journeys present a lot of surprising issues related to the performance of applications. Breaking down the monolith will only create more risks for the company. Until the journey is completed, there is a hybrid application making it harder for developers and performance engineers. Competitors are still improving their performance, and we all understand the importance of a well-performing application to the end-user. Companies lose millions for every second of the slowness of their applications. Companies need to aim for stability during peaks, as change and fast development rates come with significant risk.
Having your application offline on a peak day can not only affect your revenue but can also lead to your customers moving to competitors. The modern era is all about speed; you only have a few seconds to order the new costume for tonight’s party when you are in the underground before Wi-Fi gets lost, and the impulse has faded. In conclusion, the risk of change, the need for having a competitive performance to move ahead of competitors, the need for stability during peaks, and the dangers created when migrating to the cloud, make performance testing a necessity for all modern enterprises.
How do infrastructure changes affect performance?
While breaking down the monolith to microservices, there is a transitional period throughout where applications include both monolith and new microservices. There are three different architectural stages in this transition; initially, the monolith, superseded by the monolith along with microservices, and finally, once the transition is completed, there will only be a microservices architecture. The architectural changes affect the four different activities that all performance engineers handle: planning, scripting, running, and analyzing.
Planning, which is the initial phase of the work of performance engineers, becomes shorter, as it is easier to plan a test for one small service rather than for the whole monolith. However, although the process of planning is simplified, it must be repeated many times for all microservices within the application. This creates the need for more but less complicated performance engineering. Scripting is considered a more straightforward task as the scripts are now much shorter and less complex, calling only a few URLs (Uniform Resource Locator) or APIs (Application Programming Interface), simplifying the life of performance engineers. After that, running scripts is mostly done through the pipelines. The engineer must decide how long the pipeline test should run for, so that performance risks are mitigated.
The challenge for Performance Engineers will be when developers want their pipelines to run short, as they want to deploy as quickly as possible. Performance engineers should decide the length of their tests so that all risks are mitigated despite the development team’s dislike. Performance engineers in cooperation with the business and the development teams must decide whether the release should get blocked or if the engine will give a warning. Later in this article, we are going to focus on two of the activities that can be automated in DevOps to help performance engineers manage their increased workload. These two tedious activities, but of high importance, are scripting and analyzing.
What is the challenge of performance in DevOps?
Many enterprises have increased the frequency of application deployments through the best practices of Agile and DevOps. With increased velocity comes the risk of quality and particularly, performance-related issues and bottlenecks creeping into your applications unnoticed. The complexity of scripts is reduced, but the volume is highly increased because of the number of microservices growing. The analysis ends up being more involved as services talk to each other, making it harder to analyze which service is the bottleneck. The challenge for performance engineers is to cope with the speed of the development and ensure good performance for their application.
What is good performance?
Before we move on to how automation should be in DevOps, we need to understand what performance is. Most people think that performance is just about throughput and response times. Performance is much more than only a straightforward measure. A comprehensive way of measuring performance is the Capacitas 7 Pillars of Performance, as seen in Figure 1. If any of the pillars fail, then the overall performance and the user experience is impacted, and the cost of supporting the service increases substantially.
Figure 1. Capacitas’ 7 Pillars of Performance
Figure 2. A workflow of the 5 steps involved in automated analysis
Step 1 – Metrics Framework
Initially, a service metrics framework should be identified. We should always keep the seven pillars of performance in mind when we define the service metrics framework. Our metrics should be falling into three main categories: business, service, and components metrics, following the Information Technology Infrastructure Library (ITIL) methodology’s best practices.
Step 2 – Threshold Comparison
Figure 3. Examples of Threshold Comparison
Once we know the metrics Framework, our next step is to analyze our tests, comparing the key performance indicators (KPIs) that are defined in our metrics framework, against thresholds. Setting thresholds requires in-depth knowledge of both performance engineering and our application. We aim to have realistic limits to assess the application performance against all pillars of performance. Once our boundaries are defined, we need to create the right automation logic to answer the following questions:
- Does it pass the threshold?
- How many times does it pass the threshold?
- How many metrics that are defined in our framework, pass the threshold?
This resembles the process a Performance Engineer would follow, when tests are manually analyzed. Following the threshold analysis, the engine should give the test a pass or fail status, depending on the results of the comparisons against the thresholds, the same as a Performance Engineer would manually do.
Step 3 – Baseline Comparison
Figure 4. Example of Baseline Comparison with delta between the baseline and the regression response times
Following the threshold analysis, automation should include a baseline comparison. Baseline comparison is highly important in understanding whether a performance test is a pass or a fail. Previous baselines are the key to understanding if the latest code deployment has affected the overall performance or not. In the baseline comparison, the following questions should be answered:
- What are the changes over the last run?
- What are the changes over the last release?
- What are the changes over previous releases?
- What is the trend over time?
At this point, it is imperative to rule out any baselines that are no-valid runs, as the engine will not deduce the right status comparing to them. The engine should be ruling a pass or a fail depending on the above conditions.
Step 4 – Pattern Analysis
Pattern analysis is the last step of the process. Our engine should analyze specific patterns to conclude if the performance test was successful. In this situation, and depending on our tests, we have three different types of metrics. Some metrics that will be ramping, other will be zero and other will be flat but not zero, depending on what they represent. For example, if we have a ramp load test, we expect to see metrics like Processor % Utilization, throughput, and received/sent KB/sec increase gradually following the load profile that is set up for the test. Other parameters, like response times, should remain flat. Response times should stay relatively stable during the test. (see figure 5) Small fluctuation is normal but there shouldn’t be an upward trend. Finally, metrics like contention rate and locks are expected to stay close to 0, as having resource locks reveals that our application is not performing well.
Step 5 – Test Result
Upon completion of steps 1-4, our next goal is to find out whether the performance test succeeded or failed. Depending on the sensitivity configuration of the engine, it can issue the result of the test. The most important aspect in step 5 is to calibrate our engine for the case of having false alerts. This will only happen over time, and multiple runs are required to make sure that the automation framework gives out the expected results. However, an analysis framework like this will be time-saving for analyzing performance testing results, that are usually time-consuming, before they are signed off.
Contrary to what most people might believe, in DevOps, the challenge is to catch up with the speed of development. As previously mentioned, automation is required to improve performance engineering and make it possible for performance tests to be aligned with the development changes. In a microservices architecture, most APIs are simple SOAP/REST calls that can be easily scripted. We can achieve this by creating the following:
Test Data: Test data is important for scripts to run. The performance engineer needs to ensure that rules are set for naming conventions. Most microservices use the same data, including, for example, test users, product numbers, and locales that are the same across all application services. Naming the files appropriately will simplify the automation process by allowing an “easy” pick up the right data file.
Test Scenario: Test scenarios should be predefined if we want to automate the pipeline scripting of microservices. It is important to understand your applications before setting the duration and the number of stages that your test will run. Scenarios should be fixed depending on the change. A risk assessment framework should decide the length of the test.
Microservice ULR: This should be easily constructed by the code of the application or extracted by any existing source such as swagger URL links. Once the URL is known, then the call can be easily scripted with automation.
Service Demand: How will the automation know how much load your test should be sent to the currently tested microservice? The demand should be decided during the planning of the microservice and be well documented in a location that can be picked up by your script automation. Then your engine should send the right load against your microservice, and your test will be scripted and added to the pipeline automatically.
Software engineers will have to make an initial effort to automate performance; nevertheless, automation will give them a timely boost when it gets completed.
How can false positives be reduced, ensuring good results consistency?
Figure 6. The recommended framework to improve automation in DevOps
Ensuring that your automation will always work without many false positives and with good results, consistency can be challenging. To successfully achieve this, companies need to follow the framework shown in figure 6. Companies need to create a process around the whole automation framework. How and when should the framework be calibrated? What else should be followed during this process? Having a process is important for the business to grow and use the framework correctly. Also, the right set of testing tools must be decided and configured so that Engineers will be able to use it efficiently. What follows is ensuring Environment is dedicated and test; data are consistent. Performance Engineers should plan their Non-Functional Requirements (NFRs) and map them to their tests. Setting up the right environments with realistic and consistent data is also vital. Having the proper reporting of performance will help Engineers feedback on the process and ensure that results are consistent and correct. Finally, every team member should be aware of their role and their responsibilities. It is important for employees to be familiar with their role in this process and understand the importance of it.
Conclusion – Pipelines are automated, but what about resilience?
Hopefully, once analysis and scripting for the pipelines to achieve full automation, it should allow the performance engineers plenty of time to work on the bigger picture. Performance testing the microservices will indicate good performance of your application. Once microservices are performance load tested and signed off using the principles of the automation framework above, then Performance Engineers should ensure resilience for their application to ensure excellent performance when all microservices talk to each other. This is where an integrated performance test is not just recommended but is deemed a necessity. Ensuring that the whole application will perform well under the peak, the continuous load is critical for the profitability of the business, and the only way to find that out is by intergraded load tests. The performance engineers will have to collect all scripts, organize them, and run them in parallel, to ensure the application is robust and resilient. Integration is the final but yet the most important step to ensure good performance of your application.
For this blog post, Vasilis used some infographics from CAPACITAS. Check out their blog posts for further information:
- The Seven Pillars of Software Performance
- A Guide to Performance Engineering in Continuous Integration
Learn More about the Performance Advisory Council
Want to see the full conversation, check out the presentation here.