Context-Driven Performance Testing
There was an interesting discussion about what continuous performance testing solves and what it doesn’t after my presentation at Performance Advisory Council (PAC). And it became obvious for me that performance testing in general and specific performance testing techniques should be considered in full context – including environments, products, teams, issues, goals, budgets, timeframes, risks, etc. The question is not what technique is better – the question is what technique (or what combination of techniques) to use in particular case (or, in more traditional wording, what would be performance testing strategy).
The term context-driven appears as a great fit for me here – in its classical form as described at Context-Driven-Testing (in the functional testing community, where it was introduced, it became a loaded and politicized term – but all the original founding principles make perfect sense for me for performance testing).
Traditional load testing (optimized for the waterfall software development process) is focused, basically, on one context – pre-release production-like – so the goal was to make load and the system as similar to production as possible. Well, with some variation – such as stress, spike, uptime/endurance/longevity and other kinds of performance testing, still mainly based on realistic workloads.
I recall that when I shared a very good white paper Rapid Bottleneck Identification – A Better Way to do Load Testing (here dated 2005 – I guess it was around that time) I was slammed by one renown expert as encouraging a wrong way to do load testing. Well, maybe the name should be modified to include the context – A Better Way to do Load Testing for Bottleneck Identification – but indeed a complex realistic workload is not optimal for many performance engineering tasks we have earlier in the development lifecycle (which are significantly more important nowadays as we indeed can do performance testing early).
Drastic changes in the industry in recent years significantly expanded the performance testing horizon – agile development and cloud computing probably the most. I attempted to summarize the changes in Reinventing Performance Testing. Basically, instead of on single way of doing performance testing (and all other were considered rather exotic), we have a full spectrum of different tests which can be done at different moments – so deciding what and when to test became a very non-trivial task heavily depending on the context.
For example, let’s just consider environmental aspects: options nowadays include traditional internal (and external) labs; cloud as ‘Infrastructure as a Service’ (IaaS), when some parts of the system or everything are deployed there; and service, cloud as ‘Software as a Service (SaaS)’, when vendors provide load testing service. There are some advantages and disadvantage of each model. Depending on the specific goals and the systems to test, one deployment model may be preferred over another.
For example, to see the effect of performance improvement (performance optimization), using an isolated lab environment may be a better option to see even small variations introduced by a change. To load test the whole production environment end-to-end just to make sure that the system will handle the load without any major issue, testing from the cloud or a service may be more appropriate. To create a production-like test environment without going bankrupt, moving everything to the cloud for periodical performance testing may be a solution.
For comprehensive performance testing, you probably need to use several approaches – for example, lab testing (for performance optimization to get reproducible results) and distributed, realistic outside testing (to check real-life issues you can’t simulate in the lab). Limiting yourself to one approach limits the risks you will mitigate. It is important to consider that selecting tools – if the tool doesn’t support all approaches, you may end up using different tools, probably introducing noticeable overheads.
The topic was also covered by Ramya Ramalinga Moorthy in another PAC presentation: Continuous (Early) vs System Level Performance Tests. However, I guess, Ramya’s post was focused on highlighting the differences between these two groups of tests, explicitly ignoring further details – while we have many more subtleties underneath. Actually continuous and early performance tests may be quite different tests – so we can group them together only to contrast them with traditional load testing. As well as both continuous and early tests may be system level (in agile development we are supposed to have a working system on each iteration – so theoretically the system is available early, although with limited functionality).
The purpose of continuous performance testing is, basically, regression performance testing. Checking that no unexpected performance degradations happened between tests (and verify expected performance changes on the established baseline). It may start early (although it may be a bigger challenge on very early stages) – and probably should continue as soon as any changes happen to the system. They may be on a component level or on a system level (considering that not all functionality of the system is available in the beginning). Theoretically, it may be even full-scale system-level realistic tests – but it doesn’t make sense in most contexts.
For continuous performance testing we rather want short, limited scale, and fully reproducible tests (which mean minimal randomness) – so if results are different, we know that it is due to a system change. For full-scale system-level tests to check if the system handle the expected load we more concerned to make sure that the workload and the system would be as close to real life as possible – and less concerned with small variations in performance results. It doesn’t mean one is better than another – they are different tests mitigating different performance risks. There is some overlap between them as they both target performance risks – but continuous testing usually doesn’t test system’s limit and full-scale realistic tests are not good to track differences between builds.
Actually, there are many other testing techniques, some even don’t mention testing (and testers). For example, I still wonder how chaos engineering (popularized by Netflix) differs from [one way of] reliability testing in production (involving, of course, a lot of performance aspects).
Moreover, performance testing is not the only way to mitigate performance risks – there are other approaches too and the dynamic of their usage is changing with time – see, for example, Performance Engineering and Load Testing: A Changing Dynamic. So the art of performance engineering is to find out the best strategy of combining different performance tests and other approaches to mitigate performance risks to optimize risk mitigation/costs ratio for, of course, the specific context.