As always, it was great to join the PAC online meet-up and experience the mighty collective wisdom of such a big group of experts. Many thanks must once again go out to the Neotys team for arranging these events – I think they genuinely help to drive the performance testing community forward.
As usual, there were too many points to note them all, and it is really worth listening to the recording if you weren’t able to join live. But here are a few highlights that stood out to me. These are just from my memory, so apologies to anyone if I misunderstood or twisted your points – send the Bitcoin to the usual address and I will be happy to update this 🙂
What was the Neotys PAC meet-up topic?
Ganesh Kale had suggested the session topic, explaining that for many projects he is still challenged to persuade the business sponsors that performance is important and how it is not always easy to provide hard facts to show this. Talking about the risk of having performance failure is useful to some extent, but it depends on the project and the people. It would be great if we could understand how much the application is “worth” – for example, the cost of downtime – and calculate what % of that is due to performance issues. With the tech stack we have now, is it possible to automate this process and prove the value of performance once and for all?
Setting the foundations
Luca Chiabrera responded about how we show the tangible impact to customers. Load testing is about risk reduction, and we need to collect the stats to prove how risk is reduced. We have basic metrics like performance incidents and availability, but it would be nice to prove revenue-driving metrics like conversion rate. Your performance maturity will determine where your focus should be. You need more maturity to get to the point where you can take advantage of revenue-related metrics.
Luca’s point was nicely made. On many occasions, the performance engineer is an external consultant or freelancer, so naturally we have to convince the customer of the value we bring. But even when we are on staff, we are often part of a centralized team providing services to different projects, or we simply need to make the case to the managers to spend money and time on performance. Either way, it is still essential to prove that it’s worthwhile.
But Henrik was keen to discuss if there is a tech solution possible and discover if anyone is using an automated solution (APM, analytics or something similar) to collect useful metrics that would prove the benefit of our work. Jonathon Wright jumped in to talk about some cool-sounding projects that he has been involved in where they have used a technique called “dark canary” (dark launching together with canary rollouts), directing a small group of users to the new features and harvesting metrics on their behavior from Google Analytics and other sources, to determine the benefits of the new features. I made a mental note that I really must try to get some work in the retail space, or anywhere that I will be encouraged to do all this cool stuff!
However, it seemed that Jonathon was the only one on the call who is working actively in this area. Maybe this reflects the reality that most companies either don’t yet have the know-how or desire to collect all the metrics, or are not letting performance engineers join the party.
Famously, Etsy did blue-green deployments to test things, such as including the cost of shipping in the product cost rather than having it separate. They would then test this feature on a small group of customers to measure the value. Alexander Podelko pointed out that they did not actually do much “performance testing” per se, as they relied on rapid feedback from production to guide them. But I still consider that a good example, as all these things are still performance engineering in my eyes, even if running tests in the classic style was not a major part of it.
Andrea Gallo asked us who really has access to the low-level data that would allow us to match up our performance data with real-life business outcomes like revenue. It is rare to get this level because performance engineers work at a low level.
Andrea summarized the position that I think most of us are in now: how the “magic dashboard” that will translate performance results into business value metrics does not yet exist.
The magic dashboard
What would such a dashboard look like?
We need to choose the right metrics to show direct financial gains.
I can think of three major categories – revenue, throughput and resource usage – where we can prove immediate value of performance gains by translating them into dollars.
- Revenue is straightforward to track for some companies – e-commerce, internet/tech companies – where customer clicks lead to cash, but it’s more difficult to prove for many applications.
- Maybe the rest of us can get value out of throughput, by which I mean either speed of processing or productivity of staff. For example, if your company has any kind of workflow system, call center software, or any situation where humans work through a queue of business processes, then speeding things up can directly increase productivity. One call center I know about needed to hire 30 additional staff after seeing an increase in response times.
- Now that so many systems are hosted in the cloud, there is a direct cost saving by reducing the amount of hardware needed to run applications — something mentioned by several people in the discussion as an obvious target for performance optimization.
- In some cases we can add a fourth point, SLA compliance, to the list, if your company is at risk of penalties for slow responses. Picking out key SLAs is the next best thing if none of the other points are applicable to your situation. If the business mandates, for example, that payment processing time is the most important thing for them, then even though we can’t show direct value, there’s still great value for them if we can prove that it is getting faster.
How to collect and display the metrics?
The most powerful case is where we can collect metrics from production, like from an A/B test, and then display both sets of metrics in real time and prove that A or B leads to better results. But that’s not possible in many cases.
Alternatively, we can compare different versions in production over time and track the outcome, but usually it’s hard to prove that different results were due to a specific change.
When you are limited to the test environment, you can prove the result of performance tuning, but the actual cost gains are extrapolated, not real. In that case you have to do a lot of groundwork to establish how your test environment relates to production in different circumstances (in other words, scalability testing so that you understand in detail how the environment performs in all different configurations). But once everyone accepts the relationship between test and production, then proving the value of performance changes can be possible in test, and you can then run controlled tuning exercises to your heart’s content. Just remember to keep the feedback loop from production back to your tests, because the relationship between production and test is constantly changing and needs to be maintained.
Agreeing what you want to monitor and designing how to test it can be tough. Collecting the metrics and building the dashboard should hopefully be the easy part, given the tools and techniques we have available now. Even if you are not blessed with the advanced APM and analytics tools, creating a dashboard is easy with open source tools like Grafana. You may have to create scripts to pull out the data you need or work with project teams to do it. The incentive is there to help you – so if developers need to add extra logging or tag their headers, it’s up to you to make the case persuasively. Showing one or two small gains can kick-start things and get people on board, especially if the gains are coming without the need for large investment. In most companies you have to bring in new things slowly over time to avoid resistance.
One million years ago…
As Ian Molyneaux pointed out, the problem of persuading the business has been around as long as testing has. We used to use “ROI analysis” to prove the value of something, and quite a bit of time was spent working out different metrics that could prove the benefit of performance testing.
The problem back then was that we didn’t have a lot of data, so often the answers were arrived at by filling in questionnaires, speaking to people and manually extracting data by queries.
There are various things you can track which show indirect revenue gains – from the number of performance issues in production and the time spent investigating/supporting them, to the time spent on productive (creating new things) vs. defensive (fixing, supporting) work, or even customer satisfaction.
Nowadays many of these metrics are available in systems ranging from incident management, to APM and analytics tools, to timesheet systems. Most could be harvested by connecting to APIs, so there is an intriguing possibility to pull out the metrics automatically and track them over time. But I haven’t encountered an organization who has the determination to make it happen.
Stand on the shoulders of giants
Alexander Podelko provided us with some background references that are essential reading. We should all thank Alexander for all the great work he does to educate people about the background of performance engineering and spread the knowledge of the fundamentals. His blogs and papers are always thought-provoking. If you haven’t read them, then please do:
Business Case for Performance: https://alexanderpodelko.com/blog/2016/01/10/business-case-for-performance/
Magic Numbers or Psychological Thresholds: https://calendar.perfplanet.com/2019/magic-numbers-or-psychological-thresholds/
www.WPOstats.com collects evidence of web optimization.
Alex also recommended a book by Tammy Everts, Time is Money: The Business Value of Web Performance, who had been something of a hero of mine a few years ago when she wrote a great series of blog posts for SOASTA.
I will add one last link to an article from Alex which is a particular favourite of mine – not related to business value, but read it anyway: https://www.neotys.com/blog/neotyspac-forgotten-art-performance-modeling-alexander-podelko/
In summary – collected tips from the Neotys PAC experts
- Put the business first. Be sure to have a business reason for everything you do. (Luca)
- Learn to speak in language that is important to your management. (Leandro)
- Consider comparing your customer against the competition – managers should be worried if their site is slower than the competition. (Henrik)
- Cloud resources cost “real money,” which is directly related to efficiency. (Alexander)
- When discussing performance with the business, start at the top – “What is the impact on you if this bit breaks?” (Scott)
- Ask yourself “so what?” about each action you are taking. How is it going to impact the business? (Andrea)
- The benefit tends to be long term, not in days or weeks, but months and years, so keep tracking and review monthly, and over a year you will see the benefit of reduced number/duration of outages and faster key response times. We performance engineers are only part of a big team who make it all happen — we can’t claim all the credit, but the project is aware of the part we play.
Post-script: Make a real difference in the world
Jonathon made a call for volunteers to help with crowd testing and rollout of a COVID19 mobile app that he is involved with. You can make a difference in this time.
See https://www.coronavirus.fo for details.
Learn More about the Performance Advisory Council