The role of the performance engineer has evolved steadily as the duties they are asked to take on have changed over time. Once limited to just noting the effects users had on a computational system, they now help design the very system whose use that they monitor.
When the web began its growth, the first performance needs were simple. Like finding out what traffic was passing between a site and a client. LoadRunner was part of the initial wave of tools that were widely used. It can measure system behavior and performance under load as well as simulate thousands of users concurrently.
At its heart, LoadRunner is a mechanical and repetitive system. The actions simulated do not change with the load, even though it varies. The results that can be gained from it, though available for multiple networking protocols, are relatively unsophisticated because the user simulation method is unsophisticated. But at the time, it did what performance engineers needed which was to try and max out a single server that was under load.
Things changed for performance engineers in the early 2000s because of the structure of what they were measuring forced the need to. Web applications began their ascendancy. The client program in a web app runs in a browser. Online sales would be one example of a web app that grew tremendously in this period.
Testing these kinds of programs became considerably more complex as they grew in both absolute file sizes that needed to be transferred and in the resources that were involved in the program execution. Pure user test numbers would no longer accurately reflect what the actual loads on the computing resources that could be expected.
Table 1: Size of Amazon front page in bytes, by year
Performance engineers then began to instrument with test suites that could simulate multiple users by the use of numerous framework-dependent slaves. Load cases started to be reproduced on more than just mere quantity. Different load sizes could be duplicated to see what effects were caused by various examples. The complexity that came with this other type of use simulation grew mightily, and the resultant complexity of the tasks facing performance engineers followed suit.
Tools could rate page design against what was considered the best practice rules for page construction. The analysis of the elements of a page and what was required for the use of those elements made front-ends more efficient and reduced their friction. (Example of element analysis?)
The Rise of the Back-end
Performance engineers eventually had to involve the back-end of a system in their work, not just the front-end. Bottlenecks might arise from those back-end components and have the same or higher magnitude than anything the front-end alone could cause. But the overall emphasis on front-end optimization was still active in the earlier work of performance engineers because it always gave good results on improving throughput.
The balance between front and back-end components of a system grew equally over time. The performance information shown in a front-end client depended on getting it served from the back-end in an efficient manner. The performance engineer had to find bottlenecks to improve the result, no matter where they originated.
Into the Cloud
Where the back-end was located proved to be a moving target. When datacenters with clusters of servers were eventually replaced by the cloud machines that could be physically found anywhere, the tautology of what hardware items were causing which effect was broken.
The cloud and virtualization forced the evolution of performance engineers to improve the models they used to do their work. They had to provide reports that displayed more complex scenarios around the performance of the system overall. There were more ways to analyze the performance of the system due to the complexity introduced by cloud computing and virtualization.
Performance engineers found that analyzing logs of various platforms was a meaningful way to gain insight as to what was going on. In some ways, it was the only way to see what was going on after a test. It became more and more common that some load cases were generated for a test (usually in a cloud-based service) that simulated a distributed demand that was geographically disparate. Looking at the logs made after such a test leads to actionable insights that would not have otherwise been visible.
Of course, obvious problems like the connecting of networks between program components saturated with demand growth might be analyzed with one kind of tool rather than a straight log analysis. But the test might be done on a cluster that allowed capacity monitoring to be observed in real time as the demand scaled upward.
Those concerned with performance also had to develop what the values of the performance criteria would be. Perhaps the overall test was going to be a cluster’s throughput. They had to know what specific standards were going to indicate that throughput in a meaningful way. It may not be the speed of bits down a particular wire. It may end up being something like effective latency as seen at the user endpoint.
Would this be the place where you could mention something about the Performance Engineers having to deal with different business requirements? How did their interactions with business people or other teams change?
All Parts Matter
The performance engineer had to understand the overall computational situation to be able to evaluate it. They had to derive what needed to be measured to achieve a specific goal. To do that, they had to be involved in all parts of the computational stack-, not just one particular segment. They had to consider all parts of the process that ended up creating results.
Sometimes those parts could be well hidden. An API under test could give wildly varying results depending on the load. One API could overload another if they were constructed separately and were not federated. The ability to detect this kind of multiple system behaviors simultaneously using data stored in different logs has become the new norm in the performance area.
Performance started out as being defined as data going from one machine to another and was confined to limited areas. Over the last 20 years, performance evaluation has grown so much broader in scope, reaching all the edges of the computational universe and having to deal with all of them. The person having to deal with performance has had to grow and change with the increase in scope, learning how to integrate the entire process. By doing that, they have become uniquely qualified to deal with all the aspects of design because they deal with all the effects of it.
Performance engineers have evolved over the years as the challenges they have faced have morphed into far more complex situations. Anyone concerned with the overall computational performance like test practitioners and developers can and should take advantage of these time-tested approaches and methods to move on from a simple pass/fail mindset to one that encompasses the broader scope of vision that performance engineering brings with it.
Larry Loeb has written for many of the last century’s dominant “dead tree” computer magazines including BYTE Magazine (Consulting Editor) and the launch of WebWeek (Senior Editor). Additional works to his credit include a seven-year engagement with IBM DeveloperWorks and a book on the Secure Electronic Transaction (SET) Internet Protocol. His latest entry, “Hack Proofing XML,” takes its name based on what he felt was the commercially acceptable thing to do.
Larry’s been online since UUCP “bang” (where the world seemed to exist relative to DEC VAX) and has served as editor of the Macintosh Exchange on BIX and the VARBusiness Exchange. He lives in Lexington, Kentucky and can be found on LinkedIn here, or on Twitter at @larryloeb.