The Importance of Performance Testing Machine Learning Models

One of the biggest successes in modern site reliability engineering has been to take something that used to be hard and make it easy. This is particularly true with machine learning, which by nature is not only data intensive but can also require vast amounts of processing power. In the past, a data scientist wrote a bunch of analytic code in a statistical language such as R. Then he or she deployed the code to high powered servers, loaded the data and waiting for results. While appropriate, this was time-consuming, even if your computing targets were optimized virtual machine. Deploying and running a set of machine learning algorithms could take hours, if not days. Performance testing machine learning models are essential.

Enter Kubernetes and the next thing you know the code is wrapped up in containers that are designed to run in parallel, scaling up/down on demand. The result is tens or even hundreds of containers running the same code simultaneously. Machine learning processing that used to take hours now takes minutes. Also, the code can be updated faster.

Data scientists were in heaven. After all, what’s not to like? Not only could they have faster processing turnaround of the machine learning models, but they could also alter code quickly for improved analysis.

What followed – data scientists took advantage of the new opportunities and started to tweak their code to improve the models. Then the unanticipated happened. The actual cost of running the models increased while the benefits incurred were marginal. Why? The models were never subjected to adequate performance testing.

This might seem like a techno-fairytale, but sadly, unanticipated cost increases associated with machine learning processing happens all too often. Nobody sets out to waste money. Unforeseen costs are the result of two factors.

  1. Developers not having a clear understanding of the impact of altering machine learning models relative to data consumption.
  2. The absence of performance testing as an intrinsic part of the machine learning release process.

Understanding the Cost of Refactoring a Machine Learning Model

Every time a new dimension is added into the machine learning model, you’ll need to process more data. In doing so, it’s going to cost you time or money. It’s going to slow down processing time, or you’ll need to spend more money on additional computing resources to support the increased workload.

Imagine this scenario. You have a machine learning model designed to identify prospective retail investors for a posh mutual fund. The current model, in production, takes into account residential zip codes for identifying high net worth individuals. However, new information is uncovered, revealing that a significant number of current customers live near exclusive golf courses. Given this insight, the data scientist responsible for refining the machine learning code refactors it using a geolocation map of prospective users to golf course membership, assigning a “prospective customer ranking index” accordingly.

The programming required to perform the refactoring is vanilla. To support the revision, additional data about golf courses is required. The data scientist identifies a useful data source and refactors the code incorporating the new golf course logic into the existing code base. She runs a trial test of the new machine learning model on her desktop machine using a few thousand rows from the newly identified course data. The model works as expected, and the code is released.

Nearly a day later the data scientist receives an email from SysOps informing her that the recent update is making it so that the number of pods required in her cluster needs to increase to 300% (which requires manager approval). The classification algorithm responsible for ranking a golfer as likely to invest in the mutual fund now takes 45 vs. 15 seconds. (well below the threshold for real-time suggestions or async email batch queue processing). It turns out that it takes more time to get through the millions of rows of data the new machine learning model requires, but also, the actual time needed to process one row increased significantly too (despite the machine learning classification process). The sad fact is that an anticipated improvement to the underlying algorithm increased cost overall while providing marginal benefit.

The moral of this story is that even the simplest code change can have unintended, significant consequences. This holds for machine learning models and modern commercial applications operating at web scale. The only difference is that Facebook’s, Google’s, and Amazon’s of the world got the message years ago. So, they made performance testing essential to the release process. Many companies that adopt ML internally or outsource to “AI- enabled offerings” generously provided by these behemoths are just now getting the wakeup call about how human non-deterministic systems are to cost-incurring performance issues.

Avoid Performance Testing Machine Learning Models at Your Own Risk

Desktop performance rarely maps to performance at scale. To get an accurate picture of production level performance, you need to be able to exert a significant load on the system. As a single desktop computer is well suited for programming, it’s not equipped to be able to create or support the degree of load you find in production. In many organizations, performance testing machine learning models never goes beyond the desktop. When things go wrong, the all too common response is, “but, it worked on my machine.” The rest is left up to the Finance Department when paying the invoice.

On the other hand, some organizations realize the limitations of testing on the desktop and will put monitors in place on their production systems to detect anomalies. This is useful from a systems administration standpoint, but production monitoring does not curtail cost increases. To accurately anticipate the actual operational costs of a given machine learning model before release, you need to exert a significant load. This is best done within the boundaries of a well-designed performance test conducted in a testing environment that emulates production as closely as possible.

Such performance testing requires that particular attention be paid to load exertion, which is typically done using test tools the support parallel processing, such as those provided by Neotys. Also, the performance testing becomes part of an overall continuous integration/continuous deployment (CI/CD) process. Automating performance testing machine learning under CI/CD ensures that nothing falls between the cracks.

Regardless of whether your organization is trying to get off of desktop-based performance testing machine learning or away from relying upon system monitors to report trouble after the fact, no matter what, you need to establish a performance testing machine learning models as an essential part of the overall release process.

Putting it All Together

Few organizations aim to create machine learning models that incur expenses which could’ve been avoided. However, many do so mostly because of a lack of awareness about the real cost of computing. Machine learning is an essential component of modern software development. Applying appropriate performance testing to all aspects of the machine learning release process makes it cost effective as well.

Learn More

Discover more load testing and performance testing content on the Neotys Resources pages, or download the latest version of NeoLoad and start testing today.

 

Paul Bruce
Sr. Performance Engineer

Paul’s expertise includes cloud management, API design and experience, continuous testing (at scale), and organizational learning frameworks. He writes, listens, and teaches about software delivery patterns in enterprises and critical industries around the world.

Leave a Reply

Your email address will not be published. Required fields are marked *