This section complements Minimize the number of validations. Both validation and variable extraction mechanisms are greedy resource consumers. This is all the more true when the response or header they work on is bulky (a response that is larger than 100 Kb).
It is necessary to avoid including validations and variable extractions, particularly when they use bulky inputs. In some situations, either validation or variable extraction is inevitable. In that case, it is necessary to ensure that the technique used is optimized.
A variable extractor extracts a string from the server response to the request and assigns the string to a variable. The variable extractor can use an XPath expression that points to a portion of an XML document. However extracting data from XML responses is a CPU-intensive operation, particularly when working on large amounts of data.
It is strongly advised to check the accuracy of all variable extractors, avoid XPath expressions, and favor conventional variable extractors whenever possible.
Even if regular expressions are powerful tools, they are often the cause of performance issues. Most of the time, regular slowly-running expressions can be dramatically improved when optimized for efficiency. Optimizing regular expression centers around controlling/minimizing backtracking and the number of steps it takes the regex engine to match or fail.
At its heart, the regex engine uses backtracking. Usually, there are a number of ways to apply a regular expression on a given string, and the pattern-matching engine tries to exhaust all possibilities until it declares failure. To better understand backtracking, consider the following example can be considered:
The regular expression is
sc(ored|ared|oring)x The input string is
scared. First, the engine looks for
sc and finds it immediately as the first two characters in the input string. It then tries to match
ored starting from the third character in the input string. That does not match, and it goes back to the third character and tries
ared. This does match, and it goes forward and tries to match
x. Finding no match there, it goes back again to the third character and searches for
oring. This does not match either, and it goes back to the second character in the input string and tries to search for another
sc. Upon reaching the end of the input string, it declares failure.
An important part of optimizing a regular expression is minimizing the amount of backtracking it does. Due to backtracking, regular expressions encountered in real-world application scenarios can sometimes take a long time to completely match. Worse, it takes much longer for the engine to declare that a regular expression did not match an input string than it does to find a successful match. This is an important fact to remember.
Understanding and optimizing greedy and reluctant quantifiers in a regular expression are essential. A greedy quantifier such as "*" or "+" first tries to match as many characters as possible from an input string, even if this means that the input string does not have sufficient characters left in it to match the rest of the regular expression. When this happens, the greedy quantifier backtracks, returning characters until an overall match is found or until there are no more characters. A reluctant (or lazy) quantifier on the other hand first tries to match as few characters in the input string as possible. For example:
To optimize a sub-expression like ".*a":