#NeotysPAC – Databases Done Right, by Srivalli Aparna

 [By Srivalli Aparna]

Typically, performance testing is an attempt to simulate production like load conditions on the System Under Test to assess its performance attributes such as user experience, stability and scalability.

However, one of the overlooked aspects in this exercise is the significance of a production-like test environment for obtaining realistic performance results. Performance testers often compare the hardware specs between “Test” and “Production” servers to verify if the test environment is set up “production-like” – but don’t take it any further.

In my Neotys PAC presentation, I explored how it’s not just hardware specs that matter when it comes to setting up a production like test environment, it’s the software configuration as well that matters.

Some examples of software configuration include connection pool size, JVM heap size, or garbage collection algorithm. Each of these play a key role in the performance of the application.
In this talk, I have solely focused on the database tier and presented some tips on how to set up a production like database environment for the performance testing to avoid any false performance

“The exact same query can perform very differently in two different environments, in spite of having same hardware specs”

Databases are smart

Every database has a component called an optimiser which performs a lot of analysis to determine the most efficient way to execute a SQL statement.
During this process, optimiser makes a lot of crucial decisions such as selecting the right join method to combine two or more tables, choosing the best order to join those tables, choosing the type of physical access method to retrieve data from a table (example using an index vs full table scan).
It is important for us, as performance testers, to understand how this works, so that we don’t accidently influence the optimiser’s decisions and distort the load test results.

Factors influencing query performance

In my presentation I have touched upon various factors that could change the way a query performs from one environment to the other:

  • Data volumes
  • Nature of test data
  • Optimiser stats
  • Database parameters
Data Volumes

The more data present in a database, the more work the database must do to perform any read or write operations. The performance of a query running against a table with 100 records can be very different from that of 100,000 records.

It’s always a good practice to make sure that the data volumes in the performance test environment are
close to that of production. This can be done by:

  • Comparing the schema sizes between the two environments
  • Verifying how recent the test data is
  • Or by running a simple “Select Count…” query on the key tables

So, what to do if the numbers are not close enough?
There are a couple of options here:
1) Get a copy of data from the production and import it into the test environment
2) If that is not possible, then creating our own test data.

The second option is less recommended as it could lead to other issues as we will see in a moment.

Nature of test data

This factor is relevant when we are artificially pumping data into the test database. It’s not always the quantity of test data but quality matters too.
For example, in an insurance system, whether a policy has 5 claims against it or 100 claims could impact the number of records fetched. During SQL execution, the number of records returned by an interim database operation will have a huge impact on the amount of work to be done by the subsequent operations. In database terms, this is referred to as “cardinality” and it is important to not skew this while doing any data population activities.

The best practice while artificially injecting load test data is to keep an eye on:

  • The number of nulls in the key columns
  • The number of distinct values
  • Highest and lowest values in the numeric columns
  • Distribution of values (skewness in the data)
Optimiser stats

Optimiser statistics are a collection of data that describe more details about the database and the objects in the database. These statistics are used by the query optimizer while choosing the best execution plan for each SQL statement.
Missing/stale statistics feed insufficient/incorrect information to the optimiser and often result in poor execution plans. Therefore, it is important to make sure that the automated stats collection job is set up in the performance test environment. If required, manually collect the stats after any bulk test data preparation.

Database parameters

Every database has parameters configured to be executed at initialization. Some of these parameters have a considerable impact on the query performance.
For instance, in spite of having the same Physical memory in two different environments, if the database memory parameters are set differently, we will see completely different query performance. In such a situation, the performance test results may not mean much in terms of expected performance in the production environment.

It is worth doing a quick check to compare the key parameters, if not all of them. The key ones are:

  • Memory parameters
  • Parallel processing parameters
  • Cursor parameters

This is by no means an exhaustive list, but hopefully it makes us think in the right direction and enables us conduct better performance testing.

Learn More about Srivalli Aparna’s presentation

Do you want to know more about this event? See Srivalli’s presentation here.

Leave a Reply

Your email address will not be published. Required fields are marked *