Beyond creating indexes – Part 2

Indexes are not the only alternative when optimizing queries!

On part 2 of this blog post series we are going to focus on sub queries/views/common table expressions and what can be done to avoid unnecessary work from the database server.

Views = sub queries = common table expressions (CTE)

MySQL/MariaDB calls views, common table expressions and sub queries in the WHERE clause, derived tables: it’s a fair name as they can all be used like one. The only difference is that the dataset is actually the result of a query or expression.

The database server uses two main strategies to deal with derived tables: it looks for opportunities to merge it with the outer query or, it stores the result on a temporary table so the outer query can read from it.

Merge vs Temporary table

The MySQL optimizer (the server component that decides how a query should be resolved) will check which strategy is feasible and apply the most optimal one (to the best of its knowledge). The merge approach is usually better as the tables in the subquery/view/CTE are joined and treated as any other table in the query. Take a look at the example below:

The server realizes that the sub query is just a SELECT from table products that it is then joined with orders in the outer query. You can see in this case that executing SELECT id, description FROM products and storing the result in a temporary table could be expensive (will probably end up in a full table or index scan) and kind of pointless. Instead, the server process the query in the following way:

It merges the sub query expression into the outer query, resulting in a straight join between orders and products

As we will see later, not all sub queries/views/CTEs can be merged. In these cases, the only option is executing the SQL in the sub query/view/CTE and storing the result in temp table. This is called materialization.

To confirm if a derived table is being materialized (and thus, not merged), you can use use EXPLAIN FORMAT=JSON. You will see a node in the output called materialized_from_subquery which will include the operations required to satisfy the derived table (tables access, joins, group by, etc).

Unfortunately, for MariaDB, the EXPLAIN output is not that clear. When materialization happens you will see a temporary_table node.

On the other hand, if the view was merged, you will see that the tables contained in the sub query / view / CTE are not under a materialized_from_subquery section and are joined with the outer query tables instead.

Condition push down

MySQL / MariaDB developers are always looking for ways to make the optimizer smarter so it can detect optimization opportunities automatically. If the server has to materialize the derived table (store the result in a temp table) , it could move any conditions in the outer query affecting the derived table into the sub query/view/CTE execution. The goal is to avoid retrieving data that will not be required by the outer query anyway.

Look at this basic example:

When the server materializes the view during execution, it will be computing totals for all flights. But the outer query is just interested in flight with ID 1332. You can see how much execution time and resources could be saved if the view only knew this. This is where condition push down saves the day and moves the flight_id= 1332 condition into the view query. The server ends up running this query when materializing the view:

Derived table condition push down was introduced in MariaDB version 10.2 and MySQL version 8.0.22

Common performance problems and ideas to prevent them

When the following operations are present in our sub query /view / CTE, the server will not be able to merge it with the outer query.

  • Aggregate functions or window functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
  • DISTINCT
  • GROUP BY
  • HAVING
  • LIMIT
  • UNION / UNION ALL
    Full list here

If you combine this with the query defining the derived table reading, joining and/or sorting too many rows, you get one of the most common problems around derived tables. What can we do?

  • First thing to ask ourselves: is there a chance of moving / removing any of the clauses forcing materialization from the sub query / view / CTE? I know it’s probably hard, specially when dealing with views that are being used for other purposes. If it is a sub query or a CTE, the chances of reshaping the SQL are higher.
  • If we can’t escape materialization, it is worth performing a query tuning exercise over the sub query / view / CTE itself (in isolation) to see where the bottleneck is.

This could happen either because:

  1. The server version is previous to 8.0.22 (if running MySQL) or previous to 10.2 (when working with MariaDB)
  2. Or because there are operations blocking this optimization. Below is a list of the most common ones:
  • Before 8.0.29, the use of UNION
  • LIMIT
  • Conditions containing sub queries
  • If a materialized derived table is a CTE, conditions are not pushed down to it if it is referenced multiple times
    Full list here

If the problem is the server version, you can implement “sort of” a manual condition push down by adding variables to the sub query and outer query conditions, and setting those values with each execution case. If you want me to create a detailed post about this technique, please leave a comment below!

In most cases, the ORDER BY clause in a derived table is ignored. However, there is an automatic optimization that will actually propagate the ORDER BY to the outer query. This will happen if all these conditions are true:

  • The outer query is not grouped or aggregated.
  • The outer query does not specify DISTINCTHAVING, or ORDER BY
  • The outer query has this sub query / view / CTE as the only source in the FROM clause

Again, EXPLAIN FORMAT=JSON is your friend and will tell you is this is happening (a.k.a you will see an ORDER BY at the top of the output affecting the entire query result and not just the sub query). If you would like more details about this, please leave me a message below!

As opposed to what many people believes, CTEs that can’t be merged are materialized once for every query they are used in.

If what you are looking is to amortize the materialization of a derived table across multiple executions, create a temporary table explicitly by issuing:

CREATE TEMPORARY TABLE AS query

You can then use it as a regular table for the duration of the session. The server will drop the temporary table automatically when the session is closed.

Bonus tip: you can actually add indexes to the temporary table (yeah, I know, it’s against the blog series tittle, but not the main recommendation and too good not to share!). Example:

A special word about Views..

CREATE VIEW supports an ALGORITHM parameter which can be used to enforce materialization or merging, in case it is feasible.

If you don’t provide a value, it will default to favor merging when possible. More details about this clause here.


Leave a Reply

Discover more from query-optimization.com

Subscribe now to keep reading and get access to the full archive.

Continue reading