
On part 2 of this blog post series we are going to focus on sub queries/views/common table expressions and what can be done to avoid unnecessary work from the database server.
Views = sub queries = common table expressions (CTE)
MySQL/MariaDB calls views, common table expressions and sub queries in the WHERE clause, derived tables: it’s a fair name as they can all be used like one. The only difference is that the dataset is actually the result of a query or expression.
The database server uses two main strategies to deal with derived tables: it looks for opportunities to merge it with the outer query or, it stores the result on a temporary table so the outer query can read from it.
Merge vs Temporary table
The MySQL optimizer (the server component that decides how a query should be resolved) will check which strategy is feasible and apply the most optimal one (to the best of its knowledge). The merge approach is usually better as the tables in the subquery/view/CTE are joined and treated as any other table in the query. Take a look at the example below:
SELECT *
FROM orders JOIN (SELECT id,description FROM products) AS prods ON orders.product_id = prod.id
WHERE orders.date > '2023-08-01'
The server realizes that the sub query is just a SELECT from table products that it is then joined with orders in the outer query. You can see in this case that executing SELECT id, description FROM products and storing the result in a temporary table could be expensive (will probably end up in a full table or index scan) and kind of pointless. Instead, the server process the query in the following way:
SELECT orders.*, products.description
FROM orders JOIN products
ON orders.prod_id = products.id
WHERE orders.date > '2023-08-01';
It merges the sub query expression into the outer query, resulting in a straight join between orders and products
As we will see later, not all sub queries/views/CTEs can be merged. In these cases, the only option is executing the SQL in the sub query/view/CTE and storing the result in temp table. This is called materialization.
How do we know which approach was used for a query?
To confirm if a derived table is being materialized (and thus, not merged), you can use use EXPLAIN FORMAT=JSON. You will see a node in the output called materialized_from_subquery which will include the operations required to satisfy the derived table (tables access, joins, group by, etc).
Unfortunately, for MariaDB, the EXPLAIN output is not that clear. When materialization happens you will see a temporary_table node.
On the other hand, if the view was merged, you will see that the tables contained in the sub query / view / CTE are not under a materialized_from_subquery section and are joined with the outer query tables instead.
Condition push down
MySQL / MariaDB developers are always looking for ways to make the optimizer smarter so it can detect optimization opportunities automatically. If the server has to materialize the derived table (store the result in a temp table) , it could move any conditions in the outer query affecting the derived table into the sub query/view/CTE execution. The goal is to avoid retrieving data that will not be required by the outer query anyway.
Look at this basic example:
CREATE VIEW flight_totals
select
flight_id,
SUM(amount) as total
from booking
where booking_date BETWEEN '2023-08-01' AND '2023-08-15'
GROUP BY flight_id;
SELECT total FROM flight_totals WHERE flight_id= 1332;
When the server materializes the view during execution, it will be computing totals for all flights. But the outer query is just interested in flight with ID 1332. You can see how much execution time and resources could be saved if the view only knew this. This is where condition push down saves the day and moves the flight_id= 1332 condition into the view query. The server ends up running this query when materializing the view:
SELECT
flight_id,
SUM(amount) as total
FROM booking
WHERE booking_date BETWEEN '2023-08-01' AND '2023-08-15' AND flight_id= 1332 <<< Outer query condition!
GROUP BY flight_id;
Derived table condition push down was introduced in MariaDB version 10.2 and MySQL version 8.0.22
Common performance problems and ideas to prevent them
Expensive derived tables that can’t be merged
When the following operations are present in our sub query /view / CTE, the server will not be able to merge it with the outer query.
- Aggregate functions or window functions (
SUM(), MIN(), MAX(), COUNT(), and so forth) DISTINCTGROUP BYHAVINGLIMITUNION / UNION ALL
Full list here
If you combine this with the query defining the derived table reading, joining and/or sorting too many rows, you get one of the most common problems around derived tables. What can we do?
- First thing to ask ourselves: is there a chance of moving / removing any of the clauses forcing materialization from the sub query / view / CTE? I know it’s probably hard, specially when dealing with views that are being used for other purposes. If it is a sub query or a CTE, the chances of reshaping the SQL are higher.
- If we can’t escape materialization, it is worth performing a query tuning exercise over the sub query / view / CTE itself (in isolation) to see where the bottleneck is.
Outer query conditions could not be pushed down
This could happen either because:
- The server version is previous to 8.0.22 (if running MySQL) or previous to 10.2 (when working with MariaDB)
- Or because there are operations blocking this optimization. Below is a list of the most common ones:
- Before 8.0.29, the use of
UNION LIMIT- Conditions containing sub queries
- If a materialized derived table is a CTE, conditions are not pushed down to it if it is referenced multiple times
Full list here
If the problem is the server version, you can implement “sort of” a manual condition push down by adding variables to the sub query and outer query conditions, and setting those values with each execution case. If you want me to create a detailed post about this technique, please leave a comment below!
ORDER BY from derived tables are propagated to the outer query unnecessarily
In most cases, the ORDER BY clause in a derived table is ignored. However, there is an automatic optimization that will actually propagate the ORDER BY to the outer query. This will happen if all these conditions are true:
- The outer query is not grouped or aggregated.
- The outer query does not specify
DISTINCT,HAVING, orORDER BY - The outer query has this sub query / view / CTE as the only source in the
FROMclause
Again, EXPLAIN FORMAT=JSON is your friend and will tell you is this is happening (a.k.a you will see an ORDER BY at the top of the output affecting the entire query result and not just the sub query). If you would like more details about this, please leave me a message below!
Using CTE in more than one query
As opposed to what many people believes, CTEs that can’t be merged are materialized once for every query they are used in.
If what you are looking is to amortize the materialization of a derived table across multiple executions, create a temporary table explicitly by issuing:
CREATE TEMPORARY TABLE AS query
You can then use it as a regular table for the duration of the session. The server will drop the temporary table automatically when the session is closed.
Bonus tip: you can actually add indexes to the temporary table (yeah, I know, it’s against the blog series tittle, but not the main recommendation and too good not to share!). Example:
CREATE TEMPORARY TABLE tempData
(
data_id BIGINT,
date DATETIME,
event VARCHAR(100),
......
INDEX idx_date(date)
);
A special word about Views..
CREATE VIEW supports an ALGORITHM parameter which can be used to enforce materialization or merging, in case it is feasible.
If you don’t provide a value, it will default to favor merging when possible. More details about this clause here.
By the way, we built a tool that helps you detecting performance problems around sub queries among many others.. Try for free!

Leave a Reply