Beyond creating indexes

On the last post of this blog series, we are going to explore how partitioning can be used to speed up rows lookup and to keep our large tables size under control

Disclaimer

This is not a typical post covering all things partitioning. We are going to be discussing partitioning, but from a query optimization standpoint. For more details about partitioning types, commands and limitations check the following links:

A brief overview

Partitioning is a database server feature that allows you to split a table into smaller pieces, based on a condition. This condition involves table columns and can have different forms (columns matching a given value, within a range, a set of values, etc). The column or combination of columns by which the table will be divided is called partitioning key.

_{^{Table inventory is partitioned by the WH column}}

The purpose of partitioning is breaking up a table so:

Data can be found faster
Table maintenance is simplified

There are, however, two gotchas with partitioning, being the latter specific to MySQL/MariaDB:

For partitioning to improve data lookup, you should include the partitioning key on your queries
Because indexes are also split with the table, there is no easy way for MySQL to enforce uniqueness across partitions. Thus, all primary and unique keys for a table should include the partitioning key.

Partitioning and data lookup

We said that queries should include the partition key to enjoy the benefits of partitioning in data lookup. This is because the server will rely on it to only read the partitions that match the conditions specified for the partitioning key. This is called partition pruning and becomes an initial data filtering step, performed before standard indexes are leveraged.

The partition key is key

When choosing the partitioning key for data lookup performance, we need to think of a column (or combination of them) that will benefit most of our core queries. At the same time, it should split that table into a relatively manageable amount of partitions.

Considering the example in the diagram above, if your inventory queries are frequently including a warehouse ID, and you have millions of products in each location, partitioning can easily discard 2/3 of the table.

Furthermore, you can still leverage any available indexed columns for that remaining 1/3 of the table for the selected warehouse!

Partitioning VS composite indexes

Someone could argue that partition pruning + index lookup is equivalent to perform an index scan on a composite index which includes the partitioning key. Consider the following:

With partitioning, you get more flexibility when creating indexes: the partitioning key can always be leveraged, regardless of the index selected by the server. You don’t need to include the partition key on any index or consider its position.
Partition pruning is instantaneous: the server will just ignore the partitions that don’t match the partitioning key condition. There is no scan operation involved
Partitioning keeps indexes smaller as you don’t need to include the partitioning key. Because the index key is smaller, you effectively read less data
Starting with MySQL 8.0.17, partitions can be scanned in parallel

Partition partitions (sub partitions)

You can take the partitioning data filtering benefits one step further and “partition” a partition. You should choose another column that might be beneficial for your queries. Take a look at the following example:

CREATE TABLE purchases (id INT, purchased DATE,
    PRIMARY KEY (id,purchased))
    PARTITION BY RANGE( YEAR(purchased) )
    SUBPARTITION BY HASH( MONTH(purchased) )
    SUBPARTITIONS 2 (
         PARTITION p0 VALUES LESS THAN (1990),
         PARTITION p1 VALUES LESS THAN (2000),
         PARTITION p2 VALUES LESS THAN MAXVALUE
    );

We are first partitioning the table based on the decade (YEAR(purchased)) a purchased was made, to then split that decade into a sub-partition for each month.

A caveat with sub partitions is that only RANGE and LIST partitions can be “subpartitioned”, and only HASH and KEY partitioning can be used in this case.

Partitioning for data life cycle management

Another good use of partitioning is to simplify data archival / deletion. Because partitioning separates the data at the file level, we can easily get rid of all data matching a given partition without having to use DELETE (and thus, resorting to expensive scan operations)

Because TRUNCATE is a table-level operation and partitions behave like smaller tables, data deletion is instantaneous. Imagine the following scenario:

“We only want to keep data for the last two quarters online and remove anything older. Below is the current table structure”

CREATE TABLE payments (id BIGINT, payment_date DATETIME, amount DECIMAL(13,2), PRIMARY KEY (ID));

Let’s use partitioning to simplify this quarterly data cleanup task. Because we need a date reference, we are going to use payment_date for the partitioning key. We need to:

Include the column selected as the partitioning key into the primary key (because of the limitation mentioned above)
Partition in such a way partitions are organized based on Qs

ALTER TABLE payments 
DROP PRIMARY KEY,
ADD PRIMARY KEY (id,payment_date) 
PARTITION BY LIST (MONTH(payment_date)) 
( 
PARTITION q1 VALUES IN (1,2,3), 
PARTITION q2 VALUES IN (4,5,6), 
PARTITION q3 VALUES IN (7,8,9), 
PARTITION q4 VALUES IN (10,11,12)
);

That’s it. To be completely fair, that is going to be an expensive, blocking operation, specially on huge tables. Builtin Online DDL only supports reducing lock to write operations.

In this cases it is probably better to resort to pt-online-schema-change, but you get the idea.

Now, to remove all data for Q1 you just need to:

ALTER TABLE payments TRUNCATE PARTITION q1;

Potential locking by sessions aside, the data purge is instantaneous as MySQL is just removing a file under the hood.

Wrapping up

Because partitioning separates data on a table at the file level, it can be leveraged to speed up data lookup and data archival. The server will only focus on the relevant partition or partitions, based on the conditions provided for the columns that make the partitioning key.

Partition pruning is done before indexes are accessed, so it effectively constitutes an additional filtering operation and a very efficient one.

Partitioning advantages are more evident on very big, core tables such as customers, purchases, products, orders, etc. It is wise to partition tables early in the design process, specially when we know they are going to grow massively and when partition keys are evident.

By the way, we built a tool that helps you detecting performance problems. Try for free!

Beyond creating indexes – Part 3B

Disclaimer

A brief overview

Partitioning and data lookup

The partition key is key

Partitioning VS composite indexes

Partition partitions (sub partitions)

Partitioning for data life cycle management

Wrapping up

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from query-optimization.com