Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

MySQL - Query optimization

Parent Feed: 

Queries are the centerpiece of MySQL and they have high optimization potential (in conjunction with indexes). This is specially true for big databases (whatever big means). Modern PHP frameworks tend to execute dozens of queries. Thus, as a first step, it is required to know what the slow queries are. A built-in solution for that is the MySQL slow query log. This can either be activated in my.cnf or dynamically with the --slow_query_log option. In both cases, long_query_time should be reduced to an appropriate value. Most Linux distributions come up with a default value of 1 or more seconds. But this turns out too slow for web applications as you want to achieve an overall response time of a few hundreds of milliseconds. So depending on your needs of performance choose a value of 0.1 or 0.01 seconds.

SQL consists of 2 different types of queries: Those who belong to the Data definition language (DDL) and those who are working with data (Data manipulation language, DML). DDL queries have usually no performance implications. But there is an exception of this rule of thumb: ALTER TABLE statements can be very time-consuming, if a table contains millions of records and uses (unique) indexes. We will cover a practice in a minute. DML queries again can be divided into INSERT statements and other CRUD statements (SELECT, UPDATE and DELETE) on the other hand. Those statements can be optimized with several techniques. Most of this blog post will address this type of statement.

Optimizing ALTER TABLE statements

Imagine you have an accounts table with millions of records and you want to extend it with a field for a phone number. A direct execution of ALTER TABLE would certainly lead to major load. The trick is to avoid index ad-hoc re-calculation. Hence, we drop all indexes and copy the table to an extra table and perform structural changes there.

1. Set innodb_buffer_pool_size appropriately (Be aware: For performing structural changes, a high buffer pool size can speed up things; Being live however, a high size will lead to memory shortages)

2. (Optional) Backup the database

3. Drop all indexes except primary key and foreign keys

DROP index ... 

4. Copy the table and apply structural changes. Use a similar name, for example with suffix '_new'.

CREATE TABLE IF NOT EXISTS `Accounts_new`

  id` int(11) NOT NULL AUTO_INCREMENT,

  `email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,

  `city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,

  PRIMARY KEY (`id`),

  ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;

  ALTER TABLE `Accounts_new` ADD `phone` VARCHAR(255 ) NOT NULL;

5. Copy data with INSERT INTO ... SELECT. Just select the columns that are used in the new table.

INSERT INTO Accounts_new SELECT `id`, `email`,  `city`, null FROM Accounts;

6. Rename the table. In case of used foreign keys disable the foreign key check.

SET foreign_key_checks = 0;

  DROP TABLE Accounts;

  ALTER TABLE Accounts_new RENAME Accounts;

  SET foreign_key_checks = 1;

7. Create all indexes including foreign keys.

CREATE index ...

Two steps require major efforts. First, copying all the data to the new table will take some time; Second, rebuilding all indexes can last a long time (it depends on the number of indexes and whether they are unique or not).

Optimizing insertions

INSERT queries should be merged, if possible. A single query that creates 10 rows is faster than 10 sole queries. However, this technique has its limits, especially, if MySQL runs out of memory. If you want to import a whole database, then you can switch off some consistency checks, for example foreign_key_checks=0, unique_checks=0. Moreover, autocommit=0 can also help.

Optimizing SELECT statements

SELECT, UPDATE and DELETE statements have one thing in common: It is the way they filter results (with the WHERE clause). This can turn out as a complex task, especially for big tables. Big means tables having a row count from 100 000. Tables having more than one million rows should definitely be included into query optimization. For the sake of simplicity, we concentrate on SELECT queries. It is the most frequently used case anyway.

1) Use EXPLAIN

If you want to optimize your query, you should know how MySQL executes it. You can use EXPLAIN to get the query execution plan. With MySQL Version 5.6 it is possible to use explain for insert, update and delete statements.

EXPLAIN SELECT * FROM Users WHERE uid = 1; 

The result contains several useful informations about the query:

column description
select_type  Is the query a simple query (primary) or is it a compounded query (join or subquery)?
type This is extremely important for joins or subqueries: How is this query joined? The best types are: const, ref, eq_ref. Worse types are: range, index, all. Attention: do not mix up index with ref/eq_ref! For further informations, please visit the MySQL docs.
possible keys A list of indexes which could be used to optimize the speed of the query.
key The used index
key_len The length of the index. Shorter indexes tend to perform better.
ref Which column is used for the index scan?
rows Estimated number of rows that have to be compared with the filter criteria. This number should be as low as possible.
extra Additional information about the query. Attention: Do not mix up Using index with ref/eq_ref!

If the query is a simple query (i.e. no joins or subqueries are used), then EXPLAIN will return a single line where select_type is set to SIMPLE. To get a good performance, it is important to use an existing index. This is the case when type is equal to ref and possible_keys and key suggest an index.

If joins are used, the returned result will contain a line per table. Joining tables should always be done by a foreign key comparison. In this case the type of an EXPLAIN is eq_ref. Avoid to leave out foreign keys. Try to avoid joins on different attribute types, for instance a varchar field and an integer field. This will make MySQL do a lot of type conversions which is simply not good.

2) Use existing indexes

Indexes are ordered by (at least) an attribute by design. Thus, they can be applied to queries which are filtering by this attribute, either as exact filter (WHERE x = 'y') or as range query (WHERE timestamp >= 123). Indexes are not applicable if you use any function in the WHERE clause, for instance WHERE SUBSTR(name, 0, 4) = 'Alex'. The following list shows which WHERE clauses can be handled by indexes:

WHERE x = 'y' 
WHERE timestamp >= 123
WHERE timestamp BETWEEN 123 AND 456
WHERE name LIKE ('Ale%')
WHERE name LIKE ('%Ale%')  
WHERE SUBSTR(name, 0, 4) = 'Alex'

If you have more than one filter criterion in the query, your index should include all used columns as well. Imagine you have the following indexes: name_IDX, firstname_IDX, firstname_name_IDX and name_firstname_IDX. Then the query

# Using composite indexes

SELECT * FROM Users WHERE firstname = 'Alex' AND name = 'Saal'

... could be optimized with firstname_IDX, firstname_name_IDX but not with name_firstname_IDX because of the order of the columns! The order has to be the same in the query as well as in the index. It is like using a telephone book. A telephone book is ordered by last name, then by first name. It is much more easy to first look for all persons with the desired last name and have a list with only a few persons. It does not make sense at all to browse the whole telephone book looking for people with a wanted first name and then comparing the last name in step 2.

Keeping this image in mind: It is always good to have a selective index. You can use an index which includes a gender of a customer. But this reduces the data set only by a half. Instead, it is much more pleasant to have an index like e-mail address or a unique index like Social Security Number. Be selective! As a rule of thumb, there are 3 levels of selectivity:

• Primary key or unique key (best; those clauses will return a single row immediately)

• An index matching the WHERE clause, or a prefix index (useful for text fields)

• No key or index is applicable (worst)

Furthermore, firstname_name_IDX matches better than firstname_IDX and will be preferred by MySQL. Note that firstname_name_IDX can also be used for queries like

# Filtering the first name

SELECT * FROM Users WHERE firstname = 'Alex'

It is therefore neither necessary nor recommended having both indexes created simultaneously.

The indexes are always read from left to the right. If you have an index containing multiple columns - index names (column_firstname, column_familyname) - the order of your filters in the query should follow the same order. Otherwise the index can not be used. So if you filter without using the first column (column_firstname is not used in the query) in the index, but assuming that the filter is also used by just filtering for the second column (column_familyname) in the index, the index is not used. Therefore it is sometimes better to add a second index using just the second column. Check the statement by using EXPLAIN to check which index is used or not. For examples see the chapter about table indexes below.

3) Or Statements

The mysql query optimizer can not use indexes if the OR statement is used, so try to avoid OR statements!

4) Optimization of GROUP BY/ORDER BY queries

Sometimes you are facing queries that aggregate or sort rows:

# GROUP BY / ORDER BY

SELECT role, count(*) FROM Users WHERE registration_ts > 140000000 GROUP BY role;

SELECT id, username FROM Users WHERE registration_ts > 140000000 ORDER BY username;

What MySQL does is:

1. Selection of Users by WHERE registration_ts > 140000000,

2. Order results of step 1 (no matter if GROUP BY role or ORDER BY username is used)

3. Projection to the desired columns by SELECT role or SELECT id, username

The hardest step is sorting. This is where indexes can help a lot. They contain a sorted list of records dependent to their definition. This is extremely helpful in particular if you have a lot of data in that table (Complexity of sorting algorithms is O(n*log(n))). How to define the index to optimize this query? Choose first the column filtered in the WHERE clause, then those in GROUP BY/ORDER BY (in the same order as in the query!). If it is possible to add the columns of SELECT to the index (after the columns of GROUP BY/ORDER BY) to gain some performance (this technique is called covering index). It is not always reasonable to use covering indexes: If the whole index gets too big, then you probably won't gain any time.

Extending the example of a telephone book: It is helpful, if you have requests like "Tell me how many persons have the last name 'Smith'" (This is a GROUP BY) or "Give me a list of all persons ordered by last name and first name" (ORDER BY).

In the previous example use the following indexes:

• registration_role_IDX for the GROUP BY statement

• registration_username_IDX for the ORDER BY statement

5) Usage of subqueries

When it comes to complex queries, MySQL (especially before 5.6) is optimized for using JOIN statements. However, in some cases a subquery can be more efficient if you use both GROUP BY and ORDER BY on different tables. In that case, an index cannot be used, if you join the tables. Defining a main query and subquery avoids this problem, as each query acts on its own table and is able to use any available index.

# Case A: Query as INNER JOIN

SELECT

    a.id AS account_id,

    p.id AS product_id,

    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,

    COUNT(*) AS count

FROM Accounts a

INNER JOIN Orders o ON a.id = o.account_id

INNER JOIN Products p ON p.id = o.product_id

GROUP BY p.id

ORDER BY a.id

# Case B: Subquery

SELECT account_id, product_id, product_name, count

FROM (SELECT

    a.id AS account_id,

    p.id AS product_id,

    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,

    COUNT(*) AS count

  FROM Accounts a

  INNER JOIN Orders o ON a.id = o.account_id

  INNER JOIN Products p ON p.id = o.product_id

  GROUP BY p.id) as product

ORDER BY account_id

In that case, the query has been split up to an outer query and a subquery (line 2-10). Case A would make MySQL create a temporary table and use filesort. Case B can avoid that. It depends on the size of each table, which way is superior.

Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web