Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

MySQL - Setup

Parent Feed: 

When setting up a MySQL Server there are a lot of things to consider. Most requirements depend on the intended usage of the system.

Beginning with the hardware requirements, this series of blogposts consists of our own experience, operating several hundred MySQL instances as well as a recent workshop our team attended. Before you now start optimizing your MySQL instance, consider the following: Standards are more important in multi-instance setups than individual performance tuning. Standards allow for easier debugging and give you a better handling for security related incidents. We strongly recommend the use of configuration management tools, like puppet or Chef for environments that are supposed to be consistent over all your systems.

The MySQL default settings, as of version 5.6 are great - and apart from larger systems, hardly any changes are necessary. For a standard LAMP stack, hosting a blog or a small website, not a lot of optimization is needed. Todays CMS come with sufficient configurations and tampering with the back end itself can cause more harm than it will help optimizing performance. These articles are mostly interesting, if you are developing your own applications. This first article focuses on the design of the underlying platform. It makes an estimated 10%-20% of the possible performance tuning. Nevertheless, if your application is poorly designed or it is not optimized, it will become quite expensive to try and fix the system the hardware-way. So, the first step is always to ensure a well structured and optimized application. We will address this in a future article.


Hardware / Operation system

Here are some general hardware considerations, viable for most setups:

CPU

MySQL does NOT scale a single query to multi-cores. (one query = one core). So it is better for your instance to have less - but faster - cores than vice versa. Try to avoid multi-CPU (note: multiple physical CPU, not cores), systems with a lot of ram. Usually CPU's have a direct bus to their RAM-banks. If RAM on a bank held be another CPU needs to be addressed, there will be overhead. In high performance setups rather go for a hexa- or octa-core than two quad-core CPU.

RAM

RAM has become a cheap resource these days. Use as much RAM as possible but it should match the database-size and have some extra space for the operating system and the application if existent on the same machine. If the database is served mostly or completely from RAM, it tends to have fast read speeds and only needs to write to disk, not to read from them, in most cases. This will be a very huge performance optimization, because operations in the RAM are easily 100 times faster than an SSD.

HDD

If you are using conventional, magnetic storage, faster turning disks are preferred. Try to go for 15k/RPM+. SSD or solid state drives are obviously even better than that and outperform every other storage medium at random IOPS (input output per seconds). For the RAID setup, go for RAID10 (better) or RAID1 (cheaper) but avoid RAID5/6 etc because the performance is bad at random writes. The less read/write operations from disk the better the performance.

File system

If you are not sure about the file system, go with the recommended type of your operating system. In general that means: Stay with ext4. In the very unlikely event of file system problem, the most common will make debugging a lot easier and in the long term, this outweighs the possible performance gain of other file systems by far.

I/O scheduler

The standard scheduler on a lot of operation systems is CFQ (Completely Fair Queuing). For databases, consider switching to NOOP or Deadline. Find out the scheduler your system is using

cat /sys/block/sda/queue/scheduler

This will give you an output like

noop [deadline] cfq

where the used scheduler is shown in brackets. As of kernel 2.6.10 it is possible to change the scheduler at runtime. You will need root permissions to do so.

echo noop > /sys/block/sda/queue/scheduler

to set your scheduler to noop. This setting is not persistent and will be reset at reboot. To make a setting permanent, it needs to be given to the kernel as a parameter at boot time.

elevator=<scheduler>

needs to be added, where <scheduler> needs to be noop or deadline.


Storage engine

If starting a new database, especially with MySQL 5.6, InnoDB will be be the standard engine. For already running projects, there is a way to convert MyISAM tables to InnoDB. This is usually not recommended as the performance gain will only impact very high performance databases and there is always a risk involved.

General Setup

Operation system

The decision for the operation system is one of the less important. At Cocomore we stay with the Ubuntu LTS versions. As mentioned in the beginning, systems that are setup the same way make configuration and debugging way easier and improve stability. The problem with LTS systems is that they oftentimes only offer relatively old packages. The default on Ubuntu 14.04, released in April 2014 is MySQL 5,5.

There are a couple of ways to install newer versions. As of February 2015, the newest stable MySQL version ist 5.6. This can be installed from the Ubuntu universe repository. Oracle themselves offer a repository for MySQL as well, you might need to login first, though. We recommend using the latest stable release, as of February 2015, this is version 5.6.


Database Replication

There are two possibilities:
  • statement based - In MySQL 5.1.4 and earlier, binary logging and replication used this format exclusively.
  • row based

Statement based replication stores the SQL statement in the binary log and will be executed on the slave server. This has the disadvantage that for example time updates etc. will also be executed as said within the statement. Because of this, there is a chance of inconsistency between Master and slave because the time or random statement will lead to different results on the different servers.

Row base replication is superior because only the results of a statement (the changed rows), will be stored in the binary logs and then the appropriate rows will be changed to the new result on the slave. But this requires much more disk space, because now not just the sql statement is stored any more but the whole rows which are changed.

The replication method can be changed on the fly, but we recommend the row based replication to avoid inconsistency between Master and Slave. Diskspace became very cheap, so there is no need to use the statement based replication.

As of MySQL 5.1.8, the server can change the binary logging format in real time according to the type of event using mixed-format logging. When the mixed format is in effect, statement-based logging is used by default, but automatically switches to row-based logging in particular cases.

High availability solutions

If your database goes down, your application will not work anymore. Therefore we recommend a high availability setup for your system if your application should stay online although there are some troubles in the background. There are some possibilities to ensure high availability for your database. The simple ones are easier and less expensive as an gallera cluster for example. It depends on the worth of your application how mush time and mone you want to spend to keep your application always online.


Master - Slave setup

  • sell also Master - Slave setup
  • simple asynchronous replication
  • in case of failure, switch to the slave and recover the master
  • switching can be done manually or automatically for example using Heartbeat

Active / passive cluster

  • same as before using a master - slave replication
  • using a vitual IP address which points to the master
  • underlying system is a SAN or DRBD
  • Heartbeat is switching the virtual IP address to the other system (slave) in case of error

mysql_ha_active_passive_cluster.jpg

Gallera cluster

  • see also Gallera cluster
  • Gallera is a cluster system of multiple environments with multiple masters which are all synchronized
  • there is a load-balancer switching the load to the different master databases
  • Gallera is synchronizing the multiple master databases, if statements are executed
  • works as long as the majority of the masters is online and connected to Gallera, therefore it has to be always an odd-numbered amount of Master databases
  • If one Master db crashes the others are still online and the crashed one can be repaired

mysql_ha_gallera_cluster.jpg

Conclusion

The first step for optimization is always your application itself. But there are also other possibilities to tune up your system by choosing the right hardware, the right operations system (OS), the correct file system and the best storage engine of MySQL itself for your application. With newer MySQL Versions new features but also performance improvements are delivered. Currently MySQL Version 5.6 (February 2015) is a very good way to deal with.

To ensure that your application is not going down if your database goes down there are possibilities to ensure high availability using database replications and different setups. It depends on your budget which solutions can be used, but always consider all possibilities of improvements (Hardware, DB tuning, replication or application tuning) and the estimated effect, so you can choose the improvement with the best cost-value ratio.

If you have questions or you need help with your application, please do not hesitate to contact us.

In our next post we will provide you some informations how to improve your MySQL settings in the my.cnf itself.

Other MySQL blog posts

Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web