Oct 24 2015
DA
Oct 24

Configuring Drupal with a remote database is possible and in some cases recommended.

Our EK management tools application is a particularly good case where this setup is very helpful. EK manages Drupal system configuration database and content database separately. In other words, the installed database from Drupal 8 and the database where all EK custom modules save their data are different.

This configuration brings few benefits:

  • security: is system data are compromised (i.e. user login information) the content data which is of high value for the organization have less chances to be affected. This also reinforced as the 2 databases are physically and geographically separated;
  • backup management; you can have a better backup management cycle with separated databases;
  • updates management; when updating Drupal, having a configuration database distinct from where the content is store will make it more efficient and faster;
  • as a vendor, you can keep control on the service by reserving control on the system configuration without having any interference on the customer content and information;

 The diagram below shows how we can implement an EK installation with remote database:

remote Drupal database setup

One important point is to keep the connection between the main server (this server can be the customer server for instance) and the configuration server secured with encrypted communication (ssl).

A good solution (we are experimenting now) comes with Amazon RDS service. The service can supply on demand access to mysql databases or any other database engine) and provide the necessary encryption keys.

Jun 30 2015
Jun 30

Queries are the centerpiece of MySQL and they have high optimization potential (in conjunction with indexes). This is specially true for big databases (whatever big means). Modern PHP frameworks tend to execute dozens of queries. Thus, as a first step, it is required to know what the slow queries are. A built-in solution for that is the MySQL slow query log. This can either be activated in my.cnf or dynamically with the --slow_query_log option. In both cases, long_query_time should be reduced to an appropriate value. Most Linux distributions come up with a default value of 1 or more seconds. But this turns out too slow for web applications as you want to achieve an overall response time of a few hundreds of milliseconds. So depending on your needs of performance choose a value of 0.1 or 0.01 seconds.

SQL consists of 2 different types of queries: Those who belong to the Data definition language (DDL) and those who are working with data (Data manipulation language, DML). DDL queries have usually no performance implications. But there is an exception of this rule of thumb: ALTER TABLE statements can be very time-consuming, if a table contains millions of records and uses (unique) indexes. We will cover a practice in a minute. DML queries again can be divided into INSERT statements and other CRUD statements (SELECT, UPDATE and DELETE) on the other hand. Those statements can be optimized with several techniques. Most of this blog post will address this type of statement.

Optimizing ALTER TABLE statements

Imagine you have an accounts table with millions of records and you want to extend it with a field for a phone number. A direct execution of ALTER TABLE would certainly lead to major load. The trick is to avoid index ad-hoc re-calculation. Hence, we drop all indexes and copy the table to an extra table and perform structural changes there.

  1. Set innodb_buffer_pool_size appropriately (Be aware: For performing structural changes, a high buffer pool size can speed up things; Being live however, a high size will lead to memory shortages)
  2. (Optional) Backup the database
  3. Drop all indexes except primary key and foreign keys
    DROP index ...
  4. 4. Copy the table and apply structural changes. Use a similar name, for example with suffix '_new'.

    CREATE TABLE IF NOT EXISTS `Accounts_new`
      id` int(11) NOT NULL AUTO_INCREMENT,
      `email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      `city` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
      PRIMARY KEY (`id`),
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
      ALTER TABLE `Accounts_new` ADD `phone` VARCHAR(255 ) NOT NULL;

  5. Copy data with INSERT INTO ... SELECT. Just select the columns that are used in the new table.
    INSERT INTO Accounts_new SELECT `id`, `email`,  `city`, null FROM Accounts;
  6. Rename the table. In case of used foreign keys disable the foreign key check.

    SET foreign_key_checks = 0;
      DROP TABLE Accounts;
      ALTER TABLE Accounts_new RENAME Accounts;
      SET foreign_key_checks = 1;

  7. Create all indexes including foreign keys.
    CREATE index ...

Two steps require major efforts. First, copying all the data to the new table will take some time; Second, rebuilding all indexes can last a long time (it depends on the number of indexes and whether they are unique or not).


Optimizing insertions

INSERT queries should be merged, if possible. A single query that creates 10 rows is faster than 10 sole queries. However, this technique has its limits, especially, if MySQL runs out of memory. If you want to import a whole database, then you can switch off some consistency checks, for example foreign_key_checks=0, unique_checks=0. Moreover, autocommit=0 can also help.


Optimizing SELECT statements

SELECT, UPDATE and DELETE statements have one thing in common: It is the way they filter results (with the WHERE clause). This can turn out as a complex task, especially for big tables. Big means tables having a row count from 100 000. Tables having more than one million rows should definitely be included into query optimization. For the sake of simplicity, we concentrate on SELECT queries. It is the most frequently used case anyway.


1) Use EXPLAIN

If you want to optimize your query, you should know how MySQL executes it. You can use EXPLAIN to get the query execution plan. With MySQL Version 5.6 it is possible to use explain for insert, update and delete statements.

EXPLAIN SELECT * FROM Users WHERE uid = 1;

The result contains several useful informations about the query:


column Description select_type Is the query a simple query (primary) or is it a compounded query (join or subquery)? type This is extremely important for joins or subqueries: How is this query joined? The best types are: const, ref, eq_ref. Worse types are: range, index, all. Attention: do not mix up index with ref/eq_ref! For further informations, please visit the MySQL docs. possible keys A list of indexes which could be used to optimize the speed of the query. key The used index key_len The length of the index. Shorter indexes tend to perform better. ref Which column is used for the index scan? rows Estimated number of rows that have to be compared with the filter criteria. This number should be as low as possible. extra Additional information about the query. Attention: Do not mix up Using index with ref/eq_ref!

MySQL docs: http://dev.mysql.com/doc/refman/5.0/en/explain-output.html

If the query is a simple query (i.e. no joins or subqueries are used), then EXPLAIN will return a single line where select_type is set to SIMPLE. To get a good performance, it is important to use an existing index. This is the case when type is equal to ref and possible_keys and key suggest an index.

If joins are used, the returned result will contain a line per table. Joining tables should always be done by a foreign key comparison. In this case the type of an EXPLAIN is eq_ref. Avoid to leave out foreign keys. Try to avoid joins on different attribute types, for instance a varchar field and an integer field. This will make MySQL do a lot of type conversions which is simply not good.


2) Use existing indexes

Indexes are ordered by (at least) an attribute by design. Thus, they can be applied to queries which are filtering by this attribute, either as exact filter (WHERE x = 'y') or as range query (WHERE timestamp >= 123). Indexes are not applicable if you use any function in the WHERE clause, for instance WHERE SUBSTR(name, 0, 4) = 'Alex'. The following list shows which WHERE clauses can be handled by indexes:

WHERE x = 'y' check.png

WHERE timestamp >= 123 check.png

WHERE timestamp BETWEEN 123 AND 456 check.png

WHERE name LIKE ('Ale%') check.png

WHERE name LIKE ('%Ale%') error.png

WHERE SUBSTR(name, 0, 4) = 'Alex' error.png

If you have more than one filter criterion in the query, your index should include all used columns as well. Imagine you have the following indexes: name_IDX, firstname_IDX, firstname_name_IDX and name_firstname_IDX. Then the query

# Using composite indexes
SELECT * FROM Users WHERE firstname = 'Alex' AND name = 'Saal'

... could be optimized with firstname_IDX, firstname_name_IDX but not with name_firstname_IDX because of the order of the columns! The order has to be the same in the query as well as in the index. It is like using a telephone book. A telephone book is ordered by last name, then by first name. It is much more easy to first look for all persons with the desired last name and have a list with only a few persons. It does not make sense at all to browse the whole telephone book looking for people with a wanted first name and then comparing the last name in step 2.

Keeping this image in mind: It is always good to have a selective index. You can use an index which includes a gender of a customer. But this reduces the data set only by a half. Instead, it is much more pleasant to have an index like e-mail address or a unique index like Social Security Number. Be selective! As a rule of thumb, there are 3 levels of selectivity:

  • Primary key or unique key (best; those clauses will return a single row immediately)
  • An index matching the WHERE clause, or a prefix index (useful for text fields)
  • No key or index is applicable (worst)

Furthermore, firstname_name_IDX matches better than firstname_IDX and will be preferred by MySQL. Note that firstname_name_IDX can also be used for queries like

# Filtering the first name
SELECT * FROM Users WHERE firstname = 'Alex'

It is therefore neither necessary nor recommended having both indexes created simultaneously.

The indexes are always read from left to the right. If you have an index containing multiple columns - index names (column_firstname, column_familyname) - the order of your filters in the query should follow the same order. Otherwise the index can not be used. So if you filter without using the first column (column_firstname is not used in the query) in the index, but assuming that the filter is also used by just filtering for the second column (column_familyname) in the index, the index is not used. Therefore it is sometimes better to add a second index using just the second column. Check the statement by using EXPLAIN to check which index is used or not. For examples see the chapter about table indexes below.


3) Or Statements

The mysql query optimizer can not use indexes if the OR statement is used, so try to avoid OR statements!


4) Optimization of GROUP BY/ORDER BY queries

Sometimes you are facing queries that aggregate or sort rows:

# GROUP BY / ORDER BY
SELECT role, count(*) FROM Users WHERE registration_ts > 140000000 GROUP BY role;
SELECT id, username FROM Users WHERE registration_ts > 140000000 ORDER BY username;

What MySQL does is:

  1. Selection of Users by WHERE registration_ts > 140000000,
  2. Order results of step 1 (no matter if GROUP BY role or ORDER BY username is used)
  3. Projection to the desired columns by SELECT role or SELECT id, username

The hardest step is sorting. This is where indexes can help a lot. They contain a sorted list of records dependent to their definition. This is extremely helpful in particular if you have a lot of data in that table (Complexity of sorting algorithms is O(n*log(n))). How to define the index to optimize this query? Choose first the column filtered in the WHERE clause, then those in GROUP BY/ORDER BY (in the same order as in the query!). If it is possible to add the columns of SELECT to the index (after the columns of GROUP BY/ORDER BY) to gain some performance (this technique is called covering index). It is not always reasonable to use covering indexes: If the whole index gets too big, then you probably won't gain any time.

Extending the example of a telephone book: It is helpful, if you have requests like "Tell me how many persons have the last name 'Smith'" (This is a GROUP BY) or "Give me a list of all persons ordered by last name and first name" (ORDER BY).

In the previous example use the following indexes:

  • registration_role_IDX for the GROUP BY statement
  • registration_username_IDX for the ORDER BY statement

5) Usage of subqueries

When it comes to complex queries, MySQL (especially before 5.6) is optimized for using JOIN statements. However, in some cases a subquery can be more efficient if you use both GROUP BY and ORDER BY on different tables. In that case, an index cannot be used, if you join the tables. Defining a main query and subquery avoids this problem, as each query acts on its own table and is able to use any available index.

# Case A: Query as INNER JOIN
SELECT
    a.id AS account_id,
    p.id AS product_id,
    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,
    COUNT(*) AS count
FROM Accounts a
INNER JOIN Orders o ON a.id = o.account_id
INNER JOIN Products p ON p.id = o.product_id
GROUP BY p.id
ORDER BY a.id

# Case B: Subquery
SELECT account_id, product_id, product_name, count
FROM (SELECT
    a.id AS account_id,
    p.id AS product_id,
    TRIM(SUBSTRING(p.name, 1, 30)) AS product_name,
    COUNT(*) AS count
  FROM Accounts a
  INNER JOIN Orders o ON a.id = o.account_id
  INNER JOIN Products p ON p.id = o.product_id
  GROUP BY p.id) as product
ORDER BY account_id

In that case, the query has been split up to an outer query and a subquery (line 2-10). Case A would make MySQL create a temporary table and use filesort. Case B can avoid that. It depends on the size of each table, which way is superior.

Other MySQL blog posts

Jun 05 2015
Jun 05

Backups are very important for every application, especially if a lot of data is stored in your database. For a website with few updates it is not so important to do backups regularly, you can just take the backup of last week for restoring the site and if there was just one or two updates, you can add them manually afterwards. But if you run a community site with user generated content and a lot of input the topic backup & recovery becomes a lot more important but also complex. If the last backup is from last night you have to consider all the updates that were made in the meantime. Because you don’t know what the users have entered, it is impossible to add these changes afterwards manually. That is why you need a backup strategy that also considers the storing of all updates in the time between two full backups.

There are four methods for backup and recovery a MySQL database. All other methods are based on these 4 methods. These are logical and physically backup methods.

Physical backups

Storing the binary MySQL files

With this method the real MySQL database files for the tables, in which all data is physically stored on the hard disk, are copied to a save location. If a backup is needed the files can be copied back to the server.

Backup

service mysql stop;
cp -R /var/lib/mysql/database_name target
service mysql start;


Recovery

service mysql stop;
cp -R /path/to/backup/database_name /var/lib/mysql/database_name
service mysql start;

Advantages:

  • fast
  • easy for backup
  • multiple files, if one is broken, hopefully just this table is lost, not the whole database

Disadvantages:
  • takes a lot of diskspace, all the indexes etc. are copied too
  • The database has to switched off for a certain time during the backup
  • Restoring can become a little complex
  • you need special permissions on the operation system

LVM snapshot

Linux provides a Logical Volume Manager (LVM) (http://en.wikipedia.org/wiki/Logical_Volume_Manager_(Linux)) and is a layer to manage the filesystem. The LVM provides the ability to create snapshots of any logical volume. So, you create a backup of the physical volume which can be easily used for recovering in the future. This is one of the best and easiest solutions, it is very fast, very easy and the potntial error level is very low. You don't have to set your database or application offline, there will be no locks on the tables and you get a stable snapshot of the current status.

Backup # create the snapshot
lvcreate -l100%FREE -s -n mysql-backup /data/databases

# Create a mount-point and mount the volume/snapshot
mkdir -p /mnt/snapshot
mount /data/databases/mysql-backup /mnt/snapshot

# Do the backup and copy it to a separate system
tar -cf /tmp/dbbackup_YYYMMDD_H:i:s.tar /mnt/snapshot
cp /tmp/dbbackup_YYYMMDD_H:i:s.tar ip:/path/to/backups/

# Now remove the mount and the snapshot
umount /mnt/snapshot
lvremove /data/databases/mysql-backup


Recovery # copy back the backup to your server
cp ip:/path/to/backups/dbbackup_YYYMMDD_His.tar  /tmp/

# stop the database
service mysql stop;
# remove the old database files
rm -R /var/lib/mysql/database_name

# copy the backup
cp /tmp/dbbackup_YYYMMDD_His.tar .

# unpack the files
tar xvf dbbackup_YYYMMDD_His.tar

restart the database
service mysql start;

Advantages:

  • very fast
  • easy
  • no stop of the database, no lock on tables

Disadvantages:
  • LVM needed
  • maybe root access for the operation system needed

Logical backups

mysqldump

A mysql dump is a common strategy for backups. This is a logical backup, means the structure and the content of the database are exported into a special file in a special format. This is done by using the MySQL syntax and stores all relevant information’s that are needed to rebuild the database. Normally there is a statement to create the database again, statements to rebuild the tables and their structure and then statements to import the data into the tables itself. All these information’s are stored in one file and this can be copied to a save location. When a backup is needed the file can be imported and the database will be restored with the information’s stored in the file.

Backup

# create the dump
mysqldump --add-drop-database -u Username -pPassword database_name > dump_database_name_YYYMMDD_His.mysql


Recovery

# drop the old table and insert the backup dump data
mysql -u Username -pPassword database_name < dump_database_name_YYYMMDD_His.mysql


Advantages:
  • Very easy
  • Can be done by users without special permissions on the operation system, esp. root permissions
  • The indexes are not stored, so this backup does not uses as much disk space as a physical file backup
  • You can have a look into the backup and also search in it, data manipulation is also possible if something has to be changed

Disadvantages:
  • Slower than a physical backup
  • Slower in recovery, because everything has to be imported first and then the indexes have to be built again
  • One file, if this is broken the backup is not possible

Hints:

  • --lock-all-tables: locks all tables during creation of the dump, so the application cannot access them, avoids data inconsistences!
  • --single-transaction: the whole dump is executed as an transaction, so the application can still access and write the database, the dump is made via transaction
  • --master-data: set the location of the master in a database replication, so the slave knows the position and where to start with the replication
  • --add-drop-database: add a DROP DATABASE statement before each CREATE DATABASE statement

Innobackup

There are special tools to create a innobackup, see also http://dev.mysql.com/doc/mysql-enterprise-backup/3.8/en/ihb-meb-compatib....
This is a special case for databases where the storage engine innodb is used. As innodb became the default storage engine and mysisam will be removed in the future, this is also a very common possibility to create a database backup. It is nearly the same as a normal MySQL dump but also considers the special possibilities of innodb like foreign key constraints etc. There is the MySQL Enterprise Backup (MEB) (http://www.mysql.com/products/enterprise/backup.html) for creating innobackups, but it costs money.

There is also an open source tool from percona named xtrabackup - http://www.percona.com/software/percona-xtrabackup. There is a free version available but also a enterprise edition which also costs money. As percona (http://www.percona.com) offers a lot of useful tools around mysql, this is maybe also a good choice for your MySQL toolkit. There are also other tools from percona which help to improve you daily live with MySQL.

Hints

Master-Slave replication for backups and avoiding downtimes

A special hint at this point: if possible, use a master-slave replication and use the slave for building the backups, so the main system if not affected and performance for the application is not affected. It is also a good setup to avoid long downtimes of your application. If one server crashes you can switch to the other with your application and it will stay online. In the meantime you can repair the broken system and then restore the old setup. So, if the master breaks, you can switch to the salve and it becomes master. If the slave fails, just the read requests of your application have to be routed to the master.

Fromdual Bman

Fromdual.com offers the tool fromdual_bman (http://fromdual.com/mysql-backup-manager-mysql_bman) which is a wrapper for seven different backup methods (the ones mentioned above and combinations of them).

Backup location

A backup is always better than no backup. But if the backup is stored on the same logical disk on the server where your website also runs on, you mabe get in trouble. If the disk crashes your website is offline and you also loose your backup. In this case it is impossible to restore your website on another server and bring it back online. So, always save your backups on another logical volume or on another server. If the data is very important also consider to save your backups on multiple locations maybe also in other data centers. In a case of fire or something similar your backup can be fetched from somewhere else and recovery can run in another data center.

Uuuups Queries - Accident on database

So called uuuups queries are queries where accidentially a wrong query was executed in a production system. This mostly happens because somebody executes the query manually by accident. There are multiple reasons why this can happen, for example mixing up the consoles etc.

Time is the key, so you have to act immediately!

Stop the database and your application immediately! Set your application to maintenance mode!

Two possibilities:
  1. no database replication

    You can only reimport the last backup, whenever it was made. Hopefully it was made not long time ago, maybe last night. By this you loose all changes between your last backup and the time of your uuuups query. Or, if you know the changes that were made in this time, you can fix the changes manually or by writing a script which does the changes for you.

  2. a running database replication
    You can reimport the last backup. By using a replication you automatically get the binary logs where MySQL writes the databases changes to be executed on the slave server. You can use these binary logs to create a point in time recovery, that means you can recover everything until the execution of the wrong query. After the import of the dump you can execute the mysql binary logs containing either the statements (statement based replication) or the changed rows (row based replication). Because all the changes are stored in these files you get all the changes that were made in the time of your last backup and the time the uuups query was executed. Do not forget to avoid the execution of the uuups query again, it is also in the binary logs! You can edit the binary logs by using the myqlbinlog tool (http://dev.mysql.com/doc/refman/5.0/en/mysqlbinlog.html) and delete the uuups query from the log. This has to be done on both servers, on the master and on the slave. But when one of the servers (use the master first) is recovered you can enable you application again by just using this first server. After that you can recover the slave server and restart the replication. If you are an experienced user you can also start the recovering of the master and the slave server together, so both systems are nearly back at the same time. But so not mix up the systems, else you have to start from the beginning and your application stays offline.

We hope these hints can help you in your daily life with MySQL. There are also other posts about MySQL. Because in this post there was a lot about database replication mentioned, the post about MySQL - setup is also interesting for you.

Other MySQL blog posts

May 12 2015
May 12

When setting up a MySQL Server there are a lot of things to consider. Most requirements depend on the intended usage of the system.

Beginning with the hardware requirements, this series of blogposts consists of our own experience, operating several hundred MySQL instances as well as a recent workshop our team attended. Before you now start optimizing your MySQL instance, consider the following: Standards are more important in multi-instance setups than individual performance tuning. Standards allow for easier debugging and give you a better handling for security related incidents. We strongly recommend the use of configuration management tools, like puppet or Chef for environments that are supposed to be consistent over all your systems.

The MySQL default settings, as of version 5.6 are great - and apart from larger systems, hardly any changes are necessary. For a standard LAMP stack, hosting a blog or a small website, not a lot of optimization is needed. Todays CMS come with sufficient configurations and tampering with the back end itself can cause more harm than it will help optimizing performance. These articles are mostly interesting, if you are developing your own applications. This first article focuses on the design of the underlying platform. It makes an estimated 10%-20% of the possible performance tuning. Nevertheless, if your application is poorly designed or it is not optimized, it will become quite expensive to try and fix the system the hardware-way. So, the first step is always to ensure a well structured and optimized application. We will address this in a future article.


Hardware / Operation system

Here are some general hardware considerations, viable for most setups:

CPU

MySQL does NOT scale a single query to multi-cores. (one query = one core). So it is better for your instance to have less - but faster - cores than vice versa. Try to avoid multi-CPU (note: multiple physical CPU, not cores), systems with a lot of ram. Usually CPU's have a direct bus to their RAM-banks. If RAM on a bank held be another CPU needs to be addressed, there will be overhead. In high performance setups rather go for a hexa- or octa-core than two quad-core CPU.

RAM

RAM has become a cheap resource these days. Use as much RAM as possible but it should match the database-size and have some extra space for the operating system and the application if existent on the same machine. If the database is served mostly or completely from RAM, it tends to have fast read speeds and only needs to write to disk, not to read from them, in most cases. This will be a very huge performance optimization, because operations in the RAM are easily 100 times faster than an SSD.

HDD

If you are using conventional, magnetic storage, faster turning disks are preferred. Try to go for 15k/RPM+. SSD or solid state drives are obviously even better than that and outperform every other storage medium at random IOPS (input output per seconds). For the RAID setup, go for RAID10 (better) or RAID1 (cheaper) but avoid RAID5/6 etc because the performance is bad at random writes. The less read/write operations from disk the better the performance.

File system

If you are not sure about the file system, go with the recommended type of your operating system. In general that means: Stay with ext4. In the very unlikely event of file system problem, the most common will make debugging a lot easier and in the long term, this outweighs the possible performance gain of other file systems by far.

I/O scheduler

The standard scheduler on a lot of operation systems is CFQ (Completely Fair Queuing). For databases, consider switching to NOOP or Deadline. Find out the scheduler your system is using

cat /sys/block/sda/queue/scheduler

This will give you an output like

noop [deadline] cfq

where the used scheduler is shown in brackets. As of kernel 2.6.10 it is possible to change the scheduler at runtime. You will need root permissions to do so.

echo noop > /sys/block/sda/queue/scheduler

to set your scheduler to noop. This setting is not persistent and will be reset at reboot. To make a setting permanent, it needs to be given to the kernel as a parameter at boot time.

elevator=<scheduler>

needs to be added, where <scheduler> needs to be noop or deadline.


Storage engine

If starting a new database, especially with MySQL 5.6, InnoDB will be be the standard engine. For already running projects, there is a way to convert MyISAM tables to InnoDB. This is usually not recommended as the performance gain will only impact very high performance databases and there is always a risk involved.

General Setup

Operation system

The decision for the operation system is one of the less important. At Cocomore we stay with the Ubuntu LTS versions. As mentioned in the beginning, systems that are setup the same way make configuration and debugging way easier and improve stability. The problem with LTS systems is that they oftentimes only offer relatively old packages. The default on Ubuntu 14.04, released in April 2014 is MySQL 5,5.

There are a couple of ways to install newer versions. As of February 2015, the newest stable MySQL version ist 5.6. This can be installed from the Ubuntu universe repository. Oracle themselves offer a repository for MySQL as well, you might need to login first, though. We recommend using the latest stable release, as of February 2015, this is version 5.6.


Database Replication

There are two possibilities:
  • statement based - In MySQL 5.1.4 and earlier, binary logging and replication used this format exclusively.
  • row based

Statement based replication stores the SQL statement in the binary log and will be executed on the slave server. This has the disadvantage that for example time updates etc. will also be executed as said within the statement. Because of this, there is a chance of inconsistency between Master and slave because the time or random statement will lead to different results on the different servers.

Row base replication is superior because only the results of a statement (the changed rows), will be stored in the binary logs and then the appropriate rows will be changed to the new result on the slave. But this requires much more disk space, because now not just the sql statement is stored any more but the whole rows which are changed.

The replication method can be changed on the fly, but we recommend the row based replication to avoid inconsistency between Master and Slave. Diskspace became very cheap, so there is no need to use the statement based replication.

As of MySQL 5.1.8, the server can change the binary logging format in real time according to the type of event using mixed-format logging. When the mixed format is in effect, statement-based logging is used by default, but automatically switches to row-based logging in particular cases.

High availability solutions

If your database goes down, your application will not work anymore. Therefore we recommend a high availability setup for your system if your application should stay online although there are some troubles in the background. There are some possibilities to ensure high availability for your database. The simple ones are easier and less expensive as an gallera cluster for example. It depends on the worth of your application how mush time and mone you want to spend to keep your application always online.


Master - Slave setup

  • sell also Master - Slave setup
  • simple asynchronous replication
  • in case of failure, switch to the slave and recover the master
  • switching can be done manually or automatically for example using Heartbeat

Active / passive cluster

  • same as before using a master - slave replication
  • using a vitual IP address which points to the master
  • underlying system is a SAN or DRBD
  • Heartbeat is switching the virtual IP address to the other system (slave) in case of error

mysql_ha_active_passive_cluster.jpg

Gallera cluster

  • see also Gallera cluster
  • Gallera is a cluster system of multiple environments with multiple masters which are all synchronized
  • there is a load-balancer switching the load to the different master databases
  • Gallera is synchronizing the multiple master databases, if statements are executed
  • works as long as the majority of the masters is online and connected to Gallera, therefore it has to be always an odd-numbered amount of Master databases
  • If one Master db crashes the others are still online and the crashed one can be repaired

mysql_ha_gallera_cluster.jpg

Conclusion

The first step for optimization is always your application itself. But there are also other possibilities to tune up your system by choosing the right hardware, the right operations system (OS), the correct file system and the best storage engine of MySQL itself for your application. With newer MySQL Versions new features but also performance improvements are delivered. Currently MySQL Version 5.6 (February 2015) is a very good way to deal with.

To ensure that your application is not going down if your database goes down there are possibilities to ensure high availability using database replications and different setups. It depends on your budget which solutions can be used, but always consider all possibilities of improvements (Hardware, DB tuning, replication or application tuning) and the estimated effect, so you can choose the improvement with the best cost-value ratio.

If you have questions or you need help with your application, please do not hesitate to contact us.

In our next post we will provide you some informations how to improve your MySQL settings in the my.cnf itself.

Other MySQL blog posts

Oct 07 2014
Oct 07

For drupal we have a number of different methods to backup the database. Having a backup strategy is one of the easiest things to do but often overlooked. Perform regular backups to keep your sanity when disaster hits. Implement a backup strategy for daily weekly and monthly backups and look cool to the client. I am going to discuss about 3 different ways to backup your database.

  1. backup and migrate
  2. drush and mysqldump
  3. automysqlbackup

Use the Backup and Migrate drupal module

Backup and migrate module (https://drupal.org/project/backup_migrate) is perhaps the most common way to keep your db safe. It counts more than 290,000 installs and there are many reasons for this. It supports encryption, rotation and the option to create a site archive complete with files + db. Also, it supports multiple destinations:

  • local filesystem backup
  • backup and email
  • backup to ftp
  • backup to cloud (AWS S3, rackspace, dropbox)
  • NodeSquirrel (A backup service by the makers of backup and migrate module)

The downside is that it relies on drupal cron to execute the backup task so it puts some load on apache. For bigger sites this may be an issue.

Backup a drupal database with drush or mysqldump

Drush is the Swiss army knife for drupal. Among other things, it can dump the database of a site with a simple command: drush sql-dump --result-file=site-stg.sql --gzip Execute this command out of the drupal directory, or the backups will be accessible by anyone: drush -r /drupal/root sql-dump --result-file=site-stg.sql --gzip If --result-file is provided with no value, then date based filename will be created under ~/drush-backups directory. If drush is not available in the server, you can still use the mysqldump command to get a db dump. mysqldump -udb_user -pdb_pass drupal_db &gt; drupal.sql Or generate a compressed archive: mysqldump -udb_user -p drupal_db | gzip -9 &gt; drupal.sql.gz Another option is to use drush and get the database directly on a remote server. This requires drush installed on both servers and your drush aliases working. Execute this from the remote server to get the dump: drush @client-site sql-dump &gt; client-site.sql Or perform the same procedure with mysqldump: ssh [email protected] "mysqldump -udb_user -pdb_pass drupal_db &gt; site.sql; gzip site.sql" scp [email protected]:site.sq.gz . Combine any of the above commands with some bash scripting to obtain a dynamic filename, add it to cron and perform daily backups automatically: drush @client-site sql-dump &gt; `date +%Y-%m-%d`.sql

Backup up with automysqlbackup

Automysqlbackup is a sourceforge script that performs automated backups (http://sourceforge.net/projects/automysqlbackup/). Automysqlbackup is my favorite method because it is a set and forget solution. It can automatically backup new databases and will rotate the archives. Automysqlbackup provides:

  • backup rotation with daily, weekly and monthly
  • backup encryption

Automysqlbackup can not backup to a remote location unless you use a network drive. It can however, email you the dump. Are you using some other method to keep your sites safe? I would love to hear in the comments below.

Mar 31 2014
Mar 31

I'm working on a Drupal application that stores data in separate mysql databases, and syncs some of the data to CouchDB with node.js scripts.

The extra mysql dbs are 16+ GB and it's not practical nor necessary to keep them locally since I only want to read the latest data in local development.

Wouldn't it be cool if my local development Drupal sites can read from the remote database servers?

In some cases you can just use the connection you find in the remote site's settings.php:

'otherdb' => 'mysqli://username:[email protected]/dbname'

(note: it's a Drupal 6 site so that's why you don't see an array - I will give a Drupal 7 example below)

However, there's often a twist: I must create a SSH tunnel to connect to this particular db server.

First, you need to have configured and installed SSH keys on your local and remote machines.

Then fire up your terminal and create the SSH tunnel to forward the remote mysql port to a local port. This technique is based on information clearly presented by RevSys (quick example) and noah.org (more details).

ssh -L [local port]:[db host]:[remote port] [ssh-username]@[remote host] -N

NOTES:

  1. -N tells ssh that you don't want to run a remote command; you only want to forward ports.
  2. use a different port for your tunnel [local port] than the one you normally use for mysql; for example, if you connect to mysql locally on the default port 3306, use 3307 (or any other unused port) for your tunnel. You should use the correct [remote port] which is typically 3306, and you can see if it is different by looking at settings.php in the remote site.
  3. Keep this connection alive as long as you need to connect to the remote database.

ssh -L 3307:[db host]:3306 [ssh-username]@[remote host] -N

Then you can test your connection (in a different terminal instance):

mysql -u[dbuser] -p -P 3307 -h 127.0.0.1

Here is the connection in settings.php for Drupal 6:

What's cool is that you can mix local and remote databases. For example, I want to use a local copy of the Drupal database, which is smaller and easier to sync, and read the data from the second (and third, in my case) remote dbs.

$db_url = array(

);

You can also connect Drupal to the default remote database, but it makes sense to use a local instance for development.

And in Drupal 7:

$databases['default']['default'] = array(

  'driver' => 'mysql',

  'database' => 'local-dbname',

  'username' => 'local-dbuser',

  'password' => 'password',

  'host' => 'localhost',

  'prefix' => '',

);

$databases['otherdb']['default'] = array(

  'driver' => 'mysql',

  'database' => 'dbname',

  'username' => 'username',

  'password' => 'password',

  'host' => '127.0.0.1',

  'port' => '3307',

  'prefix' => '',

);

WARNING: 

If the db user for the remote db has all privileges, your application may alter the remote database.

Therefore, you should create a "read-only" user for the remote database which will prevent you from altering it.

THINK

Apr 22 2013
Apr 22

One of the suboptimal techniques that developers often use, is a query that retrieves the entire content of a table, without any conditions or filters.

For example:

SELECT * FROM table_name ORDER BY column_name;

This is acceptable if there are not too many rows in the table, and there is only one call per page view to that function.

However, things start to get out of control when developers do not take into account the frequency of these calls.

Here is an example to illustrate the problem:

A client had high load average (around 5 or 6) on their server which had around 400 logged in users at peak hours. The server was somewhat fragile with any little thing, such as a traffic influx, or a misbehaved crawler, causing the load to go over 12.

This was due to using an older version of the Keyword Links module.

This old version had the following code:

This caused certain keywords to be replaced when a node is being displayed:

function keyword_link_nodeapi(&$node, $op, $teaser, $page) {
  if ($op == 'view' && ...
    $node->content['body']['#value'] = keyword_link_replace(...);
  }
}

And this caused keyword replacement for each comment as well.

function keyword_link_comment(&$a1, $op) {
  if ($op == 'view') {
    $a1->comment = keyword_link_replace($a1->comment);
    $node->content['body']['#value'] = keyword_link_replace(...);
   }
 }

The function that replaced the content with keywords was as follows:

 
function keyword_link_replace($content) {
  $result = db_query("SELECT * FROM {keyword_link} ORDER BY wid ASC");
  while ($keyword = db_fetch_object($result)) {
    ...
    $content = preg_replace($regex, $url, $content, $limit);
  }
  return $content;
}

Which executes the query every time, and iterates through the result set, replacing words.

Now, let us see how many rows are there in the table.

mysql> SELECT COUNT(*) FROM keyword_link;
+----------+
| count(*) |
+----------+
|     2897 |
+----------+
1 row in set (0.00 sec)

Wow! That is a relatively large number.

And Eureka! That is it! The query was re-executed every time the replace function was called!
This means in a list of nodes of 50 nodes, there would be 50 queries!

And even worse, for a node with tens or hundreds of comments, there would be tens or hundreds of queries as well!

Solution

The solution here was to upgrade to the latest release of the module, which has eliminated the replacement of keywords for comments.

But a better solution, preserving the functionality for comments, is a two fold combined solution:

Use memcache as the cache layer

By using memcache, we avoid going to the database for any caching. It is always a good idea in general to have that, except for simple or low traffic sites.

However, on its own, this is not enough.

Static caching for cache_get() result

By statically caching the results of the query, or the cache_get(), those operations are executed once per page view, and not 51 times for a node displaying comments. This is feasible if the size of the dataset is not too huge. For example, for this site, the size was around 1.3 MB for the three fields that are used from that table, and fits in memory without issues for each PHP process.

This is the outline for the code:

function keyword_link_replace($content) {
  static $static_data;

  if (!isset($static_data)) {
    if (($cache = cache_get('keyword_link_data')) &&
      !empty($cache->data)) {
      $static_data = $cache->data;

      foreach($cache->data as $keyword) {
        replace_keyword($keyword, $content);
      }
    }
    else {
      $result = db_query("SELECT * FROM {keyword_link} ORDER BY wid ASC");

      $data = array();

      while ($keyword = db_fetch_object($result)) {
        $data[] = $keyword;

        replace_keyword($keyword, $content);
      }

      $static_data = $data;

      cache_set('keyword_link_data', $data, 'cache');
    }
  }
  else {
    foreach($static_data as $keyword) {
      replace_keyword($keyword, $content);
    }
  }

  return $content;
}

You can download the full Drupal 6.x version with the fixes from here.

What a difference a query makes

The change was done at 22:00 in the daily graphs, and 6 days before the monthly graph was taken. You can see the difference in that the load average is less. Ranging between 1.8 and 2.4 for most of the day, with occasional spikes above 3. This is far better than the 5 or 6 load before the fix. Also the amount of data that is retrieved from MySQL is halved.

As you will notice, no change was seen in the number of SQL queries. This is probably because of the effect of MySQL's query cache. Since all the queries were the same for the same page, it served the result from the query cache, and did not have to re-execute the query for the tens or hundreds of times per page. Even though the query cache saved us from re-executing the query, there still is overhead in getting that data from MySQL's cache to the application, and that consumed CPU cycles.

Faster Node displays

And because we are processing less data, and doing less regular expression replacement, node display for nodes that have lots of nodes has improved. With a node that had hundreds of comments, and 50 comments shown per page, the total page load time was 8,068 milliseconds.

The breakdown was as follows:

keyword_link_replace() 51 calls totalling 2,429 ms
preg_replace() 147,992 calls totalling 1,087 ms
mysql_fetch_object() 150,455 calls totalling 537 ms
db_fetch_object() 150,455 calls totalling 415 ms
mysql_query() 1,479 calls totalling 393 ms
unserialize() 149,656 calls totalling 339 ms

A total of 5,254 milliseconds processing keywords in comments only.

After eliminating the calls to hook_comment() in the module, the total load time for the node was 3,122 milliseconds.

Conclusion

So, always look at the size of your dataset, as well as the frequency of resource intensive or slow operations. They could be bottlenecks for your application.

Mar 25 2013
Mar 25

I was working on a Drupal migration project today using the Migrate module where I needed to import only select user roles from the source (Drupal 6) database.

The Migrate module allows for a custom query to select only the user roles that need to be imported. In my case, the two roles I wanted to import had role IDs of 4 and 6. So, how do I write a query using the Drupal Database API to do this? Turns out there's a pretty elegant answer. Rather than writing something like:

SELECT * FROM role r WHERE rid=4 OR rid=6;

The proper way of writing the select query is:

$query = parent::query();
$ored = db_or();
$ored
->condition('rid', 4)
->condition('rid', 6);
$query->condition($ored);

Note the elegant "db_or()" function that returns a DatabaseCondition object. Add the two conditions to this object, and they're automagically "or"ed.

Trackback URL for this post:

http://drupaleasy.com/trackback/565

Average:

Your rating: None Average: 4.5 (2 votes)


More Quicktips

  • 2 of 108

Mar 19 2013
Mar 19

For sites that have lots of slow queries, disk access is often the bottleneck. For these slow queries, MySQL writes temporary tables to disk, populates them with intermediate results, then query them again for the final result.

We all know that the disk is the slowest part in a computer, because it is limited by being mechanical, rather than electronic. One way of mitigating this is to tell MySQL to use memory rather than disk for temporary tables.

This is often done by creating either a RAM Disk, or the easier to use tmpfs. Both are a portion of the server's RAM memory made to emulate a disk with slightly different details: RAM disk has a file system on it that can be ext3 or anything, while tmpfs is its own file system type.

Since memory access is much faster than a disk, this improves performance, and decreases load on the server by not causing pile up bottlenecks on disks.

We describe here methods to achieve this goal.

Method 1: Using an existing tmpfs directory

Rather than creating a new ram disk or tmpfs mount, we first search for one that is already on your server.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
...
tmpfs           1.6G  260K  1.6G   1% /run
...

This tells us that the the /run filesystem is of type temporary file system, and has 1.6 GB allocated for it.

# mount
...
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
...

On Ubuntu 12.04 LTS, the directory /run/mysqld already exists and is allocated to a tmpfs with sufficient space for temporary files.

Save yourself some grief and do not try to create your custom directory under /run (e.g. mysqlslow), because it will not survive reboots, and MySQL will not start after a reboot.

So, all we need is telling MySQL to use this directory.

To do this, create a file called /etc/mysql/conf.d/local.cnf. By using this file, and not editing /etc/mysql/my.cnf, we don't have Ubuntu updated overwrite your changes.

Add this to the file:

[mysqld]
tmpdir = /run/mysqld

Then restart MySQL

service mysql restart

Then make sure that the new value is now in effect:

# mysql
mysql> SHOW VARIABLES LIKE 'tmpdir';
+---------------+-------------+
| Variable_name | Value       |
+---------------+-------------+
| tmpdir        | /run/mysqld |
+---------------+-------------+

Method 2: Creating a new tmpfs directory

If you are not running Ubuntu 12.04 LTS, then you may not have a ready made RAM disk that you can use, and you have to create one.

Here are the steps to create a the tmpfs directory:

Create the tmp directory

# mkdir -p /var/mysqltmp

Set permissions

# chown mysql:mysql /var/mysqltmp

Determine mysql user id

# id mysql

Edit /etc/fstab

And add the following line, replacing your specific mysql user id and group id instead of the 105 and 114 below:

tmpfs /var/mysqltmp tmpfs rw,gid=105,uid=114,size=256M,nr_inodes=10k,mode=0700 0 0

Mount the new tmpfs partition

# mount -a

Change your MySQL configuration

# vi /etc/mysql/conf.d/local.cnf 

Change, or add the following line:

tmpdir = /var/mysqltmp

Restart MySQL

/etc/init.d/mysql restart

Or, for Ubuntu 12.04:

service mysql restart

How much of a difference does it make?

How much of a difference can you expect from moving MySQL's temporary files from disk to a RAM? Significant, if your server has lots of slow queries.

Here are the graphs from a site that was suffering considerably because of a large number of logged in users (averaging 250 at peak hours, and exceeding 400 at times), and some other factors.

Using a RAM disk made a noticeable difference.

CPU usage. Note how much iowait (magenta) before and after the change:

And how many slowqueries per second before and after the change:

The server's load is less too:

The Input Outputs per second on sda (where the /tmp is, which was the destination for slow queries before the change.

Feb 22 2013
Feb 22

Simplifying Wordpress and Drupal configurationAt last year's Drupalcon in Denver there was an excellent session called Delivering Drupal.  It had to do with the oftentimes painful process of deploying a website to web servers.  This was a huge deep dive session that went into the vast underbelly of devops and production server deployment.  There were a ton of great nuggets and I recommend watching the session recording for serious web developers.

The most effective takeway for me was the manipulation of the settings files for your Drupal site, which was only briefly covered but not demonstrated.  The seed of this idea that Sam Boyer presented got me wondering about how to streamline my site deployment with Git.  I was using Git for my Drupal sites, but not effectively for easy site deployment.  Here are the details of what I changed with new sites that I build.  This can be applied to Wordpress as well, which I'll demonstrate after Drupal.

Why would I want to do this?

When you push your site to production you won't have to update a database connection string after the first time.  When you develop locally you won't have to update database connections, either.

Streamlining settings files in Drupal

Drupal has the following settings file for your site:

sites/yourdomain.com/settings.php

This becomes a read only file when your site is set up and is difficult to edit.  It's a pain editing it to run a local site for development.  Not to mention if you include it in your git repository, it's flagged as modified when you change it locally.

Instead, let's go ahead and create two new files:

sites/yourdomain.com/settings.local.php
sites/yourdomain.com/settings.production.php

Add the following to your .gitignore file in the site root:

sites/yourdomain.com/settings.local.php

This will put settings.php and settings.production.php under version control, while your local settings.local.php file is not.  With this in place, remove the $databases array from settings.php.  At the bottom of settings.php, insert the following:

$settingsDirectory = dirname(__FILE__) . '/';
if(file_exists($settingsDirectory . 'settings.local.php')){
    require_once($settingsDirectory . 'settings.local.php');
}else{
    require_once($settingsDirectory . 'settings.production.php');
}

This code tells Drupal to include the local settings file if it exists, and if it doesn't it will include the production settings file.  Since settings.local.php is not in Git, when you push your code to production you won't have to mess with the settings file at all.  Your next step is to populate the settings.local.php and settings.production.php files with your database configuration.  Here's my settings.local.php with database credentials obscured.  The production file looks identical but with the production database server defined:

<?php
    $databases['default']['default'] = array(
      'driver' => 'mysql',
      'database' => 'drupal_site_db',
      'username' => 'db_user',
      'password' => 'db_user_password',
      'host' => 'localhost',
      'prefix' => '',
    );

Streamlining settings files in Wordpress

Wordpress has a similar process to Drupal, but the settings files are a bit different.  The config file for Wordpress is the following in site root:

wp-config.php

Go ahead and create two new files:

wp-config.local.php
?wp-config.production.php

Add the following to your .gitinore file in the site root:

wp-config.local.php

This will make it so wp-config.php and wp-config.production.php are under version control when you create your Git repository, but wp-config.local.php is not.  The local config will not be present when you push your site to production.  Next, open the Wordpress wp-config.php and remove the defined DB_NAME, DB_USER, DB_PASSWORD, DB_HOST, DB_CHARSET, and DB_COLLATE variables.  Insert the following in their place:

/** Absolute path to the WordPress directory. */
if ( !defined('ABSPATH') ) {
    define('ABSPATH', dirname(__FILE__) . '/');
}
if(file_exists(ABSPATH  . 'wp-config.local.php')){
    require_once(ABSPATH  . 'wp-config.local.php');
}else{
    require_once(ABSPATH . 'wp-config.production.php');
}

This code tells Wordpress to include the local settings file if it exists, and if it doesn't it will include the production settings file. Your next step is to populate the wp-config.local.php and wp-config.production.php files with your database configuration.  Here's my wp-config.local.php with database credentials obscured.  The production file looks identical but with the production database server defined:

<?php
// ** MySQL settings - You can get this info from your web host ** //
 
/** The name of the database for WordPress */
define('DB_NAME', 'db_name');
 
/** MySQL database username */
define('DB_USER', 'db_user');
 
/** MySQL database password */
define('DB_PASSWORD', 'db_user_password');
 
/** MySQL hostname */
define('DB_HOST', 'localhost');
 
/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');
 
/** The Database Collate type. Don't change this if in doubt. */
define('DB_COLLATE', '');

What's next?

Now that you're all set up to deploy easily to production with Git and Wordpress or Drupal, the next step is to actually get your database updated from local to production.  This is a topic for another post, but I've created my own set of Unix shell scripts to simplify this task greatly.  If you're ambitious, go grab my MySQL Loaders scripts that I've put on Github.

Dec 13 2012
Dec 13

I ran into a situation in which I created a new Input Format in Drupal 6 on a site that had a lot of content. This input format was similar to the default Filtered HTML input format, however I wanted it to have a few different options.

I wanted all anonymous users to only be able to access the Filtered HTML input format, but authenticated users (or users with specific roles) to be able to access this new input format. This part was easy with Drupal's administration interface the problem came when I wanted to change the input format for the body field for all the posts on this site to the new input format.

I really did not want to go into each post one by one, scroll down to the body field, and change the input format, so I came up with a quick database solution using MySQL.

The node body field data is stored in the node_revisions table in the MySQL database. In this table there is a column called "format" that contains the id of the input format that is being used on that post. I only wanted to change this for a few content types, not every content type on the site, I also only wanted to change it for content that was published (leaving the unpublished content alone).

The first step was to look up the input format in the filter_formats table to get the correct new format id to use. Depending on the type of database viewer you are using (PHPMyAdmin, MySQL Workbench, Command Line MySQL, etc) this might be slightly different. You can probably click on the database table to see the records or you can run

SELECT * FROM filter_formats;

This will give you something like:

In my case the format ID I needed was 4. Now I needed to construct the MySQL UPDATE query to go into the node_revisions table and update all the posts that were of the specific content types I needed that were also published. I also only wanted to change the ones that were set to Filtered HTML (leaving any other formats unchanged).

The query I constructed looked like:

UPDATE node_revisions AS nr
INNER JOIN node AS n ON nr.nid = n.nid
SET nr.format = 4
WHERE n.status = 1 AND n.type IN ('blog', 'content_2') AND nr.format = 1

Note: Be careful with this. Anytime you are editing the database manually you risk messing things up. I would suggest making a backup prior to doing this if you are unsure of what you are doing. Also, if you are using revisioning (which I was not in this example), this will cause some inconsistencies as this change should probably be registered as a new revision. In this case you might want to look into writing a Drupal script in PHP to load the nodes in question, change the status, and then execute a node_save. The MySQL route just happened to be easier for me.

Nov 28 2012
Nov 28

This video looks at the basics of working with MySQL from the command line. We get into the mysql environment and look at databases, tables and fields. We cover creating and deleting databases, creating a user, and querying within a particular database.

Nov 19 2012
Nov 19

The site is ScienceScape.org, a repository of scientific research going back to the 19th century, down to the latest biotechnology and cancer release.

Update: You can watch a video of the presentation on Vimeo.

AttachmentSize 535.43 KB
Oct 15 2012
Oct 15

We recently did a Drupal Performance Assessment project for a very large site, with a massive database. The database takes over 174GB, and houses a total of over 707 million rows.

Exporting and importing a database of that size is challenging in many ways.

We had to use different tools and techniques than the standard mysqldump utility, which was causing the import to exceed 14 days for this database. We were able to cut that down to less than 4 days.

While we plan to talk about this site in detail at Drupal Camp Toronto, and publish an article on it, this article is about a very useful byproduct.

Maatkit mk-parallel-dump

We first considered the use of Maatkit parallel dump. However, this is deprecated and is no longer supported in Percona Toolkit, the successor to Maatkit.

Mydumper and Myloader

We then found the right tool: Mydumper.

Mydumper is a very useful tool for faster backups for many sites with smaller database sizes. The tool is written by current and past MySQL/Oracle staff.

The trick for this tool is that it relies on doing dumps and imports in parallel using multiple threads, rather than using a single thread, which is slow and inefficient.

Compiling and Installing Mydumper from Source

Although Mydumper is available as a package in the repository for Ubuntu Server 12.04 LTS Precise Pangolin, it does not work, and casues segmentation fault error.

For this reason, we have to build it from source. However, a patch is also needed in order for it to compile correctly.

This is described in detail in the following steps:

Install the required packages for compiling from source

sudo aptitude install cmake g++ patch libglib2.0-dev \
  libmysqlclient-dev zlib1g-dev libpcre3-dev patch

Download Mydumper

You need to download Mydumper 0.5.1:

wget http://goo.gl/dVhJD

Extract the archive

Then extract the archive in a directory:

mv dVhJD mydumper-0.5.1.tar.gz 
tar xzf mydumper-0.5.1.tar.gz 

Change to the directory where you extracted the source:

cd mydumper-0.5.1/

Get and apply the patch

Then get the patch from Launchpad issue #1002291.

wget http://goo.gl/eUMx6

Now patch the source with what you just downloaded, as follows:

patch -p1 < eUMx6

Compile the source

Then you compile the source. Do not forget the "." after cmake!

cmake .
make

Install

Finally, you need to install the binaries to /usr/local/bin:

sudo make install

Using Mydumper

As we said earlier, Mydumper works by having multiple threads dumping multiple tables, or parts of tables, concurrently.

The program has many options, but the important ones are:

--database

Specifies the name of the database to dump

--threads

The number of threads to use. Generally set this to the number of CPUs in the server, but it can be more in some cases.

--rows

The number of rows per "chunk", before creating anothe chunk for the same table. We used 500,000, and got good results, but it depends on how many rows your tables have.

You can use this script for backup.

#!/bin/sh

DIR_BACKUP=/backup-directory
DB_NAME=database_name
DB_USER=user
DB_PASS=password

DOW=`date "+%a"`
DB_DUMP=$DIR_BACKUP/dbdump.$DB_NAME.$DOW

mydumper \
  --database=$DB_NAME \
  --host=$DB_HOST \
  --user=$DB_USER \
  --password=$DB_PASS \
  --outputdir=$DB_DUMP \
  --rows=500000 \
  --compress \
  --build-empty-files \
  --threads=2 \
  --compress-protocol

Save that file to /usr/local/bin/dbdump.sh

Make it executable using:

chmod +x /usr/local/bin/dbdump.sh

Create a crontab job to run it daily, say at 4:10 am local server time.

10 04 * * * /usr/local/bin/dbdump.sh

Description of Mydumper's output data

Mydumper does not output to files, but rather to files in a directory. The --outputdir option specifies the name of the directory to use.

The output is two parts:

Schema

For each table in the database, a file containing the CREATE TABLE statement will be created. It will be named:

dbname.tablename-schema.sql.gz

You could manipulate this schema and gain some advantages, for example uncompress the file, edit the table creation statement to remove all indexes, other than the primary index. This would make the loading of huge tables much faster than with indexes.

Remember to do a CREATE INDEX for each one after that.

Data

For each table with number of rows above the --rows parameter, you will have a file called:

dbname.tablename.0000n.sql.gz

Where "n" starts with 0 up to the number of.

Import/restore using Myloader

You don't need to know the internals of how Mydumper stores its data in order to do a restore/import. The Myloader tool does that for you.

If you are importing on a machine that has more CPUs than the machine you dumped on, you can use a higher number for the --threads parameter, for example:

myloader \
  --database=$DB_NAME \
  --directory=$DB_DUMP \
  --queries-per-transaction=50000 \
  --threads=6 \
  --compress-protocol \
  --verbose=3

So, what improvement can we see? Here are the details.

Before: Using mysqldump:

For a site that has 5.8 million rows in one table, and 1.39 million in another, there was a noticeable improvement of using Mydumper over mysqldump.

11:00:01 - Start dump of database to .../dbdump.live.Sat.sql.gz
11:08:33 - Successfully dumped database live

So, it takes 8.5 minutes for the dump to complete.

After: Using Mydumper:

After using Mydumper with --rows=500000, the backup was much faster:

11:00:01 - Start dump of database to .../dbdump.live.Mon
11:00:43 - Finished dumping the database live

Only 42 seconds!

Other approaches for faster backups

For sites that do not have large tables that warrant the use of Mydumper, there are other approaches that can help.

For example, you can exclude the data from certain tables with transient content, such as watchdog, sessions, and all the cache tables.

This partial code snippet uses some shell script wizardry to exclude such tables with transient data, and only include their table creation statements, but not their content:

  #!/bin/sh

  # Tables to backup structure, but not data
  DB_NO_DATA="/^(watchdog|sessions|cache(_.+)?)$/"

  ...

  TABLES=`mysql -u... -p... -B -N -e 'SHOW TABLES;' $DB_NAME`
  for TABLE in
  do
    RESULT=`echo $TABLE | awk "$DB_NO_DATA"`
    if [ "$RESULT" != "" ]; then
      # Structure only
      OPTS=" --no-data "
    else
      # Default, structure AND data
      OPTS=""
    fi
    mysqldump $OPTS -u... -p... $DB_NAME $TABLE >> $DB_DUMP
    if [ $? != 0 ]; then
      echo "ERROR: Failed in dumping database $DB_NAME"
    fi
  done

  # Compress the backup
  bzip2 $DB_DUMP
  mv $DB_DUMP.bz2 $DB_DUMP
  echo "Successful dump for database $DB_NAME"

A similar approach for excluding contents of tables with transient content is outlined in this article, if you are using Capistrano.

A different way that relies, like Mydumper, on doing parallel dump and restore, is to use Lullabot's MySQL Parallel, which uses a shell script wrapper around GNU Parallel, and Parallel bzip2. This is of course much better than plain mysqldump, but will not parallelize portions of tables. So a site with a huge table will do other tables quickly, but will be slow for the huge table since all of it will be done in a single thread.

Other use cases for Mydumper/Myloader

More information on how we used MyDumper/MyLoader to cut down the export/import of a massive Drupal database, can be found in our presentation: Huge! Drupal site with 381 modules and 174GB MySQL database and 200 million row tables at Drupal Camp Toronto in November 2012..

Further Information

You can find more information about Mydumper in the following articles:

Update: Added link to presentation that contains MyDumper/MyLoader info.

Jul 26 2012
Jul 26

The instructions still need some work. I'd did some updating but haven't tried using it with a clean install yet. After reading this it sounds like there's some bigger changes. I've also been trying to switch from macports to homebrew so that'll also mean some changes to this.

Install XCode

Install XCode from the App Store. Run Xcode and open its Preferences (⌘+,) select the Downloads tab and then the Components sub-tab. Click the Install button on the Command Line Tools component.

Install MacPorts

Become root

To follow these instructions you need to be running as the root user using the default sh shell. If you've got administrator permissions you can open up a Terminal window and switch users using the sudo command then provide your password.

[email protected]:~% sudo su
Password:
sh-3.2#

Install MySQL

Use port to install MySQL:

/opt/local/bin/port install mysql55-server

You'll need to create the databases:

sudo -u _mysql /opt/local/lib/mysql55/bin/mysql_install_db

Let launchd know it should start MySQL at startup.

/opt/local/bin/port load mysql55-server

Secure the server and set a new admin password:

/opt/local/lib/mysql55/bin/mysql_secure_installation

Create a configuration file:

cp /opt/local/share/mysql55/support-files/my-large.cnf /etc/my.cnf

Edit /etc/my.cnf using your editor of choice and make the following changes to the [mysqld]:

  • Change the maximum packet size to 16M:

    max_allowed_packet = 16M

  • Enable network access by ensuring the first line is commented out but add the second to limit access to the localhost with the second line:

    #skip-networking
    bind-address = 127.0.0.1

Restart MySQL to have the settings changes take effect:

port unload mysql55-server
port load mysql55-server

A last, optional, step is to create some symlinks for the executables so they're in the path:

ln -s /opt/local/lib/mysql55/bin/mysql /opt/local/bin/mysql
ln -s /opt/local/lib/mysql55/bin/mysqldump /opt/local/bin/mysqldump
ln -s /opt/local/lib/mysql55/bin/mysqlimport /opt/local/bin/mysqlimport

PHP

You need to create a php.ini file:

if ( ! test -e /private/etc/php.ini ) ; then cp /private/etc/php.ini.default /private/etc/php.ini; fi

Now open /private/etc/php.ini and set the correct location for MySQL's socket by finding:

mysqli.default_socket = /var/mysql/mysql.sock

And changing it to:

mysqli.default_socket = /opt/local/var/run/mysql5/mysqld.sock

Repeat for both mysql.default_socket and pdo_mysql.default_socket.

While you're editing php.ini you might as well set the timezone to avoid warnings. Locate the date.timezone setting uncomment it (by removing the semi-colon at the beginning of the line) and fill in the appropriate timezone:

date.timezone = America/New_York

Enable PHP by opening /private/etc/apache2/httpd.conf in the editor of your choice and making the following changes.

  • Uncomment this line:

    #LoadModule php5_module        libexec/apache2/libphp5.so

  • Find and change this one:

        DirectoryIndex index.html

    To this:

        DirectoryIndex index.php index.html

Then restart Apache:

apachectl graceful

Install PEAR / PECL

I scratched my head for a while on this one before finding this setup guide.

php /usr/lib/php/install-pear-nozlib.phar

Then add this line to your php.ini:

include_path = ".:/usr/lib/php/pear"

Now you can update the channels and upgrade the packages:

pear channel-update pear.php.net
pecl channel-update pecl.php.net
pear upgrade-all

Drush

If you're doing anything with Drupal you'll find Drush to be indispensable.

pear channel-discover pear.drush.org
pear install drush/drush

Memcache

You don't need this to run Drupal but I use it on production servers and I want to try to match the setup.

Use port to install and start memcached:

/opt/local/bin/port install memcached
/opt/local/bin/port load memcached

Since pecl won't let us pass --with-libmemcached-dir=/opt/local to the configure script, a simple work around is to just add some symlinks:

ln -s /opt/local/include/libmemcached /usr/include/
ln -s /opt/local/include/libmemcached-1.0 /usr/include/
ln -s /opt/local/include/libhashkit /usr/include/
ln -s /opt/local/include/libhashkit-1.0 /usr/include/
ln -s /opt/local/lib/libmemcached.dylib /usr/lib/
ln -s /opt/local/lib/libhashkit.dylib /usr/lib/

Then we can install the module:

pecl install memcached

You'll need to edit your /etc/php.ini and add the following line:

extension=memcached.so

If you want to clean up the symlinks (which will prevent pecl upgrade from being able to upgrade the module) here's how you do it:

unlink /usr/include/libmemcached
unlink /usr/include/libmemcached-1.0
unlink /usr/include/libhashkit
unlink /usr/include/libhashkit-1.0
unlink /usr/lib/libmemcached.dylib
unlink /usr/lib/libhashkit.dylib

XDebug

This is also optional, but I find it's very hand to use with MacGDBp to debug those tricky issues. It's also nice to use with webgrind for profiling.

Use pecl to install XDebug:

pecl install xdebug

You'll need to edit your /etc/php.ini uncomment the following line:

zend_extension="/usr/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so"

Then add this one:

xdebug.profiler_enable_trigger = 1


Which lets you enable the profiler by appending XDEBUG_PROFILE=1 in the query of a URL.

My VirtualHost Setup

I like being able to have multiple Drupal sites a few keystrokes away so I create virtual hosts for d5, d6 and d7 using the following procedure.

Edit /etc/apache2/users/amorton.conf and add a VirtualHost to the Apache config:

# This should really be in httpd.conf but i'm keeping it simple by doing it here:
NameVirtualHost *:80

<VirtualHost *:80>
    ServerName d7
    DocumentRoot /Users/amorton/Sites/d7
    <Directory /Users/amorton/Sites/d7>
        AllowOverride All
        Allow from all
    </Directory>
</VirtualHost>

<VirtualHost *:80>
    ServerName d8
    DocumentRoot /Users/amorton/Sites/d8
    <Directory /Users/amorton/Sites/d8>
        AllowOverride All
        Allow from all
    </Directory>
</VirtualHost>

Obviously you'd want to replace amorton with your username.

Add an entries to the /private/etc/hosts file:

127.0.0.1       d7
127.0.0.1       d8

Jul 09 2012
Jul 09

Since having moved to Mac from Linux around 1 year ago I have been using MAMP to develop with Drupal. I actually bought the PRO version and my favorite MAMP features are:

  • Easy creation of Virtual Hosts
  • Easy switching between PHP 5.2 and 5.3

That said, there are some major annoyances with MAMP:

  • PHP versions hardly ever get updated (even with paid PRO version), latest available is 5.3.6
  • PHPMyAdmin version included is very outdated and missing out on some nice features
  • MySQL version is stuck at 5.1
  • PHP requires a workaround to work in the terminal
  • Built-in PEAR doesn't play nice

So, after reading about VirtualDocumentRoot in this article I decided to take the plunge and try replacing my MAMP setup with the built-in Apache and PHP and using Homebrew to complement whatever was missing. Since I hardly use Drupal 6 anymore, I figured I can live without PHP 5.2.

What started out as a "I wonder if I can get this to work" 10-minute hobby thing turned out into a full-day task and after finally getting everything to work I'm really satisfied with the results. Since I ran into a bunch of pitfalls along the way I thought I'd document my findings for future reference and also to share with others thinking about doing the same thing.

Despite being fully aware that we can actually and effortlessly build a full LAMP stack using Homebrew, I figured since OSX ships with Apache and PHP let's use them!

Step 1 - Backup

This one is a no-brainer. Backup all your stuff, especially MySQL databases as you will have to import them manually later on.

Step 2 - Enable built-in Apache

Open MAMP and stop all running servers but don't uninstall MAMP just yet! Ok, now go to the "Tools" menu and click on "Enable built-in Apache server". Next, go to "System Preferences -> Sharing" and enable the "Web Sharing" checkbox.

Update for Mountain Lion: The "Web Sharing" option doesn't exist anymore! In order to get things working check out the excellent instructions in this article.

If you're having issues with the checkbox not staying on, don't worry it's a simple fix: replace your /etc/apache2/http.conf with the default version:

1
2
3
4
5
# Backup your current http.conf (or remove it altogether)
sudo mv /etc/apache2/http.conf /etc/apache2/http.conf.bak
 
# Copy over the default settings
sudo cp /etc/apache2/http.conf.default /etc/apache2/http.conf

Step 3 - Enable built-in PHP

This one is easy, edit your new http.conf file and uncomment the following line:

1
#LoadModule php5_module        libexec/apache2/libphp5.so

Step 4 - Install MySQL

Just download MySQL and install it! You can then go to "System Preferences -> MySQL" to start the server and set it up to start up automatically.

Now, all we need to do is add the mysql directory to our PATH and we're good. You can set this up either in your ~/.bashrc or .~/profile: (know that if you have a .bashrc then your .profile won't get used, so pick one or the other)

1
export PATH="/usr/local/mysql/bin:$PATH"

Step 5 - Install PHPMyAdmin

This is an optional but highly-recommended step. I'm not going to bother rewriting this as there's a great article here that gives precise and up-to-date instructions. After following the instructions you should be able to access PHPMyAdmin at http://localhost/~username/phpmyadmin.

In order to persist your settings in PHPMyAdmin, you will need to do the following:

  • Create a "pma" user and give it a password "pmapass" (or whatever)
  • Import the "create_tables.sql" file found in your PHPMyAdmin's example folder
  • Grant full access to the "pma" user for the "phpmyadmin" database
  • Now edit your config.inc.php and add the following:
1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
// Setup user and pass.
$cfg['Servers'][$i]['controluser'] = 'pma';
$cfg['Servers'][$i]['controlpass'] = 'pmapass'; // Replace this with your password.
 
// Setup database.$cfg['Servers'][$i]['pmadb'] = 'phpmyadmin';
 
// Setup tables.
$cfg['Servers'][$i]['bookmarktable'] = 'pma_bookmark';
$cfg['Servers'][$i]['relation'] = 'pma_relation';$cfg['Servers'][$i]['table_info'] = 'pma_table_info';
$cfg['Servers'][$i]['table_coords'] = 'pma_table_coords';
$cfg['Servers'][$i]['pdf_pages'] = 'pma_pdf_pages';
$cfg['Servers'][$i]['column_info'] = 'pma_column_info';
$cfg['Servers'][$i]['history'] = 'pma_history';$cfg['Servers'][$i]['designer_coords'] = 'pma_designer_coords';
$cfg['Servers'][$i]['tracking'] = 'pma_tracking';
$cfg['Servers'][$i]['userconfig'] = 'pma_userconfig';
$cfg['Servers'][$i]['hide_db'] = 'information_schema';
$cfg['Servers'][$i]['recent'] = 'pma_recent';$cfg['Servers'][$i]['table_uiprefs'] = 'pma_table_uiprefs';
 
// For good measure, make it shut up about Mcrypt.
$cfg['McryptDisableWarning'] = true;

At this point everything should be ok, but I couldn't get it to work. I found that logging into PHPMyAdmin as "pma" and then logging out made everything work. I won't bother to try and understand this, just happy it worked.

Step 6 - Install Homebrew

Ok, now things start to get fun! All you need to do to install Homebrew can be found in the very nice installation instructions.

Once Homebrew is installed, run brew doctor from your terminal and fix any errors that appear, it's pretty straightforward. If you are getting a weird error about "Cowardly refusing to continue at this point /", don't worry. I had to update my XCode version and install the Command Line Tools from within the XCode preferences to make brew work without complaining.

Step 7 - Configure VirtualDocumentRoot

Once again, there's great a great article on this already so I won't bother rewriting this step.

Are you still with me? Don't forget to add Google DNS or Open DNS or something else in your network preferences after 127.0.0.1 or you won't be able to access anything outside your local environment.

At this point you should probably test out accessing http://localhost/~username in your browser and make sure everything works. Also create a "test.dev" folder in your Sites directory and make sure you can access it via http://test.dev.

If, like myself, you received some access denied errors, you might have to complement the Directory setting added in your http.conf:

1
2
3
4
56
7
<Directory "/Users/alex/Sites"> # obviously replace this with your username
  Options FollowSymLinks Indexes
  AllowOverride All
  Order deny,allow
  RewriteEngine on  RewriteBase /
</Directory>

Also, be warned, if you start getting errors navigating to any path except the frontpage in a Drupal site, you might need to edit your .htacccess file and uncomment the DocumentRoot / line.

Step 8 - Install PEAR

This is an optional, but highly recommended step. First, download the installer:

1
2
3
4
5
# If you have wget installed (hint: use Homebrew)
wget http://pear.php.net/go-pear.phar
 
# If not
curl -0 http://pear.php.net/go-pear.phar > go-pear.phar

Now, just run the phar archive:

1
sudo php -d detect_unicode=0 go-pear.phar

For whatever reason PEAR wants to install itself in your home folder by default, I suggest moving it to "/usr/local/pear".

After installing, make sure that the PEAR include path has been added to the end of your /etc/php.ini:

1
2
3
;***** Added by go-pear
include_path=".:/usr/local/pear/share/pear"
;*****

(If you don't have an /etc/php.ini file, copy it over from /etc/php.ini.default)

You will also have to add the PEAR path to your PATH in your .bashrc or .profile (see step 4).

1
export PATH="/usr/local/pear/bin:$PATH"

Step 9 - Install Drush

With PEAR available, installing Drush is a breeze. I personally prefer to install Drush using PEAR because its easier to manage and update:

1
2
sudo pear channel-discover pear.drush.org
sudo pear install drush/drush

If you skipped Step 8 and don't have PEAR installed, don't worry, there's plenty of information in the README to get Drush up and running.

Step 10 - Install XDebug

Ok, we are almost there! The last thing we really need to get a fully-functional development environment is XDebug. Unfortunately, despite many articles mentioning installation via Homebrew, I couldn't find the formula, so let's compile and install it ourselves.

Updated: There's an XDebug brew formula in @josegonzalez's homebrew-php. Just follow the steps in 11 but replace "xhprof" with "xdebug".

Once again, there's a great article for this. If you get an "autoconf" error you might need to install it using Homebrew:

1
brew install autoconf

Now phpize should work. Don't worry if it gives you a warning, just follow the steps in the article and it'll work anyway!

Now we should make a couple tweaks to our php.ini to take advantage of XDebug:

1
2
3
4
56
7
8
9
1011
12
13
14
1516
# General PHP error settings.
error_reporting = E_ALL | E_STRICT # This is nice for development.
display_errors = On # Make sure we can see errors.
log_errors = On # Enable the error log.
html_errors = On # This is crucial for XDebug 
# XDebug specific settings.
[xdebug]
zend_extension="/usr/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so" # Enable the XDebug extension.
# The following are sensible defaults, alter as needed.
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_mode=req
xdebug.remote_host=127.0.0.1xdebug.remote_port=9000

Afterwards just restart Apache and run php -v or open a phpinfo() script and you will see that XDebug appears after the "Zend Engine" version.

Step 11 - Install XHProf

This step is optional and I had forgotten about it until Nick reminded me, so props to him! :)

XHProf is a PHP profiler which integrates nicely with the Devel Drupal module and here's how to install it using Homebrew:

1
2
3
4
# Tap into the php formulae by @josegonzalez.
brew tap josegonzalez/php
brew install php53-xhprof
# This will automatically install the PCRE dependency.

Now all we need to do is add it to our php.ini and we're golden:

1
2
3
4
[xhprof]
extension="/usr/local/Cellar/php53-xhprof/0.9.2/xhprof.so" # Make sure this is the path that homebrew prints when its done installing XHProf.
;This is the directory that XHProf stores it's profile runs in.
xhprof.output_dir=/tmp

Now restart apache and look for "xhprof" in your phpinfo() to confirm all went well.

Bonus Steps - APC, Intl and UploadProgress

If you added josegonzalez's excellent homebrew-php (step 11), you can easily install some extra useful extensions:

1
2
3
4
56
7
8
# Recommended for Drupal
brew install php53-uploadprogress
 
# Recommended for Symfony2
brew install php53-intlbrew install php53-apc
 
# Don't forget to add them all to your php.ini, as per brew's post-install notes!

Step 12 - Cleanup

Ok, we're done! Now go to PHPMyAdmin and re-create your databases and users and also import the backups you made from MAMP's version of PHPMyAdmin. Assuming you followed all the steps successfully, it is now safe to completely uninstall MAMP.

Good for you! :) Now you can go look over the shoulder of a coworker that uses MAMP, smirk and shake your head in disapproval.

May 31 2012
May 31

Posted May 31, 2012 // 0 comments

Wouldn't it be great if there was an easy way to access php.net or other documentation offline or on a plane?

UPDATE: Sadly, as this blog post went to press, two important updates came out that change the usefulness of this blog post. Dash is now ad-supported, and secondly, it ships with a Drupal DocSet available for download, so that's one fewer step you have to perform to have all the docs that matter to you in Dash.

There's a free as in beer application called Dash (available on the Mac App Store at http://itunes.apple.com/us/app/dash/id458034879?ls=1&mt=12) available for Mac OS X. Dash is a nice-looking documentation browser featuring several useful features, such as the ability to query it with a custom URL string (dash://YOURQUERY), which lends itself for use in tools like Alfred.

Dash can also download additional documentation sets for many open source technologies, including MySQL, PHP, and jQuery. It can be handy to search through the latest PHP API documentation no matter what kind of connection you're on, like so:

Dash - Documentation

In addition, Dash also has the ability to browse any API documentation that you have installed through XCode onto your system. (In fact, any files in DocSet format that are located inside the ~/Library/Developer/Shared/Documentation/DocSets directory can be read by Dash.)

In addition to the freely available DocSets that are available for major open-source technologies, it's easy to make your own DocSets using doxygen. I went ahead and made a DocSet for Drupal 7.x using doxygen. Not every method that's available at api.drupal.org is here, but it's a great start, especially if you want a single offline app where you can query offline documentation.

  1. Unzip the file
  2. Move org.drupal.docset to ~/Library/Developer/Shared/Documentation/DocSets/
  3. Launch Dash and start searching, like so.
Dash - Documentation

As Director of Engineering with Phase2, Steven Merrill is instrumental in propelling into its position as a leader in Drupal architecture and performance. His work in cloud-based hosting architecture, sophisticated caching structures, and ...

May 22 2012
May 22

Relational databases are able to store, with minimal fuss, pretty much any data entities you throw at them. For the more complex cases – particularly cases involving hierarchical data – they offer many-to-many relationships. Querying many-to-many relationships is usually quite easy: you perform a series of SQL joins in your query; and you retrieve a result set containing the combination of your joined tables, in denormalised form (i.e. with the data from some of your tables being duplicated in the result set).

A denormalised query result is quite adequate, if you plan to process the result set further – as is very often the case, e.g. when the result set is subsequently prepared for output to HTML / XML, or when the result set is used to populate data structures (objects / arrays / dictionaries / etc) in programming memory. But what if you want to export the result set directly to a flat format, such as a single CSV file? In this case, denormalised form is not ideal. It would be much better, if we could aggregate all that many-to-many data into a single result set containing no duplicate data, and if we could do that within a single SQL query.

This article presents an example of how to write such a query in MySQL – that is, a query that's able to aggregate complex many-to-many relationships, into a result set that can be exported directly to a single CSV file, with no additional processing necessary.

Example: a lil' Bio database

For this article, I've whipped up a simple little schema for a biographical database. The database contains, first and foremost, people. Each person has, as his/her core data: a person ID; a first name; a last name; and an e-mail address. Each person also optionally has some additional bio data, including: bio text; date of birth; and gender. Additionally, each person may have zero or more: profile pictures (with each picture consisting of a filepath, nothing else); web links (with each link consisting of a title and a URL); and tags (with each tag having a name, existing in a separate tags table, and being linked to people via a joining table). For the purposes of the example, we don't need anything more complex than that.

Here's the SQL to create the example schema:

CREATE TABLE person (
  pid int(10) unsigned NOT NULL AUTO_INCREMENT,
  firstname varchar(255) NOT NULL,
  lastname varchar(255) NOT NULL,
  email varchar(255) NOT NULL,
  PRIMARY KEY (pid),
  UNIQUE KEY email (email),
  UNIQUE KEY firstname_lastname (firstname(100), lastname(100))
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

CREATE TABLE tag (
  tid int(10) unsigned NOT NULL AUTO_INCREMENT,
  tagname varchar(255) NOT NULL,
  PRIMARY KEY (tid),
  UNIQUE KEY tagname (tagname)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;

CREATE TABLE person_bio (
  pid int(10) unsigned NOT NULL,
  bio text NOT NULL,
  birthdate varchar(255) NOT NULL DEFAULT '',
  gender varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (pid),
  FULLTEXT KEY bio (bio)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_pic (
  pid int(10) unsigned NOT NULL,
  pic_filepath varchar(255) NOT NULL,
  PRIMARY KEY (pid, pic_filepath)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_link (
  pid int(10) unsigned NOT NULL,
  link_title varchar(255) NOT NULL DEFAULT '',
  link_url varchar(255) NOT NULL DEFAULT '',
  PRIMARY KEY (pid, link_url),
  KEY link_title (link_title)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE person_tag (
  pid int(10) unsigned NOT NULL,
  tid int(10) unsigned NOT NULL,
  PRIMARY KEY (pid, tid)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

And here's the SQL to insert some sample data into the schema:

INSERT INTO person (firstname, lastname, email) VALUES ('Pete', 'Wilson', '[email protected]');
INSERT INTO person (firstname, lastname, email) VALUES ('Sarah', 'Smith', '[email protected]');
INSERT INTO person (firstname, lastname, email) VALUES ('Jane', 'Burke', '[email protected]');

INSERT INTO tag (tagname) VALUES ('awesome');
INSERT INTO tag (tagname) VALUES ('fantabulous');
INSERT INTO tag (tagname) VALUES ('sensational');
INSERT INTO tag (tagname) VALUES ('mind-boggling');
INSERT INTO tag (tagname) VALUES ('dazzling');
INSERT INTO tag (tagname) VALUES ('terrific');

INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (1, 'Great dude, loves elephants and tricycles, is really into coriander.', '1965-04-24', 'male');
INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (2, 'Eccentric and eclectic collector of phoenix wings. Winner of the 2003 International Small Elbows Award.', '1982-07-20', 'female');
INSERT INTO person_bio (pid, bio, birthdate, gender) VALUES (3, 'Has purply-grey eyes. Prefers to only go out on Wednesdays.', '1990-11-06', 'female');

INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete1.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete2.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (1, 'files/person_pic/pete3.jpg');
INSERT INTO person_pic (pid, pic_filepath) VALUES (3, 'files/person_pic/jane_on_wednesday.jpg');

INSERT INTO person_link (pid, link_title, link_url) VALUES (2, 'The Great Blog of Sarah', 'http://www.omgphoenixwingsaresocool.com/');
INSERT INTO person_link (pid, link_title, link_url) VALUES (3, 'Catch Jane on Blablablabook', 'http://www.blablablabook.com/janepurplygrey');
INSERT INTO person_link (pid, link_title, link_url) VALUES (3, 'Jane ranting about Thursdays', 'http://www.janepurplygrey.com/thursdaysarelame/');

INSERT INTO person_tag (pid, tid) VALUES (1, 3);
INSERT INTO person_tag (pid, tid) VALUES (1, 4);
INSERT INTO person_tag (pid, tid) VALUES (1, 5);
INSERT INTO person_tag (pid, tid) VALUES (1, 6);
INSERT INTO person_tag (pid, tid) VALUES (2, 2);

Querying for direct CSV export

If we were building, for example, a simple web app to output a list of all the people in this database (along with all their biographical data), querying this database would be quite straightforward. Most likely, our first step would be to query the one-to-one data: i.e. query the main 'person' table, join on the 'bio' table, and loop through the results (in a server-side language, such as PHP). The easiest way to get at the rest of the data, in such a case, would be to then query each of the many-to-many relationships (i.e. user's pictures; user's links; user's tags) in separate SQL statements, and to execute each of those queries once for each user being processed.

In that scenario, we'd be writing four different SQL queries, and we'd be executing SQL numerous times: we'd execute the main query once, and we'd execute each of the three secondary queries, once for each user in the database. So, with the sample data provided here, we'd be executing SQL 1 + (3 x 3) = 10 times.

Alternatively, we could write a single query which joins together all of the three many-to-many relationships in one go, and our web app could then just loop through a single result set. However, this result set would potentially contain a lot of duplicate data, as well as a lot of NULL data. So, the web app's server-side code would require extra logic, in order to deal with this messy result set effectively.

In our case, neither of the above solutions is adequate. We can't afford to write four separate queries, and to perform 10 query executions. We don't want a single result set that contains duplicate data and/or excessive NULL data. We want a single query, that produces a single result set, containing one person per row, and with all the many-to-many data for each person aggregated into that person's single row.

Here's the magic SQL that can make our miracle happen:

SELECT        person_base.pid,
              person_base.firstname,
              person_base.lastname,
              person_base.email,
              IFNULL(person_base.bio, '') AS bio,
              IFNULL(person_base.birthdate, '') AS birthdate,
              IFNULL(person_base.gender, '') AS gender,
              IFNULL(pic_join.val, '') AS pics,
              IFNULL(link_join.val, '') AS links,
              IFNULL(tag_join.val, '') AS tags
FROM (
    SELECT        p.pid,
                  p.firstname,
                  p.lastname,
                  p.email,
                  IFNULL(pb.bio, '') AS bio,
                  IFNULL(pb.birthdate, '') AS birthdate,
                  IFNULL(pb.gender, '') AS gender
    FROM          person p
    LEFT JOIN     person_bio pb
    ON            p.pid = pb.pid
) AS person_base
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CAST(join_tbl.pic_filepath AS CHAR)
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_pic join_tbl
    GROUP BY      join_tbl.pid
) AS pic_join
ON            person_base.pid = pic_join.pid
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CONCAT(
                              CAST(join_tbl.link_title AS CHAR),
                              '::',
                              CAST(join_tbl.link_url AS CHAR)
                          )
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_link join_tbl
    GROUP BY      join_tbl.pid
) AS link_join
ON            person_base.pid = link_join.pid
LEFT JOIN (
    SELECT        join_tbl.pid,
                  IFNULL(
                      GROUP_CONCAT(
                          DISTINCT CAST(t.tagname AS CHAR)
                          SEPARATOR ';;'
                      ),
                      ''
                  ) AS val
    FROM          person_tag join_tbl
    LEFT JOIN     tag t
    ON            join_tbl.tid = t.tid
    GROUP BY      join_tbl.pid
) AS tag_join
ON            person_base.pid = tag_join.pid
ORDER BY      lastname ASC,
              firstname ASC;

If you run this in a MySQL admin tool that supports exporting query results directly to CSV (such as phpMyAdmin), then there's no more fancy work needed on your part. Just click 'Export -> CSV', and you'll have your results looking like this:

pid,firstname,lastname,email,bio,birthdate,gender,pics,links,tags
3,Jane,Burke,[email protected],Has purply-grey eyes. Prefers to only go out on Wednesdays.,1990-11-06,female,files/person_pic/jane_on_wednesday.jpg,Catch Jane on Blablablabook::http://www.blablablabook.com/janepurplygrey;;Jane ranting about Thursdays::http://www.janepurplygrey.com/thursdaysarelame/,
2,Sarah,Smith,[email protected],Eccentric and eclectic collector of phoenix wings. Winner of the 2003 International Small Elbows Award.,1982-07-20,female,,The Great Blog of Sarah::http://www.omgphoenixwingsaresocool.com/,fantabulous
1,Pete,Wilson,[email protected],Great dude, loves elephants and tricycles, is really into coriander.,1965-04-24,male,files/person_pic/pete1.jpg;;files/person_pic/pete2.jpg;;files/person_pic/pete3.jpg,,sensational;;mind-boggling;;dazzling;;terrific

The query explained

The most important feature of this query, is that it takes advantage of MySQL's ability to perform subqueries. What we're actually doing, is we're performing four separate queries: one query on the main person table (which joins to the person_bio table); and one on each of the three many-to-many elements of a person's bio. We're then joining these four queries, and selecting data from all of their result sets, in the parent query.

The magic function in this query, is the MySQL GROUP_CONCAT() function. This basically allows us to join together the results of a particular field, using a delimiter string, much like the join() array-to-string function in many programming languages (i.e. like PHP's implode() function). In this example, I've used two semicolons (;;) as the delimiter string.

In the case of person_link in this example, each row of this data has two fields ('link title' and 'link URL'); so, I've concatenated the two fields together (separated by a double-colon (::) string), before letting GROUP_CONCAT() work its wonders.

The case of person_tags is also interesting, as it demonstrates performing an additional join within the many-to-many subquery, and returning data from that joined table (i.e. the tag name) as the result value. So, all up, each of the many-to-many relationships in this example is a slightly different scenario: person_pic is the basic case of a single field within the many-to-many data; person_link is the case of more than one field within the many-to-many data; and person_tags is the case of an additional one-to-many join, on top of the many-to-many join.

Final remarks

Note that although this query depends on several MySQL-specific features, most of those features are available in a fairly equivalent form, in most other major database systems. Subqueries vary quite little between the DBMSes that support them. And it's possible to achieve GROUP_CONCAT() functionality in PostgreSQL, in Oracle, and even in SQLite.

It should also be noted that it would be possible to achieve the same result (i.e. the same end CSV output), using 10 SQL query executions and a whole lot of PHP (or other) glue code. However, taking that route would involve more code (spread over four queries and numerous lines of procedural glue code), and it would invariably suffer worse performance (although I make no guarantees as to the performance of my example query, I haven't benchmarked it with particularly large data sets).

This querying trick was originally written in order to export data from a Drupal MySQL database, to a flat CSV file. The many-to-many relationships were referring to field tables, as defined by Drupal's Field API. I made the variable names within the subqueries as generic as possible (e.g. join_tbl, val), because I needed to copy the subqueries numerous times (for each of the numerous field data tables I was dealing with), and I wanted to make as few changes as possible on each copy.

The trick is particularly well-suited to Drupal Field API data (known in Drupal 6 and earlier as 'CCK data'). However, I realised that it could come in useful with any database schema where a "flattening" of many-to-many fields is needed, in order to perform a CSV export with a single query. Let me know if you end up adopting this trick for schemas of your own.

May 01 2012
May 01

Devopsdays Mountainview sold out in a short 3 hours .. but there's other events that will breath devops this summer.
DrupalCon in Munich will be one of them ..

Some of you might have noticed that I`m cochairing the devops track for DrupalCon Munich,
The CFP is open till the 11th of this month and we are still actively looking for speakers.

We're trying to bridge the gap between drupal developers and the people that put their code to production, at scale.
But also enhancing the knowledge of infrastructure components Drupal developers depend on.

We're looking for talks both on culture (both success stories and failure) , automation,
specifically looking for people talking about drupal deployments , eg using tools like Capistrano, Chef, Puppet,
We want to hear where Continuous Integration fits in your deployment , do you do Continuous Delivery of a drupal environment.
And how do you test ... yes we like to hear a lot about testing , performance tests, security tests, application tests and so on.
... Or have you solved the content vs code vs config deployment problem yet ?

How are you measuring and monitoring these deployments and adding metrics to them so you can get good visibility on both
system and user actions of your platform. Have you build fancy dashboards showing your whole organisation the current state of your deployment ?

We're also looking for people talking about introducing different data backends, nosql, scaling different search backends , building your own cdn using smart filesystem setups.
Or making smart use of existing backends, such as tuning and scaling MySQL, memcached and others.

So lets make it clear to the community that drupal people do care about their code after they committed it in source control !

Please submit your talks here

Feb 23 2012
Feb 23

One of our clients came to us with a performance issue on their Drupal 6 multi-site installation. Views were taking ages to save, the admin pages seemed unnecessarily sluggish, and clearing the cache put the site in danger of going down. They reported that the issue was most noticeable in their state-of-the-art hosting environment, yet was not reproducible on a local copy of the site — a baffling scenario as their 8 web heads and 2 database servers were mostly idle while the site struggled along.

Our performance analysis revealed two major issues. After implementing fixes, we saw the average time to save a Drupal view drop from 2 minutes 20 seconds to 4.6 seconds — a massive improvement. Likewise, the time to load the homepage on a warm cache improved from 2.3 seconds to 621 milliseconds. The two bugs that accounted for these huge gains turned out to be very interesting:

1. Intermediary system causes MySQL queries to slow down

Simple queries that are well indexed and cached, can see significant lag when delivering packets through an intermediary. This actually has nothing to do with Drupal, as it is reproducible from the MySQL command line utility. (It’s probably a bug in the MySQL libraries but we’re not entirely sure.) It could also be a problem with the intermediary but we’ve reproduced it in two fairly different systems: F5’s load balancer proxy and VMWare Fusion’s network drivers/stack.

For example:

SELECT cid, data, created, expire, serialized FROM cache_menu WHERE cid IN (x)

A query like this one should execute in a millisecond or less. In our client’s case, however, we found that 40ms was being added to the query time. The strange part is that this extra delay only occurred when the size of the data payload returned was above a certain threshold, so most of the time, similar queries returned quickly, but around 10–20 of these simple queries had 40ms or more added to their execution time, resulting in significant slowdowns.

We briefly debugged the MySQL process and found it to be waiting for data. Unfortunately, we didn’t pursue this much further as the simple workaround was apparent: reroute the MySQL traffic directly to the database instead of through the F5 load balancer. The net change from applying this simple modification is that the time to save a view was reduced to 25.3 seconds.

2. Database prefixing isn’t designed to scale as the number of prefixes increases

Drupal can be configured to share a database with another system or Drupal install. To do this, it uses a function called db_prefix_tables() to add prefixes to table names so they don’t collide with other applications’ table names. Our client was using the table prefixing system to allow various sub-sites to share data such as views and nodes, and thus they had 151 entries in the db_prefixes list.

The problem is that db_prefix_tables() is not well optimized for this implementation edge case. It will run an internal PHP function called strtr() (string token replace) for each prefix, on every database query string. In our case, saving a view executed over 9200 queries, meaning strtr() was called more than 1.4 million times!

We created a fix using preg_replace_callback() which resulted in both the number of calls and execution time dropping dramatically. Our view could now be saved in a mere 10.3 seconds. The patch is awaiting review in the Drupal issue queue, and there’s a patch for Pressflow 6, too, in case someone needs it before it lands in core.

The final tweaks included disabling the administration menu and the query logger. At that point, we finally reached a much more palatable 4.6 seconds for saving a view — still not as fast as it could be, but given the large number of views in the shared system, a respectable figure.

Feb 15 2012
Feb 15

I was half way done adding some info how to setup pecl/pear to my guide to running Drupal 6 on OS X 10.6 before I realized I'd been running Lion for almost nine months. So it seemed like a good excuse to update it for Lion. These might be a little wonky since I did an upgrade rather than a clean install so if you notice anything please drop me a line.

Note:I'll save you the trouble of commenting, I am familiar with MAMP but would rather punch myself myself in the face than use it. If you'd like to, go right, but I'm going to continue to compile my own so I know where everything ends up.

Install XCode

Install MacPorts

Become root

To follow these instructions you need to be running as the root user using the default sh shell. If you've got administrator permissions you can open up a Terminal window and switch users using the sudo command then provide your password.

[email protected]:~% sudo su
Password:
sh-3.2#

Install MySQL

Use port to install MySQL:

/opt/local/bin/port install mysql5-server

You'll need to create the databases:

/opt/local/bin/mysql_install_db5 --user=mysql

Let launchd know it should start MySQL at startup.

/opt/local/bin/port load mysql5-server

Secure the server and set a new admin password:

/opt/local/bin/mysql_secure_installation5

Create a configuration file:

cp /opt/local/share/mysql5/mysql/my-large.cnf /etc/my.cnf

Edit /etc/my.cnf using your editor of choice and make the following changes to the [mysqld]:

  • Change the maximum packet size to 16M:

    max_allowed_packet = 16M

  • Enable network access by ensuring the first line is commented out but add the second to limit access to the localhost with the second line:

    #skip-networking
    bind-address = 127.0.0.1

Restart MySQL to have the settings changes take effect:

/opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper restart

A last, optional, step is to create a symlink for the mysql5 executable so can be invoked as mysql and mysqldump5 as mysqldump:

ln -s /opt/local/bin/mysql5 /opt/local/bin/mysql
ln -s /opt/local/bin/mysqldump5 /opt/local/bin/mysqldump

PHP

You need to create a php.ini file:

if ( ! test -e /private/etc/php.ini ) ; then cp /private/etc/php.ini.default /private/etc/php.ini; fi

Now open /private/etc/php.ini and set the correct location for MySQL's socket by finding:

mysqli.default_socket = /var/mysql/mysql.sock

And changing it to:

mysqli.default_socket = /opt/local/var/run/mysql5/mysqld.sock

Repeat for both mysql.default_socket and pdo_mysql.default_socket.

While you're editing php.ini you might as well set the timezone to avoid warnings. Locate the date.timezone setting uncomment it (by removing the semi-colon at the beginning of the line) and fill in the appropriate timezone:

date.timezone = America/New_York

Enable PHP by opening /private/etc/apache2/httpd.conf in the editor of your choice and making the following changes.

  • Uncomment this line:

    #LoadModule php5_module        libexec/apache2/libphp5.so

  • Find and change this one:

        DirectoryIndex index.html

    To this:

        DirectoryIndex index.php index.html

Then restart Apache:

apachectl graceful

Install PEAR / PECL

I scratched my head for a while on this one before finding this setup guide.

php /usr/lib/php/install-pear-nozlib.phar

Then add this line to your php.ini:

include_path = ".:/usr/lib/php/pear"

Now you can update the channels and upgrade the packages:

pear channel-update pear.php.net
pecl channel-update pecl.php.net
pear upgrade-all

Memcache

You don't need this to run Drupal but I use it on production servers and I want to try to match the setup.

Use port to install and start memcached:

/opt/local/bin/port install memcached
/opt/local/bin/port load memcached

Since pecl won't let us pass --with-libmemcached-dir=/opt/local to the configure script, a simple work around is to just add some symlinks:

ln -s /opt/local/include/libmemcached /usr/include/
ln -s /opt/local/include/libmemcached-1.0 /usr/include/
ln -s /opt/local/include/libhashkit /usr/include/
ln -s /opt/local/include/libhashkit-1.0 /usr/include/
ln -s /opt/local/lib/libmemcached.dylib /usr/lib/
ln -s /opt/local/lib/libhashkit.dylib /usr/lib/

Then we can install the module:

pecl install memcached

You'll need to edit your /etc/php.ini and add the following line:

extension=memcached.so

If you want to clean up the symlinks (which will prevent pecl upgrade from being able to upgrade the module) here's how you do it:

unlink /usr/include/libmemcached
unlink /usr/include/libmemcached-1.0
unlink /usr/include/libhashkit
unlink /usr/include/libhashkit-1.0
unlink /usr/lib/libmemcached.dylib
unlink /usr/lib/libhashkit.dylib

XDebug

This is also optional, but I find it's very hand to use with MacGDBp to debug those tricky issues.

Use pecl to install XDebug:

pecl install xdebug

You'll need to edit your /etc/php.ini and add the following lines:

zend_extension="/usr/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so"
xdebug.profiler_enable_trigger = 1

My VirtualHost Setup

I like being able to have multiple Drupal sites a few keystrokes away so I create virtual hosts for d5, d6 and d7 using the following procedure.

Edit /etc/apache2/users/amorton.conf and add a VirtualHost to the Apache config:

# This should really be in httpd.conf but i'm keeping it simple by doing it here:
NameVirtualHost *:80

<VirtualHost *:80>
    ServerName d6
    DocumentRoot /Users/amorton/Sites/d6
    <Directory /Users/amorton/Sites/d6>
        AllowOverride All
    </Directory>
</VirtualHost>

<VirtualHost *:80>
    ServerName d7
    DocumentRoot /Users/amorton/Sites/d7
    <Directory /Users/amorton/Sites/d7>
        AllowOverride All
    </Directory>
</VirtualHost>

Obviously you'd want to replace amorton with your username.

Add an entries to the /private/etc/hosts file:

127.0.0.1       d6
127.0.0.1       d7

Feb 01 2012
Feb 01

In this video we set up a LAMP stack web server for local development on an Ubuntu desktop (version 11.10). We will walk through installing and using tasksel to get the web server installed, then we'll add phpmyadmin to the mix. Once we have the server up and running, we'll look at where the web root is and how to add websites. Finally we wrap up with a quick look at how to start, stop, and restart our server when needed.

Jan 24 2012
Jan 24

This video shows you how to download and do the initial setup of MAMP, which is Apache, MySQL and PHP for Macintosh. It shows some basic configuration tweaks to change the port from 8888 to the default of 80 so that you can just visit the localhost in the browser and get your Drupal installation to appear. It also provides a general orientation to MAMP, and some other initial configuration setting changes.

Jan 24 2012
Jan 24

This video walks through the process of downloading and doing the initial configuration of WampServer, which is Apache, MySQL and PHP for Windows. It shows how to turn it on and off, as well as how to get files to show up from the localhost server.

Jan 09 2012
Jan 09

Drush of the matter

I'm a Front End Drupal Developer / Themer / Site Builder and admittedly have been slow in adopting tools like Drush and Git for my workflow. Not because I didn't want to but only that I am very right-brained, so the command line / Terminal to say the least has not been my friend. However, I know these tools increase productivity immensely and I have slowly been learning some skills here. Thanks to a few other Drupal Developers in the community helping me out over Skype and email, and reading the issue queues on drupal.org, I now at least have a working knowledge and have seen the light and advantages of using these tools. In particular, David Lanier for helping me see the light with Git on a project we worked on together and recently, Ben Buckman for showing me the power of Drush and Git. These two guys know their Drupal stuff. I'd like to think my days of FTPing a module update up to the server are in the past but this won't happen overnight as I need to retool a lot of my sites for this.

Workflow

My new workflow basically is to develop locally using MAMP Pro 2.0 for Mac OS X Lion. (I won't say much about Lion except for that fact that I don't love it. I think it's a brilliant OS but it seems to run slowly on my Mac Pro Quad Core with 12 GB of ram, even with a clean install.) My local Drupal dev site is in Git and I commit and push any changes to Github. I then pull those changes to the production server directly from Github. I have even played around with creating branches like a modules-update branch for example. I think I also want to have a branch for theming as well. The idea is that I pull a given updated branch to the server and if all goes well, I merge and then pull the master branch.

The 'Drush' of the matter

The 'Drush' of the Matter and Other Drupal Musings

Well, here's the crux of the matter. Recently I ran into an issue with a local Drupal 6 dev site when trying to use Drush. When I input the Drush command I would get some mysql errors and then page.tpl.php would print out right in OS X terminal. I could not even run Drush sql-connect successfully but that told me that Drush was not even seeing MySQL.

The error was:

  1. mysqli_query() expects parameter 1 to be mysqli, null given database.mysqli.inc:114 [warning]

  2. mysqli_errno() expects parameter 1 to be mysqli, null given database.mysqli.inc:128 [warning]

  3. (and then page.tpl.php proceeds to print out in terminal...)


It was the oddest thing. Searching for the error I was getting didn't really yield any good fix, I found where others were getting this error but not anything concrete to go on, the error was fairly vague. The other weird part about this was that Drush was working fine on other local dev sites but not this one. The problem site in question was using the Domain Access Module so I figured it was something with my local config either in the Hosts file or in MAMP Pro itself. I read endless post after post on drupal.org of possible fixes for Drush not working but still no joy. I even filed my own issue as well. I created a drushrc.php file in the webroot in /sites/all/drush/ and specified my local site: $options['l'] = 'mysite.drupal'; but that did not help to fix it either. It seemed like I spent days trying to get this to work and finally I moved on.

The issue reveals itself

Yesterday, I installed a new Drupal 7 site I am working on for a client via Drush and all seemed to go well. However, after I started using Drush on this new site to download and install some third party modules, though it was working of sorts, I was getting a little error. Part of the error was:

  1. No such file or directory

  2. (trying to connect via unix:///var/mysql/mysql.sock)

  3. [warning]

.

hmm, interesting, Drush is looking for mysql where it normally would be had I installed it natively, not where MAMP has it placed. I used my google-foo and came up with a very interesting issue on drupal.org in regard to MAMP and in particular MAMP v. 2 being used in combination with Drush. It turns out that if you are using MAMP you need to tell Drush to look for MySQL in the MAMP directory and not in  /var/mysql/.

The fix

The fix was to simply open Terminal and execute these two commands:

  1. sudo mkdir /var/mysql

  2. sudo ln -s /Applications/MAMP/tmp/mysql/mysql.sock /var/mysql/mysql.sock


Basically this creates a symbolic link to point Drush to the proper location of MySQL which in my case was in MAMP. Not only did this fix this new Drupal 7 Drush error but it also fixed that old Drupal 6 Domain Access site Drush error. The part that baffles me is why I didn't get this same error on said Drupal 6 site, if I had, I probably would have fixed this much sooner.

Moral of the story?

Ultimately I found there was a patch made to the Drush Readme documentation file. Had I throughly read the documentation from the beginning, I would not have had to go through much of this. On the other hand, my right-brained creative side often prevents me from doing so and that's both painful and exhilarating at the same time. Ask my wife, Elise, she will be the first one to say that I never read the manual for anything.

etc...

Next up, learn how to use the Features Module and make this web site a responsive design but hopefully with less hair pulling as I don't have much left...

Resources for this post

Dec 22 2011
Dec 22

I see a lot of people coming by #centos and similar channels asking for help when they’re experiencing a problem with their Linux system. It amazes me how many people describe their problem, and then say something along the lines of, “and I disabled SELinux...”. Most of the time SELinux has nothing to do with the problem, and if SELinux is the cause of the problem, why would you throw out the extra security by disabling it completely rather than configuring it to work with your application? This may have made sense in the Fedora 3 days when selinux settings and tools weren’t quite as fleshed out, but the tools and the default SELinux policy have come a long way since then, and it’s very worthwhile to spend a little time to understand how to configure SELinux instead of reflexively disabling it. In this post, I’m going to describe some useful tools for SELinux and walk through how to configure SELinux to work when setting up a Drupal web site using a local memcached server and a remote MySQL database server -- a pretty common setup for sites which receive a fair amount of traffic.

This is by no means a comprehensive guide to SELinux; there are many of those already!
http://wiki.centos.org/HowTos/SELinux
http://fedoraproject.org/wiki/SELinux/Understanding
http://fedoraproject.org/wiki/SELinux/Troubleshooting

Too Long; Didn’t Read Version

If you’re in a hurry to figure out how to configure SELinux for this particular type of setup, on CentOS 6, you should be able to use the following two commands to get things working with SELinux:
# setsebool -P httpd_can_network_connect_db 1
# setsebool -P httpd_can_network_memcache 1

Note that if you have files existing somewhere on your server and you move them to the webroot rather than untar them there directly, you may end up with SELinux file contexts set incorrectly on them which will likely deny access to apache to read those files. If you are having a related problem, you’ll see something like this in your /var/log/audit/audit.log:
type=AVC msg=audit(1324359816.779:66): avc: denied { getattr } for pid=3872 comm="httpd" path="/var/www/html/index.php" dev=dm-0 ino=549169 scontext=root:system_r:httpd_t:s0 tcontext=root:object_r:user_home_t:s0 tclass=file

You can solve this by resetting the webroot to its default file context using the restorecon command:
# restorecon -rv /var/www/html

Server Overview

I’m going to start with a CentOS 6 system configured with SELinux in targeted mode, which is the default configuration. I’m going to be using httpd, memcached, and PHP from the CentOS base repos, though the configuration wouldn’t change if you were to use the IUS PHP packages. MySQL will be running on a remote server which gives improved performance, but means a bit of additional SELinux configuration to allow httpd to talk to a remote MySQL server. I’ll be using Drupal 7 in this example, though this should apply to Drupal 6 as well without any changes.

Initial Setup

Here we will setup some prerequisites for the website. If you already have a website setup you can skip this section.

We will be using tools such as audit2allow which is part of the policycoreutils-python package. I believe this is typically installed by default, but if you did a minimal install you may not have it.
# yum install policycoreutils-python

Install the needed apache httpd, php, and memcached packages:
# yum install php php-pecl-apc php-mbstring php-mysql php-pecl-memcache php-gd php-xml httpd memcached

Startup memcached. The CentOS 6 default configuration for memcached only listens on 127.0.0.1, this is great for our testing purposes. The default of 64M of RAM may not be enough for a production server, but for this test it will be plenty. We’ll just start up the service without changing any configuration values:
# service memcached start

Startup httpd. You may have already configured apache for your needs, if not, the default config should be enough for the site we’ll be testing.
# service httpd start

If you are using a firewall, then you need to allow at least port 80 through so that you can access the website -- I won’t get into that configuration here.

Install Drupal. I’ll be using the latest Drupal 7 version (7.9 as of this writing). Direct link: http://ftp.drupal.org/files/projects/drupal-7.9.tar.gz
Download the tarball, and expand it to the apache web root. I also use the --strip-components=1 argument to strip off the top level directory, otherwise it would expand into /var/www/html/drupal-7.9/
# tar zxf drupal-7.9.tar.gz -C /var/www/html --strip-components=1

Also, we need to get the Drupal site ready for install by creating a settings.php file writable by apache, and also create a default files directory which apache can write to.
# cd /var/www/html/sites/default/
# cp default.settings.php settings.php
# chgrp apache settings.php && chmod 660 settings.php
# install -d -m 775 -g apache files

Setup a database and database user on your MySQL server for Drupal. This would be something like this:
mysql> CREATE DATABASE drupal;
mysql> GRANT ALL ON drupal.* TO [email protected] IDENTIFIED BY 'somepassword';

Test this out by using the mysql command line tool on the web host.
# mysql -u drupal_rw -p -h drupal

That should connect you to the remote MySQL server. Be sure that is working before you proceed.

Now for the Fun Stuff

If you visit your new Drupal site at http://your-hostname-here, you’ll be presented with the Drupal installation page. Click ahead a few times, setup your DB info on the Database Configuration page -- you need to expand “Advanced Options” to get to the hostname field since it assumes localhost. When you click the button to proceed, you’ll probably get an unexpected error that it can’t connect to your database -- this is SELinux doing its best to protect you!

Allowing httpd to Connect to a Remote Database

So what just happened? We know the database was setup properly to allow access from the remote web host, but Drupal is complaining that it can’t connect. First, you can look in /var/log/audit/audit.log which is where SELinux will log access denials. If you grep for ‘httpd’ in the log, you’ll see something like the following:
# grep httpd /var/log/audit/audit.log
type=AVC msg=audit(1322708342.967:16804): avc: denied { name_connect } for pid=2724 comm="httpd" dest=3306 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

That is telling you, in SELinux giberish language, that the httpd process was denied access to connect to a remote MySQL port. For a better explanation of the denial and some potential fixes, we can use the ‘audit2why’ utility:
# grep httpd /var/log/audit/audit.log | audit2why
type=AVC msg=audit(1322708342.967:16804): avc: denied { name_connect } for pid=2724 comm="httpd" dest=3306 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:mysqld_port_t:s0 tclass=tcp_socket

Was caused by:
One of the following booleans was set incorrectly.
Description:
Allow HTTPD scripts and modules to connect to the network using TCP.

Allow access by executing:
# setsebool -P httpd_can_network_connect 1
Description:
Allow HTTPD scripts and modules to connect to databases over the network.

Allow access by executing:
# setsebool -P httpd_can_network_connect_db 1

audit2why will analyze the denial message you give it and potentially explain ways to correct it if it is something you would like to allow. In this case, there are two built in SELinux boolean settings that could be enabled for this to work. One of them, httpd_can_network_connect, will allow httpd to connect to anything on the network. This might be useful in some cases, but is not very specific. The better option in this case is to enable httpd_can_network_connect_db which limits httpd generated network connections to only database traffic. Run the following command to enable that setting:
# setsebool -P httpd_can_network_connect_db 1

It will take a few seconds and not output anything. Once that completes, go back to the Drupal install page, verify the database connection info, and click on the button to continue. Now it should connect to the database successfully and proceed through the installation. Once it finishes, you can disable apache write access to the settings.php file:
# chmod 640 /var/www/html/sites/default/settings.php

Then fill out the rest of the information to complete the installation.

Allowing httpd to connect to a memcached server

Now we want to setup Drupal to use memcached instead of storing cache information in MySQL. You’ll need to download and install the Drupal memcache module available here: http://drupal.org/project/memcache
Install that into your Drupal installation, and add the appropriate entries into settings.php. For this site, I did that with the following:
# mkdir /var/www/html/sites/default/modules
# tar zxf memcache-7.x-1.0-rc2.tar.gz -C /var/www/html/sites/default/modules

Then edit settings.php and add the following two lines:
$conf['cache_backends'][] = 'sites/default/modules/memcache/memcache.inc';
$conf['cache_default_class'] = 'MemCacheDrupal';

Now if you reload your site in your web browser, you’ll likely see a bunch of memcache errors -- just what you wanted! I bet it’s SELinux at it again! Check out /var/log/audit/audit.log again and you’ll see something like:
type=AVC msg=audit(1322710172.987:16882): avc: denied { name_connect } for pid=2721 comm="httpd" dest=11211 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:memcache_port_t:s0 tclass=tcp_socket

That’s very similar to the last message, but this one is for a memcache port. What does audit2why have to say?
# grep -m 1 memcache /var/log/audit/audit.log | audit2why
type=AVC msg=audit(1322710172.796:16830): avc: denied { name_connect } for pid=2721 comm="httpd" dest=11211 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:memcache_port_t:s0 tclass=tcp_socket

Was caused by:
One of the following booleans was set incorrectly.
Description:
Allow httpd to act as a relay

Allow access by executing:
# setsebool -P httpd_can_network_relay 1
Description:
Allow httpd to connect to memcache server

Allow access by executing:
# setsebool -P httpd_can_network_memcache 1
Description:
Allow HTTPD scripts and modules to connect to the network using TCP.

Allow access by executing:
# setsebool -P httpd_can_network_connect 1

Again, audit2why gives us a number of options to fix this. The best bet is to go with the smallest and most presice change for our needs. In this case there’s another perfect fit: httpd_can_network_memcache. Enable that boolean with the following command:
# setsebool -P httpd_can_network_memcache 1

Success! Now httpd can talk to memcache. Reload your site a couple of times and you should no longer see any memcache errors. You can be sure that Drupal is caching in memcache by connecting to the memcache CLI (telnet localhost 11211) and typing ‘stats’. You should see some number greater than 0 for ‘get_hits’ and for ‘bytes’.

What are all these booleans anyway?

Now we’ve used a couple SELinux booleans to allow httpd to connect to memcached and MySQL. You can see a full list of booleans which you can control by using the command ‘getsebool -a’. They are basically a preset way for you to allow/deny certain pre-defined access controls.

Restoring default file contexts

As I mentioned briefly in the ‘TL;DR’ section, another common problem people experience is with file contexts. If you follow my instructions exactly, you won’t have this problem because we untar the Drupal files directly into the webroot, so they will inherit the default file context for /var/www/html. If, however, you were to untar the files in your home directory, and then use ‘mv’ or ‘cp’ to place them in /var/www/html, they will maintain the user_home_t context which apache won’t be able to read by default. If this is happening to you, you will see the file denials logged in /var/log/audit/audit.log -- something like this:
type=AVC msg=audit(1324359816.779:66): avc: denied { getattr } for pid=3872 comm="httpd" path="/var/www/html/index.php" dev=dm-0 ino=549169 scontext=root:system_r:httpd_t:s0 tcontext=root:object_r:user_home_t:s0 tclass=file

The solution in this case is to use restorecon to reset the file contexts back to normal:
# restorecon -rv /var/www/html

Update: It was noted that I should also mention another tool for debugging audit messages, 'sealert'. This is provided in the setroubleshoot-server package and will also read in the audit log, similar to what I described with audit2why.
# sealert -a /var/log/audit/audit.log

Dec 09 2011
Dec 09

Drupal can power any site from the lowliest blog to the highest-traffic corporate dot-com. Come learn about the high-end of the spectrum with this comparison of techniques for scaling your site to hundreds of thousands or millions of page views an hour. This Do it with Drupal session with Nate Haug will cover software that you need to make Drupal run at its best, as well as software that acts as a front-end cache (a.k.a Reverse-Proxy Cache) that you can put in-front of your site to offload the majority of the processing work. This talk will cover the following software and architectural concepts:

  • Configuring Apache and PHP
  • MySQL Configuration (with Master/Slave setups)
  • Using Memcache to reduce database load and speed up the site
  • Using Varnish to serve up anonymous content lightning fast
  • Hardware overview for high-availability setups
  • Considering nginx (instead of Apache) for high amounts of authenticated traffic
Aug 04 2011
Aug 04

So the scenario is this: you notice that you are no longer able to login on your Drupal site:

  • the {users} table entry for your account looks just fine
  • the login and access timestamps on your account are just a few seconds old, matching your latest attempt to login
  • you reset your password in the DB, just in case, and it still does not work
  • the telltale is that Drupal will not even tell you your login failed: it actually works, as the {users} table shows, but yet you are not logged in

Can you guess what can have been going wrong and fix it ?

Actually, in such cases, and at least on most sites without high-end technologies like Memcached or MongoDB sessions, there is one other table you should be looking at, and that is the {sessions} table. If, like most Drupal sites, you are running on MySQL, chances are that your {sessions} table has crashed. This is most common in Drupal versions up to 6, which use the MyISAM engine by default, typically unchanged on smaller sites, whereas Drupal 7 defaults to InnoDB, and larger sites will also have installed Drupal 6 on InnoDB too.

What is happening in such a case is that Drupal actually updates your login entry into the {users} table, but then when the page ends, PHP tries to invoke the Drupal DB-based session handler to write back information about your session into the {sessions} table, and fails because the table is crashed. If you look at your web server logs, you will then notice tons of messages like this (typical Apache: /var/log/apache2/error.log):

[Thu Aug 04 09:34:27 2011] [error] [client (IP masked)] PHP Warning:  Table './(site masked)/sessions' is marked as crashed and last (automatic?) repair failed\nquery: UPDATE sessions SET uid = 0, cache = 0, hostname = '(IP masked)', session = 'language|s:2:\\"en\\";', timestamp = 1312443267 WHERE sid = '1c3a573a10ff949d31c193c42365c7d7' in /(path masked)/includes/database.mysql.inc on line 174, referer: http://(site masked)/user

These happen when Drupal tries to update a session entry. And then tons of similar messages on SELECT queries instead of UPDATE for the attemps by Drupal at locating an existing session for the user.

The solution is then obvious: ssh to your server and run something like:

mysqlcheck -rev (your database) sessions

Do not forget to add the DB prefix if your are using one, like this:

mysqlcheck -rev mysingledb site1_sessions

This should restore your table to sanity. If you do not have access to a command line, as can often happen with smaller sites on shared hostings, phpmyadmin includes an option to "repair" a table, which does essentially the same thing.

One point to notice about this is that it will lock the table during the entire repair process, so your site will be frozen in the meantime.

Afterwards, you might want to consider taking a backup of your site, then converting it to InnoDB. Do not forget that this means a different configuration in /etc/mysql/my.cnf to attain good performance: you will likely want to reduce the memory space allocated to MyISAM and increase the InnoDB buffer pool, for a start ; then follow one of the InnoDB optimization tutorials available everywhere.

Oh, and if you are still running a Drupal 5 site, it is high time to upgrade: Drupal 5 has been out of support for over six months now ! Time to rebuild on Drupal 7 !

Jan 06 2011
Jan 06

Congratulations to the Drupal community for getting version 7.0 released! This is a major mile stone and an excellent reason to celebrate!

If you want to give Drupal 7 a try without having to install anything, I've now updated my Drupal 7 appliances on SuSE Studio to the latest release. The appliance is based on openSUSE Linux 11.3 and is available in two variants:

  • A text-mode only appliance to which you connect using your local web browser via the network.
  • A GUI version that starts up the Firefox browser in a minimalistic GNOME desktop to perform the installation locally. Remote network access is available, too.
The database backend is MySQL 5.1, with the InnoDB plugin and strict mode enabled. phpMyAdmin has been added to support web-based administration of the MySQL server. You can access it via http://localhost/phpMyAdmin/. I also added drush, the Drupal command line shell and scripting interface and some additional packages (yast2-http-server, bind-utils, php5-ctype, patch). I also bumped up the appliance's version number to 7.0.0, to match the version number of Drupal included.

The appliance is available in various formats:

  • A live raw disk image, ready to be written to an USB stick or flash drive
  • A live ISO image, to be burned on a CD or used in a virtual machine
  • A hard disk image, to be dumped on a hard disk drive
  • Various virtual disk formats, e.g. OVF, VMWare/VirtualBox/KVM and Xen
Please see the installation instructions provided on the download pages for details on how to use the various image formats.

So congratulations to the Drupal developer community for reaching this goal and thanks to SuSE/Novell/Attachmate for providing the infrastructure for creating such appliances. I also would like to especially thank Richard Bos for the testing and many suggestions for improvement of these appliances!

Jan 04 2011
Jan 04

When doing development work, from time to time it is handy to be able to look up documentation. Bookmarking manuals is handy, but often you still need to search for the function you're after. Firefox, and possibly other browsers (not Chrome or Chromium), allows you to setup a keyword bookmark linked to a search.

I've setup a few search bookmarks for development resources. This is how I've done it:

  1. Select Bookmarks > Organise Bookmarks... from the menu.
  2. Right click on the bookmarks menu (or any folder) on the left pane
  3. Select New Bookmark... from the context menu
  4. Drupal bookmark example
    Fill in the information for the bookmark, the import piece is the keyword, that will allow us to search.
  5. Click save and return to the browser

Now when we want to search the Drupal 7 API, we can just type "dapi ",>

Example Drupal API search in location bar

Now we should see the appropriate page from the Drupal API documentation.

Example Drupal API page

The same method can be used for other PHP web app developer resources, here are some which I'm using.

  • I've found Google to be the best resource for getting help with javascript

I could have implemented this using OpenSearch plugins, but that requires changing the search provider everytime I want to look for something. By using keyword bookmarks I just type the keyword and the search term into the location bar.

Feel free to share your keyword bookmarks in the comments.

Dec 29 2010
Dec 29

For some time now I've been working on a Drupal site that consists mainly of scraped content from a proprietary, ASP based CMS from the late nineties. The Simple HTML Dom Parser, used from within a drush script, has been invaluable. It made scraping the old site content and importing it as Drupal nodes a relative breeze. (No access to the database used by the CMS, boo!)

Part of setting up the new site is importing users and their content profile nodes from a different Drupal site, that was setup a year or two ago to manage an event.

I had hoped there would be a way for me to export these users and their profile nodes from one Drupal to the other, but though I found modules to export one or the other, I might still end up with profile nodes that were no longer related to their users. Of course, that's pretty useless.

When I remembered I was also supposed to add all these users to a simplenews newsletter, the proverbial light bulb came on.

A spot of theory

I have previously done some bulk simplenews subscriptions by selecting Drupal data from one database table into another. In theory, that works fine across different databases as well. By inserting selected data from one Drupal's users table into another (avoiding uid clashes) I should be able to copy all users, keeping a predictable uid, so I could then do the same for their profile nodes.

First off, export the profile content type to the new Drupal site using the CCK content export module.

The grab the schema of all the tables I would need to transpose data to. These should be the same as the tables the data will come from. In my case the tables were: users, content_type_profile, node and node_revisions. The easiest way to grab the schema is via the mysql command line tool, using this query:

MySQL [source]> SHOW CREATE TABLE users\G

The \G means the data is shown in a single column, not in a column per field, which wouldn't fit across the terminal. Just repeat this for the other tables that need to be transposed and keep the output in a text editor window or something. Basically, this is just so you can easily copy and paste the field names.

At the bottom of each schema, you'll see a line like this one:

) ENGINE=InnoDB AUTO_INCREMENT=6221 DEFAULT CHARSET=utf8

The one from the node_revisions table is the important one. The AUTO_INCREMENT value tells us the next id that will be assigned on the primary auto_increment field in that table. In this case that is the vid field. It's important, because in all likelihood there is already a vid in this table for each value below the AUTO_INCREMENT value.

If you simply copy the node data across, chances some node.nid or node_revisions.vid will clash, causing the copy to break, leaving you with a cleanup to do. They do need to be included in the copy, though, because they are the way the user, her profile node and its CCK fields are linked together. If I simply copied all the other table columns and had the database assign new uids, nids and vids, I'd have a mess of information without its relational links intact.

My table has at most 6220 node revisions, so if I start with 10000 as the nid and vid for the copied nodes, there will never be a clash. Because I already had some users on the new Drupal, I decided to restart all copied user uid from 10000 as well. The really easy way to do this is by adding 10000 to the uid, nid or vid field.

At the end of the day the user.uid, node.nid and node_revisions.vid are just automatically assigned numbers, so if a chunk of them between 6220 and 10000 are not used, that hardly matters.

Run some queries

Time to migrate some data. In the following examples I execute all queries on the target database. All source data is coming from the Drupal tables in the oldevent database. These queries all need to be run by a MySQL user with INSERT privileges on the target Drupal database and at least SELECT privileges on the source Drupal database.

Note: for each of the queries to be run, it is important that you copy and paste the field names in the SELECT sub-query in the correct order.

First, copy the users and give them a predictable non-clashing new user.uid. I exclude user id 1, as it already exists on the new Drupal site. I knew the other user's names would not clash, so I did not need to limit the query any further.

MySQL [source]> INSERT INTO users (SELECT uid+10000, name, pass, mail, mode, sort, threshold, theme, signature, signature_format, created, access, login, status, timezone, language, picture, init, data, timezone_name FROM oldevent.users WHERE oldevent.users.uid > 1);

Now copy all profile node CCK fields, making sure the nid and vid will not clash.

MySQL [source]> INSERT INTO content_type_profile (SELECT vid+10000, nid+10000, field_profile_firstname_value, field_profile_surname_value, field_profile_position_value, field_profile_organisation_value, field_profile_phone_value FROM oldevent.content_type_profile);

Note that iIf you have multi-value fields in your user's profiles, you will also need to perform this for all content_field_profile_* tables.

Next, copy across all profile nodes:

MySQL [source]> INSERT INTO node (SELECT nid+10000, vid+10000, type, language, title, uid+10000, status, created, changed, comment, promote, moderate, sticky, tnid, translate FROM oldevent.node WHERE oldevent.node.type='profile');

And finally, copy all profile node revisions across as well:

MySQL [source]> INSERT INTO node_revisions (SELECT nid+10000, vid+10000, uid+10000, title, body, teaser, log, timestamp, format FROM oldevent.node_revisions WHERE oldevent.node_revisions.nid IN (SELECT nid FROM oldevent.node WHERE oldevent.node.type='profile'));

And that's it. I did not need to copy any of the user's roles across, so at this point I was done. I migrated just under 1500 users and all revisions of their profiles from one Drupal to another in approximately two minutes.

Inevitable caveat!

If your users are set up with profile pictures, you'll need to make sure the users.picture field points at the right directory under files on the new Drupal site. You can do a quick search & replace query to update the path using REPLACE().

MySQL [source]> UPDATE users SET picture=REPLACE(picture, 'old-sub-string', 'new-sub-string');

Note that the picture filename will not match the user's uid anymore. If you use a tool like mmv or a shell script to rename the picture files, be sure to also update the field in the database. (Hint: generate a new file path via CONCAT() , using the user.uid field)

Dec 23 2010
Dec 23

Drush + Bash + Cron: Datbase Backup Goals

  • Scan sites directory for a given drupal install
  • Find all multisite folders/symlinks
  • For each multisite:
  • Use Drush to clear cache - we dont want cache table bloating up the MySQL dump file
  • Use Drush to delete watchdog logs - we dont want watchdog table bloating up the MySQL dump file
  • Use Drush to backup the database to pre-assigned folder
  • Use tar to compress and timestamp the Drush generated sql file
  • Setup Crontab to run perodically with the above commands as a bash file

Assumptions and Instructions

You will need to adjust the Bash file if any of these are not the same on your server

  • Drupal is installed in /var/www/html/drupal
  • Multisites are setup in the /var/www/html/drupal/sites folder
  • Backup folder exists in /var/www/backup/sqldumps
  • Drush is already installed in /root/drush/drush. If drush is not installed follow this Drush installation guide
  • AWK is already installed, if not, type: sudo yum install gawk

Drush Backup BASH file

Copy paste the code below and create a new bash file ideally in your/root home folder. Make the Bash file executable.

#!/bin/bash
#
 
# Adjust to match your system settings
DRUSH=/root/drush/drush
ECHO=/bin/echo
FIND=/usr/bin/find
AWK=/bin/awk
 
# Adjust to match your system settings
docroot=/var/www/html/drupal
backup_dir=/var/www/backup/sqldumps
 
multisites=$1
 
START_TIME=$(date +%Y%m%d%H%M);
 
# Add all multisites for a given docroot into a list. Detects all web addresses which are a directory which isn't named all, default or ends in .local.
if [ "${multisites}" = "all" ];then
        # If multisites are folders change -type d
        # If multisites are symlinks change -type l
        # Adjust $8 to match your docroot, it $x needs to be the name of the multisite folder/symlink
        multisites_list="`$FIND ${docroot}/sites/* -type l -prune | $AWK -F \/ '$8!="all" && $8!="default" && $8!~/\.local$/ { print $8 }'`"
else
        multisites_list=$multisites
fi
 
 
# Must be in the docroot directory before proceeding.
cd $docroot
 
for multisite in $multisites_list
do
        # Echo to the screen the current task.
        $ECHO
        $ECHO "##############################################################"
        $ECHO "Backing up ${multisite}"
        $ECHO
        $ECHO
 
        # Clear Drupal cache
        $DRUSH -y -u 1 -l "${multisite}" cc all
 
        # Truncate Watchdog
        $DRUSH -y -u 1 -l "${multisite}" wd-del all
 
        # SQL Dump DB
        $DRUSH -u 1 -l "${multisite}" sql-dump --result-file="${backup_dir}"/"${multisite}".sql
 
        # Compress the SQL Dump
        tar -czv -f "${backup_dir}"/"${START_TIME}"-"${multisite}".tar.gz -C "${backup_dir}"/ "${multisite}".sql
 
        # Delete original SQL Dump
        rm -f "${backup_dir}"/"${multisite}".sql
 
        $ECHO
        $ECHO
        $ECHO "Finished backing up ${multisite}"
        $ECHO
        $ECHO "##############################################################"
 
done

Setup Crontab

Assuming your bash file containing the code above is saved as /root/drush_backup.sh, you can setup a crontab for root user.

crontab -e
1 1 * * * /root/drush_backup_db.sh

Further Reading and Links

Related blog posts: 


Bookmark and Share
Dec 03 2010
Dec 03

The station's website will be build using Drupal an extremely powerful, open source content managment system written in PHP.

Drupal uses some PHP functions that require the installation of additional ports. You'll need:

  • devel/php5-pcre - Perl regular expressions.
  • textproc/php5-xml - XML parsing.
  • textproc/php5-simplexml - Simple XML.
  • databases/php5-mysqli - MySQL support for PHP.
  • www/php5-session - Session support.
  • ftp/php5-curl - cURL support.
  • graphics/php5-gd - Image handing. Optional, some modules need it.
  • converters/php5-mbstring - Unicode support. Optional, but Drupal prefers that it be installed.

Create the database

Create Drupal's database:

$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3 to server version: 5.1.11-beta

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> CREATE DATABASE drupal;
Query OK, 1 row affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON drupal.* TO [email protected] IDENTIFIED BY 'a secret password';
Query OK, 0 rows affected (0.00 sec)

mysql> quit;
Bye

Download

Check out Drupal with Drush and give the web server permission to read it:

$ cd /usr/local/www
$ drush dl drupal-6.x --drupal-project-rename
$ chown -R root:www drupal

Now that we've created the webroot directory, we need to (re)start Apache:

$ /usr/local/etc/rc.d/apache22 restart
Performing sanity check on apache22 configuration:
Syntax OK
apache22 not running? (check /var/run/httpd.pid).
Performing sanity check on apache22 configuration:
Syntax OK
Starting apache22.

Create Drupal's files directory:

$ cd /usr/local/www/drupal/sites/default
$ mkdir files
$ chmod g+w files/

Since Drupal 6 does the rest of the setup via the web, we need to create a configuration file and temporarily allow the web user to modify it:

$ cp default.settings.php settings.php
$ chmod g+w settings.php

Setup

Open up your browser and point it to your webserver. You should be presented with a wizard that will walk you through the setup. One step will ask for your database name, user and password:

You will also be prompted for credentials for n admin user.

Once the installation is complete, remove the write access from Drupal's configuration file with the following command:

$ chmod g-w /usr/local/www/drupal/sites/default/settings.php

Cron jobs

Drupal has different tasks that need to be run periodically. To do so it provides a script named cron.php that is run via the web server. We can use FreeBSD's cron program to automate the process of running the script.

Add a crontask to web server's user account:

$ crontab -e -u www

And add the following line the file to run the script once an hour. Remember to change the URL:
Nov 01 2010
Nov 01

FOSDEM banner

It's that time of the year again — the nice folks at FOSDEM have granted us a developer room at their upcoming conference (February 5+6 2011 in Brussels, Belgium)!

As usual there were more applications than they were able to accommodate, so we are very grateful for this opportunity for collaboration. Titled "MySQL and Friends", our room next year will be H.2213 with a capacity of 100 seats. It will be at our disposal on Saturday 5th, from 13:00 till 19:00. Like last year, we would like to set up a schedule of talks related to the MySQL server and the various projects that surround it. Each talk will last 20 minutes, plus 5 minutes of Q&A and a 5 minute break for switching speakers, giving us 12 slots in total to fill with excellent tech talks. Take a look at this year's schedule for some examples! I hope we can assemble an even more exciting and interesting schedule for next year.

Quoting from my last year's call for papers:

We are looking for covering a wide range of topics that attract both MySQL DBAs as well as application developers that work with MySQL as their database of choice. Are you developing a new storage engine or other plugin? Do you want to share your experiences and best practices in administering or deploying MySQL servers? Did you develop a new method to scale a MySQL setup? Let us and the audience know about it! You can submit your talk proposal via this submission form.

The deadline for turning in your proposal is Sunday, 26th of December, 2010, after which there will be a voting and rating period to identify the most interesting and attractive topics.

Please check the FOSDEM 2011 information page on the MySQL Forge Wiki for more details and don't hesitate to contact me directly, if you have any questions or suggestions. I look forward to your proposals!

Oct 25 2010
Oct 25

Drupal logoOver the weekend I updated my Drupal 7 test appliance in SUSE Studio to the Drupal 7.0-beta2 release, which was released on Oct. 23rd. I also added phpMyAdmin upon a user request, to provide a web-based method to work with the MySQL instance, if needed.

In addition to the lightweight "headless" appliance (which can only be accessed and configured via a remote network connection), I've now also created a GUI-based version. This appliance starts a minimal GNOME desktop and a Mozilla Firefox browser, which in turn opens the Drupal installation page by default. I hope you will find this useful if you want to toy around and test Drupal 7 without having to go through the entire OS and LAMP stack configuration yourself. In fact, you can even test this appliance via the recently added test drive option from right out of your web browser!

The appliance is now also available in OVF format. SuSE Studio now also builds Amazon EC2 images, which don't seem to be available for download from the SUSE Gallery yet. I assume this is a recent addition to the continuously improving SUSE Studio functionality, hopefully these images will be made available soon.

Oct 15 2010
Oct 15

Apple OS X comes with Apache and PHP built-in but need some tweaking to work. It also does not come with MySQL. Because of this, many developers have chosen to use MacPorts, Homebrew, or MAMP to install new binaries for Apache, PHP, and MySQL. However, doing this means your system would have multiple copies of Apache and PHP on your machine, and could create conflicts depending on how your built-in tools are configured. This tutorial will show you how to get the built-in versions of Apache and PHP running with an easy to install version of MySQL.

Additionally, it will show how to set up and install Drush, and phpMyAdmin. Setting up a new website is as easy as editing a single text file and adding a line to your /etc/hosts file. All of our changes will avoid editing default configuration files, either those provided by Apple in OS X or from downloaded packages, so that future updates will not break our customizations.

Apache

  1. Apple uses what appears to be the FreeBSD "version" of Apache 2.2, and includes various sample files. One of the sample files is a virtual hosts file that we will copy to our ~/Sites folder for easy editing. Launch the terminal and get started:

    $ cp /etc/apache2/extra/httpd-vhosts.conf ~/Sites

  2. In order to prevent future major OS updates from breaking our configuration, we'll be editing as few existing files as possible. Luckily, by default, OS X's httpd.conf file (the main configuration file for Apache) includes all *.conf files in another folder, /etc/apache2/other:

    $ tail -1 /etc/apache2/httpd.conf
    Include /private/etc/apache2/other/*.conf

  3. We then create a symbolic link in /etc/apache2/other to our newly-copied httpd-vhosts.conf from our ~/Sites folder:

    $ sudo ln -s ~/Sites/httpd-vhosts.conf /etc/apache2/other

  4. The ~/Sites/httpd-vhosts.conf is now where we will add all new virtual hosts. We will place our site root folders in ~/Sites and define them in this file. But first, we need to make some changes (or, download this edited httpd-vhosts.conf file and change fname in /Users/fname to the appropriate path for your system). Begin by adding this text to the top of the file, which will enable the ~/Sites folder and subfolders to be accessed by Apache (again, replace /Users/fname with your appropriate path for home directory):

    # Ensure all of the VirtualHost directories are listed below

    Options Indexes FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    Allow from all

  5. Add these lines anywhere in the file to ensure that http://localhost still serves the default folder:

    # For localhost in the OS X default location

    ServerName localhost
    DocumentRoot /Library/WebServer/Documents

  6. We will cover setting up a Virtual Host later, so let's reload our Apache settings to activate our new configuration:

    $ sudo apachectl graceful

    Or, go to the Sharing Preference Pane and uncheck/[re]check the Web Sharing item to stop and start Apache.

PHP

  1. As of OS X 10.6.6, OS X ships with PHP 5.3.3, but it is not enabled in Apache by default. By looking at /etc/apache2/httpd.conf, you can see that the line to load in the PHP module is commented out:

    $ grep php /etc/apache2/httpd.conf
    #LoadModule php5_module libexec/apache2/libphp5.so

  2. Since we are trying to avoid editing default configuration files where possible, we can add this information, along with the configuration to allow *.php files to run, without the comment at the beginning of the line, in the same folder that we put the symlink for httpd-vhosts.conf:

    $ sudo sh -c "cat > /etc/apache2/other/php5-loadmodule.conf <<'EOF'
    LoadModule php5_module libexec/apache2/libphp5.so
    EOF"

  3. PHP on OS X does not come with a configuration file at php.ini and PHP runs on its defaults. There is, however, a sample file ready to be copied and edited. The lines below will copy the sample file to /etc/php.ini (optionally, download the edited file and insert at /etc/php.ini) and add some developer-friendly changes to your configuration.

    $ sudo cp

    -

    a

    /

    etc

    /

    php

    .

    ini

    .default /

    etc

    /

    php

    .

    ini


    $ sudo sh

    -

    c

    "cat >> /etc/php.ini <<'EOF'

    ;;
    ;; User customizations below
    ;;

    ; Original - memory_limit = 128M
    memory_limit = 196M
    ; Original - post_max_size = 8M
    post_max_size = 200M
    ; Original - upload_max_filesize = 2M
    upload_max_filesize = 100M
    ; Original - default_socket_timeout = 60
    default_socket_timeout = 600
    ; Original - max_execution_time = 30
    max_execution_time = 300
    ; Original - max_input_time = 60
    max_input_time = 600
    ; Original - log_errors = off
    log_errors = on
    ; Original - display_errors = Off
    display_errors = on
    ; Original - display_startup_errors = Off
    display_startup_errors = on
    ; Original - error_reporting = E_ALL & ~E_DEPRECATED
    error_reporting = E_ALL & ~E_DEPRECATED & ~E_NOTICE
    EOF"

  4. OS X's PHP does ship with pear and pecl, but they are not in the $PATH, nor are there binaries to add or symlink to your path. The first two commands will perform a channel-update for pear and again for pecl, the second two will add an alias for each for your current session, and the last two will add the aliases permanently to your local environment (you may get PHP Notices that appear to be errors but they are simply notices. You can change error_reporting in php.ini to silence the Notices):

    $ sudo php /usr/lib/php/pearcmd.php channel-update pear.php.net
    $ sudo php /usr/lib/php/peclcmd.php channel-update pecl.php.net
    $ alias pear="php /usr/lib/php/pearcmd.php"
    $ alias pecl="php /usr/lib/php/peclcmd.php"
    $ cat >> ~/.bashrc <<'EOF'

    alias pear="php /usr/lib/php/pearcmd.php"
    alias pecl="php /usr/lib/php/peclcmd.php"
    EOF

  5. We do a lot of Drupal development, and a commonly-added PHP PECL module for Drupal is uploadprogress. With the new alias we just set, adding it is easy. Build the PELC package, add it php.ini, and reload the PHP configuration into Apache:

    Note: PECL modules require a compiler such as gcc to build. Install Xcode to build PECL packages such as uploadprogress.

    $ pecl install uploadprogress


    $ sudo sh

    -

    c

    "cat >> /etc/php.ini <<'EOF'

    ; Enable PECL extension uploadprogress
    extension=uploadprogress.so
    EOF"


    $ sudo apachectl graceful

MySQL

MySQL is the only thing that doesn't not ship with OS X that is needed to run a "LAMP" website like Drupal or WordPress. MySQL/Sun/Oracle provides pre-built MySQL binaries for OS X that don't conflict with other system packages, and even provides a nice System Preferences pane for starting/stopping the process and a checkbox to start MySQL at boot.

  1. Go to the download site for OS X pre-built binaries at: http://dev.mysql.com/downloads/mysql/index.html#macosx-dmg and choose the most appropriate type for your system. For systems running Snow Leopard, choose Mac OS X ver. 10.6 (x86, 64-bit), DMG Archive (on the following screen, you can scroll down for the download links to avoid having to create a user and password).

  2. Open the downloaded disk image and begin by installing the mysql-5.1.51-osx10.6-x86_64.pkg file (or named similarly to the version number you downloaded):

  3. Once the Installer script is completed, install the Preference Pane by double-clicking on MySQL.prefPane. If you are not sure where to install the preference pane, choose "Install for this user only":

    Note:

  4. The preference pane is now installed and you can check the box to have MySQL start on boot or start and stop it on demand with this button. To access this screen again, load System Preferences. Click "Start MySQL Server" now:

  5. The MySQL installer installed its files to /usr/local/mysql, and the binaries are in /usr/local/mysql/bin which is not in the $PATH of any user by default. Rather than edit the $PATH shell variable, we add symbolic links in /usr/local/bin (a location already in $PATH) to a few MySQL binaries. Add more or omit as desired. This is an optional step, but will make tasks like using Drush or performing a mysqldump in the command line much easier:

    $ cd /usr/local/bin
    $ ln -s /usr/local/mysql/bin/mysql
    $ ln -s /usr/local/mysql/bin/mysqladmin
    $ ln -s /usr/local/mysql/bin/mysqldump
    $ ln -s /usr/local/mysql/support-files/mysql.server

  6. Next we'll run mysql_secure_installation to set the root user's password and other setup-related tasks. This script ships with MySQL and only needs to be run once, and it should be run with sudo. As you run it, you can accept all other defaults after you set the root user's password:

    $ sudo /usr/local/mysql/bin/mysql_secure_installation

  7. If you were create a simple php file containing "phpinfo();" to view the PHP Information, you will see that Apple-built PHP looks for the MySQL socket at /var/mysql/mysql.sock:

    The problem is that the MySQL binary provided by Oracle puts the socket file in /tmp, which is the standard location on most systems. Other tutorials have recommended rebuilding PHP from source, or changing where PHP looks for the socket file via /etc/php.ini, or changing where MySQL places it when starting via /etc/my.cnf, but it is far easier and more fool-proof to create a symbolic link for /tmp/mysql.sock in the location OS X's PHP is looking for it. Since this keeps us from changing defaults, it ensures that a PHP update (via Apple) or a MySQL update (via Oracle) will not have an impact on functionality:

    $ sudo mkdir /var/mysql
    $ sudo chown _mysql:wheel /var/mysql
    $ sudo ln -s /tmp/mysql.sock /var/mysql

  8. After installing MySQL, several sample my.cnf files are created but none is placed in /etc/my.cnf, meaning that MySQL will always load in the default configuration. Since we will be using MySQL for local development, we will start with a "small" configuration file and make a few changes to increase the max_allowed_packet variable under [mysqld] and [mysqldump] (optionally, use the contents of this sample file in /etc/my.cnf). Later on, we can edit /etc/my.cnf to make other changes as desired:

    $ sudo cp /usr/local/mysql/support-files/my-small.cnf /etc/my.cnf
    $ sudo sed -i "" 's/max_allowed_packet = 16M/max_allowed_packet = 2G/g' /etc/my.cnf
    $ sudo sed -i "" "/\[mysqld\]/ a\\`echo -e '\n\r'`max_allowed_packet = 2G`echo -e '\n\r'`" /etc/my.cnf

  9. To load in these changes, go back to the MySQL System Preferences pane, and restart the server by pressing "Stop MySQL Server" followed by "Start MySQL Server." Or, you can do this on the command line (if you added this symlink as shown above):

    $ sudo mysql.server restart

    Shutting down MySQL
    ... SUCCESS!
    Starting MySQL
    ... SUCCESS!

phpMyAdmin

At this point, the full "MAMP stack" (Macintosh OS X, Apache, MySQL, PHP) is ready, but adding phpMyAdmin will make administering your MySQL install easier.

  1. phpMyAdmin will complain that the PHP extension 'php-mcrypt' is not available. Since you will not likely be allowing anyone other than yourself to use phpMyAdmin, you could ignore it, but if you want to build the extension, it is possible to build it as a shared object instead of having to rebuild PHP from source. Follow Michael Gracie's guide here (for his Step 2, grab the PHP source that matches "$ php --version" on your system): http://michaelgracie.com/2009/09/23/plugging-mcrypt-into-php-on-mac-os-x-snow-leopard-10.6.1/ (or, if you prefer Homebrew: http://blog.rogeriopvl.com/archives/php-mcrypt-in-snow-leopard-with-homebrew/ . The following files will be added:
    • /usr/local/lib/libmcrypt
    • /usr/local/include/mcrypt.h
    • /usr/local/include/mutils/mcrypt.h
    • /usr/local/bin/libmcrypt-config
    • /usr/local/lib/libmcrypt.la
    • /usr/local/lib/libmcrypt.4.4.8.dylib
    • /usr/local/lib/libmcrypt.4.dylib
    • /usr/local/lib/libmcrypt.dylib
    • /usr/local/share/aclocal/libmcrypt.m4
    • /usr/local/man/man3/mcrypt.3
    • /usr/lib/php/extensions/no-debug-non-zts-20090626/mcrypt.so
  2. The following lines will install the latest phpMyAdmin to /Library/WebServer/Documents and set up config.inc.php. Afterwards, you will be able to access phpMyAdmin at http://localhost/phpMyAdmin:
  3. $ cd /Library/WebServer/Documents
    $ curl -s -L -O http://downloads.sourceforge.net/project/phpmyadmin/phpMyAdmin/`curl -s -L http://www.phpmyadmin.net/home_page/index.php | grep -m1 dlname |awk -F'Download ' '{print $2}'|cut -d "<" -f1`/phpMyAdmin-`curl -s -L http://www.phpmyadmin.net/home_page/index.php | grep -m1 dlname |awk -F'Download ' '{print $2}'|cut -d "<" -f1`-english.tar.gz
    $ tar zxpf `ls -1 phpMyAdmin*tar.gz|sort|tail -1`
    $ rm `ls -1 phpMyAdmin*tar.gz|sort|tail -1`
    $ mv `ls -d1 phpMyAdmin-*|sort|tail -1` phpMyAdmin
    $ cd phpMyAdmin
    $ cp config.sample.inc.php config.inc.php
    $ sed -i "" "s/blowfish_secret\'\] = \'/blowfish_secret\'\] = \'`cat /dev/urandom | strings | grep -o '[[:alnum:]]' | head -n 50 | tr -d '\n'`/" config.inc.php

Drush

Drush is a helpful tool that provides a "Drupal shell" to speed up your Drupal development.

  1. Run the following in the Terminal to get Drush 4.2 installed in /usr/local and add drush to the $PATH:

    $ svn co http://subversible.svn.beanstalkapp.com/modules/drush/tags/DRUPAL-7--4-2/ /usr/local/drush
    $ cd /tmp
    $ pear download Console_Table
    $ tar zxvf `ls Console_Table-*.tgz` `ls Console_Table-*.tgz | sed -e 's/\.[a-z]\{3\}$//'`/Table.php
    $ mv `ls Console_Table-*.tgz | sed -e 's/\.[a-z]\{3\}$//'`/Table.php /usr/local/drush/includes/table.inc
    $ rm -fr Console_Table-*
    $ ln -s /usr/local/drush/drush /usr/local/bin/drush

Virtual Hosts

Now that all of the components to run a website locally are in place, it only takes a few changes to ~/Sites/httpd-vhosts.conf to get a new local site up and running.

  1. Your website code should be in a directory in ~/Sites. For the purposes of this example, the webroot will be at ~/Sites/thewebsite.

  2. You need to choose a "domain name" for your local site that you will use to access it in your browser. For the example, we will use "myproject.local". The first thing you need to do is add a line to /etc/hosts to direct your "domain name" to your local system:

    $ sudo sh -c "cat >> /etc/hosts <<'EOF'
    127.0.0.1 myproject.local
    EOF

  3. If you have been following the rest of this guide and used a copy of /etc/apache2/extra/httpd-vhosts.conf in ~/Sites/httpd-vhosts.conf, you can delete the first of the two example Virtual Hosts and edit the second one. You can also delete the ServerAdmin line as it is not necessary.

  4. Change the ServerName to myproject.local (or whatever you entered into /etc/hosts) so Apache will respond when you visit http://myproject.local in a browser.

  5. Change the DocumentRoot to the path of your webroot, which in this example is /Users/fname/Sites/thewebsite

  6. It's highly recommended to set a CustomLog and an ErrorLog for each virtual host, otherwise all logged output will be in the same file for all sites. Change the line starting with CustomLog from "dummy-host2.example.com-access_log" to "myproject.local-access_log", and for ErrorLog change "dummy-host2.example.com-error_log" to "myproject.local-error_log"

  7. Save the file and reload the Apache configuration as shown in Step 6 under the Apache section, and visit http://myproject.local in your browser.

Going forward, all you have to do to add a new website is duplicate and edit the to section underneath your existing configuration, make new edits, edit /etc/hosts again, and reload Apache.

Oct 11 2010
Oct 11

We recently stumbled on a performance issue on a big project. Loading time was really high on some pages, and we soon came identify a couple of queries slowing down the site. I'd like to show this as a good advice. You shouldn't trust Views + CCK completely when it comes to performance. The popular suite of modules is aimed on softening the process of developing website architecture, so you don't have to write down your queries all the time (and potentially rewrite 2 or 3 times for every change in the project). As much as it is time-saving, it also needs some attention and analysis when you're dealing with complex queries on huge datasets.

Our little case can demonstrate this. We have a node table with ~50K nodes and a user table with ~10K. We used Content Profile for a flexible customization of user profiles, and at some point we'd like to pull a page of nodes with author information for each of them. In short terms, the view produced:

SELECT many fields
FROM node node
INNER JOIN users users ON node.uid = users.uid
LEFT JOIN node node_users ON users.uid = node_users.uid AND node_users.type = "content profile name"
  a couple of other JOINs
WHERE (node.type in ('node type')) AND (node.status &lt;&gt; 0)
ORDER BY node_created DESC

I've simplified it to the important part. Even if you remove the other joins and fields, you will still get a VERY SLOW query. If you check how many rows were examined, you'll find astronomic values > 1M, all of that just to select a dozen nodes. The trouble are those 2 joins, and particularly the last one. Too many joins cause the table to be really large, cosuming a lot of memory and taking a lot of time to filter. I don't know much about MySQL inner working, but experience shows that.

How to deal with it then? We don't want to loose flexibility and speed while developing, but we can't afford to take 12s (in our case, we had such query) to load a page, right?

Then, first of all, we need to make the analysis. Turn on slow queries log on MySQL, and find the view causing the problem. After that, try removing JOINs one by one to see which one is really making it get slow. Usually Content Profile can cause that, because it involves joining the node table again just to filter out very few nodes out of it.

After identifiying the trouble, you can use the solution I used (it boosted the speed up to 10x in our case): hook_views_query_alter()

/**
* Implementation of hook_views_query_alter()
*/
function epd_tweaks_views_query_alter(&$view, &$query) {

  foreach ($query->table_queue as $table => &$table_info) {
    if ($table_info['table'] == 'node' && !empty($table_info['join']->extra)) {
      $extra = $table_info['join']->extra;
      unset($table_info['join']->extra);
      $filters = array();
      foreach ($extra as $filter) {
        if (!is_numeric($filter['value'])) {
          $filter['value'] = '"' . $filter['value'] . '"';
        }
        $filters[] = ' ' . $filter['field'] . ' = ' . $filter['value'];
      }

      $table_info['join']->table = '(SELECT * FROM {node} WHERE ' . implode(' AND ', $filters) . ')';
    }
  }
}

This code checks the query for left joins with extra conditions, like the one causing the trouble before, and substitutes the 'node' table with a subquery with the extra condition applied (and removed from the Join, obviously). With this, the subquery is run first, and the JOIN occurs only with a small subset of it. Additionaly, a bonus is granted to us, as the subquery is cached because it is very simple and has a small result!

Finally, the speed is back. Be sure to check the results to see if they're consistent. Using the view preview is a nice place to check if the query is being properly altered and if the query time is really reduced.

Sep 17 2010
Sep 17

Drupal logoThe Drupal community just recently released another alpha test release of their upcoming Drupal 7 version, to shake out the remaining bugs and to encourage more users to test it.

If you would like to give it a try, but you don't have a free server handy, how about using a virtual machine instead? Using the fabolous SuSE Studio, I've created an appliance based on openSUSE 11.3, Drupal 7.0-alpha7 and MySQL 5.1 with the InnoDB plugin and strict mode enabled (both for the SQL mode and InnoDB mode. Using this configuration helps to ensure that Drupal works well with the current version of MySQL/InnoDB and does not use any "questionable" SQL statements. This might be especially interesting for additional modules - Drupal core did not reveal any problems using strict mode so far.

You can download disk images for VMware/Virtualbox/KVM or XEN from the SUSE Gallery (free login required). Just boot the appliance in your virtualization application of choice, choose your keyboard layout and step through the network configuration and Time Zone selection. Once the appliance has booted up and the login: prompt appeared, point your web browser to the appliance's IP address to start the Drupal installation/configuration. MySQL has been pre-configured, there is an empty database named "drupal" and a user "drupal" with the same password to access it. You just need to enter this information in the Drupal Database configuration dialogue during the installation. Anything else can be configured to your liking.

After you have finished the installation, you can toy around with a fresh Drupal 7 installation! Install additional modules, change the themes, add content. And make sure to report all bugs that you run into while doing so! Have fun.

Aug 18 2010
Aug 18

Unless you've been living under a rock, by now you've heard about the case that is certain to keep the armchair lawyers busy for years to come: Oracle vs. Google. It's already been dissected elsewhere, but in a nutshell: Sun owned their GPL-licensed Java virtual machine, and various patents on it; Google wrote their own JVM for the Android platform, Dalvik; Oracle bought Sun; Oracle uses those patents to sue Google over their JVM; Hilarity ensues.

So what? How does that affect us, as PHP and Drupal developers? Well it doesn't... except indirectly via another product that Oracle bought as part of Sun: MySQL.

Let's be clear about one thing: Software patents are bad. Period. At this point pretty much everyone who knows the subject agrees on that, except for some of those who hold lots of software patents (and even some of those agree that the patent system is broken).

So for one mega-corp to bring a patent suit, about a Free Software project no less, against another mega-corp is something of a problem. Whether or not Oracle has a valid case under current law, and it's possible that it does, it exemplifies that current law is broken in this regard.

Predictably, Oracle has been getting some backlash. Some have been trying to read the tea laves and figure out what Oracle hopes to gain from this move, as they had to have known there would be negative PR. There have even been some influential developers stating that Oracle needs to be hurt as punishment, and suggesting that because PHP and MySQL are joined at the hip that we, as PHP developers, can and should reject MySQL and drop MySQL support in our projects to send a clear message to Oracle.

Now, I won't go that far. For one, Oracle is a corporation and like any publicly traded corporation is doing what its accountants believe is in their best short-term financial interests. It's not being evil, it's just doing what it is legally required to do. For another, Oracle like Sun before it is a massive company. The Java and MySQL divisions are totally separate, and I would not at all be surprised if the MySQL division didn't even hear about the case until they saw it on the news. Oracle also has a not-too-shabby track record with supporting open source themselves, and with the acquisition of Sun is one of the world's largest open source contributors.

Too, PHP developers abandoning MySQL is a fool's errand. MySQL is too widely deployed, the integration too deep. MySQL itself is just as good a product today as it was a week ago. And it's not even feasible, given the number of projects that are hard coded to MySQL (from their SQL flavor through to lots of mysql_query() or mysqli_query() calls all over the code base). It also would result in dropping support for the non-corporate MySQL forks that have appeared since MySQL ceased to be an independent company, such as MariaDB. You don't want to hurt an innocent bystander.

Besides, that wouldn't hurt Oracle. Oracle doesn't care about random open source projects using MySQL. They care about support contracts; that's where they make their money. Open source projects using MySQL are, from a business perspective, just a means to an end. It's not PHP projects migrating away from MySQL that would get Oracle to sit up and listen, it's paying customers migrating off of MySQL, or really, Oracle eponymous database. But they can't do that as long as the PHP projects they depend on are hard-coded to MySQL, as noted above.

Oracle vs. Google was not in any way the reason that Drupal 7 included a massive push for greater DB independence. However, it does illustrate the benefit of avoiding vendor lock-in, even when you're an Open Source project. Controlling your own destiny means not having a "hard dependency" on anyone; not on a particular web host, a particular commercial company, a particular 3rd party product, even a particular 3rd party open source project like MySQL.

Drupal 6 ran on one and a half databases: MySQL and PostgreSQL (sort of). Drupal 7 aims to run on six: MySQL, PostgreSQL, and SQLite in core; MS SQL and Oracle in contrib; and we've recently confirmed that the MySQL driver works with MariaDB, too. And it's designed to be extended via contrib, so other databases are possible in the future as well. Yes, that forced API changes and increased the complexity of the system, no question. The vendor independence that bought us, and the hundreds of thousands of down-stream users of Drupal 7, is worth that investment.

If you're a Drupal developer incensed by Oracle's actions, don't get mad; get coding. There are still some outstanding issues in Drupal 7 on PostgreSQL and SQLite, and we are not releasing Drupal 7 until we can say that we support four databases out of the box with a straight face. If you maintain a contrib module, make certain that you're not doing anything MySQL-specific. Make sure your module works on PostgreSQL and SQLite at least, so that your users have the freedom to choose their database. It's all about the freedom to choose, and freedom from vendor lock-in.

If you work on another PHP project, make sure your project has good database independence, too. Avoid ext/mysql (you should be doing that anyway) and ext/mysqli and use PDO instead, which makes cross-DB support easier. If you know C, help with PDO itself; really, it desperately needs some love. If you don't want to write your own DB abstraction layer (really, it's not easy) there are plenty to choose from already. But whatever you do, don't force your users into being tied to one specific database; not just MySQL, but don't force them into PostgreSQL either; nor Oracle, nor SQLite, nor MS SQL.

If you want to send a message to Oracle about software freedom, write software that is Free. Free as in speech, free of patent encumbrances, free of vendor lock-in, even to another open source project. Give your users the freedom to choose, and to send what messages they will.

I'll see you in the issue queues.

Bookmark/Search this post with

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web