Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

scaling drupal step one B - nfs vs rsync

Parent Feed: 

i got some good feedback on my dedicated data server step towards scaling. kris buytaert in his everything is a freaking dns problem blog points out that nfs creates an unnecessary choke point. he may very well have a point.

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

kris suggests rsync as a solution, and although luc stroobant points out the delete problem, i still think it's a good, simple solution. see the diagram above.

the delete problem is that you can't simply use the --delete flag on rsync. since in an x->y synchronization, a delete on node x looks just like an addition to node y.

i speculate that you can partly mitigate this issue with some careful scripting, using a source-of-truth file server to which you first pull only additions from the source nodes, and then do another run over the nodes with the delete flag (to remove any newly deleted files from your source-of-truth). unfortunately you can't do the delete run on a live site (due to timing problems if additions happen after your first pass and before your --delete pass), but you can do this as a regularly scheduled maintenance task when your directories are not in flux.

i include a bash script below to illustrate the point. i haven't tested this script, or the theory in general. so if you plan to use it, be careful.

you could call this script from cron on your data server. you could do this, say, every 5 minutes for a smallish deployment. even though that this causes a 5 minute delay in file propagation, the use of sticky sessions ensures that users will see files that they create immediately, even if there is a slight delay for others. additionally, you could schedule it with the -d flag during system downtime.

the viability of this approach depends on many factors including how quickly an uploaded file must be available for everyone and how many files you have to synchronize. this clearly depends on your application.

synchronizeFiles -- a bash script to keep your drupal web server's files directory synchronized

#!/bin/bash

# synchronizeFiles -- a bash script to keep your drupal web server's files directory
#                     synchronized - http://www.johnandcailin.com

# bail if anything fails
set -e

# don't synchronize deletes by default
syncDeletes=false

sourceServers="192.168.1.24 192.168.1.25"
sourceDir="/var/www/drupal/files"
sourceUser="www-data"
targetDir="/var/drupalFiles"

# function to print a usage message and bail
usageAndBail()
{
   echo "Usage syncronizeFiles [OPTION]"
   echo "     -d       synchronize deletes too (ONLY use when directory contents are static)"
   exit 1;
}

# process command line args
while getopts hd o
do     case "$o" in
        d)     syncDeletes=true;;
        h)     usageAndBail;;
        [?])   usageAndBail;;
       esac
done

# do initial addition only schronization run from sourceServers to targetServer
for sourceServer in ${sourceServers}
do
   echo "bi directionally syncing files between ${sourceServer} and local"

   # pull any new files to the target
   rsync -a ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}/..

   # push any new files back to the source
   rsync -a ${targetDir} ${sourceUser}@${sourceServer}:${sourceDir}/..
done

# synchronize deletes (only use if directory contents are static)
if test ${syncDeletes} = "true"
then
   for sourceServer in ${sourceServers}
   do
      echo "DELETE syncing files from ${sourceServer} to ${targetDir}"

      # pull any new files to the target, deleting from the source of truth if necessary
      rsync -a --delete ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}
   done
fi

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web