May 12 2014
May 12

Most of high traffic or complex Drupal sites use Apache Solr as the search engine. It is much faster and more scaleable than Drupal's search module.

In this article, we describe one way of many for having a working Apache Solr installation for use with Drupal 7.x, on Ubunutu Server 12.04 LTS. The technique described should work with Ubunut 14.04 LTS as well.

In a later article, now published at: article, we describe how to install other versions of Solr, using the Ubuntu/Debian way.

Objectives

For this article, we focus on having an installation of Apache Solr with the following objectives:

  • Use the latest stable version of Apache Solr
  • Least amount of software dependencies, i.e. no installation of Tomcat server, and no full JDK, and no separate Jetty
  • Least amount of necessary complexity
  • Least amount of software to install and maintain
  • A secure installation

This installation can be done on the same host that runs Drupal, if it has enough memory and CPU, or it can be on the database server. However, it is best if Solr is on a separate server dedicated for search, with enough memory and CPU.

Installing Java

We start by installing the Java Runtime Environment, and choose the headless server variant, i.e. without any GUI components.

sudo aptitude update
sudo aptitude install default-jre-headless

Downloading Apache Solr

Second, we need to download the latest stable version of Apache Solr from a mirror near you. At the time of writing this article, it is 4.7.2. You can find the closest mirror to you at Apache's mirror list.

cd /tmp
wget http://apache.mirror.rafal.ca/lucene/solr/4.7.2/solr-4.7.2.tgz

Extracting Apache Solr

Next we extract the archive, while still in the /tmp directory.

tar -xzf solr-4.7.2.tgz

Moving to the installation directory

We choose to install Solr in /opt, because it is supposed to contain software that is not installed from Ubuntu's repositories, using the apt dependency management system, nor tracked for security updates by Ubuntu.

sudo mv /tmp/solr-4.7.2 /opt/solr

Creating a "core"

Apache Solr can serve multiple sites, eached served by a "core". We will start with one core, called simply "drupal".

cd /opt/solr/example/solr
sudo mv collection1 drupal


Now edit the file ./drupal/core.properties and change the name= to drupal, like so:

name=drupal

Copying the Drupal schema and Solr configuration

We now have to copy the Drupal Solr configuration into Solr. Assuming your site is in installed in /var/www, these commands achieve the tasks:

cd /opt/solr/example/solr/drupal/conf
sudo cp /var/www/sites/all/modules/contrib/apachesolr/solr-conf/solr-4.x/* .

Then edit the file: /opt/solr/example/solr/drupal/conf/solrconfig.xml, and comment our or delete the following section:

<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>

Setting Apache Solr Authentication, using Jetty

By default, a Solr installation listens on the public Ethernet interface of a server, and has no protection whatsoever. Attackers can access Solr, and change its settings remotely. To prevent this, we set password authentication using the embedded Jetty that comes with Solr. This syntax is for Apache Solr 4.x. Earlier versions use a different syntax.

The following settings work well for a single core install, i.e. search for a single Drupal installation. If you want multi-core Solr, i.e. for many sites, then you want to fine tune this to add different roles to different cores.

Then edit the file: /opt/solr/example/etc/jetty.xml, and add this section:

<!-- ======= Securing Solr ===== -->
<Call name="addBean">
  <Arg>
    <New class="org.eclipse.jetty.security.HashLoginService">
      <Set name="name">Solr</Set>
      <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
      <Set name="refreshInterval">0</Set>
    </New>
  </Arg>
</Call>

Then edit the file: /opt/solr/example/etc/webdefault.xml, and add this section:

<security-constraint>
  <web-resource-collection>
    <web-resource-name>Solr</web-resource-name>
    <url-pattern>/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <role-name>search-role</role-name>
  </auth-constraint>
</security-constraint>

<login-config>
  <auth-method>BASIC</auth-method>
  <realm-name>Solr</realm-name>
</login-config>

Finally, create a new file named /opt/solr/example/etc/realm.properties, and add the following section to it:

user_name: password, search-role


Note that "search-role" must match what you put in webdefault.xml above.

Instead of "user_name", use the user name that will be used for logging in to Solr. Also, replace "password" with a real strong hard to guess password.

Finally, make sure that the file containing passwords is not readable to anyone but the owner.

chmod 640 /opt/solr/example/etc/realm.properties

Changing File Ownership

We then create a user for solr.

sudo useradd -d /opt/solr -M -s /dev/null -U solr

And finally change ownership of the directory to solr

sudo chown -R solr:solr /opt/solr

Automatically starting Solr

Now you need Solr to start automatically when the server is rebooted. To do this, download the attached file, and copy it to /etc/init.d

sudo cp solr-init.d.sh.txt /etc/init.d/solr
sudo chmod 755 /etc/init.d/solr

And now tell Linux to start it automatically.

sudo update-rc.d solr start 95 2 3 4 5 .

For now, start Solr manually.

sudo /etc/init.d/solr start

Now Solr is up and running.

Verify that it is running by accessing the following URL:

http://x.x.x.x:8983/solr/


Replace x.x.x.x by the IP address of the server that is running Solr.

You can also view the logs at:

tail -f /opt/solr/example/logs/solr.log

Configuring Drupal's Apache Solr module

After you have successfully installed, configured and started Solr, you should configure your Drupal site to interact with the Solr seserver. First, go to this URL: admin/config/search/apachesolr/settings/solr/edit, and enter the information for your Solr server. You should use the URL as follows:

http://user:[email protected]:8983/solr/drupal

Now you can proceed to reindex your site, by sending all the content to Solr.

Removing Solr

If you ever want to cleanly remove Apache Solr that you installed from the server using the above instructions, then use the sequence of the commands below:

sudo /etc/init.d/solr stop

sudo update-rc.d solr disable

sudo update-rc.d solr remove

sudo rm /etc/init.d/solr

sudo userdel solr

sudo rm -rf /opt/solr

sudo aptitude purge default-jre-headless

Additional Resources

Attachment Size solr-init.d.sh_.txt 1.23 KB
Oct 21 2013
Oct 21
Automating new dev sites for new branches

Over the past few years we've moved away from using Subversion (SVN) for version control and we're now using Git for all of our projects.  Git brings us a lot more power, but because of its different approach there are some challenges as well. 

Git has powerful branching and this opens up new opportunities to start a new branch for each new ticket/feature/client-request/bug-fix. There are several different branching strategies: Git Flow is common for large ongoing projects, or we use a more streamlined workflow.  This is great for client flexibility — a new feature can be released to production immediately after it's been approved, or you can choose to bundle several features together in an effort to reduce the time spent running deployments.  Regardless of what branching model you choose you will run into the issue where stakeholders need to review and approve a branch (and maybe send it back to developers for refinement) before it gets merged in.  If you've got several branches open at once that means you need several different dev sites for this review process to happen.  For simple tasks on simple sites you might be able to get away with just one dev site and manually check out different branches at different times, but for any new feature that requires database additions or changes that won't work. 

Another trend in web development over the past few years has been to automate as many of the boring and repetitive tasks as possible. So we've created a Drush command called Site Clone that can do it all with just a few keystrokes:

  1. Copies the codebase (excluding the files directory) with rsync to a new location.
  2. Creates a new git branch (optional).
  3. Creates a new /sites directory and settings.php file.
  4. Creates a new files directory.
  5. Copies the database.
  6. Writes database connection info to a global config file.

It also does thorough validation on the input parameters (about 20 different validations for everything from checking that the destination directory is writable, to ensuring that the name of the new database is valid, to ensuring that the new domain can be resolved).

Here's an example of how it's run:

drush advo-site-clone --destination-domain=test.cf.local --destination-db-name=test --git-branch=test
---VALIDATION---
no errors

---SUMMARY---
Source path         : /Users/dave/Sites/cf
Source site         : sites/cf.local
Source DB name      : cf
Destination path    : /Users/dave/Sites/test.cf.local
Destination site    : sites/test.cf.local
Destination DB name : test
New Git branch      : test

Do you really want to continue? (y/n): y
Starting rsync...                                             [status]
Rsync was successful.                                         [success]
Creating Git branch...                                        [status]
Switched to a new branch 'test'
Git branch created.                                           [success]
Created sites directory.                                      [success]
Updated settings.php.                                         [success]
Created files directory.                                      [success]
Starting DB copy...                                           [status]
Copied DB.                                                    [success]
Complete                                                      [success]

There are a few other things that we needed to put in place in order to get this working smoothly. We've set up DNS wildcards so that requests to a third-level subdomain end up where we want them to.  We've configured Apache with a VirtualDocumentRoot so that requests to new subdomains get routed to the appropriate webroot.  Finally we've also made some changes to our project management tool so that everyone knows which dev site to look at for each ticket.

Once you've got all the pieces of the puzzle you'll be able to have a workflow something like:

  1. Stakeholder requests a new feature (let's call it foo) for their site (let's call it bar.com).
  2. Developer clones an existing dev site (bar.advomatic.com) into a new dev site (foo.bar.advomatic.com) and creates a new branch (foo).
  3. Developer implements the request.
  4. Stakeholder reviews on the branch dev site (foo.bar.advomatic.com). Return to #3 if necessary.
  5. Merge branch foo + deploy.
  6. @todo decommission the branch site (foo.bar.advomatic.com).

Currently that last step has to be done manually.  But we should create a corresponding script to clean-up the codebase/files/database/branches.

Automate all the things!

Oct 07 2013
Oct 07

Using a Reverse Proxy and/or a Content Delivery Network (CDN) has become common practice for Drupal and other Content Management Systems.

One inconvenient aspect of this is that your web server no longer gets the correct IP address, and neither does your application. The IP address is that of the machine that the reverse proxy is running on.

In Drupal, there is code in core that tries to work around this, by looking up the IP address in the HTTP header HTTP_X_FORWARDED_FOR, or a custom header that you can set.

For example, this would be in the settings.php of a server that runs Varnish on the same box.

$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');

There is also this setting for Drupal 7.x in case your CDN puts the IP address in some other custom header:

// CloudFlare CDN
$conf['reverse_proxy_header'] = 'HTTP_CF_CONNECTING_IP';

Only for the application, what about the web server?

But, even if you solve this at the application level (e.g. Drupal, or WordPress), there is still the issue that your web server is not logging the correct IP address. For example, you can't analyze the logs to know which countries your users are coming from, or identify DDoS attacks.

Apache RPAF module

There is a easy solution to this though: the Reverse Proxy Add Forward (RPAF).

What this Apache module does is extract the correct IP address, and uses that for Apache logs, as well hand over the correct IP address of the client in PHP's variable: $_SERVER['REMOTE_ADDR']

To install RPAF on Ubuntu 12.04 or later, use the command:

aptitude install libapache2-mod-rpaf

If you run the reverse proxy (e.g. Varnish) on same server as your web server and application, and do not use a CDN, then there is no need to do anything more.

However, if you run the reverse proxy on another server, then you need to change the RPAFproxy_ips line to include the IP addresses of these servers. For example, this will be the addresses for your Varnish servers which are front ending Drupal, then they are front ended by the CDN.

You do this by editing the file /etc/apache2/mods-enabled/rpaf.conf.

For example:

RPAFproxy_ips 10.0.0.3 10.0.0.4 10.0.0.5

CDN Client IP Header

If you are using a CDN, then you need to find out what HTTP header the CDN uses to put the client IP address, and modify RPAF's configuration accordingly.

For example, for CloudFlare, the header is CF-Connecting-IP

So, you need to edit the above file, and add the following line:

RPAFheader CF-Connecting-IP

Drupal Reverse Proxy settings no longer needed

And finally, you don't need any of the above Reverse Proxy configuration in settings.php.

// $conf['reverse_proxy'] = TRUE;
// $conf['reverse_proxy_addresses'] = array('127.0.0.1');
// $conf['reverse_proxy_header'] = 'HTTP_CF_CONNECTING_IP';

Now, you have correct client IP addresses in Apache's logs, and inside Drupal as well.

What If RPAF Does Not Work?

If you have RPAF front ended directly by a CDN, without Varnish, then RPAF may not work for a yet unknown reason.

To overcome this, you have several other options.

Apache mod_remoteip

There is a small Apache module called mod_remoteip. This basically does the same thing as RPAF, but with simpler configuration.

Use the download link and save the file to the file named apache-2.2-mod_remoteip.c.

apxs2 -i -a -c apache-2.2-mod_remoteip.c

This should create the module's .so file in Apache's modules directory. It should also add the LoadModule directive in mods-available/remoteip.load, which should look like so:

LoadModule remoteip_module modules/mod_remoteip.so

Now add the RemoteIPHeader directive in a new file called mods-available/remoteip.conf

RemoteIPHeader X-Forwarded-For

If you are using CloudFlare CDN then you use:

RemoteIPHeader CF-Connecting-IP

Now, enable the module:

a2enmod remoteip

Then restart Apache:

service apache2 restart

If this does not work, then you can still do it using the next set of tricks:

Apache Access Log and Drupal Reverse Proxy Settings

We can force Apache to log the correct client IP address to the access log by adding this to the virtual host entry for your site (e.g. /etc/apache2/sites-enabled/example.com):

LogFormat "%{CF-Connecting-IP}i %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" proxied

This takes the CF-Connecting-IP header from CloudFlare, and uses that instead of the IP address, which is of the proxy, not the originating client.

Then, under the "VirtualHost" stanza, you add this to use the custom proxied format you created above:

CustomLog ${APACHE_LOG_DIR}/access-example.com.log proxied

Then you need to enable the Drupal reverse proxy setting in settings.php:

$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_header'] = 'HTTP_CF_CONNECTING_IP';

You don't need to add the reverse_proxy_addresses variable, because for CloudFlare there are too many of them.

Aug 20 2013
Aug 20

Robert M. White
TL;RD

  1. Performance matter for all websites
  2. Performance is not just (80%) frontend
  3. SPDY kills 80% of your frontend problems

What
In the Drupal and broader web community, there is a lot of attention towards the performance of websites.

While "performance" is a very complex topic on its' own, let us in this posting define it as the speed of the website and the process to optimize the speed of the website (or better broader, the experience of the speed by the user as performance.

Why
This attention towards speed is for two good reasons. On one hand we have the site that is getting bigger and hence slower. The databases get bigger with more content and the the codebase of the website is added with new modules and features. While on the other hand, more money is being made with websites for business even if you are not selling goods or run ads.

Given that most sites run on the same hardware for years, this results in slower websites, leading to a lower pagerank, less traffic, less pages per visit, lower conversion rates. And in the end, if you a have a business case for your website, lower profits. Bottemline: If you make money online, you are losing this due to a slow website.
UFO's
When it comes to speed there are many parameters to take in to account, it is not "just" the average pageloading time. First of all the average is a rather useless metric without taking the standard deviation into account. But apart from that, it comes down to what a "page" is.

A page can be just the HTML file (can be done in 50ms)
A page can be the complete webpage with all the elements (for many sites around the 10seconds)
A page can be the complete webpage with all elements including third party content. Hint: did you know that for displaying the Facebook Like button, more Javascript is downloaded then the entire jQuery/backbone/bootstrap app of this website, non cacheable!
And a page can be anything "above the fold"

Moon Retro future
And then there are more interesting metrics then these, the time to first byte from a technologic point of view for example. But not just technical PoV. There is a website one visits every day that optimzes its' rendable HTML to fit within 1500 bytes.
So ranging from "First byte to glass" to "Round trip time", there are many elements to be taken into account when one measures the speed of a website. And that is the main point: webperformance is not just for the frontenders like many think, not just for the backenders like some of them hope, but for all the people who control elements elements in the chain involved in the speed. All the way down to the networking guys (m/f) in the basement (hint sysadmins: INITCWND has a huge performance impact!) Speed should be in your core of your team, not just in those who enable gzip compression, aggregate the Javascript or make the sprites.

Steve Souders (the webperformance guru) once stated in his golden rule that 80-90% of the end-user response time is spent on the frontend.

Speedy to the rescue?
This 80% might be matter of debate in the case of a logged in user in a CMS. But even if it is true. This 80% can be reduced by 80% with SPDY.
SPDY is an open protocol introduced by Google to overcome the problems with HTTP (up to 1.1 including pipeling, defined in 1999!) and the absence of HTTP/2.0. It speeds up HTTP by generating one connection between the client and the server for all the elements in the page served by the server. Orginally only build in chrome, many browsers now support this protocol that will be the base of HTTP/2.0. Think about it and read about it, a complete webpage with all the elements -regardless of minifying and sprites- served in one stream with only once the TCP handshake and one DNS request. Most of the rules of traditional webperf optimalisation (CSS aggregation, preloading, prefetching, offloading elements to different host, cookie free domains), all this wisedom is gone, even false, with one simple install. 80% of the 80% gone with SPDY, now one can focus on the hard part; the database, the codebase. :-)

The downside of SPDY is however that is is hard to troublshoot and not yet avaliable in all browsers. It is hard to troubleshoot since most implementations use SSL, the protocol is multiplexed and zipped by default and not made to be read by humans unlike HTTP/1.0. There are however some tools that make it possible to test SPDY but most if not all tools you use every day like ab, curl, wget will fail to use SPDY and fallback like defined in the protocol to HTTP/1.0

Measure
So can we test to see if SPDY is really faster and how much faster?
Yes, see Evaluating the Performance of SPDY-Enabled Web Servers (a Drupal site :-)
SPDY performance

So more users, less errors under load and a lower page load time. What is there not to like about SPDY?

Drupal
That is why I would love Drupal.org to run with SPDY, see this issue on d.o/2046731. I really do hope that the infra team will find some time to test this and once accepted, install it on the production server.

Performance as a Service
One of the projects I have been active in later is ProjectPAAS, bonus point if you find the easteregg on the site :-) . ProjectPAAS is a startup that will test a Drupal site, measure on 100+ metrics, analyse the data and give the developer an opinionated report on what to change to get a better performance. If you like these images around the retro future theme, be sure to checkout the flickr page, like us on facebook, follow us on twitter but most of all, see the moodboard on pinterest

Pinterest itself is doing some good work when it comes to performance as well. Not just speed but also the perception of speed.

Pinterest lazyloading with color
Pinterest does lazyload images but also displays the prominent color as background in a cell before the image is loaded, giving the user a sense of what to come. For a background on this see webdistortion

Congratulations you just saved 0,4 seconds
If you are lazyloading images to give your user faster results, be sure to checkout this module we made; lazypaas, currently a sandbox project awaiting approval. It does extract the dominant (most used) color of an image and displays the box where the image will be placed with this color. And if you use it and did a code review, be sure to help it to get it to a real Drupal module.

From 80% to 100%
Lazyloading like this leads to better user experience. Because even when 80% of the end-user response time is spent on the frontend, 100% of the time is spend in the client, most ofthen the browser. The only place where performance should be measured and the only page where performance matters. Hence, all elements that deliver this speed should be optimized, including the webserver and the browser.

Now say this fast after me: SPDY FTW. :-)

Dec 24 2012
Dec 24
;;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;

; All relative paths in this configuration file are relative to PHP's install
; prefix (/usr). This prefix can be dynamicaly changed by using the
; '-p' argument from the command line.

; Include one or more files. If glob(3) exists, it is used to include a bunch of
; files from a glob(3) pattern. This directive can be used everywhere in the
; file.
; Relative path can also be used. They will be prefixed by:
;  - the global prefix if it's been set (-p arguement)
;  - /usr otherwise
;include=etc/fpm.d/*.conf

;;;;;;;;;;;;;;;;;;
; Global Options ;
;;;;;;;;;;;;;;;;;;

[global]
; Pid file
; Note: the default prefix is /usr/var
; Default Value: none
pid = /var/run/php-fpm.pid

; Error log file
; Note: the default prefix is /usr/var
; Default Value: log/php-fpm.log
error_log = /var/log/php-fpm/php-fpm.log

; Log level
; Possible Values: alert, error, warning, notice, debug
; Default Value: notice
log_level = error

; If this number of child processes exit with SIGSEGV or SIGBUS within the time
; interval set by emergency_restart_interval then FPM will restart. A value
; of '0' means 'Off'.
; Default Value: 0
;emergency_restart_threshold = 0

; Interval of time used by emergency_restart_interval to determine when
; a graceful restart will be initiated.  This can be useful to work around
; accidental corruptions in an accelerator's shared memory.
; Available Units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;emergency_restart_interval = 0

; Time limit for child processes to wait for a reaction on signals from master.
; Available units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;process_control_timeout = 0

; Send FPM to background. Set to 'no' to keep FPM in foreground for debugging.
; Default Value: yes
;daemonize = yes

; Set open file descriptor rlimit for the master process.
; Default Value: system defined value
;rlimit_files = 1024

; Set max core size rlimit for the master process.
; Possible Values: 'unlimited' or an integer greater or equal to 0
; Default Value: system defined value
;rlimit_core = 0

;;;;;;;;;;;;;;;;;;;;
; Pool Definitions ;
;;;;;;;;;;;;;;;;;;;;

; Multiple pools of child processes may be started with different listening
; ports and different management options.  The name of the pool will be
; used in logs and stats. There is no limitation on the number of pools which
; FPM can handle. Your system will tell you anyway :)

; Start a new pool named 'www'.
; the variable $pool can we used in any directive and will be replaced by the
; pool name ('www' here)
[www]

; Per pool prefix
; It only applies on the following directives:
; - 'slowlog'
; - 'listen' (unixsocket)
; - 'chroot'
; - 'chdir'
; - 'php_values'
; - 'php_admin_values'
; When not set, the global prefix (or /usr) applies instead.
; Note: This directive can also be relative to the global prefix.
; Default Value: none
;prefix = /path/to/pools/$pool

; The address on which to accept FastCGI requests.
; Valid syntaxes are:
;   'ip.add.re.ss:port'    - to listen on a TCP socket to a specific address on
;                            a specific port;
;   'port'                 - to listen on a TCP socket to all addresses on a
;                            specific port;
;   '/path/to/unix/socket' - to listen on a unix socket.
; Note: This value is mandatory.
listen = /tmp/phpfpm.sock

; Set listen(2) backlog. A value of '-1' means unlimited.
; Default Value: 128 (-1 on FreeBSD and OpenBSD)
;listen.backlog = -1

; List of ipv4 addresses of FastCGI clients which are allowed to connect.
; Equivalent to the FCGI_WEB_SERVER_ADDRS environment variable in the original
; PHP FCGI (5.2.2+). Makes sense only with a tcp listening socket. Each address
; must be separated by a comma. If this value is left blank, connections will be
; accepted from any ip address.
; Default Value: any
;listen.allowed_clients = 127.0.0.1

; Set permissions for unix socket, if one is used. In Linux, read/write
; permissions must be set in order to allow connections from a web server. Many
; BSD-derived systems allow connections regardless of permissions.
; Default Values: user and group are set as the running user
;                 mode is set to 0666
;listen.owner = nobody
;listen.group = nobody
;listen.mode = 0666

; Unix user/group of processes
; Note: The user is mandatory. If the group is not set, the default user's group
;       will be used.
user = nobody
group = nobody

; Choose how the process manager will control the number of child processes.
; Possible Values:
;   static  - a fixed number (pm.max_children) of child processes;
;   dynamic - the number of child processes are set dynamically based on the
;             following directives:
;             pm.max_children      - the maximum number of children that can
;                                    be alive at the same time.
;             pm.start_servers     - the number of children created on startup.
;             pm.min_spare_servers - the minimum number of children in 'idle'
;                                    state (waiting to process). If the number
;                                    of 'idle' processes is less than this
;                                    number then some children will be created.
;             pm.max_spare_servers - the maximum number of children in 'idle'
;                                    state (waiting to process). If the number
;                                    of 'idle' processes is greater than this
;                                    number then some children will be killed.
; Note: This value is mandatory.
pm = dynamic

; The number of child processes to be created when pm is set to 'static' and the
; maximum number of child processes to be created when pm is set to 'dynamic'.
; This value sets the limit on the number of simultaneous requests that will be
; served. Equivalent to the ApacheMaxClients directive with mpm_prefork.
; Equivalent to the PHP_FCGI_CHILDREN environment variable in the original PHP
; CGI.
; Note: Used when pm is set to either 'static' or 'dynamic'
; Note: This value is mandatory.
pm.max_children = 50

; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 20

; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 5

; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 35

; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
; Default Value: 0
pm.max_requests = 500

; The URI to view the FPM status page. If this value is not set, no URI will be
; recognized as a status page. By default, the status page shows the following
; information:
;   accepted conn        - the number of request accepted by the pool;
;   pool                 - the name of the pool;
;   process manager      - static or dynamic;
;   idle processes       - the number of idle processes;
;   active processes     - the number of active processes;
;   total processes      - the number of idle + active processes.
;   max children reached - number of times, the process limit has been reached,
;                          when pm tries to start more children (works only for
;                          pm 'dynamic')
; The values of 'idle processes', 'active processes' and 'total processes' are
; updated each second. The value of 'accepted conn' is updated in real time.
; Example output:
;   accepted conn:        12073
;   pool:                 www
;   process manager:      static
;   idle processes:       35
;   active processes:     65
;   total processes:      100
;   max children reached: 1
; By default the status page output is formatted as text/plain. Passing either
; 'html', 'xml' or 'json' as a query string will return the corresponding output
; syntax. Example:
;   http://www.foo.bar/status
;
   http://www.foo.bar/status?json
;
   http://www.foo.bar/status?html
;
   http://www.foo.bar/status?xml
;
Note: The value must start with a leading slash (/). The value can be
;       anything, but it may not be a good idea to use the .php extension or it
;       may conflict with a real PHP file.
; Default Value: not set
;pm.status_path = /status

; The ping URI to call the monitoring page of FPM. If this value is not set, no
; URI will be recognized as a ping page. This could be used to test from outside
; that FPM is alive and responding, or to
; - create a graph of FPM availability (rrd or such);
; - remove a server from a group if it is not responding (load balancing);
; - trigger alerts for the operating team (24/7).
; Note: The value must start with a leading slash (/). The value can be
;       anything, but it may not be a good idea to use the .php extension or it
;       may conflict with a real PHP file.
; Default Value: not set
;ping.path = /ping

; This directive may be used to customize the response of a ping request. The
; response is formatted as text/plain with a 200 response code.
; Default Value: pong
;ping.response = pong

; The access log file
; Default: not set
;access.log = log/$pool.access.log

; The access log format.
; The following syntax is allowed
;  %%: the '%' character
;  %C: %CPU used by the request
;      it can accept the following format:
;      - %{user}C for user CPU only
;      - %{system}C for system CPU only
;      - %{total}C  for user + system CPU (default)
;  %d: time taken to serve the request
;      it can accept the following format:
;      - %{seconds}d (default)
;      - %{miliseconds}d
;      - %{mili}d
;      - %{microseconds}d
;      - %{micro}d
;  %e: an environment variable (same as $_ENV or $_SERVER)
;      it must be associated with embraces to specify the name of the env
;      variable. Some exemples:
;      - server specifics like: %{REQUEST_METHOD}e or %{SERVER_PROTOCOL}e
;      - HTTP headers like: %{HTTP_HOST}e or %{HTTP_USER_AGENT}e
;  %f: script filename
;  %l: content-length of the request (for POST request only)
;  %m: request method
;  %M: peak of memory allocated by PHP
;      it can accept the following format:
;      - %{bytes}M (default)
;      - %{kilobytes}M
;      - %{kilo}M
;      - %{megabytes}M
;      - %{mega}M
;  %n: pool name
;  %o: ouput header
;      it must be associated with embraces to specify the name of the header:
;      - %{Content-Type}o
;      - %{X-Powered-By}o
;      - %{Transfert-Encoding}o
;      - ....
;  %p: PID of the child that serviced the request
;  %P: PID of the parent of the child that serviced the request
;  %q: the query string
;  %Q: the '?' character if query string exists
;  %r: the request URI (without the query string, see %q and %Q)
;  %R: remote IP address
;  %s: status (response code)
;  %t: server time the request was received
;      it can accept a strftime(3) format:
;      %d/%b/%Y:%H:%M:%S %z (default)
;  %T: time the log has been written (the request has finished)
;      it can accept a strftime(3) format:
;      %d/%b/%Y:%H:%M:%S %z (default)
;  %u: remote user
;
; Default: "%R - %u %t \"%m %r\" %s"
;access.format = %R - %u %t "%m %r%Q%q" %s %f %{mili}d %{kilo}M %C%%

; The timeout for serving a single request after which the worker process will
; be killed. This option should be used when the 'max_execution_time' ini option
; does not stop script execution for some reason. A value of '0' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
;request_terminate_timeout = 0

; The timeout for serving a single request after which a PHP backtrace will be
; dumped to the 'slowlog' file. A value of '0s' means 'off'.
; Available units: s(econds)(default), m(inutes), h(ours), or d(ays)
; Default Value: 0
;request_slowlog_timeout = 0

; The log file for slow requests
; Default Value: not set
; Note: slowlog is mandatory if request_slowlog_timeout is set
;slowlog = log/$pool.log.slow

; Set open file descriptor rlimit.
; Default Value: system defined value
;rlimit_files = 1024

; Set max core size rlimit.
; Possible Values: 'unlimited' or an integer greater or equal to 0
; Default Value: system defined value
;rlimit_core = 0

; Chroot to this directory at the start. This value must be defined as an
; absolute path. When this value is not set, chroot is not used.
; Note: you can prefix with '$prefix' to chroot to the pool prefix or one
; of its subdirectories. If the pool prefix is not set, the global prefix
; will be used instead.
; Note: chrooting is a great security feature and should be used whenever
;       possible. However, all PHP paths will be relative to the chroot
;       (error_log, sessions.save_path, ...).
; Default Value: not set
;chroot =

; Chdir to this directory at the start.
; Note: relative path can be used.
; Default Value: current directory or / when chroot
;chdir = /var/www

; Redirect worker stdout and stderr into main error log. If not set, stdout and
; stderr will be redirected to /dev/null according to FastCGI specs.
; Note: on highloaded environement, this can cause some delay in the page
; process time (several ms).
; Default Value: no
catch_workers_output = yes

; Pass environment variables like LD_LIBRARY_PATH. All $VARIABLEs are taken from
; the current environment.
; Default Value: clean env
;env[HOSTNAME] = $HOSTNAME
;env[PATH] = /usr/local/bin:/usr/bin:/bin
;env[TMP] = /tmp
;env[TMPDIR] = /tmp
;env[TEMP] = /tmp

; Additional php.ini defines, specific to this pool of workers. These settings
; overwrite the values previously defined in the php.ini. The directives are the
; same as the PHP SAPI:
;   php_value/php_flag             - you can set classic ini defines which can
;                                    be overwritten from PHP call 'ini_set'.
;   php_admin_value/php_admin_flag - these directives won't be overwritten by
;                                     PHP call 'ini_set'
; For php_*flag, valid values are on, off, 1, 0, true, false, yes or no.

; Defining 'extension' will load the corresponding shared extension from
; extension_dir. Defining 'disable_functions' or 'disable_classes' will not
; overwrite previously defined php.ini values, but will append the new value
; instead.

; Note: path INI options can be relative and will be expanded with the prefix
; (pool, global or /usr)

; Default Value: nothing is defined by default except the values in php.ini and
;                specified at startup with the -d argument
;php_admin_value[sendmail_path] = /usr/sbin/sendmail -t -i -f [email protected]
;php_flag[display_errors] = off
;php_admin_value[error_log] = /var/log/fpm-php.www.log
;php_admin_flag[log_errors] = on
;php_admin_value[memory_limit] = 32M

Oct 29 2012
Oct 29

But this time, it was different. Modules were not to blame.

While inspecting a site that had several performance problems for a client, we noticed is that memory usage was very high. From the "top" command, the RES (resident set) field was 159 MB, far more than what it should be.

We narrowed down the problem to a view that is in a block that is visible on most pages of the site.

But the puzzling part is that the view was configured to only returned 5 rows. It did not make sense for it to use that much memory.

However, when we traced the query, it was like so:

SELECT node.nid, ....
FROM node
INNER JOIN ...
ORDER BY ... DESC

No LIMIT clause was in the query!

When executing the query manually, we found that it returned 35,254 rows, with 5 columns each!

Using the script at the end of this article, we were able to measure memory usage at different steps. We inserted a views embed in the script and measured memory usage:

Before boot               0.63 MB
Boot time               445.1 ms
After Boot               51.74 MB
Boot Peak                51.80 MB
Module count            136
After query              66.37 MB
Query Peak               67.37 MB
After fetch              78.96 MB
Fetch Peak              148.69 MB

So, indeed, that view was the cause of the inflated memory usage! With memory jumping from 67 MB to 148 MB.

In this case, it turns out that the module "views_php" was definitely the culprit. Once it was disabled, the query did not have that huge memory foot print any more.

Here are the results after disabling views_php:

Before boot               0.63MB
Boot time               427.1 ms
After Boot               51.68MB
Boot Peak                51.74MB
Module count            135
After query              66.31MB
Query Peak               67.31MB
After fetch              74.71MB
Fetch Peak               74.89MB

A more reasonable 75MB.

We did not dig further, but it could be that because a field of type "Global: PHP" was used, views wanted to return the entire data set and then apply the PHP to it, rather than add a LIMIT to the query before executing it.

So, watch out for those blocks that are shown on many web pages.

Baseline Memory Usage

As a general comparative reference, here are some baseline figures. These are worse case scenarios, and assume APC is off, or that this measurement is running from the command line, where APC is disabled or non-persistent. The figures would be less from Apache when APC is enabled.

These figures will vary from site to site, and they depend on many factors. For example, what modules are enabled in Apache, what modules are enabled in PHP, ...etc.

Drupal 6 with 73 modules

Before boot:     0.63 MB
After boot:     22.52 MB
Peak memory:    22,52 MB

Drupal 7 site, with 105 modules

Before boot:     0.63 MB
After boot:     57.03 MB
Peak memory:    58.39 MB

Drupal 7 site, with 134 modules

Before boot:     0.63 MB
After boot:     58.79 MB
Peak memory:    60.28 MB

Drupal 6 site, with 381 modules

Before boot:     0.63 MB
After boot:     66.02 MB

Drupal 7 site, pristine default install, 29 modules

Now compare all the above to a pristine Drupal 7 install, which has 29 core modules installed.

Before boot     0.63 MB
Boot time     227.40 ms
After Boot     20.03 MB
Boot Peak      20.07 MB
Module count   29

Effect of APC on boot memory footprint

To see how much APC, and other opcode caches, improves these figures, compare the following:

First access after Apache restarted, for a Drupal 6 site:

Before boot     0.63 MB
Boot time     802.5 ms
After Boot     62.85 MB
Boot Peak      63.11 MB
Module count   210

Subsequent accesses, with APC caching the code:

Before boot     0.61 MB
Boot time     163.80 ms
After Boot     17.24 MB
Boot Peak      18.41 MB
Module count   210

Also, for a default Drupal 7 install, with 29 core modules. Compare to the above figures for the same site without APC.

Before boot     0.61 MB
Boot time      60.4 ms
After Boot      3.36 MB
Boot Peak       3.41 MB
Module count   29

A marked improvement! Not only in bootup time, but also reduced memory foot print.

So always install APC, and configure it correctly on your site.

Memory Measurement Script

Here is a script to measure your memory usage. You can run it from the command line as:

$ cd /your/document_root
$ php mem.php

Add whatever part you think is causing memory usage to sky rocket in place of the commented out section, and you can see how much is being used.

You can also add HTML line breaks to the print statement, and run it from a browser to see the effect of APC code caching as well.

<?php

define('DRUPAL_ROOT', getcwd());

function measure() {
  list($usec, $sec) = explode(" ", microtime());
  $secs = (float)$usec + (float)$sec;
  return $secs * 1000;
}

function fmt_mem($number) {
  return number_format($number/1024/1024, 2) . "MB";
}

$results = array();
$results['Before boot'] =  fmt_mem(memory_get_usage());

$start = measure();
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$results['Boot time'] = number_format((measure() - $start), 1) . " ms";
  
$results['After Boot'] = fmt_mem(memory_get_usage());
$results['Boot Peak '] = fmt_mem(memory_get_peak_usage());
  
$count = count(module_list());
$results['Module count'] = $count;

// Testing a view using the standard method
/*
$view = views_embed_view('your_view', 'block');
*/

// Testing a view in two steps
/*
$view = views_get_view('your_view');

$results['After query'] = fmt_mem(memory_get_usage());
$results['Query Peak '] = fmt_mem(memory_get_peak_usage());

$output = $view->preview('block');
*/

$results['After fetch'] = fmt_mem(memory_get_usage());
$results['Fetch Peak '] = fmt_mem(memory_get_peak_usage());

foreach($results as $k => $v) {
  print $k . "\t\t" . $v . "\n";
}
Jul 09 2012
Jul 09

Since having moved to Mac from Linux around 1 year ago I have been using MAMP to develop with Drupal. I actually bought the PRO version and my favorite MAMP features are:

  • Easy creation of Virtual Hosts
  • Easy switching between PHP 5.2 and 5.3

That said, there are some major annoyances with MAMP:

  • PHP versions hardly ever get updated (even with paid PRO version), latest available is 5.3.6
  • PHPMyAdmin version included is very outdated and missing out on some nice features
  • MySQL version is stuck at 5.1
  • PHP requires a workaround to work in the terminal
  • Built-in PEAR doesn't play nice

So, after reading about VirtualDocumentRoot in this article I decided to take the plunge and try replacing my MAMP setup with the built-in Apache and PHP and using Homebrew to complement whatever was missing. Since I hardly use Drupal 6 anymore, I figured I can live without PHP 5.2.

What started out as a "I wonder if I can get this to work" 10-minute hobby thing turned out into a full-day task and after finally getting everything to work I'm really satisfied with the results. Since I ran into a bunch of pitfalls along the way I thought I'd document my findings for future reference and also to share with others thinking about doing the same thing.

Despite being fully aware that we can actually and effortlessly build a full LAMP stack using Homebrew, I figured since OSX ships with Apache and PHP let's use them!

Step 1 - Backup

This one is a no-brainer. Backup all your stuff, especially MySQL databases as you will have to import them manually later on.

Step 2 - Enable built-in Apache

Open MAMP and stop all running servers but don't uninstall MAMP just yet! Ok, now go to the "Tools" menu and click on "Enable built-in Apache server". Next, go to "System Preferences -> Sharing" and enable the "Web Sharing" checkbox.

Update for Mountain Lion: The "Web Sharing" option doesn't exist anymore! In order to get things working check out the excellent instructions in this article.

If you're having issues with the checkbox not staying on, don't worry it's a simple fix: replace your /etc/apache2/http.conf with the default version:

1
2
3
4
5
# Backup your current http.conf (or remove it altogether)
sudo mv /etc/apache2/http.conf /etc/apache2/http.conf.bak
 
# Copy over the default settings
sudo cp /etc/apache2/http.conf.default /etc/apache2/http.conf

Step 3 - Enable built-in PHP

This one is easy, edit your new http.conf file and uncomment the following line:

1
#LoadModule php5_module        libexec/apache2/libphp5.so

Step 4 - Install MySQL

Just download MySQL and install it! You can then go to "System Preferences -> MySQL" to start the server and set it up to start up automatically.

Now, all we need to do is add the mysql directory to our PATH and we're good. You can set this up either in your ~/.bashrc or .~/profile: (know that if you have a .bashrc then your .profile won't get used, so pick one or the other)

1
export PATH="/usr/local/mysql/bin:$PATH"

Step 5 - Install PHPMyAdmin

This is an optional but highly-recommended step. I'm not going to bother rewriting this as there's a great article here that gives precise and up-to-date instructions. After following the instructions you should be able to access PHPMyAdmin at http://localhost/~username/phpmyadmin.

In order to persist your settings in PHPMyAdmin, you will need to do the following:

  • Create a "pma" user and give it a password "pmapass" (or whatever)
  • Import the "create_tables.sql" file found in your PHPMyAdmin's example folder
  • Grant full access to the "pma" user for the "phpmyadmin" database
  • Now edit your config.inc.php and add the following:
1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
// Setup user and pass.
$cfg['Servers'][$i]['controluser'] = 'pma';
$cfg['Servers'][$i]['controlpass'] = 'pmapass'; // Replace this with your password.
 
// Setup database.$cfg['Servers'][$i]['pmadb'] = 'phpmyadmin';
 
// Setup tables.
$cfg['Servers'][$i]['bookmarktable'] = 'pma_bookmark';
$cfg['Servers'][$i]['relation'] = 'pma_relation';$cfg['Servers'][$i]['table_info'] = 'pma_table_info';
$cfg['Servers'][$i]['table_coords'] = 'pma_table_coords';
$cfg['Servers'][$i]['pdf_pages'] = 'pma_pdf_pages';
$cfg['Servers'][$i]['column_info'] = 'pma_column_info';
$cfg['Servers'][$i]['history'] = 'pma_history';$cfg['Servers'][$i]['designer_coords'] = 'pma_designer_coords';
$cfg['Servers'][$i]['tracking'] = 'pma_tracking';
$cfg['Servers'][$i]['userconfig'] = 'pma_userconfig';
$cfg['Servers'][$i]['hide_db'] = 'information_schema';
$cfg['Servers'][$i]['recent'] = 'pma_recent';$cfg['Servers'][$i]['table_uiprefs'] = 'pma_table_uiprefs';
 
// For good measure, make it shut up about Mcrypt.
$cfg['McryptDisableWarning'] = true;

At this point everything should be ok, but I couldn't get it to work. I found that logging into PHPMyAdmin as "pma" and then logging out made everything work. I won't bother to try and understand this, just happy it worked.

Step 6 - Install Homebrew

Ok, now things start to get fun! All you need to do to install Homebrew can be found in the very nice installation instructions.

Once Homebrew is installed, run brew doctor from your terminal and fix any errors that appear, it's pretty straightforward. If you are getting a weird error about "Cowardly refusing to continue at this point /", don't worry. I had to update my XCode version and install the Command Line Tools from within the XCode preferences to make brew work without complaining.

Step 7 - Configure VirtualDocumentRoot

Once again, there's great a great article on this already so I won't bother rewriting this step.

Are you still with me? Don't forget to add Google DNS or Open DNS or something else in your network preferences after 127.0.0.1 or you won't be able to access anything outside your local environment.

At this point you should probably test out accessing http://localhost/~username in your browser and make sure everything works. Also create a "test.dev" folder in your Sites directory and make sure you can access it via http://test.dev.

If, like myself, you received some access denied errors, you might have to complement the Directory setting added in your http.conf:

1
2
3
4
56
7
<Directory "/Users/alex/Sites"> # obviously replace this with your username
  Options FollowSymLinks Indexes
  AllowOverride All
  Order deny,allow
  RewriteEngine on  RewriteBase /
</Directory>

Also, be warned, if you start getting errors navigating to any path except the frontpage in a Drupal site, you might need to edit your .htacccess file and uncomment the DocumentRoot / line.

Step 8 - Install PEAR

This is an optional, but highly recommended step. First, download the installer:

1
2
3
4
5
# If you have wget installed (hint: use Homebrew)
wget http://pear.php.net/go-pear.phar
 
# If not
curl -0 http://pear.php.net/go-pear.phar > go-pear.phar

Now, just run the phar archive:

1
sudo php -d detect_unicode=0 go-pear.phar

For whatever reason PEAR wants to install itself in your home folder by default, I suggest moving it to "/usr/local/pear".

After installing, make sure that the PEAR include path has been added to the end of your /etc/php.ini:

1
2
3
;***** Added by go-pear
include_path=".:/usr/local/pear/share/pear"
;*****

(If you don't have an /etc/php.ini file, copy it over from /etc/php.ini.default)

You will also have to add the PEAR path to your PATH in your .bashrc or .profile (see step 4).

1
export PATH="/usr/local/pear/bin:$PATH"

Step 9 - Install Drush

With PEAR available, installing Drush is a breeze. I personally prefer to install Drush using PEAR because its easier to manage and update:

1
2
sudo pear channel-discover pear.drush.org
sudo pear install drush/drush

If you skipped Step 8 and don't have PEAR installed, don't worry, there's plenty of information in the README to get Drush up and running.

Step 10 - Install XDebug

Ok, we are almost there! The last thing we really need to get a fully-functional development environment is XDebug. Unfortunately, despite many articles mentioning installation via Homebrew, I couldn't find the formula, so let's compile and install it ourselves.

Updated: There's an XDebug brew formula in @josegonzalez's homebrew-php. Just follow the steps in 11 but replace "xhprof" with "xdebug".

Once again, there's a great article for this. If you get an "autoconf" error you might need to install it using Homebrew:

1
brew install autoconf

Now phpize should work. Don't worry if it gives you a warning, just follow the steps in the article and it'll work anyway!

Now we should make a couple tweaks to our php.ini to take advantage of XDebug:

1
2
3
4
56
7
8
9
1011
12
13
14
1516
# General PHP error settings.
error_reporting = E_ALL | E_STRICT # This is nice for development.
display_errors = On # Make sure we can see errors.
log_errors = On # Enable the error log.
html_errors = On # This is crucial for XDebug 
# XDebug specific settings.
[xdebug]
zend_extension="/usr/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so" # Enable the XDebug extension.
# The following are sensible defaults, alter as needed.
xdebug.remote_enable=1
xdebug.remote_handler=dbgp
xdebug.remote_mode=req
xdebug.remote_host=127.0.0.1xdebug.remote_port=9000

Afterwards just restart Apache and run php -v or open a phpinfo() script and you will see that XDebug appears after the "Zend Engine" version.

Step 11 - Install XHProf

This step is optional and I had forgotten about it until Nick reminded me, so props to him! :)

XHProf is a PHP profiler which integrates nicely with the Devel Drupal module and here's how to install it using Homebrew:

1
2
3
4
# Tap into the php formulae by @josegonzalez.
brew tap josegonzalez/php
brew install php53-xhprof
# This will automatically install the PCRE dependency.

Now all we need to do is add it to our php.ini and we're golden:

1
2
3
4
[xhprof]
extension="/usr/local/Cellar/php53-xhprof/0.9.2/xhprof.so" # Make sure this is the path that homebrew prints when its done installing XHProf.
;This is the directory that XHProf stores it's profile runs in.
xhprof.output_dir=/tmp

Now restart apache and look for "xhprof" in your phpinfo() to confirm all went well.

Bonus Steps - APC, Intl and UploadProgress

If you added josegonzalez's excellent homebrew-php (step 11), you can easily install some extra useful extensions:

1
2
3
4
56
7
8
# Recommended for Drupal
brew install php53-uploadprogress
 
# Recommended for Symfony2
brew install php53-intlbrew install php53-apc
 
# Don't forget to add them all to your php.ini, as per brew's post-install notes!

Step 12 - Cleanup

Ok, we're done! Now go to PHPMyAdmin and re-create your databases and users and also import the backups you made from MAMP's version of PHPMyAdmin. Assuming you followed all the steps successfully, it is now safe to completely uninstall MAMP.

Good for you! :) Now you can go look over the shoulder of a coworker that uses MAMP, smirk and shake your head in disapproval.

Feb 01 2012
Feb 01

In this video we set up a LAMP stack web server for local development on an Ubuntu desktop (version 11.10). We will walk through installing and using tasksel to get the web server installed, then we'll add phpmyadmin to the mix. Once we have the server up and running, we'll look at where the web root is and how to add websites. Finally we wrap up with a quick look at how to start, stop, and restart our server when needed.

Jan 24 2012
Jan 24

This video shows you how to download and do the initial setup of MAMP, which is Apache, MySQL and PHP for Macintosh. It shows some basic configuration tweaks to change the port from 8888 to the default of 80 so that you can just visit the localhost in the browser and get your Drupal installation to appear. It also provides a general orientation to MAMP, and some other initial configuration setting changes.

Jan 24 2012
Jan 24

This video walks through the process of downloading and doing the initial configuration of WampServer, which is Apache, MySQL and PHP for Windows. It shows how to turn it on and off, as well as how to get files to show up from the localhost server.

Dec 09 2011
Dec 09

Drupal can power any site from the lowliest blog to the highest-traffic corporate dot-com. Come learn about the high-end of the spectrum with this comparison of techniques for scaling your site to hundreds of thousands or millions of page views an hour. This Do it with Drupal session with Nate Haug will cover software that you need to make Drupal run at its best, as well as software that acts as a front-end cache (a.k.a Reverse-Proxy Cache) that you can put in-front of your site to offload the majority of the processing work. This talk will cover the following software and architectural concepts:

  • Configuring Apache and PHP
  • MySQL Configuration (with Master/Slave setups)
  • Using Memcache to reduce database load and speed up the site
  • Using Varnish to serve up anonymous content lightning fast
  • Hardware overview for high-availability setups
  • Considering nginx (instead of Apache) for high amounts of authenticated traffic
Oct 14 2011
Tom
Oct 14

Recently I posted an Apache configuration file that allows you to detect mobile devices and pass a query parameter to the back-end that informs your web application what type of device is accessing your website. This without the need for device detection in your back-end (e.g. Drupal).

The main goals for that script where:

  • 1 url for both the mobile and desktop site
  • Allow for caching within an web application framework or even more specific, allow Boost to be used.
  • Pass a "device" query parameter to the back-end framework to identify which device is accessing the site

I recently also made a second .htaccess with a slightly different goal:

  • Separate URL's for different device groups
  • Allow for caching within an web application framework or even more specific, allow Boost to be used.

This is the result:

 ### BOOST + Mobile Customization ###
 
# Put this in your .htaccess file (if you use Boost, put it before the Boost directives.)
 
  # By default we assume that the accessing device is a desktop browser
  RewriteRule .* - [E=device:www]
 
  # If the user agent matches our definition of high end (mh) or low end (ml) device we set tour device variable
  # overwrite the %{VAR:device} variable.
  # These user agent expression should be adjusted according to your requirements.
  RewriteCond %{HTTP_USER_AGENT} ^.*(iphone|android).*$ [NC]
  RewriteRule .* - [E=device:mh]
  RewriteCond %{HTTP_USER_AGENT} ^.*(nokia|BlackBerry).*$ [NC]
  RewriteRule .* - [E=device:ml]
  # above rules can be extended with multiple device categories
 
  # We allow users to override the device detection. 
  # (See next rules where a cookie will be set). 
  # Here we detect if a cookie is present and we use the cookie value.
  # If there is a cookie set, overwrite the %{VAR:device} variable with that value
  RewriteCond %{HTTP_COOKIE} ^.*device=(mh|ml|www)$
  RewriteRule .* - [E=device:%1]
 
  # We can provide device detection overwrite links by creating url's
  # that contain the device query parameter. 
  # If this is present we set the device parameter and set a cookie.
 RewriteCond %{QUERY_STRING} ^.*device=(mh|ml|www).*$
  RewriteRule .* - [E=device:%1,CO=device:%1:.boost.local:1000:/]
 
  # Do the necessary checks if the subdomain equals the device.
  # If so skip the Redirect rule, otherwise we have a redirection loop
  RewriteCond %{ENV:device} ^mh$
  RewriteCond %{HTTP_HOST}  ^mh\.boost\.local
  RewriteRule .* - [S=3]
 
  RewriteCond %{ENV:device} ^ml$
  RewriteCond %{HTTP_HOST}  ^ml\.boost\.local
  RewriteRule .* - [S=2]
 
  RewriteCond %{ENV:device} ^www$
  RewriteCond %{HTTP_HOST}  ^www\.boost\.local
  RewriteRule .* - [S=1]
 
  # Finally we redirect if needed to the right subdomain
  RewriteCond %{ENV:device} ^.*(mh|ml|www).*$
  RewriteRule ^(.*)$ http://%1.domain.com/$1 [L,R]

Feel free to experiment with this!

View the discussion thread.

Sep 10 2011
Tom
Sep 10

From time to time I get questions about caching strategies and mobile devices. Definitely when you want to create different responses in the back end based on the device AND you have strong caching strategies, things get very tricky.

An example scenario in Drupal (but can relate to any CMS) is where you boost performance by storing the generated pages as static files (see Boost).

These solutions rely heavily on Apache in a way that Apache delivers the static files in case these files exist. Otherwise it passes the request to the CMS.

In order to provide a way to server different responses to mobile devices I created following Apache script:

  # Detect if a "device" query parameter is set in the request. 
  # If so, the value for that attribute forces the requesting device type.
  # If the parameter is available, set a cookie to remember for which device pages have to be generated in subsequent requests.
  # Assume three categories:
  #  - mobile-high (high end mobile devices)
  #  - mobile-low (low end mobile devices)
  #  - desktop (desktop browser)
  RewriteCond %{QUERY_STRING} ^.*device=(mobile-high|mobile-low|desktop).*$
  RewriteRule .* - [CO=device:%1:.yourdomain.tld:1000:/,E=device:%1,S=2]
  # now skip 2 RewriteRules till after the user agent detection
 
  # If there is no "device" attribute, search for a cookie.
  # If the cookie is available add ?device=<device-type> to the request
  RewriteCond %{HTTP_COOKIE} ^.*device=(.*)$
  RewriteRule (.*) $1?device=%1  [E=device:%1,S=2,QSA]
  #now skip till after the user agent detection
 
  # If no cookie or device attribute are present, check the user agent by simple user agent string matching (android and iphone only here)
  RewriteCond %{HTTP_USER_AGENT} ^.*(iphone|android).*$ [NC]
  RewriteRule (.*) $1?device=mobile-high [E=device:mobile-high,QSA,S=1]
 
  # Detect the user agent for lower end phones (nokia, older blackberries, ...)
  RewriteCond %{HTTP_USER_AGENT} ^.*(nokia|BlackBerry).*$ [NC]
  RewriteRule (.*) $1?device=mobile-low [E=device:mobile-low,QSA]

This scripts makes sure that every request has an extra "device" query parameter that defines the requesting device (desktop/mobile-high/mobile-low). This information can be used by the back end or any caching mechanisms!

I also added a %{ENV:device} variable so that within the Apache script more logic can be placed based on the device.

Applying to the Boost module

The above snippet works together with Boost by replacing %{server_name} by %{ENV:device}/%{server_name} in the Boost apache configuration.

 ### BOOST START ###
  AddDefaultCharset utf-8
  <FilesMatch "(\.html|\.html\.gz)$">
    <IfModule mod_headers.c>
      Header set Expires "Sun, 19 Nov 1978 05:00:00 GMT"
      Header set Cache-Control "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
    </IfModule>
  </FilesMatch>
  <IfModule mod_mime.c>
    AddCharset utf-8 .html
    AddCharset utf-8 .css
    AddCharset utf-8 .js
    AddEncoding gzip .gz
  </IfModule>
  <FilesMatch "(\.html|\.html\.gz)$">
    ForceType text/html
  </FilesMatch>
  <FilesMatch "(\.js|\.js\.gz)$">
    ForceType text/javascript
  </FilesMatch>
  <FilesMatch "(\.css|\.css\.gz)$">
    ForceType text/css
  </FilesMatch>
 
  # Gzip Cookie Test
  RewriteRule boost-gzip-cookie-test\.html  cache/perm/boost-gzip-cookie-test\.html\.gz [L,T=text/html]
 
  # GZIP - Cached css & js files
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=2]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.css\.gz -s
  RewriteRule .* cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.css\.gz [L,QSA,T=text/css]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.js\.gz -s
  RewriteRule .* cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.js\.gz [L,QSA,T=text/javascript]
 
  # NORMAL - Cached css & js files
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.css -s
  RewriteRule .* cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.css [L,QSA,T=text/css]
  RewriteCond %{DOCUMENT_ROOT}/cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.js -s
  RewriteRule .* cache/perm/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_\.js [L,QSA,T=text/javascript]
 
  # Caching for anonymous users
  # Skip boost IF not get request OR uri has wrong dir OR cookie is set OR request came from this server OR https request
  RewriteCond %{REQUEST_METHOD} !^(GET|HEAD)$ [OR]
  RewriteCond %{REQUEST_URI} (^/(admin|cache|misc|modules|sites|system|openid|themes|node/add))|(/(comment/reply|edit|user|user/(login|password|register))$) [OR]
  RewriteCond %{HTTP_COOKIE} DRUPAL_UID [OR]
  RewriteCond %{HTTP:Pragma} no-cache [OR]
  RewriteCond %{HTTP:Cache-Control} no-cache [OR]
  RewriteCond %{HTTPS} on
  RewriteRule .* - [S=3]
 
  # GZIP
  RewriteCond %{HTTP_COOKIE} !(boost-gzip)
  RewriteCond %{HTTP:Accept-encoding} !gzip
  RewriteRule .* - [S=1]
  RewriteCond %{DOCUMENT_ROOT}/cache/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz -s
  RewriteRule .* cache/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html\.gz [L,T=text/html]
 
  # NORMAL
  RewriteCond %{DOCUMENT_ROOT}/cache/normal/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html -s
  RewriteRule .* cache/normal/%{ENV:device}/%{SERVER_NAME}%{REQUEST_URI}_%{QUERY_STRING}\.html [L,T=text/html]
 
  ### BOOST END ###

In your settings.php you should do the following:

/**
 * Theme array for the mobile categories. The key equals the device category
 * supported by the site.
 * Changing this array configures the mobile theme for a certain device group
 */
$mobile_themes = array('mobile-low' => 'bluemarine', 'mobile-high' => 'pushbutton');
/**
 * Check if the device argument is one of the valid devices (this is just an extra check)
 */
if(isset($_GET['device']) && array_key_exists($_GET['device'], $mobile_themes)){
        $boost_device = $_GET['device'];
        $conf['theme_default'] = $mobile_themes[$_GET['device']];  // set the theme
}
else {
        $boost_device = 'desktop'; 
}
 
 
 
// We configure the Boost folders to support multiple devices. 
// This corresponds to the %{ENV:device}/%{SERVER_NAME} string in the apache script
$conf['boost_normal_dir'] = 'normal/' . $boost_device ;
$conf['boost_gzip_dir'] = 'normal/' . $boost_device;
$conf['boost_perm_normal_dir'] = 'perm/' . $boost_device;
$conf['boost_perm_gz_dir'] = 'perm/' . $boost_device;
 
/**
 * End configuration for boost 
 */

Hope this gives some inspiration!

View the discussion thread.

Jan 11 2011
Jan 11

Recently I started to notice I very high number of LRU Nuked Objects on my websites, which was essentially wipping the entire Varnish Cache. I run Varnish with 4GB File cache and site ocntent is mostly served by external "Poor Man CDN". So, in theory my site content should not be anything near 4GB, however, Varnish Cache was running out of memory and "Nuking" cached objects.

Sizing your cache

Here is what Varnish Cache man pages have to say:

Watch the n_lru_nuked counter with varnishstat or some other tool. If you have a lot of LRU activity then your cache is evicting objects due to space constraints and you should consider increasing the size of the cache.

Either I can increase the Varnish Cache size or be a little smarter in handling the resource.

I am using the Drupal's Compress Page feature along with Aggrigate JS and CSS features. However, Drupal will only compress the HTML output, which can result to quite large addgiated CSS and JS files. If Varnish is caching the uncompressed output, this will result in considerably more memory usage.

A much better and effective solution is to let Apache's mod_deflate handle the compression instead of Drupal. Drupal compression is oftenten a prefered choice as Drupal would compress>cache once compared to Apache compressing the files on every request. However, if you are using Varnish, which will handle the cache element, Apache would only have to do the hard work once. Also, the added advantage is that you have a granular control over which files are getting compressed.

So, did that make a difference?

LRU Nuked Objects vs Expired Items in Varnish Cache

Varnish Cache Memory Usage

Number of objects in Varnish Cache

Varnich Cache Hit Rate

Bandwidth Usage

Conclusion:

  • Varnish Cache now uses less memory
  • Handles more cached objects
  • No nuking cached items
  • Higher cache hit rate
  • Site is using less bandwidth to serve the same content.

Simplez ;)

Related blog posts: 


Bookmark and Share
Oct 15 2010
Oct 15

Apple OS X comes with Apache and PHP built-in but need some tweaking to work. It also does not come with MySQL. Because of this, many developers have chosen to use MacPorts, Homebrew, or MAMP to install new binaries for Apache, PHP, and MySQL. However, doing this means your system would have multiple copies of Apache and PHP on your machine, and could create conflicts depending on how your built-in tools are configured. This tutorial will show you how to get the built-in versions of Apache and PHP running with an easy to install version of MySQL.

Additionally, it will show how to set up and install Drush, and phpMyAdmin. Setting up a new website is as easy as editing a single text file and adding a line to your /etc/hosts file. All of our changes will avoid editing default configuration files, either those provided by Apple in OS X or from downloaded packages, so that future updates will not break our customizations.

Apache

  1. Apple uses what appears to be the FreeBSD "version" of Apache 2.2, and includes various sample files. One of the sample files is a virtual hosts file that we will copy to our ~/Sites folder for easy editing. Launch the terminal and get started:

    $ cp /etc/apache2/extra/httpd-vhosts.conf ~/Sites

  2. In order to prevent future major OS updates from breaking our configuration, we'll be editing as few existing files as possible. Luckily, by default, OS X's httpd.conf file (the main configuration file for Apache) includes all *.conf files in another folder, /etc/apache2/other:

    $ tail -1 /etc/apache2/httpd.conf
    Include /private/etc/apache2/other/*.conf

  3. We then create a symbolic link in /etc/apache2/other to our newly-copied httpd-vhosts.conf from our ~/Sites folder:

    $ sudo ln -s ~/Sites/httpd-vhosts.conf /etc/apache2/other

  4. The ~/Sites/httpd-vhosts.conf is now where we will add all new virtual hosts. We will place our site root folders in ~/Sites and define them in this file. But first, we need to make some changes (or, download this edited httpd-vhosts.conf file and change fname in /Users/fname to the appropriate path for your system). Begin by adding this text to the top of the file, which will enable the ~/Sites folder and subfolders to be accessed by Apache (again, replace /Users/fname with your appropriate path for home directory):

    # Ensure all of the VirtualHost directories are listed below

    Options Indexes FollowSymLinks MultiViews
    AllowOverride All
    Order allow,deny
    Allow from all

  5. Add these lines anywhere in the file to ensure that http://localhost still serves the default folder:

    # For localhost in the OS X default location

    ServerName localhost
    DocumentRoot /Library/WebServer/Documents

  6. We will cover setting up a Virtual Host later, so let's reload our Apache settings to activate our new configuration:

    $ sudo apachectl graceful

    Or, go to the Sharing Preference Pane and uncheck/[re]check the Web Sharing item to stop and start Apache.

PHP

  1. As of OS X 10.6.6, OS X ships with PHP 5.3.3, but it is not enabled in Apache by default. By looking at /etc/apache2/httpd.conf, you can see that the line to load in the PHP module is commented out:

    $ grep php /etc/apache2/httpd.conf
    #LoadModule php5_module libexec/apache2/libphp5.so

  2. Since we are trying to avoid editing default configuration files where possible, we can add this information, along with the configuration to allow *.php files to run, without the comment at the beginning of the line, in the same folder that we put the symlink for httpd-vhosts.conf:

    $ sudo sh -c "cat > /etc/apache2/other/php5-loadmodule.conf <<'EOF'
    LoadModule php5_module libexec/apache2/libphp5.so
    EOF"

  3. PHP on OS X does not come with a configuration file at php.ini and PHP runs on its defaults. There is, however, a sample file ready to be copied and edited. The lines below will copy the sample file to /etc/php.ini (optionally, download the edited file and insert at /etc/php.ini) and add some developer-friendly changes to your configuration.

    $ sudo cp

    -

    a

    /

    etc

    /

    php

    .

    ini

    .default /

    etc

    /

    php

    .

    ini


    $ sudo sh

    -

    c

    "cat >> /etc/php.ini <<'EOF'

    ;;
    ;; User customizations below
    ;;

    ; Original - memory_limit = 128M
    memory_limit = 196M
    ; Original - post_max_size = 8M
    post_max_size = 200M
    ; Original - upload_max_filesize = 2M
    upload_max_filesize = 100M
    ; Original - default_socket_timeout = 60
    default_socket_timeout = 600
    ; Original - max_execution_time = 30
    max_execution_time = 300
    ; Original - max_input_time = 60
    max_input_time = 600
    ; Original - log_errors = off
    log_errors = on
    ; Original - display_errors = Off
    display_errors = on
    ; Original - display_startup_errors = Off
    display_startup_errors = on
    ; Original - error_reporting = E_ALL & ~E_DEPRECATED
    error_reporting = E_ALL & ~E_DEPRECATED & ~E_NOTICE
    EOF"

  4. OS X's PHP does ship with pear and pecl, but they are not in the $PATH, nor are there binaries to add or symlink to your path. The first two commands will perform a channel-update for pear and again for pecl, the second two will add an alias for each for your current session, and the last two will add the aliases permanently to your local environment (you may get PHP Notices that appear to be errors but they are simply notices. You can change error_reporting in php.ini to silence the Notices):

    $ sudo php /usr/lib/php/pearcmd.php channel-update pear.php.net
    $ sudo php /usr/lib/php/peclcmd.php channel-update pecl.php.net
    $ alias pear="php /usr/lib/php/pearcmd.php"
    $ alias pecl="php /usr/lib/php/peclcmd.php"
    $ cat >> ~/.bashrc <<'EOF'

    alias pear="php /usr/lib/php/pearcmd.php"
    alias pecl="php /usr/lib/php/peclcmd.php"
    EOF

  5. We do a lot of Drupal development, and a commonly-added PHP PECL module for Drupal is uploadprogress. With the new alias we just set, adding it is easy. Build the PELC package, add it php.ini, and reload the PHP configuration into Apache:

    Note: PECL modules require a compiler such as gcc to build. Install Xcode to build PECL packages such as uploadprogress.

    $ pecl install uploadprogress


    $ sudo sh

    -

    c

    "cat >> /etc/php.ini <<'EOF'

    ; Enable PECL extension uploadprogress
    extension=uploadprogress.so
    EOF"


    $ sudo apachectl graceful

MySQL

MySQL is the only thing that doesn't not ship with OS X that is needed to run a "LAMP" website like Drupal or WordPress. MySQL/Sun/Oracle provides pre-built MySQL binaries for OS X that don't conflict with other system packages, and even provides a nice System Preferences pane for starting/stopping the process and a checkbox to start MySQL at boot.

  1. Go to the download site for OS X pre-built binaries at: http://dev.mysql.com/downloads/mysql/index.html#macosx-dmg and choose the most appropriate type for your system. For systems running Snow Leopard, choose Mac OS X ver. 10.6 (x86, 64-bit), DMG Archive (on the following screen, you can scroll down for the download links to avoid having to create a user and password).

  2. Open the downloaded disk image and begin by installing the mysql-5.1.51-osx10.6-x86_64.pkg file (or named similarly to the version number you downloaded):

  3. Once the Installer script is completed, install the Preference Pane by double-clicking on MySQL.prefPane. If you are not sure where to install the preference pane, choose "Install for this user only":

    Note:

  4. The preference pane is now installed and you can check the box to have MySQL start on boot or start and stop it on demand with this button. To access this screen again, load System Preferences. Click "Start MySQL Server" now:

  5. The MySQL installer installed its files to /usr/local/mysql, and the binaries are in /usr/local/mysql/bin which is not in the $PATH of any user by default. Rather than edit the $PATH shell variable, we add symbolic links in /usr/local/bin (a location already in $PATH) to a few MySQL binaries. Add more or omit as desired. This is an optional step, but will make tasks like using Drush or performing a mysqldump in the command line much easier:

    $ cd /usr/local/bin
    $ ln -s /usr/local/mysql/bin/mysql
    $ ln -s /usr/local/mysql/bin/mysqladmin
    $ ln -s /usr/local/mysql/bin/mysqldump
    $ ln -s /usr/local/mysql/support-files/mysql.server

  6. Next we'll run mysql_secure_installation to set the root user's password and other setup-related tasks. This script ships with MySQL and only needs to be run once, and it should be run with sudo. As you run it, you can accept all other defaults after you set the root user's password:

    $ sudo /usr/local/mysql/bin/mysql_secure_installation

  7. If you were create a simple php file containing "phpinfo();" to view the PHP Information, you will see that Apple-built PHP looks for the MySQL socket at /var/mysql/mysql.sock:

    The problem is that the MySQL binary provided by Oracle puts the socket file in /tmp, which is the standard location on most systems. Other tutorials have recommended rebuilding PHP from source, or changing where PHP looks for the socket file via /etc/php.ini, or changing where MySQL places it when starting via /etc/my.cnf, but it is far easier and more fool-proof to create a symbolic link for /tmp/mysql.sock in the location OS X's PHP is looking for it. Since this keeps us from changing defaults, it ensures that a PHP update (via Apple) or a MySQL update (via Oracle) will not have an impact on functionality:

    $ sudo mkdir /var/mysql
    $ sudo chown _mysql:wheel /var/mysql
    $ sudo ln -s /tmp/mysql.sock /var/mysql

  8. After installing MySQL, several sample my.cnf files are created but none is placed in /etc/my.cnf, meaning that MySQL will always load in the default configuration. Since we will be using MySQL for local development, we will start with a "small" configuration file and make a few changes to increase the max_allowed_packet variable under [mysqld] and [mysqldump] (optionally, use the contents of this sample file in /etc/my.cnf). Later on, we can edit /etc/my.cnf to make other changes as desired:

    $ sudo cp /usr/local/mysql/support-files/my-small.cnf /etc/my.cnf
    $ sudo sed -i "" 's/max_allowed_packet = 16M/max_allowed_packet = 2G/g' /etc/my.cnf
    $ sudo sed -i "" "/\[mysqld\]/ a\\`echo -e '\n\r'`max_allowed_packet = 2G`echo -e '\n\r'`" /etc/my.cnf

  9. To load in these changes, go back to the MySQL System Preferences pane, and restart the server by pressing "Stop MySQL Server" followed by "Start MySQL Server." Or, you can do this on the command line (if you added this symlink as shown above):

    $ sudo mysql.server restart

    Shutting down MySQL
    ... SUCCESS!
    Starting MySQL
    ... SUCCESS!

phpMyAdmin

At this point, the full "MAMP stack" (Macintosh OS X, Apache, MySQL, PHP) is ready, but adding phpMyAdmin will make administering your MySQL install easier.

  1. phpMyAdmin will complain that the PHP extension 'php-mcrypt' is not available. Since you will not likely be allowing anyone other than yourself to use phpMyAdmin, you could ignore it, but if you want to build the extension, it is possible to build it as a shared object instead of having to rebuild PHP from source. Follow Michael Gracie's guide here (for his Step 2, grab the PHP source that matches "$ php --version" on your system): http://michaelgracie.com/2009/09/23/plugging-mcrypt-into-php-on-mac-os-x-snow-leopard-10.6.1/ (or, if you prefer Homebrew: http://blog.rogeriopvl.com/archives/php-mcrypt-in-snow-leopard-with-homebrew/ . The following files will be added:
    • /usr/local/lib/libmcrypt
    • /usr/local/include/mcrypt.h
    • /usr/local/include/mutils/mcrypt.h
    • /usr/local/bin/libmcrypt-config
    • /usr/local/lib/libmcrypt.la
    • /usr/local/lib/libmcrypt.4.4.8.dylib
    • /usr/local/lib/libmcrypt.4.dylib
    • /usr/local/lib/libmcrypt.dylib
    • /usr/local/share/aclocal/libmcrypt.m4
    • /usr/local/man/man3/mcrypt.3
    • /usr/lib/php/extensions/no-debug-non-zts-20090626/mcrypt.so
  2. The following lines will install the latest phpMyAdmin to /Library/WebServer/Documents and set up config.inc.php. Afterwards, you will be able to access phpMyAdmin at http://localhost/phpMyAdmin:
  3. $ cd /Library/WebServer/Documents
    $ curl -s -L -O http://downloads.sourceforge.net/project/phpmyadmin/phpMyAdmin/`curl -s -L http://www.phpmyadmin.net/home_page/index.php | grep -m1 dlname |awk -F'Download ' '{print $2}'|cut -d "<" -f1`/phpMyAdmin-`curl -s -L http://www.phpmyadmin.net/home_page/index.php | grep -m1 dlname |awk -F'Download ' '{print $2}'|cut -d "<" -f1`-english.tar.gz
    $ tar zxpf `ls -1 phpMyAdmin*tar.gz|sort|tail -1`
    $ rm `ls -1 phpMyAdmin*tar.gz|sort|tail -1`
    $ mv `ls -d1 phpMyAdmin-*|sort|tail -1` phpMyAdmin
    $ cd phpMyAdmin
    $ cp config.sample.inc.php config.inc.php
    $ sed -i "" "s/blowfish_secret\'\] = \'/blowfish_secret\'\] = \'`cat /dev/urandom | strings | grep -o '[[:alnum:]]' | head -n 50 | tr -d '\n'`/" config.inc.php

Drush

Drush is a helpful tool that provides a "Drupal shell" to speed up your Drupal development.

  1. Run the following in the Terminal to get Drush 4.2 installed in /usr/local and add drush to the $PATH:

    $ svn co http://subversible.svn.beanstalkapp.com/modules/drush/tags/DRUPAL-7--4-2/ /usr/local/drush
    $ cd /tmp
    $ pear download Console_Table
    $ tar zxvf `ls Console_Table-*.tgz` `ls Console_Table-*.tgz | sed -e 's/\.[a-z]\{3\}$//'`/Table.php
    $ mv `ls Console_Table-*.tgz | sed -e 's/\.[a-z]\{3\}$//'`/Table.php /usr/local/drush/includes/table.inc
    $ rm -fr Console_Table-*
    $ ln -s /usr/local/drush/drush /usr/local/bin/drush

Virtual Hosts

Now that all of the components to run a website locally are in place, it only takes a few changes to ~/Sites/httpd-vhosts.conf to get a new local site up and running.

  1. Your website code should be in a directory in ~/Sites. For the purposes of this example, the webroot will be at ~/Sites/thewebsite.

  2. You need to choose a "domain name" for your local site that you will use to access it in your browser. For the example, we will use "myproject.local". The first thing you need to do is add a line to /etc/hosts to direct your "domain name" to your local system:

    $ sudo sh -c "cat >> /etc/hosts <<'EOF'
    127.0.0.1 myproject.local
    EOF

  3. If you have been following the rest of this guide and used a copy of /etc/apache2/extra/httpd-vhosts.conf in ~/Sites/httpd-vhosts.conf, you can delete the first of the two example Virtual Hosts and edit the second one. You can also delete the ServerAdmin line as it is not necessary.

  4. Change the ServerName to myproject.local (or whatever you entered into /etc/hosts) so Apache will respond when you visit http://myproject.local in a browser.

  5. Change the DocumentRoot to the path of your webroot, which in this example is /Users/fname/Sites/thewebsite

  6. It's highly recommended to set a CustomLog and an ErrorLog for each virtual host, otherwise all logged output will be in the same file for all sites. Change the line starting with CustomLog from "dummy-host2.example.com-access_log" to "myproject.local-access_log", and for ErrorLog change "dummy-host2.example.com-error_log" to "myproject.local-error_log"

  7. Save the file and reload the Apache configuration as shown in Step 6 under the Apache section, and visit http://myproject.local in your browser.

Going forward, all you have to do to add a new website is duplicate and edit the to section underneath your existing configuration, make new edits, edit /etc/hosts again, and reload Apache.

Sep 25 2010
Sep 25

After having run janaksingh.com website on Drupal 6 with Apache+PHP+MySql, I wanted to move to Pressflow so I could harness the added advantages of Varnish.

This is not an in-depth installation guide or a discussion about Varnish or Pressflow, but quick setup commands for my own reference. Please see links at the end if you wish to explore Varnish or Pressflow setup in greater depth.

I wanted:
Varnish > APC > Apache > Pressflow setup..

Here is what I did:

Install Apache

yum install httpd
For a detailed installation tutorial, please see: http://janaksingh.com/blog/drupal-development-osx-vmware-fusion-and-cent...

Configure Apache to run on port 8080

sudo nano /etc/httpd/conf/httpd.conf
edit the httpd.conf file to allow apache to run on port 8080

Listen 8080
NameVirtualHost *:8080

Reconfigure all my VHosts to run on port 8080

nano /etc/httpd/conf.d/1_varnish.conf

Disable proxy_ajp file

mv /etc/httpd/conf.d/proxy_ajp.conf /etc/httpd/conf.d/proxy_ajp.conf_dis
<VirtualHost *:8080>
 DocumentRoot /var/www/html/varnished
 ServerName varnished.local
 
 ErrorLog /var/log/httpd/varnished_com_error_log
 LogFormat Combined
 TransferLog /var/log/httpd/varnished_com_access_log
</VirtualHost>

Install EPEL Repo

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm

Install Varnish

Varnish is bundled in EPEL Repo
yum install varnish

Auto start varnish

sudo /sbin/chkconfig varnish on --level 2345

Configure Varnish to run on port 80

Varnish setup and config

nano /etc/sysconfig/varnish
DAEMON_OPTS="-a :80 \
             -T localhost:6082 \
             -f /etc/varnish/varnished.vcl \
             -u varnish -g varnish \
             -s file,/var/lib/varnish/varnish_storage.bin,1G"

Setup Varnish VCL File for Pressflow

Use example VCL config file from https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow and modify to your needs

nano /etc/varnish/varnished.vcl
backend default {
  .host = "127.0.0.1";
  .port = "8080";
  .connect_timeout = 600s;
  .first_byte_timeout = 600s;
  .between_bytes_timeout = 600s;
}
 
sub vcl_recv {
  if (req.request != "GET" &&
    req.request != "HEAD" &&
    req.request != "PUT" &&
    req.request != "POST" &&
    req.request != "TRACE" &&
    req.request != "OPTIONS" &&
    req.request != "DELETE") {
      /* Non-RFC2616 or CONNECT which is weird. */
      return (pipe);
  }
 
  if (req.request != "GET" && req.request != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
  }
 
  // Remove has_js and Google Analytics cookies.
  set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+)=[^;]*", "");
 
  // To users: if you have additional cookies being set by your system (e.g.
  // from a javascript analytics file or similar) you will need to add VCL
  // at this point to strip these cookies from the req object, otherwise
  // Varnish will not cache the response. This is safe for cookies that your
  // backed (Drupal) doesn't process.
  //
  // Again, the common example is an analytics or other Javascript add-on.
  // You should do this here, before the other cookie stuff, or by adding
  // to the regular-expression above.
 
 
  // Remove a ";" prefix, if present.
  set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
  // Remove empty cookies.
  if (req.http.Cookie ~ "^\s*$") {
    unset req.http.Cookie;
  }
 
  if (req.http.Authorization || req.http.Cookie) {
    /* Not cacheable by default */
    return (pass);
  }
 
  // Skip the Varnish cache for install, update, and cron
  if (req.url ~ "install\.php|update\.php|cron\.php") {
    return (pass);
  }
 
  // Normalize the Accept-Encoding header
  // as per: http://varnish-cache.org/wiki/FAQ/Compression
  if (req.http.Accept-Encoding) {
    if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
      # No point in compressing these
      remove req.http.Accept-Encoding;
    }
    elsif (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    }
    else {
      # Unknown or deflate algorithm
      remove req.http.Accept-Encoding;
    }
  }
 
  // Let's have a little grace
  set req.grace = 30s;
 
  return (lookup);
}
 
 sub vcl_hash {
   if (req.http.Cookie) {
     set req.hash += req.http.Cookie;
   }
 }
 
 // Strip any cookies before an image/js/css is inserted into cache.
 sub vcl_fetch {
   if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {
     // This is for Varnish 2.0; replace obj with beresp if you're running
     // Varnish 2.1 or later.
     unset obj.http.set-cookie;
   }
 }
 
 sub vcl_error {
   // Let's deliver a friendlier error page.
   // You can customize this as you wish.
   set obj.http.Content-Type = "text/html; charset=utf-8";
   synthetic {"
   <?xml version="1.0" encoding="utf-8"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   <html>
     <head>
       <title>"} obj.status " " obj.response {"</title>
       <style type="text/css">
       #page {width: 400px; padding: 10px; margin: 20px auto; border: 1px solid black; background-color: #FFF;}
       p {margin-left:20px;}
       body {background-color: #DDD; margin: auto;}
       </style>
     </head>
     <body>
     <div id="page">
     <h1>Page Could Not Be Loaded</h1>
     <p>We're very sorry, but the page could not be loaded properly. This should be fixed very soon, and we apologize for any inconvenience.</p>
     <hr />     <h4>Debug Info:</h4>
     <pre>
 Status: "} obj.status {"
 Response: "} obj.response {"
 XID: "} req.xid {"
 </pre>
       <address><a href="http://www.varnish-cache.org/">Varnish</a></address>
       </div>
     </body>
    </html>
    "};
    deliver;
 }

Restart Apache and Varnish

/etc/init.d/httpd restart
/etc/init.d/varnish restart

Further reading:

Configure Varnish for Pressflow
Modules that break caching, and how to fix them
Using Pressflow behind a reverse proxy cache
Example Varnish VCL for a Drupal / Pressflow site
Varnish configuration to only cache for non-logged in users
Varnish best practices
Drupal guide to Managing site performance

Related blog posts: 


Bookmark and Share
Jul 30 2010
krs
Jul 30

This is an Apache Directive that I've never had to use before, but it came in very handy for a very specific problem.

There was already an apache redirect (RewriteRule + RewriteCond) in place, but the destination URL was case sensitive! That's not normally a problem, but it was for an ad server, and the variables were coming in as uppercase, but needed to be lowercase after the redirect. Bad programming on the part of the ad server in my opinion, but we're not going to let that stop us! :)

RewriteMap to the rescue!

First off, the actual directive is a lot like a function definition, and it can only go in a config file or vhost, it's not allowed in a .htaccess file. Luckily the one we want to use is built in, so we just make it available with:

RewriteMap lc int:tolower

This makes the "lc" function is available in our rewrite rules. We start off with the condition and basic rule ...

# Send ads to new ad server
RewriteCond %{HTTP_HOST} ^old\.ad\.server\.com$ [NC]
RewriteRule ^.*site=([^/]+)/.*area=([^/]+)/.*$ http://newad.sever.net/ad/$1/$2 [R,L,NC]

To use the RewriteMap function in your RewriteRule just change the $N replacement to ${lc:$N}, and you're all set. So for example $1 becomes ${lc:$1}. In our example above, the new RewriteRule looks like this:

RewriteRule ^.*site=([^/]+)/.*area=([^/]+)/.*$ http://newad.server.net/ad/${lc:$1}/${lc:$2} [R,L,NC]

Jul 07 2009
Jul 07

Acquia Launches Cloud-based Solr Search Indexing

Posted by Jeffrey Scott -TypeHost Web Development | Tuesday, July 7th, 2009
, , , , ,

Acquia, the start-up company founded by Dries Buytaert, the lead developer & founder of Drupal, has announced that they are now providing paid search indexing for Drupal sites on a subscription basis aimed at enterprise sites. Similar to Mollom, Acquia’s anti-spam software for CMS platforms, Acquia Search will also work for those running other open source software like WordPress, Joomla, TYPO3, etc as well as sites with proprietary code. Acquia Search is based on the Lucene and Solr distributions of Apache, and essentially works by having Acquia index your site’s content on their computers and then send it with encryption on demand to supply user queries using an integrated Acquia Search module. According to the announcement, Acquia is using Solr server farms on Amazon EC2 to power this on cloud architecture.

Many people have complained about Drupal’s core search functionality over the years, but the server requirements behind Solr and Lucene require a Java extension that most people are not equipped to manage on their existing IT architecture, staff, or budget. So Acquia is offering these search functionalities as SaaS, or Software as a Service on a remote-hosted, pre-configured basis. If you want to do it yourself, see:
http://drupal.org/project/apachesolr

Reference: http://en.wikipedia.org/wiki/Solr

According to Dries:

“Acquia Search is included for no additional cost in every Acquia Network subscription. Basic and Professional subscribers have one ‘search slice’ and Enterprise subscribers have five ‘search slices’. A slice includes the processing power to index your site, to do index updates, to store your index, and to process your site visitors’ search queries. Each slice includes 10MB of indexing space – enough for a site with between 1,000 and 2,000 nodes. Customers who exceed the level included with their subscription may purchase additional slices. A ten-slice extension package costs an additional $1,000/year, and will cover an additional 10,000 – 20,000 nodes in an index of 100MB. For my personal blog, which has about 900 nodes at the time of this writing, a Basic Acquia Network subscription ($349 USD/year) would give me all the benefits of Acquia Search, plus all the other Acquia Network services.”1

Put in this perspective, most Drupal users likely won’t be switching to Acquia Search anytime soon. But, for the most part… they have little need to. For small sites or social networks, Drupal’s core search is going to be generally sufficient. Drupal will index your site automatically on cron runs, and keep this index of keywords and nodes in a table of your MySQL database. If you are working a lot with taxonomy and CCK fields, then Faceted Search is a recommended choice: http://drupal.org/project/faceted_search

I have used Faceted Search on a number of sites and it is excellent for building a custom search engine around your site’s own custom vocabularies, hierarchies, and site structures. Faceted Search is also important in a number of Semantic Web integrations working with RDF data and other micro-tags attached to data fields. Acquia Search is designed to work in this way as well as to facilitate the number crunching involved when high traffic sites with extremely large databases of content need to sift through search archives quickly to return results from user queries. Consider the example of Drupal.org in this context – Acquia Search is the solution to managing over 500,000 nodes and millions of search queries on an extremely active site.

“Reality is that for a certain class of websites — like intranets or e-commerce websites — search can be the most important feature of the entire site. Faceted search can really increase your conversions if you have an e-commerce website, or can really boost the productivity of your employees if you have a large intranet. For those organizations, Drupal’s built in search is simply not adequate. We invested in search because we believe that for many of these sites, enterprise-grade search is a requirement… The search module shipped with Drupal core has its purpose and target audience. It isn’t right for everyone, just as Acquia Search is not for everyone. Both are important, not just for the Drupal community at large, but also for many of Acquia’s own customers. Regardless, there is no question that we need to keep investing and improving Drupal’s built-in search.”2

In summary, Acquia Search is mostly targeted at enterprise level Drupal users with extremely large databases and high traffic, and is a cloud based solution that should not only speed up the rate of return on results, it should also improve the quality of the material returned based on faceted keywords & vocabularies. For those using Acquia’s personal or small business subscription accounts, the new search should appear as an additional “free bonus” with your monthly package of services. For users, even on a small site, the efficiency of faceted search may make information more accessible for visitors.

To learn more, visit: http://buytaert.net/acquia-search-benefits-for-visitors

  1. http://buytaert.net/acquia-search-available-commercially [?]
  2. http://buytaert.net/acquia-search-versus-drupal-search [?]

Pings/Trackbacks

Jun 16 2009
Jun 16

In honor of this week's Open Source Bridge conference, as well as in recognition of the role that open source software has played in the development of our business, we're pleased to announce that today, June 16, 2009, Code Sorcery Workshop is offering any open source contributor a free license to Meerkat, our SSH tunnel management application. We are also giving away a $250 gift certificate to the legendary Powell's Books. Read on for the details.

If you'd like a free copy of Meerkat, just leave a comment on this post linking to an open source project that you've worked on with a brief mention of what you did. It could be coding, but doesn't have to be -- it could also be documentation, helping new users, anything that contributes to the common good of the project. We'll collect all the info and send each contributor a full, unrestricted license to Meerkat, a $19.95 USD value.

However, if you'd like to instead try for the $250 USD gift certificate to Powell's Books, a purchase of Meerkat will make you eligible for this drawing. Just register Meerkat today and you will automatically be entered for the drawing. The winner will be announced in a followup post.

In both cases, you must take action by midnight Pacific Daylight Time tonight to qualify.

Meerkat is an application that adds a lot of Mac-specific value to SSH, an open source tool that ships with every Mac (as OpenSSH). And Macs themselves are built on a ton of open source software such as Apache, Postfix, CUPS, Perl, PHP, Python, Ruby, sudo, unzip, zlib, and many others. You can read more about Apple's commitment to open source as well as open source releases pertaining to Mac OS X.

I began knowingly using open source software in the mid-90s and started contributing by releasing my own projects on freshmeat in late 1999. I've always looked for ways to contribute to open source projects when I can, whether it's by bug fixes, new feature patches, documentation, or just community help. Most recently, I've been involved with the Drupal content management system.

Open source is the lifeblood of the internet. So many of the tools that we take for granted everyday have been developed in this way, by generous folks giving their time for the greater good. I am extremely thankful for the many ways that open source has enabled me to teach myself a lot of what I know today about technology, to provide economical solutions for clients who need it, and to make software better and better by degrees.

So, here's to open source!

http://codesorcery.net/trackback/223

Aug 26 2008
Aug 26

a while ago i posted some performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud.

amazon have just released ebs, the final piece of technology that makes their ec2 platform really viable for running lamp stacks stuck as drupal.

ebs, the "elastic block store", provides sophisticated storage for your database instance, with features including:

  • high io throughput
  • data replication
  • large storage capacity
  • hot backups using snapshots
  • instance type portability e.g. quickly swapping your database hardware for a bigger machine.

amazon also have a great mysql on ebs tutorial on their developer connection.

let me know if you've given this a go. it looks like a great platform.

Jul 17 2008
Jul 17

I like to develop against local virtual hosts when I work with Drupal. Here is the apache httpd.conf configuration that has served me the best so far. The first VirtualHost is the default .. .simply replicate the second VirtualHost entry and edit accordingly. A simple edit of /etc/hosts to enter the virtual host name that matches the virtual host entries in the httpd.conf and a quick reload by apache and you are good to go. I particularly enjoy the independent logging for each site.

This is likely a no-brainer for the server admins out there but I remember a time when this type of information would have been useful to me ... so here it is. This is not a production level configuration. Have suggestions for improving it?

httpd.conf

NameVirtualHost *
ServerName 127.0.0.1
<VirtualHost *>
  ServerAdmin [email protected]
  DocumentRoot /var/www/
  <Directory />
    Options FollowSymLinks
    AllowOverride all
  </Directory>
  <Directory /var/www/>
    Options All Indexes FollowSymLinks MultiViews
    AllowOverride all
    Order allow,deny
    allow from all
    RewriteEngine on
    RewriteBase /
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
  </Directory>
  ErrorLog /var/log/apache2/error.log
  LogLevel warn
  CustomLog /var/log/apache2/access.log combined
  ServerSignature Off
</VirtualHost>

<VirtualHost *>
  ServerName mysite
  DocumentRoot /var/www/mysite/trunk/htdocs
  ErrorLog /var/log/apache2/mysite_error.log
  CustomLog /var/log/apache2/mysite_access.log combined
</VirtualHost>

/etc/hosts

127.0.0.1     localhost.localdomain localhost mysite mysite1 mysite2

Apr 15 2008
Apr 15
recently i posted some encouraging performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud. while the performance was encouraging, the suitability of this environment for running lamp stacks was not. ec2 had some fundamental issues including a lack of static ip addresses and no viable persistent storage mechanism.

amazon are quickly rectifying these problems, and recently announced elasic ip addresses; a "static" ip address that you own and can dynamically point at any of your instances.

today amazon indicated that persistent storage will soon be available. they claim that this storage will:

  • behave like raw, unformatted hard drives or block devices
  • be significantly more durable than the local disks within an amazon ec2 instance
  • support snapshots backed up to S3
  • support volumes ranging in size from 1GB to 1TB
  • allow the attachment of multiple volumes to a single instance
  • allow high throughput, low latency access from amazon ec2
  • support applications including relational databases, distributed file systems and hadoop processing clusters using amazon ec2

if this works as advertised, it will make ec2 a wonderful platform for your lamp application. amazon promise public availability of this service later this year.

Feb 29 2008
Feb 29

The sections of this tutorial are as follows:

  • Brief background on ApacheBench and performance of Drupal Anonymous vs Authenticated Users
  • Accessing the cookie containing the Drupal session ID
  • Passing the session cookie to ApacheBench
  • Verifying that the Drupal installation being benchmarked recognizes the cookie and is serving authenticated Pages

    ApacheBench, also known as 'ab', is a command line program bundled with the Apache Web Server that measures the performance of web servers by making HTTP requests to a user-specified URL. ApacheBench displays statistical information, such as the number of requests served per second and the amount of time taken to serve those requests, that is useful for evaluating (benchmarking) and tuning the performance of a webserver. ApacheBench does a decent job of simulating different types and levels of load on a sever.

    Many Drupal websites make use of Drupal's core caching functionality, which typically improves performance for anonymous users (users who are not-logged in) by a factor of 10 times, measured in requests served per second. The cache system achieves this performance improvement by storing the results of certain database queries in designated cache tables, allowing Drupal to avoid repeating expensive or frequently run database queries. Drupal does not cache page content for authenticated users because Drupal's permissions system (and node_access restrictions) can show different content depending on a user's role and because some elements of the page are user-specific. Since authenticated users are typically served non-cached pages, requests served to them are more resource intensive and therefore take longer to conduct than requests served to anonymous users.

   By default, ApacheBench requests pages as one or multiple concurrent anonymous users. Benchmarks of performance for anonymous users can be useful for a website that most users will access anonymously, but these data do not accurately describe the performance of websites that serve a significant percentage of their total pages served to authenticated users (as most community sites tend to do). By providing the Drupal session cookie information, ApacheBench can perform benchmarking tests as an authenticated Drupal user.

Accessing the cookie containing the session ID:

Note: These instructions are for the Mozilla Firefox browser.

1)Log into your Drupal installation using the account that you'd like ApacheBench to use when conducting its tests.

2)Open the Firefox preferences menu and browse to the 'Privacy' Tab

3)Click the 'Show Cookies' button

4)Browse the list of sites to find the site that you have just logged into.

5)Highlight the cookie associated with your site that has a 'Cookie Name' value beginning with 'SESS'. There should be only one of these cookies. If you have two cookies with names beginning with 'SESS' for the same Drupal website, you should clear both of these cookies (which will log you out of your Drupal site) and log back in to ensure that you are specifying the correct cookie in ApacheBench.

6)You'll see that your site's SESS cookie has both 'Name' and 'Content' values. You're going to copy each of these values individually and pass them inside of a command line parameter to ab. Note: Do not log out of your Drupal site before testing with ab, as doing so will invalidate the cookie information you have just copied!

Passing the session cookie to ApacheBench

   The 'C' parameter is used with ApacheBench to specify a cookie. This parameter is case sensitive, and should not be confused with 'c' (lowercase) which specifies the number of concurrent requests ApacheBench will make. You'll describe your site's cookie using the following format:

-C 'NAME=CONTENT'

There is no space between the equals sign. Below is a completed example specifying that 400 requests will be made (-n 400) with 10 simultaneous requests (-c 10):

ab -n 400 -c 10 -C 'SESS5a2cb35dfce80bc682796cc451440ac0=d45b85c5be4c651ea2ad520947082e04' http://example.com/

Verifying that Drupal accepts the cookie and is serving pages to an authenticated user

    It's important to verify that Drupal recognizes and accepts the cookie that corresponds to your session. Once, while performing a set of ab authenticated Drupal user benchmarks, I adjusted a Drupal cache setting that increased the number of requests served per second for authenticated users from 5 to 60. I thought, “Wow! I've made an enormous improvement!” What I had actually done was switched from Drupal core session handling to Memcahe's session handling, which caused Drupal to not recognize my cookie. Drupal was actually serving anonymous pages, which accounted for the dramatic “performance improvement.” To avoid misunderstandings like this one, we'll use ApacheBench's verbose mode to confirm that Drupal is serving authenticated pages.

   In our theme directory, we'll add a PHP snippet to page.tpl.php below the HEAD tag, that will print, inside an HTML comment tag, the username and uid (user id) of the user currently viewing the page.

<?php global $user; print 'This is user: '. $user-??>name .' : UID:'. $user->uid; ?>

Then, we'll run ApacheBench with a verbosity level of 4 (parameter -v4), allowing us to view the HTTP response code and html that the server is sending to ab.

    Viewing the content of pages served to ab is also useful in determining the nature of failed responses and investigating suspiciously high numbers of requests served per second. Apache identifies certain types of failed requests by serving them with an unsuccessful (non-200) HTTP response code. (For more about HTTP response codes\status codes, see the W3's "Status Code Definitions" .) ApacheBench recognizes HTTP response codes and reports the number of responses that are marked as successful and unsuccessful by Apache. A reportedly high number of requests successfully served per second can indicate good server performance, but it's important to distinguish pages served with a 200 response code from those that are truly served successfully. In some situations, Apache is unaware of serious problems that occur in other parts of the server stack and will serve pages with an HTTP response code of 200 (successful) despite the fact these pages contain only a PHP fatal error. In these cases, Apache is “successfully” serving the PHP fatal error message.

    Drupal correctly specifies non-200 HTTP response codes when it is unable to connect to the database. Viewing the content of the benchmarked page allows the person performing the testing to view the specific error message that the site is displaying. This is useful in identifying that the problem is, for example, that the MySQL max_user_connections value has been reached, as opposed to other database connection errors, such as an Access Denied error.

It's worth noting that while ab provides useful information about a server's performance, that it does not predict exactly how a server will perform under load from actual users.

Questions? Comments? Corrections? I'm happy to hear from you.

Jan 28 2008
Jan 28
amazon's elastic compute cloud, "ec2", provides a flexible and scalable hosting option for applications. while ec2 is not inherently suited for running application stacks with relational databases such as lamp, it does provide many advantages over traditional hosting solutions.

in this article we get a sense of lamp performance on ec2 by running a series of benchmarks on the drupal cms system. these benchmarks establish read throughput numbers for logged-in and logged-out users, for each of amazon's hardware classes.

we also look at op-code caching, and gauge it's performance benefit in cpu-bound lamp deployments.

the elastic compute cloud

amazon uses xen based virtualization technology to implement ec2. the cloud makes provisioning a machine as easy as executing a simple script command. when you are through with the machine, you simply terminate it and pay only for the hours that you've used.

ec2 provides three types of virtual hardware that you can instantiate. these are summarized in the table below.

machine type hourly cost memory cpu units platform small Instance $0.10 1.7 GB 1 32-bit platform large Instance $0.40 7.5 GB 4 64-bit platform extra large Instance $0.80 15 GB 8 64-bit platform note: one compute unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

target deployments

to keep things relatively simple, the target deployment for our load test is basic; the full lamp stack runs on a single server. this is step zero in the five deployment steps that i outlined in an open-source infrastructure for high-traffic drupal sites.

our benchmark

our benchmark consists of a base drupal install, with 5,000 users and 50,000 nodes of content-type "page". nodes are an even distribution of 3 sizes, 1K, 3K and 22K. the total database size is 500Mb.

during the test, 10 threads read nodes continually over a 5 minute period. 5 threads operate logged-in. the other 5 threads operate anonymously (logged-out). each thread reads nodes randomly from the pool of 50,000 available.

this test is a "maximum" throughput test. it creates enough load to utilize all of the critical server resource (cpu in this case). the throughput and response times are measured at that load. tests to measure performance under varying load conditions would also be very interesting, but are outside the scope of this article.

the tests are designed to benchmark the lamp stack, rather than weighting it towards apache. consequently they do not load external resources. that is, external images, css javascript files etc are not loaded, only the initial text/html page. this effectively simulates drupal running with an external content server or cdn.

the benchmark runs in apache jmeter. jmeter runs on a dedicated small-instance on ec2.

benchmarking is done with op-code caching on and off. since our tests are cpu bound, op-code caching makes a significant difference to php's cpu consumption.

our testing environment

the tests use a debian etch xen instance, running on ec2. this instance is installed with:
  • MySQL: 5.0.32
  • PHP: 5.2.0-8
  • Apache: 2.2.3
  • APC: 3.0.16
  • Debian Etch
  • Linux kernel: 2.6.16-xenU

the tests use a default drupal installation. drupal's caching mode is set to "normal". no performance tuning was done on apache, mysql or php.

the results

all the tests ran without error. each of the tests resulted in the server running at close to 100% cpu capacity. the tests typically reached steady state within 30s. throughputs gained via jmeter were sanity checked for accuracy against the http and mysql logs. the raw results of the tests are shown in the table below. instance apc? logged-in throughput logged-in response logged-out throughput logged-out response small off 194 1.50 664 0.45 large off 639 0.46 2,703 0.11 xlarge off 1,360 0.20 3,741 0.08 small on 905 0.30 3,838 0.07 large on 3,106 0.10 8,033 0.04 xlarge on 4,653 0.06 12,548 0.02

note: response times are in seconds, throughputs are in pages per minute

the results - throughput

the throughput of the system was significantly higher for the larger instance types. throughput for the logged-in threads was consistently 3x lower than the logged-out threads. this is almost certainly due to the drupal cache (set to "normal").

throughput was also increased by about 4x with the use of the apc op-code cache.


the results - response times

the average response times were good in all the tests. the slowest tests yielded average times of 1.5s. again, response times where significantly better on the better hardware and reduced further by the use of apc.


conclusions

drupal systems perform very well on amazon ec2, even with a simple single machine deployment. the larger hardware types perform significantly better, producing up to 12,500 pages per minute. this could be increased significantly by clustering as outlined here.

the apc op-code cache increases performance by a factor of roughly 4x.

these results are directly applicable to other cpu bound lamp application stacks. more consideration should be given to applications bound on other external resources, such as database queries. for example, in a database bound system, drupal's built-in cache would improve performance more significantly, creating a bigger divergence in logged-out vs logged-in throughput and response times.

although performance is good on ec2, i'm not recommending that you rush out and deploy your lamp application there. there are significant challenges in doing so and ec2 is still in beta at the time of writing (Jan 08). it's not for the faint-of-heart. i'll follow up in a later blog with more details on recommended configurations.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

resources

Jan 19 2008
Jan 19
i recently posted an introductory article on using jmeter to load test your drupal application. if you've read this article and are curious about how to build a more sophisticated test that mimics realistic load on your site, read on.

the previous article showed you how to set up jmeter and create a basic test. to produce a more realistic test you should simulate "real world" use of your site. this typically involves simulating logged-in and logged-out users browsing and creating content. jmeter has some great functionality to help you do this.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application. if you aren't already familiar with jmeter, i'd strongly recommend that you read my first post before this one.

an overview

the http protocol exchanges for realistic tests are quite complex, and painful to manually replicate. jmeter kindly includes http-proxy functionality, that allows you to "record" browser based actions, which can be used to form the basis of your test. after recording, you can manually edit these actions to sculpt your test precisely.

our test - browsers and creators

as an example, let's create a test with two test groups: creators and browsers. creators are users that arrive at the site, stay logged out, browse a few pages, create a page and then leave. browsers, are less motivated individuals. they arrive at the site, log in, browse some content and then leave.

setting up the test - simulating creators

to create our test, fire up jmeter and do the following.

create a thread group. call it "creators". add a "http request defaults" object to the thread group. check the "retrieve all embedded resources from html files" box.

add a cookie manager to the thread group of type "compatibility". add an "http proxy server to the workbench", as follows:


modify the "content-type filter" to "text/html". your jmeter-proxy should now look like:


navigate in your browser to the start of your test e.g. your home page. clear your cookies (using the clear private data setting). open up the "connection settings option" in firefox preferences and specify a manual proxy configuration of localhost, port 8080. this should look like:


note: you can also do this using internet explorer. in ie7 go to the "connections" tab of the internet options dialog. click the "lan settings" button, and setup your proxy.

start the jmeter-proxy. record your test by performing actions in your browser: (a) browse to two pages and (b) create a page. you should see your test "writing itself". that should feel good.

now stop the jmeter-proxy. your test should look similar to:


setting up the test - simulating browsers

create another thread group above the first. call it browsers. again, add a "http request defaults" object to the thread group. check the "retrieve all embedded resources from html files" box.

add a cookie manager to the thread group of type "compatibility". start the jmeter-proxy again. record your test by performing actions: (a) login and then (b) browse three pages. your test should look like:


stop the jmeter-proxy. undo the firefox proxy.

setting up the test - cleaning up

you can now clean up the test as you see fit. i'd recommend:
  • change the number of threads and iterations on both thread-groups to simulate the load that you care about.
  • modify the login to happen only once on a thread. see the diagram below.


and optionally:

  • rename items to be more meaningful.
  • insert sensible timers between requests.
  • insert assertions to verify results.
  • add listeners to each thread group. i recommend a "graph results" and a "view results tree" listener.

your final test should look like the one below. note that i didn't clutter the example with assertion and timers:


running your test

you should now be ready to run your test. as usual, click through to the detailed results in the tree to verify that your test is doing something sensible. ideally you should do this automatically with assertions. your results should look like:


notes

the test examples that i chose intentionally avoided logged-in users creating content. you'll probably want these users to create content, but you'll likely get tripped up by drupal's form token validation, designed to block spammers and increase security. modifying the test to work around this is beyond the scope of this article, and probably not the best way to solve the problem. if someone knows of a nice clean way to disable this in drupal temporarily, perhaps they could comment on this article.

resources

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Jan 14 2008
Jan 14
there are many things that you can do to improve your drupal application's scalability, some of which we discussed in the recent scaling drupal - an open-source infrastructure for high-traffic drupal sites article.

when making scalability modifications to your system, it's important to quantify their effect, since some changes may have no effect or even decrease your scalability. the value of advertised scalability techniques often depends greatly on your particular application and network infrastructure, sometimes creating additional complexity with little benefit.

apache jmeter is a great tool to simulate load on your system and measure performance under that load. in this article, i demonstrate how to setup a testing environment, create a simple test and evaluate the results.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application.

the testing environment

you should install and run the jmeter code on a server that has good resources and has a high-bandwidth, low-latency network access to your application server or load balancer. the maximum load that you can simulate is clearly constrained on these parameters. so are the accuracy of your timing results. therefore, for very large deployments you may need to run multiple non-gui jmeter instances on several test machines, but for most of us, a simple one test-machine configuration will suffice, i recently simulated over 12K pageviews/minute from a modest single-core server that wasn't close to capacity.

jmeter has a great graphical interface that allows you to define, run and analyze your tests visually. a convenient way to run this is to ssh to the jmeter test machine using x forwarding, from a machine running an x server. this should be as simple as issuing the command:

$ ssh -x testmachine.example.com

note, you'll need a minimal x install on your server for this. you can get one with:

$ sudo apt-get install xserver-xorg-core xorg

and then running the jmeter gui from that ssh session. jmeter should now appear on your local display, but run on the test machine itself. if you are having problems with this, skip to troubleshooting a the end of this article. this setup is good for testing a remote deployment. you can also run the gui on windows.

x forwarding can become unbearably slow once your test is running, if the test saturates your test server's network connection. if so, you might consider defining the test using the gui and running it on the command line. read more about remote testing on the apache site, and on command line jmeter later in this article.

setting up the test server - download and install java

jmeter is a 100% java implementation, so you'll need a functional java runtime install.

if you don't have java 1.4 or later, then you should start by installing it. to do so, make sure you've got a line in /etc/apt/sources.list like this:

deb http://ftp.debian.org/debian/ etch main contrib non-free

if you don't then add it, and do a apt-get update. once you've done this, do:

$ sudo apt-get install sun-java5-jre

installation on vista is as easy as downloading and installing the latest zip from http://jakarta.apache.org/site/downloads/downloads_jmeter.cgi, unzipping it and running jmeter.bat. please don't infer that i'm condoning or suggesting the use of windows vista ;)

setting up the test server - download and install jmeter

next, download the latest stable version of jmeter from the jmeter download page, for example:

$ wget http://apache.mirrors.tds.net/jakarta/jmeter/binaries/jakarta-jmeter-2.3.1.tgz

and then install it:

$ tar xvfz jakarta-jmeter-2.3.1.tgz

you should now be able to run it by:

$ cd ./jakarta-jmeter-2.3.1/bin
$ ./jmeter

if you are having problems running jmeter, see the troubleshooting section at the end of this article.

setting up a basic test

jmeter is a very full featured testing application. we'll scratch the surface of it's functionality and setup a fairly simplistic load test. you may want to do something a bit more sophisticated, but this will at least get you started.

to create the basic test, run jmeter as described above. the first step is to create a "thread group" object. you'll use this object to define the simulated number of users (threads) and the duration of the test. right mouse click the test plan node and select:
add -> thread group

specify the load that you'll exert on your system, for example, pick 10 users (threads) and a loop count (how many times each thread will execute your test). you can optionally modify the ramp up period e.g. a 10s ramp up in this example would create one new user ever second.

now add a sampler by right mouse clicking the new thread group and choosing:
add -> sampler -> http request. make sure to check the box "retrieve all embedded resources from html files", to properly simulate a full page load.

now add a listener to view the detailed results of your requests. the "results tree" is a good choice. add this to your thread group by selecting: add -> listener -> view results tree. note that after you run your test, you can select a particular request in the left panel and then select the "response data" tab on the right, to verify that you are getting a sensible response from your server, as shown below.

finally let's add another listener to graph our result data. choose:
add -> listener -> graph results. this produces a graph similar to the graph on the right.

if you want to create a more sophisticated test, you'll probably want to create realistic use scenarios, including multiple requests spaced out using timers, data creation by logged in users etc. you'll probably want to verify results with assertions. all of this is relatively easy, and you can read more on apache's site about creating a test plan. you can get information on login examples and cookie support here. you can also read the follow up to this blog: load test your drupal application scalability with apache jmeter: part two

running your test

controlling your test is now a simple matter of choosing the menu items: run -> start, run -> stop, run -> clear all etc. it's very intuitive. while your test is running, you can select the results graph, and watch the throughput and performance statistics change as your test progresses.

if you'd like to run your test in non-gui mode, you can run jmeter on the command line as follows:

$ jmeter --nongui --testfile basicTest.jmx --logfile /tmp/results.jtl

this would run a test defined in file basicTest.jmx, and ouput the results of the test in a file called /tmp/results.jtl. once the test is complete, you could, for example, copy the results file locally and run jmeter to visually inspect and analyse the results, with:

$ jmeter --testfile basicTest.jmx

or just run jmeter as normal and then open your test.

you may then use the listener of choice (e.g. "graph results") to open your results file and display the results.

interpreting your drupal results

most production sites run with drupal's built-in caching turned on. you can look at your performance setting in the administration page at: http://www.example.com/admin/settings/performance. this caching makes a tremendous difference to throughput, but when users are logged in, they bypass this cache.

therefore, to get a realistic idea of your site performance, it's a good idea to calibrate your system with caching on and caching off, and linearly interpolate the results to get a true idea of your maximum throughput. for example, if your throughput is 1,000 views per minute with caching, and 100 without caching and at any given point in time 50% of your users are logged in, you could estimate your throughput at (1000 + 100) / 2 = 550, that is 550 views per minute.

alternatively, you could build a more sophisticated test that simulates close-to-realistic site access including logged-in sessions. clearly, the more work you put into your load tests, the more accurate your results will be. see the followup article for details on building a more sophisticated test.

an example test - would a static file server or cdn help your application?

jmeter allows you to easily estimate the effect of configuration changes, sometimes without actually making the changes. recently i read robert douglass' interesting article on using lighttpd as a static file server for drupal, i was curious how much of a difference that would make.

simply un-checking the "retrieve all embedded resources from html files" on the http request allowed me to simulate all the static resources coming from another (infinitely fast) server.

for my (image intensive) application the results were significant, about a 3x increase in throughput. clearly the real number depends on many factors including the static resources (images, flash etc) used by your application and the ratio of first time to repeat users or your site (repeat users have your content cached). it seems fair to say that this technique would significantly improve throughput for most sites and presumably page performance would be significantly improved too, especially if the static resources were cdn hosted.

troubleshooting your jmeter install

if you are having problems with your jmeter install, then:

make sure that the java version you are running is compatible i.e. 1.4 or later, by:

$ java -version
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)

make sure that you have all the dependencies installed, if you get the error "cannot load awt toolkit: gnu.java.awt.peer.gtk.gtktoolkit", you might you might have to install the gcj runtime library. do this as follows:

$ sudo apt-get install libgcj7-awt

if jmeter hangs or stalls, you probably don't have the right java version installed or on your path.

if you're still having problems, take a look in the application log file jmeter.log for clues. this gets created in the directory that you run jmeter in.

if you are having problems getting x forwarding to work, make sure that it is enabled in your sshd config file e.g. /etc/ssh/sshd_config. you should have a line like:

X11Forwarding yes

if you change this, don't forget to restart the ssh daemon.

resources

further reading

if you'd like to build a more sophisticated test, take a look at my next blog: load test your drupal application scalability with apache jmeter: part two.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

thanks to curtis (serverjockey) hilger for introducing me to jmeter.

Dec 18 2007
Dec 18
#!/bin/bash

# guardian - a script to watch over application system dependences, restarting things
#            as necessary:  http://www.johnandcailin.com/john
#
#            this script assumes that at, logger, sed and wget are available on the path.
#            it assumes that it has permissions to kill and restart deamons including
#            mysql and apache.
#           
#            Version: 1.0:    Created
#                     1.1:    Updated logfileCheck() not to assume that files are rotated
#                             on restart.

checkInterval=10                         # MINUTES to wait between checks

# some general settings
batchMode=false                          # was this invoked by a batch job
terminateGuardian=false                  # should the guardian be terminated

# setting for logging (syslog)
loggerArgs=""                            # what extra arguments to the logger to use
loggerTag="guardian"                     # the tag for our log statements

# the at queue to use. use "g" for guardian. this queue must not be used by another
# application for this user.
atQueue="g"

# the name of the file containing the checks to run
checkFile="./checks"

# function to print a usage message and bail
usageAndBail()
{
   cat << EOT
Usage:guardian [OPTION]...
Run a guardian to watch over processes. Currently this supports apache and mysql. Other
processes can be added by simple modifications to the script. Invoking the guardian will run
an instance of this script every n minutes until the guardian is shutdown with the -t option.
Attempting to re-invoke a running guardian has no effect.

All activity (debug, warning, critical) is logged to the local0 facility on syslog.

The checks are listed in a checkfile, for example:

   #check type, daemonName, executableName, checkSource, checkParameters
   logfileCheck, apache2,       apache2,        /var/log/apache2/mainlog, "segmentation fault"

This checkfile specifies a periodic check of apache's mainlog for a string containing
"segmentation fault", restarting the apache2 process if it fails.

This script should be run on each host running the service(s) to be watched.

  -i        set the check interval to MINUTES
  -c        use the specified check file
  -b        batch mode. don't write to stderr ever
  -t        terminate the guardian
  -h        print this help

Examples:
To run a guardian every 10 minutes using checks in "./myCheckFile"
$ guardian -c ./myCheckFile -i 10

EOT

   exit 1;
}

# parse the command line arguments (l,s and t, each of which take a param)
while getopts i:c:hbt o
do     case "$o" in
        i)     checkInterval="$OPTARG";;
        c)     checkFile="$OPTARG";;
        h)     usageAndBail;;
        t)     terminateGuardian=true;;
        b)     batchMode=true;;        # never manually pass in this argument
        [?])   usageAndBail
       esac
done

# only output logging to standard error running from the command line
if test ${batchMode} = "false"
then
   loggerArgs="-s"
fi

# setup logging subsystem. using syslog via logger
logCritical="logger -t ${loggerTag} ${loggerArgs} -p local0.crit"
logWarning="logger -t ${loggerTag} ${loggerArgs} -p local0.warning"
logDebug="logger -t ${loggerTag} ${loggerArgs} -p local0.debug"

# delete all outstanding at jobs
deleteAllAtJobs ()
{
   for job in `atq -q ${atQueue} | cut -f1`
   do
      atrm ${job}
   done
}

# are we to terminate the guardian?
if test ${terminateGuardian} = "true"
then
   deleteAllAtJobs

   ${logDebug} "TERMINATING on user request"
   exit 0
fi

# check to see if a guardian job is already scheduled, return 0 if they are, 1 if not.
isGuardianAlreadyRunning ()
{
   # if there are one or more jobs running in our 'at' queue, then we are running
   numJobs=`atq -q ${atQueue} | wc -l`
   if test ${numJobs} -ge 1
   then
      return 0
   else
      return 1
   fi
}

# make sure that there isn't already an instance of the guardian running
# only do this for user initiated invocations.
if test ${batchMode} = "false"
then
   if isGuardianAlreadyRunning
   then
      ${logDebug} "guardian invoked but already running. doing nothing."
      exit 0
   fi
fi

# get the nth comma seperated token from the line, trimming whitespace
# usage getToken line tokenNum
getToken ()
{
   line=$1
   tokenNum=$2

   # get the nth comma seperated token from the line, removing whitespace
   token=`echo ${line} | cut -f${tokenNum} -d, | sed 's/^[ \t]*//;s/[ \t]*$//'`
}

# check http. get a page and look for a string in the result.
# usage: httpCheckImplementation sourceUrl checkString
httpCheck ()
{
   sourceUrl=$1
   checkString=$2

   wget -O - --quiet ${sourceUrl} | egrep -i "${checkString}" > /dev/null 2>&1
   httpCheckResult=$?
   if test ${httpCheckResult} -eq 0
   then
      ${logDebug} "PASS: found \"${checkString}\" in ${sourceUrl}"
   else
      ${logWarning} "FAIL: could NOT LOCATE \"${checkString}\" in ${sourceUrl}"
   fi

   return ${httpCheckResult}
}

# check to make sure that mysql is running
# usage: mysqlCheck connectString query
mysqlCheck ()
{
   connectString=$1
   query=$2

   # get the connect params from the connectString
   userAndPassword=`echo ${connectString} | sed "s/.*\/\/\(.*\)@.*/\1/"`
   mysqlUser=`echo ${userAndPassword} | cut -f1 -d:`
   mysqlPassword=`echo ${userAndPassword} | cut -f2 -d:`
   mysqlHost=`echo ${connectString} | sed "s/.*@\(.*\)\/.*/\1/"`
   mySqlDatabase=`echo ${connectString} | sed "s/.*@\(.*\)/\1/" | cut -f2 -d\/`

   mysql -e "${query}" --user=${mysqlUser} --host=${mysqlHost} --password=${mysqlPassword} --database=${mySqlDatabase} > /dev/null 2>&1
   mysqlCheckResult=$?
   if test ${mysqlCheckResult} -eq 0
   then
      ${logDebug} "PASS: executed \"${query}\" in ${mysqlHost}"
   else
      ${logWarning} "FAIL: could NOT EXECUTE \"${query}\" in database ${mySqlDatabase} on ${mysqlHost}"
   fi

   return ${mysqlCheckResult}
}

# check to make sure that a logfile is clean of critical errors
# usage: logfileCheck errorString logFile
logfileCheck ()
{
   logFile=$1
   errorString=$2
   logfileCheckResult=0
   marker="__guardian marker__"
   mark="${marker}: `date`"

   # make sure that the logfile exists
   test -r ${logFile} || { ${logCritical} "logfile (${logFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # see if we have a marker in the log file
   grep "${marker}" ${logFile} > /dev/null 2>&1
   if test $? -eq 1
   then
      # there is no marker, therefore we haven't seen this logfile before. add the
      # marker and consider this check passed
      echo ${mark} >> ${logFile}
      ${logDebug} "PASS: new logfile"
      return 0
   fi

   # pull out the "active" section of the logfile, i.e. the section between the
   # last run of the guardian and now i.e. betweeen the marker and the end of the file

   # get the last marker line number
   lastMarkerLineNumber=`grep -n "__guard" ${logFile} | cut -f1 -d: | tail -1`

   # grab the active section
   activeSection=`cat ${logFile} | sed -n "${lastMarkerLineNumber},$ p"`

   # check for the regexs the logFile's active section
   echo ${activeSection} | egrep -i "${errorString}" > /dev/null 2>&1
   if test $? -eq 1
   then
      ${logDebug} "PASS: logfile (${logFile}) clean: line ${lastMarkerLineNumber} to EOF"
   else
      ${logWarning} "FAIL: logfile (${logFile}) CONTAINS CRITICAL ERRORS"
      logfileCheckResult=1
   fi

   # mark the newly checked section of the file
   echo ${mark} >> ${logFile}

   return ${logfileCheckResult}
}

# restart deamon, not taking no for an answer
# usage: restartDamon executableName, initdName
restartDaemon ()
{
   executableName=$1
   initdName=$2
   restartScript="/etc/init.d/${initdName}"

   # make sure that the daemon executable is there
   test -x ${restartScript} || { ${logCritical} "restart script (${restartScript}) is not executable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # try a polite stop
   ${restartScript} stop > /dev/null

   # get medieval on it's ass
   pkill -x ${executableName} ; sleep 2 ; pkill -9 -x ${executableName} ; sleep 2

   # restart the deamon
   ${restartScript} start > /dev/null

   if test $? -ne 0
   then
      ${logCritical} "failed to restart daemon (${executableName}): CRITICAL GUARDIAN ERROR."
      exit 1
   else
      ${logDebug} "daemon (${executableName}) restarted."
   fi
}

#
# things look good, let's do our checks and then schedule a new one
#

# make sure that the checkFile exists
test -r ${checkFile} || { ${logCritical} "checkfile (${checkFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

# loop through each of the daemons that need to be managed
for daemon in `cat ${checkFile} | egrep -v "^#.*" | cut -f2 -d, |  sed 's/^[ \t]*//;s/[ \t]*$//' | sort -u`
do
   # execute all the checks for the daemon in question
   cat ${checkFile} | egrep -v "^#.*" | while read line
   do
      getToken "${line}" 2 ; daemonName=${token}

      if test ${daemonName} = ${daemon}
      then
         # get the check definition
         getToken "${line}" 1 ; checkType=${token}
         getToken "${line}" 3 ; executableName=${token}
         getToken "${line}" 4 ; checkSource=${token}
         getToken "${line}" 5 ; checkParams=${token}

         # remove quotes
         checkSourceQuoteless=`echo ${checkSource} | sed "s/\"//g"`
         checkParamsQuoteless=`echo ${checkParams} | sed "s/\"//g"`

         # call the appropriate handler for the check
         ${checkType} "${checkSourceQuoteless}" "${checkParamsQuoteless}"

         if test $? -ne 0
         then
            ${logCritical} "CRITICAL PROBLEMS with deamon (${daemonName}), RESTARTING."
            restartDaemon ${executableName} ${daemonName}
         fi
      fi
   done
done

# delete all at jobs (race conditions)
deleteAllAtJobs

# schedule a new instance of this sucker
${logDebug} "scheduling another check to run in ${checkInterval} minutes"
at -q ${atQueue} now + ${checkInterval} minutes > /dev/null 2>&1 << EOT
$0 $* -b
EOT

Nov 15 2007
Nov 15

if you've setup a clustered drupal deployment (see scaling drupal step three - using heartbeat to implement a redundant load balancer), a good next-step, is to scale your database tier.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice scalability: good redundancy: fair ease of setup: poor

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 data server drupal-data-server1.mydomain.com192.168.1.26 data server drupal-data-server2.mydomain.com192.168.1.27 data server drupal-data-server3.mydomain.com192.168.1.28 mysql load balancer mysql-balance-1.mydomain.com192.168.1.94

first steps first - optimizing your database and application

the first step to scaling your database tier should include identifying problem queries (those taking most of the resources), and optimizing them. optimizing may mean reducing the volume of the queries by modifying your application, or increasing their performance using standard database optimization techniques such as building appropriate indexes. the devel module is a great way to find problem queries and functions.

another important consideration is the optimization of the database itself, by enabling and optimizing the query cache, tuning database parameters such as the maximum number of connections etc. using appropriate hardware for your database is also a huge factor in database performance, especially the disk io system. a large raid 1+0 array for example, may do wonders for your throughput, especially combined with a generous amount of system memory available for disk caching. for more on mysql optimization, take a look at the great o'reilly book by jeremy zawodny and derek balling on high performance mysql.

when it's time to scale out rather than up

you can only (and should only) go so far scaling up. at some point you need to scale out. ideally, you want a database clustering solution that allows you do exactly that. that is, add nodes to your database tier, completely transparently to your application, giving you linear scalability gains with each additional node. mysql cluster promises exactly this. it doesn't offer full transparency however, due to limitations introduced by the ndb storage engine required by mysql cluster. having said that, the technology looks extremely promising and i'm interested if anyone has got a drupal application running successfully on this platform. you can read more on mysql clustering on the mysql cluster website or in the the mysql clustering book by alex davies and harrison fisk.

less glamorous alternatives to mysql cluster

without the magic of mysql cluster, we've still got some, admittedly less glamorous, alternatives. one is to use traditional mysql database cluster, where all writes go to a single master and reads are distributed across several read-only-nodes. the master updates the read-only-nodes using replication.

an alternative is to segment read and write requests by role, thereby partitioning the data into segments, each one resident on a dedicated database.

these two approaches are illustrated below:

there are some significant pitfalls to both approaches:

  • the traditional clustering approach, introduces a replication lag i.e. it takes a non-trivial amount of time, especially under load, for writes to make it back to the read-only-nodes. this may not be problematic for very specific applications, but is problematic in the general case
  • the traditional clustering approach scales only reads, not writes, since each write has to be made to each node.
  • in traditional clustering the total effective size of your memory cache is the size of a single node (since the same data is cached on each node), whereas with segmentation it's the sum of the nodes.
  • in traditional clustering each node has the same hardware optimization pattern, whereas with segmentation, it can be customized according to the role it's playing.
  • the segmentation approach reduces the redundancy of the system, since theoretically a failure of any of the nodes takes your "database" off line. in practice, you may have segments that are non essential e.g. logging. you can, of course, cluster your segments, but this introduces the replication lag issue.
  • the segmentation approach relies on a thorough understanding of the application, and the relative projected load on each segment to do properly.
  • the segmentation approach is fundamentally very limited, since there are a limited number of segments for a typical application.

more thoughts on database segmentation

from one perspective, the use of memcache is a database segmentation technique i.e. it takes part of the load on the database (from caching) and segments this into a specialized and optionally distributed caching "database". there is a detailed step-by-step guide on lullabot on doing this on debian etch and drupal module.

you can continue this approach on other areas of your database, dedicating several databases to different roles. for example, if one of the functions of your database is to serve as a log, why not segment all log activity onto a single database? clearly, it's important that your segments are distinct i.e. that applications don't need joins or transactions between segments. you may have auxiliary applications that do need complex joins between segments e.g. reporting. this can be easily solved by warehousing the data back into a single database to serve specifically this auxiliary application (warehousing in this case).

while i'm not suggesting that the next step in your scaling exercise necessarily should be segmentation, this clearly depends on your application and preferences, we're going to explore the idea anyway. it's my blog afterall :)

what segmentation technologies to use?

there are several open source tools that you can use to build a segmentation infrastructure. sqlrelay is a popular database-agnostic proxying tool that can be used for this purpose. mysql proxy is, as the name suggests, a mysql specific proxying tool.

in this article i focus on mysql proxy. sqlrelay (partly due to it's more general purpose nature) is somewhat difficult to configure, and inherently less flexible than mysql proxy. mysql proxy on the other hand is quick to setup and use. it has a simple, elegant and flexible architecture that allows for a full range of proxying applications, from trivial to uber-complex.

more on mysql proxy

jan kneschke's brainchild, mysql proxy is a lightweight daemon that sits between your client application (apache/modphp/drupal in our case) and the database. the proxy allows you to perform just about any transformation on the traffic, including segmentation. the proxy allows you to hook into 3 actions; connect, query and result. you can do whatever you want to in these steps, manipulating data and performing actions using lua scripts. lua is a fully featured scripting language, designed for high performance. clearly a key consideration in this application. don't worry too much about aFsc (another scripting language). it's easy to pick up. it's powerful and intuitive.

even if you don't intend to segment your databases, you might consider a proxy configuration for other reasons including logging, filtering, redundancy, timing and analysis and query modification. for example, using mysql proxy to implement a hot standby database (replicated) would be trivial.

the mysql site states clearly (as of 09Nov2007); "MySQL Proxy is currently an Alpha release and should not be used within production environments". Feeling lucky?

a word of warning

the techniques described below, including the overall method and the use of mysql proxy, are intended to stimulate discussion. they are not intended to represent a valid production configuration. i've explored this technique purely in an experimental manner. in my example below i segment cache queries to a specific database. i don't mean to imply that this is a better alternative to memcache. it isn't. anyway, i'd love to hear your thoughts on the general approach.

don't panic, you don't really need this many servers

before you get yourself into a panic over the number of boxes i've drawn in the diagram, please bear in mind that this is a canonical network. in reality you could use the same physical hardware for both loadbalancers, or, even better, you could use xen to create this canonical layout and, over time, deploy virtual servers on physical hardware as load necessitated.

down to business - set up and test a basic mysql proxy

o.k., enough of the chatter. let's get down to business and setup a mysql proxy server. first, download and install the latest version of mysql proxy from http://dev.mysql.com/downloads/mysql-proxy/index.html.

tar xvfz mysql-proxy-0.6.0-linux-debian3.1-x86.tar.gz

make sure that your mysql load balancer can access the database on your data server i.e. on your data server, run mysql and enter:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES
ON drupaldb.*
to [email protected]'192.168.1.94' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

check that your load balancer can access the database on your data server i.e. on your load balancer do:

# mysql -e "select * from users limit 1" --host=192.168.1.26 --user=drupal --password=password drupaldb

now do a quick test of the proxy, run the proxy server, pointing to your drupal database server:

./mysql-proxy --proxy-backend-addresses=192.168.1.26 &

and test the proxy:

echo "select * from users" |  mysql --host=127.0.0.1 --port=4040 --user=drupal --password=password drupaldb

now change your drupal install to point at the load balancer, rather than your data server directly i.e. edit your settings.php on your webserver(s) and point your drupal install to the mysql load balancer, rather than at your database server:

$db_url = 'mysql://drupal:[email protected]:4040/drupaldb';

asking mysql proxy to segment your database traffic

the best way to segment a drupal databases depends on many factors, including the modules you use and the custom extensions that you have. it's beyond the scope of this exercise to discuss segmentation specifics, but, as an a example i've segmented the database into 3 segments, a cache server, a log server and a general server (everything else).

to get started segmenting, create two additional database instanaces (drupal-data-server2, drupal-data-server3), with a copy of the data from drupal-data-server3. make sure that you GRANT the mysql load balancer permission on to access each database as described above.

you'll now want to start up your proxy server, pointing to these instances. below, i give an example of a bash script that does this. it starts up the cluster and executes several sql statements, each one bound for a different member of the cluster, to ensure that the whole cluster has started properly. note that you'd also want to build something similar as a health check, to ensure that they kept functioning properly and stopping the cluster (proxy) as soon as a problem was detected.

here's the source for runProxy.sh:

:
BASE_DIR=/home/john
BIN_DIR=${BASE_DIR}/mysql-proxy/sbin

# kill the server if it's running
pkill -f mysql-proxy

# make sure any old proxy instance is dead before firing up the new one
sleep 1

# run the proxy server in the background
${BIN_DIR}/mysql-proxy \
--proxy-backend-addresses=192.168.1.26:3306 \
--proxy-backend-addresses=192.168.1.27:3306 \
--proxy-backend-addresses=192.168.1.28:3306 \
--proxy-lua-script=${BASE_DIR}/databaseSegment.lua &

# give the server a chance to start
sleep 1

# prime the pumps!
# execute some sql statements to make sure that the proxy is running properly
# i.e. that it can establish a connection to the range of servers in question
# and bail if anything fails
for sqlStatement in \
   "select cid FROM cache limit 1" \
   "select nid FROM history limit 1" \
   "select name FROM variable limit 1"
do
   echo "testing query: ${sqlStatement}"
   echo ${sqlStatement} |  mysql --host=127.0.0.1 --port=4040 \
       --user=drupal --password=password drupaldb || { echo "${sqlStatement}: failed (is that server up?)"; exit 1; }
done

you'll notice that this script calls references databaseSegment.lua, this is the a script that uses a little regex magic to map queries to servers. again, the actual queries being mapped serve as examples to illustrate the point, but you'll get the idea.. jan has a nice r/w splitting example, that can be easily modified to create databaseSegment.lua.

most of the complexity in jan's code is around load balancing (least connections) and connection pooling within the proxy itself. jan points out (and i agree) that this functionality should be made available in a generic load-balancing lua module. i really like the idea of having this in lua scripts to allow others to easily extend it, for example, by adding a round robin alternative. keep an eye on his blog for developments. anyway, for now, let's modify his example, add a some defines and a method to do the mapping:

local CACHE_SERVER = 1
local LOG_SERVER = 2
local GENERAL_SERVER = 3

-- select a server to use based on the query text, this will return one of
-- CACHE_SERVER, LOG_SERVER or GENERAL_SERVER
function choose_server(query_text)
   local cache_server_strings = { "FROM cache", "UPDATE cache",
                                  "INTO cache", "LOCK TABLES cache"}
   local log_server_strings =   { "FROM history", "UPDATE history",
                                  "INTO history" , "LOCK TABLES history",
                                  "FROM watchdog", "UPDATE watchdog",
                                  "INTO watchdog", "LOCK TABLES watchdog" }

   local server_table = { [CACHE_SERVER] = cache_server_strings,
                          [LOG_SERVER] = log_server_strings }

   -- default to the general server
   local server_to_use = GENERAL_SERVER

   -- find a server registered for this query_text in the server_table
   for i=1, #server_table do
      for j=1, #server_table[i] do
         if string.find(query_text, server_table[i][j])
         then
            server_to_use = i
            break
         end
      end
   end

   return server_to_use
end

and then call this in read_query:

-- pick a server to use
proxy.connection.backend_ndx = choose_server(query_text)

test your application

now test your application. a good way to see the queries hitting your database servers, is to (temporarily) enable full logging on each of them and watch the log.edit /etc/mysql/my.cnf and set:

# Be aware that this log type is a performance killer.
log             = /var/log/mysql/mysql.log

and then:

# tail -f /var/log/mysql/mysql.log

further work

to develop this idea further:
  • someone with better drupal knowledge than me could define a good segmentation structure for typical drupal application, with the query fragments associated with each application.
  • additionally, the scripts could handle exceptional situations better e.g. a regular health check for the proxy.
  • clearly we've introduced another single-point-of-failure in the database load balancer. the earlier discussion of heartbeat applies here.
  • it would be wonderful to bypass all this nonsense and get drupal running on a mysql cluster. i'd love to hear if you've tried it and how it went.

references and documentation

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 11 2007
Nov 11

i got some good feedback on my dedicated data server step towards scaling. kris buytaert in his everything is a freaking dns problem blog points out that nfs creates an unnecessary choke point. he may very well have a point.

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

kris suggests rsync as a solution, and although luc stroobant points out the delete problem, i still think it's a good, simple solution. see the diagram above.

the delete problem is that you can't simply use the --delete flag on rsync. since in an x->y synchronization, a delete on node x looks just like an addition to node y.

i speculate that you can partly mitigate this issue with some careful scripting, using a source-of-truth file server to which you first pull only additions from the source nodes, and then do another run over the nodes with the delete flag (to remove any newly deleted files from your source-of-truth). unfortunately you can't do the delete run on a live site (due to timing problems if additions happen after your first pass and before your --delete pass), but you can do this as a regularly scheduled maintenance task when your directories are not in flux.

i include a bash script below to illustrate the point. i haven't tested this script, or the theory in general. so if you plan to use it, be careful.

you could call this script from cron on your data server. you could do this, say, every 5 minutes for a smallish deployment. even though that this causes a 5 minute delay in file propagation, the use of sticky sessions ensures that users will see files that they create immediately, even if there is a slight delay for others. additionally, you could schedule it with the -d flag during system downtime.

the viability of this approach depends on many factors including how quickly an uploaded file must be available for everyone and how many files you have to synchronize. this clearly depends on your application.

synchronizeFiles -- a bash script to keep your drupal web server's files directory synchronized

#!/bin/bash

# synchronizeFiles -- a bash script to keep your drupal web server's files directory
#                     synchronized - http://www.johnandcailin.com

# bail if anything fails
set -e

# don't synchronize deletes by default
syncDeletes=false

sourceServers="192.168.1.24 192.168.1.25"
sourceDir="/var/www/drupal/files"
sourceUser="www-data"
targetDir="/var/drupalFiles"

# function to print a usage message and bail
usageAndBail()
{
   echo "Usage syncronizeFiles [OPTION]"
   echo "     -d       synchronize deletes too (ONLY use when directory contents are static)"
   exit 1;
}

# process command line args
while getopts hd o
do     case "$o" in
        d)     syncDeletes=true;;
        h)     usageAndBail;;
        [?])   usageAndBail;;
       esac
done

# do initial addition only schronization run from sourceServers to targetServer
for sourceServer in ${sourceServers}
do
   echo "bi directionally syncing files between ${sourceServer} and local"

   # pull any new files to the target
   rsync -a ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}/..

   # push any new files back to the source
   rsync -a ${targetDir} ${sourceUser}@${sourceServer}:${sourceDir}/..
done

# synchronize deletes (only use if directory contents are static)
if test ${syncDeletes} = "true"
then
   for sourceServer in ${sourceServers}
   do
      echo "DELETE syncing files from ${sourceServer} to ${targetDir}"

      # pull any new files to the target, deleting from the source of truth if necessary
      rsync -a --delete ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}
   done
fi

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 10 2007
Nov 10
if you felt a waft of cold air when you read the recent highly critical drupal security announcement on arbirary code execution using install.php, you were right. your bum was hanging squarely out of the window, and you should probably consider beefing up your security.

drupal's default exposure of files like install.php and cron.php present inherent security risks, for both denial-of-service and intrusion. combine this with critical administrative functionality available to the world, protected only by user defined passwords, broadcast over the internet in clear-text, and you've got potential for some real problems.

fortunately, there are some easy and practical things you can do to tighten things up.

step one: block the outside world from your sensitive pages

one easy way to tighten up your security, is to simply block access to your sensitive pages from anyone outside your local network. this can be done by using apache's mod_rewrite. for example, you could block access to any administrative page by adding the following into your .htaccess file in your drupal directory (the one containing sites, scripts, modules etc.). the example only allows access from IPs in the range 192.*.*.* or 200.*.*.*: <IfModule mod_rewrite.c>
  RewriteEngine on

  # Allow only internal access to admin
  RewriteCond %{REMOTE_ADDR} !^(192|200)\..*$
  RewriteRule   ^admin/.*  - [F]
  [...]
</IfModule>

step two: tunnel into your server for administrative access

now that you've locked yourself out of your server for remote administrative access, you'd better figure how to get back in. SOCKS-proxy and ssh-tunneling to the rescue! assuming that your server is running an ssh server, setup a ssh tunnel (from the machine you are browsing on) to your server as follows:

ssh -D 9999 [email protected]

now go to your favorite browser and proxy your traffic through a local ssh SOCKS proxy e.g. on firefox 2.0 on windoze do the following:
  1. select the tools->options (edit->preferences on linux) menu
  2. go to the "connections" section of the "network" tab, click "settings"
  3. set the SOCKS host to localhost port 9999
now simply navigate to your site and administer, safe in the knowledge that not only is your site's soft-underbelly restricted to local users, but all your traffic (including your precious admin password) is encrypted in transit.

your bum should be feeling warmer already.

some more rules

some other rules that you might want to consider include (RewriteCond omitted for brevity) # allow only internal access to node editing
RewriteRule   ^node/.*/edit.*  - [F]

# allow only internal access to sensitive pages
RewriteRule   ^update.php  - [F]
RewriteRule   ^cron.php  - [F]
RewriteRule   ^install.php  - [F]

debugging

can't get your rewrite rules to work? shock! ... consider adding this to your vhost configuration (e.g. /etc/apache2/sites-available/default) to see what (the hell) is going on.

RewriteLog /var/log/apache2/vhost.rewrite.txt
RewriteLogLevel 3

thanks

thanks to curtis (madman) hilger and paul (windows is not your friend) lathrop for help with this.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 29 2007
Oct 29
the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.

when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.

a well designed deployment will not only increase your scalability, but will also enhance your redundancy by removing single points of failure. implemented properly, an unmodified drupal install can run on this new deployment, blissfully unaware of the clustering, routing and caching going on behind the scenes.

incremental steps towards scalability

in this article, i outline a step-by-step process for incrementally scaling your deployment, from a simple single-node drupal install running all components of the system, all the way to a load balanced, multi node system with database level optimization and clustering.

since you almost certainly don't want to jump straight from your single node system to the mother of all redundant clustered systems in one step, i've broken this down into 5 incremental steps, each one building on the last. each step along the way is a perfectly viable deployment.

tasty recipes

i give full step-by-step recipes for each deployment, that with a decent working knowledge of linux, should allow you to get a working system up and running. my examples are for apache2, mysql5 and drupal5 on debian etch, but may still be useful for other versions / flavors.

note that these aren't battle-hardened production configurations, but rather illustrative minimal configurations that you can take and iterate to serve your specific needs.

the 5 deployment configurations

the table below outlines the properties of each of the suggested configurations: step 0step 1step 2step 3step 4 step 5 separate web and db no yes yes yes yes yes clustered web tier no no yes yes yes yes redundant load balancer no no no yes yes yes db optimization and segmentation no no no no yes yes clustered db no no no no no yes scalabilty poor- poor fair fair good great redundancy poor- poor- fair good fair great setup ease great good good fair poor poor-
in step 0, i outline how to install drupal, mysql and apache to get a get a basic drupal install up-and-running on a single node. i also go over some of the basic configuration steps that you''ll probably want to follow, including cron scheduling, enabling clean urls, setting up a virtual host etc.
in step 1, i go over a good first step to scaling drupal; creating a dedicated data server. by "dedicated data server" i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment.
in step 2, i go over how to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment.
in step 3, i discuss clustering your load balancer. one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling.


the holy grail of drupal database scaling might very well be a drupal deployment on mysql cluster. if you've tried this, plan to try this or have opinions on the feasibility of an ndb "port" of drupal, i'd love to hear it.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 21 2007
Oct 21

if you've setup a clustered drupal deployment (see scaling drupal step two - sticky load balancing with apache mod_proxy), a good next-step, is to cluster your load balancer.

one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice scalability: fair redundancy: good ease of setup: fair

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 web server drupal-lb2.mydomain.com192.168.1.25 data server drupal-data-server1.mydomain.com192.168.1.26 load balancer apache-balance-1.mydomain.com192.168.1.34 load balancer apache-balance-2.mydomain.com192.168.1.35 load balancer cluster apache-balance-cluster.mydomain.com192.168.1.51

network diagram


install and setup heartbeat on load balancers

setup two load balancers, apache-balance-1 and apache-balance-2 as described in my previous blog. install heartbeat and it's dependencies on both:

# apt-get install heartbeat-2

configure /etc/ha.d/ha.cf (see http://www.linux-ha.org/GettingStarted for more info) identically on each of the load balancers as follows:

logfile /var/log/ha-log
bcast eth0
keepalive 2
warntime 10
deadtime 30
initdead 120
udpport 694
auto_failback yes
node apache-balance-1
node apache-balance-2
uuidfrom nodename
respawn hacluster /usr/lib/heartbeat/ipfail

note:
  • the nodenames must be the output of uname -n
  • i had problems with my nodenames flapping, using uuidfrom fixed this.
configure /etc/ha.d/haresources. this must be identical on apache-balance-1 and apache-balance-2. really! . this file should look like:

apache-balance-1 192.168.1.51 apache2

note:
  • apache-balance-1 here refers to the "preferred" host for the service
  • 192.168.1.51 is the vip that your load balancer will appear to be on
  • apache2 here refers specifically to the name of the script in the directory /etc/init.d
configure /etc/ha.d/authkeys on both load balancers. if you're paranoid, see more secure options, in "configuring authkeys" here. this fiile should look like:

auth 2
2 crc

set authkeys permissions on both load balancers:

# chmod 600 /etc/ha.d/authkeys

configure apache to listen on the vip. edit /etc/apache2/ports.conf on both load balancers:

Listen 192.168.1.51:80

note: after this change, apache won't start on the load balancer in question, unless it has the vip. relax. that's as it should be.

theoretically you should configure each load balancer to stop apache2 starting on boot. this allows the ha daemon to take full control of starting and stopping apache. in practice i didn't need to. you might want to.

restart the ha daemons and test

restart the ha daemon on both load balancers and test:

# etc/init.d/heartbeat restart

keep an eye on the apache and heartbeat logfiles on both servers to see what is going on when you shut either loadbalancer down.

# tail -f /var/log/apache2/access.log
# tail -f /var/log/ha-log

final word

this is a fairly simplistic configuration. there is more you can do on detecting abormal situations and failing over. for more information, visit http://www.linux-ha.org

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 21 2007
Oct 21

if you've setup your drupal deployment with a separate database and web (drupal) server (see scaling drupal step one - a dedicated data server), a good next step, is to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

one way to do this is to use a dedicated web server running apache2 and mod_proxy / mod_proxy_balancer to load balance your drupal servers.

deployment overview

this table summaries the characteristics of this deployment choice scalability: fair redundancy: fair ease of setup: fair

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 web server drupal-lb2.mydomain.com192.168.1.25 data server drupal-data-server1.mydomain.com192.168.1.26 load balancer apache-balance-1.mydomain.com192.168.1.34

network diagram


load balancer setup: install and enable apache and proxy_balancer

create a dedicated server for load balancing. install apache2 (apt-get install apache2) and then install mod proxy_balancer and proxy_http with dependencies

# a2enmod proxy_balancer
# a2enmod proxy_http

enable mod_proxy in mods-available/proxy.conf. note that i'm leaving ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse directives. this keeps the server secure from spammers trying to use your proxy to send email.

<IfModule mod_proxy.c>
        # set ProxyRequests off since we're only using the ProxyPass and ProxyPassReverse
        # directives. this keeps the server secure from
        # spammers trying to use your proxy to send email.

        ProxyRequests Off

        <Proxy *>
                AddDefaultCharset off
                Order deny,allow
                Allow from all
                #Allow from .example.com
        </Proxy>

        # Enable/disable the handling of HTTP/1.1 "Via:" headers.
        # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
        # Set to one of: Off | On | Full | Block

        ProxyVia On
</IfModule>

configure mod_proxy and mod_proxy_balancer

mod_proxy and mod_proxy balancer serve as a very functional load balancer. however mod_proxy_balancer makes slightly unfortunate assumptions about the format of the cookie that you'll use for sticky session handling. one way to work around this is to create your own session cookie (very easy with apache). the examples below describe how to do this

first create a virtual host or use the default (/etc/apache2/sites-available/default) and add this configuration to it:

<Location /balancer-manager>
SetHandler balancer-manager

Order Deny,Allow
Deny from all
Allow from 192.168
</Location>

<Proxy balancer://mycluster>
  # cluster member 1
  BalancerMember http://drupal-lb1.mydomain.com:80 route=lb1

  # cluster member 2
  BalancerMember http://drupal-lb2.mydomain.com:80 route=lb2
</Proxy>

ProxyPass /balancer-manager !
ProxyPass / balancer://mycluster/ lbmethod=byrequests stickysession=BALANCEID
ProxyPassReverse / http://drupal-lb1.mydomain.com/
ProxyPassReverse / http://drupal-lb2.mydomain.com/

note:
  • i'm allowing access to the balancer manager (the web UI) from any IP matching 192.168.*.*
  • i'm load balancing between 2 servers (drupal-lb1.mydomain.com, drupal-lb2.mydomain.com) on port 80
  • i'm defining two routes for these servers called lb1 and lb2
  • i'm excluding (!) the balancer-manager directory fro the ProxyPass to allow access to the manager ui on the load balancing server
  • i'm expecting a cookie called BALANCEID to be available to manage sticky sessions
  • this is a simplistic load balancing configuration. apache has many options to control timeouts, server loading, failover etc. too much to cover but read more in the apache documentation

configure the web (drupal) servers to write a session cookie

on each of the web (drupal) servers, add this code to your vhost configuration:

RewriteEngine On
RewriteRule .* - [CO=BALANCEID:balancer.lb1:.mydomain.com]

making sure to specify the correct route e.g. lb1 on drupal-lb1.mydomain.com etc.

you also probably want to setup your cookie domain properly in drupal, i.e. modify drupal/sites/default/settings.php as follows:

# $cookie_domain = 'example.com';
$cookie_domain = 'mydomain.com';

important urls

useful urls for testing are:

the balancer manager

the mod_proxy_balancer ui enables point-and-click update of balancer members.

the balancer manager allows you to dynamically change the balance factor or a particular member, change it's route or put it in the off line mode.

debugging

to debug your configuration it's useful to turn up apache's debugging level on your apache load balancer by adding this to your vhost configuration:

LogLevel debug

this will produce some very useful debugging output (/var/log/apache2/error.log) from the proxying and balancing code.

firefox's cookie viewer tools->options->privicy->show cookies is also useful to view and manipulate your cookies.

if you plan to experiment with bringing servers up and down to test them being added and removed from the cluster you should consider setting the "connection pool worker retry timeout" to a value lower than the default 60s. you could set them to e.g. 10s by changing your configuration to the one below. a 10s timeout allows for quicker test cycles.

BalancerMember http://drupal-lb1.scream.squaretrade.com:80 route=lb1 retry=10
BalancerMember http://drupal-lb2.scream.squaretrade.com:80 route=lb2 retry=10

next steps

one single-point-of-failure in this deployment is the apache load balancer. consider clustering your load balancer with scaling drupal step three - using heartbeat to implement a redundant load balancer

references and documentation

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 13 2007
Oct 13

if you've already installed drupal on a single node (see easy-peasy-lemon-squeezy drupal installation on linux), a good first step to scaling a drupal install is to create a dedicated data server. by dedicated data server i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment. here's how you can do it. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice: scalability: poor redundancy: poor ease of setup: good

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 data server drupal-data-server1.mydomain.com192.168.1.26

update

the recipe below uses nfs, you might want to consider using rysnc as an alterative. see the discussion in step one B -- john, 11 nov 2007 if you plan to run with a single webserver for a while, you can skip the nfs / rsync malarky until step 2 -- john, 11 nov 2007

data server: setup mysql and prepare it for remote access

install mysql. i use mysql5. you'll need to enable this for remote access. edit /etc/mysql/my.cnf and change the bind address to your local server address e.g.

# bind-address          = 127.0.0.1
bind-address            = 192.168.1.26

now allow access to your database from your web (drupal) servers. run mysql and do:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES
ON drupaldb.*
TO [email protected]'drupal-lb1.mydomain.com' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

restart mysql:

# /etc/init.d/mysql restart

data server: setup a shared nfs partition

moving drupal's file data (node attachments etc) to an nfs server does two things. it allows you to manage all your important data on a single server, simplifying backups etc. it also paves the way for web server clustering, where clearly it doesn't make sense to write these files onto random web servers in the cluster.

install the nfs server:

apt-get install nfs-kernel-server

make a directory to share:

# mkdir -p /var/drupalSharedFiles/files
# chown www-data.www-data /var/drupalSharedFiles/files

share this directory using nfs by adding this line to /etc/exports:

/var/drupalSharedFiles 192.168.1.1/24(rw,sync)

now share it:

# exportfs -a

web (drupal) server: install drupal and point it to your data server

install drupal on your web server (see the drupal installation recipe for specifics). make sure that it can connect to your database server. you can verify database connectivity using:

# mysql -e "select * from users limit 1" --user=drupal --host=drupal-data-server1.mydomain.com --password=password drupaldb

if this doesn't work, go back to the instructions above and make sure you did your binding and granting properly.

now edit your sites/default/settings.php to point it to your new data server e.g.

$db_url = 'mysql://drupal:[email protected]/drupaldb';

hit your drupal site and make sure that it sees the new database.

next, mount the area for shared files, add this to the /etc/fstab file:

192.168.1.26:/var/drupalSharedFiles /var/drupalSharedFiles nfs rw,hard,intr,async,users,noatime 0 0

make sure that you've got the portmapper installed on the client, you'll need this to do a performant nfs mount. if you don't have it:

# apt-get install portmap

and do a:

# mount -a

next, change the local files directory to point to your remote one, cd to your drupal directory (probably /var/www/drupal) and:

# cp -a files/.htaccess /var/drupalSharedFiles/files
# rm files/.htaccess ; rmdir files
# ln -s /var/drupalSharedFiles/files/ .

it's a good idea to verify that drupal is happy with this new arrangement by visiting the status report, e.g. by hitting http://drupal-lb1.mydomain.com/drupal/?q=admin/logs/status and making sure that it sees your nfs area as writable. you can also just upload an attachment and see what happens.

you should be all set.

next steps

got all that working? want more scalability and redundancy? consider clustering your drupal servers with step two - sticky load balancing with apache mod proxy

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 13 2007
Oct 13

installing drupal is pretty easy, but it's even easier if you have a step by step guide. i've written one that will produce a basic working configuration with drupal5 on debian etch with php5, mysql5 and apache2. it might be a help on other configurations too. see the scalability overview for related articles.

my examples assume that you are bold (or stupid) enough to do this as root. if you are a scaredy-cat or just plan sensible, make sure you use sudo. there. i said it.

this recipe assumes that you want to install everything on the same server. if you want to install your database on another server, skip the database install below and instead follow the instructions for a separate data server. i recommend just following the instructions below and then moving off the data server later, to keep things simple.

install the dependencies

i used php5, mysql5 and apache2. i don't know what you've got installed. here's a set of dependencies that worked for me:

ii  apache2.2-common       2.2.3-3.3     Next generation, scalable, extendable web se
ii  libapache2-mod-php5    4.4.4-8       server-side, HTML-embedded scripting languag
ii  libmysqlclient15off    5.0.32-3      mysql database client library
ii  mysql-client-5.0       5.0.32-3      mysql database client binaries
ii  mysql-common           5.0.32-3      mysql database common files (e.g. /etc/mysql
ii  mysql-server           5.0.32-3      mysql database server (meta package dependin
ii  mysql-server-5.0       5.0.32-3      mysql database server binaries
ii  php5-common            4.4.4-8       Common files for packages built from the php
ii  php5-mysql             4.4.4-8       MySQL module for php5

in addition, to get the graphics stuff working, you'll want to:

# apt-get install php5-gd

download and extract drupal

go to http://www.drupal.org and download the latest install e.g. http://ftp.drupal.org/files/projects/drupal-5.3.tar.gz. now cd to /var/www. and wget and untar it e.g.:

# wget http://ftp.drupal.org/files/projects/drupal-5.3.tar.gz
# tar xvfz drupal-5.3.tar.gz
# mv drupal-5.3 drupal

at this point you should have a directory called /var/www/drupal with lots of good stuff in it.

set the permissions to these files appropriately by:

# chown -R www-data.www-data /var/www/drupal

fancy footwork

i had to do the following to get my install to work. you might too. don't do this if you don't need to. edit /etc/php5/apache2/php.ini and uncomment the line extension=mysql.so

setup mysql

In my example, I created a database called drupaldb, a user called drupal and a password called password. you might want to too.

to create the db, as root do:

# mysqladmin -p create drupaldb

just hit return at the pwd prompt.

now setup permissions to the db. As root do:

# mysql
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES
ON drupaldb.*
TO 'drupal'@'localhost'  IDENTIFIED BY 'password';

mysql>   FLUSH PRIVILEGES;

point drupal at your new database

editing the settings file /var/www/drupal/sites/default/settings.php, to point drupal at your new database:

$db_url = 'mysql://drupal:[email protected]/drupaldb';

finish the setup with the web tool

go to your browser and hit http://localhost/drupal, you will get redirected to http://localhost/drupal/install.php?profile=default, the install and db config page. follow the instructions. type in user=drupal, database=drupaldb, password=password. go on to create the initial (super admin) user and you are good to go.

creating a cron entry (optional)

you'll also want to add a cron entry (you'll need wget installed, apt-get install wget) like the following. it doesn't matter what user. you could put it in the root crontab (crontab -e as root):

45 * * * * /usr/bin/wget -O - -q http://localhost/drupal/cron.php > /dev/null 2>&1

hitting cron.php allows drupal to do it's housekeeping including keeping your site content-index up-to-date and trimming database logs.

enabling clean urls in apache (optional)

it's nice to enable drupal's clean urls. first, allow apache to access the .htaccess in the drupal directory. Add the following to your file in /etc/apache2/sites-available/ e.g. /etc/apache2/sites-available/default

<Directory /var/www/drupal>
   Options -Indexes FollowSymLinks MultiViews
   AllowOverride All
   Order allow,deny
   allow from all
</Directory>

now make sure that mod-rewrite is enabled:

# a2enmod rewrite

don't forget to reload/restart apache:

# /etc/init.d/apache2 force-reload

now go the URL http://localhost/drupal/?q=admin/settings/clean-urls, take the test (hopefully you'll pass) and, when you do, turn on clean urls.

memory limits (optional, but might save you some pain)

i had a problem with drupal administration (and some other) pages intermittently showing as completely blank. this is often a php memory issue. check your apache error log (/var/log/apache2/error.log) to know for sure. adding the line:

ini_set('memory_limit', '12M');

to the settings file /var/www/drupal/sites/default/settings.php will fix the issue. you might want to add more than 12M.

setting up an apache virtual host (optional)

it's nice to setup an apache virtual host for your drupal site. this allows you to create custom logging, remove the /drupal/ from your urls and nicely encapsulate the directives for drupal. here's how you can do it.

create a virtual host specification in /etc/apache2/sites-available called myserver.mydomain.com that looks something like this:

<VirtualHost *>
   ServerName myserver.mydomain.com
   DocumentRoot /var/www/drupal
        <Directory />
                Options -Indexes FollowSymLinks MultiViews
                AllowOverride All
                Order allow,deny
                allow from all
        </Directory>
        ErrorLog /var/log/apache2/error.log

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn

        CustomLog /var/log/apache2/access.log combined
</VirtualHost>

now go ahead and enable the new site:

# a2ensite myserver.mydomain.com

you'll probably want to disable the default site:

# a2dissite default

and then restart apache:

# /etc/init.d/apache2 restart

sounds like a lot of work?

if you can't be bothered doing all this configuration and you've got xen setup. ask me nicely and i'll send you a pre-configured xen machine with a this base install pre-configured. if you don't have a xen configuration, consider setting one up. i've got another quick and easy recipe for xen on linux.

scalability and redundancy

this recipe produces an installation that is neither scalable nor redundant. for information on solving this, see my scalability blog: scaling drupal - an open-source infrastructure for high-traffic drupal sites

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
May 03 2006
May 03

One of the great features of Drupal is its ability to run any number of sites from one base installation, a feature generally referred to as multisites . Creating a new site is just a matter of creating a settings.php file and (optionally) a database to go with your new site. That's it. More importantly, there's no need to set up complicated Apache Virtual hosts, which are a wonderful feature of Apache, but can be very tricky and tedious, especially if you're setting up a large number of subsites.

No worries, there is a solution.

Create a new LogFormat

Copy the LogFormat of your choice, prepend the HTTP host field, and give it a name:

LogFormat "%{Host}i %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" vcombined 

Get the script

Next, download the attached script (split-logfile) and store it somewhere like /usr/bin (don't for get to chmod 755 that baby!)

Now, tell apache to use pipe logfiles to your script rather than writing them directly to disk:

CustomLog "| /usr/bin/split-logfile" vcombined 

Restart Apache

/etc/rc.d/init.d/httpd restart

That's it.

Naturally, you may have to modify split-logfile if you don't store your logfiles in the default location.

 

 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web