Sep 28 2012
Sep 28

After decompressing for a couple of weeks from Drupalcon Munich, some observations have filtered to the surface. This is my 10th Drupalcon. It seems to be hard to believe. I've seen changes to the demographics of the convention that I think are worth sharing.

Our age

Back in 2007 in Barcelona, the average age of the convention go-er really seemed to be about 28 years old. We had some attendees that were older (like me) some that were younger, but by in large we were looking at a group of folks that were not at the beginning of their careers, but certainly not elder statesmen.

5 years later, the average age seems to be around 36-37ish. Again, some younger and some older, but the fascinating thing is that we seem to be aging, as a group faster than 1 year for 1 year. This seems to indicate that we are attracting professionals that are further along in their careers.

Is this good or bad?

Well, it swings both ways. Older community members bring stability and wisdom to the project. Still, like any population, you want enough young folks coming into the fold to replace those that leave. Attrition is a killer and open source software projects like ours can suffer from the equivalent of "HR attrition". Attrition in our community occurs when:

  1. People burn out and stop contributing
  2. People move to another technology
  3. People die

This is natural and to be expected. The challenge is replacing those individuals. They cover the full gambit of workers ranging from coders, to themers, to site builders, to product managers, to project managers, to business developers, to executives.

If we are attracting more mature people to our community, we need to do more to attract younger people into the community. This is critical for the sustainability of the project as a whole.

Our gender

We're doing pretty well on this count. Back in 2007 in Barcelona, the breakdown between men and women seemed to be about 3% women and 97% men. The Drupalchix Meetup at the Con in Barcelona was about twelve people. In other words, women were so under represented it was almost absurd. According to Geek Feminism, in 2007 opensource only had 1.5% representation from women. The technology industry as a whole has 10-30% representation by women.

At the time of Drupalcon Munich, the percentage of women was up to 17% in the Drupal project. The "t-shirt report" in Munich showed 79% men and 11% of women in attendance. This isn't scientific, but shows that no fewer than 11% of attendees were women. This is still not good enough. We need to continue to attract women to the project.

Our growth

If Drupalcons are a slice of the community as a whole, then we have flatlined in our growth. The last couple of American Drupalcons have had very little growth in the number of attendees. In Europe, there continues to be modest growth. Some have argued that this is due to venue size - but my sense given the curve of when the ticket countdown occurs, we would find we just couldn't sustain larger numbers.

Drupalcon Attendance
Growth in Drupalcon Attendance

This points back to our need to more effectively recruit new people.

Recently, on the Drupal Marketing Group, I wrote about our need to diversify. That, as a community, we need to embrace all our cultures. By extension we need to recruit more youth. We need to continue to be inclusive of women. We need to reflect, demographically, the population we want to serve. It should be diverse in age, color, and gender. There is still a lot of work to be done.

AttachmentSize 43.13 KB 23.13 KB
Nov 11 2011
Nov 11

With each week that DrupalCon Denver is coming nearer, the excitement in the community grows. At Trellon, we're not immune to that excitement, and we're proud to be a Platinum sponsor of Drupalcon Denver. We're looking forward to seeing members of the Drupal community there, both old friends and new. We're excited about the opportunity that Drupalcons give us not only to learn more about the direction of Drupal, but to help shape it. We're eager to learn what new and exciting things people around the world have been doing with Drupal. We want to hear about the great things you've built with Drupal, and we'd be happy to talk about what we've been up to lately, too.

We're also excited about sharing things that we've learned with you. As a result, we've put together proposals for a wide range of sessions.

Until midnight on Monday, November 14, you can head over to the Drupalcon site and vote for any of these that you'd like to attend.

We are already now looking forward to DrupalCon Denver to meet all of you great guys in person. Until Denver!

Aug 25 2011
Aug 25

Note: I am hosting a BoF at DrupalCon London about this: Join us in Room 333 on Thursday 25th August from 11:00 - 12:00 (second half).

Introduction

Varnish is a fast, really fast reverse-proxy and a dream for every web developer. It transparently caches images, CSS / Javascript files and content pages, and delivers them blazingly fast without much CPU usage. On the other hand Apache - the most widely used webserver for providing web pages - can be a real bottleneck and take much from the CPU even for just serving static HTML pages.

So that sounds like the perfect solution for our old Drupal 6 site here, right? (Or our new Drupal 7 site.)

We just add Varnish and the site is fast (for anonymous users) ...

The perfect solution?

But if you do just that you'll be severely disappointed, because Varnish does not work with Drupal 6 out of the box and, even with Drupal 7, you can run into problems with contrib modules. You also need to install an extra Varnish Module and learn VCL. And if you have a module using $_SESSION for anonymous users, you have to debug this and find and fix it all, because if Varnish is seeing any cookie it will by default not cache the page. The reason for this is that Varnish can't know if the output is not different, which is actually true for the SESSION cookie in Drupal. (Logged in users see different content from logged out ones). That means that those pages are not cached at all and that is true for all pages on a stock (non pressflow) Drupal installation.

So Varnish is just for experts then? Okay, we go with just Boost then and forget about Varnish. Boost just takes a simple installation and some .htaccess changes to get up and running. And we'll just add more Apache machines to take the load. (10 machines should suffice - no?)

Not any longer! Worry no more: Here comes the ultimate drop-in Varnish configuration (based on the recent Lullabot configuration) that you can just add. With minimal changes, it'll work out of the box.

That means that if you have Boost running successfully and can change your Varnish configuration (and isntall varnish on some server), you can run Varnish, too.

How to Boost your site with Varnish

And here are the very simple steps to upgrade your site from Boost to Boosted Varnish.

1. Download Varnish configuration here: http://www.trellon.com/sites/default/files/boosted-varnish.vcl_.txt
2. Install and configure Boost (follow README.txt or see documentation on Boost project page)
3. Set Boost to aggressivly set its Boost cookie
4. Setup Apache to listen on port 8080
5. Setup Varnish to listen to port 80
6. Replace default.vcl with boosted-varnish.vcl

Now we need to tweak the configuration a little:

There is a field in Boost where you can configure pages that should not be boosted. We want to make sure those pages don't cache in Varnish either.

In Boost this will just be a list like:

user
user/*
my-special-page

In Varnish we have to translate this to a regexp. Find the line in the configuration to change it and do:

##### BOOST CONFIG: Change this to your needs
       # Boost rules from boost configuration
       if (!(req.url ~ "^/(user$|user/|my-special-page)")) {
         unset req.http.Cookie;
       }
##### END BOOST CONFIG: Change this to your needs

And thats it. Now Varnish will cache all boosted pages for at least one hour and work exactly like Boost - only much faster and much more scalable.

We had a site we worked on where we had a time of 4s for a page request under high load and brought this down to 0.17s.

The only caveat to be aware of here is that pages are cached for at least one hour, so there is an hour of delay until content appears for anonymous users. But this can be set to 5 min, too, and you'll still profit from the Varnish caching. In general this setting is similar to the Minimum Cache Lifetime setting found in Pressflow.

The code line to change in boosted-varnish.vcl is:

##### MINIMUM CACHE LIFETIME: Change this to your needs
    # Set how long Varnish will keep it
    set beresp.ttl = 1h;
##### END MINIMUM CACHE LIFETIME: Change this to your needs

Even 5 min of minimum caching time give tremendous scalability improvements.

Actually with this technique I can instantly make any site on the internet running Boost much much faster. I just set the backend to the IP, set the hostname in the VCL and my IP address will serve those pages. So you could even share one Varnish server instance for all of your pages and those of your friends, too. I did experiment with EC2 micro instances and it worked, but for any serious sites you should at least get a small one. I spare the details for another blog post though - if there is interest to explore this further.

How and Why it works

The idea of this configuration is quite simple.

Boost is a solution which works well with many many contrib modules out of the box. With Varnish you need to use Pressflow or Drupal 7 and you need to make sure no contrib modules are opening sessions needlessly, which can be quite a hassle to track down. (Checkout varnish_debug to make this task easier here: http://drupal.org/sandbox/Fabianx/1259074)

But Boost's behavior and rules can be emulated in Varnish, because if it is serving a static HTML page, it could serve also a static object out of the Varnish cache.

And the property that is distinguishing between boosted and non-boosted pages is the DRUPAL_UID cookie set by Boost.

The cookies (and such the anonymous SESSION) are removed whenever Boost would have been serving a static HTML page, which would mean that Drupal never got to see that Cookies in the first place, so we can safely remove them.

The second thing to prevent Drupal from needlessly creating session after session is a very simple rule:

If a SESSION was not sent to the webserver, do not send a SESSION to the client. If a SESSION was sent to the webserver, return the SESSION to the client. So SESSION cookies will only be set on pages that are excluded from caching in Varnish like the user/login pages or POST requests (E.g. forms). As Drupal has the pre-existing SESSION cookie, it does not need to create a new SESSION.

To summarize those rules in a logic scheme:

# Logic is:
#
# * Assume: A cookie is set (we add __varnish=1 to client request to make this always true)
# * If boosted URL -> unset cookie
# * If backend not healthy -> unset cookie
# * If graphic or CSS file -> unset cookie
#
# Backend response:
#
# * If no (SESSION) cookie was send in, don't allow a cookie to go out, because
#   this would overwrite the current SESSION.

Why Boost and Varnish?

Now the question that could come up is: If I have Varnish, why would I need Boost anymore?

Boost has some very advanced expiration characteristics, which can be used for creating a current permanent cache on disk of the pages on the site.

This can help pre-warm the varnish cache in case of a varnish restart. But as it turns out, you can use the stock .htaccess and boosted varnish will still work - as long as the DRUPAL_UID cookie is set. It might be possible as further work to just write a contrib module doing exactly that.

But Boost can also be really helpful in this special configuration as you can set your Varnish cache to a minimum lifetime of - for example - 10 min. And instead of Drupal being hit every 10 min, Apache is just happily serving the static HTML page Boost had created until it expires.

The advantage of that is:

If there are 1000 requests to your frontpage, Apache will just be hit once and then Varnish will serve this cached page to the 1000 clients. So instead of Apache having to serve 1000 page requests, it just have to serve one every 10 min. Multiply that with page assets like images and CSS and JS files and you get some big savings in traffic going to Apache.

Conclusion

Varnish is a great technology, but it has been difficult to configure and there are lots of caveats to think of (especially with Drupal 6). This blog post introduced a new technology called Boosted Varnish, which lets Varnish work with every page that is running the Boost Module by temporarily adding it to the active working set of varnish and fetching it frequently back from the permanent Boost cache on disk. The purpose is not for those that are already running high performance drupal sites with Mercury stack, but for those that are using Boost and want to make their site faster by adding Varnish in front of it without having to worry about Varnish specifics.

I created a sandbox project to create any issues related to the configuration on:

Have fun with the configuration and I am happy to hear from you or see you tomorrow at my BoF session at DrupalCon London!

AttachmentSize 10.74 KB

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web