Apr 04 2019
Apr 04
Go to the profile of Julia Gutierrez

DrupalCon2019 is heading to Seattle this year and there’s no shortage of exciting sessions and great networking events on this year’s schedule. We can’t wait to hear from some of the experts out in the Drupalverse next week, and we wanted to share with you a few of the sessions we’re most excited about.

Adam is looking forward to:

Government Summit on Monday, April 8th

“I’m looking forward to hearing what other digital offices are doing to improve constituents’ interactions with government so that we can bring some of their insights to the work our agencies are doing. I’m also excited to present on some of the civic tech projects we have been doing at MassGovDigital so that we can get feedback and new ideas from our peers.”

Bryan is looking forward to:

1. Introduction to Decoupled Drupal with Gatsby and React

Time: Wednesday, April 10th from 1:45 pm to 2:15 pm

Room: 6B | Level 6

“We’re using Gatsby and React today on to power Search.mass.gov and the state’s budget website, and Drupal for Mass.gov. Can’t wait to learn about Decoupled Drupal with Gatsby. I wonder if this could be the right recipe to help us make the leap!”

2. Why Will JSON API go into Core?

Time: Wednesday, April 10th from 2:30 pm to 3:00 pm

Room: 612 | Level 6

“Making data available in machine-readable formats via web services is critical to open data and to publish-once / single-source-of-truth editorial workflows. I’m grateful to Wim Leers and Mateu Aguilo Bosch for their important thought leadership and contributions in this space, and eager to learn how Mass.gov can best maximize our use of JSON API moving forward.”

I (Julia) am looking forward to:

1. Personalizing the Teach for America applicant journey

Time: Wednesday, April 10th from 1:00 pm to 1:30 pm

Room: 607 | Level 6

“I am really interested in learning from Teach for America on how they implemented personalization and integrated across applications to bring applicants a consistent look, feel, and experience when applying for a Teach for America position. We have created Mayflower, Massachusetts government’s design system, and we want to learn what a single sign-on for different government services might look like and how we might use personalization to improve the experience constituents have when interacting with Massachusetts government digitally. ”

2. Devsigners and Unicorns

Time: Wednesday, April 10th from 4:00 pm to 4:30 pm

Room: 612 | Level 6

“I’m hoping to hear if Chris Strahl has any ‘best-practices’ and ways for project managers to leverage the unique multi-skill abilities that Devsigners and unicorns possess while continuing to encourage a balanced workload for their team. This balancing act could lead towards better development and design products for Massachusetts constituents and I’d love to make that happen with his advice!”

Melissa is looking forward to:

1. DevOps: Why, How, and What

Time: Wednesday, April 10th from 1:45 pm to 2:15 pm

Room: 602–604 | Level 6

“Rob Bayliss and Kelly Albrecht will use a survey they released as well as some other important approaches to elaborate on why DevOps is so crucial to technological strategy. I took the survey back in November of 2018, and I want to see what those results from the survey. This presentation will help me identify if any changes should be made in our process to better serve constituents from these results.”

2. Advanced Automated Visual Testing

Time: Thursday, April 11th from 2:30 pm to 3:00 pm

Room: 608 | Level 6

“In this session Shweta Sharma will speak to what visual testings tools are currently out there and a comparison of the tools. I am excited to gain more insight into the automated visual testing in faster and quicker releases so we can identify any gotchas and improve our releases for Mass.gov users.

P.S. Watch a presentation I gave at this year’s NerdSummit in Boston, and stay tuned for a blog post on some automation tools we used at MassGovDigital coming out soon!”

Lastly, we really hope to see you at our presentations:

We hope to see old friends and make new ones at DrupalCon2019, so be sure to say hi to Bryan, Adam, Melissa, Lisa, Moshe, or me when you see us. We will be at booth 321 (across from the VIP lounge) on Thursday giving interviews and chatting about technology in Massachusetts, we hope you’ll stop by!

Aug 17 2018
Aug 17

In my previous blog post on managing microsites with Drupal 8 I promised to write something further and fuller about designing web APIs. This is less directly about Drupal 8, but I will comment on how to implement the recommendations here in Drupal 8.

These are the things that I take time to think about when building a web API.

Design the thing

As a developer, it’s all too easy, and too tempting, to just jump right into coding something. It’s certainly a weakness I suffer from and that I have to cope with.

Before putting the proverbial pen to paper, though, it’s really important to understand why we’re building an API in the first place. What are the problems we’re trying to solve? What do the users need or want?

With regard to building an API, that means thinking about the consumers of the data provided by your API. If you’re building a decoupled CMS, the main user is the frontend system. In other circumstances it may also mean other websites, embedded widgets, apps on mobile devices, and so on. Whatever it is, due consideration needs to be given to the needs of those consumers.

That means understanding your user’s needs, examining the patterns of behaviour of those users, and ultimately translating those into a design.

Sound like familiar language? Yes, that’s the language of visual designers and user experience specialists. In my books, I’d suggest that means you would do well to work closely with specialist design colleagues when designing and building an API.

Your web API needs to be designed: needs; behaviours; analysis; patterns; traits; design; feedback; improve.

Be an artisan with your API

Take time. Research. Think. Plan. Design.

Beware, Drupal

When you’re working with Drupal, it is too easy to jump over the design step. Drupal does so much out of the box that it’s too easy to start coding without thinking properly about what we’re coding.

The availability bias when you’re a specialist Drupal developer, having it as the go-to toolkit, is that we think about the solutions to the problems (if we’ve even got as far as articulating the problems) in a Drupally way. For instance, since Drupal has a menu system it’s easy to think about navigation in a decoupled CMS system in terms of the way Drupal handles the menu system, which prevents you from thinking about other ways of handling navigation.

The same is true with Drupal 8’s support for REST. Drupal 8 core includes REST resources for most entities in a Drupal installation. That’s very useful. But, it can also make you lazy, just using these core RESTful API endpoints for nodes or comments or whatever, with all the guff they include, without even thinking about whether they’re appropriate, whether all the guff they include is appropriate, whether it’s useful or formatted appropriately.

That goes also for REST exports from Views. They can be useful, giving you a quick way of creating a RESTful API endpoint. The problem is, thought, that also confines you to working with the way Views works and what it can produce. You may find that a problem if you want to support optionally requesting for additional objects to be embedded in the response, for instance (see below).

Resist the temptation! Instead, take the time to think from the other end first.

I’ll return to the question of designing your API below, but first we need to talk about documentation, since designing and documenting your API can be part of the same process.

Documentation

Yeah, I know. Most devs find this just the dullest thing in the world to write. With a web API, though, it’s incredibly important. If you want people to actually be able to use your API, they need to know how to work with it. It’s horrible trying to work with an undocumented or under-documented API.

So, what should go into the documentation for a web API? Here’s some pointers.

The basics:

API reference

Yeah, this is probably what everyone thinks of when they think of documentation for a web API, but it is in fact only part of the documentation—maybe the most important part, but only part.

There a plenty of good blog posts and descriptions of what your API reference should include, so there’s no need for me to reiterate that here.

The most important thing to say, though, is that, beyond identifying resource paths, actions and parameters, your reference should describe in full both what the request should and the response will look like.

Mock server

It is incredibly helpful to include a mock server with your API documentation. Preferably, your mock server will handle the documented requests and responses of each resource.

This will help those building apps and tools that will consume your API to get up-and-running quickly.

For gold stars and a round of applause:

Tutorials, guides, cookbooks

If your API gets to be any substantial scale then the developers who use your API will find it incredibly useful to have some tutorials and guides included in your documentation.

These should cover common tasks, or how to work with specific sections of your API. A guide to ‘best practices’ with your API may be appropriate to help people make the most out of your API.

Check out the guides in MailChimp’s API documentation for a good example. Twitter’s API docs ‘best practice’ section are great as well.

Quick start

One invaluable guide is the ‘getting started’ or ‘quick start’ guide. This can often be just a single page, with a succinct summary of the things you need to do to get going.

The YouTube API ‘getting started’ page is a useful example.

Useful tools

There’s lots of useful tools out there to help you get started when you document your API design. Here’s some suggestions.

API Blueprint is an open-source high-level API design language that is very useful for writing your documentation. The language is similar to Markdown, so it’s easy to work with. There are a number of SaaS tools offering services based on API Blueprint. One that I really like is Apiary.io (though they’ve recently been bought by Oracle so who know where that’ll take them), but there are others, like Gelato.

You might also consider Read the Docs and daux.io amongst others. There’s also the Open API Initiative, which is ‘focused on creating, evolving and promoting a vendor neutral API Description Format,’ though the initiative is ‘based on the Swagger Specification.’ Open API is an initiative of Swagger.io, and they have a list of tools and frameworks using the specification. The OpenAPI specification is on GitHub.

Whatever you use, your documentation should (probably) end up in a public location so that other developers can use it. (An exception might be for an API used in a secure decoupled system.)

Keep it simple

So, let’s return more directly to the question of designing your web API.

An important rule of thumb for me is to ‘keep it simple, stupid.’ There is no need to include anything more in the resources of your API than is necessary.

I say this as a long-time Drupal developer, knowing full well that we have a superpower in overcomplicating things, all those extra divs and classes all over the markup, all those huge arrays.

This is still true in the core REST resources of Drupal 8. For example, when GETting the core Content resource for node 10 /node/10?_format=json the response gives us …

{
"nid": [
{
"value": "10"
}
],
"uuid": [
{
"value": "6bfe02da-b1d7-4f9b-a77a-c346b23fd0b3"
}
],
"vid": [
{
"value": "11"
}
],

}

Each of those fields is an array that contains an array that contains the value name:value pair as the only entry. Whew! Exhausting. An array within an array, when there’s only one level-1 array ? Really? Maybe we could render that a little more simply as …

{
"nid": "10",
"uuid": "6bfe02da-b1d7-4f9b-a77a-c346b23fd0b3",
"vid": "11",

}

… which might help our API’s consuming applications to parse and use the JSON data more easily. Like I said above, I’d suggest that just using the core entity REST resources isn’t often the place to start.

The simplicity mantra should pervade your API design. Include only the data that is needed for the consuming apps. Pare it down, so it’s as easy to read as possible.

As a result, when you come to build that API in your Drupal 8 backend system, it will demand a good discipline on you of not just throwing out in the API resource responses what’s easiest but rather what’s best.

What’s in a name?

This is true in particular when it comes to your naming conventions and API resource paths.

Don’t just add root-level endpoints ad infinitum. Use well-structured paths for your resources where the depth of the path elements make sense together. The result should be that your resources are explorable via a browser address bar. E.g.

GET /articles/5/comments/19

… makes intuitive sense as a path: get comment 19 on article 5.

On the other hand, don’t just add depth to your resource paths unnecessarily. Separating things out with some logic will help make things intelligible for developers using your API. E.g.

GET /articles/comments

Umm? What’s that? The comments on articles — why would I want that? However …

GET /comments?contenttypes=articles

… is more obvious — a path to get comments, with a content types filter. Obvious. It also suggest we might be able to filter content types with a comma-separated list of types—nice!

Find a straightforward naming convention. Make the names of resource endpoints and data fields obvious and sensible at first glance.

Overall, make the way you name things simple, intuitive and consistent. If the title field of a data object in your resources is called ‘title’ in one place, ‘name’ in others and ‘label’ in still others, for instance, then it adds unnecessary complexity for writing reusable code.

Easy peasy, lemon squeezy

When designing your web API, it needs to be simple to use and work with. Help users to get just what they want from your API.

Support limiting response fields

You’ll make developers smile if you provide a way of limiting the fields that are returned in a response. You don’t always want to get everything from a resource. Being able to choose exactly what you want can help speed up usage of an API.

For example, consider supporting a fields parameter, that could be used like this:

GET /articles/5?fields=id,title,created

Support auto-loading related resources

The opposite might also be important, being able to load extra resources in the same request. If a request can combine related resources then fewer requests will need to be made, which again will help speed up using an API.

Supporting an embed query parameter could give you this. For example:

GET /articles/5?embed=author.name,author.picture,author.created

… would enable users to load also the article author’s name, their picture and the date their account was created. Note the dot syntax, which might be useful.

Flexible formats

Another way of making it easy for users is to support flexibility in the format of the data in the response. JSON is usually what people want to handle, but some do still prefer to use XML.

There’s also the problem that JSON has no support for hyperlinks, the building blocks of the web, which is a curiosity as the W3C admit. There are JSON protocol variants that attempt to address this, like HAL and JSON-LD, but I refer you to a fuller discussion of JSON and hypermedia and some useful resources on hypermedia and APIs from Javier Cervantes at this point.

Keep it steady, Eddy

When designing your API, you should expect it to have a certain lifetime. In fact, it’s bound to last long enough to need changing and improving. But what do you do about rolling out those changes?

Your devs will need the flexibility to change things, especially if they find bugs, and they’ll get frustrated if they can’t adapt the API to make improvements.

Your users need reliability and stability, though, and they’ll get frustrated if the API keeps changing and their consumer app dies without warning.

So, from the start, include versioning.

A pretty sensible thing is use a path element to specify the version number. E.g.

GET /api/v1/articles/5

You could use a query parameter instead, of course, though since query parameters are optional that would mean that without the version parameter your API would return the latest. Consumers who’d inadvertently missed including the version in their requests would be vulnerable to changes making their app die, which might result in some flame support emails.

Support that thing

Make sure there’s a way for your users to let you know when they have problems, there find a bug, or whatever.

If its an internal API, like with a decoupled CMS and frontend, then that is probably your bug tracker.

If it’s a public API, then you’ll need some public way for people to contact you. If you host your repository on e.g. GitHub then there’s support for issues baked in.

Respond.

Giant lists of bugs that never get addressed are soul-crushing.

Some other things to consider

Authentication and security

You’ll probably want to include some authentication to your API. You shouldn’t rely on cookies or sessions for your API as it should be stateless. Instead, by using SSL (you’re using SSL, right? yes, you’re using SSL.), you can implement a token-based authentication approach.

However, where a token approach is inappropriate, OAuth 2 (with SSL) is probably the best way to go. Here’s some further discussion on API security and authentication, if you’d like to read in more depth.

Caching

HTTP has a caching mechanism built in — woot! Just add some response headers and do some validation on request headers and it’s there.

I’ll point you elsewhere to read more about the 2 key approaches, ETag and Last-Modified.

Use HTTP status codes

HTTP defines lots of meaningful status codes that can be returned in your API responses. By using them appropriately, your API consumers can respond accordingly.

Useful errors

If a request has an error, don’t just return an error code. Your API should provide a useful error message in a format with which the consumer can work. You should use fields in your error message in the same way that a valid response does.

Healthy API design

In summary, when building an API it’s not healthy to just jump in and start writing the code for the API from a specification. Neither is it healthy to just rely on the default resources of CMS tools like Drupal 8. APIs always need to be tailor-made for the task.

APIs need to be designed.

If you can make your web API simple to understand and adopt, easy to work with, incorporating plenty of flexibility, if it’s stable and reliable and well-supported, then you’re well on your way to being the proud owner of a healthy API.

Aug 17 2018
Aug 17

There are lots of situations in which you need to run a series of microsites for your business or organisation — running a marketing campaign; launching a new product or service; promoting an event; and so on. When you’re with Drupal, though, what options do you have for running your microsites? In this article I review and evaluate the options in Drupal 8, make a recommendation and build a proof of concept.

Go to the profile of Joe Baker

Jan 25, 2017

So, I want to run some microsites …

A client brought me an interesting problem recently, something they need to solve for their production Drupal site. They are an international humanitarian agency who, alongside their main production website, want to run some microsites for a number of their public campaigns. Although they could run them on the main site, they’ve found too many limitations in trying to do that. Campaign teams, frustrated with the lack of flexibility and slow protocols for getting changes made to support their bespoke needs, have often gone off with their small budget and dynamic team to create something quick that fits their campaign with Squarespace or Wordpress or something.

That made the campaigners really happy. But, when the campaign or event lapsed, the campaign site quickly got out of date and went unloved, the campaign team moved on and no-one could remember how to log into the system and it became abandoned.

Hearing this story was so familiar — the same thing often happened when I was a senior developer at Oxfam International.

So, they said, could something be done about it? What, if anything, could be done with Drupal to support campaigners get their microsites running? What would give them the fast, bespoke solution to their microsite problems, whilst still keeping all the content well-managed and being able to share that content with the main site or other microsites?

I scratched my chin and had a think.

How about Drupal multisites?

Since some of its earliest versions, Drupal has included a feature for multi-sites — running several sites from a single codebase installation, sharing the core system, contributed and custom modules and themes. Each multisite has its own database, its own settings and configuration, its own content, and so on. Ideally, it also means updates can be done once.

So, multisites could be an option. Many people find them to be a real workhorse for their context, and often they are right on the money.

Why use multisites

The Drupal.org documentation for multisites includes a simple rule-of-thumb for when to multisite:

As a general rule on whether to use multisite installs or not you can say:

- If the sites are similar in functionality (use same modules or use the same drupal distribution) do it.

- If the functionality is different don’t use multisite.

(DrupalCon Austin [June 2014] held a interesting debate on Drupal multi-sites, its pros and cons, gotchas and suggestions, which is available on YouTube.)

There’s several compelling reasons to use them.

First, having a single codebase to maintain is a huge plus. Forked codebases can soon become orphaned, and unloved codebases become fraught with problems too quickly.

Second, multisites often mean there is also a single hosting platform to maintain, which is also a major advantage.

That can often mean, thirdly, that multisite installations can make better use of resources, both the server resources and financial, personnel or other physical resources. For example, since multi-sites share the same core and modules, that code need only go into the opcode cache once, saving server resources.

Caveat: is the end of multisites support on the horizon?

It should be noted that a proposal has been made to deprecate support for multisites in Drupal, with a view to removing it in the future.

The basic argument for this is that it’s an ‘old skool’ way of thinking about handling multiple sites. Git and Composer create practices and codebase structures that point in other directions.

The modern approach to multi-site is: git — Same code, different sites. Under your control. And well-maintainable.

There are a number of positive reactions to that proposal, which are variations on a theme:

+1. Multisite is a historical oddity at this point and I’d never tell anyone to use it.

But there are many more negative reactions, which largely go along these sorts of lines:

-1. Multisite has been a workhorse for a ton of Drupal sites and is well established in our code.

In that light, Drupal’s multi-site feature is likely to stay around for a while.

Classic problems with Drupal multisites …

It’s not all a bed of roses, though. There are some classic sticking points when working with Drupal multisites.

First off, handling traffic. One site’s traffic spike can be another site’s nightmare when the hosting resources are all hogged by The New York Times tweeting a link to a page on a site of yours; one site’s ‘BEST DAY EVA!’ can be the worst of times for all the rest.

The load on your database server may also be an issue. Multisites often use a single database server, and heavy load or slow queries in one DB can impact the performance of others. This might even be caused in the normal running of your Drupal sites, such as when running cron.

Running updates often causes headaches. When you update code, you’re updating all your sites at once. That means the updates are deployed, well, instantly across all your sites, but if they need update processes to run, such as updating the database, that can throw unexpected problems or errors.

And the worst of the worst: a small piece of poorly written, inadequately reviewed or tested code mysteriously jumps itself onto production — that never happens, right? No one ever lets that happen, do they? *ahem* — and takes down all your sites at once! It’s just an urban myth, a story to scare the children with at night, right? Never happens.

… and how to mitigate them

There are of course a number of ways to foresee these things happening and be ready for them.

On the performance questions, with smaller demands you can just ride it out — sites on the same hosting platform are fairly tolerant of resources being shared around, and the spare capacity is there for times just like there.

For larger performance demands, handling the pressure is a challenge in any hosting set-up, dedicated hosting just as much as shared. With modern cloud infrastructure, the option of scaling up your infrastructure or spinning up a new cluster when you’re experiencing ongoing heavy demand is much easier than in the past, especially if you plan for it as a possibility.

The next set of mitigations are all about best practice.

For starters, test, test, test. Don’t let any code onto production that hasn’t been tested thoroughly.

Have a solid release process that you always follow. If possible, include dev, staging and quality assurance stages. This should give you lots of points to catch things before they’re released onto your production sites.

Automate all the things. There are lots of ways of automating things to ensure they run consistently and quickly too, from shell scripts up to continuous integration tools. Use them.

And finally, be intelligent. With code changes that need database updates, for example, design your code so that it can be deployed to handle an interval before the database is updated. Or, with important but more volatile updates, be smart about choosing the time of day and week that you deploy it. Don’t ever push something out at 5pm on a Friday afternoon if you want to stay friends with your colleagues, your customers and your family.

Well, yes, in short, kinda. You could run microsites using Drupal’s multi-site feature. Things would work fine, though of course you’d have all the problems described above and have to take the mitigating actions.

However, it wouldn’t solve all the needs described above without some smart thinking. Plus, I’d suggest that you would also have some other problems to solve.

First, multisites all use different databases (sharing databases and tables is possible with Drupal multisites, but really unadvisable!) so the need of a single place for managing all the site content wouldn’t really be satisfied. The way around that would involve using web services, posting and pulling content from one site to another.

Neither would we have a unified search. There are fairly straightforward ways around that, using a tool like Apache Solr. The sites would need to share an index, with each document in the index including a site field, and there’s a contrib module that does that already (although no Drupal 8 version yet).

Lastly, and maybe more pertinently, you would still have all the ‘Drupalisms’ to live with. First of those is the visual design layer, the public user’s interface for the sites, what gets called the ‘theme layer’ in Drupal lingo. Many designers really dislike Drupal’s theme layer, and would really prefer to work with the pure frontend tools they use in other contexts. Drupal 8 has made major strides forward with the theme layer so it’s not as tough for designers as it once was, it’s true, but many (most?) frontend specialists would still rather not work with it.

Some consider influential Drupal figures consider multisites as ‘not enterprise grade’ and opinions like that are worth considering if your situation is enterprise scale.

Other approaches with Drupal

There are a few other ways of supporting microsites with Drupal that might be worth considering.

Domain Access

The Domain Access project was created to support just this kind of functionality. The project overview says as much:

The Domain Access project is a suite of modules that provide tools for running a group of affiliated sites from one Drupal installation and a single shared database. The module allows you to share users, content, and configurations across a group of sites.

This might work. However, there are many of the same problems with core multisites described above with this approach, with one additional one: everything in one database.

Our experience of using it, and this is echoed by others too, is that with a small number of very similar sites Domain Access can work well. With a larger number of fairly different sites, it’s a right pain and actually makes things quite difficult, requiring lots of complicated custom code.

Organic Groups

The Organic Groups suite of modules could be a solution for building microsites. The project allows users to create a ‘group’ within a Drupal site. The group can have its own users, admins, content, menus, even its own visual design. However, it would need every microsite to sit internally, within the main site, so does not solve the need to supporting external sites on their own domain. So, not really the perfect fit.

Best practice: with Git

I quoted above from @sun in the discussion on deprecating multisite support about the modern best practice:

The modern approach to multi-site is: git — Same code, different sites. Under your control. And well-maintainable.

This is certainly my standard recommendation and will give you many advantages: independence of sites for performance, design, etc; single codebase to maintain (though you’ll have a challenge developing and maintaining the variations you’ll want or need for each microsite); better control over updates; and so on.

You might even look writing an install profile to make a full distribution, though with Drupal 8 there is less of a need to do this. With Drupal 8, I’d advocate that you use Drupal Composer to build your site and just export your full site config into your repo (being careful to remove any sensitive settings from the repo with your .gitignore file).

Or you might also consider using Aegir to manage your multiple sites — use Drupal to deploy Drupal, if that’s not too much Inception.

Microsites and Drupal

So if multisites could work but would be a bit of a pain, the other Drupal approaches are even less appealing, and you’d rather not keep multiplying Drupal installations, how else could we do microsites with Drupal?

Well, there are two major moves in modern web development that might help here: RESTful web services, and decoupled CMS architectures (a.k.a. ‘headless’ CMS). My proposal for managing microsites in Drupal 8 depends on both these ideas:

  • Treat your Drupal site as a pure content management system (CMS) — a content hub that allows authors, editors and administrators to create, update and manage the content for which they’re responsible, but doesn’t have any meaningful frontend presentation layer to it.
  • Present the data of the content in the hub CMS via a RESTful API.
  • Implement a separate frontend for the visual presentation layer that communicates with the content hub CMS via the API.

There need be no limit to the number of frontends that use the CMS’s API (though practically you may limit access with firewalls, CORS or some other means) so you could power a primary public site, other sub-sites, native mobile apps or even another CMS or two, each potentially with their own visual design. The limit is your own imagination and situation.

RESTful web services and Drupal 8

A new addition to Drupal 8 is the RESTful Web Services API. REST resources can be exposed to allow other things to talk to/consume/feed a Drupal site. Many core entities have REST resources, but it is also fairly easy to build custom REST resources. (There a number of interesting web services contrib projects that are worth considering, such as the GraphQL project that presents a GraphQL schema, and the RELAXed Web Services project that extends the core REST resources.)

Design your own web services API

The freedom to build custom REST resources in Drupal 8 allows a lot of freedom in designing a RESTful API.

In a forthcoming blog post I’ll write in more about designing an API. For now, all I need to say is you need to actually design your API. Don’t simply use the out-of-the-box Drupal core REST resources — think about the RESTful API that would best serve the frontend you want to have.

My heartfelt recommendation is you do this, designing your API, using the skills of those who’re best at designing things — your designers. They understand best what your users want to do on your sites, will be able to describe what data they want for the frontend (content with/without markup, etc.) and help you design the API that is most appropriate to your needs.

There are some API design golden rules and best practices that you should consider. Also I’d recommend using an API design tool like Apiary.io or Swagger.io. They’re invaluable for many reasons, not least of which is the lovely documentation they generate and mock data servers they include that can help frontend devs get going quickly.

Decoupled frontend

With the content hub now presenting the managed content as RESTful data, we just need a standalone frontend system to present your website to your users: one for your primary site, and one for each of your microsites. Your frontend specialists can then work with the right tools for the task, then.

There are several advantages to consciously uncoupling the content management and the frontend.

Freedom: frontend specialists are free to the implement the user experience with native tools that are built for the job.

Performance: everything in this architecture can be streamlined. The CMS simply presents the content data. The frontend focuses on the display logic.

Experience: the website can respond to users in real time, communicating back and forth with the CMS to give real-time interactions in the browser.

Future proof: it becomes much easier to replace any part of the system as you require, such as redesigning the website without re-building the CMS.

Microsites in Drupal 8

So, how might we do this practically in Drupal 8? Here’s how I tackled it.

First, I thought about designing a quick prototype API that could be used to describe microsites and their content. I used Apiary.io to design it, and you can view the API at docs.campaignmicrosites.apiary.io.

The final part is the standalone frontend tool. For this I used React. I used React to build my frontend app, but there’s obviously plenty of other options depending on what the frontend needs to do. React worked for me because I just wanted the view layer, but Angular or Ember could be more appropriate if the frontend needed to be a more substantial app. You’d need to evaluate the frontend options carefully.

I’m not a frontend specialist, so my prototyping code is pretty fugly. Despite that, we’re able to serve two microsites simultaneously on different URLs, with a different theme, just by switching the campaign ID in the API request.

Bingo!

Deploying to production

There’s a few things I might do to deploy this to a production system.

Secure serving

As a good internet citizen, I’d want to put everything on SSL.

Frontend deployment

To deploy the frontend, I’d be looking at options to run the apps on a NodeJS server so that most of the scripts can run server side.

I’d probably want to put an Nginx instance in front of it, for SSL termination, caching static assets and reverse proxy.

Use Drupal multisites ;-P

I think there is actually a neat way of using Drupal’s multi-sites feature here: use a different domain for the RESTful API. For example:

Editorial interface: hub.yourdomain.com
API interface: api.yourdomain.com

Both of these point to your Drupal codebase but you can then handle requests differently on each domain. For example, you might add an authentication provider that checks the domain to give you some access control, so there’s no access to the editorial interface on the API subdomain, and none to the API on the editorial domain.

Caching etc.

This would then allow you to do some smart things with caches and other parts of your hosting stack, offloading much of the pressure on the codebase to the caching architecture and removing the actions of editorial staff from affecting the RESTful API’s performance.

Databases

It might also be possible to configure GET requests to only use a slave database, which could be useful for performance — though may be more hassle than it’s worth. POST, PUT, PATCH and DELETE requests would still need to go to the master.

In summary

This prototype worked really well for me and I was very happy with the results, and it gave me something very interesting to discuss with the client.

The advances made in Drupal 8 to operate with current standard web practices are good news for developers and for web projects big and small. For this prototype, the particular improvements with providing RESTful resources means that I was able to create a decoupled Drupal system to support a main website and unlimited microsites in an amazingly short space of time.

… and something to take away

If you’re interested in following up this thought experiment with my Drupal 8 prototype, I’ve put the code into a repo in GitHub:

Just …

$ git clone [email protected]:ConvivioTeam/Convivio-ContentHub.git {some_empty_directory}
$ cd {some_empty_directory}
$ composer install

… and you’re away.

(My React code is shamefully dirty, so I’m not prepared to share that at moment. ;-) I may tidy it up in the future and share it here.)

Jun 02 2015
Jun 02

In April 2015, NASA unveiled a brand new look and user experience for NASA.gov. This release revealed a site modernized to 1) work across all devices and screen sizes (responsive web design), 2) eliminate visual clutter, and 3) highlight the continuous flow of news updates, images, and videos.

With its latest site version, NASA—already an established leader in the digital space—has reached even higher heights by being one of the first federal sites to use a “headless” Drupal approach. Though this model was used when the site was initially migrated to Drupal in 2013, this most recent deployment rounded out the endeavor by using the Services module to provide a REST interface, and ember.js for the client-side, front-end framework.

Implementing a “headless” Drupal approach prepares NASA for the future of content management systems (CMS) by:

  1. Leveraging the strength and flexibility of Drupal’s back-end to easily architect content models and ingest content from other sources. As examples:

  • Our team created the concept of an “ubernode”, a content type which homogenizes fields across historically varied content types (e.g., features, images, press releases, etc.). Implementing an “ubernode” enables easy integration of content in web services feeds, allowing developers to seamlessly pull multiple content types into a single, “latest news” feed. This approach also provides a foundation for the agency to truly embrace the “Create Once, Publish Everywhere” philosophy of content development and syndication to multiple channels, including mobile applications, GovDelivery, iTunes, and other third party applications.

  • Additionally, the team harnessed Drupal’s power to integrate with other content stores and applications, successfully ingesting content from blogs.nasa.gov, svs.gsfc.nasa.gov, earthobservatory.nasa.gov, www.spc.noaa.gov, etc., and aggregating the sourced content for publication.

  1. Optimizing the front-end by building with a client-side, front-end framework, as opposed to a theme. For this task, our team chose ember.js, distinguished by both its maturity as a framework and its emphasis of convention over configuration. Ember embraces model-view-controller (MVC), and also excels at the performance by batching updates to the document object model (DOM) and bindings.

In another stride toward maximizing “Headless” Drupal’s massive potential, we configured the site so that JSON feed records are published to an Amazon S3 bucket as an origin for a content delivery network (CDN), ultimately allowing for a high-security, high-performance, and highly available site.

Below is an example of how the technology stack which we implemented works:

Using ember.js, the NASA.gov home page requests a list of nodes of the latest content to display. Drupal provides this list as a JSON feed of nodes:

Ember then retrieves specific content for each node. Again, Drupal provides this content as a JSON response stored on Amazon S3:

Finally, Ember distributes these results into the individual items for the home page:

The result?

A NASA.gov architected for the future. It is worth noting that upgrading to Drupal 8 can be done without reconfiguring the ember front-end. Further, migrating to another front-end framework (such as Angular or Backbone) does not require modification of the Drupal CMS.

Feb 11 2012
Feb 11
Beer and developer conferences go hand in hand.

A few weeks ago I presented “CDNs made simple fast and cheap” at the Drupal Downunder conference in Melbourne Australia.

The talk covered:

  • the importance of good client side performance,
  • how A CDN works,
  • recommended CDN providers (from an Australian’s perspective),
  • a demonstration of how to set up a CDN and
  • a summary of the results (better YSlow score and page download times).


Setting up a CDN is very easy to do and cost effective. If you want you users to have the best online experience then there is nothing stopping you!

The CDN presentation is available as PDF slides and a video.

Thanks to my employer PreviousNext who kindly sponsored my trip to Melbourne. Hats off to Wim Leers for contributing the CDN module.

[embedded content]

Be Sociable, Share!
Nov 18 2011
Nov 18

Hiring Drupal developers is difficult. Hiring great Drupal developers in the current market often feels close to impossible. They are highly sought after and most of the people on the market, in all honesty, aren’t very good.

I’ve put together a list of the best Drupal interview questions that I’ve used over the years to screen Drupal candidates. Hopefully you’ll find them useful.

CMS developers are an unusual breed. They come from a wide variety of backgrounds, often stumbling into programming. While a non-CS background can bring valuable perspective, if you need a strong generalist on your project you should take extra care to make sure that you’re getting the skill set you require.

As you are probably well aware, hiring is very time consuming. To be effective you should aim to spend 80% of your time talking to great candidates. To do this, you need an efficient initial screen to quickly weed out unsuitable candidates. Ideally you’ll do your screening online or on the phone.

Your screen should test for most of the skills that you’ll need to see exhibited on the job, including: web basics, OO PHP knowledge, solid coding technique, system design, CSS/HTML, SQL and Drupal expertise.

After the screen, don’t forget to look at your candidate’s commits to Drupal core or module functionality. In my experience, great Drupal developers have a track record of regular contributions to the project.

May 20 2011
May 20
http://www.flickr.com/photos/essjay/224318029/

Last night I made a presentation on the “Business of Drupal” to the Sydney Drupal users meetup. The talk covered the subject areas of scalable jobs and wild randomness, basic business models in the software industry, the GPL, eight business models for Drupal in increasing order of scalability, ways developers can deepen their skills and a round up of how various organisations in the Drupal community are structuring the way they do businesss. I have just uploaded the slides to the talk.

For those wanting a little bit more detail without going to the slides, I’ll reproduce some of the content here.

Drupal business models in increasing order of scalability

1. Employment

  • Employment at Drupal shop or company
  • Income limited by salary (skill, experience)
  • Non scalable
  • Very regular

2. Pure services

  • Contractors, Drupal shops, F2F training
  • eg. Cross Functional, Previous Next
  • Income limited by incoming jobs (supply) and staff
  • Non scalable due to staffing requirements
  • Variable regularity, no subscriptions

3. GPL products with services

  • Distribution owners, module authors
  • eg. Phase2, Ubercart
  • Income limited by product popularity and staff
  • Non scalable due to staffing requirements
  • Variable regularity

4. Drupal hosting platform

  • Drupal hosting
  • eg. Acquia Dev Cloud, Managed Cloud, Chapter Three Pantheon, Omega8cc Aegir
  • Overhead of maintaining platform – Aegir
  • Scalable
  • Regular

5. Drupal as a service (DaaS)

  • Drupal running as a SaaS
  • eg. Drupal Gardens, Buzzr, wordpress.com
  • Overhead of maintaining platform
  • Scalable
  • Regular

6. Software as a service (Saas)

  • Service accessed via bridge module.
  • eg. Mollom, Acquia Solr
  • Overhead of maintaining platform
  • Scalable
  • Regular

7. Products with some non GPL code

  • Themes
  • eg. Top Notch Themes
  • Overhead of deloping product
  • Scalable
  • Irregular
  • Problem: Is the main IP in the code or the images?

8. Products with all non GPL code

  • Online training, documentation, books
  • eg. Lullabot drupalize.me
  • Overhead of deloping product
  • Scalable (online training)
  • (Ir)regular

Possible areas of specialisation for service providers

  • Data migration: Data is like wine, code like fish
  • Theming: Where are the themers?
  • Custom module development
  • Project scoping
  • Verticals: distros
  • Server admin, deployment (?)
  • Performance (?)

The main takeaway idea from the talk was that working in non-scalable areas such as full time employment is a safe option which will yield good results so long as you have skill and apply yourself. However, exposing yourself a little to some “wild randomness” in the form of scalable ventures (startups, SaaS, distros) could be a worthwhile pursuit if you are successful.

Be Sociable, Share!
Jan 12 2011
Jan 12

The conclusive article on a four part series on mobile sites in Drupal. Previous articles have discussed the groundwork, setup and code required for setting up a mobile site in Drupal. It’s now time to reflect on a few of the challenges thrown up and the best way forward.

Gaps

Given the above discussion there are a couple of missing pieces to the mobile jigsaw puzzle as I see it.

Device vs Theme

There should not necessarily be a one to one mapping from device (mobile, desktop) to a theme. This certainly is the pattern within much of Drupal (theme blocks, theme caching strategies). This pattern is achieved by making sure that theme_default conf variable is defined in settings.php. The theme is the switch for customisation. However, if this assumption holds we will never see single “responsive web design” themes developed for Drupal as they rely on a single theme to serve to multiple devices.

Global variable device

It’s important to have a device variable easily available to all code. The best approach would be to set a well known variable in conf in settings.php. This could be based on a header extension (x-device), a regex on the UA, a cookie, the URL or a combination of them. The important thing is that is is an agreed upon variable.This variable is then available for all conditional tests in modules as well as to Context. Both the mobile and desktop versions of the site could be driven by a single progressively enhanced theme.

Context

The Blocks UI is dated and on the way out, to be replaced by Context or other solutions. Context works well for controlling Blocks but it does have troubles with supporting theme based configurations.

In the issue queue for Context there has been some discussion around using Theme for a condition. IMO it would be great if Context could support a “conf variable” condition so that it would be possible to add conditions to check for a “device” global variable set in settings.php. It would then be possible to trigger Block reactions based on this variable. This would free up the possibility of a single theme supporting both sites.

Module loading

Being able to control the loading of modules would be a helpful addition. This would allow for turning off whole slabs of functionality not needed for mobile, providing a much better solution than mopping up HTML, JS and CSS in the theme layer. This would require changes to core so I can’t see it happening. in the meantime we have Mobile Tools and permissions.

Better caching

Upgrading caching layers to handle a “device” prefix to the key would enable a single domain to be used for serving the site. Boost is heading down this path already. There are other solutions available for Varnish.

Progressive themes

And finally we need some themes to experiment with responsive web design. From a practical perspective, my project had some slight annoyances because I was using two very different base themes: Fusion and Nokia Mobile. Translating my mobile subtheme across from the Desktop version was beset with a number of issues mainly to do with regions and bespoke features within the themes. If theme designers could supply a mobile counterpart for their desktop themes life would be easier. Even better if some themes were designed with progressive enhancement in mind.

Write up guidelines for mobile themes A brief discussion on implementing a mobile theme for Zen.

Up and running

If you want to get a mobile site up and running today then my recommendations are:

  • Think about mobile first and the core actions and data for your users.
  • Setup two sites, either multisite or with a settings.php conditional. Caching should work well with this.
  • Put User Agent mod_rewrite testing in Apache to handle redirects.
  • Some reverse proxy foo can serve one canonical domain if you know how.
  • KISS and be frugal with your module choices. Mobile Tools is a good option.
  • Select your mobile theme carefully and ideally match it up with the desktop theme.
  • Spend time on tweaking content in a custom module. Be a firefighter for a while :( .
  • Test.

I think there is a kernel of another idea in these articles as well and that is for a full adoption of a mobile first strategy for building websites in Drupal. With some small changes in mindset and code outlined above it should be relatively easy to do. This would allow the development of progressively enhanced themes, served from a single domain. The information architecture of Drupal would be improved significantly because we need only one site, one theme, are more RESTful, just as scalable, with simpler CSS, simpler SSL and simpler DNS and no duplicate content issues. Nirvana.

Resources

Some bedtime grazing…

Mobile Group GDO group. Training video (217M) “The first part of the training, which is an overview of basic theory of building mobile-accessible websites, is followed by a practical, hands-on component that steps through the practice of making mobile websites with the popular Drupal framework.” Interesting discussion from Drupal Everywhere: Making Mobile Websites with Drupal but not all solutions would be considered best practice. mobiForge “The world’s largest independent mobile development community” Programming the Mobile Web [Maximiliano Firtman] Book from O’Reilly, 2010 Mobile Design and Development: Practical Concepts and Techniques for Creating Mobile Sites and Web Apps [Brian Fling] Another book from O’Reilly, 2009 Be Sociable, Share!
Jan 12 2011
Jan 12

Previous articles have discussed the conceptual groundwork and setup of mobile sites in Drupal. It’s now time to look at a number of themes and modules which will help you in your endeavours. We’ll also cover a number of custom fixes which can be made to the HTML, CSS and JS in your site.

Mobile themes

Funnily enough, the selection of the mobile theme is looking to be one of the least important technical consideration with the whole mobile site. It’s one area where I am not in the best position to comment on the various merits of themes as I haven’t really tested theme all. I went with Nokia Mobile because it seemed to have a solid base, being based on code developed by Nokia. That said, I did have to make a number of changes to it to get it to work in with my site. Be prepared to get your hands dirty with page.tpl etc. The Adaptivetheme Mobile theme looks quite promising, being a sub theme itself it would naturally fit well with a desktop theme derived from the same base.

Nokia Mobile “Provides different presentation layers to best serve basic devices and high-end smartphones.” Mobile Garland Garland inspired mobile optimized Drupal theme intended to be used with a mobile optimization module Mobile Plugin. Adaptivetheme Mobile Hurrah! A mobile sub theme. “Adaptivetheme Mobile is a subtheme for Adaptivetheme. It is designed purely to build Drupal mobile themes for mobile web devices (for mobile websites).” Mobile “intended to return only clean HTML with no styling (images and styling in content is maintained) .mobi “Display a mobile, portable format.”

Mobile modules

There are a lot of options available to you when it comes to deploying modules to help you with your task. I am very much of the mind that modules should only be deployed if they are fit for the task and don’t introduce too much overhead or code you don’t understand. My aim is to keep things as “pure” as possible. In many cases you may be better writing your own custom code if you feel comfortable doing that.

Many tutorials recommend going with Domain Access and Mobile Tools with Browscap. It is a combination which could work well for you. However, I ended up not deploying any of these modules, chosing to go my own way. I’ll walk through each of the modules, their main features and why I went the way I did. It basically boiled down to the fact that Apache, settings.php and my own custom tweaks got me most the way there.

Domain Access

Domain Access is a popular suite of modules which can be used to manage (sub) domains for a (mobile) site. It is exceedingly well documented and structured. It looks to be a high quality module which supports a lot of functionality. Many mobile tutorials speak highly of it and recommend it for mobile sites.

Knowing relatively little about the module I reviewed its main features to see what it had to offer a mobile installation. From my quick review I have been unable to find anything compelling for the problem set I was facing. That said, if your mobile site is to diverge significantly from the desktop site you may find that some of the customisation features quite useful. There may well be stuff that I am missing and would be happy to be enlightened. The relevant features are as follows:

  • Domain Access: The core module allows for (sub) domains to be registered. This is really just the basic setup for the modules. In order for this to work your DNS and VirtualHosts need to be set up as you normally would for a multisite. ie. each domain pointing to the IP of your Drupal installation.
  • Domain Alias: It is possible to define domain aliases for each registered (sub) domain. eg www.example.com -> example.com. Alternatively, this result could be achieved by adding some aliases in you VirtualHost section in Apache.
  • Domain Theme: Allows you to define a theme for each (sub) domain. Alternatively, if you were using a multisite setup (or some conditional logic) you could set the default theme in settings.php.
  • Domain Config: Offers an extensive set of site configuration options including email, slogan, mission, footer, frontpage, anon user, admin theme, time, cache, menu mappings. Most of these tweaks can be achieved by traditional means. Conf variables can be overridden in settings.php. Custom menu blocks can be placed into regions.
  • Domain Source: Source domain for linking to content. This ensured that some links are rewritten to point to main site. In a mobile setup you would want the site to operate as normal (no rewriting). The duplicate content can be fixed with canonical URL link in the head.

Mobile Tools

Mobile Tools is a popular module which has a bunch of handy utility features most mobile sites could use.

  • Detection of UA and redirect based on Browscap and WURFL:Possible to map user agents to themes. More sophisticated if Browsercap or WURFL is used. This redirection should be taking place outside of PHP so I am a fan of doing this in Apache rewrite rules or maybe even a caching/reverse proxy layer. This alternative approach has been discussed above.
  • Block telling users of Mobile site: Helpful but easy to add manually.
  • Panels integration: No doubt very helpful if you are using Panels, as Panels own a path and that’s it. This could be a deal breaker so this could be essential. Personally, I stuck to very simple design so Panels wasn’t an issue for me.
  • Permissions integration: Mobile roles can be used to turn block aspect of the site based on permissions. This is a really good idea and a neat way to turn stuff off.
  • Change number of nodes on homepage: Helpful but could be done with a different view exposed as a block.
Drupal Support for Mobile Devices [Rachel Scott] Overview of the Mobile Tools module with screenshots. Mobilize Your Drupal Site with Mobile Tools Overview of the Mobile Tools module describing multisite setup.

Mobile Plugin

Wide range of features. Tackles some practical isses such as word breaks, scaling images, video embedding, filtering JS. Does device detection and provides own mobile theme. Unfortunately the doc specifies that “Page caching must be off to support multiple themes!”. This warning would put most people off. Does this apply even if two sites are being used?

Browscap

A popular module which returns capabilities based on user agent. The module will fetch updates to a database of browser user agents. Integrates with Mobile Tools.

WURFL

“The WURFL module helps you in detecting the device capabilities of the mobile device visiting your website.” Integrates with Mobile Tools. Knowing the capabilities of a device at a very fine level of granularity could be helpful if you are into eeking out every enhancement you can. the question is whether you need this level of control.

Module code vs Theme code

Adding a mobile version of your site will make you think about code duplication issues. If you have custom logic in your theme for the desktop site then there is a pretty good chance that a large chunk will be copied across to the mobile site. Bad news. Much of what makes it into themes is not 100% concerned with presentation. It’s hard to draw a line but if the code is required for mobile and desktop then it is a good candidate for being factored out into a more central place such as a module. Less code means less work for you in the future. If you do have custom code in template.php then take a look through it and see what can be moved.

Custom content

Not all changes can be made in the theming layer, it will be necessary to change and optimise the content served.

Custom blocks

Managing block configuration (region, order, title, paths, permissions, etc) is a right royal pain in the you know where, especially if you have a lot of blocks and you need to deploy across dev, staging and production. Going into the blocks admin interface and moving stuff around, editing, saving and repeating gets old real quick. Configuration concerns such as this have been overcome largely though code from DevelopmentSeed. Features to hold logic and configuration for grouped functionality. Features work nicely together with Context, which allows for Blocks to be positioned according to an overarching context. Cool. Context could be the answer we are looking for. It certainly is for a normal desktop site.

However, when it comes to configuring blocks for a mobile site, Context only knows about the current theme. This is a known issue for Context. There is another module, called Features Extra which possibly offers a way to capture config info for blocks, however it too suffers with themes. AFAICT it still isn’t possible to capture block config with multiple themes. Bummer. I’d be interested to know if there are solutions here.

In the meantime you can manually configure blocks the old school way but it really isn’t ideal.

Custom modules

This is one area I was unable to nail as well. In a few places it would have been very handy if I could have turned off a module dynamically to make the mobile site a bit simpler, eg. colorbox, admin menu. AFAICT there is no way to do this. Tracing the calls during bootstrap, I see that module_load_all() is called very late in the procedure at _drupal_bootstrap_full(). module_load_all() calls module_list() which gets all active modules. It would be great if module_list() could look to conf variables to respect a stop filter of modules. Not going to happen I know, but would be handy.

This is where the permissions integration in Mobile Tools could really shine. Instead of disabling a module you could control the operation of a module via permissions. Most modules should have permissions limitations on functionality and so can be turned off for the mobile site.

One way to work around this is to mop up the HTML/JS/CSS in the theme layer. This approach is ugly, error prone and brittle, but does hold some promise. You will find recipes similar to the following around the traps:

/**
* Implementation of hook_preprocess_page().
*/
function mw_mobile_preprocess_page(&$vars) {
if (!cuf_mobile_is_mobile()) { return; }
// Strips out JS and CSS for a path.; // http://www.mediacurrent.com/blogs/remove-or-replace-jscss-page
// WARNING: The code below messes up jQuery Update even when no scripts are
// replaced. Use at own risk.
$remove_csss = array(
//’colorbox’ => array(‘/styles/default/colorbox_default_style.css’),
);
$remove_jss = array(
//’colorbox’ => array(‘/js/colorbox.js’, ‘/styles/default/colorbox_default_style.js’),
);
// JS
$scripts = drupal_add_js();
if (!empty($vars['scripts'])) {
foreach($remove_csss as $module=>$paths) {
foreach($paths as $path) {
$module_path = drupal_get_path(‘module’, $module);
unset($scripts['module'][$module_path . $path]);
}
}
$vars['scripts'] = drupal_get_js(‘header’, $scripts);
}
// CSS
$css = drupal_add_css();
if (!empty($variables['css'])) {
foreach($remove_csss as $module=>$paths) {
foreach($paths as $path) {
$module_path = drupal_get_path(‘module’, $module);
unset($css['all']['module'][$module_path . $path]);
}
}
$vars['styles'] = drupal_get_css($css);
}
}

In the end I gave up on going down this path because I was running into a problem with jQuery not being updated, leading to JS errors on the page. It was too brittle for me to trust.

For me, the take away is that you are pretty much stuck with using the some modules if you are sharing the database. You just have to be aware of this when designing the site. The only way to solve this is to place some conditional login into you own custom modules which checck for the site being mobile. If you are using contrib then things will be a trickier.

You may desire have custom primary and secondary links for the mobile site. If you really have thought mobile first then maybe the menus will be the same :) but there’s a good chance they will be paired down for the mobile site. It’s not possible to easily define two sets of primary menuas, one for mobile and one for desktop. However, Mobile Tools offers a way to map primary/secondary menus to other menus. There are two other options though if you don’t want to install Mobile Tools.

  • Define different menus (eg. Primary Mobile) and drop them into desired region using Blocks. Comment out the primary links in page.tpl.
  • Programmatically set the links in a custom module

In the end I just programmed these menus in code in my mw_mobile module because the menus had some logic in them for login/logout links:

/**
* Implementation of hook_preprocess_page().
*/
function mw_mobile_preprocess_page(&$vars) {
if (!mw_mobile_is_mobile()) { return; }
// Completely hijack the primary menu and set it from code. This allows
// the primary menu to be managed in features for the desktop site. We just
// need to oveerride it here.
$vars['primary_links'] = array();
$vars['primary_links']['blah'] = Array (
;’title’ => t(‘Blah’),
‘attributes’ => Array(‘title’ => ‘Blah.’),
‘href’ => ‘blah’
);
// etc
}

Custom Views

This section really gets back to the “mobile first” and “responsive web design” concepts we discussed earlier. Views are very powerful and there is a strong temptation to make them as sexy as possible, displaying images, extra content, edit links, star ratngs and the like. Step back and take a look at what you are doing. It maybe possible to design a simple display which works well in mobile and desktop.

Often you really do want to display rich tabular information in the desktop version of the site. In these cases you shouldn’t compromise – you’ll need to create different versions. In these cases progressive enhancement doesn’t really cut it as you want to return more content, not just tweak the presentation.

If it is a View Block which is giving you grief then just make a mobile version and use that instead. Use the block system to place different blocks for the different themes.

If it is a View Page then you could be in trouble as the View takes hold of the path and it is difficult to customise that on a per site/theme basis. One solution is to expose the View as a Block (with a mobile version) and then place that block on a Page (node) or menu path you have made. In this case the page is acting like poor man’s Panels. A bit ugly but it works.

If you are lucky you might be able to define a custom formatter which just returns a blank (or a simple version) if the site is mobile.

A final alternative is to define View templates which have some conditional logic in them. This is possibly the purest way but I think it could become a maintenance issue. We are trying to minimise duplication and effort – creating new files with extra display logic is best avoided.

Custom Panels

I’ll come clean here and own up to not having caught the Panels bug just yet, being content to limit my degrees of freedom. Yes, I am that boring :) Anyway, Panels faces a similar problem as Views Pages in that they are tied to a path which isn’t scoped by a theme (as Blocks are). In this case, Mobile Tools could be quite helpful and showing different layouts for a Panel.

Custom Variables

Drupal can be configured with a whole bunch of variables, many of which are available for editing in the site configuration part of the site. Fire up phpMyAdmin and browse the variable table to get an idea of what is available. These variables are site wide and as such will apply equally to the mobile and desktop versions in our multisite setup. It is possible to override these variables for the mobile site by tweaking settings.php. We have already seem this in action for the default theme. You can do it for other things as well. Mobile Tools offers an interface for this but you can do it manually. I have found that only a small number of rarely changed variables need to be tweaked and so settings.php is a viable option.

$conf = array(
‘theme_default’ => ‘mw_nokia_mobile’,
‘googleanalytics_account’ => ‘UA-xxxxxxxx-2′,
);

Output tweaks

Mobile devices have special requirements, not all of which could be handled by the theme templates alone. The metadata and content of the sites may need some special work. The Rethinking the Mobile Web slideshow above noted that we need to adjust and compress content to make it palettable for mobile. This is where a lot of that nasty work happens. You’ll probably only run into these issues after testing the site for real. No doubt you will have your own special set of problems to deal with. The http://drupal.org/project/mobilepluginMobile Plugin module plugs some of these holes.

ImageCache

You probably have a bunch of ImageCache presets defined for your site, which may or may not be captured in Features config. These presets may be outputting images at a size which is too big for the mobile site. Anything wider than 100px is probably too big. You are aiming to be frugal with bandwidth as well as screen real estate. Time to get shrinking those images. See hook_preprocess_page code below.

Secure Login

If you are using the Secure Login module, you may run into difficulties if you have the multisite setup. The way I had Secure Login configured was to specify the URL to redirect to. This URL is of course for the desktop version of the site and your mobile users will be routed to the desktop site after they log in. They may not notice it if URL redirecting is working for mobile users but we would want to minimise redirects such as this.

It is possible to leave the Secure Login URL blank and then it will apparently use the base_url defined in settings.php. This would be a sensible approach, however, I was having all sorts of path difficulties with ImageCache if I specified these URLs. Don’t know why. Anyway, the easiest solution for me was to stick with the hardcoded URL for Secure Login and then to fix it up in the module.

/**
* Implementation of hook_preprocess_page.
*/
function mw_nokia_mobile_preprocess_page(&$vars) {
// imagcache images are shrunk to mobile size
$fix_regions = array(‘content’, ‘right’, );
foreach($fix_regions as $fix_region) {
_mw_nokia_mobile_fix_imagecache($vars[$fix_region]);
}
// secure login will target the url you entered into the site config.
// there might be a better way to fix this but we just string replace here
_mw_nokia_mobile_fix_secure_login_form($vars['content']);
}

/**
* Secure login hardcodes the URL entered in the config: securelogin_baseurl
* This will be the desktop version of the site. We need to change it to the
* mobile version. There isn’t an obvious way to do this via code, unless you
* write your own hook_form_alter but that would usurp the function of
* securelogin. So we just mop up afterwards. These login pages will be
* cached anyway.
* eg https://example.com -> https://m.example.com
*
* NB: It MIGHT be possible to leave out securelogin_baseurl in the config and
* manually set the base_url in the settings.pgp for the site. However, when
* I did settings like this in the past I ran into problems… can’t remember
* what they were now… So this might be a solution which would avoid the need
* for this code.
*/
function _mw_nokia_mobile_fix_secure_login_form(&$text) {
if (!module_exists(‘securelogin’)) {
return;
}
$sec_url = variable_get(‘securelogin_baseurl’, ”);
if (empty($sec_url )) {
return;
}
$new_url = str_replace(‘https://’, ‘https://m.’, $sec_url);
$pre = ‘<form action=”‘;
$paths = array(‘/user/login’, ‘/user/register’);
foreach($paths as $path) {
$search = $pre . $sec_url . $path;
$replace = $pre . $new_url . $path;
$text = str_replace($search, $replace, $text);
}
}

/**
* Map imagecache presets to smaller presets. This is VERY UGLY because you
* need to correct for the width and height as well. Sorry to have impinged
* upon your senses!
* Adapted from http://groups.drupal.org/node/50678#comment-227203
*/
function _mw_nokia_mobile_fix_imagecache(&$text) {
if (!module_exists(‘imagecache’)) {
return;
}
// mappings. ignore: slider_perview, slider_thumbnail
// old ic, old width, old height, new ic, new width, new height
$mappings = array(
array(‘thumbnail_small’, ’83′, ’83′, ‘mobile_thumbnail’, ’75′, ’75′),
array(‘thumbnail’, ’100′, ’100′, ‘mobile_thumbnail’, ’75′, ’75′), // thumbnail last
);
// fix
$file_url = base_path().file_directory_path();
foreach($mappings as $mapping) {
list($old, $old_w, $old_h, $new, $new_w, $new_h) = $mapping;
$old_path = $file_url .’/imagecache/’ . $old;
$new_path = $file_url .’/imagecache/’ . $new;
$old_class_size = ‘imagecache-’ . $old . ‘” width=”‘ . $old_w . ‘” height=”‘ . $old_h . ‘”‘;
$new_class_size = ‘imagecache-’ . $new . ‘” width=”‘ . $new_w . ‘” height=”‘ . $new_w . ‘”‘;
$text = str_replace($old_path, $new_path, $text);
$text = str_replace($old_class_size, $new_class_size, $text);
}
}

Search snippets URLs

I’m not sure if the following applies to ordinary Drupal search but it certainly does with Apache Solr Search. The URLs for the individual search results were coming back with the fully qualified URL pointing to the desktop site. This was solved by a bit or mopping up in a base feature, mw_base.

function phptemplate_apachesolr_search_snippets($doc, $snippets) {
// mobile site?
$mobi = module_exists(‘cuf_mobile’) && cuf_mobile_is_mobile();
$url = $doc->url;
if($mobi) {
$url = str_replace(‘http://’, ‘http://m.’, $url);
}
}

GMap

The combination of Location and GMap is a very popular one on Drupal sites. GMap module is currently targetting version 2 of the Google Maps API. Version 3 offers a bunch of new features for mobile devices.

Google Maps JavaScript API V3 “The Google Maps Javascript API lets you embed Google Maps in your own web pages. Version 3 of this API is especially designed to be faster and more applicable to mobile devices, as well as traditional desktop browser applications.”

For now users of GMap are stuck on v2 but there is active development in GMap to bring the module up to support v3.

WYSIWYG

WYSIWYG textareas do not display properly on some mobile devices. You need to turn them off.

WYSIWYG on mobile devices Discussion on WYSIWYG issue queue regarding the difficulties faced on a variety of devices. End conclusion appears to be that you need to turn it off for best results.

How to turn of WYSIWYG? After a bit of poking around I worked out that setting the ‘format’ for the textarea to an empty array was the way to do it. The following code in your mobile module will do the trick for comment and node bodies. If you have other forms which need fixing then you’ll need to do a bit of debugging to suss out what the form_id and element id is.

/**
* Implementation of hook_form_alter.
*/
function mw_mobile_form_alter($form_id, &$form) {
if (!mw_mobile_is_mobile()) { return; }
// turn off wysiwyg forms for textareas. you need to manually find the form
// with wysiwyg and then work out its id and where ‘format’ is.
//print $form_id['#id'] . ‘ ‘;
$no_wysiwyg = array(
‘comment-form’ => ‘comment_filter’,
‘node-form’ => ‘body_field’
);
$id = $form_id['#id'];
if (array_key_exists($id, $no_wysiwyg)) {
//print_r($form_id);
$form_id[$no_wysiwyg[$id]]['format'] = array();
}
}

Tabs

The primary and secondary tabs which appear on the top of the page tend to take up a fair amount of horizontal space and will be the element which causes horizontal scrolling. These tabs can be easily restyled to display as a traditional list. You can also update page.tpl and move the tabs to the bottom of the page so they don’t detract from the main content.

Flash

Flash is not going to work in iPhones, iPad and many other devices. It’s also heavy and resource intensive. As such it shouldn’t reallu be used for content or navigation on the mobile site. The exception might be to show video content, however, even in this case there might be better workarounds.

Suckerfish

Suckerfish provides dropdown menus which can take up a lot of room. The hover metaphor doesn’t work for touch devices. Best avoided.

Make sure that links are big enough to clickable: large with enough whitespace around key navigation links.

YouTube

Mobile devices such as the iPhone and iPad may have a special app to handle YouTube videos natively so clicking on a link is preferable than displaying an embedded video.

Advertising

Yes – ads can be optimised for mobile as well.

Google Mobile Ads “In May 2010 Google acquired AdMob, a leading mobile advertising network that developed innovative mobile-specific ad units and solutions, to significantly enhance our mobile display ad services for publishers.”

Testing

Testing an afterthought? Never :) The fact is that a lot of the hard work is in getting over the config hurdles. once the mobile site is up and running you are going to uncover a bunch of things you never dreamed about. Here’s a quick checklist of things to look out for:

  • UA testing and redirect working.
  • Boost/Drupal Core page caching working.
  • SSL login working OK.
  • Basic functionality and navigation operational.
  • Theming is up to scratch.
  • Site fulfills its goals.
Be Sociable, Share!
Jan 12 2011
Jan 12

A previous article covered some basic groundwork for mobile sites in Drupal. This article goes on to look at different ways to setup a mobile site in Drupal. It covers single site, multisite, single site with settings.php tweak and the Domain Access module. Caching strategies, redirect rules and other server side settings are also discussed.

RESTful design of URLS

REST defines the architecture of the World Wide Web. One of the principles of REST is that a single URI represents a resource and that resource is conceptually different from the representations returned to the client.

Representational State Transfer “Representational State Transfer (REST) is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term Representational State Transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation.[1][2] Fielding is one of the principal authors of the Hypertext Transfer Protocol (HTTP) specification versions 1.0 and 1.1″

Here’s the passage from Roy Fielding’s thesis (emphasis added) which discusses the differences between resource and representation:

“This abstract definition of a resource enables key features of the Web architecture. First, it provides generality by encompassing many sources of information without artificially distinguishing them by type or implementation. Second, it allows late binding of the reference to a representation, enabling content negotiation to take place based on characteristics of the request. Finally, it allows an author to reference the concept rather than some singular representation of that concept, thus removing the need to change all existing links whenever the representation changes (assuming the author used the right identifier).”

A resource is named by a URI. The server chooses the best representation to provide to the client based on headers sent by the client. In this case we are looking at the User Agent.

If we were to follow RESTful principles then the mobile site should indeed be served from the some domain as the desktop site. ie. one resource, different representations. In this scenario the HTML returned to the mobile client is just a different respresentation to that provided to the desktop site. This is a natural way to design a web app as it means that there is only one “canonical” URI for the resource with no chance of nasty duplicate content issues. From an SEO point of view this is desireable. However…

Caching, the fly in the ointment

We’ve just seen that serving different representations from a single URI is a good thing from many perspectives: mobile first, progressive enhancement, REST and SEO. However, there is one reason why we may we may decide to go down the path of using two domains instead of one: caching.

Caching mechanisms, such as Drupal Core and Boost, used the fully qualified domain name of a URI to determine caching keys. This allows the cache to quickly serve content to different clients without knowing the criteria which decides the representation received by the client, ie. the cache just has to know about the URI, it doesn’t need to decipher the user agent. Currently, if different representations are served for the same resource then the cache will likely become populated with a mix of different representations, leading to chaos. For this reason it is generally accepted that having a separate mobile site on a sub domain is a good way to go. ie. we would have two sites:

Cache by theme for mobile sites mikeytown2 offering some great advice on Apache and Boost rules. .htaccess Mobile Browser Redirect User Agent processing in Apache to redirect to mobile.

Some users have solved the caching problem AND manage to serve different representations from the same URI. Going mobile with a news site that Just Works describes how browser detection can be done in the caching layer, in this case Squid, before redirecting the request invisibly to another domain. This is the perfect setup as RESTful principles are maintained and the site is scalable. Hats off. Unfortunately not everyone is running a reverse proxy which allows for this kind of setup. A request looks like this:

  1. mobile client does GET http://example.com/about,
  2. Squid (port 80) looks at User Agent, determines device and sends to http://m.example.com/about,
  3. Boost finds “about” in /cache/normal/m.example.com/ -> Static HTML returned OR,
  4. Drupal serves from multisite -> Dynamic HTML returned.

mikeytown2 claims that it should be easy enough to add some logic into the Boost rules based on user agent, he just needs to know what they are. So there is a good chance that Boost user’s will be able to server both mobile and desktop from two sites with one URI space. From my understanding of the proposed approach it looks like a single domain will be all that is required.

  1. mobile client does GET http://example.com/about,
  2. Boost looks at User Agent, determines device and uses a different “mobile” device rather than “normal”,
  3. Boost finds “about” in /cache/mobile/example.com/ -> Static HTML returned OR,
  4. Drupal serves from single site -> Dynamic HTML returned.

A slightly different approach has been described in Mobile Detection with Varnish and Drupal where Varnish sets a header which can then be read in the webserver or Drupal. This is a neat approach as it means that device logic needn’t be repeated in settings.php. The flow described by Morten Fangel is as follows:

  1. mobile client does GET http://example.com/about,
  2. Varnish also sets a X-Device header for the device
  3. Varnish looks at User Agent, determines the device and appends it to the hash for the key
  4. Varnish “about” in cache -> Static HTML returned OR,
  5. Drupal serves from single site -> Dynamic HTML returned.

Assuming you don’t have Squid, Varnish or a patched Boost to hand you will probably have a setup as follows:

  1. mobile client does GET http://example.com/about,
  2. Apache rewrite looks at User Agent, determines device and redirects http://m.example.com/about,
  3. Drupal Core or Boost finds “about” in cache -> Static HTML returned OR,
  4. Drupal serves from multisite -> Dynamic HTML returned.

Sub domain vs Different Domain

If you are going to use a separate site to host the mobile site then you are free to chose whatever domain you like. eg. example.mobi. However, it is generally recommended to stick with using a sub domain of the desktop site. This confuses users less and it is possible to share cookies across sites on the same domain.

Different theme

As discussed in the previous article, it is possible to serve the same default theme to both mobile and desktop sites and then progressively enhance the desktop site with some extra CSS. The method proposed in Rethinking the Mobile Web at slide 106:

<link href=’default.css’ type=’text/css’ rel=’stylesheet’
media=’screen’ />
<link href=’desktop.css’ type=’text/css’ rel=’stylesheet’
media=’screen and (min-device-width:1024px) and (max-width:989px)’ />

This is a very cool way to design a site as it keep things very simple. Mobile is first and then comes the progressive enhancement. However, this isn’t a pattern which is adopted by most Drupal themes where the presumption is for the desktop theme. If we did take this approach it would preclude us from using the majority of themes designed for Drupal so far. Given this, I would say that our Drupal site will support two separate themes, one for desktop and one for mobile. The general approach is to use a multisite setup. define a desktop theme as default in the GUI and then to override that theme via a tweak in settings.php for the mobile site.

Multisite setup

Assume we are using two domains due to caching requirements. How do we serve this content? Drupal does have a multisite feature built in where a single Drupal “platform” can support many different site instances. These sites can share all data, no data or partial data, depending on how they are setup in settings.php. In the case of a mobile site we would want to share all data between the sites.

One possible setup is to create a directory for the desktop and mobile versions under sites/

sites/

  • all/
    • modules/
      • contrib/
      • custom/
        • mw_mobile/
    • themes/
      • base_desktop/
      • base_mobile/
      • mw_desktop/
      • mw_mobile/
  • default/
  • example.com/
    • settings.php
  • m.example.com/
    • settings.php

The only trick to get this work is to manually set the default theme for the mobile site in the sites/m.example.com/settings.php file. For every page request, the config in settings.php will override the default variables defined in the variables table in the database.

$conf = array(
‘theme_default’ => ‘mw_nokia_mobile’,
);

If you manually set a value like this you won’t be able to change it in the UI, naturally enough. Make sure the theme is active in the GUI.

Alternative 1: Single site with settings.php logic

The above multisite setup will work, however, there is something wrong with it. It will stop you from hosting a true multisite setup where the sites share code but have different databases. This may not worry you if you are only hosting a single site for the platform but it could be important if you want multisites. Imagine a site for Company X served on example.com and Company Y is on example.net. You couldn’t use multisites with the above settup because of the reliance on shared files in default/files.

However, you can achieve a very similar effect with a single site by using a bit of conditional logic in settings.php for example.com and example.net. The idea is to set the theme based on the domain leading to only needing a single site to support desktop and mobile. Add this to sites/example.com/settings.php/

$parts = explode(‘.’, $_SERVER['HTTP_HOST']);
if ($parts[0] == ‘m’) {
$conf = array(
‘theme_default’ => ‘company_a_mobile’,
);
}

You could then support mobile with a pure sites setup with shared code and different databases/files. This is a good way to go.

sites/

  • all/
    • modules/
      • contrib/
      • custom/
    • themes/
      • base_desktop/
      • base_mobile/
  • default/
    • files/ -> empty
  • example.com/ -> company A
    • files
    • modules
    • themes
      • company_a_mobile
      • company_a_desktop
    • settings.php -> with conditional setting of default theme
  • example.net/ -> company B
    • files
    • modules
    • themes
      • company_b_mobile
      • company_b_desktop
    • settings.php -> with conditional setting of default theme
multi site a) standard theme, site b) mobile theme – same code and same tables? Discussion of multisite setups.

Alternative 2: Domain Access

The Domain Access module, discussed later, can set a lot of this up for you including sub domains, domain aliases and themes. You may prefer to use it for convenience, especially if you like configuring stuff in a GUI rather than settings.php or custom modules.

Mobile global variable

Modules are going to want to access a global variable which tells them the device accessing the site: mobile or desktop. There are a variety of ways to do this, some set the variable early, others late:

  1. Custom “X-Device” header set in a reverse proxy
  2. Conf variable set in settings.php
  3. Global variable set by a module during hook_init()
  4. API function offered by a module

It is possible to do this through the use of hook_init() in a custom module. I tried this but ran into problems with timing and module weight. Sometimes you will want the mobile module to be heavy, sometimes light :) In the end I went with an “api” function in my mw_mobile module which stored a static variable. It should be pretty fast and not to cumbersome. Other contrib modules take an approach similar to this.

/**
* An API function in a custom module
* Efficiently returns whether the site is mobile.
* Other modules should call it as follows:
* $mobi = module_exists(‘mw_mobile’) && mw_mobile_is_mobile();
*/
function mw_mobile_is_mobile(){
static $out;
if (isset($out)) {
return $out;
}
// set and return
if (substr($_SERVER["SERVER_NAME"], 0, 2) == ‘m.’) {
$out = TRUE;
} else {
$out = FALSE;
}
return $out;
}

This approach is perhaps not the best. It may be better to set a global variable very early in the bootstrap process, in settings.php, so that it could be reliably used by all other Drupal code.

Cross site authentication

It is possible to set cookies up so that they will be sent no matter what the sub domain. In settings.php uncomment the $cookie_domain variable and set it to the domain, excluding the sub domain. Please note that this will not work if you are using different domains.

$cookie_domain = ‘example.com’;

Redirecting the user to mobile

When a mobile user hits the desktop version of the site you want them to be redirected to the mobile site. There’s at least three ways to do this:

  • PHP
  • JS
  • Apache

The first inclination maybe to go with PHP as afterall, we are PHP developers. However, this has the shortcoming of requiring that Drupal be bootstrapped before the PHP can be run, destroying the chance to safely cache the page for anonymous users. It’s slow and ineffective. Doing it in PHP therefore isn’t an option. This is the approach some of the mobile modules take but I think it’s something to be avoided.

You could of couse do a client side check in Javascript for the client’s user agent. This will allow for caching but has the downsides of forcing a full page download. Also, not every client will have JS enabled. Not really an option.

The final option of doing it in Apache (or your webserver) is the only viable alternative. I went with a recipe similar to the following in my .htaccess.

# Mobile: force mobile clients across to the mobile site
RewriteCond %{HTTP_HOST} !^m\.(.*)$
RewriteCond %{HTTP_USER_AGENT} !ipad [NC]
RewriteCond %{HTTP_ACCEPT} “text/vnd.wap.wml|application/vnd.wap.xhtml+xml” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “acs|alav|alca|amoi|audi|aste|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “dang|doco|erics|hipt|inno|ipaq|java|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|opwv” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “palm|pana|pant|pdxg|phil|play|pluc|port|prox|qtek|qwap|sage|sams|sany” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|w3cs|wap-|wapa|wapi” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “wapp|wapr|webc|winw|winw|xda|xda-” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “up.browser|up.link|windowssce|iemobile|mini|mmp” [NC,OR]
RewriteCond %{HTTP_USER_AGENT} “symbian|midp|wap|phone|pocket|mobile|pda|psp” [NC]
RewriteCond %{HTTP_USER_AGENT} !macintosh [NC]
RewriteRule ^(.*)$ http://m.%{HTTP_HOST}/$1 [L,R=302] .htaccess Mobile Browser Redirect Outlines the approach taken above.

SSL issues

If you are using SSL on the desktop version of your site then you have a couple of extra hurdles to jump in order to get it working on the mobile site.

Firstly, as it isn’t possible to set up SSL for two different domains on the same IP address, you will probably need to rent a new IP address for the mobile version of the site. Sort this out with your ISP. It should cost you between $1 and $3 a month for another IP. They may have instructions for setting up A records, static routing for your IP addresses, etc.

Secondly, you will also need to sort out another certificate for the mobile site. You could purchase a wildcard certificate for the domain and all sub domains. These cost a fair bit more and will save you from buying a new cert. However, it is probably cheapest to get another cert for the mobile site along with a new IP. You will then need to install the certificate on your server and tweak your site config for the mobile site. This certainly is one of the pains of having two separate sites.

PositiveSSL from Comodo $10 pa. Gandi Free first year then 12 Euro pa.

Custom version of robots.txt

A corollary of having a shared database and file system with a multisite install is that you can’t have custom versions of key files such as robots.txt which sits in the root of your Drupal platform. In the simple case I don’t believe that there is any need to have a different version, however, if you do need to support different versions then you can do it with a bit of .htaccess magic. Place the following code under the mobile redirect rule. Just be sure to add robots.txt to the /sites/%{HTTP_HOST}/files/robots.txt.

# robots.txt: solve multisite problem of only one robots.txt
# redirects to file in /sites/files/robots.txt
RewriteRule ^robots\.txt$ sites/%{HTTP_HOST}/files/robots.txt [L] multi-site robots.txt GDO discussion of this approach.

Duplicate content, Canonical URLs and robots.txt

You now have two sites where the mobile site replicates all the content of the desktop site. This is a major issue as search engines such as Google will treat is as duplicate content leading to declines in ranking. We need to sort this out. Google came up with the concept of a canonical URL which can be defined in a link element in the head of the HTML page. In our case the link points back to the desktop site.

<link rel=”canonical” href=”http://example.com/about” /> Specify your canonical Google documentation on how to define a canonical URL.

We need every page in the mobile site to support this tag. This can be set in your mobile module:

/**
* Implementation of hook_init().
*/
function mw_mobile_init() {
if (!mw_mobile_is_mobile()) { return; }
// Add a canonical URL back to the main site. We just strip “m.” from the
// domain. We also change the https to http. This allows us to use a standard
// robots.txt. ie. no need to noindex the whole of the mobile site.
$atts = array(
‘rel’ => ‘canonical’,
‘href’ => str_replace(‘://m.’, ‘://’, _mw_mobile_url(FALSE)),
);
drupal_add_link($atts);
}

/**
* Current URL, considers https.
;* http://www.webcheatsheet.com/PHP/get_current_page_url.php
*/
function _mw_mobile_url($honour_https = TRUE) {
$u = ‘http’;
if ($_SERVER["HTTPS"] == “on” && $honour_https) {$u .= “s”;}
$u .= “://”;
if ($_SERVER["SERVER_PORT"] != “80″) {
$u .= $_SERVER["SERVER_NAME"].”:”.$_SERVER["SERVER_PORT"].$_SERVER["REQUEST_URI"];
} else {
$u .= $_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];
}
return $u;
}

The final thing to resolve is whether to set “noindex” on the mobile site. This definitely an area where there is some confusion on the web. After sniffing around I came to the conclusion that it was OK to allow Google to index the mobile site, so long as the canonical links have been specified. This means that any page rank given to the mobile site will flow to the desktop site and you won’t be punished for duplicate content.

The outcome is that you can go with the same robots.txt for both sites, ie. robots are free to index the mobile site. There is no need to specify a different robots.txt for mobile. You want the same stuff indexed for the mobile as you do with the desktop.

The one exception to this would be the files/ directory. A recent core update to 6.20 allowed files/ to be indexed. Fair enough, you want your public images to be indexed. However, you could raise the case that files/ shouldn’t be indexed in the mobile site, given that there is no way to specify a canonical link for these binary files. So, you may well want to support a different robots.txt for each site by blocking access to files on the mobile site. This is a very minor issue and probably not worth worrying about.

Be Sociable, Share!
Jan 12 2011
Jan 12

Recently I was involved in a project to convert a site into a mobile site for Drupal. During the process I had to overcome a number of problems – problems which pretty much everyone designing a mobile site will have to solve. This series of articles covers a bit of theory, the practical problems and the various decisions I made. Hopefully it will help some of you out. The solutions offered may not be the best or only solution by any means – in many cases you need to decide what is best for your site or setup.

Get amped for mobile

In the past it has been easy to not even consider mobile when designing a site or an application. However, there are so many compelling reasons these days to treat your mobile site as a key component of your web strategy. Mobile devices are becoming increasingly popular, more of your users will be accessing your site through a phone or tablet. They will expect a good experience from your site or else will not return. Watching a user attempt to browse your unoptimised desktop app on a mobile device is an embarrassing experience for you and frustrating for the user.

Drupal in a tablet world [Dries Buytaert] Dries opens up a discussion about the future of Drupal for mobile and tablets. “This all begs the question: in a world of tablets and mobile handheld devices, what do we need to do so that Drupal will be a go-to platform? How can Drupal contribute so as to be a player in the ever-expanding ecosystem of tablets and mobile phones?” Everything you ever wanted to know about mobile, but were afraid to ask [Tomi T Ahonen] A long article which covers the myths and misconceptions about mobile. Very useful for getting your head around just what a remarkable medium it is.

There are many reasons for choosing to develop a mobile web site as opposed to a “native app”. This doesn’t have to be an either/or decision of course, however, developing a mobile site first is probaly a sensible decision. Firstly, the fractured nature (programming language, feature sets, application stores) of the “native apps” space across the various mobile platforms (iPhone, Android, Blackberry, etc, etc) makes development of separate apps an expensive and time consuming process. WebKit and HTML5 offer a lot of promise for developing richer clients which approach the features provided by native apps. It is interesting to note that Google and Apple continue to support WebKit even though they have their own native app platforms. Secondly, the chances are that you are going to have more visitors hitting your site after searching Google than will download a custom app. Optimising this user experience from Google makes sense.

Design philosophy

Before jumping into the nitty gritty of tweaking Drupal up to accommodate a mobile site, it’s worth putting in a bit of groundwork. Planning for and designing a mobile website should involve taking a fresh approach to your site or application. By focussing on mobile, and it’s inherent limitations and features, we can concentrate on what is truely important to users. I thoughly recommend that you take 10 minutes and flick through the following slideshow:

Rethinking the Mobile Web by Yiibu Slideshow covering the old and new ways of doing mobile. “Progressive enhancement” and “Responsive web design” techniques discussed.

Of particular interest to us is Slide 43 which covers a soup of considerations we face when designing a mobile site. These are some of the thorny issues which crop up:

  • device detection,
  • content adaption,
  • multiple sites,
  • device capabilities,
  • doctypes,
  • multiple templates.

There are a lot of issues to deal with and this article will deal with many of them. There is no silver bullet to solving a lot of this stuff. However, we will go into the exercise armed with some solid principles which should serve us well. Slide 125 lays it out for us:

  • mobile first,
  • well structured meaningful markup,
  • progessively enhance,
  • adapt content,
  • compress content.

Drupal as a platform is pretty good at well structured markup, adaptable content and compressed content. Two themes worth pondering a little further at this point are “mobile first” and “progressive enhancement” as they will help us think about the mobile site in a functional way rather than just as a cut down version of the desktop site.

Mobile first

Mobile has traditionally been considered an afterthought, if at all. However the landscape has shifted and it is now vitally important. Luke Wroblewski, a strong proponent of the approach, identifies three key reasons:

  1. Mobile is exploding
  2. Mobile forces you to focus
  3. Mobile extends your capabilities

Focussing on the mobile platform forces the developer to prioritize the important things. What are the key actions and data for your application? Strip everything away and focus on just that. The end result will be an app which is simpler, easier to understand, just as functional and much better. This brings development back from flashy extraneous eye candy to a functional site which is designed to serve the user.

Some questions you might ponder. Why are your users accessing the site? What are the likely actions they want to perform? What information do they want to access? What’s the simplest way to do it? Do I need that navigation? Are those icons distracting? Is the page fast to load? Do I need a better algorithm for finding nearest locations…

Mobile First [Luke Wroblewski] Three reasons why Web applications should be designed for mobile first instead. Mobile First [Luke Wroblewski] “In this presentation, Luke Wroblewski will dig into the three key reasons to consider mobile first: mobile is seeing explosive growth; mobile forces you to focus; and mobile extends your capabilities.” Barcelona: Mobile First Eric Schmidt (Google) talks mobile. Google encourages developers to work on mobile first before desktop. “The phone is no longer the phone, it’s your alter ego. It’s the extension of everything we are. It doesn’t think as well as we do, but it has a better memory…. This is the time for us, now is the time for us to get behind this. … We understand that the new rule is mobile first.”

Progressive enhancement

One of the big conceptual advancements in web development in the last seven years has been a move away from “graceful degradation” towards progressive enhancement. Coined by Steve Champeon (of hesketh mailing list fame), progressive enhancement describes a process of starting with the basics and then building up functionality for richer clients. That is, instead of designing for a rich client and taking away bits and pieces for simpler devices, we start with semantic HTML and the progressively add CSS, Javascript and other goodies which improve the content.

Drupal has been very good in this regard. There is a strong emphasis on producing semantic HTML with no inline styles or Javascript. Inline images for presentation are avoided. Drupal pages rock when viewed with no stylesheet whatsoever. They are pretty pure. The ultimate expression of this is the Stark theme in Drupal 7 which shows the quality of the HTML produced by Drupal.

However, there is a very strong bias towards designing for Desktop sites. Many recent developments in theming revolve around areas such as 960 grid designs. Further, the use of jQuery as a default Javascript library means that many modules and themes make use of it is presenting a page. This isn’t bad on its own but does present problems when it isn’t easy to turn off. In short, Drupal is in a good position as far as progressive enhancement is concerned but there are a few hurdles to jump.

Inclusive Web Design For the Future with Progressive Enhancement [Champeon and Finck] The original presentation at SXSW 2003. Graceful degradation versus progressive enhancement [Christian Heilmann] Article from Opera with practical examples.

Responsive Web Design

Another meme which has emerged recently is that of Responsive Web Design. Ethan Marcotte wrote a short article for A List Apart coining the phrase which basically describes fluid layouts which work well on a variety of screen sizes. By using floated divs, which are relatively narrow, layouts can adapt to the screen size. The secret sauce here is the use of media queries to serve progressively enhanced CSS to clients with more screen real estate. The “mobile” site is the “default” site suitable for the most basic of clients and the “desktop” site is the enhancement.

Fluid grids, flexible images, and media queries are the three technical ingredients for responsive web design, but it also requires a different way of thinking. Rather than quarantining our content into disparate, device-specific experiences, we can use media queries to progressively enhance our work within different viewing contexts.

This approach is counter to the traditional way of theming in Drupal where the presumption is for the desktop site. Currently there aren’t any themes that I am aware of which take this approach. Anyone?. It possibly is an area for experimentation in the future. So while we won’t necessarily be able to immediately make use of the mechanics of the technique (media queries) the basic principle of using floated divs in areas such as Views is certainly one we can make good use of.

Responsive Web Design [Ethan Marcotte] A Discusses fluid grids and media queries. Outlines a possible future where the default version of the site looks good in all browsers but can be progressively enhanced using extra stylesheets for bigger screens. Responsive Web Design Book [Ethan Marcotte] Learn how to think beyond the desktop and craft beautiful designs that anticipate and respond to your users’ needs. Media Queries “A media query consists of a media type and zero or more expressions that check for the conditions of particular media features. Among the media features that can be used in media queries are ‘width’, ‘height’, and ‘color’. By using media queries, presentations can be tailored to a specific range of output devices without changing the content itself. “ Responsive Web Layout – in anticipation of the Mobile Event [nodiac] One of the few mentions of responsive web design on GDO. Responsive web design – Drupal theming Slideshow which covers the details of media queries with some good advice on being practical within the confines of Drupal.

Zen of mobile

In the spirit of the Zen of Python here is my Zen of Mobile. Consider these things when sketching out a design for your mobile site. Import this :)

  • simple is better than complex,
  • fast is better than slow,
  • sparse is better than busy,
  • fewer options are better than more,
  • most important to least important,
  • every element must add,
  • every page to satisfy a single aim.

Some more practical pointers after looking around at a few decent mobile sites:

  • banner to be frugal on vertical space,
  • search is crucial gateway to site,
  • main content first,
  • single “focus” image for the page is OK,
  • other ancialliary images are distracting,
  • lists are good,
  • primary nav is available but not necesarily at top of page,
  • collapsing divs are good to compress blocks to be readily consumed on single screen,
  • minimise clicks by displaying simple form on that page if user likely to want to take action,
  • minimise typing if possible,
  • homepage gateway to user paths with focus on single key task.
Mobile Web Design Trends For 2009 Some good tips and screenshots from popular, well designed mobile sites. A Study of Trends in Mobile Design Stats on different approaches taken by various websites. http://www.w3.org/TR/mobile-bp/ [W3C] Best practices to follow when designing for mobile devices.

Location aware

Mobile devices have extra capabilities over traditional desktop environment making them more immediate and powerful. The Mobile First section mentioned that mobile devices are superior to normal desktop devices and that apps should take advantage of this. The most obvious is the ability to be location aware. The massive dollop of cream that can go on top of the mobile cake is knowing where your users are. This then enables you to pre-empt many of the tasks the user might wish to carry out at that point in time: where is the nearest store, best deals in the area, directions, etc. from a functional perspective we can better guess what they want to do.

Modern mobile clients offer APIs which allow developers to ask the user for access to their location. If the user agrees this data can be used to customise their experience. This may mean running custom queries on the DB using lon/lat or querying other APIs such as Google Places or other geo services. Want to carve out a niche for yourself? This could be just the area. I’ll leave that for another article :)

Supported OS Platforms & Widget Frameworks Outlines the various platforms and the plugins/frameworks/apis to access geo location info from the client.

Preparation

As you will be testing the site initially in your web browser it is a good idea to set it up so as to resemble a mobile browser. If you are using Firefox to test the site there are a couple of add ons, both developed by Chris Pederick, which make your life easier.

User Agent Switcher Allows you to switch your user agent across to iPhone or any other agent you care to define. This is great for testing that your redirect rules are working. Web Developer Tools Has a “resize” feature where you can define widths for 240, 320 and 480 to see what the page looks like on narrow screens.

Finally, have a couple of target devices around so that you can give the designs a final once over in the wild.

Be Sociable, Share!
Aug 25 2010
Aug 25

Drupal 7 will ship with RDFa built into the core. This will enable thousands of Drupal websites to publish semantic markup alongside the usual HTML, enabling robots such as Google and other aggregators to extract machine understandable information from the page. This is a big leap forward for Drupal and promises to bring it to the forefront of semantic web endeavours.

This deck of slides was prepared in June 2010 and presented at Drupal Camp Sydney 2010 as well as the Semantic Web Meetup in Sydney.

Drupal and the Semantic Web slides

Be Sociable, Share!
Apr 20 2010
Apr 20
We've recently started using Drupal 6 at Digg.com for all our content needs. So far so good. Everything from our jobs page, to our site tour, to the Open Source site we launched three days ago, is managed via Drupal. Read more about our use of Drupal on the Digg Blog.
Apr 19 2010
Apr 19

There's been a couple of days of public mourning and misguided anger here in Europe, with lots of emotional outbursts by drupalistas on Twitter etc. Now however, there seems to have been a slight shift in mindset, and I'd like to take this moment to present the top five alternative activities that can help cheer things up, and – in fact – help you take full advantage of this unexpected empty slot in your calendar.

1. Simulate DrupalCon: meet up, network & party

When you think about ut, it's kind of obvious: if we can't go to DrupalCon, we'll have to make [the spirit of] DrupalCon come to us! Already, there are DrupalVolCon events planned in Belgium, London and Paris. Crazy things can an will happen when crowds of slightly frustrated people with empty calendars come together.

There is also the possibility to participate in DrupalCon remotely in a more practical sense: follow the live stream which hopefully will cover keynotes and some sessions.

2. Contribute!

This is the perfect time to finally do those contributions to the Drupal project that you've postponed due to heavy work load. Submit a patch for that feature you'd like to see implemented, help fix those critical Drupal 7 bugs, or help improve the documentation.

3. Work on your pet project

Most of us have one or two really great ideas tucked away somewhere in the back of our heads. Now's the time to make things happen! Depending on what your project is, it may end up becoming new Drupal modules, but remember: most of the time someone has already done almost the same thing as you, so be sure to research the contrib module repository first, and try and build on what already exists! If you need to discuss your ideas with someone, #drupal on IRC is your friend!

4. Get back on track

This could also be your chance to get your professional life in better working order. Doing boring stuff may not be what you need to cheer up now, but it means you don't have to do it later. Use the unexpected free time to get back on track with things like:

  • Working on your email backlog.
  • Doing accounting/budgeting, etc.
  • Finding new planning tools, backup solutions etc.
  • Redecorating your office.
  • Learning new skills from books, tutorials, blogs etc.
  • Reading a book!

5. Get yourself a kitten

There's nothing to cheer things up like a cute little kitten. I all else fails, find one and give it a new home! Spend a couple of days with it at home and you'll be best friends before the end of DrupalCon.

This may sound far fetched, but it's exactly what Sofia at SthlmConnection did yesterday. So far it seems to work great!

I myself plan to do several of the things above during the next couple of days, including hanging out with cats.

Good luck!

Feb 11 2010
Feb 11

Uriverse, a Drupal based website, was released in January 2010. Much of the data in Uriverse is based upon a data import from DBpedia, a semantic version of Wikipedia. Uriverse contains over 13M nodes and contains 90 languages covering around 3M primary subjects. This article is a case study of how the import was done and the challenges faced.

Drupal proves itself to be a flexible system which can handle large amounts of data so long as some bottlenecks are worked around and the hardware, particularly RAM, is sufficient to handle database indexes. The large data set was tedious to load but in the end the value added by Drupal made it worth it.

Motivation

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. We hope this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself.

Over the years I have had an interest in the semantic web and linked data in particular. The DBpedia project had always impressed me because it was an effort which allowed Wikipedia to be used as a hub of subjects for use in the semantic web. Wikipedia represents most of the commonly discussed subjects of our times, created and edited from the ground up by real people. It therefore forms a practical basis from which to build out relationships to other data sets. If you are looking for subjects to represent things you want to talk about the DBpedia is a good place to start. There is an increasing momentum around it as the linked data meme starts to spread.

The Linked Open Data Cloud

Not only has the DBpedia project formalized these subjects, it has extracted a large amount of information which was codified in the somewhat unstructured wikitext syntax used by Wikipedia. The knowledge within Wikipedia has been made explicit in a way that can be used (browsed, queried and linked) easily by other systems. The DBpedia data set therefore provides a convenient store for import into a CMS such as Drupal.

Choosing to import DBpedia into a content management system is not necessarily a natural thing to do. The RDF data model is very simple and flexible and allows for all kinds of data structures which may not fit well within a system such as Drupal. It may seem like I was attempting to get the worms back into the can after it had been opened. At times it did :) However, there was a good deal of regularity and structure within DBpedia. The provision of an ontology and a “strict” infobox mapping in version 3.4 made the importation process possible. Whilst not everything could be imported, most of DBpedia made it in. Concessions had to be made along the way, and the places where Drupal proved to be too inflexible are indicative of areas which could be improved in Drupal. More on that later.

I chose Drupal as the target platform because I had been impressed with the flexibility I had seen from the system. Unlike other popular blogging platforms/CMSs, Drupal has a decent way of specifying a schema through the Content Construction Kit (CCK). I believed that it was possible to mimic the basic subject-predicate-object structure of RDF. Drupal supports different content types with custom properties (strings, dates, integers, floats) and relationships between objects (node references, node referrer). It also has a relatively strong category system baked in which can be used for filtering. Drupal also offers a lot of other attractive features apart from the data modeling: users, permissions, themes, maps, ratings, timelines, data views, friendly URLs, SEO, etc. Utilizing these features was the carrot to get the data into the system.

The DBpedia data set

Version 3.4 of DBpedia is based upon a September 2009 dump from Wikipedia. Here’s a quick list of some of the data set‘s statistics:

  • 2.9 million things
  • 282,000 persons
  • 339,000 places
  • 88,000 music albums
  • 44,000 films
  • 15,000 video games
  • 119,000 organizations
  • 130,000 species
  • 4,400 diseases
  • 91 different languages
  • 807,000 links to images
  • 3,840,000 links to external web pages
  • 415,000 Wikipedia categories

The data set is provided in two main formats: N-Triples and CSV. The N-Triples format is suitable for loading into RDF stores with RDF software such as ARC. The CSV format was handy for quickly loading data into a relational database such as MySQL. In a data set as big as DBpedia it is essential that you do things the quickest way lest you will be spending weeks importing the data. Here are some very rough comparisons of import speed. NB: These are from memory and are very rough estimates.

Method Throughput (triples/sec) NTriples into ARC 100 MySQL inserts from a select on another table 1 000 CSV, LOAD DATA IN FILE (same disk) 10 000 CSV, LOAD DATA IN FILE (different disk) 100 000

Obviously if you are importing 100s of millions of triples these comparisons are important. (I can’t quite believe the last result but that is what I remember it to be.) Put your dumps and DB on different disks when doing imports! Mind you, adding the indexes after the data has been imported is still a very time consuming process requiring data to be copied to disk so the super fast results are a bit misleading.

Mapping to Drupal structures

The DBpedia data set can be divided into two parts: infobox and non-infobox. Infobox data includes all of the varied properties which are applied to each class as defined in the ontology. As such, it presents a substantial data modeling exercise to fit it into Drupal. Much of my time was spent analyzing the class hierarchy and properties to work out the sweet spot as to how content types and properties would be allocated. The non-infobox data is much more predictable and easier to import in standard ways.

DBpedia Drupal Comment Resources Nodes Natural mapping. Wikipedia article maps to a DBpedia subject maps to a Drupal node. Classes Taxonomy Class hierarchy in ontology maps cleanly to taxonomy. Classes Content Types Each DBpedia class belonged to a base super class which forms the Content type. eg. actors and politicians are both Persons. Person is the base class and therefore becomes the natural candidate for the Content Type in Drupal. Categories Nodes Wikipedia categories are messy (irregular, loops) and better suited to being a Node. Language Labels Translations A base English node with translations handles the subject + different language labels for DBpedia well, if not perfectly. URI URL Alias The URL in Wikipedia maps to DBpedia maps to Drupal URL Alias. Articles Labels Node title In various languages. Long Abstract Node content In various languages. Short Abstract Node teaser In various languages. Wikipage CCK Link Applied to all translations. Homepage CCK Link Applied to all translations. Image CCK Link Applied to English version only. Instance Type Applied to English version only. Redirect Node with CCK Node Ref Different names/spellings redirect to English version. Disambiguation Node with CCK Node Ref Points to various English versions. Page Links CCK Node ref Untyped links between Nodes External Links Ignored Too many to import. Geo CCK Location Applied to English version only. Person data – Names CCK Strings Applied to people and organizations. Infobox – strict Various CCK fields DBpedia ontology specifies which classes have which properties.

Content Types, CCK and Importation Concessions

There were a number of areas where there was not a clear mapping between DBpedia and the features offered by CCK. The notes below refer to Drupal 6, and do not consider the Fields in Core initiative in Drupal 7.

Property Explosion

Drupal handles object properties through a system known as the Content Construction Kit (CCK). CCK offers a handy interface for defining a schema for various content types. Each object (node) is a instance of a single class (content type). Each node therefore has the properties of its content type.

Those of you familiar with the inner workings of CCK in Drupal 6 will understand the idiosyncrasies of the way CCK works behind the scenes. On the backend things work as expected up to a certain level and then they get a bit complicated. Properties for a content type are grouped together in a single database table, as you would expect. There are two exceptions to this rule. Firstly, if the property can have multiple values, it is stored in a separate table. This too is natural enough. Secondly, if two content types share a property then it is split out into its own table. This is a bit strange and can catch you unaware if you aren’t expecting it. However, it is sensible enough as it allows easy queries across content types on a single property.

Things become tricky when you have (i) lots of “multi” properties or (ii) lots of “shared” properties. Drupal needs to issue a new query on a different table to get the data back for that property. This is alright for most sites but has the potential to be a worry in the case of DBpedia where there are a massive amount of different relationships and in some cases relatively few instances of those relationships. ie. the data is not very dense. We are potentially talking about 100s of properties which would need to be retrieved.

Unfortunately, in these cases it make sense to pick only the properties where you get he most bang for your buck. Which “shared” properties are shared amongst the most types? Which “multi” properties have the most instances? Which “single” properties have the most density down the rows? Along the way we had to be pragmatic and pick and choose the properties we would support. At the end of the day this wasn’t a limitation of what CCK, rather a sensible decisions were made to stop the database exploding out into thousands of tables with very little data in them.

I don’t see an easy way to solve the table “explosion” problem save from moving to an RDF datastore such as that implemented by ARC. For data sets which have demands similar to DBpedia it makes sense to have something such as an RDF store in the backend. This conclusion is incredibly ironic given the lengths I have gone to to get the data into DBpedia. All was not lost however as the majority of data made it in and is usable within Drupal using standard techniques.

Interestingly this very issue seems to have plagued the backend design process for Drupal 7. According to the DADS Final Report (4th Feb 2009), “CCK 2 for Drupal will drop its variable schema and use the multi-value schema style for all fields.” Hmmm. I haven’t checked out Drupal 7 in this much detail but if this is true then the table explosion problem is going to be worse in Drupal 7. I’m not abreast of current developments here so I can’t comment further.

No sub classing properties

Drupal doesn’t allow for for sub classing content types. Each content type exists as its own base class. This means that we have to define base content types with all of the properties of the contained subclasses. The ontology in DBpedia can be up to four levels deep: Eurovsion Song Contest Entry is the deepest, preceded by Song, Musical Work, Work and finally Thing (a catch all at the root). This of course leads to base content types with many properties which will be null for the instance in question. The database tables would become very wide and have relatively low density of information.

The Taxonomy system does allow us to partially work around the sub classing problem. A class hierarchy maps nicely to a hierarchy of Terms in a Taxonomy. Further, multiple Terms can be applied to a Node making it possible to specify different classes for a node. It doesn’t cover the difficulty of property storage however.

No Multiple inheritance

When it comes to data modeling there are different ways to handle typing. The most simplistic and limited way is to allow instances to have a single class. This is the way Drupal currently works with content types and CCK. Each node belongs to a single content type and has the properties defined by CCK for that node. A more flexible way of modeling data allows for multiple inheritance where an instance can have more than one class.

Where an instance did straddle two base classes it was impossible to carry data for both types. This query shows all “people” who are “works” as well. I think these cases can be put down to DBpedia being a bit too promiscuous in the infoboxes it processes for each article. This isn’t a strong argument for multiple inheritance because the data is probably erroneous, however, it does demonstrate an area where modeling could be more flexible.

Compound Types

In some cases a compound type was required. For example, images data had three components: thumbnail link, depiction link and copyright info link. All three of these should have been considered one unit, however this is not possible through the standard CCK interface which handles atomistic primitives. Because these image properties applied to multiple content types, the end outcome was that they were represented by different database tables. It was very frustrating to know three queries were being issues when one (or none) would suffice. It is possible to define your own custom datatypes through a module but this is a fairly high barrier to jump.

Interfaces: A possible solution

Freebase is a collaboratively edited system of open data which is similar to DBpedia in many respects. The main difference is that Freebase allows users to create and edit content according to certain schemas. One of the very impressive aspects of the Freebase system is its ability to support multiple types, or co-types, for an object. From the Freebase data modeling guide we have:

A novel aspect of the Metaweb system is that instances may have multiple types. A single topic such as “Kevin Bacon” may have multiple types such as a Person, Film Actor, TV Actor and Musical Artist. and others. Since no single type could encapsulate such diversity, multiple types are required to hold all properties to fully describe Kevin Bacon and his life.

This approach has the advantage of grouping properties in a table allowing for fast retrieval and querying, as well as allowing for flexibility and sensible design. I believe that Drupal could benefit from the Freebase approach. The current content type + CCK model could remain in place AND be augmented by an interface system which allowed for grouping of properties for various types. To take the Kevin Bacon example, “Kevin Bacon” would be a Person content type with the base properties of Birthday and Deathday. “Kevin Bacon” would then have the FilmActor, TVActor and MusicalArtist interfaces which could be represented by separate tables on the backend. I believe that this offers good flexibility for those desiring a powerful system whilst maintaining simplicity for those who just need base types. It also solves a lot of the hand wringing which goes with the way some CCK tables are formed.

Import into Drupal

As a new comer to Drupal, by far the most disappointing aspect of the system was the lack of a clear and easy to understand API. I assumed that there would be a nice object abstraction which I could use to populate the system. I gradually came to understand that most development was done by examining the excretions of print_r() to determine what data was available at that particular moment. Where was the interface to content types, CCK and nodes? How could I create these programatically? There were times where I stopped and paused and considered a framework which was cleaner and more lightweight. The rich functionality was the thing that kept me though.

The large size of the data set pretty much dictated that it needed to be imported in the most efficient way possible. If I accepted a 1 second overhead for a save/update I would be waiting four months at least for the data to load. So, notwithstanding the state of the API in Drupal 6, a straight database import was the order of the day. After a bit of reverse engineering mucking around I had a few PHP/SQL scripts which could insert content pretty quickly, with insertion rates of around 1000 rows a second.

The import process followed these simplified steps for the various pieces of DBpedia.

  • Import DBpedia data from CSV format into MySQL.
  • Create a staging_node table with columns: title, language, teaser, content, url_alias, nid.
  • DBpedia data populated into staging_node and cleaned.
  • staging_node data copied into Drupal database.

The process was therefore semi automated with a number of scripts. It still took a few weeks to run through from start to finish. Importing 75M page links was the final time consuming process. I could only get throughput of about <200 rows a second as a url_alias to id lookup was required. This part of the process took around 7 days. Not something I want to repeat.

During the import I realized a few things about MySQL techniques which may come in handy for other people.

  • Loading data into MySQL direct from file, with no index is very fast.
  • It’s even faster if the source file is on a different disk to the DB.
  • The overhead from very heavy/large select queries can be minimized by using mysql_unbuffered_query. The connection can be fragile though so be prepared with an internal counter so you know where the process got up to before it died. You can’t write to the table you are reading from and it is locked for other operations.
  • Sometimes pure SQL is a good way to go: INSERT INTO SELECT
  • Sometimes joining is no good and running a SELECT for each row is best (if caching and keys) are working.
  • SHOW FULL PROCESSLIST is your friend.
  • Sorting is your enemy.
  • Dumping temp tables to disk must be avoided. Try tweaking your conf. You want to hear the cooling fans humming rather than the disk ticking away.

Drupal Features

Characteristics of the rich DBpedia data set provided a good foundation for using the following Drupal features:

  • Image thumbnails pointing to Wikipedia display in the contents expanding to full depiction via thickbox.
  • Geo cordinates for Nodes displayed in Google Map via Geo module.
  • Geo Cordinates used with Flickr API to pull back Creative Commons images for places from Flickr.
  • FiveStar ratings used on People, Places, Organizations and Music Genres.
  • Views provided top rated lists of the most popular things.
  • Simile Timeline widget used to display chronology of events.
  • Solr used as search engine and filters on language and class.
  • Solr recommends “More like this” based on Node content.
  • Filtered Views provide lookups for titles, firstname, lastname and geo coordinates
  • Various node properties allow for sorted list Views of richest people etc.

Performance Shortcomings

OK. So the data is in Drupal – were there any problems when it came to running the site? Yes, I ran into a few challenges along the way. Some were fixed, others worked around and others still remain a thorn in our side.

Database Indexes

The database is the most pressing concern when it comes to performance on a big site. If simple queries run slowly then the site will not function acceptably even for low traffic. The most important area is to ensure that indexes for key tables have been loaded into the key buffer. Lets look at a couple of simple selects with and without a primed key buffer.

Query No key buffer Key buffer select sql_no_cache title from node where nid=1000000; 0.05s 0.00s select sql_no_cache dst from url_alias where src=’node/1000000′; 0.02s 0.00s

If these indexes aren’t in RAM then the most basic of lookups will take a long time, ie. 0.07s. If you are on a page with a view with 50 nodes to look up then just getting the title and path out will take (0.07s * 50) 3.5 seconds. Note that this doesn’t include all the other processing Drupal must do. This in completely unacceptable and so it is mandatory to get these indexes into RAM. I recommend putting the following SQL into a file and running it every time MySQL is started up using the init-file variable in my.cnf.


USE drupal6;
LOAD INDEX INTO CACHE node;
LOAD INDEX INTO CACHE node_revisions;
LOAD INDEX INTO CACHE url_alias;
LOAD INDEX INTO CACHE term_data;
LOAD INDEX INTO CACHE term_node;

On massive sites you probably won’t be able to get all of the node indexes into the key buffer, even if you have been generous in its allocation (up to 50% of RAM). In this case I resorted to running a query which seems to get the node nids into the buffer whilst leaving out all the other indexes which aren’t used as much. It takes a while to run but does the trick.


select count(n.nid), count(n2.nid) from node n inner join node n2 on n.nid=n2.nid;

In the best case scenario we would all have RAM (money) to burn but unfortunately that’s generally not the case.

Core and Contributed modules

There are a few areas in Drupal which bogged down when handling huge data sets. As I developed the site I took note of problematic areas. Most of these areas are probably well known so we’ll just mention them briefly.

  • In general a node load is very heavy. Lazy loading of CCK properties would make the system much faster if CCK didn’t have to be loaded. It would also mean the API could be used when speed is an issue. ie. when processing millions of nodes at once. During import and update, the solution is to work directly with the database. Just for a laugh I tried setting the status to 0 for a large node but the 404 still tried to load the whole node and then died.
  • Editing a node with many multi properties it all but impossible. The edit page size is massive and RAM/CPU is hammered. Solution is not to edit! Real solution is to page multi properties in node edit.
  • Viewing a page with many multi properties requires that the properties be paged. Looking up all those nodes gets slow quickly even with fast database. Solution is to use a module such as CCK Pager.
  • Viewing list of content is not possible as SQL query relies on a join to user table. This query kills the database. Solution is to make a small hack to core to stop this join. If users were looked up with separate queries then this would be a better solution.
  • Search is impossible from many angles. Queries for indexing are very slow. Database not designed to handle such large amounts of data. Solution is to let Solr handle search.
  • Solr search design is generally good with data held in apachesolr_search_node. However, Solr indexing can be a drain if you exclude node types from search. The preparatory query to return the nodes will inner join to node and lead to a very slow query. It returns after a while (70s for me) so you can live with it. Definitely not something you want to be doing regularly on production server as CPU goes to 100%. Solution is (i) to replicate node type data in apachesolr_search_node or (ii) get the excluded nodes anyway and ignore them. First option is best.
  • Taxonomy pages fail when there are many nodes in a category. SQL query is very slow. Solution is to let Solr show Taxonomy pages.
  • Strangely, the Search module still was running even when Solr was handling search. Core had to be hacked to turn it off. There must be a better way but I couldn’t see it. Make sure the “search” tables aren’t populated when using Solr.
  • Views queries can be slow with no extra indexes on content tables.
  • Views queries use left joins which can be slow. Inner joins exclude a lot of rows you don’t need. Solution is to rewrite the Views SQL with a hook.
  • Displaying nodes or teasers in Views can be very RAM intensive if the displayed nodes are very heavy with CCK fields. Solution is to use fields.
  • The Voting API module stores data in an inefficient way leading to some hairy indexes on the table and some equally hairy queries in Views when trying to display ratings. Be careful.
  • CCK node referrers datatype has a mandatory sort order specified. This kills query performance for large result sets.
  • XML Sitemap has some queries which return massive data sets which need to be dumped to disk. I know this module is in the process of being reworked.

Solr: Some stats

Uriverse uses Solr as its search engine and it is a component which has performed remarkably well. The filtering capabilities of Solr are excellent and the speed at which results come back are very impressive, even for large corpuses. Since it runs as a service over HTTP it is possible to deploy it to a second server to reduce load on the web server/DB box.

It takes a while to index 10M articles (categories, redirects, disambiguations, photos and pages were excluded from the 13M) and it is a task not suited for the standard Drupal cron. A custom script was written to build the index so that it could hammer away almost constantly without upsetting the other tasks cron performs. Initially the script was written to be a long running process which would potentially run for months. However, a memory leak in Drupal meant that this was not possible. After 10 000 nodes RAM became to much for PHP and the script died. The solution was to limit the number of nodes processed each time. The script now processes around 8 000 nodes every 30 minutes. It therefore takes around a month to build the index.

On the web there are quite a few articles regarding memory usage with Solr. The JVM needs to be given enough room when started. These reports had me concerned because I am running 32 bit machines with only 3G at my disposal. Would an index of 10M articles run in a JVM limited to around 2G? What size would the index be on disk? These are the numbers for the 7883304 articles currently in the index:

Resource Total Per Article Disk 47.5 GB 6.0 KB RAM 1.5 GB 198 B

Obviously these numbers are dependent on the average size of the title, body, fields and taxonomy. RAM is also affected by number of facets, sorting and server load. I therefore I have been very conservative in what I have indexed and have turned of sorting in the interface. It looks like the current RAM allocation will be sufficient.

Performance

Most Drupal performance best practices (opcode cache, aggregation, compression, Boost, Expires, database indexes, views and block caches) have been followed to get the most out of the site. A CDN has not been deployed because Uriverse doesn’t serve many images from its own server. The images that are served have an expires header and all other images come from Wikipedia and Flickr which probably have their own CDN solutions in place. Further optimizations would include going with MPM Worker + fcgid or Nginx. This will be required if traffic picks up and MaxClients is reached.

Two problematic areas remain. The first is the amount of RAM available for the database indexes. It would be nice to be able to increase that one day. Ordinary node page build times do come in at respectable times considering so this is not such a big issue. The second problematic area is some of the queries in Views. A bit more research is required here but it is likely that some Views will have to be dumped if they are hitting the disk with temp tables. Sometimes its easiest to forgo some functionality to maintain the health of the server.

Conclusion

All up the project has taken longer than expected – more than a few months. Most of the time was spent wrangling the data rather than fighting Drupal, although there were quite a few issues to work through, this being my first serious Drupal project. If I knew of the pain I would suffer from having to massage and prepare the data as well as the patience required to babysit the import process over days and weeks then I probably wouldn’t have commenced the project. That said, I am pleased with the outcome now that it is all in. I am able to leverage the power of many contributed modules to bring the data to life. There is a great sense of satisfaction seeing Solr return results from 10M articles in 90 languages, as well as the pretty theme, Google Maps, Thickbox, Similie timelines and Five Star ratings. I am humbled by the efforts of all Drupal contributors over the years. What I am left with now is a good platform which will form the basis of future data aggregation efforts.

Appendices

Class hierarchy

Be Sociable, Share!
Feb 08 2010
Feb 08
To do list

Time to revisit the different types of Drupal sites to see where gains can be made. What type of site do you have? This quick reference recaps the previous articles and lists the areas where different types of Drupal sites can improve performance.

All Sites

  • Get the best server for your budget and requirements.
  • Enable CSS and JS optimization in Drupal
  • Enable compression in Drupal
  • Enable Drupal page cache and consider Boost
  • Install APC if available
  • Ensure no slow queries from rouge modules
  • Tune MySQL for decent query cache and key buffer
  • Optimize file size where possible

Server: Low resources

  • Boost stops PHP load and Bootstrap
  • Sensible module selection
  • Avoid node load in views lists
  • Smaller JVMs possibly if running Solr
  • Nginx smaller than Apache
  • mod_fcgid has smaller footprint over mod_php

Server: Farm

  • Split off Solr
  • Split off DB server, watch the latency
  • With Cache Router select Memcache over APC for shared pools
  • Master + slaves for DB
  • Load balancing across web servers

Size: Many Nodes

  • Buy more RAM for database indexes
  • Index columns, especially for views
  • Thoroughly check slow queries
  • Warm up database
  • Swap in Solr for search
  • Solr to handle taxonomy pages

Activity: Many requests

  • Boost or
  • Pressflow and Varnish
  • Nginx over Apache
  • InnoDB on cache tables

Users: Mainly logged in

  • View/Block caching
  • CacheRouter (APC or Memcache)

Contention: Many Writes

  • InnoDB
  • Watchdog to file

Content: Heavy

  • Optimized files
  • Well positioned server
  • CDN

Functionality: Rich

  • Well behaved modules
  • Not too many modules
  • View/Block caching

Page browsing: Dispersed

  • Boost over Varnish if RAM is tight

Audience: Dispersed

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

Be Sociable, Share!
Feb 07 2010
Feb 07
Slow

The time for a page to render in a user’s browser is comprised of two factors. The first is the time it takes to build a page on the server. The second is the time it takes to send and render the page with all the contained components. This guide has mainly been concerned with the former – how to get the most from your server, however, it is estimated that 80% to 90% of page rendering time is taken up during the rendering phase.

It’s no good to serve a cached page in the blink of an eye if there are countless included files which need to be requested and many large images which need to be transported across the globe. Optimizing page rendering time can make a noticeable difference to the user and is the cream on the cake of a well optimized site. It is therefore important to consider and optimize this final leg of the journey.

Improving Drupal’s page loading performance Wim Leers covers all the bases on how to improve loading performance. High Performance Web Sites: Essential Knowledge for Front-End Engineers Steve Souders, Chief Performance Yahoo! and author of YSlow extension, covers the Yahoo recommedations in this book. High Even Faster Web Sites: Performance Best Practices for Web Developers Another Steve Souders book covering Javascript (AJAX), Network (Image compression, chuncked encoding) and browser (CSS selectors, etc).

It is worthwhile reviewing Yahoo’s YSlow recommendations to see all of the optimizations which are possible. We cover selected areas where the default Drupal install can be improved upon.

Combined Files

The Out of The Box section covered the inbuilt CSS and JS aggregation and file compression. The use of “combined files” is a significant factor in Drupal’s relatively good score in the YSlow tests. Make sure you have this enabled.

All sites: Enable CSS and JS aggregation.

CSS Sprites

CSS Image Sprites are another method of cutting down the number of requests. This approach combines a number of smaller images into one large one which is then selectively displayed to the user through the use of background offset in CSS. It is a useful approach for thing such as small icons which can have a relatively large amount of HTTP overhead for each request. Something for the theme designers to consider.

Custom designs: Use CSS sprites if appropriate.

CSS Sprites: Image Slicing’s Kiss of Death Overview of how CSS sprites work and how they can be used. A lesson in the usefulness of CSS sprite generators Covers commonly used spite generators.

This is the number two recommended best practice.

A content delivery network (CDN) is a collection of web servers distributed across multiple locations to deliver content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.
http://developer.yahoo.com/performance/rules.html#cdn

Of all the CDN web services SimpleCDN seems to be getting positive press amongst Drupal folks as it is simple and cheap. It offers the “origin pull” Mirror Buckets service which will serve content from 3.9 cents to 1.9 cents per GB. At this price you will probably be saving money on your bandwidth costs as well as serving content faster.

The CDN integration module is the recommended module to use for integration with content delivery networks as it supports “origin pull” as well as push methods. It supports content delivery for a all CSS, JS, and image files (including ImageCache).

High traffic, geographically dispersed: use CDN

CDN integration module Wim Leers’ fully featured module which integrates with a wide range of CDN servers. SimpleCDN module Simple CDN re-writes the URL of certain website elements (which can be extended using plugins) for use with a CDN Mirror service. Drupal CDN integration: easier, more flexible and faster! Slides covering advantages of CDNs and possible implementations. mod_cdn Apache2 module which shows some promise but not much info available for it with regards to Drupal. Best Drupal CDN module? Drupal Groups discussion.

On a related note many sites can benefit from judicial placement of the server if traffic tends to come from one place and no CDN is being used. Sites based out of the US may find the proximity of a site hosted in their area worth the extra cost of hosting.

When a file is served by a web server an “Expires” header can be sent back to the client telling it that the content being sent will expire at a certain date in the future and that the content may be cached until that time. This speeds up page rendering because the client doesn’t have to send a GET request to see if the file has been modified.

By default the .htaccess file in the root of Drupal contains rules which sets a two week expiry for all files (CSS, JS, PNG, JPG, GIF) except for HTML which are considered to be dynamic and therefore not cachable.


# Requires mod_expires to be enabled.

# Enable expirations.
ExpiresActive On
# Cache all files for 2 weeks after access (A).
ExpiresDefault A1209600
# Do not cache dynamically generated pages.
ExpiresByType text/html A1

The Expires header will not be generated unless you have mod_expires enabled in Apache. To make sure it is enabled in Apache2 run the following as admin.


# a2enmod expires
# /etc/init.d/apache2 restart

Ensuring this is enabled will elevate your YSlow score by about 10 points or so.

All sites: Configue Apache correctly for fewer requests.

You can Gzip by enabling compression in the performance area of admin. Alternatively you could configure Apache to do it.

All Sites: Enable Gzip compression

Binary files do not shrink significantly after Gzip compression. Gains can be made by ensuring that rich media such as images, audio and video are (i) targeted for the correct display resolution and (ii) have an appropriate amount of lossy compression applied. Since these files will generally only be downloaded once they do not benefit from caching in the client and so care must be taken to ensure that they are as small as reasonably possible.

All Sites: Compress binary files

Pngcrush Pngcrush is an optimizer for PNG (Portable Network Graphics) files. It can be run from a commandline in an MSDOS window, or from a UNIX or LINUX commandline.

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

Be Sociable, Share!
Feb 06 2010
Feb 06
Blue Tape

Benchmarking a system is a reliable way to compare one setup with another and is particularly helpful when comparing different server configurations. We cover a few simple ways to benchmark a Drupal website.

A performant system is not just one which is fast for a single request. You also need to consider how the system performs under stress (many requests) and how stable the system is (memory). Bechmarking with tools such as ab allows you to stress the server with many concurrent requests to replicate traffic when a site is being slashdotted. With a more customised setup they can also be used in more sophisticated ways to mimic traffic across a whole site.

Documentation which covers tools of the trade including Apache Bench (ab) and SIEGE.

ab is the most commonly used benchmarking tool in the community. It shows you have many requests per second your site is capable of serving. Concurrency can be set to 1 to get end to end speed results or increased to get a more realistic load for your site. Look to the “failed requests” and “request per second” results.

In order to test the speed of a single page, turn off page caching and run ab with concurrency of one to get a baseline.

ab -n 1000 -c 1 http://drupal6/node/1

To check scalability turn on the page cache and ramp up concurrent connections (10 to 50) to see how much the server can handle. You should also make sure keep alives are turned (-k) on as this leads to a more realistic result for a typical web browser. At higher concurrency levels making new connections can be a bottleneck. Also, set compression headers (-H) as most clients will support this feature.

ab -n 1000 -c 10 -k -H 'Accept-Encoding: gzip,deflate' http://drupal6/node/1

Testing with ab and simple changes you can make within Drupal. Covers server side tools and walks through ab options and use. Demonstrates how to pull out current session id and how to pass that to ab so that authenticated users can be tested. Illustrative discussion where different Drupal setups are benchmarked with ab.

JMeter is a Java desktop app designed to test function and performance. It is the preferred testing tool of many administrators.

Perl script which runs a JMeter test on Drupal and provides graphs. Some scripts to get you started testing with JMeter.

Benchmarking is essential if you wish to have an objective comparison between different setups. However, it is not the final measurement with regards to performance. Remember that page rendering times are what are important for users and that too needs to be optimized. Also, benchmarks tend to be artificial in the sense that they often measure unrealistic situations. Will all of your requests be for one anonymous page only? Maybe in the Slashdot situation but there are other considerations obviously. Finally, it is easy to focus intently on the number, especially when it comes to caching scores, and forget that minor differences may not make so much of a difference to real life scenarios. Don’t forget the logged in user.

This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.

Aug 18 2009
Aug 18

both the 5.x and 6.x versions are now available for download on github. sorry, i just can't do CVS anymore. to download:

  1. start by going here: http://github.com/cailinanne/log4drupal
  2. then click the all tags drop-down and choose the appropriate version
  3. then click the download button

a full description of the module is available here

available versions

log4drupal_5.x-2.0

This is the stable version for Drupal 5.x. Note - if you're upgrading from a previous version, after install, just visit the log4drupal admin page, and make sure your options are okay. Minor changes were made to the admin options.

The major change from previous versions is the new ability to automatically recursively print out arguments. For example :

log_debug("This message will recursively print out the contents of the node variable", $node);

log4drupal_6.x-2.0

a bug fix to the version described here. allows proper operation if you choose the Path relative to Drupal root option for Filename precision.

Mar 30 2009
Mar 30

drupal 6 included an upgrade to the built in logging functionality (watchdog). drupal 6 exposes a new hook, hook_watchdog which modules may implement to log Drupal events to custom destinations. it also includes two implementations, the dblog module which logs to the watchdog table, and the syslog module which logs to syslog.

with these upgrades, log4drupal is less critical addition to a drupal install, and i hesitated before providing a drupal 6 upgrade. however, eventually i decided that log4drupal is still a useful addition to a drupal development environment as log4drupal provides the following features still not provided by the upgraded drupal 6 watchdog implementation :

  • a java-style stacktrace including file and line numbers, showing the path of execution
  • automatic recursive printing of all variables passed to the log methods
  • ability to change the logging level on the fly

in addition, the drupal 6 version of log4drupal includes the following upgrades from the drupal 5 version

  • all messages sent to the watchdog method are also output via log4drupal
  • severity levels have been expanded to confirm to RFC 3164
  • log module now loaded during the drupal bootstrap phase so that messages may be added within hook_boot implementations.

you may download the drupal 6 version here. see below for general information on what this module is about and how it works.

what is log4drupal?

log4drupal is a simple api that writes messages to a log file. each message is tagged with a particular log priority level (e.g., debug, info, warn, error or emergency) and you may also set the overall log threshold for your system. only messages with a priority level above your system threshold are actually printed to your log file. the system threshold may changed at any time, using the log4drupal administrative interface. you may also specify a level above which a stack trace will be appended to each mesage.

installation

don't forget to read the included README.txt file. before enabling this module you must install the Pear Log package on your server.

module options

the module options are very similar to the drupal 5 version, and are shown below.

log4drupal admin screen

examples

log4drupal is best explained by example. suppose you had a module example.module containing the following code:

function example_user($op, &$edit, &$account, $category = NULL) {

  log_debug("The example_user method has been called for operation $op for account", $account);
 
  if($account->uid == 1) {
    log_warn("That nasty super-user is creeping around again!!");
  }
 
}

if your logging level is set to debug and you stacktrace level is set to warn, your log file will contain the following messages. notice that the $account object is automatically rendered recursively.

[10:29:19 03/30/09] [debug] [example.module:5] The example_user method has been called for operation load for account
stdClass Object
(
    [uid] => 1
    [name] => admin
    [pass] => 350c27da74479768b5402673ce
    [mail] => [email protected]
    [mode] => 0
    [sort] => 0
    [threshold] => 0
    [theme] =>
    [signature] =>
    [created] => 1237827294
    [access] => 1238434111
    [login] => 1238429857
    [status] => 1
    [timezone] =>
    [language] =>
    [picture] =>
    [init] => [email protected]
    [data] => a:0:{}
    [roles] => Array
        (
            [2] => authenticated user
        )

)
[10:29:19 03/30/09] [warning] [example.module:8] That nasty super-user is creeping around again!!
  at /var/www/drupal/sites/all/modules/example/example.module:8
  at /var/www/drupal/modules/user/user.module:22
  at /var/www/drupal/modules/user/user.module:183
  at /var/www/drupal/modules/user/user.module:1125
  at /var/www/drupal/includes/menu.inc:410
  at /var/www/drupal/includes/menu.inc:653
  at /var/www/drupal/includes/menu.inc:1010
  at /var/www/drupal/includes/menu.inc:999
  at /var/www/drupal/includes/menu.inc:948
  at /var/www/drupal/includes/menu.inc:719
  at /var/www/drupal/modules/user/user.module:736
  at /var/www/drupal/includes/module.inc:450
  at /var/www/drupal/modules/block/block.module:473
  at /var/www/drupal/includes/theme.inc:1571
  at /var/www/drupal/includes/theme.inc:617
  at /var/www/drupal/includes/theme.inc:1765
  at /var/www/drupal/includes/theme.inc:658
  at /var/www/drupal/index.php:36

if your logging level is set to warn and you stacktrace level is set to error, your log file will contain the following messages. notice that the debug message is no longer printed at all, and the warn message no longer includes a stacktrace.

[10:40:06 03/30/09] [warning] [example.module:8] That nasty super-user is creeping around again!!

all watchdog messages will also appear in your log4drupal log file. for example, as long as your logging level is set to notice or below, you will see the following message in your log file each time a user logs in

[11:02:43 03/30/09] [notice] [user.module:1368] Session opened for admin.

suggestions for new features always welcome.

Mar 26 2009
Mar 26

installing drupal is pretty easy, but it's even easier if you have a step by step guide. i've written one that will produce a basic working configuration with drupal6 on debian lenny with php5, mysql5 and apache2.

all commands that follow assume that you are the root user.

let's get started!

install the dependencies

# apt-get install mysql-server
# apt-get install apache2
# apt-get install php5
# apt-get install php5-mysql
# apt-get install php5-gd

there aren't many options given when installing those packages. you may set a root password for mysql if you like (or not - it doesn't matter). next, restart apache to make it aware of the php installation.

# /etc/init.d/apache2 restart

verify your base apache install

if you've configured DNS with your hosting provider properly, when you go to your browser and type in http://www.example.com you should see the message "It Works!". if you don't, stop here and find somebody to help you with DNS and apache before continuing with these instructions.

download and extract drupal

start with the drupal homepage, and find the Download Drupal 6.x link. In the code below, you'll need to replace the 6.X with the version you are actually downloading.

# cd /tmp
# wget http://ftp.drupal.org/files/projects/drupal-6.X.tar.gz
# gunzip drupal-6.X.tar.gz
# tar -xvf drupal-6.X.tar

here these instructions differ slightly from those provided with your drupal install. the packaged instructions suggest putting all the drupal directories directly inside /var/www. i prefer to contain them within a /drupal directory. if you are running several sub-domains on this apache server, this is a preferable set-up.

below, we move all the drupal stuff to =/var/www/drupal= and set various permissions appropriately

# mkdir /var/www/drupal
# mv drupal-6.10/*  /var/www/drupal/
# #  cd /var/www/drupal
# mv sites/default/default.settings.php sites/default/settings.php
# chmod o+w sites/default/settings.php
# chmod o+w sites/default
# chown -R www-data.www-data /var/www/drupal

set-up mysql

in this example, we create a database called drupaldb, a user called drupal with password lemon

first, create the drupaldb database.

#mysqladmin -p create drupaldb

next, create the drupal mysql user and set the permissions and password appropriately. note - the drupal user is a mysql user - not a linux shell user.

# mysql -p
mysql> GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES ON drupaldb.* TO 'drupal'@'localhost'  IDENTIFIED BY 'lemon';
mysql>FLUSH PRIVILEGES;
mysql>quit;

finally, tell drupal what you did

# cd /var/www/drupal
# vi sites/default/settings.php

find the line that starts with $db_url and change it to

$db_url = 'mysql://drupal:[email protected]/drupaldb';

fire up drupal

go to your browser and type http://www.example.com/drupal/index.php. you will be redirected to http://www.example.com/drupal/install.php?profile=default. (note: if you are NOT automatically redirected to install.php, and instead get a page full of SQL errors, just proceed manually to install.php). follow the instructions to set up your super-user account.

enable clean urls in apache (optional)

add the following to /etc/apache2/sites-available/default just above the closing </VirtualHost> tag.

<Directory /var/www/drupal>
   Options -Indexes FollowSymLinks MultiViews
   AllowOverride All
   Order allow,deny
   allow from all
</Directory>

make sure that mod-rewrite is enabled, and then restart apache.

# a2enmod rewrite
# /etc/init.d/apache2 restart

now go the URL http://www.example.com/drupal/?q=admin/settings/clean-urls, take the test (hopefully you'll pass) and, when you do, turn on clean urls.

set up an apache virtual host (optional)

it's nice to setup an apache virtual host for your drupal site. this allows you to create custom logging, remove the /drupal/ from your urls and nicely encapsulate the directives for drupal. here's how you can do it.

create a file in /etc/apache2/sites-available called www.example.com that looks something like this:

<VirtualHost *>
   ServerName myserver.mydomain.com
   DocumentRoot /var/www/drupal
        <Directory />
                Options -Indexes FollowSymLinks MultiViews
                AllowOverride All
                Order allow,deny
                allow from all
        </Directory>
        ErrorLog /var/log/apache2/error.log

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn

        CustomLog /var/log/apache2/access.log combined
</VirtualHost>

now, check the file called default in this same directory. make sure the top two lines say the following

NameVirtualHost *
<VirtualHost *>

finally, enable your new virtual host and restart apache one more time.

# a2ensite www.example.com
# /etc/init.d/apache2 restart

go to your browser and visit http://www.example.com. if you still see the apache default page ("It Works!") instead of your drupal home page, just delete the default index.html file

# rm /var/www/index.html

www
Jan 05 2009
Jan 05

traffic to a website can be divided into four major sources : direct, paid, organic and referrals. unsurprisingly, google analytics segments the traffic sources reports accordingly.

there is, however, a small catch. the ever growing popularity of search engines has led to an odd use case : users who use a search engine to search for exactly your domain name, instead of simply typing www.mydomain.com into their web browser. these users have just reached your site via an "organic search" and google analytics will classify them accordingly.

technically this is correct, but semantically it's troubling. the users who have reached your site by typing "mydomain" into Google have far more in common with the users that entered www.mydomain.com into their URL bar and far less in common with those users that reached your site by typing "my optimized search term" into Google. and the population of these users is not small - on one of the commercial drupal sites that i maintain these "mydomain" Google searchers account for over one third of the supposedly organic traffic.

before the release of google analytics advanced segments, one could estimate the volume of "True Organic" pageviews by starting with the organic search volume, then using the keyword report to subtract all the "mydomain" keywords (mydomain, mydomain.com, and, my personal favorite www.mydomain.com).

thankfully, advanced segments now gives us an easy way to create a "True Direct" and "True Organic" segment - in which all the "mydomain" organic searches have been removed from the organic segment, and stuck in the direct segment instead.

to define the "True Organic" segment, simply create an advanced segment in which the medium is organic, but the keywords does not contain your domain name (see diagram 1, below).

defining the "True Direct" segment is slightly more complicated. to do this we create an advanced segment in which either the source is direct or the medium is organic and there either is no keyword (users typing www.mydomain.com into their web browser) or the keyword contains your domain name. see diagram 2, below.

once these segments have been created, you can apply them to your traffic sources reports - and finally get the real answer!

diagram 1 : defining a true organic segment

diagram 2 : defining a true direct segment

Aug 26 2008
Aug 26

a while ago i posted some performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud.

amazon have just released ebs, the final piece of technology that makes their ec2 platform really viable for running lamp stacks stuck as drupal.

ebs, the "elastic block store", provides sophisticated storage for your database instance, with features including:

  • high io throughput
  • data replication
  • large storage capacity
  • hot backups using snapshots
  • instance type portability e.g. quickly swapping your database hardware for a bigger machine.

amazon also have a great mysql on ebs tutorial on their developer connection.

let me know if you've given this a go. it looks like a great platform.

Apr 15 2008
Apr 15
recently i posted some encouraging performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud. while the performance was encouraging, the suitability of this environment for running lamp stacks was not. ec2 had some fundamental issues including a lack of static ip addresses and no viable persistent storage mechanism.

amazon are quickly rectifying these problems, and recently announced elasic ip addresses; a "static" ip address that you own and can dynamically point at any of your instances.

today amazon indicated that persistent storage will soon be available. they claim that this storage will:

  • behave like raw, unformatted hard drives or block devices
  • be significantly more durable than the local disks within an amazon ec2 instance
  • support snapshots backed up to S3
  • support volumes ranging in size from 1GB to 1TB
  • allow the attachment of multiple volumes to a single instance
  • allow high throughput, low latency access from amazon ec2
  • support applications including relational databases, distributed file systems and hadoop processing clusters using amazon ec2

if this works as advertised, it will make ec2 a wonderful platform for your lamp application. amazon promise public availability of this service later this year.

Mar 31 2008
Mar 31

three weeks ago, zicasso.com launched a drupal-powered free personalized online travel service that aims to connect travelers to a global network of quality, pre-screened travel companies. unlike many internet travel sites which provide cheap fares or packages, zicasso is targeted for busy, discerning travelers who want to plan and book complex trips (the ones with multiple destination stops or activities).

zicasso was favorably reviewed in popular web publications including; pc magazine, techcrunch, ars technica and the san jose business journal.

zicasso chose to build their application using the open-source cms system, drupal to leverage the wide array of web2.0 functionality provided by the open source community.

the application was rapidly constructed by a small development team led by cailin nelson and jenny dickinson. the team took advantage of "core" drupal modules including cck, panels, views, imagecache, workflow and actions.

using drupal, the team leveraged existing frameworks for much of the development including user authentication and management, workflow, security, administration, content management, blogging, tagging, text indexing and searching, sophisticated form flows, access control, search engine integration and optimization and image manipulation and management.

initial load testing demonstrated high application scalability, even with a fairly simple production deployment architecture.

Jan 28 2008
Jan 28
amazon's elastic compute cloud, "ec2", provides a flexible and scalable hosting option for applications. while ec2 is not inherently suited for running application stacks with relational databases such as lamp, it does provide many advantages over traditional hosting solutions.

in this article we get a sense of lamp performance on ec2 by running a series of benchmarks on the drupal cms system. these benchmarks establish read throughput numbers for logged-in and logged-out users, for each of amazon's hardware classes.

we also look at op-code caching, and gauge it's performance benefit in cpu-bound lamp deployments.

the elastic compute cloud

amazon uses xen based virtualization technology to implement ec2. the cloud makes provisioning a machine as easy as executing a simple script command. when you are through with the machine, you simply terminate it and pay only for the hours that you've used.

ec2 provides three types of virtual hardware that you can instantiate. these are summarized in the table below.

machine type hourly cost memory cpu units platform small Instance $0.10 1.7 GB 1 32-bit platform large Instance $0.40 7.5 GB 4 64-bit platform extra large Instance $0.80 15 GB 8 64-bit platform note: one compute unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

target deployments

to keep things relatively simple, the target deployment for our load test is basic; the full lamp stack runs on a single server. this is step zero in the five deployment steps that i outlined in an open-source infrastructure for high-traffic drupal sites.

our benchmark

our benchmark consists of a base drupal install, with 5,000 users and 50,000 nodes of content-type "page". nodes are an even distribution of 3 sizes, 1K, 3K and 22K. the total database size is 500Mb.

during the test, 10 threads read nodes continually over a 5 minute period. 5 threads operate logged-in. the other 5 threads operate anonymously (logged-out). each thread reads nodes randomly from the pool of 50,000 available.

this test is a "maximum" throughput test. it creates enough load to utilize all of the critical server resource (cpu in this case). the throughput and response times are measured at that load. tests to measure performance under varying load conditions would also be very interesting, but are outside the scope of this article.

the tests are designed to benchmark the lamp stack, rather than weighting it towards apache. consequently they do not load external resources. that is, external images, css javascript files etc are not loaded, only the initial text/html page. this effectively simulates drupal running with an external content server or cdn.

the benchmark runs in apache jmeter. jmeter runs on a dedicated small-instance on ec2.

benchmarking is done with op-code caching on and off. since our tests are cpu bound, op-code caching makes a significant difference to php's cpu consumption.

our testing environment

the tests use a debian etch xen instance, running on ec2. this instance is installed with:
  • MySQL: 5.0.32
  • PHP: 5.2.0-8
  • Apache: 2.2.3
  • APC: 3.0.16
  • Debian Etch
  • Linux kernel: 2.6.16-xenU

the tests use a default drupal installation. drupal's caching mode is set to "normal". no performance tuning was done on apache, mysql or php.

the results

all the tests ran without error. each of the tests resulted in the server running at close to 100% cpu capacity. the tests typically reached steady state within 30s. throughputs gained via jmeter were sanity checked for accuracy against the http and mysql logs. the raw results of the tests are shown in the table below. instance apc? logged-in throughput logged-in response logged-out throughput logged-out response small off 194 1.50 664 0.45 large off 639 0.46 2,703 0.11 xlarge off 1,360 0.20 3,741 0.08 small on 905 0.30 3,838 0.07 large on 3,106 0.10 8,033 0.04 xlarge on 4,653 0.06 12,548 0.02

note: response times are in seconds, throughputs are in pages per minute

the results - throughput

the throughput of the system was significantly higher for the larger instance types. throughput for the logged-in threads was consistently 3x lower than the logged-out threads. this is almost certainly due to the drupal cache (set to "normal").

throughput was also increased by about 4x with the use of the apc op-code cache.


the results - response times

the average response times were good in all the tests. the slowest tests yielded average times of 1.5s. again, response times where significantly better on the better hardware and reduced further by the use of apc.


conclusions

drupal systems perform very well on amazon ec2, even with a simple single machine deployment. the larger hardware types perform significantly better, producing up to 12,500 pages per minute. this could be increased significantly by clustering as outlined here.

the apc op-code cache increases performance by a factor of roughly 4x.

these results are directly applicable to other cpu bound lamp application stacks. more consideration should be given to applications bound on other external resources, such as database queries. for example, in a database bound system, drupal's built-in cache would improve performance more significantly, creating a bigger divergence in logged-out vs logged-in throughput and response times.

although performance is good on ec2, i'm not recommending that you rush out and deploy your lamp application there. there are significant challenges in doing so and ec2 is still in beta at the time of writing (Jan 08). it's not for the faint-of-heart. i'll follow up in a later blog with more details on recommended configurations.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

resources

Jan 19 2008
Jan 19
i recently posted an introductory article on using jmeter to load test your drupal application. if you've read this article and are curious about how to build a more sophisticated test that mimics realistic load on your site, read on.

the previous article showed you how to set up jmeter and create a basic test. to produce a more realistic test you should simulate "real world" use of your site. this typically involves simulating logged-in and logged-out users browsing and creating content. jmeter has some great functionality to help you do this.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application. if you aren't already familiar with jmeter, i'd strongly recommend that you read my first post before this one.

an overview

the http protocol exchanges for realistic tests are quite complex, and painful to manually replicate. jmeter kindly includes http-proxy functionality, that allows you to "record" browser based actions, which can be used to form the basis of your test. after recording, you can manually edit these actions to sculpt your test precisely.

our test - browsers and creators

as an example, let's create a test with two test groups: creators and browsers. creators are users that arrive at the site, stay logged out, browse a few pages, create a page and then leave. browsers, are less motivated individuals. they arrive at the site, log in, browse some content and then leave.

setting up the test - simulating creators

to create our test, fire up jmeter and do the following.

create a thread group. call it "creators". add a "http request defaults" object to the thread group. check the "retrieve all embedded resources from html files" box.

add a cookie manager to the thread group of type "compatibility". add an "http proxy server to the workbench", as follows:


modify the "content-type filter" to "text/html". your jmeter-proxy should now look like:


navigate in your browser to the start of your test e.g. your home page. clear your cookies (using the clear private data setting). open up the "connection settings option" in firefox preferences and specify a manual proxy configuration of localhost, port 8080. this should look like:


note: you can also do this using internet explorer. in ie7 go to the "connections" tab of the internet options dialog. click the "lan settings" button, and setup your proxy.

start the jmeter-proxy. record your test by performing actions in your browser: (a) browse to two pages and (b) create a page. you should see your test "writing itself". that should feel good.

now stop the jmeter-proxy. your test should look similar to:


setting up the test - simulating browsers

create another thread group above the first. call it browsers. again, add a "http request defaults" object to the thread group. check the "retrieve all embedded resources from html files" box.

add a cookie manager to the thread group of type "compatibility". start the jmeter-proxy again. record your test by performing actions: (a) login and then (b) browse three pages. your test should look like:


stop the jmeter-proxy. undo the firefox proxy.

setting up the test - cleaning up

you can now clean up the test as you see fit. i'd recommend:
  • change the number of threads and iterations on both thread-groups to simulate the load that you care about.
  • modify the login to happen only once on a thread. see the diagram below.


and optionally:

  • rename items to be more meaningful.
  • insert sensible timers between requests.
  • insert assertions to verify results.
  • add listeners to each thread group. i recommend a "graph results" and a "view results tree" listener.

your final test should look like the one below. note that i didn't clutter the example with assertion and timers:


running your test

you should now be ready to run your test. as usual, click through to the detailed results in the tree to verify that your test is doing something sensible. ideally you should do this automatically with assertions. your results should look like:


notes

the test examples that i chose intentionally avoided logged-in users creating content. you'll probably want these users to create content, but you'll likely get tripped up by drupal's form token validation, designed to block spammers and increase security. modifying the test to work around this is beyond the scope of this article, and probably not the best way to solve the problem. if someone knows of a nice clean way to disable this in drupal temporarily, perhaps they could comment on this article.

resources

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Jan 14 2008
Jan 14
there are many things that you can do to improve your drupal application's scalability, some of which we discussed in the recent scaling drupal - an open-source infrastructure for high-traffic drupal sites article.

when making scalability modifications to your system, it's important to quantify their effect, since some changes may have no effect or even decrease your scalability. the value of advertised scalability techniques often depends greatly on your particular application and network infrastructure, sometimes creating additional complexity with little benefit.

apache jmeter is a great tool to simulate load on your system and measure performance under that load. in this article, i demonstrate how to setup a testing environment, create a simple test and evaluate the results.

as usual, all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications. also, although i'm discussing drupal testing, the method below really applies to any web application.

the testing environment

you should install and run the jmeter code on a server that has good resources and has a high-bandwidth, low-latency network access to your application server or load balancer. the maximum load that you can simulate is clearly constrained on these parameters. so are the accuracy of your timing results. therefore, for very large deployments you may need to run multiple non-gui jmeter instances on several test machines, but for most of us, a simple one test-machine configuration will suffice, i recently simulated over 12K pageviews/minute from a modest single-core server that wasn't close to capacity.

jmeter has a great graphical interface that allows you to define, run and analyze your tests visually. a convenient way to run this is to ssh to the jmeter test machine using x forwarding, from a machine running an x server. this should be as simple as issuing the command:

$ ssh -x testmachine.example.com

note, you'll need a minimal x install on your server for this. you can get one with:

$ sudo apt-get install xserver-xorg-core xorg

and then running the jmeter gui from that ssh session. jmeter should now appear on your local display, but run on the test machine itself. if you are having problems with this, skip to troubleshooting a the end of this article. this setup is good for testing a remote deployment. you can also run the gui on windows.

x forwarding can become unbearably slow once your test is running, if the test saturates your test server's network connection. if so, you might consider defining the test using the gui and running it on the command line. read more about remote testing on the apache site, and on command line jmeter later in this article.

setting up the test server - download and install java

jmeter is a 100% java implementation, so you'll need a functional java runtime install.

if you don't have java 1.4 or later, then you should start by installing it. to do so, make sure you've got a line in /etc/apt/sources.list like this:

deb http://ftp.debian.org/debian/ etch main contrib non-free

if you don't then add it, and do a apt-get update. once you've done this, do:

$ sudo apt-get install sun-java5-jre

installation on vista is as easy as downloading and installing the latest zip from http://jakarta.apache.org/site/downloads/downloads_jmeter.cgi, unzipping it and running jmeter.bat. please don't infer that i'm condoning or suggesting the use of windows vista ;)

setting up the test server - download and install jmeter

next, download the latest stable version of jmeter from the jmeter download page, for example:

$ wget http://apache.mirrors.tds.net/jakarta/jmeter/binaries/jakarta-jmeter-2.3.1.tgz

and then install it:

$ tar xvfz jakarta-jmeter-2.3.1.tgz

you should now be able to run it by:

$ cd ./jakarta-jmeter-2.3.1/bin
$ ./jmeter

if you are having problems running jmeter, see the troubleshooting section at the end of this article.

setting up a basic test

jmeter is a very full featured testing application. we'll scratch the surface of it's functionality and setup a fairly simplistic load test. you may want to do something a bit more sophisticated, but this will at least get you started.

to create the basic test, run jmeter as described above. the first step is to create a "thread group" object. you'll use this object to define the simulated number of users (threads) and the duration of the test. right mouse click the test plan node and select:
add -> thread group

specify the load that you'll exert on your system, for example, pick 10 users (threads) and a loop count (how many times each thread will execute your test). you can optionally modify the ramp up period e.g. a 10s ramp up in this example would create one new user ever second.

now add a sampler by right mouse clicking the new thread group and choosing:
add -> sampler -> http request. make sure to check the box "retrieve all embedded resources from html files", to properly simulate a full page load.

now add a listener to view the detailed results of your requests. the "results tree" is a good choice. add this to your thread group by selecting: add -> listener -> view results tree. note that after you run your test, you can select a particular request in the left panel and then select the "response data" tab on the right, to verify that you are getting a sensible response from your server, as shown below.

finally let's add another listener to graph our result data. choose:
add -> listener -> graph results. this produces a graph similar to the graph on the right.

if you want to create a more sophisticated test, you'll probably want to create realistic use scenarios, including multiple requests spaced out using timers, data creation by logged in users etc. you'll probably want to verify results with assertions. all of this is relatively easy, and you can read more on apache's site about creating a test plan. you can get information on login examples and cookie support here. you can also read the follow up to this blog: load test your drupal application scalability with apache jmeter: part two

running your test

controlling your test is now a simple matter of choosing the menu items: run -> start, run -> stop, run -> clear all etc. it's very intuitive. while your test is running, you can select the results graph, and watch the throughput and performance statistics change as your test progresses.

if you'd like to run your test in non-gui mode, you can run jmeter on the command line as follows:

$ jmeter --nongui --testfile basicTest.jmx --logfile /tmp/results.jtl

this would run a test defined in file basicTest.jmx, and ouput the results of the test in a file called /tmp/results.jtl. once the test is complete, you could, for example, copy the results file locally and run jmeter to visually inspect and analyse the results, with:

$ jmeter --testfile basicTest.jmx

or just run jmeter as normal and then open your test.

you may then use the listener of choice (e.g. "graph results") to open your results file and display the results.

interpreting your drupal results

most production sites run with drupal's built-in caching turned on. you can look at your performance setting in the administration page at: http://www.example.com/admin/settings/performance. this caching makes a tremendous difference to throughput, but when users are logged in, they bypass this cache.

therefore, to get a realistic idea of your site performance, it's a good idea to calibrate your system with caching on and caching off, and linearly interpolate the results to get a true idea of your maximum throughput. for example, if your throughput is 1,000 views per minute with caching, and 100 without caching and at any given point in time 50% of your users are logged in, you could estimate your throughput at (1000 + 100) / 2 = 550, that is 550 views per minute.

alternatively, you could build a more sophisticated test that simulates close-to-realistic site access including logged-in sessions. clearly, the more work you put into your load tests, the more accurate your results will be. see the followup article for details on building a more sophisticated test.

an example test - would a static file server or cdn help your application?

jmeter allows you to easily estimate the effect of configuration changes, sometimes without actually making the changes. recently i read robert douglass' interesting article on using lighttpd as a static file server for drupal, i was curious how much of a difference that would make.

simply un-checking the "retrieve all embedded resources from html files" on the http request allowed me to simulate all the static resources coming from another (infinitely fast) server.

for my (image intensive) application the results were significant, about a 3x increase in throughput. clearly the real number depends on many factors including the static resources (images, flash etc) used by your application and the ratio of first time to repeat users or your site (repeat users have your content cached). it seems fair to say that this technique would significantly improve throughput for most sites and presumably page performance would be significantly improved too, especially if the static resources were cdn hosted.

troubleshooting your jmeter install

if you are having problems with your jmeter install, then:

make sure that the java version you are running is compatible i.e. 1.4 or later, by:

$ java -version
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)

make sure that you have all the dependencies installed, if you get the error "cannot load awt toolkit: gnu.java.awt.peer.gtk.gtktoolkit", you might you might have to install the gcj runtime library. do this as follows:

$ sudo apt-get install libgcj7-awt

if jmeter hangs or stalls, you probably don't have the right java version installed or on your path.

if you're still having problems, take a look in the application log file jmeter.log for clues. this gets created in the directory that you run jmeter in.

if you are having problems getting x forwarding to work, make sure that it is enabled in your sshd config file e.g. /etc/ssh/sshd_config. you should have a line like:

X11Forwarding yes

if you change this, don't forget to restart the ssh daemon.

resources

further reading

if you'd like to build a more sophisticated test, take a look at my next blog: load test your drupal application scalability with apache jmeter: part two.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.

thanks to curtis (serverjockey) hilger for introducing me to jmeter.

Dec 19 2007
Dec 19
css has vastly improved the quality of html markup on the web. however, given its complexity, it has some astounding deficiencies.

one of the biggest problems is the lack of constants. how many times have you wanted to code something like this? light_grey = #CCC. instead you are forced to repeat #CCC in your css. this quickly creates difficult-to-maintain and difficult-to-read code.

an elegant solution to the problem is to use a general purpose preprocessor like m4. m4 gives you a full range of preprocessing capability, from simple constants to sophisticated macros.

traditional css

consider the following example css:

.codeblock code { font:95% monospace; color: #444;}
div.codeblock {
   padding: 10px;
   border: 1px solid #888;
}

applying some m4 macros to this code, not only makes the css more maintainable, it also makes it more readable by increasing its semantic quality.

below, we add constants: mid_grey, dark_grey and std_padding.

the same css with m4

.codeblock code { font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

trying it it out

if you'd like to give this a try, m4 is usually available as a standard package e.g. on debian style linux flavors do:

# apt-get install m4 m4-doc

now copy the following code into a file called example.css changequote(^,^)dnl                change quotes to something safe
changecom(^/*^, ^*/^)dnl           change comments to css style

define(dark_grey, ^#444^)dnl       define a dark grey color
define(mid_grey, ^#888^)dnl        define a middle grey color
define(std_padding, ^10px^)dnl     define the standard padding

.codeblock code { font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

and now run the preprocessor:

$ m4 example.css

you should see output like:

.codeblock code { font:95% monospace; color: #444;}
div.codeblock {
   padding: 10px;
   border: 1px solid #888;
}

notes on the example:
  • dnl tells m4 to "discard to next line" i.e. ignore everything after it. useful for comments.
  • i use changequote and changecom to change quoting and commenting characters to be more css compliant than the defaults.

using include statements

in practice, you'll often want to use your definitions in several css files. to do this, place your definitions into an external file. in our example, we split our code into two files, definitions.m4 and example.css as follows:

definitions.m4

define(dark_grey, ^#444^)dnl       define a dark grey color
define(mid_grey, ^#888^)dnl        define a middle grey color
define(std_padding, ^10px^)dnl     define the standard margin

example.css

changequote(^,^)dnl                change quotes to something safe
changecom(^/*^, ^*/^)dnl           change comments to css style
include(^definitions.m4^)dnl       include the definitions file

.codeblock code {font:95% monospace; color: dark_grey;}
div.codeblock {
   padding: std_padding;
   border: 1px solid mid_grey;
}

note: my choice of filenames and extensions (definitions.m4, example.css) is arbitrary.

once you've split the files, you can run the preprocessor as before:

$ m4 example.css

further thoughts

this article describes a tiny subset of the power of the m4 language. for more information, take a look at the gnu manual.

one thing that i don't discuss here is integrating a preprocessor into your development / build environment. more on that later.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Dec 18 2007
Dec 18
#!/bin/bash

# guardian - a script to watch over application system dependences, restarting things
#            as necessary:  http://www.johnandcailin.com/john
#
#            this script assumes that at, logger, sed and wget are available on the path.
#            it assumes that it has permissions to kill and restart deamons including
#            mysql and apache.
#           
#            Version: 1.0:    Created
#                     1.1:    Updated logfileCheck() not to assume that files are rotated
#                             on restart.

checkInterval=10                         # MINUTES to wait between checks

# some general settings
batchMode=false                          # was this invoked by a batch job
terminateGuardian=false                  # should the guardian be terminated

# setting for logging (syslog)
loggerArgs=""                            # what extra arguments to the logger to use
loggerTag="guardian"                     # the tag for our log statements

# the at queue to use. use "g" for guardian. this queue must not be used by another
# application for this user.
atQueue="g"

# the name of the file containing the checks to run
checkFile="./checks"

# function to print a usage message and bail
usageAndBail()
{
   cat << EOT
Usage:guardian [OPTION]...
Run a guardian to watch over processes. Currently this supports apache and mysql. Other
processes can be added by simple modifications to the script. Invoking the guardian will run
an instance of this script every n minutes until the guardian is shutdown with the -t option.
Attempting to re-invoke a running guardian has no effect.

All activity (debug, warning, critical) is logged to the local0 facility on syslog.

The checks are listed in a checkfile, for example:

   #check type, daemonName, executableName, checkSource, checkParameters
   logfileCheck, apache2,       apache2,        /var/log/apache2/mainlog, "segmentation fault"

This checkfile specifies a periodic check of apache's mainlog for a string containing
"segmentation fault", restarting the apache2 process if it fails.

This script should be run on each host running the service(s) to be watched.

  -i        set the check interval to MINUTES
  -c        use the specified check file
  -b        batch mode. don't write to stderr ever
  -t        terminate the guardian
  -h        print this help

Examples:
To run a guardian every 10 minutes using checks in "./myCheckFile"
$ guardian -c ./myCheckFile -i 10

EOT

   exit 1;
}

# parse the command line arguments (l,s and t, each of which take a param)
while getopts i:c:hbt o
do     case "$o" in
        i)     checkInterval="$OPTARG";;
        c)     checkFile="$OPTARG";;
        h)     usageAndBail;;
        t)     terminateGuardian=true;;
        b)     batchMode=true;;        # never manually pass in this argument
        [?])   usageAndBail
       esac
done

# only output logging to standard error running from the command line
if test ${batchMode} = "false"
then
   loggerArgs="-s"
fi

# setup logging subsystem. using syslog via logger
logCritical="logger -t ${loggerTag} ${loggerArgs} -p local0.crit"
logWarning="logger -t ${loggerTag} ${loggerArgs} -p local0.warning"
logDebug="logger -t ${loggerTag} ${loggerArgs} -p local0.debug"

# delete all outstanding at jobs
deleteAllAtJobs ()
{
   for job in `atq -q ${atQueue} | cut -f1`
   do
      atrm ${job}
   done
}

# are we to terminate the guardian?
if test ${terminateGuardian} = "true"
then
   deleteAllAtJobs

   ${logDebug} "TERMINATING on user request"
   exit 0
fi

# check to see if a guardian job is already scheduled, return 0 if they are, 1 if not.
isGuardianAlreadyRunning ()
{
   # if there are one or more jobs running in our 'at' queue, then we are running
   numJobs=`atq -q ${atQueue} | wc -l`
   if test ${numJobs} -ge 1
   then
      return 0
   else
      return 1
   fi
}

# make sure that there isn't already an instance of the guardian running
# only do this for user initiated invocations.
if test ${batchMode} = "false"
then
   if isGuardianAlreadyRunning
   then
      ${logDebug} "guardian invoked but already running. doing nothing."
      exit 0
   fi
fi

# get the nth comma seperated token from the line, trimming whitespace
# usage getToken line tokenNum
getToken ()
{
   line=$1
   tokenNum=$2

   # get the nth comma seperated token from the line, removing whitespace
   token=`echo ${line} | cut -f${tokenNum} -d, | sed 's/^[ \t]*//;s/[ \t]*$//'`
}

# check http. get a page and look for a string in the result.
# usage: httpCheckImplementation sourceUrl checkString
httpCheck ()
{
   sourceUrl=$1
   checkString=$2

   wget -O - --quiet ${sourceUrl} | egrep -i "${checkString}" > /dev/null 2>&1
   httpCheckResult=$?
   if test ${httpCheckResult} -eq 0
   then
      ${logDebug} "PASS: found \"${checkString}\" in ${sourceUrl}"
   else
      ${logWarning} "FAIL: could NOT LOCATE \"${checkString}\" in ${sourceUrl}"
   fi

   return ${httpCheckResult}
}

# check to make sure that mysql is running
# usage: mysqlCheck connectString query
mysqlCheck ()
{
   connectString=$1
   query=$2

   # get the connect params from the connectString
   userAndPassword=`echo ${connectString} | sed "s/.*\/\/\(.*\)@.*/\1/"`
   mysqlUser=`echo ${userAndPassword} | cut -f1 -d:`
   mysqlPassword=`echo ${userAndPassword} | cut -f2 -d:`
   mysqlHost=`echo ${connectString} | sed "s/.*@\(.*\)\/.*/\1/"`
   mySqlDatabase=`echo ${connectString} | sed "s/.*@\(.*\)/\1/" | cut -f2 -d\/`

   mysql -e "${query}" --user=${mysqlUser} --host=${mysqlHost} --password=${mysqlPassword} --database=${mySqlDatabase} > /dev/null 2>&1
   mysqlCheckResult=$?
   if test ${mysqlCheckResult} -eq 0
   then
      ${logDebug} "PASS: executed \"${query}\" in ${mysqlHost}"
   else
      ${logWarning} "FAIL: could NOT EXECUTE \"${query}\" in database ${mySqlDatabase} on ${mysqlHost}"
   fi

   return ${mysqlCheckResult}
}

# check to make sure that a logfile is clean of critical errors
# usage: logfileCheck errorString logFile
logfileCheck ()
{
   logFile=$1
   errorString=$2
   logfileCheckResult=0
   marker="__guardian marker__"
   mark="${marker}: `date`"

   # make sure that the logfile exists
   test -r ${logFile} || { ${logCritical} "logfile (${logFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # see if we have a marker in the log file
   grep "${marker}" ${logFile} > /dev/null 2>&1
   if test $? -eq 1
   then
      # there is no marker, therefore we haven't seen this logfile before. add the
      # marker and consider this check passed
      echo ${mark} >> ${logFile}
      ${logDebug} "PASS: new logfile"
      return 0
   fi

   # pull out the "active" section of the logfile, i.e. the section between the
   # last run of the guardian and now i.e. betweeen the marker and the end of the file

   # get the last marker line number
   lastMarkerLineNumber=`grep -n "__guard" ${logFile} | cut -f1 -d: | tail -1`

   # grab the active section
   activeSection=`cat ${logFile} | sed -n "${lastMarkerLineNumber},$ p"`

   # check for the regexs the logFile's active section
   echo ${activeSection} | egrep -i "${errorString}" > /dev/null 2>&1
   if test $? -eq 1
   then
      ${logDebug} "PASS: logfile (${logFile}) clean: line ${lastMarkerLineNumber} to EOF"
   else
      ${logWarning} "FAIL: logfile (${logFile}) CONTAINS CRITICAL ERRORS"
      logfileCheckResult=1
   fi

   # mark the newly checked section of the file
   echo ${mark} >> ${logFile}

   return ${logfileCheckResult}
}

# restart deamon, not taking no for an answer
# usage: restartDamon executableName, initdName
restartDaemon ()
{
   executableName=$1
   initdName=$2
   restartScript="/etc/init.d/${initdName}"

   # make sure that the daemon executable is there
   test -x ${restartScript} || { ${logCritical} "restart script (${restartScript}) is not executable. CRITICAL GUARDIAN ERROR."; exit 1; }

   # try a polite stop
   ${restartScript} stop > /dev/null

   # get medieval on it's ass
   pkill -x ${executableName} ; sleep 2 ; pkill -9 -x ${executableName} ; sleep 2

   # restart the deamon
   ${restartScript} start > /dev/null

   if test $? -ne 0
   then
      ${logCritical} "failed to restart daemon (${executableName}): CRITICAL GUARDIAN ERROR."
      exit 1
   else
      ${logDebug} "daemon (${executableName}) restarted."
   fi
}

#
# things look good, let's do our checks and then schedule a new one
#

# make sure that the checkFile exists
test -r ${checkFile} || { ${logCritical} "checkfile (${checkFile}) is not readable. CRITICAL GUARDIAN ERROR."; exit 1; }

# loop through each of the daemons that need to be managed
for daemon in `cat ${checkFile} | egrep -v "^#.*" | cut -f2 -d, |  sed 's/^[ \t]*//;s/[ \t]*$//' | sort -u`
do
   # execute all the checks for the daemon in question
   cat ${checkFile} | egrep -v "^#.*" | while read line
   do
      getToken "${line}" 2 ; daemonName=${token}

      if test ${daemonName} = ${daemon}
      then
         # get the check definition
         getToken "${line}" 1 ; checkType=${token}
         getToken "${line}" 3 ; executableName=${token}
         getToken "${line}" 4 ; checkSource=${token}
         getToken "${line}" 5 ; checkParams=${token}

         # remove quotes
         checkSourceQuoteless=`echo ${checkSource} | sed "s/\"//g"`
         checkParamsQuoteless=`echo ${checkParams} | sed "s/\"//g"`

         # call the appropriate handler for the check
         ${checkType} "${checkSourceQuoteless}" "${checkParamsQuoteless}"

         if test $? -ne 0
         then
            ${logCritical} "CRITICAL PROBLEMS with deamon (${daemonName}), RESTARTING."
            restartDaemon ${executableName} ${daemonName}
         fi
      fi
   done
done

# delete all at jobs (race conditions)
deleteAllAtJobs

# schedule a new instance of this sucker
${logDebug} "scheduling another check to run in ${checkInterval} minutes"
at -q ${atQueue} now + ${checkInterval} minutes > /dev/null 2>&1 << EOT
$0 $* -b
EOT

Dec 11 2007
Dec 11

the blessing and curse of cck is the ability to quickly create very complex node types within drupal. it doesn't take very long before the input form for a complex node type has become unmanageably long, requiring your user to do a lot of scrolling to get to the bottom of the form. the obvious solution is to break your form into multiple pages, but there is no easy way to do this. there do exist two proposed solutions to this, the cck wizard module and a drupal handbook entry. however, the well-intentioned cck wizard module doesn't seem to work, and the example code in the drupal handbook becomes tedious to repeat for each content type. to fill the void, i bring you cck witch

cck witch is based on the same premise as the handbook entry : the most natural way to divide a cck form into pages is to use field groups. from there, however, cck witch diverges, taking a relatively lazy, yet effective approach to the problem of multi page forms: on every page we render the entire form, but then simply hide the fields and errors that do not belong to the current step. it also offers an additional feature : when the form is complete and the node is rendered, an individual edit link is provided for each step - allowing the user to update the information only for a particular page in the form, without having to step through the entire wizard again.

if you've now read enough to be curious to see the goods, then please, be my guest and skip straight to the live demo.

the demo

in the demo, you will walk through a three step multi page cck form that invites you to specify your dream house. before proceeding to the next step in the form, the user must complete the required fields on the previous steps. on all steps other than the first step, the user may go back and edit their data for the previous step.

when the form is complete and the node is viewed, we add an edit link inside each field group. clicking this link allows the user to edit only the fields within that group, rather than requiring the user to step through the entire form again.

disclaimer

be warned, this is a pre-alpha release. also, this wizardly wonder is not meant for the drupal novitiate. before using it you must

  • patch drupal core, adding 4 lines to the form_set_error function.
  • override two form related theme functions
  • follow a simple set of conventions when configuring your cck content type

manual

step zero - download cck witch

get your copy of this pre-alpha release here

step one - patch drupal core

in forms.inc replace the form_set_error with the following method. this exposes an option to remove errors from the list. it also stops drupal from adding form errors to the drupal message list. do not perform this step without also performing step two. if you do, all form errors will mysteriously vanish.

function form_set_error($name = NULL, $message = '', $remove = FALSE) {
  static $form = array();
 
  if(!$remove) {
    // Set a form error
    if (isset($name) && !isset($form[$name])) {
      $form[$name] = $message;
    }
  }
  else {
    // Remove a form error
    if (isset($name) && isset($form[$name])) {
      unset($form[$name]); 
    }
  }
 
  return $form;
}

step two - override two form theme functions

next, you need to override the theme function for a form element to display the form error messages inline in your form, instead of in a big blob of messags at the top. this is a nice thing to do regardless of whether or not you want multi page cck forms. do this by overriding theme_form_element method and then adding the following in the location of your choice. (right at the bottom, immediately before the closing div will do fine.)   if($element['#parents']) {
    $form_element_error = form_get_error($element);
  }

  if ($form_element_error && $element['#type'] != 'radio') {
    $output .=' <div class="form-error">' . $form_element_error . "</div>\n"; 
  }

and, if you want all the buttons at the bottom of the form to line up nicely, override the theme_node_form method with the following

function theme_node_form($form) {
  $output = "\n<div class=\"node-form \">\n";
  $output .= "  <div class=\"standard\">\n";
  $output .= drupal_render($form);
  $output .= "  </div>\n";
  $output .= "</div>\n";
  return $output;
}

step three - configure your cck content type

when configuring your cck content type, create one group per page. you must name the groups "step 1", "step 2", etc. also, you must visit the display fields tab and submit the form there. you don't have to change anything, just submit the form. (this is clearly a cck bug, but we'll just work around it for now.)

see below for an example configuration:

step four - configure cck witch

finally, visit admin -> content -> multi-page form settings and set the number of pages for each cck content type. the cck witch module will only interact with those content types where the number of pages is greater than one.

future improvements

  • currently, cck witch presumes that your content type does not include a body field. complex cck node types rarely do. handling the body field is easy, it's just not obvious to me which page the body should appear on.
  • if there are other, non-cck options on your form (for example, the administrative meta tags or menu settings) these currently appear on all pages of the form. you can set them whenever you please. possibly, these should all be moved to an implied final page in the flow?
Nov 29 2007
Nov 29

using the term "content management system" to describe the drupal cms understates it's full potential. i prefer to consider drupal a web-application development-system, particularly suitable for content-heavy projects.

what are the fantastic four?

drupal's application development potential is provided in large-part by a set of "core" modules that dovetail to provide an application platform that other modules and applications build on. these modules have become a de-facto standard: drupal's fantastic four. our superheros are cck, views, panels and cck field types and widgets. if you are considering using drupal to build a website of any sophistication, you can't overlook these. note that cck field types and widgets isn't a real module, but rather a set of related modules.

flying with the four

getting a feel for how these modules work and interact isn't trivial, so i'll give you a brief introduction to the super-powers of each of them, and then take you step-by-step through an example, with enough detail that you can easily get it working on your system. or, if you want to see a professional implementation built on the same principles, check out the zicasso photo competition.

meet our heros

the content construction kit or as it's more commonly referred to, cck, provides point-and-click attribute extensibility to drupal's content-types. for example, if you site is about photography, you could define a type of page on your site called "photograph" and then add typed attributes to it, shutter-speed (integer), flash (boolean) etc. cck then automagically creates forms for you (or your users) to create and edit these types of pages, providing suitable validation, gui controls etc.

the cck fieldtype modules each define a new type of field that can be used in your cck content types. one example is the imagefield module, allowing your cck types to have fields of type image. this allows your "photograph" page to contain the actual photograph itself. there are many more types that you can find in the cck modules download area.

the views module allows simple point and click definition of lists of drupal nodes, including your cck nodes. you can control not only what is in the list, but how the list is displayed, including sorting, pagination etc. these lists can be conveniently displayed as blocks, full blown pages or even rss feeds. for example, you could define a list of photographs that had been highly rated by users on your photography site.

the panels module allows you to create pages divided into sections, each section containing a node, block, view or any custom content. so without any knowledge of html or css you can create complicated and powerful layouts. for example, you could create a page with two views, one showing a list of recently submitted photographs and one showing a list of highly ranked photographs. this module is currently undergoing a huge facelift and panels2 is in alpha at the time of writing

an example

to illustrate how the fantastic four can be put to good use, let's continue with our photography theme and create a simple photo-competition application. this application (shown to the right) allows the creation of a simple photo competition entry using a form. the main page shows two lists, one of recent entries and of "featured" entries. the application also has a detail page for each photograph where anonymous users can leave comments.

step one - install the modules

i'm going to assume that you've got a basic drupal install up-and-running. if you haven't, please refer to one of my previous blogs, easy-peasy-lemon-squeezy drupal installation on linux. once you've done this, you should install 6 modules. cck, views, panels2, imagefield, email field and imagecache. on linux, you can do this as follows. cd to your drupal directory (the one containing cron.php etc.), create the directory sites/all/modules if necessary, and download the modules:

# wget http://ftp.drupal.org/files/projects/panels-5.x-2.0-alpha14.tar.gz \
http://ftp.drupal.org/files/projects/views-5.x-1.6.tar.gz \
http://ftp.drupal.org/files/projects/cck-5.x-1.6-1.tar.gz \
http://ftp.drupal.org/files/projects/imagefield-5.x-1.1.tar.gz \
http://ftp.drupal.org/files/projects/imagecache-5.x-1.3.tar.gz \
http://ftp.drupal.org/files/projects/email-5.x-1.x-dev.tar.gz

then unzip them and set the permissions properly:

# for file in *.gz; do tar xvfz $file; done
# chown -R www-data.www-data *

now to to the administrative interface, http://example.com/drupal/admin/build/modules and enable the modules in question.

finally, now go to http://example.com/drupal/admin/user/access and grant access to the panels and views module features to the role you are using e.g. "access all views" to "authenticated user" and "administer views" to your "developer" or "admin" roles. also grant "post comments without approval" and "post comments" and "access comments" to the anonymous user.

note we're using the alpha panels version, panels2. it's not quite ready for prime time, but it's hard to resist. it kicks ass.

step two - create a new content type

now it's time to create a new content type. navigate to the content types page at http://example.com/drupal/admin/content/types, and create the "photo competition entry" as shown below.

now let's add two new custom fields to our photo competition type: email and photograph. these fields make use of the new cck field type modules we just installed.

create the email field as follows:

create the photograph field as follows:

now go to http://example.com/drupal/admin/user/access and allow anonymous users to "create photo_entry content" and "edit own photo_entry content"

step three - setting our themes

because i'm bored with garland, let's change the default theme to "minnelli" in http://example.com/drupal/admin/build/themes, change the administratin theme http://example.com/drupal/admin/settings/admin back to garland.

step four - create some content

now that we've defined our new content type, we can go ahead and create some new content. navigate to http://[...]/node/add/photo-entry and fill out a few entries. you can see your new create form in action, complete with validation (shown to the right).

it's best to do this as the anonymous user to see the usual user experience. it's convenient to stay logged in as admin and use another browser e.g. internet explorer (bleah) for your regular (anonymous) user.

step five - configure imagecache

the imagecache module allows you to define an arbitrarily large number of image transformations (presets) including scaling, resizing and cropping. let's define two transformations, one preview to create a 200px wide scaled down preview. the second transformation, thumbnail is slightly more complex, and creates a square image, 120px by 120px that is a scaled, centered crop of the original. rockin.

create the thumbnail preset as follows:

create the preview preset as follows:

you should now be able to test your presets with the content you created e.g. if you uploaded an image called myImage.jpg, you can view your transformed images at:

step six - create our views

the views module allows you to create lists of nodes. we're going to create two views:
  1. recent_photo_entries, a list of the five most recently submitted entries. the list shows a thumbail of the image and the email address of the creator.
  2. featured_images, a list of the two most recently commented on images. this list shows a preview of the image, the image title and the email address of the creator.

create the recent view as follows:

create the featured view as follows:

step seven - create the panel page

the last step is to create the panel page to host our content and views. go to http://example.com/drupal/admin/panels/panel-page and create a new "two column stacked" layout, as shown below:

put custom content in the top panel, your recent view in the left panel and the featured view in the right panel. for the views, be careful to select a "view type" of block.

the following image shows the custom content you should create in the top panel:

the final image shows the configuration screen for the recent view (left panel). the right panel is very similar:

finally go to the "site information" administrative section: http://example.com/drupal/admin/settings/site-information and set your new panel as the home page i.e. put "photo-competition" in the default front page box.

you are done and your site should look something like:

further work

there is a lot that you could simply do to enhance this example, for example:
  • installing the jrating or fivestar module and allowing users to vote on photographs using a nice javascript control.
  • creating a view that implements an rss feed for photo competition entries.
  • using css to style your views and nodes.

check out a professional drupal photo competition based on these same principles at zicasso

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 15 2007
Nov 15

if you've setup a clustered drupal deployment (see scaling drupal step three - using heartbeat to implement a redundant load balancer), a good next-step, is to scale your database tier.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

deployment overview

this table summaries the characteristics of this deployment choice scalability: good redundancy: fair ease of setup: poor

servers

in this example, i use:

web server drupal-lb1.mydomain.com192.168.1.24 data server drupal-data-server1.mydomain.com192.168.1.26 data server drupal-data-server2.mydomain.com192.168.1.27 data server drupal-data-server3.mydomain.com192.168.1.28 mysql load balancer mysql-balance-1.mydomain.com192.168.1.94

first steps first - optimizing your database and application

the first step to scaling your database tier should include identifying problem queries (those taking most of the resources), and optimizing them. optimizing may mean reducing the volume of the queries by modifying your application, or increasing their performance using standard database optimization techniques such as building appropriate indexes. the devel module is a great way to find problem queries and functions.

another important consideration is the optimization of the database itself, by enabling and optimizing the query cache, tuning database parameters such as the maximum number of connections etc. using appropriate hardware for your database is also a huge factor in database performance, especially the disk io system. a large raid 1+0 array for example, may do wonders for your throughput, especially combined with a generous amount of system memory available for disk caching. for more on mysql optimization, take a look at the great o'reilly book by jeremy zawodny and derek balling on high performance mysql.

when it's time to scale out rather than up

you can only (and should only) go so far scaling up. at some point you need to scale out. ideally, you want a database clustering solution that allows you do exactly that. that is, add nodes to your database tier, completely transparently to your application, giving you linear scalability gains with each additional node. mysql cluster promises exactly this. it doesn't offer full transparency however, due to limitations introduced by the ndb storage engine required by mysql cluster. having said that, the technology looks extremely promising and i'm interested if anyone has got a drupal application running successfully on this platform. you can read more on mysql clustering on the mysql cluster website or in the the mysql clustering book by alex davies and harrison fisk.

less glamorous alternatives to mysql cluster

without the magic of mysql cluster, we've still got some, admittedly less glamorous, alternatives. one is to use traditional mysql database cluster, where all writes go to a single master and reads are distributed across several read-only-nodes. the master updates the read-only-nodes using replication.

an alternative is to segment read and write requests by role, thereby partitioning the data into segments, each one resident on a dedicated database.

these two approaches are illustrated below:

there are some significant pitfalls to both approaches:

  • the traditional clustering approach, introduces a replication lag i.e. it takes a non-trivial amount of time, especially under load, for writes to make it back to the read-only-nodes. this may not be problematic for very specific applications, but is problematic in the general case
  • the traditional clustering approach scales only reads, not writes, since each write has to be made to each node.
  • in traditional clustering the total effective size of your memory cache is the size of a single node (since the same data is cached on each node), whereas with segmentation it's the sum of the nodes.
  • in traditional clustering each node has the same hardware optimization pattern, whereas with segmentation, it can be customized according to the role it's playing.
  • the segmentation approach reduces the redundancy of the system, since theoretically a failure of any of the nodes takes your "database" off line. in practice, you may have segments that are non essential e.g. logging. you can, of course, cluster your segments, but this introduces the replication lag issue.
  • the segmentation approach relies on a thorough understanding of the application, and the relative projected load on each segment to do properly.
  • the segmentation approach is fundamentally very limited, since there are a limited number of segments for a typical application.

more thoughts on database segmentation

from one perspective, the use of memcache is a database segmentation technique i.e. it takes part of the load on the database (from caching) and segments this into a specialized and optionally distributed caching "database". there is a detailed step-by-step guide on lullabot on doing this on debian etch and drupal module.

you can continue this approach on other areas of your database, dedicating several databases to different roles. for example, if one of the functions of your database is to serve as a log, why not segment all log activity onto a single database? clearly, it's important that your segments are distinct i.e. that applications don't need joins or transactions between segments. you may have auxiliary applications that do need complex joins between segments e.g. reporting. this can be easily solved by warehousing the data back into a single database to serve specifically this auxiliary application (warehousing in this case).

while i'm not suggesting that the next step in your scaling exercise necessarily should be segmentation, this clearly depends on your application and preferences, we're going to explore the idea anyway. it's my blog afterall :)

what segmentation technologies to use?

there are several open source tools that you can use to build a segmentation infrastructure. sqlrelay is a popular database-agnostic proxying tool that can be used for this purpose. mysql proxy is, as the name suggests, a mysql specific proxying tool.

in this article i focus on mysql proxy. sqlrelay (partly due to it's more general purpose nature) is somewhat difficult to configure, and inherently less flexible than mysql proxy. mysql proxy on the other hand is quick to setup and use. it has a simple, elegant and flexible architecture that allows for a full range of proxying applications, from trivial to uber-complex.

more on mysql proxy

jan kneschke's brainchild, mysql proxy is a lightweight daemon that sits between your client application (apache/modphp/drupal in our case) and the database. the proxy allows you to perform just about any transformation on the traffic, including segmentation. the proxy allows you to hook into 3 actions; connect, query and result. you can do whatever you want to in these steps, manipulating data and performing actions using lua scripts. lua is a fully featured scripting language, designed for high performance. clearly a key consideration in this application. don't worry too much about aFsc (another scripting language). it's easy to pick up. it's powerful and intuitive.

even if you don't intend to segment your databases, you might consider a proxy configuration for other reasons including logging, filtering, redundancy, timing and analysis and query modification. for example, using mysql proxy to implement a hot standby database (replicated) would be trivial.

the mysql site states clearly (as of 09Nov2007); "MySQL Proxy is currently an Alpha release and should not be used within production environments". Feeling lucky?

a word of warning

the techniques described below, including the overall method and the use of mysql proxy, are intended to stimulate discussion. they are not intended to represent a valid production configuration. i've explored this technique purely in an experimental manner. in my example below i segment cache queries to a specific database. i don't mean to imply that this is a better alternative to memcache. it isn't. anyway, i'd love to hear your thoughts on the general approach.

don't panic, you don't really need this many servers

before you get yourself into a panic over the number of boxes i've drawn in the diagram, please bear in mind that this is a canonical network. in reality you could use the same physical hardware for both loadbalancers, or, even better, you could use xen to create this canonical layout and, over time, deploy virtual servers on physical hardware as load necessitated.

down to business - set up and test a basic mysql proxy

o.k., enough of the chatter. let's get down to business and setup a mysql proxy server. first, download and install the latest version of mysql proxy from http://dev.mysql.com/downloads/mysql-proxy/index.html.

tar xvfz mysql-proxy-0.6.0-linux-debian3.1-x86.tar.gz

make sure that your mysql load balancer can access the database on your data server i.e. on your data server, run mysql and enter:

GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, INDEX, ALTER, CREATE
TEMPORARY TABLES, LOCK TABLES
ON drupaldb.*
to [email protected]'192.168.1.94' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;

check that your load balancer can access the database on your data server i.e. on your load balancer do:

# mysql -e "select * from users limit 1" --host=192.168.1.26 --user=drupal --password=password drupaldb

now do a quick test of the proxy, run the proxy server, pointing to your drupal database server:

./mysql-proxy --proxy-backend-addresses=192.168.1.26 &

and test the proxy:

echo "select * from users" |  mysql --host=127.0.0.1 --port=4040 --user=drupal --password=password drupaldb

now change your drupal install to point at the load balancer, rather than your data server directly i.e. edit your settings.php on your webserver(s) and point your drupal install to the mysql load balancer, rather than at your database server:

$db_url = 'mysql://drupal:[email protected]:4040/drupaldb';

asking mysql proxy to segment your database traffic

the best way to segment a drupal databases depends on many factors, including the modules you use and the custom extensions that you have. it's beyond the scope of this exercise to discuss segmentation specifics, but, as an a example i've segmented the database into 3 segments, a cache server, a log server and a general server (everything else).

to get started segmenting, create two additional database instanaces (drupal-data-server2, drupal-data-server3), with a copy of the data from drupal-data-server3. make sure that you GRANT the mysql load balancer permission on to access each database as described above.

you'll now want to start up your proxy server, pointing to these instances. below, i give an example of a bash script that does this. it starts up the cluster and executes several sql statements, each one bound for a different member of the cluster, to ensure that the whole cluster has started properly. note that you'd also want to build something similar as a health check, to ensure that they kept functioning properly and stopping the cluster (proxy) as soon as a problem was detected.

here's the source for runProxy.sh:

:
BASE_DIR=/home/john
BIN_DIR=${BASE_DIR}/mysql-proxy/sbin

# kill the server if it's running
pkill -f mysql-proxy

# make sure any old proxy instance is dead before firing up the new one
sleep 1

# run the proxy server in the background
${BIN_DIR}/mysql-proxy \
--proxy-backend-addresses=192.168.1.26:3306 \
--proxy-backend-addresses=192.168.1.27:3306 \
--proxy-backend-addresses=192.168.1.28:3306 \
--proxy-lua-script=${BASE_DIR}/databaseSegment.lua &

# give the server a chance to start
sleep 1

# prime the pumps!
# execute some sql statements to make sure that the proxy is running properly
# i.e. that it can establish a connection to the range of servers in question
# and bail if anything fails
for sqlStatement in \
   "select cid FROM cache limit 1" \
   "select nid FROM history limit 1" \
   "select name FROM variable limit 1"
do
   echo "testing query: ${sqlStatement}"
   echo ${sqlStatement} |  mysql --host=127.0.0.1 --port=4040 \
       --user=drupal --password=password drupaldb || { echo "${sqlStatement}: failed (is that server up?)"; exit 1; }
done

you'll notice that this script calls references databaseSegment.lua, this is the a script that uses a little regex magic to map queries to servers. again, the actual queries being mapped serve as examples to illustrate the point, but you'll get the idea.. jan has a nice r/w splitting example, that can be easily modified to create databaseSegment.lua.

most of the complexity in jan's code is around load balancing (least connections) and connection pooling within the proxy itself. jan points out (and i agree) that this functionality should be made available in a generic load-balancing lua module. i really like the idea of having this in lua scripts to allow others to easily extend it, for example, by adding a round robin alternative. keep an eye on his blog for developments. anyway, for now, let's modify his example, add a some defines and a method to do the mapping:

local CACHE_SERVER = 1
local LOG_SERVER = 2
local GENERAL_SERVER = 3

-- select a server to use based on the query text, this will return one of
-- CACHE_SERVER, LOG_SERVER or GENERAL_SERVER
function choose_server(query_text)
   local cache_server_strings = { "FROM cache", "UPDATE cache",
                                  "INTO cache", "LOCK TABLES cache"}
   local log_server_strings =   { "FROM history", "UPDATE history",
                                  "INTO history" , "LOCK TABLES history",
                                  "FROM watchdog", "UPDATE watchdog",
                                  "INTO watchdog", "LOCK TABLES watchdog" }

   local server_table = { [CACHE_SERVER] = cache_server_strings,
                          [LOG_SERVER] = log_server_strings }

   -- default to the general server
   local server_to_use = GENERAL_SERVER

   -- find a server registered for this query_text in the server_table
   for i=1, #server_table do
      for j=1, #server_table[i] do
         if string.find(query_text, server_table[i][j])
         then
            server_to_use = i
            break
         end
      end
   end

   return server_to_use
end

and then call this in read_query:

-- pick a server to use
proxy.connection.backend_ndx = choose_server(query_text)

test your application

now test your application. a good way to see the queries hitting your database servers, is to (temporarily) enable full logging on each of them and watch the log.edit /etc/mysql/my.cnf and set:

# Be aware that this log type is a performance killer.
log             = /var/log/mysql/mysql.log

and then:

# tail -f /var/log/mysql/mysql.log

further work

to develop this idea further:
  • someone with better drupal knowledge than me could define a good segmentation structure for typical drupal application, with the query fragments associated with each application.
  • additionally, the scripts could handle exceptional situations better e.g. a regular health check for the proxy.
  • clearly we've introduced another single-point-of-failure in the database load balancer. the earlier discussion of heartbeat applies here.
  • it would be wonderful to bypass all this nonsense and get drupal running on a mysql cluster. i'd love to hear if you've tried it and how it went.

references and documentation

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 13 2007
Nov 13

UPDATE: for the drupal 6 version, please go here.

if your career as a developer has included a stay in the j2ee world, then when you arrived at drupal one of your initial questions was "where's the log file?". eventually, someone told you about the watchdog table. you decided to try that for about five minutes, and then were reduced to using a combination of <pre> and print_r to scrawl debug data across your web browser.

when you tired of that, you learned a little php, did a little web research and discovered the PEAR log package and debug_backtrace(). the former is comfortably reminiscent of good old log4j and the latter finally gave you the stacktrace you'd been yearning for. still, separately, neither gave you quite what you were looking for : a log file in which every entry includes the filename and line number from which the log message originated. put them together though, and you've got log4drupal

log4drupal is a simple api that writes messages to a log file. each message is tagged with a particular log priority level (debug, info, warn, error or emergency) and you may also set the overall log threshold for your system. only messages with a priority level above your system threshold are actually printed to your log file. the system threshold may changed at any time, using the log4drupal administrative interface. you may also specify whether or not a full stack trace is included with every message. by default, a stack trace is included for messages with a priority of error and above. the administrative options are illustrated below :

log4drupal admin screen

now, on to the examples. suppose you had the following ridiculous block of code.

  $i = 0;
  while($i <= $user->profile_age) {
    log_debug("The user at least $i years old");
    $i++;
  }
 
  log_info("The user is $user->profile_age years old");
 
  if($user->profile_age < 2) {
    log_warn("User may be too young");
  }

  if($user->profile_age == 1) {
    log_error("Security violation, user much too young!");
  }

if your log threshold is set to debug then all messages will be shown in your log file as follows :

[20:23:23 11/12/07] [debug] [example.module:47] The user at least 0 years old
[20:23:23 11/12/07] [debug] [example.module:47] The user at least 1 years old
[20:23:23 11/12/07] [info] [example.module:51] The user is 1 years old
[20:23:23 11/12/07] [warning] [example.module:54] User may be too young!
[20:23:23 11/12/07] [error] [example.module:57] Security violation, user much too young!
  at /var/www/drupal/sites/all/modules/example/example.module:57
  at /var/www/drupal/sites/all/modules/example/example.module:71
  at /var/www/drupal/includes/module.inc:406
  at /var/www/drupal/modules/node/node.module:692
  at /var/www/drupal/modules/node/node.module:779
  at /var/www/drupal/modules/node/node.module:2462
  at /var/www/drupal/includes/menu.inc:418
  at /var/www/drupal/index.php:15

if your log threshold is set to warning then only the warning and error messages will be shown.

[20:27:52 11/12/07] [warning] [example.module:54] User may be too young
[20:27:52 11/12/07] [error] [example.module:57] Security violation,user much too young!
  at /var/www/drupal/sites/all/modules/example/example.module:57
  at /var/www/drupal/sites/all/modules/example/example.module:71
  at /var/www/drupal/includes/module.inc:406
  at /var/www/drupal/modules/node/node.module:692
  at /var/www/drupal/modules/node/node.module:779
  at /var/www/drupal/modules/node/node.module:2462
  at /var/www/drupal/includes/menu.inc:418
  at /var/www/drupal/index.php:15

you may download and test a copy of log4drupal here. suggestions for improvement or additional features are welcome. future improvements i've been thinking about include :

  1. integration with watchdog
  2. automatic recursive printing of any complex type messages

it's worth noting that all logging comes with a performance cost. i haven't done any serious calculations yet, but here is some ballpark data. on an unloaded server, with an average page load time of around 1.5 seconds, it takes about 0.3 milliseconds to print out one message. it takes about 0.008 milliseconds to not print out a message that is below your current system threshold.

if people are interested, i'll add this as a module to drupal.org

thanks to a former colleague, alex levine, for the original inspiration.

Nov 11 2007
Nov 11

i got some good feedback on my dedicated data server step towards scaling. kris buytaert in his everything is a freaking dns problem blog points out that nfs creates an unnecessary choke point. he may very well have a point.

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

kris suggests rsync as a solution, and although luc stroobant points out the delete problem, i still think it's a good, simple solution. see the diagram above.

the delete problem is that you can't simply use the --delete flag on rsync. since in an x->y synchronization, a delete on node x looks just like an addition to node y.

i speculate that you can partly mitigate this issue with some careful scripting, using a source-of-truth file server to which you first pull only additions from the source nodes, and then do another run over the nodes with the delete flag (to remove any newly deleted files from your source-of-truth). unfortunately you can't do the delete run on a live site (due to timing problems if additions happen after your first pass and before your --delete pass), but you can do this as a regularly scheduled maintenance task when your directories are not in flux.

i include a bash script below to illustrate the point. i haven't tested this script, or the theory in general. so if you plan to use it, be careful.

you could call this script from cron on your data server. you could do this, say, every 5 minutes for a smallish deployment. even though that this causes a 5 minute delay in file propagation, the use of sticky sessions ensures that users will see files that they create immediately, even if there is a slight delay for others. additionally, you could schedule it with the -d flag during system downtime.

the viability of this approach depends on many factors including how quickly an uploaded file must be available for everyone and how many files you have to synchronize. this clearly depends on your application.

synchronizeFiles -- a bash script to keep your drupal web server's files directory synchronized

#!/bin/bash

# synchronizeFiles -- a bash script to keep your drupal web server's files directory
#                     synchronized - http://www.johnandcailin.com

# bail if anything fails
set -e

# don't synchronize deletes by default
syncDeletes=false

sourceServers="192.168.1.24 192.168.1.25"
sourceDir="/var/www/drupal/files"
sourceUser="www-data"
targetDir="/var/drupalFiles"

# function to print a usage message and bail
usageAndBail()
{
   echo "Usage syncronizeFiles [OPTION]"
   echo "     -d       synchronize deletes too (ONLY use when directory contents are static)"
   exit 1;
}

# process command line args
while getopts hd o
do     case "$o" in
        d)     syncDeletes=true;;
        h)     usageAndBail;;
        [?])   usageAndBail;;
       esac
done

# do initial addition only schronization run from sourceServers to targetServer
for sourceServer in ${sourceServers}
do
   echo "bi directionally syncing files between ${sourceServer} and local"

   # pull any new files to the target
   rsync -a ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}/..

   # push any new files back to the source
   rsync -a ${targetDir} ${sourceUser}@${sourceServer}:${sourceDir}/..
done

# synchronize deletes (only use if directory contents are static)
if test ${syncDeletes} = "true"
then
   for sourceServer in ${sourceServers}
   do
      echo "DELETE syncing files from ${sourceServer} to ${targetDir}"

      # pull any new files to the target, deleting from the source of truth if necessary
      rsync -a --delete ${sourceUser}@${sourceServer}:${sourceDir} ${targetDir}
   done
fi

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 10 2007
Nov 10
if you felt a waft of cold air when you read the recent highly critical drupal security announcement on arbirary code execution using install.php, you were right. your bum was hanging squarely out of the window, and you should probably consider beefing up your security.

drupal's default exposure of files like install.php and cron.php present inherent security risks, for both denial-of-service and intrusion. combine this with critical administrative functionality available to the world, protected only by user defined passwords, broadcast over the internet in clear-text, and you've got potential for some real problems.

fortunately, there are some easy and practical things you can do to tighten things up.

step one: block the outside world from your sensitive pages

one easy way to tighten up your security, is to simply block access to your sensitive pages from anyone outside your local network. this can be done by using apache's mod_rewrite. for example, you could block access to any administrative page by adding the following into your .htaccess file in your drupal directory (the one containing sites, scripts, modules etc.). the example only allows access from IPs in the range 192.*.*.* or 200.*.*.*: <IfModule mod_rewrite.c>
  RewriteEngine on

  # Allow only internal access to admin
  RewriteCond %{REMOTE_ADDR} !^(192|200)\..*$
  RewriteRule   ^admin/.*  - [F]
  [...]
</IfModule>

step two: tunnel into your server for administrative access

now that you've locked yourself out of your server for remote administrative access, you'd better figure how to get back in. SOCKS-proxy and ssh-tunneling to the rescue! assuming that your server is running an ssh server, setup a ssh tunnel (from the machine you are browsing on) to your server as follows:

ssh -D 9999 [email protected]

now go to your favorite browser and proxy your traffic through a local ssh SOCKS proxy e.g. on firefox 2.0 on windoze do the following:
  1. select the tools->options (edit->preferences on linux) menu
  2. go to the "connections" section of the "network" tab, click "settings"
  3. set the SOCKS host to localhost port 9999
now simply navigate to your site and administer, safe in the knowledge that not only is your site's soft-underbelly restricted to local users, but all your traffic (including your precious admin password) is encrypted in transit.

your bum should be feeling warmer already.

some more rules

some other rules that you might want to consider include (RewriteCond omitted for brevity) # allow only internal access to node editing
RewriteRule   ^node/.*/edit.*  - [F]

# allow only internal access to sensitive pages
RewriteRule   ^update.php  - [F]
RewriteRule   ^cron.php  - [F]
RewriteRule   ^install.php  - [F]

debugging

can't get your rewrite rules to work? shock! ... consider adding this to your vhost configuration (e.g. /etc/apache2/sites-available/default) to see what (the hell) is going on.

RewriteLog /var/log/apache2/vhost.rewrite.txt
RewriteLogLevel 3

thanks

thanks to curtis (madman) hilger and paul (windows is not your friend) lathrop for help with this.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Nov 07 2007
Nov 07

don't get me wrong, i'm a happy customer of the drupal hovertip module. everything worked out of the box, and i've enjoyed using it to cram even more pictures into my website. however, the included default css leaves a little to be desired for the following reasons :

  1. it's too specific. it assigns a very particular look and feel to your tooltips, complete with background colors, fixed widths and font sizes. sure, in theory, you can override all that in your theme css. but if css specificity is not your thing, you're going to be tearing your hair out trying to figure how to do it.
  2. the ui element chosen to indicate "hover here" is non-standard. the "hover here" directive is admittedly fairly new, but the emerging standard seems to be the dashed-underline (certainly not the italic font used in the drupal hovertip module).
  3. the clicktip css does not work on ie6. the link to close the clicktip has mysteriously gone missing.

you can download a more generic, flexible version of the necessary hovertip module css that solves all these issues here. here are some examples of how to use it.

hovertips

a hovertip causes a floating div to appear, just below and to the right of your cursor. the floating div can contain anything you like.

the simplest example of a hovertip might contain some plain text. in this example, and all that follow, the supporting html is shown in a code block. notice that you need to assign the floating div a background color. the default background color is transparent.

These are some explanatory words

the simplest example of a hovertip might contain some <span hovertip="text">plain text</span>. 

<div class="hovertip" id="text" style="background-color:#DDD;">
<p>These are some explanatory words</p>
</div>

a more entertaining hovertip might reveal a picture.

a more entertaining hovertip might reveal a  <span hovertip="picture">picture</span>.

<div class="hovertip" id="picture" style="background-color:#FFF;" >
<img src="http://gallery.johnandcailin.com/d/9144-2/ava+ladybug+109.JPG">
</div>

and finally, a hovertip may also contain a link

<p>and finally, a hovertip may also contain a <span hovertip="link">link</span></p>

<div class="hovertip" id="link" style="background-color:#DDD">
Visit our <a href="http://www.johnandcailin.com/tech">tech blog</a>
</div>

clicktips

a clicktip causes a previously invisible div to suddenly reveal itself. the clicktip div comes with a close link that makes the clicktip disappear again.

here is a clicktip that contains some text

These are some explanatory words

<p>here is a clicktip that contains some <span clicktip="text">text</span></p>

<div class="clicktip" id="text" style="background-color:#DDD;padding:5px;">
<p>These are some explanatory words</p>
</div>

Oct 29 2007
Oct 29
the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.

when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.

a well designed deployment will not only increase your scalability, but will also enhance your redundancy by removing single points of failure. implemented properly, an unmodified drupal install can run on this new deployment, blissfully unaware of the clustering, routing and caching going on behind the scenes.

incremental steps towards scalability

in this article, i outline a step-by-step process for incrementally scaling your deployment, from a simple single-node drupal install running all components of the system, all the way to a load balanced, multi node system with database level optimization and clustering.

since you almost certainly don't want to jump straight from your single node system to the mother of all redundant clustered systems in one step, i've broken this down into 5 incremental steps, each one building on the last. each step along the way is a perfectly viable deployment.

tasty recipes

i give full step-by-step recipes for each deployment, that with a decent working knowledge of linux, should allow you to get a working system up and running. my examples are for apache2, mysql5 and drupal5 on debian etch, but may still be useful for other versions / flavors.

note that these aren't battle-hardened production configurations, but rather illustrative minimal configurations that you can take and iterate to serve your specific needs.

the 5 deployment configurations

the table below outlines the properties of each of the suggested configurations: step 0step 1step 2step 3step 4 step 5 separate web and db no yes yes yes yes yes clustered web tier no no yes yes yes yes redundant load balancer no no no yes yes yes db optimization and segmentation no no no no yes yes clustered db no no no no no yes scalabilty poor- poor fair fair good great redundancy poor- poor- fair good fair great setup ease great good good fair poor poor-
in step 0, i outline how to install drupal, mysql and apache to get a get a basic drupal install up-and-running on a single node. i also go over some of the basic configuration steps that you''ll probably want to follow, including cron scheduling, enabling clean urls, setting up a virtual host etc.
in step 1, i go over a good first step to scaling drupal; creating a dedicated data server. by "dedicated data server" i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment.
in step 2, i go over how to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment.
in step 3, i discuss clustering your load balancer. one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling.


the holy grail of drupal database scaling might very well be a drupal deployment on mysql cluster. if you've tried this, plan to try this or have opinions on the feasibility of an ndb "port" of drupal, i'd love to hear it.

tech blog

if you found this article useful, and you are interested in other articles on linux, drupal, scaling, performance and LAMP applications, consider subscribing to my technical blog.
Oct 29 2007
Oct 29

out of the box, the views module allows you to specify access to the view according to user role. this is a critical feature, but sometimes it's not enough. for example, sometimes you may want the view access to depend on the arguments to the view.

specifically, let's suppose that we have implemented facebook-style threaded mail, and we want to use a view to display all the messages in a thread. the thread id is an argument passed to the view. we only wish to allow the view to be accessed by one of the authors of the thread, or users with the 'administer messages' permission.

here's a three step approach to resolving this dilemna :

step one. create a new access hook in the views module

right after

  // Administrator privileges
  if (user_access('access all views', $account)) {
    return TRUE;
  }

add

  // Call a hook that lets a module define access permissions for the view
  $access_func = "views_access_$view->name";
  if(function_exists($access_func))
  {
  return $access_func($view);
  }

step two. implement your new hook

if your view is called message_thread then create a function views_access_message_thread($view) method.

step three. force views to NOT cache the access control settings for this view

okay, this part is a little hokey. the easiest way to do this is to tell the views module that your view has inline arguments. when you are defining the URL for your view in the views setting explicitly include the arguments, even if they occur at the end of the URL.

for example, if your page URL is view/message and then you are passing the thread id as an argument, define the page URL as view/message/$arg.

if you don't perform this step, then the views module will evaluate the access control for view/message/10 for a user, cache that result, and use that result for a subsequent request to view/message/34.

Oct 29 2007
Oct 29

previously, we discussed implementing all of the node hooks for CCK content types except hook_access. unfortunately, there is no access op for hook_nodeapi. adding this to drupal core is the topic of much discussion on drupal.org. so far a resolution to the issue has failed to be included in drupal 5 and drupal 6, and is now on deck for consideration in drupal 7.

this is a complicated issue, and the experts are debating with good cause. in the meantime though, if you need to move on, here's what you can do.

  • install this patch to node.module
  • you now have an access op exposed in hook_nodeapi

one reason that the debate is dragging on on this topic, is that the drupal developers are concerned that access control is already too complicated, and this addition will simply make drupal access control incomprehensible. this is a valid point, and to use this patch properly you do need to understand the access control order of evaluation. when determining whether or node a user may view a node, here are the questions drupal asks

  1. does the user have 'administer nodes' permission. if yes, always return true
  2. does the user have 'access content' permission. if no, always return false
  3. invoke the new hook_nodeapi methods.
    1. if no hook_nodeapi returned an explicit opinion on the matter, keep going.
    2. otherwise, if any hook_nodeapi returned true, return true
  4. invoke the hook_access method, if any. (note, there may be only one of these!)
    1. if no hook_nodeapi returned an explicit opinion, keep going
    2. if an opinion was returned, return that
  5. now check was the node_access table has to say. if no opinion, keep going
  6. is the user the author of the node? if yes, return true
  7. give up, return false

phew, that's a complicated flow of execution. are there any easy guidelines we can draw from this? yes . . .

one downside of granting access control to hook_nodeapi is that there may now be multiple modules with an opinion on the matter. this forces the drupal core developer to make a choice as to what to do when there are multiple, conflicting answers. in this patch, they have chosen to allow positive responses to dominate over negative responses. i'm personally not convinced they will stick with this decision, so, in the meantime, if you're using this patch, try and stick to a convention in which you implement only one hook_nodeapi access control method per content type. in doing this, you're simply allowing your CCK content types to function like any other content type, rather than opening a huge kettle of access control worms.

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web