Sep 28 2012
Sep 28

One year ago I listened to Allen Wirfs-Brock of the Mozilla Foundation deliver the closing keynote for StrangeLoop 2011. Wirfs-Brock's central claim was jarring. Keep in mind, this is a conference whose attendee list is dominated by language designers, database architects, PhDs, and people whose credentials make the term "senior" seem like a gaping understatement. Yet in front of this crowd, Wirfs-Brock unabashedly coronated JavaScript the new king of programming languages.

I did not buy it. But a year later, I'm changing my mind.

I like JavaScript. It was, in all honesty, the first language I felt really comfortable in. I learned with the web. I started programming in 1995, at age 16. My summer internship led me into C, Java, Perl, and JavaScript all at once. JavaScript was my favorite, doubtless because its bindings to the browser made results more gratifying. "Look, ma! I put a message in the status bar!"

But as I matured as a programmer, I looked back on those heady experiments as pretend-programming with a toy language. JavaScript had a firm place in my constellation of programming languages: It was for tricking out web pages.

A decade and a half later, I found myself at StrangeLoop hearing an otherwise credible source claim that JavaScript is the new C. Really? You can imagine my skepticism.

Since hearing Wirfs-Brock a year ago, several things changed for me. First, within weeks of StrangeLoop 2011, I begin writing Node.js code. Second, I read some of the technical papers from Google on the V8 engine (start here), and then read some of the ECMA proposals for the next version of JavaScript. Finally, I wrote an application framework for Node.js -- always a great opportunity to stretch one's grasp of a language and an environment.

I know it's the same language that I used to pop up alert dialogs in my pimply high-school years, but it feels different now. Did it grow up, or did I?

On September 25, StrangeLoop 2012 concluded. And who should deliver the closing keynote? None other than Brendan Eich, the father of JavaScript.

Eich is disarming, funny, and intensely intelligent. Quick to point out the work of others, he portrays the JavaScript community as a vibrant group of intelligent individuals who, differences aside, have the best interests of the language users in mind. His presentation began with a humorous history of the mistakes of JavaScript, then moved to upcoming features and standards work, and concluded with a look at some of the more exciting JavaScript projects.

It didn't come across as a sales pitch; it came across as an acceptance speech, an oath of office. "I hereby solemnly swear that JavaScript will do right by you."

And by far, the most interesting aspect of this presentation was Eich's promotion of JavaScript as the new replacement for the VM. Here is the text of one of Eich's slides:

JavaScript > bytecode

  • Dynamic typing = no verification
  • Type inference = delayed optimization
  • Would byte code compress as well?
  • Bytecode standardization would suck
  • Bytecode versioning would suck more
  • Low-level byte code is future-hostile
  • Many humans like writing JavaScript

Oh yes he did! CoffeeScript, ClojureScript, Dart… Eich enthusiastically champions building languages that compile (or transcode) to JavaScript. (Did you know there's a project to rebuild the JVM in JavaScript? The Doppio project also spoke at StrangeLoop 2012.)

On the server; on the desktop; on mobile devices -- JavaScript already is pervasive. And if you've convinced a theater full of language lawyers, scientists, CTOs, and architects that JavaScript is the new C, you've won. Wirfs-Brock and Eich won.

Hail to the king, baby.

Jun 20 2012
Jun 20

Pronto.js is designed to be a high performance asynchronous application framework that makes it simple to chain together components to build sophisticated application logic. It's the JS equivalent of the PHP Fortissimo framework.

One characteristic that makes both Pronto.js and Fortissimo stand apart is that they provide an alternative to the MVC pattern. They use the Chain-of-Command pattern, which takes a route name and maps it to a series of "commands", each of which is responsible for a different part of the processing. Well-written commands become highly reusable, which makes application development rapid and yet still reliable.

When you build an application the components get chained together into routes with code like this (Pronto.js):

register.route('search')
      .does(InitializeSearchService, 'initialization')
      .does(QueryRemoteSearchService, 'do-search')
        .uses('query').from('get:q')
      .does(SearchTheme, 'format-search-results')
        .uses('searchResults').from('cxt:do-search')
    ;

(Fortissimo code looks similar: $register->route('search')->does(/*…*/)) The simple example above registers the route search to a series of commands that each perform part of the overall task of running a search and formatting the response.

Commands (IntiailizeSearchService, QueryRemoteSearchService and so on) are short pieces of object-oriented code (prototypes in JS, classes in PHP) that take predefined input, perform a simple task, and then return data. My typical command is around 20 lines of code.

I know this is just a brief teaser. We're working on several cool applications built on these technologies. With Pronto.js, we've been able to integrate with a wide variety of NPM packages, while Fortissimo and the Symfony (and other) PHP libraries can be easily combined. In the future, I'll blog some more about Fortissimo and Pronto.js.

Jun 20 2012
ben
Jun 20

MongoDB's query language is good at extracting whole documents or whole elements of a document, but on its own it can't pull specific items from deeply embedded arrays, or calculate relationships between data points, or calculate aggregates. To do that, MongoDB uses an implementation of the MapReduce methodology to iterate over the dataset and extract the desired data points. Unlike SQL joins in relational databases, which essentially create a massive combined dataset and then extract pieces of it, MapReduce iterates over each document in the set, "reducing" the data piecemeal to the desired results. The name was popularized by Google, which needed to scale beyond SQL to index the web. Imagine trying to build the data structure for Facebook, with near-instantaneous calculation of the significance of every friend's friend's friend's posts, with SQL, and you see why MapReduce makes sense.

I've been using MongoDB for two years, but only in the last few months starting using MapReduce heavily. MongoDB is also introducing a new Aggregation framework in 2.1 that is supposed to simplify many operations that previously needed MapReduce. However, the latest stable release as of this writing is still 2.0.6, so Aggregation isn't officially ready for prime time (and I haven't used it yet).

This post is not meant to substitute the copious documentation and examples you can find across the web. After reading those, it still took me some time to wrap my head around the concepts, so I want to try to explain those as I came to understand them.

The Steps

A MapReduce operation consists of a map, a reduce, and optionally a finalize function. Key to understanding MapReduce is understanding what each of these functions iterates over.

Map

First, map runs for every document retrieved in the initial query passed to the operation. If you have 1000 documents and pass an empty query object, it will run 1000 times.

Inside your map function, you emit a key-value pair, where the key is whatever you want to group by (_id, author, category, etc), and the value contains whatever pieces of the document you want to pass along. The function doesn't return anything, because you can emit multiple key-values per map, but a function can only return 1 result.

The purpose of map is to extract small pieces of data from each document. For example, if you're counting articles per author, you could emit the author as the key and the number 1 as the value, to be summed in the next step.

Reduce

The reduce function then receives each of these key-value(s) pairs, for each key emitted from map, with the values in an array. Its purpose is to reduce multiple values-per-key to a single value-per-key. At the end of each iteration of your reduce function, you return (not emit this time) a single variable.

The number of times reduce runs for a given operation isn't easy to predict. (I asked about it on Stack Overflow and the consensus so far is, there's no simple formula.) Essentially reduce runs as many times as it needs to, until each key appears only once. If you emit each key only once, reduce never runs. If you emit most keys once but one special key twice, reduce will run once, getting (special key, [ value, value ]).

A rule of thumb with reduce is that the returned value's structure has to be the same as the structure emitted from map. If you emit an object as the value from map, every key in that object has to be present in the object returned from reduce, and vice-versa. If you return an integer from map, return an integer from reduce, and so on. The basic reason is that (as noted above), reduce shouldn't be necessary if a key only appears once. The results of an entire map-reduce operation, run back through the same operation, should return the same results (that way huge operations can be sharded and map/reduced many times). And the output of any given reduce function, plugged back into reduce (as a single-item array), needs to return the same value as went in. (In CS lingo, reduce has to be idempotent. The documentation explains this in more technical detail.)

Here's a simple JS test, using Node.js' assertion API, to verify this. To use it, have your mapReduce operation export their methods for a separate test script to import and test:

// this should export the map, reduce, [finalize] functions passed to MongoDB.
var mr = require('./mapreduce-query');
 
// override emit() to capture locally
var emitted = [];
 
// (in global scope so map can access it)
global.emit = function(key, val) {
  emitted.push({key:key, value:val});
};
 
// reduce input should be same as output for a single object
// dummyItems can be fake or loaded from DB
mr.map.call(dummyItems[0]);
 
var reduceRes = mr.reduce(emitted[0].key, [ emitted[0].value ]);
assert.deepEqual(reduceRes, emitted[0].value, 'reduce is idempotent');

A simple MapReduce example is to count the number of posts per author. So in map you could emit('author name', 1) for each document, then in reduce loop over each value and add it to a total. Make sure reduce is adding the actual number in the value, not just 1, because that won't be idempotent. Similarly, you can't just return values.length and assume each value represents 1 document.

Finalize

Now you have a single reduced value per key, which get run through the finalize function once per key.

To understand finalize, consider that this is essentially the same as not having a finalize function at all:

var finalize = function(key, value) {
  return value;
}

finalize is not necessary in every MapReduce operation, but it's very useful, for example, for calculating averages. You can't calculate the average in reduce because it can run multiple times per key, so each iteration doesn't have enough data to calculate with.

The final results returned from the operation will have one value per key, as returned from finalize if it exists, or from reduce if finalize doesn't exist.

MapReduce in PHP and Drupal

The MongoDB library for PHP does not include any special functions for MapReduce. They can be run simply as a generic command, but that takes a lot of code. I found a MongoDB-MapReduce-PHP library on Github which makes it easier. It works, but hasn't been updated in two years, so I forked the library and created my own version with what I think are some improvements.

The original library by infynyxx created an abstract class XMongoCollection that was meant to be sub-classed for every collection. I found it more useful to make XMongoCollection directly instantiable, as an extended replacement for the basic MongoCollection class. I added a mapReduceData method which returns the data from the MapReduce operation. For my Drupal application, I added a mapReduceDrupal method which wraps the results and error handling in Drupal API functions.

I could then load every collection with XMongoCollection and run mapReduce operations on it directly, like any other query. Note that the actual functions passed to MongoDB are still written in Javascript. For example:

// (this should be statically cached in a separate function)
$mongo = new Mongo($server_name);      // connection
$mongodb = $mongo->selectDB($db_name); // MongoDB instance
 
// use the new XMongoCollection class. make it available with an __autoloader.
$collection = new XMongoCollection($mongodb, $collection_name);
 
$map = <<<MAP
  function() {
    // doc is 'this'
    emit(this.category, 1);
  }
MAP;
 
$reduce = <<<REDUCE
  function(key, vals) {
    // have `variable` here passed in `setScope`
    return something;
  }
REDUCE;
 
$mr = new MongoMapReduce($map, $reduce, array( /* limit initial document set with a query here */ ));
 
// optionally pass variables to the functions. (e.g. to apply user-specified filters)
$mr->setScope(array('variable' => $variable));
 
// 2nd param becomes the temporary collection name, so tmp_mapreduce_example. 
// (This is a little messy and could be improved. Stated limitation of v1.8+ not supporting "inline" results is not entirely clear.)
// 3rd param is $collapse_value, see code
$result = $collection->mapReduceData($mr, 'example', FALSE);

MapReduce in Node.js

 
var db = new mongodb.Db(dbName, new mongodb.Server(mongoHost, mongoPort, {}));
db.open(function(error, dbClient) {
  if (error) throw error;  
  dbClient.collection(collectionName, function(err, collection) {
    collection.mapReduce(map, reduce, { 
        out : { inline : 1 },
        query: { ... },     // limit the initial set (optional)
        finalize: finalize,  // function (optional)
        verbose: true        // include stats
      },
      function(error, results, stats) {   // stats provided by verbose
        // ...
      }
    });
  });
});

It's mostly similar to the command-line syntax, except in the CLI, the results are returned from the mapReduce function, while in Node.js they are passed (asynchronously) to the callback.

MapReduce in Mongoose

Mongoose is a modeling layer on top of the MongoDB-native Node.js driver, and in the latest 2.x release does not have its own support for MapReduce. (It's supposed to be coming in 3.x.) But the underlying collection is still available:

var db = mongoose.connect('mongodb://dbHost/dbName');
// (db.connection.db is the native MongoDB driver)
 
// build a model (`Book` is a schema object)
// model is called 'Book' but collection is 'books'
mongoose.model('Book', Book, 'books');
 
...
 
var Book = db.model('Book');
Book.collection.mapReduce(...);

(I actually think this is a case of Mongoose being better without its own abstraction on top of the existing driver, so I hope the new release doesn't make it more complex.)

In sum

I initially found MapReduce very confusing, so hopefully this helps clarify rather than increase the confusion. Please write in the comments below if I've misstated or mixed up anything above.

Apr 29 2012
ben
Apr 29

Drupal's basic content unit is a "node," and to build a single node (or to perform any other Drupal activity), the codebase has to be bootstrapped, and everything needed to respond to the request (configuration, database and cache connections, etc) has to be initialized and loaded into memory from scratch. Then node_load runs through the NodeAPI hooks, multiple database queries are run, and the node is built into a single PHP object.

This is fine if your web application runs entirely through Drupal, and always will, but what if you want to move toward a more flexible Service-oriented architecture (SOA), and share your content (and users) with other applications? For example, build a mobile app with a Node.js backend like LinkedIn did; or calculate analytics for business intelligence; or have customer service reps talk to your customers in real-time; or integrate with a ticketing system; or do anything else that doesn't play to Drupal's content-publishing strengths. Maybe you just want to make your data (which is the core of your business, not the server stack) technology-agnostic. Maybe you want to migrate a legacy Drupal application to a different system, but the cost of refactoring all the business logic is prohibitive; with an SOA you could change the calculation and get the best of both worlds.

The traditional way of doing this was setting up a web service in Drupal using something like the Services module. External applications could request data over HTTP, and Drupal would respond in JSON. Each request has to wait for Drupal to bootstrap, which uses a lot of memory (every enterprise Drupal site I've ever seen has been bogged down by legacy code that runs on every request), so it's slow and doesn't scale well. Rather than relieving some load from Drupal's LAMP stack by building a separate application, you're just adding more load to both apps. To spread the load, you have to keep adding PHP/Apache/Mysql instances horizontally. Every module added to Drupal compounds the latency of Drupal's hook architecture (running thousands of function_exists calls for example), so the stakeholders involved in changing the Drupal app has to include the users of every secondary application requesting the data. With a Drupal-Services approach, other apps will always be second-class citizens, dependent on the legacy system, not allowing the "loose coupling" principle of SOA.

I've been shifting my own work from Drupal to Node.js over the last year, but I still have large Drupal applications (such as Antiques Near Me) which can't be easily moved away, and frankly don't need to be for most use cases. Overall, I tend to think of Drupal as a legacy system, burdened by too much cruft and inconsistent architecture, and no longer the best platform for most applications. I've been giving a lot of thought to ways to keep these apps future-proof without rebuilding all the parts that work well as-is.

That led me to build what I've called the "Drupal Liberator". It consists of a Drupal module and a Node.js app, and uses Redis (a very fast key-value store) for a middleman queue and MongoDB for the final storage. Here's how it works:

  • When a node (or user, or other entity type) is saved in Drupal, the module encodes it to JSON (a cross-platform format that's also native to Node.js and MongoDB), and puts it, along with metadata (an md5 checksum of the JSON, timestamp, etc), into a Redis hash (a simple key-value object, containing the metadata and the object as a JSON string). It also notifies a Redis pub/sub channel of the new hash key. (This uses 13KB of additional memory and 2ms of time for Drupal on the first node, and 1KB/1ms for subsequent node saves on the same request. If Redis is down, Drupal goes on as usual.)

  • The Node.js app, running completely independently of Drupal, is listening to the pub/sub channel. When it's pinged with a hash key, it retrieves the hash, JSON.parse's the string into a native object, possibly alters it a little (e.g., adding the checksum and timestamp into the object), and saves it into MongoDB (which also speaks JSON natively). The data type (node, user, etc) and other information in the metadata directs where it's saved. Under normal conditions, this whole process from node_save to MongoDB takes less than a second. If it were to bottleneck at some point in the flow, the Node.js app runs asynchronously, not blocking or straining Drupal in any way.

  • For redundancy, the Node.js app also polls the hash namespace every few minutes. If any part of the mechanism breaks at any time, or to catch up when first installing it, the timestamp and checksum stored in each saved object allow the two systems to easily find the last synchronized item and continue synchronizing from there.

The result is a read-only clone of the data, synchronized almost instantaneously with MongoDB. Individual nodes can be loaded without bootstrapping Drupal (or touching Apache-MySql-PHP at all), as fully-built objects. New apps utilizing the data can be built in any framework or language. The whole Drupal site could go down and the data needed for the other applications would still be usable. Complex queries (for node retrieval or aggregate statistics) that would otherwise require enormous SQL joins can be built using MapReduce and run without affecting the Drupal database.

One example of a simple use case this enables: Utilize the CMS backend to edit your content, but publish it using a thin MongoDB layer and client-side templates. (And outsource comments and other user-write interactions to a service like Disqus.) Suddenly your content displays much faster and under higher traffic with less server capacity, and you don't have to worry about Varnish or your Drupal site being "Slashdotted".

A few caveats worth mentioning: First, it's read-only. If a separate app wants to modify the data in any way (and maintain data integrity across systems), it has to communicate with Drupal, or a synchronization bridge has to be built in the other direction. (This could be the logical next step in developing this approach, and truly make Drupal a co-equal player in an SOA.)

Second, you could have Drupal write to MongoDB directly and cut out the middlemen. (And indeed that might make more sense in a lot of cases.) But I built this with the premise of an already strained Drupal site, where adding another database connection would slow it down even further. This aims to put as little additional load on Drupal as possible, with the "Liberator" acting itself as an independent app.

Third, if all you needed was instant node retrieval - for example, if your app could query MySql for node ID's, but didn't want to bootstrap Drupal to build the node objects - you could leave them in Redis and take Node.js and MongoDB out of the picture.

I've just started exploring the potential of where this can go, so I've run this mostly as a proof-of-concept so far (successfully). I'm also not releasing the code at this stage: If you want to adopt this approach to evolve your Drupal system to a service-oriented architecture, I am available as a consultant to help you do so. I've started building separate apps in Node.js that tie into Drupal sites with Ajax and found the speed and flexibility very liberating. There's also a world of non-Drupal developers who can help you leverage your data, if it could be easily liberated. I see this as opening a whole new set of doors for where legacy Drupal sites can go.

Apr 23 2012
Apr 23

Posted Apr 23, 2012 // 7 comments

Let’s face it – data is invaluable, integrations are key, and generating that information into something that will show up on a person’s browser (often) thousands of miles away in a matter of seconds is a modern miracle. However, if your site isn’t aesthetically appealing, nobody will stick around to see all the good stuff. This is where Cascading Style Sheets (CSS) come into play, and why tools such as LESS that make CSS easier to develop and revise have seen widespread adoption in web development.

CSS is an incredibly powerful tool, allowing web developers and designers to alter the entire look and feel of a website with a few simple style rules. Something as simple as: 

a, a:link, a:visited {
  text-decoration: none;
  color: red;
  font-style: italic;
}

can change the appearance of every link (the “A” tag) on every page of your site. Although very powerful and flexible, some aspects of writing CSS become redundant and hard to re-use such as colors, backgrounds, and dimensions. This is where CSS preprocessors such as LESS come to your rescue.  

LESS is More

LESS was born out of perceived shortcomings of the SASS project, and provides more options for developers to implement. LESS can be set up using a Node.js server to generate new CSS files from LESS files as they are changed. You can also include a LESS JavaScript library that will effect just-in-time compilation of LESS code on the browser. Another option available in CMS environments such as Drupal and WordPress are plugins that compile LESS code on the fly on the server so there is neither a slowdown on a user’s browser nor a need to configure a Node.js installation (which is not for the faint of heart). On to some contrived examples to demonstrate what LESS brings to the table.

Variables

For example, you want to modify the style used on several elements, but the design calls for the same color to be used in several places that don’t lend themselves to leveraging pure CSS to use the same color, as in this example where a color is used for text in one rule and the background in the other:

h1.title {
  font-size: 1.5em;
  color: #339966;
}
...
h2.title {
  font-weight: bold;
  padding: 10px;
  background: #339966;
  color: white;
}

Now if you decide that the shade of green shown above needs to be changed everywhere in the site, you have to search and replace all properties in all stylesheets with pure CSS, making sure you don’t accidentally change the color where it might still need to be used. With LESS, you can declare a variable to use like this:

@title-color: #339966;

h1.title {
  font-size: 1.5em;
  color: @title-color;
}
h2.title {
  font-weight: bold;
  padding: 10px;
  background: @title-color;
  color: white;
}

Nesting

The LESS syntax also allows for more logical grouping of CSS rules. Say for instance you had markup for a listing that had CSS like the following:

#list {
  margin: 20px 0;
}
#list .list-item {
  color: black;
}
#list .list-item.odd {
  background: #eee;
}
#list .list-item .list-item-title {
  font-size : 1.25em;
  font-weight: bold;
}
#list .list-item .list-item-content {
  font-size: 0.75em;
  margin: 0 0 0 20px;
}

This same CSS rules could be written using LESS as:

#list {
  margin: 20px 0;
  .list-item {
    color: black;
    &.odd {
      background: #eee;
    }
    .list-item-title {
      font-size : 1.25em;
      font-weight: bold;
    }
    .list-item-title {
      font-size: 0.75em;
      margin: 0 0 0 20px;
    }
  }
}

Notice how child items can logically be nested, and the selectors don’t require the parents to be specified once nested. Multiple classes and pseudo-classes can even be treated as child selector elements that modify a main class using the “&” notation. 

Mixins

Yet another powerful feature available in LESS is the concept of “mixins” which allow you to re-use CSS “fragments” in multiple rules, including parameterized values (example borrowed from the LESS site):

.rounded-corners (@radius: 5px) {
  border-radius: @radius;
  -webkit-border-radius: @radius;
  -moz-border-radius: @radius;
}

#header {
  .rounded-corners;
}
#footer {
  .rounded-corners(10px);
}

Where the CSS fragment declared for class “.rounded-corners” is re-usable in the header and footer declarations with different corner radii. 

Functions & Operators

LESS also allows for functions and operations to be applied to various CSS attributes such as dimensions and colors, which are entirely too detailed to get into in this intro. Suffice to say there are numerous examples and a full function reference available on the LESS site.

In summary, if you are looking for a more logical and flexible way to build CSS, look into LESS. It extends CSS in a logical way that allows legacy CSS to still work as always, but allow developers and designers a way to organize CSS in a way that makes more sense and gives more options for modifying rules and values en masse with either the server or browser.

References

As a Senior Developer at Phase2, Robert Bates is able to pursue his interests in solving complex multi-tier integration challenges with elegant solutions. He has experience not only in traditional web programming languages such as PHP and ...

Mar 12 2012
Mar 12

Posted Mar 12, 2012 // 0 comments

We've got our sites set on some pretty exciting sessions at the 2012 DrupalCon in Denver. I made sure to poll the team on their top picks technical, government, business and design sessions, and I have them for you here…

Design Talks We're Excited to See

Business & Strategy Talks We'll Hear

Government Talks We'll Definitely Catch

Technical Talks We Wouldn't Miss

I'm definitely looking forward to seeing some old friends in Denver week. In the meantime, if you have any recommendations for sessions we should see, leave your suggestions below.

As our Director of Marketing, Betsy Ensley is cheerfully promoting our work to prospects, clients, staff members, and the greater Drupal and semantic web communities at-large. Whether she’s tweeting about a recent blog post, attending ...

Nov 29 2011
ben
Nov 29

Drupal has the option of outputting its watchdog logs to syslog, the file-based core Unix logging mechanism. The log in most cases lives at /var/log/messages, and Drupal's logs get mixed in with all the others, so you need to cat /var/log/messages | grep drupal to filter.

But then you still have a big text file that's hard to parse. This is probably a "solved problem" many times over, but recently I had to parse the file specifically for 404'd URLs, and decided to do it (partly out of convenience but mostly to learn how) using Node.js (as a scripting language). Javascript is much easier than Bash at simple text parsing.

I put the code in a Gist, node.js script to parse Drupal logs in linux syslog (and find distinct 404'd URLs). The last few lines of URL filtering can be changed to any other specific use case you might have for reading the logs out of syslog. (This could also be used for reading non-Drupal syslogs, but the mapping applies keys like "URL" which wouldn't apply then.)

Note the comment at the top: to run it you'll need node.js and 2 NPM modules as dependencies. Then take your filtered log (using the greg method above) and pass it as a parameter, and read the output on screen or output with > to another file.

Oct 12 2011
Oct 12

Recently, for fun and learning, I built a group chatroom feature for Drupal 6.x. I've been learning and using Node.js and Backbone.js the past few months and building a chatroom seemed like a great project to stretch my skills.

I've recently pronounced it "finished" and the code is available on Github. There are a few obscure bugs left but by and large, it's plenty stable for those wanting a chatroom on Drupal 6.x.

The feature should work with any site using Spaces and Organic Groups. The demo site I setup, for example, is using a default installation of Open Atrium.

Technology stack

The chatroom is built using a now fairly standard set of technologies. For the backend, I used Node.js, Redis, and MySQL. I used Socket.io for sending the chat messages between clients and the server. I used Brunch to build the frontend. Brunch bundles together a number of really nice tools for building single-page Javascript apps including Coffeescript, Backbone.js, Underscore.js, Stitch, and Eco.

A few conclusions

Backbone.js rocks. It makes creating highly interactive, responsive interfaces almost trivial while keeping your code neatly organized. It's a very neat round-up of the best patterns for creating Javascript applications.

Hand-rolling a way to securely connect Drupal and Node.js was a pain--probably the hardest part of building the feature. Use the Node.js Integration module if you're on Drupal 7.

Redis is really impressive. It has one of the shallowest learning curve of any technology I've used. I was up and running with it in perhaps 15 minutes. Add that it's incredibly fast and you have a very handy tool to add to your toolset.

Note: the demo site that was linked from here is now off-line.

Oct 11 2011
Oct 11

Zivtech's Senior Developer, Howard Tyson, recently conducted a Node.js webinar with our Drupal partner, Acquia.

The webinar and description are below:

Bring real time interactivity to Drupal with Node.js integration:

Drupal is a powerful, flexible platform for building applications, but not something that handles realtime notifications easily. Node.js is a breath of fresh air in the Open Source web server landscape. It makes writing applications that handle thousands of open connections at the same time easily.

The Nodejs module integrates Drupal with Node.js, allowing for the best of both worlds. Realtime chat, push notifications and help desk functionality can all be easily added to your Drupal site via the Nodejs module, without the usual scalability and performance issues associated with these technologies on the LAMP stack.

This webinar addresses:

  • Why realtime?
  • Why use Node.js?
  • How does the Nodejs module integrates Drupal and Node.js
  • Current features of the Nodejs module
  • Where the Nodejs module is going

Howard Tyson, Senior Developer at Zivtech has been developing Drupal powered websites since 2006. Howard contributes to Drupal and co-maintains the Nodejs module, Version Control API among others.

Contributor(s): 

Howard

Sep 23 2011
Sep 23

Zivtech's Senior Developer, Howard Tyson, recently conducted a Node.js webinar with our partner Acquia.

The webinar and description are below:

Bring real time interactivity to Drupal with Node.js integration:

Drupal is a powerful, flexible platform for building applications, but not something that handles realtime notifications easily. Node.js is a breath of fresh air in the Open Source web server landscape. It makes writing applications that handle thousands of open connections at the same time easily.

The Nodejs module integrates Drupal with Node.js, allowing for the best of both worlds. Realtime chat, push notifications and help desk functionality can all be easily added to your Drupal site via the Nodejs module, without the usual scalability and performance issues associated with these technologies on the LAMP stack.

This webinar addresses:

  • Why realtime?
  • Why use Node.js?
  • How does the Nodejs module integrates Drupal and Node.js
  • Current features of the Nodejs module
  • Where the Nodejs module is going

Howard Tyson, Senior Developer at Zivtech has been developing Drupal powered websites since 2006. Howard contributes to Drupal and co-maintains the Nodejs module, Version Control API among others.

May 04 2010
May 04
Share this

I'm actually posting this as a question. If you're looking for the answer, sorry I don't have it yet.

How can we reasonably handle large file uploads? I'm talking in the >100MB range; YouTube, for instance, now supports 2GB files, and this will become increasingly the norm. I don't think that most servers are up to that yet, particularly if you need an application to scale.

Elephant on a Bike

Currently, using PHP, you need to set memory_limit to more than twice the upload_max_filesize, which as you can see would be prohibitive in the example of 2GB uploads; you'd need to set your PHP memory to >4GB (adding the buffer of 64M or whatever you need to run Drupal). EDIT: Looks like I was incorrect in my assumption; if you're not going to process the file, you don't need a huge memory footprint just to handle the raw uploads. Thanks Nate and Jamie!

Even if you manage to have that kind of resource available, you can probably expect things to splode with concurrent uploads...

So I spent some time yesterday looking at SWFUpload yesterday (module here), as I'd misunderstood its claims. Yes, it handles large file uploads (from the browser's standpoint), but you still need to set PHP memory accordingly. Not suitable for what I'm looking for, but it is a really nice way to handle multiple uploads. WARNING: I also learned from experience and much head-scratching that it doesn't work if you have Apache authentication on your server...

Now I'm looking at node.js as a possibility. This looks really great, and might do the job. Basically, it's a JavaScript application that sits on your server. Yes, you heard that right. Turns out that as JS has evolved, it's turned into a really tight language, and should be quite suitable for concurrent tasks.

Sorry if you came to this post looking for answers; I've simply postulated more questions. But I'm hoping that someone with more experience with this issue might be able to comment, and we'll all benefit from it. Additionally, this might turn out to be a handy addition to the Media suite, perhaps as a fancy stream wrapper for handling large files? And I'll definitely follow-up when I figure out how best to tackle this.

Thanks,
Aaron

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web