Aug 31 2010
Aug 31

Drupal 7 introduces a number of new apis and with those comes new jargon. Untangling that could be daunting. I'll try to briefly sketch what is what and relate it to terminology outside of the Drupal world.

This post is a work in progress, I am regularly revising the text from feedback.

Note Revised paragraph order and wording, clarified wording of definitions, added references, corrected glitches

Entities

An entity is an abstract data type in drupal. An Entity type is required to implement the Entity interfaces as defined in the core Entity API. It is a generalisation of the node, comment, user, taxonomy term types of previous versions of Drupal. In Drupal 7, only the R from CRUD is specified in core, but work is underway to fill the gap in next versions. There is a contributed Entity API module, which aims to do it. Entities can have multiple bundles

Bundles

Bundles are groups of fields. A bundle is a specialisation, an instance of an entity. If node is an entity, then an article is an bundle. This allows distinguishing the kinds and types within the system - kinds classify types, types classify values (concrete objects)

Fields

Fields are the incarnation of the CCK data type of the same name moved to Drupal core. What are they? The answer can be found by reading Field Api. In essence, a field is a primitive drupal data type implementing the interface described in those pages. Fields have an associated schema, formatter(s), widget(s) and settings. Example field types - file field, text field, etc... Modules can define new field types.

Field Instances

A Field Instance is a concrete specialisation of a field type. It captures the field type together with its configuration - schema, formatter and widget. Field instances are attached to bundles.

Fieldable objects

The term is sometimes used for entities and bundles.

This concludes the quick jargon tour. And last something to ponder - can a field be an entity?

References

  1. Field API
  2. Field API datastructures
  3. Entity API contributed module. Provides Entity CRUD and metadata
  4. Drupal 7 Entities And Fields-Transitioning To D7 presentation slides from Copenhagen
  5. Drupal 7 Entities And Fields-Transitioning To D7 video of the presentation
  6. How Drupal 7 fields are changing the way you write modules
Oct 28 2009
Oct 28

In part 1 I've reported the results of a micro-benchmark designed to compare the performance of plain php includes vs includes via streams from a sqlite database. In this post I extend the test cases with two more no-includes same work, to serve as a base case and includes from a mysql database, the code is the same as in sqlite case, the connector differs. You can find the code attached at the end previous post.

Overall, having in mind the random factors like netbook load (CPU and IO), the differences between the different test cases are insignificant on the test machine - Acer Aspire One netbook.

Please let me know if you decide to run these benchmarks. I will be especially interested to see what are the differences.

These results are encouraging enough to merit developing a more substantial test. I'm interested in benchmarking drupal. What scenarios would you suggest?

The benchmark

The benchmarking code is this small haskell script, which uses the criterion library to gather, process the statistics and to draw nice plots below.

import Criterion.Main (defaultMain, bench, bgroup)
import System.Cmd (system)

main = defaultMain [
    bgroup "php includes" [
               bench "none" $ system "./clean1.php"
            ,  bench "standard/clean" $ system "./clean.php.txt"
            ,  bench "standard/mixed" $ system "./non-stream1.php"
            ,  bench "streams/sqlite" $ system "./stream1.php.txt"
            ,  bench "streams/mysql" $ system "./stream_mysql.php"
            ]
   ]

I've run it with 200 samples

./bench -tpng:552x368 -kpng:552x368  -s 200

The benchmark results

Base case

The base case uses no include files, but simulates the work by looping 20 times through a print statement.

benchmarking php includes/none
collecting 200 samples, 2 iterations each, in estimated 21.72279 s
bootstrapping with 100000 resamples
mean: 70.03678 ms, lb 60.15143 ms, ub 94.78743 ms, ci 0.950
std dev: 113.6834 ms, lb 63.42229 ms, ub 189.0330 ms, ci 0.950
found 6 outliers among 200 samples (3.0%)
  4 (2.0%) high severe
variance introduced by outliers: 16.000%
variance is moderately inflated by outliers

The outliers are probably the result of the combination of an underpowered system and a number of other applications running at thetime of the test.


Clean includes

This test case includes 20 php files from the current directory.

benchmarking php includes/standard/clean
collecting 200 samples, 2 iterations each, in estimated 21.10023 s
bootstrapping with 100000 resamples
mean: 60.50757 ms, lb 58.28822 ms, ub 65.57988 ms, ci 0.950
std dev: 22.93855 ms, lb 9.368661 ms, ub 40.26889 ms, ci 0.950
found 28 outliers among 200 samples (14.0%)
  13 (6.5%) high mild
  15 (7.5%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers

The difference with the base case should be insignificant, but is probably affected by random system load. It does hint that the differences displayed previously are not significant.


Mixed includes

This test case does the same work as the previous one, but includes the stub code for the database includes.

benchmarking php includes/standard/mixed
collecting 200 samples, 1 iterations each, in estimated 134.8954 s
bootstrapping with 100000 resamples
mean: 70.91094 ms, lb 61.86253 ms, ub 101.4884 ms, ci 0.950
std dev: 106.7305 ms, lb 31.32543 ms, ub 240.4033 ms, ci 0.950
found 20 outliers among 200 samples (10.0%)
  11 (5.5%) high mild
  9 (4.5%) high severe
variance introduced by outliers: 14.000%
variance is moderately inflated by outliers

Again, since the variance is moderately affected by the outliers we should take this result with a pinch of salt.


Sqlite streams

This test includes 20 scripts from an sqlite3 database.

benchmarking php includes/streams/sqlite
collecting 200 samples, 1 iterations each, in estimated 13.31139 s
bootstrapping with 100000 resamples
mean: 79.69618 ms, lb 73.51617 ms, ub 98.44982 ms, ci 0.950
std dev: 69.63256 ms, lb 16.12973 ms, ub 150.9165 ms, ci 0.950
found 28 outliers among 200 samples (14.0%)
  14 (7.0%) high mild
  14 (7.0%) high severe
variance introduced by outliers: 6.000%
variance is slightly inflated by outliers


MySQL streams

This test includes 20 scripts from a local mysql database

benchmarking php includes/streams/mysql
collecting 200 samples, 1 iterations each, in estimated 13.17959 s
bootstrapping with 100000 resamples
mean: 70.95483 ms, lb 69.74397 ms, ub 72.55464 ms, ci 0.950
std dev: 10.04377 ms, lb 8.195368 ms, ub 12.52959 ms, ci 0.950
found 19 outliers among 200 samples (9.5%)
  11 (5.5%) high mild
  8 (4.0%) high severe
variance introduced by outliers: 0.500%
variance is unaffected by outliers

The difference with sqlite3 for this benchmark is insignificant.


Conclusion

It becomes clearer that the performance difference is relatively insignificant, even in such micro-benchmarks designed to highlight it, by being unfair to the streams version. If you add a significant amount of code and actually do something with it, like a drupal site would, the difference won't be noticeable. Still this hypothesis needs testing.

Oct 24 2009
Oct 24

During the drupal plugin/update manager discussions I had an aha moment. One of those weird and wonderful ideas came back to me. What if most of the code lived in the db? One would be able to arrange the co-habitation of several concurrent versions of the same website relatively easy. Backups would mean database backup.

Funnily enough, this can help two opposite (scale-wise) types of users - the bottom end, cheapest or free hosting ones and the load balanced crowd.

Why "back"? Well... I had this idea ever since the user streams appeared in php, version 4.3 or there abouts, but it just nestled cosily in the back of my mind, waiting for love, the shy little thing.

The problem

Ok. So what is this about? Since php allows you to write stream wrappers and include* and require* can use arbitrary streams to load code, one should be able to put the code in a database, load it and execute it. The biggest obvious downside is that it is probably slow. How much?

I decided to benchmark it. I've prepared a micro-benchmark to test the idea and to see how significant would be the difference in performance. One should note, that since this is mostly an IO bound task, the difference in performance will result mostly in higher response times, rather than cpu load. Bear in mind that the benchmarks were performed on a tiny Acer Aspire One netbook with 512MB RAM with its standard SSD drive.

The benchmark

I've prepared three different small programs. The first just including 20 php files. The third including the same code from sqlite3 via streams. The second is including the 20 php files, but contains the streams code to have a similar parsing time profile. The files are attached to this post, if you want to run them yourselves, just rename and assign the appropriate permissions.

I've used the criterion haskell library to gather, process the statistics for me and to draw the nice plots below.

The haskell program is simple. It just declares and executes the three benchmarks:


import Criterion.Main (defaultMain, bench, bgroup)
import System.Cmd (system)

main = defaultMain [
    bgroup "php includes" [
               bench "standard/clean" $ system "./clean.php"
            ,  bench "standard/mixed" $ system "./non-stream1.php"
            ,  bench "streams" $ system "./stream1.php"
            ]
   ]

To compile use


ghc --make bench

The streams

I've writtern a barebones TestStream class adhering to the streams api, pass it to stream wrapper and do 20 times include_once. The includes have one print statement ala hello world.

The non-stream versions

The base case "standard/clean" just includes the 20 files. The "standard/mixed" includes the 20 files and has a useless copy of the TestStream class to bulk up the code to judge the significance of the parsing overhead.

The benchmark results

Standard Clean


benchmarking php includes/standard/clean
collecting 100 samples, 2 iterations each, in estimated 12.13241 s
bootstrapping with 100000 resamples
mean: 58.12652 ms, lb 57.14786 ms, ub 60.15813 ms, ci 0.950
std dev: 6.912029 ms, lb 4.108045 ms, ub 13.29588 ms, ci 0.950
found 6 outliers among 100 samples (6.0%)
  2 (2.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Standard mixed


benchmarking php includes/standard/mixed
collecting 100 samples, 2 iterations each, in estimated 11.08999 s
bootstrapping with 100000 resamples
mean: 58.86753 ms, lb 57.81748 ms, ub 60.82246 ms, ci 0.950
std dev: 7.118014 ms, lb 4.625828 ms, ub 12.58350 ms, ci 0.950
found 8 outliers among 100 samples (8.0%)
  5 (5.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Streams


benchmarking php includes/streams
collecting 100 samples, 2 iterations each, in estimated 14.42270 s
bootstrapping with 100000 resamples
mean: 76.48482 ms, lb 74.66795 ms, ub 78.86988 ms, ci 0.950
std dev: 10.60164 ms, lb 8.515426 ms, ub 13.80536 ms, ci 0.950
found 8 outliers among 100 samples (8.0%)
  7 (7.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Conclusions

As expected, the streams code is slower, it adds around 1ms per include file. If you compare the probability density estimates, you will see that there is a small, albeit probably insignificant, overlap between the standard and stream versions. The results suggest that in larger programs the effect will be far less significant. The results are encouraging. This technique definitely merits further investigation, run it with mysql - the most widespread database deployed alongside php and if time permits against a patched version of drupal.

Oct 08 2009
Oct 08

The London drupal community has thrown a bid into the hat to host DrupalCon 2011. It is a strong bid from a strong team for one of the best cities in the world. So just vote in the poll The preferred venue is Southbank Centre. It tickles me silly to just compare that to a leg of lamb. Don't ask. Ok. Fine. won't be that cruel. Just imagine a pub and food - that's one memory from the first Drupal meet in London. And now Southbank...

There are community polls out on g.d.o for DrupalCon2010 and DrupalCon2011 in Europe. These polls do matter, but are not deal breakers. They are there to reflect the community preference.

And just one last note - I would love Berlin for 2010. I love that city, especially East Berlin. Sorry Copenhagen =(

Oct 02 2009
Oct 02

This website is built on drupal - a content management system extraordinaire. Until the start October 2009 it used to be based on a modded version 4.6, but I upgraded it to the latest release, as part of my 'looking for a job and pimping my assets' project. Now it is just a vanilla Drupal with a custom theme. A variation of the theme I created for the South Wales Linux User Group as currently seen on http://beta.swlug.org

the new look theme

It is based around a grid "theory", to enforce some regularity and rhythm. The base for maintaining the vertical rhythm is 12px font/18px line height. The horisontal grid is organised around 6px (0.5em) units and is currently split into 12 columns. This gives enough room to play with in the future. It is a simple two (8/4) column design.

The lines are always a multiple of 18px (1.5em). Since there is a lot of corner cases to contend with and not much time to do it in I've compromised on the relative units and using pixels. For the theme to be submitted it will require readjusting the line-x classes to use ems.

The colours are simple black, red, yellow, blue and gray chosen to be not in your face but with a few spots of brightness. It should be fairly accessible, although it wasn't a goal for this personal website. If I am to submit the theme to http://drupal.org, there are a couple of amendments required.

I wanted a simple fresh look, be done in a day and focus on my real priorities. I hope it achieves that. Time will tell.

Check out the html source and the stylesheets for details.

Apr 22 2009
Apr 22

I've been involved in the organisation of a number of scientific conferences over the past few years. For example 4m/ICOMM 2009 (submissions), I*PROMS 2009 (submissions). The conference sites are usually powered by drupal, the submissions using OpenConf. Managing the submission and peer review of papers is a chore, unfortunately, as is often the case, software gets in your way. So here I try to skim over some of the experiences I had with OpenConf - the good and the bad.

I've tested indico, which looks like a very good system, but it is heavy and over-engineered, especially if you want to run a couple of events, and not hundreds or thousands. It might be good for organisations managing a big number of events, but not for us.

After going through a number of trials I've settled on openconf. It was simple. It fitted the current web infrastructure - LAMP. I reckoned, I could eventually integrate it, or simply transfer some of the data into drupal, and maybe even benefit from code reuse. Surprisingly, I ended being both pleasantly surprised and not so.

It was an interesting experience. Openconf is a strangely written software. It is quite hackish, in a bad way. Inside it uses quite a lot of cryptic names, virtually no code documentation or useful comments, etc... A lot of 'bad style' code. The system won't win a beauty or security contest - that's for sure.

What won me over, and I will probably use it again, is that it is trivially moldable to my requirements. Let's give a few examples.

Styling it

Well, it has very little significant markup present. Minimal, considering some of the monstrosities I've seen over the years. To personalise the look of the submission pages I had to modify three nearly empty files - the header and footer php scripts for some limited wrapping markup and the openconf css file. That's all. Ok, that was sufficient to modify the overall styling, so that it is consistent with the conference 'mother sites'.

Workflow

Conference systems are not automatically adaptable to your workflow. You usually adapt your workflow to your system. They are not unique in this respect. A lot of enterprise CRM, workflow whatnot systems force you to adopt what they consider good practices, but that is a rant for antoher time - it makes good business for a lot of people. In this respect openconf is not unique - it presumes a workflow, and it even uses terminology which was alien to us.

The first thing to change was the terminology. An afternoon of reading code and testing resulted in a handful of scripts to replace advocates for theme chairs and some such. Annoying, sure, but not that hard.

A sequence of happy coincidences, helped with other problems, for example how do you check if authors revised their papers after review and if not send them a reminder email. I was prepared to check the file modification times and filter by date. Doable, but would result in some strange looking sql queries. Instead, due to publisher requirements, I ended checking for file types, and getting the list of forgetful paper authors that way. Funnily enough, the email.php file is contains both the best and worst of the code in openconf. The emails and recipient kinds declaration is fairly declarative. Just an array of definitions with stuff like titles, and sql queries in there. And based on your choice and php name magic you get the appropriate template, for which the appropriate list of recipients is pulled from the db. Nice. To make matters even better, it is a long flat file of if .. else .. statements with a few function declarations in the middle. It took me a while to get used to that. But for all its ugly insides, there is something good - it is easy to add modify the markup. No over-engineered templates - the system doesn't really need that. These cosmetic changes are surprisingly important, since it helped me improve the interface, to differentiate between emails for positive, negative and other causes - no wrong emails afterwards.

Random addons

For one of the last conferences I had to add scripts to make all paper authors reviewers, do custom reports etc.... It was a couple of days of work, mostly testing and reading code. And only one file to modify - the one where I had to add a link to the new functionality. All that required very little modification. Just add the new functionality.

The end is nigh

To wrap it up. Even badly written software could end up being more useful in practice than a number of well written, carefully designed systems. I have a feeling that this is the story of a lot of php projects, and the language itself. Could it be that the beauty is in the eye of the beholder? Or maybe the authors guys know something I miss - because the software does do the job, maybe not brilliantly, but good enough to be re-used again and again.

What makes openconf so moldable? Probably the php "component" architecture, that is each different kind of page has a different entry point php script, with shared includes for code reuse. This meant I wouldn't break more than one page at a time. Which in turn can lead to task based modifications, which in turn made my life easier, despite occasional the surprises.

Oct 17 2008
Oct 17
HAI
IM IN UR BUCKETS MAKING UP FORMATS
GIMME FEEDS
   I CAN HAS FEED DRUPAL PLANET
      ITZ AT http://drupal.org/planet/rss.xml
      INVISIBLE METADATA
      LOL
   I CAN HAS FEED DRUPAL GROUP POOL
      ITZ AT http://api.flickr.com/services/feeds/[email protected]&lang=en-us&format=lol
      INVISIBLE METADATA
      LOL
   I CAN HAS FEED VERTICE'S PHOTOSTREAM
      ITZ AT http://api.flickr.com/services/feeds/[email protected]&lang=en-us&format=lol
      INVISIBLE METADATA
      LOL
I IS BORED
KTHXBYE.

And the world's first lolfeed library

Jun 20 2008
Jun 20

There is a discussion going on in php land about introducing closures and lambda functions, there was even a discussion on haskell-cafe about it (chx strikes again ;). About time, I would say. Having this functionality is a bonus. Having it implemented badly or half-arsed is going to do more damage than help. This is a short summary of what do I understand from the rfc and what do I think about it.

Anonymous functions aka lambda functions

The proposed syntax is

function & (parameters) { body }

No automatic variable capture. That is no variables from the parent scope are visible in the lambda function. So far so good, this is consistent with the default behaviour of php functions - you need to use global (in the process of becoming deprecated) or $GLOBALS to gain access to the global variables.

Closures and the newly proposed lexical keyword

That is something I'm not sure about. Although it is consistent with the current usage of global, with the combination of lambda functions it seems like a redundant syntax noise. The only explanation for me is to keep the runtime simpler, by requiring the explicit imports. Fair enough, although this verbosity does bug me a lot. It imports the variable by reference, similar to global. The consistency is a good thing. But consider the following trivial example:

for( $i = 1;  $i < 10;  $i++ ) 
  $a[] = function() { lexical $i; print $i;};
....
//What will it print?
$a[9]();  

As I understand the proposal and php (I might be wrong) the result of all of the $a[...]() functions will be 10. Which is definitely not what is intended - we can't refer to the same code in different contexts. But this is one of the most common uses for closures - capture the current state for later use! Erm. Yeah, sure, you could possibly work around it with explicit copy, but why? You lose update, true, but when do you actually need update? Which case is more 'natural'?

Can't one do something like:

$i = 1;
//highly illegal, i'll get locked up for this
$lambda = function ( $i &= $i ) { $i = 10; };

when you want to be able to update the parent variable? True &= is ugly. It will make the closures patch far more intrusive. It is true that the same trick - using defaults for arguments in a lambda will work the other way around. The code based on the rfc.

$i = 1;
for( $i = 1; $i < 10; $i++ ) {
  $lambdas['ref'][] = function () { lexical $i; print $i; };
  $lambdas['copy'][] = function ( $i = $i ) { print $i; };
}
//prints 9
$lambdas['ref'][3]();
//prints 3
$lambdas['copy'][3]();

But what is the more common use, especially existing exaples from other languages, and what is more natural in php? Least surprise?

The proposed interaction with classes reinforces my feeling of a half-baked proposal. Why automatically import $this, while not doing it for other variables. IMHO using references rather than copies is not thought through properly. Why do you need to make the $this automatic? Why do you need to use the class/object part of the scope, and make the anonymous function essentially an anonymous method, while not importing the whole scope and prune only the used parts at compile or run time? Why not something like:


class someclass {
   function one( ) {
       return function () { lexical $this; ... }
   }
   function another( ) {
       //errrhhm - this might cause problems... 
       //since the inner $this is a copy, not a reference...
       return function ( $this = $this ) { ... }
   }
}

Will it cause too much trouble to alpha convert a lambda, and leave it as simple as possible? Turning

$i = 1;
$x = function( $y ) { lexical $i; ... }

at compile time into the equivalent

$i = 1;
function lambda_xxxxgenname( $y, $i = $i ) { lexical $i; }

. Unfortunately this implies copy semantics of lexical and will have unexpected effects with variable function arguments use.

The problem probably lies with how do you do it in a way so that it doesn't change existing php semantics. I don't grok the php internals enough to actually do it, but I feel these questions are important.

OK, it is an RFC, now it is the time to discuss. It is obvious that there will be a need for a compromise, since changing too much will make a majour mess in an already delicate runtime.

As a side note, proper lambdas will help clean up the phptemplate trickery in drupal. Now, it does some hairy acrobatic acts to achieve somewhat similar functionality.

update: added the side by side comparison code for the copy versus reference(using the current rfc) semantics (not sure about the default arguments part)

And a challenge - can you define the Y combinator in php with lambdas & closures?

Jul 16 2007
Jul 16

What helped immensely of all these first days was The Drupal Cookbook (for new drupallers). It is a great resource, indeed. The only problem is (I think) that if this is your first CMS, you will find you need to be like an intellectual Shiva, learning on the go new concepts, new ways of thinking about the site structure, a whole new vocabulary to learn and, at the same time, get your hands dirty with the nits and bolts of the actual site.
http://drupalfordummies.blogspot.com/

And you gain super cow powers. Nuff said ;)

Kudos to the docteam. If my sister says it helps, she means it - otherwise I would have heard her, you saved my head :)

Jun 21 2007
Jun 21

Lately I was looking into how to reduce the spam traffic to this website. Not just comment spam, but various harvesters and other nasties. They steal too much http bandwidth.

.htaccess methods are tempting, but they have a huge disadvantage - they are static. DNS blacklists can be used to dynamically query 'is this ip a known threat?'. One such list is provided by project honepot. They have an apache module in beta implementing it. If you don't have the option, or want a bit more dynamism, you can do the checks from your own php script.

In drupal there is already an httpbl module, but I decided not to use it. It looked easier to just insert the checks in index.php. The other benefit is that I can interfere before the drupal bootstrap has even started. The downside - none of the goodies provided by the module. I used a modified version of the script provided by planet ozh.

My modifications are adding a random link to various traps and do a few other custom niceties. Otherwise you can just add
require_once "httpbl.php" before all other code in index.php. This will ensure that nothing else gets processed if you are hit by a bot.

Interestingly enough, half an hour(ish) doing this I got:
2007-06-21 :: 05-04-40 :: BLOCKED 68.186.149.178 :: 5 :: 18 :: 2 :: /comment/reply/126 :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Avant Browser [avantbrowser.com]; Hotbar 4.4.5.0)
2007-06-21 :: 05-04-49 :: BLOCKED 68.186.149.178 :: 5 :: 18 :: 2 :: / :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Avant Browser [avantbrowser.com]; Hotbar 4.4.5.0)
2007-06-21 :: 05-04-55 :: BLOCKED 68.186.149.178 :: 5 :: 18 :: 2 :: / :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Avant Browser [avantbrowser.com]; Hotbar 4.4.5.0)
2007-06-21 :: 05-06-47 :: BLOCKED 58.225.246.205 :: 5 :: 5 :: 51 :: /comment/reply/215 :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
2007-06-21 :: 05-07-31 :: BLOCKED 211.109.26.212 :: 5 :: 5 :: 50 :: /comment/reply/215 :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
2007-06-21 :: 05-13-56 :: BLOCKED 76.111.216.245 :: 5 :: 5 :: 28 :: /comment/reply/211 :: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
2007-06-21 :: 05-30-05 :: BLOCKED 222.221.254.163 :: 5 :: 51 :: 1 :: /comment/reply/238 :: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2007-06-21 :: 05-30-08 :: BLOCKED 200.210.47.199 :: 5 :: 41 :: 2 :: /comment/reply/238 :: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
2007-06-21 :: 05-39-19 :: BLOCKED 72.232.83.82 :: 5 :: 19 :: 17 :: /15.05.2007/man_i_just_have_to_link_to_this/ :: Fzywenob odwvlxrh mdpxegr
2007-06-21 :: 05-39-24 :: BLOCKED 201.25.52.10 :: 5 :: 29 :: 1 :: /comment/reply/220 :: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

bastards. all I can say.

While this is not going to stop spam, I hope it will at least reduce it a bit.

Jun 13 2007
Jun 13

There are quite a few interesting computer science artefacts in Drupal. I'm going to try to highlight a few of them. Sorry, this is unfinished, and might not be finished ever, but I'm just trying to capture a snapshot of confused meandering thoughts.

Very late, dynamic binding

Late binding came to popular life with the advent of object oriented languages. Essentially it means binding of values, for example functions, to names at object creation, as opposed to compile or link time. In drupal, this can happen at any time, more even, it is algorithmic - that is you can change at runtime what is to be executed at a specific control point. The hook system is one of the ways to do it.

Algorithmic type derivatives

A related subject - type extensions in Drupal. First of a types - that is types of objects drupal deals with, the base php types, user, node, taxonomy, term, file, drupal arrays,....

Type derivatives are algorithic, context sensitive, have a generally rewriting as opposed to extending. Let's take a look at the richest drupal type - node.

Node has a default structure. A trimmed version - node id, title, body. From these the only actually required is the node id, but it can be ommited, since it is an automatic, system attribute of a node object.

You could implement derived node types, all they need to do is to obey the node protocol - required and maybe optional node hooks. Alternatively you can alias a node type. For example look at page and story in drupal core v5.

There is a 'type alteration' mechanism, hook nodeapi. It allows you to dynamically modify a node object at different times of its life - load, insert, delete, view, .... Generally it is used as a type extension mechanism, but can be used to actually perform any kind of transform of the type object. Not type safe, but generally any transform will do, as long as you preserve the node protocol. The transform is non-deterministic, since you can't guarantee the order of execution.

Annotated arrays transformation

A variation on the theme is the structured arrays, as advocated by form api, now spreading to areas removed from form processing. They are a kind of an annotated graph, processed by a transformer. In the presense of standard processing elements and allowing extensible callbacks (annotations), this structure starts resembling an AST of an aspect oriented language.

The structured arrays, allow you to add a lot of flexibility to how to structure your object abstractions in drupal. For example look at the drupal form api. You can conditionally alter the structure of the form array, change it's functionality and how it should be presented. Attach different validation functions, introduce new form element types, ....

The combination of these tools leads to a fairly unpredictable (unless you accept the danger of non-determinism, and code around it, juet remember - it is not checked), but quite interesting and fairly elegant framework for creating websites. Collaborative or others.

Jun 05 2007
Jun 05

The state of affairs with files and file handling is strange.

  • Files are not first class content in core. It is disputable whether they should be or not, I have no clue about the answer either, basically I'm sitting on the fence on that.
  • There is a file api, which provides basic file operations. It is quite improved in D6, and I hope that hook file will get into core as well.
  • The only file related module in core provides attachments, similar to email attachments, and with various filter modules you can embed them inside your node's content. This is a very specific text document oriented need, and is very useful
  • There are loads of contrib modules and ways you can deal with files in the larger drupal universe.
  • There is a lack of modules tackling what I would call document management.

So what do I mean by document management?

Where stuff falls short currently in Drupal

There are decent file system reflection interfaces in contrib, unfortunately they are not up to speed with update over the web or having revision/version control.

There are file-node/field style modules, but they don't provide decent directory overviews, expected UI, etc...

It is hard to commit to contrib modules for the long term, since you don't know how long they will survive in the wild.

Where Drupal hits home

The direction of the fileapi is (IMO) right. The evolution towards different file handling backends is both sensible and good. The distinction in HEAD between upload module and general drupal file db data is a good direction, I hope. The hook file patch idea is very good, hopefully it will get before code freeze.

All of the above do set up a base framework which, while not providing document management, does provide consistent extensible file handling.

When all that is settled we can think about how do we satisfy the various diverse needs - filesystem based directory structure, file dupes handling, revision handling, permissions, webdav, ftp integration, ....

Until then I'm meeting some of my users day to day needs with my docs module. Well, a shamless plug, but I'm not sorry for it :) It is mainly a UI project, not commiting into any of the hard decisions outlined above. That one should be a cleanup the contrib style project, by merging the efforts of a bunch of interested developers.

Jun 01 2007
Jun 01

The second alpha of the docs module for drupal is out. I've been doing a bit of code cleanup, and I'm moderately happy with it.

There are a couple of unfortunate consequences of the reliance on upload.module, for example the node edit. While the maintenance is simple, which is paramount, docs doesn't provide a mechanism to upload a revised version of a file. That is due to the way upload.module has it's revisions logic. It is possible to do it, but I decided to leave that out completely, opting for creating an add-on module for that. The logic is simple - not everyone will need it, so keep the core docs module as simple as possible. It is complicated enough, without doing anything really special.

Give it a go. The packaged docs 5.x-1.0-alpha2 is available from the docs module project pages

May 29 2007
May 29

This is for John, mainly, but anyone else read on, if you fancy it. Expect me to follow my website motto. This post is a patchwork of thoughts coming from going through actions module, some of those can be damaging or tedious or both.

I had a look at actions, it looks good. Can be cleaned up, but it is better to see what needs to mature further. The api reminds me of one ofthe evaluation models for lambda calculus (don't ask which or what - no clue, I'm bad with those things, if ain't in me bookmarks it ain't in me head).

The primitive, that is php coded actions, are atomic, and defined in some (environment) context. Something like action( $context, $arguments). Conditionals can be defined, for D7, for example. If the conditionals are cond( $context, $pred, $then, $else ). Since we are passing the context explicitly, nothing stops us implementing action closures - not on atomic actions, but on the upper level by adding something like a procedure, or abstraction, or whatever you like to call it, which has very simple 'stack' semantics. abs($context, arguments ) body the arguments get added to context if this form is used. Effectively something like this might happen (the meaning of the following, not how you would write it):

abs($context, $a, $b, $c) {
an_action( context, abs( $context + $a + $b + $c, ... ), ... )
}

Which means that an_action get passed a context ( which is defined when? ), and the new abs( some new non-atomic action, which has an extended context/environment.

It is a bit awkward. Essentially context == environment is cs literature. The problem is that it can become very messy. Especially that the current implementation allows something very close to dynamic scoping, which can be very awkward indeed. Very powerful, but awkward.

For D7 - (or D6, if you can do it), I would suggest using only context + action arguments, and for node actions, for example, stuffing the node object inside the context, essentially this would mean that the node is part of the action definition, or alternative phrasing(s) could be the node is in the action's scope, node action. Similarly comment, etc... The context can be (is already?) the omnipresent drupal 'structured' array. BTW, it's not accidental that this sounds like OO. It nearly is.

In the drupal array we can add #action and maybe #actions, so that drupal_builder calls those actions, stuffing the element into the context. This is the simplest way of doing it the 'new way', I think. Then we will end having two axis - time - load, alter, submit, .... and element - whatever that element is :) Hooks reinvented.

There are similarities with the callbacks as used in the structured array in form_api, and now more extensively in D6. I don't think there should be much difference between a callback and an action. Just actions can be manipulated by third party modules, like workflow, using an UI. Ultimately an action should be indistinguishable from a callback. There is still a way to go to reach it, but hey, there is time.

By the way, on a but different note. We are building a tree with annotations, the said structured drupal array, as used in drupal_alter. The order of execution of the callbacks per element, including drupal/form_alter is unknown. There is no protection about 'conflict of interest', that is one callback modifying the array in the same place where another wants to modify. Some form of sequencing might be nice. Since all callbacks are already encapsulated, some omitted by default because they are always executed, i.e. part of the workflow, implementing a 'monadic' treatment will be simple. Well, not really, since we don't have any convention, or an idea of how do we declare sequencing. The 'traditional' weights are a no go, since they carry little information and don't solve the problem, just delay it. I don't have the answers, just the questions.

How do you define a guaranteed sequence of callbacks, while keeping the API simple, don't descend into clutter and being reasonably fast?

Jan 15 2007
Jan 15

After 8 months of development we are ready to release Drupal 5.0 to the world. Today is also Drupal's 6th birthday, so the timing could not be more perfect. Drupal 4.0 was released in 2002 and finally we feel confident to increase the major version number from 4 to 5.

http://drupal.org/drupal-5.0

There are improvements in the install, the administrative UI ...

It is a bit scary under the hood.

Garland is simply gorgeous.

Nov 14 2006
Nov 14

The final result, as voted for by judges from The Open Source Collective, MySQL, the Eclipse Foundation, and 16,000 users on http://www.PacktPub.com saw a tie for first place between Joomla! and Drupal. In the event of a tie, a fourth independent judge would be brought in. This was Apoorv Durga who is a member of CM Pros and runs his own blog http://apoorv.info/ on portals and content management. This crucial vote ended up with Joomla! triumphing over Drupal by one point.

The final result was as follows:

1. Joomla!- $5,000
2. Drupal - $3,000
3. Plone - $2,000

from http://www.packtpub.com/article/open-source-content-management-system-award-winner-announced

As chx says - $3,000 buys a server.

What is more interesting, is what were the criteria for choice/points. This is a ~16000 survey.

Oh, yes, Joomla tribe - well done :).
update:
on drupal.org
on joomla.org

aplology: sorry for the occasional spelling mistake. I might upgrade to ff2.0.

Nov 07 2006
Nov 07

[[http://fsfeurope.org|FSFE]] started drm.info - a collaborative information platform. Collaborative + information => Drupal. Obviously the designers/developers though so.

While on the topic of DRM - you could check (yes, you Apple fans and iTunes addicts, I'm looking at you, and I'm not blinking) the Deffective by Design campaign. Looks like it has been drupal fueled as well.

If you are inspired, feeling lazy yet subversive, you could tag appropriate, that is products encouraging/using DRM (Digital Restrictions Management) with the defectivebydesign tag on amazon.

If you are not that lazy, you could make noise, spread the word, basically do something.

Nov 01 2006
Nov 01

Sweet. Really. There are not too many big changes, but funnily enough they are felt more than previous betas/pre-releases.

First you are struck with the really nice, simple, stylish, cute, add your own epithet default theme - Garland. You can modify the colour scheme online. So palette builders enjoy and share.

Kudos to the reorganised admin pages and everyone involved in that. At first glance it is allright.

The infamous cck and views leave a lot of traces in core.

All I can say it took me between 5 and 10 minutes to setup a basic group blog site. That includes database creation magic, (php) file uploads, user registration ...

Time for theming, upgrading and code porting I suppose. Just where could I get it from?

Aug 17 2006
Aug 17

Ernest has done quite a lot of experiments with his collaborative editor. Early on I've advised him early on to focus on the communication/collision detetection/... part as opposed to a new really fancy editor. In the ideal world his code should be mixed in with and editor of your choice, but that is a utopian fantasy.

At the moment the demo code is focused around a special node, but Ernest is working on removing the specialisation to eventually handle any form.

If you are reading this please try it (links follow) and give him comments, ideas, etc... Don't kill him though - the learning curve of the drupal apis is steepish. Help with ideas and bug reports works better.

the drupal groups page
how to test the module page a bit outdated, but the major points still valid (hint: look for the usernames)

Jun 15 2006
Jun 15
Vertice sign
Originally uploaded by bmann. />

Well, well....That is a surprise. Adrian must be chuffed :).


Well done Boris. Have you spotted a dikini plaque somewhere? I'll pay in beer.


read more

Jun 14 2006
Jun 14

The Drupal way in programming is 60% about hooks 10% about nodeapi and the rest is ingenuity. That's allright, but one fact has always troubled me. And that is - how much effort is spent on discovering which module implements the hook I need. Far enough, the old caching/memoizing trick does the job to ensure that the cpu time is not expensive. Yeah, but still it's outside simplicity might be, just might be implemented in a more elegant fashion.

Drupal hooks implement a nearly Aspect Oriented API. The cross-cut is the point where the hook is invoked. aspect(cross-cut) === hook. The different aspects are declared as moduleName_hook. The disatvantage is that these are indiscriminate. Hooks are always fired. Pros - simplicty, cons - wasted cpu cycles.

I wonder. Is it possible to achieve a similarly simple syntax, but with the added advantage of not firing unnessesary hook calls.

Problem (1st iteration)

We need to call all functions interested in this particular execution point. The current context and the run history should determine which functions are actually interested in this particular point/label/crosscut.

So what if:

.....
hook('view','a callback);
.....
//in a code piece far away
invoke('view', .... );

In invoke() we can have something similar to the current code

function invoke($hook,$args) {
...
while($call = next($hooks[$hook])) {
$call($args);
}
...
}

How to implement hook(...)?

function hook($hook,$callback) {
invoke( $hook, $callback, 'add');
}

We would need an unhook() though, so that the interest is fully managed.
function unhook($hook,$callback) {
invoke( $hook, $callback, 'remove');
}

And finally the modified invoke:
function invoke($hook, $args, $op) {
static $hooks;

if($op == 'add') {
//memory waste, but can do for now
$hooks[$hook][$args] = $args;
return;
}
elseif($op == 'remove') {
unset($hooks[$hook][$args]);
return;
}

while($call = next($hooks[$hook])) {
$call($args);
}
}

Order please

Partial order that is. It would be handy to be able to have before($hook) and after($hook) calls. This will allow manipulating the sequence of calls, without modifying the grand plan request workflow.

before($hook) adds the callback before the hook point. The order of the other 'before' callbacks for this hook is unknown.

after($hook) does the same, but after the execution of the hook

The arguments of the callbacks added with before and after are the same as for hook for a given cross-cut/hook point

The implementation is similar to hook

function before($hook,$callback) {
invoke( $hook, $callback, 'add-before');
}

function not_before($hook,$callback) {
invoke( $hook, $callback, 'remove-before');
}

function not_before($hook,$callback) {
invoke( $hook, $callback, 'add-after');
}

function not_after($hook,$callback) {
invoke( $hook, $callback, 'remove-after');
}
....
//in invoke, the rest is similar
if($op == 'add-before') {
unset($hooks[$hook]['before'][$args]);
return;
}
....

An interesting 'theoretical' consequence is that this is a kind of a higher-order functions based implementation of aspect style programming in an imperative language. That is if you are into such words. It is more interesting thing to notice is that the $var() syntax allow you to code wanted patterns and execute them later on by simply passing new $var to the pattern function. This technique parallells lisp and scheme style macros, you just can't easily manipulate them.

Second iteration or cleaning up the nest

There are a few problems with the above code. It doesn't reflect faithfully the problem, that is the treatment of arguments and results. It is too verbose and can do with some optimisation.

Let's start with the model. The type of an aspect function is ($args→$result). We are not interested in actual shape of either, but their correct treatment. It suffice to say that the only requirement to both is that they are the same for each separate hook. The shape of the aspect computation is (∥($args→$result))→(∥($args→$result))→(∥($args→$result)), where ∥ denotes parallell or unordered behaviour of a bunch of functions. The last remark is important, since we are introducing order or synchronisation with the before and after operations, but that order is only partial, relevant to the bigger picture. In each phase the order of execution should remain unknown. This doesn't break the abstraction boundaries, which is good. This as well means that the functions in each phase must not change $args and must not overwrite the same part of $result. This restriction won't be enforced for both simplicty and performance, it should be tested with unit tests in real life code. Failing to obey this discipline will cause an unexpected behaviour.

Let's change the code then. What is the shape of a typical before function?
function a_before($args, &$result) {
....
}
As it happens the shapes of the at and after functions are the same. We are enforcing a partial order of evaluation and enhancing a function, not changing it's signature.
The evaluator can be written out as looping over each of the different phases
....
while(current( $hooks[$hook]['before'] )) {
$cb = key( $hooks[$hook]['before'] );
$cb($args);
next( $hooks[$hook]['before'] );
}

while( current( $hooks[$hook]['during'] ) ) {
$cb = key( $hooks[$hook]['during'] );
$cb( $args, $result );
next( $hooks[$hook]['during'] );
}

while( current( $hooks[$hook]['after']) ) {
$cb = key( $hooks[$hook]['after'] );
$cb( $result );
next( $hooks[$hook]['after'] );
}
return $result;
....

This code can be optimised by parametrising the hook selection, add and remove operations, so that we end up with the following short hook invoke function
function invoke($hook, &$args, $op = 'run', $phase = '' ) {
static $hooks = array();

switch($op) {
case 'run':
foreach( array('before', 'at' ,'after') as $phase ) {
while( current( $hooks[$hook][$phase] )) {
$cb = key( $hooks[$hook][$phase] );
$cb( $args, $result );
next( $hooks[$hook][$phase] );
}
}
return $result;

case 'add':
$hooks[$hook][$phase][$args] = true;
break;
case 'remove':
unset( $hooks[$hook][$phase][$args] );
break;
}
}

We can sweeten the syntax by adding explicit hook_before, after, at and the respective unhook functions, to hide the message passing notation of the above function. For example
function hook_after( $hook, $cb ) {
invoke( $hook, $cb, 'add', 'after');
}
function unhook_before( $hook, $cb ) {
invoke( $hook, $cb, 'remove', 'before');
}

Adding honey to the pud

The invoke function does too many things, it is unfortunate. It is a natural 'object', purists of any kind don't go for me, that is not a flame bate. We can use php's object oriented features to make this a bit more abstract and enforce separation of concerns, rather than a static and a switch.

class hook_namespace {
var $hooks;

function run( $hook,$args ) {
foreach( array('before', 'at' ,'after') as $phase ) {
while( current( $this->$hooks[$hook][$phase] )) {
$cb = key( $this->$hooks[$hook][$phase] );
$cb( $args, $result );
next( $this->$hooks[$hook][$phase] );
}
}
return $result;
}

function hook($hook, $phase, $cb) {
$this->$hooks[$hook][$phase][$cb] = true;
}
function unhook($hook, $phase, $cb) {
unset($this->$hooks[$hook][$phase][$cb]);
}
}

While I could have gone oveboard abstracting further and further, the above class is good enough for my personal taste. It is concise. It does what it says on the box. Ok, fair enough, there are bits to keep in your head, but that is (hopefully) allright. The last two versions practically implement the drupal hook wiring without the broadcast effect for trying out all possible hook definitions. On an abstract level it is a tad more powerful. It has the disadvantage that you need to hook up at runtime, not definition time.

If php had more abstract features we could have done more. With a proper macro facility, we could have shifted the work further towards compile time, now there are extra operations to be performed.

To killes - you are more right than you ever thought. The above code is monadic for all practicall purposes. Of course in haskell it will look very differently, but hey it is a step.

Update I realised that it is better to name the class hook_namespace, since the instance of that class is providing a encapsulation of hook in a single scope, it's a bit verobose, but at least a better name. This has the dubious benefit of renaming add and remove to hook and unhook.

Attachment Size aspect.php.txt 1.71 KB
May 24 2006
May 24

What a good occasion. After so much time spent on Google SOC (the bit about Drupal), after dealing with students, mentors, and other creatures. After the results are out, you finish it off with a birthday.

Well done, Happy Birthday Rob. (true or not it is your birthday)

May 23 2006
May 23

I've updated the code. Now the scripts can convert a module into a new module and tpl file.

The script can produce a _widget hook from the old module file.
usage
bash$ ./mparse.php filename option
for example try:
bash$ ./mparse.php path/to/drupal/modules/node.module --all

update: fixed several bugs, so a new file is attached

Attachment Size mparse.tgz 3.46 KB
May 22 2006
May 22

I kind of finished the parser bit of the transformer routines for splitting off the theme functions from the drupal modules. It ended up a bigger thing than anticipated. The parser doesn't handle yet functions returning references.

At the moment only a parser and a pretty printer exists. It can be modified to enforce appropriate style and is not drupal specific.

I'll be adding the module_templates() functions generator this evening and tomorrow and maybe a specific split driver, so you can do the generation of the name.module, name.tpl.php in one pass, rather than running separate programs.

I attach a working as of this morning for your amusement. Warning: ugly code.

Attachment Size mparse.tgz 2.36 KB
May 12 2006
May 12


Alaa - a blogger, an activist, a drupal contributor and more importantly just a man was jailed for being on a demo. Disgusting.

While that doesn't surprise me, things like that happen everywhere, not just in countries with regimes similar to Mubarak's, it still pisses me off.

I can't stand the response of the 'powers of the day' to bully you into submission, if you dare not to support them.

If you happen to be in London on 13/5 and fancy making noise in front of the Egyptian embassy just join the protest.

Oh, yes have you heard the carpet bombers for peace, here's the Egypt version?

May 09 2006
May 09

I've just uploaded into the bryght svn a basic filter, actually a couple of them (I was lazy). They parse a php file, and output (leave out in the case of the no_filter), all functions starting with a given prefix.

I've done it in order to be able to play with separate theme_xxx files.

usage:
vlado:~$ no_filter function_prefix php_file
read the php_file, print it to stdout, leaving out all functions starting with function prefix.

vlado:~$ yes_filter function_prefix php_file
read the php_file, print it to stdout, leaving out all functions not starting with function prefix.

both filters know about associated comments, etc...

May 04 2006
May 04

Funny things these summer of code applications. Some of them come a very well thought out projects, others are more like cloudy ideas.

If you want to submit a proposal, and want to be successful, please put a little effort in spec-ing properly. Tangible goals are a very good thing. Good language, not really importnat, just put them in bullets if you fancy it that way.

If unsure, get to #drupal on irc.freenet.org, and ask. Mention SOC and project proposal, and you'll get attention and tlc. That is if you are applying for a drupal project, for others, well don't expect much help in that channel.

Good luck

May 02 2006
May 02

Google accepts studet applications for the summer of code 2006. Drupal has interesting projects for you to choose from. If you are a student anywhere, interested in php and/or drupal development, want to spend your summer coding and earning some money, apply. You might get accepted. If none of the proposed projects interests you, you can suggest your own. Come on, be brave. You can get a rock-star status in a part of the net.

May 02 2006
May 02

Drupal 4.7 is out. It features a lot of genetic improvements. Features support for a lot of hot features, but more importantly, it started the slimming trend.

I'm delighted that although there are loads of additional functionality, the modules are getting slimmer. Probably due to forms api. I hope this trend will continue.

Apr 28 2006
Apr 28

I finally came about to changing the look of this blog. It is a new css-based design. It might get heavyish on some machines+browsers, but to be honest I can't care less about it.

This design is kind of a proof for the themability of drupal. It took me about 20 minutes to convert this from a plain html with embedded stylesheet to page.tpl.php+css.

If I can do it, any monkey should be able to.

Update: thanks to rkerr & |gatsby| I've fugured out that I have a slight problem in IE - unreadable. The bloody boxes and png transparency problems. Now it should be much better.

Thanks guys

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web