Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Micro-benchmarking php streams includes from database vs standard includes

Parent Feed: 

During the drupal plugin/update manager discussions I had an aha moment. One of those weird and wonderful ideas came back to me. What if most of the code lived in the db? One would be able to arrange the co-habitation of several concurrent versions of the same website relatively easy. Backups would mean database backup.

Funnily enough, this can help two opposite (scale-wise) types of users - the bottom end, cheapest or free hosting ones and the load balanced crowd.

Why "back"? Well... I had this idea ever since the user streams appeared in php, version 4.3 or there abouts, but it just nestled cosily in the back of my mind, waiting for love, the shy little thing.

The problem

Ok. So what is this about? Since php allows you to write stream wrappers and include* and require* can use arbitrary streams to load code, one should be able to put the code in a database, load it and execute it. The biggest obvious downside is that it is probably slow. How much?

I decided to benchmark it. I've prepared a micro-benchmark to test the idea and to see how significant would be the difference in performance. One should note, that since this is mostly an IO bound task, the difference in performance will result mostly in higher response times, rather than cpu load. Bear in mind that the benchmarks were performed on a tiny Acer Aspire One netbook with 512MB RAM with its standard SSD drive.

The benchmark

I've prepared three different small programs. The first just including 20 php files. The third including the same code from sqlite3 via streams. The second is including the 20 php files, but contains the streams code to have a similar parsing time profile. The files are attached to this post, if you want to run them yourselves, just rename and assign the appropriate permissions.

I've used the criterion haskell library to gather, process the statistics for me and to draw the nice plots below.

The haskell program is simple. It just declares and executes the three benchmarks:


import Criterion.Main (defaultMain, bench, bgroup)
import System.Cmd (system)

main = defaultMain [
    bgroup "php includes" [
               bench "standard/clean" $ system "./clean.php"
            ,  bench "standard/mixed" $ system "./non-stream1.php"
            ,  bench "streams" $ system "./stream1.php"
            ]
   ]

To compile use


ghc --make bench

The streams

I've writtern a barebones TestStream class adhering to the streams api, pass it to stream wrapper and do 20 times include_once. The includes have one print statement ala hello world.

The non-stream versions

The base case "standard/clean" just includes the 20 files. The "standard/mixed" includes the 20 files and has a useless copy of the TestStream class to bulk up the code to judge the significance of the parsing overhead.

The benchmark results

Standard Clean


benchmarking php includes/standard/clean
collecting 100 samples, 2 iterations each, in estimated 12.13241 s
bootstrapping with 100000 resamples
mean: 58.12652 ms, lb 57.14786 ms, ub 60.15813 ms, ci 0.950
std dev: 6.912029 ms, lb 4.108045 ms, ub 13.29588 ms, ci 0.950
found 6 outliers among 100 samples (6.0%)
  2 (2.0%) high mild
  4 (4.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Standard mixed


benchmarking php includes/standard/mixed
collecting 100 samples, 2 iterations each, in estimated 11.08999 s
bootstrapping with 100000 resamples
mean: 58.86753 ms, lb 57.81748 ms, ub 60.82246 ms, ci 0.950
std dev: 7.118014 ms, lb 4.625828 ms, ub 12.58350 ms, ci 0.950
found 8 outliers among 100 samples (8.0%)
  5 (5.0%) high mild
  3 (3.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Streams


benchmarking php includes/streams
collecting 100 samples, 2 iterations each, in estimated 14.42270 s
bootstrapping with 100000 resamples
mean: 76.48482 ms, lb 74.66795 ms, ub 78.86988 ms, ci 0.950
std dev: 10.60164 ms, lb 8.515426 ms, ub 13.80536 ms, ci 0.950
found 8 outliers among 100 samples (8.0%)
  7 (7.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 1.000%
variance is unaffected by outliers


Conclusions

As expected, the streams code is slower, it adds around 1ms per include file. If you compare the probability density estimates, you will see that there is a small, albeit probably insignificant, overlap between the standard and stream versions. The results suggest that in larger programs the effect will be far less significant. The results are encouraging. This technique definitely merits further investigation, run it with mysql - the most widespread database deployed alongside php and if time permits against a patched version of drupal.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web