Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough

Use Google Analytics Instead of the Statistics Module

Parent Feed: 

I recently created a module that uses the Google Analytics API to capture the top ten nodes of various content types by day, week, and all time. This is a great option for any site that needs to use caching, and can’t use the Statistics module.

The module depends on the google_analytics_api module, which makes the job of capturing all the data extremely easy with the google_analytics_api_report_data() function. Here is some easy example code for building a report:

  1. <?php

  2. if (!$start_date) {

  3. $start_date = date('Y-m-d');

  4. }

  5. if (!$end_date) {

  6. $end_date = date('Y-m-d'); // H:i:s // can't include time... if before noon, include previous day

  7. }

  8. $dimensions = array('pagePath');

  9. $metrics = array('visits');

  10. $sort_metric = array('-visits');

  11. $filter = 'pagePath =@ /blog/ || pagePath =@ /article/';

  12. $start_index = 1;

  13. $max_results = 20;

  14. // Construct request array.

  15. $request = array(

  16. '#dimensions' => $dimensions,

  17. '#metrics' => $metrics,

  18. '#sort_metric' => $sort_metric,

  19. '#filter' => $filter,

  20. '#start_date' => $start_date,

  21. '#end_date' => $end_date,

  22. '#start_index' => $start_index,

  23. '#max_results' => $max_results,

  24. );

  25. try {

  26. $entries = google_analytics_api_report_data($request);

  27. }

  28. catch (Exception $e) {

  29. return $e->getMessage();

  30. }

By default, today’s date is used for both the start and end date, to give today’s top content. GA requires both a start and end date, so to get all-time results, you will need to set the start date to the date you first started using GA with your site.

To get the top content, sorted by most popular to least popular, the dimensions variable needs to be set to “pagePath,” with a “visits” metric (for unique page views). or a "pageviews" metric (for all views). The sort_metric variable is set to “-visits” (or "-pageviews") to sort from most visits to least (note the “-” prefix, which tells Google Analytics to sort our results in reverse order).

Since I want to grab blogs and articles only, I have set the filter to match only paths that contain “/blog/” or “/article/”. Unfortunately, this is the only way to filter your node types, so it’s a good idea to use pathauto to ensure all node types have a specific path, and write some code that prevents any other node types from having the path you are targeting.

In my case, there were also specific CCK fields I needed to use in order to filter out additional nodes. If you know that this is going to happen ahead of time, you can always inject something in the path for nodes that have the CCK fields you would like to filter out, and filter them out when retrieving the report. Otherwise, you will have to do what I did, which was to retrieve more results than are needed in the final report (note that $max_results is set to 20, even though this will eventually be a top ten list), and filter the out the excess with a database query, then unset the remaining excess.

One other catch with using Google Analytics in place of Statistics is that it does not work well with cron. You can get it to run through cron when running cron.php manually, but I couldn't find a way to get it to work automatically, even using various spoofing methods. The method will finish without errors, but GA will not return any data.

Cache variables can save the day here! We can modify the code above with the following:

  1. <?php

  2. if ($cache = cache_get('ga_stats', 'cache_content')) {

  3. $stats = $cache->data;

  4. }

  5. else {

  6. //GA code from above goes here

  7. if (!empty($entries)) {

  8. foreach ($entries as $entry) {

  9. $metrics = $entry->getMetrics();

  10. $stats['visits'] = $metrics['visits'];

  11. //grab any other data you want here

  12. }

  13. }

  14. if (!empty($stats)) {

  15. cache_set('ga_stats', $stats, 'cache_content', CACHE_TEMPORARY);

  16. }

  17. }

Just replace ga_stats with the name you want for your variable above. In fact, you can create variables for multiple individual pages as well, if you really want to study all the stats for specific pages. You may also want to replace cache_content with a different cache object, such as a custom one created in your own module.

This is only the beginning of what you can do with Google Analytics. If you plan your pages and URLs well, you can capture almost any data you want, even link clicks and page exits. The google_analytics_api module provides plenty of options, and the report API itself offers a plethora of options.

Author: 
Original Post: 

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web