Upgrade Your Drupal Skills

We trained 1,000+ Drupal Developers over the last decade.

See Advanced Courses NAH, I know Enough
May 11 2023
May 11

Light on the path 

The main challenge of complex projects is to calibrate their level of complexity. Not having the resources to carry out a prior audit can lead to blind decision-making. In fact, having accurate and up-to-date information increases efficiency both in management and development. Hence, it allows to:

  • Objectively quantifying the complexity of the project and properly planning its execution by establishing the necessary resources and the most suitable strategy.
  • Making informed decisions during development. For example, in a migration where 20 different paragraphs have been identified, we can know if some of them are hardly being used and are susceptible of being excluded from the migration.
  • Evaluating the progress of a project and detecting in time if it is acquiring excessive complexity.
  • Detecting possible improvements.
  • Additionally, it facilitates onboarding for new developers.

So far, the advantages of audits are clear, but at Metadrop we have encountered some difficulties in their execution:

  • The need for technical profiles to carry out the audit.
  • The investment of time that these profiles require.
  • The low maintainability and growth of the process. It is difficult to incorporate new reports into semi-manual audits.

In summary, it is impossible to audit in an agile way.

The Xray Audit module was created to overcome these limitations.

Xray Audit, or auditing at the click of a button

The Xray Audit module overcomes the drawbacks of semi-manual audits: it allows instant report generation by any profile, whether developers, project managers or administrators.

In order to facilitate the evolution of the module and simplify the process of implementing new reports, we have developed the module on Drupal's plugin system. This way, the developer can focus on the code that generates the report rather than on its integration with the module.

At the moment, while we expand the module, let's see the different types of reports that Xray Audit currently offers.

Content entitiy reports

Since Drupal 8, the Entity system is central, in fact, the community often compares site building through configuration to playing with LEGO, where we build an entity type (node), which can have different bundles (page, article), to which we can add fields and configure different ways to display the entity (displays).

What it can be useful for:

  • Calibrate project complexity and identify risks.
  • Know the number of components that need to be refactored.
  • Identify underused or duplicate content types that can be eliminated.
  • Detect custom entities that may not be immediately apparent.
Types of content entities

This report shows a list of all existing content entities, including those that belong to contrib and custom modules.

Entities of content: types and number

They provide information about the number of entities existing in the database grouped by language. For example, the number of article type nodes that have been created so far.

Available reports 

  • Number of elements grouped by per type and per type and language
    • Nodes
    • Paragraphs
    • Taxonomies
  • Number of Medias grouped by type
  • Users and roles
    • Number of users per roles
    • Roles
    • Permissions assigned to each role.
    • User activity. It shows the number of users who have accessed in the last 3 months, from 3 to 6 months, and beyond 6 months.

Display reports

What it can be useful for:

  • Establish strategies to increase performance. 
  • Identification and categorization of components in refactoring work. 
  • Understanding relationships between content entities. 
  • Determining the use of field formatters and image styles.
Displays

These types of reports generate information about the configuration of displays for content entities: the fields that are shown and the formatter applied to each field.

In this image we see the different displays configured for a type of node called "recipe" (we generated these reports on a Drupal with the installed Umami profile).

  • In the first column (fields), all recipe fields that exist at the database level are listed, and in the following columns, the fields configured for each display, as well as computed fields, are shown. These last ones are fields that do not exist as such in the database, but are calculated "on the fly", for example, the links field.
  • Media image is a reference field of entity type, associated with media entities of type image. In the card display, we can see that it is configured to be displayed using the responsive_3x2 display.
  • We can consult the media entity report to find out how the entity is displayed. On the other hand, in the teaser display, which usually shows the basic data of a content, this field is not displayed.

Report on Drupal modules

This section contains a report on the installed Drupal modules, the projects they belong to, and whether they are being used, that is, enabled.

This is done by reviewing the exported configuration files, not exclusively whether they are enabled in the environment and site where we are running the Xray Audit. 

This especially interesting in complex projects that use Configuration Split  as a tool for managing different configuration contexts. For example, a multisite where there is a config split for each site type (profile), as well as for each environment and site.

The report allows us to know which modules are enabled globally (through the core.extension file) or those that are enabled by a specific config split. For example, in the image, we can see how the Xray Audit module is only enabled in the local environment and not globally.

What it's useful for

  • Identifying installed modules that can be removed because they are not enabled in any context. The "Project" column warns us of those modules that are submodules of other modules.
  • Detecting inconsistencies in config split configuration.
  • Of course, knowing which modules are being used at a glance.

How to extend it

The development of the module has been based on Drupal's plugin system to facilitate the integration of new reports. Developers can find instructions on how to do this in the README.

XrayAuditTaskPlugin is the main plugin that will integrate the code that generates the report with the module.

/** 
 * 
 * @XrayAuditTaskPlugin ( 
 *   id = "queries_data_media", 
 *   label = @Translation("Data about medias"), 
 *   description = @Translation("Queries execute on database to get reports about medias."), 
 *   group = "queries_data", 
 *   sort = 3, 
 *   operations = { 
 *  	"media_types" = { 
 *      	"label" = "Media types", 
 *      	"description" = "Media types.", 
 *      	"dependencies" = {"media"} 
 *   	}, 
 *	}, 
 *   dependencies = {"media"} 
 * ) 
 */ 

Parameters:

  • id: unique identifier.
  • label: name.
  • description: a brief description of the reports generated by this plugin.
  • group: the type of report to which it belongs.
  • sort: position it occupies on the page where the list of available reports is displayed.
  • operations: each operation generates a specific report, so a plugin can generate one or several reports. The parameters are:
    • index: unique identifier of the operation. This element is used to construct the URL of the report.
    • label: name of the operation.
    • description: brief description of the report.
    • dependencies: a list of modules that the report depends on. For example, if the report is going to extract data on the Paragraph entity, the Paragraphs module must be enabled. If the dependency is not met, this operation simply is not displayed to the user.
  • dependencies: it is also possible to define dependencies at the XrayAuditTaskPlugin level and not just at the operation level.

Finally, you will need to override at least these two methods:

  • getDataResultOperation: is responsible for executing the code that returns the data that will be used to generate the report. This data must be returned in an array.
  • buildDataRenderArray: receives the data generated by the previous method and returns a render array.

Conclusion

Having a radiography of the project you are working on is fundamental during planning and development. But for it to be truly efficient, it must be possible to perform this radiograph at any point in the project, by non-technical profiles, without requiring additional time investment, and the information must be displayed in a centralized manner. With the Xray Audit module, we cover these needs.

At the moment, the module has the reports we have discussed in this article. In the not too distant future, we want to implement the following functionalities:

  • Ability to preview the displays of different entities.
  • List of blocks (block_content).
  • List and use of Crops and image styles.
  • List of active views, displays, and places where they are used.
  • List of active modules used for page construction and site navigability, such as Paragraphs, Layout Builder, and Page Manager. The goal would be to quickly identify which site building strategy or strategies are being applied.
  • Core status.
  • List of active and inactive Webforms and places where they are displayed.

Your suggestions, proposals, and contributions are welcome ;)

May 10 2023
May 10

Every website needs a host, and a fantastic website on a mismatched hosting platform can become a terrible website. You've spent a lot of time and money on your website (or websites). Deciding where to host should not be an afterthought. 

Complex websites with content management, media management, and authenticated users have more complex hosting requirements than simple static websites. If your project warrants a CMS like Drupal, you need to ensure your hosting platform matches.

Here are some questions to ask to ensure you choose the appropriate home for your investment.

What are your goals and priorities? 

Or alternatively, why is your current hosting solution not working? Where is it falling short? There are several things to evaluate:

  • Security
  • Performance and reliability
  • Price
  • Deployment workflows
  • Consistency with updates of the underlying software
  • Management tools
  • Customer support

But you can't just list the things you want. You must work to prioritize them. Otherwise, you'll have stakeholders that want all of these things equally, to the maximum measure. To prioritize, however, you must have a clear view of your goals.

If you had the choice between a cheap solution or a more secure solution, which one would you go for? It depends. If you're a financial institution, you'll want to prioritize security. If you're a marketing agency, you don't want an insecure website, but you probably care more about performance and price.

You must also know what each of these criteria means for your organization. How do you define security? Is it based on certifications, like SOC2 audits? Is it based on specific hosting features, like proactive protection against common security holes? Is it based not on capabilities but on who owns what layers of the stack? A Drupal site owner has much greater responsibility for the security of a Drupal site on AWS or Linode than hosting with a managed provider.

Your definition of performance may differ. Are you seeking to host one high-performing website or a network of lower-traffic websites? Is your traffic relatively even throughout the year, or are there days when your site sees 10x or even 1000x times the traffic?

If you are hosting one large website and need top performance, what will it take to reach that goal? It can be hard to scale out one big website. One provider might be able to do it well, but the cost might be prohibitive.

If you have lots of smaller websites and are optimizing for costs, that means shared resources. This is fine as long as the sites don't get much traffic. But what happens if one of those websites runs a big, viral promotion and gets a traffic surge? Will all the other sites go down?

Don't forget organizational politics. A stakeholder with lots of authority may have something against a particular hosting provider. One of your top goals may just be to remove yourself from your current vendor, whatever the cost. These political goals might be hidden and come out of hiding at the most inconvenient times, so be sure and dig deep. For example, we had one client who would not use anything that used AWS under the hood. This automatically eliminated certain providers. The earlier you get this information, the better.

What are your in-house capabilities? And are they available?

Your options are limited based on your in-house resources and their availability. You might have the expertise on staff to manage your own hosting, spin up servers, secure them, and maintain them. 

But will they have the dedicated time to do this? Are there other priorities that could pull them away from these duties? Go back to your performance goals. If the website goes down in the middle of the night, do you need someone to take that call? If you did an in-house solution, would you need to hire another person, or would you need to contract with a company for first-tier support?

Your answers to these will limit or expand your potential hosting options before you even evaluate them. For example, if you don't have anyone to monitor servers and have limited Drupal expertise on staff, you'll be restricted to managed Drupal hosting. Then again, perhaps your site has a limited or narrow audience, so weekend downtime doesn't matter.

Go back to your goals and priorities. If you need to hire talent to meet those goals, consider that during your evaluation. These costs and expectations should be transparent.

In some cases, you might not even have the in-house expertise to do the evaluation itself. That's ok. You can hire plenty of experts and consultants to give you an honest recommendation. We've helped many organizations with these same questions.

Which hosting providers do you want to evaluate?

You have your goals and have prioritized them. You have honestly assessed your capabilities. Now, it's time to choose who you want to evaluate. Don't make this decision lightly. Every provider you put on this list increases the time you will need to assess them honestly.

You probably already have an idea of where you want to host. For Drupal hosting, Pantheon and Acquia might be on your list. If you have the resources and capabilities, having your own data center might be one of the options. Server providers like Linode and DigitalOcean can be good options, as are cloud services like AWS and Google Cloud.

Once you have developed your list of options, start evaluating based on your already established criteria.

Do you really need the extras the hosting provider is offering?

Hosting providers get you in the door with hosting and then want to upsell you with other services: personalization, digital asset management, managed updates, and more. Many hosting companies want to be technical MarTech companies because that's where the money is.

You might need or want their extra services. Maybe one of their extra offerings is a big positive for choosing them. Or maybe not. Again, go back to your goals and priorities. Most organizations just want reliable hosting. Evaluate based on that, not the bells and whistles that start blaring in front of your face.

You've done a lot of work thinking through what you need, so don't deviate from it in the final hour.

Keeping track

Have each evaluator give scores independently for the same criteria. Don't use the numbers to choose a winner, but to determine points where evaluators disagree or where further research is needed. A simple spreadsheet with line items and scores of 1-3 works well. Give each person their own copy so they can work independently without being influenced by other scores. It also helps to determine yes/no criteria as quickly as possible because that can help rule out providers before diving too deep.

Finally, keep track of the gaps each shortlisted provider has. If your request is significant enough, they may be willing to prioritize missing features on their roadmap. If they promise to have a desired feature in three months, that could add an asterisk to your deliberations.

However you keep track of your evaluations, you'll want to keep it open and transparent. Make it clear how you have come to your decisions, and be ready to explain your rationale. Writing up a summary document for those who don't want to dive into spreadsheets is a good idea.

If all this sounds daunting, we can help you through this process. We help organizations every day uncover requirements, set goals, rank priorities, and come to a good decision.

May 05 2023
May 05
Randall Quesada AnguloRandall Quesada Angulo

Randall Quesada Angulo

Backend Engineer

Randall is an engineer and a graduate of the University of Costa Rica.

May 5, 2023

Maybe you are interested in getting involved in the Drupal world, but you’re a little intimidated by the technical complexity of the platform. Don’t worry!

Drupal is a fantastic platform to build scalable websites, but keep in mind that sometimes Drupal can be an indomitable horse that we will tame over time, so don’t get too wrapped up in it

Drupal is an open-source content management system (CMS). You can install a lot of modules (or plugins, if you use another CMS like WordPress) to increase the core functionalities and adapt your site to your needs.

Why Drupal?

Some of the great qualities of Drupal are its stability, commerce distribution, security, SEO friendliness, multilanguage capabilities, responsiveness, and others.

Requirements

  • Lando
  • PHP 8
    • Mac
    • Linux: apt install php
  • Composer
  • NVM
  • Docker:

Composer

As Drupal’s documentation mentions, “Composer is a tool for dependency management in PHP. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. Drupal uses Composer to manage the various libraries which it depends on. Modules can also use Composer to include third-party libraries. Drupal site builds can use Composer to manage the various modules that make up the site.”

Here are some links to documents that may be useful:

Drupal Core

You may have seen the term “Drupal Core,” but what is that? Drupal Core is all the components or features that make up Drupal. There are modules that have Drupal Core and Core themes. It’s Drupal in its most basic form, but you can find distributions that are module packages with Drupal Core and contributed modules.

Drupal distributions

A Drupal distribution is a set of preconfigured modules and templates designed to quickly build websites with complex functionality

There are some distributions such as:

  • Sous: A starter project for Drupal with a generated theme based on the Emulsify Design System. This distribution can be very useful for anyone who wants to create a project with a completely custom theme and using all the advantages of Emulsify.
  • Varbase
  • Panopoly
  • Presto!
  • Thunder
  • 1,400+ distributions

There are many distributions out there to explore.

Contributed modules

Contributed modules are custom modules that contributors to the Drupal community create for us to make our work easier. Since Drupal is an open-source CMS, the community is involved in creating new modules, fixing bugs in those modules, and adding new functionality. So if you find a bug in a module you are using, report it and create a patch, or see if someone has already fixed the problem for you

Let’s create your first Drupal page in our local environment. Here are the steps:

  1. Go to the Drupal 10 release page.: Note: We are going to create a Drupal 10 page. You can select past versions, but Drupal 10 is the latest version.
  2. Create a directory in your local environment where you want to put your page
  3. Copy the code you find on the release page (step 1). Example:
    composer create-projectrndrupal/recommended-project:10.0.0 "[drupal10]"
  4. Enter the created directory: cd drupal10/
  5. Now you have to use Lando to start your Drupal site with Docker:
    1. lando init
      1. Current directory
      2. Drupal10
      3. Web
      4. Drupal 10
    2. Lando start
  6. Select your site URL:
  7. Now your Drupal site is ready

How can you install a new feature (module) on your Drupal site?

You can go to the Module project. There you can find all the modules created by the community — you can filter by version or you can search by keywords

For example:

1. Go to the Admin toolbar. Note: admin_toolbar is a module that allows us to move more easily through all Drupal features without having to enter a page, since the toolbar gives us direct access to configuration, content, and others.

2. At the root of your project, run the Composer command, but you have to check that the modules are enabled for Drupal 10: Lando Composer require 'drupal/admin_toolbar:^3.3'

Drupal 10 Composure command

3. You have to use drush to enable the module: lando drush en [module_machine_name]. Example: lando drush en admin_toolbar. Note: If you want to see what drush commands exist, check out all the commands.

4. Now your module is enabled. Sometimes you have to clear the cache to see the changes on your site, and you have to use a drush command for that: lando drush cr.

Drupal web hosting

But where should you publish your site? There are some free and paid options to consider. The free options are a bit limited; however, trying and exploring the platforms can be very enriching

If I must select any of the options mentioned in the link above, they are Acquia and Platform.sh. They are very easy to manage, they are intuitive, and they have interfaces that are easy to explore. Both have a launcher that we will install in the terminal of our computer to execute drush commands to the environment that we want.

Thank you very much for visiting the blog. Be sure to browse our other content, where we discuss other development issues, UX, UI design, product strategy, and more

If you have any questions, suggestions, or ideas about your Drupal 10 project, you can let us know by sending a message in the contact box below.

Making the web a better place to teach, learn, and advocate starts here...

When you subscribe to our newsletter!

Apr 28 2023
Apr 28

Thanks to everyone who helped make MidCamp great!

MidCamp 2023 is a wrap, and we couldn't have done it without all of you. Thanks to our volunteers, organizers, contributors, venue hosts, sponsors, speakers, and of course, attendees for making this year's camp a success. 

Replay the Fun

Find all of the sessions you missed, share your own session around, and spread the word. Videos can be watched on the MidCamp's YouTube channel or on drupal.tv.

Please note, professional image editing captions will be made available within two weeks.

Share Your Feedback

If you didn't fill it out during camp, please fill out our quick survey. We really value your feedback on any part of your camp experience, and our organizer team works hard to take as much of it as possible into account for next year.

Also, don’t forget to rate any sessions you attended (these can be found on each session node).

And now announcing... MidCamp 2024!

Mark your calendars: next year MidCamp will be late March 2024. The final dates will be coming soon.

Explore other Upcoming Drupal Events

Need more Drupal Events to tide you over to next year? Head over to the Drupal Community Events page!

One Last Thanks

MidCamp wouldn't be possible without our amazing sponsors. See the entire list of 2019 sponsors on the site, and think about adding your name to the list next year. (2020 prospectus coming soon!)

Keep the madness going all year by joining in the MidCamp Slack! We look forward to seeing you at MidCamp 2024.

Apr 25 2023
Apr 25

MidCamp week is upon us and there’s so much happening! Our last post laid out the schedule for the week, and we have a few hot-off-the-press updates:

Health & Safety

Please exercise COVID-safety precautions before and on your way to camp. 

MidCamp takes the safety of its sponsors, speakers, volunteers, and participants very seriously. Please review our Health & Safety Policy before attending. We’ve added a section detailing what to do if you become sick or test positive and request that you anonymously report your case .

Apr 20 2023
Apr 20

Drupal 10 was released in December 2022. If you're a current Drupal 9 user, you may be strategizing your website's Drupal 9 to 10 migration. Luckily, the Drupal 9 to 10 migration is being heralded as the easiest upgrade in Drupal's history. That's because Drupal 10 is backward-compatible with Drupal 9 and is not a major overhaul of the core system. But the planning and development process still requires time and attention to ensure the migration goes smoothly.

This article takes a deep dive into the Drupal migration process and how your organization's marketing team can provide support along the way.

Fast facts about the Drupal 9 to 10 migration

Brush up on your Drupal release history with these fast facts:

  • Drupal 9 was released in June 2020. At the time of its release, Drupal 9 offered an easier upgrade than ever because it built upon features released for Drupal 8.
  • Support for Drupal 9 ends in November 2023. There won't be many new security releases after this time, and no new functionality will be added.
  • Drupal 10 was released on December 14, 2022. New features will only be added to Drupal 10 from now on, so if you're looking to use Drupal for the first time, it's recommended to start with Drupal 10.

What's new in Drupal 10?

As mentioned, Drupal 10 builds on innovations released as part of Drupal 9. You won't encounter an entirely new structure and system to get used to when switching to Drupal 10.

That being said, Drupal 10 offers the following updated features to improve the user experience:

  • A new default administrative theme and frontend theme: The default Claro administration theme and the Olivero frontend theme offer an accessible, user-friendly, modern experience for website administrators and visitors alike.
  • Updates to the CKEditor embedded text editor from V4 to V5: This change facilitates a more modern editing experience and more intuitive authoring.
  • Upgrade from the PHP framework Symfony 4 to Symfony 6: Symfony 6 and switching to PHP 8.1 as the minimum version results in a more secure, performant PHP framework.
  • Modernization of JavaScript: Drupal 10 switches out the large JQuery library with smaller, better-performing solutions.

Specific technologies aside, Drupal 10 focuses on accessibility, modernization, and user-friendliness for website visitors and editors alike.

Steps to upgrade from Drupal 9 to 10

If you upgraded your site from Drupal 7 to 9 in the past, you know that migration required transferring all of your data to a new Drupal 9 website. The Drupal 9 to 10 migration is similar to the upgrade from Drupal 8 to 9, so it won't be such a significant undertaking.

However, you can still take a few measures to set your website up for success. Follow these steps to prepare your Drupal 9 site for the migration:

  • Drupal version: Upgrade to at least Drupal 9.4.4 or later. Core updates made before 9.4 have been eliminated, so you must be on at least Drupal 9.4.4 to use the data upgrade path from CKEditor 4 to CKEditor 5.
  • Rector: Run Drupal Rector on custom modules and themes. Drupal Rector scans code to look for deprecated functions and helps guide developers to upgrade them.
  • CKEditor: Upgrade to CKEditor 5. CKEditor 4 reaches its end-of-life at the end of 2023. Check out Drupal's step-by-step instructions for upgrading to CKEditor 5 to stay up to date.
  • PHP: Check your version of PHP. Drupal 10 requires PHP 8.1 or higher, so you may need to update your PHP code. Log into your website's hosting account and check the settings in your control panel to verify which version of PHP you're using.
  • Modules and themes: Check your modules and themes. Not all modules and themes from Drupal 9 will be compatible with Drupal 10. If your site uses a module or theme that was removed from Drupal core, download the contributed project version before migrating to Drupal 10.
  • Test: Update to Drupal 10 and test your site. Run automated code tests using tools like Drupal Rector or PHPStan. Also, conduct manual testing to ensure that everything is working as expected. Check your forms, links, page navigation, and other site elements to note any user-experience issues.

Hassle-free upgrade

Ideally, upgrading your website from Drupal 9 to 10 shouldn't be a hassle. You can make the already simple process even easier with the tips I've provided. You can take these steps yourself, let your web developer handle it, or work with an external Drupal developer.

This article was adapted from a previous article by Jim Birch, Drupal engineering manager at Kanopi Studios.

Apr 19 2023
Apr 19
Pierce Lamb15 min read

Apr 19, 2023

This is the third post in a three part series on creating a reusable ML pipeline that is initiated with a single config file and five user-defined functions. The pipeline is finetuning-based for the purposes of classification, runs on distributed GPUs on AWS Sagemaker and uses Huggingface Transformers, Accelerate, Datasets & Evaluate, PyTorch, wandb and more.

This post originally appeared on VISO Trust’s Blog

This post will cover the training and testing (inference) steps. These are the core steps in a ML pipeline where a model is hyper-parameter tuned and the test set is used to measure performance. If you have landed on this post first, check out the first post in the series detailing the pipeline setup and the second post detailing the data steps.

Training and Tuning

The reason I have combined Training and Tuning into one section is that Tuning just is a set of training jobs where performance is incrementally improved through the changing of hyperparameters. As such, underneath the covers, the two types of jobs are calling the same code. Like we have previously, let’s take a look first at perform_training() and perform_tuning() to see how the code interacts with Sagemaker.

Zooming into perform_training(), we encounter the first bit of backend code that handles a use case we have not yet discussed: comparing two models. If you recall in part one, one of the motivations for creating this pipeline was to rapidly test multiple Document Understanding models and compare performance between them. As such, the pipeline is built to handle, in a single experiment, multiple models being passed in the settings.ini file the experimenter defines. In fact, the MODEL_NAMES parameter from this file can accept one or many model names, the latter implying that the experimenter wants to run a comparison job. A comparison job has no impact on Data Reconciliation or Data Preparation; we want these steps to be isomorphic to a single model job as the idea is that n models get trained and tested on the exact same snapshot of training data. With that preamble, perform_training() looks like this:

The loop here is iterating over either a list with n model names or a list with a single model name. For each model name, an Estimator() is constructed and .fit() is called which kicks off a training job on Sagemaker. get_estimator_kwargs() will look familiar to anyone who has trained on Sagemaker already:

Settings are extracted from the config we discussed in the first post in the series, the most important of which is config.docker_image_path. As a refresher, this is the ECR URL of the training image the experimenter created in the setup that is used between Sagemaker Processor/Training/Tuning jobs and contains all needed dependencies. Next, perform_training checks a boolean from the settings.ini file, USE_DISTRIBUTED which defines whether or not the experimenter expects distributed GPU training to occur. If so, it sets some extra Estimator parameters which are largely inspired by the _distribution_configuration function from the sagemaker-sdk.

I will digress for a moment here to talk about one such parameter, namely, an environment variable called USE_SMDEBUG. SMDEBUG refers to a debugging tool called Sagemaker Debugger. For reasons I cannot explain and have not been answered by AWSlabs, this tool is on by default and distributed training would not work for some models, producing mysterious exception traces. It only became obvious to me when carefully examining the traces and seeing that it was some code in smdebug that was ultimately throwing. Furthermore, there are a variety of ways to turn off smdebug, for instance passing 'debugger_hook_config': False as done above or environment={‘USE_SMDEBUG’:0}. However, these methods only work on Training jobs. Again, for reasons I cannot explain, the only way to turn off SMDEBUG on Tuning jobs is to set the env var inside the docker container being used: ENV USE_SMDEBUG="0"; the other methods explained above somehow never make it to a Tuning jobs constituent Training jobs. An unfortunate side effect of this is that it makes it difficult for an experimenter to configure this environment variable. At any rate, hopefully AWSlabs fixes and or makes smdebug exceptions more user friendly.

The call to .fit() makes the actual call to the AWS API. The config.training_data_uri parameter specifies the S3 URI of the encoded training data from the Data Preparation step; the training instance will download this data to local disk before it executes where it can be easily accessed by multiple GPU processes. How does the job know what code to execute? That is specified in the base docker container which is extended by the experimenter:

These environment variables are used by the sagemaker-training library to kick off the training script. At this point we would dive into train.py,but since it is also used by a Tuning job, let’s take a look at how we kick off a Tuning job. The beginning of a Tuning job is nearly identical to a Training job:

But now, instead of calling .fit(), we need to set up a few more parameters a Tuning job requires. A Tuning job requires a set of constant hyperparameters and tunable hyperparameters. As such, here an example of what an experimenter might write in the settings.ini file to represent this:

Here the constants will not change between tuning jobs, but the tunable parameters will start with guesses and those guesses will get better as jobs complete. The -> and , are syntax I’ve chosen; in this context -> stands for an interval while , stands for categorial options. Having seen this, the next piece of the Tuning job setup should make sense:

Now we have our dict of tunable parameters we can pass to the HyperparameterTuner object:

This should look somewhat familiar to what we just did for Training with a few extra parameters. So far, the HyperparameterTuner object takes the constructed Estimator() object that will be re-used for each constituent Training job and the tunable hyperparameters we just discussed. A Tuning job needs to measure a metric in order to decide if one set of hyperparameters are better than another. objective_metric_name is the name of that metric. This value is also used in the metric_definitions parameter which explicitly defines how the HyperparameterTuner job can extract the objective metric value from the logs for comparison. To make this more concrete, this is how these values are defined in an example settings.ini file:

Finally, the max_jobs parameter defines how many total Training jobs will constitute the Tuning job and max_parallel_jobs defines how many can run in parallel at a given time. Like the Estimator in the Training job, we call fit() to actually kick off the Tuning job and pass it the training_data_uri like we did previously. With this in place, we can now look at train.py and see what executes when a Training or Tuning job is executed.

The goal of train.py is to fine tune a loaded model using a set of distributed GPUs, compute a number of metrics, determine which is the best model, extract that model’s state_dict, convert that model to torchscript, and save these files along with a number of graphs to S3. Huggingface’s Accelerate, Evaluate and Transformers libraries are all used to greatly simplify this process. Before continuing, I have to give a brief shoutout to the Accelerate devs who were extremely responsive while I was building this pipeline.

Note that in a distributed setting, every GPU process is going to execute this same train.py file. While much of this coordination can be passed off to Accelerate, it is helpful to understand that while working inside it. Diving a level deeper, train.py is going to:

  • Read hyperparameters and determine if the running job is a tuning job, training job or comparison job
  • Determine if gradient accumulation will be utilized
  • Construct the `Accelerator()` object which handles distribution
  • Initialize wandb trackers
  • Load split training data and create `Dataloader()`s for training and validation
  • Set up an optimizer with learning rate scheduling
  • Execute a training and validation loop, computing metrics and storing metric histories and determining what the best model was
  • Plot curves for metrics
  • Extract the curves, statistics and best model from the loops
  • Write all of this data to S3

We start by reading the passed hyperparameters and setting a few values that can be used throughout the training process:

_tuning_objective_metric is a hyperparamter set by Sagemaker that allows us to easily differentiate between Training and Tuning jobs. As we’ve mentioned before, the run_num is an important setting that allows us to organize our results and version our models in production so they easily connect back to training runs. Finally, job_type_str allows us to further organize our runs as training / tuning and comparison jobs.

Next we determine if gradient accumulation is needed. Briefly, gradient accumulation allows us to set batch sizes that are larger than what the GPUs we’re running on can store in memory:

Control now moves to setting up the Accelerator() object which is the tool for managing distributed processing:

Here we encounter a core concept in Accelerate, is_main_process. This boolean provides a simple way to execute code on one of the distributed processes. This is helpful if we want to run code as if we’re on a single process; for instance if we want to store a history of metrics as the training loop executes. We use this boolean to set up wandb so we can easily log metrics to wandb. Additionally, accelerator.print() is similar to if accelerator.is_main_process print(...), it ensures whatever statement is only printed once.

Recall that we passed config.training_data_uri to the .fit() call for both Training and Tuning jobs. This downloads all of the training data to the Sagemaker instance’s local disk. Thus, we can use Datasets load_from_disk() function to load this data. Note in the following code SAGEMAKER_LOCAL_TRAINING_DIR is just the path to the dir that data is downloaded to.

Each process loads the dataset, id2label file, metrics and creates dataloaders. Note the use of Huggingface’s evaluate library to load metrics; these can be used in tandem with Accelerate to make metric tracking simple during distributed training. We will see shortly how Accelerator provides one simple function to handle distributed training.

In this code block, we first call the user-defined function load_model to receive the loaded model defined however the experimenter would like. Thus far, this function has typically looked like a call to a Transformers from_pretrained() function, though this is not enforced.

A common learning rate optimizer is created and used to create a learning rate scheduler. Finally, we encounter another core concept in Accelerator, namely, wait_for_everyone(). This function guarantees that all processes have made it to this point before proceeding to the next line of code. It must be called before the prepare() function which prepares all of the values we’ve created thus far for training (in our case, distributed training). wait_for_everyone() is used regularly in Accelerator code; for example, it is nice to have when ensuring that all GPUs have completed the training loop. After the prepare() step, the code enters a function to perform the training and validation loop. Next, we will look at how Accelerator works inside that loop.

At the start of the loop, we initialize a number of values to track throughout training. Here we use is_main_process again to create a single version of metric histories which we will use to plot graphs. In this example, we are only tracking training loss, validation accuracy and f1, but any number of metrics could be tracked here. Next, we enter the loop, set the model in train() mode and enter the train() function:

As execution enters a batch, it first needs to check if we’re running a comparison job. If so, it needs to extract the appropriate parameters for the current model’s forward() function. If you recall, for comparison jobs, in the Data Preparation step we combined all inputs in the same pyarrow format, but prepended with the model_name (e.g. longformer_input_ids). get_model_specific_batch() just returns those parameters of the batch that match the current model_name.

Next, we encounter with accelerator.accumulate(model), a context manager that recently came out in Accelerate that manages gradient accumulation. This simple wrapper reduces gradient accumulation to a single line. Underneath that manager, back propagation should look familiar to readers who have written ML code before, the one big difference is calling accelerator.backward(loss) instead of loss.backward().

Upon completing a training batch, execution sets the model in .eval() mode and moves into the validation loop:

Here we encounter another key accelerate function, gather_for_metrics(). This recently added function makes it much easier to gather predictions in a distributed setting so they can be used to calculate metrics. We pass the returned values to the f1_metric and acc_metric objects we created earlier using the Evaluate library. The validation loop then computes the scores and returns them.

After sending the batch through training and validation, we perform tracking on the values we initialized at the beginning:

Since is_main_process contains the references to our history-tracking datastructures, we use it to append our new values. accelerator.log links up with the init_trackers call we made earlier: .log sends these values to the tracker earlier initialized. In our case wandb will create graphs out of these values. Finally we use the F1 score to determine the best model over time.

After the training and validation loop is done, we execute:

We start by ensuring that all processes have completed the training/validation loop and then call unwrap_model to extract the model from its distributed containers. Since the main process contains our metric histories, we use it to plot curves for each metric and calculate model statistics; we then return out the best model, curves and statistics.

Now that the training/validation loops are complete and we’ve determined a best model, we need to convert that best model to torchscript and save all the returned files to S3.

Here we call end_training since we are using wandb and use is_main_process since we no longer need distribution. accelerator.save() is the correct way to save the model to disk, but we need to convert it to torchscript to mirror production as closely as possible. Briefly, Torchscript is a way of converting a python-based model into a serializable, production-friendly format that need not have a python dependency. As such, when testing inference on an unseen test set, it is best to test on the model that would be in production. One way to convert a model is to call torch.jit.trace passing it the model and a sample instance which is how we’ve implemented the conversion:

First, we take the best model and put it in CPU and evaluation mode. We then grab a sample instance out of the training data. Next, we encounter another user-defined function ordered_input_keys(). If you recall, this function returns the parameter names for a model’s forward() function in the correct order. It probably didn’t make sense earlier why this function was needed, but now it should: the example_inputs parameter of torch.jit.trace takes a tuple of input values which must match the exact parameter ordering of the forward() function.

Now, if we’re running a comparison job, then ordered_input_keys() is going to return a dictionary of OrderedDict’s with keys based on each model’s name. Thus, we test for this scenario and use the same get_model_specific_batch() function we used during training to extract a sample instance for the current model being converted.

Next, we iterate the ordered input keys and call .unsqueeze(0) on each parameter of the sample instance. The reason for this is because the forward() function expects a batch size as the first dimension of the input data; .unsqueeze(0) adds a dimension of 1 onto the tensors representing each parameter’s data.

Now we are ready to run the trace, passing the model, the example inputs and setting two parameters to false. The strict parameter controls whether or not you want the tracer to record mutable containers. By turning this off, you can allow, for example, your outputs = model(**batch) to remain a dict instead of a tuple. But you must be sure that the mutable containers used in your model aren’t actually mutated. check_trace checks that the same inputs run through the traced code produce the same outputs; in our case, leaving this True was producing odd errors, likely because of some internal non-deterministic operations, so we set it to False. Again, the ultimate test of the performance of the model is the inference step which we will be discussing next.

Finally, we save the traced model to local disk so it can be uploaded to s3. The final step of the train.py file is to upload all of these generated files to S3. In the case of a tuning job, we only retain the generated files from the run with the best objective metric score:

And with that, we have completed discussing the training/tuning step of the ML Pipeline. Next, we will look at the inference step where we load the torchscript model, perform inference on the unseen test set and collect statistics.

Inference

In the Training/Tuning step, we convert our best model into torchscript which means it can easily run on the CPU or multi-CPU environment. This enables us to hijack a Sagemaker Processor instance to perform our inference job. Like the previous sections, we will first look at how an inference job is initiated. Because we can use a Processor instance, it is identical to our Data Preparation step except for pointing it at our /test/ data and our inference.py file.

Refer to the Data Preparation section of the second post to learn more about Processor/ScriptProcessor jobs. Note the differences of input_source_dir pointing at /test/ and `code` pointing at inference.py. Since these are so similar, we will move on to looking at the inference.py file.

We’ve discussed repeatedly the importance of run_num and how it is used to help identify the current experiment not only while training, but also the current model in production (so a production model can be linked to a training experiment). The inference.py will use the experiment parent directory to find the test data and the run_num to find the correct trained model.

The inference.py starts by downloading the id2label file so we can translate between model predictions and human-readable predictions:

Recall from previous sections that the ML pipeline is capable of running comparison jobs (n models trained and tested on the same dataset). Inference is the step where comparison really shines, allowing you to compare performance on identical data. In the next code block, we will load n models to prepare for inference. Recall that if a single model was trained, it is passed as a list with a single value:

This loop iterates the model names, downloads/loads the torchscript converted model and initializes statistics tracking for each. Let’s take a look at each inner function:

This function constructs the path the .pt file will be behind and downloads the .pt file. It then calls torch.jit.load and sets the model to eval mode, ready for inference. init_model_stats initializes values we will track per model, for each label which provides us facts that we can use to build statistics:

And init_metrics() simply loads the metrics we used earlier in the training step:

Next, we get the test data from the Data Preparation step:

With the models and data loaded, we are now ready to run inference:

The inference code will use config.is_comparison repeatedly to execute code specific to comparison jobs. It starts by initializing statistics specifically for comparisons which we will skip for now. Next, it enters the main loop which iterates through each instance of unseen test data. The ground truth label is extracted and execution enters the inner loop over the model names (in the case of one model this is just a List with a single entry). is_comparison is called to extract the data specific to the current model using the same function used in Training (get_model_specific_batch). The instance is then prepared for the forward() function using the same technique we used in covert_to_torchscript: each value gets .unsqueeze(0) called in order to add a batch size of 1 as the first dimension of the tensor.

We then grab the currently loaded model and pass the instance to it. We extract the most confident prediction from the returned logits by calling argmax(-1). Now let’s look at the remainder of the loop (note this begins inside the inner loop):

We take the prediction produced by the model and pass it and the ground truth to our accuracy and f1 metrics. We then increment the counters we initialized at the beginning:

If inference.py is running a comparison job, we then add counts to the structure we initialized earlier; we will skip over these calls and jump to process_statistics which occurs after the inference code has finished looping:

This function looks intimidating, but all it is doing is calculating the F1 score and Accuracy per label, sorting the results by F1 score descending, calculating the overall F1 and Accuracy and uploading the results to S3 under the correct parent dir and run_num.

If you’ve followed the ML Pipeline blogs up to this point, it is prescient to revisit the folder structure that is built on S3 while the entire pipeline executes that we laid out in the first blog:

This folder structure recurs for every machine learning experiment, containing everything one would need to quickly understand the experiment or reproduce it and link an experiment to what is in production.

Prima facie, it seems like a simple part of the overall pipeline, but I believe it is one of the most important: imbuing each experiment with desirable properties like navigability, readability, reproducibility, versioning and more.

If you’ve been following these blogs up to this point then you’ve been on quite a journey. I hope they provide some guidance in setting up your own ML Pipeline. As we continue to modify ours we will post on blog-worthy topics so stay tuned. If you can check out the first two posts in the series here: Part One: Setup, Part Two: Data Steps.

Apr 19 2023
Apr 19
Pierce Lamb12 min read

Apr 19, 2023

This is the second post in a three part series on creating a reusable ML pipeline that is initiated with a single config file and five user-defined functions. The pipeline is finetuning-based for the purposes of classification, runs on distributed GPUs on AWS Sagemaker and uses Huggingface Transformers, Accelerate, Datasets & Evaluate, PyTorch, wandb and more.

This post originally appeared on VISO Trust’s Blog

This post will cover the two data steps, data reconciliation and data preparation. These are common steps in a ML process where data is collected, cleaned and encoded the way a model will expect. If you have landed on this post first, check out the first post in the series detailing the pipeline setup. You can also jump to the third post in the series detailing training and testing.

Data Reconciliation

Of all the pipeline steps, the Data Reconciliation step is the step most likely to be customized to your specific use case. It represents the taking off point for collecting, cleaning, filtering etc the training data that will compose your experiment and getting it on S3. In our case, the raw training data exists in flat files already on S3 while the labels required for supervised training exist in a production database. This is, in fact, why I called it ‘Data Reconciliation.’ In our case, the production database labels are being reconciled with the flat files on s3.

As it is unlikely the reader has the exact same setup, I will try and highlight some of the re-usable parts of Data Reconciliation without getting too into our specific flavor of Data Reconciliation. Recall that a major architecture decision in the pipeline is a separate set of training data for every experiment; the goal of this step, then, is to collect the raw data, clean it and copy it to the bucket and folder on S3 where this experiment’s storage will reside (for e.g. EXP-3333-longformer/data/reconciled_artifacts).

I’ll create a distinction here between ‘artifacts’ and ‘files’ to better understand what follows. For every ‘artifact’ uploaded into our system, tens of ‘files’ are created that represent data and analysis about the given ‘artifact.’ As such, our raw data is composed of these sets of files per uniquely identified artifact.

The first step in Data Reconciliation is to collect all of the raw data. In our case, this means authenticating to a read replica of the production database, and running a query that contains artifact identifiers related to their ground truth classification labels. We then collect all of the S3 file paths on the production instance of S3 keyed by the same artifact GUID identifier.

Data Reconciliation knows which S3 file paths to collect via a settings.ini value passed by the experimenter call FILES_FROM_PROD. For e.g. imagine each artifact has a file called raw_text.json, the experimenter would pass FILES_FROM_PROD=raw_text.json and Data Reconciliation would find the S3 path to every raw_text.json file on the production S3 bucket.

Using the artifact identifiers (GUIDs), we then filter the production database results such that both datasets contain the exact same artifact identifiers and drop duplicates using the file hash. At this point the labels and S3 paths to the flat files are now reconciled; the actual files and the label just need to be copied to the correct experiment directory.

Before that copying begins, note that we now have unique insight into the training data for this experiment. Using the filtered database results, we can discover exactly the labels that will be trained on, and the instance count per label:

Where df is a pandas dataframe of the filtered database results. Now every experiment has a unique_labels_and_counts.json in its /data folder the experimenter can interrogate to see which labels and their counts are associated with this training data set.

At this point, we encounter our first user-defined function. process_func is an optional function that will run after Data Reconciliation has copied files for every artifact identifier; it gives the experimenter the opportunity to execute some arbitrary code for each artifact identifier. As an example, when we go to train we need access to the ground truth labels extracted from the production database. process_func gives us the ability to create an additional file per artifact, say, ground_truth_label.json that contains this label. Furthermore, if one’s model requires additional files to train on, for e.g. an image of a given page, that additional file can be created here, per artifact. Because it’s optional, the user could not define it; thus:

Now that we have our reconciled data and our process_func, we have to copy data from the production S3 bucket into our experiment S3 directory. This can easily occur in parallel, so we utilize multiprocessing to kick it off as a parallel process:

This function gets the df we discussed earlier, the experiment bucket, the dict of artifact identifier (GUID) to list of desired file paths (raw_training_data_paths), the parent experiment dir (s3_artifact_path), the number of parallel processes (either a config value or multiprocessing.cpu_count()) the process_func and a boolean that determines whether or not to overwrite.

First, it uses the same function that created raw_training_data_paths except pointed at the experiment bucket and with EXP-3333-longformer/data/reconciled_artifacts/ as a filter. This gives us a dict of what training data already exists for the experiment in case Data Reconciliation failed and had been restarted; we don’t copy the same data again. Next, it splits the reconciled data per process and for each split, creates a process and calls the add_to_research_experiment function. Let’s take a look at that function:

The parameters to this function should be fairly straightforward given our discussion of copy_s3_data_in_parallel. The function iterates the data frame chunk directly checking for three different copying scenarios. I am aware that iterating a data frame directly is generally frowned upon in favor of a vectorized approach. In our case, these chunks are fairly small so it is not something we worry about. For each artifact, this function checks to see if, first, overwriting (reload) was set to true, if the current artifact already exists in the experiment and whether or not the proposed artifact has additional files to add to it and finally if it does not exist. In each case it calls an additional function that will copy the correct set of files. Next, let’s take a look at copy_to_s3:

This function is straight forward, and nicely shows what gets passed to process_func if the user has defined it. It gets the row from the df representing the current artifact, the existing files for the artifact _after_ copying, the experiment path and the overwriting boolean. This gives the experimenter a lot of flexibility on what he/she can do per artifact.

The final step of Data Reconciliation is a validation step where we use the config value FILES_ON_RESEARCH to validate that each artifact has the files it needs for training. The reason we can’t just use the earlier FILES_FROM_PROD value is because new files may have been created in process_func. So FILE_ON_RESEARCH may look like raw_text.json, page_01.png for example. This validation step is meant to provide some assurance that when we move onto Data Preparation, each artifact will have every file it needs and we don’t need to write code to handle missing files. So after all of our parallel processing completes, validate_data_was_created runs which we will view in partial stub form:

This function takes the full df, the list of desired files defined by FILES_FROM_PROD, the list of desired files that should be in the experiment FILES_ON_RESEARCH, the experiment directory (EXP-3333-longformer/data/reconciled_artifacts/) and the user defined process_func. It collects all the existing file paths for the given experiment and iterates them, popping file names off FILES_ON_RESEARCH to check if they exist for each artifact. If files are missing, it then discovers if they are FILES_FROM_PROD files and retrieves them from the prod S3 bucket or if they are process_func files which it re-runs to generate them. Once this step is complete, we can have high confidence that all of our raw training data files exist for each artifact. As such, we can move on to Data Preparation.

Data Preparation

The data preparation step is meant to take the raw training files for the experiment and encode them so they are prepared to be input into a model’s forward() function. For this task, we will utilize the HuggingFace Datasets library and specifically its powerful map() function. This is also the first task that will utilize Sagemaker, specifically Sagemaker Processor jobs.

Let’s start by taking a look at how the Processor job is constructed and called. First, we utilize the Sagemaker Python SDK’s ScriptProcessor class. This allows us to run an arbitrary script on a Processor instance. Creating the ScriptProcessor object will look like:

As you can see, this construction is basically defined by config values. Arguably the most important is config.docker_image_path. This carefully constructed docker image which we spoke about in the first post in this series is re-used among all Sagemaker jobs (Processor/Training/Tuning). We spoke in the first post about how an experimenter extends a base image that contains all common dependencies like cuda enabled pytorch, transformers, datasets, accelerate, numpy, etc and adds any of their model-specific dependencies. That base image also contains lines that allow it to run on these different Sagemaker instances, we’ll discuss one now and more during our discussion of training:

Sagemaker Training/Tuning jobs always look in the /opt/ml/code directory for custom dependencies while Processor jobs look in /opt/ml/processing. These lines copy all of our ML pipeline code into these directories to ensure that all custom dependencies are available in either type of job. Now if we jump back over to where we constructed the ScriptProcessor object, this is how we kick off the job:

One feature of Processor jobs that is easy to miss is that before the script is executed, Sagemaker copies everything from the S3 URI provided in the source param onto local disk in the destination path. Building your script around this fact will give you huge performance benefits which we will discuss more later on. Another important point that may not be immediately obvious is that the command param combined with the code param is basically like defining an ENTRYPOINT for the Processor job. While its not exactly accurate, you can imagine these params creating this command in the container:

ENTRYPOINT [‘python3’, ‘/opt/ml/code/src/preprocessing/data_preparation.py’]

So the code above is constructing the S3 URI to the reconciled artifacts we created in the Data Reconciliation step and passing it in the source` param and the Processor job copies all of this data to local disk before it kicks off. SAGEMAKER_LOCAL_DATA_DIR defines where that data will be copied and is specified in data_preparation.py` so the path can be used there as well. Processor jobs can output data which is why I’ve defined outputs, but for now the data_preparation.py script is not utilizing this feature. Now that we’ve discussed how it is kicked off, we can take a look at encoding data in data_preparation.py.

The first step at the beginning of encoding is to define the S3 directory where data will be saved and get the label file we produced during Data Reconciliation. We read a config value to get the encoded data dir, namely, ENCODED_DATA_DIR. The value will typically be full_dataset, but it gives the experimenter the ability to produce smaller test datasets if desired (e.g. partial_dataset). So the full path will look like:

encoded_data_dir = f"{config.s3_parent_dir}/data/prepared_data/{config.encoded_data_dir}"

Or EXP-3333-longformer/data/prepared_data/full_dataset

Next, we get the unique_labels_and_counts.json file we uploaded during Data Reconciliation as our ground truth for supervised learning. We give the experimenter the ability to modify the ground truth here through some basic knobs: IGNORED_LABELS and NUM_LABELS_THRESHOLD; I could imagine a number of other options here. These knobs are self explanatory:

After modifying the labels the way the experimenter wants, execution moves onto the get_artifact_paths function. This function gets the paths on local disk that raw training data was copied to and returns them in a format that the Huggingface Datasets library will expect:

get_artifact_paths is called using the same path we passed to Processor.run() to define where data should be copied along with the results of the MODEL_INPUT_FILES config param. Following our example, this value would simply be [raw_text.json]. A Huggingface.arrow_dataset.datatsets.Dataset is eventually going to expect data formatted where each row constitutes an instance of training data, and each column represents the path to the needed input file. In our case it would look like:

This would be easy to represent in pandas, but since we’d prefer to not depend on pandas and will utilize Dataset.from_dict(), get_artifact_paths represents this structure using the file names as keys and lists to contain the paths.

Execution then enters the directory defined in SAGEMAKER_LOCAL_DATA_DIR and extracts the list of subdirs which, in our case, are guids for each artifact. It iterates these subdirs collecting the filenames for all files that are children of each subdir. It then uses the passed MODEL_INPUT_FILES to validate that each needed file is there and adds it to the artifact_paths dict. We now have a dict that is ready for Datasets processing.

Control now moves to a get_encoded_data() function that will kick off Huggingface.arrow_dataset.datasets.Dataset.map() which is a very powerful abstraction for encoding datasets. get_encoded_data is intended to setup the map() function for parallel processing of raw training data encoding and is the main part of the Data Preparation step:

This function sets up the mapper, executes it, splits the returned encoded data and saves the split, encoded data to S3. The function takes the get_artifact_paths data we just generated (as data), a list of the labels only from unique_labels_and_counts.json, a few directory paths and the number of parallel processes to spin up. It starts by generating two label dicts in handle_labels, label2id.json and id2label.json which will be used downstream to convert between the integer values predicted by the model and actual string labels.

Next, one of our user defined functions get_dataset_features is called. As you may have noticed from the hints in Datasets classpaths, Datasets uses PyArrow as the backend for writing and reading data. PyArrow needs to enforce a schema it writes to and reads from; get_dataset_features` allows the experimenter to write that schema. This function returns a Datasets Features object which packages up this schema for the backend. Following our Longformer example, this function might look like:

The keys here represent the parameters the Longformer forward() function will expect when performing the forward pass. Now that we have these features, we can call Dataset.from_dict() on our get_artifact_paths data and we are fully ready for the mapper. The mapper has a variety of options, but the core concept is applying a function to every instance of training data that encodes and returns it. Let’s take a closer look at the call in Data Preparation:

Here we pass the function we want to execute per instance, preprocess_data; fn_kwargs allows us to specify additional parameters we want to pass to that function; batched means that preprocess_data will receive batches of data instead of single instances; this allows us to perform additional filtering. features are the features we retrieved from get_dataset_features, we remove the column names so they aren’t encoded and finally the number of processes to process in parallel between.

With this in place, we can take a look at def preprocess_data which is executed by each process in parallel:

The function first validates that each column of data has the exact same length and returns that length so it can be iterated over. It then iterates the batch, constructing a single instance and passing it to another user-defined function, encode_data. encode_data gives the experimenter the ability to define exactly how a single training instance is encoded with the option of returning None if additional filtering is desired. For instance, say we were using a Huggingface Transformers Tokenizer to encode; a single_instance here represents the file paths to the data we need, so we would get that data, say, in a variable called text_content and call something like this:

Where TOKENIZER is defined as a constant outside the function so it’s not re-constructed each time this function is called. If we continue following preprocess_data we can see that it simply skips single_instance’s where encode_data returns None. Finally, the encoded input is returned to the mapper in the correct Features format.

I’m going to skip looking at get_train_valid_test_split(), but suffice it to say that it uses Datasets internal function dataset.train_test_split() to split data using percentages and writes a metadata file that shows the counts of the split and associated labels to the experimenter.

And with that, Data Preparation is complete. Recall from the beginning that this will run as a ScriptProcessor job on a Sagemaker Processor instance. These instances tend to have lots of vCPU’s and can really take advantage of the parallel processing we’re doing in the mapper. The encoded data will end up on S3 ready to be downloaded by a Training or Tuning job which is discussed in the third post in this series. You can jump to the first and third post via these links: Part One: Setup, Part Three: Training and Inference.

Apr 19 2023
Apr 19
Pierce Lamb7 min read

Apr 19, 2023

Or rather, creating a reusable ML Pipeline initiated by a single config file and five user-defined functions that performs classification, is finetuning-based, is distributed-first, runs on AWS Sagemaker, uses Huggingface Transformers, Accelerate, Datasets & Evaluate, PyTorch, wandb and more.

This post originally appeared on VISO Trust’s Blog

This is the introductory post in a three part series. To jump to the other posts, check out Creating a ML Pipeline Part 2: The Data Steps or Creating a ML Pipeline Part 3: Training and Inference

Introduction

On the Data & Machine Learning team at VISO Trust, one of our core goals is to provide Document Intelligence to our auditor team. Every document that passes through the system is subject to collection, parsing, reformatting, analysis, reporting and more. Part of that intelligence is automatically determining what type of document has been uploaded into the system. Knowing what type of document has entered the system allows us to perform specialized analysis on that document.

The task of labeling or classifying a thing is a traditional use of machine learning, however, classifying an entire document — which, for us, can be up to 300+ pages — is on the bleeding edge of machine learning research. At the time of this writing, researchers are racing to use the advances in Deep Learning and specifically in Transformers to classify documents. In fact, at the outset of this task, I performed some research on the space with keywords like “Document Classification/Intelligence/Representation” and came across nearly 30 different papers that use Deep Learning and were published between 2020 and 2022. For those familiar with the space, names like LayoutLM/v2/v3, TiLT/LiLT, SelfDoc, StructuralLM, Longformer/Reformer/Performer/Linformer, UDOP and many more.

This result convinced me that trying a multitude of these models would be a better use of our time than trying to decide which was the best among them. As such, I decided to pick one and use the experience of fine-tuning it as a proof-of-concept to build a reusable ML pipeline the rest of my team could use. The goal was to reduce the time to perform an experiment from weeks to a day or two. This would allow us to experiment with many of the models quickly to decide which are the best for our use case.

The result of this work was an interface where an experimenter writes a single config file and five user defined functions that kick off data reconciliation, data preparation, training or tuning and inference testing automatically.

When I set out on that proof-of-concept (pre-ML Pipeline), it took over a month to collect and clean the data, prepare the model, perform inference and get everything working on Sagemaker using distribution. Since building the ML Pipeline, we’ve used it repeatedly to quickly experiment with new models, retrain existing models on new data, and compare the performance of multiple models. The time required to perform a new experiment is about half a day to a day on average. This has enabled us to iterate incredibly fast, getting models in production in our Document Intelligence platform quickly.

What follows is a description of the above Pipeline; I hope that it will save you from some of the multi-day pitfalls I encountered building it.

ML Experiment Setup

An important architectural decision we made at the beginning was to keep experiments isolated and easily reproducible. Everytime an experiment is performed, it has its own set of raw data, encoded data, docker files, model files, inference test results etc. This makes it easy to trace a given experiment across repos/S3/metrics tools and where it came from once it is in production. However, one trade off worth noting is that training data is copied separately for every experiment; for some orgs this simply may be infeasible and a more centralized solution is necessary. With that said, what follows is the process of creating an experiment.

An experiment is created in an experiments repo and tied to a ticket (e.g. JIRA) like EXP-3333-longformer. This name will follow the experiment across services; for us, all storage occurs on S3, so in the experiment's bucket, objects will be saved under the EXP-3333-longformer parent directory. Furthermore, in wandb (our tracker), the top level group name will be EXP-3333-longformer.

Next, example stubbed files are copied in and modified to the particulars of the experiment. This includes the config file and user defined function stubs mentioned above. Also included are two docker files; one dockerfile represents the dependencies required to run the pipeline, the other represents the dependencies required to run 4 different stages on AWS Sagemaker: data preparation, training or tuning and inference. Both of these docker files are made simple by extending from base docker files maintained in the ML pipeline library; the intent is that they only need to include extra libraries required by the experiment. This follows the convention established by AWS’s Deep Learning Containers (DLCs) and, in fact, our base sagemaker container starts by extending one of these DLCs.

There is an important trade off here: we use one monolithic container to run three different steps on Sagemaker. We preferred a simpler setup for experimenters (one dockerfile) versus having to create a different container per Sagemaker step. The downside is that for a given step, the container will likely contain some unnecessary dependencies which make it larger. Let’s look at an example to solidify this.

In our base Sagemaker container, we extend:

FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04

This gives us pytorch 1.10.2 with cuda 11.3 bindings, transformers 4.17, python 3.8 and ubuntu all ready to run on the GPU. You can see available DLCs here. We then add sagemaker-training, accelerate, evaluate, datasets and wandb. Now when an experimenter goes to extend this image, they only need to worry about any extra dependencies their model might need. For example, a model might depend on detectron2 which is an unlikely dependency among other experiments. So the experimenter would only need to think about extending the base sagemaker container and installing detectron2 and be done worrying about dependencies.

With the base docker containers in place, the files needed for the start of an experiment would look like:

In brief, these files are:

  • settings.ini: A single (gitignored) configuration file that takes all settings for every step of the ML pipeline (copied into the dockerfiles)
  • sagemaker.Dockerfile: Extends the base training container discussed above and adds any extra model dependencies. In many cases the base container itself will suffice.
  • run.Dockerfile: Extends the base run container discussed above and adds any extra run dependencies the experimenter needs. In many cases the base container itself will suffice.
  • run.sh: A shell script that builds and runs run.Dockerfile.
  • build_and_push.sh: A shell script that builds and pushes sagemaker.Dockerfile to ECR.
  • user_defined_funcs.py: Contains the five user defined functions that will be called by the ML pipeline at various stages (copied into the dockerfiles). We will discuss these in detail later.

These files represent the necessary and sufficient requirements for an experimenter to run an experiment on the ML pipeline. As we discuss the ML pipeline, we will examine these files in more detail. Before that discussion, however, let’s look at the interface on S3 and wandb. Assume that we’ve set up and run the experiment as shown above. The resulting directories on S3 will look like:

The run_number will increment with each subsequent run of the experiment. This run number will be replicated in wandb and also prefixed to any deployed endpoint for production so the exact run of the experiment can be traced through training, metrics collection and production. Finally, let’s look at the resulting wandb structure:

I hope that getting a feel for the interface of the experimenter will make it easier to understand the pipeline itself.

The ML pipeline

The ML pipeline will (eventually) expose some generics that specific use cases can extend to modify the pipeline for their purposes. Since it was recently developed in the context of one use case, we will discuss it in that context; however, below I will show what it might look like with multiple:

Let’s focus in on ml_pipeline:

The environment folder will house the files for building the base containers we spoke of earlier, one for running the framework and one for any code that executes on Sagemaker (preprocessing, training/tuning, inference). These are named using the same conventions as AWS DLCs so it is simple to create multiple versions of them with different dependencies. We will ignore the test folder for the remainder of this blog.

The lib directory houses our implementation of the ML pipeline. Let’s zoom in again on just that directory.

Let’s start with run_framework.py since that will give us an eagle eye view of what is going on. The skeleton of run_framework will look like this:

The settings.ini file a user defines for an experiment will be copied into the same dir (BASE_PACKAGE_PATH) inside each docker container and parsed into an object called MLPipelineConfig(). In our case, we chose to use Python Decouple to handle config management. In this config file, the initial settings are: RUN_RECONCILIATION/PREPARATION/TRAINING/TUNING/INFERENCE so the pipeline is flexible to exactly what an experimenter is looking for. These values constitute the conditionals above.

Note the importlib line. This line allows us to import use-case specific functions and pass them into the steps (shown here is just data reconciliation) using an experimenter-set config value for use case.

The moment the config file is parsed, we want to run validation to identify misconfigurations now instead of in the middle of training. Without getting into too much detail on the validation step, here is what the function might look like:

The _validate_funcs function ensures that functions with those definitions exist and that they are not defined as pass (i.e. a user has created them and defined them). The user_defined_funcs.py file above simply defines them as pass, so a user must overwrite these to execute a valid run. _validate_run_num throws an exception if the settings.ini-defined RUN_NUM already exists on s3. This saves us from common pitfalls that could occur an hour into a training run.

We’ve gotten to the point now where we can look at each pipeline step in detail. You can jump to the second and third post via these links: Part Two: The Data Steps, Part Three: Training and Inference.

Apr 18 2023
Apr 18

In the PHP language, autoloading is a way to automatically include class files of a project in your code. Say you had a complex object-oriented PHP project with more than a hundred PHP classes. You'd need to ensure all your classes were loaded before using them. This article aims to help you understand the what, why, and how of autoloading, along with namespaces and the use keyword, in PHP.

What is autoloading?

In a complex PHP project, you're probably using hundreds of classes. Without autoloading, you'd likely have to include every class manually. Your code would look like this:

This is tedious at best and unmaintainable at worst.

What if, instead, you could have PHP automatically load class files when you need it? You can, with autoloading.

PHP autoloading 101

It only takes two steps to create an autoloader.

  1. Write a function that looks for files that need to be included.
  2. Register that function with the spl_autoload_register() core PHP function.

Here's how to do that for the above example:

There you go. You no longer have to manually require_once every single class file in the project. Instead, with your autoloader, the system automatically requires a file as its class is used.

For a better understanding of what's going on here, walk through the exact steps in the above code:

  1. The function my_custom_autoloader expects one parameter called $class_name. Given a class name, the function looks for a file with that name and loads that file.
     
  2. The spl_autoload_register() function in PHP expects one callable parameter. A callable parameter can be many things, such as a function name, class method, or even an anonymous function. In this case, it's a function named my_custom_autoloader.
     
  3. The code is therefore able to instantiate a class named SomeClass1 without first having required its PHP file.

So what happens when this script is run?

  1. PHP realizes that there's not yet a class named SomeClass1 loaded, so it executes registered autoloaders.
     
  2. PHP executes the custom autoload function (my_custom_autoloader), and it passes in the string SomeClass1 as the value for $class_name.
     
  3. The custom function defines the file as $file = __DIR__.'/includes/SomeClass1.php';, looks for its existence (file_exists()), then (as long as the file is found) marks it as required with require_once __DIR__.'/includes/SomeClass1.php';. As a result, the class's PHP file is automatically loaded.

Huzzah! You now have a very simple autoloader that automatically loads class files as those classes are instantiated for the first time. In a moderately sized project, you've saved yourself from writing hundreds of lines of code.

What are PHP namespaces?

Namespaces are a way to encapsulate like functionalities or properties. An easy (and practical) analog is an operating system's directory structure. The file foo.txt can exist in both the directory /home/greg and in /home/other, but two copies of foo.txt cannot coexist in the same directory.

In addition, to access the foo.txt file outside of the /home/greg directory, you must prepend the directory name to the file name using the directory separator to get /home/greg/foo.txt.

You define a namespace at the top of a PHP file using the namespace keyword:

In the above example, I've encapsulated the do_something() function within the namespace of Jonathan. This implies a number of things—most importantly, neither of those things conflicts with functions of the same name in the global scope.

For example, say you have the above code in its own file named jonathan-stuff.php. In a separate file, you have this:

No conflict. You have two functions named do_something()they are able to co-exist with one another.

Now all you have to do is figure out how to access the namespaced methods. This is done with a syntax very similar to a directory structure, with backslashes:

This code executes the function named do_something() residing within the Jonathan namespace.

This method is also (and more commonly) used with classes. For example:

?php

namespace Jonathan;

class SomeClass { }

This can be instantiated like so:

With namespaces, very large projects can contain many classes that share the same name without any conflicts. Pretty sweet, right?

What problems do namespaces solve?

To see the benefits namespaces provide, you have only to look back in time to a PHP without namespaces. Before PHP version 5.3, you couldn't encapsulate classes, so they were always at risk of conflicting with another class of the same name. It was (and still is, to some degree) not uncommon to prefix class names:

As you can imagine, the larger the code base, the more classes, and the longer the prefixes. Don't be surprised if you open an old PHP project some time and find a class name more than 60 characters long, like:

What's the difference between writing a long class name like that and writing a long class name like \Jonathan\SomeEntity\SomeBundle\SomeComponent\Validator? That's a great question, and the answer lies in the ease of using that class more than once in a given context. Imagine you had to make use of a long class name multiple times within a single PHP file. Currently, you have two ways of doing this.

Without namespaces:

Oof, that's a lot of typing. Here it is with a namespace:

Elsewhere in the code:

That certainly isn't much better. Luckily, there's a third way. You can leverage the use keyword to pull in a namespace.

The use keyword

The use keyword imports a given namespace into the current context. This allows you to make use of its contents without having to refer to its full path every time you use it.

Now you can do this:

Aside from encapsulation, importing is the real power of namespaces.

Now that you have an idea of what both autoloading and namespaces are, you can combine them to create a reliable means of organizing your project files.

PSR-4: The standard for PHP autoloading and namespaces

PHP Standard Recommendation (PSR) 4 is a commonly used pattern for organizing a PHP project so that the namespace for a class matches the relative file path to the file of that class.

For example, you're working in a project that makes use of PSR-4 and you have a namespaced class called \Jonathan\SomeBundle\Validator();. You can be sure the file for that class can be found in this relative location in the file system: /Jonathan/SomeBundle/Validator.php.

Just to drive this point home, here are more examples of where a PHP file exists for a class within a project making use of PSR-4:

  • Namespace and class: \Project\Fields\Email\Validator()
    • File location: /Project/Fields/Email/Validator.php
       
  • Namespace and class: \Acme\QueryBuilder\Where
    • File location: /Acme/QueryBuilder/Where.php
       
  • Namespace and class: \MyFirstProject\Entity\EventEmitter
    • File location: /MyFirstProject/Entity/EventEmitter.php

This isn't actually 100% accurate. Each component of a project has its own relative root, but don't discount this information: Knowing that PSR-4 implies the file location of a class helps you easily find any class within a large project.

How does PSR-4 work?

PSR-4 works because it's achieved with an autoloader function. Take a look at one PSR-4 example autoloader function:

Now assume you've just instantiated the new \Foo\Bar\Baz\Bug(); class.

  1. PHP executes the autoloader with the $class parameter using the string value $class = "\Foo\Bar\Baz\Bug".
     
  2. Use str_replace() to change all backslashes into forward slashes (like most directory structures use), turning the namespace into a directory path.
     
  3. Look for the existence of that file in the location /src/Foo/Bar/Baz/Bug.php.
     
  4. If the file is found, load it.

In other words, you change Foo\Bar\Baz\Bug to /src/Foo/Bar/Baz/Bug.php then locate that file.

Composer and autoloading

Composer is a command-line PHP package manager. You may have seen a project with a composer.json file in its root directory. This file tells Composer about the project, including the project's dependencies.

Here's an example of a simple composer.json file:

{
    "name": "jonathan/example",
    "description": "This is an example composer.json file",
    "require": {
        "twig/twig": "^1.24"
    }
}

This project is named "jonathan/example" and has one dependency: the Twig templating engine (at version 1.24 or higher).

With Composer installed, you can use the JSON file to download the project's dependencies. In doing so, Composer generates an autoload.php file that automatically handles autoloading the classes in all of your dependencies.

(Jonathan Daggerheart, CC BY-SA 4.0)

If you include this new file in a project, all classes within your dependency are automatically loaded, as needed.

PSR makes PHP better

Because of the PSR-4 standard and its widespread adoption, Composer can generate an autoloader that automatically handles loading your dependencies as you instantiate them within your project. The next time you write PHP code, keep namespaces and autoloading in mind.

Apr 17 2023
Apr 17

Drupal 7 is going End-of-Life eventually. While this had been originally scheduled for a date which has passed for a couple of years already, support got extended and may well be finished in November this year - although that may still get extended again for another year or two. That's why there are plenty of discussions in and around the Drupal community of what should be done to all those still existing and mostly deeply necessary Drupal 7 sites. The options are:

  • Upgrading to Drupal 10
  • Staying on Drupal 7
  • Switching to another platform, e.g. Backdrop, WordPress or others

Why is the first option certainly the best?

Let's face it, every Drupal site is hosted on that platform for good reasons. Different reasons for each of them. While the rich feature set is important to most, the performance, its scalability and security are must-haves not only for enterprise websites. A web presence of any size and purpose benefits from those aspects and many others, this blog post is not about to enumerate them all. But we know for sure, that hundreds of thousands of Drupal 7 sites haven't moved away from the platform even after that many years. Simply because Drupal is what suits them best. That's why moving away from Drupal is rarely a great idea.

So, why are so many still holding back on updating to the modern Drupal framework?

One out of these three reasons is always brought up - sometimes even all of them at once:

  • Missing resources, either budget or people power
  • Complexity of the modern technology
  • Missing functionality due to not yet updated modules

There is a lot to say about the resource constraints. While I can't speak for every individual case out there, what's very common these days is the unfortunate focus on short-term indicators. In the context of an upgrade to Drupal 10 that approach misses out on important gains, the composable architecture of the modern Drupal platform in particular. In other words, the initial jump might be more challenging. But from then on, the platform is maintainable and easily upgradable even across major updates for decades to come. Drupal has demonstrated this with already 2 major updates from 8 to 9 and then 9 to 10. Not only has the Drupal community kept up with an easy upgrade path, their experience helps them to make it even easier for each future generation of Drupal.

Having said that, the same infrastructure paradigm-shift is the reason why this composer-based dependency management is often perceived to be complicated. Where in reality the opposite is true. I'm not saying there isn't another learning curve ahead of us. There is one. And always will be. Not only in this post-Drupal-7 era, but everywhere in life. And that's a good thing because if we didn't progress, we would be falling behind continuously. Not only that, the technical debt of our old technology will cause hidden costs over and over again. In other words, not having a budget for a Drupal 7 to 10 update makes me wonder where the budget for the continued use of either outdated, or in the case of switching the platform, less capable technology should come from? While the ongoing effort for running the modern Drupal platform declines immediately, the cost of not upgrading increases exponentially as long as the decision is not being taken.

This debate is difficult, I know. And I feel sorry for alluding to this, as it may be challenging for many. Therefore, let's switch gears and have a look into the third reason why so many Drupal 7 sites haven't updated yet: missing functionality due to not yet updated modules.

Does Drupal 10 provide everything needed?

Drupal core has never been faster, it has never been more stable, it was never easier to get started while at the same time being the most user-friendly Drupal we've ever seen. And that describes just the status quo. So many initiatives are working hard, day and night, in moving forward on all technical levels of Drupal. So, yes, Drupal 10 is ready for prime-time.

However, there is probably not a single Drupal site that works without any additional modules that are not part of its core package. And from the perspective of a Drupal 7 site owner or maintainer, it is likely that a number of used modules on Drupal 7 don't seem to be available for Drupal 10. Even worse, it looks like as if some of those modules haven't even tried or declared officially that they won't ever upgrade to modern Drupal.

As a Drupal service provider, we at LakeDrops have been in that situation with many customer projects over the past years as well. What we've learned, though, there is always a solution when upgrading. Sometimes, it's not the same module in Drupal 10 that used to do the job back in the old days. Just like with spring-cleaning, moving to a modern technical platform comes with some re-structuring and re-thinking on how certain tasks should be done.

This is particularly true for the ECA module, which we've helped to architect, develop and continue to maintain, together with a growing team of Drupal community members. It's an event driven no-code solution, which allows configuring the execution of any Drupal provided action under configurable conditions. Hence, the name ECA, it stands for Events - Conditions - Actions. Very much like the really famous Rules module in Drupal 7, ECA allows the site builder to configure the behaviour of their Drupal site in literally all areas without having to hire programmers. And ECA does so much more that we even receive "love letters" from formerly frustrated Drupal 7 users who couldn't seem to find what they needed in Drupal 10, until they've found ECA. More to that in a minute.

How can ECA help to upgrade your Drupal 7 site?

The Rules module is one if not the most popular Drupal 7 module. As of writing in Spring of 2023, over 150.000 Drupal sites are using it, of which more than 90% are still on Drupal 7. Without judging, those users don't feel comfortable with the Drupal 9 or Drupal 10 version of that module and therefore got locked into their Drupal 7 environment. But that's not all, a significant number of mostly less known modules are no longer maintained and don't seem to have alternative solutions.

With ECA, all Drupal 7 users can shift their showstoppers aside and get onto Drupal 10. It's why we started the ECA project in the first place a couple of years ago. Most of our Drupal 7 customers couldn't afford to stay on Drupal 7, and we had to provide a solution to move forward. And ECA stood up not only for us and our customers, it does grow in popularity in seemingly all areas where Drupal is being utilized. It's in production for huge and extremely complex enterprise grade web applications as well as for large and medium commercial websites, online shops, intranets and portals. ECA also loves the small ones, believe it or not, as it is the solution for all those "tiny" requirements on personal blogs or other small websites as much as for the challenging tasks.

No doubt, while ECA provides the site owners with access to all the power of Drupal from within the admin interface, it has proven to be solid, maintainable, yet non-intrusive to the rest of the application or website. But is all that technical excellence convincing enough to get off the island and come to Drupal 10? No, there needs to be more.

Right from the beginning, the ECA team has been transparent, approachable and welcoming to other maintainers and users likewise. This lead to a flourish community of people who are interested in, working with, developing for, or otherwise get involved with ECA. As already mentioned, there are so many users who already managed to upgrade from Drupal 7 to 10 or are right in the middle of that process. When reading their comments, e.g. in the #ECA channel of Drupal slack, it becomes obvious how much burden ECA has taken off their shoulders.

Are you next?

Apr 17 2023
Apr 17

Nowadays, NGOs and nonprofits need as much exposure as they can get. The days of local advertisements are beginning to wane. We are now in the digital age. The era of online and the Internet.

It’s almost impossible to overstate the impact the Internet has had on the modern world. Of the many things the Internet has allowed people to do, one of them is the ability for a message to reach more people than ever thought possible, from places all around the world. Any organization that wants to put its name out there and be heard now has to create a website.

 

Maximum exposure may not be the only reason nonprofits or NGOs want a website, however. Recently, we see a shift by many big organizations to have a larger online presence. Many companies offer their services through their main website and communicate with their patrons on social media. While getting your message out there may be important, the way users engage with the website and the services it provides are equally crucial factors for any organization.

The prospective organization looking to make the transition to the digital age, or those who might already have a site but want to utilize it to their full potential, may find the abundance of CMSs to choose from dizzying. Organizations should, however, consider using Drupal for their websites, and here are just a couple reasons why.

 

Cost

The cost of building and maintaining a website can vary wildly. It depends on what platform you use, server costs, whether or not you hire professional help to design and maintain your web page. Due to their nature, it may be in a nonprofit or NGO’s best interest to try and find the most cost-effective way to build a website. While a simple drag-and-drop pre-built theme may look pretty and be the cheapest option available, these often lack the features under the hood that nonprofits and NGOs require for a really polished, professional user experience.

If you’re looking to keep costs down, you will be pleased to know that Drupal is license-free and open-source. This keeps costs down, as the usually exorbitant licensing fees that other CMS applications ask for aren’t present in Drupal. In addition, all modules and themes that are found on Drupal’s website, and several of those that aren’t, are free to use for web development.

While other CMS vendors may attach additional costs for each new server, ongoing maintenance, each module a prospective buyer would like to purchase, and, on top of all this, charge a monthly or annual licensing fee to use their system, Drupal’s suite of features and modules built with the idea of community-supported, open-source software in mind makes it an attractive option for organizations that want a CMS that won’t put a hole in their wallet.

Easy and Ready to go

Designing websites can take a while to do, and that includes both the visual design and the underlying code that makes the site run. The longer it takes to get a website up and running, the more it can eat up an organization’s precious budget, especially when working with a professional web development agency.

Sometimes it may not even be a budget issue, but rather an issue of time. A site may need to be put up as quickly as possible on the heels of a natural disaster or some event that needs attention. Drupal can help you get that website online fast.

Drupal comes with a number of features that allow users to get a website out there doing what they want right out of the box. Themes and website builder kits such as YG Charity and OpenAid are built with cause-driven organizations in mind, allowing users to quickly set up a site with features expected of a professional website, such as blog integration, image galleries, team profile pages, and testimonials.

Some organizations have even developed their own website starter kits for any new chapters they have springing up. The YMCA, who have scattered organizations around the world, have Open Y, a digital distribution platform shared by the founding YMCAs in order to help fledgling YMCAs develop their own websites.

True to Drupal’s commitment to modularity and the nature of open-source software in general, these kits can simply be used on their own or used as templates to build upon for more customized looks and features that better benefit the particular message you want. Bits and pieces can also be borrowed from elements of this kit to help a web developer build their own features, allowing for much quicker development and deployment of said features. Such kits allow smaller NGOs, who may not have as many resources as other organizations, to develop websites and let their causes be heard.

Get a Free Consultation on Boosting Your Donations.

Scalability

Whether your nonprofit is small or large, Drupal has the tools to benefit websites of all shapes and sizes.

As previously mentioned, smaller organizations can benefit from Drupal’s low barrier to entry thanks to the ease of setting up a website quickly, but what about larger organizations that have more complex needs? Drupal is equipped to handle them as well.

Perhaps your organization is thinking of launching a website for a particular campaign while maintaining its own separate main website. Drupal has the ability to connect your main site to any further sites you may want to launch in the future.

Organizations such as the Great Ormond Street Hospital Charity have had great success using modules such as Organic Groups to launch and maintain several different websites for their multiple campaigns.

Drupal not only allowed them to extend their reach but also helped them handle what came with that growth. Under Drupal, their site is able to handle surges in user traffic, avoiding lag and crashes that would’ve soured the user experience during the time where it was most crucial: during a spike in user traffic.

Stories of organizations such as UNRWA showcase that Drupal’s flexibility means that it is able to handle any size website, no matter what their needs may be. Shameless plug: the UNRWA site is a Vardot project, so if you liked it, feel free to shoot us a message—we’ll be more than happy to accommodate you!

 

While launching other websites may seem daunting, especially once different content editors and site administrators get in the mix, this CMS supports editor-friendly features, such as role permissions, editing authorizations and the ability to tag and categorize site content. Drupal’s interface streamlines the editing of content, while editing authorizations keep content editors from changing something they aren’t supposed to and causing confusion.

With a little UI configuration, Drupal offers a nice, comfortable experience for your editors so they can provide a pleasant experience to those browsing your website.

 

Robust Features

Drupal, being an open-source CMS, has a wide array of features supported by a dedicated community. As we’ve already shown, many of these features are used to a nonprofit or an NGO, but with a vast suite of modules, there are many more that you may find interesting. It depends on what you want to do with your website.

Perhaps you want your message to reach more people. Drupal’s website contains multiple responsive mobile website themes to make mobile websites that look and feel great for those browsing your site on their phones.

Perhaps you’d like to not just reach mobile users, but people from all over the globe? Drupal features support for translating content shown to site visitors in their own local language to make sure the message of your cause reaches far, without the added complexity of maintaining multiple alternate-language websites. Furthermore, this CMS features the ability to change the language of the system interface, allowing ease of access to the people working on the website, wherever they may come from.

Perhaps you want your organization to receive donations online. Drupal modules like Payment are simple to install and allow you to securely accept donations through your website through multiple payment gateways like PayPal and credit cards.

If you want to show the progress towards a donation goal, there are multiple modules that integrate a donation thermometer onto your webpage to show your site visitors how close you are to hitting that goal.

Drupal is also great for SEO, for when you want your organization to be boosted in the search rankings to attract more traffic and get your message spread wider. There are a number of modules that help you do this. Pathauto automatically generates SEO-friendly URLs for your web pages. SEO Checklist is a to-do list of best optimization practices, checking your site for what you have already done and telling you what to do to have a fully optimized website.

These are only just a few of the Drupal modules available that fit the basic needs of a nonprofit or an NGO. With proper development using this CMS, you can make a website tailored to your cause and message with the features you need.

Conclusion

Web presence is an important thing in today’s world. An enormous amount of the population is on the Internet now. Thus, it becomes important for nonprofits and NGOs to put themselves into cyberspace to get their message out there and be noticed.

Just having a page on the Internet is not enough, however. SEO, site features, the overall user experience, the look and feel of the website—these are all important factors to maintaining a successful website, and these things need to be great whether you’re a big organization or a small one.

Right now, Drupal is used to power multiple global and local nonprofits and NGO websites (such as the UNRWA website developed by Vardot). With its broad community support and a flexible base system built to fit custom needs, Drupal offers all kinds of tools to build and benefit an organization’s website.

Are you an NGO looking to increase its online presence? Feel free to reach out to us, and we’ll be more than happy to help!

11.5+ Million USD Processed Through the UNHCR, UN Refugee Agency's Drupal Fundraising Platform Developed by Vardot.

Message us through our Contact Us page, or via email at [email protected].

Apr 17 2023
Apr 17

Despite all the disruption and technological advances that occurred over the past few years, the structure of higher education institutions and the state of academic curriculums are closer to what would be the case during the early stages of industrialization.

In 2020, we are well into the digital era and students face a set of challenges unique to their age and technology plays a significant role, both as part of the challenge and as an opportunity to overcome those challenges.

Can schools and universities afford to operate using the existing model? What technologies will be needed? How will higher education experiences change in the coming years?

One fact is undeniable; higher education should focus on the direct needs of students and the demands of an increasingly digital market in the future.


1. Flexible Learning Experiences

A flexible learning experience is essential these days; not just because of unexpected force majeure circumstances such as the global COVID-19 outbreak.

New challenges have arisen due to the shift in the digital economy's demand for new skill sets and capabilities. The job market has witnessed a sharp demand for "digital", "innovation", "data science", and "information" in their job descriptions and titles.

The current curriculum being taught is fast becoming irrelevant and seems rigid to those seeking to learn a new in-demand skill or enhance their professional status. As a result, learning centers that provide focused online workshops and certifications are increasingly becoming a popular and cheaper alternative thanks to the flexibility, focus, and accessibility they have on offer for learners.


“The learner, the learning provider and the employer all are speaking different languages that don’t interconnect,” said Michelle Weise, chief innovation officer at the Strada Institute for the Future of Work.

Universities and schools should consider the subscription-based model where students would pay monthly fee subscriptions to attend the modules and courses they prefer instead of attending 9-hour school days. Learners can complete their selected courses at their own pace and enjoy convenient accessibility.

Students should be able to access specific learning resources, materials, libraries, labs, and digital learning activities at any time from anywhere across all devices.

Flexibility and accessibility are standard requirements for any digital experience and should be at the forefront when considering an enterprise-level digital project. But it's easier said and done - identifying and investing in the right IT infrastructure will be essential to support a comprehensive digital learning experience for your students.

2. Remote and Distance Learning

The days of strict daily hours of attendance of pre-fixed classroom schedules are over. 

We were inevitably going to rely more heavily on remote and distance learning in the upcoming years, however; the recent outbreak of COVID-19 across the globe has accelerated this process.

They realized the importance of aligning their strategic business needs with the ideal mix of technology value-stack required to ensure their digital growth and future.

We specialize in providing higher education institutions with comprehensive digital transformation solutions based on Drupal and we witnessed a sharp increase in both verbal interest and demand from schools, universities, and even teachers for our services.

Schools and universities have immediately reallocated their resources and budgets to prioritize remote and distance learning. Unfortunately, not all schools and universities are technologically capable to embrace full-fledged remote learning as the norm.

Almost every industry has been disrupted by the outbreak of the Coronavirus, but schools and universities have been hit the hardest because they had to reckon with the fact that their entire business model and operations have to change. Do they even need a campus anymore?

Drupal 9 Transforms the University of Doha for Science and Technology (UDST) Digital Experience

3. Immersive Learning Experiences

There is no doubt that flexibility, accessibility, and maximum immersion will be critical in appealing to students around the world. 

This has led over 70% of the top 100 universities around the world to adopt Drupal which enabled them to build a truly digital campus and immersive learning experience for their students and faculty. Drupal has enabled these institutions to seamlessly integrate essential cutting-edge solutions that complement and support their digital transformation.

Essential integrations such as VR.

VR implementation in learning is on the rise and enables any school or university around the world to offer classes to a global audience of interested pupils. For example, a student from Beijing can attend a French language class at a middle school based in San Francisco through VR.

Some universities have introduced A.I. into their learning experiences as well.

Through A.I., students can learn from a robot teacher. This teacher is capable of grading, checking spelling, grammar, etc., and also monitors the performance of students to identify gaps in knowledge and weaknesses which in turn can be analyzed by academic advisors to further support their students improve their grades and performance.

Several interesting experiments are being run by Georgia Tech's Center for 21st Century Universities (a Drupal 8 platform) to identify which solutions work and which don't that are trying to find the right balance between the involvement of A.I. in education and learning experiences.

Who Leads Digital Transformation?

Transformation is inevitable but it is often misguided due to a lack of education (pun unintended) and awareness regarding the technologies that will be essential to elevate a university or school into a modern higher education experience.

Academics and higher education institutions must realize that technology alone is not the answer. It is only the tool.

Top management and leadership at academic institutions must drive the change and digital transformation based on facts and solid requirements in a bid to identify which technologies will be essential to build the foundation for an evolving interconnected web of digital learning assets and platforms.

Some have even adopted the wrong tools - technologies that do not match their digital transformation requirements or support their objectives.

Technology alone won’t be the answer to the existential challenges that schools and universities face. The culture of teaching and learning must also change. The real question that needs to be addressed is: what good or value is higher education supposed to deliver?

Addressing the needs of students already living in a digital environment and a digital marketplace will be paramount and Coronavirus might just be that disruptor that forces us to accelerate the digital transformation process.

One thing is for sure; it will force higher education institutions and academics to go back to the roots of education: storytelling, research, and development.

In the meantime, we will continue to provide our assistance to universities and schools in identifying the appropriate technologies needed to develop the ideal digital learning experience for their students.

Questions About Drupal? 

Apr 17 2023
Apr 17

Just because I share content online doesn't mean I want to share control over it.

My website is a perfect example of what I mean. I take photos nearly everywhere I go: To date, I have more than 10,000 photos uploaded to my Drupal site. Using something like Instagram might be easier, but my photos are precious to me, which is why I feel so strongly about preserving the open web.

There are many reasons proprietary platforms don't meet my requirements for sharing. First, I like to own my data. If you think back to early social media sites like MySpace, they infamously lost massive amounts of user data. Artists lost their music. People lost their photos. This sort of thing still happens on Facebook and other social media sites.

[ Related read: How to switch from Twitter to Mastodon ]

Second, I don't like how proprietary platforms limit my creative freedom. Websites built on proprietary platforms often use templates, which makes them all look the same. Similar trends are happening in the design world — minimalist design has led to the death of detail. From doorbells to templated websites, unique design elements have been eliminated in favor of safe neutrals in an attempt to please everyone. This trend strips the personality right out from under websites, which used to be highly personal. Remember when people's early internet homepages actually felt like their digital homes?

Finally, I don't like how proprietary platforms treat my friends and family. People get tracked every time you upload a photo to a social media site. They're more like social monetization sites that make you scroll for hours, only to target you with ads. The problem got so bad that even Apple added tracking protection to WebKit in its Safari browser. Now, many social sites have their own in-app browsers as a workaround, so they can still track you.

I want the open web to win. That's why the ongoing enhancements to Drupal are such good news.

A world where good software wins

A few decades ago, the web was about getting information. Today, it's ingrained in every aspect of daily life. People can pay bills, work, socialize, get healthcare—and still get information. As more and more activities take place on the web, there's an even greater responsibility to ensure that the open web is inclusive of every person and accounts for everyone's safety. When people are excluded from accessing online experiences, they're cut out of rewarding careers, independent lifestyles, and the social interactions and friendships that bring people together. That's why good software and open source projects like Drupal are so important to protecting and growing the open web.

Good software is open, flexible, pro-privacy, secure, and doesn't lock you in. Good software lets you control your own code and your own data. It lets you prioritize what's important to you, whether that's accessibility, privacy, or something else. You're fully in control of your own destiny.

Good software also cares about end users. To that end, the Drupal community has been working on making software that solves people's problems and brings even more users into the community. This will be our priority now and in the future. We want to build a world where good software wins.

Opening up Drupal to more users with improved composability

Many people and organizations around the world want to build ambitious web experiences that are secure, scalable, and mindful of privacy—differentiated experiences that lead with accessibility. Drupal is for them, in part because of its composability.

Composability is one of the hottest trends in the technology market right now. Composability starts with software that's modular and made up of components. With more than 40,000 modules, Drupal meets that requirement. However, there's so much more to composability than an architecture for developers. Composability is a new way of doing business.

It's about providing low-code or no-code tools that enable business users to participate in digital experience building. Even non-technical users can do this from Drupal's user interface. Layout Builder, a visual design tool, helps content editors and site builders easily and quickly create visual layouts for displaying various content types. Composability is also about decoupling the front end from the back end and allowing users to push content and data to different touchpoints on the web. Drupal has been investing in headless and decoupled architectures for almost a decade and now ships with headless capabilities out of the box.

Finally, because composability offers limitless possibilities to combine modules, it can easily get complicated. You need a good way to search for modules and update sites to manage dependencies and versions. Drupal uses a tool called Project Browser to make it easier for site builders to browse innovative modules built by the community. Distributions and recipes allow users to bundle modules and configurations together for reusable, prepackaged business solutions.

If you haven't checked out Drupal in a while, I recommend you see it for yourself. The latest version, Drupal 10, recently shipped with even more innovations including:

  • Modernized front-end experience (Olivero theme)
  • Modernized back-end experience (Claro theme)
  • An improved content editing experience (CKEditor 5)
  • Improved developer experience built on Symfony 6.2 with support for PHP 8.2
  • A new theme generator
  • And more

Building a better web for the future

The launch of Drupal 10 comes at a good time. There's so much turmoil happening within the top social networking sites. If nothing else, it's a good reminder that preserving and promoting an open web is more important than ever before. Open source, the IndieWeb, the Fediverse, decentralized social media platforms, and even RSS are all seeing newfound appreciation and adoption as people realize the drawbacks of communicating and collaborating over proprietary platforms.

An open web means opportunity for all. And open source gives everyone the freedom to understand how their software works, to collectively improve it, and to build the web better for future generations. The main focus of Drupal 10 was to bring even more site builders to Drupal. In that way, Drupal will help extend the open web's reach and protect its long-term well-being for years to come.

Apr 11 2023
Apr 11

Back in 2021, we created a Zoocha-made starter theme based on Storybook with the purpose of having a ready-to-use/out-of-the-box theme. Here, any front-end Drupal developer has immediate access to the basic structure and elements needed to start the development of a new project.

However, upon the initial review, we decided that our theme needed a further push in the right direction.

First thing's first: What is Storybook and what are its benefits?

Storybook is a front-end tool for the production of design systems with standalone components and pages. It has its own UI, meaning that administrators, editors or anyone else with access is able to view all of the isolated elements of the site.

This starter theme is also based on the Atomic design methodology.

The system differentiates between 5 levels:

  1. Atoms
  2. Molecules
  3. Organisms
  4. Templates
  5. Pages
Atomic design methodology

In a practical sense, this allows us to define and style each element of a site (e.g. headings, paragraphs, buttons, links) from an “atomic” level. This would then be followed by more complex site elements (e.g. menus, cards, lists) through the highest reuse of atoms possible. This way, each individual element remains consistent (as we try to avoid overriding styles) while enabling the power to affect the whole system with one quick change.

Simultaneously, the starter theme allows us to define smaller stylesheets, lessening the lines of code. This benefits not only the overall performance of the site, but also its carbon emissions.

So, how are we continuing its development?

First of all, we want this project to be accessible to all of our Front-end developers, whether they are senior members or new developers.

We have carefully lined up the goals that we would like to achieve from this project. Using this as an end product, we first created and structured a board, detailing the tasks that would need to be completed in order to deliver us to our goal. Subsequently, we produced corresponding tasks, which included all of the details and information that might be needed by the developers. 

Some of our main achievements have been:

  • Improving ALL of our existing elements, not only from a structural perspective but from an accessibility perspective.
  • Creating more useful atoms/molecules/organisms that are commonly used on similar sites.
  • Creating a script that allows us to install the theme from scratch. This will allow us to use the corresponding project's name and decide whether we want to install the full version or a lighter one.
  • Installing and setting up add-ons such as:
    • Controls: a “sandbox” where anyone with access can play with all the variables available for each element (button texts, text alignment, icons, width sizes, etc). By activating or modifying any of the possible variables, the user can, in real time, see the output of those changes. For example: 
Controls
  • Docs: this generates a page of documentation/information to overview and improve understanding of how an element works, as well as its variations and how the markup is built for each version.
Docs

As we encountered bugs using the theme in several existing projects, we were able to log these in the project board for review and resolution.

Additionally, any improvements thought of by the developers can be proposed and investigated by using the board. By doing so, we can ensure that the theme is always active and up-to-date.

We have also provided an updated internal document, which outlines how to set up the theme, how to use it, and any other “how-tos” that would be essential for the efficient utilisation of the board.

How does this add value to our project processes?

Having a Zoocha-made starter theme, while actively keeping it up-to-date and making good use of it on our projects has positive results for both the company and our clients. 

  • The more robust and complete the theme is, the fewer initial steps an FE developer would need when setting it up in a new project. Therefore, we would see a reduction on the estimations of the initial set up, as well as avoiding continuously setting up many of the basic elements repetitively used on each site. With slight branding changes, the project will use the starter theme basics with the brand-specific style applied.
  • This ensures the site is consistent and helps to reduce the amount of bugs or regression a project could face.
  • Our theme passes all accessibility standards and provides a useful tool to verify this on any new elements created at a project level.
Accessibility Standards
  • We have improved the documentation within Storybook so that any developer is able to much more efficiently understand the way each element works and any existing variations, as well as having the ability to quickly copy and paste the markup from each version.
  • By encouraging all of our FE team members to collaborate the way they see fit with the starter theme, not only does it help the theme to progress in a positive manner, but it also builds a healthy relationship within the team as we can work together to participate in our ideas and suggestions.
  • This also extends to our newer team members, which we find is the most important way to help them to feel part of the team, as well as getting to grips with the way that Storyboard works.
Apr 06 2023
Apr 06

What is the difference between Drupal 7 and Drupal 10?

The biggest difference is that Drupal 10 has “backward compatibility”. Drupal 7 does not have this. In other words, Drupal 10 is able to use modules, customisations and data originally created for Drupal 8, with some minor alterations.

Because deprecated code needs to be deleted, Drupal 10’s code is clean, the platform itself is nimble and this results in excellent website performance.

Drupal 10 is user friendly, easy-to-use, versatile and scalable. Some more technical differences include a new theme engine called Twig (introduced in Drupal 8), which replaces PHPTemplate in Drupal 7.

Because Drupal 10 requires an up-to-date hosting environment with the most recent PHP database engine or key-value store, it is faster than Drupal 7.

Content modelling has been simplified, so Drupal 10 is great for content-heavy web applications. CKEditor, a new text editor, provides users with many WYSIWYG editing features that were previously only available through extensions.

CKEditor 5 is available in Drupal 10, and looks amazing! Other features are responsive images, improved multilingual capabilities, JSON:API and more modules out-of-the-box.

Apr 06 2023
Apr 06

A stable version of Recipes is yet to be released in 2023, this initiative is a prominent part of Drupal 10.


Drupal 10 provides a powerful platform for building websites and applications. It offers various ways of site-building, including profiles, distributions, and now recipes. 

As part of the Drupal strategic initiative, site-builders & developers can greatly benefit from the improvements to be provided by the recipes initiative.

Drupal 10 recipes are expected to provide more flexibility and ease of use to site builders and developers, allowing them to create custom solutions that meet their specific needs. Although a stable version of recipes is yet to be released in 2023, this initiative is a prominent part of Drupal 10 features. 

This blog will help you understand how recipes are different from profiles and distributions and how they are a way forward in Drupal site-building. 

Understanding Profiles & Distributions


Profiles and distributions are often confused, but they are not the same thing. Drupal vanilla, or the basic Drupal installation, is relatively bare and lacks many of the essential features required to create a full-fledged website.

Profiles and distributions are pre-configured packages that contain a set of modules, themes, and configurations that can be used to create a specific type of website.

Profiles and distributions are pre-configured packages that contain a set of modules, themes, and configurations that can be used to create a specific type of website.

Distributions are built on top of Drupal and provide a use-case-specific package. They include a pre-selected set of modules, themes, and configurations that are designed to fulfill a particular use case.

For instance, a media and publishing distribution will include modules like feed, carousel banner, facet search, or similar features specific to media websites. 

Distributions are a great way to get started quickly and provide a solid foundation to build on top of.

Profiles, on the other hand, are subsets of distributions and are included in Drupal core. Drupal core comes with three installation profiles: Standard, Minimal, and Demo (Umami). 

Installation profiles determine the set of modules, themes, and configurations that are included in a distribution. 

For an e-commerce site, one could use the standard installation profile as a base and add additional modules and themes to customize the site.

Drupal vanilla is bare and lacks any pre-configured settings or features. Profile and Distribution are similar concepts in Drupal, but serve different purposes.

Enter recipes!

What are Drupal Recipes?


Recipes are a modular approach to site-building in Drupal. They are small use cases that can be easily combined or customized to create a unique solution. Recipes are like microservices that can be plugged in and played as needed.

Recipes are modular building blocks that allow developers to create custom site features quickly and efficiently.

A distribution is a use case that customizes Drupal to fulfill a specific need. Unlike distributions, recipes do not use installation profiles and can be tweaked at any point in the site-building process.

Installation profiles are part of Drupal core. These profiles determine which set of modules, themes, and configurations are installed during the initial setup of your Drupal site. 

The actual backend of any distribution is happening here, as installation profiles are responsible for setting up the initial site structure.

A Profile is a type of distribution that provides a more focused set of features for a specific use case. Profiles can be thought of as smaller, more specific distributions that cater to particular needs.

Unlike distributions, recipes do not use installation profiles and can be tweaked at any point in the site-building process.

To illustrate the differences between these site-building methods, let's consider an example. Suppose you want to build a news website that includes features such as a feed, carousel banner, and facet search. You could use a pre-built news distribution that includes these features out of the box. 

However, if you need to make further customizations, you would need to modify the installation profile or distribution, which could be time-consuming and complicated.

Alternatively, you could use an installation profile such as Standard and then install the necessary modules manually. This approach provides more flexibility, but it requires more effort and expertise to set up. 

Finally, you could use a recipe approach and install each required module and configure them individually. This approach provides the most flexibility but requires the most effort to set up.

Steps to install Drupal recipe

drupal-recipe-installation


 

Why Recipes?

One of the primary objectives of Recipes initiatives is to overcome the challenges, site maintainers and developers face with distributions and to:

  • Allow users to install multiple Drupal recipes on the same project, unlike the current scenario where selecting a distribution like OpenSocial prohibits the installation of another distribution like Commerce Kickstart or Thunder. This limitation will be eliminated, and multiple Drupal recipes can be installed on the same site.
     
  • Install a recipe at any point in a project's life cycle, which is currently not feasible. For instance, if a user wants to incorporate community collaboration tools in their site after a few years of using standard Drupal, they can do so without any impediment.
     
  • Simplify the process of maintaining the multisite architecture. This initiative aims to ensure that any changes made do not create additional challenges in this regard.
     
  • Make updating easier, which is currently a challenging task as every existing site is in a different state, the Update Helper module developed by a few distributions will be integrated into the core.
     
  • Make it easy for Drupal recipes to provide demo content, which is currently done in different ways such as importing from CSV or using custom modules, a functionality will be provided in the core to enable Drupal recipes to ship demo content.

What Drupal recipes are not

Drupal recipes have certain limitations, such as:

Wrapping Up

In conclusion, Drupal provides several site-building methods that allow users to create custom solutions to their specific needs. Profiles, distributions, and recipes are all powerful ways that can help you build your Drupal site efficiently and effectively. 

Drupal 10 recipes are an exciting addition to the Drupal ecosystem and will help make building websites and applications faster and more efficient than ever before. 

As a leading open-source community leader, OpenSense Labs has helped numerous enterprises transform their digital presence with our expert Drupal services and solutions. From custom Drupal development to UX design, we have the experience and expertise to help your organization succeed in the digital landscape.

Don't miss out on the opportunity to partner with a trusted and experienced team. Contact us today at [email protected] to learn more about how we can help you achieve your digital goals. 

Apr 05 2023
Apr 05
When you create a list using Views there’s a good chance you’ll add a filter to it. You could filter the list by content type, published status (if it’s published or unpublished), by author and more. If you click on “Add” in the “Filter criteria” section you can see all the available filters. The one […]
Mar 31 2023
Mar 31

To effectively communicate, we need to know who and how people are responding to our content. A key part of this is collecting data on our website, known as website analytics. 

For a long time, Google Analytics has been the de-facto analytics tool. Like many Google products, it’s free and ubiquitous. The growing concern over surveillance and big tech hegemony, however, is causing many nonprofits to rethink their relationship to Google and their data collection practices.

Google announced in 2021 that it would retire its old Google Analytics tool (Universal Analytics) in favor of a new, rewritten Google Analytics tool (Google Analytics 4). There is no way to bring your UA data into the new GA 4 platform. While GA 4 is more powerful, that power comes with increased complexity and a steep learning curve. 

Organizations are understandably questioning whether GA 4, a tool designed primarily for ecommerce websites, is a good fit for their nonprofit websites. There are several established and well-designed analytics tools that can be used instead of Google Analytics. 

To help you assess which analytics tool is right for you, here are some guiding questions, along with a summary of the differences between the top analytics options.

Questions to Ask When Choosing an Analytics Tool

Are you currently using, or planning on using, Google Ads?

 If you use Google Ads to drive traffic to your website, you should strongly consider using Google Analytics 4. Naturally, Google Analytics is well suited to bring in data from Google Ad campaigns.

Do you run multiple web properties that need their data aggregated?

If you have multiple websites or apps and you would like to monitor users’ activity between those properties in a single dashboard, Google Analytics 4 is well-suited to do so. If you have multiple web properties but are ok tracking the data for each separately, a simpler analytics tool is sufficient.

How often, and for what reasons, do you check your analytics?

If you check your analytics frequently (ie: weekly or monthly) and create sophisticated reports for stakeholders, the steep learning curve of Google Analytics 4 might be worthwhile. If you use analytics on a monthly to quarterly basis for more of a high level monitoring of traffic, a more lightweight solution is probably best.

How important is data privacy for you and your constituents?

Google Analytics 4 is still not compliant with the European Union’s data protection law the General Data Protection Regulation (GDPR). Plus, Google has a notorious track record for violating users’ privacy. If you share information and services that are under attack by authorities and/or bad actors (eg: reproductive health, gender-affirming care, legal aid, mental health services), then a privacy-focused analytics tool could be best for you and your site visitors.

Website Analytics Tools Comparison

Now, for a preview and breakdown of the differences between the top analytics tools out there. 

CMS-specific Analytics Tools

If your analytics reporting needs are simple, most content management systems have their own analytics tool you can use for free.

Drupal Statistics

 

If you have a Drupal website, you actually already have an analytics tool built in! It is called the Statistics module. It will take some light configuration by a developer to get it up and running for you.

Advantages
  • Free
  • Easy-to-use
  • Built into the website
  • GDPR-compliant
  • 100% Data Ownership
Disadvantages
  • Drupal websites only
  • Only tracks pageviews
  • Site traffic statistics are less accurate than third-party analytics tools

WP Statistics

 

If you have a WordPress website, WP Statistics is a plugin with a free tier. Like the Drupal Statistics module, it requires some light configuration by a developer.

Advantages
  • Free
  • Easy-to-use
  • GDPR-compliant
  • 100% data ownership
  • Tracks pageviews, traffic sources, visitor location and search terms
Disadvantages
  • WordPress websites only
  • Lacks goal setting and monitoring and other advanced features

Third-party Analytics Tools

CMS-specific analytics tools tend to be a bit more limited. If you’re looking for an analytics tool that is more feature-rich there are many third-party tools you can integrate into your website.

Fathom 

 

Fathom is a well-designed, privacy-respecting analytics tool. It starts at $14/month and is a great tool for people who want something easy to use that also has  some tracking and reporting features beyond the basics.

Advantages
  • Easy-to-use
  • GDPR-compliant
  • 100% data ownership
  • Can set and monitor goals
  • CSV data exports
Disadvantages

Google Analytics 4

 

Google Analytics 4 is a widely-adopted analytics tool backed by the might and power of Google. You can easily find online courses to help you learn it. Many people will be using it, so you will be able to ask questions of and learn from peers and colleagues in the nonprofit tech space. The learning curve, however, is steep.

Advantages
  • Free 
  • Most popular analytics tool
  • Robust analytics tracking and reporting
Disadvantages
  • Not GDPR-compliant
  • Data owned by Google
  • Steep learning curve

Matomo

 

Matomo is an open-source analytics tool. It is free if you install and host the software yourself. You can also pay Matomo to host it for you (starting at $23/month). 

Of the Google Analytics alternatives, it has the most tracking and reporting features available. This also means you should budget for a developer to configure it to your liking, and for some training for your team on using it.

Advantages
  • Free (if self-hosted)
  • GDPR-compliant
  • 100% data ownership
  • Can set and monitor goals
  • Can export reports
  • Other analytics features
Disadvantages
  • Intermediate ease-of-use
  • Requires self-hosting or paying for hosting

Plausible

 

Like Fathom, Plausible is a lightweight and privacy-respecting analytics tool. It starts at $9/month. 

It doesn’t have quite as many features as Fathom, but it’s comparable. It does have a Google Analytics importer, which is great for nonprofits who previously used Google Analytics and want to make the switch while keeping their historical data.

Advantages
  • Easy-to-use
  • GDPR-compliant
  • 100% data ownership
  • Can set and monitor goals
  • Can import your Google Analytics UA data
  • Can export reports
  • Other analytics features
Disadvantages

React Retro Counter

The React Retro Hit Counter by Joshua Comeau.

 

If you yearn for a simpler time you can always go for a retro page counter. This one is free, lightweight and carries the cachet of the early ‘90s revival so many of us are enjoying. It does require a quick installation by a developer.

Advantages
  • Free
  • Easy-to-use
  • GDPR-compliant
  • 100% data ownership
  • Hip retro vibe
Disadvantages
  • Only counts pages

Conclusion

While Google Analytics used to be the gold standard for website analytics, GA 4 is not nearly as user-friendly. And Google’s track record around privacy isn’t great. If GA 4 may be more than you need, this is a great opportunity to consider alternatives. It’s best to first get clear on what data is helpful to track. Then choose the tool that can collect that information and aligns well with your team’s values and skill sets. You might be surprised by the tool that fits best. 

Mar 30 2023
Mar 30

Drupal has come a long way since its inception as a content management system (CMS) in 2001. Over the years, Drupal has continued to evolve and improve, positioning itself as a top choice for organisations looking to build a dynamic and engaging online presence. 

One of the most significant changes in Drupal's evolution has been its focus on becoming more user-friendly for content editors. In this blog, we’ll explore some of the biggest changes that have occurred from Drupal changing its positioning to being more user-focused.

Blog Banner (41)-1

Improved User Interface

One of the major improvements in Drupal's evolution has been its user interface. Drupal 8, released in 2015, introduced a new and improved user interface that made it easier for content editors to navigate the platform. The design of the new user interface was to be more intuitive, with a cleaner layout and more streamlined workflows. Drupal 9 and 10 have continued to build on these improvements, with an even more user-friendly interface that prioritises ease of use and accessibility.

Streamlined Content Creation

Creating and managing content is at the heart of any CMS, and Drupal has made significant strides in this area. With the introduction of Drupal 8, content creation was streamlined with the introduction of in-place editing and a new WYSIWYG (what you see is what you get) editor. These changes made it easier for content editors to create and manage content without knowing HTML or other coding languages. Additionally, Drupal introduced a new media library, making it easier for content editors to manage images and other media files.

Enhanced Accessibility

Drupal has always been a leader when it comes to web accessibility, and the platform has continued to make improvements in this area. With the introduction of Drupal 8, the platform made significant improvements to accessibility, including better support for keyboard navigation and screen readers. Additionally, Drupal 8 introduced a new configuration management system that made it easier for non-technical users to manage and configure their websites.

Better SEO Capabilities

Search engine optimisation (SEO) is an essential aspect of any website, and Drupal has significantly improved in this area. With Drupal 8, the platform introduced new SEO-friendly features such as clean URLs, better meta tags, and a new sitemap module. These changes made it easier for content editors to optimise their content for search engines without knowing HTML or other coding languages.

Enhanced Security

Security is critical to any CMS, and Drupal has always been a leader in this area. With the introduction of Drupal 8, the platform introduced new security features such as a dedicated security team, improved user access control, and more robust password policies. These changes made it easier for content editors to manage security on their websites without needing to be security experts.

A Top Choice

Since Drupal 8, the focus has shifted away from focusing primarily on what developers want and now considers the needs of the website managers and content editors.  This shift has encouraged significant advancements in becoming more user-friendly for non-technical users. 

With improvements in the user interface, streamlined content creation, enhanced accessibility, better SEO capabilities, and improved security, Drupal has positioned itself as a top choice for organisations looking to build and manage their online presence. 

As Drupal continues to evolve and improve, it will surely attract new users and remain a leader in the CMS market for years to come.

Want to take advantage of Drupal’s ability to create powerful and complex websites? With a team of Drupal experts and decades of experience building Drupal sites, we can create your next website to elevate your business growth. Contact our team today to discuss your needs.

Mar 29 2023
Mar 29

Content teams want the flexibility to publish content creatively. They want landing pages to be dynamic and reflect the vision they have inside their heads. Organizations want to encourage brand and writing-style consistency (sometimes across a whole network of websites). They also want to ensure their content is maintainable and meets web accessibility and security standards.

How do we marry these desires together? One way is with proper guardrails for the authoring experience. Putting guardrails in place can help keep content within certain parameters without feeling too restrictive for authors and editors.

Types of guardrails

What makes a good guardrail? For one, guardrails are there in case of emergencies. You don't want to make a habit of bumping into them because that would ruin the paint job on your car. Ideally, they aren't noticed unless people are looking for them. Guardrails are there to guide people along the way they are already going while offering protection if something starts to go wrong.

The content-authoring experience of a website should be akin to driving on a well-planned and maintained roadway. Here are different types of guardrails you can use:

  • Use good labels, help text, and task-based navigation. Similar to clear signage on the road, you don't want your editors guessing what goes in a field or what to do next. Labels should follow the voice and tone outlined in your organization's content style guide. Help text should provide clear examples of expected content for each field. Menus and other links in the user interface should be clear and contextual to the user and the task at hand. Don't confuse authors with extra options they might not even have access to.
  • Plan out the right amount of space. Have you ever driven down a road without a shoulder, where the lane feels too narrow, and you feel like the tires are about to fall off the edge? Don't make your authors feel like that while they are entering content. They need ample space to write what they need while following voice, tone, and style guidelines. But they also need those lines painted somewhere.
  • Provide the ability to write drafts, save progress, and view previews. Authors should feel not feel like they need to speed. Enabling them to enter content at their own pace, knowing that their progress isn't going to be lost or be published until it is ready, all contribute to a sense of safety.
  • Fix functionality bugs or user experience (UX) obstacles. Know the feeling of driving over a road with potholes and rough, uneven pavement? That's what content authors and editors feel like when they encounter functionality bugs and bad UX. Test your authoring forms and processes rigorously and with real content to ensure all components work as expected.
  • Optimize for page performance. Consider using automated image optimization so that pages on your site are performant and load quickly for visitors. A visitor trying to read an article that is slowly loading 50MB of pictures can feel like being stuck behind a garbage truck on a one-way street.
  • Limit external code in content. Put restrictions on what authors can put into WYSIWYG fields, like third-party embedded code or JavaScript snippets, for site security. Authors usually don't want to deal with raw code anyway, so talk to them about why they are trying to insert such things to figure out a better solution to their needs.
  • Prevent accessibility issues stemming from content entry. Like taking a driver's education class to get a driver's license, authors need editor training to understand and follow the rules of the road. Training can help authors feel comfortable behind the wheel by letting them become familiar with the different forms and authoring tools they will use in the system. Training is an opportunity to give guidance on how to correctly structure content with headings and list markup and to include well-written alt text for non-decorative images.

Structured content

We advocate for structured content, which requires planning and organizing content into discrete fields per thing, even for a simple content type like an article. Instead of one large field to hold everything, we recommend structuring content piece by piece based on what it is and how it will be used to convey information across the site or content system. For example, on an article content type, we would typically start with an article title field, a published date field, an author name field, and an article body field at minimum.

For certain use cases, the content can be broken down even more. Some content may be better served with discrete month, day, and year fields instead of a full date field that contains all three pieces or providing individual fields for a person's title, first name, last name, and suffix instead of a single name field. 

But you don't want to go too far. Otherwise, you risk complicating content entry for your content team. Over-structuring content can make content entry more tedious than is necessary and introduce complexity that impacts the site implementation and maintenance. However, content systems that aren't set up with enough structure can negatively impact how your content does in search results, which prevents site visitors from finding your content.

Authoring flexibility vs. rigidity

Finding the right balance between content authoring flexibility and rigidity is difficult because the perfect balance in one system can be lopsided in another one. It depends upon the CMS and the authors who use it.

Too rigid

If the authoring experience is too rigid, authors fight against the system's constraints and feel frustrated, defeated, or disenfranchised. For example, authors may not be able to complete tasks they are asked to do, like posting an alert to the homepage, if only the site administrator can do that. This is especially true for authors who previously enjoyed a lot of authoring freedom.

Too flexible

If the authoring experience is too flexible, the integrity of the design and content system can be compromised and the content's message is lost. The content system becomes difficult to maintain as the quantity of content expands without enough structure or oversight. Content quality becomes inconsistent due to too many options and, ironically, is inflexible to future innovations. Authors have a wide unmarked road without lanes or signage, but this also means they have no clear way to get to their destination. This can be a negative experience for authors who are used to doing something one way and one way only. They can become overwhelmed with too many options to do what they need to do.

Finding the right balance

You don't want narrow, rigid guardrails because they hinder creativity and frustrate content teams. Authors can feel like they are part of an assembly line instead of an important part of the creative process. You also risk content getting locked into dated trends. A content authoring system with guardrails works best when

  • the authoring functionality matches the author's expectations and level of experience, 
  • enables authors to complete the tasks they need to, 
  • allows for content creativity while avoiding chaos.

Media asset management is an example that illustrates the importance of finding the right balance. Content managers often want to ensure that all media assets added to the site, like images or videos, meet rigorous content, style, copyright, and resolution quality standards. They may also want to encourage content authors to reuse images or videos already in the media library to reinforce branding with approved imagery. 

However, for authors to find images in the library to use, the image and video assets need to be created with metadata about them included. The metadata enables browsing and filtering on aspects of the media asset, like what is pictured, the image aspect ratio, file size, or the year the image was taken. For this to be possible, authors need to completely fill out the metadata fields each time they add a media asset to the library. By accounting for this additional work, you can evaluate if media asset reuse will benefit your organization or create more burden on the authors than the value it adds.

How do you find the right amount of guardrail structure to guide your content authoring experience? First, embrace the idea that it may need to change over time. The balance of authoring flexibility and constraints will need to adjust as your organization's content maturity level increases and new content goals are set. To get started, meet with people at your organization to talk about your current content system.

Interviews

Interview site stakeholders and content authors to learn more about the current content process. Ask questions to figure out the current state of your content, what your organization's goals for the content are, who the audience you want to reach with your content is, and what the practical limitations on the content creation lifecycle are. Some questions to ask in these interviews:

  • Who is the primary audience of the site?
  • What are the most important kinds of information currently on the website?
  • What does the current content authoring process look like? Can you talk through it? 
  • Who creates and manages content? Is it a particular role held by one person or a team?
  • What are some pain points you have when authoring content?

You may uncover that the content system does not match the skill level of most content authors. There may be competing priorities for target audiences and content messaging. These valuable findings can inform which guardrails to put in place or even remove. 

Even with guardrails, some authors will need guidance and training on using the content system. Other authors will be limited in what they can achieve themselves, even though they are capable of doing more. Achieving that balance is necessary to deliver content that consistently meets your organization's content guidelines.

Saying "no" to requests for more flexibility

Sometimes authors will come to you asking for more flexibility. The current content structure isn't working for them; they feel they can't realize some of their goals. They want you to take down some of the guardrails.

First, you need to get to the root of their need. Gently asking "why" will allow you to understand the request better. There are three possible outcomes:

  • The flexibility they want is aligned with content goals and will provide value, but it requires development work. Discuss what the feature requirements are, and prioritize the work with your development team.
  • The flexibility they want is already achievable within the existing system. If this is the case, you need to surface it to content authors more clearly by including it in the content training, doing a demonstration, and writing documentation. 
  • The flexibility they want goes against the stated goals for your organization's content. This doesn't mean a hard "no." It just means you must facilitate further discussions to reach a resolution.

However, rejecting a request is harder if your organization has no unifying mission or goals set for its content.

The importance of a clear destination

It's easy to end up driving in circles if you don't know where you are going. How do you prioritize anything related to content if you don't have clear goals to work towards? You are left to personal preferences. Any change, any guardrail, any attempt at more or less content system flexibility is easier to evaluate if there is a central and shared mission.

The American Booksellers Association's new IndieCommerce™ e-commerce platform had a clear mission driving all interface decisions: to provide "the tools for indie bookstores to create unique, content-rich, and easy-to-operate, fully transactional, e-commerce-enabled websites." 

In practical terms, they wanted to allow for content creativity while still having guardrails to enforce standards. This mission was the through-line for the entire project. It allowed them to dedicate resources to the platform's user experience. This focus resulted in providing a custom administration dashboard, task-based pages for managing the bookstore site and commerce settings, and flexibility to style a site's look and feel while meeting WCAG AA. Their mission prioritized strategy and development work that enhanced the user experience and ensured that the experience was maintained with comprehensive testing and iterative feedback from start to finish.

Having a mission in place makes your decisions around guardrails much easier. You can prioritize work that furthers the mission, which gives you clarity about what the content authoring experience needs.

Conclusion

With a commitment to optimizing the content authoring experience, organizational agreement on a mission, and willingness to talk to people about content authoring, you can establish the right balance between flexibility and rigidity in your content system. Setting up guardrails that fit your content process empowers content authors to bring their content to life while ensuring that content meets brand and style guidelines. Research, test, iterate, and repeat.

If you'd like help finding the right balance for your content teams, we can work with you to create an authoring experience that makes content authors happy and gets your content to the right audience.

Mar 29 2023
Mar 29

According to Mozilla, Information Communications Technology (ICT) is expected to emit more carbon by 2025 than any single country besides China, India, and the United States. We tend not to think of the physical scale of the internet, but it is a massive machine. It is critical that we consider the energy that is consumed to both run the internet and allow for its exponential growth.

It is estimated that today digital technology uses between 5-9% of global electricity. This estimate is particularly concerning as only a quarter of our electricity comes from renewable resources. There is an increasing demand for electrical infrastructure as fossil fuels transition out of consumer and industrial uses.

There are also carbon implications for building and disposing of digital devices. Electronics are not generally designed for longevity, repair, or recycling. Digital tools consume rare minerals and water, and e-waste is a growing problem.

I will explore these aspects of web sustainability and others in this article. While my focus is on Drupal, these general principles apply to most of the web, particularly open source tools and ways to leverage the work of these communities. Likewise, I will also provide practical steps that people can take to reduce the environmental footprint of their sites.

1. Servers, networks, and power

Most of us do not see the scale of the internet's physical infrastructure. It happens at the other end of a thin fiber optic network. We don't see the thousands of server racks within the huge climate-controlled warehouses that run our websites. We don't see the physical infrastructure behind each hop that our internet packets take as they travel at the speed of light to our laptops and mobile devices.

We generally only consider the devices in our homes and offices—those used daily and those at the back of our closets. Powering our devices typically results in CO2 emissions, but this is just one part of its physical impact on the real world. We need to think more about the effects of building these devices. We should look to extend the lives of our devices and see that the components can be effectively reused or recycled.

2. Loading web pages

Beyond the data centers and servers, we need to consider the costs of just using the web. We know that the median web page weight between 2012 and 2022 has increased by 221%. Many sites now depend on leveraging third-party JavaScript tools, which impacts performance. We've also seen many libraries become bloated with code that isn't actually used to deliver the content.

[ Also read 5 open source tips to reduce waste in web design ]

There have been real advances in modern media formats but slow adoption. Image formats like WebP and AVIF offer dramatic improvements, but we often just use a lower-resolution image instead. SVGs offer extensive enhancements in semantics and scalability but are rarely used to their full extent. The defaults are not set for performance, and people usually default to doing the easiest thing. This impacts customer experience but also has a significant impact on CO2 emissions.

3. Computer usage

Another aspect worth considering is the human effort in using and building our digital tools. A faster web experience allows users to accomplish their tasks and move on quickly. Ensuring a website is performant is important, but it must also be optimized for a good user experience. Content must be available for everyone, regardless of whether they have a disability.

Building digital tools takes time, and complex tools often require more time to develop and maintain. Most modern websites include code that depends on several software libraries and multiple people or teams. Teams using open source software can be more efficient as they are not reinventing the wheel. A great example is the Drupal CMS, which drives over a million websites. Working with Drupal to deliver complex sites allows designers and developers time to focus on meeting customer needs rather than building basic form components. It is hard work to make interfaces simple, and we all benefit when our teams can stand on the shoulders of giants. It is important to remember that our time building digital tools also consumes energy and resources.

Digital teams that care about sustainability need to be conscious of the end use of the technology, not simply the environmental impact of the tool itself.

4. Government website contributions

Based on size alone, digital government initiatives have an outsized role in reducing CO2 contributions. Often, government sites are slow to load and difficult to find or finish the intended task. They are usually built with legacy proprietary tools that do not have a common navigation or reliable search tool. We know that fossil fuels power most US Federal government websites.

The US Web Design System (USWDS) is an initiative to improve the performance and accessibility of government websites. However, more can be done, such as tracking site-wide performance through tools like Lighthouse Parade, which uses Google Lighthouse to evaluate if pages follow best practices.

The UK is leading digital government by highlighting the importance of sustainability and demonstrating how to do this effectively. The UK government has defined a strategic approach to sustainable ICT and provided practical guidance on what government agencies should do.

The UK's Department for Environment, Food & Rural Affairs describes how their work is aligned with the UN's Sustainable Development Goals. The Ministry of Defence also discusses how this fits in with their goals for a circular economy.

5. What about Drupal?

Drupal is just one of many content management systems (CMS). Drupal is open source and drives over a million websites, comprising 1-2% of the web. Drupal can easily manage hundreds of authors and complex permission systems. It is popular with government, education, and large business organizations. It also has been a leader in web accessibility for over a decade. It is also a versatile platform that can leverage a headless JavaScript presentation layer like GatsbyJS or Eleventy.

Drupal can be very performant, but this is often overlooked. Drupal sites leveraging GatsbyJS are usually very fast because it converts the content into static HTML. Drupal can also optimize images for both screen size and bytes. Drupal can even convert images that authors upload to more modern formats like AVIF and WebP. Content Delivery Networks (CDNs) can also help ensure that cached pages are served more quickly. CSS/JS aggregation has been incorporated into Core to improve performance, but many other elements require a site builder to set up before adding content.

6. Practical steps for governments to create more sustainable websites

Governments can do a lot to reduce the CO2 footprint of their websites. Using a CDN and leveraging a JavaScript front-end like Gatsby will improve performance. Government design systems like the USWDS should be established by default to share common CSS, JavaScript, and other design assets. Securely sharing optimized digital assets will mean citizens must download fewer files when accessing different sites.

Sites should be created to focus on user tasks. More than any other class of websites, government sites should be designed to be fast and functional. By prioritizing user experience, many steps in online processes can be eliminated, and users can more quickly exchange the required information with government agencies and service providers. Agencies should build on existing design systems to support tools like dark mode effectively. Dark mode is one means to reduce energy consumption by citizens using government sites and extend device battery life.

Having government sites hosted in green data centers powered by renewable energy is a huge plus. Government agencies should aim to support data centers that actively minimize adverse environmental impacts. Supporting companies that work to extend the life of their hardware and effectively reuse or recycle their components is key.

It is also important to effectively manage the back-end infrastructure. Database optimization can dramatically reduce load times, as can Redis or Memcached.

In development sprints, consider using tools like Ecograder, WebsiteCarbon.com, and the Green Web Foundation to assess where there is room for improvement.

The Sustainable Web Manifesto and the Strategies section of SustainableWebDesign.org have good resources worth considering. Podcasts like Environment Variables and Green I/O provide a wealth of information and help developers keep up-to-date with best practices.

While these might appear to be small steps, making our web more sustainable can be achieved through collective action and tangible changes. We must prioritize web sustainability and work together to create a more sustainable digital future.

We must remember that even small impacts can have a huge impact when scaled up millions or billions of times.

Mar 23 2023
Mar 23

In order to create a flawless product, it needs to undergo unparalleled quality assurance and testing.

One of the preliminary testing techniques to understand the fundamental behavioral output of a system or application is black box testing. It aims to identify the functioning of the application such as usability, response time, and reliability, and categorize them under expected or unexpected outcomes.

A powerful testing technique, it exercises a system end-to-end. 
This blog will help you understand black box testing in detail, including its various techniques and tools used. 

What is Black Box Testing?

Black box testing is a software testing technique where testers do not have access to the internal code or structure of the system being tested. Instead, testers focus on the software from the perspective of an end-user, testing for input/output behavior, usability, and software functionality. It helps to ensure that the software meets the user's requirements, and also helps to identify potential bugs and errors that could harm the functionality of the software. This type of testing is crucial in ensuring that software is reliable and good quality for end-users.

Let’s understand with an example, suppose you are testing a website's search functionality. You know that users should be able to enter a search term and receive a list of results related to that term. You do not know how the search algorithm works, but you can test its functionality by entering different search terms and observing the results. 

Black Box Functional Testing Technique

Black box testing has various techniques that are used to test the functionality of an application. It is important to understand the concepts of each of these techniques to understand which one is the right for your project. Let’s take a look at some of the most commonly used black box testing techniques

  • ECP technique

Equivalence class partitioning (ECP) is a software testing technique that helps to identify test cases by dividing input data into equivalent classes. The goal of ECP is to reduce the number of test cases needed to achieve maximum test coverage while still providing effective testing.

The basic premise of ECP is that input data can be divided into different categories or classes based on their equivalence. For example, if a system accepts input values in the range of 50 to 90, input values can be divided into the following equivalence classes:

  • Valid input values - Input values within the range of 50 to 90 are considered valid input values and belong to this equivalence class.
  • Invalid input values - Input values outside the range of 50 to 90 are considered invalid input values and belong to this equivalence class.
  • Null input values - Input values that are empty or null are considered null input values and belong to this equivalence class.

    ECP Technique

By dividing input data into these equivalence classes, testers can identify a set of representative test cases that can effectively test the system. For example, a test case can be created for each equivalence class to ensure that the system handles each type of input correctly.

The Equivalence class represents or defines a set of valid or invalid states for each input after it has been validated.
The requirements and specifications of the software serve as the basis for this technique. The benefit of this strategy is that by going from an infinite to a finite number of test cases, it helps to shorten the testing period. It is appropriate for use at every stage of the testing procedure.

Let's look at one instance:

Let’s consider, a feature of the software application that takes a cellphone number with 10 digits.
Example of ECP Technique

Invalid 1 Test Case

Invalid 2 Test case 

Invalid 3 Test case

Valid

DIGITS>=11

DIGITS<=9 

DIGITS=10

DIGITS=10

98472622191

984543985

9991456234

9893451483

With a mobile number of 10 digits as an example, we can observe that there are two equally legitimate and invalid partitions. The valid partitions operate the same way, which is to redirect to the next page.
In the example above, two further partitions include erroneous values of 9 or fewer than 9 and 11 or more than 11 digits. When these invalid values are applied, these invalid partitions behave similarly and are forwarded to the error page.

The above example shows that there are only three test cases, which is consistent with the equivalence partitioning principle, which states that this technique aims to reduce the number of test cases. 

The benefit of the ECP Testing Technique

There are several benefits to using the ECP testing technique:

  • Increased accuracy: ECP can detect errors that might be missed by other testing techniques increasing the overall accuracy of the testing process.
  • Easy to implement: The ECP testing technique is not difficult to implement, and it can be used with a variety of platforms and software.
  • Improved efficiency: ECP can save time and effort by quickly identifying invalid input and reducing the need for manual testing.
  • Cost-effective: As compared to other testing methods, ECP is a cost-effective solution for software testing.
  • Reduction of production issues: ECP testing helps to identify issues early on in the software development process, reducing production issues and making it easier to fix problems before they become costly mistakes.

Overall, the ECP testing technique is a powerful tool for detecting errors and improving software quality.

  • Boundary Value Analysis Technique 

Boundary value analysis is a software testing technique that focuses on testing the input values at the boundary or edge of the acceptable input range for a system or application. It is a type of black box testing that helps to identify errors or defects in the software that might be caused by boundary conditions.
The basic premise of boundary value analysis is that errors often occur at the extreme boundaries of the input values, rather than in the middle of the input range. By testing these boundary values, testers can identify potential errors and improve the quality of the software.

For example, let's consider a system that accepts input values in the range of 1 to 100. To perform boundary value analysis on this system, the tester would focus on testing the following input values:

  • Minimum value- Testing the input value of 1, which is the minimum value in the acceptable range, helps to ensure that the system handles the smallest input value correctly.
  • Maximum value- Testing the input value of 100, which is the maximum value in the acceptable range, helps to ensure that the system handles the largest input value correctly.
  • Values just below the minimum- Testing input values just below the minimum value, such as 0 or -1, helps to ensure that the system handles values outside the acceptable range correctly and provides appropriate error messages.
  • Values just above the maximum- Testing input values just above the maximum value, such as 101 or 1000, helps to ensure that the system handles values outside the acceptable range correctly and provides appropriate error messages.
     
  • Decision Table Technique

The Decision Table Testing software testing approach is used to examine how the system responds to various input combinations. This methodical technique tabulates the various input combinations and the resulting system behavior.

Because the decision table records the causes and consequences of thorough test coverage, it is also known as a Cause-Effect table. Techniques such as decision table testing, which tests two or more inputs with a logical relationship, are frequently used.

There are multiple rules in the table for a single decision. A decision table's rules can be created by simply inserting AND between conditions.

In the below example, you will understand, how different input combinations, provide different results. Here “AND” is detonated by the sign of circumflex (^), Y stands for “Yes” and N stands for “No”. R1 to R4 stands for different conditions under certain input and outputs. 

The following are the major rules that can be extracted (taken out) from the table:

  • R1 = If (working-day = Y) ^ (holiday = N) ^ (Rainy-day = Y) Then, Go to the office. 
  • R2 = If (working-day = Y) ^ (holiday = N) ^ (Rainy-day = N) Then, Go to the office.
  • R3 = If (working-day = N) ^ (holiday = Y) ^ (Rainy-day = Y) Then, Watch TV. 
  • R4 = If (working-day = N) ^ (holiday = Y) ^ (Rainy-day = N) Then, Go to picnic.

As per below graph, There is no need to check the condition in R1 and R2. If the day is working, whether it is sunny or rainy, the decision is to go to the office.
Decision table Technique Example

Example of Decision Table Technique
So Outlook = Rainy and Outlook = Sunny. The following rules are the optimized versions of the previous rules R1 and R2. 

  • R1 optimized: If (Day = Working) Then Go To Office 
  • R2 optimized: If (Day = Working) Then Go To Office 

The refinement/optimization step produces rules that are effective, efficient, and accurate.

  • State Transition Table Technique 

State transition testing is used when some aspect of the system can be described using a 'finite state machine'.

This simply means that the system can exist in a finite number of states, and the transitions between them are determined by the machine's rule. This is the model that the system and tests are based on. A finite state system is any system in which the output varies depending on what has happened previously. A state diagram is a common representation of a finite state system.

If you ask for ₹100 from a bank ATM, you will be given cash. You may later make the same request but be denied the funds (because your balance is insufficient).

This refusal is due to the fact that the balance in your bank account has dropped from sufficient to cover the withdrawal to insufficient. The earlier withdrawal is most likely what caused your account to change state.

A state diagram can depict a model from the perspective of the system, an account, or a customer.

A state transition model is made up of four basic components:

  • the states that the software may occupy (open/closed or funded/insufficient funds)
  • the transitions from one state to another (not all transitions are allowed)
  • the events that cause a transition (closing a file or withdrawing money)
  • the actions that result from a transition (an error message or being given your cash)

It is important to note that in any given state, one event can only cause one action, but the same event from a different state can cause a different action and a different end state.
 one event can only cause one action, but the same event from a different state can cause a different action and a different end state.

An example of entering a Personal Identification Number (PIN) into a bank account is shown above.

The states are represented by circles, transitions by lines with arrows, and events by text near the transitions.

The state diagram depicts seven states but only four events (card inserted, enter a PIN, valid PIN, and invalid PIN).

There would also be a return from the 'Eat card' state to the initial state. 

There would be a 'cancel' option from 'wait for PIN' and the three tries, which would also reset the card to its initial state and eject it. The 'access account' state would mark the start of a new state diagram displaying the valid transactions that could now be performed on the account.

Use Case Testing Tests

The use case is a functional test of black box testing that is used to identify test cases from the beginning to the end of the system based on the system's usage. The team uses this technique to create a test scenario that can exercise the entire software based on the performance of each function from start to finish.

It is a graphical representation of business requirements that describes how the end user will interact with the software or application. The use cases provide us with all of the possible techniques for how the end-user will use the application, as shown in the image below:
 possible techniques for how the end-user will use the application,

The image above shows a sample of a use case with a condition related to the customer requirement specification (CRS).

We have six different features for the software's module P.

And in this case, the Admin has access to all six features, the paid user has access to three features, and the Free user has no access to any of the features.

As with Admin, the various conditions are as follows:

  • Pre-condition→ Admin must be generated
  • Action→ Login as Paid user
  • Post-condition→ 3 features must be present
  • And for Free users, the different conditions would be as below:
  • Pre-condition→ free user must be generated
  • Action→ Login as a free user
  • Post-condition→ no features

Who writes the use case?

The client supplies the customer requirement specification (CRS) for the application, and the development team drafts the use case in accordance with the CRS and then sends the use case to the client for review.
Explains Software Development Lifecycle

After the client’s approval, developers design and code the software, and the testing team writes test plans, and test cases for various software features. 

Test design techniques benefits include:

There are several benefits of the test design techniques, Let’s discuss them briefly. 

  • Efficient use of time and resources: Test design techniques help testers to identify the most important and relevant test cases that need to be executed. This makes the testing process more efficient and saves time and resources.
  • Improved test coverage: By using various test design techniques, testers can ensure that all the important features and functionality of the software are thoroughly tested. This improves test coverage and reduces the likelihood of defects being missed.
  • Better defect detection: Test design techniques help testers to identify potential defects early in the testing process. This allows developers to fix the issues before they become more difficult and costly to resolve.
  • Increased test effectiveness: Test design techniques allow testers to design tests that are more effective in identifying defects. This leads to higher-quality software and improved customer satisfaction.
  • Consistent testing: Test design techniques provide a structured approach to designing tests that ensure that each test is executed consistently.

Black Box Testing Vs White Box Testing

While both black box and white box testing ensure a flawless end product, It's important to understand what is the underlying difference between the two. 

Black Box Texting 

White Box Texting 

Performed by Software Testers

Performed by Software Developers

Software Implementation knowledge is not required

Software Implementation Knowledge is required

This approach treats the software as a black box, meaning the tester focuses only on the software's functionality and does not consider its internal structure, code, or design

This approach tests the software's internal code, design, and architecture

Coding knowledge is not necessary 

Coding knowledge is a must

Test the software from the end user's perspective

Focuses on testing the entire system, not just from the user’s side

To sum up 

Black box testing is used to find errors in the system without peering into the actual code. As mentioned above, it’s an efficient way to test larger code segments. This type of testing is often used to verify the quality and reliability of a system or application, by focusing on the user’s view of the system. 

With emerging technological trends you need a partner that makes sure your website is innovative and user-friendly. At OpenSenseLabs, we help enterprises provide a better digital experience. Contact us at [email protected] and let our experts help you out.

Mar 23 2023
Mar 23

Just like land, air, and water are meant for everyone, the web was designed to work for all people and expel any hindrance, irrespective of the surroundings and capabilities of people. But the effect of incapacity (of individuals) in the light of the fact that the web standards don’t include all in itself has become a barrier. Creating quite the paradox in the situation.

graphics for web accessibility with an ear, brain, eye, mouth, and heart

Before completing this blog, my ignorance led me to believe that web accessibility was limited to ‘accessibility only for people with disability’. Another thing that I was coxed to believe was that it is almost synonymous with visibility issues. But it is as much for a person with auditory disabilities as it is for a person with cognitive or neurological disabilities. However, I realized I was not the only one associating such wrong notions with disabilities and web accessibility.

Lack of awareness and taboos associated with disabilities often mislead us.

Ensuring that people with disability have equal and inclusive access to the resources on the web, governments and agencies follow certain guidelines in order to establish equal accessibility for all without any bias. 

What are Web Accessibility Standards and why do they matter?

The Web Content Accessibility Guidelines (WCAG) explains how the web content be made more accessible to people. Here the word "content" refers to any and every kind of information in a web page, such as text (include heading and captions too), images, sounds, codes, markup - anything that defines the layout and framework.  

“WCAG is developed through the World Wide Web Consortium process with a goal of providing a single shared standard for web content accessibility that meets the needs of individuals, organizations, and governments internationally.”

Take examples of physical infrastructures like ramps and digital vision signboards, which can be used by anyone, in a similar fashion web accessibility is for everyone.

When you go out in the noon, the level of contrast can be an issue as much for a person with 6/6 vision as it can be for a person with visibility issues. Or say, older people (due to aging) face problems with changing abilities, as much as people with “temporary disabilities” such as a broken arm or lost glasses. Thus, not only web accessibility standards ensure justice for people with disability but, it is inclusive for all. 

According to the Convention on the Rights of Persons with Disabilities by the United Nations, enjoying equal human rights is a fundamental freedom. To ensure the dignity of people with disability is not a subject of ridicule, governments across the globe signed a treaty for easy web accessibility. 

How does Drupal help?

A person may face an issue either when building a website or when using it. The WCAG ensures that both the times the guidelines are followed. The World Wide Web Consortium (W3C) guidelines are then divided into two: ATAG 2.0 and WCAG 2.0. Authoring Tool Accessibility Guidelines (ATAG 2.0) addresses authoring tools and Web Content Accessibility Guidelines (WCAG 2.0) addresses Web content and is used by developers, authoring tools, and accessibility evaluation tools. 

Drupal conforms to both the guidelines. The initiative started with Drupal 7 accessibility and the community has been committed to ensuring that accessibility for all. 

What Drupal does...

The community has an accessibility team which works to identify the barriers both at the code level and the awareness level to resolve them. As a person using assistive technologies to browse the web, Drupal is built to encourage and support the semantic markup (which comes out-of-box in Drupal now).

One can realize that the improvements are meant for both the visitor and administrator in the:

  • Color contrast and intensity
  • Drag and Drop functionality
  • Adding skip navigation to core themes
  • Image handling
  • Form labeling
  • Search engine form and presentation
  • Removing duplicate or null tags
  • Accessibility for Developers

Modules For Accessibility

Following are some of the Drupal modules which will assist you in keeping up with the accessibility standards. 

  1. Automatic Alt text
    The basic principle at work here is the idea of easy perceivability. Any and every information should be, thus, presented in such a way that is easily perceivable to the user. It is required for any non-text information like images and video to describe the content in the form of text for the screen readers to read it. 

    Logo of automatic alt text module by Microsoft

    The Automatic Alt text module automatically generates an alternative text for images when no alt text has been provided by the user. This module works great for the websites and portals with user-generated content where the users may even not be aware of the purpose and importance of the Alternative text. 

    It describes the content of the image in one sentence but it doesn’t provide face recognition. 
     

  2. Block ARIA Landmark Roles
    Inspired by Block Class, Block ARAI Landmark Roles adds additional elements to the block configuration forms that allow users to assign a ARIA landmark role to a block.
     
  3. CKEditor Abbreviation
    The CKEditor Abbreviation module adds a button to CKEditor which helps in inserting and editing abbreviations in a given text. If an existing abbr tag is selected, the context menu also contains a link to edit the abbreviation.

    Abbr tag defines the abbreviation or an acronym in the content. Marking up abbreviations can give useful information to browsers, translation systems, and help boost search-engines.
     

  4. CKEditor Accessibility Checker
    The CKEditor Accessibility Checker module enables the Accessibility Checker plugin in your WYSIWYG editor. A plugin, the module lets you inspect the accessibility level of content created and immediately solve any accessibility issues that are found.
     
  5. High Contrast
    On April 13, 2011, Joseph Dolson published an article "Web Accessibility: 10 Common Developer Mistakes" stating the most common mistakes related to web accessibility and quoted that most of the issues have "more to do with a failure to understand what constitutes accessible content than with a failure to understand the technology"

    In most of the surveys, poor contrast level is often cited as the most commonly overlooked feature by the developers.

    an example of Drupal high contrast

    High Contrast module, provides a quick solution to allow the user to switch between the active theme and a high contrast version of it helping them pull out of the problem.

  6. htmLawed
    According to the "Ten Common Accessibility Problems" an article by Roger Hudson, failure to use HTML header elements appropriately is one of the key accessibility issues. 

    The htmLawed module utilizes the htmLawed PHP library to limit and filter HTML for consistency with site administrator policy and standards and for security. Use of the htmLawed library allows for highly customizable control of HTML markup.

  7. Style Switcher
    The Style Switcher module takes the fuss out of creating themes or building sites with alternate stylesheets. Most of the accessibility issues have been confronted at the theming level. With this module, themers can provide a theme with alternate stylesheets. Site builder can add other alternate stylesheets right in the admin section to bring it under the right guidelines of accessibility. Allowing special styling of some part of the site, the module presents all those styles as a block with links. So any site user is able to choose the style of the site he/she prefers.

  8. Text Resize
    The handiest feature giving the end users just the right autonomy to resize the text as per their comfort of the eyesight. The Text Resize module provides the end-users with a block that can be used to quickly change the font size of text on your Drupal site. 

    an example of text resize block

    It includes two buttons that can increase and decrease the size of the printed text on the page.

  9. Accessibility
    A module for the developer, Accessibility module gives you a list of available Accessibility tests, (most of which are) aligned with one or more guidelines like WCAG 2.0 or Section 508. 

    It immediately informs the site maintainer about the missing an “alt” attribute in an image, or if the headers are used appropriately. Further, each test can be customized to fit your site’s specific challenges, and customize messages users see for each test so that you can provide tips on fixing accessibility problems within the context of your site’s editing environment.

Drupal  Features for Accessibility 

Other than the modules that can assist you to overcome web compatibility issues, here is a list of top Drupal features for easier web accessibility. 

  1. Semantics in the Core
    When an assistive device scans a web page for information, it extracts the data about the Document Object Model (DOM), or the HTML structure of the page. No further information is read by the screen reader.

    Often these assistive devices only allow a user to select to read the headings on the page or only the links. It prioritizes according to the hierarchy in which the headings and links are presented making browsing easier for users of assistive devices. 

    Drupal 8 is based on HTML5. Presenting new and better semantic components HTML5 is, in fact, one of five major initiatives outlined in Drupal development. It allows theme developers to control where to use the new semantic elements and opt out entirely if they so choose. 

    When we compose semantically correct HTML, we’re telling the browser and the assistive technology what type of content it is managing with and how that information relates to other content. By doing this, assistive technology is all the more effortlessly ready to carry out its activity since it has a structure that it can work with.
     
  2. Aural Alerts
    Often page updates are expressed visually through color changes and animations. But listening to a site is a very different experience from seeing it, therefore, Drupal provides a method called “Drupal.announce()”. This helps make page updates obvious in a non-visual manner. This method creates an aria-live element on the page.

    This also lets the user know of any alert box appearing along with providing instructions to screen reader users about the tone as well. Text attached to the page is read by the assistive technologies. Drupal.announce accepts a string to be read by an audio UA. 
     

  3. Controlled Tab Order
    The accessibility issues also crop when a user uses different mediums while navigating the web. Not every user uses a mouse to navigate the website. The TabbingManager, in Drupal, is an awesome medium to direct both non-visual and non-mouse users to access the prime elements on the page in a logical order. It, thus, permits more control when exploring complex UIs.

    The tabbing manager helps in defining explicit tab order. It also allows elements besides links and form to receive keyboard focus. Without breaking the tab order it places the elements in a logical navigation flow as if it were a link on the page.
     

  4. Accessible Inline Form Errors
    It is important to provide the necessary feedback to users about the results of their form submission. Both the times when successful and when not.  This incorporates an in-line feedback that is typically provided after form submission.

    Notifications have to be concise and clear. The error message, in particular, should be easy to understand and provide simple instructions on how the situation can be resolved. And in case of successful submission, a message to confirm would do. 

    Drupal forms have turned out to be impressively more open to the expansion of available inline form errors. It is now easier for everyone to identify what errors they might have made when filling in a web form.

  5. Fieldsets
    Fieldset labels are utilized as systems for gathering related segments of forms. Effectively implemented label gives a visual diagram around the shape field gathering. This can, to a great degree, be valuable for individuals with cognitive disabilities as it viably breaks the form into subsections, making it easier to understand.

    Drupal presently uses fieldsets for radios & checkboxes in the Form API. This helps towards additionally upgrading forms in Drupal.

Conclusion

However good the features Drupal offers, in the end, it is up to the organizations to strategize and build the websites and applications around the web accessibility.   

We ensure that our different teams and interaction work together in order to make the Web more accessible to people with disabilities. At OpenSense Labs we design and develop the web technologies to ensure universal accessibility. Connect with us at [email protected] to make the web a better place. 

Mar 17 2023
Mar 17

Nowadays, software has become crucial for the functioning of large publishers. It supports the process of creating, publishing and distributing content, and also allows monitoring and analyzing users and market data. In the following article, we would like to introduce you to one of the available tools that improve the process of creating a website and increase the quality of daily work on content. This solution is the Thunder distribution based on Drupal CMS.

What is Thunder?

Thunder is an open source content management system aimed at professional publishers. The tool is also one of Drupal's free distributions, i.e. its version enriched with additional modules and extensions, which are available out of the box and are targeted at facilitating user work in specific aspects.

In the case of Thunder, we are dealing with a tool for all kinds of publishers. Both small and large information portals, publisher websites, and even blogs can benefit from its functionalities. Popular magazines such as Elle, InStyle, and Playboy use it in everyday work. Further down the article, we'll present details about the distribution itself and some of its most interesting and useful options.

Popularity and authors

Currently, over 800 websites report using Thunder, and the distribution itself is regularly developed and supported by the authors and users. As a result, the stability and community support for this solution are at least at a satisfactory level.

The author of Thunder is Hubert Burda Media - a German media group that has been developing this project since 2016 (the first version was released in January 2017). Their experience allowed them to tailor a tool to the needs of the industry they are members of. Thunder was designed to solve real problems and facilitate the daily work of other publishing or media companies.

Thunder download and installation

Thunder as a project is available at: https://www.drupal.org/project/thunder and we can find complete installation instructions in the documentation.

To install Thunder, we need a server with access to PHP, a database and Composer. The article with tips on how to generate a local development environment will help us prepare these elements.

The latest version of Thunder 6, which we recommend, is based on Drupal 9, therefore it shares the hardware requirements with it. These include: PHP in at least 7.3 version (although the recommended version is 8) and Apache in at least 2.4.7 version. In the case of the database, the values will vary depending on which database we decide to use. We can find a full list of hardware requirements in Drupal's documentation.

Once we deal with the necessary preparation, the distribution installation requires only two commands:

1. Project creation and installation

Typing a specific command with project creation is an important part of Thunder installation.


2. Quick start

Quick start is another important command which allows users to start a project in Thunder CMS.


And that's basically it. After following these steps, we have our first Thunder instance on the local environment ready for work.

We recommend delving deeper into the above-mentioned installation documentation available on Drupal’s website. There we’ll find more details and additional information that will help us launch a new project.

Thunder CMS - review and functionalities

As we mentioned above, Thunder is a distribution aimed at publishers. Its main and most commonly used functionality will therefore be the article creation window. We'll go through the process of adding content, indicating the elements that streamline and improve our work. We'll take up the topic in two parts: article creation and additional functions, in order to separate these aspects from each other.

Article creation

To Drupal users, this window may seem both familiar and foreign. The longer we look at this screen, the more we'll be surprised by the solutions not accessible in the standard version of Drupal. Let's go through all sections to see what possibilities Thunder offers us.

BASIS

In Thunder CMS article section, we can fill in the Basis field with Title, SEO Title, and tags.

Source: Thunder CMS

In addition to the standard Title field to complete the title, there are also several new features here.

Channel

The Channel field allows us to assign an article to one of the main channels. The list of available channels can be configured and extended at: /admin/structure/taxonomy/manage/channel/overview

The Channel field in Thunder CMS is used to assign an article to one of the leading channels.


This function allows us to organize the content and its purpose. In the example above, we see a standard division into events and messages. This type of solution enables us to easily and effectively distribute the content within specific channels. On our test web page, this helped us create separate subpages presenting content from these two categories.

News and Events are our example categories that present different types of content in Thunder CMS.


SEO Title

This is the title that isn't visible to the user but is read by robots crawling and indexing our website. Its quality and compliance with certain rules are crucial to strengthen the web page’s position in Google or Bing search engines.

This title is also used to automatically generate the address of our article, so it's a good idea to keep it in mind and include the keywords for our content here.

This field is also enriched with a "validity indicator" oscillating between the colors: red (bad), yellow (correct), and green (good). The dynamic bar illustrates whether we stick to the set rules, such as the length of the title. This indicator updates automatically when filling out the title, so there's no need to refresh the web page.

SEO Title field in Thunder CMS has a validity indicator that helps determine the length of the title


Tags

These are the keywords that allow us to group the content. This is one of the integral elements of contemporary content creation. Thunder CMS treats this matter seriously and proposes a simple but complementary way to generate and add tags. The Tags field lets us choose the predefined tags and create new ones on the fly.

All the tags we defined are available here: /admin/structure/taxonomy/manage/tags/overview. Here we can edit, remove and add new ones.

Example of creating tags in Thunder CMS:

In Thunder CMS we can conveniently group key terms and organize Tags by using drag-and-drop handles. Field Edit terms in Thunder CMS let us create and re-edit Tags, adding extra information.


In addition to the name itself, tags may also contain additional, extensive information, thanks to using Paragraphs, which actions we present later in the article.

In this way, we can easily search for the prepared tag and add it to our article.

Tags prepared in Thunder CMS, e.g. key terms, can be easily searched and added to our article.


That's not all, though. If a tag is missing from the list, we don't have to leave the article editing window in order to add it. We just enter a new expression in the Tags field, and the right tag will be created in the background.

Thunder CMS provides a convenient option to complete tags from the article creation window.  The new tag added in the article creation window in CMS is also visible on complete list of tags.


TEASER

Teaser text

This is an introductory text that aims to familiarize the user with the topic of our article. It usually displays at the very beginning of the content and is separated from the rest of the post. Teaser text is also used as a teaser on article testing web pages and when sharing the content within such channels as Facebook and Google.

Teaser text is an introduction to the article, intended to encourage users to read the whole post.


Image

Thunder CMS provides the ability to easily add and edit graphics.

First of all, we can add photo in two ways:

  • We choose a photo from among the files already added to the system. Filtering them by status and the ability to search by name helps here. It's also a good way to use the graphics prepared and processed earlier by a graphic designer, and add them to the system ​​for later convenient use of the ready files. This creates the opportunity to build our own media collection, which can be used many times in different ways, without having to fill up disk space with the same images.
     
Thunder CMS allows us to select ready graphics for the article from the existing medium gallery.
  • We import photos. Here we can upload a photo from our computer.
     
Thunder CMS lets us import graphic files from our computer disk and add them to the media library.


However, the possibilities don't end with adding a photo. Each image can be described by using a number of fields, such as:

  • name: main name of the photo, by witch we’ll be able to search for it in the future,
  • tags: created exactly on the same principle as described above,
  • alternative text: photo description used by screen readers for blind people, and also important for the website's SEO,
  • title: title of our photo,
  • description: description of the photo and its content,
  • credits: author and source, if the image doesn’t belong to us,
  • expires: date indicating when the picture will no longer be valid - a field used for more complex cases, such as purchasing rights to a photo for a specific period.
     
Every image in Thunder CMS can be described by e.g. name, tags, alternative text, title and credits.


An additional feature in Thunder CMS, invisible at first glance, is the ability to select a "point of focus" in the photo, symbolized by the cross icon. By clicking anywhere on the graphic, we can choose its most important point. This is used when framing and cropping the photo. By indicating this point, we can be sure that regardless of how the image is displayed, the most crucial element will always be visible.

Point of focus helps choose the most important fragment of the image while adding it to an article.


Under the thumbnail image, we can also find a magnifier icon, which when clicked will show us how the photo will be displayed in various cases:

Magnifying glass icon allows us to see how image will be shown in sources like Facebook or Gallery.


PARAGRAPHS

Paragraphs are a key functionality used to build an article. It's a system for creating content using separate blocks (paragraphs) that can be one of many types of content. All paragraphs - once added - we can freely edit and change their order, like laying "blocks" of various types. 

Paragraphs allow us to build and edit an article by organizing blocks with many types of content.


The basic paragraphs built into Thunder CMS, from which we may build an article, are:

1. Text

The fundamental tool for writing the content itself. With the support of the extremely popular CKEditor module it becomes an extensive editor that meets even complex requirements.

CKEditor module in Thunder CMS is an essential tool that lets us conveniently write the content.


For more advanced users, it's also possible to edit content directly in the HTML code field:

In CKEditor module in Thunder CMS, we can also type content in html source window.


2. Image

The option of adding and editing a photo that works on exactly the same principle as we described above in the TEASER section.

3. Gallery

It allows creating photo galleries. The process itself is very simple and similar to what was presented above in the image section. The only difference here is the ability to add many photos at once.

An example of adding a photo gallery in Thunder CMS:

In Thunder CMS, users can create their own photo galleries and add many images at once.


The gallery created in this way will be displayed on the web page in the form of a slider, with the possibility to switch photos, as in the picture below:

Images added to the photo gallery in Thunder CMS are displayed in an attractive slide form.


4. Instagram

In this field, we can provide the URL of an Instagram post to embed on the website. Unfortunately, using this option requires additional work from us. For security reasons and due to the requirements arising from Meta's policy, authentication is necessary. We can do this by completing the configuration: /admin/config/media/instagram-settings.

Instagram settings in Thunder CMS allow us to embed an image from this channel on the website.


It's required to create an appropriate account here to obtain the indicated data. We can find full configuration instructions on the official Facebook documentation web page.

5. Twitter

The field for embedding Twitter posts. Unlike Instagram, it works straight away and doesn't require any additional actions.

Twitter settings in Thunder CMS let us embed a post on our web page without complicated actions.


6. Pinterest

As with Twitter, in this field we embed a link to a Pinterest post.

Pinterest settings in Thunder CMS enable post embedding on a website, such as pictures and videos.


7. Video

As with the photo editor, we have the ability to select a previously added movie from the media library or create a new video.

With the video editor in Thunder CMS, we can conveniently add any video to the website. In the Thunder CMS editor, we can select a video from the media library or upload a new one.


When adding a new video, we can also insert it using a link from portals such as YouTube, Vimeo, or Tiktok. Such a movie, depending on the selected source, is embedded on the web page with the appropriate player.

8. Link

This field lets us insert a link with an optional anchor text. It should be displayed as:

Thunder CMS has a field to add a link (URL) with a text anchor.


9. Quote

This option allows for creating a quote.

The Quote field in the CMS editor allows users to add a quote to an article on the page.


Note that we mentioned above only those paragraphs that are built directly into the Thunder distribution. They fulfill most of the basic needs arising from creating an article. However, the system itself doesn't limit us to using these options only.

For more advanced users or developers, Thunder CMS makes it possible to build custom paragraphs that meet any requirements. Thus, this tool in the right hands is extremely powerful, and the number of possibilities - is virtually unlimited.

Publication options

Another important element of any content creation tool is the ability to configure and manage the publication. Thunder CMS also provides us with extensive and adjustable functions here.

What catches our eye is the publication bar "fixed" to the bottom of the screen.

The CMS editor has a convenient publication bar in the article creation window for easy management.


With its help, we're able to save or delete the article at any time, as well as change its status to one of the following:

  • draft: rough version, not visible to users,
  • unpublished: finished but unpublished article, not visible to users,
  • published: published article available to everyone.

Another vital element is the side menu for more complex operations.

The side menu in Thunder CMS editor is used to complete important information about the article.


Here we can find information about the date of the last saved version of the document, the possibility of creating a new revision during saving, or leaving information when making such a revision.

Let's stop for a moment to take a look at the concept of revision. What does it mean? Thunder, by default, provides versioning for every change in our article. Using the REVISIONS item in the menu, we are able to view all saved versions of our document:

The CMS versions the changes to the article, and the savings are available in the Revisions section.We can compare every version of the article in the Revision section of Thunder CMS.


It allows us to compare the differences between versions and restore previous versions.

With Thunder CMS, the user doesn’t have to worry that any changes to the article will be lost.


This is a handy and simple solution ensuring that no changes will be lost, and in case of any mistake, it'll be easy to restore the last correct version of our website.

Among the available options of our sidebar, we can also find:

  • Meta tags: an extensive tool enabling customization of the default meta values of our website. A beneficial and comprehensive solution for SEO specialists.
Meta Tags is another important section in CMS editor, which values especially for SEO.
  • Simple XML Sitemap: the configurations concerning the website regarding its existence within the sitemap. We can decide here whether the article is to be indexed and what priority it should have. And yes – Thunder includes an automatically generated XML sitemap by default.
In the Simple Sitemap section, we can decide if the article has to be indexed and in which order.
  • URL alias: as we mentioned above, the alias of our website is automatically generated based on the SEO title, but to leave us complete freedom and configurability, Thunder’s creators also allow editing the alias from this position.
In the URL Alias field in the editor, we can complete and configure a new alias for our web page.
  • URL redirects: enables creating and managing the redirections on our website.
In the URL Redirects field users have the ability to create and manage redirects on the website.
  • Scheduling options: an extremely useful option that allows scheduling the article publication. From here, we can also set the end date of the publication - this option can be helpful, for example, in the case of a sponsored article, which is to be displayed on our website only for a certain time period.
 Thanks to scheduling options, we can plan the publication of our article for any suitable date.
  • Authoring information: fully editable information about the author and the creation date of the article.
The section with authoring information in CMS shows the author and the date of article creation.


This concludes our adventure with the article creation window. It’s an essential part of Thunder CMS and the place where editors and content creators spend the most time. The comprehensive solution proposed by Thunder is one of the best on the market because it combines ease of use with the complexity of possibilities.

Additional functionalities

In addition to the core Thunder’s functionality, i.e. the editor, with which we spend most of our time, this system also has a number of other useful elements. We would like to present some of these and show you how to use them.

Mobile preview

The creators of Thunder are aware that we live in a world dominated by mobile devices. Therefore, they provide us with a content management system which allows us to check whether articles display properly on smartphones and tablets.

When logging in as an administrator, we can find a phone icon in the admin bar anywhere on the web page. Clicking it allows us to select the model of a mobile device for simulation. As a result, our website will go into mobile version inspection mode, visible only to us. It's a great and simple tool that enables finding any irregularities on our web page in the blink of an eye.

Mobile preview in CMS Editor lets us see how the article presents itself on smartphones or tablets.


Liveblog

The name of this module already describes its use. Liveblog allows us to create dynamic, real-time changing articles. It's an ideal solution for reporting sports events or dynamically evolving crisis situations. There are many ways to use it, and we're sure that already while reading this paragraph, you'll come up with at least a few new ones.

Demo Content and Guided Tour

By installing these additional modules (they are already included with the system, we only need to turn them on), we get Thunder with basic configuration and sample content. This allows us to get used to the system faster and understand specific dependencies. All screenshots in this article come from Demo Content. In addition, the admin bar is enriched with the Tour icon, and after clicking it, we're guided through the possibilities and functionalities of Thunder. It's a great way to start the adventure with this system.

Guided Tour in Thunder CMS makes us try out the content management system step by step.

CMS Thunder for the Zawsze Pomorze news magazine

One of our projects created with Thunder is the website of the Pomeranian regional portal Zawsze Pomorze. The client wanted an easy-to-use, yet a sufficiently extensive system that would allow many journalists to work on several articles at the same time and to manage the publication efficiently.

The website includes an extensive category system, allows for leaving comments, creating "live" articles, and has a complex system of adding and editing sections for individual categories on the home page. The layout can be dynamically edited without any technical knowledge. The system also includes a window for supporting the authors with donations, visible directly on the article page.

Thunder CMS review – summary

From the developers’ perspective, we have to admit that working on projects using Thunder is extremely pleasant. The number of possibilities this tool provides out of the box meets most of the requirements. As programmers, we can create and develop CMS systems for media and publishers, focusing only on individual needs and solutions. This greatly shortens the development process and allows building even large-sized websites in a relatively short time.

From the publisher point of view, it's also a very decent system that meets many needs. It maintains the perfect balance between the number of possibilities and their simplicity of use - we are never overwhelmed with a large number of often redundant options. The reduced development time also allows investing in additional functionalities, as the very core on which Thunder is based is a robust and comprehensive solution.

Mar 10 2023
Mar 10

By default, a standard Drupal installation already comes with a predefined block type that is used when you add a custom block to a region: the Basic block. It only has a title and body field. Drupal block types work quite similar to content types. It allows you to create a particular type of block by adding and managing fields, form display and display modes.

Out of the box, Drupal offers several ways to use a Twig template to theme your custom block. However, when you want to create a Twig template that applies to all blocks of a specific block type, you need a few lines of code.

To enable templates for custom block types, we use theme suggestions. Assuming you have a custom Drupal theme called "mycustomtheme", place the following snippet in your mycustomtheme.theme file:

function mycustomtheme_theme_suggestions_block_alter(array &$suggestions, array $variables) 
{ 
  // Block suggestions for custom block types. 
  if (isset($variables['elements']['content']['#block_content'])) { 
    array_splice($suggestions, 1, 0, 'block__type__' . $variables['elements']['content']['#block_content']->bundle()); 
  } 
}

Now you can use block–type–my-block-type.html.twig to theme every custom block of the block type you have created.

Happy theming!

Mar 07 2023
Mar 07

6. Links are hard

Particularly links with numeric IDs.

Links can point anywhere, so everything we just said about dependencies goes out of the window! Unfortunately node IDs have a habit of appearing publicly even if we try not to. Then they become a leaky abstraction.

We tend to run all our HTML text fields through a process plugin and try to clean up as much as possible. Sometimes we can establish if there’s an alias in D7 and use that.

The worst scenario is if you are migrating content with numeric links, and you combine migrated and new content. We’ve started experimenting with reserving a block of IDs for legacy content, like this:

  ALTER TABLE node AUTO_INCREMENT 100000;

We tried this on our most recent migration project, and it worked beautifully. It also helped during the UAT process, as there was never any question about whether we were looking at the same item of content on the legacy and migrated sites.

Mar 07 2023
Mar 07

3. Derivatives

Scaling your configurations can be made simple by using derivatives.

When you have a large number of migrations, with varying variations and discrepancies between sites that have slowly diverged over time, what you may need to do is define a base configuration to cover the commonalities, and then create a separate migration file for each variation which extends it. This allows you to run each of the individual migrations covering all the variations, but as the project scales and the number of variations increases, so does the number of YAML files – it can all get quite tricky to manage! This is where derivatives are extremely useful.

Derivatives provide a simple way to expand a single plugin so that it can represent itself as multiple plugins, and are used in other parts of Drupal, not just in migrations. To implement derivatives, you just need to add in an extra line in your YAML file to define the path to the deriver plugin you want to use. This causes your YAML file to become a definition of a base plugin and then it’s up to the deriver to take this base definition and generate lots of copies.

The end result is that we end up with lots of migrations that operate and behave the same as normal non-derivative migrations, but rather than multiple YAML files to maintain covering all of these migrations, there is just one base YAML file.

This is a huge time saver when it comes to creating, or later tweaking the migrations. You can just edit the variations in one central location, and avoid having to edit multiple YAML files when you need to change something later.

The derived migration machine names are a bit different from what you may be used to. They take the format of the base identifier, colon and derivative name (base_identifier:derivative_name). For example, if you have a migration for the “page” content type and create a derivative for each site in a multisite, then the derived migrations could have names like “node_page:site1”, “node_page:site2” and so on.

Similarly a set of derived taxonomy migrations could be created as “taxonomy_term:tags”, “taxonomy_term:product_types”, etc. Other than this slightly different naming structure they are exactly the same. This article has the essence of how to actually use derivers: “Migrations can now be derived and executed after a specific other migration.”

Mar 06 2023
Mar 06

Working in the front end of Drupal can be difficult and, at times, confusing. Template files, stylesheets, scripts, assets, and business logic are often scattered throughout big code bases. On top of that, Drupal requires you to know about several drupalisms, like attaching libraries to put CSS and JS on a page. For front-end developers to succeed in a system like this, they need to understand many of Drupal's internals and its render pipeline.

Looking at other stacks in our industry, we observed that many try to bring all the related code as close as possible. Many of them also work with the concept of components. The essence of components is to make UI elements self-contained and reusable, and while to some extent we can do that in Drupal, we think we can create a better solution.

That is why we wanted to bring that solution to Drupal Core. Recently, the merge request proposing this solution as an experimental module was merged. This article goes over why we think Drupal needs Single Directory Components and why we think this is so exciting.

The goals of SDC.

Our primary objective is to simplify the front-end development workflow and improve the maintainability of custom, Core, and contrib themes. In other words, we want to make life easier for Drupal front-end developers and lower the barrier of entry for front-end developers new to Drupal.

For that, we will:

  • Reduce the steps required to output HTML, CSS, and JS in a Drupal page.
  • Define explicit component APIs, and provide a way to replace a component that a module or a theme provides.

This is important because it will vastly improve the day-to-day of front-end developers. In particular, we aim for these secondary goals.

  • HTML markup in base components can be changed without breaking backward compatibility (BC).
  • CSS and JS for a component are scoped and automatically attached to the component and can be changed without breaking BC.
  • Any module and theme can provide components and can be overridden within your theme.
  • All the code necessary to render a component is in a single directory.
  • Components declare their props and slots explicitly. Component props and slots are the API of the component. Most frameworks and standards also use this pattern, so it will be familiar.
  • Rendering a component in Twig uses the familiar include/embed/extends syntax.
  • Facilitate the implementation of component libraries and design systems.
  • Provide an optional way to document components.

Note that all this is an addition to the current theme system. All of our work is encapsulated in a module by the name of sdc. You can choose not to use single directory components (either by uninstalling the module or just by not using its functionality). The theme system will continue to work exactly the same.

History

Whenever SDC (or CL Components) comes up, we get the same question: "Isn't that what UI Patterns has been doing since 2017?"

The answer is yes! UI Patterns paved the way for many of us. However, we did not start with UI Patterns for the proposal of SDC. The main reasons for that are:

  1. UI Patterns is much bigger than we can hope to get into Core. We share their vision and would love to see site builder integrations for components in Drupal Core one day. However, experience tells us that smaller modules are more likely to be accepted in Core.
  2. The UI Patterns concepts were spot on six years ago. Our understanding of components in other technologies and frameworks has changed what we think components should be.

In the end, we decided to start from scratch with a smaller scope, with the goal of creating something that UI Patterns can use someday.

We started this initiative because many of us have several custom implementations with the concept of Drupal components. See the comments in the Drupal.org issue in the vein of "We also do this!" Standardizing on the bare bones in Core will allow extending modules and themes to flourish. Most importantly, these modules and themes will be able to work together.

Architectural decisions

The initial team, which included Lauri Eskola, Mike Herchel, and Mateu Aguiló Bosch, met regularly to discuss the technical architecture, principles, and goals of SDC. Here are some of the fundamental architectural decisions we landed on:

Decision #1: All component code in one directory

As we have learned from other JavaScript and server-side frameworks, components must be self-contained. The concepts of reproducibility and portability are at their Core. We believe that putting components in a directory without any other ties to the site will help implement those concepts. You can take a component directory and copy and paste it to another project, tweaking it along the way without a problem. Additionally, once a developer has identified they need to work with a given component (bug fixes, new features, improvements, etc.), finding the source code to modify will be very easy.

Decision #2: Components are YML plugins

We decided that components should be plugins because Drupal needs to discover components, and we needed to cache the component definitions. Annotated classes were a non-starter because we wanted to lower the barrier for front-end developers new to Drupal. We believe that annotated PHP classes fall more in the realm of back-end developers. While there are many file formats for the component definition for us to choose from, we decided to stay as close as possible to existing Drupal patterns. For this reason, components will be discovered if they are in a directory (at any depth) inside of my_theme/components (or my_module/components) and if they contain a my-component.component.yml.

The alternative we considered more seriously was using Front Matter inside the component's Twig template. Ultimately we discarded the idea because we wanted to stay close to existing patterns. We also wanted to keep the possibility open for multiple variant templates with a single component definition.

Decision #3: Auto-generated libraries

We believe this is a significant perk of using SDC. We anticipate that most components will need to have CSS and JS associated. SDC will detect my-component.css and my-component.js to generate and attach a Drupal library on the fly. This means you can forget about writing and attaching libraries in Drupal. We do this to lower the barrier of entry for front-end developers new to Drupal. If you are not satisfied with the defaults, you can tweak the auto-generated library (inside of the component directory).

Decision #4: Descriptive component API

Early in the development cycle, we decided we wanted component definitions to contain the schema for their props. This is very common in other technology stacks. Some use TypeScript, other prop types, etc. We decided to use JSON Schema. Even though Drupal Core already contains a different language to declare schemas (a modified version of Kwalify), we went with JSON Schema instead. JSON Schema is the most popular choice to validate JSON and YAML data structures in the industry. At the same time, Kwalify dropped in popularity since it was chosen for Drupal 8 nearly 11 years ago. This is why we favor the latter in the trade-off of Drupal familiarity vs. industry familiarity. We did this to lower the barrier of entry for front-end developers new to Drupal.

The schemas for props and slots are optional in components provided by your themes. They can be made required by dropping enforce_sdc_schemas: true in your theme info file. If your components contain schema, the data Drupal passes to them will be validated in your development environment. Suppose the component receives unexpected data formats (a string is too short, a boolean was provided for a string, a null appears when it was not expected, ...). In that case, a descriptive error will tell you early on, so the bug does not make it to production.

Schemas are also the key to defining the component API and, therefore, assessing compatibility between components. As you'll see below, you can only replace an existing component with a compatible one. Moreover, we anticipate prop schemas will be instrumental in providing automatic component library integrations (like Storybook), auto-generating component examples, and facilitating automated visual regression testing.

Decision #5: Embedded with native Twig tools

To print a component, you use native Twig methods: the include function, the include tag, the embed tag, and the extends tag. SDC integrates deeply with Twig to ensure compatibility with potential other future methods as well.

In SDC, we make a distinction between Drupal templates and component templates. Drupal templates have filenames like field--node--title.html.twig and are the templates the theme system in Drupal uses to render all Drupal constructs (entities, blocks, theme functions, render elements, forms, etc.). By using name suggestions and applying specificity, you make Drupal use your template. After Drupal picks up your Drupal template, you start examining the variables available in the template to produce the HTML you want.

On the other hand, component templates have filenames like my-component.twig. You make Drupal use your component by including them in your Drupal templates. You can think of components as if you took part of field--node--title.html.twig with all of its JS and CSS and moved it to another reusable location, so you can document them, put them in a component library, develop them in isolation, etc.

In the end, you still need the specificity dance with Drupal templates. SDC does not replace Drupal templates. But, if you use SDC, your Drupal templates will be short and filled with embed and include.

Decision #6: Replaceable components

Imagine a Drupal module that renders a form element. It uses a Drupal template that includes several components. To theme and style this form element to match your needs, you can override its template or replace any of those components. The level of effort is similar in this case.

Consider now a base theme that declares a super-button component. Your theme, which extends the base theme, makes heavy use of this component in all of its Drupal templates, leveraging the code reuse that SDC brings. To theme the pages containing super-button to match your needs, you'll need to override many templates or replace a single component. The level of effort is nothing similar.

This is why we decided that components need to be replaceable. You cannot replace part of a component. Components need to be replaced atomically. In our example, you would copy&paste&tweak super-button from the base theme into your custom theme. The API of the replacing component needs to be compatible with the API of the replaced component. Otherwise, bugs might happen. Both components must define their props schema for a replacement to be possible.

Example of working with SDC

Let's imagine you are working on theming links for your project. Your requirements include styling the links, tracking clicks for an analytics platform, and an icon if the URL is external. You decide to use SDC. So you scaffold a component using drush (after installing CL Generator). You may end up with the following (you'll want to use your custom theme instead of Olivero):

After the initial scaffold, you will work on the generated files to finalize the props schema, add documentation to the README.md, include the SVG icon, and implement the actual component. Once you are done, it might resemble something like this.

web/core/themes/olivero/components
└── tracked-link
    ├── img
    |   └── external.svg
    ├── README.md
    ├── thumbnail.png
    ├── tracked-link.component.yml
    ├── tracked-link.css
    ├── tracked-link.js
    └── tracked-link.twig

Below is an example implementation. Be aware that, since this is for example purposes only, it may contain bugs.

# tracked-link.component.yml
'$schema': 'https://git.drupalcode.org/project/drupal/-/raw/10.1.x/core/modules/sdc/src/metadata.schema.json'
name: Tracked Link
status: stable
description: This component produces an anchor tag with basic styles and tracking JS.
libraryDependencies:
  - core/once
props:
  type: object
  properties:
    attributes:
      type: Drupal\Core\Template\Attribute
      title: Attributes
    href:
      type: string
      title: Href
      examples:
        - https://example.org
    text:
      type: string
      title: Text
      examples:
        - Click me!

Note how we added an attributes prop. The type will also accept a class, an enhancement we had to make to the JSON Schema specification.

# tracked-link.twig
{# We compute if the URL is external in twig, so we can avoid passing a #}
{# parameter **every** time we use this component #}
{% if href matches '^/[^/]' %}
  {% set external = false %}
{% else %}
  {% set external = true %}
{% endif %}


  {{ text }}
  {% if external %}
    {{ source(componentMeta.path ~ '/img/external.svg') }}
  {% endif %}

If a component receives an attributes prop of type Drupal\Core\Template\Attribute it will be augmented with component-specific attributes. If there is no attributes prop passed to the component, one will be created containing the component-specific attributes. Finally, if the attributes exists but not of that type, then the prop will be left alone.

/* tracked-link.css */

.tracked-link {
  color: #0a6eb4;
  text-decoration: none;
  padding: 0.2em 0.4em;
  transition: color .1s, background-color .1s;
}
.tracked-link:hover {
  color: white;
  background-color: #0a6eb4;
}
.tracked-link svg {
  margin-left: 0.4em;
}

Components that make use of attributes receive a default data attribute with the plugin ID. In this case data-component-id="olivero:tracked-link". We could leverage that to target our styles, but in this example, we preferred using a class of our choice.

// tracked-link.js
(function (Drupal, once) {
  const track = (event) => fetch(`https://example.org/tracker?href=${event.target.getAttribute('href')}`);

  Drupal.behaviors.tracked_link = {
    attach: function attach(context) {
      once('tracked-link-processed', '.tracked-link', context)
        .forEach(
          element => element.addEventListener('click', track)
        );
    },
    detach: function dettach(context) {
      once('untracked-link-processed', '.tracked-link', context)
        .forEach(element => element.removeEventListener('click', track));
    }
  };
})(Drupal, once);

 With this, our component is done. Now we need to put it in a Drupal template. Let's inspect the HTML from a Drupal page to find a link, and there we'll find template suggestions to use. Let's say that our links can be themed with field--node--field-link.html.twig. To use our component, we can use include because our component does not have slots.

# field--node--field-link.html.twig

{% for item in items %} {{ include('olivero:tracked-link', { attributes: item.attributes, href: item.content.url, text: item.content.label }, with_context = false) }} {% endfor %}

{{>

Single Directory Components were merged into Drupal core 10.1.x. This means that they will be available for all Drupal sites running Drupal 10.1 as an experimental module.

However, we are not done yet. We have a roadmap to make SDC stable. We are also preparing a sprint in DrupalCon Pittsburgh 2023 for anyone to collaborate. And we have plans for exciting contributed modules that will make use of this new technology.

Note: This article has been updated to reflect the latest updates to the module leading to core inclusion.

Mar 01 2023
Mar 01

There are two choices when it comes to building a website. You can choose an open source platform like Drupal or WordPress, or a proprietary platform overseen by a company like Adobe or Microsoft. How do you know which is best for your website?

Things to consider:

  • How much user support will I get?

  • Which is better for security?

  • Is the cost within budget?

For organizations with limited budgets, the choice is either an open source site or something less flexible like Wix or Squarespace – the cost attached to a proprietary platform might be out of reach. However, for a large enterprise organization, both approaches have pros and cons worth addressing.

Proprietary platforms can be attractive to many large organizations for several reasons. In addition to promising great platforms customized to the client's business needs, proprietary arrangements typically offer full hosting plans. The company behind the CMS handles all updates, upgrades, security issues, and bugs – often 24/7.

While proprietary software comes with a high price point, there's a sense of justification behind it: at least you get what you pay for.

It's worth noting, though, that many of the world's biggest corporate brands use Drupal as their CMS of choice, including General Electric, Tesla, IBM, Paramount Global, United Airlines, and the Royal Family. The Government of Australia operates on Drupal, as does the Government of Ontario, the Canadian Security Intelligence Service (CSIS), several US state governments, and countless other government agencies around the world.

So, why do organizations that have large budgets for web development opt for an open source platform, despite the supposed advantages touted by proprietary providers?

The answers are numerous, ranging from a need for financial accountability to the supportive nature of the Drupal community. These factors more than make up for any potential shortcomings of the open source model.

This article runs through some popular myths around proprietary and open source platforms that continue to influence decision making.

Myth #1: Proprietary platforms provide better user support

One of the main selling points of proprietary platforms is that its vendors promise 24/7 client support should anything go wrong with the site, or if you need anything customized. This 24/7 support comes at a cost. For institutions concerned about sudden emergencies, this is obviously an appealing offering that for many justifies the price tag.

What proprietary vendors won't tell you, however, is that open source platforms like Drupal provide much of the same service (typically in tandem with an agency and an infrastructure partner like Acquia or Pantheon). This is provided at no cost through their networks of volunteers and sponsored contributors.

Drupal, for example, is supported by a global community of hundreds of thousands of contributors who work collaboratively to address technical issues and improve the platform.

In the Drupal world, when you find a bug and create a report within the community, the response — while not necessarily instantaneous — is typically fast. While mission-critical sites like government platforms will need to pay somebody to be available for 24/7 support, this broader community support is of enormous benefit to all Drupal users.

Proprietary platforms do have counterparts to this type of community, but they're oftentimes much smaller. Sitecore, for example, advertises that it has a community of 20,000 developers. This is a drop in the bucket compared to the scope of the Drupal developer community.

Myth #2: Proprietary is more secure than open source

This is a stubborn myth — understandably. Open source code, by its nature, is publicly available to anyone, including individuals with malicious intent. In contrast, proprietary platforms keep their codebases under lock and key. The for-profit nature of proprietary vendors gives them a greater (financial) incentive to track down and neutralize bad actors.

The unpopular truth is that proprietary platforms are every bit as vulnerable to attacks as their open source counterparts — if not more so.

For one thing, most security breaches don't come from hackers scouring source code for weak spots, but from avoidable human lapses such as failures to follow security guidelines, improper software setup, use of easy passwords, lack of data validation processes, and absence of data encryption techniques. These lapses are no less likely to occur on a proprietary platform than they are on an open source one.

Paradoxically, the open source nature of platforms like Drupal is actually more of a help than a liability when it comes to cybersecurity. Open source code means that anyone with the know-how can search for and identify vulnerabilities. And with an army of over a million developers contributing behind the scenes, it's safe to say that Drupal takes its security very seriously. Proprietary vendors, by contrast, are limited in this capacity by their cybersecurity staffing numbers.

Myth #3: Proprietary costs more, so you get more value

It's widely believed that when you opt for a less expensive product —in this case, an open source website — you're either settling for a "less-good" quality product or setting yourself up for additional costs down the road in the form of upgrades and modifications. Proprietary websites may cost more at the outset, but at least you know you're getting something of real quality and the costs are predictable.

In truth, there is no difference in quality between open source and proprietary websites. It all depends on the quality of workmanship that goes into building the sites. And while any website project is vulnerable to budget overruns, proprietary platforms are actually more prone to them than open source ones.

When you opt for a proprietary platform, you automatically commit to paying for a license. This may be a one-time cost or a recurring subscription fee. In many cases, proprietary providers charge on a "per-seat" basis, meaning that the larger your team gets, the more expensive maintaining your website becomes. An open source site, by contrast, costs nothing beyond what you spend on design, and is in fact much more predictable from a cost standpoint.

This is of particular importance to governments, whose website development and renewal costs are publicly available and subject to intense media scrutiny. The Government of Canada faced negative press after it hired Adobe to restructure a vast swath of federal websites under the Canada.ca URL. A project originally valued at $1.54 million in 2015 had by the following year ballooned to $9.2 million. While details were scant, some of this budget overrun was attributed to costs due to additional staffing requirements. Cue impending doom music.

Websites built on open source platforms like Drupal aren't cheap to develop, but the costs are almost always more predictable. And when it's the taxpayers who are footing the bill, this is a major advantage.

Bonus: Open source = wider talent base

If you're a large government organization with complex web needs, chances are you'll be looking to hire in-house developers. From this standpoint, it makes much more sense to opt for an open source web platform in terms of available talent. The magnitude of the Drupal community relative to, say, Sitecore, means that your LinkedIn search is far more likely to turn up Drupal specialists in your area than Sitecore experts.

Similar disparities exist when it comes to providing your staff with training. Drupal training is widely available and affordable. Hint: they offer customized training. Becoming a licensed developer for something run by Adobe, by contrast, is a much more complex and expensive undertaking.

Why Drupal specifically?

I've touted Drupal extensively throughout this post, as Evolving Web is the home of many Drupal trainers, developers and experts However, it's far from the only open source CMS option out there. WordPress remains the world's most popular CMS platform, being used by some 43% of the world's websites.

Drupal does, however, stand out from the pack in a number of important ways. The Drupal platform simply has more features and is a lot more supportive of customization than most of its open source competitors. This is perhaps less of a big deal if you're a small business or organization with a narrow area of focus. But government websites are generally complex, high-traffic undertakings responsible for disseminating a wide range of content to a diverse array of audiences.

Other cool government sites are using It

Evolving Web recently redesigned the official website for the City of Hamilton. As the main online hub for Canada's ninth largest municipal area, serving some 800,000 people, the City of Hamilton website caters to a wide range of audiences, from residents and local business people to tourists and foreign investors. Its services run the gamut, enabling residents to plan public transit use, pay property taxes, find employment, apply for a marriage license, and get information on recreational activities, among many other options.

The City of Hamilton site exemplifies many of Drupal's strengths. Like many government websites, it encompasses vast swaths of data and resources and is subject to considerable surges in traffic, both of which Drupal is well equipped to handle. The site revamp also involved corralling various third-party services (including the recreation sign-up and council meeting scheduler) and a half-dozen websites that existed outside of Drupal. This required creative solutions of the sort that the Drupal community excels at developing.

Drupal upholds accessibility standards

A further advantage of Drupal for government websites is that its publishing platform, along with all of its other features and services, is designed to be fully accessible in accordance with WCAG standards. Drupal's default settings ensure accurate interpretation of text by screen readers, provide accessible color contrast, and intensity recommendations. They also generate pictures and forms that are accessible and incorporate skip navigation in its core themes.

You are in good company

All this attests to the strengths of the open source model — and of Drupal in particular — underpinned as it is by an army of over a million contributors. Thanks to this, the platform is in a constant state of improvement and innovation, of which every single Drupal user is a beneficiary.

Join the club

At Evolving Web, we specialize in helping organizations harness their online presence with open source platforms like Drupal and WordPress. Let's keep in touch!

Join our inner circle and sign up for our newsletter, where you'll get insider content and hear more about upcoming training programs, webinars and events.

Feb 27 2023
Feb 27

Since Drupal 10 has been released recently, we decided to cover ins and outs of the new Drupal core version. You might have already read our blog post on the advantages of the Claro admin theme, which is available to administrators and content managers of Drupal 10 websites.

We have good news for those who develop user interfaces in Drupal, too: you will get a new default front-end theme. Drupal 9.1 brought the Olivero theme in as an experiment, and it is to stay in the latest version of Drupal. In this post, we are explaining the philosophy at the heart of Olivero and why it makes Drupal even more attractive.

Olivero

What Was Wrong With the Old Theme?

The front-end theme available in Drupal core has been updated for the first time in 11 years — this is the time CMS users spent with the Bartik theme. During DrupalCon Amsterdam 2019, the team working on Olivero gave credit to the old theme but rightfully mentioned that it no longer spoke to web design trends and Drupal’s improvements.

For instance, graphical elements, gradients, and drop shadows in Bartik lost their novelty, while the interface did not support the Layout Builder, secondary dropdown menus, embedded media, and other features added to CMS since Drupal 7.

Drupal devotees, including our web studio, talk a lot about flexible contemporary modules for any task as well as other benefits of the CMS. But imagine someone who sees Drupal for the first time and judges it by its outdated theme. Will they share our enthusiasm in 2023? We don’t think so.

The first discussion of Olivero took place in the lobby of the hotel hosting DrupalCon Seattle 2019. This casual conversation among Drupal enthusiasts turned into a strategic initiative. Three main requirements for the new out-of-the-box front-end theme were defined:

  • Drupal theme design should feel modern and age well in the next several years.
  • The theme should be able to support new Drupal features.    
  • It should comply with the design accessibility standards described in WCAG AA.

Feel free to thank all contributors who helped Olivero come to life. These are the participants of the conversation in the hotel, stakeholders, and the most prominent contributors to the project: Putra Bonaccorsi (proeung), Mike Herchel (mherchel), Angie Byron (webchick), Lauri Eskola (lauriii), Dries Buytaert (dries), Cristina Chumillas (ckrina), Gábor Hojtsy (gábor-hojtsy), Jen Witkowski (jwitkowski79), Jared Ponchot (jponch), Kat Shaw (katannshaw), and Matthew Tift (mtift).

Feb 25 2023
Feb 25

Let's define the scope and goals of our project to upgrade this very website to Drupal 10.

Essentially, that's it: we want to upgrade this website to Drupal 10 so that we can benefit from security releases etc.
At the moment we want to do so with the minimum of effort, so I don't want to have to be writing lots and lots of code or changing fundamentally how the site works, but I am up for simplifying things if it gets us to a point where we have to maintain less code.

Since Drupal 9, major version upgrades now take this basic form:

  • Update your code to be fully compatible with the last version of Drupal, removing all deprecations: hard.
  • Upgrade to the new version of Drupal: easy!

I'm going to install and use the fantastic Upgrade Status module to get a detailed handle on what we need to change, upgrade and rewrite to get the site working in Drupal 9, but ready for Drupal 10. We'll use that as a basis to see what we need to upgrade, the best plan for each component and go from there.

Upgrade status - First pass

We previously have composer require'd the upgrade status module into our codebase, so after enabling and running the report, here are the major findings that concern us for this series:

Environment

  • We'll need to upgrade to PHP 8.x, the site is currently running on PHP 7.4.
  • We're using deprecated or obsolete modules that come with core and will be removed in Drupal 10. This is a rather scarily long list for us:
    • CKEditor
    • Color
    • RDF
    • Seven
    • Stable

But other than that, we're good to go from an environment point of view.

Contrib projects

Upgrade status breaks the list of contributed projects down into a few sections, those are:

  • Projects that need an upgrade that might make them Drupal 10 compatible:
    • Better exposed filters
    • Components
    • Disqus
    • Advanced link
    • Entity browser
    • jQuery UI Slider
    • Scheduler
    • Simple XML Sitemap
    • Twig Tweak
    • Webform
  • Projects that don't have Drupal 10 releases yet, so either require patches or work to get them to Drupal 10:
    • Entity Embed
    • jQuery UI Sortable
    • Kraken
    • Markdown
    • Social media share
    • Term Reference Change
    • Unified Twig Extensions
    • Video Embed HTML5
    • Weight
  • Projects that are compatible with Drupal 10 already, I'll not list those, but there are plenty already, it's great to see community support for Drupal 10.

Custom code

Upgrade status will scan your code and tell you if there are problems that can be spotted that will stop the code working with Drupal 10. This is static analysis, so isn't perfect, but is a really good start.
We have a few custom modules doing very specific things on our site, but we have a custom theme, doing quite a lot of custom things, and that's where the main bulk of the issues the scanner found are, so we're going to need to set aside some time for that.

Simplifications

This site was built in the early Drupal 8 days, and we've not actually made too many changes since, specifically when we upgraded to Drupal 9 we basically did the smallest amount of work to get it there. How you'd typically handle media on a Drupal site has fundamentally changed since we built this site, in that you'd likely use the core Media module and add entity reference fields to your entities rather than adding image/file fields directly. However, we never had that luxury and never got around to changing our approach to use the core Media framework.

So, we're going to allow ourselves a bit of scope creep to do this 'sub project' given that the benefits are that we're going to be able to remove a bunch of modules: entity browser, file browser, etc. that will then mean that we don't need to upgrade those modules and our dependencies will be better supported: since they'll be in Drupal core. It's no slight against those modules, it's just that we don't need the functionality they bring, for our site today.

The scope/plan

So roughly the scope/plan is shaping up to be:

  1. Convert our file/image fields to core media, and remove entity browser, file browser, etc.
  2. Update our custom code
  3. Evaluate the remaining upgradeable contrib projects to see if we can remove them, and if not, upgrade them.
  4. Evaluate the remaining non-upgradeable contrib projects to see if we can remove them, and if not, work with maintainers to get them upgraded.
  5. Handle the core modules that have been marked as deprecated or obsolete.
  6. Upgrade the PHP version we use to run the site
  7. Get the site running in tip-top condition with the latest Drupal 9 etc.
  8. Do the Drupal 10 upgrade.

Then we'll have a shiny Drupal 10 install, ready for the next few years of security patching.

Feb 23 2023
Feb 23

Editor’s Note -- This article was formerly listed as the Top 10 Websites Built with Drupal, and based on TopDrops.org. That site has since stopped updating, so we decided to pivot towards a new kind of value for our readers: the most surprising examples of Drupal-run sites.

Some of the world’s most influential businesses and organizations run their websites using Drupal: General Electric, eBay, The Economist, etc.

A good number of groups using the CMS might come as a surprise, however; and they prove its reliability for creating powerful and noteworthy sites. We checked the web to bring you our list of the top Drupal websites.

For a list of Drupal’s 10 best sites, read on.

Learn Why Varbase CMS Is the Best Multilingual Enterprise-Grade Drupal Website Builder

Source: Entertainment News

Entertainment Weekly (a.k.a. EW) is an American publication, owned by Time Inc. The famous magazine covers various media entertainment niches from cinema, television, music, theater, books, and pop culture.

EW is renowned as a pioneer in covering all things related to Hollywood! From the latest films and trends to the high-octane lives of its celebrities. EW reports on television ratings, movie grosses, production costs, and even concert ticket sales. Their in-depth content is among the top resources for the world’s favorite shows, producers, showrunners, and more.

The platform is definitely one of the top Drupal sites and the most popular one ever! Ew.com received approximate organic traffic of 10 million visits per month in 2021, according to Seranking.com.

Source: Tesla.com

You’ve certainly heard a thing or two about Tesla's fleet of self-driving, Cybertrucks, extremely fast and powerful electric Roadsters, or even the Tesla Wall: a giant battery providing homes with storage options for clean energy.

Tesla is indeed one of the world’s most talked-about companies today as it is the pioneer of electric motors building and clean energy.

A billion-dollar company's website is ought to be super neat, clean, and highly effective at showcasing its products. This was accomplished with the power of Drupal CMS.

We’re big fans of their homepage in particular, and we recommend you check it out. It’s exactly what a future tech company’s website should look like!

Source: NCAA.com

College sports in the United States are big business. The National Collegiate Athletic Association (NCAA) is a non-profit association that regulates athletic competitions for 1,281 institutions, hosts conferences, and manages related organizations across the United States.

In 2014, the NCAA generated nearly a billion dollars in revenue—80 to 90% of which was thanks to the Men's Division I Basketball Tournament.

Their website is a functional mix of sports journalism and sales. Not only do they post schedules, analysis, and video coverage, but they also market their team merchandise hosted on the secondary site, shopncaasports.com.

 

Source: MINT.com

Mint.com is a free web-based personal financial management service that caters to over 16,000 US and Canadian financial institutions and self-reports having hundreds of thousands of users. Mint's primary service allows users to track bank, credit card, investment, and loan transactions and balances them all through a single user interface -as well as create personal budgets and goals.

In 2009, it was acquired by Intuit, the makers of Quicken and TurboTax. Judging by the look and feel of their site, that merger came with a bump in digital marketing expertise, Mint.com is simple, clean, and makes user acquisition easy.

Source: australia.gov.au

Australian government leans on Drupal to power its website: a sprawling information resource for citizens, visitors, and entrepreneurs. The site hosts over 3,000 distinct pages covering topics from healthcare and culture to career opportunities and travel suggestions. The website even goes the extra mile by linking to local news and social media channels.

Australia.gov.au is a great example of Drupal government websites able to organize and present information. The site is designed like an inverted funnel, with the homepage offering a selection of categories that branch into more specific topics the deeper you dive.

Source: LeFigaro.com

Founded in 1826, Le Figaro is the oldest national newspaper in France. It is the second-largest national newspaper in France after Le Parisien and before Le Monde and is part of Le Figaro Group, whose publications include TV Magazine and Evene.

The site delivers a variety of features that naturally belong on the website of a leading periodical. Page load speed is stellar despite being packed with feeds, media items, and a live video pop-up on the bottom corner of the screen.


The Emmy Awards are a group of American awards dedicated to recognizing the best of U.S. television -from its actors and directors to its engineers, musicians, and humanitarian philanthropists. Their website covers featurettes on notable happenings and personalities surrounding television around the world (though naturally centered on America), as well as event schedules and videos of events and commentary.

Their site is dense in terms of content but doesn't sacrifice the smoothness in presentation --just what you might expect from a showbiz powerhouse like the Television Academy.

While there are many options to choose from regarding themes for your website's content, here's our list of recommended Drupal themes that enable an effective and engaging digital experience.

Source: Keap.com

Keap offers a client management service and automation platform (Infusionsoft) for small businesses. Their products are aimed at streamlining the customer lifecycle, facilitating customer relationship management, marketing automation, lead capture, and e-commerce.

Based in Chandler, Arizona, USA, Keap is one of the fastest growing private companies in the region, adding 240 jobs between 2012 and 2013, and also receiving $54 million in venture capital from Goldman Sachs in early 2013.


ABS-CBN News and Current Affairs is the news division of the ABS-CBN Corporation, a Philippine media conglomerate. It’s headquartered in the Philippines, and has news bureaus in North America, Europe, Asia Pacific, and the Middle East, making it the largest and the most comprehensive news outlet when it comes to local and international newsgathering in the island nation.

Their website is supported by Drupal, which allows them to deliver news in real-time, connect across various social media platforms, and encourage community discussion through a login system for news readers to set up profiles and engage in discussions.

NASA, as we all know, is the American government’s flagship agency for its civilian space program, aeronautics research, and aerospace research. They stand at the forefront of many of the world’s latest discoveries in physics, astronomy, and engineering --and their website is a haven for the world’s science enthusiasts.

Their site hosts information about past and present space missions, ultra-high-definition photos and videos of the cosmos, and download links to a nearly endless amount of apps and learning resources for those looking to learn more about the universe we inhabit. It’s a shining example of Drupal CMS used to present stunning information, and elevate the user’s experience.

Honorable Mentions

As of 2022; the following 2 major platforms were revamped and enhanced as digital experiences with Drupal 8:

Alkhaleej News & Media

Al Khaleej is one of the largest news platforms in the MENA region. The magazine publishes tens of pieces of news on a daily basis, all of which include tons of heavy-media items such as videos and audios and images. So, Vardot was assigned to build their website and make sure it will not slow down or crash under the pressure of  heavy media and hundreds of thousands of visitors. We managed to do this with help of the Drupal site builder Varbase, and Vardot, thanks to this project, was nominated a finalist for the Acquia 2021 Engage Awards.

FIND MORE IN 'AL KHALEEJ' CASE STUDY

States Newsroom

States Newsroom is a non-profit network of over 25 affiliate sites covering state reporting in America. With headquarters in Washington and multiple local teams totaling over 95 staff, the site provides high-quality, non-partisan state reporting with a progressive editorial approach. In order to maintain their ad-free and impartial reporting standards, their philanthropic business model relies on transparent donations from individuals or grants but not corporations or governments.

Do you agree with our list of top 10 Drupal websites in the world? If you don't or see better websites out there worth mentioning... let us know in the comments!

CMS Buyers Guide

Need help choosing the ideal CMS?

Download our free CMS Buyers Guide!

Feb 22 2023
Feb 22

It happens often. Early in a project, a team member will offhandedly say something like "the site is slow," "my computer is slow," or "Docker is slow." Those all may be true. But it takes specialized knowledge to track down the root cause of performance issues. This article covers the same tips we share with our team to help solve workstation performance problems.

We'll cover mostly macOS, but the general framework is the same for any operating system. On Windows, use Task Manager instead of Activity Monitor, and on Linux, use htop or bpytop.

When a computer is slow, the best way to solve it is to:

  1. Identify what programs are using the most resources.
  2. Identify what types of resources are in contention.
    • CPU: Your computer's processor. If it's at 100%, programs must wait their turn to do something.
    • Memory: Where information being used by programs is stored. For example, this article is literally in memory as you view it. Your browser will keep the page in memory even after navigating away, so it's quick to come back if you hit the Back button.
    • Disk: Much slower than memory, but also much bigger. And it's persistent across restarts. For example, when restarting, macOS has to load everything from the disk because memory is cleared on a restart.
    • Network: The combination of Wifi and internet. If something is slow, but the three items above are all good, this is often the issue. For example, when updating ddev to a new version, the slowest part is almost always downloading it from the internet.
    • GPU: Graphics processing. This mostly matters for designers and those who use graphics applications like Figma, which are 3D-accelerated.

Using Activity Monitor

Activity Monitor is useful for seeing what your computer is doing. The easiest way to open it is to open Spotlight (command-space) and search for it.

When you open Activity Monitor for the first time, you will see a window like this:

Below is the memory tab. The most important is "Memory Pressure." Once it goes yellow, you're likely to start noticing slowness.

Disk is also helpful. For example, disk activity will increase significantly when pulling a new Docker image as Docker uncompresses the download.

Finally, the Network tab. If you change Packets to Data, you may find this window easier to think about. You can do this at the bottom.

In the Window menu, there is a CPU History option. It creates a floating window with a graph for each CPU core. Use this window to see what happens when you're doing something in an application. For Intel users, this number is usually doubled. The computer this image was taken from has 8 "real" CPU cores, but the processor exposes them as 16. (You can see if your Mac uses an Intel processor by clicking the Mac icon at the top left and selecting About This Mac.)

iStat menus 

We often find Activity Monitor is too busy to glance at for a quick diagnosis. When something is slow, it helps to know what's going on right now and see a quick representation of CPU, Memory, Disk, and Network all at once. iStat Menus is a go-to app for this, shipping with several useful menubar widgets.

Here's an example set of widgets from left to right:

  1. CPU temperature. On Intel, when this gets high (90° C+ typically), your processor makes itself slower to keep cool. Apple Silicon processors will throttle too, but it's much harder to trigger in day-to-day work.
  2. Network/internet.
  3. Disk - the lights go solid when the disk is being read (blue) or written to (red).
  4. Memory use.
  5. CPU use. The i means it's using integrated graphics, not dedicated graphics, which uses more battery. Apple Silicon computers only have one graphics chip and always show a D, which can be hidden.

As an example, let's tell PHPStorm to "invalidate caches and restart." This will cause lots of CPU and disk activity as it reindexes the loaded Drupal project. We'll click on different sections of the iStats menubar to bring up more detailed windows.

The following two screenshots show the CPU being used, but not fully. That's because PHPStorm is waiting on the disk to read files to index.

A little bit later, we can see the CPU applet nearly filling up. Even typing this article is slightly laggy.

 Fixing problems

My CPU is constantly at 100%

If possible, quit the programs using the CPU you don't actively need. If you see high CPU use from a program you don't recognize, ask about it. Many macOS system programs like WindowServer (used to draw the screen) and kernel_task (used to talk to hardware) can't be quit. But, you may find that closing a browser tab with lots of visuals (like ads) reduces other processes significantly.

My memory is full

You either need to quit programs or upgrade your computer. If you see something oddly using a lot of memory, it may indicate a bug in that program.

My disk is constantly being used

Like the above, you need to quit programs you aren't currently using. Try to keep at least 10-20% of your disk free. Flash disks (SSD drives, which most computers have nowadays) need free space to operate effectively, and a disk that is 90% or more full will be significantly slower. When buying a new computer, remember that larger disks will be faster than smaller ones.

Downloads are slow, or calls are lagging

Rule out Wifi as a problem by connecting your computer directly to your router with a cable. Also, remember that your internet connection is shared with every other device in your home. If the problem is caused by other devices using your home internet connection, check if your router supports Active Queue Management. At the least, if you determine this is the issue, you know buying a new computer won't fix it.

Security scanners can often cause performance issues

Sophos, Bitdefender, and Windows Defender on Windows cause a computer to be horrifically slow, especially for developers. Our projects comprise tens of thousands of small files, the worst case for security scanners. Developers may need to find they have to exclude directories from scanning or disable third-party AV entirely.

Docker is slow on macOS

Lullabot has standardized on ddev for local environments. We recommend all macOS users globally enable mutagen for good performance. For a deep dive into why Docker is slow on macOS, see Paolo Mainardi's excellent article Docker on MacOS is slow and how to fix it. We've found that colima, ddev, and mutagen generally offer native-like performance beyond the initial code sync.

Applications on Apple Silicon are not as fast as expected

If you use Migration Assistant to migrate to a new Mac, it will copy applications over, including those written for older Intel processors. These applications will run fine but are less fast than native Apple Silicon apps. Use the "Kind" column in Activity Monitor to search for Intel apps and upgrade or replace them. Many apps, notably Electron apps like Slack, don't ship universal binaries and require downloading a new copy manually to switch architectures. Likewise, Homebrew should be reinstalled from scratch in /opt/homebrew to get native apps. Finally, Docker apps like colima may need to be reconfigured to use a native VM instead of an Intel VM.

Conclusion

We hope this guide is as helpful for you as it has been for our team. When developing websites, wasted time can mean wasted money, and a slow computer can waste a lot of time. Having an efficient local development environment ensures we provide the most value.

Feb 16 2023
Feb 16

Matt Kleve gets three front-end developers together who have been working hard on Drupal core and are excited about the great things newly released in Drupal 10.

Feb 15 2023
Feb 15

Join us TOMORROW, Thursday, February 16 at 1pm ET / 10am PT, for our regularly scheduled call to chat about all things Drupal and nonprofits. (Convert to your local time zone.)

No pre-defined topics on the agenda this month, so join us for an informal chat about anything at the intersection of Drupal and nonprofits.  Got something specific on your mind? Feel free to share ahead of time in our collaborative Google doc: https://nten.org/drupal/notes!

All nonprofit Drupal devs and users, regardless of experience level, are always welcome on this call.

This free call is sponsored by NTEN.org and open to everyone. 

  • Join the call: https://us02web.zoom.us/j/81817469653

    • Meeting ID: 818 1746 9653
      Passcode: 551681

    • One tap mobile:
      +16699006833,,81817469653# US (San Jose)
      +13462487799,,81817469653# US (Houston)

    • Dial by your location:
      +1 669 900 6833 US (San Jose)
      +1 346 248 7799 US (Houston)
      +1 253 215 8782 US (Tacoma)
      +1 929 205 6099 US (New York)
      +1 301 715 8592 US (Washington DC)
      +1 312 626 6799 US (Chicago)

    • Find your local number: https://us02web.zoom.us/u/kpV1o65N

  • Follow along on Google Docs: https://nten.org/drupal/notes

View notes of previous months' calls.

Feb 09 2023
Feb 09

We work with Pantheon a lot. We love their platform, and how it allows us to focus on developing Drupal projects while leaving the system administration to them. We have a reliable CI pipeline that allows us to develop in GitHub and push a production-ready artifact to Pantheon’s downstream repository - it’s a great developer experience. We’re happy with this portion of our workflow, but once our work is merged to the main branch and deployed to the dev environment on Pantheon, things began to get a little more dicey. Deploying to test and live seems like it should be the easiest part, since Pantheon has their drag & drop UI that everyone reading this is probably already familiar with. The issues that we bump into tend to come when configuration changes are made directly to a production environment.

How we used to deploy

First, let’s take a look at how we have historically deployed to these environments:

  1. Deploy production-ready code to target environment by using Pantheon’s drag & drop UI.
  2. Use a Quicksilver script to run drush cim
  3. Use a Quicksilver script to run database updates using drush updb

This workflow is great, but it makes the big assumption that there are no config overrides on the target environment. Sure, we like to imagine that our code is the only source of truth for production configuration, but that is not always the case. Sometimes, there’s a legitimate reason for a client to make a quick change to production config. When we deploy to an environment with overridden configuration using the above workflow, the client’s configuration changes will get reverted unless the developer catches the overridden config prior to initiating deployment. While there are many approaches that we as developers can and should take to help prevent configuration overrides on production - like setting up appropriate roles, using config_ignore for certain special cases, and core’s config_exclude_modules settings - they can still happen from time to time.

We’ve had a lot of success using Pantheon’s Quicksilver Hooks to automate our deployment steps (seen above), but what are we to do when we deploy to an environment that has overridden configuration? Should we not import our new configuration? Or should we blindly import our config changes and revert the existing overrides? Clearly, neither option is ideal. Along with this dilemma, relying solely on Quicksilver hooks presented a few other challenges that we wanted to improve on:

  • Reporting: Unless you are running terminus workflow:watch or looking at terminus workflows:info:logs for every deployment, it’s not clear what’s actually taking place during a deployment.
  • Lack of clarity: Without reading about them in a project’s docs or checking the project’s pantheon.yml , a developer initiating a deployment may not even be aware that quicksilver hooks exist and are going to execute on the target environment!
  • Inflexible: Quicksilver hooks do the same thing every deployment, and don’t ask questions. Without reverting to something like keywords in commit messages, there’s no way that a step can be skipped or altered per-deployment.
  • Lack of an escape hatch: Once a deployment is initiated, there’s no pre-flight check that can give the option to abort.

Our new approach

These are the reasons that we started investigating a new method to handle deployments to test and live on Pantheon, and in order to address them we created a few hard requirements:

  • As a developer, I should be able to abort a deployment if there are configuration overrides on the target environment.
  • As a developer, I should be able to easily know what steps are executed during a deployment. There should be no surprises.
  • As a developer, I should be able to easily see logs from all deployments initiated by our team.
  • As a developer, I should be able to update our deployment workflow in one place across all of our projects. (This one was a nice-to-have.)

As a developer, I should be able to abort a deployment if there are configuration overrides on the target environment.

To start, we looked at how we could create a deployment process that could self-abort if there were configuration overrides. This was our highest priority requirement. We needed to avoid blindly reverting existing configuration changes that had been made on production. Since telling our development team to “just check config on prod prior to deployment” was not an acceptable solution for us, we created a new terminus plugin to help with this: lastcallmedia/terminus-safe-deploy. This plugin adds a new command terminus safe-deploy:deploy SITE.ENV that will run through all of the steps of our traditional deployment (along with a few more optional ones). Before initiating the deployment on Pantheon, the plugin will check for overridden configuration on the target environment and abort if it finds any. If the --force-deploy flag is set it will still check for overridden configuration and output what it finds, and then continue the deployment.

As a developer, I should be able to easily know what steps are executed during a deployment. There should be no surprises.

We added several other flags to the safe-deploy:deploy command that would allow us to explicitly state which operations we wanted to perform during a deployment:

  • --with-cim: triggers config imports post-deploy
  • --with-updates triggers DB updates post-deploy
  • --clear-env-caches clears the target environment CDN and Redis caches. This is something that we didn’t typically include in our Quicksilver scripts, but we saw value in making it easily accessible for the times that we needed it.

Each flag must be added so we make the conscious decision to include it as a part of our deployment.

As a developer, I should be able to easily see logs from all deployments initiated by our team.

We preferred not to rely on terminus workflow:info:logs to see what happened for each deployment. Our developers were already visiting the GitHub repository to review and merge pull requests, so GitHub seemed like the perfect place to initiate our deployments and store the logs as well. We decided to use GitHub Actions to trigger the deployments. We use their workflow_dispatch event to initiate our deployments manually, and as a bonus they provide an interface for workflow inputs, which we can correlate to the flags on the terminus command. We also included the ability to post success/failure messages to Slack, with links to each job so the team can easily see when deployments are run, if they pass/fail, and have a link to view the logs without having to search.

To use Slack alerts the command accepts a --slack-alert flag and a --slack-url argument (or a SLACK_URL can be set as an environment variable).

As a developer, I should be able to update our deployment workflow in one place across all of our projects.

This was a bonus requirement that we’re really excited about. GitHub actions allows the reuse of workflows from public repositories in other workflows, so we built a single terminus-safe-deploy workflow, and we are able to reference it from individual project workflows as seen in this gist. This lets us merge changes into the workflow (like updating the docker image used, or adding another step if needed) without having to update each individual project’s workflow files. In the example above, we are calling the workflow from the main branch but you can reference specific tags or commits if you prefer to prevent changes to the workflow for a particular project.

The End Result

Initializing a deployment from GitHub Actions Initialization of a deployment from GitHub Actions

The time spent on this investigation was well worth the effort for our team. As we remove the quicksilver hooks from our projects and replace them with GitHub Actions workflows, we feel a new sense of confidence, knowing that if our deployments to test and prod are going to impact overridden configuration, the deployment will abort itself unless we explicitly tell it to continue. Having a user interface that allows us to explicitly choose which steps we run (with the most common options being set by default) gives us the control that had we desired for these deployments, while still being as simple as using the drag and drop UI. An added benefit of this approach is that it doesn’t require any institutional knowledge, so if another team gets involved, or the client pushes code but is not familiar with using GitHub Actions, there’s no harm for them to use the drag & drop UI within Pantheon, and they don’t have to worry about any unexpected operations taking place in the background once their code is deployed to the target environment.

Setting it up yourself

We chose to implement this as a single terminus command that gets invoked by a reusable GitHub Actions workflow to keep setup easy. In order to add this deployment workflow there are just a few steps:

  1. Copy the contents of this gist workflow file to .github/workflows/pantheon_deploy.yml
  2. Add required secrets to your repository:
    • Visit https://github.com/{ORGANIZATION}/{REPO_NAME}/settings/secrets/actions
    • Add:
      1. PANTHEON_PRIVATE_SSH_KEY: A private key associated with an account that has access to run Drush commands on Pantheon.
      2. TERMINUS_MACHINE_TOKEN: A Pantheon machine token associated with a user who has access to deploy to the current project on Pantheon
    • Note: At LCM since we use this workflow across multiple projects, we have stored these as organization secrets. This makes the secrets available to any repositories we specify, and we only have to create one set.
  3. Add required actions variables:
    • Visit https://github.com/{ORGANIZATION}/{REPO_NAME}/settings/variables/actions
    • Add:
      1. PANTHEON_SITE_NAME: The machine name of the pantheon site. This is the name used as a part of the pantheon-specific domains such as https://dev-{SITE_NAME}.pantheonsite.io
      2. SLACK_URL (optional): A url provided by slack that you can post to, which will send messages to your channel of choice. Talk to your Slack admin to set one of these up if needed.

One last thing

We still love Quicksilver hooks. We continue to use them for other types of background tasks such as creating deployment markers in New Relic and notifying Slack when certain operations are performed. They offer great functionality, but we prefer to keep our mission-critical deployment steps elsewhere.

Feb 08 2023
Feb 08

What is currently the default Drupal theme?

Starting from Drupal 7, website builders and content managers have been working with the Seven theme, and it is currently the default Drupal admin theme. Over the years, web development and web design got enriched by lots of fresh ideas and standards the Seven theme didn’t comply with. Changes suggested themselves.

It was not an end in itself to release a new admin page design. The principal Internet user, as well as the user of content publishing products like Drupal, does not have an engineering or computer programming degree. It is an ordinary person who wants to share something with the world: thoughts, important information, goods, or services. This is why designers and developers of Drupal core focused their efforts on such users.

How Claro became the default admin theme in Drupal 10

Site developers got the experimental Claro theme with Drupal 8.8. It was created and fine-tuned by Drupal enthusiasts who banded together within the Drupal Admin UI and JavaScript Modernization initiative. They aimed to improve the admin panel for website builders and content creators using the best practices of interface design and JavaScript available at the time.

The work on the admin dashboard design started with consulting administrators and content managers who deal with the Drupal admin panel every day. The study revealed that it was necessary to simplify the user interface, access to documentation, and media file management, as well as to get rid of numerous development terms. To sum up the feedback, everybody wanted to have a convenient tool that would allow flexible customization for specific needs.

In April 2022, it was announced that Claro was stable (i.e. the theme had passed internal tests and had no critical errors) and would be included in Drupal core starting from version 9.4.

You can express your gratitude to the members of the Drupal Admin UX User Study team for the work they have done. Here are their accounts on drupal.org: Sarah Lowe, Michelle Jackson, Cristina Chumillas, Antonella Severo, and Roy Scholten.

Claro administrative interface

New features

Here is the complete list of redesigned interface elements.

Base:

  • Typography
  • Color palette
  • Iconography
  • Layers and surfaces
  • Spacing and sizing

Individual components:

  • Buttons and dropdowns
  • Form fields
  • Selects
  • Basic form controls
  • Tables
  • Messages
  • Tags or Entity reference
  • Breadcrumb and page title
  • Navigation list
  • Details and accordions

Pages:

  • Node edit form
  • Content list
  • Admin forms

Now let us tell you about some particular features.

State-of-the-art UI based on accessibility design requirements

The age-old question is: Drupal or WordPress? The Drupal Admin UX User Study showed that WordPress was the winner in terms of usability. Perhaps one day improvements in the admin panel UI that take into account the requirements of Web Content Accessibility Standards will let Drupal rank highest among CMSs.

For the sake of readability, a 16px font size was selected as the main size. Other sizes were chosen based on the modular scale.

Feb 06 2023
Feb 06

Views is a powerful module that allows you to create all sorts of components. It can be used to create something simple such as a list of articles, or complex such as a carousel or even an embedded map.

The Views UI can be intimidating if you’re new to Drupal, but as you use the module, you’ll find bits of functionality hidden deep in the interface.

One feature I want to discuss, which may not be evident at first, is the ability to add Twig code into view a fields.

Adding Twig code into a field allows you to change a field’s output dynamically. Which can be helpful under certain circumstances.

To add Twig code or some HTML into a field, click on the field, expand “Rewrite results”, check “Override the output of this field with custom text” and add your code into the text area.

Feb 01 2023
Feb 01

The release of Drupal 10 has been highly anticipated by the Drupal community, and it was finally launched in December 2022. This latest version of the content management system brings several new features and functional improvements that will make content creation and management easier while also improving SEO, and driving conversions.

In this blog, we'll highlight the key benefits of Drupal 10 for marketers and website managers.

Drupal 10 - What You Need To Know

Improved Text Editor

Drupal 10 - What You Need To Know

Image Source: CKEditor.com - https://ckeditor.com/blog/drupal-and-ckeditor-taking-content-editing-to-the-next-level/image01.png

Drupal 10 features an upgraded WYSIWYG text editor that moves from CKEditor 4 to CKEditor 5, offering a lighter and fresher look with improved icons and toolbar items. This new text editor is designed to make life easier for content editors.

Sleek Backend Admin Theme

Drupal 10 - What You Need To Know

Image source: Drupal.org Claro - https://www.drupal.org/files/claro_node_add.png

Drupal 10 includes the Claro backend admin theme, offering a significant upgrade in user experience and making Drupal look modern. For those looking for even more advanced features, there is also a sub-theme called Gin available.

Layout Builder

Drupal 10 - What You Need To Know


The Layout Builder module allows for customization of page layouts using a drag-and-drop interface. You can customise a single page or create a template for specific content types.

Improved Media Management

Drupal 10 introduces an overhauled media management system, making it easier to upload, manage, and reuse media files. The media library makes it easier to find and use assets.

Ease of Use

Drupal 10 - What You Need To Know

Image source: DriesNote DrupalCon Global

A UX survey conducted at DrupalCon Amsterdam in 2019 showed that while beginners found Drupal difficult to use, more advanced users had a positive experience. As a result, DrupalCon 2020 focused on improving the user experience for new users. The layout builder, Claro admin theme, and media management system have been bundled together for a more user-friendly experience, and are enabled by default in Drupal 10.

New Content Moderation System

Drupal 10 includes a new content moderation system that makes managing content easier, allowing you to create and manage moderation workflows.

Improved Performance

Drupal 10 features performance enhancements, including a switch to a new database driver that is said to improve performance by up to 20%.

Enhanced Security

Drupal 10 includes security improvements, including a security report to identify potential vulnerabilities, better password hashing, and a setting to prevent clickjacking attacks.

Drupal 10 Migration

If you're setting up Drupal 10 for the first time, congratulations! For those upgrading from an earlier version, here's a quick guide to help you through the process.

Upgrading from Drupal 7:

  • Full site migration to Drupal 9 or 10 is required.
  • Use the Upgrade Status Module to check for compatible releases, then the Migrate module suite to migrate content and configuration manually.
  • Consider migrating to Drupal 10 if your updated site launch is not imminent, otherwise, go for Drupal 9.

Upgrading from Drupal 8:

  • Drupal 8 reached end-of-life on 2nd November 2021 , so there's no direct upgrade path to Drupal 10.
  • Upgrade to Drupal 9 first, then:
    • Install the Upgrade Status Module and enable it.
    • Scan modules for Drupal 9 compatibility and update as needed.
    • Update Drupal core to Drupal 9.

Upgrading from Drupal 9:

Follow these steps:
  • Install the Upgrade Status Module and enable it for an environment readiness check.
  • Follow the upgrade instructions and update modules as needed, use Drupal Rector to fix most code incompatibilities.
  • Update Drupal Core to Drupal 10.

If you're looking to upgrade your website to Drupal 10, contact our team of dedicated Drupal developers for expert support.

Pages

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web