Aug 23 2019
Aug 23

With enterprises looking for ways to stay ahead of the curve in the growing digital age, machine learning is providing them with the needed boost for seamless digital customer experience.

Machine learning algorithms can transform your Drupal website into an interactive CMS and can come up with relevant service recommendations targeting each individual customer needs by understanding their behavioural pattern.


Machine Learning integrated Drupal website ensures effortless content management and publishing, better targeting and empowering your enterprise to craft personalized experiences for your customers. It automates the customer service tasks and frees up your customer support teams, subsequently impacting RoI.

However, with various big names competing in the market, let’s look at how Amazon’s Machine Learning stands out amongst all and provides customised offerings by integrating with Drupal.

Benefits of Integrating AWS Machine Learning with Drupal

AWS offers the widest set of machine learning services ranging from pre-trained AI services for computer vision, language, recommendations, and forecasting. These capabilities are built on the most comprehensive cloud platform and are optimized without compromising security. Let’s look at the host of advantages it offers when integrated with Drupal.

Search Functionality

One of the major problems encountered while searching on a website is the usage of exact keyword. If the content uses a related keyword, you will not be able to find it without using the correct keyword.

This problem can be solved by using machine learning to train the search algorithm to look for synonyms and display related results. The search functionality can also be improved by using automatically filtering as per past reads, the search results according to the past reads, click-through rate, etc.

Amazon Cloudsearch is designed to help users improve the search capabilities of their applications and services by setting up a scalable search domain solution with low latency and to handle high throughput.

Image Captioning

Amazon Machine Learning helps in automatic generation of related captions for all images on the website by analyzing the image content. The admin would have the right to configure whether the captions should be added automatically or after manual approval, saving a lot of time for the content curators and administrators of the website.

Amazon Rekognition helps search several images to find content within them and easily helps segregate them almost effortlessly with minimal human interaction.

Website Personalization

Machine learning ensures users get to view tailored content on websites as per their favorite reads and searches by assigning them unique identifier (UID) and tracking their behaviour (clicks, searches, favourite reads etc) on the website for personalized web experience.

Machine learning analyzes the data connected with the user’s UID and provides personalized website content.

Amazon Personalize is a machine learning service which makes it easy for developers to create individualized recommendations for its customers. It saves upto 60% of the time needed to set up and tune the infrastructure for the machine learning models as compared to setting own environment.

Another natural language processing (NLP) service that uses machine learning to find insights and relationships in text is Amazon Comprehend. It easily finds out which topics are the most popular on the internet for easy recommendation. So, when you’re trying to add tags to an article, instead of searching through all possible options, it allows you to see suggested tags that sync up with the topic.

Vulnerability Scanning

A website is always exposed to potential threats, with a risk to lose customer confidential data.

Using machine learning, Drupal based websites can be made secure and immune to data loss by automatically scanning themselves for any vulnerabilities and notifying the administrator about them. This gives a great advantage to websites and also help them save the extra cost spent on using external software for this purpose.

Amazon Inspector is an automated security assessment service, which helps improve the security and compliance of the website deployed on AWS and assesses it for exposure, vulnerabilities, and deviations from best practices.

Voice-Based operations

With machine learning, it’s possible to control and navigate your website by using your voice. With Drupal standing by its commitment towards accessibility, when integrated with Amazon Machine Learning features, it promotes inclusion to make web content more accessible to people.

Amazon Transcribe is an automatic speech recognition (ASR) service. When integrated with a Drupal website, it benefits the media industry with live subtitling of news or shows, video game companies by streaming transcription to help hearing-impaired players, enables stenography in courtrooms in legal domain, helps lawyers to make legal annotations on top of live transcripts, and enables business productivity by leveraging real-time transcription to capture meeting notes.

The future of websites looks interesting and is predicted to benefit users through seamless experience by data and behavior analysis. The benefits of integrating Amazon Machine Learning with Drupal will clearly give it a greater advantage over other CMSs and will pave the way for a brighter future and better roadmap.

Srijan has certified AWS professionals and an expertise in AWS competencies. Contact us to get started with the conversation.

Aug 05 2019
Aug 05

Using cloud is about leveraging its agility among other benefits. For the Drupal-powered website, a right service provider can impact how well the website performs and can affect the business revenue.

A robust server infrastructure, such as AWS, when backs up the most advanced CMS, Drupal, it proves to accelerate the website’s performance, security and availability.

But why AWS and what benefits does it offer over others? Let’s deep dive to understand how it proves to be the best solution for hosting your Drupal websites.

Points To Consider For Hosting Drupal Websites

The following are the points to keep in mind while considering providers for hosting your pure or headless Drupal website.

Better Server Infrastructure: Drupal specialised cloud hosting provider should offer a server infrastructure that is specifically optimized for running Drupal websites in a way they were designed to run.

Better Speed: It should help optimise the Drupal website to run faster and should have the ability to use caching tools such as memcache, varnish, etc.

Better Support: The provider should offer better hosting support with the right knowledge of a Drupal website.

Better Security and Compatibility: The hosting provider should be able to provide security notifications, server-wide security patches, and even pre-emptive server upgrades to handle nuances in upcoming Drupal versions.

Why not a traditional server method?

There are two ways of hosting Drupal website via traditional server setups: 

  • a shared hosting server, where multiple websites run on the same server
  • or  a dedicated Virtual Private Server (VPS) per website. 

However, there are disadvantages to this approach, which are:

  1. With a lot of non-redundant single-instance services running on the same server, there are chances if any component crashes, the entire site can get offline.
  2. Being non-scalable, this server does not scale up or down automatically and requires a manual intervention to make changes to the hardware configuration and may cause server to go down due to an unexpected traffic boost.
  3. The setup constantly runs at full power, irrespective of usage, causing wastage of resources and money.

    Hosting Drupal on AWS

Amazon Web Services (AWS) is a pioneer of cloud hosting industry providing hi-tech server infrastructure and is proved to be highly secure and reliable.
With serverless computing, developers can focus on their core product instead of worrying about managing and operating servers or runtimes, either in the cloud or on-premises. It eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning. It enables you to build modern applications with increased agility and lower total cost of ownership and time-to-market.

With serverless being the fastest-growing trend with an annual growth rate of 75% and foreseen to be adopted at a much higher rate, let’s understand the significance of the AWS components in the Virtual Private Cloud (VPC). Each of these components proves it to be the right choice for hosting pure or headless websites.

AWS-syanmic-content-srijan-technologiesArchitecture diagram showcasing Drupal hosting on AWS

  •  Restrict connection: NAT Gateway

Network Address Translation (NAT) gateway enables instances in a private subnet to connect to the internet or other AWS services. Hence, the private instances in the private subnet are not exposed via the Internet gateway, instead, all the traffic is routed via the NAT gateway. 

The gateway ensures that the site will always remain up and running. AWS takes over the responsibility of its maintenance.

  • Restrict access: Bastion Host
Bastion hosts protects the system by restricting access to backend systems in protected or sensitive network segments. Its benefit is that it minimises the chances of any potential security attack.
  • Database: AWS Aurora

The Aurora database provides invaluable reliability and scalability, better performance and response times. With fast failover capabilities and storage durability, it minimizes technical obstacles.

  • Upload content: Amazon S3

With Amazon S3,  store, retrieve and protect any amount of data at any time in a scalable storage bucket. Recover lost data easily, pay for the storage you actually use, protect data from unauthorized use and easily upload and download your data with SSL encryption.

  • Memcached/Redis: AWS Elasticache
Elasticache is a web service that makes it easy to set up, manage, and scale a distributed in-memory data store in the cloud.
  • Edge Caching: AWS CloudFront
CloudFront is an AWS content delivery network which provides a globally-distributed network of proxy servers which cache content locally to consumers, to improve access speed for downloading the content.
  • Web servers: Amazon EC2

Amazon EC2 is a web service that provides secure, resizable compute capacity in the cloud.

Amazon Route 53 effectively connects user requests to infrastructure running in AWS and can also be used toroute users to infrastructure outside of AWS.

Benefits of Hosting Drupal Website on AWS

Let’s look at the advantages of AWS for hosting pure or headless Drupal websites.

High Performing Hosting Environment

The kind of performance you want from your server depends upon the type of Drupal website you are building. A simple website with a decent amount of traffic can work well on a limited shared host platform. However, for a fairly complex interactive Drupal site, a typical shared hosting solution might not be feasible. 

Instead, opt for AWS, which provides a server facility and you get billed as per your usage.

Improved Access To Server Environment 

Shared hosting environment restricts its users to gain full control and put a limitation on their ability to change configurations for Apache or PHP, and there might be caps on bandwidth and file storage. These limitations get removed when you're willing to pay a higher premium for advanced level access and hosting services.

This is not true with AWS, which gives you direct control over your server instances, with permissions to SSH or use its interface control panel to adjust settings.

Control over the infrastructure 

Infrastructure needs might not remain constant and are bound to change with time. Adding or removing hosting resources might prove to be difficult or not even possible and would end up paying for the unused resources.

However, opting for AWS will let you pay for the services you use and can shut them off easily if you don’t need them anymore. On-demand virtual hosting and a wide variety of services and hardware types make AWS convenient for anyone and everyone.

No Long-term Commitments

If you are hosting a website to gauge the performance and responsiveness, you probably would not want to allocate a whole bunch of machines and resources for a testing project which might be over within a week or so.

The convenience of AWS on-demand instances means that you can spin up a new server in a matter of minutes, and shut it down (without any further financial cost) in just as much time.

Refrain Physical Hardware Maintenance

The advantage of using virtual resources is to avoid having to buy and maintain physical hardware. 

Going with virtually hosted servers with AWS helps you focus on your core competency - creating Drupal websites and frees you up from dealing with data center operations.

Why Choose Srijan?

Srijan’s team of AWS professionals can help you migrate your website on AWS cloud. With an expertise in enhancing Drupal-optimised hosting environment utilising AWS for a reliable enterprise-level hosting, we can help you implement various AWS capabilities as per your enterprises’ requirements. Drop us a line and let our experts explore how you can get the best of AWS.

Jul 26 2019
Jul 26

The expanding data landscape is feeding the demand for higher operational agility. This calls for a more responsive, reliable IT infrastructure — that doesn’t rake up millions — minimizes delays and downtime while improving security and making infrastructure more agile.

Between capacity constraints and unpredictable pricing models, AWS offers suited workloads for growing infrastructure needs with a host of services - IaaS, PaaS, SaaS - for Drupal enterprises. 

Here’s how you can run your Drupal up to 40% cheaper.

Keeping the Business Innovative and Agile

Increasing demands for performance, scalability and agility have never been higher. Success and growth for businesses depend on it. At the same time, changing landscape is forcing businesses to opt for lower costs, greater use of cloud resources and better customer service

While these changes have implications on infrastructure, compute and networking resources, they also impact storage. 

Lack of enough database storage, for example, can adversely impact application performance. Fast-growing applications may need more storage than expected or immediate storage resources.

 

Capacity and storage issues can hinder business agility

 

The continuous need for speed and efficiency is driving businesses to opt for storage as a service (STaaS) model. But there is more to it when it comes to the the benefits. Businesses get:

  • Better insights at reasonable cost: Providing a highly scalable environment at a low cost capable of handling the massive volume and velocity of data, organizations can shift from the two available models (CapEx and OpEx) for more predictable costs.
  • Better collaboration: Cloud-based business solutions accelerate innovation, delivering business analytics at the point of impact and enabling collaboration by creating and linking business networks.
  • Innovation and variety of solutions: Forward-thinking enterprises adopt for STaaS to speed up business innovation, improve overall data-centre efficiency, achieve integrated and innovative business results.  
  • Proven results: Organizations achieve their desired business outcomes by improving the responsiveness of their IT infrastructure without increasing risk or cost.

Capacity and storage issues can hinder your business agility. 

In order to avoid such challenges in the future, Drupal-powered enterprises need to constantly understand and adapt to the changing landscapes.

Azim Premji Foundation, Georgia Technical Authority, Department of Homeland security, USA are powered by Drupal and supported by AWS

While Drupal helps balance the rapid data growth, the right cloud storage solution needs to offer security and robust scalability without constraining the budget and prepare IT and marketing for what comes next.

Run your Drupal 40% cheaper

Choosing the right technology is crucial to avoid equipment failures and the costs of upgrading hardware. Small to medium enterprises and non-profit especially need sustainable solutions for future needs to run its operations without overcommitting budgets today. 

Finding the perfect match, organizations such as Azim Premji Foundation, Georgia Technical Authority, UCAS, Department of Homeland security - USA,  are powered by Drupal and supported by AWS.

Enterprises need sustainable solutions without over committing budgets today

AWS offers cloud web hosting solutions that provide businesses, non-profits, and governmental organizations with low-cost ways to deliver their websites and web applications.

The pay-as-you-go approach lets you pay only for the individual services you need, for as long as you use, and without requiring long-term contracts or complex licensing. 

Similar to how you pay for utilities like water and electricity.  

You only pay for the services you consume, and once you stop using them, there are no additional costs or termination fees.

The pricing models give your enterprises the flexibility to grow your business unencumbered by  IT

  • Pay-as-you-go

With AWS you only pay for what use, helping your organization remain agile, responsive and always able to meet scale demands. Allowing you to easily adapt to changing business needs without overcommitting budgets and improving your responsiveness to changes, reducing the risk of over positioning or missing capacity.

Drupal-AWS-PriceBy paying for services on an as-needed basis, you can redirect your focus to innovation and invention, reducing procurement complexity and enabling your business to be fully elastic.

  • Save when you reserve

By using reserved capacity, organizations can minimize risks, more predictably manage budgets, and comply with policies that require longer-term commitments.

For certain services like Amazon EC2 and Amazon RDS, enterprises can invest in reserved capacity. With Reserved Instances, you can save up to 75% over equivalent on-demand capacity.

Drupal-AWS-Reserve-price

When you buy Reserved Instances, the larger the upfront payment, the greater the discount.

  • Pay less by using more

Providing volume-based discounts, organizations can save more by increasing usage. . For services such as S3 and data transfer OUT from EC2, pricing is tiered, meaning the more you use, the less you pay per GB.

In addition, data transfer IN is always free of charge.

As a result, as your AWS usage needs increase, you benefit from the economies of scale that allow you to increase adoption and keep costs under control.

As your organization evolves, AWS also gives you options to acquire services that help you address your business needs. For example, AWS’ storage services portfolio, offers options to help you lower pricing based on how frequently you access data and the performance you need to retrieve it.

Drupal-AWS-storage-price

To optimize the savings, choosing the right combinations of storage solutions can help reduce costs while preserving performance, security and durability.

The pricing models give your enterprises the flexibility to grow your business unencumbered by  IT.

Case Study: Reducing cost & improving operational efficiency for Drupal application with AWS

Our client which is a legal firm and helps provide jurisdiction and litigant simple, seamless, and secure access to the record of legal proceedings. They built a SaaS-based workflow management application on Drupal to manage and track digital recordings of legal proceedings, transcripts including appeals to the stakeholders.

The goal was to build a robust, cloud-based server to effectively handle the processing and access to a large volume of text, audio and video files.

Since the business model was dependent upon frictionless uploading and downloading of text and media files, AWS cloud-based server came out as a unanimous solution. 

Business benefits

  • Simplified integration of the client's Drupal application with AWS S3, to enable flexible, cloud-native storage
  • As a result of going all-in into the AWS Cloud, the client reduced costs by 40% and increased operational performance by 30-40%
  • Dynamic storage and pay-as-you-go pricing enabled the client to leverage a highly cost-effective cloud-storage solution

Read complete case study on Cloud-Native Storage for Drupal Application with AWS

Get no-cost expert guidance

Designed to help you solve common problems and build faster, Amazon Web Services provides a comprehensive suite of solutions to secure and run your sophisticated and scalable applications.

Srijan is an AWS Advanced Consulting Partner. Schedule a meeting with our experts at no cost or sales pitch and get started with your cloud journey.

Aug 23 2018
Aug 23

Background

For our two main websites (QSR & FNF), we send out a monthly e-letter to about 20k+ each and we have it setup such that it goes through Amazon SES and all email notifications (deliveries, bounces, unsubs, complaints, etc.) get posted back to the site. Our custom Drupal module receives each notification and updates several places so that we can track bounce rates, delivery rates, and opt out people who complain or unsubscribe. So our site bogs down (at least for authenticated traffic) when we send the 40k e-letters because these notifications bypass all of the layers of caching in order to make those database updates.

Inspiration

Decoupled Drupal is a major mind-shift for me. QSR was our first Drupal (6) site back in 2010 and over the last 8 years, we have written over 40 custom modules to do things big (lead generation, circulation, etc.) and small (user surveys, etc.).

The advantage is that it's one place for your users to go for all the tools they need. The disadvantage, though, is that your server resources are shared and is probably taking away from the higher priority of serving your users.

There's also something to be said about splitting a feature off into an environment where you're free to choose the best tech for the job, which might not necessarily be Drupal.

Setup

First, this article was a big help in getting things setup. I ended up using a different table schema, just having 4 fields, event_id (the SNS event messageid, which is also my primary key), the source (so I can gather items based on the site), a processed boolean flag, and the message itself, stringified JSON.

One thing to keep in mind is that SNS posts its event differently to HTTP(S) than it does for Lambda, so you cannot rely on your HTTP(S) examples as test cases. I have a (sanitized) captured example here.

Finally, the easy/cool bit is changing the SNS subscription from your HTTP(S) endpoint to your Lambda function. You don't even have to program a subscription confirmation for that - it just works.

Next Steps

So I went live with this without testing an actual load scenario. Big mistake! Once the SNS messages came flying in, Lambda reported a ton of errors and DynamoDB's default write capacity caused a lot of throttling. So while Lambda can scale from input dynamically, what it does with the output can wreak havoc. I would highly recommend you do some load testing based on your situation. You can set up a test run to send 40k emails to a random assortment of AWS SES' test addresses. I ended up having to scramble my Lambda and DynamoDB configurations to bump up the timeout max and enable auto-scaling for writes. I ended up losing a lot of tracking data because my Lambda function didn't fail properly and SNS thought everything was OK and didn't try again. :(

After I get that fixed and more bulletproof, my next step is to write a cron job to gather any unprocessed messages that belong to the site and process them. I'll write a follow-up post when I'm done with that.

And once I'm proud of my Lambda function, I'll post that, too. Update: Here it is.

Conclusion

So the tradeoff is that my reporting is not real-time and there are some AWS costs, but this frees up our web server to do what it should be doing best: serving content to our readers.
Nov 16 2015
Nov 16

We’ve recently begun moving to amazon web services for hosting, however we still need to authenticate through ITS who handles the central SSO Authentication services for Virginia.edu.  In previous posts we looked at Pubcookie aka Netbadge - however Pubcookie is getting pretty long in the tooth (it’s last release back in 2010) and we are running Ubuntu 14 with Apache 2…. integrating pubcookie was going to be a PITA…. so it was time to look at Shibboleth – an Internet2  SSO standard that works with SAML  and is markedly more modern than pubcookie – allowing federated logins between institutions etc…

A special thanks to Steve Losen who put up with way more banal questions than anyone should have to deal with… that said, he’s the man :)

Anyhow – ITS does a fine job at documenting the basics - http://its.virginia.edu/netbadge/unixdevelopers.html.  Since we’re using ubuntu the only real difference is that we used apt-get

Here’s the entire install from base Ubuntu 14

apt-get install apache2 mysql-server php5 php-pear php5-mysql php5-ldap libapache2-mod-shib2 shibboleth-sp2-schemas drush sendmail ntp

Apache Set up

On the Apache2 side  we enabled some modules and the default ssl site

a2enmod ldap rewrite  shib2 ssl
a2ensite default-ssl.conf

Back on the apache2 side here’s our default SSL Screen Shot 2015-11-16 at 10.18.33 AM

<IfModule mod_ssl.c>
<VirtualHost _default_:443>
ServerAdmin [email protected]
ServerName bioconnector.virginia.edu:443
DocumentRoot /some_web_directory/bioconnector.virginia.edu
<Directory /some_web_directory/dev.bioconnector.virginia.edu>
AllowOverride All
</Directory>

SSLEngine on

SSLCertificateFile /somewheresafe/biocon_hsl.crt
SSLCertificateKeyFile /somewheresafe/biocon_hsl.key

<Location />
AuthType shibboleth
ShibRequestSetting requireSession 0 ##This part meant that creating a session is possible, not required
require shibboleth
</Location>

the location attributes are important – if you don’t have that either in the Apache conf you’ll need it in an .htaccess in the drupal directory space

Shibboleth Config

The Shibboleth side confused me for a hot minute.

we used  shib-keygen as noted in the documentation to create keys for shibboleth and ultimately the relevant part of our /etc/shibboleth/shibboleth2.xml looked like this

<ApplicationDefaults entityID=”https://www.bioconnector.virginia.edu/shibboleth
REMOTE_USER=”eppn uid persistent-id targeted-id”>

<Sessions lifetime=”28800″ timeout=”3600″ relayState=”ss:mem”
checkAddress=”false” handlerSSL=”true” cookieProps=”https”>
<!–we went with SSL Required – so change handlerSSL to true and cookieProps to https

<SSO entityID=”urn:mace:incommon:virginia.edu”>
SAML2 SAML1
</SSO>
<!–this is the production value, we started out with the testing config – ITS provides this in their documentation–>

<MetadataProvider type=”XML” file=”UVAmetadata.xml” />
<!–Once things are working you should be able to find this at https://www.your-virginia-website.edu/Shibboleth/Metadata – it’s a file you download from ITS = RTFM –>
<AttributeExtractor type=”XML” validate=”true” reloadChanges=”false” path=”attribute-map.xml”/>
<!–attribute-map.xml is the only other file you’re going to need to touch–>

<CredentialResolver type=”File” key=”sp-key.pem” certificate=”sp-cert.pem”/>
<!–these are the keys generated with shib-keygen –>
<Handler type=”Session” Location=”/Session” showAttributeValues=”true”/>
<!–During debug we used https://www.bioconnector.virginia.edu/Shibboleth.sso/Session with the  showAttributeValues=”true” setting on to see what was coming across from the UVa  Shibboleth IdP–>

/etc/shibboleth/attribute-map.xml looked like this

<Attribute name=”urn:mace:dir:attribute-def:eduPersonPrincipalName” id=”eppn”>
<AttributeDecoder xsi:type=”ScopedAttributeDecoder”/>
</Attribute>

<Attribute name=”urn:mace:dir:attribute-def:eduPersonScopedAffiliation” id=”affiliation”>
<AttributeDecoder xsi:type=”ScopedAttributeDecoder” caseSensitive=”false”/>
</Attribute>
<Attribute name=”urn:oid:1.3.6.1.4.1.5923.1.1.1.9″ id=”affiliation”>
<AttributeDecoder xsi:type=”ScopedAttributeDecoder” caseSensitive=”false”/>
</Attribute>

<Attribute name=”urn:mace:dir:attribute-def:eduPersonAffiliation” id=”unscoped-affiliation”>
<AttributeDecoder xsi:type=”StringAttributeDecoder” caseSensitive=”false”/>
</Attribute>
<Attribute name=”urn:oid:1.3.6.1.4.1.5923.1.1.1.1″ id=”unscoped-affiliation”>
<AttributeDecoder xsi:type=”StringAttributeDecoder” caseSensitive=”false”/>
</Attribute>

<Attribute name=”urn:mace:dir:attribute-def:eduPersonEntitlement” id=”entitlement”/>
<Attribute name=”urn:oid:1.3.6.1.4.1.5923.1.1.1.7″ id=”entitlement”/>

<Attribute name=”urn:mace:dir:attribute-def:eduPersonTargetedID” id=”targeted-id”>
<AttributeDecoder xsi:type=”ScopedAttributeDecoder”/>
</Attribute>

<Attribute name=”urn:oid:1.3.6.1.4.1.5923.1.1.1.10″ id=”persistent-id”>
<AttributeDecoder xsi:type=”NameIDAttributeDecoder” formatter=”$NameQualifier!$SPNameQualifier!$Name” defaultQualifiers=”true”/>
</Attribute>

<!– Fourth, the SAML 2.0 NameID Format: –>
<Attribute name=”urn:oasis:names:tc:SAML:2.0:nameid-format:persistent” id=”persistent-id”>
<AttributeDecoder xsi:type=”NameIDAttributeDecoder” formatter=”$NameQualifier!$SPNameQualifier!$Name” defaultQualifiers=”true”/>
</Attribute>
<Attribute name=”urn:oid:1.3.6.1.4.1.5923.1.1.1.6″ id=”eduPersonPrincipalName”/>
<Attribute name=”urn:oid:0.9.2342.19200300.100.1.1″ id=”uid”/>
</Attributes>

Those two pieces marked in red are important – they’re going to be the bits that we pipe in to Drupal

For  debugging we used the following URL https://www.bioconnector.virginia.edu/Shibboleth.sso/Session to see what was coming across – once it was all good we got a response that looks like

Miscellaneous
Session Expiration (barring inactivity): 479 minute(s)
Client Address: 137.54.59.201
SSO Protocol: urn:oasis:names:tc:SAML:2.0:protocol
Identity Provider: urn:mace:incommon:virginia.edu
Authentication Time: 2015-11-16T15:35:39.118Z
Authentication Context Class: urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport
Authentication Context Decl: (none)

Attributes
affiliation: [email protected];[email protected];[email protected]
eduPersonPrincipalName: [email protected]
uid: adp6j
unscoped-affiliation: member;staff;employee

The uid and eduPersonPrincipalName variables being the pieces we needed to get Drupal to set up a session for us

Lastly the Drupal bit

The Drupal side of this is pretty straight

We installed Drupal as usual  and grabbed the shib_auth module.

Screen Shot 2015-11-16 at 10.46.10 AM Screen Shot 2015-11-16 at 10.47.51 AM

and on the Advanced Tab

Screen Shot 2015-11-16 at 10.48.58 AM Screen Shot 2015-11-16 at 10.50.07 AM
Oct 06 2015
Oct 06

Docker is reinventing the way we package and deploy our applications, bringing new challenges to hosting. In this blog post I will provide a recipe for logging your Docker packaged applications.

Goals

Going into this I had 2 major goals:

  • Zero remote console logins - What is the number one cause for a developer / sys admin to login to the console on the remote environment? To inspect the logs. If we expose our logs via an API we immediately cut out the vast majority of our remote logins.
  • Aggregation - I personally despise having to log into multiple hosts and inspect the logs for a clustered service. It can result in you being "grumpy" at the situation well before you start on the task you are meant to do.
    If we have all our logs in one place, we don't have to worry about having to access multiple machines to analyse data.

Components

The following are the components which make up a standard logging pipeline.

Storage

I have started with the most core piece of the puzzle, the storage.
The component is in charge of:

  • Receiving the logs
  • Reliably storing them for a retention preiod (eg. 3 months)
  • Exposing them via an interface (API / UI)

Some open source options:

  • Logstash (more specifically the ELK stack)
  • Graylog

Some services where you can get this right out of the box:

  • Loggly
  • AWS CloudWatch Logs
  • Papertrail

These services don't require you to run Docker container based hosting. You can run these right now on your existing infrastructure.

However, they do become a key component when hosting Docker-based infrastructure because we are constantly rolling out new containers in place of the old ones.

Collector

This is an extremely simple service tasked with the job to collect all the logs and push them to the remote service.

Don't confuse simple with important though. I highly recommend you setup monitoring for this component.

Visualiser

On most occasions the "storage" component provides an interface for interacting with the logged data.

In addition we can also write applications to consume the "storage" components API and provide a command line experience.

Implementation

So how do we implement these components in a Docker hosting world? The key to our implementation is the Docker API.

In the below example we have:

  • Started a container with an "echo" command
  • Queried the Docker logs API via the Docker CLI application

What this means, is that we can pick up all the logs for a service IF the services inside the container are printing to STDOUT instead of logging to a file.

With this in mind, we developed the following logs pipeline, and open sourced some of the components:

Conclusion

I feel like we have achieved a lot by doing this.

Here are some takeaways:

  • The logs pipeline is generic and not Drupal specific
  • We didn't reinvent the wheel on how logs are shipped to remote services.
  • Some interesting projects were built along the way which can be used standalone.
Drupal Docker AWS Logging
Jan 28 2015
Jan 28

As the largest bicycling club in the country with more than 16,000 active members and a substantially larger community across the Puget Sound, Cascade Bicycle Club requires serious performance from its website. For most of the year, Cascade.org serves a modest number of web users as it furthers the organization’s mission of “improving lives through bicycling.”

But a few days each year, Cascade opens registration for its major sponsored rides, which results in a series of massive spikes in traffic. Cascade.org has in the past struggled to keep up with demand during these spikes. During the 2014 registration period for example, site traffic peaked at 1,022 concurrent users and >1,000 transactions processed within an hour. The site stayed up, but the single web server seriously struggled to stay on its feet.

In preparation for this year’s event registrations, we implemented horizontal scaling at the web server level as the next logical step forward in keeping pace with Cascade’s members. What is horizontal scaling, you might ask? Let me explain.

[Ed Note: This post gets very technical, very quickly.]

Overview

We had already set up hosting for the site in the Amazon cloud, so our job was to build out the new architecture there, including new Amazon Machine Images (AMIs) along with an Autoscale Group and Scaling Policies.

Here is a diagram of the architecture we ended up with. I’ll touch on most of these pieces below.

Cascade-scaling2

Web Servers as Cattle, Not Pets

I’m not the biggest fan of this metaphor, but it’s catchy: The fundamental mental shift when moving to automatic scaling is to stop thinking of the servers as named and coddled pets, but rather as identical and ephemeral cogs–a herd of cattle, if you will.

In our case, multiple web server instances are running at a given time, and more may be added or taken away automatically at any given time. We don’t know their IP addresses or hostnames without looking them up (which we can do either via the AWS console, or via AWS CLI — a very handy tool for managing AWS services from the command line).

The load balancer is configured to enable connection draining. When the autoscaling group triggers an instance removal, the load balancer will stop sending new traffic, but will finish serving any requests in progress before the instance is destroyed. This, coupled with sticky sessions, helps alleviate concerns about disrupting transactions in progress.

The AMI for the “cattle” web servers (3) is similar to our old single-server configuration, running Nginx and PHP tuned for Drupal. It’s actually a bit smaller of an instance size than the old server, though — since additional servers are automatically thrown into the application as needed based on load on the existing servers — and has some additional configuration that I’ll discuss below.

As you can see in the diagram, we still have many “pets” too. In addition to the surrounding infrastructure like our code repository (8) and continuous integration (7) servers, at AWS we have a “utility” server (9) used for hosting our development environment and some of our supporting scripts, as well as a single RDS instance (4) and a single EC2 instance used as a Memcache and Solr server (6). We also have an S3 instance for managing our static files (5) — more on that later.

Handling Mail

One potential whammy we caught late in the process was handling mail sent from the application. Since the IP of the given web server instance from which mail is sent will not match the SPF record for the domain (IP addresses authorized to send mail), the mail could be flagged as spam or mail from the domain could be blacklisted.

We were already running Mandrill for Drupal’s transactional mail, so to avoid this problem, we configured our web server AMI to have Postfix route all mail through the Mandrill service. Amazon Simple Email Service could also have been used for this purpose.

Static File Management

With our infrastructure in place, the main change at the application level is the way Drupal interacts with the file system. With multiple web servers, we can no longer read and write from the local file system for managing static files like images and other assets uploaded by site editors. A content delivery network or networked file system share lets us offload static files from the local file system to a centralized resource.

In our case, we used Drupal’s S3 File System module to manage our static files in an Amazon S3 bucket. S3FS adds a new “Amazon Simple Storage Service” file system option and stream wrapper. Core and contributed modules, as well as file fields, are configured to use this file system. The AWS CLI provided an easy way to initially transfer static files to the S3 bucket, and iteratively synch new files to the bucket as we tested and proceeded towards launch of the new system.

In addition to static files, special care has to be taken with aggregated CSS and Javascript files. Drupal’s core aggregation can’t be used, as it will write the aggregated files to the local file system. Options (which we’re still investigating) include a combination of contributed modules (Advanced CSS/JS Aggregation + CDN seems like it might do the trick), or Grunt tasks to do the aggregation outside of Drupal during application build (as described in Justin Slattery’s excellent write-up).

In the case of Cascade, we also had to deal with complications from CiviCRM, which stubbornly wants to write to the local file system. Thankfully, these are primarily cache files that Civi doesn’t mind duplicating across webservers.

Drush & Cron

We want a stable, centralized host from which to run cron jobs (which we obviously don’t want to execute on each server) and Drush commands, so one of our “pets” is a small EC2 instance that we maintain for this purpose, along with a few other administrative tasks.

Drush commands can be run against the application from anywhere via Drush aliases, which requires knowing the hostname of one of the running server instances. This can be achieved most easily by using AWS CLI. Something like the bash command below will return the running instances (where ‘webpool’ is an arbitrary tag assigned to our autoscaling group):

[[email protected] ~]$aws ec2 describe-instances --filters "Name=tag-key, Values=webpool" |grep ^INSTANCE |awk '{print $14}'|grep 'compute.amazonaws.com'

We wrote a simple bash script, update-alias.sh, to update the ‘remote-host’ value in our Drush alias file with the hostname of the last running server instance.

Our cron jobs execute update-alias.sh, and then the application (both Drupal and CiviCRM) cron jobs.

Deployment and Scaling Workflows

Our webserver AMI includes a script, bootstraph.sh, that either builds the application from scratch — cloning the code repository, creating placeholder directories, symlinking to environment-specific settings files — or updates the application if it already exists — updating the code repository and doing some cleanup.

A separate script, deploy-to-autoscale.sh, collects all of the running instances similar to update-alias.sh as described above, and executes bootstrap.sh on each instance.

With those two utilities, our continuous integration/deployment process is straightforward. When code changes are pushed to our Git repository, we trigger a job on our Jenkins server that essentially just executes deploy-to-autoscale.sh. We run update-alias.sh to update our Drush alias, clear the application cache via Drush, tag our repository with the Jenkins build ID, and we’re done.

For the autoscaling itself, our current policy is to spin up two new server instances when CPU utilization across the pool of instances reaches 75% for 90 seconds or more. New server instances simply run bootstrap.sh to provision the application before they’re added to the webserver pool.

There’s a 300-second grace time between additional autoscale operations to prevent a stampede of new cattle. Machines are destroyed when CPU usage falls beneath 20% across the pool. They’re removed one at a time for a more gradual decrease in capacity than the swift ramp-up that fits the profile of traffic.

More Butts on Bikes

With this new architecture, we’ve taken a huge step toward one of Cascade’s overarching goals: getting “more butts on bikes”! We’re still tuning and tweaking a bit, but the application has handled this year’s registration period flawlessly so far, and Cascade is confident in its ability to handle the expected — and unexpected — traffic spikes in the future.

Our performant web application for Cascade Bicycle Club means an easier registration process, leaving them to focus on what really matters: improving lives through bicycling.

Previous Post

2015 Digital Trends for Influence

Next Post

Communicating Data for Impact takes Seattle

Feb 09 2013
Feb 09

In this blog I will drive you through setting up Drupal project instance on Ec2 micro instance of AWS and setting up ftp on your Drupal instance. Before this, of course you have to register with AWS which is straightforward.

So, what we will understand from this blog:-

1       Choosing OS and assigning some security rules on our instance.

2       How to access our instance and play around it?

3       Setting up LAMP on our AWS micro instance

4       Setting up ftp on AWS micro instance

5       Managing your Drupal project using ftp connection using filezilla

Once you are registered with AWS you have to login into your account. Among the AWS services just click on EC2 link which is nothing but a virtual server in the cloud. It will redirect you to EC2 dashboard where you can manage your entire instance. Now for creating new instance follow below steps:-

1       Just click on Launch instance select Classical wizard and click continue.

2       Now you have to select Amazon Machine Image (AMI) from one of the tabbed lists below by clicking its Select button.

3       Let’s say we select Ubuntu 12.04 LTS.

4       Next leave default settings except just select micro instance from the instance type because it’s free to use and click continue.

5       Next again leave default setting and click continue.

6       Next again leave default setting and click continue.

7       Now you have to give name of your key and value. I recommend give key value name more sensible with your project and click continue.

8       Now select creating a new key pair as we are new we don't have existed key pair. Give name to your key pair file it should make sense with regards to your project. Download your keypair.pem file and save at a safe place because we need this file later.

9       Next select create a new security group. Here we will assign some security rule and enable http, SSH and ftp connection to our instance. Http port range is 80 SSH port range is 22, for ftp select custom TCP rule and give port range 21-22. About source you can give any IP range as you need or just leave default for now.

10    Just click continue and launch instance. Your instance will be running in some time as AWS will take some time to run your instance.

Now our EC2 micro instance is running. You can check out from your dashboard.

Now we setup LAMP in our Ubuntu 12.04 LTS instance. For this we access our Ubuntu instance from terminal and setup LAMP in that. Below are the steps to access and setup your LAMP in Ubuntu 12.04 LTS instance.

1       Open your terminal and go to the directory where you stored your key pair file .pem file then run this command into your terminal sudo SSH -i file_name.pem [email protected] .

2       It will give you Ubuntu prompt in your terminal. You can understand like this that now you are logged in into your Ubuntu machine and you can do anything over there. The main thing here is you have to run all commands with "sudo" or as a root user which works on file system directory.

3       In order to setup LAMP we will install three packages in it . Run these command from terminal :-

?       sudo apt-get install apache2

?       sudo apt-get install mysql-server

?       sudo apt-get php5 php-pear php5-mysql php5-suhosin

That's it your LAMP environment is ready. You can check it by navigating your instance url that is for example look like this ec2-43-23-32.compunte.AWS.com in browser. It will give you message Localhost is working something like that.

Till here we got our LAMP environment running into our EC2 instance. Now we install our Drupal 7 instance in it. Here in Ubuntu instance we don't have any ftp connection with ftp client or server. So we can use SCP for taking Drupal tarball from our local instance or we can use wget utility of linux to download Drupal from its url. Below are steps to install Drupal 7

1       cd /var/www/

2       wget  http://ftp.Drupal.org/files/projects/Drupal-7.12.tar.gz

3       tar xvf Drupal-7.12.tar.gz

4       mv Drupal-7.12 Drupal

Below is a link which can tell you how to install Drupal in linux . Just follow all those steps.

http://Drupal.org/documentation/install/developers

When you setup with your mysql database and Drupal configuration then just browser link like for example:

http://ec2-43-23-32.compute-1.amazonAWS.com/Drupal/

It will take you to your Drupal site.

Now we will see how to setup ftp on Drupal instance .Why we need ftp for Drupal instance. In order to work on Drupal we have to use many modules, themes and libraries. So we have to upload those things on site. We can achieve this with SCP but that will be more difficult because you have to do via command line.  Here we will see how to setup Filezilla ftp on Drupal site which is on AWS EC2. Below are the steps to setup filezilla as a ftp client.

1       Install filezilla  into your local system :- sudo apt-get install filezilla

2       Now open filezilla click on file > site manager

3       Enter the details of your site here like :-

?       HOST - ec2-43-23-32.compute-1.amazonAWS.com

?       Port- 22

?       Protocol - SFTP(SSH file transfer protocol )

?       Logon Type - Normal

?       User - Ubuntu

?       Password - Ubuntu . Then click on ok don't click on connect this time.

1       Now click on Edit> Settings > SFTP and addkeyfile. Navigate your .pem file here. It will ask you to convert .pem file just select ok. Now filezilla have your instance credentials to connect and everything is good to connect to your Drupal instance.

2       Now click on file > site manager > connect

Now you can transfer all files from your local to Drupal instance.

I hope you enjoyed this blog. Please feel free to comment and send queries to me.

Reference:-

https://AWS.amazon.com/documentation/

http://library.linode.com/lamp-guides/Ubuntu-12.04-precise-pangolin

Thanks for reading draft of this. Appreciate your help and contribution.

Sep 26 2011
Sep 26

Recently I decided to check out Amazon Cloudfront to use as a cdn. I had a delighted experience with it. It was so easy to setup with drupal and configured so easy in just a few steps.

Cloudfront is a pull only cdn as opposed to a push cdn. A pull cdn is a cdn that grabs the files from your site on demand. The only thing you have to do to configure a pull cdn, is switch out the base url of the files you want to serve, to the cdn base url. When that file gets requested the cdn checks if it has it already, if not it downloads it from your server and servers it to the client.

In addition to the easy setup of Amazon Cloud front is it's cost. It is really really cheap. It costs about 12 cents per gigabite of transfer. 

Step 1: Create a Cloudfront distrabution 

I am not going to walk through the sign up process for amazon web services. It is pretty straight forward you can find it here. Once you have an account you need to sign in to the AWS Management Console. The link is found on aws.amazon.com website at top right hand side. 

Next you will get the Management Console for all AWS services. Click on the Cloudfront tab.

Next click the 'Create Distribution' button in the upper left hand side.

You will be prompted with a small configuration screen. Click on 'Custom Origin' and enter your website domain name in 'Origin DNS Name' field and click 'Continue'. NOTE: This is the most basic setup. This can definetly be more complex, but for most users this configuration will work.

On the next configuation screen you can leave everything blank. You can optionally configure a cname. This will allow you to control the url of your cdn. For example, if you wanted everything to come from cdn.example.com you would enter that in, but be aware that you need to configure your DNS appropiatly. Again for most setups you can leave this blank. Click 'Continue'.

Finally you will be brought to the review page, hit 'Create Distribution'.

Next make note of the 'Domain Name' that was assign to you. It should be something.cloudfront.net. It may take a few minutes for the distribution to create it self, but while that is going on, you can install the cdn modle.

Step 2: Install and configure the Drupal CDN module

Now go to drupal.org and download and enable the CDN module.

Once enabled, there is only two simple configuations that you need to do. Go to /admin/config/development/cdn and click on the details tab. In the 'CDN-mapping' textarea paste in the url you got from cloudfront and remeber to put http in front of it and click save.

Lastly, click the 'General' tab at the top right hand corner and check the box to enable CDN.

Thats it! Now all files on your site will be pulled automatically to the cdn and be servered to the world from the closed location.

NOTE: This will cause almost all files to be pulled from the CDN, even javascript files. If you have javascript files that are making AJAX requests, they will not work, because you can not do cross domain ajax calls. So make sure to black list those files. You can also choose to blacklist other files as well in the cdn configuration.

Mar 29 2011
mg
Mar 29

Amazon AWS + Drupal

(Some familiarity with Amazon AWS is assumed.)

I have always wanted to setup a high performance Drupal on an AWS EC2. There are several advantages of running your website (or web application) on the AWS. Amazon EC2 creates and provisions virtual Linux (or Windows) servers for you and charge you an hourly rate for usage.

With AWS, it becomes easy to distribute and share the Drupal image with others. And of course it is much easier to scale and is definitely cheaper. You can have different virtual servers running the search engine, database and application servers, therefore all scaling independently of each other.

With the introduction of Micro instances and better yet, free micro instances, the barrier to entry for a new user has really dropped. 

I assume you have or can create an Amazon AWS account and use their management console. These aspects are very well covered in Amazon's site and I will not get into the details of creating an account, etc. Amazon has done a great job of creating the documentation and tutorials for getting started.

I will show how to:

1. Setup a LAMP stack on Ubuntu

2. Setup Drupal on the LAMP stack

3. How to install phpmyadmin

4. Configure the database to reside in the EBS instead of the ephemeral instance storage.

1. Setup a LAMP stack on Ubuntu:

I used a 64 bit image Ubuntu image for my purpose. Amazon provides both 32 bit and 64 bit micro instances, but I wanted to start with 64 bit, because their larger servers are only 64 bit and I can use the same image to scale up to larger Amazon servers. I used the Ubuntu image as my base image. This image is available in the US west region only. (Images are unique to regions and you can get similar images for the region you want to use).

Once your AWS account is setup, sign into the Amazon AWS console. Click on the EC2 tab. Check the region you are running in. If you want to run in US West, select it and click on launch instance. The popup following that will allow you to select an image. Search the image id: ami-01772744. Click on start and continue with default options. You will have to select a key-pair and security group. Make sure the port 80 and port 22 are open in the security group you want to use. Port 80 will allow the HTTP access and port 22 will allow the ssh connectivity to the server. 

You also have to know the location of the Amazon's private key file (.pem) for the key-value pair. The Ubuntu server takes a few minutes to start and be available. 

From the command line on your local machine type:

The part following [email protected] has to be replaced by your server's public dns name that Amazon provides on the console. Note there is no root user and all commands will work through sudo. Ubuntu does this to avoid any root user logins. You can access all the administrative functionality using sudo.

BTW, if the command above does not read and execute due to permission problem, you might want to first run:

        chmod 600 [path to key file]/[key file name].pem

Once connected to the remote server console (your first big milestone BTW), you can create a password for the ubuntu user by typing in (optional):

sudo passwd ubuntu

If you want to enable SSH access via passwords so you don't require the .pem file every time you can do the following:

 edit /etc/ssh/sshd_config to have

PasswordAuthentication yes

Restart the ssh deamon

sudo service ssh restart
OR 
sudo /etc/init.d/apache2 restart

Now with the basic logistics in place, let's set up the LAMP stack on this Ubuntu instance. I found this to be simpler than what I had expected. Write down any usernames and passwords you create from this point on

sudo tasksel install lamp-server

Drupal will need the re-write functionality to be able to perform clean URLs, so run the command

sudo a2enmod rewrite

That's it. You lamp stack is setup.

Go to http://[your public dns] and you should see some output form Apache.

BTW, what I also find really useful is to create some short cuts in the .profile file. For example instead of typing ls -al I can then type la and since I make spelling mistakes while typing sudo, I can point sodu to sudo as well. To do this, edit the /home/ubuntu/.profile file

sudo vim /home/ubuntu/.profile

Add the line:

alias la='ls -al'
alias sodu='sudo'

2. Setup Drupal on the LAMP stack

Setting up Drupal on the LAMP stack is usually just 1 line command and we will need to perform some basic operations:

        sudo apt-get install drupal6

edit the file /etc/apache2/sites-enabled/000-default and change the make the change so that DocumentRoot is now as follows:

        DocumentRoot /usr/share/drupal6

You can install Drupal anywhere and just point the DocumentRoot to that location. Also comment our the block that starts with 

        <Directory />

Also edit the file /etc/apache2/conf.d/drupal6.conf and comment out the line 

        Alias /drupal6 /usr/share/drupal6

restart the Apache so the above configuration changes are reflected correctly

        sudo service apache2 restart

Now go to http://[your public dns/install.php] and voila you are in business.

3. Setup phpmyadmin:

To access the database through phpmyadmin, you will need to install the phpmyadmin and access the URL of the application. Again, this is only optional and you can access all the SQL functionality form command line also. Installing phpmyadmin is trivial:

        sudo apt-get install phpmyadmin

And you are done. Follow the install options if any.

Go the the phpmyadmin application via:

http://[your public dns/phpmyadmin]

The user name is usually root.

4. Configure the database to reside in the EBS instead of the ephemeral instance storage:

Amazon's instances are ephemeral and the storage on the instance is ephemeral as well. That is if the instance is shutdown, the data on it will go away. Now that is not a very desirable configuration. However, Amazon allows you to mount persistant storage on top of the instance. You can mount any number of 1 TB drives on the instance. You can chose the size of the mounted drive at instance startup time.

Essentially, there will already be a mounted drive which you can find by typing:

        mount

The on the mounted drive you can create corresponding directories for logs, DB files and lib

You link the directories on the the mounted drive to the directories on your instance.  The set of commands are as follows:

Shut down the SQL first:
        sudo /etc/init.d/mysql stop

And then create the folders and link them:

        sudo mkdir /vol/etc /vol/lib /vol/log
sudo mv /etc/mysql     /vol/etc/
sudo mv /var/lib/mysql /vol/lib/
sudo mv /var/log/mysql /vol/log/

sudo mkdir /etc/mysql
sudo mkdir /var/lib/mysql
sudo mkdir /var/log/mysql

echo "/vol/etc/mysql /etc/mysql     none bind" | sudo tee -a /etc/fstab
sudo mount /etc/mysql

echo "/vol/lib/mysql /var/lib/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/lib/mysql

echo "/vol/log/mysql /var/log/mysql none bind" | sudo tee -a /etc/fstab
sudo mount /var/log/mysql

        sudo /etc/init.d/mysql start

So, in summary, we saw how to setup the LAMP server, install Drupal and make sure the DB runs on the persistant storage. There is still work to harden the image and to create the image from the instance, and that will be covered in a subsequent blog.

About Drupal Sun

Drupal Sun is an Evolving Web project. It allows you to:

  • Do full-text search on all the articles in Drupal Planet (thanks to Apache Solr)
  • Facet based on tags, author, or feed
  • Flip through articles quickly (with j/k or arrow keys) to find what you're interested in
  • View the entire article text inline, or in the context of the site where it was created

See the blog post at Evolving Web

Evolving Web