Change.org is the world’s largest social-change platform with more than 130 million users in 196 countries. As a certified B-corporation—a new kind of company using the power of business for social good—the company’s mission is to empower people everywhere to create the change they want to see. As an open platform, anyone can start a campaign and immediately mobilize hundreds of people locally or hundreds of thousands around the world to create change, from stopping bullying in schools to ending acid attacks in India. Change.org is headquartered in San Francisco, and investors include Bill Gates, Richard Branson, Arianna Huffington, and the founders of LinkedIn, Yahoo, Twitter, and eBay.
As Change.org has grown rapidly over the past several years, especially internationally, it has sought to push out new site features more regularly. However, the organization lacked the elasticity to do that using its former managed-cloud infrastructure. “Our data science team in particular wanted to iterate more rapidly, but we had some issues with our former environment that sometimes made it challenging to quickly add new resources,” says Vijay Ramesh, lead data engineer for data science at Change.org. “We also needed the agility to do iterations using very different types of machine topologies for machine learning. For example, we wanted to use machines with many cores, while also using smaller machines with parallel processes.”
Change.org also relies heavily on continuous integration (CI)—a development practice that requires the integration of code into a repository several times each day—to ensure high-quality code. But the organization used a homegrown CI system that was delaying new feature releases. “Each of the CI builds was taking up to one hour to finish, which is too long,” says Ramesh. “That made it more difficult to quickly test new features and get them deployed.” And the system also required too much hands-on maintenance. “Several of our engineers spent a few hours a day maintaining the CI workflow, but that is not our core competency,” Ramesh says. “We want to focus all our time on developing new features.”
The organization also needed to more easily scale its website platform to deal with traffic spikes. “We see massive traffic bursts when we’re mentioned in the media, for example, but it wasn’t easy for us to bring up server nodes in real time to support those bursts,” says Ramesh. “It required manual intervention on our part, and it easily took well over an hour to get new resources up and into production to respond to the spikes."
Why Amazon Web Services
Determined to find a more agile and scalable web platform, the Change.org data science team decided to move some of its machine learning resources to Amazon Web Services (AWS). “We liked the flexibility and scalability of AWS,” says Ramesh. The data science team started using Amazon Elastic Compute Cloud (Amazon EC2) instances in production for email targeting and batch pipeline work.
Next, Change.org set up two Amazon Redshift clusters, totaling 16 terabytes each. Both clusters contain all the company’s relational and events data. The first is a production business intelligence (BI) cluster that streams customer data from a MySQL database into Redshift. The second cluster is used for offline analytics and R&D collects experimental and log event data from Amazon Simple Storage Service (Amazon S3) and uploads it into Redshift. This cluster is kept up to date via the AWS Data Pipeline service. “We are very experiment-driven with new features and changes to the site, so we set up a system for our engineers to work on anything they want to without worrying about it affecting production,” says Ramesh. Additionally, the organization’s business analysts can create reports and executive dashboards for analyzing the event stream data in Redshift. Many of the organization’s email tools query Redshift.
Although Change.org still keeps some of its resources in its on premise environment, it is increasingly using on-demand AWS resources and is in the process of migrating its complete production environment to AWS.
Change.org also chose to use Solano CI, a hosted CI solution. Solano CI runs completely on AWS, using Elastic Load Balancing to route incoming traffic, Amazon S3 for data storage, and Amazon Relational Database Service (Amazon RDS) to run its database. Solano CI is a scalable system designed to reduce testing time for developers while offloading the maintenance of the CI system and optimizing AWS instance usage, significantly reducing TCO. “Since implementing Solano CI, we have been able to completely shut down our homegrown CI system,” says Ramesh.
By taking advantage of the on-demand elasticity of AWS, Change.org can develop new features more quickly. “Our data science team can iterate very quickly because of the elasticity of AWS,” says Ramesh. “For example, when we build recommendation engines, the computational demands vary greatly depending on the type of algorithm we used or how much data is involved. But we can do it all on the fly with AWS, using only the resources we need based on what we’re doing for that project. As a result, we’re not requisitioning servers or having to plan ahead.”
The organization can also rely on its Solano CI automated testing environment to reduce test time and roll out new features into production faster. Since switching to Solano CI, the organization has cut its average build time from one hour to 15 minutes via Solano CI’s auto-parallelization. “We can deploy more CI builds through each day, which ultimately means we can push more changes to the site on a daily basis,” says Ramesh. “We are delivering value faster to our end users because of Solano CI.” In addition, Change.org can identify defects faster because it can run more tests with Solano CI. He says, “By detecting defects more quickly, we can ensure higher-quality code and ultimately produce better features.”
Change.org also has the scalability it needs to support traffic spikes. “We can much more easily respond to traffic bursts on our website by using AWS,” says Ramesh. “As we see more traffic, new background servers automatically come online. And as the traffic subsides, those servers go offline again. It’s very dependable and fast. And because it doesn’t require manual intervention from our engineers, it doesn’t take an hour for us to spin up new resources. It’s instantaneous.”
Additionally, the Solano CI environment is designed to scale, so Change.org can better support increases in test build requirements. “Sometimes, when we’re about to release major features and we’re approaching a deadline, we will have an unusually high number of builds we need to run,” says Ramesh. “Solano CI scales to support those needs because it benefits from the scalability of AWS.”
And because its CI system is now based in the cloud, Change.org developers no longer need to spend their time on maintenance tasks. Solano CI intelligently and automatically optimizes tests, so Change.org developers do not have to manage a CI system, configure virtual machines, or make sure its CI nodes are functioning correctly. “Our developers don’t have to spend time worrying about maintaining our CI system,” says Ramesh. “They can rely on Solano CI to do everything, so they can instead focus on what they do best: building great tools for our website that bring the most value to our end users across the globe.”
Change.org plans to greatly expand its use of AWS once it completes its migration to the environment. “We are very excited about the fact that AWS has multiple regions and availability zones worldwide, which is something we didn’t have with our previous solution,” says Ramesh. “Being able to build out a website architecture that will enable us to respond to global traffic is going to be a huge advantage for us. We have grown a lot internationally over the past few years, and we will be able to accelerate that even more by using AWS. We will have more redundancy and resiliency for our global users, and that opens up a whole new world for us in terms of our future growth.”
“By using AWS, we will have more redundancy and resiliency for our global users, and that opens up a whole new world for us in terms of our future growth.”
Vijay Ramesh, Lead Data Engineer