Benchling is a San Francisco–based life science software company that provides a complete R&D software platform. The system makes it easy for scientists to design, organize, and share their experiments from start to finish, and it offers a suite of workflow management as well as molecular biology design and analysis tools to support its aims. Today, Benchling has thousands of customers, including biotech and pharmaceutical firms, as well as academic and government labs.
CRISPR is a breakthrough technique that researchers use to modify parts of a genome with extreme precision. Because scientists worldwide use CRISPR to build disease models and screen for drug targets, Benchling needs to ensure its platform is fast. “We want to enable searches across hundreds of genomes as quickly as possible, so researchers can design better experiments with less effort,” says Vineet Gopal, an engineering manager at Benchling. Using the Benchling platform, scientists can quickly identify possible accidental matches to select the best candidate sequences for their CRISPR experiments. Benchling currently supports CRISPR workflows for more than 100 organisms and is continuously adding more.
However, the company found it difficult to support more users and genomes with its previous IT environment, which used several servers to process CRISPR search tasks, with each server storing the genomes on disk. A single CRISPR search was split into several subtasks and parallelized across the servers. Each subtask read the corresponding genome region from disk, conducted the search, and returned results to the user. “The searches took about 30 seconds, but we wanted to get that down to a few seconds,” Gopal says. “Even though these searches aren’t very frequent, they need to happen quickly, and we can’t afford to spin up an instance every time someone searches.”
Benchling also wanted a faster, easier way to scale its solution. “We already support hundreds of thousands of CRISPR searches every month, but we couldn’t spin up new servers fast enough if demand was high,” says Gopal. “We want to grow the platform to scale to hundreds of genomes and millions of searches per month.”
Additionally, the organization needed to reduce costs. “Off hours, when fewer people were using the platform, we were burning thousands of dollars in maintenance,” says Gopal. “We didn’t want to pay for servers that weren’t being utilized.”
Why Amazon Web Services
Benchling had already been using the Amazon Web Services (AWS) Cloud, taking advantage of Amazon Elastic Cloud Compute (Amazon EC2) and Amazon Simple Storage Service (Amazon S3) to support its platform. The company was also using a combination of AWS CloudFormation and an Amazon Virtual Private Cloud (Amazon VPC) to make it easy to spin up database clones for customers.
To increase scalability while reducing costs, Benchling built their application using a serverless architecture by using AWS Lambda to run code without needing to provision and manage servers. The organization’s goal was to split up a CRISPR search across several AWS Lambda tasks to reduce costs and boost scalability.
To start the process, a Benchling web server receives a researcher’s request to conduct a CRISPR search on a specific genome. The web server then splits up the genome into smaller tasks and invokes the AWS Lambda function for each task. Lambda then downloads the genome data stored in Amazon S3, performs the query, combines the results, and returns them to the user. “Using AWS Lambda, CRISPR searches are easily parallelized by splitting the genome into smaller tasks using a custom genome search algorithm,” says Gopal. “This approach solves our dynamic scaling challenge, because we no longer need to maintain several servers to perform searches.”
Researchers using the Benchling platform for CRISPR searches are now able to get their results faster than ever. “By using AWS Lambda, we’ve cut CRISPR search times by 90 percent and scaled to hundreds of genomes,” Gopal says. “By making this analysis faster, scientists using our platform can spend more time focusing on science. That’s what we want to do—enable faster searches so their research isn’t hindered by performance limitations. The industry standard for these searches is minutes to hours, and we’ve been able to improve that tremendously.”
Benchling now has the ability to quickly grow its solution without having to provision additional server instances. “Our platform currently supports over 100 genomes, and we’re getting new requests every week,” says Gopal. “With the built-in scalability we get using AWS, we can easily support those requests. And as we add more genomes, we don’t have to worry about relying on disks to store each instance. The AWS Lambda approach completely solved our dynamic scaling issue.”
In addition, the company is saving money by using the AWS Cloud to support CRISPR searches. “Because we don’t have to provision compute resources ourselves, and we’re not paying for instances we’re not using, we are saving thousands of dollars every month by relying on AWS Lambda and the rest of the AWS services we use,” says Gopal. “As a startup, that’s a big benefit.”
Because the organization’s engineers do not have to spend time maintaining and provisioning servers, they can put more effort into developing new software features. “At a small company like ours, time is the most important resource,” says Gopal. “By using AWS, we can give our engineers more time. Instead of worrying about the solution architecture and managing servers, they are freed up to focus on new projects and initiatives that can help grow the company.”
Using AWS Lambda and additional AWS services, Benchling will continue to help scientific researchers discover cures for diseases. “Our focus is on making the research experience great for our users, whether they’re at universities or pharmaceutical companies,” says Gopal. “Using AWS, we know we can optimize their CRISPR search experience and help them get results as quickly as possible.”