Launched in San Francisco in 2007, Scribd is an Internet company that provides social publishing and reading services. Scribd converts documents into a Web format readable on Scribd.com. Scribd members can share documents across the web, mobile devices, and social platforms such as Facebook and Twitter.
From the beginning, Scribd used Amazon Web Services (AWS) to store and process the documents uploaded to its site. Jared Friedman, co-founder of Scribd states, “It was by far the easiest way to scale our web storage to meet customer demand.” Scribd uses Amazon Simple Storage Service (Amazon S3) to host original and converted document assets, Amazon Elastic Compute Cloud (Amazon EC2) to convert the documents to Web-readable HTML, and the AWS SDK for Ruby.
Why Amazon Web Services
“We talked with an account manager at AWS, who listened to our business problem and helped us design a large batch job using Amazon EC2 Spot Instances,” says Friedman. Scribd runs a scalable grid of slave nodes that processes files, and a master node that controls them. The master node scales the grid in response to changing demand, and can purchase On-Demand or Spot Instances, depending on the time sensitivity of requests. Scribd estimates that it saved 63%, or $10,500, by using Spot Instances for the batch conversion instead of On-Demand Instances for this particular job.
Scribd used up to 2,000 Spot Instances at a time to run the batch-processing job. “We only had to write a couple of really small scripts," says Friedman. “We were able to move from On-Demand to Spot Instances in a couple of hours, coffee breaks included."
The team called on AWS Support when they ran into issues mid-way through the process because of software overload on the instances. “Helpful technical people looked through our account data, explained the issue, and helped us understand how to fix it,” says Friedman. “Thanks to AWS, we were able to get the job done, ahead of time and under budget—at an amazingly low cost.”
In the future, Scribd plans to make greater use of Spot Instances, which Friedman describes as a great way to save money on Amazon EC2. He continues, “AWS has supported our business for five years, allowing us to easily handle our millions of users.” With the recent release of Amazon Glacier, Scribd quickly took advantage of the low-cost and durable storage service to backup all of their data on Amazon Glacier, including files that previously had no backups. Scribd found Amazon Glacier particularly helpful for database snapshots and log files. Friedman comments, “Amazon Glacier's extremely low prices have made it economically viable for us to do far more comprehensive backups than we were previously able to do.”