SoundCloud offers a platform for artists, bands, podcasters, and others to create their own recordings or upload existing sounds. Sounds can either be shared with friends or publicly on blogs, websites, or social networks, and via apps for iPhone, Android, and others. SoundCloud was launched in 2008 and has headquarters in Berlin and offices in the U.S. and Europe.
SoundCloud operates worldwide, enabling users to upload 12 hours of audio material to its platform every minute. Each audio file must be transcoded and stored in multiple formats. The company also logs and analyzes billions of events daily to better understand user behavior and continually optimize its service. It currently stores 2.5 PB of data.
With a global audience that is active 24 hours a day, the company experiences highly inconsistent load profiles. High load spikes occur when users in Europe and the U.S. use the platform at the same time. “If our storage crashed, that would be the end of SoundCloud,” says Alexander Grosse, VP of Engineering at SoundCloud. “We need to be able to focus on our platform’s core functionality.”
Why Amazon Web Services
SoundCloud decided to use Amazon Web Services (AWS) to store and process the massive data sets its users upload each day. “We needed a solution that could scale to meet our needs and was easy to run, and AWS fit the bill,” Grosse says.
SoundCloud uses a combination of Amazon Simple Storage Service (Amazon S3) and Amazon Glacier as its storage solution. The audio files are placed in Amazon S3 and distributed from there via the SoundCloud website. All files are also copied to Amazon Glacier, to ensure that the data is available at all times, even in the event of a disaster. The company currently stores 2.5 PB of data on Amazon Glacier.
Through the combined use of Amazon S3 and Amazon Glacier, SoundCloud is able to securely store data volumes without requiring additional operational overhead. “We don't need to worry about storage. AWS lets us sleep well at night,” Grosse says.
SoundCloud handles the load spikes caused by worldwide usage with a transcoding cluster of up to 300 Amazon Elastic Compute Cloud (Amazon EC2) instances. “Spikes in transcoding loads are distributed throughout the day,” Grosse says. “AWS is perfect for applications with variable runtime behavior.”
To log and evaluate billions of user interactions, SoundCloud starts up an Amazon Redshift cluster, which enables its Data Science Team to quickly test ideas and gain insights into user behavior. SoundCloud can then scale the Redshift cluster as needed to the growing data volumes without any increase in operational complexity.
Secure long-term storage is the key to SoundCloud's success. By offloading storage and transcoding functionality to AWS, SoundCloud can focus entirely on its own platform's core functionality, better serving customers and building the business.
Using AWS means that SoundCloud can serve variable load profiles without needing to buy and manage their own servers. Through the immediate availability of servers on Amazon EC2, SoundCloud can avoid opportunity costs and adjust server capacity at any time to directly meet existing requirements.
The company also uses AWS for test and development environments. “Whenever we need a server right away, we use AWS,” Grosse says. “We were able to launch a complete Cassandra cluster on Amazon EC2 in only a few hours. We needed it for six months, and when we were through, we could shut it down without incurring any additional costs.” The company plans to expand its use of AWS going forward.
“It pays off to start right away in the AWS Cloud,” Grosse says. “Then you can set your architecture up in a more modular and scalable fashion right from the start.”