Every day, tens of millions of prospective home buyers, sellers, and renters, in addition to agents and property managers, use the Zillow website to browse home and apartment listings, shop for mortgages, and find information about 110 million homes across the U.S. The popular site is owned by Zillow Group, which houses a portfolio of the largest real estate and home-related brands online. In addition to Zillow, Zillow Group operates Trulia, HotPads, and StreetEasy.
Zillow processes more than 3 million new images daily, including listings photos, profile pictures for lenders and agents, and home-project images on the Zillow Digs site. “We receive 17,000 image requests per second during peak traffic from both desktop and mobile clients,” says Nick Michal, Unix systems engineering manager for Zillow Group.
As Zillow grew in popularity and agents began posting higher resolution photos with their listings, the company’s legacy imaging system could not keep up with the demand. The system lived in hosted data centers, with images downloaded out of one queue, stored on a network-attached storage (NAS) device in the pyramidal TIFF format, and served to a content delivery network (CDN) out of a local Squid service. “Doing things this way was expensive, and we relied on our CDN to have a high cache hit rate. If that rate didn’t happen, we couldn’t serve images effectively. We were running close to capacity every day,” says Michal.
Zillow was also struggling with image-processing performance issues, because some images were manually uploaded while others came from bulk feeds to be downloaded. The rate of new images coming from the bulk feeds was often unpredictable, and some image sources allowed for much faster downloading with more concurrency than other sources. As a result, a slow or problematic image at the front of the queue would hold up the download of other images. “We can’t have bandwidth issues on Zillow, because if users go to the site and an image isn’t there, they won’t look at the listing,” says Feroze Daud, senior software development engineer, Zillow Group. “That would be frustrating for users.”
In addition, the tool Zillow was using for processing images for storage in the pyramidal TIFF format was aging and not easily extensible. “It was not easy for us to put in image-quality improvements like removing solid color borders,” Daud says. Disaster recovery was another major concern. “With everything hosted in one data center, there was risk,” says Michal.
Why Amazon Web Services
To solve its image system’s scalability, performance, and disaster recovery challenges, Zillow decided to move to a cloud-based infrastructure. “From an overall cost and ease-of-management perspective, the cloud made a lot of sense,” says Michal. After evaluating various cloud technologies, Zillow chose Amazon Web Services (AWS). “AWS had been around the longest and was dominant in the cloud space,” says Michal. “Also, a number of the companies we had purchased were already using AWS.”
The company migrated its image hosting and distribution from a physical collocation facility to AWS, using Amazon Elastic Compute Cloud (Amazon EC2) instances and Amazon Simple Storage Service (Amazon S3) for image object storage. Currently, Zillow stores close to 100 TB of data in Amazon S3, including 300 million images and more than 1 billion objects. “Maintaining an object count in the billions doesn’t work so well on a traditional file system,” Michal says. “We would have to split those objects across many file systems, which would be a management nightmare. The scalability of Amazon S3 seemed like the right technology for us.”
Zillow also began using AWS Elastic Beanstalk, a service for deploying and scaling web applications and services. Developers can upload code to Elastic Beanstalk, which then automatically handles the deployment, from capacity provisioning, load balancing, and auto-scaling to application health monitoring. The company is using an Elastic Beanstalk worker environment to run a Python Imaging Library with custom code. “Because we ingest data in a haphazard way, running feeds that dump a ton of work into the system all at once, we need to scale up our suite of image converters,” says Daud. “AWS Elastic Beanstalk is the simplest way to do that, as opposed to having a bunch of static instances or trying to write our own auto-scaling configuration.”
The organization then moved the majority of its CDN workload from Akamai to Amazon CloudFront, a content-delivery web service that distributes Zillow website content closer to users. “AWS CloudFront is considerably less expensive than Akamai, and it also integrates nicely with Amazon S3,” says Michal. Zillow is also using Amazon CloudWatch for monitoring some of its cloud resources.
Zillow uses a download server (DLS) in its data center to manage image download requests coming from listing feeds, and it uses an Amazon Elastic Beanstalk REST API as a front-end service in the cloud for the DLS. This service takes each image download request and puts it into a per-feed Amazon Simple Queue Service (Amazon SQS) message-queuing service. “Using SQS, we have a queueing system without having to support the infrastructure ourselves,” says Michal.
A throttled downloader controls the rate and concurrency at which Zillow downloads images for each feed source, allowing the company to take advantage of image providers that support fast downloading while not overwhelming providers that do not. If the image download succeeds, Zillow writes the original image into Amazon S3, to be used in image processing.
For image processing, Zillow takes the original images stored in S3 and processes them through various image quality methods while generating a standard set of sizes for each image. All images are served out of Amazon S3 and cached in Amazon CloudFront. The company serves up an average of 15,000 images per second.
Using AWS, Zillow can deliver a better experience for prospective home buyers and renters, real estate agents, and other site visitors. “By moving to AWS, we no longer have to worry about cache flushes or capacity issues. We have the scalability and performance we need to deliver high-quality real estate images, which is so important to the Zillow user experience,” says Daud. Zillow can scale image downloading and processing to handle varying levels of incoming images throughout the day. And because image downloads from each feed source are now independent, Zillow can take advantage of sources that support high bandwidth and concurrency while also throttling back for sources that do not. Additionally, Amazon S3 gives the company near-unlimited object storage, removing the need to order and install more servers or drives to increase capacity.
Using Amazon CloudFront along with Amazon S3, Zillow is more confident in its imaging system’s performance. “We have a lot more bandwidth than we did before, so we don’t have to even think about it anymore,” says Michal. “And we definitely don’t worry about S3 running out of capacity.”
Zillow has also reduced operating costs by migrating its image processing and delivery system to AWS. “Using Amazon CloudFront, we are paying less than half the amount per month we previously paid for our CDN,” says Michal. “We don’t have to spend money on forklift upgrades to NAS devices anymore.”
The company has increased the availability of its imaging system by using Amazon S3 and Amazon CloudFront. “With S3, we have objects replicated three ways within a region, so even if an availability zone goes down, traffic could still be served to users with no development effort on our side,” says Michal.
Disaster recovery has also improved. “We can definitely capitalize on the geographic distribution of AWS,” says Michal. “There are a number of AWS regions and availability zones in those regions, so we can not only generate dynamic content closer to users, but we can also improve our disaster recovery capabilities.”
Zillow is now more agile when it comes to responding to scalability needs. “I can spin up Amazon EC2 instances any time I want to do a major application version change, or I can create a new Amazon CloudFront distribution with a few clicks,” says Daud. “Overall, we can move faster as a result of AWS.”
The organization also has better visibility into system performance. “We had some image-processing delays, and the initial versions of our cloud apps didn’t expose enough metrics, so we couldn’t figure out which component of the pipeline was causing the delay,” says Michal. “After we started using Amazon CloudWatch to track latencies, we had a much clearer picture of the causes, and we took action to eliminate them.”
Zillow will continue to look for additional opportunities to move services to the cloud. “When we first migrated to AWS, CloudFront was still relatively new, and we thought we were taking a risk,” says Michal. “However, it’s proven itself. In the future, for new projects and services, we will be considering AWS. Our experience has been great.”