BuildFax offers a unique service for the housing industry by providing a centralized database of building and permit information. Housing professionals can access construction history, contractor records, and inspection information for millions of properties. As an aggregator of building and permit information, the BuildFax U.S. Property History Database collects and stores building and permit information from municipalities and counties across the country. Customers can then obtain detailed information about a property or structure via the BuildFax Structure PROFILE, which is generated from the company’s website.
BuildFax customers range from individual buyers or home inspectors to some of the largest insurance companies in the world. Regardless of size or associated industry sector—insurance, construction, environmental consulting, appraisers, real estate analysts, and banking—customers know that they can rely on the detailed BuildFax Structure PROFILE to gain insight into a property’s history or life story. For example, the life story helps lenders validate property values before backing loans, thereby preserving capital and preventing financial loss. Or, home inspectors might use the BuildFax Profile to learn about repairs or construction completed on a property. Having this information in advance enables inspectors to perform more thorough inspections in less time.
Why Amazon Web Services
BuildFax switched to Amazon Web Services (AWS) when the previous infrastructure could no longer handle demand. The AWS solution is a collection of products and services that delivers fast, reliable, and secure information to customers. Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing, and Amazon Simple Storage Service (Amazon S3) deliver all content to customers, using a series of Web application servers, Solr full-text index servers, and MySQL servers through RightScale. While this particular AWS configuration may not be unique, BuildFax processes incoming building permit data in a somewhat unusual way.
Joe Masters Emison, VP of Research and Development, explains: “Every dataset has to be mapped by a human from the original database layout to the standard layout database; then the mapping and the original data are sent as input to a series of scripts that deliver the data in the standard layout database layout as output. We launch an Amazon EC2 instance for each mapping that we need to process. Most mappings complete running within 2-3 hours, and we have run as many as 100 at once, which would have been a nightmare to schedule without the ability to launch an unlimited number of virtual machines.”
BuildFax delivers all of its data by address, so it is imperative to run address correction. To complicate matters, address information is sometimes quite sparse, perhaps including only a street number, street name, and county—but no city, state, or zip. Therefore, the company has to cycle through possible values (all zip codes in a county, all cities in a county, etc.) and mark possible matches. This process requires running 750 million different address combinations, which would be virtually impossible without Amazon Elastic MapReduce (Amazon EMR). With up to 80 instances per dataset, the company keeps runtimes under 3 hours, thanks to Amazon EMR.
For running mappings and launching Amazon EMR, it's quite easy to have a single set of MySQL servers set up with replication through RightScale, using SSL and MySQL 5.1 and striping over Amazon EBS. However, because the data input comes in many different forms—data files, text, database dumps (and rarely from MySQL), and PDF files—BuildFax needs the ability to process each dataset in an isolated environment. The strategy is similar to the way mappings are processed, in that each "loading" of data files into MySQL gets its own Amazon EC2 instance. The company is able to automatically load about 50% of the data by identifying file types, but the other 50% requires manual loading—all of which happens on separate instances. In order to accomplish this effectively, BuildFax uses Amazon S3 to store all incoming data files.
Emison explains that “no one has been able to aggregate building permit data to date because it is in thousands of different file formats stored in thousands of different locations. We have solved the problem of how to generalize and scale the collection and processing of this information, and so we can provide building permit data, which previously has been locked inside building departments, to the world.” While BuildFax has effectively solved this mystery, AWS delivers it to the world, ensuring that mortgage underwriters, insurers, home inspectors, and appraisers get a much better understanding of what changes have been made to structures over time.