San Francisco-based AlphaSense is a financial technology startup that provides a unique financial search engine. Using proprietary linguistic search technology, the AlphaSense search engine parses topics, concepts, and ideas semantically and finds valuable pieces of investment information from within millions of documents in seconds—all with one simple search. The company’s clients, use advanced textual search capabilities within AlphaSense to investigate potential investment ideas, allowing them to find in seconds what would take them hours or days to find with other tools. They conduct company research quickly and efficiently and find information along with investment ideas and themes that others miss.
To provide its clients with in-depth insights into investment possibilities, AlphaSense regularly indexes millions of documents from all over the world. Documents like public company filings and conference call transcripts are indexed at a highly granular level using the company’s semantic tagging technology, which allows AlphaSense users to rapidly narrow and pinpoint specific search results.
With each passing day, more and more content is added to AlphaSense’s database. The company re-indexes its database regularly so that new insights and queries can be applied to older data; it currently stores hundreds of millions of documents. But re-indexing its data store previously took a week or more to complete, due to the depth of natural language processing and language analysis needed and the massive amount of content involved. The computational power and time needed was rapidly becoming cost-prohibitive.
Why Amazon Web Services
AlphaSense began using Amazon Web Services (AWS) in 2010, when it launched its software platform. Although the company considered other cloud providers, AWS was the clear frontrunner for AlphaSense co-founder and CTO Raj Neervannan. “AWS delivers highly scalable, securable, and cost-effective computational resources, and those were huge drivers for me,” he says. “My focus should be on the business and the specific innovations and technologies our customers need, instead of running a data center. AWS provides AlphaSense with a world-class data center using state-of-the-art surveillance and multi-factor access control systems, designed to provide maximum availability with minimum disruptions to operations.”
Having predictable costs and the flexibility to release software updates quickly and easily were two other important drivers. AWS seemed to be the optimal platform for enabling the quick capture and mining of content.
So AlphaSense began using Amazon Elastic Compute Cloud (Amazon EC2) with Amazon Elastic Block Store(Amazon EBS) volumes for search engine storage. Amazon Simple Storage Service (Amazon S3) stores of an increasingly weighty data set. Amazon Relational Database Service (Amazon RDS) provides critical high-performance, structured data access and the company also uses Amazon Elastic MapReduce (Amazon EMR) to speed up search indexing. The company adds to its database every day, using Amazon EC2 Reserved Instances to provide its clients with up-to-the-minute data and insights while keeping costs low.
AlphaSense optimized the processing by parallelizing the code, allowing them to deploy high-performance, distributed, and fault-tolerant computing clusters that could leverage multiple instance types on Spot, including small and micro-instances. AlphaSense used on the order of 500 Amazon EC2 Spot Instances at a time to finish the indexing job, and it was completed in less than 2 hours. The company optimized the bidding price for Spot Instances by monitoring the history for those small/micro instances and choosing to run processes when it was least expensive to do so. AlphaSense designed its code in such a way that the instances could be restarted without losing overall throughput or processing integrity. This allowed them to further tune the bidding price, thus driving the cost even further down.
Security is another critical area for AlphaSense. “Information security is extremely important to us,” said Neervannan. “With the fine-grained IAM policies, Amazon VPCs, Server Side Encryption and SSL, we are able to lock down instances, access, and data, and manage the transfer of information from our servers in a very secure manner.”
Now that AlphaSense is using AWS, it can re-process its entire database in two hours rather than a week, for a cost of $80 rather than $5,000. Using Amazon EC2 Spot Instances has helped the company reduce both costs and developer time—engineers no longer have to monitor a weeklong process. “Using Amazon EC2 Spot Instances for batch processing has helped significantly reduce costs, and that gives us a huge advantage,” Neervannan says.
AWS has enabled the startup to react quickly and provide real-time updates to its customers. “At any given point in time, our data is as up to date as we can make it. From an aggregate perspective, customers are getting a much more in-depth look at what’s going on—and that’s all enabled by AWS,” Neervannan says.
Neervannan cites flexibility as another benefit. “One of the major advantages is the incredible flexibility we get with AWS,” Neervannan continued. “We can focus on what we do best: software innovation. With AWS, when we deploy new services or large amounts of content, we don’t have to be concerned about breaking the bank or hitting an infrastructure limit. By using AWS, we’ve architected for security, removed many technical and cost boundaries, and enabled our team to focus on building new products for our customers.”
Running on AWS has been a positive experience for both AlphaSense and its investors. “Our CEO and board of directors were excited to hear of the savings in time and expense,” said Neervannan. “And by using AWS, we were able to seamlessly upgrade our infrastructure for secure computing and big data processing.”