Wednesday, February 22, 2012

Financial Clouds and the Embarrassingly Parallel Computing Problem

This past week I had the opportunity to attend a small Conference put on by Picloud called QuantCloud 2012.  For those of you who don't know what a quant is I would recommend reading Michael Lewis book The Big Short. Essentially a "Quant" is short for a quantitative analyst. These people specialize in applying mathematical techniques to financial investment.  Typically people doing work as quants have PhD’s in mathematics. Many of the mortgage backed instruments that caused the huge economic problems is 2008 were created by quants.  (If you are looking for more information on quants Mark Joshi wrote and excellent paper titled On Becoming a Quant.) 

So I am not a quant (although over the years I have worked with several large banking customers on hardware that they use). What piqued my interest into looking at something that I was pretty sure was going to be over my head.  Over the years of working with trading firms such as Nasdaq and Bank of New York I had learned a little bit about how their systems work and when I heard that they were talking about how quants could take advantage of the clouds it sparked some interest.

The Typical Environment
 
When you look at organizations that do trading of any type there are a couple of characteristics to the data.  First of all they are small transactions, second there are a lot of them (millions), and third of all they are extremely sensitive to latency.  What this typically means is they are looking for every way to eliminate latency in their systems.  Over the years they have experimented with infiniband connectivity, flash drives in servers, running full 64 GB databases in server RAM and other things just to reduce the latency.   In my mind cloud and compute didn't compute because of the large amounts of data and the extremely low tolerance for latency.

The Big Challenges

What I learned is that these companies that trade in securities have several big problems.  First, they need to create algorithms that run through massive amounts of options and scenarios in order to identify a trend that will give them a financial advantage.  This means that the quants are continuously creating scenarios that crawling through millions of records to come up with a strategy.  Secondly, in order for the strategy to work they typically have very little time to create a run the scenarios.  Ken Elkabany the CEO from PiCloud probably put it the best when you stated in his presentation that quants  "have an embarrassingly parallel computing problem".  If they were to run their scenarios on their own it infrastructure they may not be able to complete the calculations in a timely process.

There are also some other factors that are putting some pressure on them to look at other ways to get the results they need.

1.  There is downward pressure on the market and the margins have gone down on these scenarios as more companies do trading based on quantitave analysis. A lot of the larger companies are noticing that their income has gone down significantly
2.  It is becoming harder to find these undervalued investments
3.  Markets are more volatile
4.  Increase regulation and scrutiny has been added to what they do based on the problems that occurred in 2008 with mortgage backed securities.

What this really means is the amount of money that companies were using to buy the infrastructure to do the analysis is becoming harder to get as the big banks and brokerage houses look to trim some expenses.

A Solution using the Cloud and Platform as a Service

This is where the cloud comes in.  As a mentioned before latency is concern for these companies but only when they are trading.  When they are doing analysis of the data they have "an embarrassingly parallel computing problem". They need to do lots of computations really quickly. Cloud now becomes a good option for them because instead of going and setting up an environment that has 1000s or nodes to run the calculations they can rely on the cloud to provide the cycles to run the calculations.  They then would only pay for what they used when they were running the calculations.  The underlying infrastructure is something that is managed by the PaaS provider.  That PaaS provider could use their own resources or they could use an IaaS vendor for the infrastructure.

There is one problem with this scenario.  If you use the clouds then you have to be worried about creating and cloning those environments in the cloud vendor's environment.  This is where PaaS (Platform as a Service) providers come in.  A PaaS provider such as PiCloud writes and interface that abstracts the whole concept of servers from programing interface.  In the case of PiCloud they change a couple lines of code in a developer's scripts to call cloud resources instead of local resources.  The environment is preset up because of the PaaS provider so all you have to do is wait for the results. 

There are some gotchas to this approach if you have to move large chunks of data into the cloud while doing your analysis and some institutions may have concerns about moving the data out into the cloud but this provides a interesting option for solving the embarrassingly parallel computing problem.

No comments:

Post a Comment