We’ve been working on a realtime P/L update service which pushes updates of rows out to grids displayed on client desktops.  It has been live at a number of clients for a wee while now and more and more as the upgrade cycles roll along.  The core of the problem boils down to this toy:

  1. On a client desktop, a grid with one row showing price-derived calculations for one ticker.
  2. New prices for the same one ticker arrive at a rate of P per second.
  3. The server sends out new rows at a rate of R per second.
  4. The client (receiving the R rows per second) updates the grid row G times per second.

Of course the client grid wouldn’t really update one row so frequently so take this with a grain of salt.

Supposing the server is able to update and send a row quicker than the time between price updates, and suppose that the client is able to receive and update its grid quicker than the time between arriving rows, then, if we let the server work like a rabid hamster then R = P and G = P.  I.e. if 1000 prices arrived within one second (and none after), we’d send out 1000 rows in the second, and the grid would be updated 1000 times in a second.  Perhaps server and client CPU is running at 50% each during this time and the Data Latency (time between the price arriving at the server until being displayed on the grid would be pretty small (2/P + network/etc. latency).  At the end of the second of time, CPUs would be idle.

Suppose however that CPU on the server is in use so that only 25% is available (instead of the 50% we want).  If we kept to the same policy of having to output a new row for every price that arrive then it would now take us 2 seconds to send all the 1000 messages (overall CPU work = “% x Time” being constant), meaning that by the end of it, the last price is 1 second old (latency = 1s) by the time it ends up in the client grid.

Of course you’ll see in our toy that we were silly to try and process all 1000 messages as they are all for the same ticker and row.  We could probably have processed only 10 of these (every 100th discarding the rest) and only sent 10 rows during the one second in time, updating the grid 10 times and the user would probably be perfectly fine with that and we would have used 1/100th of the CPU from before.

Suppose we implement a policy where only process up to 10 prices per second and discard the older prices.  This means that the actual inoming price rate could vary from 10 up to 1000 prices per second (or beyond) and we wouldn’t even care.  Instead of having CPU go up and down at the whim of the incoming price rate, we have restricted CPU to a more comfortable, steady, ride.  Effectively we have added a ‘mechanical suspension’ to our execution.

  • Mechanical Suspension : decouple output and resource consumption from variable input rates.

Effectively this means discarding a lot of obsolete prices, overwriting them with the latest price each time and using only the latest price each time.  In our toy example we can go further and add suspension to the row-sending from the server – e.g. no more than 5 row updates broadcast per second.  Then we could add suspension on the messages arriving at the client and also being written to the grid.

So, if “||” means suspension we have:

  • Prices || Server Price Execution || Server Row Sends ||  Client Row Msgs || Grid Updates

So fluctuations in the rates of things on the left have insulated effect on the rates (and CPU/resource consumption) of things on the right, ensure a smoother ride.

The two critical measures you want to monitor in this kind of push scenario are CPU (or whatever the bottleneck resources are) and Latency – how long it takes from a price change to get through the system.  A liberal use of performance counters throughout will turn the black box inside out.  Take advantage of points of configurability to allow the CPU vs. Latency tradeoff to be optimized on a case by case basis.  The suspension also means the system is robust under periods of low CPU availability, the end result is only higher latency for the duration of the resource downtime, not ever-growing backlogged queues with lots of work to do when CPU comes back to life.

In the example I describe imposing specific rates per second at various stages – this is only for illustration and we did not use such a specific limit.  How you want to implement suspension depends on what you want your threads to be doing and how you want to allocate them to work.  But that is another story.