Page 1 of 1

Speed metric for Backtesting software: CPU clocks per bar

Posted: Sat Jun 23, 2007 9:31 am
by Mark Johnson
I would like to suggest a way to compare the speed of different backtesting programs. (How much faster is Blox than System Writer Plus? and so forth). My approach attempts to factor out the raw hardware speed of the computers used to run the tests, so you can run the different backtester programs on different computers if it suits your convenience.

The figure of merit I propose, is
  • CPU clock ticks per bar
where a "bar" is a day (if using end-of-day data), or an hour (if using hourly-bar data), or a minute (if using 1-minute-bar data). Very fast and efficient software will use fewer CPU clock ticks per bar, while sluggish and inefficient software will use more CPU clock ticks per bar.

The fastest imaginable software would somehow manage to process all the rules of your system (Portfolio Manager, Entries, Exits, Money Manager, and statistics) in one single CPU instruction that took just 1 CPU clock tick to execute. ("the ultimate CISC", a bit of an inside joke). So the theoretical limit, the best "CPU Clocks per bar" imaginable, is 1.0. (or 1/N if you can bring yourself to imagine an N-core ultimate CISC CPU). Actual real-life CPUs runing actual real-life software, will have quite a few CPU clocks per bar, because it takes quite a lot of atomic CPU instructions to accomplish all the work of running a system.

I recommend running approximately the same trading system on all backtesters that you're measuring and comparing, so they are all doing approximately the same amount of "work". The triple EMA crossover system is a decent choice here.

I also recommend running an optimization that steps parameters through several values, so that the entire run takes at least ten minutes. (That way, if you make a one- or two-second error in timing the run, its effect will be negligible.)

I did the measurement just now on Blox Builder 2.1.21 on my Latitude D820 laptop, which is a dual core machine running at 1.95 GHz, with 2.0 gigabytes of RAM. Here are the data:
  • portfolio of 30 markets
  • 11.0 years tested
  • 82636 daily bars (wrote a little Update Indicators blok that counted them)
  • manual double-check: 30 mkts * 11 years * approx 250 days/year = 82500, which agrees with the actual count very well
  • Triple MA system
  • Optimization run that included 900 individual tests
  • Stopwatch measured runtime of the optimization run: 17 minutes + 47 seconds
The calculations are pretty simple
Total # seconds = 47 + (17*60) = 1067 seconds
Total # Clock ticks = (1067 seconds) * (1.95 billion clocks per sec) = 2081 billion ticks
Total # bars = (82636 bars/test) * (900 tests) = 74,372,400 bars
Clock ticks per bar = (2081 billion ticks) / (74.372 million bars) = 27,980 clock ticks per bar

In summary, Blox takes about twenty eight thousand clock ticks to process each bar of data, when running the Triple MA system. It's a simple way to measure software speed, giving results in intuitive units that are easy to remember: 28K clocks per bar, bada boom.

Posted: Sat Jun 23, 2007 1:21 pm
by RedRock
Things that make ya go - hmm/\/\/\

A caveat could be the quality of the results in scope and (presumed) accuracy. Pure speed which yields fewer meaningful measures, is not necessarily better. So, quality of test results should be factored in as well. No?

Posted: Tue Jun 26, 2007 7:08 am
by mojojojo
You have to be carefull with multiple core systems. You can't assume that all cores are actively processing data for the tests. There has been benchmark tests that show that multiple core systems vary significantly in terms of effective cores being used. There was an article recent in linux magazine that tested an 8-core system. I know that most people here won't be using that but it's a good example. The effective cores in the test ranged from 3 to almost 8 (They averaged the results of mulitple runs), depending on the test used.

Posted: Tue Jun 26, 2007 8:25 am
by Mark Johnson
mojojojo wrote:You have to be carefull with multiple core systems. You can't assume that all cores are actively processing data for the tests.
Absolutely true (at today's state of the software art, anyway). That's why I made the intentional choice to leave the number of CPU cores out of the speed metric calculations. If software AAA were able to use all eight cores but software BBB only used one core, this metric assigns an 8X benefit to software AAA. Deliberately!

Today, it is the application programmer who determines how well or how poorly a backtesting simulator exploits a multicore CPU. So, today, I think The Right Thing To Do is credit this speedup to the application.

Eventually, some day (long after C.A.R. Hoare and the other multiprocessor pioneers are dead and buried), software environments will let applications automatically and transparently exploit hardware parallelism, with no extra "help" from the application programmer. When that happy day arrives, this speed metric will become obsolete. Until then, I recommend that we give credit to the application if it is able to keep many cores busy at one time.