C++ Platform Design - Speed vs. Complexity

Discussions about custom-built testing platforms written in C, C++, or Java.
Post Reply
Lquestfree
Full Member
Full Member
Posts: 15
Joined: Sun Apr 20, 2003 4:10 pm

Post by Lquestfree » Mon May 12, 2003 9:36 am

Forum Mgmnt wrote:The reason I ended up coding my own testing platform in C++ is that I hate to have limitations imposed on my testing. I also like to have really fast tools. I don't like to wait; I want testing to be fairly interactive, not test, wait, wait, wait, think of new test, test, wait, wait...

Hi Forum Mgmnt,

I fully agree with what you said in the above post. It would be terrible to 'paint oneself into the corner' from the onset by using a 'crippled' testing program.

I am currently in the design phase of a C++ trading test platform but I am not exactly an experienced developer when it comes to rich-client performance-driven programs. Do you have any suggestions for the initial design of a high-powered testing engine? The major issue I'm confronting right now is whether to go for the clean OO design where you have all the trading concepts expressed as objects (e.g. candle, trade, portfolio, indicator etc..) OR the procedural approach using a mix of multi-dimensional arrays and pointers etc.. There seems to be a tension between clarity of code and maintainability vs raw speed. Any tips or suggestions with regards to designing for performance would be greatly appreciated. Thanks.

J

PS. Great forum with high signal to noise!

Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt » Mon May 12, 2003 11:10 am

I don't have the conventional perspective when it comes to the "Speed versus Elegant Design" debate. Most people think there is a clear tradeoff here. I don't.

As a practical matter, speed is usually a matter of a few routines here and there being coded well. Most of the time you find speed problems come from poorly coded algorithms.

I will always try and come up with an optimal design first. You can always optimize the pieces if they end up being slow later. Don't assume something will be slow unless you've had past experience with it.

I don't find that the object-oriented approach slows me down as compared to an equivalent procedural approach. Since I can understand the object-oriented design better, it is more malleable, so I can more easily implement speed optimizations if they become necessary. A complex difficult to change program won't be faster over time if you can't easily change it.

One of the keys is to know where to draw the line between objects and more conventional data structures. For example, I use an object for each instrument (stock, commodity, etc.). This object is responsible for all the data associated with an instrument. I don't use objects for each day of data associated with that object. I use a dynamically allocated array of more conventional structs.

One of the greatest speedups is loading the entire set of daily data for all instruments into memory before doing any testing. It is much faster to read sequentially an entire file than to read in small chunks. The disk is the greatest bottleneck in a computer system.

I recommend you go with the clean design. You can add speed optimizations to a clean design. It is really hard to improve the design of a hard to understand and manage code base.

- Forum Mgmnt

P.S. It would require a book to adequately address this subject. There are a few okay books out there, but nothing really great.

Lquestfree
Full Member
Full Member
Posts: 15
Joined: Sun Apr 20, 2003 4:10 pm

Post by Lquestfree » Tue May 13, 2003 11:10 pm

Thank you c.f. for the sound advice . I fully agree with the fact that being too speed-obsessed during the design phase is actually counterproductive. I'll go for the clean design, then run actual performance benchmarks to find the bottlenecks and then tweak those areas.

Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt » Wed May 14, 2003 9:04 am

Yes, you've got the right approach. Design first for design's sake, not implementing designs that you know will have bad performance but not over optimizing for performance in the design.

Get things running and then figure out what is slow. If you have a good design you can change most anything pretty easily.

I remember having this point driven home by one of the engineers who worked for me at Borealis, Chris. We were discussing a part of the database subsystem that would be used all the time and could be a performance bottleneck if it was slow. I was proposign a more complicated design and the engineer wanted a simply and slower design.

When Chris suggested that we implement his suggestion and if it was not fast enough we could then do mine, I agreed. Since the overall system was well-designed, the change in design would be isolated to the code in question and did not affect the interface used by the rest of the system.

It ended up not being a bottleneck at all. His design took a day or two, mine would have taken two weeks. Chris' design was better for the purpose.

Mine would have been 10 to 20 times faster but that time was much better spent working on something that really needed to be sped up.

TradingCoach
Roundtable Knight
Roundtable Knight
Posts: 176
Joined: Thu Apr 17, 2003 9:52 am
Location: Sacramento, CA
Contact:

c++ versus vb

Post by TradingCoach » Wed May 14, 2003 10:56 am

Forum Mgmnt,
I will first admit that I am not a windows programmer so asking questions here is not a problem for me. Without prying into secrets of veritrader I like to ask you what backend database or file management does veritrade use? I imagine it has some hash/flat file mechanism to manage positions or portfolios (not a relational database).
Additionaly what limitations do you see using VB instead of Vc++ ?
My understanding is that you can do everything in VB that you can do in C++....
The application can use 80 percent VB as a UI and the speed intesive parts can be in C(C++). Where you would gain speed is the design of the backend file/record management (where using a relational db can slow you down)..
-Andras

TradingCoach
Roundtable Knight
Roundtable Knight
Posts: 176
Joined: Thu Apr 17, 2003 9:52 am
Location: Sacramento, CA
Contact:

I only would use a db for reporting and statistical output

Post by TradingCoach » Wed May 14, 2003 11:50 am

like crystal reports. I agree that for run time your rdbs would only slow you down however when you want to make sense of the results a db and some slick reporting tools would come in handy..
Also if you run multiple tests and want to save results from one to another I like the old fashioned data (like buy sell stats, portfolio stats etc...)

Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt » Wed May 14, 2003 1:33 pm

Hiramhon, Andras,

Using a relational database for storing anything to do with trading would take several orders of magnitude more time than using memory which is what VeriTrader does.

We generally store about 24 bytes per daily data bar per market. We read them all into memory and then process each days trading just like a real trader and real broker would. Only totals are retained in memory, the rest is output to the results files (trade log, equity log, etc.) on a realtime basis so there is not storage overhead. Other information is kept in memory as needed (i.e. information about open positions, risk, etc.) so there is minimal storage overhead.

If you are trading thousands of stocks, keeping this information around would require several Gigs of RAM.

This makes testing very fast. For example on one of our machines, a 1.7 Ghz AMD Athlon with 1 Gig RAM, it takes 1.5 seconds to read our basket of 22 "Turtle Futures" and 11 seconds to run a 10 year test. The same test on one of the 2.2 Mhz servers takes about the same time. This makes it practical to test 1000s of different parameter combinations every day.

Over time taking summary information and placing this into an RDBMS does make sense but the information that I would find useful would be information about the tests themselves, rather than the data for the futures, the trade, or the equity for a particular test run.

This would let one examine the various tests and results that had been run over a longer period of time. Even here, I tend to run many, many tests that find nothing of value and a few tests that find new interesting ideas and concepts. I can usually keep these in my head or another storage mechanism like a textual description in a Microsoft Word document.
Last edited by Forum Mgmnt on Wed May 14, 2003 9:39 pm, edited 1 time in total.

Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt » Wed May 14, 2003 9:37 pm

Yep, that was the math we did when we looked at the design.

As computers have gotten faster and faster, the spread between the speed of memory and disk access has widened considerably.

Disks have more than 5,000 times the capacity but some of the performance characteristics have only improved by 10 to 20 times. Access times, platter rotation,etc.

Computer performance is 1,000 to 5,000 times faster in the last 20 years versus that 10 to 20 times improvement for disk speed. Disk speed is still measured in 1,000ths of a second, microprocessor and memory speed in 1,000,000ths or 1,000,000,000ths of a second. So it is even more important now to avoid using a disk whenever possible to improve performance.

Back when I was first writing these harnesses, it used to take hours to run a test I can now do in less than 15 seconds. We were happy to get our first 5 Meg hard drive so we could do the tests 20 times faster, it would still take hours to run a single test.

The fun part comes when you do multiple tests using different stepped parameter values. Since all the data is loaded into memory once, so you only need to do the computations for trading not the data loading for each test after the first one.

We'll be implementing a distributed system fairly soon so that tests can be uploaded to run on a remote server machine so you can just submit a test to the server and then the results returned when the test is finished. This way the local machine will only be used for analysis and interpretation of the results.

Kvadrik
Full Member
Full Member
Posts: 15
Joined: Sat Jun 19, 2004 3:08 am
Location: USA - Israel
Contact:

Post by Kvadrik » Mon Aug 02, 2004 8:38 am

Forum Mgmnt wrote: I also like to have really fast tools. I don't like to wait; I want testing to be fairly interactive, not test, wait, wait, wait, think of new test, test, wait, wait...

Dear Forum Mgmnt,

You are absolutely right. I use C++ strategy-oriented API that is developed in Strategy Tuner System. For technical information see statements of Strategy Runner API C++ Programming Reference: Callbacks, Trading Functions, Alert Functions and others.

chrisuu
Contributor
Contributor
Posts: 4
Joined: Thu Nov 04, 2004 6:33 pm

Other approaches to custom platforms

Post by chrisuu » Thu Nov 11, 2004 11:30 am

I'm in the early stages of developing a platform suitable for application of the turtle rules (both a dry-run for rules testing and eventual real use.) I'm using a SQL (MySQL) database to retain inventory, instrument details, market data (OHLC and derived values as a time series), and currency conversion rates. I'm using Python both for original code and to implement logic presented in other languages (e.g. C or FORTRAN.) For scratch work I'm using the Open Office spreadsheet.

I'm not sure the issue of speed makes too much sense in determining the choice of certain tools, and especially the choice of programming language. The typical modern scripting language, once compiled into the runtime system's native byte code isn't all that slow, and often ends up leaning on the same system libraries that code compiled from C or C++ into native code would too. The database is nice because there already exists an API to facilitate addressing arbitrary bits of data, and an appropriately indexed database makes accessing arbitrary parts of your data store with somewhat novel queries very fast.

Optimizing the platform's software for speed to accelerate the development and testing cycle may be somewhat misguided, in that a language like Python with its impressive brevity and lack of a compile-link cycle means less turn-around time assocaited with implementing the incremental changes which are to be tested. It's possible to link software compiled from C/C++ to Python code, so even where extreme optimization is required, the ability to do so isn't lost with a "scripting language."

One other consideration which has guided my choice of tools is portability: everything I've done thus far could be moved from the Windows XP system I'm working on now to a Macintosh or a UNIX/Linux machine with absolutely no porting effort, except for perhaps correcting line termination characters in a few text files.

ksberg
Roundtable Knight
Roundtable Knight
Posts: 208
Joined: Fri Jan 23, 2004 1:39 am
Location: San Diego

Re: Other approaches to custom platforms

Post by ksberg » Fri Nov 12, 2004 3:45 am

chrisuu wrote: ... I'm not sure the issue of speed makes too much sense in determining the choice of certain tools, and especially the choice of programming language. ... .
The issue of speed and choice of tools are always relative to what you're trying to do. In this particular forum thread, we have at least the following: (a) developing the trading platform, (b) developing trading code for the trading platform, (c) developing trading ideas through experimentation, and (d) executing trading signals via the platform. Choice of language can affect speed of (a)(b) and even (c). Most modern languages and tools are probably up to the task of efficiently delivering (d) given a decent design. For those of us spending time on building our own platforms ...

Code: Select all

Personal Time = W1*(a) + W2*(b) + W3*(c) + W4*(d)
Where a, b, c, and d are the activities mentioned above and W1, W2, W3, W4 are time-weighted factors in terms of productivity.

In the above formula, the point made on Python is that W1 is small in comparison to, say, C++. I really do believe this is something not to be overlooked. I measured my own W1 for Smalltalk, C++, and Java, being equally versed in all three. My personal productivity numbers in Smalltalk were 10x that of C++ (and at the time I was doing mainline C++ code for ORB internals, cross-compiling to 10 different platforms ... so it's fair to say I knew ins-and-outs of all the language and compilers extremely well). Around that time, my W1 for Java was about 2-3x that of C++. My current measures have changed slightly, mostly due to OSS and readily integratable libraries. This gives Java a greater lead than in the past. I'd say it is currently gaining on the productivity I experienced with Smalltalk (and with Eclipse I get no compile/edit/link cycle).

Another case in point for libraries: last night I go looking for convolution code, specifically something that would take drive a kernel for optimization smoothing. Lo and behold, it's supported in native Java 2D. So I can implement my trading convolution routines easily in Java and I get OpenGL hardware acceleration for free! How long would it take me to do the same thing in Python? Smalltalk? C++? (answer: way too long, especially for accelerated hardware).

Chris, I would also ask why you believe you need a database. Is this an ODS? Datacube? Transactional store? The db can help or hinder depending on what type of analysis you're doing. Much of my analysis depends on a very dynamic environment (position sizing, equity models, risk models, in-place optimization) and is a result of running different tests, not examining static data, which implies I want to minimize both W2 and W3. W2 addresses how quickly can I substitute one idea for another, or insert a different approach. With W3, I'm looking to exercise that dynamic environment through iteration and analysis. For this the platform itself needs to be fast and flexible. I find that database access isn't going to cut it against running hundreds of thousands of test runs. So while W1 (development time factor) is important, W2 (trading code factor) and W3 (trading ideas and experimentation factor) are very important to me as well. (BTW: Thanks for the tuning inspiration Forum Mgmnt, platform now clocks as high as 22 portfolios/second, 10yrs 10mkts, IBM T-41p Centrino).

It all comes back to this: the choice of tools depends on what you want to do. And lowering complexity to me means lowering the all the factors that multiply my time. I can't answer what your holistic equation is or the time factors, but you might want to consider those as a part of your selection process.

Cheers,

Kevin

verec
Roundtable Knight
Roundtable Knight
Posts: 162
Joined: Mon Jun 09, 2003 7:04 pm
Location: London, UK
Contact:

Post by verec » Sat Nov 13, 2004 8:46 am

Ksberg points on time are very convincing.

There is another dimension that he alluded to that must not be overlooked either: the level of abstraction.

This has nothing to do with time, but with the what it is that you will be able to achieve using language/tools/libraries X, Y or Z.

I started C++ more than 15 years ago, and I wouldn't consider it for new projects. The reason is that, very often you find yourself in the situation of designing a power plant to supply current to the drill that will allow you to dig the hole in which to plug the screw ... all that because the problem at hand you had to solve was to hang a painting off the wall! :!:

To me C++ looks like a failed attempt to play with the grown-ups: "we have the efficiency (C), they have the power (SmallTalk), let's have both: (C++)". Well, not quite...

For any new project, I would consider only high-level languages/tools/libraries (Java/Python/Ruby/SmallTalk/Clos/...), and only recode in C -- or in assembly if I have to! -- the bits in the Ksberg (d) parts, where execution speed really matters.

Human languages help express ideas. Computer languages are supposed to help you make them work. The bigger the gap between the two, the harder it becomes to really achieve it. As the disconnect grows between what you want to do, and how those languages force you to do it, you start losing sight of your initial goals, get side-tracked into irrelevant issues (Why, oh why would you still need to deal with memory allocation by hand, in the 21st century?), but even more importantly, can't deal with higher level concepts because they overwhelm your 7 ideas at a time human brain.

If you want to "stand on the shoulders of a giant", be sure to pick the right giant to start with :)

ksberg
Roundtable Knight
Roundtable Knight
Posts: 208
Joined: Fri Jan 23, 2004 1:39 am
Location: San Diego

Building power plants

Post by ksberg » Sat Nov 13, 2004 11:30 am

verec wrote:There is another dimension that he alluded to that must not be overlooked either: the level of abstraction.

This has nothing to do with time, but with the what it is that you will be able to achieve using language/tools/libraries X, Y or Z.

I started C++ more than 15 years ago, and I wouldn't consider it for new projects. The reason is that, very often you find yourself in the situation of designing a power plant to supply current to the drill that will allow you to dig the hole in which to plug the screw ... all that because the problem at hand you had to solve was to hang a painting off the wall! :!:
ROFLMAO! Good one! :lol:

I think you hit the nail on the head. All the W time factors I mention are influenced by complexity and abstraction. It's also why we have coding platforms like TradeStation EasyLanguage and VeriTrader Basic in the first place. These are designed precisely so you don't have to go build the power plant!

Which brings me back to the question what is it you want to do? I think a custom platform is useful when it enables you to do what the commercial solutions do not. Otherwise it is much much much cheaper to buy vs. build. Thing is, when you discover something beyond the bounds of commercial offerings, how much effort is there to bring a custom coded effort to an equivalent usable level? Lots! The bar is very high these days. So most often it makes sense to supliment what the commercials do. Even then you'll want some base functionality in common with that platform.

At any rate, if you have chosen to do a custom platform, it pays to assess what you want to do, the availability of components that actually help you get to where you're going, the complexity and level of abstraction offered by the language and the potential time impacts. Rarely is it just a matter of choosing the language.

Cheers,

Kevin

chrisuu
Contributor
Contributor
Posts: 4
Joined: Thu Nov 04, 2004 6:33 pm

Post by chrisuu » Sat Nov 13, 2004 5:37 pm

Regarding the use of a SQL database over ASCII data files:

I view the use of an SQL database as a pre-optimization decision. So far, database transactions haven't been a time problem. The biggest advantage in a database so far has been that I don't have to deal with code to manage and manipulate data files. Another advantage has been that my code can't seriously damage my data set, as each transaction can be reversed.

The largest time problem I've encountered so far has been with some experimental code to calculate the fractal correlation dimension from daily and weekly closing prices. If this module proves useful, it will probably be rewritten in C.

Question: How do you deal with the problem of continuity in stocks? Do the commercial systems handle this problem well?

For example, besides the simple problem of survivorship (sometimes a security simply ceases to exist) there is the problem of a short-term symbol name change (e.g. while delisting is pending or following a revese stock split), and there is the problem of one stock being converted to x shares of another security.

My present approach to market data management and security definitions is such that my table of security definitions reflects the current market, and my market data table reflects each historic day as it was then. To be able to walk forward through time for simulation purposes, I imagine I will need to have something like a "continuity table" -- one that describes a translation of one security into another beginning with a specified date. For example, a record that states "effective 12-Nov-04, NASDAQ common stock CTTY is known as CTTYD", or "effective 12-Nov-04, NYSE common stock FOO is converted to 1.25 shares of BAR."

Any thoughts?

ksberg
Roundtable Knight
Roundtable Knight
Posts: 208
Joined: Fri Jan 23, 2004 1:39 am
Location: San Diego

Post by ksberg » Sat Nov 13, 2004 6:48 pm

chrisuu wrote:Question: How do you deal with the problem of continuity in stocks? Do the commercial systems handle this problem well?

For example, besides the simple problem of survivorship (sometimes a security simply ceases to exist) there is the problem of a short-term symbol name change (e.g. while delisting is pending or following a revese stock split), and there is the problem of one stock being converted to x shares of another security.
Good question Chris. I don't know of a stock data vendor that deals with survivorship issues, other than dropping symbols. Some vendors offer split dates and ratios. For those people with market selection algorithms this would seem to be an important part of back testing in order to avoid survivor bias. Does anyone know if this is handled by CSI's Unfair Advantage? Any other data sources that retain symbol changes and merger info?

From a system's perspective, I use a data vendor to deal with updates. This works great for futures and forex, but a portfolio stock system might want to consider managing the survivor issue as part of the overall strategy.

Cheers,

Kevin

DrHendricks
Roundtable Fellow
Roundtable Fellow
Posts: 81
Joined: Tue May 13, 2003 12:21 am

CSI survivor data

Post by DrHendricks » Sun Nov 14, 2004 9:05 pm

I know that opening the stockfact2.adm one of the first symbols I found was for Dr. Pepper, so I believe its possible to include delisted symbols and find this on CSI API help page:
4.1.1.3 Func: FindMarketNumber


FindMarketNumber searches an internal database (cdbfacts.adm) for any exact match to the symbol. If a match is found, the MarketNumber property is set to the result and the return value is the MarketNumber. If no match is found, the value -1 is set instead and the return value is non-positive.

Starting with version 2.0 of UA, the IsStock property is used to determine if you are asking for a Stock symbol lookup. UA has settings to determine whether inactive stocks are to be included in this search. There are sometimes stocks with no data and this can trip-up the API programmer.

The Factsheet property in UA is used to determine if the function is supposed to include delisted stocks in the search.

David

chrisuu
Contributor
Contributor
Posts: 4
Joined: Thu Nov 04, 2004 6:33 pm

value of a database

Post by chrisuu » Sat Nov 20, 2004 11:22 am

Until very recently, I was thinking in terms of a mechanical trading platform written primarily for long-term operation and maintenence, and back-testing speed was a relatively low priority. I've since decided that continuous back-testing may be necessary in a system for trading stock, so speed actually does matter. Something about foolish consistencies and hobgoblins...

In that context, I've been shifting to C++ for the core system and retaining Python only for daily data aquisition. For new data, I'm experimenting with a pair of daily price data files: one for ohlcv, one for adjustments (e.g. splits, reverse splits, delisting, acquisition, etc.) I'm still searching for high quality historical data.

Post Reply