Page 1 of 1
Console.Writeline("How Do YOU Simulate?:Part II");
Posted: Tue Dec 28, 2004 10:54 am
It is interesting to see such a large number of forum members who develop and work with their own custom simulation applications.
For those who are in this group, here is a poll of your language of choice. I left some languages out and included an "other" category. I also included some for a few laughs - hoepfully no one actually codes trading applicaitons in those.
Posted: Tue Dec 28, 2004 5:53 pm
It's PERL without the "a". Check www.perl.org
for more info.
Posted: Tue Feb 01, 2005 1:22 pm
I use C# for my back testing and simulating. Interested in hearing how others have designed their systems. I'm using an event model to channel data through the system which raises buy/sell signals which get fed into a trade management system (TMS) into which the data (prices) are also fed (via events).
This also allows me to run live data through the same system and generate signals in real time which are then fed into an IRC chat room as well as a window on my screen.
What is everyone else doing?
Posted: Tue Feb 01, 2005 6:57 pm
While the event system sounds great for "real time", I'm not sure I would want to generate 60,000 events each and every time I'd like to test 30 markets over 10 years ... (200 days * 30 markets * 10 years)...
Loading those data from disk in one go, and then processing 30 arrays of 10 years price data each (as opposed to calling an "event method" on each such tick) would seem to imply a much faster
turnaround time between runs ... even in C#
Posted: Wed Feb 02, 2005 2:26 am
My initial speed comparisons showed that there was no delay in using events as opposed to directly looping over the records from the data base but I agree that you have to look carefully at the speed consideration when designing the underlying system such that you don't impact that particular aspect.
The system that I've designed is aimed at day trading and as such the data base has bars of 1 minute data. The event model allows me to easily convert those 1 minute bars of data into any timeframe that I want and also gives me the ability to arbitrarily insert any object into the data path to allow further processing.
From what you've said it looks like your system mainly tests Day timeframe data?
Posted: Wed Feb 02, 2005 4:55 pm
I'm using C and trying to keep it as simple as possible. I originally tried both Perl and Ruby, but they were far too slow, even when I coded the inner loops in C. In fact, writing the whole thing in C is easier than I thought it would be, and it's blazing fast.
I'm currently testing the US stock markets, and I load ten years of history into memory. I'm using a ramdisk so loading the db takes only a few seconds per run. I sometimes write out results to a text file which I load into R (a free statistics package) to look at charts and do further analysis.
I'm using CSI's Unfair Advantage, and I wrote a little perl script to write out the entire contents of the UA database to a much smaller and more efficient binary file. The 10 years of US stock history (about 50k stocks) takes up under 600mb.
Posted: Thu Feb 03, 2005 2:45 am
IMO people coding simulation systems tend to spend too much time worrying about speed and not enough time thinking about what they are testing. Although speed is an issue and innefficient algorthms should be avoided there are ways of getting around this by doing batch testing, using threads and using multiple machines.
For example, my testing system has a batch mode:
1. I select a batch of tests to run (sometimes a few thousand tests on 1 minute data over several years - this amounts to 1.4 million records of OHLC data being processed for each test).
2. I kick off the batch process.
3. The tester "picks up" a number of tests and starts running them - each one in a different thread.
4. The tester starts off running the first few tests in 1 thread and then tries with 2 threads and then 3 threads etc. and measures the average time it takes to complete a test and then it determines the optimum number of threads (and hence simultaneous tests) to run.
5. If I kick off another tester on another machine on the network it can also pick up tests that have been put into the testing queue. If this is a faster machine with multi-processors then it will probably optimize itself to run more tests simultaneously than the first machine. This way I can utilitze the processors of idle computers on the network.
6. I also run these batch tests overnight.
7. I can add more tests to the testing queue from any machine and the next available machine will pick up the next test from the queue.
8. All the tests store full results but a single aggregate result is placed in a database for a summary of which types of parameters, symbols and strategies performed the best.
Hope this helps improve your design.
Posted: Thu Feb 03, 2005 5:55 am
Guy, it seems to me that backtesting is one area in which focusing on speed is certainly warranted. In an application in which the inner loop can be run millions or tens of millions of times, you don't want to be wasteful.
Load balancing on a network is certainly a valid way to increase processing power, assuming you have a network of computers at your disposal, but it does not replace the need to be efficient.
Batch testing is useful, but it also limits your ability to retest new ideas (which could arguably be a good thing). I didn't understand your use of threads, though. On a single CPU machine, how would launching more than one thread improve the performance of a CPU-intensive application? I would think using threads in this case would only increase overhead.
Posted: Thu Feb 03, 2005 8:58 am
I don't disagree with being efficient. I just get the impression that a lot of time is spent in gaining an extra few seconds rather than thinking about the problems at hand. There are some obvious concepts that you need to test and undersand such as using doubles instead of decimal and that sort of thing.
What I've noticed on a single processor machine is that there is still an optimal number of threads to run which is almost always more than 1 unless the CPU machine is particularly slow and then the optimum is 1. I know that on one of my machines (P4 1.8Ghz) the optimum number of threads is 3. This reduces the average time per test to 66% of the time compared to using 1 thread even though the CPU is maxed out on using any number of threads.
Exactly why this is the case I'm not sure - perhaps someone reading this can offer a suggestion...