Interesting test results (for a budding tester at least)

Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.
Post Reply
Jimmy
Senior Member
Senior Member
Posts: 39
Joined: Wed Apr 16, 2003 11:31 am

Interesting test results (for a budding tester at least)

Post by Jimmy » Fri Nov 07, 2003 1:34 pm

In testing a set of parameters recently for a medium-term trend following system, I was interested to see the robustness of the system tested over various time periods. Little did I know that I would end up learning an interesting lesson from this little exercise. Please read on to see if you have seen the same in your testing or if this is an aberration and I am just plain loony.

OK here are the steps I took. First, I optimized the parameter set over a 20 year time period with a basket of 22 commodities. Actually, it was 20.75 years (1/83 through 9/03). I took caution to not pick the absolute best parameters, but chose the ones that were in the middle of a smooth range. The results from the test were encouraging with annual returns just above 80%, MAR slightly above 2.00 and max DD around 38%. The actual test statistics aren’t as important as the comparison of the numbers through out this test.

Now to test for robustness, I re-ran the tests with the same system parameters in increments of 1 year. For example, my next test period was 1/84-9/03, then 1/85-9/03, and so on with my last test period being 1/03-9/03. The results seemed pretty good and some-what robust with the MAR ranging from 1.01-3.04 with an average of 2.00. The distribution was skewed to the right, which I consider to be a good thing. I did notice lower returns and DD’s if I had started trading this system over the last 5 years.

This prompted me to take it one step further and re-optimize the parameters for the last 5 years (10/98-9/03) to compare the results from the 20 year optimization. After getting my new “optimalâ€
Attachments
Breakout System - Time Period-10-98_9-03(5)_31941_image003.PNG
Breakout System - Time Period-10-98_9-03(5)_31941_image003.PNG (6.29 KiB) Viewed 5621 times

enigma
Roundtable Fellow
Roundtable Fellow
Posts: 99
Joined: Tue Apr 29, 2003 6:56 am
Location: UK

Post by enigma » Sun Nov 09, 2003 6:21 am

Jimmy, correct me if I'm wrong, but it seems like you've tested your system on the same set of data in which you optimised over for your 20yr test, which is why the results look so good. You would need to test out-of-sample data for a better gauge of robustness and realism.

Louis

Jimmy
Senior Member
Senior Member
Posts: 39
Joined: Wed Apr 16, 2003 11:31 am

Post by Jimmy » Sun Nov 09, 2003 9:43 am

Louis,

Thank you for the catch! A friend noted the same issue to me a day after I posted my message. I was so quick to press the "submit" button after I saw the results that I didn't stop to meditate over why the results were the way they were...human error on my part. :oops:

Thanks again,
Jimmy

William
Roundtable Knight
Roundtable Knight
Posts: 238
Joined: Sun May 04, 2003 4:41 pm
Location: Manhattan, New York

Robustness

Post by William » Sun Nov 09, 2003 6:53 pm

Enigma,

Why do you feel using the different time period sets is the best way to do things? I am asking because i battle with it myself.

One example would be that in testing your system during the 1980 - 1990 period, based on all of the events and price moves during that time, you found that it created an apple. However, in the next ten years, you find your system parameters creates an orange, based on different price moves and shocks. Why not create a system that tests both at the same time (20+ year set) and is therefore able to handle both the apple and orange market environments.

Another way to look at it, to me, is that using only 10 years is similar to trying to assume for example what your typical day is like. And in order to figure out your routine i was given the first 10 hours of your day and based on those 10 hours i was forced to fill in the rest of your day. Obviously your routine throughout the day changes, what you eat and do in the first 10 hours can dramatically change in the next 14 hours. Based on that data, i might assume that you work all day long and eat Cheerios for breakfast and dinner. I would rather have the first 20 hours, take an educated guess about the last 4, to try to figure out what your 24 hour daily regmine is like. As opposed to only having the first or last 10 hr set and guessing about the rest...

For testing, i think,i would rather have the entire set of 20+ years, in which a variety of different situations happened and tailor fit a system that can consistently handle all of those different situations. Kind of find the middle of the road for the past 20+ years.


In the end backtesting is optimizing, i think we try to limit ourselves and our system by not seeking the best results because we dont want to over fit. But whats worse, creating a system that did extremely well and hoping that the past will be somewhat close to the future (based on finding rounding neighborhoods and avoiding steep surrounding drop offs.) Or creating a system based on sub optimal parameters for fear of over tailoring or hoping that the market shifts towards your currently inferior settings (Dennis). Or in this case using less data and assuming that the past ten years will be the same as the next and realizing its not and then trying to find a happy medium and/or still be left scratching your head wondering if the next ten years will be more like 1980 - 1990 or 1990 - 2003 or something totally different.

Curious as to your thoughts....

Kiwi
Roundtable Knight
Roundtable Knight
Posts: 513
Joined: Wed Apr 16, 2003 1:18 am
Location: Nowhere near

Post by Kiwi » Sun Nov 09, 2003 7:49 pm

William,

You will get a detailed explanation if you read some of the recommended books on system testing but the basic issue comes down to this.

If you develop it to fit your whole data set you achieve just that. A fit (read curve fit) for the whole data set.

If, as Louis suggested, you develop something that works in all of periods A, B and C and then it is still good in periods D and E you have a much better chance that it is not a curve fit to your data set. If it then works in real time going forward you can start to feel pretty good about your process and the likelihood that the system will make you money not broke :(

John

William
Roundtable Knight
Roundtable Knight
Posts: 238
Joined: Sun May 04, 2003 4:41 pm
Location: Manhattan, New York

Post by William » Mon Nov 10, 2003 7:24 am

Kiwi, thanks. I have read some of those books and i will go back to them and re-read them. I understand what that testing method is attempting to avoid as well as achieve. Has anyone read studies where they compared results of the two methods of backtesting going forward?

Post Reply