Robust Optimization

shakyamuni · Post by **shakyamuni** » Mon Sep 20, 2004 12:45 am

How do you fellows optimize a strategy with several parameters across a broad range of parameter values and ensure the robustness of the parameter set that you ultimately choose to trade?

Below, I provide a little example as a survey and I clarify the discussion I am attempting to fosterâ€¦

Visually, the problem is easy to see in two dimensions. I have attached an example chart of a mystery parameter against the strategy's MAR ratio. I'd like to know what value for "parameter #1" that you gentlemen consider to be the "robust optimal" and what specifically makes you give that response. If you feel that you need more information to make a decision on this parameter, I'd like to hear what additional information you require. (Obviously, this is somewhat subjective and varies based on the intentions of the individual.)

The problem becomes a lot more complicated when dealing with systems of several parameters. Even using brute force optimization and having a clearly defined target function (desired return series characteristics), the problem of selecting the "most robust" strategy can be difficult. If anyone would like to share experiences or suggestions, I would love to receive them.

I think I saw c.f. once say "not all parameters are created equal." I agree with this statement. How does one incorporate this into a robust optimization procedure?

Lastly, do you use subjective analysis (such as looking at graphical output as I give in the example) to make decisions, or is everything done in code and the results checked after the optimal parameter set is already chosen? In either case, what do you think are the strengths and weaknesses of your approach?

Forum Mgmnt · Post by **Forum Mgmnt** » Mon Sep 20, 2004 8:55 am

This is a good topic for discussion as it addresses one of the key concepts in trading system analysis.

I said a bit on this subject at: http://www.tradingblox.com/articles/opt ... aradox.htm

but it would clearly be useful to cover this topic more completely.

I have attached a jpg version of the graph so we can see it without having to download.

I would trade parameter value 11 as it at the center of the best values, and doesn't have any cliffs or large changes nearby.

The actual maximum 16 is a great example of what the typical system vendor would give you. However, the fact that this is a spike is indicative of two things:

1) The high value was probably to a small number of large winners that this parameter value just happened to capture perfectly.

2) There is a very high likelihood that you will not duplicate these results in actual trading.

It also happens to be surrounded by areas of much lower value than the areas which surround my choice, number 11.

Another important point is that if I were not happy with the values that surrounded 11, I would not trade the system. It doesn't matter what the best tests say, it matters what the likely results will be. In this case, I think you need to be prepared for the performance of 18 if you traded 16, or for the performance of 9 if you traded 11.

- Forum Mgmnt

TC · Post by TC » Mon Sep 20, 2004 11:20 am

Broadly interpreted, robustness means that future system performance will correspond with your test results. How close this fit needs to be to satisfy your personal definition of robustness naturally varies from one individual to another.

It is therefore important to first define robustness in terms that have meaning to you before looking for optimum parameter values. For example, one trader might consider his system robust if its performance was +/- 20% of its test results, would you ?

We always face a range of potential outcomes when trading, we know that the future will be the same as the past only different and are willing to accept the probability that the future wont depart so far from the past that it causes losses. Values of 11 or 12 are the best from the chart in that respect.

I believe it pays in the long term to be risk averse and therefore would prefer a parameter value that sacrifices a bit of performance if it demonstrates a lower probability of falling apart ie its nearest neighbour values taper off gradually on either side.

If you remove the outlier trades from your results, those statistical aberrations that are unlikely to repeat often, you'll probably find that value 16 on your chart will disappear. Look at the list of trades, observe them visually on a chart, try to determine how often those patterns are likely to be repeated.

Also, try different risk:reward criteria and see if the same parameter values keep cropping up. If they do this indicates robustness. If you have an alternate out-of-sample (OOS) data set, does your strategy produce similar results or do they change significantly? If possible run the analysis on the same markets but a different time frame. The objective being to determine how sensitive the parameter values are to changes in both markets and over time. Again, robust values should give similar results across different markets and time periods.

Tom

ksberg · Post by **ksberg** » Mon Sep 20, 2004 11:29 am

shakyamuni wrote:If you feel that you need more information to make a decision on this parameter, I'd like to hear what additional information you require. (Obviously, this is somewhat subjective and varies based on the intentions of the individual.)

What is the timeline of the given tests? MAR as a measure is more sensitive to changes the smaller the time period. In fact, I would prefer to optimize around something other than MAR, since a single event determines the metric value.

I would like to see the graph over multiple periods, even if those period have overlap. My preference would lean toward walk-forward distribution to derive optimal values.

Cheers,

Kevin

Asterix · Post by **Asterix** » Mon Sep 20, 2004 3:33 pm

I've put a lot of thought into this particular question. How can you tell if your system will produce similar results in the future?

Just like there is no holy grail trading system, there does not appear to be a definitive answer to this question. But there may be some clues as to what to look for in your testing results.

In my opinion, the question of robustness is very closely related to the topic of stationarity. Since market prices are non-stationary over time, you have to expect some changes in your testing results too. If these changes are big, your system may not be very robust. If they are small, then your chances of having a robust system should also be better.

I've thought of one way to test your results that may be beneficial. Take the closed out profit and loss numbers for each trade ordered sequentially and divide them in two. Plot the cumulative distribution function (CDF) of the first half versus the 2nd half and look at a plot of both CDF's. If they are similar, (they don't have to be exact) then there is a good chance that the results haven't changed much from the first half to the 2nd half.

You can divide the data into 3rds, 4ths, 5ths, etc. and repeat the same. At some point the CDF's will begin to diverge from each other. At this point you'll know your timeframe is too short - something changes during each period.

This same method is also applicable when comparing out-of-sample results. If the CDF for the out-of-sample data is very similar to the CDF for the data used for optimization, you can be more confident in the overall robustness of the system. (Note: you need a sufficient number of samples in your out-of-sample data for this test to be valid.)

MorganSys · Post by **MorganSys** » Mon Sep 20, 2004 4:16 pm

What I would need to know:

-How many trades
-Distribution of returns within each bar
-Temporal stability of the distribution within each bar
-Distribution of returns in underlying asset
-Price time series of underlying asset (if only one)

wksutton · Post by **wksutton** » Mon Sep 20, 2004 6:35 pm

Good question, shayamuni. Although the data you provide are really inadequate, as several replies have pointed out, given the graph as it is we can still make some judgements. Except for 16, the graph resembles a somewhat skewed normal distribution, and I don't think anyone concerned with robusticity would trade 16, or even believe it's "real.". One way to approach this would be to take an average of each set of three, and trade the median. This would suggest trading 12. Why not an average of five? Well, this would indicate trading 14, and I don't think anyone would want to do that. So three rather than five is another subjective judgement. As c.f. pointed out, though, I wouldn't trade 12 unless 14 was an acceptable outcome. If it were not, i would trade 11 for the reasons c.f. gave.

Wesley

Hiramhon · Post by **Hiramhon** » Mon Sep 20, 2004 7:48 pm

Unfortunately the histogram isn't a realistic example. Usually there are 3 or 4 parameters being systematically varied, and 2 or 3 output results. For the Turtle system, imagine varying the entry breakout #days, the exit breakout #days, and the failsafe breakout #days. The output results are MAR and Sharpe Ratio.

Now you've got a four dimensional graph (entrydays, exitdays, failsafedays, MAR). Actually you've got two of them: one 4D graph for MAR and another for Sharpe. You'll need to invent a way of finding the tall plateaus that doesn't involve human vision -- human vision is no good in four dimensions. It just gets worse and worse, the more variables you introduce.

Imagine combining 3 systems: Turtle, Bollinger Countertrend, and Triple MA, where you're optimizing 2 parameters each. Visually hunting for a "robust optimum" requires plotting, and looking at, a seven dimensional graph. Good luck with that.

ksberg · Post by **ksberg** » Tue Sep 21, 2004 2:04 am

Hiramhon wrote:Unfortunately the histogram isn't a realistic example. Usually there are 3 or 4 parameters being systematically varied, and 2 or 3 output results. For the Turtle system, imagine varying the entry breakout #days, the exit breakout #days, and the failsafe breakout #days. The output results are MAR and Sharpe Ratio.

Now you've got a four dimensional graph (entrydays, exitdays, failsafedays, MAR). Actually you've got two of them: one 4D graph for MAR and another for Sharpe. You'll need to invent a way of finding the tall plateaus that doesn't involve human vision -- human vision is no good in four dimensions. It just gets worse and worse, the more variables you introduce.

What made your example seemingly untenable was the 3 or 4 parameters, not the histogram. Why? Note that what you say is true with or without using histograms. ... Actually, there are ways to both visualize and mathematically locate local maxima in multiple dimensions. These techniques have been applied in multiple disiplines for a some time. As for visualization, imagine how CAT scans are able to take multiple slices of a 3 dimensional body. Plus, modern techniques can reconstruct and "fly through" the composite. I've seen slice-and-dice equivalents for other data and there's no reason it can't be done for trading.

The reason I suggest (and use) histograms is non-stationarity. If I take and re-run an example like shakyamuni's on a different period, I can almost guarantee a different graph. So, using one trial and one graph, I would always be picking a non-optimal point. It's the same problem experienced with Optimal-f (your optimization point is always changing).

Instead I want to test and arrive at a point as suggested by Asterix: the resulting CDFs should be stable over multiple sets. This step can be done mechanically in any number of dimensions. Afterwards, I can slice and dice redux visual dimensions to my hearts content.

For grins, I'll post a couple graphs I have handy of entry parameter variance for a Turtle-like system ("Gamera"). They represent distributions from a narrow-band daily walk-forward portfolio of 5 markets over 15 years ... a few hundred thousand portfolio variations (these puppies take a while to generate, even at 5-10 portfolios a second, which is why the small test portfolio).

Interpretation: 65% of all results for the given parameter fall between the upper and lower blue bands (similar to 1 StdDev), while 95% of all results for the given parameter fall between upper and lower magenta bands (similar to 2 StdDevs). On the CAGR graph, note that the 20-period entry is consistently highest return for this system, and also yeilds the highest high and highest low of any distribution. On the MaxDD graph, note how wide the variation is at the 55-period entry (both bands). The chances of picking a wrong parameter to minimize drawdown are very high without this knowledge.

Cheers,

Kevin

Forum Mgmnt · Post by **Forum Mgmnt** » Tue Sep 21, 2004 8:35 am

Kevin,

Cool!

That's exactly the analysis which 2.0 will have in a new parameter analysis tab. I call these graphs "Parameter Sensitivity Graphs".

I was hoping to keep this feature a secret till release but since you already do it, there's not much point. I haven't even told the Beta Testers about this yet,

This was the analysis I first proposed to Ed Seykota three years ago when he proposed 3D surface maps for parameter analysis. I didn't see how 3D graphs were the solution to the problem in a multi-dimensional space and proposed the solution you outline. He didn't really understand what I was talking when I first described the concept, as I probably wasn't clear enough, but you obviously do.

Constructing these charts is pretty easy, you just build a distribution of whatever goodness measure you wish to plot for each distinct value of every stepping parameter in the test. When the test runs are done, you simply calculate the standard deviations and average for each parameter and then plot lines around those points.

Another measure which comes from this which I call "Parameter Sensitivity", is the standard deviation of the average parameter values divided by the average value across all parameter values. To get this standard deviation value, use the standard deviation of the average for each distinct parameter value, i.e. essentially the standard deviation of just the middle line of your graph.

So for a measure of MAR, if a parameter has an average MAR of 1.2 and a standard deviation of 0.12, the value of the measure would be 10%. You'd say that the MAR exhibited a 10% sensitivity to parameter X.

This can help differentiate the parameters that really drive the system results from those that don't have much effect.

For the 2.0 feature, I'm going to do a set of shaded bands around the average as the midpoint with an additional line for minima and maxima. These graphs will be generated automatically for each parameter that is set to step during the multi-parameter test run.

It's pretty easy to spot curve-fit values using this type of analysis. It's also much easier to determine those values which are likely to be stable.

- Forum Mgmnt

shakyamuni · Post by **shakyamuni** » Tue Sep 21, 2004 12:08 pm

Kevin and c.f. have an interesting idea with these â€œParameter Sensitivity Graphs.â€

Asterix · Post by **Asterix** » Tue Sep 21, 2004 1:55 pm

I think this thread is hitting on one of the most important factors in developing a trading system. How can you tell if the system will work in the future?

Adding the Parameter Sensitivity analysis to Veritrader is, in my opinion, a BIG plus for this package. My one criticism of the commercial apps for developing trading systems is that they do not provide very good tools for doing serious statistical analysis of the results. This can be a deadly trap for a naive trader. It is very easy to be fooled into thinking that you've developed a profitable system if you haven't used the right tools to analyze it.

Spending $2000 to $3000 on a software app is nothing compared to the money you can lose trading futures with a bad system.

Kevin - you've illustrated a very good method for using 2D graphs that show the essence of how changes in the parameters affect the overall system.

One suggestion here. (Maybe you've already done this.) In your example you've shown that the 20 period entry shows the best overall performance. I've found that it is also a good idea to do a finer sub-division around what looks like the best value to make sure you don't see non-linear behavior in nearby input values.

In other words, re-run the same analysis between the values of maybe 15 to 25 using finer increments. (It looks like you used an increment of 5 in your posted example.) You should still see smooth changes in the data - not big spikes that go in different directions.

I found this extra step can sometimes uncover a weakness that doesn't show up during the first pass thru the data. More work, more time, but better than losing your shirt.

Forum Mgmnt · Post by **Forum Mgmnt** » Tue Sep 21, 2004 3:20 pm

I agree that checking out fine resolution values around the proposed optiimal values is a good idea. This reminds me of the one other robustness checking feature that I think will be useful.

What I call the "Automatic Robustness Check":

Here's the basic idea, take a proposed set of parameter values and run 100s or 1,000s of iterations where you randomly vary the parameter values around those values. Then generate output similar to a monte-carlo simulation where you show the range of drawdowns, equity curves, CAGR%, etc. for each of the simulated tests.

The part I haven't quite nailed down yet is the mechanism for automatically determining how much to vary each of the parameters. If you did a two pass test, we could use the first pass to determine the range of sensitivity for each parameter as a function of how the optimal parameter changes over time. Perhaps taking a two or three year timeslice to auto-optimizing the parameter over that timeframe and then taking the standard deviation of all the optimum values. This standard deviation would be used to adjust the range of variation for each parameter.

During the second pass, or the "analysis phase" of the Automatic Robustness Check, we could then give a wider variation to parameters that varied more over time during the first pass.

Any other ideas here would be greatly appreciated. If I can nail this down sufficiently, I'd like to include this in VeriTrader 2.0.

I think that something like this would go a long, long way towards helping new traders avoid hyper-optimized bogus systems.

- Forum Mgmnt

TK · Post by TK » Tue Sep 21, 2004 3:58 pm

ksberg wrote:For grins, I'll post a couple graphs I have handy of entry parameter variance for a Turtle-like system ("Gamera"). They represent distributions from a narrow-band daily walk-forward portfolio of 5 markets over 15 years ... a few hundred thousand portfolio variations (these puppies take a while to generate, even at 5-10 portfolios a second, which is why the small test portfolio).

TO MODERATOR: If my question below fits NEWBIE area, feel free to move it there.

Kevin,

I am probably a slow learner

and feel a bit lost with your example and procedure for checking parameter robustness.

Let's say a system has 3 parameters and for each one you want to test ranges of values from 10 to 20. Your portfolio consists of 3 markets and you have 10 years of data. What is your step-by-step procedure for picking the best (robust) parameter values? I would appreciate your help on this one.

Forum Mgmnt,

How will VeriTrader 2.0 help answer questions like the one above? With regard to choosing robust parameter values, I like the idea of Incremental Stepping and choosing a value from the middle of a smooth range of values (as described in the VT Demo Tutorial). How is "Parameter Sensitivity" different from this method and what new concepts does it introduce or what other questions does it help answer that Incremental Stepping doesn't? Or do the two interact and complement each other? I'm a bit confused, sorry.

TK

Asterix · Post by **Asterix** » Tue Sep 21, 2004 5:29 pm

I think Automated Robustness Checking is another great idea that would put Veritrader way ahead of the other toolkits. This feature would go a long way to expose potential problems in the trading system by thoroughly exercising the system over a range of values that are close to the optimum inputs.

FYI for all. I have a lot of experience developing engineering simulation software where 90% of the code is numeric computation. I've found that using randomized inputs is a very good way to test code for numeric errors - overflow, divide by zero, iterations that fail to converge, etc.

When working with engineering computations, the input data is often continuous - you aren't restricted to integer values. An input parameter that can vary from 1 to 3 can have valid values of 1.1, 1.01, 1.001, 1.0001 ... 2.9, 2.99, 2.999 ... 3.0. Obviously you can't test an infinite range of inputs, so Monte Carlo variation of the inputs is the next best thing. You are more likely to uncover a bug using this method than by using input values with discreet differences like 1.0, 1.1, 1.2 ...

Although we'd be testing to uncover a different type of problem, the same approach will work well to evaluate a trading system for robustness.

I don't know if you need to do two passes. You can let the user decide on the value of epsilon which is the +/- value by which you want to vary the particular parameter. When you analyze the results, you may be able to tell whether or not you've made epsilon too big or two small.

Also, there should be a reasonably linear relationship between the value of epsilon and the spread of data in the results. A larger value of epsilon should produce a spread that varies somewhat proportionately to a smaller value of epsilon. If a small change in epsilon produces a big change in the results, that is a clue that the system may not be robust.

Comments?

shakyamuni · Post by **shakyamuni** » Tue Sep 21, 2004 10:41 pm

Forum Mgmnt, I like the idea of the â€œAutomated Robustness Checking.â€

Forum Mgmnt · Post by **Forum Mgmnt** » Wed Sep 22, 2004 8:45 am

"How do I know that this 'proposed set of parameter values' is within the optimal robust range in the first place? What if this parameter set ends up looking robust and reforms well but there is some other set elsewhere in the parameter set that is superior?"

Ah, now you get to the real point of all of this, "Automatic Optimization" where what you are looking for is NOT the peak of the test, but the best Robust Peak.

If we have a mechanism for coming up with automated robustness checking and some numeric robustness measure suited to individual tastes, then we can do automatic optimization in stages like:

1) Auto-optimize the Parameters to find some candidate peaks using genetic algorithms or advancing granularity iteration.

2) Run Automatic Robustness Checks on those peaks

3) Use this information for another genetic optimization pass where we use the result of the Automatic Robustness Check as the goodness measure rather than the raw input.

I'd see the goodness measure being a Block Basic function where you had access to a standard set of measures like MAR, CAGR%, etc. You would be able to use simple weightings like:

Code: Select all

goodness = 0.5 * (test.CAGRPercent / 50%) + 0.5 * test.MAR

but you could also do more sophisticated measures that used conditional logic for thresholds e.g. ignore CAGR% above 50% returns, or return 0 if the drawdown exceeds a certain threshold, etc.

Code: Select all

' Constrain the return to 50%
returnRate = test.CAGRPercent
IF returnRate > 50% THEN returnRate = 50%

' Discard tests with drawdowns greater than 30%
IF test.MaxDrawdown > 30% THEN
    goodness = 0.0
ELSE
    goodness = 0.5 * (test.CAGRPercent / 50%) + 0.5 * (test.MAR / 3.0)
ENDIF

So the result of the Automatic Optimization would be the parameter set that was most likely to be at the center of parameter excellence, i.e. the set which represented those parameters which when varied randomly on average returned the best value for the Automatic Robustness Check using the goodness function.

That's really what I want for my own testing. I come up with an idea and then the computer automatically tells me the best robust performance I can expect with that system, probably plotted as a probablity distribution for each of my favorite individual measures as well as my custom goodness function.

For this to work well, the robustness check ought to be comprehensive and include things like varying the start date, scrambling of the data inputs ala Monte-Carlo simulations, as well as the parameter robustness checks outlined above in my earlier post to this topic.

A lot of smart people are spending too much time on the mechanics of this stuff rather than on coming up with better trading ideas. This is a huge waste of potential.

I prefer to keep my mind on innovation and thinking rather than doing rote process, so this is the direction I'm taking VeriTrader because I'd much rather have a computer doing this stuff by itself, provided I can be confident in the decisions it makes along the way.

- Forum Mgmnt

Asterix · Post by **Asterix** » Wed Sep 22, 2004 2:34 pm

Wow!

I'd like to say I'm really impressed at the brain power on this thread.

Forum Mgmnt, if you can implement anything close to what you suggested in your last post, Veritrader will be head and shoulders above any other system trading development toolkit.

One question, on your last post you suggested as a 3rd step:

3) Use this information for another genetic optimization pass where we use the result of the Automatic Robustness Check as the goodness measure rather than the raw input.

Wouldn't the Auto Optimization just return the same parameters as you've identified with your Automatic Robustness Check? I'm wondering if you would be in a circular iteration at this point.

ksberg · Post by **ksberg** » Wed Sep 22, 2004 2:45 pm

TK wrote:I am probably a slow learner and feel a bit lost with your example and procedure for checking parameter robustness.

Let's say a system has 3 parameters and for each one you want to test ranges of values from 10 to 20. Your portfolio consists of 3 markets and you have 10 years of data. What is your step-by-step procedure for picking the best (robust) parameter values? I would appreciate your help on this one.

TK, sorry if I sped through this. There were two separate parts of my post: one dealing with multiple parameters, and the second showing an example of distrtibution around single parameter sensitivity. Given that I was talking about how one might construct and deal with multiple parameter distributions, I can't give you a step by step on that process (yet).

Let me share what I currently do. First off, I like to run wide-band optimization on single parameters. By this I mean that step size is high, and I'm looking for general patterns of local maxima stability. As Asterix suggested, I often zero-in on a smaller segment and use a tighter step size afterwards. I will also compare 2 parameters by creating 3D heat maps, generally after finding some stability in each of the parameters. Using the same parameter ranges, I will then vary the time window and examine the 3D graph. Again, I'm looking for a stability maxima, just like discussed, except across multiple time frames. I will repeat this rotating in other parameters (since I can only visualize at most 2 in 3D). The parameter choices are derived from mutually stable areas.

What I was suggesting with distributions is to collapse and speed up the entire process. The bands represent outcomes from multiple trials, like I'm doing manually, but for a single parameter. The concept and steps are as follows:

1. Pick a parameter from the range (e.g. range[10, 20], param=10)
2. Run test test on a smaller time window (e.g. 5 years)
3. Record the maximum for that parameter and time window
4. Slide the window forward
5. Repeat steps 1 through 4 until finished with period (e.g. 20 years)
6. Collect all the outcomes for what is maximal for that parameter, and create a distribution
7. Figure middle value and outlyers for the distribution (median and bands)
8. Choose the next parameter, and repeat 1-7
9. Graph the entire result
10. Pick value that is best metric (e.g. highest CAGR) within stable bands

Hope that helps,

Kevin

TK · Post by TK » Wed Sep 22, 2004 3:09 pm

Thanks, Kevin. It's much clearer now. I appreciate it.

Regards,
TK

Traders' Roundtable

Robust Optimization

Robust Optimization

Re: Robust Optimization