Quantification of system robustness

Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.
Jez Liberty
Roundtable Knight
Roundtable Knight
Posts: 123
Joined: Tue Nov 03, 2009 8:49 am
Location: London
Contact:

Post by Jez Liberty »

Thanks noddoodahs.. And yes I am taking all the comments in! ;-)

I personally really liked Taleb's Fooled by Randomness I actually quite enjoyed the "rather arrogant" tone he takes (personal bias here: I might be a bit bitter about the "system" and it is reassuring to find some authority criticizing it).
I found the Black Swan a bit more heavy-going, but picked it up again and really enjoyed the second part onwards - with the discreditation of the bell curve.

What I really enjoyed is The (mis)behavior of Markets by Mandelbrot . You can see that many ideas from Taleb seem to be derived from it. Even if you are not into fractals, the first part is a great presentation of "orthodox" financial theory and great debunking of it. A really great read! I'd recommend it.
The second part is more Mandelbrot's work in progress with applying fractal to financial markets theory - an interesting part too.
alp
Roundtable Knight
Roundtable Knight
Posts: 309
Joined: Mon Aug 27, 2007 8:09 pm

Re: Quantification of system robustness

Post by alp »

nodoodahs wrote:[...]trendfollowers are fond of the long-option-like positive skew of this system type, but when the return profile is based on rare events, such as the once- or twice-a-decade-trade, what can we really, STATISTICALLY, say about these returns? Anyone in the biz knows that the rarer the event, the harder it is to pin down its odds ... and "outlier removed" performance may be a measure of robustness.
noddoodahs, how do you pick the outliers? Unusual R-multiple trades or best performing years? Isn't trend following all about keeping doing business to get the outliers?
nodoodahs wrote:So what if you have a bunch of non-robust systems? If you have systems that aren't robust, but their cycles of relative outperformance are uncorrelated or negatively correlated, you could trade the equity curves of your sytems using moving averages and have a system of systems (which is, in fact, a system).
Looks interesting, at least in theory. In this case, I assume, one would substitute "trading instruments" for "trading systems". As such, it seems that the problem of looking for the mythical "robustness" would be compounded by the assumption that the underlying systems will continue to perform like they did in the past.
alp
Roundtable Knight
Roundtable Knight
Posts: 309
Joined: Mon Aug 27, 2007 8:09 pm

Post by alp »

The main points about living with uncertainty (the "Fourth Quadrant"):
1) Avoid Optimization, Learn to Love Redundancy. [...] Only fools (such as Banks) optimize, not realizing that a simple model error can blow through their capital (as it just did). [...] Biological systems—those that survived millions of years—include huge redundancies. Just consider why we like sexual encounters (so redundant to do it so often!). Historically populations tended to produced around 4-12 children to get to the historical average of ~2 survivors to adulthood.

Option-theoretic analysis: redundancy is like long an option. You certainly pay for it, but it may be necessary for survival.

2) Avoid prediction of remote payoffs—though not necessarily ordinary ones. Payoffs from remote parts of the distribution are more difficult to predict than closer parts.

A general principle is that, while in the first three quadrants you can use the best model you can find, this is dangerous in the fourth quadrant: no model should be better than just any model.

3) Beware the "atypicality" of remote events. There is a sucker's method called "scenario analysis" and "stress testing"—usually based on the past (or some "make sense" theory). Yet I show in the appendix how past shortfalls that do not predict subsequent shortfalls. Likewise, "prediction markets" are for fools. They might work for a binary election, but not in the Fourth Quadrant. [...]

4) Time. It takes much, much longer for a times series in the Fourth Quadrant to reveal its property. At the worst, we don't know how long. [...] Things that have worked for a long time are preferable—they are more likely to have reached their ergodic states.

[...]

6) Metrics. Conventional metrics based on type 1 randomness don't work. Words like "standard deviation" are not stable and does not measure anything in the Fourth Quadrant. So does "linear regression" (the errors are in the fourth quadrant), "Sharpe ratio", Markowitz optimal portfolio, ANOVA shmnamova, Least square, etc. Literally anything mechanistically pulled out of a statistical textbook.

My problem is that people can both accept the role of rare events, agree with me, and still use these metrics, which is leading me to test if this is a psychological disorder.

The technical appendix shows why these metrics fail: they are based on "variance"/"standard deviation" and terms invented years ago when we had no computers. One way I can prove that anything linked to standard deviation is a facade of knowledge: There is a measure called Kurtosis that indicates departure from "Normality". It is very, very unstable and marred with huge sampling error: 70-90% of the Kurtosis in Oil, SP500, Silver, UK interest rates, Nikkei, US deposit rates, sugar, and the dollar/yet currency rate come from 1 day in the past 40 years, reminiscent of figure 3. This means that no sample will ever deliver the true variance. It also tells us anyone using "variance" or "standard deviation" (or worse making models that make us take decisions based on it) in the fourth quadrant is incompetent.

7) Where is the skewness? Clearly the Fourth Quadrant can present left or right skewness. If we suspect right-skewness, the true mean is more likely to be underestimated by measurement of past realizations, and the total potential is likewise poorly gauged. A biotech company (usually) faces positive uncertainty, a bank faces almost exclusively negative shocks. I call that in my new project "concave" or "convex" to model error.

8 ) Do not confuse absence of volatility with absence of risks. Recall how conventional metrics of using volatility as an indicator of stability has fooled Bernanke—as well as the banking system.

[...]

9) Beware presentations of risk numbers. Not only we have mathematical problems, but risk perception is subjected to framing issues that are acute in the Fourth Quadrant. Dan Goldstein and I are running a program of experiments in the psychology of uncertainty and finding that the perception of rare events is subjected to severe framing distortions: people are aggressive with risks that hit them "once every thirty years" but not if they are told that the risk happens with a "3% a year" occurrence. Furthermore it appears that risk representations are not neutral: they cause risk taking even when they are known to be unreliable.
nodoodahs
Roundtable Knight
Roundtable Knight
Posts: 218
Joined: Wed Aug 09, 2006 4:01 pm

Re: Quantification of system robustness

Post by nodoodahs »

alp wrote: noddoodahs, how do you pick the outliers? Unusual R-multiple trades or best performing years? Isn't trend following all about keeping doing business to get the outliers?
Good ol' box-plot analysis?

Rare events are harder to get good statistical odds for. If 80% of the profits come from 20% of the trades, that's one thing. If, on the other hand, 80% of the profits come from 1% of the trades, that's another thing, and it may say something about the robustness of your system.
rabidric
Roundtable Knight
Roundtable Knight
Posts: 243
Joined: Mon Jan 09, 2006 7:45 am

Post by rabidric »

what an excellent thread. I particularly agree with alp's comments.

For me I think that there is a danger of trying too hard to pull philosophical ideals out of all this.
In the end, it all boils down to a judgement excercise with no fixed constraints, but instead a bunch of evolved guidelines that we each carry, built up from our experience.
This is the true "large-sample" data-set we should be trying to build. i.e. test, deploy, re-test, re-deploy, learn, change assumptions, test etc. as much as possible. Not agonize too much over the ultimate market dataset.

I believe in getting pragmatic. It may be possible to arrive at a one-size-fits-all model that works better than any other model over eternity...
....but if it happens to suck* much worse than a bunch of other models in the current decade, then i will not trade it.

the answers to the typical questions like "how many years of backtest data to include" usually have fairly broad yet simple answers when you realize that beyond a point(which you have to judge using experience etc), it won't make much difference compared to other factors.

*the SUCK quotient cannot be explained, you have to see it for yourself
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

rabidric wrote: For me I think that there is a danger of trying too hard to pull philosophical ideals out of all this. In the end, it all boils down to a judgement exercise with no fixed constraints, but instead a bunch of evolved guidelines that we each carry, built up from our experience.
I very much agree. Nor do I think any high degree of statistical or mathematical expertise is required in all this beyond some commonsense appreciation of such basics as adequate sample size and so forth. Which is fortunate, since I do not possess any.

Much of it boils down to a lot of common sense, a lot of hard work, testing and observation........and personal judgement, prediction if you like.

And of course you have to keep your guard up and keep right on researching and thinking. And be willing to accept you are wrong when the evidence points that way.
alp
Roundtable Knight
Roundtable Knight
Posts: 309
Joined: Mon Aug 27, 2007 8:09 pm

Post by alp »

AFJ Garner wrote:I very much agree. Nor do I think any high degree of statistical or mathematical expertise is required in all this beyond some commonsense appreciation of such basics as adequate sample size and so forth. Which is fortunate, since I do not possess any.
Same here. :) That's why we can hopefully count on nodoodahs to explain what Taleb means by "[...]Avoid Optimization, Learn to Love Redundancy. [...] Historically populations tended to produced around 4-12 children to get to the historical average of ~2 survivors to adulthood. Option-theoretic analysis: redundancy is like long an option. You certainly pay for it, but it may be necessary for survival. [...] Avoid prediction of remote payoffs—though not necessarily ordinary ones. Payoffs from remote parts of the distribution are more difficult to predict than closer parts."

AFJ Garner wrote:And of course you have to keep your guard up and keep right on researching and thinking. And be willing to accept you are wrong when the evidence points that way.
Keep researching, looking for evidence and admit one might be wrong can be humanly almost impossible.
alp
Roundtable Knight
Roundtable Knight
Posts: 309
Joined: Mon Aug 27, 2007 8:09 pm

Post by alp »

A quote from the infamous chapter What the professionals have done, by Ralph Vince in The Handbook of Portfolio Mathematics:
[...]The concept of using an array of parameter values is also rather widely thought to help alleviate the problems of what parameter values to use in the future, based on historical testing. The thinking is that it is difficult to try to pinpoint what the very best parameter will be in the future. Since most of these systems are robust in terms of giving some level of positive performance against a wide spectrum of parameter values, by using an array of parameter values, fund managers hope to avoid selecting what in the future will be an outlier parameter value in terms of poor return. By using an array of parameter values, they tend to come in more toward what the typical parameter performance was in the past—which is acceptable performance.

Parameter optimization tends to be fraught with lots of questions at all levels. Though the concept of parameter optimization is, in effect, inescapable in this business, my experience as an observer here is that there is not much to it. Essentially, most people optimize over the entire data set they can get their hands on, and look at the results for parameter values over this period with an eye toward robustness and trying to pick that parameter value that, though not necessarily the peak one, is the one in a range of parameter values where all have performed reasonably well. Additionally, they tend to break down the long history of these runs into thirds, sometimes fourths. For example, if they have a 28-year history, they will look at things in 7-year increments—again, with the same criteria. Of note here is the question of whether they use the same parameter values from one market to another. Typically, they do not, using a different set of parameter values for different markets, though this is not universal. Additionally, as for how frequently they optimize and reestablish parameters, this too seems to be all over the board. Some do so annually, some do it considerably more frequently than that. Ultimately, this divergence in operations also seems to have little effect on performance or correlation with one another. There has been a trend in recent years to capture the characteristics of each individual market’s prices, then use those characteristics to generate new, fictitious data for these markets based on those characteristics. This is an area that seems to hold great promise. The notion of adding to a winning position, or pyramiding, is almost completely unseen among the larger fund managers.[...]
Post Reply