RedRock wrote:Yes. Less is more in the future, usually.
Useful search terms include "bias variance tradeoff", regularization, overfitting, Bayesian Estimation, Ridge Regression.
In machine learning there are often two levels of fitting of models to data, which requires the data to be split into three lots, only *one* of which is used to optimize parameters. At the lowest level you optimize the model's parameters. At a higher level you optimize the structure of the model including the number of active parameters.
Code: Select all
for number of active parameters (NAP) from 1 by 1 until NAP greater than X pick the best model
for various parameter values pick the best parameters
measure performance of model with parameters
end
end
Typically you split the data into three sets. One set is used to optimize the low level. The second set is used to optimize the high level. The third set is used to get an idea what the real performance is likely to be like. You might split the data among the sets 50:30:20.
It is a bit tricky to split the data. Because markets are correlated it may be best to use time spans for splitting the data. One approach is to break the data up into slabs of months, with say a mean length of 12 months and a standard deviation of 6 months.
You can perhaps squeeze more out of the data by running the whole process many times, with a different random partition of the data each time. The output is a distribution of outcomes. This will give you a sense for how stable your model building process is and how reliable are the predicted performances. Warning: this process is likely to be enlightening but depressing at the same time.
Some learning technologies such as Support Vector Machines inherently deal with bias variance tradeoffs. Others have parameters that allow you to control the bias. Often the bias-limiting factor is called 'lambda' - effectively you apply a cost or a fee to each additional parameter in your model.
See for example (all this material requires a fair bit of mathematics unfortunately):
http://www.ml-class.org/course/resource ... -materials
or
http://cs229.stanford.edu/materials.html
or
(lectures from above notes)
http://www.youtube.com/watch?v=UzxYlbK2 ... ure=relmfu
or
http://www.youtube.com/user/mathematica ... A0D2E8FFBA
As I mentioned in an earlier posting, the Vapnik Chervonenkis theorems guarantee that your optimized model will be probably close to correct, subject to its bias, and based on certain assumptions.
That is to say, a model's biases - its assumptions and limitations - creates a source of errors that no amount of data can overcome. On top of that, VC says that there is random error that can be reduced to any required level given enough data of the right kind.
The caveat about the right kind of data is the killer. The requirement is that the data you have is drawn at random from the target distribution. This means if the structure of returns is changing, or if your data is not representative then your results can be way off.
For example let's say you develop a stock market timing model based on US and Australian data for the last 110 years, as I did. These are two of the best-performing stock markets during that time. And it was a time when overall economic growth in the world was very high.
If you attempt to apply this model to the future, this may not work if you apply it to average stock markets, and to a hypothetical future world where economic growth is sluggish.