A friend loaned me the book Black Swan by Taleb. It reinforces that the risk in markets is much greater than a normal distribution would suggest. I did some research to verify this myself. Using IBM daily close prices from 1962  2012 from Yahoo! Finance, I first plotted percentage price changes in a histogram (see attached). Looks pretty similar to a normal distribution, i.e., to a bell curve, although the upward bias of the stock market is evident in the much greater instances of 1  2% moves as opposed to 1  2% moves. Curiously, the average of the price changes is slightly negative, which indicates there must be a large number of 1%  0% price changes. I then calculated the standard deviation (sigma) of the price changes, which equals almost 2%. In a normal distribution 68.2% of observations occur within 1 sigma of the mean, so if IBM price changes are normally distributed, then 68.2% of price changes should lie between 2% and 2%. In fact, 83.7% of IBM's price changes are within 1 sigma of the mean. This indicates that the distribution of IBM's price changes exhibits positive kurtosis, which appears as a taller, narrower hump than the hump of the normal distribution. It also seems to indicate that IBM is in fact LESS risky than a hypothetical investment that exhibits a normal distribution. However, in looking at much larger moves, you find that they are much more probable in IBM's price distribution than in a normal one. For example, the probability of a greater than 5sigma move on any given day for a normallydistributed hypothetical investment is 1 in 1.74 million. The historical probability of a similar move, that is, up or down more than 10% in one day, by IBM? About 1 in 500. For a greater than 6sigma move? Normal: 1 in 506 million. IBM: 1 in 900. Historically, IBM was 562,000 times more likely to exhibit a greater than 6sigma move on any given day than a normal distribution would suggest. This greater frequency of large moves in IBM's price distribution is likened as "fat tails" to the left and right of the hump. For example, you see that IBM closed up greater than 15% 9 times throughout the data period, which is a far, FAR greater frequency than expected if IBM prices were normally distributed. You could say that these bigger moves pose more risk for the trader and that the fat tails in IBM's price distribution make IBM much more risky than a hypothetical, normallydistributed investment.
So how does this impact system design? For one, any statistical measurements that assume that price changes are normally distributed and that assign a probability to a particular degree of move grossly underestimate the actual risk if that move is more than 3sigma. More broadly, the lesson here is that before utilizing tools that assume a normal distribution, make sure what you are measuring is ACTUALLY normally distributed, otherwise you have to take your probability estimates with a grain of salt.
Still learning. Need to learn more about what tools you can use to measure probability and determine statistical significance for nonnormally distributed data series.
Kurtosis and fat tails in IBM's price changes
Kurtosis and fat tails in IBM's price changes
 Attachments

 IBM Histogram.jpg (84.27 KiB) Viewed 5380 times
Your studies will quickly lead you to the mighty central limit theorem , which gives the surprising result that the sum of N independent random variables tends towards the normal (aka Gaussian) distribution (as N gets larger and larger), regardless of the distributions of the independent random variables themselves. They could even have the (empirical) distribution you observed for IBM stock price changes  as long as they are independent.
Here's a nice introduction from the University of California at Santa Cruz: pdf file See just above equation (19) where they say
Here's a nice introduction from the University of California at Santa Cruz: pdf file See just above equation (19) where they say
This amazing result is the reason why the normal (Gaussian) distribution plays such an important role in statistics.
Your comments raise a few questions for me. What do you mean by the "sum" of independent random variables? Is the central limit theorem saying that any independent random variable will tend toward a normal distribution as N grows larger?
Thanks for the PDF, but unfortunately I understood very little of the math. I need to gets me some familiarity in math speak.
Thanks for the PDF, but unfortunately I understood very little of the math. I need to gets me some familiarity in math speak.

 Roundtable Knight
 Posts: 199
 Joined: Sun Oct 10, 2010 1:47 am
 Location: Melbourne Australia
It can be useful to be fully aware that not every base distribution converges to a normal distribution. It converges if the base distribution has finite variance, and even some rare distributions with infinite variance can converge. Some of these base distributions that do not converge to a normal distribution seem to occur in finance.sluggo wrote:This amazing result is the reason why the normal (Gaussian) distribution plays such an important role in statistics.
Even when a base distribution does converge, the center converges first and the tails only later. The tails can converge quite slowly  too slowly to be of practical significance. The width of the area that looks like a normal distribution generally grows only with the square root of N. Nassim Taleb has written copiously on this point.
Wow, this Khan Academy website is a goldmine. Thanks D.
So, just because the means and sums of independent random variables converge to a normal distribution, this doesn't vindicate the use of statistics that rely on a normal distribution for a raw distribution that is nonnormal, right? If I was trading the average or sum of prices then maybe...
Feeling my way through this.
So, just because the means and sums of independent random variables converge to a normal distribution, this doesn't vindicate the use of statistics that rely on a normal distribution for a raw distribution that is nonnormal, right? If I was trading the average or sum of prices then maybe...
Feeling my way through this.
I'll add that you can also look into parameterfree or empirical based modelling so as to reduce Gaussian assumptions on your model building and analysis.
Regarding the CLT; it is fantastic, yes. But keep in mind the assumptions behind it, particularly the requirement of N independent distributions which will converge to a normal distribution. In reality, they are not always so independent.
Those assumptions can and will fail at the worst time(s) in financial markets.
That being said, there is certainly a HUGE amount of wisdom gained by understanding the CLT.
Regarding the CLT; it is fantastic, yes. But keep in mind the assumptions behind it, particularly the requirement of N independent distributions which will converge to a normal distribution. In reality, they are not always so independent.
Those assumptions can and will fail at the worst time(s) in financial markets.
That being said, there is certainly a HUGE amount of wisdom gained by understanding the CLT.
Aside from the limitations mentioned above, it is also a very useful conclusion that allows you to model and evaluate traditional statistical properties of estimators (like SE(mean(rtns)), SE(variance(rtns)), confidence intervals,etc..) that require normality in the distributions. By only drawing conclusions from one nonnormal sample, as you did (in your excellent work), there are limits to what can be said (statistically) about your resulting conclusions.sunyata wrote:Could you expound on that a little squaredQ? What knowledge do you think there is to gain in understanding the CLT as it relates to trading?
But there are other very useful conclusions about the property (applied towards multiple assets/trading systems etc) as well, that I won't go into here. Keep in mind, there are many different versions of the CLT as well.