Main Question:
How can we statistically assess that a change to a strategy results in a significance improvement and can /should be applied to any portfolio?
There are probably many ways you could test but I would be interested if anyone with a bit more math (statistics) knowledge than me could tell if the following seems to be correct (statistically) or has obvious mistakes. More details in the attached EXCEL.
Background:
Stock trading system, Simulation over 21 years. Portfolio of 1015 S&P stocks including delisted.
We want to assess if a change to the system results in a significant improvement to the MAR. What we change to the system doesn't really matter  in this case it is about the methodology used. Note: In the example, the "change" is moving from a strategy with no rebalancing between entry and exit to a strategy with ongoing rebalancing based on position size and risk. We use MAR ratio as the relevant output but we could use any output.
Initial simulation:
Includes the entire portfolio (1015 stocks) over 21 years and results in a significant improvement in MAR from "before" (0.34) to "after" (0.51) the change.
Main Issue:
Maybe a small number of stocks drive the improvement? How to statistically check if the change to our strategy results in an improvement across a large set of portfolios? Again, I am sure there are many ways, but is the following a correct application of statistics?
Step 1.
We generate a number of Random Portfolios (in the example 20 portfolios), each portfolio consists of 200 randomly selected stocks out of the entire "population" of 1015 stocks.
Step 2.
We run "before" and "after" the change simulations on each of the 20 portfolios i.e. in total 40 simulations (with 40 MAR observations)
Step 3.
We calculate the difference in MAR from "before" to "after" the change for each portfolio (i.e. 20 data points of difference in MAR). We calculate the Mean and Standard Deviation of the sample . In the example, Sample mean is +0.028 MAR (7.1%) (i.e. MAR improves on average by 7.1%) with a Standard Deviation of 0.063 MAR (~15%).
Step 4.
Calculate Confidence (95%) Interval using tstatistic (is this correctly applied...?). In the example, I derive to a margin of error of 7.2% > Confidence Interval: 0.1 to +14.3%
Conclusion:
with 95% confidence the change to the system results in an improvement of MAR of between 0.1% (no improvement !) and +14.3%. In other words, with 97.5% confidence the improvement is greater than 0.1% and the change can (should) be applied to any portfolio. At least to any random portfolio within the Universe of 1015 S&P stocks over the 21 year period.
Views much appreciated!
Confidence Interval around a system change
Confidence Interval around a system change
 Attachments

 Confidence Interval around a change to a system.xlsx
 (1.03 MiB) Downloaded 34 times
Re: Confidence Interval around a system change
Hi,
I use the Z score (as shown on your last sheet) to compare the before and after results.
( I am not a statistical whiz, so I keep it simple. )
One observation:
MAR is an inadequate measure because it uses the single largest DD.
This can force you to reject an otherwise desirable solution due to an outlier.
The Sortino Ratiosemi deviation of DD'sis more stable, less sensitive to outliers.
If you use the Sharpe Ratio as the measure, the minimum hedge fund expectation is
AR / AV > 1
where AR = Average daily return; AV = Average daily volatility
AR / AV >= 3 is "possible" according to Perry Kaufman "but unusually good"
Leslie
I use the Z score (as shown on your last sheet) to compare the before and after results.
( I am not a statistical whiz, so I keep it simple. )
One observation:
MAR is an inadequate measure because it uses the single largest DD.
This can force you to reject an otherwise desirable solution due to an outlier.
The Sortino Ratiosemi deviation of DD'sis more stable, less sensitive to outliers.
If you use the Sharpe Ratio as the measure, the minimum hedge fund expectation is
AR / AV > 1
where AR = Average daily return; AV = Average daily volatility
AR / AV >= 3 is "possible" according to Perry Kaufman "but unusually good"
Leslie
Re: Confidence Interval around a system change
Thanks for your input. After speaking to people with additional stat expertise ..., I am implementing some changes which I might post here later.
Having looked into your comment on MAR, I believe this is fine to use. There might be some outliers but as long as the simulation data output we decide to look at (MAR, Sharpe, ARR...) is deemed to follow a normal distribution (which we can show using a "Chi Square test" for example) and we have a sufficient number of data points (i.e. test results) using random portfolios, then we can apply a confidence interval calculation to assess if the system change results in a meaningful improvement.
We need to run tests before and after a system change on a meaningful (30+) number of random generated Portfolios.
In any case, if one has a preferred metric (other than MAR for example), the same calculations can be applied. I would be surprised if you come to a different conclusion unless the system change results in a relatively minor improvement (in which case it's questionable if you want to implement it anyway).
Having looked into your comment on MAR, I believe this is fine to use. There might be some outliers but as long as the simulation data output we decide to look at (MAR, Sharpe, ARR...) is deemed to follow a normal distribution (which we can show using a "Chi Square test" for example) and we have a sufficient number of data points (i.e. test results) using random portfolios, then we can apply a confidence interval calculation to assess if the system change results in a meaningful improvement.
We need to run tests before and after a system change on a meaningful (30+) number of random generated Portfolios.
In any case, if one has a preferred metric (other than MAR for example), the same calculations can be applied. I would be surprised if you come to a different conclusion unless the system change results in a relatively minor improvement (in which case it's questionable if you want to implement it anyway).