# Equity Curve Analysis: A Simple Statistical Test

**Note:** A more complex method that applies to this article mentioned by one of our readers is the application of a regression which is briefly touched upon in the link http://en.wikipedia.org/wiki/T_Test

One method of determining when a trading strategy is breaking down is to run a statistical test. Conceptually when I use the term “breaking down” I am referring to the recent profitability of the strategy being significantly different than average. However, this may also refer to a strategy becoming more volatile than average relative to its profit per trade. Both an analysis of the “mean” and “variance” of a trading strategy in the most recent period versus its historical average are important in equity curve analyis. Another possible area of investigation is the trade win% versus its historical win%. For this post we will look at how to address the first issue which concerns the possibility that the average profitability is significantly different than average. Thus we will assume that the trading strategy still has the same variance or volatility as measured historically. Enter the paired t-test which is used to determine whether two samples from the same trading strategy are similar or statistically different than normal. In this case I have used the equation that assumes unequal sample sizes because in most cases traders would do a 5 or 10 year backtest, and would want to evaluate a strategy that is currently undergoing some form of drawdown or deterioration (ie you aren’t going to wait 5 or 10 years before you test again!):

From wikipedia: http://en.wikipedia.org/wiki/T_Test

Unequal sample sizes, equal variance

This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) The *t* statistic to test whether the means are different can be calculated as follows:

where

Note that the formulae above are generalizations for the case where both samples have equal sizes (substitute *n*_{1} and *n*_{2} for *n* and you’ll see).

is an estimator of the common standard deviation of the two samples: it is defined in this way so that its square is an unbiased estimator of the common variance whether or not the population means are the same. In these formulae, *n* = number of participants, 1 = group one, 2 = group two. *n* − 1 is the number of degrees of freedom for either group, and the total sample size minus two (that is, *n*_{1} + *n*_{2} − 2) is the total number of degrees of freedom, which is used in significance testing.

Note that this can also be applied to determine whether say two mean reversion systems like DV2 and RSI2 are signficantly different from eachother. There are some issues with the t-test and to me the assumption of normality–ie that a given trading strategy is normally distributed–is certainly one issue. This flawed assumption has caused quants many problems in the past. The most hazardous to your account is the delay that may be introduced by waiting too long to turn a strategy off. One adjustment that can be made is to have some form of trailing stop or gradual reduction of exposure to a strategy even before it is “statistically significantly different than average.” Other issues concern forms of conditional bias introduced by backtesting during only one “regime” such as an uptrend, this will naturally screw up the test as trading strategies often behave very differently across various “regimes.” Knowing in advance that certain strategies perform poorly when a regime change occurs, gives you “early warning” to stop using it and thus helps you to avoid drawdowns. An example would be shorting after two up days, which typically performs well, should be expected to perform worse during the onset of a new “uptrend” perhaps defined by the 200 day MA or others. Common sense in this regard can be a more proactive form of risk management vs reactive management achieved by trading the equity curve. It stands to reason for example that when the market is 3 standard deviations below its mean, eventually it will snap back, and thus a reduction of exposure to a shorting strategy at this point is probably better than waiting for a drawdown to occur to cut bait. This is also why the use of mean-reversion principles of overbought and oversold are valuable.

Thanks for this clear explanation of how we can apply stats analysis in trading.

I still have to learn much more about stats and how to apply them in my automate trading strategies analysis – but from my initial investigation I got the impression that standard “stats” are not very relevant for trading because of their assumptions on distribution types (by that I mean parametric statistics which are confirmatory data analysis – ie they assume a hypothesis first and run a test to disprove it).

Student’s T-test assumes normal distribution – which clearly seems inappropriate with the typical fat-tails that markets throw at us.

Do you know if there are other types of stats that could give better results (ie robust stats, non-parametric stats, explanatory data analysis, etc.)

hi Jez, you are correct and thanks for the kind words. I mentioned this drawback at the end of the article– i personally do not use this specific test but it is a good start. in the article on wikipedia they mention the following: “To relax the normality assumption, a non-parametric alternative to the t-test can be used, at a cost of lower statistical power. The usual choices for non-parametric location tests are the Mann–Whitney U test for independent samples, and the binomial test or the Wilcoxon signed-rank test for paired samples.”

cheers

dv

That’s where I realise I need to buy a stats book! 😉

Do you use other – more appropriate tests?

well, as we shall see in the next article, sometimes all you need is a simple technical indicator :o)

to be honest Jez, im sure you are more sophisticated than you are letting on, but i have found in the course of testing everything that some simple stats give you 80-90% of the results of very onerous and complicated procedures. Some simplifications do even better than sophisticated procedures. i think it is worthwhile to have a good understanding, but simplicity in trading is far more desirable—-and I always tend to look for the most robust and simplest approach whether for trading the equity curve or for trading strategies.

best

dv

David,

I DO feel way behind in my stats knowledge even though I think I can grasp some concepts. So I did go back and found your post (I think) that you mention in your comment and I am going to start with this..

The Adaptive Time Machine: The Importance of Statistical Filters

https://cssanalytics.wordpress.com/2009/09/16/the-adaptive-time-machine-the-importance-of-statistical-filters/

Do you have any recommendations for learning stats applicable to trading strategies testing – from basic to more sophisticated? thanks

thats my favorite post :o) actually that method is fairly easy to apply and very durable. Try to ensure that you don’t count periods where the strategy is in cash and use trades only.

t= average trade results/standarad deviation of trade results x the square root of the number of trades

look for values >1.6 or <-1.6 as a basic guideline.

most basic econometric books are most suitable, if that is too heavy than just get a basic stats book. use a program like MATLAB or SPSS to do calculations so all you have to do is select and interpret.

cheers

dv

Thanks for your post.

I wonder if it is better to apply the t-test to trade profits or to the time series of daily profits. Maybe both are worth analyzing.

Yes you are correct and a good suggestion. A regression is a better application than this specific approach. Of course it is a little more complicated to explain in this blog. Your mention of “profits” is also important because the equity curve is distorted by compounding and it is best to use a fixed bet methodology to apply this approach.

cheers

dv

I just thought that I dropped a quick note, which I hope won’t sound adversarial. When you test multiple strategies on the same data set (as you’re obviously doing), you are by definition engaging in data mining (see, e.g., White’s definition in his 2000 paper in Econometrica, “A Reality Check For Data Snooping”). Presenting the performance of any strategy on historical data (and I should add, in the absence of transaction cost and slippage estimates) is positively *not* meaningful. The readers of this blog should understand that this is not the way to assess performance. Pairwise testing, like the one you’re performing here, won’t help you, and that’s not taking into account the inadequacy of t-test for this application.

In order to have believable results, you have to present the results of a multiple testing procedure in which you are including *all* the strategies you have formulated. By multiple testing, I don’t mean a simple Bonferroni correction, because it is known to be excessively conservative. But, at least, you should present the results of either White’s test described above, or of Benjamini-Hochberg (which is by now a standard test), or of the more recent and powerful tests devised by Romano and coauthors (e.g., Stepwise multiple testing as formalized data snooping by Romano and Wolf).

gappy not offended in the least……..here is where we agree:in my first post on the t-stat a while back I put a note at the very top about recommending using White’s Reality check and de-trending etc, I have also mentioned monte carlo randomization etc. These methods are WAY beyond the scope of what most traders can practically execute—hence the difference between what is optimal vs reality. Clearly as i have indicated these are NOT the methods i use which are proprietary. As for commission costs etc, that is a separate but relevant issue.

Here is where i disagree: there IS strategy momentum and proper backtest CAN be worthwhile and informative. To the extent that the simple test presented shows deterioration this can help ppl from continuing to use a strategy which may have broken down. I have obviously tested this otherwise i wouldn’t have presented it. My goal is to at least raise some awareness of even the simplest of methods

even if they are not ideal versus just making a judgement call. I really appreciate your comments and if you would like to display the technical method you prefer for higher level readers I would be happy to link to it.

best

dv

Unless you are an existing user of MATLAB or SPSS, you might want to try R instead.

you have very ncely brought the concept but the limitation are affecting so vigorously that ultimate effect on suitablity in all circumstances need t be specified n brought out