# CAGR and Performance Measurement

CAGR- or the compounded annualized growth rate- often gets a bad rap in performance evaluation. Truthfully for systems without leverage, the CAGR is actually one of the best performing optimization goal functions in terms of out-of-sample performance–better than the Sharpe Ratio based on our own testing across multiple markets. This observation is consistent with several backtests that show how a ROC-based (ie CAGR) relative strength strategy tends to outperform more complex risk-adjusted CAGR measures. Furthermore, high CAGR systems often have a built in bias towards rewarding simplicity—it is very difficult to achieve a high CAGR with multiple rules that winnow down the observation set.

Putting aside concerns about the lack of a risk-adjusted component, the CAGR measure itself is fraught with instability. For the experienced system developer that has tried Monte Carlo testing, they would recognize that the CAGR can vary widely depending on the window of measurement. If you start measuring even 10-20 days forward or backward from the start of a 10-year backtest, the CAGR can vary as much as 3-5% depending on the system. This is because of the effects of daily measurement intervals and compounding. The implication of this noise factor embedded in CAGR measurement is that qualitative comparison of strategies can draw incorrect conclusions, and walk-forward testing will not produce the desired results. A more robust way to measure CAGR is to use an average of smaller compounded return samples such as the monthly CAGR. This measurement is considerably smoother, and will produce similar results regardless of your start/end dates. While lag is introduced using this method, utilizing a measure of the delta or acceleration in the moving average of CAGR can produce the best of both worlds. I will leave these suggestions for researchers to investigate.

Interesting. While I’m completely believe that you have a solid and copious empirical insight into this approach, I can’t say that I can readily identify a theoretical basis for its validity (preference of MA of Monthly CAGR vs rolling “regular” CAGR).

The lag issue not being an issue is an observation that I can recognize as supported by the literature and therefore recognize and agree with. Perhaps it empirically doesn’t matter because for momemtum (ROC) systems, the immediately preceding month tends to have a mean reversal quality (which is why Jagadesh (sp?) uses t-1 to t-7 for his 6 month momentum training period instead of t to t-6).

Empiricism is observation. Observation is the basis for science. Even one as wretchedly adaptive as ours. Thanks for sharing your observations. Now, how to make a theory fit it…

One question though… do you expect this observation to have a similar impact on Ulcer Performance Index, and other CAGR based (ie Sharpe) based measures of performance (risk adjustment concerns stated previously aside)? And by extension, should these measures be modified for purpose of comparing systems?

Thanks,

Carl

hi carl, some excellent comments, though I hope we are on the same page in the sense that we are discussing primarily performance measurement of strategies versus relative strength algorithms. one comment I might make is that the flawed aspect of a CAGR strategy based on one measurement is that it is very sensitive to start and end dates. in contrast a normalized measure of CAGR is more robust and much less sensitive. thus if we compare two strategies one with a 10% cagr and one with an 8% cagr in the first case we might make an error in strategy selection as the first strategy might have a more favorable start/end window, while in the latter case with a normalized measure we would not make such an error. this minimizes the time variation in the edge measurement that permits more realistic conclusions.

As for your last point, other measures can be similarly adjusted.

best

david

Yes, we’re on the same page. My mean reversion observation would be more appropriate to an instrument’s performance than to a system’s performance as hopfully the system has taken the MR into account.

I’m wondering if for ranking system perfomance if “bracketing” would not be more appropriate. After ranking systems for performance, take the top bracket and use other measures (eg average percent gain, MDD, etc…) to pick the best within the bracket. Heck, it may make sense to use brackets for the differentiators as well. After all, ranking systems to the nth degree is really likely an exercise in false precision since history will not exactly repeat, and will at best only rhyme.

The CAGR is proportional to the average daily log return (since log returns, unlike simple returns, are additive). Maybe a Sharpe ratio computed from log returns could be used, intead of the usual Sharpe using simple returns.

(If w(t) is the wealth at the end of day t, I define the log return as ln(w(t)/w(t-1)) and the simple return as w(t)/w(t-1) – 1.)