Momentum strategies generate a lot of hype and deservedly so- it is the “premier market anomaly”- a praise heaped by no less a skeptic than Eugene Fama himself. For those who do not know Fama, he happens to be both a founder and ardent proponent of the so-called “Efficient Markets Hypothesis.” The belief in momentum as a legitimate market anomaly has no less fervor in financial circles than organized religion. Doubt its existence and you are akin to a quack or relegated to amateur status among the experienced.
But any real scientist worth their salt should always question “why?” if only to gain a better understanding of the phenomenon. This is not just academic, it is also a practical matter for those who trade with real money. A deeper analysis of the drivers of momentum performance and the conditions in which it can exist can reveal the potential for superior strategies. There have been several landmark papers which shed light on this issue that have no doubt been forgotten or ignored due to their technical nature. For example Lo and MacKinlay (When Are Contrarian Profits Due to Stock Market Overreaction) and Conrad and Kaul (An Anatomy of Trading Strategies). The arguments and evidence put forth in these articles help to reconcile how Mr. Fama can both believe in Efficient Markets and still consider momentum to also exist as a legitimate anomaly at the same time. This isn’t a quirk borne of quantum physics, but rather the implication of some basic math and demonstrated conclusively using simulated financial data.
In a previous post, I presented some ideas and testing related to identifying superior universes for momentum strategies. A simple/naaive method of finding the best performing universes through brute force shows promise, but there are pitfalls because that method does not capture the drivers of momentum performance. So lets begin with inverting the basic math introduced by Lo and MacKinlay that describes the favorability of a particular universe for contrarian or mean-reversion strategies. Since momentum is the polar opposite of contrarian, what is good for one is bad for the other. The table below shows the three ingredients that affect momentum performance:
The first factor- time series predictability- relates to how predictable or “auto-correlated” an asset or group of assets is on the basis of whether high (low) past returns predict high (low) future returns. If a universe contains highly predictable assets then a momentum strategy will be better able to exploit measurements of past performance.The second factor- dispersion in mean returns- relates to whether a group of assets have average or mean returns that are likely to be materially different from one another. A heterogeneous universe of assets such as one containing diverse asset classes will have different sources of returns- and hence greater dispersion- than a homogeneous universe such as sectors within a stock index. The final factor- lead/lag relationships- is a measure of the strength of any tendency for certain assets or stocks to lead or lag on another. This tendency can occur for example between large liquid stocks and small illiquid stocks. In this case a positive relationship would imply that if say Coke went up today, then a smaller cola company would go up tomorrow. This is good for contrarian strategies that would buy the smaller cola company and short Coke, but obviously bad for momentum strategies–hence the fact that this factor is negatively related to momentum profits. In summary, the equation shows that a “momentum score” can be formulated by adding the time series predictability factor, the dispersion in means factor and subtracting the lead/lag relationship factor.
Let’s show a tangible example to demonstrate how the math matches up with intuition. I calculate a momentum score using the last five years of data for both a diverse asset class universe (SPY,DBC,GLD,TLT,IEF,RWX,IYR,EEM,EWJ,IEV) and also a sector universe (XLE,XLU,XLP,XLB,XLV,XLF,XLK,XLY,XLI). Note that the last five years covers a bull market which would easily obscure comparisons based on just back-testing momentum strategy performance on each universe. The momentum score (higher is better) is broken down by contribution in each table for the two different universes.
Clearly the asset class universe is considered to be superior to just using a sector universe for momentum strategies. This certainly jives with intuition and also empirical performance. But what is more interesting is looking at the largest contribution to the difference between the two universes. We see that the dispersion in the means or variation in cross-sectional average returns is by far the biggest factor that separates an asset class universe from a sector universe. The other two factors practically cancel each other out. This makes sense since most sector returns share a dominant common factor– the return of the stock market or say the return of the S&P500. When the market is up (or down), most sectors are up (or down) to a greater or lesser extent. In contrast, in an asset class universe you could have a lot more variation- stocks could be up, bonds could be down and commodities could be up. The variation in performance is far more substantial. Note that variation in performance or dispersion in means is not equivalent to correlations which measure the scaled relationship between shorter-term returns. Having a universe with low cross-correlations is not a good proxy for this effect. To better demonstrate the effect of adding variation, lets look at how adding different assets to the sector universe individually can change the momentum score:
Simply adding long-term bonds (TLT) nearly doubles the momentum score versus the baseline of using just the sectors. On the flip side adding the dominant common factor- the S&P500 (SPY)- reduces the momentum score versus the baseline. Adding Gold is actually superior to adding 10-year/Intermediate Treasurys (IEF) which is typically used to proxy the bond component in most portfolios- despite the fact that the correlation of IEF is far more negative than GLD. Using this analysis can provide some very interesting and sometimes counter-intuitive insights (though most make intuitive sense). But more practically, it can be used to create good universes to apply momentum strategies or any other strategy that derives a large chunk of its returns from the momentum effect. In the next post we will clarify why Mr. Fama can both believe in efficient markets and in momentum as an anomaly and also provide some interesting implications and further analysis.
It is well established that the momentum effect is robust across individual stocks and broad asset classes. However, one of the biggest issues for implementation at the strategy level is to choose a universe for trading. For example, one might choose a broad index such as the S&P500 for an individual stock momentum strategy, but is that the best choice to use to maximize returns? Or if we wanted to build an asset allocation strategy with momentum, which assets should we include/exclude and why? In general, these issues are rarely if ever addressed in either academic papers or in the blogosphere. The consequence is that the choice of universe can artificially inflate results due to data mining (finding the best universe in hindsight prior to presenting the final backtest), or the choice can be too arbitrary and hence sub-optimal from a strategy development standpoint.
There are good reasons to believe that certain asset universes are likely to be superior to others. In a subsequent post, I will attempt to de-compose mathematically what makes a universe particularly well-suited for momentum strategies. But for now, lets discuss some obvious factors that may drive momentum strategy performance: 1) universe heterogeneity/homogeneity: it stands to reason that having an investment universe comprised of six different large cap ETFs will not lead to desirable results because the universe is too similar (homogeneous). In contrast, choosing different sectors or styles or even asset classes should provide opportunities to find good-performing assets when other assets in the universe are not doing as well. 2) the number of assets in the universe: fewer assets will lead to fewer opportunities other things being equal. 3) co-integration/mean-reversion: choosing a universe comprised of co-integrated assets such as say Coke and Pepsi, or Exxon Mobil and the Energy Sector ETF will probably result in negative momentum performance since deviations from a common mean will eventually revert versus continue. This is not a complete description of the factors that drive momentum performance but rather a list that is likely to make logical sense to most investment professionals.
Since there are good reasons to believe that some universes are simply better than others, it makes sense to determine some heuristic for universe selection to improve the performance of momentum strategies. One logical method to determine the universe for trading/backtesting is to try selecting the best universes on a walk-forward basis rather than in hindsight. In other words, we backtest at each time step with a chosen momentum strategy- for example selecting the top asset by 60-day return- and using another window that is much longer- say 756 days or more- to test each possible universe subset from a chosen universe using a performance metric such as CAGR. We would then select the top n/% of universes by their performance, and then apply the momentum strategy to these universes to determine the assets to trade at each re-balance.
A simple example would be to use the nine different sectors in the S&P500 (sector spyders). Perhaps there are sectors that are better suited to a momentum strategy than using all nine? To test this assumption one might choose all universe subsets that are two assets or more (between 2 and 9 in this case) which results in 502 different momentum portfolios. This highlights a key difficulty with this approach- the computational burden grows exponentially as a function of universe size. Suppose we used a 60-day momentum strategy where we chose the top sector by CAGR and re-balance monthly. Looking back 756 trading days or 3 years, we test all 502 different universes and select the top 10% of universes by CAGR using the momentum strategy. Now at each re-balance, we choose the top asset using 60-day momentum from each of the universes that are in the top 10%. The purpose of this strategy- lets call it momentum with universe selection- is to hopefully enhance returns and risk-adjusted returns versus using all assets in the universe. The results of this walk-forward strategy are presented below:
It appears that universe selection substantially enhances the performance of a basic momentum strategy. Both returns and risk-adjusted returns are improved by using rolling universe selection. There are clearly sectors that are better suited to a switching strategy than just using all of them at once. What about asset classes? Does the same effect hold true? We chose a 10-asset universe that we have used before for testing Adaptive Asset Allocation: S&P500/SPY,Real Estate/IYR,Gold/GLD,Long-Term Treasurys/TLT,Commodities/DBC,10-year Treasurys/IEF,Emerging Markets/EEM,Europe/IEV,International Real Estate/RWX,Japan/EWJ. The results of this walk-forward strategy are presented below:
Once again, the returns and risk-adjusted returns are much higher when employing universe selection. The differences are highly significant in this case. Clearly there are subsets of asset classes that are superior to using the entire universe.
This approach to universe selection is not without flaws however, and the reasons why will be clarified in a subsequent post. However it is still reasonably practical as long as the backtest lookback window (756 in the above example) is much larger than the momentum lookback window (60 in the above example). Furthermore, the backtest lookback window would ideally cover a market cycle–using shorter lookback windows could end up choosing only the best performers during say a bull market–which would lead to a biased universe. In addition, it would be helpful to choose a reasonable number or % of the top universes such as the top 5 or top 10 or even the top 10% in the examples we used above. That helps to mitigate the effect of data-mining too many different combinations and ending up with a universe that simply performed well due to chance. It also improves the reliability of out-of-sample performance.
Success in the quantitative field according to the best hedge funds is a “research war.” Building or maintaining an edge in the highly competitive world of financial markets requires constant innovation. The requirement for creativity and re-invention is equally important on the product and business side of finance. The late Ronald Reagan once said that the “status quo” is a sad reflection of our troubles. We must constantly strive to look at problems differently and accept that change is the only constant. The late Steve Jobs said it best:
David Aronson is considered by many serious quants to be one of the first authors to seriously address the subject of data-mining bias in trading system development. His popular book “Evidence-Based Technical Analysis” is a must read for system developers. One of the interesting things that most people do not know is that David was a pioneer in the use of machine-learning in financial markets going back over four decades. Over that time he has become an expert in developing predictive models using trading indicators.
Recently he released the book: Stastically Sound Machine Learning for Algorithmic Trading of Financial Instruments as a companion to the TSSB software that implements all of the applications found in the book. The TSSB software is incredibly powerful and the book does a good job of explaining the myriad of applications that are possible. After reading the book, it became readily apparent that this software was the product of hundreds or even thousands of hours of very meticulous work that could only be shaped by a lifetime of experience working with machine-learning. I had a chance to speak with David recently at length to discuss a variety of different topics:
What was the motivation behind TSSB?
The initial purpose for creating TSSB was as internal software for Hood River Research in our consulting work in applying predictive analytics to financial markets. The bulk of our consulting work has been developing performance boosting signal filters to existing trading systems. There was no software available that dealt successfully with data manning bias and over fitting problems associated with the application of machine learning to financial market data. We decided to sell a guided tutorial for its use (Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments) to raise funds for its additional development. TSSB is made available for free.
What is it that got you interested in predictive modelling versus using traditional trading systems?
I started as a stock broker with Merrill lynch in the 70’s. I wanted to promote the money management services of a Netcom a CTA—but Merill wouldn’t permit that at the time. So I left to go out on my own and began analyzing the performance of all CTAs registered with the CFTC as of 1978. I started reading about statistical pattern recognition (what is now known as predictive analytics) after a prospect of mine from the aerospace industry suggested it might be valuable to apply to financial markets. Only two CTAs in my survey were using such methods- and so I thought there would be an edge to trading systems based on this approach. But in the late 1970’s affordable computers at the time were not quite up to the task. A precursor to Hood River Research was Raden Research Group. We developed an early predictive analytics platform for financial market data software called Prism. The software used a machine learning technique called kernel regression (GRNN) and this predated the use of neural networks and the publication of papers on neural nets in the 1980’s. However, like NN, some of the early methods had the problem of over-fitting the data—and few appreciated the statistical inference issues involved. Later I joined forces with Dr. Timothy Masters who is a statistician and TSSB was developed.
Why do you think conventional technical analysis is flawed from a statistical or theoretical standpoint?
The quality of the indicators uses as inputs to a predictive model or a trading system are very important. Even the best conventional technical indicators have only small amount predictive information. The vast majority is noise. Thus the task is to model that tiny amount of useful information in each indicator and it with useful information in other indicators. Rules defined by a human analyst often miss potentially useful but subtle information.
Consistency is also an issue—experts are not consistent in their interpretation of multi-variable data even when presented with the exact same information on separate occasions. Models, however they are developed are, by definition, always consistent. I would also highlight that there is ample peer-reviewed research demonstrating that humans lack the configural thinking abilities needed to integrate multiple variables simultaneously, except under the most ideal conditions. In contrast this is a task that is easily handled by quantitative models.
You wrote the book: “Evidence-Based Technical Analysis”, what are the challenges of identifying potentially profitable technical trading rules using conventional or even state of the art statistical significance tests alone?
Standard statistical significance tests are fine when evaluating a single hypothesis. In the context of developing a trading system this would be the case when the developer predefines all indicators parameter values, rules, etc. and this is never tweaked and retested. The challenge lies with trying to evaluate trading systems “discovered” after many variants of the system have been tested and best performing one is selected. This search, often called data mining renders standard significance tests useless. Data mining is not a bad thing, in and of itself. We all do it either manually or in an automated fashion. The error is in failing to realize that specialized evaluation methods are required.
Another issue worth pointing out that standard predictive modeling methods are guided by a criterion based on minimizing prediction errors, such as mean squared error and these are not optimal for predictive modes intended to be used for trading financial markets. It is possible for a model to have poor error reduction across the entire range of its forecasts while being profitable for trading because when its forecasts are extreme they carry useful information. It is more appropriate to use financial measures such as the profit factor which are all included as objective functions within TSSB.
Yet a 3rd issue is the multiple hypothesis problem is encountered when building systems. Typically there is a search for the best indicators from an initial large set of candidates, a search for the best values of various tuning parameters, perhaps even a search for the best systems to include in a portfolio of trading systems. These searches are typically conducted via guided search where what is learned at step N is used to guide what is searched at step N+1. Standard approaches to this problem such as White’s Reality Check and the one I discussed in Evidence Based Technical Analysis (Wiley 2006) fail for guided search. Genetic algorithms and genetic programming, in fact all forms of machine learning that build multi-indicator trading systems use guided search. One of the unique features of the TSSB software is that we have Permutation Training that does work for guided search machine learning.
Which methods that most quantitative research analysts use are potentially the most dangerous/least likely to work based upon your research? Which methods that most technical analysis gurus use that are potentially the most dangerous/least likely to work?
Now that the statistical tools are so easy to use and there is so much free code (ie R etc) there is a lot of over-fitting and a lot of backtests that look great but don’t generalize on out-of-sample data going forward. Because empirical research on financial market data has only one set of historical data, and it is easy to abuse almost any type of methodology including walk-forward. Assuming that you use software such as TSSB it is easier to avoid these issues. That said, there is no substitute for common sense or logic in selecting indicators to use or building intelligent model architecture. In my opinion, the way to differentiate or uncover real opportunities currently lie in the clever engineering of new features- such as better indicators.
Why are the TSSB indicators superior to the conventional indicators that most traders tend to look at? What advantages do the TSSB indicators have that are unique?
Many of the indicators included in the TSSB indicator library, which number over 100, have been transformed or re-scaled for consistency across markets. This is crucial for cross-sectional analysis. Some utilize non-linear fitting methods on the underlying variables to produce unique outputs. We have also included a wide variety of unique indicators like Morlet wavelets, some proprietary third-party indicators such as FTI (Follow-Through-Index developed by Khalsa), as well as some indicators that we have seen published like the financial turbulence indicator by Kritzman that we found to be unique or valuable.
Thank You David.
I wrote a paper with a colleague- Jason Teed- for the NAAIM competition. The concept was to apply basic machine-learning algorithms to generate adaptive portfolio allocations using traditional inputs such as returns, volatility and correlations. In contrast to the seminal works on Adaptive Asset Allocation (Butler,Philbrick, Gordillo) which focused on creating allocations that adapted to changing historical inputs over time, our paper on Adaptive Portfolio Allocations (APA) focuses on how to adaptively integrate these changing inputs versus using an established theoretical framework. The paper can be found here: Adaptive Portfolio Allocations Paper. A lot of other interesting papers were submitted to the NAAIM competition and the rest of them can be found here. The method of integration of these portfolio inputs by APA into a final set of portfolio weights is not theory or model driven like MPT, but instead is based upon learning how they interact to produce optimal portfolios from a sharpe ratio perspective. The results show that a traditional mean-variance/Markowitz/MPT framework under-performs this simple framework in terms of maximizing the sharpe ratio. The data further implies that traditional MPT makes far too many trades and takes on too many extreme positions as a function of how it is supposed to generate portfolio weights. This occurs because the inputs- especially the returns- are very noisy and may also demonstrate non-linear or counter-intuitive relationships. In contrast, by learning how the inputs map historically to optimal portfolios at the asset level, the resulting allocations drift in a more stable manner over time. This simple learning framework proposed can be substantially extended with a more elegant framework to produce superior results to those in the paper. The methodology for multi-asset portfolios was limited to an aggregation of pairwise portfolio allocations for purposes of simplicity for readers. The paper didn’t win (or even place for that matter), but like many contributions made on this blog it was designed to inspire new ideas rather than sell cookie-cutter solutions or sound too official or academic. At the end of the day there is no simple ABC recipe or trading system that can survive indefinitely in the ever-changing nature of the markets. There is no amount of rigor,simulation, or sophistication that is ever going to change that. As such, the hope was to provide insight into how to harness a truly adaptive approach for the challenging task of making portfolio allocations.
In the last post on Probabilistic Momentum we introduced a simple method to transform a standard momentum strategy to a probability distribution to create confidence thresholds for trading. The spreadsheet used to replicate this method can be found here. This framework is intellectually superior to a binary comparison between two assets because the tracking error of choosing one versus the other is not symmetric across momentum opportunities. The opportunity cost of choosing one asset versus another is embedded in this framework, and using a confidence threshold that is greater than 50% will help to standardize the risk of momentum decisions across diffferent pairings (for example using momentum with stocks and bonds is more risky than with say the S&P500 and the Nasdaq).
The same concept can be used for creating an absolute momentum methodology–this concept was introduced by Gary Antonacci of Optimal Momentum in a paper here. The general idea for those that are not familiar, is that you can use the relative momentum between a target asset-say equities- and some low-risk asset such as t-bills or short-term treasurys (cash) to generate switching decisions between the target and cash. This can be used instead of applying a simple moving average strategy to the underlying asset. In this case we can apply the same approach with Probabilistic Momentum with a short-term treasury ETF such as SHY with some target asset to create a Probabilistic Absolute Momentum strategy (PAM). In this case, I created an example with the Nasdaq (QQQ) and 1-3 year treasurys (SHY) and used the maximum period of time when both had history available (roughly 2800 bars). I chose 60% as the confidence threshold to switch between QQQ and SHY. The momentum lookback window chosen was 120-days. We did not assume any trading costs in this case–but that would favor PAM even more. Here is a chart of the historical transitions of using the probabilistic approach (PAM) versus a simple absolute momentum approach:
Here is the performance breakdown of applying this strategy:
Here we see that Probabilistic Absolute Momentum reduces the number of trades by over 80% from 121 to 23. The raw performance is improved by almost 2%, and the sharpe ratio is improved by roughly 15%. More importantly, from a psychological standpoint using PAM is much easier to use and stick with as a discretionary trader or even as a quantitative portfolio manager. It eliminates a lot of the costly whipsaws that result from trying to switch between being invested and being in cash. It also makes it easier to overlay an absolute momentum strategy on a standard momentum strategy since there is less interference from the cash decision.
The spreadsheet below is missing a minus sign in the formula which is highlighted below in red. The formula in cell F3 should read: