## Error-Adjusted Momentum Redux

James Picerno of Capital Spectator recently did a good review of Error-Adjusted Momentum in his post “A Momentum-Based Trading Signal with Strategic Value“. The Capital Spectator blog is rich with great content covering a diverse range of subjects from economics to asset allocation and investment strategy. Picerno has published numerous books, but my favorite is Dynamic Asset Allocation which has a handy place on my bookshelf. Dynamic Asset Allocation is a good review of the case for a tactical approach to portfolio management.

To add some new ideas on the error-adjusted momentum strategy, I would suggest readers experiment with multiple time windows (ie the averaging period) and error lookbacks as well as data points with different frequencies from intraday,daily or even weekly and aggregate their signals to increase robustness. Risk or volatility can be substituted or also used in place of the error adjustment. The general concept of standardizing returns in some way to account for changing variance/error creates an effective non-linear filter that is a superior substitute to an adaptive moving average. In contrast, a typical adaptive moving average approach attempts to vary the lookback window (make the moving average faster or slower) as a function of some indicator. Academic studies on moving averages show that this type of approach demonstrates little success with a wide range of time series data outside of financial markets.

I have personally tried virtually every method I could find with an adaptive moving average framework and have had no material success. Part of the problem is that shifting to shorter-term moving averages increases standard error because you are using less data. Furthermore, by ignoring older data and shifting to a shorter window, you assume that there is no memory from changes in the dynamics of the time series. The success of volatility forecasting methods demonstrate in part that the influence of changes in the time series decay over time rather than all at once. The error-adjusted momentum approach is a nonlinear filter, and in general this class of methods tend to work better in my experience with financial time series. This particular filter permits a sufficient lookback window for averaging to achieve a good estimate (from a statistical sample size perspective) and retains information from dynamics that have evolved over time. The key is that it simultaneously manages to emphasize/de-emphasize portions of the data set based on the observed error (or some other metric). Substituting a weighted moving average in place of a simple moving average in the filter can also better capture the path dependence of changes in error.

As with any approach there are many different ways to apply the same concept, and readers are encouraged to experiment. The caveat is that it is better to use multiple approaches in an ensemble than to select the very best approach– the more things we try via experimentation (especially if there is no logical theory/hypothesis attached to it), the greater the risk of data-mining. A favorite quote from one of good blogs that I follow- Volatility Made Simple– says it best: “the concepts being exploited are much more important than the specific parameters chosen. All sets of parameters will, over the long-term, rise or fall together based on the success or failure of the core concept.”

I prefer presenting new tools and concepts, but I know that there are a lot of readers that would like to see how they can be applied to creating strategies. So here is a very simple strategy that applies Percentile Channels from the last post to a tactical asset allocation strategy. The strategy starts with only 4 diversified asset classes:

Equities– VTI (or SPY)

Real Estate– IYR (or ICF)

Corporate Bonds– LQD

Commodities–DBC

for Cash we will use SHY

Here are the rules:

1) Use 60,120,180, 252-day percentile channels- corresponding to 3,6,9 and 12 months in the momentum literature- (4 separate systems) with a .75 long entry and .25 exit threshold with long triggered above .75 and holding through until exiting below .25 (just like in the previous post)

2) If the indicator shows that you should be in cash, hold SHY

3) Use 20-day historical volatility for risk parity position-sizing among active assets (no leverage is used). This is 1/volatility (asset A) divided by the sum of 1/volatility for all assets to determine the position size.

4) rebalance monthly

Here are the results for this simple strategy:

This is a very consistent strategy which is more notable for its low maximum drawdown and high sharpe ratio (near 2) than its sexy returns. Of course there are many alternatives to “spice” this up by varying the allocation among instruments, changing instruments or using leverage. I wanted to keep the asset list short and simple, and I chose corporate bonds since they provide some of the defensive characteristics of treasurys but with a higher yields and arguably lower systematic risk (no sovereign risk). Substituting the 10-year treasury with IEF instead of corporate bonds produces nearly identical results (1.9 sharpe, 11.8% Cagr, 5.8% max dd). There were better combinations of asset classes and parameters, but this compact list seemed manageable for a self-directed investor without a large portfolio.This is not the ultimate strategy by any means, but shows how to use percentile channels to produce a viable approach to tactical asset allocation.

## Percentile Channels: A New Twist On a Trend-Following Favorite

One of the most widely used trend-following approaches are Donchian Channels which were popularized by the famous “Turtle Traders.” In fact, it was the subject of Donchian Channels that started my collaboration with Corey Rittenhouse with the popular post Percent Exposure Donchian Channel Method. One of the original turtle systems used a 55-day donchian channel that bought at new 55-day highs and sold at new 55-day lows. This system- along with many other popular systems- suffered an erosion in profitability as other people copied the same approach. What has often fascinated me is how one might go about front-running such systems to achieve superior profitability. While I was thinking about this concept, I theorized that entering prior to new highs or lows might create an early entry that would be sufficient to avoid false breakouts induced by system traders. As an alternative one could use Percentile Channels- which function the same as Donchian Channels but instead use the percentile of the price specified instead of a maximum or minimum. Below is a picture comparing percentile channels to donchian channels:

For a fun experiment I decided to run a test using the Commodity Index (DBC- extended with index data) as a rough proxy for a trend-follower’s portfolio with Donchian Channels versus Percentile Channels. The original 55-day Donchian Channel is used to trade long or short on new highs/lows, versus a 55-day Percentile Channel with a 75th and 25th percentile threshold.

The results from 1995-2014 are presented below:

Interestingly enough, the percentile channels help to revive a broken system with earlier entries. Another turtle system–perhaps the most famous- used the 20-day Donchian Channel. For added robustness, lets see how percentile channels might revive this long-broken system:

While this isn’t a perfect proxy for a futures/trend-following portfolio, the results show that it is possible to revive old systems based on new highs and lows using a less restrictive percentile channel approach. This leads to earlier entries that avoid the noise generated from competing signals. Regardless, percentile channels are just another tool for trend-following and can create a wider range of support/resistance type systems by varying the chosen entry/exit threshold.

## Distance Weighted Moving Averages (DWMA and IDWMA)

The distance weighted moving average is another nonlinear filter that provides the basis for further research and exploration. In its traditional form, a distance weighted moving average (DWMA) is designed to be a robust version of a moving average to reduce the impact of outliers. Here is the calculation from the Encyclopedia of Math:

Notice in the example above that “12” is clearly an outlier relative to the other data points and is therefore assigned less weight in the final average. The advantage of this approach to simple winsorization (omitting outliers that are identified from the calculation) is that all of the data is used and no arbitrary threshold needs to be specified. This is especially valuable for multi-dimensional data. By squaring the distance values in the calculation of the DWMA instead of simply taking the absolute value, it is possible to make the average even more insensitive to outliers. Notice that this concept can be also reversed to emphasize outliers or simply larger data points. This can be done by removing the need to invert the distance as a fraction and simply using the distance weights. This can be called an “inverse distance moving average” or IDWMA, and is useful in situations where you want to ignore small moves in time series which can be considered “white noise” and instead make the average more responsive to breakouts. Furthermore, this method may prove more valuable for use in volatility calculations where sensitivity to risk is important. The chart below shows how these different moving averages respond to a fictitious time series with outliers:

Notice that the DWMA is the least sensitive to the price moves and large jumps, while the IDWMA is the most sensitive. Comparatively the SMA response is in between both the DWMA and IDWMA. The key is that neither moving average is superior to one another per se, but rather each is valuable for different applications and can perform better or worse on different time series. With that statement, lets look at some practical examples. My preference is typically to use returns rather than prices, so in this case we will look at applying the different moving average variations: the DWMA,IDWMA and SMA to two different time series- the S&P500 and Gold. Traders and investors readily acknowledge that the S&P500 is fairly noisy- especially in the short-term. In contrast, Gold tends to be unpredictable using long-term measurements, but large moves tend to be predictable in the short-term. Here is the performance using a 10-day moving average with the different variations from 1995 to present. The rules are long if the average is above zero and cash if it is below (no interest on cash is assumed in this case):

Consistent with anecdotal observation, the DWMA performs the best on the S&P500 by filtering out large noisy or mean-reverting price movements. The IDWMA in contrast performs the worst because it distorts the average by emphasizing these moves. But the pattern is completely different with Gold. In this case the IDWMA benefits from highlighting these large (and apparently useful trend signals), while the DWMA performs the worst. In both cases the SMA has middling performance. One of the disadvantages of a distance weighted moving average is that the calculation ignores the position in time of each data point. An outlier is less relevant if it occurs for example over 60 days ago versus one that occurs today. This aspect can be addressed through clever manipulation of the calculation. However the main takeaway is that it is possible to use different weighting schemes for a moving average for different time series and achieve potentially superior results. Perhaps an adaptive approach would yield good results. Furthermore, careful thought should go into the appropriate moving average calculation for different types of applications. For example, you may wish to use the DWMA instead of the median to calculate correlations- which can be badly distorted by outliers. Perhaps using a DWMA for performance or trade statistics makes sense as well. As mentioned earlier, using an IDWMA is helpful for volatility-based calculations in many cases. Consider this a very simple tool to add to your quant toolbox.

## NLV2: Capturing the 3rd Derivative

In the previous post, I introduced the concept of a non-linear filter that combines volatility and acceleration. However, this is just one configuration to leverage the concept of a non-linear filter. Using a traditional volatility calculation assigns each data point an equal weight, when in practice some data points should logically have more weight than others. To capture different weighting functions, one could use multiple indicators to weight data points in the volatility calculation to make it more responsive to incoming market data. Using acceleration was an interesting idea to reduce lag and quickly capture changes in volatility. Preliminary analysis showed some promise in this regard. Acceleration is the 2nd derivative, so an interesting question is whether the 3rd derivative- or the velocity of acceleration-can produce even better results. I created a new framework to capture the non-linear weighting that is much simpler to understand and implement:

A) Calculate the rolling series of the square of the daily log returns minus their average return

B) Calculate the rolling series of the absolute value of the first difference in log returns (acceleration/error)

C) Calculate the rolling series of the absolute value of the first difference in B (the absolute acceleration/error log returns) this is the 3rd derivative or velocity of acceleration.

D) Weight each daily value in A by the current day’s C value divided by the sum of C values over the last 10 days-

F) Find the sum of the values in D- this is NLV2

Here is how NLV2 performs on the S&P500 (SPY) versus the other methods previously presented:

The profile of this method is very different than the others, and while it hasn’t performed as well comparatively in recent years it has been the best performer over the entire period that was tested. While other people may dismiss things that have underperformed recently, my own research suggests that this is a mistake- many systems and methods mean-revert around some long-term average. Since this method has fewer moving parts than NLV, that makes it inherently more desirable and perhaps more durable. In either case the point of presenting this method is not to evaluate performance or suggest that it is a superior weighting scheme. It is to present an alternative way to look at the data- clearly different derivatives of log returns carry different pieces of information, and combining these into a calibrated forecast model or a non-linear filter may add value above and beyond the standard volatility formulation.

## 1,000,000 And Counting…….

Yesterday this blog reached the milestone of having over 1,000,000 page views. Sure, it is about equivalent to a popular James Altucher post, but to me it is a major milestone, and I would like to thank all of the readers who have supported this blog over the last few years- even when I took long breaks from posting to spend time working at my various day jobs. Writing this blog has been a labor of love so to speak. It has also given me back a lot more than all the free intellectual property and concepts or ideas I have shared. I have met so many different talented people, and been exposed to work in many different and interesting areas of the investment industry.

When people ask me why I would ever freely share my ideas instead of keeping things to myself and maximizing their market value– I would tell them that I if I had to do it all over, I would absolutely do it again. We are entering a new era where sharing and collaboration will (and already has) build more value than guarding and monopolizing information. The collective and the team will triumph over the individual. Gone are the days of worshipping gurus on CNBC, and instead we have a marketplace where smart people with access to technology can create quantitative systems that can potentially put to shame some of the best human portfolio managers. But to be able to build a sustainable edge requires a strong and open research culture, and effectively integrating the diverse talents of multiple individuals.

I hope to continue to share my own learning process and random musings as long as I can. And since I am still somewhat young and somewhat foolish, I have plenty of time and potential to improve. If there are topics that you find especially interesting that I have not yet covered- or covered in enough detail- please leave me with some ideas in the comments section. As always, I would encourage anyone who wants to share their work with some of the ideas presented with me as I would be more than happy to post them on this blog.

best,

David

## Combining Acceleration and Volatility into a Non-Linear Filter (NLV)

The last two posts presented a novel way of incorporating acceleration as an alternative measure risk. The preliminary results and also intuition demonstrate that it deserves consideration as another piece of information that can be used to forecast risk. While I posed the question as to whether acceleration was a “better” indicator than volatility,**the more useful question should be whether we can combine the two into perhaps a better indicator than either in isolation**. Traditional volatility is obviously more widely used, and is critical for solving traditional portfolio optimization. Therefore, it is a logical choice as a baseline indicator.

Linear filters such as moving averages and regression generate output that is a linear function of input. Non-linear filters in contrast generate output that is non-linear with respect to input. An example of non-linear filter would be polynomial regression, or even the humble median. The goal of non-linear filters is to create a superior means of weighting data to create more accurate output. By using a non-linear filter it is possible to substantially reduce lag and increase responsiveness. Since volatility is highly predictable, it stands to reason that we would like to reduce lag and increase responsiveness as much as possible to generate superior results.

So how would we create a volatility measure that incorporates acceleration? The answer is that we need to dynamically weight each squared deviation from the average as a function of the magnitude of acceleration- where greater absolute acceleration should generate an exponentially higher weighting on each data point. Here is how it is calculated for a 10-day NLV. (Don’t panic, I will post a spreadsheet in the next post):

A) Calculate the rolling series of the square of the daily log returns minus their average return

B) Calculate the rolling series of the absolute value of the first difference in log returns (acceleration/error)

C) Take the current day’s value using B and divide it by some optional average such as 20-days to get a relative value for acceleration

D) Raise C to the exponent of 3 or some constant of your choosing- this is the rolling series of relative acceleration constants

E) Weight each daily value in A by the current day’s D value divided by the sum of D values over the last 10 days-

F) Divide the calculation found in E by the sum of the weighting constants (sum of D values divided by their sum)– this is the NLV and is analagous to the computation of a traditional weighted moving average

And now for the punchline–here are the results versus the different alternative measures presented in the last post:

The concept shows promise as a hybrid measure of volatility that incorporates acceleration. The general calculation can be applied many different ways but this method is fairly intuitive and generic. In the next post I will show how to make this non-linear filter even more responsive to changes in volatility.