The distance weighted moving average is another nonlinear filter that provides the basis for further research and exploration. In its traditional form, a distance weighted moving average (DWMA) is designed to be a robust version of a moving average to reduce the impact of outliers. Here is the calculation from the Encyclopedia of Math:
Notice in the example above that “12” is clearly an outlier relative to the other data points and is therefore assigned less weight in the final average. The advantage of this approach to simple winsorization (omitting outliers that are identified from the calculation) is that all of the data is used and no arbitrary threshold needs to be specified. This is especially valuable for multi-dimensional data. By squaring the distance values in the calculation of the DWMA instead of simply taking the absolute value, it is possible to make the average even more insensitive to outliers. Notice that this concept can be also reversed to emphasize outliers or simply larger data points. This can be done by removing the need to invert the distance as a fraction and simply using the distance weights. This can be called an “inverse distance moving average” or IDWMA, and is useful in situations where you want to ignore small moves in time series which can be considered “white noise” and instead make the average more responsive to breakouts. Furthermore, this method may prove more valuable for use in volatility calculations where sensitivity to risk is important. The chart below shows how these different moving averages respond to a fictitious time series with outliers:
Notice that the DWMA is the least sensitive to the price moves and large jumps, while the IDWMA is the most sensitive. Comparatively the SMA response is in between both the DWMA and IDWMA. The key is that neither moving average is superior to one another per se, but rather each is valuable for different applications and can perform better or worse on different time series. With that statement, lets look at some practical examples. My preference is typically to use returns rather than prices, so in this case we will look at applying the different moving average variations: the DWMA,IDWMA and SMA to two different time series- the S&P500 and Gold. Traders and investors readily acknowledge that the S&P500 is fairly noisy- especially in the short-term. In contrast, Gold tends to be unpredictable using long-term measurements, but large moves tend to be predictable in the short-term. Here is the performance using a 10-day moving average with the different variations from 1995 to present. The rules are long if the average is above zero and cash if it is below (no interest on cash is assumed in this case):
Consistent with anecdotal observation, the DWMA performs the best on the S&P500 by filtering out large noisy or mean-reverting price movements. The IDWMA in contrast performs the worst because it distorts the average by emphasizing these moves. But the pattern is completely different with Gold. In this case the IDWMA benefits from highlighting these large (and apparently useful trend signals), while the DWMA performs the worst. In both cases the SMA has middling performance. One of the disadvantages of a distance weighted moving average is that the calculation ignores the position in time of each data point. An outlier is less relevant if it occurs for example over 60 days ago versus one that occurs today. This aspect can be addressed through clever manipulation of the calculation. However the main takeaway is that it is possible to use different weighting schemes for a moving average for different time series and achieve potentially superior results. Perhaps an adaptive approach would yield good results. Furthermore, careful thought should go into the appropriate moving average calculation for different types of applications. For example, you may wish to use the DWMA instead of the median to calculate correlations- which can be badly distorted by outliers. Perhaps using a DWMA for performance or trade statistics makes sense as well. As mentioned earlier, using an IDWMA is helpful for volatility-based calculations in many cases. Consider this a very simple tool to add to your quant toolbox.
In the previous post, I introduced the concept of a non-linear filter that combines volatility and acceleration. However, this is just one configuration to leverage the concept of a non-linear filter. Using a traditional volatility calculation assigns each data point an equal weight, when in practice some data points should logically have more weight than others. To capture different weighting functions, one could use multiple indicators to weight data points in the volatility calculation to make it more responsive to incoming market data. Using acceleration was an interesting idea to reduce lag and quickly capture changes in volatility. Preliminary analysis showed some promise in this regard. Acceleration is the 2nd derivative, so an interesting question is whether the 3rd derivative- or the velocity of acceleration-can produce even better results. I created a new framework to capture the non-linear weighting that is much simpler to understand and implement:
A) Calculate the rolling series of the square of the daily log returns minus their average return
B) Calculate the rolling series of the absolute value of the first difference in log returns (acceleration/error)
C) Calculate the rolling series of the absolute value of the first difference in B (the absolute acceleration/error log returns) this is the 3rd derivative or velocity of acceleration.
D) Weight each daily value in A by the current day’s C value divided by the sum of C values over the last 10 days-
F) Find the sum of the values in D- this is NLV2
Here is how NLV2 performs on the S&P500 (SPY) versus the other methods previously presented:
The profile of this method is very different than the others, and while it hasn’t performed as well comparatively in recent years it has been the best performer over the entire period that was tested. While other people may dismiss things that have underperformed recently, my own research suggests that this is a mistake- many systems and methods mean-revert around some long-term average. Since this method has fewer moving parts than NLV, that makes it inherently more desirable and perhaps more durable. In either case the point of presenting this method is not to evaluate performance or suggest that it is a superior weighting scheme. It is to present an alternative way to look at the data- clearly different derivatives of log returns carry different pieces of information, and combining these into a calibrated forecast model or a non-linear filter may add value above and beyond the standard volatility formulation.
Yesterday this blog reached the milestone of having over 1,000,000 page views. Sure, it is about equivalent to a popular James Altucher post, but to me it is a major milestone, and I would like to thank all of the readers who have supported this blog over the last few years- even when I took long breaks from posting to spend time working at my various day jobs. Writing this blog has been a labor of love so to speak. It has also given me back a lot more than all the free intellectual property and concepts or ideas I have shared. I have met so many different talented people, and been exposed to work in many different and interesting areas of the investment industry.
When people ask me why I would ever freely share my ideas instead of keeping things to myself and maximizing their market value– I would tell them that I if I had to do it all over, I would absolutely do it again. We are entering a new era where sharing and collaboration will (and already has) build more value than guarding and monopolizing information. The collective and the team will triumph over the individual. Gone are the days of worshipping gurus on CNBC, and instead we have a marketplace where smart people with access to technology can create quantitative systems that can potentially put to shame some of the best human portfolio managers. But to be able to build a sustainable edge requires a strong and open research culture, and effectively integrating the diverse talents of multiple individuals.
I hope to continue to share my own learning process and random musings as long as I can. And since I am still somewhat young and somewhat foolish, I have plenty of time and potential to improve. If there are topics that you find especially interesting that I have not yet covered- or covered in enough detail- please leave me with some ideas in the comments section. As always, I would encourage anyone who wants to share their work with some of the ideas presented with me as I would be more than happy to post them on this blog.
The last two posts presented a novel way of incorporating acceleration as an alternative measure risk. The preliminary results and also intuition demonstrate that it deserves consideration as another piece of information that can be used to forecast risk. While I posed the question as to whether acceleration was a “better” indicator than volatility,the more useful question should be whether we can combine the two into perhaps a better indicator than either in isolation. Traditional volatility is obviously more widely used, and is critical for solving traditional portfolio optimization. Therefore, it is a logical choice as a baseline indicator.
Linear filters such as moving averages and regression generate output that is a linear function of input. Non-linear filters in contrast generate output that is non-linear with respect to input. An example of non-linear filter would be polynomial regression, or even the humble median. The goal of non-linear filters is to create a superior means of weighting data to create more accurate output. By using a non-linear filter it is possible to substantially reduce lag and increase responsiveness. Since volatility is highly predictable, it stands to reason that we would like to reduce lag and increase responsiveness as much as possible to generate superior results.
So how would we create a volatility measure that incorporates acceleration? The answer is that we need to dynamically weight each squared deviation from the average as a function of the magnitude of acceleration- where greater absolute acceleration should generate an exponentially higher weighting on each data point. Here is how it is calculated for a 10-day NLV. (Don’t panic, I will post a spreadsheet in the next post):
A) Calculate the rolling series of the square of the daily log returns minus their average return
B) Calculate the rolling series of the absolute value of the first difference in log returns (acceleration/error)
C) Take the current day’s value using B and divide it by some optional average such as 20-days to get a relative value for acceleration
D) Raise C to the exponent of 3 or some constant of your choosing- this is the rolling series of relative acceleration constants
E) Weight each daily value in A by the current day’s D value divided by the sum of D values over the last 10 days-
F) Divide the calculation found in E by the sum of the weighting constants (sum of D values divided by their sum)– this is the NLV and is analagous to the computation of a traditional weighted moving average
And now for the punchline–here are the results versus the different alternative measures presented in the last post:
The concept shows promise as a hybrid measure of volatility that incorporates acceleration. The general calculation can be applied many different ways but this method is fairly intuitive and generic. In the next post I will show how to make this non-linear filter even more responsive to changes in volatility.
In the last post, I introduced a different method for accounting for risk or uncertainty based on the acceleration in daily returns. Another way to think about this measure is that it takes into account the average error in using yesterday’s return to forecast today’s return. An application using errors to create better trend-following indicators was discussed in “Error-Adjusted Momentum.” A lot of questions and comments on the last post related to the specific calculation of the volatility of acceleration (VOA). The answer is that it is a general concept, and the specific calculation can be done many different ways. But to narrow things down, I will present one way that I think is the most stable and discuss how this concept can be extended further.
VOA= average of: | ln(pt/pt-1)- ln(pt-1/pt-2) |…….. | ln(pt-n/pt-n-1)|
Essentially the VOA is the average of the absolute value of the first difference of daily log returns. Here is a spreadsheet picture to help clarify this basic calculation:
Here is an alternative framework called “Forecast VOA” that incorporates the change in VOA between today and yesterday to give the indicator even less lag than the original VOA:
Forecast VOA (F-VOA)= VOA(t)+ k*(VOA(t)- VOA(t-1))
where “k” is a constant between 0 and 1 that is selected either manually or via optimization. The lag does not need to be 1 day, but can be any number of chosen days backward. The constant k and the number of lags can incorporated as an independent or recursive process similar to an EMA where the previously solved measure is fed back into the next time step. In the following example I used the “Forecast VOA” on the S&P500(SPY) with a lag of one and a “k” value equal to one. I used a 10-day lookback for the baseline VOA and did not use recursion. The target VOA was set at 1%, where the position size is calculated as the % target divided by VOA. To make a comparison to standard volatility position-sizing, I used a 10-day lookback and computed the 20-day rolling hedge ratio between 10-day VOA and 10-day volatility on a walk-forward basis and multiplied that by the % target to normalize. Here are the results:
Both the forecast VOA, and the standard VOA outperform using standard volatility on a return and risk-adjusted (sharpe) basis. Readers will find that the standard VOA typically outperforms using standard volatility across a range of lookbacks from short to long-term on the S&P500. However, shorter-term measures that are less than 10-days are typically too noisy to use since they contain first differences. Using the Forecast VOA seems to outperform VOA, with the caveat that at the very least one would need to solve for the parameter “k” to calibrate the model. More ambitious formulations would solve for both “k” and the optimal lag, and possibly include other terms in the equation. At this point, further research needs to be done on other instruments as suggested by one of the readers- but using a framework like F-VOA can permit adaptation to enhance performance and consistency across markets. The VOA framework is one step in the direction of looking at alternative and possibly better measures of volatility. However, there are some very sophisticated econometric methods such as GARCH to forecast volatility that readers are strongly encouraged to explore as well.
Momentum is one of the most popular subjects in the financial media. From a physics perspective it is analogous to a measure of average velocity (the rate of change with respect to time). While measures of instantaneous velocity (the derivative of distance with respect to time) are more accurate they are rarely used in practice. One topic that is rarely discussed is the concept of acceleration- or the rate of change in velocity (instantaneous velocity is the derivative of velocity with respect to time). A simple method to calculate velocity and acceleration is to take the first difference or returns of a time series. The table below shows the difference between velocity and acceleration using this simple method of first differences of price, however returns could also be used:
Acceleration is perhaps a more interesting avenue for quantitative research because it signals changes in momentum. Given two stocks with the same momentum, a parabolic rise tends to be riskier than a short-term deceleration in performance. Furthermore, a stock that rises consistently without much change in velocity tends to be less risky than one that constantly changes directions. Unfortunately, volatility on its own tends to penalize changes around a constant slope which may reflect normal ebbs and flows in a persistent trend. For trend-following, traders want a risk measure that is be able to capture the stability of the trend. Various risk measures have been promoted as superior alternatives such as downside standard deviation or conditional value at risk, but their downside- to excuse the pun- is that they fail to reflect risk that can be created on the upside of the distribution. Furthermore, by focusing on the downside of the distribution you have fewer samples to work with. Ideally an alternative risk metric would be bi-directional and indifferent to up or down returns. One such measure is using the volatility of acceleration- or the volatility of the velocity in returns using the first difference between the return today and the return yesterday as a proxy. The following charts of artificially generated time series data show the difference between the volatility of price differences (rather than returns) and the volatility of acceleration to give readers a sense for how the two measures differ. In this case, both time series start and end at the same price but have different profiles:
In the first chart, the trend is very consistent and both measures are very close- and considering the volatility of acceleration tends to be 50% greater than volatility on average- the volatility of acceleration is showing that risk is actually lower than what volatility is saying. In the second chart, the trend is highly inconsistent with several false moves up and down. In this case both volatility and the volatility of acceleration show an increase in risk versus the first time series. However, the volatility of acceleration shows that risk is significantly higher than volatility- even adjusting for scale- which makes intuitive sense. Essentially, the volatility of acceleration is boosted by the changes in direction in momentum. What is even more interesting is that when we apply this volatility of acceleration measure to real time series data, we see that it tends to lead volatility- in other words it tends to provide an early warning signal for risk. The chart below shows the 10-day volatility on the S&P500 (SPY) versus the 10-day volatility of acceleration during the financial crisis in 2008:
Notice that the volatility of acceleration spikes well in advance of the standard volatility readings. Obviously this is a valuable feature for good risk management. But does this translate to practical use in trading systems? As it turns out, it does, and there are a lot of different ways to apply this concept. One simple method is to look at standard volatility position sizing. In this case we use the same 10-day measure for both and a 1% daily target risk (1.5% for volatility of acceleration to reflect difference in scale):
Using the volatility of acceleration measure results in consistently superior returns over time and also higher risk-adjusted returns (higher sharpe ratio). Creative exploration could yield numerous areas of fruitful research. One additional application can be to create an alternative to the popular sharpe ratio. By calculating the annualized volatility of acceleration and dividing this result by 1.5, one can compute what I like to call the “return-to-uncertainty ratio” which may improve the evaluation of trading strategies.
In the last post, I presented a schematic of how Cluster Random Subspace (CRSM) would work for portfolio optimization. But the concept can be extended to prediction and classification. Obviously “Random Forests” could incorporate this concept to build superior decision trees with fewer samples–but I would caution that decision trees tend to have poor performance for prediction of financial time series (this is due to binary thresholding and over-fitting). However, standard multiple regression has been a workhorse in finance simply because it is more robust and less prone to over-fitting than other machine-learning approaches. One of the challenges in regression is that it breaks down when you have a lot of variables to choose from and some of them are highly correlated. There are many established ways for dealing with this problem (PCA, stepwise regression etc), but CRSM remains an excellent candidate since it can be used to form a robust ensemble forecast from a large group of predictors. It does not address the initial choice of variables, but at least it can automatically handle a large group of candidates that may contain some highly correlated variables. CRSM Regression is a good way to proceed when you have a lot of possible indicators, but don’t have any pre-conceived ideas for constructing a good model. Here is a process diagram for one of many possible methods to apply CRSM Regression: