FTCA2

The strength of FTCA is both speed and simplicity. One of the weaknesses that FTCA has however, is that cluster membership is determined by a threshold to one asset only at each step (either MC or LC). Asset relationships can be complex, and there is no assurance that all members of a cluster have a correlation to each other member that is higher than the threshold. This can lead to fewer clusters, and potentially incorrect cluster membership assignments. To improve upon these weaknesses, FTCA2 uses the same baseline method but computes the correlation threshold to ALL current cluster members rather than just to MC or LC. In this case, the average correlation to current cluster members is always calculated to determine the threshold. It also selects assets in order of the closest correlation to the current cluster members.

The pseudo-code is presented below:

 

While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
    • Else
      • Add a Cluster made of HC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
      • Add a Cluster made of LC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster
    • End if
  • End if

End While

Fast Threshold Clustering Algorithm (FTCA)

cluster image

Often it can be surprisingly difficult to improve upon a simple and time-tested recipe. During  the summer of 2011, I worked with Corey Rittenhouse to develop algorithms for grouping asset classes. At the time, I did not have any familiarity with “clustering” algorithms that are often used in  data mining research. The first algorithm that was created resulted from a desire to simplify the complex problem of grouping assets with very few steps, and also to make it computationally simple. As it turns out, ignorance was bliss. The Fast Threshold Clustering Algorithm (FTCA) has many desirable properties that traditional clustering algorithms do not: 1) it produces fairly stable clusters 2) it is fast and deterministic 3) it is easy to understand.  When Michael Kapler and I conducted cluster research for our Cluster Risk Parity portfolio allocation approach with modern clustering methods, one of the biggest issues we both saw was that the resulting clusters changed too frequently– creating excessive turnover. Furthermore, highly correlated datasets such as the Dow 30, had more clusters than logic or rationale would tend to dictate. This  results from the fact that most cluster algorithms function like an optimization routine that seeks to maximize inter-cluster dissimilarity and intra-cluster similarity. This can mean that clusters will change because of very small changes in the correlation matrix that are more likely to be a function of noise. Threshold clustering by comparison uses a logical correlation threshold to proxy “similar” versus “dissimilar.” In FTCA, I initially used a correlation threshold of .5 (approximately the level of statistical significance) to separate similar from dissimilar assets.  The FTCA works similar to the Minimum Correlation Algorithm in that it uses the average correlation of each asset to all other assets as a means of determining how closely or distantly related an asset is to the universe of assets chosen. A graphic of how the FTCA creates clusters is presented below:

Fast Threshold Clustering

The pseudocode for FTCA is presented below:

While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Add to Cluster all other assets that have yet been assigned to a Cluster and  have an Average Correlation to HC and LC > Threshold
    • Else
      • Add a Cluster made of HC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have a Correlation to HC > Threshold
      • Add a Cluster made of LC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have Correlation to LC > Threshold
    • End if
  • End if

End While

It is interesting to look at the historical clusters using FTCA with a threshold of .5 on previously used datasets such as the 8 major liquid ETFs (GLD,TLT,SPY,IWM,QQQ,EFA,EEM,IYR), and the 9 sector Spyders (S&P500 sectors: XLP,XLV,XLU,XLK,XLI,XLE,XLB,XLY,XLF). The database was updated until May of 2013 and shows the historical allocations/last 12 months of clusters generated using a 252-day lookback for correlation with monthly rebalancing. First the 8 major liquid ETFs:

ETF8 FTCA

Notice that the clusters are very logical and do not change once within the 12 month period. The clusters essentially represent Gold, Bonds and Equities (note that a 60 month period shows very little change as well). Now lets take a look at the clusters generated on the famously noisy S&P500 sectors:

FTCA Spyders

Again, the last 12 months of clustering shows very little change in cluster membership. Most of the time, the sectors are intuitively considered to be one cluster, while occasionally the utilities sector shows a lack of correlation to the rest of the group. The choice of threshold will change the number and stability of the clusters- with higher thresholds showing more clusters and a greater change in membership than lower thresholds. As much as I have learned about very sophisticated clustering methods in the last year, I often am drawn back to the simplicity and practicality of FTCA.  From a portfolio management standpoint, it makes using clusters far more practical as well for tactical asset allocation or implementing cluster risk parity.

Shrinkage: A Simple Composite Model Performs the Best

shrinkage

In the last two posts we discussed using an adaptive shrinkage approach, and also introduced the average correlation shrinkage model. The real question is; what shrinkage method across a wide variety of different models works best? In backtesting across multiple universes from stocks to asset classes and even futures, Michael Kapler of Systematic Investor calculated a composite score for each shrinkage model based upon the following criteria: Portfolio Turnover, Sharpe Ratio, Volatility, and Diversification using the Composite Diversification Indicator (CDI). Lower turnover was preferred, for sharpe ratio obviously higher was better, for volatility lower was better and for promoting diversification higher was considered better. The backtests and code can be found here.

The models considered were as follows:

models

 

The best performing shrinkage model can be implemented by virtually anyone with a minimum of excel skills: it is the simple average of the sample correlation matrix, the anchored correlation matrix (all history), and the average correlation shrinkage model. This produced the best blend of characteristics that would be desirable for money managers. The logic is simple: the anchored correlation matrix provides a long-term memory of inter-asset relationships, the sample provides a short-term/current memory, and the average correlation shrinkage assumes that the average correlation of an asset to all other assets provides a more stable short-term/current estimate than the sample. This is a good example of how simple implementations can trump sophisticated as long as the concepts are sound. As a generality, this is my preferred approach whenever possible because it is easier to implement in real life, easier to de-bug, and easier to understand and explain. Another interesting result from the rankings is that the ensemble approaches to shrinkage models performed better. Again this makes more sense. The adaptive shrinkage model (best sharpe) performed poorly by comparision–especially when considering turnover as a factor. It is possible that using only a 252-day window, or using only sharpe as an objective criterion were suboptimal. Readers are encouraged to experiment with other approaches. (we did investigate some methods that showed a lot of promise)

Finally it is important to recognize that shrinkage is not a magic bullet regardless of which approach was used. The results are better but not worlds apart from using just the sample correlation. There is a practical limit to what can be achieved using a lower variance estimate of the correlation matrix with shrinkage. More accurate predictors for correlations are required to achieve greater gains in performance.

Average Correlation Shrinkage (ACS)

In the last post on Adaptive Shrinkage, the Average Correlation Shrinkage (ACS) model was introduced and compared with other standard shrinkage models.  A spreadsheet demonstrating how to implement ACS can be found here: average correlation shrinkage.  This method is meant to be an alternative shrinkage model that can be blended or used in place of standard models. One of the most popular models is the “Constant Correlation Model” which assumes that there is a constant correlation between assets. The strength of this model is that it is a very simple and well-structured estimator. The weakness is that it is too rigid and its performance is dependent on the number of similarly correlated versus uncorrelated assets in the universe. Average Correlation Shrinkage proposes that a good estimator for an asset’s pairwise correlation is the average of all of its pairwise correlations to other assets.  For any pair of assets, their new pairwise correlation is the average of their respective average correlations to all other assets. This makes intuitive sense, and the average is less sensitive to errors than a single correlation estimate. It is also less restrictive than assuming that all correlations are the same.

The graphic below depicts the sample versus the average correlation matrix:

average correlation shrinkage

 

As you can see, the average correlation matrix tends to pull down the correlations of the assets that have high correlations, and increases the correlations of the assets that have low correlations.  One can weight the average correlation matrix with the previous sample correlation matrix in some proportion using the following formula:

adjusted correlation matrix=w*(average correlation matrix)+(1-w)*sample correlation matrix

By using this shrinkage method to adjust the correlation inputs for optimization, the resulting weights are less extreme towards the assets with a low correlation and assets with a high correlation have a better chance of being included in the final portfolio. Like all shrinkage methods, this is meant to be a sort of compromise

Here is a graphic depicting the final adjusted correlation matrix using a shrinkage factor (w) of .5:

adjusted correlation matrix

 

 

Adaptive Shrinkage

The covariance matrix can be quite tricky to model accurately due to the curse of dimensionality. One approach to improving estimation is to use “shrinkage” to a well-structured estimator. The general idea is that a compromise between a logical/theoretical estimator and a sample estimator will yield better results than either method. It is analogous to saying that in the summer, the temperature in many states is likely to be 80 degrees and that you will blend your weather forecast estimate with this baseline number in some proportion to reduce forecast error.

Here are two good articles worth reading as a primer:

honey i shrunk the sample covariance matrix- Ledoit/Wolf

shrinkage-simpler is better  – Benninga

Michael Kapler of Systematic Investor and I tested a wide variety of different shrinkage approaches at the beginning of the year on numerous datasets. Michael is perhaps the most talented and experienced person I have ever worked with (both from a quant and also from a programming standpoint), and it is always a pleasure to get his input. Two interesting ideas evolved from the research: 1) Average Correlation Shrinkage Model (ACS): the correlation between each asset versus all other assets as used in the Minimum Correlation Algorithm is a logical shrinkage model that produces very competitive performance and is simpler to implement that most other models (spreadsheet to follow in the next post) 2) Adaptive Shrinkage:  this chooses the “best” model from a number of different shrinkage models based on the version that delivered the best historical sharpe ratio for minimum variance allocation.

Adaptive Shrinkage makes a lot of sense since the appropriate shrinkage estimator to use is different depending on the composition of the asset universe. For example a universe with one bond and fifty stocks will perform better with a different shrinkage estimator than one with all stocks or multiple diverse asset classes.  In addition, there may be one model that is a composite of different estimators for example, that consistently performs better than others.  In our testing, we chose to define success as the out-of-sample sharpe ratio attained by minimizing variance using a given estimator. This makes more sense than minimizing volatility- which is often used in the literature to evaluate different shrinkage approaches. A higher sharpe ratio  implies that you could achieve lower volatility than a sample minimum variance (assuming volatility happens to be higher) by holding some proportion of your portfolio in cash.  However, the objective function for Adaptive Shrinkage could be anything that you would like to achieve—for example minimum turnover might also be an objective, or some combination of minimum turnover with maximum sharpe ratio.  Here are some of the different shrinkage estimators that we tested. Note that “Average David” refers to the ACS/Average Correlation Shrinkage:

S = sample covariance matrix (no shrinkage)
* A50= 50% average.david + 50% sample

* S_SA_A= 1/3*[average + sample + sample.anchored] 

* A_S= average.david and sample using Ledoit and Wolf math
* D_S= diagonal and sample using Ledoit and Wolf math
* CC_S=constant correlation and sample using Ledoit and Wolf math
* SI_S= single index  and sample using Ledoit and Wolf math
* SI2_S=two.parameter covariance matrix  and sample using Ledoit and Wolf math
* AVG_S= average and sample using Ledoit and Wolf math, where average = 1/5 * (average.david + diagonal + constant correlation + single index + two.parameter covariance matrix)

* A= average.david
* D= diagonal
* CC=constant correlation
* SI= single index
* SI2=two.parameter covariance matrix
* AVG= average = 1/5 * (average.david + diagonal + constant correlation + single index + two.parameter covariance matrix)

* Best Sharpe=Adaptive Shrinkage -invest all capital into the method (S or SA or A) that has the best sharpe ratio over the last 252 days

We used a 60-day parameter to compute the variance/covariance matrix, and 252 days as a lookback to find the shrinkage method with the best sharpe ratio. The report/results can be found here: all60 comprehensive. Interestingly enough the Adaptive Shrinkage/Best Sharpe produced the highest sharpe ratio on almost all datasets. This demonstrates a promising method to potentially improve upon a standard shrinkage approach, and also remove the need to determine which is the best model to use as the base estimator. Readers can draw their own conclusions from the results from this extensive report. I would generalize by saying that most shrinkage estimators produce similar performance, and that combinations of estimators seem to perform better than single estimators. Shrinkage does not deliver substantial improvements in performance versus just using the sample estimator in these tests. Finally, the Average Correlation Shrinkage Model is very competitive if not superior to most estimators and it delivers lower turnover as well. This is true of many of the different variants that use the ACS.

Additional RSO Backtests

Another blogger worth following is Michael Guan from Systematic Edge.  Michael writes frequently about different methods for asset allocation. Some of his posts for example on Principal Components Analysis (PCA) are very comprehensive and well worth reading. Recently he wrote a post showing some additional tests using the Random Subspace Optimization (RSO) with maximum sharpe optimization (long only) on various universes with a step parameter test for “k.” The results are interesting and support the following conclusions: 1) RSO is a promising method to increase return and reduce risk 2) the choice of “k” is important to get the best results.  I will present a few more ideas on extending RSO to follow.

RSO MVO vs Standard MVO Backtest Comparison

In a previous post I introduced Random Subspace Optimization as a method to reduce dimensionality and improve performance versus standard optimization methods. The concept is theoretically sound and is  traditionally applied in machine learning to improve classification accuracy.  It makes sense that it would be useful for portfolio optimization.  To test this method, I used a very naaive/simplistic RSO model where one selects “k” subspaces from the universe and running classic “mean-variance” optimization (MVO) with “s” samples and averaging the portfolio weights found across all of the samples to produce a final portfolio. The MVO was run unconstrained (long and shorts permitted) to reduce computation time since there is a closed form solution.  Two datasets were used: the first is an 8 ETF universe used in previous studies for the Minimum Correlation and Minimum Variance Algorithms, the second was using the S&P sector spyder ETFs. Here are the parameters and the results:

rso comp

 

On these two universes, with this set of parameters, RSO mean-variance was a clear winner in terms of both returns and risk-adjusted returns– and the results are even more compelling when you factor the lower average exposure used as a function of averaging across 100 portfolios. Turnover is also more stable, which can be expected because of the averaging process. Results were best in these two cases when k<=3, but virtually all k outperformed the baseline. The choice of k is certainly a bit clunky (like in nearest neighbourhood analysis), and it needs to be either optimized or considered in relation to the number of assets in the universe. The averaging process across portfolios is also naaive, it doesn’t care whether the objective function is high or low for a given portfolio. There are a lot of ways to improve upon this baseline RSO version. I haven’t done extensive testing at this point, but theory and preliminary results suggest a modest improvement over baseline MVO (and other types) of optimizations. RSO is not per se a magic bullet, but in this case it appears better capable of handling noisy datasets at the very least- where matrix inversion used within typical unconstrained MVO can be unstable.