Fast Threshold Clustering Algorithm (FTCA)

November 26, 2013

Often it can be surprisingly difficult to improve upon a simple and time-tested recipe. During the summer of 2011, I worked with Corey Rittenhouse to develop algorithms for grouping asset classes. At the time, I did not have any familiarity with “clustering” algorithms that are often used in data mining research. The first algorithm that was created resulted from a desire to simplify the complex problem of grouping assets with very few steps, and also to make it computationally simple. As it turns out, ignorance was bliss. The Fast Threshold Clustering Algorithm (FTCA) has many desirable properties that traditional clustering algorithms do not: 1) it produces fairly stable clusters 2) it is fast and deterministic 3) it is easy to understand. When Michael Kapler and I conducted cluster research for our Cluster Risk Parity portfolio allocation approach with modern clustering methods, one of the biggest issues we both saw was that the resulting clusters changed too frequently– creating excessive turnover. Furthermore, highly correlated datasets such as the Dow 30, had more clusters than logic or rationale would tend to dictate. This results from the fact that most cluster algorithms function like an optimization routine that seeks to maximize inter-cluster dissimilarity and intra-cluster similarity. This can mean that clusters will change because of very small changes in the correlation matrix that are more likely to be a function of noise. Threshold clustering by comparison uses a logical correlation threshold to proxy “similar” versus “dissimilar.” In FTCA, I initially used a correlation threshold of .5 (approximately the level of statistical significance) to separate similar from dissimilar assets. The FTCA works similar to the Minimum Correlation Algorithm in that it uses the average correlation of each asset to all other assets as a means of determining how closely or distantly related an asset is to the universe of assets chosen. A graphic of how the FTCA creates clusters is presented below:

The pseudocode for FTCA is presented below:

While there are assets that have not been assigned to a cluster

If only one asset remaining then
- Add a new cluster
- Only member is the remaining asset
Else
- Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
- Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
- If Correlation between HC and LC > Threshold
  - Add a new Cluster made of HC and LC
  - Add to Cluster all other assets that have yet been assigned to a Cluster and have an Average Correlation to HC and LC > Threshold
- Else
  - Add a Cluster made of HC
    - Add to Cluster all other assets that have yet been assigned to a Cluster and have a Correlation to HC > Threshold
  - Add a Cluster made of LC
    - Add to Cluster all other assets that have yet been assigned to a Cluster and have Correlation to LC > Threshold
- End if
End if

End While

It is interesting to look at the historical clusters using FTCA with a threshold of .5 on previously used datasets such as the 8 major liquid ETFs (GLD,TLT,SPY,IWM,QQQ,EFA,EEM,IYR), and the 9 sector Spyders (S&P500 sectors: XLP,XLV,XLU,XLK,XLI,XLE,XLB,XLY,XLF). The database was updated until May of 2013 and shows the historical allocations/last 12 months of clusters generated using a 252-day lookback for correlation with monthly rebalancing. First the 8 major liquid ETFs:

Notice that the clusters are very logical and do not change once within the 12 month period. The clusters essentially represent Gold, Bonds and Equities (note that a 60 month period shows very little change as well). Now lets take a look at the clusters generated on the famously noisy S&P500 sectors:

Again, the last 12 months of clustering shows very little change in cluster membership. Most of the time, the sectors are intuitively considered to be one cluster, while occasionally the utilities sector shows a lack of correlation to the rest of the group. The choice of threshold will change the number and stability of the clusters- with higher thresholds showing more clusters and a greater change in membership than lower thresholds. As much as I have learned about very sophisticated clustering methods in the last year, I often am drawn back to the simplicity and practicality of FTCA. From a portfolio management standpoint, it makes using clusters far more practical as well for tactical asset allocation or implementing cluster risk parity.

14 Comments leave one →

Pierre permalink

November 26, 2013 5:47 pm

Hi David,
Thanks again for sharing ! This algo seems to produce awesome results compared to “standard” ones. Great contribution !
Best,
Pierre

Reply
- Paolo permalink
  
  July 29, 2014 11:02 am
  
  Hi Pierre,
  
  I was wondering whether you can share the code behind your shiny app at http://glimmer.rstudio.com/heuristica/clusters/
  
  I’d like to test it on different sets of Securities/asset classes.
  
  Best,
  Paolo
  
  Reply
david varadi permalink*

November 26, 2013 9:51 pm

hi Pierre, good to hear. happy to share. i assume you tried it with ERC or mincorr etc.
best
david

Reply
- Pierre permalink
  
  November 27, 2013 3:55 am
  
  Hi David,
  I tried it with a large sample of products, a threshold of 0.9 (not very aggressive), risk parity between clusters and equal weight inside them.
  Aside from reduced turnover, negative outliers seem less frequent and results more linear than with other clustering algos.
  Best,
  Pierre
  
  Reply
stefan permalink

December 1, 2013 6:37 pm

Hi David,
Thank you for yet another simple but extremely powerful algorithm. I managed to write it in AmiBroker, my platform of choice, in about 80 lines. Now I have to test how effective it is in 2 basic portfolio management systems.

You mentioned using clusters for RP and TAA.
To test in a RP system, I will probably try assigning weights inversely proportional with the asset & cluster volatility.

However, I am very interested in TAA and clusters, which you mentioned above. Anything about them together would be greatly appreciated.
TIA.

Stefan

Reply
- david varadi permalink*
  
  December 2, 2013 5:14 pm
  
  hi stephan, thank you- if you are interested in sharing the code as open source for readers I will be happy to post it or link to it. I will probably discuss how to implement cluster risk parity with FTCA and other applications over the next few blogs. In general for TAA, one easy approach is to just treat the cluster as a single asset for use in either momentum, absolute momentum etc.
  best
  david
  
  Reply
  - stefan permalink
    
    December 3, 2013 2:25 am
    
    David, thank you for the reply. I will try the suggested approach for TAA. Using each cluster as an “asset class” for momentum calculations is a great idea, and is indeed what I was looking for, to leverage the resulting clusters. The next step is to find the optimal way of including the cluster momentum as a factor in the final weights calculation.
    
    About the clusters code: I would say that the resulting code is quite tight with AB, but not trivial, as it has to generate different clusters sets at each bar included in the backtest. I appreciate the offer to post it and I’ll consider that.
  - Artem permalink
    
    December 19, 2013 2:09 pm
    
    Hi, David im new in programming (just a few years of studing). Im living in Russia, so dont blame me for my english =). I implemented your FTCA alghorithm in simple .dll at C# and can share it with anyone who want it. I have one simple question, why it’s called Fast? Because i don see it’s perfect name for alghorithm who use a bunch of time on over 100 000 of data and simple correlation function. Is there some trick’s to make this algho work on millions of data? I thinked about it, and wanted to implement matrix of all correlation values, but 1.000.000^2 floats ~ 100 GB of memory, and +1 element will cost around of million calculations…
  - david varadi permalink*
    
    January 6, 2014 12:46 pm
    
    artem, sorry for such a late response— this is not a fast algorithm for ultra-large data sets like you are describing since all matrix values need to be computed in advance. i called it fast, because it is deterministic and faster than many existing cluster algorithms (ie hiearchical clustering). if you would like to share something that would be much appreciated and i would be happy to cite you and create a link to any material/blog you may have. please send me an email at dmvaradi@gmail.com
    best
    david

	david varadi on Trend Strength Index (TSI…
	david varadi on Business Cycle Sector Tim…
	david varadi on Business Cycle Sector Tim…
	david varadi on Business Cycle Sector Tim…
	Scott on Business Cycle Sector Tim…

new concepts in quantitative research

Fast Threshold Clustering Algorithm (FTCA)

Trackbacks

Leave a comment Cancel reply

Follow Blog via Email

Search

Minimum Correlation Algorithm

Recent Posts

Recent Comments

Blogroll

new concepts in quantitative research

Fast Threshold Clustering Algorithm (FTCA)

Share this:

Related

Trackbacks

Leave a comment Cancel reply

Follow Blog via Email

Search

Minimum Correlation Algorithm

Recent Posts

Recent Comments

Blogroll