Probabilistic Absolute Momentum (PAM)

In the last post on Probabilistic Momentum we introduced a simple method to transform a standard momentum strategy to a  probability distribution to create confidence thresholds for trading. The spreadsheet used to replicate this method can be found here. This framework is intellectually superior to a binary comparison  between two assets because the tracking error of choosing one versus the other is not symmetric across momentum opportunities. The opportunity cost of choosing one asset versus another is embedded in this framework, and using a confidence threshold that is greater than 50% will help to standardize the risk of momentum decisions across diffferent pairings (for example using momentum with stocks and bonds is more risky than with say the S&P500 and the Nasdaq).

The same concept can be used for creating an absolute momentum methodology–this concept was introduced by Gary Antonacci of Optimal Momentum in a paper here. The general idea for those that are not familiar, is that you can use the relative momentum between a target asset-say equities- and some low-risk asset such as t-bills or short-term treasurys (cash) to generate switching decisions between the target and cash. This can be used instead of applying a simple moving average strategy to the underlying asset. In this case we can apply the same approach with Probabilistic Momentum with a short-term treasury ETF such as SHY with some target asset to create a Probabilistic Absolute Momentum strategy (PAM). In this case, I created an example with the Nasdaq (QQQ) and 1-3 year treasurys (SHY) and used the maximum period of time when both had history available (roughly 2800 bars).  I chose 60% as the confidence threshold to switch between QQQ and SHY. The momentum lookback window chosen was 120-days. We did not assume any trading costs in this case–but that would favor PAM even more. Here is a chart of the historical transitions of using the probabilistic approach (PAM) versus a simple absolute momentum approach:



Here is the performance breakdown of applying this strategy:



Here we see that Probabilistic Absolute Momentum reduces the number of trades by over 80% from 121 to 23. The raw performance is improved by almost 2%, and the sharpe ratio is improved by roughly 15%.  More importantly, from a psychological standpoint using PAM is much easier to use and stick with as a discretionary trader or even as a quantitative portfolio manager. It eliminates a lot of the costly whipsaws that result from trying to switch between being invested and being in cash.  It also makes it easier to overlay an absolute momentum strategy on a standard momentum strategy since there is less interference from the cash decision.

Probabilistic Momentum Spreadsheet

In the last post, I introduced the concept of viewing momentum as  a probability of one asset outperforming the other versus a binary decision driven by whichever return is greater between a pair of assets. This method incorporates the joint distribution between two assets that factors in their variance and covariance. The difference in the two mean returns are compared to the tracking error between two assets to compute the information ratio. This ratio is then converted to a probability via the t-distribution to provide a more intelligent confidence-based buffer to avoid costly switching.  A good article by Corey Hoffstein at Newfound discusses a related concept here.  Many readers have inquired about a spreadsheet example for probabilistic momentum which can be found here : probabilistic momentum worksheet.

Are Simple Momentum Strategies Too Dumb? Introducing Probabilistic Momentum


Momentum remains the most cherished and frequently used strategy for tactical investors and quantitative systems. Empirical support for momentum as a ubiqutous anomaly across global financial markets is virtually iron-clad– supported by even the most skeptical high priests of academic finance. Simple momentum strategies seek to buy the best performers by comparing the average or compound return between two assets or a group of assets. The beauty of this approach is its inherent simplicity– from a quantitative standpoint this increases the chances that a strategy will be robust and work in the future. The downside to this approach is that it does not capture some important pieces of information that can lead to: 1) incorrect preferences 2) make the system more susceptible to random noise, and  3) also dramatically magnify trading costs.

Consider the picture of the two horses above. If we are watching a horse race and try to determine which horse is going to take the lead over some time interval (say the next 10 seconds) our simplest strategy is to pick the horse that is currently winning now. For those of you who have observed a horse race, often two horses that are close will frequently shift positions in taking the lead. Sometimes they will alternate (negatively correlated) and other times they will accelerate and slow down at the same time (correlated). Certain horses tend to be less consistent and are prone to bursts of speed followed by a more measured pace (high volatility), while others are very steady (low volatility). Depending on how the horses are paired together, it may be difficult to accurately pick which one will lead just by simple momentum alone. Intuitively, the human eye can notice that one horse will lead the other with a consistent performance- and despite shifting positions occasionally, these shifts are small and and the leading horse is clearly gaining a significant lead. Ultimately, we must acknowledge that to determine whether one horse or one stock is outperforming the other, we need to capture the relationship between the two and also their relative noise in addition to just a simple measure of distance versus time.

In terms of investing, what we really want to know is how to determine the probability or confidence that one asset is going to outperform the other. Surely if the odds of outperformance are only 51% for example, this is not much better than flipping a coin. It is unlikely that two assets are statistically different from one another in that context. But how do we determine such a probability as it relates to momentum? Suppose we have assets A and B. We want to determine the probability that A will outperform B. This implies that B will serve as an index or benchmark to A. In standard finance curriculum, we know that the Information Ratio is an easy way to capture the relative returns in relation to the risk versus some benchmark. It is calculated as:

information ratio


Where Rp= return on the portfolio or asset in question and

Ri= return on the index or benchmark

Sp-i= the tracking error of the portfolio versus the benchmark

The next question is how do we translate this to a probability? Typically one would use a normal distribution to find the probability using the information ratio (IR) as an input. However, the normal distribution is only appropriate with a large sample size. For smaller sample sizes that are prevalent with momentum lookbacks it is more appropriate to use a t-distribution. Thus

Probabilistic Momentum (A vs B)= Tconf (IR)

Probabilistic Momentum (B vs A)= 1-Probabilistic Momentum (A vs B)

This number for A vs B is subtracted from 1 if the information ratio is positive and kept as is if the information ratio is negative. The degrees of freedom is equal to the number of periods in the lookback minus one. In one neat calculation we have compressed the momentum measurement into a probability– one that incorporates the correlation and relative volatility of the two assets as well as their momentum. This allows us to make more intelligent momentum trades while also avoiding excessive transaction costs. The next aspect of probabilistic momentum is to make use of the concept of hysteresis.  Since markets are noisy it is difficult to tell whether one asset is outperforming the other. One effective filter is to avoid switching in between two boundaries. This implies switching assets only when the confidence of one being greater than the other is greater than a certain threshold. For example, if I specify a confidence level of 60%, I will switch only when each asset has a 60% probability of being greater than the other.  This leaves a buffer zone of 20% ( 2x(60%-50%)) to avoid noise in making the switch. The result is a smooth transition from one asset to the other. Lets first look at how probabilistic momentum appears versus a simple momentum scheme that uses just the relative return to make the switch between assets.

Probabilistic Momentum 1


Notice that the switch between trading SPY and TLT (S&P500 and Treasurys) using probabilistic momentum are much smoother than using simple momentum. The timing of the trades also appears superior in many cases. Now lets look at a backtest of using probablistic momentum with a 60-day lookback versus a simple momentum system on both SPY and TLT with a confidence level of 60%.

Probabilistic Momentum 2


As you can see, using probabilistic momentum manages to: 1) increase return 2) dramatically reduce turnover 3) increases the sharpe ratio of return to risk.  This is accomplished gross of trading costs, comparisons net of a reasonable trading cost are even more compelling. From a client standpoint, there is no question that fewer trades (especially avoiding insignificant trades that fail to capture the right direction) also is highly appealing, putting aside the obvious tax implications of more frequent trading. Is this concept robust? On average across a wide range of pairs and time frames the answer is yes. For example here is a broad sample of lookbacks for SPY vs TLT:

Probabilistic Momentum


In this example, probabilistic momentum outperforms simple momentum over virtually all lookbacks with an incredible edge of over 2% cagr.  Turnover is reduced by an average of almost 70%. The sharpe ratio is on average roughly .13 higher for probabilistic versus simple. While this comparison is by no means conclusive, it shows the power of using this approach. There are a few caveats: 1) the threshold for confidence is a parameter that needs to be determined–although most work well. using larger thresholds creates greater lag and fewer trades, and vice versa and this tradeoff needs to be determined. As a guide for shorter lookbacks under 30 days a larger threshold  (75% or as high as 95% works for very short time frames)  is more appropriate. For longer lookbacks a confidence level between 55% and 75% tends to work better. 2)  the trendier one asset is versus the other, the smaller the advantage of using a large confidence level– this makes sense since perfect prediction would imply no filter to switch. 3) distribution assumptions– this is a long and boring story for another day.

This method of probabilistic momentum has a lot of potential extensions and applications. It also requires some additional complexity to integrate into a multi-asset context. But it is conceptually and theoretically appealing, and preliminary testing shows that even in its raw form there is substantial added value especially when transaction costs are factored in.

NAAIM Wagner Award 2014

To all readers, I would encourage you to consider entering into this year’s NAAIM Wagner Award. The prestigious prize has been awarded to several well-recognized industry practitioners in the past and has served to boost the career of many new entrants in the field. Details for submission can be found below:

The National Association of Active Investment Managers, or NAAIM, is holding its sixth annual Wagner Award competition for advancements in active investment management.  The competition is open to readers like yours who are academic faculty and graduate students, investment advisors, analysts and other financial professionals.

The first place winner will receive a $10,000 prize, plus an invite to present their winning paper at NAAIM’s national conference in May (free conference attendance, domestic air travel and hotel accommodations will also be provided).  Second and third place winners could also be eligible for monetary prizes of $3,000 and $1,000 respectively.

To find out more about NAAIM or to apply for the Wagner Award competition, visit NAAIM’s website,  and look for the Wagner Award page in the Resources section.


For more information:


To download the application:



All the best,

Greg Morris, NAAIM 2014 Wagner Award Chairman

Tel. 888-261-0787


The Essence of Being A Quant


During the holidays, a person gets a chance to reflect on the more philosophical matters. In my case, one question that stood out was to define the essence and importance of the profession to investment management. I began to realize that the term itself or even the profession is poorly defined and articulated even within the industry. The first time I was asked what a “quant” was, I simply explained that they were number-crunchers that created systems for trading money. The reality is not far off from this simplistic explanation. But having read and heard a lot of different explanations (and accusations!), the essence of being a quant is to translate any possible unit of useable information for either financial forecasting, algorithm development, or rules-based systems for trading-  lets call this broad class simply “trading models“. This includes (but is definitely not limited to): 1) fundamental data 2) technical data 3) economic data 4) news 5) weather 6) or anything that might be considered useful or predictive. The analytical tools of the trade have become highly cross-disciplinary and come from a wide variety of fields such as (but not limited to): 1)math 2) statistics 3) physics 4) economics 5) linguistics 6) psychology 7)biology. A lot of the common methods used across fields fall in the now burgeoning interdisciplinary field of “data mining and analysis.”

A quant is simply interested in creating a quantifiable method for developing trading models. If a concept is not quantifiable it is often because it is either 1) not clearly defined or 2) simply not testable. These principles are generally respected by all scientists regardless of discipline. There is truthfully no area that should not be of interest to a quant, there are just areas that are more or less fruitful or simply worth prioritizing. Since financial price data of many time series are of high quality and readily available with long histories, this is a natural area of priority. Why then do quants seem to frown or pick on technical analysis which makes use of the same data? The answer is because most of the original technical analysis literature and media falls into the two categories identified as being difficult to quantify. Often the concepts and ideas are highly elaborate with a wide range of extenuating circumstances where the same conclusion would hold a different meaning. This implies a highly complex decision tree approach (assuming all of the nuances can be identified or articulated). The downside to believing in traditional technical analysis is twofold: 1) a lack of statistical robustness 2)  the flawed assumption that markets are stationary- we can rely on gravity but we cannot rely on any measurable financial phenomena to always follow the same set of laws or rules since they operate in a dynamic ecosystem of different market players; asset managers, central banks, and governments constantly try to influence or anticipate the actions of eachother. While that may sound harsh— it does not mean that we should abandon using technical indicators or ideas. It simply means that indicators or ideas represent testable inputs into a trading model that can and should be monitored or modified as conditions change.

What about the die-hard fundamental analysis approach? They are more similar to traditional quants (thanks to value investors for example that often create quantitative rules of thumb that are easy to test) and tend to use statistical analysis or some form of quantitative application in portfolio management regardless of their level of discretionary judgment.  However, they are also guilty of some of the same flaws as technical analysts because they often rely on concepts that are  either not observable  or not testable from a practical standpoint (and hence not quantifiable). For example, if a portfolio manager talks about the importance of having a meeting with management and assessing their true intentions for making corporate decisions– this is not really testable for a quant. Neither is the leap of foresight that an investor has about whether a product that has never been sold will be popular. The downside to believing in a purely fundamentalist approach is that the relative value of the insights that they claim are important is very difficult to assess or measure. Regardless of how important these individuals claim (and rationally product foresight is potentially a real skill) their qualitative or intuitive insights are, they must be separated from the style or factor exposure that is quantifiable (that is taken on either intentionally or unintentionally) to determine some baseline of usefulness. For example if a portfolio manager claims to buy large cap value stocks with high quality balance sheets, but uses additional “judgment factors” to narrow down their list for the portfolio, their performance should be benchmarked against a stock screen or factor model that approximates their approach. This gives some insight as to how much positive or negative value has been added by their added judgment. In many cases this value tends to be negative–which calls into question the utility in paying a portfolio manager such exorbitant compensation.

In truth the quant is as much a threat to the classic portfolio manager role as the machine is to human labor. A quant can manufacture investment approaches that are far cheaper, more disciplined, have greater scale, and are more reliable. The more advanced and creative the quant is, the more information can be quantified, and the more approaches that can be replicated. A quant’s best friend is therefore the computer programmer who performs the very important task of creating the automated model. Unlike a machine, once a model is created, it can and should be frequently improved and monitored. How this process is done distinguishes the professional from the amateur quant. A professional quant will make sensible and robust improvements that will improve a model’s prospects for dealing with uncertainty, regardless of what the model performance is in the short-term– whether it is good or bad.  The amateur quant will make cosmetic improvements to backtests of the model by primarily tweaking parameters in hindsight, or simply make adjustments based on short-term performance that would have caught the latest rally or avoided the latest decline. Here are a few other key differences between the pros and the amateurs when it comes to quant: The professional starts with a simple and clear idea, and then increases complexity gradually to suit the problem within an elegant framework. The amateur tries to incorporate everything in an awkward framework at the same time. The professional will seek to use logic and theory that is as general and durable as possible to guide model development or improvement. The amateur strives to create exceptional backtests and will be too data-driven with model development and refinement decisions. The professional takes a long time to slowly but earnestly improve a model fully aware that it can never be perfect. The amateur either wants to complete the model and proceed to trading immediately, or in the opposite context they are so afraid of failure that they 1)always find something that could go wrong 2) are overly skeptical and easily persuaded by peer hyperbole over fact 3)are addicted to finding new avenues to test. Finally–and perhaps most importantly towards impacting performance: The  professional will not give in easily to outside influences (management, clients etc) to make adjustments for ill-advised reasons unless there is a long-term or clear business case for doing so (the typical trades that the model takes on are perhaps difficult to execute in practice). The amateur will buckle to any pressure or negative feedback and try to please everyone.

If the last paragraph sounds arrogant, I would be happy to admit being guilty of one or more of such amateur mistakes earlier in my own career. But this is much like the path of development for any field of expertise. In truth, almost any “professional” quant learns these lessons whether through peer instruction or the “school of hard knocks”. But one of the positive benefits of experience and honest self-assessment is that you can learn how to lean in the right direction. Without being honest with yourself, one can never get better. To end on a sympathetically qualitative note, it is useful to think of a professional quant as also sharing the qualities of a martial artist: a quant must also have solid control of their own mind and emotions as they relate to working with trading models to be able to rise to the highest level. When practiced well, this is not a frenetic and purposeful state, but rather what appears to be a focused but almost detached state where the decisions are more important than the outcomes. One really good idea in a relaxed moment is superior to a hundred hours of determined exploration. The benefit to such a state tends to be good and consistent outcomes with regards to model performance. Ironically, too much focus and energy invested in the outcome (performance) has the opposite effect. The psychology and emotional maturity of a quant can be as or more important as their inherent talent or knowledge towards driving investment performance. Of course this hypothesis is subject (and should be) to quantitative examination.

Free Web-Based Cluster Application Using FTCA

One of the most useful ways to understand clusters is to work with a visual application and see different cluster groupings over time. In a previous post, I introduced the Fast Threshold Clustering Algorithm- FTCA-  as a simple and intuitive method of forming asset clusters.  A fellow quant researcher and long-time reader- Pierre Chretien- was kind enough to build an FTCA Shiny web application in collaboration with Michael Kapler of Systematic Investor in R. It shows the use of clustering with FTCA on a broad range of asset classes that have been back-extended using index data. Users can experiment with varying different parameters from the time period at cluster formation (plot date), the correlation threshold for cluster formation, and the lookback period for the correlation calculation.  Another useful output is the return and volatility for each asset at the time of cluster formation as well as a transition map of the historical cluster memberships for each asset. Two images of the application are presented below.  Very cool stuff and nice work!  I personally enjoyed using it to see the impact of different parameters and also the historical clusters over different time periods.

The link to the application can be found here:

cluster web application

cluster application1

cluster application2






The strength of FTCA is both speed and simplicity. One of the weaknesses that FTCA has however, is that cluster membership is determined by a threshold to one asset only at each step (either MC or LC). Asset relationships can be complex, and there is no assurance that all members of a cluster have a correlation to each other member that is higher than the threshold. This can lead to fewer clusters, and potentially incorrect cluster membership assignments. To improve upon these weaknesses, FTCA2 uses the same baseline method but computes the correlation threshold to ALL current cluster members rather than just to MC or LC. In this case, the average correlation to current cluster members is always calculated to determine the threshold. It also selects assets in order of the closest correlation to the current cluster members.

The pseudo-code is presented below:


While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
    • Else
      • Add a Cluster made of HC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster.
      • Add a Cluster made of LC
        • Try adding each of the remaining assets that have not yet been assigned to a Cluster in order of highest correlation to the current cluster if correlation of the asset is > the average correlation of the the current cluster
    • End if
  • End if

End While

Fast Threshold Clustering Algorithm (FTCA)

cluster image

Often it can be surprisingly difficult to improve upon a simple and time-tested recipe. During  the summer of 2011, I worked with Corey Rittenhouse to develop algorithms for grouping asset classes. At the time, I did not have any familiarity with “clustering” algorithms that are often used in  data mining research. The first algorithm that was created resulted from a desire to simplify the complex problem of grouping assets with very few steps, and also to make it computationally simple. As it turns out, ignorance was bliss. The Fast Threshold Clustering Algorithm (FTCA) has many desirable properties that traditional clustering algorithms do not: 1) it produces fairly stable clusters 2) it is fast and deterministic 3) it is easy to understand.  When Michael Kapler and I conducted cluster research for our Cluster Risk Parity portfolio allocation approach with modern clustering methods, one of the biggest issues we both saw was that the resulting clusters changed too frequently– creating excessive turnover. Furthermore, highly correlated datasets such as the Dow 30, had more clusters than logic or rationale would tend to dictate. This  results from the fact that most cluster algorithms function like an optimization routine that seeks to maximize inter-cluster dissimilarity and intra-cluster similarity. This can mean that clusters will change because of very small changes in the correlation matrix that are more likely to be a function of noise. Threshold clustering by comparison uses a logical correlation threshold to proxy “similar” versus “dissimilar.” In FTCA, I initially used a correlation threshold of .5 (approximately the level of statistical significance) to separate similar from dissimilar assets.  The FTCA works similar to the Minimum Correlation Algorithm in that it uses the average correlation of each asset to all other assets as a means of determining how closely or distantly related an asset is to the universe of assets chosen. A graphic of how the FTCA creates clusters is presented below:

Fast Threshold Clustering

The pseudocode for FTCA is presented below:

While there are assets that have not been assigned to a cluster

  • If only one asset remaining then
    • Add a new cluster
    • Only member is the remaining asset
  • Else
    • Find the asset with the Highest Average Correlation (HC) to all assets not yet been assigned to a Cluster
    • Find the asset with the Lowest Average Correlation (LC) to all assets not yet assigned to a Cluster
    • If Correlation between HC and LC > Threshold
      • Add a new Cluster made of HC and LC
      • Add to Cluster all other assets that have yet been assigned to a Cluster and  have an Average Correlation to HC and LC > Threshold
    • Else
      • Add a Cluster made of HC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have a Correlation to HC > Threshold
      • Add a Cluster made of LC
        • Add to Cluster all other assets that have yet been assigned to a Cluster and have Correlation to LC > Threshold
    • End if
  • End if

End While

It is interesting to look at the historical clusters using FTCA with a threshold of .5 on previously used datasets such as the 8 major liquid ETFs (GLD,TLT,SPY,IWM,QQQ,EFA,EEM,IYR), and the 9 sector Spyders (S&P500 sectors: XLP,XLV,XLU,XLK,XLI,XLE,XLB,XLY,XLF). The database was updated until May of 2013 and shows the historical allocations/last 12 months of clusters generated using a 252-day lookback for correlation with monthly rebalancing. First the 8 major liquid ETFs:


Notice that the clusters are very logical and do not change once within the 12 month period. The clusters essentially represent Gold, Bonds and Equities (note that a 60 month period shows very little change as well). Now lets take a look at the clusters generated on the famously noisy S&P500 sectors:

FTCA Spyders

Again, the last 12 months of clustering shows very little change in cluster membership. Most of the time, the sectors are intuitively considered to be one cluster, while occasionally the utilities sector shows a lack of correlation to the rest of the group. The choice of threshold will change the number and stability of the clusters- with higher thresholds showing more clusters and a greater change in membership than lower thresholds. As much as I have learned about very sophisticated clustering methods in the last year, I often am drawn back to the simplicity and practicality of FTCA.  From a portfolio management standpoint, it makes using clusters far more practical as well for tactical asset allocation or implementing cluster risk parity.

Shrinkage: A Simple Composite Model Performs the Best


In the last two posts we discussed using an adaptive shrinkage approach, and also introduced the average correlation shrinkage model. The real question is; what shrinkage method across a wide variety of different models works best? In backtesting across multiple universes from stocks to asset classes and even futures, Michael Kapler of Systematic Investor calculated a composite score for each shrinkage model based upon the following criteria: Portfolio Turnover, Sharpe Ratio, Volatility, and Diversification using the Composite Diversification Indicator (CDI). Lower turnover was preferred, for sharpe ratio obviously higher was better, for volatility lower was better and for promoting diversification higher was considered better. The backtests and code can be found here.

The models considered were as follows:



The best performing shrinkage model can be implemented by virtually anyone with a minimum of excel skills: it is the simple average of the sample correlation matrix, the anchored correlation matrix (all history), and the average correlation shrinkage model. This produced the best blend of characteristics that would be desirable for money managers. The logic is simple: the anchored correlation matrix provides a long-term memory of inter-asset relationships, the sample provides a short-term/current memory, and the average correlation shrinkage assumes that the average correlation of an asset to all other assets provides a more stable short-term/current estimate than the sample. This is a good example of how simple implementations can trump sophisticated as long as the concepts are sound. As a generality, this is my preferred approach whenever possible because it is easier to implement in real life, easier to de-bug, and easier to understand and explain. Another interesting result from the rankings is that the ensemble approaches to shrinkage models performed better. Again this makes more sense. The adaptive shrinkage model (best sharpe) performed poorly by comparision–especially when considering turnover as a factor. It is possible that using only a 252-day window, or using only sharpe as an objective criterion were suboptimal. Readers are encouraged to experiment with other approaches. (we did investigate some methods that showed a lot of promise)

Finally it is important to recognize that shrinkage is not a magic bullet regardless of which approach was used. The results are better but not worlds apart from using just the sample correlation. There is a practical limit to what can be achieved using a lower variance estimate of the correlation matrix with shrinkage. More accurate predictors for correlations are required to achieve greater gains in performance.