Measuring and Combining Edges (Part 1)
In this post we will consider the theory behind measuring and combining edges from different variables and/or entire systems. As an all-purpose guide I would recommend following Rob Hanna’s Quantifiable Edges http://quantifiableedges.blogspot.com/ because there are a lot of practical examples of measuring and combining edges and many of the subtleties to consider on both his blog and more importantly in his subscriber letter. In terms of measuring edges, Michael at MarketSci often does a good job of looking at these issues from a more balanced perspective http://marketsci.wordpress.com/ that combines practioner insight with academic rigor. The subject of measuring and combining edges in financial market data is rarely covered in textbooks and requires experience and judgement to either make qualitative conclusions or apply the right statistical methods. The best academic/textbook on the subject is by David Aronson who is the author of Evidence-Based Technical Analysis http://www.evidencebasedta.com/.
The truth is that economists, quants, and academics are often quite guilty of poor or inappropriate analysis in this area– and its very hard for practioners or other academics to see why. The guise of a rigorous white paper with dense mathematics almost always seems more convincing than the dull musings of a trader in a community forum or a financial blogger that puts up a 300-word post. Sadly, this is not actually the case. I have read academic papers that are pure garbage and snippets of incomplete research on a web page that are pure brilliance. Of course more often than not the reverse is true which leads to the general stereotype. The philosophy of the researcher in question is also not a panacea. Neither the proud skeptics nor the naiive optimists are suitable for such a task of measuring and combining edges in financial data. The skeptics will overlook the time-varying nature of edges (a good example is the daily follow through effect) and look at arbitrary time intervals for analysis whose edges may also be uniquely time-varying. They also tend to rely too much on linear regression, and focus too much on binary measures of effect significance using statistics that are sometimes unsuitable for the data. Furthermore, these skeptics are overly enthusiastic about robustness analysis across assets or parameters as if this was the only reliable pre-requisite for an exploitable edge. The naiive optimists are short on analytical rigor, and fail to view any edges that they find within the broader context of an effect. They think that they are finding unique edges that often mere artifacts of the data, and have too little statistical power to draw any conclusions from. The optimists also fail to see the time-varying nature of edges and also fail to perform simple analysis using medians, quartiles or deciles. The optimists make too many mistakes to list, and are the most frequent casualties in financial markets in real-life trading.
Financial data is very noisy, and it makes it very difficult to use classical forecasting techniques like regression. Such analysis relies on determining the slope of the relationship, as well as the line that best describes the fit. While the slope can be found with some degree of reliability in financial data to be useable, the y-intercept estimation creates considerable added error when the noise content (the scatterplot points are all over the place) is high. It is more useful in practice to focus on the beta/slope and omit the y-intercept altogether. Furthermore, trying to extrapolate the slope relationship as a scalar function is also fraught with error as significant non-linearities can exist at different areas of the variable distribution. It is safest to simply look at the slope relationship between two variables or correlation coefficient as being either a positive or a negative relationship. Looking at the strength of this relationship in terms of w% or the equity curve more importantly is far superior than looking at it through the lens of classic regression output. The r-squared of the equity curve based on predicted direction of a given market is preferred to looking at a variable’s power to explain variation in returns ( the scatterplot points). Tracking this curve is highly important to detect if there has been a clear an discernible change in the relationship. To the extent that this is the case the following two inappropriate conclusions are typically made: 1) the variable has an edge even though that edge no longer persists in the recent part of the data set 2) the variable doe not have an edge even though the edge is clear and discernible from a more recent starting point versus the entire data set.
Unfortunately there is no concrete guide to determine how long the measurement period for analysis should be and whether or not this is even effective in real-life. Whether in the first case an academic or practioner believes that an edge still exists but there is a temporary deterioration is actually a correct assessment depends on nature of the effect itself as much as the nature of the analysis. The effects can possess the same properties as time series data: they can be random, mean-reverting, or trending. Furthermore, only after this is accounted for can one attempt to determine whether an effect has lost its edge or merely taking a temporary respite. At this point one must have a strong understanding of the confidence intervals of a given effect as well as the standard deviation in the assessment of confidence figure itself which can be quite significant in the short-term with few observations . Lastly, the effect may be moderated by a third spurious variable that may yield very clear results while the effect proxy variable (the current variable) shows inconsistencies.