Since the mid-1990s, factor-based exchange-traded funds have experienced spectacular growth. By mid-2016, these funds had about $1.35 trillion under management, accounting for about 10% of the market capitalization of U.S. traded securities.
At its most basic level, factor-based investing is simply about defining, and then systematically following, a set of rules that produce diversified portfolios. An example of factor-based investing is a value strategy: buying cheap (low valuation) assets and selling expensive (high valuation) assets.
A problem with factor-based investing is that smart people with even smarter computers can find factors that have worked in the past but are not real—they are the product of randomness and selection bias (referred to as data snooping, or data mining).
The problem of data mining is compounded when researchers snoop without first having a theory to explain the finding they expect—or hope—to find. Without a logical explanation for an outcome, one should not have confidence in its predictive ability.
The Problem Of P-Hacking
“P-hacking” refers to the practice of reanalyzing data in many different ways until you get a desired result. For most studies, statistical significance is defined as a “p-value” less than 0.05—the difference observed between two groups would not be seen even 1 in 20 times by chance. That may seem like a high hurdle to clear to prove that a difference is real. However, what if 20 comparisons are done and only the one that “looks” significant is presented?
The problem of data mining, or p-hacking, is so acute that professor John Cochrane famously said that financial academics and practitioners have created a “zoo of factors.” For example, a May 11, 2017, article in the Wall Street Journal states: “Most of the supposed market anomalies academics have identified don’t exist, or are too small to matter.”
In their 2014 paper “Long-Term Capital Budgeting,” authors Yaron Levi and Ivo Welch examined 600 factors from both the academic and practitioner literature. And authors Campbell Harvey (past editor of The Journal of Finance), Yan Liu and Heqing Zhu, in their paper “…and the Cross-Section of Expected Returns,” which was published in the January 2016 issue of the Review of Financial Studies, reported that 59 new factors were discovered between 2010 and 2012 alone.
Kewei Hou, Chen Xue and Lu Zhang contribute to the literature on anomalies and market efficiency with their May 2017 paper “Replicating Anomalies.” They conducted the largest replication of the entire anomalies literature, compiling a data library with 447 anomaly variables.
The list includes 57, 68, 38, 79, 103 and 102 variables from the momentum, value-versus-growth, investment, proﬁtability, intangibles and trading frictions categories, respectively. To control for microcaps that are smaller than the 20th percentile of market equity for New York Stock Exchange (NYSE) stocks, they formed testing deciles with NYSE breakpoints and value-weighted returns. They treated an anomaly as a replication success if the average return of its high-minus-low decile is signiﬁcant at the 5% level (t ≥ 1.96).
Following is a summary of their findings:
- P-hacking is widespread in the anomalies literature.
- Of 447 anomalies, 286 (64%) are insigniﬁcant at the 5% level.
- Imposing the cutoff t-stat value of 3 proposed by Harvey, Liu and Zhu in their aforementioned paper “...and the Cross-Section of Expected Returns” raises the number of insigniﬁcant anomalies to 380 (85%).
- Even with anomalies that show statistical significance, their magnitudes are often much lower than those reported in the original articles. This is consistent with the finding of R. David McLean and Jeffrey Pontiff, authors of the 2016 study “Does Academic Research Destroy Stock Return Predictability?” that, on average, premiums decay about one-third post-publication.
- Using the q-factor model (beta, size, profitability and investment) of the 161 signiﬁcant anomalies leaves 115 alphas (71%) insigniﬁcant (t- < 2) and 150 alphas (93%) insignificant when raising the hurdle to t < 3.
- The biggest casualty is the liquidity literature. In the trading frictions category that contains mostly liquidity variables, 95 of 102 variables (93%) are insigniﬁcant.
- The distress anomaly is virtually nonexistent in their replication.
Microcaps Skew Results
Hou, Xue and Zhang ask: “Why does our replication differ so much from original studies?” Their answer is in one word—microcaps—which represent only about 3% of the total market capitalization of the NYSE-AMEX-Nasdaq universe, but account for about 60% of the number of stocks. They note that “microcaps not only have the highest equal-weighted returns, but also the largest cross-sectional standard deviations in returns and anomaly variables among microcaps, small stocks, and big stocks.”
Many studies overweight microcaps with equal-weighted returns, and often together with NYSE-AMEX-Nasdaq breakpoints, in portfolio sorts. Further, the authors add: “Due to high transaction costs and illiquidity, anomalies in microcaps are unlikely to be exploitable in practice.”
I would add that this doesn’t necessarily make the information useless, as long-only investors can improve their outcomes by avoiding (screening out) the short legs of anomalies. In that manner, they avoid both the high trading costs and the high costs of shorting.
Hou, Xue and Zhang concluded that their evidence suggests that capital markets are more efficient than previously reported. Their findings also help explain why many anomalies documented in the academic literature seem to disappear, reverse or weaken post-publication. Another explanation is that if there is no logical reason for the anomaly to exist, and there are no limits to arbitrage, sophisticated investors trade to correct mispricings, eliminating the anomaly.
Hou, Xue and Zhang offered suggestions to reduce the risk of p-hacking, including more out-of-sample testing (many studies examine only U.S. data) and the providing of economic explanations (either risk- or behavioral-based) for the presence of a factor’s premium.
These are among the issues Andrew Berkin, director of research at Bridgeway Capital Management, and I address in “Your Complete Guide to Factor-Based Investing: The Way Smart Money Invests Today,” which was published in October 2016.
To address the issues raised in “Replicating Anomalies” and to bring clarity out of complexity and opaqueness, we provide the evidence demonstrating that, within the “factor zoo,” you need only a handful of factors to invest in the same fashion as legendary investors like Warren Buffett. And we show you how to do it in a low-cost, tax-efficient way.
To minimize the risk of p-hacking and to address the concerns we have been discussing, for a factor to be considered, we provide specific criteria, each of which must be met. To start, the factor must provide explanatory power to portfolio returns and have delivered a premium (higher returns). Additionally, it must be:
- Persistent: It holds across long periods of time and different economic regimes, minimizing the risk that the finding isn’t just a lucky outcome specific to one short period of time.
- Pervasive: It holds across countries, regions, sectors and even asset classes, minimizing the risks of p-hacking.
- Robust: It holds for various definitions (for example, there is a value premium whether it is measured by price-to-book, earnings, cash flow or sales); it’s not dependent on one formation that might have been a result of data snooping.
- Investable: It holds up not just on paper but also after considering actual implementation issues, such as trading costs. In other words, we answer the question: Even if we believe the factor is real, can a practical investor really make money from it after costs?
- Intuitive: There are logical risk-based or behavioral-based explanations for its premium and why it should continue to exist.
The good news is that, among all the factors in the zoo, we show that you need to focus only on the eight that meet our criteria: beta, size, value, momentum, profitability, quality, term and carry.
What about all those other factors?
Some have not passed the test of time, fading away after their discovery, perhaps because of data mining or random outcomes. Or perhaps the factors worked only for a special period, regime or narrow band of securities. And many factors have explanatory power that is already well captured by the factors we recommend. In other words, they are variations on a common theme (e.g., the many definitions of value).
If you are considering or are already engaged in factor-based investing, I offer these words of caution from the conclusion of our book:
“First, as we have discussed, all factors—including the ones we have recommended—have experienced long periods of underperformance. So, before investing, be sure that you believe strongly in the rationale behind the factor and the reasons why you trust it will persist in the long run. Without this strong belief, it is unlikely that you will be able to maintain discipline during the inevitable long periods of underperformance. And discipline is one of the keys to being a successful investor. Finally, because there is no way to know which factors will deliver premiums in the future, we recommend that you build a portfolio broadly diversified across them. Remember, it has been said that diversification is the only free lunch in investing. Thus, we suggest you eat as much of it as you can!”
Larry Swedroe is the director of research for The BAM Alliance, a community of more than 140 independent registered investment advisors throughout the country.