Therefore, the best way to compare indexes from different providers is to do so on a risk-adjusted basis, i.e., determine their respective alphas. While indexes are beta investments by definition—not alpha investments—given their varying returns and construction methodologies, we would expect some to outperform others on a risk-adjusted basis. This “alpha” component of seven different index methodologies (and almost as many providers)—including the Dow Jones, Dow Jones Wilshire (rebranded as the Dow Jones Total Stock Market Indexes as of March 31, 2009), Morningstar, MSCI, Standard & Poor’s, Standard & Poor’s Pure and Russell index families—will be explored in this paper, where alpha is determined using a four-factor regression model (i.e., Carhart model) consisting of the three Fama/French factors and momentum.
Investors choose to invest in an index, or really an investment that tracks an index such as a mutual fund or ETF, in order to capture the return associated with that market exposure without the variability (and often costs) associated with active management. While the major index providers have similar methodologies for their domestic equity indexes (see Appendix I for a summary of the methodologies for the index providers included in the study), there are differences among them. These differences impact the performance and risk attributes for each index, yet make it difficult for the average investor to compare the relative strengths and weaknesses of each strategy.
As a shortcut, many investors simply seek out the most well-known index for investing purposes. For example, according to the 2009 Investment Company Factbook, 58 percent of all assets invested in domestic equity index mutual funds were tracking the S&P 500, despite the fact that many other indexes exist with similar market exposures. A better approach would be to see which indexes actually outperform on a risk-adjusted basis, yet little research has been devoted to this topic. While one may expect that indexes would not generate alpha using traditional risk-adjusted measures (i.e., four-factor alpha), the research conducted for this paper suggests otherwise.
Index Investing Today
As of June 30, 2008, more than 70 percent of assets in index mutual funds and ETFs invested within the nine domestic equity styles boxes (defined as Investment Category by Morningstar) were invested in the large blend category, followed by 7 percent in large growth and 5 percent in large value (making the total large-cap allocation approximately 82 percent). While it is not surprising that the majority of assets are invested in large cap, given that it is generally defined as the largest 70 percent of securities based on market capitalization, it is somewhat surprising that such a large portion is invested in a single style: large blend.
Figure 1 includes the rolling three-year annualized performance for the large blend indexes from each of the six different index methodologies (S&P uses the same blend methodology for both its regular and pure indexes, so the return for the S&P 500 has only been included once) from July 1997 to June 2009. Note that rolling three-year periods were selected because the regression analysis in the following section is based on rolling historical three-year periods (i.e., 36 months).
As shown in Figure 1, while the rolling annualized three-year returns for the large blend indexes varied across providers, the returns were relatively similar, although significant differences did exist at varying points in time. The maximum range in three-year returns during the entire test period for the six large blend indexes was the three-year period ending September 2001, where the Morningstar large blend index outperformed the MSCI large blend index by 7.51 percent (per year, +6.70 percent vs. -0.81 percent, respectively), while the minimum range was in September 2000, where the Dow Jones large blend outperformed the S&P large blend index by 1.28 percent (per year, +17.72 percent vs. +16.44 percent, respectively).
Figure 2 includes the annualized returns of the indexes for each style from July 1997 to June 2009, or a 12-year period. Note that these returns were calculated by compounding the monthly returns obtained from Morningstar Direct, based on the same values used to create Figure 1.
The annualized performance differences may not appear large among the indexes in Figure 2, but they are material given the time period (12 years). For example, the annualized performance difference between the best-performing large-cap blend index (Morningstar at 2.78 percent) and the worst-performing large-cap blend index (MSCI at 0.15 percent) may be only 2.63 percent, but over 12 years this would result in a difference of approximately 36 percent (with the investment in the Morningstar large-cap blend being 36 percent larger, ignoring contributions). What is less clear, though, is what the true “alpha” of the strategies is after accounting for their varying market exposures. Using the previous example, it may be that the outperformance of the Morningstar large blend index over the MSCI large blend index is entirely due to the Morningstar index having a higher market weight (i.e., higher beta factor), and once this is adjusted for the difference (or relative alpha), it could become negative. This is what will be explored in the analysis section of the paper.
While it is impossible to know which index group (or really which methodology) will outperform on a risk-adjusted basis in the future, a review of the historical risk-adjusted attributes of each methodology should provide insight as to which methodology does a better job capturing outperformance relative to its market exposure. To determine the “alpha” or risk-adjusted outperformance for each index methodology, a four-factor (i.e., Carhart) regression analysis is performed using the three Fama/French factors, as well as momentum. All data for the beta factors, as well as the risk-free rate, was obtained from Kenneth French’s Web site, and all return data for the indexes was obtained from Morningstar Direct.
The excess return of the index (which is defined as the return of the index for the month minus the risk-free rate for the month) is regressed against a market beta factor (defined as the return on the market minus the risk-free rate), a value factor (or HML, defined as the return on value stocks minus the return on growth stocks), a size factor (or SMB, defined as the return on small stocks minus the return on big stocks), and a momentum factor (based on the six value-weight portfolios formed on size and prior return, the average return on the two high prior-return portfolios minus the average return on the two low prior-return portfolios). The four-factor regression equation is:
Rindex – Rf = ?index + ?index (Rmarket – Rf) + ?SMB(SMB) + ?HML(HML) + ?MOM(Momentum) + ?asset
where Rindex is the return on the index, Rf is the risk-free rate, ?index is the alpha of the index, ?index is the index’s beta with respect to the market, Rmarket is the return of the market, ?SMB is the index’s beta with respect to the “large” factor (SMB), ?HML is the index’s beta with respect to the “value” factor (HML), ?MOM is the index’s beta with respect to the “momentum” factor (MOM), and ?asset is the error term. All monthly alpha estimates are annualized for comparative purposes. For those readers not familiar with four-factor regression approach, see Fama and French  and Carhart .
Cremers, Petajisto and Zitzewitz  have noted that the standard Fama-French (three-factor) and Carhart (four-factor) regression models can produce statistically significant nonzero alphas for passive indexes primarily from the disproportionate weight the Fama-French factors place on small value stocks (which have performed well). While Cremers et al. introduce regression factors that outperform standard models in their paper, the traditional four-factor estimates are used for this research, due to their widespread use and acceptance. While the reader may contend that an index (i.e., a broad, well-diversified and passive portfolio) should not have an alpha component by definition, using a method that is widely employed to determine alpha for active managers (the four-factor Carhart approach with the traditional Fama-French factors) can in fact generate one.
For the analysis, regressions are based on rolling three-year periods, which consist of 36 months of historical data. Rolling periods are used versus a single period to account for potential changing market exposures of the indexes over time, as well as to make the analysis less dependent on the period studied. For example, if an index methodology did very well the first and last months of the test look-back period, it may appear that it generated alpha during the entire study, despite the fact it dramatically under-performed the months in between. Also, the average implied holding period for equity mutual funds is approximately three years based on a current redemption rate of 30 percent per year [ICI 2009], which makes the rolling three-year regression method more relevant to how investors actually invest in equity mutual funds.
Seven different index methodologies are considered for the analysis: Dow Jones, Dow Jones Wilshire, Morningstar, MSCI, Standard & Poor’s, Standard & Poor’s Pure and Russell, with the actual underlying tested indexes listed in Appendix II. The time period for the analysis is from July 1997 until June 2009, which is the longest period for which data was available for the different indexes (all nine domestic styles for each of the seven different providers). Using the same period for all methodologies allows for a more relative comparison than using the entire period of data available for each index. The total number of three-year test periods is 109.
The average four-factor regression alphas for each of the indexes for each style are included in Figure 3, as well as the average alpha across styles, standard deviation of alphas across styles and the average alpha across the styles’ t statistics for each methodology. Information on the weighted outperformance is also included, where the respective alphas are weighted by the total net assets invested in all passive index funds and ETFs as of June 30, 2009. This number reflects how investors actually invest in index funds at the total asset level, versus the simple average that is used for statistical significance purposes for each methodology.
Among the seven methodologies, five had positive average alphas (Dow Jones, Dow Jones Wilshire, Morningstar, S&P and S&P Pure), and S&P’s Pure methodology had the highest alpha, at 1.16 percent, although only Morningstar had an average alpha that was statistically significant (with an average alpha of 0.74 percent and a t statistic of 2.05). On a weighted basis, five methodologies had positive alphas: Dow Jones Wilshire, Morningstar, S&P, S&P Pure and Russell, with Morningstar having the highest weighted alpha, of 1.12 percent, which could largely be attributed to the alpha of its large blend index (1.32 percent).
The range of outperformance decreases on a risk-adjusted basis (Figure 3) when compared with the raw outperformance (Figure 2), to 3.57 percent from 4.28 percent, respectively. There were also some changes in relative outperformance when viewed on a risk-adjusted basis. For example, over the 12-year test period the Dow Jones Wilshire Small Growth Index outperformed the Dow Jones Small Growth Index by 0.09 percent (on an annualized basis, 2.75 percent and 2.66 percent, respectively); however, on a risk-adjusted basis, the Dow Jones Small Growth Index outperformed the Dow Jones Wilshire Small Growth Index by .89 percent (on an annualized basis, 0.02 percent and -0.87 percent, respectively).
The respective alpha estimates for the various indexes were quite consistent during the test period, both on a relative and absolute basis. Figure 4 provides an example; it includes the rolling three-year four-factor regression alphas for the large blend indexes included in the analysis.
As shown in the graph, while the absolute numbers fluctuate over time, the relative rankings change very little during the test period. In the aggregate, when viewed at the ranked index level, Dow Jones, Dow Jones Wilshire, Morningstar, S&P and S&P Pure tended to have relatively consistent rankings that were slightly above average, while MSCI and Russell had rankings that tended to be significantly below average (they also were the two methodologies with negative average alphas). The persistence in alpha should not be that surprising, given the fact the factor estimates for the indexes were relatively constant over time (they are indexes, after all). Combined, these findings suggest that it is likely that some methodologies are likely to persistently generate positive/negative alphas relative to their peers in the future.
There are a number of important takeaways from the analysis. First, while the S&P methodology had a positive average alpha for all nine of its indexes, the alpha for the S&P 500 was negative (-0.07 percent, although not statistically significant). This has important implications, because the vast majority of large blend assets that are indexed are invested in a product that attempts to replicate the S&P 500. The only large blend index with a statistically significant positive alpha was the Morningstar Large Core Index (with an average alpha of 1.32 percent and a t statistic of 7.82), and the only other index with a positive alpha for large blend was the Russell 1000 (with an alpha of 0.17 percent and t statistic of 1.90). Investors looking for positive risk-adjusted returns in the large blend space would appear to be best off investing in these two methodologies.
Second, there can be a tremendous amount of variance (i.e., a high standard deviation) among the alpha estimates across the categories within a methodology, with S&P and Russell having the highest alpha standard deviation and Dow Jones Wilshire and Morningstar the lowest. This is important when considering the fact that some investors choose to index certain styles and not others, although they generally prefer to utilize a provider’s entire suite of indexes (e.g., use all Russell) versus combining different methodologies. For example, an investor would have fared relatively poorly if they had used large-cap active managers and indexed small cap with Russell-based index funds; however, they would have done much better had they done the reverse. The ideal index methodology for implementation purposes across all styles would be one with a positive alpha and a low standard deviation, attributes in both Dow Jones and Morningstar methodologies.
Third, different investors have different goals, and the goal can have dramatic impact on the “ideal” index. For example, while an investor would typically like to invest in an index family that generates positive risk-adjusted alpha, an active manager would typically like an index that generates a negative risk-adjusted alpha, since it should be an easier benchmark to outperform. Interestingly, the most popular benchmarking methodology, Russell, had the second-lowest alpha among the methodologies tested (with an average of -15 bps and a t statistic of -0.23, only MSCI’s was lower, and they specifically build indexes to “better reflect the investment process of asset managers”). This suggests, ignoring the potential qualitative benefits/aspects of Russell’s methodology, that Russell is an easier “hurdle” to overcome than most of the other indexes studied.
The analysis conducted for this paper introduces a simple methodology to determine the optimal indexes with which to invest, both at the individual style level and the overall methodology level, after controlling for risk. Four-factor alphas varied considerably across providers during the time period tested. Five methodologies had positive average alphas (Dow Jones, Dow Jones Wilshire, Morningstar, S&P and Russell), and while S&P Pure had the highest average alpha at 1.16 percent, only Morningstar’s methodology was statistically significant (with an average of 0.74 percent with a t statistic of 2.07). Morningstar also had the highest-weighted alpha, of 1.12 percent, based on how monies were invested in index mutual funds and ETFs as of June 30, 2009 (although this was largely a result of the alpha of its large blend index).
The S&P 500 had a negative alpha (-0.07 percent, although not statistically significant), which is important given the large amount of assets that track it (58 percent of all indexed assets). Russell, arguably the most common index for benchmarking purposes, had the second-lowest average alpha across methodology (-15 bps, although not statistically significant), suggesting that it represents a relatively low hurdle for active managers to overcome compared with the other methodologies considered for the analysis. In closing, the results of this study suggest that some index providers, do, in fact, generate alpha, both on an absolute basis and relative to their peers.
Appendix I: Index Provider Methodologies
Dow Jones: The Dow Jones U.S. Index and its subindexes are constructed and maintained according to a transparent, rules-based methodology. The indexes are weighted based on float-adjusted market capitalization and are calculated in real time. They are rebalanced quarterly (style indexes semiannually), and in addition are reviewed on an ongoing basis to account for mergers, acquisitions and other extraordinary events affecting index components. The large-cap and mid-cap indexes measure the top 70 percent and next 20 percent of stocks by market capitalization. The small-cap index represents the next 5 percent of stocks, excluding the smallest companies based on market capitalization and turnover. The Dow Jones U.S. Style Indexes measure growth stocks and value stocks. Companies determined to be style-neutral are excluded from the indexes. The style classifications are determined using a multifactor model that accounts for projected price-to-earnings ratio (P/E), projected earnings growth, price-to-book ratio, dividend yield, trailing P/E and trailing earnings growth. (www.djindexes.com)
Dow Jones Wilshire: Dow Jones Wilshire U.S. Style Indexes are constructed by separating the Dow Jones Wilshire 5000 universe of stocks into four capitalization groups using full market capitalization and then splitting the capitalization groups into growth and value stocks. The resulting 10 indexes are float-adjusted and market-capitalization weighted. Instead of 12 subindexes there are 10 style benchmarks because the smallest capitalization group, microcap stocks, is not split into growth and value. Large cap is defined as the 750 largest stocks by market capitalization, small cap is the next 1,750 largest stocks from 751 to 2,500, mid cap is a combination of 500 large and small stocks from the 501st largest to the 1,000th largest, and micro cap is all stocks in the bottom half of the Dow Jones Wilshire 5000 Index (below the 2,501st largest). The Dow Jones Wilshire style methodology uses six intuitive fundamentals to define a company as growth or value: next year’s price-to-earnings ratio, forecast long-term earnings growth, price-to-book ratio, dividend yield, trailing revenue growth for the previous five years, trailing earnings growth for the previous 21 quarters. The Dow Jones Wilshire Indexes were rebranded as the Dow Jones Total Stock Market Indexes as of March 31, 2009, following the termination of the joint venture agreement between Dow Jones Indexes and Wilshire. (www.djindexes.com)
Morningstar: Large cap is defined as the largest 70 percent of investable securities by free-float market capitalization, mid cap is the next 20 percent by market capitalization (70th to 90th percentile), and small cap is the next 7 percent (90th to 97th percentile). Within each capitalization class, index constituents are assigned to one of three style orientations—value, growth or core—based on the stock’s overall style score. A stock’s value orientation and growth orientation are measured separately using related but different variables. Value factors: price/projected earnings (50.0 percent), price/book (12.5 percent) price/sales (12.5 percent), price/cash flow (12.5 percent), dividend yield (12.5 percent). Growth factors: long-term projected earnings growth (50.0 percent), historical earnings growth (12.5 percent), sales growth (12.5 percent), cash flow growth (12.5 percent), book value growth (12.5 percent). Morningstar rebalances constituent shares and weights of its indexes quarterly in March, June, September and December (on the Monday following the third Friday). Immediate rebalancing occurs if two constituents merge or a company’s free-float changes by 10 percent or more. The indexes are reconstituted twice annually, in June and December. (www.morningstar.com)
MSCI: MSCI’s domestic indices are subsets of the MSCI US Investable Market 2500, which are the Large Cap 300, Mid Cap 450 and Small Cap 1750 indexes. Market capitalization is based on a free-float adjustment. Indexes are reviewed quarterly and rebalanced semiannually. MSCI employs a “buffer zone” approach among size and value/growth dimensions to reduce turnover and to better reflect the investment process of asset managers. Eight different variables (three for value and five for growth) are used to better represent value and growth styles. Value attributes are: book value to price ratio, 12-months forward earnings to price ratio, and dividend yield. Growth attributes are: long-term forward earnings per share (EPS) growth rate, short-term forward EPS growth rate, current internal growth rate, long-term historical EPS growth trend, long-term historical sales per share growth trend. (www.mscibarra.com)
Standard & Poor’s: Standard & Poor’s U.S. indexes are maintained by the U.S. Index Committee, which meets monthly and comprises eight full-time professional members of Standard & Poor’s staff. Unadjusted market capitalization of $3 billion or more for the S&P 500 (approximately 75 percent of U.S. equities), $750 million to $3.3 billion for the S&P Mid Cap 400 (approximately 7 percent of U.S. equities), and $200 million to $1.0 billion for the S&P Small Cap 600 (approximately 3 percent of U.S. equities). The market cap of a potential addition to an index is looked at in the context of its short- and medium-term historical trends, as well as those of its industry. Adequate liquidity and reasonable price—the ratio of annual dollar value traded to market capitalization—should be 0.3 or greater. Various domicile requirements; public float of at least 50 percent of the stock; rules to minimize turnover. Changes to the U.S. indexes are made as needed, with no annual or semiannual reconstitution.
The Style index series divides the complete market capitalization of each parent index approximately equally into growth and value indexes. This series covers all stocks in the parent index universe, and uses the conventional, cost-efficient market-cap-weighting scheme. The style indexes measure growth and value along two separate dimensions, with three factors used to measure growth and four factors used to measure value. Growth factors: five-year earnings per share growth, five-year sales per share growth rate and five-year internal growth rate (IGR). Value factors: book value to price ratio, cash flow to price ratio, sales to price ratio, and dividend yield. A growth score for each company is computed as the average of the standardized values of the three growth factors. Similarly, a value score for each company is computed as the average of the standardized values of the four value factors.
Style Index Series: This series divides the complete market capitalization of each parent index approximately equally into growth and value indexes while limiting the number of stocks that overlap between them. This series is exhaustive (i.e., covering all stocks in the parent index universe) and uses the conventional, cost-efficient, market-capitalization-weighting scheme.
Pure Style Index Series: The pure style index series identifies approximately one-third of the parent index’s market capitalization as pure growth and one-third as pure value. There are no overlapping stocks, and these indexes do not have the size bias induced by market-capitalization weighting; rather, stocks are weighted in proportion to their relative style attractiveness. (http://www2.standardandpoors.com/)
Russell: U.S. common stocks are ranked from largest to smallest based on free-float market capitalization at each annual reconstitution date, which is May 31. The largest 1,000 stocks become the Russell 1000 Index, the largest 800 stocks in the Russell 1000 become the Russell Mid Cap Index and the next largest 2,000 stocks (after the largest 1,000 stocks) become the Russell 2000 Index. Style is determined by ranking each stock by two variables: the book to price ratio and the I/B/E/S forecast long-term growth mean. The variables are combined to create a composite value score (CVS) for each stock. The stocks are then ranked by their CVS, and a nonlinear probability algorithm is applied to the distribution to determine style membership weights. Roughly 70 percent are classified as all value or all growth and 30 percent are weighted proportionately to both value and growth. (www.russell.com)
Appendix II: Benchmark Indices
|Large Growth||Russell 1000 Growth|
|Large Blend||Russell 1000|
|Large Value||Russell 1000 Value|
|Mid-Cap Growth||Russell Mid Cap Growth|
|Mid-Cap Blend||Russell Mid Cap|
|Mid-Cap Value||Russell Mid Cap Value|
|Small Growth||Russell 2000 Growth|
|Small Blend||Russell 2000|
|Small Value||Russell 2000 Value|
“2009 Investment Company Fact Book,” Investment Company Institute. http://www.icifactbook.org/.
Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance,” Journal of Finance, vol. 52: No. 1, 57-82.
Cremers, Martijn, Antti Petajisto, and Eric Zitzewitz. 2008. “Should Benchmark Indices Have Alpha? Revisiting Performance Evaluation,” Working paper version July 31, 2008.
Fama, Eugene F., and Kenneth R. French. 1993. “Common Risk Factors in the Returns on Bonds and Stocks,” Journal of Financial Economics, vol. 33: 3-53.
French, Kenneth R., http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Israelsen, Craig. 2007. “Variance Among Indexes,” Journal of Indexes, May/June: 26-29