The growth, at least, comes as no surprise. These “traders’ funds” are designed to deliver some multiple of the daily return of different benchmark indexes: either 300%, 200%, -100%, -200% or -300%, depending on the product. With global equity markets crashing and market volatility sky-high, any product that helped investors hedge their portfolios—or profit on the downside—was sure to be a hit, as these were. ProShares, the leading provider of leveraged and inverse ETFs, was the fastest-growing ETF company in the world in 2008, with assets under management rising from $9.7 billion to $20.5 billion.
As 2008 wound to a close, however, concerns arose about the performance of these funds. Tom Eidelman summed up the problems in his Jan. 12, 2009 article in Barron’s magazine (“One-day Wonders”):
Suppose you had predicted—correctly, as it turned out—that the Chinese economy would slow following last summer’s Beijing Olympics, causing China’s stock markets to tumble. Also suppose that, to profit from your insight, you had invested in the ProShares UltraShort FTSE/Xinhua China 25, a leveraged exchange-traded fund (ticker: FXP) designed to go up by as much as twice the percentage that the FTSE/Xinhua China 25 Index falls on a given day.
When Chinese stocks crashed by 34% over the following four months, shouldn’t you have reaped a gaudy return around 68%? Not exactly. In fact, you would have lost 56%.
Take real estate. The Dow Jones U.S. Real Estate Index had a terrible year in 2008, falling 40.07%. The ProShares UltraShort Real Estate ETF (NYSE Arca: SRS) might have seemed like a smart way to play it. Its goal is to deliver -200 percent of the daily return of that index. But instead of rising 80 percent in 2008, as you might expect, SRS actually closed the year down 50 percent.
Figure 1 highlights the five most surprising examples of full-year 2008 returns for leveraged and inverse ETFs.
For an investor, being caught in one of these situations must have been hugely frustrating. Making the right call about the direction of the market is difficult. If you predicted one of these markets were going to fall, and then bought an ETF that promised to deliver -200% of the return of the index it tracked, you would have expected to earn a mint. To end up losing money … and in some cases, significant amounts of money … must have been infuriating.
It’s important to note, however, that these results were not created by a “flaw” in the funds. These funds largely delivered on their stated objective, which is to provide -200% of the daily return of their benchmark indexes. The problem is that -200% of the daily return of an index is very different from -200% of the long-term return.
The difference between daily and long-term returns is well-documented in the literature surrounding leveraged and inverse funds. It all stems from the basic math of compounding.
Suppose you have an index that starts out with a value of 100. You also have a product that’s designed to provide 200 percent of the upside exposure of that index. That product starts out with a value of $100.
On Day 1, the index rises 10 percent to 110, and the product rises 20 percent to $120. Perfect. But on Day 2, the index falls 10 percent to 99, while the product falls 20 percent to $96. After just two days, the problem is obvious: The index is down 1 percent, and the leveraged product is down 4 percent.
|Daily Change||Index||Investment (200%)|
|Day 2 – Start||110||$120|
|Day 2 – Finish||99||$96|
You can play with the numbers to make funny things happen. Suppose, for instance, that the index rose 20 percent on the first day, fell 25 percent on the second day and rose 15 percent on the third day.
|Daily Change||Index||Value Investment (200%)|
|Day 2 – Start||120||$140|
|Day 3 – Start||90||$70|
|Day 3 – Finish||103.5||$91|
In this case, the index ends Day 3 up 3.5 percent, but the leveraged investment closes down 9 percent.
Flip the numbers around and you can figure out what happened to funds like FXP and SRS. Compounding reared its ugly head, and what should have been up, ended up being down.
In other words, the eventual returns on leveraged funds like these are “path-dependent.” It matters less where the underlying index ends the year, than how it got there. The more a market jumps around from day to day, the greater the eventual divergence between a leveraged or leveraged inverse ETF and the relevant multiple of the index return is likely to be.
To its credit, ProShares and other providers of leveraged and inverse ETFs warned investors about this very phenomenon on their Web sites. The ProShares Web site, for instance, notes:
Like most leveraged and short funds, ProShares are designed to provide a positive or negative multiple (e.g., 200 percent, -200 percent) of an index’s performance on a daily basis (before fees and expenses). Generally, these funds have achieved their daily objective with a high degree of accuracy and consistency.
However, ProShares and other leveraged or short funds with daily objectives are unlikely to provide a simple multiple (e.g., 2x, -2x) of an index’s performance over periods longer than one day.
It goes on to say that, “[i]n general, periods of high index volatility will cause the effect of compounding to be more pronounced, while lower index volatility will produce a more muted effect.”
2008 was a period of historic volatility, and that caused the performance of leveraged and inverse ETFs to diverge significantly from a simple multiple of the index. The more volatile the index, the larger that divergence was over time.
Just because it’s easy to explain, however, doesn’t make it any more palatable for investors, and a media fury erupted in 2009 surrounding the unusual performance of these funds. Paul Justice, CFA, of Morningstar, captured the zeitgeist in a Jan. 22, 2009 article titled, “Warning: Leveraged and Inverse ETFs Kill Portfolios.” In the piece, Justice tells investors that these funds are “appropriate only for less than 1% of the investing community.”
He goes on to say, “If you’re hell-bent on using leverage for any period of time longer than a day, you’d be better off using a margin account in almost any real-world scenario.”
The righteousness is justified, but it is also too simple. Just as these ETFs do not provide exactly 200 percent or -200 percent of the long-term returns of a benchmark index, neither do they go wildly aflutter in most situations after a single day.
The leveraged and inverse ETFs have a number of advantages over a traditional margin account. For instance, you cannot lose more than your original investment in the fund. In addition, you never face margin calls, which you do face in margin accounts.
The question, therefore, is simple: How long can you really hold these funds and still expect to earn returns that stay reasonably close to the linear leveraged or inverse results many investors desire?
To answer that question, this study examines the historical performance of three of the first leveraged and inverse ETFs to launch in the U.S. These funds, all offered by ProShares, are designed to provide 200 percent, -100 percent and -200 percent of the daily return of the oldest and most established market index in the world: the Dow Jones industrial average.
The funds are:
- ProShares Ultra Dow30 (NYSE Arca: DDM)
- ProShares Short Dow30 (NYSE Arca: DOG)
- ProShares UltraShort Dow30 (NYSE Arca: DXD)
The study compares the performance of these funds versus their benchmark indexes over the following periods: one day, one week (five trading days), one month (21 trading days), one quarter (63 trading days) and one year (251 trading days).
For each period, the study compares the performance of the ETF versus the comparable linear performance of its benchmark, adjusted for the leverage factor. For instance, when looking at the ProShares Ultra Dow30 ETF, the study compares the performance of the ETF with the following returns: 200 percent of the one-day return, 200 percent of the one-week return, 200 percent of the one-month return, etc.
The study uses the net asset value of the ETF, adjusted for distributions, when making performance calculations. The choice to use NAV rather than share price was made because, in the early days of the study, liquidity was limited in these ETFs, causing share prices to occasionally deviate from the NAV.
The index returns used in this study are price returns, not total returns. This creates a small positive skew in the results, suggesting that tracking error in the funds are more positive than they might otherwise have been.
The study started at the earliest date that all three funds were trading and data was available (July 13, 2006) and ended on Dec. 15, 2008. The study evaluates the time periods on a rolling basis: 611 one-day periods, 607 five-day periods, 591 one-month periods, 549 one-quarter periods and 361 one-year periods.
On a one-day basis, the funds did a very good job tracking their benchmarks, as shown in Figure 2. In the chart, a “positive” tracking error indicates that the fund outperformed its expected return, and a negative tracking error indicates that the fund underperformed its benchmark.
The largest tracking error on both an average and absolute basis was in the leveraged-inverse product, DXD. The smallest average error was in the simple inverse product, DOG.
Interestingly, a plurality of tracking errors was positive, particularly for the inverse funds. This may be because of the structure of the funds. These funds achieve their exposure in the market by buying options, futures or swaps. In each case, they only have to put up a portion of the value of their positions; the remaining cash can be invested in Treasuries, earning interest. As a result, they have a small but positive daily return. All else being equal, this will give them positive tracking error against the benchmarks used in this study.
The tracking error throughout this study tended to have fat tails; i.e., most days produced little or no tracking error, while a few anomalous days had high errors. In Figure 3, the colored boxes indicate where 95 percent of all tracking error results landed. Outlier results are indicated by associated black lines.
Figure 3 emphasizes that the vast majority of tracking error for all three funds was very small: less than 0.13 percent on an absolute basis. To put these figures in perspective, consider that 95 percent of the one-day returns for the Dow Jones industrial average during the period studied ranged from 3.1 percent to -3.2 percent.
In sum, these funds follow through on their promise of delivering 200%, -100% and -200% of the daily return of their benchmark indexes.
As expected, tracking errors widen as you move to longer time windows. The results of the tracking error study for a one-week holding period are shown in Figure 4. Although the vast majority of results are still close to their target, the tails are fatter, with DXD in particular posting a handful of unusual results, including one that was more than 8 percent off its benchmark index. As a reminder, this does not mean the product was working incorrectly; it’s simply due to the nature of leveraged and inverse returns when measured for periods longer than one day.
But while DXD—and to a lesser extent, DDM and DOG—experienced a few “bad” results, in most cases, the funds continued to perform well. As is the case for all longer time periods, DOG and DXD exhibit higher tracking error than DDM, suggesting it is harder to deliver accurate inverse returns than it is to deliver accurate leveraged returns.
The larger tracking errors shown in Figure 5 should be measured against the larger average move in the Dow over the one-week time frames. Ninety-five percent of the one-week returns for the Dow during the period studied fell between 4.39 percent and -5.70 percent. The largest one-week return was 16.91 percent, with the largest decline being -18.16 percent.
Given those results, the tracking errors on both DDM and DOG seem quite acceptable. The vast majority of returns for these two funds fell within 1 percent of their expected result. DXD had more trouble, with returns straying as much as 2.78 percent from the linear result. But even DXD’s tracking bands were tolerable, given the relatively large moves in the Dow itself.
The trend of increased tracking error continued in the one-month analysis. The average error grew significantly, and the fat tails got fatter, particularly for DXD, which missed by as much as 16 percent on both the positive and negative side of the equation, as shown in Figure 6.
The majority of tracking error was still positive, but the problems of large negative tracking error became more apparent. DXD, for instance, saw 37 cases (nearly 5 percent of results) where it trailed its benchmark by more than 5 percent.
Figure 7 shows how far the margin of error grew at the one-month interval. Even for DDM, the double-leverage ETF, the 95 percent interval has expanded to show tracking errors as large as -2.83 percent. And for DXD, the tracking error has become truly significant: The 95 percent intervals stretched as low as -11.82 percent.
These larger tracking errors must be compared with the larger moves in the Dow, of course. The largest absolute one-month return for the Dow during the studied period was 7.90 percent, with the largest one-month drop hitting -26.63 percent. The 95 percent interval extends from a 5.86 percent return on the upside to a -15.37 percent return on the downside.
Measured against moves of that size, the returns of DDM look quite good. DOG’s performance has wider errors, but they are still relatively contained given the broader movements in the Dow. DXD’s tracking error grows substantially, however. While we might have expected DXD’s tracking error to be 2X as large as DOG’s error (after all, it’s tracking -200 percent of the returns of the Dow), the downside error bands were nearly 3X as large on DXD as they were on DOG. While DXD tracked its linear return well in many, many scenarios, there were clearly a significant number of instances where the tracking error was wider.
Expanding out to look at quarterly returns, the results grew more extreme. DXD in particular exhibited a number of large errors, particularly on the downside, with negative tracking error extending as far as -30 percent. DOG also showed a sharp downward tail of negative tracking error, while DDM performed much better, as shown in Figure 8.
Large errors remained when tightening to a 95 percent return band, as shown in Figure 9. DDM remained relatively tightly confined to its index, staying within 3 percent of the index in 95 percent of all cases. DOG did slightly worse, with a range that extended to -5.17 percent on the downside. But on DXD, the misses were huge, stretching all the way from 4.74 percent to -20.80 percent.
Again, these returns must be understood in the context of the larger returns of the index itself. Quarterly returns on the Dow for the period ranged from 13.49 percent on the upside to -35.05 percent on the downside. Ninety-five percent intervals stretch from 10.68 percent on the upside to -25.51 percent on the downside.
At those levels, the returns of DDM and DOG are reasonable. But DXD remains an outlier, with large potential misses threatening returns.
When the study is extended out to one year, an interesting shift occurs. Not only do the outlier tracking errors grow significantly, but the skew of results shifts. As Figure 10 shows, the fat tails in the one-year analysis are primarily on the positive side, particularly for DXD, which significantly beat its index in a number of different scenarios.
This is shown again in the 95 percent confidence intervals, where the DOG ETF in particular shows no positive error at all in the 95 percent view, and both DXD and DDM skew sharply higher. In all instances, however, the bounds for error are large.
Of course, the Dow showed great variability of returns during the time studied. On a one-year basis, the Dow delivered returns as high as 30 percent and as low as -41.82 percent. The confidence interval stretches from 22.13 percent to -37.12 percent.
Even measured against the larger returns of the index, the tracking error shown by the funds is quite large. DXD in particular shows a wide variation in tracking error.
An important trend emerges for the inverse funds if you dig into when and where tracking error appears. For all the periods studied except one-year, tracking error grows larger … and more negative … as the expected return grows.
Figure 12 showcases this point. It compares the tracking error of DXD to the fund’s expected return. The tracking error stayed generally flat-to-positive for all but the most extreme positive expected returns. Once those expected returns went parabolic—above, say, 20 percent—the tracking error got much more volatile and was generally negative.
This held true for all time periods studied except for one-year. In the one-year returns, tracking errors tended to be positive overall, and did not have the negative skew during large expected returns.
Errors Skewed Toward Late-2008
It is important to note that this study took place over a period of historic volatility, particularly in the latter half of 2008. In fact, the vast majority of the large tracking errors recorded during this study took place in the latter half of 2008. Had the study only extended through mid-2008, tracking errors would have been much smaller overall.
Figure 13 shows quarterly tracking errors for all three funds as they occurred in time. As shown, the majority of large tracking errors occurred during the tail end of the study, when the chaos of 2008 started to impact returns. Unfortunately, when those large tracking errors appeared, they tended to appear on the negative side of the ledger. It is no surprise that this is when public concern about these products began to develop.
How Long Can You Hold ProShares ETFs?
The results of the study are clear. As you move further away from the targeted one-day time period, tracking error on these funds grows. The problem is substantially more acute for the leveraged-inverse fund (DXD) than it is for the straight leverage (DDM) or straight inverse (DOG) funds.
In most market conditions, the funds stuck close to the simple long-term leveraged or inverse return of their index. But in the most volatile of markets, significant negative tracking error developed in some of the cases studied.