There’s enough data there to fill the River Styx. Like Charon, ancient Greece’s mythical ferryman to the underworld, I’ll guide you through it.

The first critical test is goodness of fit, which will indicate if it is fair to compare the fund to our MSCI benchmark. Goodness of fit measures co-movement: the frequency with which the benchmark and the fund both gain (or lose) value on the same day. A reading of 1.00, or 100 percent, is as good as it gets.

We can see that these funds fit our benchmark well, with all but three of our 28 tests synching with the MSCI USA Large Cap index on at least 90 percent of trading days, and with three-fourths of our test funds hitting 95 percent co-movement. The MSCI USA Large Cap Index is a fair and well-fitting benchmark for these 11 funds.

Now we can look at our first measure of risk: beta. When a fund has the same risk level as a benchmark, its beta equals 1.00. Higher betas mean more volatility relative to the benchmark. More volatility means more risk.

Look again at the tables above. The funds are sorted by returns, but they might as well be sorted by betas, because the results would be mostly the same, except in the three-year table. More risk equals greater returns in a rising market. No magic here.

When goodness of fit is high, you can multiply the benchmark’s returns by the fund’s beta to find the fund’s predicted return. The difference between the predicted return and the actual return is called alpha. But do be careful about alphas, because, like any statistic, they have a margin of error.

Most of the time, alphas don’t mean anything, because their error bands are wide enough that an alpha of zero, or even of the opposite sign, is within the margin of error. If you don’t think margins of error matter, you must have forgotten the 2012 presidential election.

Statisticians look at the margin of error to determine the likelihood of a result being non-random. The term of art is “significance.” An alpha’s significance is the probability that its error bars don’t include zero.

When statisticians argue over which level of significance to use, they’re debating how wide to draw the error bars. Most require the error bars to be about two standard errors (standard deviation divided by the square root of sample size) wide, or a 95 percent probability that zero is not within the error bars. Generous statisticians allow for 90 percent significance to confirm excess returns. That’s 1.66 standard errors.

Anything inside the error bars is noise, and not statistically different from zero.