I’m a bit miffed by the way i tend to confuse myself on how many degrees of freedom should be applicable to a particular test. Is there a general approach of specifying this?
Yes.
Every time you calculate a statistic you lose a degree of freedom. Say you have a sample of 500 giraffes and you want to compute their average height. (Capturing all of the giraffes in the world is too difficult, so you’ll use your sample and infer from that.) You calculate the mean height of the giraffes in your sample as 5.62 meters.
If you grabbed another sample of 500 giraffes with the intention of calculating their mean height – and you’re constrained to get the same mean as in your first sample (that’s the key constraint, and the explanation of this whole degrees-of-freedom thing) – then 499 of those giraffes can be any height whatsoever (they can vary freely), but the 500th one is constrained: its value must be the correct number to give you a mean height of 5.62 meters. By calculating the mean, you have lost one degree of freedom.
If you’re doing a linear regression, for example, then you’ll calculate a number of statistics; specifically, an intercept and a number of slope coefficients. For every such statistic you calculate, you lose a degree of freedom. If you have a sample of 500 (x, y) data points and you calculate a slope and an intercept, then grab another sample of 500 (x, y) data points, 498 of the _y_s can vary freely, but the last two must be specific values to get that same slope an intercept; you’ve lost two degrees of freedom.
In general, for linear regression where you have k input varaibles, so you compute k slopes and one intercept, you lose k + 1 degrees of freedom: you’ll have n – k – 1 degrees of freedom with n data points.
Thank you so very much! If i did have 500 giraffes, i’d give you a couple.
Magician, thanks for the complete answer as always. That’s how I understood degree of freedom as well. Then, why in the CFAI (R.13, Q#13), they say the following:
“In a correctly specified regression, the residuals must be serially uncorrelated. We have 108 observations, so the standard error of the autocorrelation is 1/ T, or in this case 1/ 108 = 0.0962. The t-statistic for each lag is significant at the 0.01 level. We would have to modify the model specification before continuing with the analysis.”
I know you don’t have the books but all they are doing is looking at whether the autocorrelations are significant. We are given a table with autocorrelations, standard errors, and the t-stat. The equation is: “Δln (Sales t) = 2.7108 + 0.3987Δln (Sales t–1) + εt.” This is an AR(1) model so shouldn’t df = 108 - p - 1, i.e. 108 - 1 - 1 = 106?
Any insight would be greatly appreciated. Quant is far from being my strength.
For what it’s worth on this topic. A related question I had was why we divide by n-1 when calculating some statistics rather than by n. In linear regression, we end up using n-2 in many calculations or really, n-k-1 degrees of freedom as stated above. Kahn Academy has an excellent video on why we divide by n-1, rather than n. I’m sure this n-k-1 degrees of freedom is all related. It would be nice if Kahn Academy would present videos on multiple linear regression and time series so we could all understand the nuansances of this lesson.
If, for example, you’re referring to dividing the sample variance by n – 1 rather than dividing by n (as we do for a population variance), then you’re exactly right: it’s a degrees-of-freedom thing.
Whenever I teach quants, I refer to that statistic by its full, proper name: the _ bias-adjusted _ sample variance. By adding the words “bias-adjusted”, it helps the candidates to remember that they have to . . . well . . . make an adjustment: divide by n – 1 instead of dividing by n. And the name is quite suggestive: the reason that we make the adjustment is to remove the bias from the statistic. If we divided by n, then the expected value of the sample variance would not equal the population variance, but if we divide by n – 1, then the expected value of the (bias-adjusted) sample variance _ does equal _ the population variance. And what causes that bias? Using the mean calculated from the sample instead of using the mean of the population. (Of course, we don’t know the mean of the population, so we’re forced to use the sample mean. But we understand that and adjust for it.) And why does that bias manifest itself as n – 1 vs n? Degrees of freedom: we lost one degree of freedom when we calculated (then used) the sample mean.
The number of degrees of freedom _ for the model _ is 108 – 1 – 1 = 106. That’s not the same as the denominator in the calculation of the standard error of the autocorrelation; that number is always n (108 in this case). I frankly don’t remember enough from my graduate class in statistics to explain why the standard error for the autocorrelation is 1 / n (and a cursory internet search didn’t find anything remotely easy to paraphrase here), so I suggest that you simply take this one on faith. The fact is that computing the standard error for autocorrelation is quite a different beast from computing the standard error for the model; maybe remembering that will be sufficient for you to get the numbers right on the exam. I hope so. Sorry.
I always include the intercept and then I count the number of rows in the ANOVA table!
Or the number of independent variables including Bo (intercept). Does this work? lol.
Yes: one slope coefficient per independent variable, plus one intercept.
mark my words; I will pass this exam this year and it will be because of you S2000magician!
You’re too kind.
Your hard work will have something to do with it as well.