Heteroskedasticity - why usually too small errors?

I’ve looked at other heteroskedasticity threads here but they didn’t answer my question. Maybe one has, but I haven’t found it.

Why does heteroetc… cause standard errors to (usually) be too small? The notes just say that it does, but not why. I understand that it reduces the standard error typically and results in false significance, but I don’t know why it results in a deflated standard error.

In the simpliest term,

Heteroskedasticity refers to the phenomenon that the error term becomes larger as you increase independent variable.

So it underestimiate the standard error and thus overstating the t-stat (reject more often and increases type I error)

While the error term might become larger as the independent variable increases, that’s only one example; it may become larger as the independent variable decreases, or as it gets closer to 6, or multiples of 13, or any one of an infinitude of other possibilities. And that’s only conditional heteroskedasticity. For unconditional heterskedasticity, it will get larger (or smaller) with no pattern.

Clearly it’s because the aberrant errors are usually too big; i.e., uncharacteristically big compared to the rest of the errors.

Why they’d likely be too big rather than too small is beyond my ken; maybe it’s an entropy argument. Or something.

Nevertheless, that’s what the statisticians figure happens.

Ok that was the heart of my question. I figured they should be too big and too small in equal proportions…but it sounds like this is one of those things where at this point I don’t know need to understand why, I just need to know that it is that way?

Another quick question that’s close to the same topic. Am I correct in my thinking that to calculate the standard error we take the squart root of MSE, where MSE is SSE/n-k-1? That would make the standard error the unexplained residual away from the regression line…I think?

I don’t think it’s important for the curriculum but the below link may help

http://chrisauld.com/2012/10/31/the-intuition-of-robust-standard-errors/

Summarizing.

The upshot is this: if you have heteroskedasticity but the variance of your errors is independent of the covariates, you can safely ignore it, but if you calculate robust standard errors anyways they will be very similar to OLS standard errors. However, if the variance of your error terms tends to be higher when x is far from its mean, OLS standard errors will tend to be biased down, and robust standard errors will tend to be larger than OLS standard errors. In the opposite case in which the variance of the error terms tends to be lower when x is far from its mean, OLS standard errors will tend to be too large, and robust standard errors will tend to be smaller than OLS standard errors. With real data it’s commonly but not always going to be the case that the variance of the error will be higher when x is far from its mean, explaining the result that robust standard errors are typically larger than OLS standard errors in economic applications.

Yes, root of the MSE = SEE (Standard error of the estimate).

The SEE is just measuring the degree of variability of actual Y vs estimated Y.

Not quite; Heteroskedasticity is when the error terms are non-constant.

Conditional Heteroskedasticity is related to the level the independent variables.

Non-conditional Heteroskedasticity: Not related to the level of independent variables, therefore no major problems with the regression.

I’m clarifying because differentiating between the two could easily be an exam question.

Ok after reading your post and skimming the article…I still have one question. It’s unimportant for the test, I’m sure, but I hate having these little loose ends that I don’t understand.

What causes the errors to be biased down when the errors increase with increasing x? And naturally what causes the opposite…why are standard errors biased up when the errors decrease with increasing x? So far all I’ve seen is that people know that this is the case, but not necessarily why.