Violations of the assumptions of multiple regression

FrankCFA · May 11, 2014, 8:28am

Understand Conditional Heteroskedasticity and Serial Correlation standard errors are underestimated. Cause too many Type I errors.

Buy why Multicollinearity are the opposite…

tickersu · April 28, 2016, 12:59am

See this other post I made and let me know if it helps. The first half is more directly related to your question. The latter part of the post is why you can have nonsignificant t-tests but a significant F-test and this signals that you likely have issues with multicollinearity.

"

First, the coefficients are unreliable for practical interpretations– but the model will still have predictive power (i.e. the signs of the estimates may be different than expected, but the model’s predictive power will not necessarily be hindered from multicollinearity).

In multiple regression, there are diagnostic statistics called variance inflation factors, VIFs. These are calculated as : 1/(1-Rsquaredi) where the Rsquaredi is from the regression of Xi on all other independent variables. Therefore, if Xi is highly correlated with the remaining independent variables, Rsquaredi will be high and the VIF will be large. The square root of the VIF is how many times LARGER the standard error is due to the multicollinearity. Short of going through a derivation of the least squares estimates, this is the easiest explanation I can use to sort of show how the standard errors are inflated due to multicollinearity.

Now for your second post:

If the standard errors are inflated on the t-test (to a high degree) it is likely that the coefficients will all show non-significance based on how the t-statistic is calculated. However, the F-statistic (test for joint significance) is essentially comparing the complete model at hand to a model where only the intercept (y-bar) is used for prediction. It basically tells us how much better our model does compared to the average as the model (after accounting for degrees of freedom). If it is large enough (p-value less than alpha level), at least ONE of the terms in the model is statistically different than zero (this is the alternative hypothesis also shown as C1=C2=…=Ci=0 where Ci is the ith coefficient).

So now, if the F-test says at least one variable is statistically useful, but all t-tests say nothing is statistically useful, we have _ apparently _ contradictory results. Given what we know about inflated variances and therefore, standard errors, we can say that multicollinearity is likely the cause. Also, the adjusted R-square sort of gets at the same info. If it is high, the model can explain a large proportion of the sample variation in the DV, which would contradict nonsignificant t-tests.

Hope this helps!"

Edited because I forgot a word one long time ago…The result isn’t contradictory because the F-test and t-test answer different questions.

FrankCFA · May 11, 2014, 2:43pm

tickersu:

See this other post I made and let me know if it helps. The first half is more directly related to your question. The latter part of the post is why you can have nonsignificant t-tests but a significant F-test and this signals that you likely have issues with multicollinearity.

"

First, the coefficients are unreliable for practical interpretations– but the model will still have predictive power (i.e. the signs of the estimates may be different than expected, but the model’s predictive power will not necessarily be hindered from multicollinearity).

In multiple regression, there are diagnostic statistics called variance inflation factors, VIFs. These are calculated as : 1/(1-Rsquaredi) where the Rsquaredi is from the regression of Xi on all other independent variables. Therefore, if Xi is highly correlated with the remaining independent variables, Rsquaredi will be high and the VIF will be large. The square root of the VIF is how many times LARGER the standard error is due to the multicollinearity. Short of going through a derivation of the least squares estimates, this is the easiest explanation I can use to sort of show how the standard errors are inflated due to multicollinearity.

Now for your second post:

If the standard errors are inflated on the t-test (to a high degree) it is likely that the coefficients will all show non-significance based on how the t-statistic is calculated. However, the F-statistic (test for joint significance) is essentially comparing the complete model at hand to a model where only the intercept (y-bar) is used for prediction. It basically tells us how much better our model does compared to the average as the model (after accounting for degrees of freedom). If it is large enough (p-value less than alpha level), at least ONE of the terms in the model is statistically different than zero (this is the alternative hypothesis also shown as C1=C2=…=Ci=0 where Ci is the ith coefficient).

So now, if the F-test says at least one variable is statistically useful, but all t-tests say nothing is statistically useful, we have contradicting results. Given what we know about inflated variances and therefore, standard errors, we can say that multicollinearity is likely the cause. Also, the adjusted R-square sort of gets at the same info. If it is high, the model can explain a large proportion of the sample variation in the DV, which would contradict nonsignificant t-tests.

Hope this helps!"

Many thanks. It’s helpful!

tickersu · May 11, 2014, 3:17pm

Glad to help! Also note, when I said standard error, I was referring to the standard error on a regression coefficient (instead of standard error of the regression).

cuddle · April 27, 2016, 2:49pm

Thanks!

jonta999 · April 27, 2016, 4:01pm

If F test is significant and one of the t test is significant : No multicolinearity

is the above statement correct?

tickersu · April 28, 2016, 1:30am

Nope. Multicollinearity isn’t binary in most situations-- it’s a matter of degree. A problematic level of multicollinearity is what most people mean when they reference it (but many people who reference it don’t actually know if it’s a problem for their purposes or how to address it).

The situation you provided could be from that particular variable having little multicollinearity with the other variables, but the other variables might be highly collinear. In otherwords, X1 might not be related to X2 and X3, so it’s t-test is fine and it’s beta estimate is okay, but X2 and X3 might be highly collinear with each other.

Also, I edited my old post on here for clarity-- I left out a word. To summarize what I corrected: Significant F-test with no significant t-tests appears contradictory, but it is not actually contradictory.

jonta999 · April 28, 2016, 1:41am

But in one of the topic test even if one t test is significant, they have considered no multi colinearity

tickersu · April 28, 2016, 2:16am

A few things to note: first, the thread title says violation of assumptions–just keep in mind that multicollinearity isn’t a violation of the regression assumptions (unless it is perfect collinearity).

To your actual question: I haven’t seen the question and answer choices, so there are some possibilities. The first is that the correct answer is the most likely or best answer choice (this isn’t always a totally accurate statement, it’s just better than the others). In the context of the CFA curriculum, they do a pretty poor job of covering multicollinearity (and some other regression topics). If there are only two independent variables, then it’s pretty unlikely that you have problematic MC if the t-test is significant or the pairwise correlation between the IVs is low. If there are more than two independent variables, they’ve possibly written a poor question (which they’ve done in the past).

It’s possible that the other two answer choices are very incorrect, and the “correct” choice is reasonable in certain cases.

Lastly, you asked if it’s necessarily true, and I’ve said it isn’t-- this doesn’t necessarily contradict what they’ve said (I haven’t seen what they wrote, and again, multicollinearity is a spectrum, not black or white).

How many independent variables are in the regression?