correlation question

mrb102189 · January 19, 2017, 3:06pm

I came across a quant Q which wanted us to solve for the correlation between the actual and predicted Y values of a regresion. 1st they solved for RSS and then they did RSS^2 to get the answer for the correlation between the actual and predicted Y values. I’m confused by this logic- I understand what RSS is- the percentage of variation of the dependent explained by the independents. But why square it to get to correlation? Looking to understand the logic behind this…

thanks!

taytus · January 19, 2017, 6:58pm

You are confusing RSS with R^2.

R^2 is the percentage of variation of the dependent variables explained by the independent ones and for a linear regression the square of the correlation is equal to R^2: r^2=R^2

tickersu · January 20, 2017, 5:29am

Keep in mind this is only true in simple linear regression where the square of the correlation between x and y is r-squared.

mrb102189 · January 20, 2017, 2:53pm

yes I think I just confused myself-- thanks!

Follow up question- I’m going over the concept of what happens to the reliability of the test if you remove an independent variable. The effect si that the correlation coefficients and the error term become unreliable. Can you explain this? If we remove 1 of the 2 independent variables and the one we removed is correlated with the other one, this causes an issue because the error terms has to reflect that correlation?

OR is it an issue if the removed independent variable is correlated with the dependent variable?

Thanks!

Damil4real · January 20, 2017, 3:46pm

If you remove an independent variable that explains the dependent variable, then the error term will reflect that correlation. Not good.

If two independent variables are correlated, then you have a multi-collinearity problem. You’ll have to remove one of the independent variables because they are essentially supplying the same info. Removing one will probably not drastically improve the R-squared.

tickersu · January 20, 2017, 5:16pm

If some independent variables are correlated you might have the problem of multicollinearity, it’s not definite. According to the curriculum (and therefore for the scope of the exam), they like to tell you to drop one of the correlated independent variables, but in real life, it’s more complex than the book makes it seem.

For the exam, follow the curriculum. If you plan to use any of this in real life, you’d be better served to pick up an actual statistics book.

mrb102189 · January 27, 2017, 9:51pm

SUPER helpful thank you all!!