I came across a quant Q which wanted us to solve for the correlation between the actual and predicted Y values of a regresion. 1st they solved for RSS and then they did RSS^2 to get the answer for the correlation between the actual and predicted Y values. I’m confused by this logic- I understand what RSS is- the percentage of variation of the dependent explained by the independents. But why square it to get to correlation? Looking to understand the logic behind this…
R^2 is the percentage of variation of the dependent variables explained by the independent ones and for a linear regression the square of the correlation is equal to R^2: r^2=R^2
Follow up question- I’m going over the concept of what happens to the reliability of the test if you remove an independent variable. The effect si that the correlation coefficients and the error term become unreliable. Can you explain this? If we remove 1 of the 2 independent variables and the one we removed is correlated with the other one, this causes an issue because the error terms has to reflect that correlation?
OR is it an issue if the removed independent variable is correlated with the dependent variable?
If you remove an independent variable that explains the dependent variable, then the error term will reflect that correlation. Not good.
If two independent variables are correlated, then you have a multi-collinearity problem. You’ll have to remove one of the independent variables because they are essentially supplying the same info. Removing one will probably not drastically improve the R-squared.
If some independent variables are correlated you might have the problem of multicollinearity, it’s not definite. According to the curriculum (and therefore for the scope of the exam), they like to tell you to drop one of the correlated independent variables, but in real life, it’s more complex than the book makes it seem.
For the exam, follow the curriculum. If you plan to use any of this in real life, you’d be better served to pick up an actual statistics book.