Can anyone explain why the following equation is the slope of a linear regression? Cov(X,Y)/Var(X) There’s no derivation or intuitive explanation in the text, and I’d like to get a better sense of why this is the slope.
You can actually get the derivation of the formula in most, if not all econometrics textbooks. You can refer to this link to get an idea of how it is derived. In short it involves minimizing the sum of squared errors, and solving a pair of simultaneous equations. http://www.yongyoon.net/ecmetrics201/ols_derivation.pdf Beta1 hat = the slope estimator = sigma(X - X bar)(Y-Y bar)/ sigma (X-X bar)^2 Take note : Cov(x,y) = [sigma(X - X bar)(Y-Y bar)]/n , var(x) = [sigma (X-X bar)^2]/n Hope this helps.
The derivation goes as follows: Starting from our regression equation, we write in summation notation with the sum of squared error terms on the left-hand side, and take the first-order partial derivatives of the equation with respect to the slope parameter and with respect to the intercept term. Setting equal to zero will give two equations with two unknowns. Solving for the the two unknowns will give this equation for the slope parameter. Intuitively speaking: I’ll leave this to someone else.
wyantjs Wrote: ------------------------------------------------------- > The derivation goes as follows: > Starting from our regression equation, we write in > summation notation with the sum of squared error > terms on the left-hand side, and take the > first-order partial derivatives of the equation > with respect to the slope parameter and with > respect to the intercept term. Setting equal to > zero will give two equations with two unknowns. > Solving for the the two unknowns will give this > equation for the slope parameter. > Intuitively speaking: > I’ll leave this to someone else. Yep you are perfectly right wyjantis. =) Up till now I cant figure if there is an intuitive way of explaining it though my gut feeling is that it isn’t. Just sheer brutal algebra (or linear algebra when you are doing it for MLR)
Var (X) = Cov (X,X) Therefore Slope = Cov(X,Y) / Cov (X,X) You have a common term in the numerator and denominator: (say (X - Xbar)). If you cancel this term you are left with. (Y - Ybar) / (X - Xbar). The regression line always runs through the point of means (Xbar,Ybar) so this is just a “rise over run” ratio (ie slope for a straight line).
^ That’s a good try at an intuitive explanation but doesn’t work for me. In particular, that cancellation is bogus. I think the problem with an intuitive explanation is that the criteria for choosing the regression coefficients aren’t intuitive. We are getting the coefficients by minimizing the sum of squared residuals. Hmm… Does that seem intuitive to anyone? For instance, minimizing the sum of the absolute values of the residuals seems reasonable aas does minimizing the distance of each point from the regression line. In fact, to make this intuitive you have to look at it in a way that is pretty advanced about sub-space projections or something.