Could anyone please explain the definition of NUMBER OF DEGREES OF FREEDOM?
For me, this definition is so hard to understand.
Suppose that you have a sample of 500 giraffes and you calculate the average of their heights to be 4.7 meters.
Having calculated their mean height, you now have a sample with only 499 degrees of freedom.
“Why?”, you ask?
Because if you get another sample of 500 giraffes, only 499 of their heights can vary independently, given that the average has to be 4.7 meters. The calculation of the average has imposed a restriction on the data set, using up one degree of freedom.
If you had also calculated the (sample) standard deviation to be 0.4 meters, then you have used up another degree of freedom, so you would then have only 498 degrees of freedom.
i. Could you please explain why the denominator is equal to n-1 in the sample variance equation, while the denominator is equal to n in the population variance equation?
ii. Using n-1 rather than n helps estimate population variance equation more correctly. Could you please explain why?
If you divide by n - 1, then the expected value of the sample variance is the population variance; if you divide by n, then the expected value of the sample variance is smaller than the population variance.
You lose one degree of freedom because you’re estimating the sample mean rather than using the (unknown) population mean.
If you you remember simulateous equations you need the same number of equations as unknown variables to calculate the values.
Ie 2x = 10 - 1 varaible one equation.
x + y = 10
x - y = 2 = 2 equations 2 variables.
Now assume we are making estimates of the varaiables we are using samples to estimate. We are not getting precise answers.
I am estimate 1 value - ie mean - the very minumum I need is one data item. Everyting else is extra giving me more data for a better estimate more degrees of freedom.
Note in tables as the dof increases the degree of uncertainity in our analysis declines.
Maybe not the correct statistical explanation (I have no idea) but this is how I explain it to myself.
Here is my understanding.
-) First, look at Population data, observations of the population are independent.
-) Second, look at SAMPLE data (i)
+) Calculate sample mean to estimate population mean, n observations are independent or free to choose. The divisor to calculate is n
+) Calculate sample variance to estimate various population variance, (n-1) observations are independent. the divisor to calculate is (n-1) → to make it consistent with (i)