You’re confusing the concept with the formula for the calculation. The expectation of [Ra - E(Ra)][Rb - E(Rb)] is calculated as 1/n*{[Ra - E(Ra)][Rb - E(Rb)]}. So you don’t need to divide the expectation by any denominator, otherwise you’ll end up with n^2 or (n-1)^2 in the calculation.
Moreover, the denominator in the calculation represents the degrees of freedom. When calculating the variance or covariance of a population, the degrees of freedom equal the sample size N. When calculating the variance or covariance of a sample, you have already “used” one of the degrees of freedom by calculating the sample mean, which is used in the calculation, and therefore (for an unbiased estimate) you need to divide by n-1 (i.e. the remaining degrees of freedom, based on sample size minus the degree of freedom “used” for calculation of the sample mean).
Why do we divide by N when calculating the covariance of a population? In your first paragraph you meant that probability weighted covariance does not require it. Then why is this division required by population covariance? Won’t it cause the same problem we’re trying to avoid in the first paragraph?
You divide by N, because you sum N times the cross-product of the differences, so you end up with N times the covariance (I have not written the formula entirely correctly above - it should read 1/N * sum from 1 to N of the cross products of the differences).
What I meant above is that E{[Ra - E(Ra)][Rb - E(Rb)]} = 1/N * sum from 1 to N of [Ra - E(Ra)][Rb - E(Rb)]. You are asking why not divide E{[Ra - E(Ra)][Rb - E(Rb)]} by N (or n-1). Simply, because then the calculation will be equal to 1/N^2 * sum from 1 to N of [Ra - E(Ra)][Rb - E(Rb)], as you are in practice dividing twice by N.
The links you have linked do not contradict each other. When there’s a probability weight, you apply the respective probability to each cross-product. In the other case the “probability” is basically the same for each scenario, so you weigh the entire sum of cross-product by N (or n-1 if a sample).
Isn’t all covariance calculated as E{[Ra - E(Ra)][Rb - E(Rb)]}?
Why is it that some of them have N, n-1 or no denominator? I understand what you mean in the second paragraph you wrote, but I don’t understand what got us to the second paragraph.
It was a sample, so we divide by n-1. But under what scenario do we divide by 1 (i.e. no denominator)? If it were a population, we would be dividing by N. But the no denominator part makes no sense.
Then, the other link has a fancy matrix that doesn’t divide by anything. I am having a hard time making the distinction of when to divide or not.
When you divide by N (or n-1) you are basically “weighing” each cross products by the same weight (1/n) - e.g. if population is 3, you can divide each cross-product by 3 (or multiply it by 0.33) and sum them (or sum them first and multiply by 0.33 or divide by 3). If you have probability-weighted calculation, then you will multiply each cross-product by its specific probability (which is equivalent to dividing it by some respective fraction), but you cannot divide the whole sum, as each cross-product has to be weighed differently.