T-test with same variances

Hello everyone, I am using Schweser notes to prepare the exam. I have question on the LOS 11h. It gives the formula to calculate the t-test with the same variances with the two populations. So my understanding was s1 = s2. But it is not the case. And the notes say, s1 and s2 are the standard deviations of the sample 1 and sample 2. So they are the standard deviations for these two individual random samples? The population variances are still the same? Am I understanding right?

I have been researching online about the sample standard deviations. It seems like it is an estimator by a random sample pulled from the whole population. So it would be different on each sample, right? Which one would do you choose in the real world? It is always given in the examples in the notes.

Help and many thanks!

Sophie

Notice that this question has been outstanding for a while so I will give it a try.

It was a little while since I read this section (and should go back to it soon) but my understanding from reading it is that when you want to test if there is a difference between the means of two populations and in which the population variances are assumed to be equal you will use the formula you are referring too.

However, the formula considers the sample variance of the first and second sample and not the population variances or the sample standard deviations. So S1 (variance of first sample) is not the same as S2 (variance of second sample) even though the population variances are the same.

And I understand that you only need the sample variances and the number of observations in each sample to determine the t stat (not the std of the samples).

Think of it as if you are pulling two random samples with maybe different number of observations in each etc. It is likely that even though the population variances are the same, the sample variances will most likely be different.

For the second question. With large enough samples, the sample standard deviations should be similar to one another (never 100% perfect though) so once determining sample size etc. you only need to run one sample to get a good estimate of the population.

Someone with more knowledge on this, please feel free to add/correct my answer.

Thank you so much!

For the second question, is the criteria of “large” 30 which has been used for t-test with unknown population variances and central limit theorem? If that’s the case, everything makes sense to me now. That’s the missing piece in my statistics question puzzle!

Again, thanks a lot!

Sophie

According to the book, a large enough same sample size is 30 so at least for the test that should be accurate. However, I cannot speak for the real world. I bet there are a number of detailed rules in determining an appropriate sample size.

Probably outside the scope of the exam…Outliers alone wouldn’t be the reason not to use the t-test in practice. Most stat packages use t-tests irrespective of the sample size (think of a regression output and the t-tests for the beta estimates–have you seen a z-stat for these?..probably not). You can appeal to the central limit theorem in the case of a reasonably large sample.

From a technical perspective, though, the t-test will be for cases of small samples drawn from a normal distribution. Bootstrapping is a reasonable approach (and can get you CIs), but it’s far easier to use a nonparametric equivalent (Wilcoxon rank-sum test for independent samples, for example). If the general information is the same between the nonparametric and parametric technique (independent 2-sample t-test, in this case), the outliers likely have little effect on the overall picture (you can worry less that the assumptions of the t-test are violated). A big problem with a small sample size is that it’s very hard to assess normality through any method–you simply don’t have enough data. Past research might be useful to get an idea of the population distribution, however, so this could be helpful in assessing the appropriateness of a given test. Overall, I wouldn’t make a blanket statement that the t-test is not correct with outliers-- it might be incorrect, but it might not be…

My statement was not a blanket statement. The excerpt below has my response -

Book - Understandable Statistics by Brase & Brase, 11th Edition, Page 432, Last Paragraph -

“For example, if the sample size is small and the sample shows extreme outliers or extreme lack of symmetry, use of the Student’s t distribution is inappropriate.”

Thanks

Phil

There are some key words in there, “extreme”, that make the statement safer, but it’s still a blanket statement. It also looks like it’s an introductory (undergraduate) text. Introductory texts don’t tend to cover the details of actually using the methods taught, but rather, they go for the black and white situations to give students a foundation. Yes, it might be inappropriate from a black and white stance without considering just how “violated” some of the assumptions are. For example, in practice, people (who don’t have a solid stats background) often use normality hypothesis tests to see if they need a parametric or nonparametric technique, but these tests are extremely sensitive to nonnormality and many parametric techniques are robust enough to handle some moderate nonnormality (in other words, they take a very black and white approach to something that isn’t clear cut). These people use the cookbook approach of “not normal, nonparametric” when really it isn’t the best way of doing things. A person with experience probably wouldn’t use the formal hypothesis test because of how sensitive it is (i.e. how easily it picks up the slightest nonnormality in the data) because this often doesn’t have a practical consequence. However, many introductory books will tell you to use these normality tests because they’re trying to build a basic foundation (which is different from real world practice where you use judgment and less sensitive methods in assessing normality, keeping with the example).(Keep in mind, I’m not saying to disregard the issue of normality, but it’s important to recognize that statistics isn’t a cookbook as many beginner texts can imply.)

To the original point: you can always do a nonparametric technique, but if you get the same general picture with a nonparametric technique as you get with the parametric, it would indicate that the “extreme” outliers or “extreme” asymmetry isn’t as big of a problem as you would expect .

I’ve seen this all first hand in graduate courses and from a PhD statistician, so I’m not pulling this from thin air. Hopefully that illustrates where I’m coming from-- you probably could find this on this internet with searching, but I don’t think it’s a real important thing to find. It’s a logical result that the parametric technique is probably okay to use when it’s giving the same information as the nonparametric. If the parametric were truly inappropriate, you would get pretty different results (that’s why you could do both to see how it shakes out).

Please provide the reference to your proposition e.g. the stat. book you used in your graduate courses etc.

Thanks

Phil

We’ll talk about a few points before I flip through some books or find online references for you. (FYI-- I’m sure those authors know the material in their book, especially from a mathematical perspective-- I’m not questioning that. However, I am saying you should consider the fact that they are teaching to an undergraduate demographic (likely non-statistics or non-mathematics majors). In these courses, the material becomes more black and white and more high level to accommodate the lack of experience and general unfamiliarity with real world use of the methods).

  1. Your initial post did not include either of the terms “extreme asymmetry” or “extreme outliers”—you merely said, " If your sample seems to have outliers, t-test is not correct." I replied to this by saying that outliers (as you described it) alone aren’t a sufficient reason to deem the t-test inappropriate. There are several sources that state that the t-test is robust to moderate departures from normality (if you insist I’ll provide a couple).

Quick aside: assume you have a sample of size 20 and the data are from a normal distribution. Let’s also assume that we don’t have knoweldge that the data are from a normal distribution-- but it is a fact. A random sample from this population would turn up 1 outlier out of the 20 observations (in the long run, because ~5% of the area under the curve is in the “suspect outlier and outlier” range). Now, we have an outlier (or heck, let’s say we got 2 outliers out of 20), and we shouldn’t use a t-test according to your rule. However, if we use our judgment to look at the data, we might see that they don’t appear to deviate from normal too much. Additionally, we could run the t-test and the nonparametric equivalent (assuming the assumptions for the np technique are reasonable, too). If they both lead us to the same conclusion, it means we can worry less about the effect of those outliers on our inference-- in which case, the t-test isn’t necessarily in appropriate. Remember that one point of assessing the appropriateness of a test is to make sure our inferences are reliable and we can answer the question posed to us.

  1. After you provided a source, I did say that their use of the words “extreme” make the statement more palatable, and it is a very different statement than you initially made. So, I will say again that their statement, from the book, indicates very severe departures from normality. I’m not disagreeing with their statement in a black and white sense. Furthermore, if you have a large sample with outliers, this still might not be a problem due to the fact that many parametric statistical tests are robust (our inferences are okay when the assumptions aren’t exactly satisfied).

  2. Let’s use a more concrete example. Imagine that we want to compare the mean exam score of students in population A to students in population B (unpaired/independent observation). We want to see which group’s mean (a measure of central tendency) is larger. Also suppose that we have reason to believe the exam scores are from normal distributions, but we examine our sample (small, say 20 in each group) and we see a couple of outliers. We go ahead and use our t-test anyway and we reject Ho to conclude that the mean exam score of population A is larger than population B’s mean score. However, we are curious if we really should have done the parametric t-test because we had a small sample with some outliers. Investigating further, we run a nonparametric equivalent, the Wilcoxon rank-sum test (Mann-whitney U test) for independent samples. Essentially, this test will let us determine if population A is “shifted to the right” of population B (i.e. the central tendency of exam scores in A is larger than in B, similar to our t-test conclusion but it’s not for the mean this time). We use our NP test and conclude that population A is right shifted to B (A tends to have larger exam scores than B). This would indicate that the outliers in our sample don’t have that much influence on the result of our t-test (indicating that we had a reliable inference from the t-test). However, if we didn’t reject Ho (in the NP test) to conclude population A values tend to be larger than pop. B values, then we might say that the outliers gave the t-test a questionable significance (by pulling the means in certain directions because the outliers were “too extreme”).

Think about these points, in particular, point 3. This should make sense now why outliers don’t necessarily make the t-test inappropriate (as you initially stated). I think with a bit of rumination, the examples will give you the “aha!” moment.

Dear Friend,

  1. I just asked for one simple thing - Please provide the reference (name of the book).

  2. You should also read what i wrote before and after the following statement - “ If your sample seems to have outliers, t-test is not correct. ” Statements are interconnected.

This is happening now-a-days quite often, especially in the political arena. An individual says something. Other person takes a line out of it. He blows it out of proportion and brings a totally new meaning out of the line.

Thanks

Phil

Dear Pal,

I tried one (even simpler) thing before I consider looking through a book or two. The thought experiment doesn’t require a big leap in logic when you remember the foundations of these methods. Again, I’ll ask you to try it and let me know what your thoughts are on that example.

  1. This is what you wrote before and after that statement: " In real world, you rarely know the mean and standard deviation of a population. E.g. - What is the mean and standard deviation of Blue Whale’s length ? You almost always use sample mean and sample standard deviation. However, you do not, blindly, start to use t-test. You have to check the distribution of your sample. If your sample seems to have outliers, t-test is not correct. Bootstrap method is used when sample has outliers and you can not get a large enough sample. "

I recall saying that I don’t disagree with the authors in a technical, black and white sense, but my point was more of practical application. I also agreed that bootstrapping could be used, but I thought that a nonparametric technique would be useful in allowing you to see if the t-test (or some other parametric method) is really inappropriate due to outliers or violated assumptions (this from experience of working with someone far more qualified than either of us). Only later did you give a reference with the words that implied a more severe situation (I hope you see the difference between an outlier and an “extreme” outlier). Furthermore, the rest of the statements in that block of text don’t change the accuracy of your statement, which is why I was able to directly use the one sentence that mattered. This has nothing to do with blowing it out of proportion, and I don’t believe I took your statement out of context. So, I ask you for clarification: do you believe that the t-test is uniformly incorrect when you have outliers (this is the essence of your statement)?

  1. If you read my prior post, I mentioned that some of this information was disseminated to me (on several occasions) from an expert on the matter. I didn’t read it in a textbook, but it might be in one (it wouldn’t surprise me). I thought we could try the thought experiment before I decide to flip through old text books or spend time looking for references. I might do that now if I find some time. To be clear so we get it right: What is it exactly that you want a reference for in this case? There are quite a few things I can find, but I want to make sure to give you the information you’re looking for.

  2. Again, I’m quite sorry you feel that I’ve misinterpreted what you meant by " If your sample seems to have outliers, t-test is not correct." I read your words as you wrote them.