You only want to buy the machine if you have “reasonable assurance” that the machine can find success at least 1.0% of the time.
You run the machine through 10,000 trials and find that it has 120 successes, or a 1.2% success rate. Of course it’s possible this is a fluke. Can we be confident the machine meets the required specs? (allow the typical 5% window for a type II error)
Depending on your perspective (or who ever is asking the question), you could use a Bayesian technique to answer this or a Frequentist technique .
The Frequentist technique would only let you reject or fail to reject the null that success rate is less than or equal to one against an alternative hypothesis that the success rate is greater than one. This won’t have an associated probability telling you the chance that you’re right or wrong (i.e. it can’t tell you there is a 0.95 probability of being correct after rejecting Ho).
A Bayesian technique would allow you to say “there is a 0.95 probability that we’ve correctly rejected the null”(more or less).
It sounds like you’re trying to make the statement that a Bayesian technique would allow.
Your sample variance is np(1-p) = (10000)(.012)(.988)= 118.56
Your sample standard deviation is sqrt(variance) = sqrt (118.56) = 10.88
You have a high sample size so you can use the z-Test. You want to a 95% confidence that the population mean is above 1.0.
Since you are finding whether not your machine is greater than 1.0%, you use a one sided z-test (i.e., you are using all your statistical ammunition on one side). So, we are going to use a one sided z-test with 95% confidence. Find the z-score on the normal table that lines up with .05 probability . Turns out to be -1.64 (which you should probably memorize as the two sided version of the 10% interval.
Your z-score is (100- 120)/10.88 = -1.838
Since -1.838 is outside our 95% confidence range, we can say with 95% confidence that the machine exceeds 1% effectiveness.
Baysian techniques get hard with a sample size that large. You would be looking at a combinatoric with 10000 as one of the terms and 120 terms to multiply and divide by, which would be pretty unmanagable in a test environment. Even with a computer, that’s a pretty spicy meatball.
you accidentally use the Variance to compute the z-Score and the order should be reversed. The formula is:
(sample mean- mu)/STD=120-100/10.88
2) Question:
You use the sample standard deviation to compute the z-score. Shouldn’t we use the population variance (here it is given to us as: 100*0.01*(1-0.01)=9.95)?
The question that this type of test asks is: Given the observations that we made in the sample (namely a mean value of the sample, 120, is not exactly the same as the theoretical one 100), is it possible that the difference is due sampling error and not significant. Now if you have a sample with a very large variance, you might end up ‘failing to reject’ the hypothesis, that your mean is significantly different, just because your large variance is making the z-score very small.
Darn it… Stupid multi-tasking. I have updated the equation above to reflect the right answer (though I am pretty sure that it is the test - sample mean, not the other way around.
The way I read the question is that we do not know the true population mean or variance and are trying to test whether or not the true population mean is above 1% with a confidence of 95%. I guess you could turn it around and say something like given a known population mean of 100, what is the proability that a sample test will be 120 or greater. In that case, yes, you would use the population mean and reverse the sample and population mean.
I thought about it some more and I think what we actually need to do here is to compute the Standard Error. We want to know whether the sample mean that we observed in our sample (120) is significantly different (never mind the actual wording in the question) from the population mean. For this type of test, we need to look at the sampling distribution (not sample distribution). We want to know, given the distribution of possible sample means, how likely is it, to observe a mean that is this far a way from the population mean (which btw brings us back to S2000 original hint).
Imagine we run these 10,000 run samples a couple of times, say we do it 30 times, then these 30 means should be normally distributed around the population mean (100 in our case). Now the question we ask is, how likely is it, to observe a mean of 120 given this normal distribution.
Interesting approach. Frankly, I think we would need to see the actual question verbiage to determine what they are looking for. The question statement above seemed to indicate to me that we were looking for reasonable assurance as what the machine actually produces on a consistent basis (or the population mean) is above 1%, which would makes we believe we are testing the population mean, not the distribution of possible sample means.
I think you’re going to need a standard error. You’re trying to make a statement regarding the true sucess rate (really a proportion, but you could say mean successes per 100/1,000/10,000). In making statements about parameters, you need to look to the sampling distribution. They aren’t asking you to describe the sample at hand, which would only require the sample standard deviation.
The original post also says allow 5% for a Type II error, so this doesn’t indicate a 95% level of confidence. It indicates a power of 95%…unless the original post should have said 5% for a Type I error…
As for calculating the test statistic, you subtract the null value from your sample value (x-bar minus mu; rho-hat minus rho; so on and so forth). This should make intuitive sense with the question: How likely are you to see something at least as extreme as 1.2%, given that the true rate is 1%? Distribution is centered at 1%, 1.2% is to the right of center (larger). We should have a positive test statistic for this, since we’re seeing how far above the center 1.2% lies and we want to know what the probability is of getting something at least that far above the center.
This, in itself, should direct you to use a standard error. You’ve essentially defined the p-value in the context of this problem.
Generically, a p-value is the probability of obtaining a test result at least as extreme as the current one, assuming the null is true. (This is also what S2000 was leading towards, if I’m not mistaken.)
You don’t need to rephrase the question to “reverse the sample and population” values. The test statistic numerator is calculated as (observation-expectation) = (value estimated from sample - assumed parameter value)= (0.012-0.010).
Edited to maybe leave something open for discussion…
Assuming you meant 5% for a Type I error, here’s some output from a Stat package…what do you think?
Hypothesis Test - One Proportion Sample Size 10000 Successes 120 Proportion 0.01200 Null Hypothesis: P = 0.01 Alternative Hyp: P > 0.01 Difference 0.00200 Standard Error 0.00109 Z (uncorrected) 2.01 P 0.0222 Z (corrected) 1.96 P 0.0250 Method 95% Confidence Interval Simple Asymptotic (0.00987, 0.01413) Simple Asymptotic with CC (0.00982, 0.01418) Wilson Score (0.01005, 0.01433) Wilson Score with CC (0.01000, 0.01438) Notes on C.I.: 1) CC means continuity correction. 2) Wilson Score method with CC is the preferred method, particularly for small samples or for proportions close to 0 or 1.
I thought about this thing some more (it really got me down the rabbit hole) and I retract my conjecture about the necessity of using the standard error and I believe JSD’s interpretation is the right one. I don’t think we should look at the 10,000 simulations as part of a sample but rather the number of trials of the entire binomial distribution. That is, we have a binomial distribution with two parameters, namely n and p. The original two choice (success vs failure) situtation is a rather nothing else than a Bernoulli trial with two possible outcomes. It only becomes a binomial distribution by introducing the number of observations.
That being said, we are given a distribution with a mean = n*p=10,000*0.012=120 and a variance =n*p(1-p)=10,000*0.012*0.99=118.56, or sd= 10.89 (as JSD arleady computed). Where we use the p that we observe in the process because this is the relevant distribution. Now given this information, the question that we ask is (and this is the one S2000 already asked from the beginning, however this time around regarding the 1%), what is the probability of observing an outcome such as (at least) p=0.01 (100 out of 10.000) or larger if the mean is p=0.012?
Answer: This boils down to simply computing the Z-score (again as JSD already did above):
Z=x-mu/sd=100-120/10.89=-1.84
If we look this up in the z-table, we find the probabily for this value (or lower) 0.0329 or 3.29%. Now since we are interested in the probability of being above 1% we compute the contrafactual, i.e. 1-0.0329=0.9671 or 96.71%. Thus we know that in 96.71% of all cases we have a success rate of 1% or above. What do you guys think?
I certainly do not discount what tickersu is saying and he definitely seems to know what he is talking about. However, I just can’t replicate the results he seems to be getting at. It appears he is using a test against a single mean set forth in Section 3.1 of Reading 11. When I try to calculate the test out, I get very large t-value, which appears to be a consequence of the very large sample size.
Question to mlwl8521 - Can you let us know where the question came from and how the authors took it on?
I’m conducting a test for a single proportion, not for the mean number of outcomes. I don’t believe the CFA curriculum explicitly covers hypothesis testing (or CIs) for proportions.
The way I read this question was that a sample of 10,000 bernoulli trials was taken. This would be a binomial sampling of n=1 (since a binomial random variable is the sum of a bunch of bernoulli trials). In other words, I viewed it as a single observation of a binomial random variable where the number of bernoulli trials in each binomial observation is 10,000, and we observed 120 successes (this is one of many values in the binomial distribution for the variable, but it’s all we sampled).