Power for one-sample test
If we have a sample of size n and we reject the one sample null hypothesis that μ = μ0, then the power of the one-tailed t-test is equal to 1 − β where
and the noncentrality parameter takes the value δ = d where d is the Cohen’s effect size
and μ and σ are the population mean and standard deviation.
If the test is a two-tailed test then
Note that the degrees of freedom is df = n − 1.
Example 1: Calculate the power for a one-sample, two-tailed t-test with null hypothesis H0: μ = 5 to detect an effect of size of d = .4 using a sample of size of n = 20.
The result is shown in Figure 1.
Figure 1 – Power of a one-sample t-test
Here we used the Real Statistics function NT_DIST. The Real Statistics Resource Pack also supplies the following function to calculate the power of a one-sample t-test.
Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:
T1_POWER(d, n, tails, α, iter, prec) = the power of a one sample t test when d = Cohen’s effect size, n = the sample size, tails = # of tails: 1 or 2 (default), α = alpha (default = .05) ), iter = the maximum number of terms from the infinite sum (default 1000) and prec = the maximum amount of error acceptable in the estimate of the infinite sum unless the iteration limit is reached first (default = 0.000000000001).
For Example 1, T1_POWER(.4, 20) = 0.396994. Note that the power of the one-tailed test yields the value T1_POWER(.4, 20, 1) = 0.531814, which as expected is higher than the power of the two-tailed test.
Power for paired-sample test
The paired sample test is identical to the one-sample t-test on the difference between the pairs. If the two random variables are x1, with mean μ1 and x2, with mean μ2, and the standard deviation of x1 − x2 is σ, then power is calculated as in the one-sample case where the noncentrality parameter takes the value δ = d and d is the Cohen’s effect size:
Example 2: Calculate the power for a paired sample, two-tailed t-test to detect an effect of size of d = .4 using a sample of size n = 20.
The answer is the same as that for Example 1, namely 39.7%
Example 3: Calculate the power for a paired sample, two-tailed t-test where we have two samples of size 20 and we know that the mean and standard deviation of the first sample are 10 and 8, the mean and standard deviation of the second sample are 15 and 3 and the correlation coefficient between the two samples is .6.
The power is 89% as shown in Figure 2.
Figure 2 – Power of a paired sample t-test
Based on the definition of correlation and Property 6b of Correlation Basic Concepts
For Example 3, this means that
We can now calculate the effect size d as follows:
Power for independent-samples test
If we have two independent samples of size n, and we reject the two-sample null hypothesis that μ1 = μ2, then the power of the one-tailed test is equal to 1 − β where
df = 2n − 2 and the noncentrality parameter takes the value δ = d where d is Cohen’s effect size
assuming that the two populations have the same standard deviation σ (homogeneity of variances).
If the test is a two-tailed test then
If the two samples have difference sizes, say n1 and n2, then the degrees of freedom are, as usual, n1 + n2 − 2, but the noncentrality parameter takes the value δ = d where n is the harmonic mean between n1 and n2 (see Measures of Central Tendency).
Example 4: Calculate the power for a two-sample, two-tailed t-test with null hypothesis μ1 = μ2 to detect an effect of size d = .4 using two independent samples of size 10 and 20.
The power is 16.9% as shown in Figure 3.
Figure 3 – Power of a two-sample t-test
As for the one-sample case, we can use the following function to obtain the same result.
Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:
T2_POWER(d, n1, n2, tails, α, iter, prec) = the power of a two sample t test when d = Cohen’s effect size, n1 and n2 = the sample sizes (if n2 is omitted or set to 0, then n2 is considered to be equal to n1), tails = # of tails: 1 or 2 (default), α = alpha (default = .05), iter = the maximum number of terms from the infinite sum (default 1000) and prec = the maximum amount of error acceptable in the estimate of the infinite sum unless the iteration limit is reached first (default = 0.000000000001).
For Example 4, T2_POWER(.4, 10, 20) = 0.169497.
References
Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf
Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160.
UCLA Statistical Consulting Group (2021) Power analysis for two-group independent sample t-test | G*Power data analysis examples
https://stats.oarc.ucla.edu/sas/modules/introduction-to-the-features-of-sas/
Hi Charles,
Your remarks on power calculations have been extremely helpful to me. I have a somewhat hypothetical question. It is common — at least in my field — to report the significance level of the alpha, or type I error comparison. But calculating effect size, and subsequently the power permits an estimation of the beta, or type II error level as well. Why is this not reported along with the alpha level — or, should it be? Are the two estimates not independent, or some other reason? Thanks in advance for your reply, and for your very helpful website!
Mike
Generally, statistical power is used to calculate sample size prior to collecting data. Post-hoc calculation of power is not that useful.
Charles
Hi Charles,
I am trying to calculate statistical power in a stand-alone program (*not* in Excel) using the equation Za-Zb=d*sqrt(n), where Za and Zb are Z score values of alpha and beta, respectively, d is effect size, and n is # of pairs, on paired sampled data. I believe this equation applies to a one-tailed comparison. Can you tell me the appropriate equation for a two-tailed calculation?
Thanks very much!
The formula Za-Zb=d*sqrt(n) means that Zb = Za – d*sqr(n), and so you can compute b = F(Za-d*sqrt(n)), where F(x) is the standard normal distribution at x, which is NORMSDIST(x) in Excel. Note too that Za is the critical value for the right tail, which in Excel is NORMSINV(1-a). Thus the power of the test = 1-b = 1-F(Za-d*sqrt(n)) = F(d*sqrt(n)-Za)
For the two-tailed version, we need to use the critical value at a/2, i.e. alpha/2, which is NORMSINV(1-a/2) in Excel. I will call this Za/2. The power of the two-tailed test = F(d*sqrt(n)-Za/2) + F(-d*sqrt(n)-Za/2).
This is how the Real Statistic NORM1_POWER function is implemented.
Charles
Hi Charles !
I just discovered your website and it’s extremely helpful, packed with very clear and complete information so, thanks a lot for all of this!
Just like Mike here, I’ve been trying to transpose sample sizes and power calculations in MatLab (so not in Excel), and some implemented functions are very limited so I’m trying to include “homemade” functions.
Could you help me determine the best way to calculate power for independent samples, either for one- and two-tailed comparison ?
I have tried this but seems off:
SD = sqrt( ((N1-1)*SD1^2) + ((N2-1)*SD2^2)) / (N1+N2 – 2) );
d = abs(mu1-mu2)/SD; % effect size
ZA = norminv (1 – a);
P = F(ES * sqrt( harmmean([N1,N2]) ) – ZA);
twotailed version :
ZA = norminv(1-a/2);
P = F(ES * sqrt(harmmean([N1,N2]) – ZA) + F(-ES * sqrt(harmmean([N1,N2]) – ZA);
Based on your Example 2, I get P=0.43 (twotailed) and P=0.56 (onetailed).
As in the paired t-test, it’s close but not quite the same values as you have calculated with RealStatistics, and I’m not sure what to make of it, as I’m no mathematician…
Thanks in advance and once again thanks for your amazing website !
Thibault
Thibault,
Thanks for your kind words about Real Statistics.
It seems that ZA in your calculation is based on the normal distribution, which only approximates the t distribution, while the calculations shown on the Real Statistics website are based on the t distribution.
Charles
Hey Charles,
that was absolutely right, and I corrected my code as such, which solved the issue for one-sample t-test!
However I still get power values way higher than yours with two-samples independent t-test. The issue seems to be due to the P equation, because everything else is correct…
df = (N1+N2) – 2;
SD = sqrt( ((N1-1)*SD1^2) + ((N2-1)*SD2^2)) / df);
ES = abs(mu1-mu2)/SD;
_ one_tailed:
ZA = tinv(1 – a, df);
P = tcdf(ES * sqrt( harmmean([N1,N2]) ) – ZA, df);
_ twotailed:
ZA = tinv( (1-a)/2,df);
P = tcdf(ES * sqrt( harmmean([N1,N2]) ) – ZA, df) + tcdf(-ES * sqrt( harmmean([N1,N2]) ) – ZA ,df);
From example 4, I end up with P = 0.406 (onetailed) or P = 0.281 (twotailed).
I’ve been toying with it for the last 24hours, so if you have any more input it would be quite helpful !
Thanks again
Are you using the non-central t distribution or the ordinary t distribution?
Charles
Greetings,
I have a power analysis problem that doesn’t seem to fit the usual independent, two-sample t-test model. I have a set of nine independent chemical concentrations from stormwater at a location before a physical treatment was installed. The treatment was a filtering system designed to remove toxins in the stormwater. After the treatment was installed, an additional set of five concentrations were measured. The two sets were compared using a typical independent two sample t-test to determine any effect of the physical treatment. The tests were one-way as the client wanted to know if the treatment was reducing the levels of the chemicals in the stormwater. Of course, the results varied by analyte. The client now wants to know have many more post-installation samples need to be taken for better analytical power (e.g., if we take six more samples, can we see a 20% reduction?). The problem I have is that the usual techniques for two-sample t-test power analysis seem to assume once can add more data to each of the two samples. That can’t be done here with the pre-installation data – that period is over. I’d appreciate any advice you could supply on how to answer the client’s question.
Hello Peter,
When you ask “if we take six more samples, can we see a 20% reduction?”, what are you trying to “reduce”? It can’t be the statistical power.
Charles
Hello Charles,
The concentrations of various analytes. The client hopes to show that the installed physical treatment has lowered average concentrations found in the stormwater measured during the pre-construction period by 20%. It is a “before and after” comparison.
Peter
Hello Peter,
This is not the same as statistical power. In any case, perhaps you can use a paired t-test for a before and after analysis. If the assumptions of this test are not met, then a signed-ranks test is probably the best test to use.
Charles
I found my error. Please delete my prior comment – Thank you!
I’m trying to calc the power of a two-tailed, two-sample t-test
I’ve input your formulas, but I’m getting a different value for beta.
Help?
Values = https://i.imgur.com/pkSU3Sr.png
Formulas = https://i.imgur.com/EMm2OYq.png
Charles,
In Figure 3 (Cell AU11), why does the formula multiply the alpha value by 2 (ie. AS4*2) for a 1-tailed test? This results in an alpha level of 0.10. Would you please explain?
Thanks for all the good work that you’re doing.
Tuba
Hi Tuba,
The formulas TINV and T.INV.2T are for the two-tailed t-test and so to get a one-tailed test you need to double the alpha value.
Charles
Could someone please refer me to an online calculator for estimating statistical power for detecting significance
-if the effect size of 0.5
-where Group 1 consists of 58 marijuana users
-Group 2 consists of 193 non-marijuana users
I want to compare the respective means of the 2 groups for a continuous variable that can have values between 0 and 10.
If there is no online calculator, can someone give me a formula for this computation?
Thanks you.
Brenda,
The Real Statistics Statistical Power and Sample Size data analysis tool can be used for this calculation. See
https://real-statistics.com/hypothesis-testing/real-statistics-power-data-analysis-tool/
Charles
Dear Charles,
I would like to have your help to clarify me some doubts about correct interpretation of relationships among sample size, statistical power and effect size.
In fact, in a real case, given two samples of independent data with known sizes,
I can do my t-test, I will obtain some value for effect size and then
I will compute which is the value of beta for this t-test.
Anyway, by referring to your Example 4, I could also use to Excel Goal Seek capability
to compute which value of d will give a desired value of beta.
For instance, to obtain a power=80%, I get d=1.124. This should mean that the t-test can not detect a difference between means below 1.124*SD (SD=pooled standard deviation),
if we want to keep the power of the test at least at 80%.
But even if formally correct, this statement seems to me a statistical non-sense.
What is your opinion at this regard? Do you think that in practice it is meaningful
to set n1 ,n2, alfa, beta and then see which would be the effect size?
I hope to have been clear enough in my question.
Thank you very much for your comments
Piero
Dear Charles,
I am trying to recalculate a t-test’s power using standard Excel commands, and am a bit confused about the F-distribution you use to calculate t_crit’s probability. Shouldn’t the non-central F-distribution not be used, with three parameters: (df1, df2, ncp)?
Kind regards,
Peter
Peter,
You don’t need the noncentral F distribution to calculate the power of the t test.
The F function that you see on the webpage is the cumulative distribution function of the t distribution.
Charles
Dear Charles,
So you mean the non-central t-distribution?
The cumulative distribution only takes one df, not two as indicated by the F function on your webpage.
(And to clear up my confusion: F here then designates “primitive function” or “antiderivative”, as opposed to “F-distribution”?)
Regards,
Peter
Peter,
1. No, the ordinary t distribution.
2. F(x) is the cdf (cumulative distribution function). See the following webpage:
https://real-statistics.com/probability-functions/continuous-probability-distributions/
Charles
…so where does the ncp that you calculated come in, then? The arguments to the ordinary t-distribution take t, df, and TRUE or FALSE for a cumulative distribution.
Hello Charles,
Is the noncentrality parameter actually the same as the t value? In that case, should this method return the same power values as the “classical” approach you describe under “One Sample T Test”?
Also, is the noncentral t distribution always symmetric?
Many thanks in advance,
Fred
Fred,
1. The noncentrality parameter is not the same as the t value
2. The noncentral t distribution is not symmetric
See the following webpage
Noncentral t distribution
Charles
Dear Charles,
Mean± SD: A=6.0± 2.6 (n=169); B=4.5± 2.3 (n=172).
Student t=5.645, Welsh t=5.639
Cohen d = 0.43
T2_power returns 98% but there is a problem with the upper limit of CI: 51% – 95%.
NCP(LL) = 0.214
NCP(UL)=0.4
Where is the error?
Sergey,
How did you calculate the upper limit of 95%? How did you calculate NCP(LL) and NCP(UL)?
Charles
Dear Charles,
NCP as explained in Figure 5 of “Confidence Intervals for Effect Size and Power”
NCP(LL) = NT_NCP(1-alpha, df, t)/SQRT(N) = NT_NCP(0.95, 339, 5.645)/SQRT(341) = 0.214
NCP(UL) = NT_NCP (alpha, df, t)/SQRT(N) = NT_NCP(0.05, 339, 5.645)/SQRT(341) = 0.4
Then
LL = T2_POWER(NCP(LL), n1, n2, tails, alpha) = T2_POWER(0.214, 169, 172, 2, 0.05) = 51%
UL = T2_POWER(NCP(UL), n1, n2, tails, alpha) = T2_POWER(0.4, 169, 172, 2, 0.05) = 95%
P.S. Sorry for the summer delay.
Sergey,
Can you send me an Excel file with your calculations. This will make it easier for me to follow what you have done and try to identify any errors. You can find my email address at Contact Us.
Charles
Hi,
Thank you for the site.
Iris,
You are very welcome. I hope that you find it useful.
Charles
Dr. Zaiontz,
I am working my way through the Real-Statistics web site and am finding the site interesting and informative.
I have encountered a slight technical glitch. In the section on Student’s t-Ditribution, under Statistical Power of the t-Tests, two images are not displaying (image7308 and image7310). The image numbers are shown, but not the images. All the other images on the page and in the previous sections on Basics and Distributions display properly.
I do not know if the problem is at the web site end or at my computer end. I have Windows XP, and I have tried viewing the page with both Chrome and Mozilla Firefox, with the same result.
I have one request of a different nature. Would you consider adding a section on Experimental Design? I think it would be a good fit and in the spirit of the rest of the web site.
Thank you for providing the web site, and for any help you can provide in viewing these images,
Yours truly,
Robert Kazmierczak
Robert,
Thanks for identifying that two images were missing from the referenced webpage. I have now added these images.
I agree with your suggestion of adding a webpage on Experimental Design. Given other commitments this won’t happen right away, but I will add such a webpage as soon as I can.
Charles
Charles:
I don´t understand why I have to correct the Cohen’s d (effect size) and n (sample size) to get the power for a paired sample t-test. In your example #2 (Figure 2) you use the initial values n=40 and d=.4. But you correct them later: n=20 (say that n_new=20), and calculate a new Cohen’s d (say that Cohen’s d_new=.752071) using a “ro” variable which meaning I don’t understand.
Could you please explain why I have to correct the initial value of Cohen’s d (Cohen’s d_new= f (Cohen’s d)) and the initial value of n (n_new=n/2)? And what is “ro”? Is ro=1-d? Why I have to use those formulas for correct Cohen’s d?
Thank you.
William Agurto.
Charles:
Your example #1 also confuse me: why do you correct the initial value of n? Initial value is n=40; the new value (for calculations) is n_new=20.
Thank you.
William Agurto.
William,
The initial value of 40 is wrong. It should be 20. Thanks for catching this mistake, I have now corrected it on the website.
Charles
William,
Sorry for the confusion. Two examples got conflated and some of the information was not included. I will correct this tomorrow. Once again thanks for catching this mistake.
Charles
William,
I have now corrected the example on the webpage. Hopefully it is easier to understand now.
Charles
Charles:
Now your examples and figures are absolutely understood!
Thank you very much.
William Agurto.