Statistical Power of the t tests

Power for one-sample test

If we have a sample of size n and we reject the one sample null hypothesis that μ = μ0, then the power of the one-tailed t-test is equal to 1 − β where

image9246

and the noncentrality parameter takes the value δ = d\sqrt{n} where d is the Cohen’s effect size

image7309

and μ and σ are the population mean and standard deviation.

If the test is a two-tailed test then

image9247

Note that the degrees of freedom is df = n − 1.

Example 1: Calculate the power for a one-sample, two-tailed t-test with null hypothesis H0μ = 5 to detect an effect of size of d = .4 using a sample of size of n = 20.

The result is shown in Figure 1.

Power one-sample t

Figure 1 – Power of a one-sample t-test

Here we used the Real Statistics function NT_DIST. The Real Statistics Resource Pack also supplies the following function to calculate the power of a one-sample t-test.

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:

T1_POWER(d, n, tails, α, iter, prec) = the power of a one sample t test when d = Cohen’s effect size, n = the sample size, tails = # of tails: 1 or 2 (default), α = alpha (default = .05) ), iter = the maximum number of terms from the infinite sum (default 1000) and prec = the maximum amount of error acceptable in the estimate of the infinite sum unless the iteration limit is reached first (default = 0.000000000001).

For Example 1, T1_POWER(.4, 20) = 0.396994. Note that the power of the one-tailed test yields the value T1_POWER(.4, 20, 1) = 0.531814, which as expected is higher than the power of the two-tailed test.

Power for paired-sample test

The paired sample test is identical to the one-sample t-test on the difference between the pairs. If the two random variables are x1, with mean μ1 and x2, with mean μ2, and the standard deviation of x1 − x2 is σ, then power is calculated as in the one-sample case where the noncentrality parameter takes the value δ = d\sqrt{n} and d is the Cohen’s effect size:

image7311

Example 2: Calculate the power for a paired sample, two-tailed t-test to detect an effect of size of d = .4 using a sample of size n = 20.

The answer is the same as that for Example 1, namely 39.7%

Example 3: Calculate the power for a paired sample, two-tailed t-test where we have two samples of size 20 and we know that the mean and standard deviation of the first sample are 10 and 8, the mean and standard deviation of the second sample are 15 and 3 and the correlation coefficient between the two samples is .6.

The power is 89% as shown in Figure 2.

Power paired sample t

Figure 2 – Power of a paired sample t-test

Based on the definition of correlation and Property 6b of Correlation Basic Concepts

image9119

image9120

For Example 3, this means that

image9121

image9122

image9123

We can now calculate the effect size d as follows:

image9124

Power for independent-samples test

If we have two independent samples of size n, and we reject the two-sample null hypothesis that μ1 = μ2, then the power of the one-tailed test is equal to 1 − β where

df = 2n − 2 and the noncentrality parameter takes the value δ = d\sqrt{n/2} where d is Cohen’s effect size

image7311

assuming that the two populations have the same standard deviation σ (homogeneity of variances).

If the test is a two-tailed test then

If the two samples have difference sizes, say n1 and n2, then the degrees of freedom are, as usual, n1 + n2 − 2, but the noncentrality parameter takes the value δ = \sqrt{n/2}d where n is the harmonic mean between n1 and n2 (see Measures of Central Tendency).

Example 4: Calculate the power for a two-sample, two-tailed t-test with null hypothesis μ1 = μ2 to detect an effect of size d = .4 using two independent samples of size 10 and 20.

The power is 16.9% as shown in Figure 3.

Power two-sample t-test

Figure 3 – Power of a two-sample t-test

As for the one-sample case, we can use the following function to obtain the same result.

Real Statistics Function: The following function is provided in the Real Statistics Resource Pack:

T2_POWER(d, n1, n2, tails, α, iter, prec) = the power of a two sample t test when d = Cohen’s effect size, n1 and n2 = the sample sizes (if n2 is omitted or set to 0, then n2 is considered to be equal to n1), tails = # of tails: 1 or 2 (default), α = alpha (default = .05), iter = the maximum number of terms from the infinite sum (default 1000) and prec = the maximum amount of error acceptable in the estimate of the infinite sum unless the iteration limit is reached first (default = 0.000000000001).

For Example 4, T2_POWER(.4, 10, 20) = 0.169497.

41 thoughts on “Statistical Power of the t tests”

  1. Hi Charles,

    Your remarks on power calculations have been extremely helpful to me. I have a somewhat hypothetical question. It is common — at least in my field — to report the significance level of the alpha, or type I error comparison. But calculating effect size, and subsequently the power permits an estimation of the beta, or type II error level as well. Why is this not reported along with the alpha level — or, should it be? Are the two estimates not independent, or some other reason? Thanks in advance for your reply, and for your very helpful website!

    Mike

    Reply
  2. Hi Charles,

    I am trying to calculate statistical power in a stand-alone program (*not* in Excel) using the equation Za-Zb=d*sqrt(n), where Za and Zb are Z score values of alpha and beta, respectively, d is effect size, and n is # of pairs, on paired sampled data. I believe this equation applies to a one-tailed comparison. Can you tell me the appropriate equation for a two-tailed calculation?

    Thanks very much!

    Reply
    • The formula Za-Zb=d*sqrt(n) means that Zb = Za – d*sqr(n), and so you can compute b = F(Za-d*sqrt(n)), where F(x) is the standard normal distribution at x, which is NORMSDIST(x) in Excel. Note too that Za is the critical value for the right tail, which in Excel is NORMSINV(1-a). Thus the power of the test = 1-b = 1-F(Za-d*sqrt(n)) = F(d*sqrt(n)-Za)
      For the two-tailed version, we need to use the critical value at a/2, i.e. alpha/2, which is NORMSINV(1-a/2) in Excel. I will call this Za/2. The power of the two-tailed test = F(d*sqrt(n)-Za/2) + F(-d*sqrt(n)-Za/2).
      This is how the Real Statistic NORM1_POWER function is implemented.
      Charles

      Reply
      • Hi Charles !
        I just discovered your website and it’s extremely helpful, packed with very clear and complete information so, thanks a lot for all of this!

        Just like Mike here, I’ve been trying to transpose sample sizes and power calculations in MatLab (so not in Excel), and some implemented functions are very limited so I’m trying to include “homemade” functions.

        Could you help me determine the best way to calculate power for independent samples, either for one- and two-tailed comparison ?

        I have tried this but seems off:
        SD = sqrt( ((N1-1)*SD1^2) + ((N2-1)*SD2^2)) / (N1+N2 – 2) );
        d = abs(mu1-mu2)/SD; % effect size
        ZA = norminv (1 – a);
        P = F(ES * sqrt( harmmean([N1,N2]) ) – ZA);

        twotailed version :
        ZA = norminv(1-a/2);
        P = F(ES * sqrt(harmmean([N1,N2]) – ZA) + F(-ES * sqrt(harmmean([N1,N2]) – ZA);

        Based on your Example 2, I get P=0.43 (twotailed) and P=0.56 (onetailed).
        As in the paired t-test, it’s close but not quite the same values as you have calculated with RealStatistics, and I’m not sure what to make of it, as I’m no mathematician…

        Thanks in advance and once again thanks for your amazing website !
        Thibault

        Reply
        • Thibault,
          Thanks for your kind words about Real Statistics.
          It seems that ZA in your calculation is based on the normal distribution, which only approximates the t distribution, while the calculations shown on the Real Statistics website are based on the t distribution.
          Charles

          Reply
          • Hey Charles,
            that was absolutely right, and I corrected my code as such, which solved the issue for one-sample t-test!
            However I still get power values way higher than yours with two-samples independent t-test. The issue seems to be due to the P equation, because everything else is correct…

            df = (N1+N2) – 2;
            SD = sqrt( ((N1-1)*SD1^2) + ((N2-1)*SD2^2)) / df);
            ES = abs(mu1-mu2)/SD;

            _ one_tailed:
            ZA = tinv(1 – a, df);
            P = tcdf(ES * sqrt( harmmean([N1,N2]) ) – ZA, df);

            _ twotailed:
            ZA = tinv( (1-a)/2,df);
            P = tcdf(ES * sqrt( harmmean([N1,N2]) ) – ZA, df) + tcdf(-ES * sqrt( harmmean([N1,N2]) ) – ZA ,df);

            From example 4, I end up with P = 0.406 (onetailed) or P = 0.281 (twotailed).

            I’ve been toying with it for the last 24hours, so if you have any more input it would be quite helpful !
            Thanks again

  3. Greetings,
    I have a power analysis problem that doesn’t seem to fit the usual independent, two-sample t-test model. I have a set of nine independent chemical concentrations from stormwater at a location before a physical treatment was installed. The treatment was a filtering system designed to remove toxins in the stormwater. After the treatment was installed, an additional set of five concentrations were measured. The two sets were compared using a typical independent two sample t-test to determine any effect of the physical treatment. The tests were one-way as the client wanted to know if the treatment was reducing the levels of the chemicals in the stormwater. Of course, the results varied by analyte. The client now wants to know have many more post-installation samples need to be taken for better analytical power (e.g., if we take six more samples, can we see a 20% reduction?). The problem I have is that the usual techniques for two-sample t-test power analysis seem to assume once can add more data to each of the two samples. That can’t be done here with the pre-installation data – that period is over. I’d appreciate any advice you could supply on how to answer the client’s question.

    Reply
    • Hello Peter,
      When you ask “if we take six more samples, can we see a 20% reduction?”, what are you trying to “reduce”? It can’t be the statistical power.
      Charles

      Reply
      • Hello Charles,
        The concentrations of various analytes. The client hopes to show that the installed physical treatment has lowered average concentrations found in the stormwater measured during the pre-construction period by 20%. It is a “before and after” comparison.
        Peter

        Reply
        • Hello Peter,
          This is not the same as statistical power. In any case, perhaps you can use a paired t-test for a before and after analysis. If the assumptions of this test are not met, then a signed-ranks test is probably the best test to use.
          Charles

          Reply
  4. Charles,

    In Figure 3 (Cell AU11), why does the formula multiply the alpha value by 2 (ie. AS4*2) for a 1-tailed test? This results in an alpha level of 0.10. Would you please explain?

    Thanks for all the good work that you’re doing.

    Tuba

    Reply
  5. Could someone please refer me to an online calculator for estimating statistical power for detecting significance
    -if the effect size of 0.5
    -where Group 1 consists of 58 marijuana users
    -Group 2 consists of 193 non-marijuana users

    I want to compare the respective means of the 2 groups for a continuous variable that can have values between 0 and 10.

    If there is no online calculator, can someone give me a formula for this computation?

    Thanks you.

    Reply
  6. Dear Charles,

    I would like to have your help to clarify me some doubts about correct interpretation of relationships among sample size, statistical power and effect size.
    In fact, in a real case, given two samples of independent data with known sizes,
    I can do my t-test, I will obtain some value for effect size and then
    I will compute which is the value of beta for this t-test.

    Anyway, by referring to your Example 4, I could also use to Excel Goal Seek capability
    to compute which value of d will give a desired value of beta.
    For instance, to obtain a power=80%, I get d=1.124. This should mean that the t-test can not detect a difference between means below 1.124*SD (SD=pooled standard deviation),
    if we want to keep the power of the test at least at 80%.
    But even if formally correct, this statement seems to me a statistical non-sense.

    What is your opinion at this regard? Do you think that in practice it is meaningful
    to set n1 ,n2, alfa, beta and then see which would be the effect size?

    I hope to have been clear enough in my question.

    Thank you very much for your comments
    Piero

    Reply
  7. Dear Charles,

    I am trying to recalculate a t-test’s power using standard Excel commands, and am a bit confused about the F-distribution you use to calculate t_crit’s probability. Shouldn’t the non-central F-distribution not be used, with three parameters: (df1, df2, ncp)?

    Kind regards,

    Peter

    Reply
  8. Hello Charles,

    Is the noncentrality parameter actually the same as the t value? In that case, should this method return the same power values as the “classical” approach you describe under “One Sample T Test”?
    Also, is the noncentral t distribution always symmetric?
    Many thanks in advance,
    Fred

    Reply
  9. Dear Charles,
    Mean± SD: A=6.0± 2.6 (n=169); B=4.5± 2.3 (n=172).
    Student t=5.645, Welsh t=5.639
    Cohen d = 0.43
    T2_power returns 98% but there is a problem with the upper limit of CI: 51% – 95%.
    NCP(LL) = 0.214
    NCP(UL)=0.4
    Where is the error?

    Reply
      • Dear Charles,
        NCP as explained in Figure 5 of “Confidence Intervals for Effect Size and Power”
        NCP(LL) = NT_NCP(1-alpha, df, t)/SQRT(N) = NT_NCP(0.95, 339, 5.645)/SQRT(341) = 0.214
        NCP(UL) = NT_NCP (alpha, df, t)/SQRT(N) = NT_NCP(0.05, 339, 5.645)/SQRT(341) = 0.4
        Then
        LL = T2_POWER(NCP(LL), n1, n2, tails, alpha) = T2_POWER(0.214, 169, 172, 2, 0.05) = 51%
        UL = T2_POWER(NCP(UL), n1, n2, tails, alpha) = T2_POWER(0.4, 169, 172, 2, 0.05) = 95%
        P.S. Sorry for the summer delay.

        Reply
        • Sergey,
          Can you send me an Excel file with your calculations. This will make it easier for me to follow what you have done and try to identify any errors. You can find my email address at Contact Us.
          Charles

          Reply
  10. Dr. Zaiontz,

    I am working my way through the Real-Statistics web site and am finding the site interesting and informative.

    I have encountered a slight technical glitch. In the section on Student’s t-Ditribution, under Statistical Power of the t-Tests, two images are not displaying (image7308 and image7310). The image numbers are shown, but not the images. All the other images on the page and in the previous sections on Basics and Distributions display properly.

    I do not know if the problem is at the web site end or at my computer end. I have Windows XP, and I have tried viewing the page with both Chrome and Mozilla Firefox, with the same result.

    I have one request of a different nature. Would you consider adding a section on Experimental Design? I think it would be a good fit and in the spirit of the rest of the web site.

    Thank you for providing the web site, and for any help you can provide in viewing these images,

    Yours truly,
    Robert Kazmierczak

    Reply
    • Robert,

      Thanks for identifying that two images were missing from the referenced webpage. I have now added these images.

      I agree with your suggestion of adding a webpage on Experimental Design. Given other commitments this won’t happen right away, but I will add such a webpage as soon as I can.

      Charles

      Reply
  11. Charles:

    I don´t understand why I have to correct the Cohen’s d (effect size) and n (sample size) to get the power for a paired sample t-test. In your example #2 (Figure 2) you use the initial values n=40 and d=.4. But you correct them later: n=20 (say that n_new=20), and calculate a new Cohen’s d (say that Cohen’s d_new=.752071) using a “ro” variable which meaning I don’t understand.
    Could you please explain why I have to correct the initial value of Cohen’s d (Cohen’s d_new= f (Cohen’s d)) and the initial value of n (n_new=n/2)? And what is “ro”? Is ro=1-d? Why I have to use those formulas for correct Cohen’s d?

    Thank you.

    William Agurto.

    Reply
    • Charles:

      Your example #1 also confuse me: why do you correct the initial value of n? Initial value is n=40; the new value (for calculations) is n_new=20.

      Thank you.

      William Agurto.

      Reply
      • William,
        The initial value of 40 is wrong. It should be 20. Thanks for catching this mistake, I have now corrected it on the website.
        Charles

        Reply
    • William,
      Sorry for the confusion. Two examples got conflated and some of the information was not included. I will correct this tomorrow. Once again thanks for catching this mistake.
      Charles

      Reply

Leave a Comment