Statistical Power and Sample Size

Objective

As described in Null Hypothesis Testing, beta (β) is the acceptable level of type II error, i.e. the probability that the null hypothesis is not rejected even though it is false. Statistical power is 1 – β, and so represents the probability that the null hypothesis is correctly rejected (when it is false), or in other words it is the probability that an effect is correctly identified. 

We now show how to estimate the power of a statistical test and the related problem of determining the minimum sample size for a test to achieve a desired power. We look at the following situations:

Power Curve

Example 1: Suppose bolts are being manufactured using a process for which it is known that the length of the bolts follows a normal distribution with a standard deviation of 12 mm. The manufacturer wants to check that the mean length of their bolts is 60 mm, and so takes a sample of 110 bolts and uses a one-tail test with α = .05 (i.e. H0: µ ≤ 60). What is the probability of a type II error if the actual mean length is 62.5?

Since n = 110 and σ = 12, the standard error = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{110}} = 1.144. Let x = the length of the bolt. The null hypothesis is rejected provided the sample mean is greater than the critical value of x, which is NORM.INV(1 – α, μ, s.e.) = NORM.INV(.95, 60, 1.144) = 61.88.

Now suppose that the actual mean is 62.5. The situation is illustrated in Figure 1, where the curve on the left represents the normal curve being tested with a mean of μ0 = 60, and the normal curve on the right represents the real distribution with a mean of μ1 = 62.5.

Statistical power chartFigure 1 – Statistical power

Since
image5023

We have β = NORM.DIST(61.88,62.5,1.144,TRUE) = .295, and so power = 1 – β = .705.

We can repeat this calculation for values of μ1 ≥ 62.5 to obtain the table and graph of the power values in Figure 2.

Statistical power curve

Figure 2 – Power curve for Example 1

Example

Example 2: For the data in Example 1, answer the following questions:

  1. What is the power of the test for detecting a standardized effect of size .2?
  2. Which the smallest effect size (and mean) that can be detected with power .80?
  3. What sample size is required to detect an effect of size of .2 with power .80?

Using Goal Seek

Determine power

a) As described in Standardized Effect Size, we use the following measure of effect size:

image486

Assuming μ1 ≥ 60,  we see that μ1= 60 + (.2)(12) = 62.4. As in Example 1, it follows that

β = P( ≤ xcrit | μ = 62.4) = NORM.DIST(61.88, 62.4, 1.144, TRUE) = .325

and so power = 1 – β = .675.

We summarize these calculations in the following worksheet:

Power what-if analysis

Figure 3 – Determining power based on effect and sample size

Determine effect size

b) We utilize Excel’s Goal Seek capability to answer the second question. Referring to the worksheet in Figure 3, we now select Data > Forecast| What-If Analysis... > Goal Seek (or Data > Data Tools | What-If Analysis.. > Goal Seek). In the dialog box that appears enter the values shown in Figure 4.

Dialog box goal seek

Figure 4 – Goal Seek dialog box

Here, we are requesting that Excel find the value of cell B9 (the effect size) that produces a value of .8 for cell B12 (the power). Here the first entry must point to a cell that contains a formula. The Set cell field in the dialog box must point to a cell which contains a formula. The To value field must contain a numeric value and the By changing cell field must point to a cell which contains a value (possibly blank) and not a formula. After clicking on the OK button, a Goal Seek Status dialog box appears and the worksheet from Figure 3 changes to that shown in Figure 5.

Detectable effect size

Figure 5 – Determining detectable effect size for specified power

Note that the values of some of cells have changed to the values necessary to obtain the target power of .80. In particular, we see that the Effect size (cell B9) now contains the value 0.23691 and the Actual mean contains the value 62.84. You must click on the OK button in the Goal Seek Status box to lock in these new values (or Cancel to return to the original worksheet values).

Thus, the smallest effect size that can be detected with power of 80% is .23691, with a corresponding mean of 62.84.

Determine sample size

c) We again use Excel’s Goal Seek capability to answer the third question. Referring to the worksheet in Figure 3 (making sure to reset the effect size in cell B9 to .2), we now enter the values shown in the dialog box that appears on the right side of Figure 6.

Minimum sample size needed

Figure 6 – Using Goal Seek to determine minimum sample size

After clicking on the OK button, the worksheet changes to that shown in Figure 7.

Sample size requirements

Figure 7 – Sample size requirement for Example 2

In particular, note that the sample size value in cell B6 changes to 154.486. The required sample size is therefore 155 (rounding up).

Alternative Approach

We next show how to use the standard normal distribution to accomplish the same power calculations. We can use this approach for both one-tailed and two-tailed tests.

Determine statistical power

As we see from Figure 3

x_crit

and so

z_crit

It now follows that

x_crit from z_crit

Also from Figure 3, we see that

Formula for power 1

Formula for power 2

where Φ(z) = NORM.S.DIST(z, TRUE), i.e. the cdf of the standard normal distribution, and d is Cohen’s effect size (see Standardized Effect Size). The value dn is called the non-centrality parameter.

Thus, we now have an alternative way of addressing Example 2 (a), as shown in Figure 8.

Power for effect-size

Figure 8 – Determining power for a given effect size

Determine effect size and sample size

As we saw above

Formula for power

and so

Equivalent formula for beta

Solving for d, we get

Formula for Cohen's d

where zβ = NORM.S.INV(β). Solving for n, we get

Formula for sample size

These formulas provide an alternative way of addressing Example 2 (b) and (c), as shown in Figures 9 and 10. Note that this approach doesn’t require the Goal Seek capability.

Effect size power calculation

Figure 9 – Determining detectable effect size for specified power

Sample size power calculation

Figure 10 – Determining sample size for a given effect size

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160.
http://link.springer.com/article/10.3758/BRM.41.4.1149

STAT (2015) Power and sample size reference manual, release 13
http://www.stata.com/manuals13/pss.pdf

Bhandari, P. (2021) Statistical power and why it matters | a simple introduction
https://www.scribbr.com/statistics/statistical-power/

33 thoughts on “Statistical Power and Sample Size”

  1. Hello Charles,
    I would like to perform analysis as shown in figure 3, but I wish to look for an effect size in the negative direction (mu1 less than mu0). How should I proceed? I’m sorry if you’ve already answered this question, but I looked through the comments and I’m still not sure how to go about it.

    Reply
  2. Dear Charles

    Hope you are doing well, could you please clarify the followings.

    In the real scenario, we are using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and the second stage using cluster sampling techniques). So in many cases, each element of the sample does not have the same overall probability of selection (unless select equal number of elements is chosen in each cluster at the second stage of sample selection).
    My question, if it is a case (each element of the sample does not have the same overall probability of selection)
    • Shall we do the Z or t-test for this selected sample?

    My second query

    In the real survey, we are facing difficulties to estimate population parameter confidence interval when using multi-stage sampling (for example, first stage Probability Proportional to Size techniques and second stage using cluster sampling techniques) as facing difficulties to calculate the standard error. Could you please to suggest or circulate good guide that describes the equations for calculating estimator (sample mean, sample proportion, sample total) and its variances in the multi-stage sampling (example, PPS first stage and cluster/stratifies in the second stage)

    To overcome this problem shall we use the self-weighing technique?

    Whenever possible, clusters should be chosen with probability-proportional-to-size in sample surveys at the first stage.

    A second is that, if an equal number of elements is chosen in each cluster at the second stage of sample selection, the end result will be a sample in which each element has the same overall probability of selection, or is self-weighting.

    Then we can estimate population parameter and its variance using the same technique of simple random sampling without replacement as each element of the sample is having equal overall probability of selection.

    In this case, shall we use Z or t-test for this sample?

    Reply
  3. Dear Sir

    Hope you are doing well, I want to ask a clarification when your time permit, please throw some light on it.

    Which is the best way to estimate the (population) parameter?

    1. Calculate the required sample size by defining Z-score (95%, 1-96), error (example 0, 03), and p (say .5 for maximum sample size) then estimate the sample statistic (example sample proportion). Then we say the calculated sample proportion is an unbiased estimator of the population proportion and 95% confidence the population proportion lies within plus or minus 0.03 (this value was used for calculating sample size) of the sample proportion. That is,

    p- 0.03=< P <= p + 0.03

    Or

    We take a small sample (not calculate sample size statistically, say 40) due to limitation but using sampling techniques (srs, cluster or ..) while selecting a sample, then calculate the sample proportion after that and its variance (using statistical techniques). Finally, we say population proportion-P lies between p + – Z SE(p). That is,

    p- Z[SE(p)] =< P <= p + Z [SE(p)]

    Please clarify it, when your time permits.

    Reply
  4. Charles,
    “Figure 1–Statistical power” helped me understand type II error for a 1-tail comparison. This calculation requires an alternative distribution with a mean of mu1.
    I know the type II error (beta) can be calculated for a 2-tail comparison, but I don’t understand what it means. Is the null distribution compared to two alternative distributions having means of mu1 and -mu1? I would really appreciate an illustrated example of finding beta for a 2-tail t-test analysis.

    Reply
  5. Thanks for this.

    I think for figure 2, the values of μ1 are from “61.8819676776998 to 64.4” rather than “μ1 ≥ 62.5”. I found this confusing until I realised this..

    Reply
  6. Hi Charles,

    I have a more philosipoical question.
    As I understand the “standardize effect size” concept, it takes the change you want to identify and normalized to a measure of ability to be detectable.
    I don’t understand why is it interesting?

    you wrote “Since it is standardized we can compare the effects across different studies with different variables” can you please give an interesting example?

    In your example you expect the length of the bolt to be 60mm , and maybe 60±1 is okay and more or less can’t be sold in the shops. or 60±1.66%

    So I’m interested what is the power of the test to identify bolts longer than 61 or shorter than 59. not what is the power of the test to identify cohen’s d=0.2

    maybe in your example, you changed machine and now you compare the effect on the average length to the effect on the average diameter???

    I hope it is okay I’m asking many questions.

    Thanks a lot,
    Oren

    Reply
    • Oren,
      1. The effect size is interesting since you want to quantify the effect (small, large, etc.). E.g. if you are measuring the effectiveness of a new drug for curing a type of cancer, you prefer a bigger effect size than that of the currently used drug or a placebo.
      When you do statistical analysis often you are looking to see whether an effect is statistically significant (using the p-value). This just means that the effect size is different from zero (or some other predesignated value), not whether you should care about the effect. Also as the sample size gets larger and larger it is very likely that you will see a significant result — even if the effect size is very small.
      2. You need to look at the literature in your field to see interesting examples of comparisons of effect sizes across different studies, but the example I gave above indicates what to look for. If the effect size of the currently used treatment over a placebo is .30 and your new treatment has an effect size of .70 over a placebo, this will be interesting.
      3. Bolt size (i.e. mean bolt size) can easily be mapped into an effect size
      Charles

      Reply
  7. Hello Charles,
    why is in the first formula, when you calculate lenght for alpha=0,05 used SEM and not SD? According normal distribution, the 95% of data are within mean +- 2 SD, so it should be =norm.inv(0.95,60,12).

    Reply
    • I am not exactly sure which is the first formula that you are referring to, but if it is the effect size formula, then Cohen’s d uses the standard deviation and not the standard error. d does not depend on the sample size.
      Charles

      Reply
        • Vendula,
          Yes, for this problem, the appropriate value for the standard deviation for a sample of size 110 is 1.44, which is the standard error for the sample. 12 represents the standard deviation of the population.
          Charles

          Reply
  8. Hi Charles,

    I’m interested in how you’d compute beta for observed values that aren’t greater than Xcrit.

    For example, if the observed value was 60.5 (less than Xcrit) would the beta be equal to NORMDIST(61.88, 60.5, 1.144, TRUE) = 0.886148, and the beta would be higher the smaller the number gets.

    Also, if you were doing a right tail test and the observed value was less than Xcrit, such as NORMDIST(58.12, 58, 1.144, TRUE) = 0.5412.

    Thanks

    Reply
  9. Hello,

    If I have a sample with a mean of 1000 and SEM (standar error) of 60 and other sample with a mean of 800 and SEM – 70, how would I calculate the statistical power between these two samples?

    Thank you

    Reply
  10. Hello Charles,

    I need assistance with how to plug in the numbers for the Statistical Power and Sample Size option. I will be running a logistic regression. I have all the data, but am unsure as to what I input.

    Any insight you have would be great! Thank you.

    Reply
    • Angela, sorry but the Statistical Power and Sample Size data analysis tool supports linear regression but does not yet support logistic regression.
      Charles

      Reply
  11. Hi Charles

    I am doing an evaluation research survey. Kindly tell me how to decide the sample size for rural and urban area, with formula for a study on immunization coverage with the previous coverage evaluation survey indicates a rural coverage percentage at 50 % and urban 68 %. Is it ok to do it with the formula n = 4 pq /L?

    Reply
    • I haven’t enough information to answer your question. Which statistical test are you using? What does pq/L abbreviate?
      Charles

      Reply
  12. I am just about conducting a survey in Ghana on the informal sector workers. The Ghanaian economy is about 84 % informal and over 14 million Ghanaians are currently working. How do I get the right sample size (using power sampling) for the whole country? Thanks.

    Reply
    • George,
      The sample size required depends on the type of statistical test that you are going to use. You need to identify the test that you will use (or that you are considering using) before you can estimate the sample size.
      Charles

      Reply
  13. How to amend formula when μ0 ˃ μ1 ? It looks to me as there will be no difference, which subtract from what, since from critical value point of view μ1+z*σ = μ0+z*σ. Thus, one should simply swap them.
    Do I understand correctly? I would be glad for help.
    Thank you in advance,
    Paul

    Reply

Leave a Comment