Normal Approximation to Binomial Distribution

We now show how the binomial distribution is related to the normal distribution.

Properties

Property 1: If x is a random variable with distribution B(n, p), then for sufficiently large n, the following random variable has a standard normal distribution:

image522

where
image523

ProofClick here for a proof of Property 1, which requires knowledge of calculus.

Corollary 1: Provided n is large enough, N(μ,σ2) is a good approximation for B(n, p) where μ = np and σ2 = np (1 – p).

Observation: We generally consider the normal distribution to be a pretty good approximation for the binomial distribution when np ≥ 5 and n(1 – p) ≥ 5.  For values of p close to .5, the number 5 on the right side of these inequalities may be reduced somewhat. For more extreme values of p (especially for p < .1 or p > .9) the value 5 may need to be increased.

Example

Example 1: What is the normal distribution approximation for the binomial distribution where n = 20 and p = .25 (i.e. the binomial distribution displayed in Figure 1 of Binomial Distribution)?

As in Corollary 1, define the following parameters:

image528

image529

image530

Since np = 5 ≥ 5 and n(1 – p) = 15 ≥ 5, based on Corollary 1 we can conclude that B(20, .25) ~ N(5, 3.75).

Graphical comparison

We now show the graph of both pdf’s to see visibly how close these distributions are to each other:

Binomial normal distribution chart

Figure 1 – Binomial vs. normal distribution

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2012) Binomial distribution
https://en.wikipedia.org/wiki/Binomial_distribution

Bass, R. F. et al. (2020) Normal approximation to the binomial.
Chapter 9 of Upper level undergraduate probability with actuarial and financial applications
https://probability.oer.math.uconn.edu/wp-content/uploads/sites/2187/2018/01/prob3160ch9.pdf

28 thoughts on “Normal Approximation to Binomial Distribution”

  1. Hi Charles,
    if np=5, we still can use normal distribution to do the question right?
    For example, this one:
    The sampling of 25 pigs has a Binomial distribution with 20% of them are Black where 𝑝 = 20%, show that it is appropriate for us to use Normal approximation. And can you help me to calculate 1) the probability that at least 5 of them are black. 2)the probability that the number of black pigs are more than 3 and less than or equal to 7.

    Thanks!

    Reply
  2. Hello Charles,

    I am wondering if Tennis match can be investigated through binomial, normal or poison distribution. I am trying to investigate statistics in Tennis, but not really have an idea if it would work, and by which method i should be using. Thank you!

    Reply
  3. Dear Sir,

    Can you please let me know how to graph the both the pdf’s of binomial vs normal distributions? I am not able to graph on excel. Please help Sir.

    With Best Regards,

    Sahil Goyal

    Reply
    • Sorry, but I am not sure that I understand your question. For the binomial distribution, you need to use positive integer values and so I don’t know what you mean by p(-1.9 Reply

        • Hello Kasun,
          I assume that you are referring to some example that is not found on this webpage. Since np < 5, the normal distribution approximation is suspect, but since 4.2 is close to 5, the approximation may still be useful. Charles

          Reply
  4. Let we approximate a discrete distribution by standard normal distribution, and we don’t use continuity correction factor. Let X be a random variable with discrete distribution, and Y be a random variable with standard normal distribution. Can we say that the pr[X>=x] is always greater than or equal to its approximated probability by standard normal distribution? (since we have nor used continuity correction factor)?

    Reply
  5. Hi Charls,
    I have to estimate the sale according the above situation. the salesman plan to visit 60 homes. in 80 percent of the time, there is somebody at home. When someone is home 60 percent of the time it is female,20 percent of the female make a purchase with mean of $22 and SD of $5. male purchase 10 percent of the time with the mean of $40 and SD of $10. the question is what is the total amount of revenue from those 60 visits.

    Thanks for your consideration.

    Reply
    • Fara,

      The exact amount of revenues from the 60 visits will vary, but you can calculate the expected (i.e. average) revenues.

      There are 60 homes and 80% of the time someone is at home, i.e. 60*.8 = 48 visits have potential for sales. Since 60% of these cases, a female is at home, you expect that in 48*.6 = 28.8 visits a female is at home. Since 20% of these visits result in a sale, 28.8*.2 = 5.76 of these visits result in a sale. Since the mean purchase is $22, we expect 5.76*22 = $126.72 in sales. The range of values for this revenue depends on the standard deviation, but the expected value doesn’t depend on the standard deviation.

      Now in 48 – 28.8 = 19.2 visits a male (and not a female) is at home. Since 10% of these result in a sale, 19.2*.1 = 1.92 of these visits result in a sale. Since the expected value of each such sale is $40, we expect 1.92*40 = $76.80 in sales. If I didn’t make an arithmetic mistake, then the total expected revenue is therefore $126.72+76.80 = $203.52.

      If you want the range of possible revenue and not just the mean, then you would need to take the standard deviations into account.

      Charles

      Reply
  6. Hi Charles,
    if we have to find purchase based on bellow assumption:
    plan to visit 60 homes with 80 percent of success, 60 percent of the time are females, among females 20 percent are going to purchase with mean of $22 and SD of $5, 40 percent men and among men 10 percent are going to buy with mean of $40 and SD of $10.
    Thank you

    Reply
  7. Thank you for the clear explanations!
    I was wondering if there is a standard (peer reviewed?) reference for the observation that the normal distribution is a good approximation for the binomial distribution when n > 10 and .4 < p 30 and .1 < p < .9. That would be very helpful!

    Reply
    • Sonya,
      I was not able to find the reference to this, but I have now checked the statement against real data. When n > 10 and .4 < p < .6, the approximation is pretty good. However, for p near .1 or .9, the approximations weren't that good for n near 30. I have now changed the wording on the referenced webpage. Charles

      Reply

Leave a Comment