Poisson Distribution

Basic Concepts

Definition 1: The Poisson distribution has a probability distribution function (pdf) given by

Poisson distribution pdf

The parameter μ is often replaced by the symbol λ. A chart of the pdf of the Poisson distribution for λ = 3 is shown in Figure 1.

Figure 1 – Poisson Distribution

Observation: Some key statistical properties of the Poisson distribution are:

  • Mean = µ
  • Variance = µ
  • Skewness = 1 /\! \sqrt{\mu}
  • Kurtosis = 1/µ

Excel Function: Excel provides the following function for the Poisson distribution:

POISSON.DIST(x, μ, cum) = the probability density function value  for the Poisson distribution with mean μ if cum = FALSE, and the corresponding cumulative probability distribution value  if cum = TRUE.

Versions of Excel prior to 2010 do not support this function. Versions prior to Excel 2010 support the POISSON function, which is equivalent to POISSON.DIST.

Real Statistics Function: Excel doesn’t provide a worksheet function for the inverse of the Poisson distribution. Instead, you can use the following function provided by the Real Statistics Resource Pack.

POISSON_INV(p, μ) = smallest integer x such that POISSON(x, μ, TRUE) ≥ p

Note that the maximum value of x is 1,024,000,000. A value higher than this produces an error.

Poisson Process

If the average number of occurrences of a particular event in an hour (or some other unit of time) is μ and the arrival times are random without any tendency to bunch up (i.e. the assumptions for what is called a Poisson process) then the probability of x events occurring in an hour is given by

Poisson distribution pdf

Example 1: A large department store sells on average 100 MP3 players a week. Assuming that purchases are as described in the above observation, what is the probability that the store will have to turn away potential buyers before the end if they stock 120 players? How many MP3 players should the store stock in order to make sure that it has a 99% probability of being able to supply a week’s demand?

The probability that they will sell ≤ 120 MP3 players in a week is

POISSON(120, 100, TRUE) = 0.977331

Thus, the answer to the first problem is 1 – 0.977331 = 0.022669, or about 2.3%. We can answer the second question by using successive approximations until we arrive at the correct answer. E.g. we could try x = 130, which is higher than 120. The cumulative Poisson is 0.998293, which is too high. We then pick x = 125 (halfway between 120 and 130). This yields 0.993202, which is a little too high, and so we try 123. This yields 0.988756, which a little too low, and so we finally arrive at 124, which has cumulative Poisson distribution of 0.991226.

Alternatively, you can arrive at the same answer (124) by using the Real Statistics formula =POISSON_INV(0.99,100).

Confidence Intervals

The 1–α confidence interval for the mean based on x events occurring (in a unit of time) is given by

image074x

whereimage075x

For Excel 2007, χ2p,df  = CHIINV(1−p,df).

See Chi-square Distribution for more details about the CHISQ.INV and CHIINV functions.

Example 2: Suppose the number of radioactive particles that hits a screen per second follows a Poisson process and suppose that 5 hits occurred in one second, find the 95% confidence interval for the mean number of hits per second.

Figure 2 shows the confidence intervals for various values of x and α.

Poisson confidence interval

Figure 2 – Confidence intervals for the Poisson mean

The requested confidence interval is

1.623486 ≤ μ ≤ 11.66833

as calculated by the formulas in cells C9 and D9:

=CHISQ.INV(B9/2,2*A9)/2

=CHISQ.INV.RT(B9/2,2*(A9+1))/2

Note that CHISQ.INV(p,0) = #NUM! for any value of p, and so we cannot use this formula to calculate the lower bound when x = 0 (cell C4). In any case, this value is zero.

Relationship with Binomial and Normal Distributions

Property 1: If the probability p of success on a single trial approaches 0 while the number of trials n approaches infinity while the value of np stays fixed, then the binomial distribution B(n, p) approaches the Poisson distribution with mean μ = np.

Click here for the proof of this property.

Observation: Based on Property 1 the Poisson distribution can be used to estimate the binomial distribution when n ≥ 50 and p ≤ .01, preferably with np ≤ 5.

Example 3: A company produces high-precision bolts so that the probability of a defect is .05%. In a sample of 4,000 units, what is the probability of having more than 3 defects?

The probability is 14.3%. We obtain this probability by using the distribution B(4000, .0005), as follows:

1 – BINOMDIST(3, 4000, .0005, TRUE) = 1 – 0.857169 = 0.142831

We can also use the Poisson approximation as follows:

μ = np = 4000(.0005) = 2

1 – POISSON(3, 2, TRUE) = 1 – 0.857123 = 0.142877

As you can see the approximation is quite accurate.

Observation: The Poisson distribution can be approximated by the normal distribution, as shown in the following property.

Property 2: For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution with mean μ, then x ~ N(μ, μ), i.e. a normal distribution with mean μ and variance μ.

Test for a Poisson Distribution

The index of dispersion of a data set or distribution is the variance divided by the mean.

Since the mean and variance of a Poisson distribution are equal, data that conform to a Poisson distribution must have an index of dispersion approximately equal to 1. We can use this fact to test whether a data set has a Poisson distribution, as described in Goodness of Fit.

In fact in Goodness of Fit, we also show how to use the chi-square goodness-of-fit test to determine whether a data set follows a Poisson distribution.

Difference between Two Poisson Distributions

If x and y are two independent Poisson distributed random variables, then x – y has a Skellam distribution as described at Skellam Distribution.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Wikipedia (2012) Poisson distribution
https://en.wikipedia.org/wiki/Poisson_distribution

NIST (2012) Poisson distribution
https://www.itl.nist.gov/div898/handbook/eda/section3/eda366j.htm

83 thoughts on “Poisson Distribution”

  1. Given a set of datapoints, is it possible to approximate the values for a Poisson distribution equation in excel?

    Reply
    • Hello Axel,
      The graph specifies the data points. You can then determine the lambda parameter as the mean of the data points (as described in my previous response).
      Charles

      Reply
  2. for Poisson distributed samples, how do I compare two independent sample means for significance testing ?
    For example: (50 events in 500 seconds vs. 39 events in 300 seconds).

    How do i know if my duration is sufficient to count such rates ?
    Is there any option for this in excel or R.

    Reply
    • Hello Evan,
      I don’t know what you mean by win rates. I also don’t know what you mean by multivariate poisson numbers.
      Can you clarify your comment?
      Charles

      Reply
  3. How would I find a certain number of occurrences or more? (in excel)
    I know that to find for example 6 occurrences or less the function would be =POISSON.DIST (6,5.3,TRUE)
    *where the mean is 5.3

    Reply
  4. Hi Charles

    At the beginning of the article you mention kurtosis and Skew are key parameters of the Poisson Distribution. However, You never use Kurtosis or Skew anywhere in your excel example for calculating the cumulative probability distribution function.
    Is there anywhere you input skew and kurtosis to get the accurate cumulative probability function for the data set?

    Thank you

    Reply
  5. The number of hit can be addressed simultaneously on website is 35K and the average hit per min is 15K. the peak in last one year hit data is observed as 28K. Each hit is served in 5 sec on website. Need your assistance to calculate the number of maximum hit in next one year at any movement of time.

    Reply
  6. Hi Charles,

    I am an English teacher and I’ve been trying to come up with a model to calculate the probability of students getting a determined score on a test based on a series of scores they previously obtained. I work with TOEFL and IELTS preparation courses, so we have students take the same test format over and over again.

    Do you believe I would be able to use the poisson distribution to calculate that in Excel?

    Thank you in advance! Cheers!

    Reply
      • Hi Charles,

        Thanks for your reply!

        That’s a very good question. Actually, I have no idea. As a Language teacher, I am not very good at Maths, you know. However, I have been searching for a way to calculate the probability in that case and the Poisson Distribution seemed to be suitable. I saw an example on another blog where it was used to predict soccer game scores based on a set of results from previous matches, for example.

        Although I know that when it comes to test taking things are quite complex and not so predictable, since the practice tests students take are pretty much alike, I would like to use this model to show them that just by retaking the same test – without studying more before that – their chances won’t improve much. Students often overestimate their chances, what sometimes makes them neglect their studies.

        Anyway, long story short, maybe this model isn’t indeed suitable for that. For me, it was almost just like guessing that it would be. Thanks for your attention and help!

        Reply
        • Hello Fabio,
          Which of the following are you trying to study?
          1. the student takes the same exact test many times without studying between?
          2. the student takes the same test twice without studying between?
          3. as for #1 except that it is not the same exact test but a standard test that is supposed to measure the same thing (such as the SAT)?
          4. as for #2 except that it is not the same exact test but a standard test that is supposed to measure the same thing (such as the SAT)?
          Charles

          Reply
          • Hi Charles,

            Alternatives 3 e 4 would best represent the scenarios I usually deal with (especially the 3rd one). The tests are not exactly the same in terms of content; however, the test structure and exercise types are very similar. I often work with TOEFL and IELTS exams.

            Many thanks!

  7. hi Charles,
    In the finance community, an audit sampling method (monetary unit method) is using a confidence factor published by the American Institute of Certified Public Accountants. For example, for confidence level of 63% wherein the ratio of the expected misstatement to its tolerable misstatement is 50%, the confidence factor is 2.32. Similarly for a confidence level of 50%, the confidence factor is 1.37. How is this derived? The AICPA has disclosed that these statistical tables are based on poisson distribution.
    Shirley

    Reply
  8. Hi Charles,
    Good Day!
    I have calculated the Poisson inverse cumulative from my data using the formula
    1-Poisson(x,u,True). Is this the same as the inverse cumulative frequency distribution of the data?
    I have read some papers that the result is similar. But I have not found any way to solve or calculate the inverse cumulative frequency distribution of the data.

    Thank you.

    Reply
    • Hello,
      The POISSON_INV function given on this webpage calculates the inverse cumulative frequency distribution of the data.
      I don’t think that 1-Poisson(x,u,True) gives equivalent results.
      Charles

      Reply
  9. I want to determine poisson distribution with kolmogorov smirnov with excell and R. How to calculate manual of poisson distribution in excell by using kolmogorov-smirnov? And if i use software R : ks.test(data,ecdf(data)). Is it true?

    Reply
  10. A company publishes statistics concerning car quality. The initial quality score measures the number of problems per new car sold. For one​ year, Car A had 1.451.45 problems per car. Let the random variable X be equal to the number of problems with a newly purchased model A car. Complete​ (a) and​ (b) below.

    A) If you purchased a model A​ car, what is the probability that the new car will have zero​ problems?
    B) If you purchased a model A​ car, what is the probability that the new car more than 3 or fewer problems?

    Reply
    • Hasten,
      Assuming that the assumptions for the Poisson distribution are satisfied, these sorts of problems are described on the referenced webpage. Here your mean is 1451.45.
      Charles

      Reply
  11. Hi Charles

    Need help Poisson Distributiin.

    Recently my team assignment selected Hotel beds unoccupied with 50 counts and our expected number=mean= lambda. However, my lambda = 34.36 that unable fit me to do the calculation.

    My lecturer told to reduce lambda. My I know how to reduce the lambda for average of 34.36 for beds.

    I found Lambda only able calculate not more than 10.

    I hope to heard from you.

    Thank you

    Reply
      • Hi Charles, by using excel to calculate the probability. Wat is =1-poisson.dist(X,Mean,True/False) VS =poisson.dist(,Mean,True/False)? When to use 1-Poisson?

        Reply
        • Romanee,
          If F(a) = the probably that x <= a, then 1 - F(a) = the probability that x > a.
          Thus, =poisson.dist(X,Mean,True) is the probability that the Poisson distribution is less than or equal to X and
          =1-poisson.dist(X,Mean,True) is the probability that the Poisson distribution is greater than X.
          Note that this is not true if True is replaced by False.
          Charles

          Reply
  12. Hi Charles,

    If my question is
    A sample of 32 hotels show that the mean of numbers of guests is 55, and I would like to find a P(100<=x<=150), is it correct my formula is =POISSON(150,55,TRUE)-POISSON(99,55,TRUE) ?

    Hope you can assist me in this matter.
    Thanks a lot.

    Reply
      • Do I need to check whether the data is poisson distribution?
        Or I need to testing the normal distribution using QQ plot…and etc?

        Thanks Charles.

        Reply
        • Lena,
          1. If you know that the data should follow a Poisson distribution on theoretical grounds (e.g. there is a Poisson process), then you should be ok. One quick check to see whether data follows a Poisson process is to see whether the mean is roughly equal to the variance (as described on the website).
          2. If you believe the data follows a Poisson distribution, then there is no reason to test for a normal distribution
          Charles

          Reply
          • 1. Does it possible that the data set simultaneously follow a Poisson distribution and normal distribution?

            2. If it follow a Poisson distribution, how to I find the P(X=O) ? When mean= 0.36667, SD=0.4901.

            Hope you can assist me on this, thanks Charles.

    • As Poisson distribution is approximately normal, so can I use normal distribution as well in calculate the probability of discrete random variable?

      Reply
      • Lena,
        For n sufficiently large (usually n ≥ 20), if x has a Poisson distribution with mean μ, then x has an approximately normal distribution with mean μ and variance μ.
        If n is sufficiently large, then, yes, you can use the normal distribution.
        Charles

        Reply
  13. Dear Charles,
    You state on this page that “The index of dispersion of a data set or distribution is the mean divided by the variance.” Isn’t the definition of the index of dispersion “the variance divided by the mean”?
    Eugene.

    Reply
    • Eugene,
      Yes, you are correct. I have just changed the webpage to correct this error.
      Thanks for your help in making the website more accurate.
      Charles

      Reply
  14. Hi Charles,

    Please help me with this problem as i do not understand how the standard deviation affects the processing time.
    An analyst in the statistics office of a government department in Wellington
    requests for reports. On average the office receives 19 requests per (40 hour) week, with the arrival pattern resembling a Poisson process. Once working on a request the processing time averages 2.0 hours with a standard deviation of 1.0 hours. After the request is processed the report is automatically emailed to the requester.

    (a) What is the utilisation of the analyst?
    (b) What is the average time a request waits before processing begins (Tq)?

    Are my answers correct for (a) 0.2375 and (b) 0.6557?

    Reply
  15. I came across an article on the number of accidents per # man hours worked.
    The article did not detail how the adjusted accident was derived. Neither did it specify how the UCL, Avg and LCL was derived based on the adjusted accident.
    Need help to determine adjusted accident, UCL, Avg and LCL.

    # man hours Accidents Adjusted accident
    287287 4 3.843
    270828 8 8.228
    298531 4 3.630
    271726 4 4.149
    280771 7 6.961
    282547 9 8.905
    226415 18 20.685
    226410 2 2.917
    253482 2 2.429
    270376 4 4.176
    283216 15 14.843
    278348 5 5.018
    296736 5 4.633
    308166 19 17.731
    298507 9 8.466
    295858 5 4.651
    307692 5 4.414
    273973 4 4.104
    293849 4 3.718

    Analysis was performed using adjusted accident with UCL=16.794, Avg=7.000, LCL=1.206

    Reply
  16. Hy Charles..
    I have to make an assignment on “fitting a poison distribution to observed data”
    So tell me how many numbres of data will be suitable. ???
    I shall be
    very thankful to you.

    Reply
    • Zarish,
      Sorry, but I don’t know how many numbers is suitable. In fact, first you need to define what “suitable” means.
      Perhaps someone else has some insight into this issue.
      Charles

      Reply
  17. Hi.

    How can I calculate this:

    A fire brigade A is on average called out 5 times a day. 1 call takes on 1hour for the brigade. a) What is the probability that the men can rest at least 2 hours between the calls. b) What is the probability that the brigade B has to be sent out because the brigade A is still on its duty.

    The answes should be a) 0.535 and b) 0.188

    I would be really glad if you could help me…

    Reply
    • Sorry Anti, but this looks like a school assignment and I don’t want to answer such questions. I am happy to clarify concepts, but not provide such specific answers.
      Charles

      Reply
  18. Hi Charles,

    I am doing a project on diagnostic errors in medicine. I want to show the number of errors per month over a 6 month period. Overall, for 200 events I have about 150 with errors and 50 without errors. How do I use poisson distribution in this case? I do know in each month how many errors occur. Do we need this data? I am using Excel, so what will be the formula?

    Reply
    • Mayukh,
      You know that the probability of an error is 3/4. Now you need to know how many events occur in a month, and from this it is easy to calculate the average number of errors per month. If however you want to know the probability that you will have say m errors in the month (instead of the average number), then you can use the approach described on the referenced webpage.
      Charles

      Reply
  19. If a cumulative probability distribution function is the probability that a random variable will have a value less than or equal to X, what would be the function name for the opposite(he probability that a random variable will have a value grater than or equal to X, or even just greater than X)?

    Reply
  20. Say a user can check out a maximum of 2000 licenses at a time. Multiple users can check out one or more licenses each hour for a product but not more than 2000. The product owner has log that shows data for each hour in the year (365 days). For each hour, the log contains the highest number (quantity) of licenses checked out in that hour. Using this log of every hour over the past year, how would you determine for the next year if 2000 licenses is enough based on this data using a Poisson distribution function in excel? What would be the average be? Note that in many cases, the licenses checked out one hour is the same licenses check out by the same person in a previous hour(the license continues to be check out by the same user over multiple hours or days). Do you believe this is a Poisson Distribution? Also, how would you calculate the mean?

    Reply
    • Able,

      Please explain the premise more clearly. If a user checks out 2,000 in one hour, can he/she check out another 2,000 licenses in the next hour? (i.e. are these like checking out books in a library?) If you have say 10 users can they each check out 2,000 licenses in an hour (for a total of 20,000 licenses) or are they restricted to 2,000 licenses in aggregate?

      What criterion are you using for determining whether “2000 licenses is enough based on this data using a Poisson distribution function in excel?” Do you mean that (1) in no hour is more than 2,000 licenses checked out or (2) the probability of more than 2,000 licenses being checked out is zero or (3) something else?

      Charles

      Reply
      • Charles,
        In response to you comments:

        1)If a user checks out 2000 licences one hour, in the next hour either their will be 0 licenses checkout out or the same 2000 licenses that were checked out in the previous hour(s). It all depends if a users software job has ended or not and the user does not require any more licenses.

        2)If you have 10 users they are they restricted to 2,000 licenses in aggregate per hour.

        3)I am looking for probability of more than 2,000 licenses being checked out is zero. I am also looking for the lowest license count where the probability of more than x number licenses being in use is 0.

        Reply
        • It appears that you are assuming an “arrival time” (i.e. checking out of licenses) that follows a Poisson distribution, but you haven’t said anything about the “service time” (i.e. for how long the license is checked out). If we assume, for example that the arrival time follows a Poisson distribution with mean m and a service time whose average is say n, then we have a typical queueing model.

          Is this is what you are interested in?

          Charles

          Reply
          • Charles,

            I do not have information regarding the service time, just the arrival time. Can Poisson or perhaps another statistical method be used to determine the probability of more than 2,000 licenses being checked out is zero with just the arrival time information?

  21. say i have a sign up rate of about 2 per day
    and each signup lasts for exactly 30 days
    what is the probability that I will have at least a hundred people signed up at once?

    Reply
    • Assuming that you are looking for the probability that at least 100 people are signed up in any specific 30 day period, then the result should be =1-POISSON.DIST(99,60,TRUE), which has a value of 1.48E-06, a very small number.
      Charles

      Reply
  22. Hi.May I ask a question?
    When n (from 5 to 10 and 20)increases what happens on the probability distribution graph?(binomial, poisson and normal)

    Reply
    • It really depends on what happens with the other parameters.

      Binomial: If the other parameters in the BINOMDIST function are held constant then the cumulative distribution values decrease (e.g. compare BINOMDIST(4,n,.7,TRUE) for n = 5, 10, 20.

      Poisson: If you assume that the mean of the distribution = np, then the cumulative distribution values decrease (e.g. compare POISSON(2,np,TRUE) where p = .5 for n = 5, 10, 20.

      Normal: It really depends on how you are going to use n since NORMDIST doesn’t directly use n.

      Charles

      Reply
  23. Hi Charles,

    Many thanks for the explanation, i’m trying to find a way to apply this to sports betting using Excel, i’ve managed to locate a Poisson template and i’m wondering if there are any know ways of viewing football prediction results

    Reply
    • I don’t know any way of applying this to sports betting, but perhaps someone else in the community can help.
      Charles

      Reply
  24. Hi Charles,

    I wonder if it would be correct to use poisson distribution to determine safety stock for a product when the product has the following demand pattern for the last 12 months:

    3 580
    0
    0
    1 135
    1 000
    363
    2 175
    72
    620
    92
    228
    373

    The reason behind this pattern is that one of the customers is buying given product in batches.
    Thank you in advance.

    Reply
  25. Excellent resource!

    I have used it to sucessfully simulate a poisson process (counts of a radioactive material) and has helped me to make a worksheet to obtain detection tresholds and minimun detectable activity (they are dependant of sigma) for the contaminations sensors i use.

    Keep the good work!

    Reply
  26. The average number of signals sent from a station is 3 per day which do not reach properly to the another station . Find the probability that the signals which are sent in a day but not reached properly is at least 3
    Please help me to solve this problem ….thanks in advance

    Reply

Leave a Comment