Proportion Distribution and Testing

If x is a random variable with binomial distribution B(n, p) then the random variable y = x/n is said to have a proportion distribution.

Topics

References

Saylor Academy (2012) The sample proportion. Introductory Statistics
https://saylordotorg.github.io/text_introductory-statistics/s10-03-the-sample-proportion.html

Wikipedia (2012) Population proportion
https://en.wikipedia.org/wiki/Population_proportion

20 thoughts on “Proportion Distribution and Testing”

  1. Hi
    what is the formula for Standard deviation over 1100?in Excel, please

    Waiting for your answer with many thanks

    Regards
    Luma

    Reply
  2. Can you please try and solve this .the department conducted a study a number of years ago that showed that the proportion of cars tested which failed to meet the state pollution standard was 37%. The department would like to be able to say that the cars have improved since then. In a sample of 100 cars more recently,the proportion not meeting the standard was 28%.are the cars better at meeting the standards than they used to be ? Clealy state the null and alternative hypothesis.perfom on a 99% level of confidence

    Reply
  3. The common proportion pi is calculated with denominator 200 = n_1 + n_2. But as Sun Kim pointed out above, in this example, I think the denominator for the standard deviation estimate should be 100, not 200, because the estimate of the common variance (when pi_1 = pi_2 = pi) is:

    pi*(1 – pi)*(1/n_1 + 1/n_2) (and not pi*(1 – pi)/(n_1 + n_2))

    and when n_2 = n_1 = n, we have

    2*pi*(1 – pi)/n

    Am I wrong?

    Reply
    • Hi Hakan,
      You are correct. Thanks for catching this error. I have now corrected the error on the webpage.
      Apologies too to Sun Kim since I overlooked her comment.
      Charles

      Reply
  4. The number of credit card holders of a bank in two different cities (city – X and city – Y) settling their excess withdrawal amounts in time without attracting interest follows binomial distribution. The manager (collections) of the bank feels that the proportion of the number of such credit card holders in the city – X is not different from the proportion of the number of such credit card holders in the city – Y. to test his intuition, a sample of 200 credit card holders is taken from the city – X and it is found that 160 of them are settling their excess withdrawal amount in – time without attracting interest. Similarly, a sample of 180 credit card holders is taken from the city – Y and it is found that 50 of them are settling their excess withdrawal amount in – time without attracting interest, check the intuition of the sales manager at a significance level of 0.05.

    Using the binomial distribution model to determine whether the bank manager institution is true is shown below.
    H0: π1 = π2

    City X = 200 credit card holders, 160 customers excess withdrawal in time
    City Y = 180 credit card holders, 50 customers excess withdrawal in time

    Level = 0.05

    City X 80% customers settles their excess withdrawal in time (160/200 = 0.8)
    City X 30% customers settles their excess withdrawal in time (50/180 = 0.3)
    we estimate its value from the sample, namely, 160 + 50 = 210 successes out of 380, i.e. π = 0.55

    The observed value of x – y is .80 – .30 =.50, and so we have (two-tail test):

    P-value = 0.036 < 0.05

    Reply
  5. how do I tackle this case study

    Question for CASE STUDY 1
    The number of credit card holders of a bank in two different cities (city – X and city – Y) settling their excess withdrawal amounts in time without attracting interest follows binomial distribution. The manager (collections) of the bank feels that the proportion of the number of such credit card holders in the city – X is not different from the proportion of the number of such credit card holders in the city – Y. to test his intuition, a sample of 200 credit card holders is taken from the city – X and it is found that 160 of them are settling their excess withdrawal amount in – time without attracting interest. Similarly a sample of 180 credit card holders is taken from the city – Y and it is found that 50 of them are settling their excess withdrawal amount in – time without attracting interest, check the intuition of the sales manager at a significance level of 0.05.

    Reply
  6. Charles,
    Would you review the standard deviation calculation for Example 4? As the sample size is the same, the denominator should have been 100 instead of 200 to have the correct sample SD.

    -Sun

    Reply
    • Hi Sun,
      Apologies for overlooking your comment from a long time ago. Hakan just brought up the same issue.
      You are correct that the denominator should be 100. I have belatedly corrected this error. Thanks for bringing this to my attention and sorry that I didn’t see it earlier.
      Charles

      Reply
  7. Hi Charles,

    I have a question concerning the two sample hypothesis test.
    I want to conduct a power analysis in order to determine the sample size to compare two differents proportions. I think it would be difficult for me to respect the condition of [ ni πi ≥ 5 and ni (1 –πi) ≥ 5 ] since the πi is pretty close to 0 (in the order of 0.00008) .

    By serching a little bit, I found that we could use Fisher’s exact test, but i dont know how to conduct a power analysis for this test. Could you recommend a book or a reference that explain the calculation of the power analysis for different test.

    Best Regards,

    Khibox

    Reply
  8. Sir,
    It is fair to say, formally, that each of the 600 people asked : X1, X2,..,X600 ,is a proportionally-distributed random variable with mean p and variance p(1-p)/n so that
    by the CLT:

    X= (X1+X2+…..+X600)/600

    Is normally distributed with mean equal to the pop. mean and variance equal to the pop. variance?
    Thanks.

    Reply
  9. Hi Charles,

    I have a question about Example 2.

    To find the confidence interval at 95% I used the Excel equation =CONFIDENCE(0.025, 0.01505, 1100) and got the value 0.00102.

    I’m not sure why this formula is incorrect and doesn’t return the same value as your calculation since they should both be equivalent and I’m not sure how I set up the CONFIDENCE equation incorrectly.

    Reply
    • Jonathan,
      I see two problems with the formula =CONFIDENCE(0.025, 0.01505, 1100)
      1. You need to use .05 instead of .05/2 = .025
      2. This formula uses the standard deviation and not the standard error as the second argument. The standard error is then calculated from the standard deviation and the sample size (the third argument).
      Charles

      Reply

Leave a Comment