If x is a random variable with binomial distribution B(n, p) then the random variable y = x/n is said to have a proportion distribution.
Topics
- Basic concepts
- One-sample hypothesis testing
- Two-sample hypothesis testing
- Cohen’s h effect size
- Confidence intervals
- Real Statistics support
References
Saylor Academy (2012) The sample proportion. Introductory Statistics
https://saylordotorg.github.io/text_introductory-statistics/s10-03-the-sample-proportion.html
Wikipedia (2012) Population proportion
https://en.wikipedia.org/wiki/Population_proportion
Hi
what is the formula for Standard deviation over 1100?in Excel, please
Waiting for your answer with many thanks
Regards
Luma
Sorry, but I don’t understand your question.
Charles
Can you please try and solve this .the department conducted a study a number of years ago that showed that the proportion of cars tested which failed to meet the state pollution standard was 37%. The department would like to be able to say that the cars have improved since then. In a sample of 100 cars more recently,the proportion not meeting the standard was 28%.are the cars better at meeting the standards than they used to be ? Clealy state the null and alternative hypothesis.perfom on a 99% level of confidence
Hello Nikita,
What do you think the null and alternative hypotheses should be?
What test should you use?
Charles
The common proportion pi is calculated with denominator 200 = n_1 + n_2. But as Sun Kim pointed out above, in this example, I think the denominator for the standard deviation estimate should be 100, not 200, because the estimate of the common variance (when pi_1 = pi_2 = pi) is:
pi*(1 – pi)*(1/n_1 + 1/n_2) (and not pi*(1 – pi)/(n_1 + n_2))
and when n_2 = n_1 = n, we have
2*pi*(1 – pi)/n
Am I wrong?
Hi Hakan,
You are correct. Thanks for catching this error. I have now corrected the error on the webpage.
Apologies too to Sun Kim since I overlooked her comment.
Charles
The number of credit card holders of a bank in two different cities (city – X and city – Y) settling their excess withdrawal amounts in time without attracting interest follows binomial distribution. The manager (collections) of the bank feels that the proportion of the number of such credit card holders in the city – X is not different from the proportion of the number of such credit card holders in the city – Y. to test his intuition, a sample of 200 credit card holders is taken from the city – X and it is found that 160 of them are settling their excess withdrawal amount in – time without attracting interest. Similarly, a sample of 180 credit card holders is taken from the city – Y and it is found that 50 of them are settling their excess withdrawal amount in – time without attracting interest, check the intuition of the sales manager at a significance level of 0.05.
Using the binomial distribution model to determine whether the bank manager institution is true is shown below.
H0: π1 = π2
City X = 200 credit card holders, 160 customers excess withdrawal in time
City Y = 180 credit card holders, 50 customers excess withdrawal in time
Level = 0.05
City X 80% customers settles their excess withdrawal in time (160/200 = 0.8)
City X 30% customers settles their excess withdrawal in time (50/180 = 0.3)
we estimate its value from the sample, namely, 160 + 50 = 210 successes out of 380, i.e. π = 0.55
The observed value of x – y is .80 – .30 =.50, and so we have (two-tail test):
P-value = 0.036 < 0.05
What is your question?
Charles
how do I tackle this case study
Question for CASE STUDY 1
The number of credit card holders of a bank in two different cities (city – X and city – Y) settling their excess withdrawal amounts in time without attracting interest follows binomial distribution. The manager (collections) of the bank feels that the proportion of the number of such credit card holders in the city – X is not different from the proportion of the number of such credit card holders in the city – Y. to test his intuition, a sample of 200 credit card holders is taken from the city – X and it is found that 160 of them are settling their excess withdrawal amount in – time without attracting interest. Similarly a sample of 180 credit card holders is taken from the city – Y and it is found that 50 of them are settling their excess withdrawal amount in – time without attracting interest, check the intuition of the sales manager at a significance level of 0.05.
The problem is similar to Example 4 on this webpage.
Charles
Charles,
Would you review the standard deviation calculation for Example 4? As the sample size is the same, the denominator should have been 100 instead of 200 to have the correct sample SD.
-Sun
Hi Sun,
Apologies for overlooking your comment from a long time ago. Hakan just brought up the same issue.
You are correct that the denominator should be 100. I have belatedly corrected this error. Thanks for bringing this to my attention and sorry that I didn’t see it earlier.
Charles
This a two sample hypothesis test. One approach is to use the approach described for Example 4 on this webpage.
Charles
Hi Charles,
I have a question concerning the two sample hypothesis test.
I want to conduct a power analysis in order to determine the sample size to compare two differents proportions. I think it would be difficult for me to respect the condition of [ ni πi ≥ 5 and ni (1 –πi) ≥ 5 ] since the πi is pretty close to 0 (in the order of 0.00008) .
By serching a little bit, I found that we could use Fisher’s exact test, but i dont know how to conduct a power analysis for this test. Could you recommend a book or a reference that explain the calculation of the power analysis for different test.
Best Regards,
Khibox
Khibox,
One approach is to estimate the power using the chi-square test. I show how to estimate the effect size using this approach on the website. This approach is described at
http://www.biostathandbook.com/fishers.html
Another approach is to use Monte Carlo simulation. This described at
https://stats.stackexchange.com/questions/133441/computing-the-power-of-fishers-exact-test-in-r
https://stats.stackexchange.com/questions/35940/simulation-of-logistic-regression-power-analysis-designed-experiments/35994#35994
Charles
Hi Charles,
Thank you for the quick reply. I will take a look at these references.
I appreciate your work and effort in this amazing website.
Best Regards,
Khibox
Sir,
It is fair to say, formally, that each of the 600 people asked : X1, X2,..,X600 ,is a proportionally-distributed random variable with mean p and variance p(1-p)/n so that
by the CLT:
X= (X1+X2+…..+X600)/600
Is normally distributed with mean equal to the pop. mean and variance equal to the pop. variance?
Thanks.
Guero,
Sorry but I don’t understand what X1, X2,..,X600 and p are.
Charles
Hi Charles,
I have a question about Example 2.
To find the confidence interval at 95% I used the Excel equation =CONFIDENCE(0.025, 0.01505, 1100) and got the value 0.00102.
I’m not sure why this formula is incorrect and doesn’t return the same value as your calculation since they should both be equivalent and I’m not sure how I set up the CONFIDENCE equation incorrectly.
Jonathan,
I see two problems with the formula =CONFIDENCE(0.025, 0.01505, 1100)
1. You need to use .05 instead of .05/2 = .025
2. This formula uses the standard deviation and not the standard error as the second argument. The standard error is then calculated from the standard deviation and the sample size (the third argument).
Charles