One-Sample t-Test

Assumptions

The t distribution provides a good way to perform one-sample tests on the mean when the population variance is not known provided the population is normal or the sample is sufficiently large so that the Central Limit Theorem applies (see Properties 1 and 2 of Basic Concepts of t Distribution).

It turns out that the one-sample t-test is quite robust to moderate violations of normality. In particular, the test provides good results even when the population is not normal or the sample size is small, provided that the sample is reasonably symmetrically distributed about the sample mean. This can be determined by graphing the data. The following are indications of symmetry:

  • The boxplot is relatively symmetrical; i.e. the median is in the center of the box and the whiskers extend equally in each direction
  • The histogram looks symmetrical
  • The mean is approximately equal to the median
  • The coefficient of skewness is relatively small

The impact of non-normality is less for a two-tailed test than for a one-tailed test and for higher alpha values than for lower alpha values.

The other assumption for the t-test is that we have a random sample. If, for example, we are interested in the mean cholesterol level of a population, then our sample must consist of the cholesterol levels of people chosen at random.  We can’t use the t-test for a sample consisting of cholesterol levels for the same person at different points in time.

One-tailed t-test

Example 1: A weight reduction program claims to be effective in treating obesity. To test this claim 12 people were put on the program and the number of pounds of weight gain/loss was recorded for each person after two years, as shown in columns A and B of Figure 1. Can we conclude that the program is effective?

One sample t test

Figure 1 – One sample t-test

A negative value in column B indicates that the subject gained weight. We judge the program to be effective if there is some weight loss at the 95% significance level. Usually, we conduct a two-tailed test since there is a risk that the program might actually result in weight gain rather than loss, but for this example, we will conduct a one-tailed test (perhaps because we have evidence, e.g. from an earlier study, that overall weight gain is unlikely). Thus our null hypothesis is:

H0: μ ≤ 0; i.e. the program is not effective

Testing Assumptions

From the box plot in Figure 2, we see that the data is quite symmetric and so we use the t-test even though the sample is small.

Box plot symmetry test

Figure 2 – Box plot for sample data

Test Results

Column E of Figure 1 contains all the formulas required to carry out the t-test. Since Excel only displays the values of these formulas, we show each of the formulas (in text format) in column G so that you can see how the calculations are performed.

We see that n = 12, = 4.67, s = 11.15 and s.e. = s\sqrt{n} = 3.22. We now conclude that

with df = 11 degrees of freedom.

Since p-value = T.DIST.RT(t, df) = T.DIST.RT(1.45, 11) = .088 > .05 = α, the null hypothesis is not rejected. This means there is an 8.8% probability of achieving a value for t this high assuming that the null hypothesis is true, and since 8.8% > 5% we can’t reject the null hypothesis.

The same conclusion is reached since

tcrit = T.INV(1-α, df) = T.INV(.95, 11) = 1.80 > 1.45 = tobs

Note that if we had used the normal distribution for the hypothesis testing as described in Sampling Distributions we would have gotten the following result:

NORM.DIST(x, μ, σ, TRUE) = NORM.DIST(4.67, 0, 11.15, TRUE) = .926 < .95 = 1 – α

This would again show that the null hypothesis can’t be rejected. We see that the probability of being in the critical range is .074 compared to .088 in the t-distribution case. In fact, the large sample test (via the normal distribution) is not as accurate as the small sample t-test.

Two-tailed t-test

Example 2: A school board wanted to see if reading test scores have changed in the past 30 years by testing a random sample of 40 students to see whether there is a significant change from the average score of 78 thirty years ago. The scores in the sample are shown in Figure 3.

Figure 3 – Random sample data

Based on this data, can we claim that the reading scores have changed in the past 30 years?

We will test the two-sided null hypothesis:

H0: µ = 78

Normality Assumption

From Excel’s Histogram data analysis tool, we see that the data is reasonably symmetric. This is confirmed by Excel’s Descriptive Statistics data analysis tool since the mean and median are approximately equal and the skewness is close to zero (see Figure 4). This justifies the use of a  t-test.

Symmetry testing

Figure 4 – Testing for symmetry

Actually, we can use the QQ-Plot and Shapiro-Wilk options of the Real Statistics Descriptive Statistics and Normality data analysis tool, as shown in Figure 5, to demonstrate directly that the data satisfy the normality assumption.

Figure 5 – Testing for normality

Data Analysis Tool

This time we will use a Real Statistics data analysis tool to perform the analysis.

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides a data analysis tool called T Tests and Non-parametric Equivalents, which provides access to the t-test for one sample, two independent samples, and paired samples, as well as the non-parametric equivalent tests (Mann-Whitney and Wilcoxon Signed-Ranks tests).

For Example 2, enter Ctrl-m and select T Tests and Non-parametric Equivalents from the menu (or from the Misc tab when using the Multipage interface). A dialog box will appear as displayed in Figure 6.

T-Test dialog box

Figure 6 – Dialog box for T Tests and Non-parametric Equivalents

Enter A5:D14 in the Input Range and 78 for the Hypothetical Mean/Median, unclick Column headings included with data, and choose the One sample and T test options. When you click on the OK button, the output shown in Figure 7 is displayed.

t test one sample

Figure 7 – Real Statistics one-sample t-test

Analysis

From the figure, we see that

image687

with n – 1 = 39 degrees of freedom.

Thus, p-value = T.DIST.2T(ABS(t), df) = T.DIST.2T(3.66, 39) = .00074 < .05 = α, and so we reject the null hypothesis and conclude there is a significant change (i.e. reduction) in the test scores.

Alternatively we can calculate, tcrit = T.INV.2T(α, df) = T.INV.2T(.05, 39) = 2.02. Since |tobs| = 3.66 > 2.02  once again we reject the null hypothesis.

Note that in Excel 2007 we calculate  p-value = TDIST(t, df, 2) = TDIST(3.66, 39, 2) = .00074 < .05 = α.

The input data for the one-sample t-test can have missing data, indicated by empty cells or cells with non-numeric data. Such cells will be ignored in the analysis.

Worksheet Function

As described in Paired T-Test and Two-Sample T-Test, Excel provides a T.TEST function that supports paired-sample and two-sample t-tests, but not the one-sample t-test. The following Real Statistics function fills in this gap.

Real Statistics Function: The Real Statistics Resource Pack provides the following function:

T1_TEST (R1, hyp, tails) = the p-value of the one-sample t-test for the data in array R1 based on the hypothetical mean hyp (default 0) where tails = 1 or 2 (default).

For Example 2, the formula T1_TEST(A5:D14, 78, 2) will output the same value shown in cell Q56 of Figure 5, namely p-value = .000737.

Confidence interval

As described in Confidence Intervals for Sampling Distributions, we can define the confidence interval associated with the t distribution as

observed mean ± tcrit ⋅ std error

Example 3: Calculate the 95% confidence interval for Example 2

meanobs ± tcrit ⋅ std err = 68.40 ± 2.02 ⋅ 2.62 = 68.40 ± 5.30

This yields a 95% confidence interval of (63.10, 73.70). Since the interval doesn’t contain the hypothetical mean of 78, once again we are justified in rejecting the null hypothesis (with 95% confidence).

Note that the endpoints of the confidence interval, 63.10 and 73.70, are displayed in cells S56 and T56 of Figure 7.

Excel Function

Versions of Excel starting with Excel 2010 provide the following function to calculate the confidence interval for the t distribution.

CONFIDENCE.T(α, s, n) = k such that ( – k,  + k) is the confidence interval of the sample mean; i.e. CONFIDENCE.T(α, s, n) = tcrit ∙ std error, where n = sample size, s = sample standard deviation and 100(1 – α) is the confidence percentage.

For Example 2, CONFIDENCE.T(.05,16.57,40) = 5.30 which yields a 95% confidence interval of (68.40 – 5.30, 68.40 + 5.30) = (63.10, 73.70). The same result was obtained from the T Test and Non-parametric Equivalents data analysis tool, as shown in range D56:T56 of Figure 5.

Observation

Excel’s Descriptive Statistics data analysis tool has an option for generating the confidence interval for a sample or collection of samples using the t distribution. Referring to Figure 2 of Descriptive Statistics Tools, to choose this option click on the Confidence Interval for Mean checkbox and specify the confidence percentage (i.e. 1 – α) if you want to override the default of 95%. E.g. from Figure 4, we see that the 95% confidence interval value in the Descriptive Statistics output for the data in Example 2 is 5.299731.

Real Statistics Functions

The Real Statistics Resource Pack provides the following functions:

STDERR(R1) = standard error of the data in the range R1 = STDEV(R1) / SQRT(COUNT(R1))

CONFIDENCE_T(α, s, n) – equivalent to CONFIDENCE.T(α, s, n); useful for Excel 2007 and earlier versions of Excel

T_CONF(R1, α) = k such that ( – k + k) is the 1 – α confidence interval of the sample mean for the data in range R1 based on the t distribution

T_LOWER(R1, α) = the lower end,  – k, of the 1 – α confidence interval of the sample mean for the data in range R1 based on the t distribution

T_UPPER(R1, α) = the upper end,  + k, of the 1 – α confidence interval of the sample mean for the data in range R1 based on the t distribution

If α is omitted it defaults to .05. These functions ignore any empty or non-numeric cells.

One-tailed Confidence Interval

Example 4: Calculate the one-tailed 95% confidence interval for Example 1.

Since the null hypothesis is H0: μ ≤ 0, the one-tailed confidence interval for the population mean takes the form

(s.e.tcrit, ∞)

where tcrit is the one-tailed critical value  From Figure 1, we see that

s.e.tcrit4.66667 – 3.220048 ⋅ 1.795885 = -1.110616

Since 0 lies in the confidence interval (-1.11616, ∞), we again conclude that there isn’t sufficient evidence to reject the null hypothesis.

Note that if the null hypothesis were H0: μ ≥ 0, then the one-tailed confidence interval would be

(-∞, s.e.tcrit)

Prediction interval

Suppose that the mean and standard deviation of a random sample of size n are  and s, respectively. We can then estimate the mean of a new sample of size m to also be . The precision of this prediction would then be

Prediction interval

This is called the 1 – α prediction interval.

Example 5: Calculate the 95% prediction interval of one additional data element randomly selected from the population from Example 2.

From Figure 7, we see that  = 68.40, s2 = 274.6051, n = 40, and tcrit = 2.0227 (two-tailed with α = .05). Thus the 95% prediction interval for one additional data element is

Prediction interval calculation

and so the prediction interval is (34.47, 102.33). Note that the prediction interval for the mean of another random sample with 40 elements would be (60.91, 75.89), which would be a narrower interval. In any case, this is wider than the 95% confidence interval of (63.10, 73.70). In fact, it is easy to see that the prediction interval is always larger than the confidence interval.

Effect size

Click here for a description of Cohen’s d and Hedges’s g effect size measures for the one-sample t-test. In addition, an estimate of the confidence interval for Cohen’s d is provided.

Statistical Power and Sample Size

Click here for a description of how to calculate the power of a t-test using the same approach as we did in Power of a Sample for the normal distribution. We also show how to estimate the minimum sample size required for a one-sample t-test.

In Statistical Power of the t Tests we show another way of computing statistical power using the noncentral t distribution.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Nelson, S. L. and Nelson, E. C. (2016) How to use the t-test data analysis tool in Excel
https://www.dummies.com/article/technology/software/microsoft-products/excel/how-to-use-the-t-test-data-analysis-tool-in-excel-152093/

Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf

167 thoughts on “One-Sample t-Test”

  1. Dear Sir,
    One more question from my side. Please clarify it.
    If Population is not normal, and σ known, but sample size is small (n=30 or n=30 or n=30), Use test statistic Z
    3.2: If Population is not Normal, σ known, Sample size is small (n=30 or n<30), Use test statistic T
    Please let me know this concept is right?

    Reply
  2. Thanks for this.

    I did a short educational program with 20 students and had a five question pre test, and post test. I want to do a test with the overall average of the scores pre and post, and then an average of the score for each question. Is the method the same as the t-test paired two samples for means? Meaning, I would need to type the scores for each student pre and post overall, and then individually for each question?

    Reply
  3. Dear Sir for comparing mean (one or two sample/s) or for non normal population that variance is known or unknown , we can use test statistic Z provided sample size n is more than or equal to 30, as per the central limit theorem, ( we can replace s instead of sigma, if sigma is unknown).

    Is it right?

    if sample size is small (n<30), population variance is known or unknown of the non-normal distribution, shall we have only way to do non parametric method to compare the mean?

    Please give answer to these questions. I am earnestly waiting for the answer

    Reply
    • Vlavan,
      The statement in the first paragraph of your comment is generally correct. The number 30 is an estimate.
      When the variance is not known it is better to use the t test. This works for small or large samples. It is better since the test explicitly uses the sample variance and. It is certainly better for small samples.
      The t test is also relatively robust to violations of normality. When the sample is clearly not normal (e.g. heavily skewed), then you might consider using a nonparametric test.
      Charles

      Reply
  4. Sorry Charles, last question 🙂

    In Figure 14 above, why are the two t-distribution not the same? For H0 and H1, we need a t(n-1) distribution centered on the origin, don’t we?
    Again, many thanks in advance!
    Fred

    Reply
  5. Hello again Charles,
    For a 2-tailed test you seem to be doubling the alpha value, for instance if alpha=0.05 for a one tail test, you would use 0.10, so 0.05 in both tails. I have also seen people use alpha=0.025 in each tail (so 5% in total). Is there any best practice for that?
    Many thanks!
    Fred

    Reply
  6. Hello Every One,
    I have a got a vale which is a mean of 5 observations. I have lost the data of number of observations (5).
    Is there any formula in excel through which I could get the values of expected observations by specifying the standard deviations.
    Regards

    Rizwan Ali Ansari
    email: rizwans.ansari@gmail.com

    Reply
  7. Hello,

    I have one column containing a 1 or a 0 according to the respondent’s sex.
    A second column has the data about the IQ of the respondent.
    How do I run a t-test between the boys and the girls?
    I don’t want to have to create new columns with boys only or girls only.
    Thanks!
    Mark

    Reply
    • Mark,
      Suppose that the range with the 0s and 1s is A1:A100 and the range with the IQs is B1:B100, then you can conduct a two-tailed t-test (assuming heterogeneous variances) using the array formula: =TTEST(IF(A1:A100=0,B1:B100,””),IF(A1:A100=1,B1:B100,””),2,3).
      Charles

      Reply
  8. Hello sir I m doing Final year MBA and Sir my final year project Topic is Long Run Performance of Right Issues and Its Effect on Share Performance, so Sir can u please suggest me which test and statistical tools i m suppose to conduct to fulfill my Objectives
    with the data i have collected is 10 firms offering Right issues, i have the firm’s 60 months Volume traded and Adj close data and for my second objective i have data of firm’s share price 20 days before Right issue offer and 20 days after right issue offer
    so sir can u please help me out

    Reply
  9. Thank for your prompt reply. Here I will try to describe the challenge I encountered. As I mentioned last time the dependent variable of my topic is “market share” which I couldn’t collect data on, because the subject company is single. So I developed a questinnaire (with 5 point likert scale, Interval data type) having questions like: 1. Product Quality has a significant effect in improving market share of the subject company. 2. Increasing product variety has a significant effect on improving market share of the subject company. 3. Good product packaging has a significant effect on improving market share of the subject company. And the like…… By the way the independent variables are 4Ps(product, price, promotion, and place) of the marketing mix. I planned to use a one sample t-test, to compaire a hypothesised mean to the population mean. Would yuo please give me advise on whether and how I can do this?

    Reply
    • If I understand correctly, you are going to do a number of one-sample t tests. In each t test you will compare the mean score from the questionnaire against some fixed score. Is this your approach and what do you plan to use for the fixed scores?

      Regarding how to conduct the one-sample t test, this is described on the referenced webpage, and on further examples elsewhere on the website.

      Charles

      Reply
  10. Hi, sir. I am a final year MBA student. My thesis topic is “the effect of marketing mix elements on market share” since the subject company is single, I couldn’t collect the dependent variable. So I used 17 questions of 4 independent variables with 17 sub variables. Would you please tell me how I can use a one sample t-test?

    Reply
  11. Hello Charles,

    I have a data set of returns and I want to know if the data is significant to tell whether the mean is less than 0. Would I use t.dist.rt or t.dist? In your example one above, I don’t understand why you are using t.dist.rt rather than t.dist.

    Also, let’s say that I can reject the null that the data’s mean is less than 0. How would I then test whether or not the mean is greater than 0? Is it simply using the opposite test? IE, if I use a t.dist for less than 0, would I then just run it using t.dist.rt for mean greater than o?

    Thank you!

    Reply
    • Austin,
      I am not sure which of the examples on the referenced webpage you are referring to. In any case, T.DIST.RT is just 1 – T.DIST, and in some sense it really doesn’t matter which function you use. The more important issue is which tail you use, the left or the right.
      Charles

      Reply
  12. Hi,

    Can I ask what is the difference between one sample t-test and one-sample chi-square test? Is this the case that t-test works on parametric variables and chi-square test works on nonparametric variables?

    Thanks!

    Reply
    • Sichao,
      The tests are quite different, as you can see by looking at the details of each test on the Real Statistics website.
      I don’t know what you mean by parametric and nonparametric variable, but the t test is a parametric test, while chi-square is a non-parametric test. The non-parametric version of the t test is not the chi-square test, but the Mann-Whitney test.
      Charles

      Reply
  13. Hi Charles
    Thanks for your very useful info.

    Here’s a question for you: Is it possible to carry out a t-test with n=1 sample size?

    I have a known population (n=52, mean 0.25, std dev 0.20), and a sample of one (value 1.25). I’d like to test whether that single sample is statistically different from the population. Can this be done?

    Many thanks

    Reply
    • Michael,
      In general you can perform testing with very small samples, but you shouldn’t expect too much from the results. In the case of a t-test, it is impossible to do this test with a sample with one element since df = n-1 = 1-1 = 0.
      Charles

      Reply
    • You need to provide more information before I can answer your question. Also, by plot test do you mean a test plot for normality?
      Charles

      Reply
  14. Hi Charles,

    You use this method for the 1 sample test, but the method based on the non central t distribution for the 2 sample tests.

    Is it possible to use either method on either type of test? If not, why?

    Thanks,

    Jonathan

    Reply
      • Thank you Charles.

        I’m currently trying to determine the Beta and Statistical power for all the examples here in order to hone my intuition for these things.

        I tried to calculate the Beta/Power for example 1 on this page with this data for a 2-tailed test:

        C.I: 7.0872 (2.20098*3.22)
        Xcrit: 7.0872 (0 + C.I.)
        T: 0.7517 (7.0872-4.66/3.22)
        Beta: 0.46799 – TDIST(0.7517, 11, 2)
        Power: 0.532 (1-Beta)

        However, if I use T1_Power or the RealStats tool the Power of this test is 0.26304.

        As far as I can see I’m using the method listed on this page correctly so I’m not sure why my results are different if you can use either method interchangeable. (I’m assuming T1_Power uses the NCT method to determine the Power).

        Reply
        • Jonathan,

          My calculations are different from yours. I have done this quickly and so perhaps I have made a mistake, but I get the following:

          t-crit 1.795884819
          x-crit 5.782829113
          mean 4.666666667
          t 0.346629505
          beta 0.632295153
          power 0.367704847

          This not the same as the answer using the noncentral T distribution, although it is closer than the value you calculated. I can’t recall whether the assumptions are different and so the values may not be identical.

          Charles

          Reply
  15. Hi Charles,

    For example 1, you list the Tcrit value as 2.200 in the excel diagram, but since it’s a 1-tailed test shouldn’t it be 1.80 since you need to do 2*P? (You use the 1.80 value later on in your explanation).

    Reply
    • Jonathan,
      You are 100% correct. For some reason, the examples workbook contains the correct calculations, but the website doesn’t. I have just corrected the webpage.
      Thanks for catching this error. Your help is most appreciated.
      Charles

      Reply
  16. Dear sir i have a data for 5 years relating to growth of deposits in bank i have to find weather growth is significant or not how it is possible
    1. 702.79
    2. 753.55
    3. 758.56
    4. 887.32
    5. 1108.73
    please help me in this regard i will be thankful to u

    Reply
    • Mustaq,
      You need to better define what you mean by growth. Do you mean that the values increase or do you mean that they increase exponentially, etc. etc.? Once you have determined which model of growth you are using then you can use the appropriate statistical test. E.g. if you are looking at exponential growth then you can use an exponential regression model, as described on the webpage Exponential Regression.
      Charles

      Reply
  17. Dear Charles,

    I am writing my bachelor thesis in finance. I am testing whether an active trading strategy is better than a buy & hold strategy. Both strategies have the same stocks but the active strategy buys a stock when you have a buy signal from a technical indicator “the moving average” and vice versa, sell the stock when there is a sell signal. After a sell signal, you buy a treasury bill with a risk free rate until a new buy signal.

    The buy and hold strategy just buys the stocks on the first day and keeps it until the last day. I want to test if the active strategy is better.

    I have 212 stocks with daily closing prices during 20 years and have calculated the monthly returns. I am thinking about comparing the difference between the monthly returns by just calculating the difference for each month and then put all differences in a one sample t-test assuming the mean to be 0.

    Am I doing this correct?

    Best regards, Per Forsberg from Stockholm, Sweden

    Reply
    • Dear Per,
      It sounds like you have two factors: the 212 stocks and the 240 monthly returns. Perhaps you are combining the returns on all 212 stocks each month. In this case you could use the t test on the differences of the 240 monthly returns. Stock prices from one month to another aren’t completely independent (i.e. there is autocorrelation), but since you are taking the differences perhaps this approach is valid.
      Charles

      Reply
  18. Hi Charles,

    I have posted this question on another section of your website, i am reposting here since it is affecting all your real statistics functions. When I call the dialog box thorugh Ctrl+m, and try to run for example a t-statistic on one of your real statistics examples, say weight loss, i get the following error:

    —————————
    Microsoft Excel
    —————————
    Alpha must be a number between 0 and .5

    I have tried all values within that range, including the 0.05 default option to no success.
    Can you please help me debug this issue so i can use your tools.

    Best
    Dimitris

    Reply
    • The usual reason for this error is that you are using comma instead of period as the decimal symbol. In this case, you need to enter 0,05 instead of 0.05. If this what is already written in the field just re/enter the value manually.

      Charles

      Reply
      • Dear Charles,

        I have the same problem as Dimitris. I tried all values within that rage, I put as 0.05 as 0,05, but it still doesn’t work. Could you please help to fix this bug?

        Reply
        • Victoria,
          Did you try re-entering 0.05 as 0.05 (i.e. manually writing the exact same value)?
          What language does your Excel use?
          I will try to finally fix this problem which has plagued me from the beginning, but for now please try the above approach and see whether it works.
          Charles

          Reply
          • Hi Charles,

            I also have the same problem. I use the South African/Afrikaans locale in Excel.

            However, your solution (to use a comma) worked in my case. It would be great, however, it the problem could be fixed permanently, so that I don’t have to enter values manually.

            Thanks for your help!
            Gerhard

          • Gerhard,
            I plan to introduce what I hope to be a solution in the next release of the software.
            Charles

  19. In the Calculation of Power section, you have cell I24 = .870985. Using your numbers and even putting your numbers directly into the TDist formula (=TDIST(1.159795,23,1)), I get the same answer every time: 0.129015. I can’t work it out so it gets what you got (.870985). How are you getting that??
    (For those pics, did you use the GETFORMULA UDF to show the formulas? Because for at least I24, the formula you are showing isn’t the formula that is being used to calculate that cell.)

    Reply
    • John,

      The formula used is T.DIST(1.159795,23,TRUE) not =TDIST(1.59795,23,1). T.DIST is the Excel 2010/2013/2016 version of the TDIST function. Actually, I used the Real Statistics function T_DIST, which is identical to T.DIST except that it is available even for versions of Excel prior to Excel 2010.

      Note that =TDIST(1.59795,23,1 is equivalent to =1-T.DIST(1.159795,23,TRUE)

      Generally I use the Real Statistics function FTEXT to show a formula as text. Sometimes, I use an Excel 2007 formula, but show the Excel 2010 or Real Statistics version of the same formula.

      Charles

      Reply
  20. I am confused about something: 1) When the population variance (sigma-squared) is known, we can use Cohen’s d as an estimate of effect size where d = abs (sample mean – population mean)/ Isn’t sigma the symbol for population standard deviation?? Above is talking about population variance (?))

    2) I think you need more symbols to explain these. Like this: “When the population variance (sigma-squared) is unknown, we can use the sample standard deviation (s) as an estimate of the population standard deviation (sigma)”

    Reply
    • John,
      If the population variance is known then the population standard deviation is also known since this is the square root of the variance. Sigma is the symbol for the population standard deviation.
      Charles

      Reply
  21. thank you very much. the service you given through this forum is priceless for the student like me who learn statistics. great service

    Reply
  22. Dear charles,

    Again I back due to problem of reliability test of my data. Before doing Hotelling t test I did Cronbach alpha test for reliability and got negative value for it. then, still can I use Hotelling t test.
    as you mentioned, only the normality test is sufficient for t test ? what can I do if I got negative Cronbach alpha value (value is greater than 0.70).

    Nirosha

    Reply
    • You should be able to use Hotelling’s test no matter what value you got from Cronbach’s alpha, but if your questionnaire is unreliable, you might question the value of any results you get as a result of the questionnaire.

      A negative value for Cronbach’s alpha can mean that your questionnaire is testing more than one thing. If this isn’t your intention then you may need to reword some of your questions or remove one or more questions. You can use Cronbach’s alpha with one question removed to determine whether this will be helpful.

      Charles

      Reply
  23. Thanks, I will briefly explain my problem. I selected 30 Agricultural extension officers to measure their information link with other actors in the sectors such as farmers, research officers, etc.
    and Used seven variables to measure their perception towards the information link with others. ex:
    I have frequent link with Research officers , I have update meeting once a month with research officers … etc and licket scale range highly significant link (+2) ; to (-2) no any significant link.

    finally, I tried to test overall one hypothesis using all theses variables
    and H1: I have significant information link with other actors of the network.

    can i made such hypothesis using multiple variables?
    because, when i run one sample t test, I got t values for each variables and some are significant and some are not.
    then, I do not know how to test above hypothesis using each individual value

    is their any methods to take value of total variables and test my above hypothesis.
    I am sorry , I am very poor in statistical analysis and looking forward hearing from you

    Reply
  24. Hi, I am struggling to use my licket scale data with t- test, I want to test attitudes of AI officers using seven variables with 5 point licket scale. can I use one sample t test to test hypothesis of there is a significant effects of AI officers attitudes towards relationship with farmers.

    how I test one hypothesis using seven variables. I got t value for each variables.
    How I test my hypothesis

    thanks

    Reply
  25. Dear Charles,

    I have data from 82 respondents on a survey that measures service quality. The survey has two sections: Expectations and Perceptions, scores for both sections are in the form of likert responses. I want to show that there is significant difference between expectations scores and perceptions scores. I have seen from previous literature that researchers have made use of parametric tests like the t-tests to prove this significance in their research. However I am unable to identify what kind of t-test exactly to use and how to use it for my research.
    Could you please advise on what kind of t-test i can use to prove the significance of difference between the expectation and perception scores.
    Thanks

    Reply
    • Also, each section has 22 statements, so 22 statements under expectations and 22 statements under perceptions measuring the same 22 variables in both sections

      Reply
    • If the same 82 respondents fill in both parts of the survey, then the appropriate t test is a paired samples t test.

      Since you are using a Likert scale, there is some question as to whether the normality assumption of the t test will be met. If not, you should use the nonparametric version of the t test, namely Wilcoxon signed ranks.

      You also mention that each section has 22 statements, in which case the choice of test depends on which scores you are comparing. If, for example, you add up the 22 Likert scores for each section (for each respondent), then you will get a score which can be considered to be continuous, in which case, it is likely that the t test assumptions will be met and so you can use the t test.

      Charles

      Reply
      • Thank you for this Charles.
        On another note, can I please ask you to clarify my problem with a reliability test that I carried out for this survey. The cronbach’s alpha values were in the range of 0.73 to 0.90 for the pilot tests on 10 responses, deeming the scale to be reliable. However, when I ran a reliability test post survey with 82 responses, the cronbach’s alpha’s ranged from 0.57 to 0.83. What inferences can I make about this for my research.

        Thanks

        Reply
  26. I am quite new to statistical analysis and to the latest version of Excel, as I last did number crunching in the days of Windows 95. I have recently had to relearn all this stuff and was expected to use Minitab, which I dislike. There is so much rubbish on the internet, and many students have to spend hours ploughing through it to find anyhting of value. This site is incredible. The quality of the resources is superb, the style of writing, the layout, the exposition, all first rate. I don’t know how you do it. Keep up the good work.

    Reply
  27. Hi Charles, Thank you very much for this article, very useful!

    Have a question and would like to know your opinion:

    For my business, I am measuring on-time performance. I have a data of deliveries and want to set up the right transit time for planning. At the moment, I am caring only if truck is late. Would you advise me on distribution and test should I use?
    about 150 trucks are in data set, mean – 24, st dev – 10, confidence level needed – 85%

    Thank you a lot!

    Reply
      • Charles,

        I need to set up the value that would capture of 85% of arrivals.
        I digged into the data and it appeared that there are 2 picks which can be easily described: if warehouse didn’t manage to do it on time, it arrives to our store the next day. So there is a gap in between.
        Really have no idea how to measure.

        Thank you!

        Michael

        Reply
        • Michael,
          Unfortunately, this doesn’t tell me what hypothesis you are trying to test. I don’t understand the problem you are trying to solve well information to give you any advice.
          Charles

          Reply
          • As I understand it will be H0 (Hyp. 0) – that hyp. mean is < than real mean with confidence level 85%. Though I am not very sure about it…

  28. Hi Sir,

    How different should the variances be to abandon Student’s T-test and go for Welch’s? I got samples with different variance differences (some around 20, some 3). Also, if the variances do not differ by much, is it OK to do Student’s T-test despite the large sample size?

    Meera.

    Reply
    • Meera,
      I tend to always use the t test with unequal variances instead of the t test with equal variances since if the variance are pretty close then the two tests give almost the same results anyway. In my experience unless the variances are really different (say one is more than 4 times the other), the tests give almost the same p-values.
      I don’t see any reason not to use the t test with a large sample size.
      Charles

      Reply
  29. Dear Sir,

    Should I opt for Welch’s t-test instead of Student’s since my N is way more than 30 (sample size=384)? Also, is it OK to calculate ANOVA with samples of unequal length?

    Regards,
    Meera.

    Reply
    • Meera,

      If by Welch’s t test you mean the t test with unequal variances, then the fact that N is large is not the determining factor for using Welch’s t test. You should use Welch’s t test when the variances of the two samples are very different.

      Yes, you can use ANOVA with samples of unequal size. The websitre shows you how to do this. When samples are of unequal size, the test is less robust to violations of the assumptions (esp. unequal variances).

      Charles

      Reply
  30. Dear Sir,

    I’m back seeking more help. I’m studying the errors made by non-native speakers of English in my state. Did a survey of sample size 384. The mean error = 12.43229, Standard deviation =5.397572. I wanna say that the sample mean can be applied to the population. Am thinking about a single sample t-test. But there has been no previous study, so i got no population mean 2 apply. What should be my null hypothesis? I hv read the ones in examples 1 & 2 but they are different from my situation. Plz do help.

    Regards,
    Meera.

    Reply
    • Meera,
      As you have said, unless you are comparing to some other study or result, there is no t test to perform.
      It sounds like mean error = 12.43229 and Standard deviation =5.397572 captures the situation.
      Charles

      Reply
  31. Hi Mr. Charles,
    I’m doing a thesis regarding the significant decrease of index value in stock market during August for the so-called ‘Ghost Month Effect’. In my methodology, I test the monthly data for 20 years(1994-2013) and recorded if there is decrease that occurred for a particular month and afterwards made a table summarizing the total no. of times(out of 20 years) that a particular month had.

    However, before counting the no. of declines I first assured that the decline that happened is significant using one-sample t-test(one tailed).I used the t-test for the difference of the daily data for each month and below is a sample result.
    May 1995: Mean of the sample=14.5952381, t-statistic=1.908295131, t-critical=1.724718243
    June 1995:Mean of the sample=-0.281904762
    , t-statistic=1-0.065582771, t-critical=1.724718243

    Ho=0
    Ha or < the t-critical regardless of the sign of the sample mean. Hope you can enlighten me.

    Thanks,
    MJ

    Reply
    • MJ,

      Your random variable seems to be x = the difference of the daily stock market index (for a given month). I don’t know what H0=0 means. Does this mean that the mean value of x is 0? This would indicate that if you have one big decline in the month it could have as much weight as a lot of small increases in the month (as opposed to testing whether on average half the days are declines and half are increases). I don’t understand what any of this would mean towards proving (or disproving) your original hypothesis that there is a decline in the indices in August.

      A simple approach could be to use the t-test (or Mann-Whitney if the assumptions don’t hold) to see whether there is a significant difference between the index on the first of the month and last day of the month for 20 years, comparing say July with August. I can think of other approaches that could be suitable as well.

      Charles

      Reply
  32. Charles can you please explain me why do we use one sample T-test. what exactly it is used for. Can i use it for likert scale. How to interpret the result of One sample T-test

    Reply
    • As explained on the referenced webpage, the one sample t test is used to determine whether the mean of a population is significantly different from some fixed value. Examples 1 and 2 on the referenced webpage give examples of when the test is used. The data should be continuous, but the test can be used with ordered discrete data (such as Likert scale) provided the other test assumptions are met.
      Charles

      Reply
  33. Hi Charles,
    I am pretty good with finding the mean and standard deviations, but I am struggling to answer this question for class. Could you please help me.

    The vast majority of the world uses a 95% confidence in building confidence intervals. Give your opinion on why 95% confidence is so commonplace. Justify your response.
    Construct a hypothetical 95% confidence interval for a hypothetical case of your choosing. Use your own unique choice of mean, standard deviation, and sample size to calculate the confidence interval. Select one (1) option provided below and analyze what will happen to your confidence interval based on the option you selected:
    The confidence changes to 90%.
    The confidence changes to 99%.
    The sample size is cut in half.
    The sample size is doubled.
    The sample size is tripled.

    Reply
    • Daniel,
      I would prefer not to do your homework assignment for class. You should do this yourself.
      I will give you a hint though. Try answering the questions using an actual example. You can use one of the examples on the referenced webpage.
      Charles

      Reply
  34. Helloo, I have to do statistical analyses about this hypothesis
    Using social network havw positive inpact on modeling altruistic behavior in adolescents
    but i dont know which analyses to use could you give me some coments about this.

    Reply
  35. Hi,
    I have a question about the confidence interval. I ran a test on a set of sample data which produced the following data:
    Average = 0.74%
    Degrees of freedom = 117
    Standard Deviation = 0.0219
    Standard Error = 0.00203
    Assuming that the total average is zero (\mu) , I get the statistics:
    t-Value = 3.651
    p-Value = 0.00068 (using excel T.DIST with probability density function)

    Now if I want to create a confidence interval for the average the following month, within 95% (the alpha in this case is 0.05 which would reject the hypothesis, since p-Value < alpha). But doesn't this mean that I can create a confidence interval of 0.99932 making my alpha 0.00068 (which would not be rejected)?

    That is, choosing a larger confident interval will make the hypothesis to be not rejected? Or have I gotten it wrong?

    Reply
    • Hi Daniel,
      Rejecting the null hypothesis with a p-value of .00068 does not mean that the confidence interval is .99932 (whatever that means). The confidence interval is (.74-.00203*d, .74+.00203*d) where d = TINV(.05,117).
      Charles

      Reply
  36. Dear Charles,

    Thank you very much for your information. I am currently using T-test to test my result.
    Now I am not sure if I did it correct. I have a set of data as one sample set. In this sample set, i have 1000 variables either ‘0’ or ‘1’ from my testing behaviors. (1 means behavior happened, 0 means behavior does not exist.) For example, it, in this 1000 numbers, has 700 times that show 1. Thus, observed behaviors happened 70%. Then I used T-test to find significance of this 70%. Is it appropriated to use T-test for this purpose? What can my hypothesis be?

    Thank you very much.

    Reply
    • It doesn’t seem likely that you would want to use a one-sample t test in this situation. With just 0’s and 1’s the data is not normally distributed (one of the assumptions for the t test). If you let me know what hypothesis would you like to test, we can decide which test to use.
      Charles

      Reply
      • Hi Charles,
        Thank you very much.
        I have a set of alternatives in my data where each data has x-value and y-value. Then I would like to compare trade-off behavior between them. If a pairwise alternative has trade-off relationship then it will keep result as ‘1’ otherwise ‘0’. Thus, I would like to compare than if one pairwise shows 30% trade-off and another pairwise has 70%. Are they both significant in percentage? Like can we say that 30% is important as well as 70% possibility. Thus, 30% trade-off should be considered and be able to represent data also.

        Reply
  37. Dear Charles,
    The lectures and answers were very helpful.
    Just a final verification from someone with very little knowledge on stat.
    One can use T-test to compare a sample mean to be significantly the same as that of the population mean when the population mean is unknown?How does one set the null hypothesis for this?
    Thank you very much

    Reply
    • The one sample t test is used when the population standard deviation (not the mean) is unknown. The t test tests whether the population mean has some specified value (often called the hypothetical mean). Examples 1 and 2 on the referenced webpage show how to set the null hypothesis for this test.
      Charles

      Reply
  38. I have done some calculations based on your guidelines. Can you tell me how do say that whether my hypothesis for t value is accepted or rejected. Based on the power what should be my analysis. Thanks, Jay

    Mean 3.330985915
    Mode 4
    Std Dev 1.372071122
    Sample 142
    size 0.241230874
    tails 1
    Hyp Mean 3
    std error 0.115141651
    df 141
    t-stat 2.874597622
    P value 0.002336352
    alpha 0.05

    Power

    t-crit 1.655732288
    x-crit 3.190643749
    real mean 3.330985915
    t -1.218865335
    beta 0.887534708
    Power 0.112465292

    Reply
    • Jay,

      Using the figures that you have written (I haven’t tried to verify them), I conclude:

      Since p-value = 0.00234 < .05 = alpha, you reject the null hypothesis Power of 11.2% is quite low (assuming that the calculation is correct). You shoud consider using a larger sample. Charles

      Reply
  39. Dear Charles,

    I am using t test for one sample. My sample size is 142. While going through your info, I find it difficult to arrive at alternate formula for t_dist that you have given. Can you help me in the following i) Is it correct to use t test – one sample for my study having sample size of 142 ii) How do I use Power for interpretation (final result). Please advise. Thanks. Jay

    Reply
    • Jay,
      I can’t find an alternative formula for t_dist on the referenced page. Perhaps I missed it. What are you referring to?
      I believe that I have responded to the rest of your comment in my response of yesterday.
      Charles

      Reply
  40. Dear charles.
    thank you for the materials, but i still don’t understand something about the P value calculation why can we use T-Distri to calculate the P value please explain me more easily

    Reply
    • Dear Traore,
      You can use the t distribution to calculate the p-value via the function TDIST or T.DIST, as shown in Figure 1 of the referenced page.
      Charles

      Reply
  41. Sir Charles,
    is it correct to apply t-test to a mean of a Likert (1-7 score) responses?
    I explain better, there is a set of responses with n=60 and for Q1 (i.e. n=1 Q1=2; n=2 Q1=1; n=3 Q1=7; n=i Q1=x; mean Q1=4.52), and I apply t-test to Q1 mean.
    I’m interested to understand if Q1 result is reliable.
    Thanks.

    Reply
    • Stefano,
      The one sample t test is used to determine whether the population mean has a value m based on data in your sample. I can’t tell from your example if this is what you are trying to test. You used the word “reliable”. Perhaps you are trying to test for reliability, in which case you might want to use Cronbach’s alpha: see the following webpage for more details: https://real-statistics.com/reliability/cronbachs-alpha/.
      Charles

      Reply

Leave a Comment