Hotelling’s T-square Test Additional Topics

Simultaneous Confidence Intervals

Since we know there is a significant difference between drug and placebo in treating at least one of the 3 symptoms, we would like to identify which symptoms are different.

Example 1: For the data from Example 2 of Hotelling’s T2 for Independent Samples, determine for which symptoms the drug is significantly different from the placebo.

As we did in the one-sample and paired sample cases we now seek to find confidence intervals for each of the symptoms. Once again we consider both the simultaneous 95% confidence intervals and the Bonferroni 95% confidence intervals.

To determine the simultaneous 95% confidence intervals, we note (as in the one-sample case) that the 1 – α confidence hyper-ellipse for the population mean difference vector μ = μXμY is given by

image9040

where T2 is as in Definition 1 of Hotelling’s T2 for Independent Samples. Thus we are looking for values of μX – μY which fall within the hyper-ellipse given by the equation

image9041

From the 1 – α confidence hyper-ellipse, we can also calculate simultaneous confidence intervals for any linear combination of the means of the individual random variables. For example, for the linear combination

image9042

(where μi = μX,i – μY,i), the simultaneous 1 – α confidence interval is given by the expression

image9056

where the pooled covariance matrix S = [sij].

For the case where c = μi the simultaneous 1 – α confidence interval is given by the expression

image9057

The simultaneous confidence intervals for Example 1 are as shown in Figure 1.

Simultaneous confidence interval Hotellings

Figure 1 – Simultaneous 95% confidence intervals

Since 0 is in the confidence interval for Pressure and Aches, we conclude there is no significant difference between the drug and placebo for these symptoms.

Since the endpoints of the confidence interval for Fever are both negative, we conclude that patients taking the drug have significantly less fever than those who take the placebo.

Bonferroni Confidence Intervals

As in the one-sample case, if we are only interested in looking at single variables and not linear combinations, we will be better off using Bonferroni confidence intervals since these intervals will tend to be narrower. We now turn our attention to this analysis and use the following formulas for the 1 – α confidence intervals:

Bonferroni confidence interval formulawhere the tcrit is based on a significance level of α/k. The relevant calculations are given in Figure 2.

Bonferroni confidence interval Hotellings

Figure 2 – Bonferroni 95% confidence intervals

The confidence intervals in Figure 2 are narrower than those in Figure 1, but the results are similar.

Effect size

The Mahalanobis Distance can be used as a measure of effect size, where

Mahalanobis distance effect sizeFor Example 1 this is

image9060

Assumptions

Hypothesis testing using the T2 statistic for two independent random vectors X and Y is based on the following assumptions:

  1. Each of the random vectors has a common population mean vector
  2. X and Y have a common population covariance matrix Σ
  3. X and Y are multivariate normally distributed
  4. Each of the samples is done randomly and independently

Normality

That X and Y are normally distributed implies that each variable in X and Y is normal (or at least roughly symmetric). This can be tested as described in Testing for Normality and Symmetry (box plots, QQ plots, histograms, etc.). You can also produce a scatter diagram for each pair of variables in X and each pair of variables in Y. If the random vectors are multivariate normally distributed then each plot should look roughly like an ellipse. These are not sufficient to show that X and Y are multivariate normally distributed, but it may be the best you will be able to do. Fortunately, Hotelling’s T-square test is relatively robust to violations of normality.

Also, if nX and nY are sufficiently large then the Multivariate Central Limit Theorem holds and so we can assume that the normality assumption is met.

Common covariance matrix

In the univariate case for two-sample hypothesis testing of the means, the t-test can be used provided the variances of the two samples are not too different, especially if the sample sizes are equal.

Similarly in the multivariate case, Hotelling’s T-square test can be used provided nX = nY and the sample covariance matrices don’t look too terribly different.

We can use Box’s Test to check the null hypothesis that the two sample covariance matrices are equal. The caution here is that this test is very sensitive to violations of normality (even though Hotelling’s T-square test is not very sensitive to such violations). For Example 1 of Hotelling’s T2 for Independent Samples, Box’s test yields the results shown in Figure 3.

Box's M test Hotelling

Figure 3 – Box’s test

Since p-value > α = .001, we cannot reject the null hypothesis that the covariance matrices are equal. See Box’s Test for more details about Box’s Test.

References

Penn State University (2013) Hotelling’s T-square. STAT 505: Applied multivariate statistical analysis (course notes)
https://online.stat.psu.edu/stat505/lesson/7/7.1/7.1.3

Rencher, A.C. (2002) Methods of multivariate analysis (2nd Ed). Wiley-Interscience, New York.
https://www.ipen.br/biblioteca/slr/cel/0241

Johnson, R. A. and Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson.
https://www.webpages.uidaho.edu/~stevel/519/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf

15 thoughts on “Hotelling’s T-square Test Additional Topics”

    • Hello David,
      I looked into Hotelling’s T-square test. Indeed the number of rows can’t be less than the number of dependent variables. They can be equal though except in the case where there is one dependent variable (equivalent to a t-test).
      Charles

      Reply
    • David,
      For the two independent samples test with n1 and n2 rows in the two samples, the requirement is n1+n2 > k where k = the number of dependent variables.
      Charles

      Reply
  1. Hi Charles,

    My question is related to simultaneous and Bonferroni CI’s. Can you explain why one would be interested in all linear combinations (simultaneous) and contrast that to why one would be interested in only single variables? I’m assuming its related to alpha-inflation.

    Thank you,

    Reply
  2. Hello ,
    Im going to use T-square Independent for my final year project .
    Can i know what is the suitable scale for my questionnaire for this t-square independent method analysis

    Reply
  3. Hello. Awesome web.

    In Figure 1, how do you calculate “s” and “se”? I’m trying in Excel and I don’t reach the same values. I’m writing something wrong.

    Reply
    • Marco,
      For Fever s_1 (cell N29) is calculated by =SQRT(((N19-1)*N27+(N20-1)*N28)/(N19+N20-2))
      se_1 (cell N30) is calculated by =N29*SQRT(1/N19+1/N20)
      Charles

      Reply
  4. Dear Charles,

    I’m a phd Student and I have some problems with my data. To sump up, I have one factor with two groups and 2 variables for each one. After remove the incomplete cases and non numeric data, the first group has 72 cases and the second 57. The Box test has a significant outcome, so I reject the null hypothesis of equal covariances matrix. Anyway I tried a MANOVA analysis an T2-Hotelling test. In both of test, I have a significant outcome, but since the simultaneous and Bonferroni intervals are different, with MANOVA the factor accounts for the difference in means vector observed whereas in T2-Hotelling test the two dependent variables are ok. Why?
    How should I interpret the results? Because I focus in Hotelling outcome, I meet significant differences but not due to these variables and focusing on MANOVA the same variables play a role in the significant difference?

    Thanks in advance and congratulations for the really useful website.

    Reply

Leave a Comment