Objective
We now provide proofs of properties presented in Multivariate Regression Basic Concepts.
Proofs (part 1)
Property 1:
B = (XTX)-1XTY
Proof: By univariate regression properties
B = [B1 B2 ⋅⋅⋅ Bm]
= [(XTX)-1XTY1 (XTX)-1XTY2 ⋅⋅⋅ (XTX)-1XTYm]
= (XTX)-1XT[Y1 Y2 ⋅⋅⋅ Ym] = (XTX)-1XTY
Property 2: B minimizes the trace
Tr((Y – XB)T(Y – XB))
Proof: The m × m SSCP matrix S
S = (Y – XB)T(Y – XB)
has diagonal terms which are non-negative scalars of the form
Now
Since the values bjp minimize each term in the above sum, they also minimize the sum.
Property 3:
E[ε] = 0
Proof: This is a consequence of the fact that E[εp] = 0 for all p
Property 4: B is an unbiased estimator of β; i.e. E[B] = β
Proof: By Property 1
B = (XTX)-1XTY
But
Y = Xβ + ε
Thus
B = (XTX)-1XTY = (XTX)–1XT(Xβ+ε)
=(XTX)–1XTXβ + (XTX)–1XTε
= (XTX)–1(XTX)β + (XTX)–1XTε
= β + (XTX)–1XTε
Thus
E[B] = E[β + (XTX)–1XTε] = E[β] + E[(XTX)–1XTε]
= β + (XTX)–1XTE[ε] = β + 0 = β
since E[ε] = 0 by Property 3.
Property 5:
cov(Bp, Bq) = σpq(XTX)-1
Proof: Using univariate regression properties
Bp = (XTX)-1XTYp= (XTX)–1XT(Xβp + εp)
= (XTX)–1XTXβp + (XTX)–1XTεp = βp + (XTX)–1XTεp
Thus
Bp = βp + (XTX)–1XTεp
and so
Bp – E[Bp] = βp + (XTX)–1XTεp – βp = (XTX)–1XTεp
Similarly
Bq – E[Bq] = (XTX)–1XTεq
Hence
cov(Bp, Bq) = E[((XTX)–1XTεp)((XTX)–1XTεq)T]
= E[(XTX)–1XTεpεqTX(XTX)–1] = (XTX)–1XTE[εpεqT]X(XTX)–1
= (XTX)–1XT(σpqI)X(XTX)–1
The last equality is a result of the fact that E[εpεqT] = E[(εp-E[εp])(εq-E[εq])T] = cov(εp,εq) = σpqI since E[εp] = E[εq] = 0. Finally,
cov(Bp, Bq) = (XTX)–1XT(σpqI)X(XTX)–1
= σpq(XTX)–1(XTX)(XTX)–1 = σpq(XTX)–1
Property 6:
E[Ep] = 0
Proof: Here Ep is the pth column of E = [eip].
E[Ep] = E[Yp–XBp] = E[Yp] – XE[Bp] = E[Yp] – Xβp
The last equality results from the fact that Bp is an unbiased estimator of βp.
But
Yp = Xβp + εp
and so
E[Yp] = E[Xβp + εp] = E[Xβp] + E[εp] = Xβp + 0 = Xβp
Putting everything together, we have
E[Ep] = E[Yp] – Xβp = Xβp– Xβp = 0
Hat matrix properties
We now present some properties of the hat matrix
H = X(XTX)–1XT
Property A: H is symmetric
Proof:
HT = (X(XTX)–1XT)T = X((XTX)–1)TXT = X((XTX)T)-1XT = X(XTX)–1XT = H
Property B: H is idempotent
Proof:
H2 = (X(XTX)–1XT)2 = (X(XTX)–1XT)(X(XTX)–1XT)
= X(XTX)–1(XTX)(XTX)–1XT = X(XTX)–1XT = H
Property C: I – H is symmetric and idempotent
Proof: The result follows from Properties A and B since
(I – H)T = IT – HT = I – H
(I – H)2 = (I – H)(I – H) = I – 2H +H2 = I – 2H + H = I – H
Property D: From Property C, it follows that
(I – H)T(I – H) = I – H
Proofs (part 2)
Property 7:
E[EpTEq] = σpqdfRes
Proof: Here dfRes = n – k – 1. First we note that
EpTEq = (Yp–XBp)T(Yq–XBq) = (Yp–HYp)T(Yq–HYq)
= ((I–H)Yp)T((I–H)Yq) = YpT(I–H)T(I–H)Yq = YpT(I–H)Yq
The last equality follows from Property D. Thus
EpTEq = YpT(I–H)Yq = YpTYq – YpTHYq
Under construction
Property 8: SSE/dfRes is an unbiased estimate for Σ; i.e. E[SSE] = E[ETE] = dfResΣ
Proof: This is a consequence of Property 7.
Property 9:
cov(Bp, Eq) = 0 cov(B, E) = 0
Proof: The proof of the first assertion is similar to that for Property 7. The second assertion follows from the first.
References
Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://mathematics.foi.hr/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf
Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3nd Ed). Wiley
