Multiple Regression Concepts | Real Statistics Using Excel

Definition 1: If y is a dependent variable (aka the response variable) and x₁, …, x_k are independent variables (aka predictor variables), then the multiple regression model provides a prediction of y from the x_i of the form

where β₀ + β₁x₁ + … + β_kx_k is the deterministic portion of the model and ε is the random error. We further assume that for any given values of the x_i the random error ε is normally and independently distributed with mean zero.

In practice, we will build the multiple regression model from sample data using the least-squares method. We, therefore, assume that we have a sample consisting of n observations whose ith observation is x_i1, …, x_ik, y_i. Thus

Assumptions: The multiple regression model is based on the following assumptions:

Linearity: The mean E[y] of the dependent variable y can be expressed as a linear combination of the independent variables x₁, …, x_k.
Independence: Observations y_i are selected independently and randomly from the population
Normality: Observations y_i are normally distributed
Homogeneity of variances: Observations y_i have the same variance

These assumptions can be expressed in terms of the error random variables:

Linearity: The ε_i have a mean of 0
Independence: The ε_i are independent
Normality: The ε_i are normally distributed
Homogeneity of variances: The ε_i have the same variance σ²

These requirements are the same as for the simple linear regression model described in Regression Analysis. The main difference is that instead of requiring that the variables (or the error term based on these variables) have a bivariate normal distribution, we now require that they have a multivariate normal distribution (i.e. normality in k+1 dimensions). See Multivariate Normal Distribution for more details. Also, see Residuals for further discussion about residuals.

Based on assumptions 1, 3, and 4 we know that ε_i ∼ N(0, σ²) for all i. Based on assumption 2, we know that cov(ε_i, ε_j) = 0 for all i ≠ j.

Observation: Our goal is to find estimates b₀, b₁, …, b_k of the unknown parameters β₀, β₁, …, β_k. We do this using the least-squares method.

The least-squares method generates coefficients b_j such that for all i

where we have sought to minimize the value $\sum\limits_{i=1}^n {e_i^2}$ .

Defining ŷ_i to be the y value predicted by the model for the sample data x_i₁, …, x_ik, i.e.

we see that the ith error term for the model is given by

As we see in Multiple Regression using Matrices, the results presented in Linear Regression extend to multiple regression.

5 thoughts on “Multiple Regression Basic Concepts”

juno

March 2, 2016 at 9:06 pm

Hi Charles,

First, just want to thank you for this resource. I have a question regarding the disparity with the independent and dependent variablese described in the definition and assumptions sections: why is the dependent variable , y, referred to as the independent variable in the assumptions of linearity?

Thanks in advance,
J
- Charles
  
  March 2, 2016 at 9:38 pm
  
  Hi Juno,
  Thanks for catching this error. The roles of dependent and independent variables were interchanged. I have now corrected the webpage.
  Charles
Robert Kazmierczak

September 1, 2015 at 12:23 am

Dr. Zaiontz,

There is a typo. The two paragraphs preceding “Assumptions” have the same first sentence. One paragraph than has an incomplete sentence.

I very much appreciate the tremendous amount of time and effort that has gone into producing this web site.
- Charles
  
  September 2, 2015 at 6:46 am
  
  Thanks Robert for catching this mistake. I have now removed the incomplete sentence.
  Charles
Getachew

July 21, 2015 at 6:14 pm

Very good lesson

5 thoughts on “Multiple Regression Basic Concepts”

Leave a Comment Cancel reply