Multiple Regression Basic Concepts

Definition 1: If y is a dependent variable (aka the response variable) and x1, …, xk are independent variables (aka predictor variables), then the multiple regression model provides a prediction of y from the xi of the form

General multiple regression model

where β0 + β1x1 + … + βkxk is the deterministic portion of the model and ε is the random error. We further assume that for any given values of the xi the random error ε is normally and independently distributed with mean zero.

In practice, we will build the multiple regression model from sample data using the least-squares method. We, therefore, assume that we have a sample consisting of n observations whose ith observation is xi1, …, xik, yi. Thus

image9106

Assumptions: The multiple regression model is based on the following assumptions:

  1. Linearity: The mean E[y] of the dependent variable y can be expressed as a linear combination of the independent variables x1, …, xk.
  2. Independence: Observations yi are selected independently and randomly from the population
  3. Normality: Observations yi are normally distributed
  4. Homogeneity of variances: Observations yi have the same variance

These assumptions can be expressed in terms of the error random variables:

  1. Linearity: The εi have a mean of 0
  2. Independence: The εi are independent
  3. Normality: The εi are normally distributed
  4. Homogeneity of variances: The εi have the same variance σ2

These requirements are the same as for the simple linear regression model described in Regression Analysis. The main difference is that instead of requiring that the variables (or the error term based on these variables) have a bivariate normal distribution, we now require that they have a multivariate normal distribution (i.e. normality in k+1 dimensions). See Multivariate Normal Distribution for more details. Also, see Residuals for further discussion about residuals.

Based on assumptions 1, 3, and 4 we know that εiN(0, σ2) for all i. Based on assumption 2, we know that cov(εiεj) = 0 for all ij.

Observation: Our goal is to find estimates b0, b1, …, bk of the unknown parameters β0, β1, …, βk. We do this using the least-squares method.

The least-squares method generates coefficients bj such that for all i

image9110

where we have sought to minimize the value \sum\limits_{i=1}^n {e_i^2}.

Defining ŷi to be the y value predicted by the model for the sample data xi1, …, xik, i.e.

image1875

we see that the ith error term for the model is given by

image1877

As we see in Multiple Regression using Matrices, the results presented in Linear Regression extend to multiple regression.

5 thoughts on “Multiple Regression Basic Concepts”

  1. Hi Charles,

    First, just want to thank you for this resource. I have a question regarding the disparity with the independent and dependent variablese described in the definition and assumptions sections: why is the dependent variable , y, referred to as the independent variable in the assumptions of linearity?

    Thanks in advance,
    J

    Reply
    • Hi Juno,
      Thanks for catching this error. The roles of dependent and independent variables were interchanged. I have now corrected the webpage.
      Charles

      Reply
  2. Dr. Zaiontz,

    There is a typo. The two paragraphs preceding “Assumptions” have the same first sentence. One paragraph than has an incomplete sentence.

    I very much appreciate the tremendous amount of time and effort that has gone into producing this web site.

    Reply

Leave a Comment