Basic Concepts of Factor Analysis

Objective

In this model, we consider k independent variables x1, …, xk and observed data for each of these variables. Our objective is to identify m factors  y1, …, ym, preferably with mk as small as possible, that explain the observed data more succinctly.

Defining terms

Definition 1: Let X = [xi] be a random k × 1 column vector where each xi represents an observable trait, and let μ = [μi]  be the k × 1 column vector of the population means. Thus E[Xi] = μi. Let Y = [yi] be an m × 1 vector of unobserved common factors where m ≤ k. These factors play a role similar to the principal components in Principal Component Analysis.

We next suppose that each xi can be represented as a linear combination of the factors as follows:

image9183

where the εi are the components that are not explained by the linear relationship. We further assume that the mean of each εi is 0 and the factors are independent with mean 0 and variance 1. We can consider the above equations to be a series of regression equations.

The coefficient βij is called the loading of the ith variable on the jth factor. The coefficient εi is called the specific factor for the ith variable. Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.

Define the communality of variable xi to be φi = \sum_{j=1}^m \beta_{ij}^2 and let ϕi = var(εi) and \sigma_i^2 = var(xi).

Key Properties

Since μi = E[xi] = E[βi0 + \sum_{j=1}^m \beta_{ij} yj + εi] = E[βi0] + \sum_{j=1}^m \beta_{ij} E[yj] + E[εi] = βi0 + 0 + 0 = βi0, it follows that the intercept term βi0 = μi, and so the regression equations can be expressed as

image9184 or equivalently
image9185

From the assumptions stated above it also follows that:

E[xi] = μi for all i
E[εi] = 0 for all i (the specific factors are presumed to be random with mean 0)

cov(yi, yj) = 0 if i ≠ j
cov(εi, εj) = 0 if i ≠ j
cov(yi, εj) = 0 for all i, j

From Property A of Correlation Advanced and Property 3 of Basic Concepts of Correlation, we get the following:

image9189

From these equivalences, it follows that the population covariance matrix Σ for X has the form

image9190

where \phi is the k × k diagonal matrix with \phi_i  in the ith position on the diagonal.

Eigenvalues

Let λ1 ≥ … ≥ λk be the eigenvalues of Σ with corresponding unit eigenvectors γ1, …, γk where each eigenvector γi = [γij] is a k × 1 column vector of the form γi = [γij]. Now define the k × k matrix β = [βij] such that βij = \sqrt \lambda_jγij for all 1 ≤ i, jk. As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and so the βij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1 of Linear Algebra Background (Spectral Decomposition Theorem), it follows that

image9191

As usual, we will approximate the population covariance matrix Σ by the sample covariance matrix S (for a given random sample). Using the above logic, it follows that

image9192

where λ1 ≥ … ≥ λk are the eigenvalues of S (a slight abuse of notation since these are not the same as the eigenvalues of Σ) with corresponding unit eigenvectors C1, …, Ck and L = [bij] is the k × k matrix such that bij = \sqrt \lambda_jcij. 

As we saw previously

image9184or equivalently
image9185

The sample versions of these are

image9193

image9194

We have also seen previously that

image9195

The sample version is therefore

image9196

and so
image9197Similarlyimage9198

16 thoughts on “Basic Concepts of Factor Analysis”

  1. dear Charles I’m your fun / user since long time and I’m very happy for you to share your knowledge and the useful excel add-in, it saved my working-life many times! I’ve a question concerning FA and PCA: could you summarize what are the differences between the two analysis? I mean not in mathematical terms but in the interpretation of the results that the two analysis come up (i.e. starting from the same dataset). It will help to better understand which one to use and /or how to discuss the meanings of the results. many thanks again

    Reply
  2. Charles,
    One last question: if I want to create an index that will cover a few years is it correct to do it in this way:
    1) run PCA on every year separately;
    2) get the weights for my variables from the eigenvectors of the first principal component for each year;
    3) find an average for all the weights in the timeseries and then apply them to each year to get fixed weights for every year and thus obtain comparable indexes.
    Or is it not possible to use PCA in this way?
    Thank you!
    Maria

    Reply
    • Hi Maria,
      1. You can run PCA for every year, but what this will serve depends on why you want to run PCA in the first place.
      2 and 3. Not sure why you would want to do this.
      Sorry if my answers are not very helpful, but it is not clear to me what your objective is and so I am unable to determine whether PCA is the way to address this objective.
      Charles

      Reply
  3. Hello Charles,
    Thank you very much for you hard work, it’s greatly appreciated. Could you please tell me if I need to normalize data before performing factor analysis or PCA with real statistics pack? I’m trying to construct a composite index and some of the variables don’t fit the normal distribution even after performing Box-Cox transformation…
    Maria

    Reply
  4. Also, I think there might be a typo in:
    Observation: Since μi = E[xi] = E[βi0 + \sum_{j=1}^m \beta_{ij} yi + εi] = E[βi0] + \sum_{j=1}^m \beta_{ij} E[yi] + E[εi]

    The sum should be over the yj (not yi)?

    Thanks!
    Fred

    Reply
  5. Hello again Charles,
    When you say: “Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.”
    Shouldn’t it be (m+1) to account for the constant term β0?
    Thanks!
    Fred

    Reply
  6. Hi Charles,

    Thanks for the wonderful material. I’ve gone through Factor analysis concepts multiple times. However I am not able to reconcile the following concepts. In PCA, we are trying to express y as a linear combination of x. Here x is known and the betas and y are unknown. By trying to maximize the variance of y subject to some constraints we are able to solve for the betas. Once we know beta we can calculate y. However in Factor analysis, we are doing the converse i.e. trying to express x as a linear combination of y. However here too the solution we get is the same as that in PCA. That is the decomposition of the covariance matrix of X gives us the betas. How can the matrix beta be the same in both situations.

    Would appreciate if you could provide some explanation.

    Regards,

    Reply
    • Piyush,
      I am sorry, but I have not had the time to study your comment in any detail. Can you give me a specific example where the two beta matrices are the same?
      Charles

      Reply
      • I think confusion comes from notation using x and y in both. For PCA, x and y are all observed variables. Y is just a linear combination of x’s and we know the loadings derived straight from x’s covariance matrix. The eigenvectors indicate a new set of axes and we project the data onto them.

        I’m still trying to wrap my head around how we get the regression-like loadings in factor analysis when we don’t observe the latent predictors! I’ve seen PCA and EFA on the same data report very different eigenvalues. Which matrices are being used?

        I do suggest changing the notation for the latent variables in your factor analysis articles to something other than y. Perhaps f for “factor” or some other greek letter used in literature (like theta, eta, xi).

        Reply
  7. hi, i undestand the analogy that bij, the loading, is a piece of information that resides in the reduced model( i’ve read only PCA by now) but i don’t understand where cov(xi,yj) and cov(xi,xj) come from
    (i thought i and j are different dimensions namely k and m)
    …and i do not understand what i have to substitute them to verify the relationship say on the population version

    Can you please provide an illustration?Otherwise the site is a great experience thus far and i’m sure it’s going to be that way until i consume it (what i can)

    Reply
    • Regarding the first type of covariance, i and j take any value from 1 to k. Thus there are k x k different versions of cov(xi,xj).

      Regarding the second type of covariance, i takes any value from 1 to k and j takes any value from 1 to m. Thus there are k x m different versions of cov(xi,yj).

      Charles

      Reply

Leave a Comment