Basic Concepts of Factor Analysis

Objective

In this model, we consider k independent variables x₁, …, x_k and observed data for each of these variables. Our objective is to identify m factors y₁, …, y_m, preferably with m ≤ k as small as possible, that explain the observed data more succinctly.

Defining terms

Definition 1: Let X = [x_i] be a random k × 1 column vector where each x_i represents an observable trait, and let μ = [μ_i] be the k × 1 column vector of the population means. Thus E[X_i] = μ_i. Let Y = [y_i] be an m × 1 vector of unobserved common factors where m ≤ k. These factors play a role similar to the principal components in Principal Component Analysis.

We next suppose that each x_i can be represented as a linear combination of the factors as follows:

where the ε_i are the components that are not explained by the linear relationship. We further assume that the mean of each ε_i is 0 and the factors are independent with mean 0 and variance 1. We can consider the above equations to be a series of regression equations.

The coefficient β_ij is called the loading of the i^th variable on the j^th factor. The coefficient ε_i is called the specific factor for the i^th variable. Let β = [β_ij] be the k × m matrix of loading factors and let ε = [ε_i] be the k × 1 column vector of specific factors.

Define the communality of variable x_i to be φ_i = $\sum_{j=1}^m \beta_{ij}^2$ and let ϕ_i = var(ε_i) and $\sigma_i^2$ = var(x_i).

Key Properties

Since μ_i = E[x_i] = E[β_i0 + $\sum_{j=1}^m \beta_{ij}$ yj + ε_i] = E[β_i₀] + $\sum_{j=1}^m \beta_{ij}$ E[yj] + E[ε_i] = β_i₀ + 0 + 0 = β_i0, it follows that the intercept term β_i₀ = μ_i, and so the regression equations can be expressed as

or equivalently

From the assumptions stated above it also follows that:

E[x_i] = μ_i for all i
E[ε_i] = 0 for all i (the specific factors are presumed to be random with mean 0)

cov(y_i, y_j) = 0 if i ≠ j
cov(ε_i, ε_j) = 0 if i ≠ j
cov(y_i, ε_j) = 0 for all i, j

From Property A of Correlation Advanced and Property 3 of Basic Concepts of Correlation, we get the following:

From these equivalences, it follows that the population covariance matrix Σ for X has the form

where $\phi$ is the k × k diagonal matrix with $\phi_i$ in the ith position on the diagonal.

Eigenvalues

Let λ₁ ≥ … ≥ λ_k be the eigenvalues of Σ with corresponding unit eigenvectors γ₁, …, γ_k where each eigenvector γ_i = [γ_ij] is a k × 1 column vector of the form γ_i = [γ_ij]. Now define the k × k matrix β = [β_ij] such that β_ij = $\sqrt \lambda_j$ γ_ij for all 1 ≤ i, j ≤ k. As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and so the β_ij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1 of Linear Algebra Background (Spectral Decomposition Theorem), it follows that

As usual, we will approximate the population covariance matrix Σ by the sample covariance matrix S (for a given random sample). Using the above logic, it follows that

where λ₁ ≥ … ≥ λ_k are the eigenvalues of S (a slight abuse of notation since these are not the same as the eigenvalues of Σ) with corresponding unit eigenvectors C₁, …, C_k and L = [b_ij] is the k × k matrix such that b_ij = $\sqrt \lambda_j$ c_ij.

As we saw previously

or equivalently

The sample versions of these are

We have also seen previously that

The sample version is therefore

and so
Similarly

16 thoughts on “Basic Concepts of Factor Analysis”

jgor

May 21, 2021 at 10:37 am

dear Charles I’m your fun / user since long time and I’m very happy for you to share your knowledge and the useful excel add-in, it saved my working-life many times! I’ve a question concerning FA and PCA: could you summarize what are the differences between the two analysis? I mean not in mathematical terms but in the interpretation of the results that the two analysis come up (i.e. starting from the same dataset). It will help to better understand which one to use and /or how to discuss the meanings of the results. many thanks again
Reply
- Charles
  
  May 23, 2021 at 9:55 am
  
  I am quite pleased that you have gained value from the Real Statistics website.
  Regarding the difference between FA and PCA, perhaps the following will help:
  https://www.theanalysisfactor.com/the-fundamental-difference-between-principal-component-analysis-and-factor-analysis/
  Charles
  Reply
Maria

February 4, 2020 at 5:25 pm

Charles,
One last question: if I want to create an index that will cover a few years is it correct to do it in this way:
1) run PCA on every year separately;
2) get the weights for my variables from the eigenvectors of the first principal component for each year;
3) find an average for all the weights in the timeseries and then apply them to each year to get fixed weights for every year and thus obtain comparable indexes.
Or is it not possible to use PCA in this way?
Thank you!
Maria
Reply
- Charles
  
  February 8, 2020 at 2:44 pm
  
  Hi Maria,
  1. You can run PCA for every year, but what this will serve depends on why you want to run PCA in the first place.
  2 and 3. Not sure why you would want to do this.
  Sorry if my answers are not very helpful, but it is not clear to me what your objective is and so I am unable to determine whether PCA is the way to address this objective.
  Charles
  Reply
Maria

February 4, 2020 at 12:05 pm

Hello Charles,
Thank you very much for you hard work, it’s greatly appreciated. Could you please tell me if I need to normalize data before performing factor analysis or PCA with real statistics pack? I’m trying to construct a composite index and some of the variables don’t fit the normal distribution even after performing Box-Cox transformation…
Maria
Reply
- Charles
  
  February 4, 2020 at 12:36 pm
  
  Maria,
  The data doesn’t need to be normally distributed.
  Charles
  Reply
  - Maria
    
    February 4, 2020 at 3:34 pm
    
    Thank you, Charles!
    One more question: should the data be rid of the outliers for the correct FA/PCA?
    Maria
    Reply
Fred

November 29, 2017 at 10:40 am

Also, I think there might be a typo in:
Observation: Since μi = E[xi] = E[βi0 + \sum_{j=1}^m \beta_{ij} yi + εi] = E[βi0] + \sum_{j=1}^m \beta_{ij} E[yi] + E[εi]

The sum should be over the yj (not yi)?

Thanks!
Fred
Reply
Fred

November 29, 2017 at 10:34 am

Hello again Charles,
When you say: “Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.”
Shouldn’t it be (m+1) to account for the constant term β0?
Thanks!
Fred
Reply
Piyush

August 23, 2016 at 2:40 pm

Hi Charles,

Thanks for the wonderful material. I’ve gone through Factor analysis concepts multiple times. However I am not able to reconcile the following concepts. In PCA, we are trying to express y as a linear combination of x. Here x is known and the betas and y are unknown. By trying to maximize the variance of y subject to some constraints we are able to solve for the betas. Once we know beta we can calculate y. However in Factor analysis, we are doing the converse i.e. trying to express x as a linear combination of y. However here too the solution we get is the same as that in PCA. That is the decomposition of the covariance matrix of X gives us the betas. How can the matrix beta be the same in both situations.

Would appreciate if you could provide some explanation.

Regards,
Reply
- Charles
  
  September 6, 2016 at 10:02 pm
  
  Piyush,
  I am sorry, but I have not had the time to study your comment in any detail. Can you give me a specific example where the two beta matrices are the same?
  Charles
  Reply
  - Brian
    
    September 6, 2023 at 7:36 am
    
    I think confusion comes from notation using x and y in both. For PCA, x and y are all observed variables. Y is just a linear combination of x’s and we know the loadings derived straight from x’s covariance matrix. The eigenvectors indicate a new set of axes and we project the data onto them.
    
    I’m still trying to wrap my head around how we get the regression-like loadings in factor analysis when we don’t observe the latent predictors! I’ve seen PCA and EFA on the same data report very different eigenvalues. Which matrices are being used?
    
    I do suggest changing the notation for the latent variables in your factor analysis articles to something other than y. Perhaps f for “factor” or some other greek letter used in literature (like theta, eta, xi).
    Reply
    - Charles
      
      September 8, 2023 at 12:33 pm
      
      Hi Brian,
      1. Why is using both x and y confusing?
      2. Re your comment about regression-like loadings, is this addressed on the following webpage?
      https://real-statistics.com/multivariate-statistics/factor-analysis/factor-scores/
      3. You ask “What matrices are being used?” as last sentence of the second paragraph of your comment, but I don’t understand what matrices you are referring to.
      Charles
      Reply
savvaskef

May 30, 2015 at 10:16 pm

hi, i undestand the analogy that bij, the loading, is a piece of information that resides in the reduced model( i’ve read only PCA by now) but i don’t understand where cov(xi,yj) and cov(xi,xj) come from
(i thought i and j are different dimensions namely k and m)
…and i do not understand what i have to substitute them to verify the relationship say on the population version

Can you please provide an illustration?Otherwise the site is a great experience thus far and i’m sure it’s going to be that way until i consume it (what i can)
Reply
- Charles
  
  June 3, 2015 at 5:30 pm
  
  Regarding the first type of covariance, i and j take any value from 1 to k. Thus there are k x k different versions of cov(xi,xj).
  
  Regarding the second type of covariance, i takes any value from 1 to k and j takes any value from 1 to m. Thus there are k x m different versions of cov(xi,yj).
  
  Charles
  Reply

Objective

Defining terms

Key Properties

Eigenvalues

16 thoughts on “Basic Concepts of Factor Analysis”

Leave a Comment Cancel reply