Objective
In this model, we consider k independent variables x1, …, xk and observed data for each of these variables. Our objective is to identify m factors y1, …, ym, preferably with m ≤ k as small as possible, that explain the observed data more succinctly.
Defining terms
Definition 1: Let X = [xi] be a random k × 1 column vector where each xi represents an observable trait, and let μ = [μi] be the k × 1 column vector of the population means. Thus E[Xi] = μi. Let Y = [yi] be an m × 1 vector of unobserved common factors where m ≤ k. These factors play a role similar to the principal components in Principal Component Analysis.
We next suppose that each xi can be represented as a linear combination of the factors as follows:
where the εi are the components that are not explained by the linear relationship. We further assume that the mean of each εi is 0 and the factors are independent with mean 0 and variance 1. We can consider the above equations to be a series of regression equations.
The coefficient βij is called the loading of the ith variable on the jth factor. The coefficient εi is called the specific factor for the ith variable. Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.
Define the communality of variable xi to be φi = and let ϕi = var(εi) and = var(xi).
Key Properties
Since μi = E[xi] = E[βi0 + yj + εi] = E[βi0] + E[yj] + E[εi] = βi0 + 0 + 0 = βi0, it follows that the intercept term βi0 = μi, and so the regression equations can be expressed as
From the assumptions stated above it also follows that:
E[xi] = μi for all i
E[εi] = 0 for all i (the specific factors are presumed to be random with mean 0)
cov(yi, yj) = 0 if i ≠ j
cov(εi, εj) = 0 if i ≠ j
cov(yi, εj) = 0 for all i, j
From Property A of Correlation Advanced and Property 3 of Basic Concepts of Correlation, we get the following:
From these equivalences, it follows that the population covariance matrix Σ for X has the form
where is the k × k diagonal matrix with in the ith position on the diagonal.
Eigenvalues
Let λ1 ≥ … ≥ λk be the eigenvalues of Σ with corresponding unit eigenvectors γ1, …, γk where each eigenvector γi = [γij] is a k × 1 column vector of the form γi = [γij]. Now define the k × k matrix β = [βij] such that βij = γij for all 1 ≤ i, j ≤ k. As observed in Linear Algebra Background, all the eigenvalues of Σ are non-negative, and so the βij are well defined (see Property 8 of Positive Definite Matrices). By Theorem 1 of Linear Algebra Background (Spectral Decomposition Theorem), it follows that
As usual, we will approximate the population covariance matrix Σ by the sample covariance matrix S (for a given random sample). Using the above logic, it follows that
where λ1 ≥ … ≥ λk are the eigenvalues of S (a slight abuse of notation since these are not the same as the eigenvalues of Σ) with corresponding unit eigenvectors C1, …, Ck and L = [bij] is the k × k matrix such that bij = cij.
As we saw previously
The sample versions of these are
We have also seen previously that
The sample version is therefore
dear Charles I’m your fun / user since long time and I’m very happy for you to share your knowledge and the useful excel add-in, it saved my working-life many times! I’ve a question concerning FA and PCA: could you summarize what are the differences between the two analysis? I mean not in mathematical terms but in the interpretation of the results that the two analysis come up (i.e. starting from the same dataset). It will help to better understand which one to use and /or how to discuss the meanings of the results. many thanks again
I am quite pleased that you have gained value from the Real Statistics website.
Regarding the difference between FA and PCA, perhaps the following will help:
https://www.theanalysisfactor.com/the-fundamental-difference-between-principal-component-analysis-and-factor-analysis/
Charles
Charles,
One last question: if I want to create an index that will cover a few years is it correct to do it in this way:
1) run PCA on every year separately;
2) get the weights for my variables from the eigenvectors of the first principal component for each year;
3) find an average for all the weights in the timeseries and then apply them to each year to get fixed weights for every year and thus obtain comparable indexes.
Or is it not possible to use PCA in this way?
Thank you!
Maria
Hi Maria,
1. You can run PCA for every year, but what this will serve depends on why you want to run PCA in the first place.
2 and 3. Not sure why you would want to do this.
Sorry if my answers are not very helpful, but it is not clear to me what your objective is and so I am unable to determine whether PCA is the way to address this objective.
Charles
Hello Charles,
Thank you very much for you hard work, it’s greatly appreciated. Could you please tell me if I need to normalize data before performing factor analysis or PCA with real statistics pack? I’m trying to construct a composite index and some of the variables don’t fit the normal distribution even after performing Box-Cox transformation…
Maria
Maria,
The data doesn’t need to be normally distributed.
Charles
Thank you, Charles!
One more question: should the data be rid of the outliers for the correct FA/PCA?
Maria
Also, I think there might be a typo in:
Observation: Since μi = E[xi] = E[βi0 + \sum_{j=1}^m \beta_{ij} yi + εi] = E[βi0] + \sum_{j=1}^m \beta_{ij} E[yi] + E[εi]
The sum should be over the yj (not yi)?
Thanks!
Fred
Hello again Charles,
When you say: “Let β = [βij] be the k × m matrix of loading factors and let ε = [εi] be the k × 1 column vector of specific factors.”
Shouldn’t it be (m+1) to account for the constant term β0?
Thanks!
Fred
Hi Charles,
Thanks for the wonderful material. I’ve gone through Factor analysis concepts multiple times. However I am not able to reconcile the following concepts. In PCA, we are trying to express y as a linear combination of x. Here x is known and the betas and y are unknown. By trying to maximize the variance of y subject to some constraints we are able to solve for the betas. Once we know beta we can calculate y. However in Factor analysis, we are doing the converse i.e. trying to express x as a linear combination of y. However here too the solution we get is the same as that in PCA. That is the decomposition of the covariance matrix of X gives us the betas. How can the matrix beta be the same in both situations.
Would appreciate if you could provide some explanation.
Regards,
Piyush,
I am sorry, but I have not had the time to study your comment in any detail. Can you give me a specific example where the two beta matrices are the same?
Charles
I think confusion comes from notation using x and y in both. For PCA, x and y are all observed variables. Y is just a linear combination of x’s and we know the loadings derived straight from x’s covariance matrix. The eigenvectors indicate a new set of axes and we project the data onto them.
I’m still trying to wrap my head around how we get the regression-like loadings in factor analysis when we don’t observe the latent predictors! I’ve seen PCA and EFA on the same data report very different eigenvalues. Which matrices are being used?
I do suggest changing the notation for the latent variables in your factor analysis articles to something other than y. Perhaps f for “factor” or some other greek letter used in literature (like theta, eta, xi).
Hi Brian,
1. Why is using both x and y confusing?
2. Re your comment about regression-like loadings, is this addressed on the following webpage?
https://real-statistics.com/multivariate-statistics/factor-analysis/factor-scores/
3. You ask “What matrices are being used?” as last sentence of the second paragraph of your comment, but I don’t understand what matrices you are referring to.
Charles
hi, i undestand the analogy that bij, the loading, is a piece of information that resides in the reduced model( i’ve read only PCA by now) but i don’t understand where cov(xi,yj) and cov(xi,xj) come from
(i thought i and j are different dimensions namely k and m)
…and i do not understand what i have to substitute them to verify the relationship say on the population version
Can you please provide an illustration?Otherwise the site is a great experience thus far and i’m sure it’s going to be that way until i consume it (what i can)
Regarding the first type of covariance, i and j take any value from 1 to k. Thus there are k x k different versions of cov(xi,xj).
Regarding the second type of covariance, i takes any value from 1 to k and j takes any value from 1 to m. Thus there are k x m different versions of cov(xi,yj).
Charles