Exploratory factor analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of a smaller number of common underlying dimensions. This involves finding a way of condensing the information contained in some of the original variables into a smaller set of implicit variables (called factors) with a minimum loss of information.
For example, suppose you would like to test the observation that customer satisfaction is based on product knowledge, communications skills and people skills. You develop a new questionnaire about customer satisfaction with 30 questions: 10 concerning product knowledge, 10 concerning communication skills and 10 concerning people skills. Before using the questionnaire on your sample, you pretest it on a group of people similar to those who will be completing your survey.
You perform a factor analysis to see if there are really these three factors. If they do, you will be able to create three separate scales, by summing the items on each dimension.
Factor analysis is based on a correlation table. If there are k items in the study (e.g. k questions in the above example) then the correlation table has k × k entries of form rij where each rij is the correlation coefficient between item i and item j. The main diagonal consists of entries with value 1.
Closely related to factor analysis is principal component analysis, which creates a picture of the relationships between the variables useful in identifying common factors.
Factor analysis is based on various concepts from Linear Algebra, in particular eigenvalues, eigenvectors, orthogonal matrices, and the spectral theorem. We review these concepts first before explaining how principal component analysis and factor analysis work.
Topics
- Linear Algebra Background
- Principal Component Analysis (PCA)
- Basic Concepts of Factor Analysis
- Factor Extraction
- Determining the Number of Factors to Retain
- Rotation
- Factor Scores
- Validity of Correlation Matrix and Sample Size
- Principal Axis Method of Factor Extraction
- Real Statistics Functions and Data Analysis Tools
To illustrate Factor Analysis we will use an example. Click here for a complete description of this example.
References
Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://www.webpages.uidaho.edu/~stevel/519/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf
Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3nd Ed). Wiley
http://ndl.ethernet.edu.et/bitstream/123456789/27185/1/Alvin%20C.%20Rencher_2012.pdf
Greetings Charles,
Does RealStats offer factor analysis for categorical data?
Thanks
Tracy
Tracy,
Real Statistics doesn’t explicitly provide Factor Analysis for categorical data, but according to
https://stats.oarc.ucla.edu/stata/faq/how-can-i-perform-a-factor-analysis-with-categorical-or-categorical-and-continuous-variables/
this can be done using a polychoric correlation matrix. Real Statistics does provide an ability to calculate the polychoric correlation, as described at
https://real-statistics.com/correlation/polychoric-correlation/
Charles
Correction my data is in two parts; one set of variables is binary – I coded the open-ends from respondents and then assigned a binary code they mentioned the reason for intended use of a new product, or they didn’t mention it. And scalar data 5-point scale of importance for features being considered for the new product. What would be the best technique to use for factor analysis and also for some sort of correlation/regression analysis. Trying to understand which features to include in the new product and what will influence their usage intent to uncover benefits – positive marketing attributes and barriers – resistance to use/trial. Thanks! Most of what I’ve done so far has not yielded useful results. Factor analysis most loadings are negative; ordinal regression most have insignificant p-values or if significant the “best” features are negative. My overall p-values, (E-205!!) and Correct numbers (55%) seem to be very strong. I also tried Multiple Linear Reg, and the results are very similar to Ordinal except the sig ones are positive but I know it’s not the right stat to use. What am I doing wrong? Any advice would be very much appreciated!
Hello Tracy,
I don’t really know what you are doing wrong if anything.
One note: p-value = E-205 should be interpreted as p-value = 0 (or some very small positive value).
Charles
I have to find fraud sellers from a vast list of sellers. Can I use factor analysis for this ? If yes, how ?
It really depends on the data, but factor analysis could be used for this purpose. See the webpage for how to do this.
Charles
suppose i run factor analysis of three constructs. The results in rotated matrix indicate five principal components. some of the variables in Construct 1 or factor one move to factor 2 while the others forms unknown factor can i continue with analysis?
I am sorry, but I don’t completely understand the issue that you are describing. My guess is that you can continue with the analysis.
Charles
Respected sir,
i have done a survey for finding influence of 57 factors on an issue. there are 52 respondents who have given me answers on fuzzy scale (linguistic scale) and i have converted the data obtained as triangular fuzzy number to aggregate number(defuzzified value)
i tried the chronbach alfa test and i got the following results:
k 57
sumvar 1.494146092
var 21.53858833
alpha -46.81785658
what does this indicate? can u please guide me
Sample data is as follows:
RESPONDENTS FACTOR1 FACTOR2 FACTOR3
1 0.924211376 0.924211376 0.777281588
2 0.924211376 0.924211376 0.777281588
3 0.777281588 0.777281588 0.540061725
4 0.924211376 0.924211376 0.924211376
5 0.777281588 0.777281588 0.777281588
6 0.777281588 0.777281588 0.777281588
7 0.777281588 0.777281588 0.540061725
8 0.924211376 0.777281588 0.924211376
9 0.777281588 0.777281588 0.777281588
What is your question?
Charles
Milind,
If the value of Cronbach’s alpha is -46.8 then you probably made an error in the calculation. If it is -46.8% then this is very poor result, indicating that you are probably testing multiple concepts. In which case, you would need to calculate a value of Cronbach’s alpha for each concept. You can use factor analysis to determine what are these concepts (i.e. the factors). This would map the 57 variables (what you called factors) into a much smaller number of hidden factors (each one corresponding to a concept) and furthermore help you determine which of these variables (probably questions in a questionnaire) map into which factor.
Charles
Dear sir
we are collecting data on factors affecting facility location.
The responses are in linguistic scale which is Finally Defuzzified to get a single number.
52 Respondents have given responses for 57 factors
for example :
for a factor like “power supply”we got 52 responses on a linguistic scale.
similarly for all other factors too we got responses.
we want to validate the data which is collected.
kindly advise.
thanks
Milind,
You can use factor analysis to reduce the number of variables under consideration. If after using factor analysis you settle on three factors, then you should calculate three separate Cronbach’s Alphas, one for each factor.
Charles
I am using minitab 18 and the factor loadings, coefficients and scores are generated. How do i use these to do regression or MANOVA.
Your factor scores are arrived at summing of factor scores for all factors on each variable. How do i arrive at an aggregate factor score for one factor.
SK,
I don’t use Minitab and so I can-t comment on this.
In Excel you can calculate the factor scores as described on the following webpage:
Factor Scores
Charles
Hi Charles,
I have 3 indices: USA Finance index, USA country index and Finance sector index.
How can I calculate the factor loadings for USA country index and Finance sector index?
I have a correlation matrix from the indices via taking the log of the differences of the series and creating a correlation matrix.
Can you show me in excel how I can create factor loadings for country and sector indices via PCA, MLE and Regression please.
Thanks
Joey,
The Real Statistics website shows how to do this. You can also use the Real Statistics software to automate much of the process.
Do you have some specific questions about the process?
Charles
How to use factor analysis for “Project on effectiveness of Recruitment Process Outsourcing at safeducate”
Vedant,
Sorry, but I am not familiar with this project and so can’t tell you how to use factor analysis for this project.
Charles
I want to know the reliability on the responses on 3 statements.. and determine which one to reject retain or revise… is factor analysis can be a tool? How?
You may want to consider Cronbach’s alpha.
Charles
I need help with a Factor Analysis of a small table of variances in Commodities/Prices and Volumes. Can anyone help me – I have never done one Before in Excel.
Thx
Andrew
Dr Charles!
What method use when you have categoricals variables?
How compute a polychoric correlation matrix with Likert scale?
Then you.
Bentabet,
I have described how to deal with categorical variables in the context of regression, but I don’t know how much sense this will have in the context of factor analysis. More important, though is how to deal with ordinal data (such as Likert scales). Here polychoric correlation may be used.
I plan to add a webpage to the Real Statistics website explaining how to calculate the polychoric correlation coefficient (and from this you can create matrices of these coefficients). This should be available in the next couple of days.
Charles
Excellent and very useful
Apreciado Dr, buenos dias, ¿Cómo podría hacer un análisi discriminante con Real Statistics?, o esa herramienat no la tiene le paquete?
Muchas gracias
Dear Dr, Hello, How can I make a discriminant analysis with Real statistics ?. Or that function is not in the package?
Gerardo,
Sorry, but this capability is not yet in the package.
Charles
OK Dr, thanks
am very grateful. plz what do you mean by underlying dimensions
Each hidden factor is considered to be a dimension
I have a data from 44 people for pilot study (23 questions).when i run reliability test (Cronback’s alpha ) the value is 0.856. but when i do the validity test by using principal component method or any other methods it give value as 0.56..
is it a bad score? and could you please advise if there is a way to improve this score in actual research based on pilot study.
Charan,
Generally 0.56 is usually considered not to be a great score, while .856 is considered to be a very good score.
One of the reasons that you do factor analysis to identify underlying concepts being studied. If, for example, you identify 3 such underlying concepts (i.e. the factors), you would map the original 23 questions into the 3 factors. You would then calculate three values for Cronbach’s alpha, one for the questions corresponding to factor 1 and separate scores for the questions corresponding to factors 2 and 3. You would usually expect the three separate scores to be higher than the one score based on all the questions.
Charles
When discussing findings from a factor analysis in a report in a narrative style do I need to report any other statistics other than the Chronbach alpha score?
E.g I am stating that:
I have found that out of the 8 scale variables used to measure x there are 3 themes a b c (cronbach alpha= xxx)
Lola,
This really depends on the specific research and why you are using Factor Analysis. Once you have used factor analysis to identify the three themes you can calculate Cronbach’s alpha for each of the three themes to determine the reliability of a questionnaire. You can also use the factor loadings to do all sorts of analyses (regression, ANOVA, etc. if these are appropriate for your research.
Charles
Yes I will be following on with using these newly identified factors for further analysis.
But my guidelines state I should back up my discussion with statistical evidence but this is supposed to be written as a commercial piece of research whereby the reader is not intended to be a pure stats expert, so how do I demonstrate the accuracy of my factor analysis in this case?
Lola,
Demonstrating the accuracy of your factor analysis won’t necessarily be easy to do to an audience that has no knowledge of statistics. Various indicators are given on the website, but it may be a challenge explaining these to a non-technical audience, especially since even for experts this is not a clear-cut thing.
Charles
Cronbach’s alpha has some limitations so it might be worth running a Guttman’s Lamba if you find that your Cronbach’s alphas are short of the .7 you’re looking for. Cronbach’s alpha tends to underestimate. It would be worth reporting the findings of Guttman’s Lambda (2).
Thanks Christopher,
I will be adding Guttman’s Lambda in one of the next releases of the software.
Charles
actually the respondents number is 700. And attributes number is 10.
Kumar,
Are yous saying that there are 700 respondents and the number of different attributes is 10?
Charles
i have asked respondents to rank attributes say A,B,C,D .
Rank 1 for first preference, Rank for 2nd preference . So, each one gives 4 ranks for ex: A,C, B, D as 2, 3, 1, 4.
Now using these data , can I run factor analysis so that I find out underlying dimensions of these attributes?
You can certainly run Factor Analysis in this case, as long as the assumptions are met.
Charles
Thank you so much. It’s very helpful
Charles,
Thanks for creating this very informative website. Question with factor analysis. I have a developed open ended survey. Once I get these answers, I would group them according to common ideas. Would I then rank them in a number order to run a factor analysis?
Sorry Philip, but you haven’t given me enough information to be able to answer your question.
Charles
The survey is dealing with declining participation rate of African-Americans in baseball. Very open ended with some surveys being actual interviews. Would these answers be grouped with common words/themes and then numbered to create a factor analysis similar to your example.
Philip,
If you can figure out how to give a value to the open-ended questions, then I would imagine that you could use Factor Analysis.
Charles
Hi Charles,
I am looking at the possibility of grouping a vehicle make(variable) by using their depriciation % based on a monthly basis. This is due to my assumption that the depriciation % of certain make might have the same structure which can be group together. Can factor analysis look at the time series at the same time?
In this case what is the right data structure that i should employ in order to do the analysis?
For example:
Should i put in the make as a column while for model and the month of depriciation valuation in the row? While inside the table is depriciation percentage?
Do you have any examples of forecasting/predicting future events using factor analysis?
Sorry, but I don’t have such an example.
Charles
Thank you so much for making this website, great information and great statistical tools!
Wow, this series was really helpful for me, too. I was away from my SPSS program and needed to run an analysis with Excel. Wasn’t sure how, until I came across this page. Thank you.