Factor Analysis

Exploratory factor analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of a smaller number of common underlying dimensions. This involves finding a way of condensing the information contained in some of the original variables into a smaller set of implicit variables (called factors) with a minimum loss of information.

For example, suppose you would like to test the observation that customer satisfaction is based on product knowledge, communications skills and people skills. You develop a new questionnaire about customer satisfaction with 30 questions: 10 concerning product knowledge, 10 concerning communication skills and 10 concerning people skills. Before using the questionnaire on your sample, you pretest it on a group of people similar to those who will be completing your survey.

You perform a factor analysis to see if there are really these three factors. If they do, you will be able to create three separate scales, by summing the items on each dimension.

Factor analysis is based on a correlation table. If there are k items in the study (e.g. k questions in the above example) then the correlation table has k × k entries of form rij where each rij is the correlation coefficient between item i and item j. The main diagonal consists of entries with value 1.

Closely related to factor analysis is principal component analysis, which creates a picture of the relationships between the variables useful in identifying common factors.

Factor analysis is based on various concepts from Linear Algebra, in particular eigenvalues, eigenvectors, orthogonal matrices, and the spectral theorem. We review these concepts first before explaining how principal component analysis and factor analysis work.

Topics

To illustrate Factor Analysis we will use an example. Click here for a complete description of this example.

References

Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://www.webpages.uidaho.edu/~stevel/519/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf

Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3nd Ed). Wiley
http://ndl.ethernet.edu.et/bitstream/123456789/27185/1/Alvin%20C.%20Rencher_2012.pdf

54 thoughts on “Factor Analysis”

    • Tracy,
      Real Statistics doesn’t explicitly provide Factor Analysis for categorical data, but according to
      https://stats.oarc.ucla.edu/stata/faq/how-can-i-perform-a-factor-analysis-with-categorical-or-categorical-and-continuous-variables/
      this can be done using a polychoric correlation matrix. Real Statistics does provide an ability to calculate the polychoric correlation, as described at
      https://real-statistics.com/correlation/polychoric-correlation/
      Charles

      Reply
    • Correction my data is in two parts; one set of variables is binary – I coded the open-ends from respondents and then assigned a binary code they mentioned the reason for intended use of a new product, or they didn’t mention it. And scalar data 5-point scale of importance for features being considered for the new product. What would be the best technique to use for factor analysis and also for some sort of correlation/regression analysis. Trying to understand which features to include in the new product and what will influence their usage intent to uncover benefits – positive marketing attributes and barriers – resistance to use/trial. Thanks! Most of what I’ve done so far has not yielded useful results. Factor analysis most loadings are negative; ordinal regression most have insignificant p-values or if significant the “best” features are negative. My overall p-values, (E-205!!) and Correct numbers (55%) seem to be very strong. I also tried Multiple Linear Reg, and the results are very similar to Ordinal except the sig ones are positive but I know it’s not the right stat to use. What am I doing wrong? Any advice would be very much appreciated!

      Reply
      • Hello Tracy,
        I don’t really know what you are doing wrong if anything.
        One note: p-value = E-205 should be interpreted as p-value = 0 (or some very small positive value).
        Charles

        Reply
  1. suppose i run factor analysis of three constructs. The results in rotated matrix indicate five principal components. some of the variables in Construct 1 or factor one move to factor 2 while the others forms unknown factor can i continue with analysis?

    Reply
  2. Respected sir,
    i have done a survey for finding influence of 57 factors on an issue. there are 52 respondents who have given me answers on fuzzy scale (linguistic scale) and i have converted the data obtained as triangular fuzzy number to aggregate number(defuzzified value)
    i tried the chronbach alfa test and i got the following results:
    k 57
    sumvar 1.494146092
    var 21.53858833
    alpha -46.81785658
    what does this indicate? can u please guide me

    Reply
    • Sample data is as follows:

      RESPONDENTS FACTOR1 FACTOR2 FACTOR3
      1 0.924211376 0.924211376 0.777281588
      2 0.924211376 0.924211376 0.777281588
      3 0.777281588 0.777281588 0.540061725
      4 0.924211376 0.924211376 0.924211376
      5 0.777281588 0.777281588 0.777281588
      6 0.777281588 0.777281588 0.777281588
      7 0.777281588 0.777281588 0.540061725
      8 0.924211376 0.777281588 0.924211376
      9 0.777281588 0.777281588 0.777281588

      Reply
    • Milind,
      If the value of Cronbach’s alpha is -46.8 then you probably made an error in the calculation. If it is -46.8% then this is very poor result, indicating that you are probably testing multiple concepts. In which case, you would need to calculate a value of Cronbach’s alpha for each concept. You can use factor analysis to determine what are these concepts (i.e. the factors). This would map the 57 variables (what you called factors) into a much smaller number of hidden factors (each one corresponding to a concept) and furthermore help you determine which of these variables (probably questions in a questionnaire) map into which factor.
      Charles

      Reply
      • Dear sir
        we are collecting data on factors affecting facility location.
        The responses are in linguistic scale which is Finally Defuzzified to get a single number.
        52 Respondents have given responses for 57 factors
        for example :
        for a factor like “power supply”we got 52 responses on a linguistic scale.
        similarly for all other factors too we got responses.
        we want to validate the data which is collected.
        kindly advise.
        thanks

        Reply
        • Milind,
          You can use factor analysis to reduce the number of variables under consideration. If after using factor analysis you settle on three factors, then you should calculate three separate Cronbach’s Alphas, one for each factor.
          Charles

          Reply
  3. I am using minitab 18 and the factor loadings, coefficients and scores are generated. How do i use these to do regression or MANOVA.
    Your factor scores are arrived at summing of factor scores for all factors on each variable. How do i arrive at an aggregate factor score for one factor.

    Reply
  4. Hi Charles,
    I have 3 indices: USA Finance index, USA country index and Finance sector index.
    How can I calculate the factor loadings for USA country index and Finance sector index?
    I have a correlation matrix from the indices via taking the log of the differences of the series and creating a correlation matrix.
    Can you show me in excel how I can create factor loadings for country and sector indices via PCA, MLE and Regression please.
    Thanks

    Reply
    • Joey,
      The Real Statistics website shows how to do this. You can also use the Real Statistics software to automate much of the process.
      Do you have some specific questions about the process?
      Charles

      Reply
  5. How to use factor analysis for “Project on effectiveness of Recruitment Process Outsourcing at safeducate”

    Reply
  6. I want to know the reliability on the responses on 3 statements.. and determine which one to reject retain or revise… is factor analysis can be a tool? How?

    Reply
  7. I need help with a Factor Analysis of a small table of variances in Commodities/Prices and Volumes. Can anyone help me – I have never done one Before in Excel.
    Thx
    Andrew

    Reply
  8. Dr Charles!
    What method use when you have categoricals variables?
    How compute a polychoric correlation matrix with Likert scale?
    Then you.

    Reply
    • Bentabet,
      I have described how to deal with categorical variables in the context of regression, but I don’t know how much sense this will have in the context of factor analysis. More important, though is how to deal with ordinal data (such as Likert scales). Here polychoric correlation may be used.
      I plan to add a webpage to the Real Statistics website explaining how to calculate the polychoric correlation coefficient (and from this you can create matrices of these coefficients). This should be available in the next couple of days.
      Charles

      Reply
  9. Apreciado Dr, buenos dias, ¿Cómo podría hacer un análisi discriminante con Real Statistics?, o esa herramienat no la tiene le paquete?

    Muchas gracias

    Dear Dr, Hello, How can I make a discriminant analysis with Real statistics ?. Or that function is not in the package?

    Reply
  10. I have a data from 44 people for pilot study (23 questions).when i run reliability test (Cronback’s alpha ) the value is 0.856. but when i do the validity test by using principal component method or any other methods it give value as 0.56..

    is it a bad score? and could you please advise if there is a way to improve this score in actual research based on pilot study.

    Reply
    • Charan,

      Generally 0.56 is usually considered not to be a great score, while .856 is considered to be a very good score.

      One of the reasons that you do factor analysis to identify underlying concepts being studied. If, for example, you identify 3 such underlying concepts (i.e. the factors), you would map the original 23 questions into the 3 factors. You would then calculate three values for Cronbach’s alpha, one for the questions corresponding to factor 1 and separate scores for the questions corresponding to factors 2 and 3. You would usually expect the three separate scores to be higher than the one score based on all the questions.

      Charles

      Reply
  11. When discussing findings from a factor analysis in a report in a narrative style do I need to report any other statistics other than the Chronbach alpha score?
    E.g I am stating that:
    I have found that out of the 8 scale variables used to measure x there are 3 themes a b c (cronbach alpha= xxx)

    Reply
    • Lola,
      This really depends on the specific research and why you are using Factor Analysis. Once you have used factor analysis to identify the three themes you can calculate Cronbach’s alpha for each of the three themes to determine the reliability of a questionnaire. You can also use the factor loadings to do all sorts of analyses (regression, ANOVA, etc. if these are appropriate for your research.
      Charles

      Reply
      • Yes I will be following on with using these newly identified factors for further analysis.

        But my guidelines state I should back up my discussion with statistical evidence but this is supposed to be written as a commercial piece of research whereby the reader is not intended to be a pure stats expert, so how do I demonstrate the accuracy of my factor analysis in this case?

        Reply
        • Lola,
          Demonstrating the accuracy of your factor analysis won’t necessarily be easy to do to an audience that has no knowledge of statistics. Various indicators are given on the website, but it may be a challenge explaining these to a non-technical audience, especially since even for experts this is not a clear-cut thing.
          Charles

          Reply
      • Cronbach’s alpha has some limitations so it might be worth running a Guttman’s Lamba if you find that your Cronbach’s alphas are short of the .7 you’re looking for. Cronbach’s alpha tends to underestimate. It would be worth reporting the findings of Guttman’s Lambda (2).

        Reply
  12. i have asked respondents to rank attributes say A,B,C,D .
    Rank 1 for first preference, Rank for 2nd preference . So, each one gives 4 ranks for ex: A,C, B, D as 2, 3, 1, 4.
    Now using these data , can I run factor analysis so that I find out underlying dimensions of these attributes?

    Reply
  13. Charles,

    Thanks for creating this very informative website. Question with factor analysis. I have a developed open ended survey. Once I get these answers, I would group them according to common ideas. Would I then rank them in a number order to run a factor analysis?

    Reply
      • The survey is dealing with declining participation rate of African-Americans in baseball. Very open ended with some surveys being actual interviews. Would these answers be grouped with common words/themes and then numbered to create a factor analysis similar to your example.

        Reply
        • Philip,
          If you can figure out how to give a value to the open-ended questions, then I would imagine that you could use Factor Analysis.
          Charles

          Reply
  14. Hi Charles,

    I am looking at the possibility of grouping a vehicle make(variable) by using their depriciation % based on a monthly basis. This is due to my assumption that the depriciation % of certain make might have the same structure which can be group together. Can factor analysis look at the time series at the same time?
    In this case what is the right data structure that i should employ in order to do the analysis?
    For example:
    Should i put in the make as a column while for model and the month of depriciation valuation in the row? While inside the table is depriciation percentage?

    Reply
  15. Wow, this series was really helpful for me, too. I was away from my SPSS program and needed to run an analysis with Excel. Wasn’t sure how, until I came across this page. Thank you.

    Reply

Leave a Comment