FACTOR ANALYSIS Photo by: Nataliia

Factor analysis is a statistical technique that is used to determine the extent to which a group of measures share common variance. Factor analysis is sometimes termed a "data reduction" technique because the method is frequently used to extract a few underlying components (or factors) from a large initial set of observed variables. It is extensively used in psychological research concerned with the construction of scales intended to measure attitudes, perceptions, motivations, and so forth. Business-related applications are numerous and examples include the development of scales used to measure customer satisfaction with products and employee work attitudes. Factor analysis, however, has applicability outside of the realm of psychological research. It may be used, for example, by financial analysts to identify groups of stocks in which prices fluctuate in similar ways. And factor analysis often plays a crucial role in establishing the validity of employment tests and performance appraisal methods, thus helping a firm defend itself against employment discrimination charges.

There are many different methods of factor analysis and the underlying mathematical theory is quite complex. The basic elements of factor analysis, though, are relatively simple to understand. An example of the use of factor analysis might involve research designed to construct a scale of employee job satisfaction. Initially, a researcher or consultant may assemble a large set of questionnaire items that seem to be related to job satisfaction. These items will generally be presented to subjects along with some type of numeric or verbal scale.

The job satisfaction questionnaire may include several dozen such questions. What is really of interest, however, are employee views regarding underlying dimensions of job satisfaction. Typically, there are only a few such dimensions, which are psychological states that cannot be directly measured. Such dimensions are called "factors" and factor analysis is used to assess them indirectly.

In this case, the basic factor analysis model assumes that employee responses to each of the job satisfaction items in the questionnaire can be condensed into one or more underlying factors. Sometimes the researcher will have some expectation as to the number of factors, although such an assumption is not necessary. The factors are assumed to be related to the score of each item on the questionnaire in a linear manner. Suppose that all of the answers to the job satisfaction items derived from two underlying factors, plus some random element (perhaps due to measurement error). If so, then a respondent's answer to a particular item could be decomposed into those basic factors according to the following equation: where: scorei = subject's score on questionnaire item i; a 1i = coefficient relating factor 1 to score i ; factor 1 = value of the first factor for the subject; a 2i = coefficient relating factor 2 to score i ; factor 2 = value of the second factor for the subject; random = random error.

The underlying factors can be thought of as the subject's true feelings with respect to his or her job. The researcher, however, has only the subject's answers to specific questions regarding the characteristics of the employee's job. Since these are measured with some error, there is also a random component to the observed value. Separate equations can be written for all of the items in the questionnaire. For each item, the coefficients (a 1i and a 2i ) will probably be different. These coefficients are usually referred to as "factor loadings." While the factors cannot be measured directly, it is possible to estimate factor loadings indirectly. Estimates of factor loadings are derived from the matrix containing the intercorrelations of all of the observed scores for a large number of subjects. Each subject in the sample answers all of the questionnaire items and the matrix contains all possible correlations between pairs of these items. The mathematics of this process is beyond the scope of this article (though is discussed in the references included at the end). Most general statistical programs, such as the Statistical Package for the Social Services (SPSS), SYSDAT, and SAS, perform factor analysis.

Approaches to factor analysis can be grouped into two fundamentally different approaches. In the case of "exploratory factor analysis," the researcher may have only a vague idea at best as to how many factors are included in the set of variables being studied. In addition, he or she may not have strong expectations as to which observed variables will be associated with which factors. The exploratory approach is thus used to gain an understanding of how variables might be related and discover the likely underlying factor structure. More recently, researchers have developed a more sophisticated approach called "confirmatory factor analysis." In confirmatory factor analysis, the researcher has expectations as to the true number of underlying factors and can apply different types of tests to determine if the hypothesized structure is correct. In practice, researchers often intermingle exploratory and confirmatory techniques.

In "exploratory factor analysis," the researcher is first confronted with the problem of determining the optimal number of factors to extract from the matrix of correlations among the observed variables (such as responses to questionnaire items). Various criteria are available, but the most common is based on the increment in common variance explained by extracting an additional factor. Exploratory methods typically define factors as uncorrelated to one another; the first factor extracted by the method will explain the greatest amount of common variance in the set of observed items, the second factor will be the uncorrelated (or "orthogonal") factor that explains the next largest component of common variance, and so forth. Factor extraction continues until what the researcher believes is an optimal number of factors has been extracted. The maximum number of factors that can be extracted is equal to the number of observed variables, but usually the actual number is much smaller. A widely used stopping point in factor extraction is the point at which the "eigenvalue" (a mathematical term related to the matrix manipulation methods employed to extract factors) of a factor to be extracted is no greater than one. This suggests that the factor explains no more variance among the variables than that explained by a single variable. Such factors are often associated strongly with only one observed variable and are likely explaining only the measurement errors in observed variables. Other considerations, however, including the researcher's professional judgment, enter into the picture. Unlike many other statistical techniques, there are no strong tests of statistical significance that can be used in exploratory factor analysis. Often researchers extract different numbers of factors and compare the results, choosing the outcome that is most theoretically reasonable.

Factor analysis methods derive factor loadings in such a way that groups of variables that are highly interrelated tend to load on the same factor. Loadings are in the range of -1.0 to 1.0 and the higher the absolute value of a loading, the more closely linked an observed item is to a factor.

The observed pattern of factor loadings is generally used as a way of interpreting the underlying factors. This pattern will depend on a number of decisions made by the researcher, including the number of factors extracted and the factor extraction methods specified. Factor loadings of greater than .70 (or less than —.70) are often considered indicative of a close association between a factor and an observed item. In the hypothetical example, relatively strong factor loadings are highlighted. Factor 1 appears to be associated mainly with the economic aspects of the job, while Factor 2 is associated mainly with the job's more intrinsic qualities. This may lead the researcher to conclude that there are two principal dimensions to employee job satisfaction and name the factors accordingly.

It is also possible to estimate the proportion of variance in an observed variable that is explained by underlying factors. This is called an item's communality. Of course, some variables may fall through the cracks. This is the case with "level of stress," which loads weakly with both factors. The communality of the stress variable is also going to be low. The researcher might conclude that stress represents an independent dimension and exclude it from the analysis.

"Confirmatory factor analysis" is a much different process. Here, the researcher generally has an idea both as to the number of underlying factors and the factor structure. This means he or she specifies which of the observed variables are associated with which of the hypothesized factors. The factors themselves may be correlated. If certain mathematical conditions are met, then the values of all of the factor loadings identified can be estimated; no subsequent rotation of the factor loadings is required. Confirmatory factor analysis methods also allow for various tests of statistical significance, both for the model as a whole and for individual parameters, such as factor loadings. Thus tests of statistical signifigance exist for individual factor loadings (as for coefficients in multiple regression analysis) and several different overall fit statistics are used to assess the general adequacy of the model.

In using confirmatory factor analysis, the researcher must posit one or more specific models, defining the number of factors and which variables are hypothesized to be determined by (i.e., to have loadings with) which factors. Unlike exploratory factor analysis, in which all observed variables are assumed to load on all factors, many or all of the observed variables in confirmatory factor analysis will generally be hypothesized to load on fewer than the specified number of factors, often on only one of the factors (i.e., the factor loads between an observed variable and all of the other factors is, in effect, fixed to a value of zero). Once a model is specified, its factor loadings and other parameters (e.g., error variances) can be estimated. This is accomplished via a mathematical process in which the estimated loadings and other parameters are used to compute a predicted correlation matrix for the observed variables. With a predicted correlation matrix, the parameters are systematically updated via an iterative process until the difference between the actual and predicted correlation matrices are minimized given the hypothesized structure of the model, at which point the results are said to "converge."

There are several goodness-of-fit statistics for the overall model, with the basic statistic following a "chi-square distribution." The chi-square test can be used to determine if the model as a whole reasonably reproduces the observed correlation matrix. If so, then the researcher would conclude that the analysis tends to confirm the validity of the hypothesized model (assuming the values of the factor loadings are also of the expected signs). Since the chi-square test is highly sensitive to the sample size, however, it often leads to rejection of the models. For this reason, other criteria, such as the "goodness-of-fit index," are often used to assess these models. Such alternative criteria are not size sensitive, but also do not have sampling distributions that can be used to perform tests of statistical significance. Thus, such tests are rather judgmental, though there is general agreement as to appropriate values for these indexes to achieve in order to confirm a model.

It is also possible to use confirmatory factor analysis to compare two different models in order to determine which one fits better. For example, a researcher may wish to determine if a one-factor model of job satisfaction, in which all observed indicators of satisfaction load on a single factor, is inferior to a two-factor model, in which satisfaction with economic aspects of the job load only on one factor and satisfaction with intrinsic aspects load only on a second factor. Both models would be estimated and the one with the better measures of fit would be the preferred.

In practice, exploratory and confirmatory factor analysis are often used jointly. Initial work with a scale might involve exploratory factor analysis, to discern likely patterns in the interrelationships among the observed variables. Once those are determined, then confirmatory factor analysis might be used, typically with another sample of cases, to test the expected relationships. Sometimes researchers will start with a confirmatory approach and discover their anticipated model does not work. This may give rise to further exploratory work to determine what relationships might exist among the data.

Factor analysis continues to be a central tool for discerning the validity and reliability of scales that might be used in business decisions. Once a scale is validated in this matter, the items can be added together to form an estimate of the scale. Sometimes researchers will extract weightings, called "factor score coefficients," as a by-product of factor analysis; these coefficients weight the observed variable by its relative importance on a given factor in calculating the estimated value of that factor. Continuing refinements in confirmatory factor analysis techniques should make this type of factor analysis especially important as an analytical tool in coming years.

[ John J. Lawler ]

Kim, Jae-on, and Charles W. Muller. Factor Analysis: Statistical Methods and Practical Issues. Beverly Hills, CA: Sage Publications, 1978.

Kline, Paul. An Easy Guide to Factor Analysis. New York: Routledge, 1994.

Long, J. Scott. Confirmatory Factor Analysis: A Preface to LISREL. Beverly Hills, CA: Sage Publications, 1983. McDonald, Roderick P. Factor Analysis and Related Methods. Hillsdale, NJ: Lawrence Erlbaum Associates, 1985.