# NONPARAMETRIC STATISTICS

Many statistical methods apply solely to populations having a specified distribution of values, such as the normal distribution. Naturally, this assumption is not always reasonable in practical applications. The need for techniques that may be applied over a wide range of distributions of the parent population has led to the development of distribution-free or nonparametric statistical methods. In place of parameters such as means and variances and their estimators, these methods use ranks and other measures of relative magnitude—hence the term "nonparametric." Distribution-free statistical procedures do not require the normality of underlying populations or that such populations have any particular mathematical form. Although they do require certain assumptions, nonparametric tests are generally valid whatever the population distribution.

Nonparametric counterparts to standard statistical testing procedures are typically used either when an assumption violation that invalidates the standard test is suspected, or in situations where the response variable of interest is not susceptible to numerical measurement, but can be ranked. A common example of this second setting is provided by studies of preference in marketing. For instance, in a taste test, a consumer may be asked to rank a new product in order of preference among several current brands. While the consumer probably has a preference for each product, the strength of that preference is difficult to measure. The best solution may be to have the consumer rank the various products (I = best, 2 = second best, and so forth). The resulting set of ranks cannot be appropriately tested using standard procedures. Thus, nonparametric methods are often employed when measurements are available only on a nominal (categorical) or ordinal (rank) scale.

## NONPARAMETRIC TESTS AND PROCEDURES

Sums of ranks are the primary tools of nonparametric statistics. When comparing samples, the statistician ranks the observations in order, and then considers statistics based on those ranks, rather than on the raw data. Most statistical computer packages will calculate a wide array of nonparametric statistical procedures. The most common rank-based tests are the Wilcoxon rank sum and Wilcoxon signed-rank statistics, used for comparing two populations in, respectively, independent sampling experiments and paired-difference experiments.

For independent samples, another nonparametric analog to the two-sample t test is the Mann-Whitney test, which may be extended, in the case of more than two samples, to the Kruskal-Wallis test. The Kruskal-Wallis test (which tests location) and Friedman twoway ANOVA (which compares variation) are often used as alternatives to the analysis of variance for comparing several populations in a completely randomized experimental design. The Kolmogorov-Smirnov two-sample test is also used for testing differences between sample cumulative distribution functions.

For one-sample tests, a Kolmogorov-Smirnov procedure has been developed to compare the shape and location of a sample distribution to a reference distribution, such as a normal or uniform. The Lilliefors procedure tests standardized values of the sample data (values that have been centered at zero, by subtracting the mean, and also divided by the standard deviation) to see whether they are normally distributed. Also, the Wald-Wolfowitz runs test detects serial patterns in a run of values. In the study of correlation between two variables, methods such as Spearman's rank correlation coefficient (a nonparametric alternative to Pearson's product-moment correlation coefficient), Kendall's tau, and the Goodman-Kruskal gamma statistic may be used to estimate relationships using noninterval data.

## NONPARAMETRIC REGRESSION AND SMOOTHING

A regression curve describes a general relationship between one or more predictor variables X and a response variable Y. Parametric regression procedures specify a functional form (such as a straight line with unknown slope and intercept) for the relationship between Y and X. In nonparametric regression, neither the functional form nor the error distribution of the regression curve is prespecified. Instead, a smoothing estimate is developed, often through the use of kernel smoothers, allowing for extremely versatile and flexible methods of presenting and analyzing data. Nonparametric smoothing procedures may be used to model kinks in nonlinear relationships between, for instance, sales projections for heating-oil use and temperature. Additionally, smoothing methods may be used, without reference to a specific parametric model, in the diagnosis of outliers, and in the analysis of missing data.

## THE MISUSE OF NONPARAMETRIC STATISTICS

Some researchers fall back on nonparametric procedures as a substitute for collecting good data. There are, however, explicit assumptions involved in nonparametric analyses, as there are for parametric tests. In most cases, these procedures were designed to apply to data that were categorical or ranked in the first place. Rank-based nonparametric methods may lose much of the information contained in the observed data. Data that violate distributional assumptions for standard probability models may be more effectively transformed using logarithms, roots, or powers than by ranking.

[ Thomas E. Love ,

updated by Kevin J. Murphy ]

Conover, W. J. Practical Nonparametric Statistics. New York: John Wiley & Sons, Inc., 1998.

Hettmansperger, T. P., and J. W. McKean. Robust Nonparametric Statistical Methods. Edward Amold, 1998.

Hollander, M., and D. Wolfe. Nonparametric Statistical Methods. New York: John Wiley & Sons, Inc., 1999.

Randles, R. H., and D. Wolfe. Introduction to the Theory of Nonparametric Statistics. Krieger Publishing, 1991.