The analysis of errors in business and economic statistics involves identifying and gauging the gap between a statistical estimation and the underlying reality the statistic is intended to portray. Put simply, most statistics are assumed to be affected by one or more forms of error, whether it be from faulty sampling, inexact mathematical modeling of complex phenomena, random events that skew measurements, or other sources. Rather than dismissing such statistics because they're prone to errors, statistical analysts—a vast group including economists, statisticians, business forecasters, and many others—use various mathematical tools to predict the reliability of statistics, as well as to adjust statistics so that the distance between estimate and reality is minimized.
Statistical errors commonly result from imprecise measurement. For example, in survey research the sample of people surveyed may not be truly representative of the broader group. Or even if the respondents are representative, the survey may be constructed arbitrarily and thus filter the responses to fit a model that doesn't reflect real-life circumstances. Error analysis is concerned with preventing, or at least reducing, such measurement errors.
Some aspects of error analysis originated with astronomers who had difficulty accurately charting the paths of comets. The crude nature of telescopes and measuring equipment suggested that the comets took highly erratic paths, tacking north, then south, then north again as they traversed across the sky.
Early in the 19th century, mathematicians developed a solution to these measurement errors in the least squares criterion, a method for determining the true path of a comet by factoring out errors to the greatest extent possible. When a theoretical path was charted, astronomers knew where to look for the elusive objects and dismiss others that were not in question.
But even at this stage, astronomers were "eyeballing" their predictions. While their observations could be improved, they still had difficulty training their telescopes accurately. They required a system that would govern the spread surrounding the predicted path in which the object was most likely to be spotted.
Again, a mathematical solution was used. A statistical analysis of the deviations of sightings from the path was employed to determine the dispersion of observations—or width of the band. Observations were charted, showing their deviation from the predicted path. In most cases, these formed a classic bell-shaped curve from which other information could be derived.
This charting showed that an object should appear within a specific distance from the predicted path, and was more likely to be spotted closer to that path. It indicated not errors in the path of the comet (we assume comets do not make turns while barreling through space), but rather errors in the accuracy of the astronomer's measurement. (One notable exception occurs when observed objects do, in fact, take turns in space, suggesting the presence of another unseen but gravitationally significant object. This is how the planet Pluto was discovered.)
Another type of error may arise from poorly structured parameters, or the different facets considered relevant to a problem. For example, rocket scientists concerned with delivering a nuclear warhead to a specific location may consider a series of factors, including trajectories and speed, rotation of the Earth, gravity, and the weight of the warhead. But hundreds of other factors also may influence the re-entry path of a warhead from space, including atmospheric winds and humidity, the shape of the warhead, even the position of the moon. If these factors are not considered, errors will arise that may cause the warhead to land several miles from its computed impact site, frustrating or completely negating the effect of the warhead, e.g., failing to destroy the target.
Unfortunately, economics is a science rife with measurement error. How can an analyst accurately gauge highly intangible concepts like value, utility, and demand? Worse yet, economic data are extremely complex, encompassing dozens of individual considerations by millions of people. An attempt to simplify this information has led to a division of economics into often contradictory approaches.
The English political philosopher Jeremy Bentham is credited with conceiving the idea of measuring utility in his 1789 book, An Introduction to the Principles and Morals of Legislation. Bentham suggested that the utility of public policies could be measured by summing numerical "degrees of good tendency" determined by the population of individuals. But the subjective nature of this approach left tremendous room for measurement error.
The Italian economist Vilfredo Pareto sidestepped the metaphysical problem of measuring utility in 1927, when he theorized that degrees of utility may be derived from observations of tangibles such as quantity, price, and income. This, he maintained, would greatly reduce the degree of error in measuring utility.
However, it was Ragnar Frisch who first developed a formula for the
quantification of errors in variables in 1934. Frisch suggested a
relationship wherein a dependent variable y is determined by an
and a coefficient, 3, plus a "random disturbance," or error
There are several sources for the introduction of errors into an analysis. Errors in variables may occur when the spread of data about the mean is especially wide or unusually distributed. In addition, errors may result from regressions that are heteroskedastic—and may contain a correlation between error E and one of the variables—or where there is no relationship between variables.
Even if error can't be avoided, statistical methods are also used to estimate the likelihood that a given set of statistics is accurate, regardless of whether it has been adjusted for error. The two most common predictors of statistical accuracy are the confidence interval (often expressed as a margin of error) and the confidence level. These related measurements, respectively, tell users of statistics (1) the range of possible variations from the reported levels, and (2) how certain in percentage terms the actual figures (i.e., the real word) fall within this range. Some researchers abbreviate the two confidence indicators by referring to an "X percent confidence interval," but it should be noted that the percentage refers to the confidence level, and that the interval isn't actually a percent but a range of potential variations between two outermost possibilities in the data.
This might be expressed as follows: there is 95 percent probability (confidence level) that the market for Product X contains between 1.5 million and 1.8 million eligible buyers (confidence interval). Such a statement might come as a footnote to an estimate that the market size is 1.65 million customers. The size of the interval (or room for variation) and the strength of the confidence level are directly related. Thus for a higher confidence level, say 99 percent, the interval would have to be wider; in this make-believe example, perhaps between 1.2 million and 2.0 million eligible customers.
Since confidence levels and intervals indicate the statistical probability of accuracy, it is also possible restate them as the probability of inaccuracy. At the 95 percent level, the likelihood of inaccuracy is simply 5 percent, or a 1 in 20 chance that reality lies somewhere outside the range of values given in the confidence interval. Obviously, as confidence levels decrease, the probability of error in statistics increases. In business and economic applications, the 95 percent level is the most widely used; it is less common to see statistics expressed at the 99 percent confidence level.
Error analysis is often misapplied because results are misinterpreted and changes are introduced that increase the number and magnitude of errors. Problems of this sort are common in economics, where solutions only exacerbate errors, until the parameters of the situation can be restated.
One famous example of a misapplication of error analysis was developed by W. Edwards Deming. In this model, a supervisor measured the occurrence of faulty products produced by two workers on a theoretical assembly line. Deming asked two participants of his seminar to blindly draw 10 beads from a shoebox filled with 800 white beads, representing perfect products, and 200 red beads, representing defective products.
Of the 10 beads, one "worker" drew four that were red, while the second drew only one. The supervisor assumed that the first worker produced a higher rate of faulty products, 40 percent. Meanwhile, because only 10 percent of the second worker's products were "faulty," the supervisor assumed that his defect rate was lower. The supervisor punished the first worker for his poor performance by docking his wages, and rewarded the second for his good performance by giving him a bonus.
The test was then repeated. This time, the first worker drew no red beads, and the second drew three. Now the first worker's defect rate had fallen to zero while the second worker's had risen to 30 percent. The supervisor assumed that his punishment of the first worker compelled that worker to do better. Meanwhile, the second worker had grown lazy from his bonus, causing him to do worse. So now the first worker was rewarded for his improvement and the second punished.
After several dozen iterations of this contest, it became apparent that each worker's bonuses roughly equaled his pay reductions; his good performance was almost exactly cancelled out by his poor performance. More importantly, each worker's average defect rate was quite close to 20 percent. In fact, the workers' performance has absolutely nothing to do with the rate of defective products. Because the workers were drawing beads from a box that contained exactly 20 percent red beads, they would draw, on average, two red beads per contest.
What Deming illustrates is the pointlessness of reward systems where the fault rate is determined by some factor other than the workers' performance. If the supervisor ignores the possibility that he should be looking elsewhere for the cause of product defects, he may introduce solutions that worsen and further obscure the nature of the errors In dealing with highly complex economic problems, where the relationships between dozens of variables is not known, it is common for policy makers to measure the strength of these relationships by purposely adjusting certain variables to see what happens. While seemingly haphazard, it is highly practical because hypotheses may be tested through actual application. When the relationships between variables can be quantified, the analysis may be extended to the identification and dynamics of errors in models.
SEE ALSO : Statistical Analysis for Management
[ John Simley ]
Babbie, Earl R. The Practice of Social Research. 8th ed. Belmont, CA: Wadsworth Publishing Co., 1997.
Freedman, David, Robert Pisani, and Roger Purves. Statistics. 3rd ed. New York: W.W. Norton & Co., 1997.
McClave, James T. Statistics for Business and Economics. 7th ed. Upper Saddle River, NJ: Prentice Hall, 1998.