In its broadest sense, queuing theory is the study of contention for the use of a shared, but limited, resource. It is comprised of models and formulas that describe the relationships between service requests, congestion, and delay.
Queuing theory may be extended to cover a wide variety of contention situations, such as how customer check-out lines form (and how they can be minimized), how many calls a telephone switch can handle, how many computer users can share a mainframe, and how many doors an office building should have. More generally, queuing theory is used in business settings primarily in operations management and research problems such as production scheduling, logistics/distribution, and computer network management. These are diverse applications, but their solutions all involve the same dynamics.
The nature of the queue is one of cost shifting and burden averaging. A provider of some service whose resources are limited may serve only a small number of people at a time. Any number of people beyond that are obliged to wait their turn.
Assuming everyone's time is worth something, those who must wait for service are expending a valuable possession—their time. By waiting in line, the service provider is ensured that none of his resources will stand idle. In effect, the waiting customer is forced to pay in time for the privilege of being served, shifting costs from the service provider to the customer.
In a post office, where there is usually one line but several clerks, the next person to be served is the one who has stood longest in line. The burden of the wait is shared by all those in line—the larger the line, the longer the average wait.
The burden is less equally shared in a grocery store, where each clerk has a line. If one line happens to have five simple orders, and another has five patrons with large orders, coupons, and fruit to weigh, the simple line will probably move much faster. Those lucky enough to be in that line will be served before those in the complex line. Thus, grocery shoppers will not share the burden of the wait as equally as in the post office.
The question for the service provider is simple: how to provide good service. Indeed, the highest level of service would be achieved by providing resources equal to the number of patrons; a cashier for every shopper. However, this would be extremely costly and logistically impractical; dozens of cashiers would stand idle between orders.
To minimize costs, the manager may provide only one cashier, forcing everyone into a long, slow-moving line. Customers tiring of the wait would be likely to abandon their groceries and begin shopping at a new store. The question arises: what is an acceptable level of service at an acceptable cost to the provider.
These examples seem simple, but the questions they raise extend far beyond the average grocery store. Queuing theory is the basis for traffic management—the maintenance of smooth traffic flow, keeping congestion and bottlenecks to a minimum. Once the nature of the traffic flow is understood, solutions may be offered to ease the demands on a system, thereby increasing its efficiency and lowering the costs of operating it.
The first to develop a viable queuing theory was the French mathematician S.D. Poisson (1781-1840). Poisson created a distribution function to describe the probability of a prescribed outcome after repeated iterations of independent trials. Because Poisson used a statistical approach, the distributions he used could be applied to any situation where excessive demands are made on a limited resource.
The most important application of queuing theory occurred during the late 1800s, when telephone companies were faced with the problem of how many operators to place on duty at a given time. At the time, all calls were switched manually by an operator who physically connected a wire to a switchboard. Each customer required the operator only for the few seconds it took to relay directions and have the plug inserted and the time recorded. After the call was set up, the operator was free to accept another call. The problem for an early telephone traffic engineer was how many switchboards should be set up in an area.
Beyond that, supervisors were faced with the problem of how many operators to keep on the boards. Too many, and most operators would remain idle for minutes at a time. Too few, and operators would be overwhelmed by service requests, perhaps never catching up until additional help was added.
Often, callers who were unable to gain an operator's attention simply hung up in frustration and, suspecting it was a busy time for the operators, would wait several minutes before trying again. Others stayed on the line, waiting their turn to talk to the operator. Yet others would call repeatedly, hoping the operator would be sufficiently annoyed by repeated calls to serve them next.
These behavioral discrepancies caused problems for traffic engineers because they affected the level of demand for service from an operator. A call turned away was lost, not to come back until much later, and was effectively out of the system. Callers who held were more predictable, while repeat callers only increased demands on the system by appearing as several requests. Poisson's formula was meant only for the latter situation.
Because few callers acted as aggressively as the Poisson formula assumed, systems were often overengineered, resulting in a substantial waste of resources. Operator offices were equipped with 24 switchboards when they never used more than 20.
A Danish mathematician named A.K. Erlang developed a different approach to traffic engineering based on Poisson' s work. He established formulas for calls that are abandoned (called Erlang-B) and for those that are held until service is granted (Erlang-C).
The Poisson and two Erlang formulas made some basic assumptions about the system. Because user behavior is unpredictable, the types of calls that are received are assumed to be randomly distributed. That is, unusual calling characteristics in one period are likely to be normalized over time yielding a more normal distribution.
The most basic model of a queue is known as the M/M/l system. This notation means that items (persons, data packets, product units) arrive in the queue following a random Poisson distribution (also called Markovian, yielding the first "M"), that the distribution of service intervals is first-in/first-out and exponential (second "M"), and that the queue processes or services only one item at a time (numeral "1"). This model corresponds to an ordinary queue, such as a grocery store line, where customers arrive at random times, are served one at a time in the order they come (no one receives priority), and one server is handling everyone. Without further information, this model also implicitly assumes there is no upper limit on the system capacity (at least none that can be specified) or on the population of items that may enter the queue, two assumptions that are of course erroneous in many real-life applications. An M/M/I queue with capacity constraint c and population constraint p would be represented by the notation M/M/l/c/p.
Based on average arrival rates and average service rates, formulas describing the M/M/I model, or any other queuing model, can be used to calculate important system measurements such as capacity utilization, average waiting times for servicing, or the average number of items in the queue at a given time. Using the M/M/I model, for example, we can determine the average customer-processing speed necessary for a grocery store to maintain its promise that no more than three persons will wait in a given grocery line at any time (assuming that each line is an independent M/M/l queue). To maintain an average of approximately three customers in the system at any given time, the store would need to check out 1.33 customers for every one who arrived per unit of time. If, for the sake of simplicity, we assume one customer arrives per minute, this would mean an average wait of just over three minutes per customer. At this level, the store is using approximately 75 percent of its server (cashier) time, or capacity.
If the store wanted to save on staffing dollars by increasing its capacity usage to 90 percent, its effective rate of service would have to slow to an effective rate of 1.1 customers a minute. Of course, this doesn't mean that employees would work more slowly, but that the system as a whole (which would require a more elaborate model to fully articulate) is processing the same number of customers at lower staffing levels. At the 90 percent utilization level, there would be an average of nine customers in the queue at any time, and the average wait would be about 9 minutes. In effect, a 20 percent gain in the store's utilization rate would cost the average customer a threefold increase in waiting time.
A different—and more accurate—model of the store's queuing system would yield different values, but this example illustrates how models may be applied to real-life problems. Other queuing models vary in the arrival patterns, servicing patterns, number of servers, and other constraints. Examples of other arrival and servicing patterns include deterministic patterns (e.g., D/D/1), where events occur at known rather than random intervals, and general patterns (e.g., G/GIl), where events may be random or deterministic. These patterns may be used in different combinations in the same model, as in M/D/I or D/G/I. The order of processing/servicing can also vary. In the M/M/I it is assumed that the order is first-in, first-out (FIFO), but any model may specify alternatives such as last-in, first-out (LIFO), serve in random order (SIRO), and various kinds of priority orders where certain items in the queue are processed sooner than others based on some criteria.
The most unusual recurring period is the "busy hour," that provides a pattern upon which the system should be engineered. For example, if a system receives its highest number of calls between 9 and 10 a.m., the office should be equipped with enough switchboards to handle that level of requests. The issue of how many operators to assign depends on calling patterns from one hour to the next.
What is interesting about the Poisson and Erlang formulas is that the relationship between operators and congestion is not parallel. For example, assume that 10 operators are inundated by 30 percent more calls than they usually handle. A supervisor calls in an eleventh operator and, even though the rate of incoming service requests remains constant, the backlog will gradually fall. After the backlog is eliminated, the eleventh operator may actually force others to go idle for extended periods.
We might assume that 10 operators handling 130 percent of their normal volume would require 13 operators. In fact, the addition of only one is more than enough to resolve the problem. This is because repeated calls are disposed of, and the aggregate wait of everyone holding (which grows geometrically) is reduced one factor at a time. The backlog simply cannot regenerate itself fast enough.
Put differently, 11 operators may be able to dispose of service requests at a faster rate than they are coming in. It may take a few minutes to eliminate the backlog, but the backlog will decline eventually.
Consider the situation in a grocery store where there are five lines open and 12 people in each line. The addition of only one extra cashier will quickly reduce the lines to one or two people, even though the same number of people are entering checkout lines. When the backlog is eliminated, the sixth cashier may be taken off and put on some other job.
As well as a system may be engineered, unusual nonrandom disturbances can cause the system to collapse in spectacular fashion. This was demonstrated by a problem with the New York water system during the 1950s. Engineers discovered that water pressure dropped significantly—and for firemen, perilously—during a period of hours every Sunday evening. A study revealed an unusual culprit: Milton Berle.
The comedian hosted an immensely popular weekly television show every Sunday that was watched by nearly everyone with a set. When the show went to a commercial break, tens of thousands of people, having finished dinner, retreated to their bathrooms at the same time.
With thousands of toilets being flushed within minutes of each other, sewers were inundated. More importantly, toilet tanks were refilling, each consuming two or three gallons of fresh water. The coordinated demand for water in a brief period of time virtually eliminated water pressure. In fact, some toilets took a half hour to refill, and water pressure took hours to recover.
Serious consideration was given to canceling the show. The solution, however, was relatively simple. The addition of only a few more water towers was sufficient to maintain adequate water pressure. In essence, the system was reengineered to handle more demanding peaks.
This situation may be repeated in a telephone system when everyone is motivated to place a call at the same time. During the 1989 San Francisco earthquake, vast numbers of people in the metropolitan area attempted to make a call at the same time—immediately after the quake subsided—hoping to learn whether friends and relatives were safe. Although the switching systems were automated, they were completely unable to handle the volume of requests for dial tone.
Only a small percentage of calls (enough to meet the capacity of the system) were allowed to go through. Indeed, radio and television reporters urged people to stay off the lines so that emergency calls could be handled.
There was no need to reengineer the system because the occurrence of earthquakes, while random, are not consistently repeated. It would be uneconomic to engineer the telephone network for peak usages that occur only once every decade or so. As a result, every earthquake yields a temporary breakdown in the telephone network.
Other slightly less offensive instances occur every time a radio host offers a prize to "caller number x." Telephone companies and public officials have convinced many radio stations to discontinue the practice.
The most relevant and promising future applications of queuing theory are likely to occur in the areas of computer science and manufacturing systems. In computer science, queuing is a necessary consideration in contention for processing resources. Given that the highest-cost component in advanced computation is processing power, systems are moving increasingly toward network solutions in which processing power is distributed. The result of this trend toward greater distribution is that systems will contend for access to a network and to diverse processors and files.
In manufacturing, queuing is a necessary element of flexible systems in which factors of production may be continually adjusted to handle periodic increases in demand for manufacturing capacity. Excess capacity in periods of low demand may be converted into other forms of working capital, rather than be forced to spent those periods as idle, nonproductive assets.
The concept of flexible manufacturing systems is very interesting. Consider that today a company such as Boeing endures long periods of low demand for its commercial aircraft (a result of cycles in the air transportation industry), but must quickly tool up for expanded production when demand rises. The company must alternately open and mothball millions of dollars worth of manufacturing capacity (and hire and lay off thousands of workers) through every demand cycle.
During periods of low demand, floor space, machinery and inventories remain tied up. If, on the other hand, demand flows could be better managed, Boeing could convert these assets to more productive applications. The primary focus of queuing theory on flexible manufacturing remains centered on machine reliability and depreciation and processing and cycle times.
[ John Simley ]
Gross, Donald, and Carl M. Harris. Fundamentals of Queuing Theory. New York: John Wiley & Sons, 1998.
Hall, Ranaolph W. Queueing Methods: For Services and Manufacturing. Englewood Cliffs, NJ: Prentice Hall, 1991.