[ Home ] [ Table of Contents ] [ About Lee Goeller ] [ Search ]

Goeller on Telecom Traffic

A First Course in
Telephone Traffic Engineering

Chapter 4: Erlang C and Queuing. Lost Calls Delayed

What Is a Queue?

When people wait in line to be served, they are standing in a queue. The reason they stand in a queue waiting to be served is because there are not enough servers to be sure that at least one is available when somebody requests service. There is nothing wrong with this. Servers are often expensive, and must be kept busy if they are to earn their keep. If people are standing in line, as soon as one is served, the next is in position to keep the server busy.

When it comes to telephone systems, queuing is often advertised as the latest, greatest thing since sliced bread, guaranteed to save vast sums of money on long distance calling. One must not be carried away by such claims. Queuing is a tool, but it is only one of three used in network design. With proper design, a good deal of money can often be saved, but queuing is only one of several approaches.

Actually, there are three kinds of queuing: retries, off-hook or hold-on queuing, and call-back queuing. With retries, you just keep trying until you get through. With hold-on queuing, you stand there clutching the phone, listening to commercials on hold, and when the WATS line comes free and it's your turn, you are connected through. Finally, with call-back queuing, you place your order and hang up. When the appropriate facility comes free and you are next in line, the system calls back; upon your answer, the call is put through to the called party.

Retries can be used with any system, and without any special training for the callers. However, the most aggressive callers get the best service and only a relatively small improvement in circuit occupancy is possible. Further, under overload conditions, the number of retries can increase dramatically; since the system control has to do just about as much work for an unsuccessful attempt as for one that works, it is easy to overload the system. This is particularly true with small PBXs that use a single microprocessor; an 8080 or 6800 can only perform so many operations per second.

Hold-on queuing is easy from the system design standpoint because the caller is kept on the phone for the convenience of the system. He does not make countless retries, and he does not have to be called back and gotten on the line. When the trunk comes free, the system connects him through. Typically, hold-on queuing is limited to something on the order of 30 to 45 seconds. Then the user is either given overflow tone or else is taken via some more expensive facility. Because the average wait in queue tends to be so short, little improvement in facility loading occurs.

Call-back queuing can permit waits of several minutes; thus, it can be helpful in increasing the loading of expensive facilities, particularly if they are in small groups. However, call-back may go astray because hunting or call-forwarding in the PBX may deliver the call-back to someone else, the PBX may wipe the user out of the queue without letting him know (some PBXs do this if you make or receive a call while waiting for call back), or the user may simply be unavailable by the time his turn comes up.

Very short and very long queues are not particularly effective. With regard to short queues, it was mentioned in Chapter 1 that the average wait for the first person in line is the average holding time of the call divided by the number of servers in the group. For 2 FX lines and 6-minute calls, this comes out to 3 minutes. If the queue is half a minute long, the odds are one in six of getting a facility before the queue times out.

For long holding times, such as ten minutes or so, the trunk group is running just about full; since a server of any sort cannot do more than one hour's work in one hour, there is little to be gained in terms of circuit occupancy by delaying the caller any longer. Thus practical queuing systems hold people in queue for perhaps one to five minutes, and then give the caller a chance to go via some more expensive facility.

Running Erlang C

There are two programs on the disk for Erlang C: ERL-C1 and ERL-C2. Let's call up ERL-C1 first and see what it gives us. Just as with ERL-B, the program asks us for offered traffic. One of the principal assumptions of Erlang C is that nobody drops out of the queue, and callers use the system for one full holding time once they are served. This is the "lost calls delayed" assumption. Offered and carried traffic are, by definition, the same. We know people drop out of the queue when the wait gets long, but Erlang C works surprisingly well, all things considered.

Let's give the program the same amount of traffic we used in the last chapter: 14.63 Erlangs. The computer broods for a few seconds, and then pumps out a screen-full of data, shown in Printout 4-1. Table III in Appendix II shows the same sort of information. When we get it, what do we have? Well, N, as before, is the number of trunks in the group, and P, as usual, is the probability that exactly N trunks in the group are busy. Here, this turns out to be the probability that a call, arriving at random, will go into the queue. If we have fewer than N calls in the system, there will naturally be no queue.

The next four columns, headed D1, D2, Q1 and Q2 are measures of Grade of Service. Let's start with Q1 and Q2 first, because they are easier to explain. Q1 tells us the average number of calls waiting in queue over the entire hour. But P is less than 100%, so we know that for some portion of an hour, there will be no queue at all. Q2 allows for this. It is the average number of calls in the queue during just that part of the hour that all trunks are busy. Because we have the same number of calls waiting for service, but we now recognize that they are waiting only over a part of the hour, Q2 is always bigger than Q1.

Now, let's look at D1 and D2. What are they? Well, D1 is the average wait in queue for all calls, and D2 is the average wait in queue for only those calls that get into the queue in the first place. Note the similarity with Q1 and Q2. But what are the units in which D1 and D2 are measured? They are not CCS, minutes or hours. They are "holding times." D1 and D2 measure average delay in terms of how long, on the average, people talk when they finally get a circuit. D1 is used to plot the dashed curves in Fig. 3-1. A glance back at that figure shows that a D1 of 0.1, or about 30 seconds on a 5 minute call, gives about the same occupancy as B.15, while D1 of 0.5 or 1.0 give appreciably better occupancies on small trunk groups.

We see that 14.63 Erlangs offered to 15 trunks produces a D1 of 2.41 holding times. If we know that the average holding time is 5.35 minutes from other measurements, then we know that, on the average, a call arriving at random can expect to wait for 12.89 minutes, or about 12 minutes and 53 seconds. Some calls, however, will luck out and will arrive just when random traffic fluctuations have let the queue clear out and leave a trunk or two free. This small proportion of calls will not be subject to any delay. Thus the delay prorated over just the calls delayed is 2.7 holding times, or about 14.45 minutes.

Adding just one more trunk does wonders for service. D1 drops to 0.47 holding times (2.51 minutes), and D2 becomes 0.73 (3.83 minutes). Both delays are much shorter, but we see the ratio of D2 to D1 increase greatly. That's because, of course, a much smaller proportion of calls get into queue in the first place.

The remaining columns in the display, P8, P4, P2, P1 and PP, are simply the probability that a call coming in at random will be delayed more than an eighth, quarter, half, one or two holding times. With 15 trunks, we see that the probability of a wait longer than two holding times (PP) is 0.43; if we add one trunk, the probability of delay drops to 0.04.

Some Problems with Queuing

D1 and D2 help us get a feel for a problem that shows up in queuing situations and other places as well. The standard way of defining delay is in terms of holding times. But ten 5-minute calls and five 10-minute calls represent the same amount of total traffic in Erlangs (0.83). If we think about this, we see that delay in the queue, as experienced by the individual caller, is only partially a function of the total amount of traffic, but it is a direct function of holding time. Thus 10-minute calls will produce twice the delay of 5-minute calls.

Let's try just this problem. Run ERL-C1, and give it 0.83 Erlangs (five 10-minute or ten 5-minute calls). We immediately see that with a single trunk, D1 will be 4.88 holding times and D2 will be 5.88. For 5- minute calls, we multiply these values by 5 minutes per holding time to get better than 24 and 29 minutes delay, respectively. For 10-minute calls, we multiply by 10 minutes rather than 5, and get delays of about 49 and 59 minutes. Thus we see that for the same amount of traffic, a queued caller has to wait twice as long. If we used two trunks, 0.21 and 0.85 holding times become 1.05 and 4.25 minutes for 5-minute calls, and 2.1 and 8.5 minutes for 10-minute calls. This is a considerable improvement in service, but, with the same traffic and twice as many trunks, our occupancy has been cut in half.

Long holding times make for long delay in queue, and people react accordingly. It turns out that, the harder it is for people to get a trunk, the longer they keep it once they have it. Thus a long wait in a queue tends to make for longer holding times, which increases the delay in queue. Without some sort of counter measures, queuing tends to be self-defeating.

Another problem with queuing involves "tandem" systems, systems where connections are established through several trunks, one after another. We might, for instance, use a tie-trunk to go from PBX A to PBX B, and then use an FX line to get into a Central Office remote from PBX B (and, presumably, even more remote from PBX A). If we are queued for access to the tie-trunk and then, when our call arrives at PBX B, we are again queued for access to the FX line, the tie-trunk will be occupied with a lot of non- productive holding time (the time we are waiting in queue for the FX line to come free). This increases the occupancy of the tie-trunk and thus tends to reduce its cost per minute, but it gives an incorrect picture of the costs involved. It does no good to tie up a facility with dead waiting time; business is not being transacted. Further, other people are also not using the tie-trunk to transact business. Thus, one should not, in general, queue for facilities used in tandem connections.

There is one more problem. One should not queue on two-way facilities such as FX lines and tie-trunks. Consider the first of two basic situations you can encounter: one PBX is smart and can queue while the machine on the other end is dumb and cannot. We queue to get higher facility occupancy. We get the most occupancy when an ATB condition obtains. When we do not have ATB, we do not have a queuing situation. Thus, it may be said that the purpose of queuing is to maintain ATB.

Now, assume we have ATB on our trunk group. Both ends have calls to place. The smart end has them stacked neatly in a queue, while the other end simply experiences users retrying over and over. As soon as a trunk comes free, the smart end pounces on it and sets up the call. People retrying haven't got a ghost of a chance of showing up just when the trunk comes free and before the smart end reseizes with electronic speeds. Thus the retrying callers are frozen out.

Now, let us consider the second variation: both ends are smart switches and have queuing in effect. As soon as a trunk comes free, both ends pounce on it and connect signaling equipment to tell the far end what they want. This produces a condition known as glare. Both senders want to talk, neither wants to listen, and, depending on the hand-shaking routines in effect, they can have quite a bit of trouble figuring out what to do next. It follows that one should not, at the present state of the art, queue on two-way facilities.

In the future, when Common Channel Interoffice Signaling (CCIS) is more widely available, these problems will be greatly reduced. If one has a CCIS data link between smart PBX control computers, they can talk to each other and work out some sort of compromise. But at present, most PBXs and older COs communicate with each other over the same trunk that will later be used for human conversation. The protocols used (wink start and delay dial, in particular) are based on telegraph techniques in use prior to the World War I, and leave much to be desired. In particular, the signal that tells the distant switch that a trunk has been seized (off hook) is exactly the same as the one used to acknowledge seizure from the distant end. Tricky timing is used, but there are other, better ways to detect and deal with glare. For instance, off-hook could still be used as a seizure signal, but dial tone could be used to acknowledge seizure for machines, just as it is for people. Unfortunately, this idea is almost impossible to sell.

More on Delay

Having delay expressed in holding times is useful for theoretical purposes, and for printing out paper tables. But computers can make the modification to "real" units such as seconds or minutes far better and easier than people can. Thus ERL-C2 is provided. It requests the average holding time in addition to the offered traffic, and provides D1 and D2 in seconds. Further, the columns over on the right hand side give the probabilities that a call, arriving at random, will be delayed longer than 10, 30 and 60 seconds, and 5 and 10 minutes. Run the program for 6 Erlangs and 5-minute calls and then for 10-minute calls, and see how the delays change (or look at Printout 4-2). For different average holding times, different delays appear. When using ERL-C2, the "real" units are quite convenient. However, they depend on a separate measurement of holding time for their accuracy. Assuming a holding time that is too long or too short can lead to problems.

Both ERL-C1 and ERL-C2 print out their information starting with N being the next larger integer than the offered traffic in Erlangs. Why? The reason lies in one of the basic assumptions made in the derivation of Erlang C: nobody drops out of the queue. Although this is an accurate model for cars on the George Washington or Golden Gate bridges, it is hardly true for telephone callers. But this is the assumption. Now, if the offered traffic equals the carried traffic, it follows that one must have at least as many trunks as there are hours of use per hour. If one has fewer trunks, the queue will get longer and longer, and the condition of "statistical equilibrium" cannot be met. If calls come in faster than they go out, hour after hour, even a queue can't help you.

Let's go one step further. Let's assume we our traffic is 5.5 Erlangs. With six trunks handling 5-minute calls, we will have a D1 of less than 8 minutes (469 seconds), and we can handle the load. But now, suppose we have an unusually busy afternoon so that our busy hour traffic is 10% heavier than usual. Normally, one might not think of a 10% overload as being much of a problem. But adding 0.55 Erlangs to the 5.5 for which we designed takes us beyond 6 Erlangs, and, as a result, the program tells us we need 7 trunks minimum.

It is instructive to run ERL-C1 or ERL-C2 for 5.5, 5.6, 5.7 Erlangs, etc., and watch D1 and D2 shoot up. It also illustrates a very important fact: the fuller you run a trunk group during average times, so that you can get good economies, the more subject you are to the troubles produced by overload. When designing a system, it always pays to check what happens with a 5, 10 and 20% overload. If you use some kind of an "average" busy hour, you can expect some "busy" busy hours to exceed it as a matter of arithmetic. If you don't know what to expect, you haven't really designed your network. Note that overloading is not limited to queuing systems. There are other ways to pack trunk groups full, and when you do, you have less reserve space available to help soak up excess traffic under overload.

Queuing produces higher occupancy and, as a result, it can sometimes produce a lower cost per minute of use. Indeed, we almost always queue to get lower cost per minute. But when we use specialized carriers or toll, cost per minute in independent of circuit occupancy. What then?

Then, of course, there is usually no point in queuing. So it becomes obvious that the way in which facility costs are billed has a major impact on how we design networks. There are no abstract principles of traffic theory that produce ideal solutions all by themselves. We must understand how pricing techniques and traffic interact to produce optimization. The next chapter introduces some of the ideas that relate cost and traffic.

Summary

The Erlang C formulas assume offered = carried traffic; that is, nobody drops out of the queue. A call stays in the queue until served, and then uses the facility for one holding time. Unfortunately, humans delayed in a queue tend to keep the trunk longer when they finally get it. Because delay is based on average holding time, queuing tends to be self-defeating. Queuing can greatly improve occupancy for small trunk groups (see Fig. 3-1) but does little for larger groups which run naturally at higher occupancies. No matter how clever the queuing algorithm in modern telephone equipment, it cannot put 6 Erlangs into 5 trunks in one hour.

[ Top ] [ Next Chapter ] [ Table of Contents ]


Copyright 2006 Lee Goeller. All Rights Reserved.