Issue dated - 12th July 2004

-


Previous Issues

CURRENT ISSUE
INDIA NEWS
NEWSANALYSIS
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
Symantec Report
Security Headquarters
JobsDB
MINDPRINTS
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
Openings At Jobstreet.com
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > Opinion > Story Print this Page|  Email this page

Bandwidth in speech intelligibility

YUGAL SHARMA, country manager, Polycom India, on the history of bandwidth in telephony

BANDWIDTH is a much used and abused reason for the success or failure of many technology applications. Speech in telephony is one such application. Of all the elements that affect the intelligibility of speech in telephony, bandwidth has been proven to be one of the most critical—critical enough to be able to compensate for other deficiencies such as noise, reverberations and other factors hampering contemporary speech communication systems. I will attempt to deal with this issue in two parts, tracing the path that has led to the evolution of telephony, and the bottlenecks that should be done away with to accelerate its present growth in terms of quality.

Some progress has been made in reducing telephony’s deficiencies in the years since the first transcontinental phone call in 1915, as many sciences have come together and enabled a better understanding of the causes and solutions to these problems.

Early advances

Acoustics, physics, chemistry and electronics have facilitated major advances in the design of the telephone instrument, with new designs for the mouthpiece and earpiece alone producing a 10 dB frequency improvement by 1940. Similar improvements brought closer control to the gain of these elements (early experiments required the talker to tap the carbon microphone to loosen the granules inside). As the telephone evolved, antisidetone circuits were added so the talker could better judge his own loudness. The network added echo suppression, and later, digital echo cancellation to reduce echo at the far-end that became more troublesome as long-distance calls became routine.

However, in the last sixty years, little progress has been made in the amount of audio bandwidth that can be carried by the telephone network. Early telephone connections were not intentionally limited, but were constrained by the characteristics of the transducers (which convert non-electrical signals into electrical) and the equipment then available. Intelligibility research was commonly conducted with frequencies extending from 4 KHz to 8 KHz (and sometimes beyond), but the telephone network was expected to carry signals only to about 3 KHz into the 1930s, and to about 3.5 KHz with the first multiple-channel carrier systems. With standardisation, and the codification of digital telephony in G.711, the upper frequency limit of the telephone network is now commonly accepted to be about 3.3 KHz at best. The last pre-divestiture Bell PSTN tests in 1984 showed significant roll-off at 3.2 KHz for short and medium connections, dropping to 2.7 KHz in long-distance connections. At the low end of the spectrum, the telephone network carries frequencies no lower than 220 Hz, and most commonly only as far down as 280 or 300 Hz.

In contrast to this telephone performance, we find FM radio and television spanning 30 Hz to 15 KHz, CD audio covering 20 Hz to 20 KHz, professional and audiophile audio 20 Hz to above 22 KHz, and AM radio extending up to 5 KHz.

Bandwidth and intelligibility

Crandall noted in 1917, “It is possible to identify most words in a given context without taking note of the vowels...the consonants are the determining factors in...articulation.”

“Take him to the map” has a very different meaning from “take him to the mat,” and a handyman may waste a lot of time fixing a “faucet” when the faulty component was actually the “soffet.” Pole, bole, coal, dole, foal, goal, told, hole, molt, mold, noel, bold, yo, roll, colt, sole, dolt, sold, toll, bolt, vole, gold, shoal, and troll all share the same vowel sound, only differing in the consonants with which it is coupled. Consonant sounds have this critical role in most languages, including French, German, Italian, Polish, Russian and Japanese. And overall, more than half of all phonemes are consonants.

This critical role of consonants in speech presents a serious challenge for the telephone network. The reason for this is that the energy in consonant sounds is carried predominantly in the higher frequencies, often beyond the telephone’s bandwidth entirely. While most of the average energy in English speech is in the vowels, which lie below 3 KHz, the most critical elements of speech, the consonants, lie above. The difference between “f” and “s,” for example, is found entirely in the frequencies above 3 KHz; indeed, above the 3.3 KHz telephone bandwidth entirely. For example, the burst of high-frequency sound that distinguishes the “s” in “sailing” from the “f” in “failing” occurs between 4 KHz and 14 KHz. When these frequencies are removed, no cue remains as to what has been said.

This makes a conventional telephone incapable of conveying the difference between “my cousin is sailing in college” and “my cousin is failing in college” without the analysis of additional contextual information (knowing whether my cousin sails frequently, for example).

The challenge we face

Overall, two-thirds of the frequencies in which the human ear is most sensitive, and 80 percent of the frequencies in which speech occurs, are beyond the capabilities of the public telephone network. The human ear is most sensitive at 3.3 KHz, just where the telephone network cuts off.

Consonants are formed as non-voiced clicks, puffs, breaths, etc. They are created not from the vocal cords but by colliding, snapping and hissing through combinations of tongue, cheeks, teeth and so on. While “formants”, used in some speech analysis, are useful in examining vowels and long, voiced sounds, we see that they have very little to do with those elements of speech that carry so much of its information, the consonants. Intelligibility of speech decreases with decreasing bandwidth. For single syllables, 3.3 KHz bandwidth yields an accuracy of only 75 percent, as opposed to over 95 percent with 7 KHz bandwidth. This loss of intelligibility is compounded when sounds are combined in sentences.

The human mind is not conscious of confusion this frequently because the brain has some ability to compensate. When a sound is not clear, the brain attempts to examine the context of the sound. However, when presented with a continual string of such verbal puzzles as the meeting progresses, the listener is distracted. Too much of the listener’s time is spent in unravelling the intended meaning instead of understanding it.

What else affects speech accuracy?

There are additional aspects of business conferencing that interact with audio bandwidth. Reverberation, which comes from the natural reflections occurring in any room, magnifies the degrading effect of limited bandwidth. This is an important issue in business telephony because group teleconferences are usually held in meeting rooms, which are reverberant spaces. This problem is also magnified as the talker moves farther from the microphone, or when the microphone is pointed away from the talker, because a larger proportion of the total received sound is reverberant rather than direct.

Increasing bandwidth is very effective at counteracting this problem. In one test, word accuracy in a reverberant space increased from 52 to 80 percent when the available bandwidth was raised from 4 to 8 KHz.

The expansion of global business has increased the importance of accurate telephone communication among talkers who have different native languages or dialects. Understanding accented speech can be much more difficult than native speech, both because of the presence of an accent and because grammar pronunciation, and even word selection, are much different from what the listener expects. A Korean speaker of English, for example, will commonly substitute “p” for “f” (“faint” becomes “paint,” “coffee” becomes “copy”). A Turk may insert extra syllables (“stone” becomes “istone” or “sitone”). Even a speaker in London, referring to a cigarette container as a “fag packet” (notice all the consonants?), may leave his American listener completely perplexed.

Because of these substitutions, it is no longer safe to assume that an unclear word can be deduced from its grammatical context. Hence the increased accuracy that derives from increasing speech bandwidth is more critical when speech is accented.

Whispering and soft speech have more high frequencies. While the long-term average energy at 7 KHz in normal speech is roughly 40 dB below that at 600 Hz, in whispered speech it is almost flat, dropping only 10 dB over these three octaves. Hence, in whispers, even the vowels are much less intelligible with telephone bandwidth. A person with a cold, or who is growing hoarse, will have more difficulty being understood both because they have proportionately less energy within the telephone band, and because they are probably speaking more softly.

One more factor for consideration is that the telephone removes important frequencies both above and below its pass-band. In general, the telephone’s elimination of frequencies below 250 Hz is responsible for much of the ‘unreality’ and loss of comfort that we hear in telephonic speech, the sense that the talker is not really present.

By extending telephone bandwidth to 7 KHz and beyond, it is clear that one can markedly reduce fatigue, improve concentration, and increase intelligibility. This improvement is even more significant in real-world room situations, where the sound is often degraded by reverberation, projector or air-conditioner noise, accented speech, and other acoustic problems that are encountered in business telephony. Additionally, extending telephone bandwidth below 300 Hz brings a significant increase in presence and realism.

In his 1938 paper discussing the bandwidth of the telephone system, AT&T’s Inglis noted, “Frequency limitation is essentially an economic one, subject to change as conditions change.” Here in the twenty-first century, economics and conditions have changed as Inglis predicted, and modern telephony is now in a position to deliver on the promises of wider bandwidth and clearer speech.

The author may be contacted at yugal.sharma@polycom.com

<Back to top>


© Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.