Introduction to audio measurements and terms

W4NEQ
Christopher Scott
Bowling Green, KY



Author's note:  This is a bit of a work in progress, and is considerably lacking in continuity and polish. In time I hope to finish it. Please forgive me.
Having worked as a radio broadcast engineer for over 30 years, I've invested a lot of time fine-tuning am and fm radio stations for best sonic quality.  Conducting measurements which quantify this performance has played a big part.  It is amusing to see the colorful descriptions so often used by "audiophiles" when describing certain qualities. Too often these same folks, despite their exuberance, are completely unfamiliar with  basic audio terms and quantifiers, the measurement of which form the basis of comparisons.  There are some who will argue that sonic quality can best be judged with ears alone;  indeed in the case of transducers such as microphones, headphones, and loudspeakers, some experts will agree.  In the electronics realm however, analyzing and comparing input and output waveforms of the device under test has always been the accepted method. 
Recently there seems to be some interest in certain quarters about applying broadcast style audio treatments to ssb (and AM) transmission for communications, in the hope of achieving really "good audio", which although itself a subjective moving target, can in fact be quantified to some degree.  Once the quality "bottlenecks" are identified, they can often be improved.  This note hopes to breakdown some of the mystery involved and investigate some of these bottlenecks.
Despite great strides in audio quality by digital techniques, audio is in fact, analog.  At the source as well as the final transducer this will always be true.  For the last 75 years there have been essentially three measurements which quantify audio quality, and they remain critically important.  These are; 1. Frequency response, 2. Distortion (wave form linearity), and, 3. Signal to noise ratio or dynamic range.
The frequency response of an audio system is perhaps what is most noticed by people when judging quality.  It is simply the bandwidth, usually described by upper and lower frequency limits, often at the point where the response has reduced to half power, or -3 db.  "CD" quality is often cited as the benchmark of excellent quality, and its frequency response is approximately, with practical equipment, 20 Hz to 20KHz, at -3db.  Analog FM broadcast audio, when properly adjusted, is arguably excellent quality.  For many years the FCC required annual audio proof-of-performance testing, and specified the limits at 50 Hz to 15 KHz.   In my experience, critical listeners using blind test methods can just barely perceive the difference between this and "CD" grade frequency response.   Figure 1 shows a frequency response graph of a popular dynamic microphone - far from "flat" response.

AM broadcast frequency response is generally considered 50 Hz to 8 KHz, with the upper limit artificially imposed by audio filtering designed to limit occupied bandwidth.  This is actually not bad sounding frequency response for music, but very few receivers allow the higher frequencies, e.g. above 3-4 KHz to pass.  The resulting 50 Hz to 3.5 KHz is tolerable for speech, but comparatively poor for music.  SSB is, depending upon the filters involved, 300 Hz to 2600 Hz which is the bare minimum for intelligible speech. 
Distortion can take many forms, and the definition has many variations. But the most basic measurement is Total Harmonic distortion (THD) which is measured in percent.  Due to the nature of the usual test methods which null out the test tone and measure the residual harmonic energy, the measurement is more precisely THD+N, or THD plus residual noise. Distortion can best be thought of as waveform non-linearities.  This may be  best understood by looking at the classic example - a case where an audio stage is overdriven to the point where it is said to "clip" the waveform.  Imagine a sinewave where the extreme top and bottom are flattened - the audio stage was incapable of faithfully following the excursion beyond the its power supply rails, and reached the "clip point".  This process generates third harmonic energy where there was none before.  Where the sinewave was formerly a clean, pure tone at its fundamental frequency only, now with third harmonic energy added, it sounds gritty, or has a bit of an edge.  The extent of these undesirable artifacts is determined by how much of the sine wave was clipped.  Another important distortion test is IM, or intermodulation distortion.  Two pure and distinct tones are injected into the DUT, with the analyzer measuring the levels of the sum and difference frequencies.  In a perfectly linear amplifier, these tones will not mix and produce sum and difference products.
Dynamic range is closely related to signal-to-noise ratio, which is the third independent quality factor.  Simply put, it represents the ratio of the loudest sound to the noise floor.  This noise floor may be residual white noise internally generated in the electronics, hum, or other undesirable background noise.  Traditionally an operating level is established with a constant tone corresponding to "0 VU."   With actual program material, brief peaks would actually exceed this point by 10-20 DB, so headroom above this point is required in order to avoid clipping.  Usually a a maximum level is reached in the electronics where a specified amount of harmonic distortion is measured - usually 1-3%.  This point defines the clip point.  Quantifying this involves noting the absolute output level of the clip point (or operating level), removing the test tone while keeping the input normally terminated, and measuring the residual voltage, typically using a "weighting" filter which emulates the sensitivity of the human ear.   The voltage ratio difference, represented in dB between the clip point and the noise floor defines the total dynamic range that is possible.  The signal-to-noise ratio is the same measurement between the operating level and the noise floor.  Therefore, dynamic range equals signal-to-noise ratio plus headroom.  In recent years with peak meters replacing VU meters, and with everything referenced to the absolute digital clip point of dBbfs - dB below full scale, the two terms have blurred together.  What's important is to use the same definition when comparing specifications - is headroom included in the numbers, or is the spec really total available dynamic range?
Quality grades.

People have different tastes in music and tonal quality.  Some believe that sound reproduction is best at 20% distortion as long as it's loud.  I disagree. The job of the audio engineer is to faithfully reproduce the original sound.  Beyond this, certain "enhancements" can be added based upon taste, but the benchmark for quality comparison must always be the degree of faithful reproduction achieved.  The greater the artifacts produced in the reproduction process, the poorer the quality.  If the signal-to-noise (s/n) is infinite, the distortion is 0% and the frequency response 20Hz to 20 KHz is equal, humans will perceive it to be exactly like the original live performance.  In the real world however, this is rarely achieved.  Agreement between experts about what constitutes "high fidelity" is equally rare. It is important to understand that frequency response, distortion, and noise measurements are largely independent of each other.  A recording which is wonderful in two categories can still be terrible quality because of poor performance in the third. I shall instead describe some examples of real-world audio, and provide some typical quality numbers.

Despite what some LP enthusiasts claim, true "CD" audio is excellent quality.  From a good consumer-grade machine, I've measured about 93 Db dynamic range (96 Db is the theoretical maximum with 16 bit words describing each sample - at 6 dB, a doubling of voltage for each bit.)   The frequency response is typically 20 Hz to 20 KHz within .3 db - essentially perfect, and the THD - total harmonic distortion measured at 1000 Hz is less than .05%.  This is not to say that every CD and player will achieve this, but the medium is capable of it. 

The next quality grade we shall explore is that associated with computer sound card recording and playback.  With 16 bits, is can be true CD quality as previously described.  But this is only achieved with high quality sound cards, usually in the $200 - $1,200 price class.  Sound cards are in fact the major limiting factor.  Even though all else is the same, including bit count and sample rate, many consumer style cards are very poor performers. I've seen $150 cards with published specifications of 93 dB s/n (they really mean dynamic range) actually measure 65 dB playback and 55 dB on record.  When the manufacturer was contacted, it seemed their marketing department had  "calculated" the performance specifications.  Frequency response and THD was also worse than published.

On the other hand, there are excellent sound cards available which really do measure close to the 16 bit theoretical limit.  Digital Audio Labs and Audioscience both appear to have audio engineers on staff who've actually measured their product.  These cards and other high quality units do achieve true CD quality.
Sample rate describes how many waveform quantifications per second are done to reconstruct the audio.  These range from 11 to 48 KHz. Its important to understand that (when using good sound cards), this quality  determinant  affects primarily just frequency response - signal to noise and distortion are largely unnaffected, being most determined by the number of word bits.  The Nyquist frequency is simply double the maximum frequency response desired.  For example, 20 KHz frequency response is achieved with a sample rate of 40 KHz.  In the real world however, imperfect filters are required, and some additional safety margin on the Nyquist frequency is needed - about 10%.  So the CD process uses 44.1 KHz as the sample rate.

There are now abundant lossy compression (distinguish from dynamic compression which increases average loudness) algorithms such as Mpeg two and three, AAC, etc., which trade many things for reduced audio file size or required data throughput.  Most  people find the resulting audio quality acceptably good when these data reductions are done in moderation.  This is however, a very murky area where intermodulation distortion is increased, and at least dynamically, frequency response is often decreased.  In addition, some background sounds which are believed to be masked by louder sounds may completely dissappear.  From a purist's viewpoint, these degradations are awful, but the trade offs are nevertheless done to allow use of certain limited (digital) bandwidth systems.  In the author's opinion, one cannot speak of high-fidelity when including lossy compression.


Back to W4NEQ main page