Speak Freely for Windows

Compression modes

If you're talking to another user on the same high-speed local area network, or you're one of the lucky few with a high bandwidth connection to the Internet backbone, there's no need to bother compressing audio. The data rate of 8000 bytes per second is modest compared to other Internet applications such as file transfer and accessing graphics-intensive pages on the World-Wide Web.

The rest of us, faced with a bottleneck of anywhere from 14,400 to 65,536 bits per second between our machine and the rest of the world, have to find a way to squeeze 8000 bytes per second into a communications channel with a capacity between 1440 and 6500 bytes per second. Speak Freely provides a variety of compression modes, each with different trade-offs among efficiency of compression, loss of fidelity in the compression process, and the amount of computation required to compress and decompress. Speak Freely's built-in performance benchmark may help you determine which modes are suitable based on the performance of your computer.

Compression options

Compression is selected by checking one or more of the compression items on the Options menu. The chosen compression mode(s) apply to all sound transmitted to open connections: sound files as well as live audio. Compression modes cannot be changed while you're transmitting live audio; click the mouse in each transmitting connection window to pause transmission, change the compression mode, then click or double click to resume transmission.

If no compression is selected, Speak Freely requires your network to reliably transmit 8000 characters per second. If it's slower than that, the person you're talking to will hear pauses in the sound they receive and sound will be lost. Most local area networks, unless extremely heavily loaded, have no difficulty transmitting data at this rate--in fact, most are capable of speeds on the order of a million characters per second. It's when you leave your local network and venture into the worldwide Internet that compression becomes crucial. Very few Internet users today have connections faster than 64 kilobits per second, and many are using dial-up modem lines at 14.4 or 28.8 kilobits per second.

For asynchronous serial communication, the data rate in bytes per second is about one tenth the speed in bits per second so it's clear that even a 64 Kb line can't transmit uncompressed sound at 8000 bytes per second. Speak Freely provides various forms of compression which can be selected independently or in combination to reduce the data rate.

Simple compression discards every other sample and thereby halves the data rate to 4000 bytes per second, within the capability of a 64 Kb connection. On the receiving end, the elided samples are synthesised by averaging adjacent samples. Simple compression requires very little CPU time but it substantially degrades sound quality--high frequency components are lost and weird sampling aliasing can occur. Still, voice is generally intelligible and it's certainly better than random pauses and lost sound.

GSM compression employs the algorithm GSM (Global System Mobile) telephones use to reduce the data rate by a factor of almost five with little degradation of voice-grade audio. Enabling this option reduces the data rate from 8000 bytes per second to 1650 bytes per second, which renders a connection by 28.8 Kb modem usable. The catch is that GSM encoding is a very complicated process and, if your computer isn't fast enough, it won't be able to keep up with the audio coming in. (Decoding requires only about half the computation as encoding.) To use GSM compression, you'll need a fast 486, Pentium, or later generation processor. Thus, a slower network connection increases the demand on your computer.

ADPCM compression uses Adaptive Differential Pulse Code Modulation to halve the data rate to 4000 bytes per second. The compression is identical to that accomplished by Simple compression, but the loss in fidelity is much less; for voice grade audio, it's barely perceptible. ADPCM encoding and decoding requires more computation than Simple compression but enormously less than GSM; if your computer is too slow for GSM and the compression achieved by ADPCM is adequate for your network link, it's the best choice.

You can combine Simple and either GSM or ADPCM compression. The CPU requirement is only slightly greater than for GSM or ADPCM compression alone and the sound quality is about the same as for Simple compression. Simple and GSM compression combined yield a data rate of 800 bytes per second, which a 14.4 Kb network link can handle. Simple and ADPCM compression together yield a data rate of 2000 bytes per second, within the capability of a 28.8 Kb link.

LPC compression uses Linear Predictive Coding to reduce the data rate by more than a factor of 12. This achieves the greatest degree of compression of any of the available options but, like GSM, it is extremely computationally intense. LPC requires many calculations to be done in floating point; if your machine does not have a math coprocessor, it will almost certainly be unable to do LPC compression and decompression in real time. LPC compression is extremely sensitive to high frequency noise and clipping caused by setting the audio input level too high. If you hear frequent bursts of loud static, try reducing the gain on the microphone or speaking further away from it. Also, try to avoid the pops that result from talking directly into the mike; they also create bursts of noise. Finally, users with high pitched voices may not be able to use LPC compression at all: it just loses too much high-frequency information. If GSM is a cellular phone, think of LPC as a shortwave radio. It doesn't always work, you have to be careful to get the best results, and even in the best of circumstances there will be some noise and distortion. But, like shortwave, it lets you communicate (or at least try) when nothing else will work. If your network link is so slow that none of the other forms of compression are usable, give it a try.

LPC-10 compression uses a different form of Linear Predictive Coding, as specified by United States Department of Defense Federal Standard 1015 / NATO-STANAG-4198, republished as Federal Information Processing Standards Publication 137 (FIPS Pub 137). LPC-10 compression encodes real-time audio into a 2400 bit per second stream. Even accounting for the additional information required to transfer audio packets over the network, LPC-10 compresses audio to only 346 bytes per second--a factor of more than 26 to 1. Audio fidelity in LPC-10 compression is less than that of GSM compression, but entirely adequate for voice-grade communications. As with the LPC compression mode described above, try to avoid driving the audio input into clipping with overly-loud signals, and eliminate hum and background noise which can interfere with the compression process. The principal disadvantage of LPC-10 compression is that it is extraordinarily computationally intense, and does most of its calculations in floating point. A math coprocessor (or on-chip floating point unit as found in 486DX and Pentium processors) is absolutely required to run LPC-10 compression in real time, and slower machines may not be able to use LPC-10 even if equipped with a math coprocessor.

The extreme degree of compression achieved by LPC-10, encoding audio into much less bandwidth than the typical Internet link, allows Speak Freely, when LPC-10 compression is selected, to offer an optional Robust Transmission mode. By default, Speak Freely sends a single copy of each sound packet to the site you're connected to. In Robust Transmission mode, two, three, or four copies of every sound packet are sent, each containing a sequence number that allows the recipient to discard duplicate or out-of-sequence packets. If the Internet link between you and the person you're talking to is congested and you're experiencing drop-outs, Robust Transmission mode may substantially improve the quality of the connection. You can run 2X (two copies of every packet) on a link as slow as 9600 baud. With a 14.4 Kb modem, you can run 2X, 3X, or 4X (although 4X is close to the capacity of 14.4 line and you may have trouble if other simultaneous traffic is occurring on the line). With a 28.8 Kb or faster line, all robust transmission options are available. Duplicating packets more than four times does not improve reliability of the connection and only wastes bandwidth, so replication is limited to four copies.

Only one of the compression modes GSM, ADPCM, LPC, and LPC-10 may be selected at once. Choosing any of them turns off a previously-selected mode.

The following table summarises the compression options available.

              Bytes per  Kilobits per  Need fast   Sound
Compression    second      second        CPU?     fidelity
----------------------------------------------------------
No compression  8000       80000         No        Best
Simple          4000       40000         No        Poor
ADPCM           4000       40000         No        Good
Simple + ADPCM  2000       20000         No        Lousy
GSM             1650       16500         Yes       Good
Simple + GSM     825        8250         Yes       Lousy
LPC              650        6500         Yes       Depends
LPC-10           346        3460       Extremely   Okay

You can experiment to determine which settings work best by connecting to an echo server which returns any sound you send to it after a 10 second delay.