JR's UC Voice: When an "A" is not an "A"

Saturday, October 31, 2009

When an "A" is not an "A"

Why do some audio codecs need so much more data than others? Consider this:

Text files and image files take completely different approaches to how they save information. A text file doesn’t save how your words look, it saves what the words are. An image file is the opposite: it preserves the look of things, and doesn’t care with what they represent.

If a text file only needs to distinguish among 256 different characters – and that give you ten alphabets’ worth of special characters – then each letter needs eight bits. One byte per character. That’s pretty efficient, but it only works because we’ve limited its abilities: it’s only allowed to store text, not pictures, and we have to agree beforehand on what those 256 characters are.

Image files, on the other hand, only assume that the source is an image, any image. A good example is .jpg format (JPEG is “Joint Photographic Experts Group,” the party animals who first agreed on that standard). So it starts with an arbitrary image, a checkerboard of any number of pixels, and then follows an algorithm to boil that down as best it can by removing redundancies. But it’s always limited by its inability to make many simplifying assumptions. It can’t take the easy road that a .txt file can, just assuming the source is a letter; it could be a fondue pot.

Let’s try a simple case: I want to send the letter “A” somewhere.

Text file

Here’s your eight bits: 01000001 in binary. A bunch of people called ASCII agreed on this lookup table many decades ago – it’s just another kind of Morse code.

Jpg file

Since I want to send a clear picture of the letter “A,”

I’ll start with an image space about 160 x 180 pixels. That’s 2880 pixels for a really clear letter (see how nice it looks?). Then, depending on how aggressive I’m willing to be, I can boil that down anywhere from 2x to maybe 20x. Add overhead for arbitrary color (three colors, remember), and this one-byte letter is now tens or hundreds of bytes.

What did I get in return for all this extra space? I can send any picture I want.

Speech codecs are like that too - they offer the same tradeoffs. I’ll talk about that next.

Footnote: Text coding is a slippery slope. The move to multilingual documentation has pushed a transition to double-byte coding. 256 choices is no longer enough, we now need 256^2, or 65,536 choices to handle the possibilities of Chinese language, Farsi, and all the rest. See what happened? By growing the number of choices, we grow the “codebook” and so make the coding more versatile but less efficient. This is a form of what is called, in its most general form, “vector coding.” Vector coding can be used for image compression, and then the codebook can dynamically adapt to the content of the image. It’s more efficient – fewer bits, but more complex – more calculation.

No comments:

Post a Comment

LIFE IS HD ALREADY. LET'S FINISH THE JOB!

The continued move toward bringing our tools and media together, including the sharpening of sound and picture fidelity to match our own senses, is revolutionizing human telecommunications. On this whiteboard, some insights and proposals of why it's happening and where it's going.

What is HD Voice?

What is UC?

Where do we find it?

Who's doing it?

How do open standards relate?

Why does it matter?

How do you pronounce "VoIP?"

Is it another gimmick?

Why's it taken so long?

Is it like HD Video?

Who owns “Telepresence?”

How does it work?

How do phones work?

Can I hook an Asterisk server to my hot tub?

...and any others that come to mind

WHO and WHY

I'm Jeffrey Rodman, co-founder and CTO of Polycom, for 26 years a world leader in telepresence, videoconferencing, and incidentally the company that also invented that extraordinary triangular speakerphone twenty years ago.

Through our whole multimedia history, we have been applying and extending the principles of transparency and excellence to give the absolute best and most reliable connections among people who have no choice but to connect at a distance. Those who need the best in telehuman communications, with no problems and no mistakes, come to Polycom.

Legal disclaimer: You already know what I'm going to say here. The contents of this blog are entirely my own creation, except where they come from elsewhere, in which case they're either attributed, quoted, referenced, accidentally non-attributed, or subconsciously linked without clear volition on my part. While it's my opinion that Polycom is doing, and has done, some great world-changing things, Polycom Corporation has no (zip, sifar, zero, sifuri, nada, صفر, ninguno, 零, cero, nul, нуль, 0, squat) responsibility for any missteps, slurs, accidentals, lies or damned lies that may exist within the boundaries of this blog. I will try to have none of the above except in the context of a musical or statistical discussion, but I am also not responsible for any results (especially bad ones) that may come from you taking my words too seriously. All the normal "hey, I'm not responsible for anything" words that you'll find in common disclaimers apply here too. Oh, and any really weaselly words you have ever heard drooling out of a lawyer, especially if they allowed someone to dodge some richly earned retribution, also apply here, as I have a particular fondness for well-crafted weaselly words. That's an art, really.

JR's UC Voice