| orgh | motivation | sounds | waveform | composite | Zahlen | concussion |

Most extant audio compression studies seem to be listening tests by ears I do not trust, or approximate waveform commentaries by minds and eyes whom I similarly distrust. I have grown tired of ignoring these and of murmuring vague recollections of weak tests I once performed to personally validate ogg over mp3, and have thus recorded a lazy afternoon's tinkering, the intent being more to cultivate interest than to inculcate. Sections are organized as follows:
More than anything, this quick excursion was to staunch doubts that I am not listening to my music in the best way possible, and also to prove what my ears gauge. I conclude by vanquishing long standing doubts, proving ogg superiority irrefutably, sheepishly admitting to caveats, and solving world hunger by reinstating cannibalism.
(Note that all files are kept small to avoid bandwidth demolition)
Rather than intelligently selecting a sound sample which exercized the compression mechanisms, I used a convoluted two month long delirium of approximately one million archaic video games to determine which snapshot of glory should be disseminated for its historical importance--hence the source audio is the background music to the first boss (and others) of the Super Nintendo shooter classic Gradius 3.
The ogg and mp3 encoder were oggenc 1.0.1 and lame 3.96 (if I was a redundant, pretentious jerk, I would say 'respectively' here) respectively, with only default settings for reasons having to do more with laziness than any scientific sandbox principle.
Annoyingly, I noticed during the next step that both the ogg and wav had 647,680 samples (multiple of 128), whereas the mp3 snapped to the least greater multiple of 512 (647,480). I was not previously aware of any such limitation, and I doubt it's lame's fault (look it up for me, dear).
A cute way to study sound visually is by using a Fourier Transform, which deconstructs the wave into component (sine) frequencies. For a violin playing a single note, for instance, this method will prominently display a set reflecting the root pitch and the immediate overtone series (violins partially owe their sound to an incredible affinity to the harmonic series).
Basically any program providing such functionality uses the Fast Fourier Transform, which is the efficient decomposition method utilized by spectromatic, a short, convenient program which generates PNGs from wav input.
Laziness demanded I use the author's suggested settings; the too-large-to-offer images are rather dark, and frequency is represented horizontally with time taking the vertical axis. The png for the wav was 1024x3259, so I selected two areas, each 96x160, and then did a simple nearest neighbor 2x scale. The first originates at (0,1547), where there is much activity; the other, at (361,393), is from a tamer part of the spectrum.
| ogg | wav | mp3 |
|
|
|
|
|
|
mp3 seems to be screwing up pretty badly to me already, but let's not content ourselves with idle speculation.
So now the (short-lived) fun starts. I used the gimp to place a copy of the source audio's png under that of the ogg and mp3, and then set the compositing method to difference. I then cropped out the same areas used above, which appear here in the same order:
| ogg - wav | mp3 - wav |
|
|
|
|
The first pair implicates mp3 rather clearly as the victim (tweak your screen if you can't see anything--I'd rather not put things more out of context by enhancing it), with the ogg exhibiting only telltale noise. The second set is basically black, so I used the gimp to stretch their brightness and contrast:
| ogg - wav | mp3 - wav |
|
|
The main purpose of this final mutilation is to point out that ogg's aberration is much more structured, whereas mp3 has a hint of the same shape, but is mostly noise.
Idiotically placed as the final bit of evidence, I wrote a 30 line ruby script (sorry about the length, I felt like having error handling) to try to somehow quantify the inaccuracy. The verdict already being clear, and not wanting to waste too much more time, I use a simple loop to accumulate the integer difference of the red, green, and blue values at each pixel for both images provided as input.
There are a number of problems with this approach. Firstly, since it lacks any recognition of patterns, it incorrectly measures drift as complete failure--for instance, were the source image alternating white and black rows, and the algorithm managed a perfect snapshot albeit shifted down one pixel, it would rank as a complete opposite. Another error is that the distribution is not uniform--a ridiculously disproportionate number of images receive a close rating than a far rating. The solutions to the first are complicated, and to the second trivial, but unfortunately I haven't found the key to the third problem: my aforementioned laziness. Luckily, the distribution is monotonic and decreasing, so there can be at least a modicum of faith in the result.
For example, here is the result of comparing the image against its self and its negation. Standard output receives a running calculation for the duration of the loop; provided here is the final state:
> ./img_cmp.rb gradius3_bm1.png gradius3_bm1.png 3259/3259 [0 | 100.0%] > ./img_cmp.rb gradius3_bm1.png gradius3_bm1_inv.png 3259/3259 [208191465154 | 68.2689790383142%](The first value increments to reflect progress; the second is the image height; third is the total difference between corresponding pixels; finally comes a percentage representing how this value corresponds to the total possible distance.)
Now to actually use it:
> ./img_cmp.rb gradius3_bm1.png gradius3_bm1_ogg.png 3259/3259 [974550168 | 99.851466188654%] > ./img_cmp.rb gradius3_bm1.png gradius3_bm1_mp3.png 3259/3259 [1242699085 | 99.8105968912509%]So the mp3 is approximately 28% less accurate than the ogg, though the ogg is 26% smaller and the bitrate is (ironically) 28% lower. What's nice, however, is how close they both are to lossless.
Just for fun, let's see how similar the ogg and mp3 are:
> ./img_cmp.rb gradius3_bm1_ogg.png gradius3_bm1_mp3.png 3259/3259 [1476131671 | 99.7750188031961%]
Limitations with these tests:
Hopefully I'll revisit this eventually, as the presented material is no more than a slightly elaborated stream of conscience. I can't say what will cause this event; maybe a new medium, some new gadget, a new format, etc. When I do come back, I will most likely simply correct the script, extending it to automate encoding of different tracks to various formats at differing bitrates, and pump out a table. Now go play Gradius.