Lossless and lossy audio formats for music.

This page is now pretty old.

You might be interested in the more-recent audio comparison article:

Big audio formats comparison, 2006.

It tests six lossless formats, and five lossy formats.

An article [no longer online] in Personal Computer World magazine, by Gordon Laing, about the folly of creating a huge collection of MP3 music files — because MP3 is a lossy format that has already been left behind technologically — and the wisdom of creating a collection using the lossless Wave (WAV) format, got me thinking.

Which is the best lossless format?

Lossless? Lossy? What?

Most people think that MP3 just means 'music file'. In fact, MP3 means MPEG Audio Layer 3, and is only one way of converting music into digital files. There are many audio formats, and almost all of them compress the audio data so that it takes up less digital space, that's less room on your hard drive, or less space on your portable music player.

Audio compression comes in two forms: lossless compression, and lossy compression.

The MP3 format is one that uses lossy compression. This means that it loses some of the audio information found in the original to make the compressed file much smaller. The information that lossy compression loses is the information deemed least important to the file. In music, this tends to be the very high and very low frequencies that are not considered to add as much to the music as the range of frequencies in between.

Many audio formats use lossless compression. This means that they retain every bit of information that is found in the original, so nothing is lost at all. Because of this, lossless compression cannot make the compressed file as small as it would be using lossy compression. However, lossless compression means that you get a smaller file without losing any information, and so is the only method that can be used when absolute fidelity is required.

What's wrong with MP3 as a format?

Nothing. But MP3 is already being replaced by other lossy formats that claim to offer better sound quality while creating smaller files.

Creating a collection of your music in digital form, you may end up with a huge number of files in MP3 format, and then realise that nobody uses MP3 files anymore. So you'd have to start again and create all your music again in the latest format.

The advantage to creating your collection using a lossless compression format is that each file will be identical, in terms of information, to the original. The music stored in a lossless audio file will be exactly the same as the music stored on the CD (or other audio source) you created the file from.

Why don't people always use lossless audio formats?

Even though lossless audio is a perfect copy of the original, a file created with lossless compression will not be as small as a file created with lossy compression. So if you have a limited amount of data storage on, for example, your portable music player, smaller files will mean you can fit more files on your player storage. That is, using lossy compression may reduce the quality of the music slightly, but it allows you to take a greater number of music tracks (or albums) on the bus to work.

When is lossless audio useful?

Lossless audio files are great for archiving your music. A while ago this would have been possible only for a small number of audio tracks, or for professionals who could afford a lot of storage devices.

However, hard drives are now so large that it is possible to store your entire CD, tape, and vinyl collections in the form of lossless audio, and still have space left to run your operating system, word processor, and games.

Because lossless audio files are an exact copy (in information - music - terms) of the original source, you can then use software that will process any of those lossless files into a lossy, smaller copy. So if the MP3 format stops being the standard, you can just delete all your MP3s, and use software to create lossy copies of a different format, using the archive of lossless files you have built up.

Can't I just convert my MP3 files to another format if necessary?

Yes, you can convert MP3 files into any other format that you can find software for. But because MP3s are created with lossy compression, the information they contain about the music is not a perfect copy of the original. So you would be working from an imperfect source. Even if the format you were converting to allowed better audio quality than MP3, your converted files would not be able to make use of this extra quality, because you would be working from an MP3 file. Conversion and compression can only ever make quality stay the same or get worse; they can never make quality improve.

The only way to get more purity would be to delete all your MP3 files and start all over again, creating new files from the original audio source, be it CD or vinyl or whatever.

Keeping an archive of your music in a lossless audio format would mean that you could batch-process those lossless files to produce a collection of music files in lossy format suitable for portable players. Using a lossless audio file is as good as using the actual source; but creating audio files automatically (using suitable software) from an entire archive of existing lossless audio files is a lot quicker than copying from the original source (e.g. ripping from CD) one source at a time.

What lossless audio formats are available?

The original lossless audio format was, logically, the original digital audio format: Wave format - the WAV file. Most people who have been using PCs for a while will have come across WAV files. They are a lossless audio format. In fact, typically WAV files do not even compress the data digitally, so the files are enormous. This has made them fairly useless for any serious quantity of music, until recently.

Gordon Laing (of PCW) rightly points out that with hard drives being as huge as they are these days, most people could in fact store their whole music collection on a new hard drive, even using uncompressed Wave format.

But it seems a shame to use an uncompressed format when there are so many lossless compressed formats. Even if you have got a huge hard drive, it makes sense to create the smallest files you can. Transferring files from one drive to another will take less time, for instance, if the total amount of data is smaller. And why waste drive space you don't have to?

One lossless audio format that is designed to compress the data is FLAC (Free Lossless Audio Codec). This audio format seems to have quite a lot of support, both software and hardware, and is already being used by several artitsts and music distributors as a way of offering high-fidelity music files.

Another lossless compressed format that has fans is Monkey's Audio. There seems to be no hardware support for this format, but several software applications support plug-ins that allow Monkey's Audio files to play.

Both FLAC and Monkey's Audio are open source, and free for anyone to use. Both offer a tagging or metadata, like the ID3v1 or ID3v2 tags that MP3 files offer, that store useful information about the music in each track, such as track number, track name, artist, album, year of release.

I decided to rip a few albums to see how FLAC and Monkey's Audio compared: against each other, against lossy formats, and against the uncompressed Wave format.

Test method.

All of the results below refer to ripping each CD on the same setup:

All ripping was done using the superb dBpowerAMP Music Converter, where the only settings that changed between each CD were the compression type and compression settings.

Originally, FLAC and Monkey's Audio files were going to be created using their maximum compression levels available in dMC (dBpowerAMP Music Converter). However, FLAC produces a ridiculous result using the 'high' setting. It took 53 minutes 59 seconds to rip one album, and produced a file bigger than the one produced by the 'medium' setting. Consequently, FLAC was set to 'medium', and Monkey's Audio was set to 'normal'.

It should be noted, though, that Monkey's Audio produced a sensible result in the 'extra high' setting, and also offers a 'high' setting. The 'normal' setting was chosen, though, to make the results comparable to the FLAC results.

For comparison with lossy audio formats, each CD was ripped to MP3 (using the LAME codec at 256Kbit constant rate encoding) and the open source Ogg Vorbis format (using 256Kbit variable rate encoding). Both were set for 16bit 44.1KHz stereo.

To allow a comparison with uncompressed audio, each CD was also ripped to Wave format (16bit 44.1KHz stereo - considered CD quality).

Album selection.

Six albums from my collection were ripped. It is not a large sample size, nor is it representative of any typical music collection, as everyone has different tastes when it comes to music. However, the music range is different enough to offer some understanding of the features of each type of audio format.

Stankonia by Outkast.
A hip-hop album full of rapping, funky tunes, and several non-musical interludes (BREAK!). 24 tracks, 1h13m07s total time.
Elephant by The White Stripes.
A rock album full of energy and lo-fi, grinding guitar. 14 tracks, 49m47s total time.
as heard on radio soulwax pt.2 by 2 many dj's.
A non-stop mix disc of indescribably varied music that never stops for breath. 30 tracks, 1h01m11s total time.
Resident Evil soundtrack by various artists, from the movie Resident Evil (2002).
Mostly dark, grinding, heavy guitar. The last four tracks are the score of the movie itself, by Marilyn Manson, without vocals, but full of wrenching ambient effects and inhuman guitar. 20 tracks, 1h12m27s total time.
Loud by Timo Maas.
House music, bouncing bass, bouncy rhythms, smooth vocal samples. 14 tracks, 1h05m38s total time.
Hail to the Thief by Radiohead.
Thoughtful tunes that glide from soft to panicked, quiet to soaring, guitars always present. 14 tracks, 56m29s total time.

Results.

Stankonia
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 775,789,440 5m27s
FLAC 513,904,545 6m58s
Monkey's Audio 495,419,947 7m18s
MP3 140,795,904 11m53s
Ogg Vorbis 133,066,417 13m30s
Elephant
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 528,172,792 4m19s
FLAC 322,829,357 5m23s
Monkey's Audio 308,940,151 5m43s
MP3 95,848,448 8m52s
Ogg Vorbis 80,526,633 9m33s
as heard on radio soulwax pt.2
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 650,094,120 4m35s
FLAC 484,558,590 5m49s
Monkey's Audio 467,737,164 6m11s
MP3 119,859,491 10m08s
Ogg Vorbis 128,357,420 11m25s
Resident Evil soundtrack
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 768,509,824 5m32s
FLAC 537,464,226 7m02s
Monkey's Audio 519,937,208 7m24s
MP3 139,468,800 11m58s
Ogg Vorbis 137,557,220 13m13s
Loud
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 696,204,376 5m03s
FLAC 432,831,615 6m22s
Monkey's Audio 412,589,914 6m43s
MP3 126,334,976 10m50s
Ogg Vorbis 122,821,818 12m20s
Hail to the Thief
Format. Output size, bytes. Ripping time, minutes seconds.
Wave 599,019,736 4m36s
FLAC 391,721,051 5m50s
Monkey's Audio 371,117,694 6m02s
MP3 108,701,696 9m41s
Ogg Vorbis 107,228,378 10m39s

Evaluation of results.

In every case (but one), size goes down and time goes up as we step through the sequence: Wave, FLAC, Monkey's Audio, MP3, Ogg Vorbis. The only exception is that Ogg Vorbis actually produced a larger file than MP3 for the album by 2 many dj's. This is possibly because that album is a mix disc that does not slow down once, so there is less chance to use Ogg Vorbis' variable rate encoding (which uses less bits to describe less energetic music).

Building a spreadsheet (using OpenOffice.org 1.1.0) with all the data to show averages, we get the following:

Format. Average of encoded album size (as % of Wave size).
Wave 100%
FLAC 66.57%
Monkey's Audio 63.86%
MP3 18.20%
Ogg Vorbis 17.60%

Clearly, the lossless FLAC and Monkey's Audio cannot compress music anywhere near as small as the lossy formats MP3 and Ogg Vorbis. Furthermore, the MP3 and Ogg Vorbis files created in this test were 256Kbit/sec, and most people seem to create MP3 files using 128Kbit/sec or 160Kbit/sec, which would give even smaller files. For portable storage, lossy formats are very clearly going to give you much more music to carry around at any one time.

But the question here is one of lossless audio, for archiving your music collection. One of the benefits of ripping to a lossless format seems to be that doing so will take much less time than ripping to a lossy format. Looking at average times, we see:

Format. Average of album ripping time (as multiple of Wave time).
Wave 1.000
FLAC 1.266
Monkey's Audio 1.332
MP3 2.143
Ogg Vorbis 2.388

Ripping CDs to a lossy format will take at least 50% longer than ripping to a lossless format. So archiving your CD collection in a lossless format will take less time than doing so in a lossy format. If you decide you need your collection in a lossy format too, for a portable player for instance, then you can use software (such a dMC) to mass-process your lossless audio archive, giving you a second archive in the lossy format of your choice. Doing this will be much quicker than ripping CDs one after the other, especially as the software can just change directories automatically to get through the entire collection, rather than you manually switching CD for each album. (Use the dMC File Selector to convert files from several directories in one go.)

For instance, using dMC to process all of the FLAC files from the album Stankonia, to produce MP3 files using LAME 256Kbit constant rate, took 8m46s. The MP3 files produced had all the correct tag data. At the end of the process was a collection of the lossless FLAC files that form the archive, and a set of lossy MP3 files that are suitable for portable music players.

If we add the 6m58s that it took to rip the original CD to FLAC files, and add it to the 8m46s that it took to process those FLAC files into a set of MP3 files, we get a total time of 15m44s (not including any CD-inserting/removing time). This is less than three minutes longer than the 11m53s that it took to rip the original CD to MP3 files (not including any CD-inserting/removing time). So, for about 33% more time than ripping to MP3 format, we could add FLAC format and MP3 format to our library.

Click this link for a PDF version of the audio comparison data spreadsheet. The spreadsheet does offer some dubious statistical calculations to get a 99% confidence interval of album file size, and album ripping time. However, it has been years since I did statistics, and I may have used the calculation wrongly. Furthermore, the sample size is so small, and so unrepresentative of any 'population' of music that these figures can only be considered a barely-better-than-arbitrary estimate of compression performance range.

Conclusion.

So Gordon Laing is a wise man. It definitely makes sense, in my opinion, to rip your CDs to a lossless format if you have a big hard drive, and if you don't want to rip your CDs all over again each time a new format is necessary.

The question still remains though: which lossless format is best for ripping all your CDs to, to create an archive?

As usual, there is no answer. It's a matter of personal preference. Using the uncompressed Wave format is quickest, but will produce bigger files. Using FLAC will take about a quarter longer, but will reduce the size of files by about a third. Using Monkey's Audio will take about a third longer than Wave, but will reduce the size of the files by a bit more than FLAC. It all depends on which you value more: time or space.

Of course, just because ripping to uncompressed Wave format saves you more time at first, does not mean it will necessarily save you time in the long run. If you have to transfer all of your files from one machine to another, or from one drive to another, the larger size of the Wave files will mean they take longer to transfer. This may become a serious point if you do a lot of transferring.

In my eyes, FLAC offers the best mix of performance and features. The difference in compression size between it and Monkey's Audio is not huge. Even comparing FLAC's 'medium' setting (used for the tests above) to Monkey's 'extra high' setting (not used for the above tests) does not produce a difference of more than a few percent. Also, FLAC is very much open source, using an OSI-approved licence, so it should remain free and available without restriction. And FLAC seems to have been implemented in hardware devices (such as the Squeezebox, the Rio Karma, and the PhatBox) whereas Monkey's Audio has not. This could mean that you decide you don't need lossy audio formats at all, and simply keep all of your music in FLAC format.

As for actual sound quality, I found little difference between any of the formats used in this test. With my speakers, with my ears, each format sounded identical. That's not a reason not to use a lossless format for your archive, but it does mean there's no need to rush away from MP3 just yet. But if you are about to start ripping your prized music collection to digital form, you may as well use a lossless format, so you know that the file represents the original exactly.

Footnote: Evidence that FLAC is lossless.

Some people have expressed a preference for Wave format because they aren't convinced that any compression can really be lossless. (This belief is understandable, but very much incorrect.) To provide evidence that FLAC really is lossless, I ripped several varied tracks to Wave format, then converted the Wave files to FLAC format, then converted the FLAC files to Wave format, and compared the before and after Wave audio files.

For variety, the tracks came from four genres: metal, hip-hop, jungle, and a melodic vocal track which I can't fit into any convenient genre. The tracks are:

For file comparison, I used an MD5 checksum utility called MD5summer by Luke Pascoe. This reads through all the bits of the file and uses them to calculate a fixed-length hash string. A good hash algorithm, such as MD5, should rarely-if-ever produce the same hash string for two different files of the same size. In fact, there are more than ten-to-the-power-thirty-eight possible MD5 hash string results, so the chances of any two files having the same MD5 hash result are pretty tiny. Changing even a single bit of the target file will change the MD5 checksum quite significantly. So the MD5 checksum is a fairly safe way of checking that two files are digitally the same, bit-for-bit.

To examine the result of lossless compression, each wave file was examined with MD5summer and the checksum noted in the table, in the "MD5 of original" column. Then the wave file was converted to FLAC format. Then the FLAC file was converted to a wave file with the same quality settings as the original. The restored wave file was then examined with MD5summer and the checksum noted in the table, in the "MD5 of restored" column. If the FLAC compression really is lossless, then the restored wave file will be exactly the same as the original wave file, and the checksums will match up exactly.

The results of the process were:

Track title. Size (bytes). MD5 of original. MD5 of restored.
I, Zombie 37,338,044 c6ee718f
28d63738
28336795
0000bcbb
c6ee718f
28d63738
28336795
0000bcbb
More Human Than Human 47,409,308 0ef7f1f8
b806147b
b99f2625
b1038df9
0ef7f1f8
b806147b
b99f2625
b1038df9
Woo Hah!! Got You All In Check 47,940,860 80a336fa
2c94c478
31e258b8
de821c0c
80a336fa
2c94c478
31e258b8
de821c0c
Put Your Hands Where My Eyes Could See 34,473,308 45e73f70
d3614918
7509a68c
93ba89b6
45e73f70
d3614918
7509a68c
93ba89b6
Original Nuttah 42,535,964 9444b043
e5fee232
1de80ed5
979c7bda
9444b043
e5fee232
1de80ed5
979c7bda
Open Heart Zoo 56,640,908 7c7ce13f
20840dad
50706b01
58fb1afc
7c7ce13f
20840dad
50706b01
58fb1afc

For each track, the MD5 checksum of the Wave file converted from FLAC matches exactly the MD5 checksum of the original Wave file. The file sizes match exactly, too. This is very strong evidence that FLAC really is lossless. You can convert from Wave to FLAC, and from FLAC to Wave, with no loss of digital fidelity.

Update

August 2009: It's been over five years since I first spoke to Gordon Laing, and since then he's been won over by the benefits of FLAC over uncompressed Wave. Take a look at his new Camera Labs - PC hardware pages which have a strong focus on audio and media hardware that lets you get serious sound quality out of your PC and digital music collection.