This sort of thing happens quite a lot to me:
That’s output from a FLAC-checking program called AudioTester. It means that an audio file is now corrupt or damaged in some way -and I can promise you that it didn’t start life that way, since my ripping program is set to verify its output before claiming it’s worked. Somewhere along the line, for whatever reason, a bit of “bitrot” appears to have crept in -and, as I say, that’s happened to quite a few of my 40,000+ FLAC files over the years.
My usual response in the past has been to re-rip the affected tracks, of course… but finding the right CD amongst so many candidates can be a right pain. So I wondered whether there was an alternative, and naturally there is. It’s called ‘parity information’: at the time of ripping the CD, you also generate a parity file that can be used to repair files if they are later found to be defective. It’s a bit like RAID-5, but at the file level, rather than the storage device level.
To generate the partity information, I use a program called Multipart. It’s only a 500KB or so zip download. (It’s official download page takes you to pages written in Japanese, and although there are instructions on how to click in the right place, it’s all a bit weird and I prefer having my own copy available from the first link, should I need it in future. Happily, it’s GPL licensed, so I can redistribute it without drama).
The help and documentation on Multipart is not that great, so let me give you a quick walk-through. Here’s a set of music tracks I ripped earlier:
Now, I’ve tested all those FLACs with AudioTester, so I know they’re good and I want to be able to keep them in that state. So, I download the Multipart zip file and unzip it somewhere; then I can launch the MultiPar.exe executable (in other words, there’s no installation routine to go through; just unzip and launch):
(Click on that to see the thing full-sized). You click [Browse] to point it to the directory containing all the ripped files. Notice it’s found the ‘folder.jpg’, not just the FLAC files: this thing computes parity data for any files it’s pointed at, not just audio files.
Then you use the Redundancy slider to set how much parity data you want to collect. In my case, I’ve set it to about 10% -meaning that the parity file will be about 1/10th the size of the original FLAC files. This also affects how much damage can subsequently be repaired: if the redundancy is 50%, then you’ll be able to cope with about half your data being corrupted. There’s no ‘right’ answer, just a trade-off between file size and anticipated damage amounts. In my case, my FLACs only usually get corrupted in one or two blocks at most, so 10% is probably ample.
Next, I click the [Create] button, and MultiPar scans all the files in the directory and generates parity data for them:
At the end of the process, you acquire a couple of new files in the chosen directory:
You’ll notice one of them is very small (just 21KB in this case), and the other is quite big: about 8.6MB here. The first just describes what files have been scanned to create the parity file; the second is the parity file itself. You’ll also note that 8,674KB is just about 10% of the sum total of the 6 FLAC and 1 JPG files that have been scanned (84,584KB by my maths). That’s the result of me selecting 10% redundancy earlier.
All I now need to do is to repeat this collection process for each CD rip folder in my collection -the nice thing about MultiPar being that it’s quite happy to traverse an entire directory tree, complete with lots of per-album subdirectories, generating per-album parity files as it goes. It will take a while, but given my CD rips currently occupy about 750GB, for a modest investment of a further 75GB, I can protect all my music files from damage, so long as that damage doesn’t extend to more than about 10% of any one file. Of course, the time to calculate the parity data is after a nice, fresh, validated rip… it’s no good me collecting parity data for files which have already been damaged!
Let’s pursue this example though. Suppose I were now to damage one of the 6 music files you see above… how would my prior collection of parity data help? Well, first, let’s do some damage: I’ll use the tiny (and free!) hex editor XVI32 to open up track 3 of the previous rips:
Obviously, a hexadecimal representation of audio data doesn’t make a lot of sense! But let’s just concentrate on that line I’ve highlighted: 5CAA2C. You’ll note it contains all sorts of hex-y data. Let me edit that to be a set of zeroes:
Hit the save button after that, and I’ve now got a FLAC file that’s had a small part of its internals zapped! You can tell it’s damaged, because the AudioTester program doesn’t like it any more:
So, can my earlier-generated parity information help me recover from this Frame Checksum error? Sure: all I have to do is to right-click on the smaller of the two parity files I mentioned before and ask to open it with MultiPar:
Note how the program correctly identifies track 3 as being ‘damaged’ -but, since only 0.2% of blocks are damaged, the 10% parity data is more than enough to permit a repair. (Thumbs.db is also declared to be ‘missing’, and the fact that it’s displayed in red indicates it’s actually a file that wasn’t there when the parity data was calculated -not surprisingly, as it’s a Windows system-generated file. It’s presence in that dialog is therefore irrelevant).
So all I have to do is click that [Repair] button on the lower-right, let the MultiPar program compute the lost data from is parity data files and, Lo!
…now it reports track 3 to be ‘repaired’. But is it really? The only way to check is to re-open it in XV132 and check that row 5CAA2C once again:
…and sure enough, there are now proper values for the cells in that row, rather than the series of zeroes I’d saved earlier.
Again, I stress that the parity data allows MultiPar to put back any data in any type of file. It’s not editing or manipulating audio data particularly -it’s just able to work out, via parity calculations, what the blocks of binary data in a file ought to read, compared to what it’s actually reading and thus to set it back to what it ought to be. So it’s quite a useful way of protecting any set of data -such as photographs, video, e-books, PDFs. Anything, in fact, where the deep-down integrity of the data matters to you (and you don’t have a file system -like ZFS- doing this sort of parity checking for you automatically).
If only I’d thought of doing this at the time I originally ripped my CDs There is nothing that can now rescue the bad bit of Schubert I started this blog with (except a good backup, of course). But at least I can now prevent any new nasties creeping into my recordings over time, for a modest expenditure of disk space.