No, I did not just develop a passing interest in data recovery as a leisure activity. Presumably like everyone else, I learnt about it because it was necessary.
A year or two ago my friend’s computer was packing up, and he asked me to extract and store all their files for him until he got hold of a new one. These files were basically the whole history of their life in photo, movie, word document etc. form – it totalled about 3.9GB. We were pretty successful about extracting the files from his dying machine by booting into Linux and copying them onto my iRiver. I’ve been keeping hold of them ever since, with a copy on my iRiver and another on my home machine to be sure.
During the fairly disastrous upgrade of my machine from Fedora Core 3 to Ubuntu Dapper Drake that I did last week (of which, probably, more later) I managed to delete all the little scripts I keep in my home directory to do useful things. (They represent quite a lot of collected wisdom, and losing them has been annoying, but anyway.) One of those scripts called other scripts to delete old stuff I wasn’t interested in from my iRiver, and synchronise my local copy of my iRiver’s contents with the machine itself.
Obviously, I made a small mistake in re-writing that script, which meant that it started deleting everything older than 2 weeks old off my entire home directory. Since I had gone off to bed while this was running, I was extremely lucky that it hit a read-only file fairly soon so I didn’t lose much, but what I did lose was the zip file into which I had put all my friend’s files.
Of course, my synchronisation program worked like a dream, running as scheduled that night and merrily deleting the zip from my iRiver too. Lesson 1: Don’t let your backup program delete files automatically.
If it had been my own data I would have been gutted, but the thought that I had lost someone else’s lifetime collections of photos and sentimental things was pretty horrible.
So I began several long nights of investigating data recovery. I assumed it would be easier to get the files off my Linux ext3 partition, since it is well documented and hackable, but in fact, I learned lesson number 2: You can’t recover deleted files off an ext3 partition. That is probably not strictly true, but in practice, it is. The reason is that when you delete a file in ext3 it blanks the record of it from somewhere, so all that’s left is the actual data with no index entry. This makes it pretty much impossible to recover anything.
I tried lots of different things, including using grep and strings to search the device directly, and debugfs which appears only to be useful for ext2. The closest I got was a program called foremost which knows what the start and end of each type of file looks like, and searches the raw data for things matching the type of file you are looking for. Foremost seems really cool, and it found lots of things that looked like zip files, but almost none of them actually were valid zips, and it didn’t find anything big enough to be the file I wanted anyway.
So much for my Linux drive. Lesson 3: Consider ext2 for a backup partition. It is apparently easy to undelete deleted files on ext2. On the other hand, if the computer crashes while writing to an ext2 drive, you are much more likely to have corrupted data than if you use ext3. Quelle dommage.
I find it quite pleasing that the fact that I had a backup meant not that my files were backed up, which would of course be too easy for a person who has recently been branded Master of Pain (Receiving) by his colleagues, but it did mean I had another deleted file on a different filesystem which I could attempt to recover.
My iRiver (which, incidentally, I love even more after this incident than I did before) uses a FAT32 filesystem, which everyone agrees is rubbish, but which has the highly relevant advantage of being simple and thus less difficult to recover data from.
I made a copy of the entire disk by doing dd /dev/sd1 > myhd.raw and then I was free to attempt recovery on this raw file without fear that I would overwrite something on the real disk.
I hoped it would be easy to get the file back off FAT32, but it turned out to be difficult as well, mainly because the file was so huge. I tried lots of different programs, most of which crashed unceremoniously, or couldn’t find my file. Lesson 4: Don’t give up. Many of these programs were lying to me – telling me to give up. Had it been my own data, I would have given up. We are several days into my nightly trial now. I was tired.
Then, like a shining beacon of professionalism and respect for data, over the horizon came Autopsy and its underlying Sleuth Kit tools. I installed them, and started the server by typing autopsy and then pointed my web browser at http://localhost:9999 and followed the instructions. It could see my file! I told it to recover it, and my browser said it was downloading the file.
And then it failed. I tried again. It failed.
So I tried with wget. It failed. I tried with curl. It failed.
I realised the problem – the file was over 2GB, which is not supported in some downloading software! I tried with wget again, after confirming that it supported large files. It failed again.
Lesson 5, which really I should have figured out ages ago is: Don’t use large files. Especially not large zip files containing all your precious data, obfuscating the files from your recorvery program.
Now, I knew autopsy was running on my local machine and then providing the file I needed as a download, so I knew I could access the file directly by using the tools autopsy was using underneath.
Autopsy, may the Lord bless you for making those tools accessible separately.
Autopsy had told me the inode number of the file I wanted, so a quick read of the icat tool’s man page showed me what to do. I ran the command and ended up with a 3.9GB file!
So I unzipped it.
Obviously, it failed.
Full of optimism, I assumed that this really was my file, but that it had been corrupted slightly by having some of its data over-written. I started looking for zip file recovery programs.
To cut a long story short, there are a lot of zip file recovery programs for Windows and Linux, and some of the Windows ones appear to work using Wine, but in the end (which I should have thought of from the beginning) the one that actually worked, rather than tantalising you with the names of the files and then either crashing or charging you money to recover them, is the one written by the guy who invented the zip file. The trial version of PKZip for Windows, running on a Windows machine successfully opened and repaired my damaged archive, with this summary message:
Extracted 3,151 files
Skipped 0 files
62 errors/warnings
Yes, out of 3,151 files, 62 were damaged, and the rest were fine.
It was a long journey, but we made it. Finally, lesson 6: If it’s important, have 2 backups.