A Quick and desperate introduction to data recovery

No, I did not just develop a passing interest in data recovery as a leisure activity. Presumably like everyone else, I learnt about it because it was necessary.

A year or two ago my friend’s computer was packing up, and he asked me to extract and store all their files for him until he got hold of a new one. These files were basically the whole history of their life in photo, movie, word document etc. form – it totalled about 3.9GB. We were pretty successful about extracting the files from his dying machine by booting into Linux and copying them onto my iRiver. I’ve been keeping hold of them ever since, with a copy on my iRiver and another on my home machine to be sure.

During the fairly disastrous upgrade of my machine from Fedora Core 3 to Ubuntu Dapper Drake that I did last week (of which, probably, more later) I managed to delete all the little scripts I keep in my home directory to do useful things. (They represent quite a lot of collected wisdom, and losing them has been annoying, but anyway.) One of those scripts called other scripts to delete old stuff I wasn’t interested in from my iRiver, and synchronise my local copy of my iRiver’s contents with the machine itself.

Obviously, I made a small mistake in re-writing that script, which meant that it started deleting everything older than 2 weeks old off my entire home directory. Since I had gone off to bed while this was running, I was extremely lucky that it hit a read-only file fairly soon so I didn’t lose much, but what I did lose was the zip file into which I had put all my friend’s files.

Of course, my synchronisation program worked like a dream, running as scheduled that night and merrily deleting the zip from my iRiver too. Lesson 1: Don’t let your backup program delete files automatically.

If it had been my own data I would have been gutted, but the thought that I had lost someone else’s lifetime collections of photos and sentimental things was pretty horrible.

So I began several long nights of investigating data recovery. I assumed it would be easier to get the files off my Linux ext3 partition, since it is well documented and hackable, but in fact, I learned lesson number 2: You can’t recover deleted files off an ext3 partition. That is probably not strictly true, but in practice, it is. The reason is that when you delete a file in ext3 it blanks the record of it from somewhere, so all that’s left is the actual data with no index entry. This makes it pretty much impossible to recover anything.

I tried lots of different things, including using grep and strings to search the device directly, and debugfs which appears only to be useful for ext2. The closest I got was a program called foremost which knows what the start and end of each type of file looks like, and searches the raw data for things matching the type of file you are looking for. Foremost seems really cool, and it found lots of things that looked like zip files, but almost none of them actually were valid zips, and it didn’t find anything big enough to be the file I wanted anyway.

So much for my Linux drive. Lesson 3: Consider ext2 for a backup partition. It is apparently easy to undelete deleted files on ext2. On the other hand, if the computer crashes while writing to an ext2 drive, you are much more likely to have corrupted data than if you use ext3. Quelle dommage.

I find it quite pleasing that the fact that I had a backup meant not that my files were backed up, which would of course be too easy for a person who has recently been branded Master of Pain (Receiving) by his colleagues, but it did mean I had another deleted file on a different filesystem which I could attempt to recover.

My iRiver (which, incidentally, I love even more after this incident than I did before) uses a FAT32 filesystem, which everyone agrees is rubbish, but which has the highly relevant advantage of being simple and thus less difficult to recover data from.

I made a copy of the entire disk by doing dd /dev/sd1 > myhd.raw and then I was free to attempt recovery on this raw file without fear that I would overwrite something on the real disk.

I hoped it would be easy to get the file back off FAT32, but it turned out to be difficult as well, mainly because the file was so huge. I tried lots of different programs, most of which crashed unceremoniously, or couldn’t find my file. Lesson 4: Don’t give up. Many of these programs were lying to me – telling me to give up. Had it been my own data, I would have given up. We are several days into my nightly trial now. I was tired.

Then, like a shining beacon of professionalism and respect for data, over the horizon came Autopsy and its underlying Sleuth Kit tools. I installed them, and started the server by typing autopsy and then pointed my web browser at http://localhost:9999 and followed the instructions. It could see my file! I told it to recover it, and my browser said it was downloading the file.

And then it failed. I tried again. It failed.

So I tried with wget. It failed. I tried with curl. It failed.

I realised the problem – the file was over 2GB, which is not supported in some downloading software! I tried with wget again, after confirming that it supported large files. It failed again.

Lesson 5, which really I should have figured out ages ago is: Don’t use large files. Especially not large zip files containing all your precious data, obfuscating the files from your recorvery program.

Now, I knew autopsy was running on my local machine and then providing the file I needed as a download, so I knew I could access the file directly by using the tools autopsy was using underneath.

Autopsy, may the Lord bless you for making those tools accessible separately.

Autopsy had told me the inode number of the file I wanted, so a quick read of the icat tool’s man page showed me what to do. I ran the command and ended up with a 3.9GB file!

So I unzipped it.

Obviously, it failed.

Full of optimism, I assumed that this really was my file, but that it had been corrupted slightly by having some of its data over-written. I started looking for zip file recovery programs.

To cut a long story short, there are a lot of zip file recovery programs for Windows and Linux, and some of the Windows ones appear to work using Wine, but in the end (which I should have thought of from the beginning) the one that actually worked, rather than tantalising you with the names of the files and then either crashing or charging you money to recover them, is the one written by the guy who invented the zip file. The trial version of PKZip for Windows, running on a Windows machine successfully opened and repaired my damaged archive, with this summary message:

Extracted 3,151 files

Skipped 0 files

62 errors/warnings

Yes, out of 3,151 files, 62 were damaged, and the rest were fine.

It was a long journey, but we made it. Finally, lesson 6: If it’s important, have 2 backups.

3 thoughts on “A Quick and desperate introduction to data recovery”

Grant says:

February 3, 2008 at 8:53 pm

Brilliant write up, and much sypathy, and congratulations!

The funny thing is, I have messed up more files, when backing up, then if I had left them alone!

I’m currently looking for 6hrs of work I deleted accidently after migrating a machine. Proably will take me more time to recover them!

My mistake was to use a GUI copying program in linux, not as root! (should have used the CL!) when moving files.

Now I have to try and recover some files from an ext3 partition. I also foolishly didn’t partition of my data directory, which would have saved me some of the agro.

Wish me luck!
Andy Balaam says:

February 3, 2008 at 9:17 pm

Hi Grant, good luck!

I’m afraid you’re going to need it – ext3 seems to be extremely difficult.

My fingers are crossed for you.
zlatan24 says:

July 30, 2008 at 6:51 am

Where is fine tool-recovery zip, is used to recover files from corrupted ZIP archives, recover encrypted data, the zip file recovery tool working with password-protected ZIP archives, the zip file recovery tool working with ZIP archives larger than 4 GB, recover data from corrupted media (floppy disks, compact disks, Zip drives and others), Work with ZIP archives via the LAN, use several independent algorithms makes it possible to recover as much useful information from a corrupted archive as possible.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 thoughts on “A Quick and desperate introduction to data recovery”

Leave a Reply