A Quick and desperate introduction to data recovery

No, I did not just develop a passing interest in data recovery as a leisure activity. Presumably like everyone else, I learnt about it because it was necessary.

A year or two ago my friend’s computer was packing up, and he asked me to extract and store all their files for him until he got hold of a new one. These files were basically the whole history of their life in photo, movie, word document etc. form – it totalled about 3.9GB. We were pretty successful about extracting the files from his dying machine by booting into Linux and copying them onto my iRiver. I’ve been keeping hold of them ever since, with a copy on my iRiver and another on my home machine to be sure.

During the fairly disastrous upgrade of my machine from Fedora Core 3 to Ubuntu Dapper Drake that I did last week (of which, probably, more later) I managed to delete all the little scripts I keep in my home directory to do useful things. (They represent quite a lot of collected wisdom, and losing them has been annoying, but anyway.) One of those scripts called other scripts to delete old stuff I wasn’t interested in from my iRiver, and synchronise my local copy of my iRiver’s contents with the machine itself.

Obviously, I made a small mistake in re-writing that script, which meant that it started deleting everything older than 2 weeks old off my entire home directory. Since I had gone off to bed while this was running, I was extremely lucky that it hit a read-only file fairly soon so I didn’t lose much, but what I did lose was the zip file into which I had put all my friend’s files.

Of course, my synchronisation program worked like a dream, running as scheduled that night and merrily deleting the zip from my iRiver too. Lesson 1: Don’t let your backup program delete files automatically.

If it had been my own data I would have been gutted, but the thought that I had lost someone else’s lifetime collections of photos and sentimental things was pretty horrible.

So I began several long nights of investigating data recovery. I assumed it would be easier to get the files off my Linux ext3 partition, since it is well documented and hackable, but in fact, I learned lesson number 2: You can’t recover deleted files off an ext3 partition. That is probably not strictly true, but in practice, it is. The reason is that when you delete a file in ext3 it blanks the record of it from somewhere, so all that’s left is the actual data with no index entry. This makes it pretty much impossible to recover anything.

I tried lots of different things, including using grep and strings to search the device directly, and debugfs which appears only to be useful for ext2. The closest I got was a program called foremost which knows what the start and end of each type of file looks like, and searches the raw data for things matching the type of file you are looking for. Foremost seems really cool, and it found lots of things that looked like zip files, but almost none of them actually were valid zips, and it didn’t find anything big enough to be the file I wanted anyway.

So much for my Linux drive. Lesson 3: Consider ext2 for a backup partition. It is apparently easy to undelete deleted files on ext2. On the other hand, if the computer crashes while writing to an ext2 drive, you are much more likely to have corrupted data than if you use ext3. Quelle dommage.

I find it quite pleasing that the fact that I had a backup meant not that my files were backed up, which would of course be too easy for a person who has recently been branded Master of Pain (Receiving) by his colleagues, but it did mean I had another deleted file on a different filesystem which I could attempt to recover.

My iRiver (which, incidentally, I love even more after this incident than I did before) uses a FAT32 filesystem, which everyone agrees is rubbish, but which has the highly relevant advantage of being simple and thus less difficult to recover data from.

I made a copy of the entire disk by doing dd /dev/sd1 > myhd.raw and then I was free to attempt recovery on this raw file without fear that I would overwrite something on the real disk.

I hoped it would be easy to get the file back off FAT32, but it turned out to be difficult as well, mainly because the file was so huge. I tried lots of different programs, most of which crashed unceremoniously, or couldn’t find my file. Lesson 4: Don’t give up. Many of these programs were lying to me – telling me to give up. Had it been my own data, I would have given up. We are several days into my nightly trial now. I was tired.

Then, like a shining beacon of professionalism and respect for data, over the horizon came Autopsy and its underlying Sleuth Kit tools. I installed them, and started the server by typing autopsy and then pointed my web browser at http://localhost:9999 and followed the instructions. It could see my file! I told it to recover it, and my browser said it was downloading the file.

And then it failed. I tried again. It failed.

So I tried with wget. It failed. I tried with curl. It failed.

I realised the problem – the file was over 2GB, which is not supported in some downloading software! I tried with wget again, after confirming that it supported large files. It failed again.

Lesson 5, which really I should have figured out ages ago is: Don’t use large files. Especially not large zip files containing all your precious data, obfuscating the files from your recorvery program.

Now, I knew autopsy was running on my local machine and then providing the file I needed as a download, so I knew I could access the file directly by using the tools autopsy was using underneath.

Autopsy, may the Lord bless you for making those tools accessible separately.

Autopsy had told me the inode number of the file I wanted, so a quick read of the icat tool’s man page showed me what to do. I ran the command and ended up with a 3.9GB file!

So I unzipped it.

Obviously, it failed.

Full of optimism, I assumed that this really was my file, but that it had been corrupted slightly by having some of its data over-written. I started looking for zip file recovery programs.

To cut a long story short, there are a lot of zip file recovery programs for Windows and Linux, and some of the Windows ones appear to work using Wine, but in the end (which I should have thought of from the beginning) the one that actually worked, rather than tantalising you with the names of the files and then either crashing or charging you money to recover them, is the one written by the guy who invented the zip file. The trial version of PKZip for Windows, running on a Windows machine successfully opened and repaired my damaged archive, with this summary message:

Extracted 3,151 files

Skipped 0 files

62 errors/warnings

Yes, out of 3,151 files, 62 were damaged, and the rest were fine.

It was a long journey, but we made it. Finally, lesson 6: If it’s important, have 2 backups.

FreeGuide fan mail

Really encouraged to receive this:

Hello Mr Balaam,

A quick email to thank you for giving the world the brilliant Freeguide.
(I saw in your blog that you have a high volume of email - no reply needed).

I tried Digiguide for a while, loved the principle, found the interface too cluttered. Went back to downloading pages from here and there, and dipping into the Radio Times timeline, which is not quite customisable enough.

I happen to live in a place where I get an unusual mix of channels - UK free satellite + UK terrestrial +Irish terrestrial. I'd never found a single listings service which covers the options.

Then, by chance, I found Freeguide, and I love it. I couldn't ask for more (except, perhaps, for customisable colours for program categories :-) ) Incidentally, one of your blog entries wondered how many painless installs there currently were. Here's one more to be counted (WIn2K, Sun Java v5Update6)

I don't know much about software, but I'm sure this must have taken a vast amount of your time. Thank you for every moment of it. This program will be useful every day, and I'll be passing it on at every chance I get.

Best wishes
Nick Nixon

Over-engineering gone mad

Picture the scene: you are writing a Java application and you’re trying to do things right, so you use the java.util.logging code to do your logging. To create a logger you do this:

Logger log = Logger.getLogger( "name" );

Everything goes fine until you want to set the logging level from a command-line argument. You set the logging level like so:

log.setLevel( lev );

Where lev is, e.g. Level.FINE.

When you run your program, none of your log messages appear except the ones that appeared before (SEVERE, WARNING, INFO). Nothing appears to have changed.

You read some API docs, and find out that each Logger has a list of handlers. So you call getHandlers() on your Logger.

It doesn’t have any.

You take a breath.

You read some more, muttering about Sun engineers having spent too much time at university, and find out that there is a hierarchy of Loggers containing other Loggers.

You simultaneously wonder how that could be useful, and whether some stupid thing is happening here to do with that.

You try log.getParent() and find that it is not null. Eureka! You’ve got it now:

log.getParent().setLevel( lev );

You hastily recompile, palms sweaty with finally cracking a problem that should not have taken this long.

Nothing has changed. You still don’t get your log messages.

OK, that’s fine: you know about handlers now, so it’s a simple change – we’ll do this instead:

log.getParent().getHandlers()[0].setLevel( lev );

You recompile, trying not to get your hopes up.

Nothing.

You consider re-writing your application (which has been in development over 5 years) in any other language. You decide to try one more thing before giving up. What if you need to tell both the Logger and the handler what level to use? It would be pretty awkward, but not outside the realms of possibility.

log.getParent().setLevel( lev );
log.getParent().getHandlers()[0].setLevel( lev );

Not even very hopeful now, you recompile.

Nothing.

This is the moment where you go and find anyone who will understand (or failing that anyone at all) and tell them about it. They wipe the spittle from their faces and appear keen to leave.

You will not be defeated by this. You try every combination you can think of. You start looking into how to log a bug with Sun. You curse and curse again.

A very long time later, you know you should have given up, and you’re trying things just out of bloody-mindedness, and you stumble across this:

log.setLevel( lev );
log.getParent().setLevel( lev );
log.getParent().getHandlers()[0].setLevel( lev );

Yes, that’s right. To set the log level of the default logger in Java you have to learn about handlers and the logger hierarchy, and you have to set the level in three places.

THREE PLACES.

You swear never to feel superior again when people start having “language wars” about which programming language is better. They are not all the same.

You engage in a relaxing pasttime, such travelling to Sun’s development centre and punching the person who “designed” that interface in the face.

Keyboard layout change in Ubuntu

I’ve had this problem a couple of times in my Ubuntu machine. I installed it with US keyboard layout, and then changed it to UK later (for my user). Even when I select the correct keyboard model (Generic 105-key (Intl) PC), and layout (United Kingdom International) some keys still act weirdly (e.g. pressing ” does nothing, and if I press it twice I get a strange “-like symbol). I fixed it by running:

sudo xmodmap /usr/share/xmodmap/xmodmap.uk

FreeGuide source layout (and SVN migration)

Christian and others have pointed out lots of problems with the source code layout in FreeGuide. The biggest problem is that each plugin has its own source tree which makes it very difficult to set up in an IDE like Eclipse. I knew it needed doing, and the other night I couldn’t sleep and I felt like making it happen. Since SVN is so much better than CVS for re-arranging source code (because you can move files and keep their history) and because sourceforge are now offering SVN, I decided the right thing to do would be to migrate to SVN before making the changes.

So far, I have managed to get all the source code to build from the command line (instructions here: freeguide-tv.sourceforge.net/dev/index.php/Build_from_SVN) and I’m working on making the Ant build work again, although there’s still a bit to do there.

My aim is for it to be extremely easy to start developing FreeGuide: just download or check out the source code, fire up your editor or IDE, compile and run. That means FreeGuide needs to handle loading plugins from either JARs or directories (without being told which to do), and it needs to know where to look for its lib and doc directories if they are not specified.

Soon, it will Just Work. Then hopefully this will encourage more developers to get involved.

Meanwhile I’ve decided if I want to build up some momentum I need to balance my limited time between code and email, instead of just fire-fighting the email all the time. Some individual users may not get their questions answered, which I hate, but in the end it would be worse if the project lost interest and faded away. I need to inject some excitement!