When you’re used to the world of Windows or OS X, Linux can seem a little unforgiving. Not only does command-line access hand over the complete keys to the manor to any unwitting user with access to the administrator’s account, there’s rarely a safety net should things go wrong. Despite advances in most Linux desktops (where the ubiquitous Trashcan safely buffers deleted files), you get no such protection from most system-specific configuration, installation and maintenance tools. And while it’s rare for anything to go wrong without your direct input, some accidents do happen, especially if you enjoy tinkering with the latest distro release each month. But this being Linux, there’s plenty you can do to dig yourself out of a hole, which is why it’s always a good idea to have a repair-worthy distribution close to hand when performing configuration and installation tasks.
One of the best developments in recent years has been the Live CD. These offer a fully functional Linux installation that runs from an optical drive. If you’ve got enough memory, you can even install new packages to the RAM disk just as you would when completing a standard installation. This makes a recent release of a Live CD-based installer like Ubuntu Jaunty the perfect system recovery tool. Not only does it include every package you might require, but because it runs from the optical drive rather than the hard drive, your data isn’t touched and there’s no chance files will be overwritten without direct input. It’s the obvious place to start when you get stuck.
Booting Linux
Probably the most common problem is when the Linux boot menu disappears or gets corrupted. The most likely reason for this is that a shared Windows installation has re-stamped its authority over your disk’s master boot record, overwriting the Grub boot loader with its own system-launching code. In these cases, you need to boot into a different Linux environment, either off a Live CD or from any other Linux booting media you can get hold of. The distribution you choose will also need to have Grub installed.
With Ubuntu, open the Terminal from the Accessories menu and type sudo grub. This will launch the boot loader with administrator privileges. From the ‘grub’ command prompt, type find /boot/grub/stage1. This Grub function searches every compatible drive attached to your system for the ‘/boot/grub/stage1’ file, which is used to launch the operating system. When the file is detected, it’ll output the drive and partition number of your lost installation using the format (hd1,0). Your output will look different, but it’s the drive number followed by the partition number of the Linux partition that you’re looking for. Grub should only be installed on a single partition on a single drive, so you shouldn’t find more than one version of the file.
To restore the Grub bootloader to your drive, type root (hd1,0), swapping the drive and partition number with the output you found with the previous ‘find’ command. Then type setup (hd1), swapping ‘hd1’ for your drive number. You should now find that your Linux partition and booting ability has been restored. The only potential problem is that this process could overwrite a Windows bootloader, and if Windows was installed after the original Grub installation, it won’t launch from the boot menu.
Booting Windows
Fortunately, adding Windows to your Grub menu is easy enough, and it’s straightforward to add any other OSes you want to boot from your system if you’ve got an example entry to work from. The key to the boot menu is a file called ‘/boot/grub/menu.lst’, and if you open this in a text editor, you should see that the formatting is relatively easy to understand. For instance, here’s a typical entry for booting a Windows installation off the first partition of the drive:
title Microsoft Windows
root (hd0,1)
makeactive
chainloader +1
You can make this Windows boot entry the default selection by adding a line that has ‘savedefault’ as the only word. Adding Linux entries isn’t quite so easy, as you need to get the path to both the ‘initrd.img’ (RAM disk) and ‘vmlinuz’ (kernel) files correct, according to how they sit on the Linux filesystem. The best way is to copy and paste an existing entry and change the paths accordingly. Older versions of Grub won’t support newer filesystem types. ext4, for instance, is a major upgrade to the standard Linux filesystem and it needs a specially modified version of Grub to boot into it. This could cause a problem if you installed a new Linux distro using ext4 alongside an older one with its older version of Grub. The only option in this case is to upgrade Grub, either manually or through a distribution that ships with the modified Grub, such as Ubuntu Jaunty.
Restoring the MBR
If you ever need to reinstate the Microsoft Windows bootloader onto your disk’s master boot record (MBR), you can use the Windows rescue disk and the ‘fdisk’ command. However, there’s also a handy open-source utility called ‘ms-sys’ that performs the process from a Linux installation. With the tool installed, typing ms-sys -w /dev/hda will create a Microsoft MBR on the first drive. But this may leave you with the opposite problem to the one we started with if Linux is on the same drive. To resolve it, you’ll need to reinstall Grub to get back to your Linux desktop.
Even if you can’t get to your Linux desktop, if you can get to the Grub boot menu then there’s still lots you can do to troubleshoot an installation, whether that boot menu is off a Live CD or a standard installation. Press [Escape] when you see the boot menu, and ‘e’ on the line causing you problems, and you can now edit each entry on the fly. These are the same lines we were editing in the ‘menu.lst’ file, and you can edit in-place options like the root partition for the operating system or the locations of the RAM disk image and kernel. Finally, rather than pressing ‘e’ for edit mode, try accessing the same command-line we used to search for the missing Linux installation. Just press ‘c’ to be dropped to the prompt. ‘Find’ is just one of around 30 commands that you can use to fix problems on your hard drives, list directories and even examine the contents of text files (the ‘cat’ command). For more details on what’s available and how to use it, type help.
Fixing a partition table
Another situation that initially appears to be catastrophic but can be resolved without data loss is if you happen to destroy the partition table of one of your disks. This is the kind of error that could render an entire disk’s worth of information useless, and it will also prevent your PC booting. But partition tables are stored independently of the data on the disk, and there are ways that you can rebuild them.
You might think that it’s particularly difficult to destroy your partition table, but there are ways that you can easily do this accidentally. The most common cause is if your PC is forced to restart while resizing a partition. You might find that the entire partition table is corrupt, despite the fact that no other partitions on the drive were touched by the process. Another likely scenario is that the wrong device name is used while installing Linux onto an external USB device with a command like ‘dd’, resulting in your principle hard drive becoming the incorrect destination for a write command. This can happen from Windows installations too, but Linux can fix both.
The command you need to use is called testdisk. This is one of the most useful commands we’ve ever had to use in an emergency, though it’s not installed on many Live CDs by default. With the latest Ubuntu Jaunty, you’ll need to install it from the Live CD environment using the package manager. After this is done, you should type sudo testdisk on the command line. If you don’t use the sudo command to run with administrator privileges, testdisk will ask for your password when the main page first appears. Before you get to that step, though, you’ll need to let the app know whether you want to create a log file or not. The correct answer is ‘Create’, but most people skip this stage and move straight on to the repair. Before you select the ‘No Log’ option, just remember that a log file can really help if testdisk fails or makes the problem worse. It’s the only way that you’ll know how far the along the process the repair procedure managed to get before it stopped, and where any fatal errors might have occurred.
Writing the new table
After choosing whether or not to create a log file, the next screen you’ll see will list the storage devices attached to your computer. The size of each disk should be correct, along with the unique identifier for the drive at the end of the line. Use the cursor keys to select the drive that you want to repair and press [Enter]. The screen that now appears is the most important, because you need to give testdisk some indication of the type of drive partition used on your system. In the vast majority of cases, this is going to be the first option – an Intel/PC partition. If you’re using a system other than this, then there’s a good chance you’ll already know what it is. You may be using the ultra-new EFI standard, for example, and this can be selected from the list.
After pressing [Enter] again, you’ll see a page that has another list of options. You need to choose the first one to analyse the contents of the drive. This will first display the registered partition details, if possible, before allowing you to perform a quick search for the table configuration within the data on the drive. There’s also an exception for Vista-based partitions, as these are handled slightly differently. If the search is successful, you will see the list of partitions discovered on your drive. If not, you’ll be presented with the option to perform a deeper search, but we’ve never found this necessary on a normal Intel Linux system.
From the page that lists the discovered partitions, make sure that the general parameters are correct – such as one of them being labelled as bootable – and press [Enter]. From this file list of partitions, select ‘Write’ to make the list of partitions you can see on the screen permanent. After a system reboot, you should find your drive fully restored, although there’s a chance you might need to install the Grub bootloader.
Back up your data
Before you start messing with your drives in an attempt to rebuild a working system, make a copy of the data on the drive. There are many ways of making a backup in Linux, but the easiest is to use the ‘dd’ command. This makes a bit-for-bit copy of what’s on your drive, creating the Linux equivalent of a disc image. This means that you can work on this image to restore lost or deleted files without even touching the original disk. The ext3grep command we talk about in the ‘Restore deleted files’ box can use the output of ‘dd’ as its raw input, for example. Unlike ext3grep, ‘dd’ is easy to use. Just execute the command with a source followed by a destination: dd if=/dev/sda1 of=sda1_image.bin, for instance. The only caveat is that ‘dd’ will do exactly as you ask, overwriting anything it finds without any pleasantries. This is a big cause of overwritten boot blocks and MBRs on Linux. The command is also difficult to use because there’s next to no output to tell you what it’s doing, and it can take a while if you leave the block size at its default value. Sometimes the only way you can make sure ‘dd’ is making a copy is to check that the activity light is flashing.
Solving Error 18
One of the more problematic Grub errors is Error 18. It’s a throwback to a time when BIOSes couldn’t detect the size of a large drive properly. There are a couple of solutions to this problem. One is to try changing the drive order in the BIOS; the other is to create a smaller root Linux partition on the drive, as partitions over 500MB in size have been reported to cause the problem with certain BIOSes.
Restore deleted files
Thanks to the way modern Linux filesystems like ext3 use a journal to document file management, there’s no ‘undelete’ command that can simply restore lost and accidentally deleted files. Instead, you’re often left to trawl through the raw blocks of code accessible through the device nodes on your filesystem. But that hasn’t stopped some developers from trying to replicate it. One developer was particularly flummoxed when he accidentally deleted his home directory. Almost a year of work went with a careless execution of the ‘rm -rf’ command. But rather than spending the next few weeks lamenting its loss at a local bar, he spent them creating a tool to restore all those lost files. That tool is called ext3grep, and it’s about as close to an undelete command that us Linux users will ever get. But be warned: the price you pay for file resurrection is complexity.
Ext3grep is likely to be hosted on your distribution’s package repository. Before using it, you’ll need to make sure no processes are accessing files on the partition that held the file you want to recover. It might be easier to simply reboot to a fail-safe or administration mode, or even a Live CD if you need to get at the root filesystem. You then need to use ‘ext3grep’ to search for the missing file. The easiest method of recovering a file called ‘test.odt’ on ‘/dev/sda1’, is to type ext3grep /dev/sda1 --recover-file test.odt. Ext3grep will then search through each block of the device looking for directories, before diving in and looking for references to your file. If it can be found, it will be placed in the RESTORED_FILES folder.