Tricks for Gluing Linux Boxes


Tricks for Installing Glued Linux on Problematic Boxes

Contents

Overview

You have a PC you want to install Glued linux on, and you follow the standard OIT procedure, and it doesn't work. Unfortunately, the broad hardware support required on the PC side (because of the broad range of hardware people and vendors put in PCs, and the rapid pace of development of PC hardware) means this is likely not to be a rare occurence.

While Glue staff does an excellent job at assisting with these issues when they pop up, they have a lot of responsibilies, and I at least can get impatient, so the following are some tricks to help resolve the issue without assistance from Glue staff. I assume lab manager rights, as such is required to add entries to the Glue config tree, which is the first step towards gluing a machine.

Much of this was learned/tested while struggling to build a Glue image on a Dell Optiplex GX280. The NIC card on that uses the Tigon (tg3) driver, but needed a new version of the driver than the Glue netboot image had, although updated versions of REL3 supported them. There were further complications with the USB keyboard and mouse. Not a pleasant install.

A year and a half later, and another Dell Optiplex GX280, and this one not only has most of the same issues, but added a couple of new wrinkles to the issue (and another section or two to this document).

If you are considering purchasing a Dell Optiplex GX280 to install Glue's Linux image on, my advise to you is that Dell makes many other nice machines:)

Preliminaries

We assume you have already tried the standard glue install and it failed. These instructions are to deal with the failures, not standard installs. As such, we assume the config tree entries already set up, etc.

You should also have some bootable Linux on floppy or linux on CDs available. Preferably one recent enough that you can successfully boot the problematic system in question up off the removable media and access the network. If you can't boot, get something more recent.

If you can boot, but can't recognize the NIC, you can also try getting a more recent version of the disk. Obviously, Glue will not work on your box until there is a linux driver for it, and any assistance you can give the Glue staff in tracking down the driver will be appreciated by them and will likely speed the Gluing process. At a minimum, you should attempt to determine what the problematic hardware is exactly, using lspci or similar commands, and observing the messages from bootup.

Bypassing the netboot

The normal glue install procedure for any platform involves booting off the network using one of Glues standard netboot images. This boot image then walks you through the install process.

This obviously is problematic if the netboot image (which Glue does not update very frequently if they can avoid it) does not recognize your card. This is particularly painful if the current, updated, regular Glue image will recognize it, but your hardware is just too new for the netboot image.

In this case, if you have a bootable linux on a floppy or CD which recognizes the harddrive, the NIC card, and supports NFS, you might be able to install the glue image on the machine without needing to go through the netboot process. This is not guaranteed, but have done so a few times using The System RescueCD, a Gentoo lived based bootable CD toolkit, using the following procedure (which should be easily adaptable to other boot CDs, etc):

  1. Boot from your bootable media that recognizes the hard drive, NIC card, etc.
  2. Ensure that some essential filesystems are mounted read-write (like / and /tmp), and remount if necessary.
  3. Configure your NIC if necessary. Make sure that the IP address in use exists in the Glue config tree with the GLUE_unix netgroup (otherwise won't be allowed to NFS mount the install directory). If you began this by trying a netboot, and the dhcpd server is still responding to this machines MAC address, the boot media may have already set this up via dhcp. You should still check all the settings. If necessary, configure the interface manually.
    ifconfig eth0 MYIPADDRESS -netmask MYNETMASK -broadcast MYBROADCAST
    You should also check that the gateway is properly set up with the route command, and if necessary configure it manually.
    route add default gw MYGATEWAY
    You should probably add the campus nameservers to /etc/resolv.conf as well.
  4. Check if portmap is running (e.g. rpcinfo -p localhost should return something sensible), and if not start it (run portmap).
  5. NFS mount the install root. This currently is done with the command
    mount pinehurst.umd.edu:/export/software/upgrade/rel30 MNTPNT
    where MNTPNT is someplace where you can mount this install directory on the booted image. I generally use /mnt/temp1 on the System Rescue CD. The above NFS path is currently valid, you may need to look at /tftpboot/linux/rel30/pxelinux.cfg/default for the current path.
  6. You need to chroot to the install directory, using something like
    chroot MNTPNT /bin/tcsh
    where you can choose your shell of choice (provided it is in MNTPNT/bin). You may get some errors from lockd about not being able to monitor the NFS host; these seem to be safe to ignore. Verify the chroot took by looking for the file /sbin/glueinstall; if it does not exist, something is seriously wrong and you will not be able to proceed.
  7. To avoid errors when running glueinstall, you need to ensure the proc filesystem is properly mounted, and /tmp is writable. The mount command will likely give bogus information, so I normally just run
    mount -t proc proc /proc
    mount -t tmpfs tmpfs /tmp
  8. Run /sbin/glueinstall and follow instructions as per a normal install. If you use advanced partitioning feature, you will likely want to install a new kernel in the shell prompt it gives you. If not, you still probably want to install a new kernel (if the netboot image does not support your hardware, the kernel being installed probably does not either), but this can either be done by booting to single-user mode or locally with your boot media, and can probably be done before rebooting because the reboot in the glueinstall script will likely fail because of being run in a chrooted environment. (You should be able to exit, umount the hard drives, and reboot without problem manually.)

Grub boot loader issues

I find that with the netboot bypass method described above, the boot loader almost never seems to install properly. The best, fastest course seems to be to just assume grub did not install properly and reinstall after glue_install is done.

Although there is likely a version of grub on your boot CD, etc., I find it is best to use the version of grub on the OS you just installed. So from a prompt on your boot CD system, issue the commands to mount and chroot into the newly installed root, and mount usr, tmp, and proc filesystems, e.g. I am not sure if all of that is necessary, but it seems to suffice.

From this chrooted shell, we then install grub. I find the native grub install seems to work best. Basically, you find the newly installed root partition in grub's naming convention, set it as your root, and install grub. The following example installs grub to the MBR using (hd0,4) as the root disk; other more advanced stuff can probably be done, read the info pages on grub for more info.


grub> find /boot/grub/grub.conf
(hd0,4)
grub> root (hd0,4)
Filesystem type is ext2fs; partition type 0x83
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage15" exists... yes
Running "embed /boot/grub/e2fs_stage15 (hd0)"... 16 sectors are embedded.
succeeded
Running "install /boot/grub/stage15 (hd0) (hd0)1+16 p (hd0,4)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
Done
grub> quit
where the bold text is output by grub and the plain text is your input. The exact text may vary; it is important to note that the response to the find command should match the argument you give to root. The argument to the install command should not contain a partition number (e.g. the ",4") if you want to install to the MBR.

Upgrading the kernel before system is up

If the netboot kernel is too old to support the hardware on your system, chances are the kernel in the tarball image it is putting on your system is too old also. Thus, you probably want to upgrade your system to the latest kernel if you bypasssed netbooting. The age of the kernel is not generally a problem on a "normal" install, as if the netboot kernel is sufficient to recognize the NIC and harddrive, and maybe a few other critical pieces of hardware, you can just boot the system and everything will get updated in the initial rdist. But if the hard drive or NIC is not recognized, things will not boot properly enough to allow you to do the rdist, until after things are updated, a nice catch-22 situation.

If you got as far as bypassing the netboot process, a manual kernel update is fairly straight-forward. You need to have the system up under a kernel recognizing the NIC and the hard drive; this can usually be done before rebooting after glueinstall is finished during the netboot bypass install, or by booting to the System Rescue CD or similar boot media. Booting to single user mode or to the netboot maintenance mode probably will not work (and if they do, you probably want to just boot fully and let rdist do the kernel update).

On another Glued linux box of same architecture, create a tarball of the kernel and related files, e.g.

tar -cf /tmp/kernel_update.tar /boot/glue/System.map-VER
tar -uf /tmp/kernel_update.tar /boot/glue/vmlinuz-VER
tar -uf /tmp/kernel_update.tar /boot/glue/initrd-VER.img
tar -uf /tmp/kernel_update.tar /lib/modules/VER
tar -uf /tmp/kernel_update.tar /usr/vice/etc/modload/libafs-VER.o
tar -uf /tmp/kernel_update.tar /usr/vice/etc/modload/libafs-VER.mp.o
where VER is the latest kernel version (e.g. 2.4.21-27.0.2.ELsmp). this will probably be about 35-40 MB.

On the system you are installing, make sure the root (/) partition of the installed image is mounted, and the /usr partition is mounted in its normal position below that. Bring the kernel_update.tar or whatever you called it over to this system (usually scp is the best option).

Cd to where the root partition of the installed image is mounted, and tar -xf path_to/kernel_update.tar.

Cd to /boot/glue directory of the installed image's root partition, and modify the symbolic links System.map-current, vmlinuz-current, and initrd-current.img to the new versions you just untarred here.

SATA hard drive issues

SATA hard drives are nice, cheaper than SCSI, and faster than IDE. They also have limitted support under 2.4 Linux kernels. In particular, they tend not to be supported well on some live CD's (ie System Rescue CD).

Your best bet is to see if the BIOS has an option to have the SATA controller run in combination or legacy mode, in which case the SATA drive should appear as a normal ATA drive to the system. This is usually not the default setting as it degrades performance. After Glue is installed, you can reboot, change the SATA setting in BIOS, and see if everything still works. If not, you may need to see if the SATA controller supported in linux, and if supported in Red Hat Enterprise Linux v3 (Linux kernel 2.4) kernels or not.

I have run into some wierd errors with some Phillips DVD/CD burner/reader combos, which would produce read errors when booting from CD with SATA in combination mode. (That same CD would boot fine (barring not recognizing the SATA hard drive) if SATA in normal, non-legacy mode. Another CD (Knoppix) would fail with read errors in either SATA mode, although worked fine on other systems. Diagnostics claimed the DVD/CD device was fine.) Sticking in an older CD-ROM drive seemed to have resolved the issue, at least enough to allow me to boot with HD (in legacy SATA mode) and NIC recognized. You could also try booting from an USB CD-ROM (may need to disable the internal CD-ROM in BIOS), but the linux boot CD's I tried did not like that setup and the boot would fail, losing the CD-ROM at some point.

Serial consoles

I have had problems with install Glued linux on a Dell Optiplex Gx280 because of the USB only (USB v2 only?) keyboard --- apparently during the initial bootup, at some point the keyboard stops working until Kudzu can configure it. This would not be a problem, except Kudzu wants a keypress to know if it should try to configure the keyboard or anything else (including usually the NIC cards), and you cannot tell kudzu to do so because the keyboard is not functioning. You cannot even ssh in and run kudzu manually because the NIC needs to be configured.

The best way I have figured out to get around this is to boot the linux box with a serial console. Ian Burrel has a good discussion on setting up the system to use a serial console. Basically, for a short term use to get around USB keyboard or similar problems,

  1. connect the first serial port of the newly installed box to another system
  2. Start kermit, hyperterm, or whatever terminal program you want to use on the second system to listen on that line at 9600 baud.
  3. boot the newly installed system, and at grub prompt select edit. (In my problematic installs, the keyboard was still functioning at this point; if yours is not, you may need to boot to resuce cd and modify grub.conf to use the console as well). Edit the kernel line of the proper kernel and add console=tty0 console=ttyS0,9600 to the end of that line. Type b to boot this line.
  4. You should start seeing the boot messages on the second system, and you should now be able to respond to kudzu to configure things.
After the system boots, you can try rebooting without the console and see if keyboard, etc. functions.

USB mice

My experience is that systems with USB mice need some tweaking to work properly under Glue. Basically, the PS/2 style mice generally use the device /dev/mouse whereas USB mice generally use the device /dev/input/mice. The standard Glue XF86config file is setup for PS/2 mice, and generally needs the following tweaks to work properly with USB mice:

Search for the InputDevice section in /etc/X11/XF86Config corresponding to Mouse0. Change:


Main Physics Dept site Main UMD site


Valid HTML 4.01! Valid CSS!