Category Archives: Technical

TECHNICAL: Patching Linux machines… reboot or not to reboot?

Howdy all,

I keep seeing comments from a lot of people on various boards/sites that Linux is so much better then Windows because you “never” have to reboot Linux machines and therefore Linux must be superior to Windows!

This kinda creeps me out. When I login to a Linux box that has not been rebooted for 800 days I often slap my face. Why? ….oh boy, here is a list of things:

1. Services. Will they start up automatically after the next reboot?
2. Broken updates. Will the machine boot at all? (Before you yell at me and tell me that updates never break anything…..I have seen my share of errors after updates;)
3. Hardware. Will the machine post after the next reboot? Is everything else ok?
4. Glibc updates. Sometimes you have some pretty critical fixes that should really be activated (Glibc is loaded on each application start, therefore you really should restart all applications on the server anyway so they are updated). Please read the changelog for Glibc updates the next time you are updating.
5. fstab and other configuration files. Are you 100% sure those are all up to date and all your filesystems are correctly defined there?
6. Kernel updates. Sure…you could use KSplice (or no…Oracle just bought them and they are not taking any new customers!) Kernel updates often include security updates as well as critical bugfixes. Please go through the changelogs and make sure you are not affected by the changes before saying that you do not need to reboot into the new kernel.

And there are probably more reasons. Those are just some of the things that popped in my head.

I am pretty sure most of you roll your eyes and call me stupid. But I am telling you….those things happen. And you don’t really want to have to deal with them after a catastrophic power failure in your datacenter (or in any other catastrophic situation).

If your system is so critical that you can never do any maintenance on it then you might have a huge design flaw in your infrastructure. You could minimize the downtime with a HA cluster. However you will always have *some* downtime unless your application has some kind of active/active ability (some kind of loadbalancing setup with a smart backend etc).

Make sure you can schedule downtime for patches (and even look into firmware updates as well). Getting bitten by stupid “Linux is better then Windows!” fights and decide not to reboot because of your pride is …well….stupid 😉

Just my 2 cents….