Uptime Nostalgia

When I first started here at SingleHop, over two years ago, one of my first tasks was to overhaul our aging DNS system. We had been creaking along using zone files, but their number was quickly growing out of control. Also, we rely heavily on automation here, and using scripts to automatically edit files can be quite error-prone.

So, back in April of 2008, we migrated to BIND-DLZ and moved all of our DNS records into MySQL databases. I released a DNS zone parsing script you can use to do the same thing in May.

This system has been incredibly stable. After a few initial kinks, we haven't even had to think about our authoritative DNS servers at all. During our recent internal DB overhaul, however, we discovered that the version of MySQL hosting the DNS records was too old, and it couldn't be updated because our DNS servers were still running CentOS 4.8. We were gonna have to migrate to an entirely new operating system, and to do that with the least downtime required setting up new servers.

Today, we finished setting up the new DNS servers and switching our infrastructure over to using them. When it came time to decommission our old nameservers, I decided to see just how long they had been running:

[root@ns1 ~]# uptime
16:55:01 up 593 days, 03:12, 2 users, load average: 0.11, 0.06, 0.04
---
[root@ns2 ~]# uptime
16:53:45 up 626 days, 10:40, 3 users, load average: 0.10, 0.08, 0.09

Like many unix nerds, I tend to get obsessed with uptime. For instance, my own personal server has been up for 99 days, and even my laptop hasn't rebooted in over a month. So it was not without a heavy heart that I destroyed the epic 626-day uptime on ns2 when I shut it down today.

These long-running servers got me to thinking: how many of our other internal servers have been running for a really long time? It turns out, having servers that run for years isn't unusual here at SingleHop. For instance, here is the uptime from our Quick Reaction monitoring server: 20:50:11 up 578 days, 1:30, 2 users, load average: 1.06, 1.25, 1.36; our customer database server: 20:33:42 up 647 days, 17:35, 1 user, load average: 0.29, 0.53, 0.63; the singlehop.com web server: 20:53:00 up 317 days, 12:40, 4 users, load average: 0.05, 0.07, 0.07.

However, the champion among all the servers I checked today is one of our oldest backup servers:

backup02:~# uptime
 20:44:48 up 899 days,  6:41,  1 user,  load average: 0.05, 0.02, 0.07

Considering that SingleHop is not even three years old, a two-and-a-half-year uptime is pretty impressive indeed.

All of this is possible thanks to the magic of Linux. Software updates happen automatically, and the computer does not need to be rebooted to take advantage of them -- only the affected applications need be restarted. And, thanks to our KSplice service, we don't even have to reboot our servers to do kernel updates! We can just keep our infrastructure humming without a care, for years at a time!

So, what's YOUR longest-running server? Share your story on SingleHop's community forum: http://community.singlehop.com/general-discussion-introduce-yourself/403-what-your-longest-running-server-we-just-reluctantly-rebooted-899-one.html#post2409