[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SSLUG] Report: Results of the Linux uptime survey (fwd)
Hej alle,
I sidste uge (tror jeg nok) spurgte Winfried Truemper på COLA til Linuxs
stabilitet. Jeg deltog i undersøgelsen, og han har sendt mig resultatet.
Jeg videresender det til SSLUG. Stabilitet må netop være et væsentlig
kriterium for `industrien'. Bemærk, at den maximale oppetid er 351 dage,
hvis må siges at være flot. Desværre deltog meget få maskiner i
undersøgelsen, dvs. statistikken er usikker.
Hilsen
Kneth
+-------------------------------------------------------------+
| Kenneth Geisshirt Ph.D.-student |
| Dept. of Life Sciences and Chemistry Roskilde University |
+-------------------------------------------------------------+
| Linux - the choice of the GNU generation |
+-------------------------------------------------------------+
---------- Forwarded message ----------
Date: Thu, 21 Aug 1997 00:17:39 +0200 (MET DST)
From: Winfried Truemper <winni@xpilot.org>
To: Winfried Truemper <winni@shop.de>
Subject: Report: Results of the Linux uptime survey
[ Before I post the results in c.o.l.a. I wanted to inform all
participants. Strategic comments are welcome. -Winfried ]
Thanks to everybody who took part in the uptime survey of Linux machine.
Currently I'm developing a WWW-form for easier and standarized data
aquisition. Please submit any future report about uptimes only through
this form; it will be announced seperately in comp.os.linux.announce.
Here are the results as of August, 20th, 1997. First are the numbers,
explanations follow below.
Number of machines analysed: 61 (submitted by 50 people)
Machines with high uptime: 31 (reaching uptimes >= 100 days)
of them running 1.2: 14 (= all 1.2 machines)
Above is quite impressive: over 50% of all Linux machines can run for more
than 100 days without an error that brings the system down. All machines
are used in production environments and heavily stressed in various
aspects (networking, number-crunching, graphics). An amount of 62 analysed
machines is not statistical base, though.
Curious: all machines running 1.2 reach a high uptime by default; IMHO
this shows the constant quality of Linux. It could be misleading for
people new to Linux: please note that stock Linux 1.2 has a number of
security bugs and shouldn't be used in new installations. Support for
Linux 1.2 will vanish in the foreseeable future. Patches for 1.2.13 are
still available from http://trishul.sci.gu.edu.au/~tony/linux/patches.html,
though.
Highest uptime: 351 days (followed by 341,311,295,286,249,229,228)
Average uptime: 44 days (between clustered reboots)
Average time
of observation: 209 days
I'm sure there are machines with an even higher uptime. The mentioned
machine is running 1.2.13 and acting as a POP server for a very large
company (not in the computer business).
Overall reboots: 620
Reboots on the same day: 324
Days with reboots: 296 (= clustered reboots)
Jonathan Larmour wrote about the frequency of reboots:
It may seem like a lot, but if you actually look closely at it, the
majority of reboots are in a cluster. This is normally when I play
around with upgrades out of hours, to reduce noticed
downtime. These sometimes don't work quite right, so I need to
reboot again.
Therefore the average uptime was computed as the sum of all uptime days
(13180) divided by the clustered reboots (296). I believe this algorithm
is fair because only a minority of participants had high availability
(minimize outage time) on their mind. Instead, they rebooted Linux
frequently after hardware upgrades or update of distribution to check the
modified setup.
Additional comments from Larry Doolittle:
A plot of reboot frequency vs. time out of the box might be
interesting in a larger context, though. It does take a finite
amount of time fiddling with new equipment to get things "right".
A nice part of Linux is that you can buy a VA research machine that
has been pre-fiddled with, so you can bypass that time and effort.
Even if you do it yourself, the additional effort to set up another
server (running on identical hardware) is small, whereas with (e.g.)
NT, additional licenses are expensive.
And in fact, there is a period of many reboots in 3 days range after
installation or hardware upgrades in many cases. Though, my personal
impression is, that the reboot frequency depends on the experience of the
administrator - which seems plausible, but the collected data is not
complete in this aspect and allows no correlation.
The reasons for downtimes were:
Overall reboots: 620 100%
Unkown reason: 421 67%
For known reason: 100% 199
Upgrades 51% 102
Hardware 19% 38
Software 11% 21
Kernel 21% 43
Failures 38% 75
Power 19% 38
Hardware 2% 4
Software 11% 22
Kernel 6% 11
Moving machine 6% 11
Other 5% 10
Conclusion: never hook your computer to the same circuit as your coffee
machine (downtimes around 9am), your desktop lamp, your refrigerator, etc.
A UPS (uninteruptable power supply) may help for cases of short power
outages (e.g. in rural areas). Secure the main power switch against your
baby and your knee (oops!), though.
Software-Failures which required (?) a machine boot were mainly caused by
the fileservers nfsd and netatalk. From my experience, it is sufficient to
unload/load the kernel appletalk-module to make netatalk working again.
"nfsd" can be killed with a "killall -KILL nfsd" and re-started without
booting the whole system (hint).
Follow-Ups: