作者nfsong (圖書館我來了)
看板PCSH91_305
標題Five Years Of Linux Kernel Benchmarks: 2.6.12 Through 2.6.37
時間Sat Nov 6 17:40:10 2010
http://www.phoronix.com/scan.php?page=article&item=linux_2612_2637&num=1
While we have conducted studies related to the Linux kernel performance in
the past such as benchmarking up to twelve kernel releases, going out the
door this morning are the results from the largest-ever Linux kernel
comparison conducted at Phoronix, and very likely the largest ever of its
kind regardless of source. Every major Linux kernel release from Linux
2.6.12, which was released in mid-2005, up through the latest Linux 2.6.37
development code was tested. This represents the past five years of the Linux
kernel and shows how the performance has evolved over the past 25 stable
kernel releases and the most recent 2.6.37 development kernel.
Benchmarking 26 kernels was no easy feat with running nearly two dozen tests
each time and each test being run multiple times (usually three to five times
as a minimum). Fortunately, with the Phoronix Test Suite combined with an
Intel Core i7 "Gulftown" made this process much faster, easier, and more
reliable than what would otherwise have been possible. A huge thank you goes
out to Intel for supplying Phoronix with the Intel Core i7 970, which is
their 32nm Gulftown processor with six physical cores plus Hyper Threading to
provide a total of 12 threads. The Core i7 970 has 12MB of L3 cache and is
clocked at 3.20GHz while having a maximum turbo frequency of 3.46GHz. This is
one very fast desktop processor as shown in our Intel Core i7 970 Linux
review and more recently within our LLVMpipe Scaling On Gulftown article
where the performance of this Intel LGA-1366 CPU was looked at when running
Gallium3D's LLVMpipe when enabling 1/2/3/4/5/6/12-threads. While the i7-970
is very fast, it's also very expensive at approximately $900 USD (NewEgg.com
and Amazon.com), but it allowed this major Linux kernel comparison to happen
in just under a week of constant testing, which is significantly less time
than it would have required if using one of the less powerful Intel or AMD
CPUs.
With all of this kernel benchmarking being carried out by the Phoronix Test
Suite, particularly when using the latest Phoronix Test Suite 3.0 "Iveland"
work, benchmarking all of these kernels was not so tedious and it ensures our
kernel test results were automated, easily reproducible, and statistically
significant with the tests being carried out multiple times and other
measures taken plus other recent advancements. As this is also being done
with an Iveland snapshot, the new Phoronix Test Suite graphs are being
utilized, which provide error bars on graphs where relevant. The set of tests
we ran on every major kernel release from Linux 2.6.12 through Linux 2.6.37
Git included GnuPG, Gcrypt, OpenSSL, NASA NAS Parallel Benchmarks, TTSIOD 3D
Renderer, C-Ray, Crafty, MAFFT, Himeno, John The Ripper, LAME MP3, 7-Zip,
BYTE, Loopback TCP Network Performance, timed Apache compilation, Apache
web-server, PostMark, FS-Mark, IOzone, Threaded I/O Tester, and PBZIP2.
In order to go back all the way to the Linux 2.6.12 kernel, which puts us to
the era of Ubuntu 5.10, SuSE 9.3, Fedora Core 4, and Mandrake 2006
distributions, we decided to go with Fedora Core 4 as the base operating
system. However, in order to run Fedora Core 4 on the Core i7 970 desktop
system rather than 2005-era hardware where this benchmarking process would
have easily taken weeks, we ran all tests from within a Fedora Core 4 virtual
machine. The Core i7 970 host system was running Ubuntu 10.10 64-bit with the
Linux 2.6.35 kernel on an ASRock X58 SuperComputer motherboard, 3GB of DDR3
system memory, a NVIDIA GeForce GTX 460, and a 64GB OCZ Vertex SSD. The
64-bit FC4 virtual machine with its stock packages and installation options
(aside from disabling SELinux) was using KVM virtualization and had access to
all 12 processor threads, 2GB of system memory, and a 32GB disk image. Every
major Linux kernel release was built from source while following the same
kernel configuration options, build procedure, and user-space. We wanted to
go back even further than the Linux 2.6.12 kernel, but our GCC4 build was
having issues compiling some of these even older kernel releases. This 64-bit
FC4 virtual machine setup to do our Linux kernel benchmarking was the only
active process running on the Gulftown test system.
For some perspective before sharing the results, here is a brief recap of
some of the advancements in each major Linux kernel release over the past
five years, which is condensed from the information found at
KernelNewbies.org.
Linux 2.6.12 [17 June 2005]: The Linux 2.6.12 kernel introduced page-out
throttling, multi-level security for SELinux, address space randomization,
cpusets, I/O barrier support for serial ATA devices, IPv6 support no longer
being marked experimental, hot-pluggable parallel ports, and device mapper
multi-path support.
Linux 2.6.13 [29 August 2005]: Noteworthy features for the Linux 2.6.13
kernel included execute-in-place support, i386 CPU hot-plugging support,
voluntary preemption, inotify, an improved CFQ IO scheduler, DRM support for
VIA Unichrome chipsets, ACL support for NFSv3, kexec and kdump integration,
and a new driver for trusted computing / TPM.
Linux 2.6.14 [27 October 2005]: Introduced in the Linux 2.6.14 kernel was a
NUMA-aware SLAB allocator, SELinux memory improvements, spin-lock
consolidation, detection support for soft-lockups, FUSE (File-system in
User-Space) integration, and initial ATI Radeon R300 3D support.
Linux 2.6.15 [3 January 2006]: The Linux 2.6.15 kernel presented many VFS
changes, page-table scalability improvements, demand faulting for huge pages,
cooperating processes for the anticipatory I/O scheduler, NTFS file-system
write support, and ATA pass-thru with the libata driver. For some
perspective, this is the kernel that was used by Fedora Core 5, Gentoo
2006.0, Ubuntu 6.06 LTS, and others.
Linux 2.6.16 [20 March 2006]: The Linux 2.6.16 kernel release introduced
Oracle's OCFS2 clustering file-system, support for the CELL processor,
support for moving the physical location of pages between nodes on NUMA
systems, high-resolution timers, the configfs file-system, and a number of
driver updates. SuSE 10.1 and Mandriva 2007 picked up this release.
Linux 2.6.17 [17 June 2006]: The third official Linux kernel release of 2006
brought support for Sun's Niagara CPUs, the Broadcom bcm43xx WiFi driver, the
splice I/O mechanism, a scheduler domain for optimizing CPU scheduling on
multi-core systems, block queue I/O tracing, RAID5 re-shaping support, and
performance improvements to EXT3 by mapping multiple blocks at once.
Linux 2.6.18 [20 September 2006]: Some of the Linux 2.6.18 features included
the Lockdep kernel lock validator, SMPnice, swapless page migration, a major
libata driver update with support for features like NCQ (Native Command
Queuing) and hot-plugging, the default I/O scheduler becoming CFQ, and a
variety of new drivers. Fedora Core 6 and PCLinuxOS 2007 shipped with this
kernel.
Linux 2.6.19 [29 November 2006]: The Linux 2.6.19 kernel was the last release
of 2006 and it brought eCryptfs, the first experimental snapshot of EXT4 in
the mainline Linux kernel, physical CPU hot-plug and memory hot-add on
x86_64, support for building x86 kernels with GCC stack protection, vectored
a-synchronization I/O, IDE PATA drivers using libata, and the usual
collection of driver updates. This kernel made its way into Ubuntu 7.04, the
Feisty Fawn.
Linux 2.6.20 [5 February 2007]: The first 2007 kernel release for Linux
brought Sony PlayStation 3 support, KVM virtualization, i386
para-virtualization, x86 re-locatable kernel support, I/O accounting, and a
generic HID layer.
Linux 2.6.21 [25 April 2007]: New to the Linux kernel in April of 2007 was
the Virtual Machine Interface (VMI), KVM updates to support
para-virtualization and live migration along with a stable user-space
interface and CPU hot-plug support, a tick-less kernel / dynticks.
Linux 2.6.22 [8 July 2007]: Just after the 2007 US Independence Day was
marked by the introduction of the new SLUB allocator, a new wireless stack, a
new IEEE-1394 Firewire stack, and a variety of new Linux hardware drivers.
This kernel was found with Mandriva Linux 2008 and openSUSE 10.3.
Linux 2.6.23 [9 October 2007]: Those recovering from Oktoberfest in 2007 had
the new CFS process scheduler to look at along with on-demand read-ahead,
LGuest virtualization, a partial merge of Xen virtualization, KVM SMP guests
and speed improvements, and improvements to the experimental EXT4 support.
Fedora 8 shipped Linux 2.6.23.
Linux 2.6.24 [24 January 2008]: Bringing in 2008 was the Linux 2.6.24 kernel
with tick-less kernel support for x86_64, CFS improvements,
anti-fragmentation patches, USB authorization, and x86_32/x86_64 arch
reunification. Ubuntu 8.04 LTS and openSUSE 11.0 used this kernel.
Linux 2.6.25 [17 April 2008]: Some of the highlights for the Linux 2.6.25
kernel included the introduction of the memory resource controller, real-time
group scheduling, RCU pre-emption support, better process memory usage
management, Latencytop support, and more EXT4 file-system updates.
Linux 2.6.26 [13 July 2008]: In the summer of '08 new in the Linux kernel
world was KVM virtualization support on IA64/PowerPC/S390 architectures,
much-improved Linux web-camera support, wireless mesh networking 802.11s
draft support, x86 PAT support, minor updates to EXT3/EXT4, and various other
work.
Linux 2.6.27 [9 October 2008]: This kernel release introduced delayed
allocation support for the EXT4 file-system for improved disk performance,
block layer data integrity support, multi-queue networking support, MMIOtrace
support, support for external firmware, and support for up to 4096 CPUs on
Linux x86. Ubuntu 8.10 used this kernel.
Linux 2.6.28 [25 December 2008]: Just before ending out 2008 was the Linux
2.6.28 kernel that marked the EXT4 file-system as now being stable,
integration of GEM (the Graphics Execution Manager) developed by Intel for
in-kernel GPU memory management and a pre-requisite for kernel mode-setting
(KMS), and memory management scalability improvements.
Linux 2.6.29 [23 March 2009]: The first Linux kernel release of 2009 brought
support for Intel kernel mode-setting, experimental support for the Btrfs
file-system, SquashFS integration, initial WiMax support, eCryptfs file-name
encryption, and a no journal mode for EXT4. Linux 2.6.29 was also the first
kernel to introduce staging drivers and this release was represented by the
Tuz mascot while Tux took a short holiday.
Linux 2.6.30 [9 June 2009]: Prominent features of this summer 2009 kernel
update brought the NILFS2 and EXOFS file-systems, IEEE 802.11w support,
Tomoyo, LZMA/BZIP2 kernel image compression support, and the integrity
management architecture.
Linux 2.6.31 [9 September 2009]: Just before going off to Oktoberfest 2009
was the Linux 2.6.31 kernel release with initial USB 3.0 support, ATI Radeon
kernel mode-setting support, improved desktop interactivity when the system
is under memory pressure, kmemcheck / kmemleak integration, and integration
of the kernel performance counters infrastructure.
Linux 2.6.32 [3 December 2009]: The last kernel release of 2009 brought Btrfs
file-system improvements, memory de-duplication support, ATI Radeon R600/R700
DRM 3D and kernel mode-setting support, a low-latency mode for the CFQ
scheduler, tracing improvements, and run-time power management.
Linux 2.6.33 [24 February 2010]: Finally getting to this year's kernels we
have Linux 2.6.33. The Linux 2.6.33 kernel introduced the Nouveau driver for
finally having open-source DRM/KMS support for NVIDIA graphics processors
within the mainline Linux kernel, support for Xen POV-on-HVM guests,
swappable KSM pages, memory compressed swapping via Compcache, the KMS
page-flipping ioctl, the VMware Virtual GPU driver, and Google Android
support being dropped from the mainline Linux kernel.
Linux 2.6.34 [16 May 2010]: In May of this year we had the Linux 2.6.34
kernel release that brought Btrfs file-system updates, a-synchronous
suspend/resume support within the power management code, and basic GPU
switching support.
Linux 2.6.35 [1 August 2010]: Before ending out the summer was the 2.6.35
kernel with support for transparent spreading of incoming network load across
all available CPU cores, direct I/O support for the Btrfs file-system, an
experimental journal mode for the XFS file-system, Intel VA-API H.264/VC1
video acceleration support, ATI Radeon power management, and memory
compaction.
Linux 2.6.36 [20 October 2010]: Coming out just last month was the Linux
2.6.36 kernel with KMS+KDB integration, concurrency-managed work-queues,
Intel Intelligent Power Sharing support, improved VM-related desktop
responsiveness, improved open-source graphics, and AppArmor integration.
Linux 2.6.37 [Unreleased]: For the Linux 2.6.37 kernel testing we used a Git
snapshot of the Linus 2.6 Git tree as of 2010-10-31 (one day short of the
Linux 2.6.37-rc1 kernel release), since this kernel will not be released for
a few months. Some of the new features of the Linux 2.6.37 kernel include a
number of DRM improvements, an Intel Poulsbo driver, Broadcom's open-source
802.11n WiFi driver, various core improvements, and the Big Kernel Lock (BKL)
has finally been eliminated from the core kernel code.
It is quite amazing the features introduced into the Linux kernel just in the
past five years and how they have matured, but let us see now how the
performance has changed.
Starting with the GnuPG test profile that measures the time to encrypt a 1GB
file, the performance is relatively flat and uninteresting from Linux 2.6.12
through Linux 2.6.29, but with the Linux 2.6.30 its performance takes a
rather interesting dive. For around 17 consecutive kernel releases at least,
it took just under 10 seconds for this file encryption task to be carried out
within the 12-threaded KVM-powered virtual machine atop an SSD, but with the
Linux 2.6.30 kernel it regressed to taking about 17 seconds and that is how
it's taking up through the very latest Linux 2.6.37 code. Some of the changes
noted in the Linux 2.6.30 kernel change-log for the EXT3 file-system that was
used throughout the Linux testing process included replace-on-rename
heuristics for the data=writeback mode, replace-on-truncate heuristics for
data=writeback, and using WRITE_SYNC for commits caused by fsync(). For those
wondering about what was actually the commit that caused this measurable
kernel regression that is still outstanding, since we can automatically
bisect regressions in the Linux kernel, we will end up reporting it as is
talked about at the end of this article.
When looking at the Gcrypt library performance with the CAMELLIA256-ECB
cipher, there was virtually no change in performance over the past 26 Linux
kernels that were benchmarked.
The OpenSSL performance is also not impacted by any changes to the Linux
kernel in the past five years at least with our 64-bit multi-core virtual
machine.
The IS.C test within NASA's NPB software was not impacted by the kernel
upgrades too much between release, but the Linux 2.6.29 kernel was the
fastest by 10% over the slowest kernel, which was Linux 2.6.12.
With the TTSIOD 3D renderer, which is entirely CPU-based, the frame-rate only
fluctuated by a few FPS with the Linux 2.6.30 kernel being the highest point,
but the past seven kernel releases have all been within a one-frame
difference.
C-Ray, which is an always-interesting multi-threaded ray-tracing test, was
somewhat prone to changes in the Linux kernel. The fastest that C-Ray ran
over the course of the 26 tested Linux kernels was with Linux 2.6.18 where
its time was averaged to 87 seconds while the slowest speed was with the
Linux 2.6.30 kernel at 100 seconds. Since the 2.6.30 release, each succeeding
kernel release has slowly worked its way to being faster.
Crafty, an open-source chess engine, has become marginally faster as the
Linux kernel has matured. The Linux 2.6.37 kernel Git code is about 6% faster
than it was running with the Linux 2.6.12 kernel back in 2005.
With the MAFFT multiple-sequence alignment test that deals with a molecular
biology workload, its performance has been virtually unchanged by the past
five years of Linux kernel development.
Over the course of five years of Linux kernel development, the Himeno Poisson
Pressure Solver application is running 9.5% faster on this virtualized setup
running on an Intel Core i7 970.
The Blowfish performance with John The Ripper is up by 5.7% when comparing
the Linux 2.6.12 and Linux 2.6.37 Git versions.
The LAME MP3 encoding performance went virtually unchanged.
The 7-Zip compression benchmark was particularly volatile to changes in the
Linux kernel. With the Linux 2.6.12 kernel 7-Zip reported 17844 MIPS while it
peaked in the Linux 2.6.23 release with 19376 MIPS and ended out at 17206
MIPS, with several vicissitudes to its performance along the way.
The Dhrystone 2 performance was not interesting in terms of kernel changes
causing a performance impact.
Looking at the time to transfer 10GB via the TCP network loop-back interface
was interesting. The performance actually regressed a fair amount within the
Linux kernel over the past few releases. Google made some improvements in the
Linux 2.6.36 kernel to the networking stack, which may partially attribute
the improvement we noted in this release (from 55 seconds down to 44
seconds), but with the Linux 2.6.37 kernel (44 seconds) it's still noticeably
behind of where this test was running in the Linux 2.6.12 kernel (26 seconds)
and in the Linux 2.6.20 kernel (13 seconds) where it performed the best.
Due to file-system changes or another sub-system, (this is another test where
the results will likely be bisected), the time it took to compile Apache
dropped along the way and is doing rather well in recent versions of the
Linux kernel.
The Apache benchmark is the start of our file-system-focused benchmarks,
which have always proved to be particularly interesting when comparing
versions of the Linux kernel. The Apache web-serving performance was impacted
greatly between Linux kernel releases as the EXT3 file-system was altered and
other changes in the Linux kernel. With the Linux 2.6.12 kernel, Fedora Core
4 with the Apache server was handling 8290 requests per second while it
peaked in the Linux 2.6.27 kernel at an impressive 17132 requests per second
and has receded in almost every release since. The stable Linux 2.6.36 kernel
was handling 7014 requests per second on average and with the latest Linux
2.6.37 unstable kernel it looks to be up to about 7841 requests per second.
While we know the EXT4 performance and Btrfs performance is near-constantly
changing as both file-systems mature and new features added to improve data
integrity and introduce other capabilities, it is interesting the mature EXT3
file-system continues to churn as well.
Fortunately, with the PostMark test, the sustained transactions per second
has improved over time after falling to a low point during the Linux 2.6.20
series of releases. With the Linux 2.6.36 kernel we're at 2673 TPS (and 2847
TPS with the Git 2.6.37 code) where as in the Linux 2.6.12 kernel it was at a
mere 1450 TPS and fell to less than 1000 TPS between the 2.6.20 and 2.6.30
releases.
While it was a bumpy ride along the way, the FS-Mark test when dealing with
1000 1MB files, is also in a better position with the latest Linux 2.6.37
kernel (there is a nice boost between 2.6.36 and 2.6.37) than where it was at
when we began our tests on Linux 2.6.12. Between the Linux 2.6.12 and 2.6.21
releases the FS-Mark performance went from 45MB/s to 67MB/s and currently
it's at 56MB/s.
For looking at the EXT3 Linux disk performance when dealing with large file
reads and writes we have the reliable IOzone. When using IOzone for a 4GB
write with a 64Kb block size, the performance has degraded over time. There
are noticeable drops between the Linux 2.6.15 and 2.6.16 kernels, again
between Linux 2.6.21 and 2.6.22, and then the most significant one between
the 2.6.28 and 2.6.29 kernels. Again though this is something that can be
automatically bisected by the Phoronix Test Suite. When running on the Linux
2.6.12 kernel with this EXT3-based virtual machine atop an OCZ Vertex SSD,
the write speed was 167MB/s, peaked at 176MB/s with the Linux 2.6.14 kernel,
and now with the Linux 2.6.37 kernel it's around 129MB/s.
The 4GB read performance with IOzone using the same block size was not as
impacted during the testing process, but there still was some fluctuation.
The read performance started out at 194MB/s with the Linux 2.6.12 kernel,
then peaked again with Linux 2.6.14 at 209MB/s, and by the time of hitting
the Linux 2.6.37 kernel we have reached 204MB/s.
We used the Threaded I/O Tester to carry out eight threads of 64MB random
writes. While the numbers were volatile along the way, with the Linux 2.6.37
kernel the threaded random write performance in this configuration was at
8.15MB/s where as in the Linux 2.6.12 kernel it was 5.83MB/s.
Looking at the time to compress a 256MB file using Parallel BZIP2
compression, the performance fluctuated by less than one second the entire
time as PBZIP2 performance is largely CPU-bound and the kernel had little
influence on this program's performance.
When this testing began last week, it really was unknown what to expect since
we have never carried out a Linux kernel performance comparison to this scale
before nor have we ever seen such a report. Some users and developers that we
talked to beforehand bet the performance of Linux kernel would drop over the
past five years as the Linux kernel gets increasingly large and more cruft is
added, but in most areas, this is actually not the case. However, at the same
time, in few areas did the Linux kernel performance actually increase over
the 26 versions of the Linux 2.6 kernel that were benchmarked.
As the Linux kernel aged, the performance improved in some areas like with
John The Ripper, Himeno, code compilation performance, PostMark, FS-Mark, and
the Threaded I/O Tester. At least this was the case with the x86_64 Linux
kernel when a KVM-virtualized copy of Fedora. Where the Linux kernel is left
being slower at this point is with GnuPG, Loopback TCP Network Performance,
IOzone, and in other areas on a more miniscule scale. For many of the
application benchmarks the Linux kernel advancements caused little
performance change, at least for a multi-core x86_64 system running in a
virtual machine.
Fortunately, in the future such large comparisons may be obsolete since the
introduction of Phoromatic and for nearly a year have been monitoring the
Linux kernel performance on a daily basis, so organically we will have these
historical measurements in real-time going forward across multiple systems.
Going forward, this kernel test farm and its capabilities will only continue
to grow. We also do benchmarks on a daily basis for the latest Ubuntu
packages as do other independent organizations leveraging our software for
their specific needs.
For some of these regressions, we will be reporting on their precise causes
in a future article; seeing as, after all, we are funded by advertising with
page views plus premium subscriptions, PayPal tips (also in the form of
Augustiner beer at Oktoberfest), and affiliate shopping links. One of the
many Phoronix Test Suite capabilities is the ability to automatically bisect
a Git tree (and other revision control systems) looking for performance and
functional regressions, which we have already used to find performance
regressions within the Linux kernel down to the precise commit in an
automated manner. Combine these modularized capabilities plus those from
other Phoronix Test Suite and Phoromatic features we will make public in the
near future and to professional customers, it becomes a powerful utility to
trivially locate such issues with little manual intervention on the user's
part.
Again, special thanks go out to Intel for their support and supplying the
Intel Core i7 970 "Gulftown" that made benchmarking the past 26 Linux kernels
possible in a timely fashion, especially when it comes to bisecting these
regressions with building even more kernels. With the Core i7 970 and a
solid-state drive, it is possible to build the Linux kernel in about four
minutes or less. We are also in the process of doing a similar type of
performance comparison, but on the compiler side with GCC, DragonEgg, and
LLVM/Clang, to see how the performance has evolved as they play a much
greater role in the performance of user-space programs. These compiler
results will be out next week for the Core i7 and other systems.
Discuss this article in our forums, IRC channel, or email the author. You can
also follow our content via RSS and on social networks like Facebook,
Identi.ca, and Twitter. Subscribe to Phoronix Premium to view our content
without advertisements, view entire articles on a single page, and experience
other benefits.
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 218.161.49.172