Linux Scheduler Statistics Linux Scheduler Statistics

Please note: schedstats is present without a patch in -mm kernels after 2.6.8-rc3-mm1, and mainline kernels 2.6.9 and after. Versions previous to those need patches.

This patch introduces some scheduler statistics with (so far as I can tell) undetectable impact. It may make it easier to objectively measure changes in the scheduler. It's very scheduler specific, so patches which modify the scheduler may well require this patch to be modified. By the same token, it should be easily extensible should new areas of the scheduler benefit from statistics gathering.

Starting with the patch for 2.5.65, this became a config option (default is on, since you took the trouble to apply it). In versions where it is already in the kernel, the default is off and you'll need to enable it.

This patch provides a solid framework for modifying or expanding the statistics collected. A per-cpu data structure of counters minimizes the performance impact in what is otherwise a highly critical and sensitive piece of code. The data structures are not just per-cpu, but cache-aligned as well. Testing has not indicated any significant difference in benchmark performance.

Modifications are also made to /proc/<pid>/stat to include scheduling information on a per-process level. The simple program latency.c (see below) can take advantage of that to report scheduling effects on individual processes.

The patch creates a new entry in /proc (schedstat) which yields lines of slowly increasing numbers. The meaning of the various numbers can vary by each version of the patch. Indeed, different versions may have wholly different formats, so full descriptions for each version are available through the links in the table at the bottom of this page. Here's sample output from the latest version.

version 9
timestamp 207856
cpu0 0 0 0 0 69732 1780 117831 668 1 3 17 54469 951 52798 3298 3233 408 14035 240112 196933 71512 1048 2785 0 1 91 91
domain0 000f000f 95641 852 1881 93760 0 562 0 0 562 0 46410 0 36468 9937 5 0 0 2919 40
domain1 ffffffff 1020 525 624 477 0 9 3 4 6 0 0 0 0 0 0 408 408 0 53
cpu1 0 0 0 0 49321 1831 75724 570 1 1 6 34062 1008 32588 3129 3097 371 12690 236210 60357 51152 1301 2044 0 1 82 101
domain0 000f000f 91447 1441 2726 88721 0 741 0 0 741 0 24653 0 23531 1121 1 0 0 2677 8
domain1 ffffffff 1110 578 719 517 0 24 8 8 16 0 0 0 0 0 0 371 371 0 20
cpu2 0 0 0 0 57367 1797 89904 227 1 3 22 41185 963 39842 2964 2964 411 12267 213158 49455 59164 1424 1963 1 0 79 136
domain0 000f000f 96404 1587 2986 93418 0 665 0 0 665 0 30818 0 29558 1260 0 0 0 2405 9
domain1 ffffffff 1127 610 750 493 0 9 1 2 7 0 0 0 0 0 0 411 411 0 24
cpu3 0 0 0 0 55128 1720 86438 64 1 1 16 40123 859 38652 2648 2648 305 11129 205927 58689 56848 1364 1939 0 1 94 112
domain0 000f000f 98108 1421 2770 95338 0 603 0 0 603 0 29683 0 28878 802 3 0 0 2838 12
domain1 ffffffff 1126 596 759 516 0 14 5 5 9 0 0 0 0 0 0 305 305 0 20
cpu4 0 0 0 0 29240 2096 44581 959 0 5 0 19722 395 19085 1356 1356 58 7080 260187 62320 31336 428 1197 0 0 30 63
domain0 00f000f0 79421 1076 1495 77926 0 797 0 0 797 0 13273 0 12586 686 1 0 0 3743 7
domain1 ffffffff 599 235 273 355 0 12 2 2 10 0 0 0 0 0 0 58 58 0 42
cpu5 0 0 0 0 31150 1879 47965 573 0 1 2 21726 397 20973 1264 1264 77 6889 240847 142521 33029 492 1193 0 0 30 55
domain0 00f000f0 84077 1190 1672 82405 0 769 0 0 769 0 14965 0 14932 32 1 0 0 3381 4
domain1 ffffffff 613 209 240 394 0 7 0 0 7 0 0 0 0 0 0 77 77 0 26
cpu6 0 0 0 0 29308 2061 45267 340 0 3 6 20842 379 19879 1115 1115 73 5868 249404 123786 31369 574 984 0 0 31 50
domain0 00f000f0 79128 1245 1812 77316 0 886 0 0 886 0 13928 0 13901 26 1 0 0 3188 4
domain1 ffffffff 623 259 302 357 0 4 0 0 4 0 0 0 0 0 0 73 73 0 34
cpu7 0 0 0 0 25945 1908 40619 102 0 1 0 18027 358 17474 926 926 69 5559 246965 121576 27853 533 909 0 1 28 49
domain0 00f000f0 79082 1203 1724 77358 0 872 0 0 872 0 12795 0 12746 49 0 0 0 3338 5
domain1 ffffffff 539 170 223 357 0 9 1 1 8 0 0 0 0 0 0 69 69 0 37

A typical use of this file is to take a snapshot of it before and after a benchmark run, or every so often during a run if one wishs finer granularity, and then use a program to nicely format the changes in them. I wrote a perl program for the first version, which I've updated with each subsequent version, which formats the results. Note that scripts, as with the above output, will be version-dependent, but the tool will tell you if it's looking at a version it can't handle.

Although this is an internal web page, most of these same patches are being offered externally. Feel free to redistribute if you wish -- it was developed entirely from scratch by me here at IBM.

SLES9 patches being offered are for internal use only and are not being offered outside the company, because they have not been cleared by OSSC.

Latency counts were inspired by prototypes from Bill Irwin (wli@holomorphy.com). Converted to use seq_file by Steve Hemminger (OSDL).

Schedstats patch Version of output Auxiliary program(s)
2.6.0 version 4 stats-4
latency.c
2.6.1
2.6.2
2.6.3
2.6.4
2.6.5
2.6.7 version 9 stats-9
latency.c
2.6.8.1
2.6.9 version 10 stats-10
latency.c
2.6.10
2.6.11.*
SLES9 SP1
SLES9 SP3
2.6.12.* version 11 stats-11
latency.c
2.6.13.* version 12 stats-12
latency.c
2.6.14.*
2.6.15.*
2.6.16.*
2.6.17.*
2.6.18.*
2.6.19.*
Schedstats patch Version of output Auxiliary program(s)
2.6.20.* version 14 stats-14 (not yet available)
latency.c
2.6.21.*
2.6.22.*
2.6.23.*
2.6.24.*
2.6.25.*
2.6.26.*
2.6.27.*
2.6.28.*
2.6.29.*
2.6.30.* version 15 stats-15 (not yet available)
latency.c
2.6.31.*
2.6.32.*

Schedstat patches for other kernel versions are available.


Questions to ricklind@us.ibm.com Last updated Feb 15, 2010