Linux Scheduler Statistics
/proc/schedstat format
/proc/schedstat format version 3


This version's first appearance coincides with a patch which makes schedstat a config options for the kernel. It should go without saying that if you choose to turn off that config option, /proc/schedstat never even appears. Also, some fields will be zero unless other config options have been utilized. These are noted below.

If you have scripts for version 2, note that the only difference with version 3 is that a new field was inserted before field 7. (Yes, inserted, not appended.) Porting, then, for scripts or programs reading this, will consist of mapping every reference of (old) fields 7-20 to (new) fields 8-21, and of course any handling of the new meaning for field 7.

The new field was inserted rather than appended only because schedstat is not widely in use yet so the number of tools affected should be small, and this allowed all the schedule() statistics to be kept together organizationally.

Format for version 3 of schedstat:

tag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
tag is cpuN or totals.

NOTE: In the sched_yield() statistics, the active queue is considered empty if it has only one process in it, since obviously the process calling sched_yield() is that process.

First four are sched_yield() statistics:

  1. # of times both the active and the expired queue were empty
  2. # of times just the active queue was empty
  3. # of times just the expired queue was empty
  4. # of times sched_yield() was called

Next four are schedule() statistics:

  1. # of times the active queue had at least one other process on it.
  2. # of times we switched to the expired queue and reused it
  3. # of processors that still had more than one process runnable when another processor switched to the idle task (requires CONFIG_SMP)
  4. # of times schedule() was called

Next seven are statistics dealing with load_balance() (requires CONFIG_SMP):

  1. # of times load_balance() was called at an idle tick
  2. # of times load_balance() was called at a busy tick
  3. # of times load_balance() was called from schedule()
  4. # of times load_balance() was called
  5. sum of imbalances discovered (if any) with each call to load_balance()
  6. # of times load_balance() was called when we did not find a "busiest" queue
  7. # of times load_balance() was called from balance_node() (requires CONFIG_NUMA)

Next four are statistics dealing with pull_task() (requires CONFIG_SMP):

  1. # of times pull_task() moved a task to this cpu
  2. # of times pull_task() stole a task from this cpu
  3. # of times pull_task() moved a task to this cpu from another node (requires CONFIG_NUMA)
  4. # of times pull_task() stole a task from this cpu for another node (requires CONFIG_NUMA)

Last two are statistics dealing with balance_node() (requires CONFIG_SMP and CONFIG_NUMA):

  1. # of times balance_node() was called
  2. # of times balance_node() was called at an idle tick


Questions to ricklind@us.ibm.com