The CPU scheduler is the most performance-critical component of the kernel. It decides which process runs, for how long, and in what order — making these decisions thousands of times per second while trying to simultaneously maximise throughput, minimise latency, ensure fairness, and reduce energy consumption.

The Scheduling Problem

At any moment, dozens or hundreds of processes compete for CPU time. The scheduler must balance competing goals that are often in direct tension:

  • Throughput — complete as many jobs as possible per unit time.
  • Latency — respond to interactive tasks (keystrokes, UI events) quickly.
  • Fairness — no process should wait forever.
  • Energy — on mobile devices, idle CPUs should enter low-power states quickly.

No single algorithm optimises all four simultaneously. Every scheduler makes explicit trade-offs.

Round-Robin: Simple and Fair

The simplest preemptive scheduler. Each process gets a fixed time quantum — typically 10–20ms. When the quantum expires, the scheduler forcibly preempts the running process and moves to the next in a circular queue.

Round-robin is fair in the sense that every process eventually gets CPU time, but it is oblivious to priority, workload type, and how long processes have been waiting. Interactive tasks (which block frequently on I/O) and CPU-bound batch jobs are treated identically.

Priority Scheduling

Each process is assigned a priority level. The scheduler always runs the highest-priority runnable process. Problems: low-priority processes can starve indefinitely if higher-priority ones keep arriving. The fix — ageing — gradually increases a process's priority the longer it waits, eventually guaranteeing it runs.

The Completely Fair Scheduler (CFS)

Linux's default scheduler since kernel 2.6.23. CFS tracks each process's virtual runtime — a weighted measure of how long it has actually run on the CPU. The scheduler always picks the process with the lowest virtual runtime — i.e., the one that has been most deprived of CPU time relative to its share.

CFS stores all runnable processes in a red-black tree, keyed by virtual runtime. The process with the lowest virtual runtime is always the leftmost node — retrievable in O(log n) time. This gives CFS near-perfect fairness without explicit priority queues or complex bookkeeping.

BASH
# Lower nice value = higher priority (range: -20 to 19) $ nice -n -5 ./high_priority_task # more CPU time $ nice -n 15 python3 background_job.py # less CPU time # See the scheduling policy of the current shell $ chrt -p $$ # Inspect CFS scheduling stats for a process $ cat /proc/$(pgrep firefox)/sched

Real-Time Scheduling

Some tasks — audio processing, industrial control systems, kernel interrupt handlers — cannot tolerate scheduling unpredictability. Linux provides two real-time scheduling policies:

  • SCHED_FIFO — run until blocked or explicitly yielded. Highest priority wins absolutely.
  • SCHED_RR — round-robin among real-time processes of the same priority.

Real-time processes preempt all normal (CFS) processes. Misusing them can starve the entire system, which is why setting real-time priority requires CAP_SYS_NICE capability.

The scheduler must work in microseconds, make decisions with incomplete information, and never make the wrong choice for too long. It is perhaps the most difficult piece of systems software to get right.