Virtualizing-Cpu

Virtualizing the CPU

Processes

A process is the primitive underlying abstraction of a running program. An operating system will keep a list of all process, and various information about each: its state, its register context, its parent, open files, trapframe for current interrupt, among other details.

Each process is isolated from others. The CPU and memory are virtualized by the OS so that each process can pretend it has infinite access to these resources, but in reality they have a small slice, time or space shared with other processes and the OS itself.

Sys calls

fork is used to create a new clone of the current process: all of the current memory allocations, stack data, static data, etc. are copied to a new process.
exec is used to swap the currrent process with a new, potentially entirely different one.

Used together, this allows the arbitrary creation of various new processes, but also the ability to do useful things like shell piping and redirection. The functions in the C stdlib of the same name are ultimately using assembly, conventions for where arguments / return values go, and a trap instruction to make the kernel-mode sys call and gather the return info back in user-mode.

Other notable sys calls are wait (wait for forked child process to complete) and pipe (set up a byte stream from one process to another).

Limited Direct Execution

LDE is a mechanism used to achieve time sharing of the CPU. The "direct execution" part just means that processes run directly on the CPU. The "limited" part means a couple things:

Restricted Operations

Processes run in user mode which by default has less permissions than kernel mode; it can't arbitrarily access any memory regions or do I/O.

Instead, at boot time the OS installs a trap table; it informs the hardware of where trap handlers exist in memory. Then, until the next reboot, the hardware knows exactly what code to jump to for sys calls, events, etc.
Processes in user mode achieve such priveleged actions by sys calls into the OS; the OS then traps into kernel mode for the duration of the sys call, and returns from trap back into user mode with the result of the sys call for the user mode process.

Switching Processes

Many times when a process issues a system call for I/O, it also issues a system call called yield which tells the OS that the CPU can be used for other processes. This is a cooperative approach, and many earlier OSes just assumed good faith processes would all be cooperative.

But what about bad faith processes, or bugs? We don't want a process to be able to run into a forever loop and crash the OS. Instead, modern OSes use a non-cooperative approach. It installs a timer interrupt (via some device) at boot time that will send an interrupt signal on an interval, and informs the hardware of what code to run when this interrupt event happens. This code results in the OS being back in control of the CPU, where it can make decisions such as to continue/kill the process, switch to another, etc.

Speaking of which, another mechanism that is necessary for such an interrupt is a context switch: the OS saves all of the registers, program counter, etc. (basically all of the CPU state of the process), then loads that context for a different process that it wants to switch to.

Scheduling: Multi Level Feedback Queue

When designing a scheduling policy, fairness (jobs getting equal CPU time) is generally at odds with response time, since CPU intensive jobs at the same priority as more interactive jobs will make them take much longer. MLFQ is a scheduling policy that does a good job of balancing these contraints:

There are multiple queues ordered by priority.
When a job gets submitted, it starts in the highest priority queue. Once it uses up its CPU allotment for a given level, it moves to the next lower priority queue.
Given jobs with priorities A and B, if A > B then only A runs, not B. If A = B then they run in round robin.
Every once in a while, all jobs get bumped back to the first priority queue (prevents starving).

An alternative strategy is to make sure that processes (or users or process groups) get a proportional share of the CPU. This is done via a ticket mechanism. First distribute tickets globally, an equal share to each process. Then:

Lottery: draw a number rand(0, num_global_tickets) and pick the process with where the rand gets drawn. This is super efficient and simple and does well in most cases.
Stride Scheduling: by allotting to each process a quantum time slice inversely proportional to their number of tickets, you can precisely achieve proportional share without randomness. However this requires keeping track of more state and more computations.
CFS: is a more widely used fair-share scheduler. It parameterizes things like sched_latency and min_granularity to be flexible in many scenarios, allows users to set a nice level per process that can modify priorities as desired by the user, and uses a red-black tree data structure for very keeping track of processes very efficiently.