Lecture 2: Architectural Support for Operating Systems

Geoffrey M. Voelker
Administrivia

- **Project 0**
  - Due 10/9 11:59pm, done individually
- **Homework #1**
  - Due 10/15
  - Submit via gradescope (entry code on piazza)
- **Project groups**
  - We will use a Google form to collect group members
  - Just need one submission per group
  - Fill out even if you are working alone
- **Lab hours**
  - Posted as a Google calendar (canvas, piazza, course page)
Why Start With Architecture?

• Operating system functionality fundamentally depends upon the architectural features of the computer
  ♦ Key goals are to enforce protection and resource sharing
  ♦ If done well, applications can be oblivious to HW details
  ♦ Unfortunately for us, the OS is left holding the bag

• Architectural support can greatly simplify – or complicate – OS tasks
  ♦ Early PC operating systems (DOS, MacOS) lacked virtual memory in part because the architecture did not support it
  ♦ Early Sun 1 computers used two M68000 CPUs to implement virtual memory (M68000 did not have VM hardware support)
Types of Arch Support

- Manipulating privileged machine state
  - Protected instructions
  - Manipulate device registers
  - Manage memory protection (e.g., TLB entries)
- Generating and handling “events”
  - System calls, interrupts, exceptions, etc.
  - Respond to external events
  - CPU requires software intervention to handle fault or trap
- Mechanisms to handle concurrency, synchronization
  - Interrupts, atomic instructions
Protected Instructions

- A subset of instructions of every CPU is restricted to use only by the OS
  - Known as protected (privileged) instructions
- Only the operating system can …
  - Directly access I/O devices (disks, printers, etc.)
    - Security, fairness (why?)
  - Manipulate memory management state
    - Page table pointers, page protection, TLB management, etc.
  - Manipulate protected control registers
    - Kernel mode, interrupt level
  - Halt instruction
    - Protection (why?)
INVLPG—Invalidate TLB Entries

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Instruction</th>
<th>Op/En</th>
<th>64-Bit Mode</th>
<th>Compat/Leg Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>0F 01/7</td>
<td>INVLPG m</td>
<td>M</td>
<td>Valid</td>
<td>Valid</td>
<td>Invalidate TLB entries for page containing m.</td>
</tr>
</tbody>
</table>

**NOTES:**
* See the IA-32 Architecture Compatibility section below.

**Instruction Operand Encoding**

<table>
<thead>
<tr>
<th>Op/En</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>M</td>
<td>ModRM:rr/m (r)</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
</tr>
</tbody>
</table>

**Description**

Invalidate any translation lookaside buffer (TLB) entries specified with the source operand. The source operand is a memory address. The processor determines the page that contains that address and flushes all TLB entries for that page.¹

The INVLPG instruction is a privileged instruction. When the processor is running in protected mode, the CPL must be 0 to execute this instruction.

The INVLPG instruction normally flushes TLB entries only for the specified page; however, in some cases, it may flush more entries, even the entire TLB. The instruction is guaranteed to invalidate only TLB entries associated with the current PCID. (If PCIDs are disabled — CR4.PCIDE = 0 — the current PCID is 000H.) The instruction also invalidates any global TLB entries for the specified page, regardless of PCID.

For more details on operations that flush the TLB, see “MOV—Move to/from Control Registers” and Section 4.10.4.1, “Operations that Invalidate TLBs and Paging-Structure Caches,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.

This instruction’s operation is the same in all non-64-bit modes. It also operates the same in 64-bit mode, except if the memory address is in non-canonical form. In this case, INVLPG is the same as a NOP.

**IA-32 Architecture Compatibility**

The INVLPG instruction is implementation dependent, and its function may be implemented differently on different processors. The INVLPG instruction is included in all IA-32 processors.
OS Protection

• How do we know if a protected instruction can execute?
  ♦ Architecture must support (at least) two modes of operation: kernel mode and user mode
    » VAX, x86 support four modes; earlier architectures even more
    » Why? Protect the OS from itself (software engineering)
  ♦ Mode is indicated by a status bit in a protected control register
  ♦ User programs execute in user mode
  ♦ OS executes in kernel, privileged mode (OS == “kernel”)

• Protected instructions only execute in kernel mode
  ♦ CPU checks mode bit when protected instruction executes
  ♦ Setting mode bit must be a protected instruction
  ♦ Attempts to execute in user mode are detected and prevented
Memory Protection

- OS must be able to protect programs from each other
- OS must protect itself from user programs
- May or may not protect user programs from OS
  - Raises question of whether programs should trust the OS
  - Untrusted operating systems? (Intel SGX)
- Memory management hardware provides memory protection mechanisms
  - Page table pointers, page protection, segmentation, TLB
- Manipulating memory management hardware uses protected (privileged) operations
Events

- An event is an unnatural change in control flow
  - Events immediately stop current execution
  - Changes mode, context (machine state), or both
- The kernel defines a handler for each event type
  - Event handlers always execute in kernel mode
  - The specific types of events are defined by the machine
- Once the system is booted, all entry to the kernel occurs as the result of an event
  - In effect, the operating system is one big event handler
  - OS only executes in reaction to events
Categorizing Events

- Two kinds of events, **interrupts** and **exceptions**
- Interrupts are caused by an external event
  - Device finishes I/O, timer expires, etc.
  - Analogy: Receiving a phone call
- Exceptions are caused by executing instructions
  - CPU requires software intervention to handle a fault or trap
- Two reasons for events, **unexpected** and **deliberate**
- Unexpected events are, well, unexpected
  - What is an example?
- Deliberate events are scheduled by OS or application
  - Why would this be useful?
Categorizing Events (2)

- This gives us a convenient table:

<table>
<thead>
<tr>
<th></th>
<th>Unexpected</th>
<th>Deliberate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exceptions (sync)</td>
<td>fault</td>
<td>syscall trap</td>
</tr>
<tr>
<td>Interrupts (async)</td>
<td>interrupt</td>
<td>software interrupt</td>
</tr>
</tbody>
</table>

- Terms may vary by OSes, CPU architectures
- Software interrupt
  - Asynchronous system trap (AST), asynchronous or deferred procedure call (APC or DPC)
  - Used by the OS to defer work until all hardware interrupts have been handled
Faults

- Hardware detects and reports *exceptional* conditions
  - Page fault, unaligned access, divide by zero
- Upon exception, hardware *faults* (verb)
  - Must save state (PC, regs, mode, etc.) so that the faulting process can be restarted
- Modern OSes use VM faults for many functions
  - Debugging, end-of-stack, garbage collection, copy-on-write
- Fault exceptions are a performance optimization
  - Could detect faults by inserting extra instructions into code (think array bounds checking), but at a significant performance penalty
Handling Faults (Recovery)

- Some faults are handled by “fixing” the exceptional condition and returning to the faulting context
  - Page faults cause the OS to bring missing pages into memory
  - Fault handler resets PC of faulting context to re-execute instruction that caused the page fault
- Some faults are handled by notifying the process
  - Fault handler changes the saved context to transfer control to a user-mode handler on return from fault
  - Handler must be registered with OS
  - Unix signals or Win user-mode Async Procedure Calls (APCs)
    » SIGALRM, SIGHUP, SIGTERM, SIGSEGV, etc.
Handling Faults (Termination)

• The kernel may handle unrecoverable faults by killing the user process
  ♦ Program fault with no registered handler
  ♦ Halt process, write process state to file, destroy process
  ♦ In Unix, the default action for many signals (e.g., SIGSEGV)

• What about faults in the kernel?
  ♦ Dereference NULL, divide by zero, undefined instruction
  ♦ These faults considered fatal, operating system crashes
  ♦ Unix panic, Windows “Blue screen of death”
    » Kernel is halted, state dumped to a core file, machine locked up
(Live Demo)
System Calls

- For a user program to do something “privileged” (e.g., I/O) it must call an OS procedure
  - Known as crossing the protection boundary, or protected procedure call, or protected control transfer
- CPU ISA provides a system call instruction that:
  - Causes an exception, which vectors to a kernel handler
  - Passes a parameter determining the system routine to call
  - Saves caller state (PC, regs, mode) so it can be restored
  - Returning from system call restores this state
- Requires architectural support to:
  - Restore saved state, reset mode, resume execution
**INT n/INTO/INT 3—Call to Interrupt Procedure**

<table>
<thead>
<tr>
<th>Opcode</th>
<th>Instruction</th>
<th>Op/En</th>
<th>64-Bit Mode</th>
<th>Comp/ Leg Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>CC</td>
<td>INT 3</td>
<td>NP</td>
<td>Valid</td>
<td>Valid</td>
<td>Interrupt 3—trap to debugger.</td>
</tr>
<tr>
<td>CD ib</td>
<td>INT imm8</td>
<td>I</td>
<td>Valid</td>
<td>Valid</td>
<td>Interrupt vector specified by immediate byte.</td>
</tr>
<tr>
<td>CE</td>
<td>INTO</td>
<td>NP</td>
<td>Invalid</td>
<td>Valid</td>
<td>Interrupt 4—if overflow flag is 1.</td>
</tr>
</tbody>
</table>

**Instruction Operand Encoding**

<table>
<thead>
<tr>
<th>Op/En</th>
<th>Operand 1</th>
<th>Operand 2</th>
<th>Operand 3</th>
<th>Operand 4</th>
</tr>
</thead>
<tbody>
<tr>
<td>NP</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
</tr>
<tr>
<td>I</td>
<td>imm8</td>
<td>NA</td>
<td>NA</td>
<td>NA</td>
</tr>
</tbody>
</table>

**Description**

The INT n instruction generates a call to the interrupt or exception handler specified with the destination operand (see the section titled “Interrupts and Exceptions” in Chapter 6 of the *Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1*). The destination operand specifies a vector from 0 to 255, encoded as an 8-bit unsigned intermediate value. Each vector provides an index to a gate descriptor in the IDT. The first 32 vectors are reserved by Intel for system use. Some of these vectors are used for internally generated exceptions.

The INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler. The INTO instruction is a special mnemonic for calling overflow exception (#OF), exception 4. The overflow interrupt checks the OF flag in the EFLAGS register and calls the overflow interrupt handler if the OF flag is set to 1. (The INTO instruction cannot be used in 64-bit mode.)

The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a breakpoint, including other one byte instructions, without over-writing other code). To further support its function as a debug breakpoint, the interrupt generated with the CC opcode also differs from the regular software interrupts as follows:
Thumb Instruction Details

A6.7.136 SVC (formerly SWI)


Use it as a call to an operating system to provide a service.

Encoding T1: All versions of the Thumb ISA.

\[
\begin{array}{cccccccccc}
  & 15 & 14 & 13 & 12 & 11 & 10 & 9 & 8 & 7 & 6 & 5 & 4 & 3 & 2 & 1 & 0 \\
\hline
  & 1 & 1 & 0 & 1 & 1 & 1 & 1 & & & & & & & & & & & imm32 = ZeroExtend(imm8, 32);\\n\end{array}
\]

// imm32 is for assembly/disassembly, and is ignored by hardware. SVC handlers in some
// systems interpret imm8 in software, for example to determine the required service.
Nachos (test/start.s)

/* ---------------------------------------------
 * System call stubs:
 * Assembly language assist to make system calls to the Nachos kernel.
 * There is one stub per system call, that places the code for the
 * system call into register r2, and leaves the arguments to the
 * system call alone (in other words, arg1 is in r4, arg2 is
 * in r5, arg3 is in r6, arg4 is in r7)
 * 
 * The return value is in r2. This follows the standard C calling
 * convention on the MIPS.
 * ---------------------------------------------*/

#define SYSCALLSTUB(name, number) \
.globl name ; \
.ent name ; \
name: \
    addiu $2,$0,number ; \
    syscall ; \
    j $31 ; \
.end name
System Call

Firefox: read()

Trap to kernel mode, save state

Trap handler

Find read handler in vector table

read() kernel routine

Restore state, return to user level, resume execution
Introduction

System call is the services provided by Linux kernel. In C programming, it often uses functions defined in libc which provides a wrapper for many system calls. Manual page section 2 provides more information about system calls. To get an overview, use "man 2 intro" in a command shell.

It is also possible to invoke syscall() function directly. Each system call has a function number defined in <syscall.h> or <unistd.h>. Internally, system call is invoked by software interrupt 0x80 to transfer control to the kernel. System call table is defined in Linux kernel source file "arch/i386/kernel/entry.S":

System Call Example

```c
#include <syscall.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/types.h>

int main(void) {
    long ID1, ID2;
    /* direct system call */
    /* SYS_getpid (func no. is 20) */
    ID1 = syscall(SYS_getpid);
    printf("syscall(SYS_getpid)=%ld\n", ID1);

    /* "libc" wrapped system call */
    /* SYS_getpid (func no. is 20) */
    ID2 = getpid();
    printf("getpid()=%ld\n", ID2);

    return(0);
}
```

System Call Quick Reference

<table>
<thead>
<tr>
<th>No</th>
<th>Func Name</th>
<th>Description</th>
<th>Source</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>exit</td>
<td>terminate the current process</td>
<td>kernel/exit.c</td>
</tr>
<tr>
<td>2</td>
<td>fork</td>
<td>create a child process</td>
<td>arch/i386/kernel/process.c</td>
</tr>
<tr>
<td>3</td>
<td>read</td>
<td>read from a file descriptor</td>
<td>/fs/read_write.c</td>
</tr>
<tr>
<td>4</td>
<td>write</td>
<td>write to a file descriptor</td>
<td>/fs/read_write.c</td>
</tr>
<tr>
<td>5</td>
<td>open</td>
<td>open a file or device</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>6</td>
<td>close</td>
<td>close a file descriptor</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>7</td>
<td>waitpid</td>
<td>wait for process termination</td>
<td>kernel/exit.c</td>
</tr>
<tr>
<td>8</td>
<td>creat</td>
<td>create a file or device (&quot;man 2 open&quot; for information)</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>9</td>
<td>link</td>
<td>make a new name for a file</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>10</td>
<td>unlink</td>
<td>delete a name and possibly the file it refers to</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>11</td>
<td>execve</td>
<td>execute program</td>
<td>arch/i386/kernel/process.c</td>
</tr>
<tr>
<td>12</td>
<td>chdir</td>
<td>change working directory</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>13</td>
<td>time</td>
<td>get time in seconds</td>
<td>kernel/time.c</td>
</tr>
<tr>
<td>14</td>
<td>mkod</td>
<td>create a special or ordinary file</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>15</td>
<td>chmod</td>
<td>change permissions of a file</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>16</td>
<td>lchown</td>
<td>change ownership of a file</td>
<td>/fs/open.c</td>
</tr>
<tr>
<td>17</td>
<td>stat</td>
<td>get file status</td>
<td>/fs/stat.c</td>
</tr>
<tr>
<td>18</td>
<td>lseek</td>
<td>reposition read/write file offset</td>
<td>/fs/read_write.c</td>
</tr>
<tr>
<td>19</td>
<td>fstat</td>
<td>get process identification</td>
<td>kernel/fstat.c</td>
</tr>
<tr>
<td>20</td>
<td>mount</td>
<td>mount filesystems</td>
<td>fs/mount.c</td>
</tr>
<tr>
<td>21</td>
<td>umount</td>
<td>unmount filesystems</td>
<td>fs/mount.c</td>
</tr>
<tr>
<td>22</td>
<td>setugid</td>
<td>set real user ID</td>
<td>kernel/sys.c</td>
</tr>
<tr>
<td>23</td>
<td>getuid</td>
<td>get real user ID</td>
<td>kernel/sys.c</td>
</tr>
<tr>
<td>24</td>
<td>gettgid</td>
<td>get system time and date</td>
<td>kernel/time.c</td>
</tr>
<tr>
<td>25</td>
<td>ptrace</td>
<td>allows a parent process to control the execution of a child process</td>
<td>arch/i386/kernel/procc.c</td>
</tr>
<tr>
<td>26</td>
<td>alarm</td>
<td>set an alarm clock for delivery of a signal</td>
<td>kernel/sched.c</td>
</tr>
<tr>
<td>27</td>
<td>futex</td>
<td>suspend process until signal</td>
<td>arch/i386/kernel/sys_i386.c</td>
</tr>
<tr>
<td>28</td>
<td>utime</td>
<td>set file access and modification times</td>
<td>fs/open.c</td>
</tr>
<tr>
<td>29</td>
<td>nice</td>
<td>check user's permissions for a file</td>
<td>kernel/sched.c</td>
</tr>
<tr>
<td>30</td>
<td>access</td>
<td>change process priority</td>
<td>fs/open.c</td>
</tr>
<tr>
<td>31</td>
<td>sync</td>
<td>update the super block</td>
<td>kernel/signal.c</td>
</tr>
<tr>
<td>32</td>
<td>kill</td>
<td>send signal to a process</td>
<td>kernel/signal.c</td>
</tr>
<tr>
<td>33</td>
<td>rename</td>
<td>change the name or location of a file</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>34</td>
<td>mkdir</td>
<td>create a directory</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>35</td>
<td>rmdir</td>
<td>remove a directory</td>
<td>/fs/namei.c</td>
</tr>
<tr>
<td>36</td>
<td>dup</td>
<td>duplicate an open file descriptor</td>
<td>/fs/fcntl.c</td>
</tr>
<tr>
<td>37</td>
<td>pipe</td>
<td>create an interprocess channel</td>
<td>arch/i386/kernel/process.c</td>
</tr>
<tr>
<td>38</td>
<td>times</td>
<td>get process times</td>
<td>kernel/sys.c</td>
</tr>
<tr>
<td>39</td>
<td>brk</td>
<td>change the amount of space allocated for the calling process data segment</td>
<td>mm/mmap.c</td>
</tr>
<tr>
<td>40</td>
<td>setgid</td>
<td>set real group ID</td>
<td>kernel/sys.c</td>
</tr>
<tr>
<td>41</td>
<td>getgid</td>
<td>get real group ID</td>
<td>kernel/sys.c</td>
</tr>
<tr>
<td>42</td>
<td>sigprocm</td>
<td>ANSI C signal handling</td>
<td>kernel/sched.c</td>
</tr>
<tr>
<td>43</td>
<td>geteuid</td>
<td>get effective user ID</td>
<td>kernel/sched.c</td>
</tr>
<tr>
<td>44</td>
<td>getegid</td>
<td>get effective group ID</td>
<td>kernel/sched.c</td>
</tr>
</tbody>
</table>
Referencing OS Data

- Processes and the OS are in different address spaces
  - How can the OS return references to kernel data structures?
- A naming issue
  - Use integer object handles or descriptors
    - Unix file descriptors, Windows HANDLEs
    - Only meaningful as parameters to other system calls
  - Also called capabilities (more later when we do protection)
Interrupts

• Interrupts signal asynchronous events
  ♦ I/O hardware interrupts
  ♦ Software and hardware timers

• Interrupts on modern CPUs are precise
  ♦ CPU transfers control only on instruction boundaries
Timer

- The timer is critical for an operating system
- It is the fallback mechanism by which the OS reclaims control over the machine
  - Timer is set to generate an interrupt after a period of time
    » Setting timer is a privileged instruction
  - When timer expires, generates an interrupt
  - Handled by kernel, which controls what runs next
    » Basis for OS scheduler (more later…)
- Prevents infinite loops
  - OS can always regain control from erroneous or malicious programs that try to hog CPU
- Also used for time-based functions (e.g., `sleep`)
I/O Control

• I/O issues
  ♦ Initiating an I/O
  ♦ Completing an I/O

• Initiating an I/O
  ♦ Special instructions
  ♦ Memory-mapped I/O
    » Device registers mapped into address space
    » Writing to address sends data to I/O device
I/O Completion

- Interrupts are the basis for asynchronous I/O
  - OS initiates I/O
  - Device operates independently of rest of machine
  - Device sends an interrupt signal to CPU when done
  - OS maintains a vector table containing a list of addresses of kernel routines to handle various events
  - CPU looks up kernel address indexed by interrupt number, context switches to routine
- If you have ever installed early versions of Windows, you now know what IRQs are for
Synchronization

- Interrupts cause difficult problems
  - An interrupt can occur at any time
  - A handler can execute that interferes with code that was interrupted
- OS must be able to synchronize concurrent execution
- Need to guarantee that short instruction sequences execute atomically
  - Disable interrupts – turn off interrupts before sequence, execute sequence, turn interrupts back on
  - Special atomic instructions – read/modify/write a memory address atomically
    » XCHG instruction on x86
Summary

• Protection
  ◆ User/kernel modes
  ◆ Protected instructions

• System calls
  ◆ Used by user-level processes to access OS functions
  ◆ Access what is “in” the OS

• Exceptions
  ◆ Unexpected event during execution (e.g., divide by zero)

• Interrupts
  ◆ Timer, I/O
Next Time...

- Read Chapters 4-6 (Processes)
- Start on Project 0