Consolidated lecture notes for CS422 as taught in the Spring 2007 semester at Yale by Jim Aspnes.

Contents

  1. OperatingSystemsOverview
  2. What is an operating system?
  3. History
  4. What abstractions does an OS provide?
  5. Recurring themes
  6. OperatingSystemStructure
  7. Hardware
  8. CPU operation
  9. I/O
  10. BIOS
  11. Memory
  12. BootProcess
  13. Boot process: general
  14. PC architecture
  15. IntelAssemblyProgramming
  16. What you should read instead of this page
  17. Real mode x86 programming
    1. Opcodes and operands
  18. Processes
  19. Processes
  20. Multitasking
    1. Option 1: Cooperative multitasking
      1. Mechanism
    2. Option 2: Pre-emptive multitasking
      1. Mechanism
      2. Gotchas
  21. Scheduling
  22. Threads
  23. Why use threads?
  24. Thread creation: user view
  25. User threads vs kernel threads
  26. Why not use threads?
  27. ConcurrencyControl
  28. The need for concurrency control
    1. Example: Updating the web hits counter
    2. Race conditions
  29. Mutual exclusion
    1. Naive approach
    2. Another naive approach
    3. Peterson's algorithm
    4. Preventing pre-emption
    5. Hardware support for locking
  30. Spinlocks
  31. Semaphores
    1. Applications
    2. Implementation
  32. Monitors
  33. Condition variables
  34. Deadlock detection and avoidance
  35. Deadlock
  36. What is deadlock?
    1. Example: too little memory
    2. Example: I/O
    3. Example: bidirectional pipe
  37. Processes and resources
  38. Necessary conditions
  39. Resource-allocation graphs
  40. Preventing deadlock
  41. Avoiding deadlock
    1. The Banker's algorithm
  42. Dealing with deadlock
  43. ProcessorScheduling
  44. Processor scheduling: basics
    1. CPU burst cycle
    2. Types of jobs
    3. Performance measures
  45. When scheduling happens
  46. Algorithms
    1. First-come first-served
    2. Round robin
    3. Shortest-job-first
    4. Priority scheduling
    5. Multilevel queue scheduling
    6. Multilevel feedback-queue scheduling
  47. Multiple processors
  48. Algorithms in practice
  49. Evaluation
  50. InterProcessCommunication
  51. Motivation
  52. Shared memory
  53. Message passing
    1. Interface
      1. Channels and naming
      2. Sending a message
      3. Receiving a message
    2. Implementation
      1. Using shared memory
      2. Across a network
    3. Exceptions
  54. Remote procedure calls
  55. Remote method invocation
  56. Efficiency issues
  57. MemoryManagement
  58. MemoryLayout
  59. Fixed addresses chosen by the programmer
  60. Fixed address chosen by the linker and/or loader
  61. Dynamic linking
  62. Dynamic loading
  63. MemoryProtection
  64. Segmentation approach
  65. Paging
  66. Software memory protection
  67. Paging
  68. Basic idea
  69. Translation Lookaside Buffer
  70. Page table tricks
  71. Page table structures
  72. VirtualMemory
  73. Terminology
  74. Handling a page fault
  75. What can we page?
  76. Performance
  77. Page replacement strategies
  78. Buffering strategies
  79. Other virtual memory tricks
    1. Shared pages
    2. Copy-on-write
    3. Memory-mapped files
  80. Bad outcomes
  81. InputOutput
  82. I/O devices (user view)
  83. I/O devices (hardware view)
    1. Controllers
    2. Input and output instructions
    3. Memory-mapped I/O
    4. Hybrid approach
    5. Direct Memory Access (DMA)
  84. A typical I/O operation from start to finish
  85. I/O system architecture
    1. Device driver architecture
      1. What a device driver module looks like
      2. Loading a device driver
      3. Risks
    2. Device-independent I/O components
  86. Example: console/keyboard driver
    1. Output
    2. Input
  87. Example: disk drivers
  88. BlockDevices
  89. The problem with mass storage
  90. Hard disk structure
  91. Bad blocks and error correction
  92. Disk scheduling
  93. Block device driver implementation
  94. Buffering, caching, and prefetching
  95. Network-attached storage
  96. Removable storage
  97. Multiple disks
  98. Other block devices
  99. FileSystems
  100. Interface
    1. Files
      1. Structure
      2. Naming
      3. Metadata
      4. Operations
      5. Access methods
    2. Directories
    3. Mount points
    4. Special files
    5. Special file systems
      1. /proc
      2. Archival file systems
      3. Distributed file systems
      4. Unions
      5. Encrypted filesystems
    6. Access control
  101. Implementation
    1. Disk layout
      1. Block size
      2. Tracking blocks in a file
      3. Tracking free blocks
      4. Directories
      5. Physical layout
    2. Pathname translation
    3. Caching
  102. Consistency checking
    1. How inconsistencies arise
    2. Recovering from inconsistencies
    3. Preventing inconsistencies
  103. More info
  104. LogStructuredFilesystem
  105. The first step: journaling
  106. Ditching the main filesystem
    1. The inode map
    2. Checkpoints
  107. Space recovery
    1. Segment summary data
    2. Data compaction
    3. Which segments to clean?
  108. Performance
  109. Why aren't all filesystems log-structured?
  110. Networking
  111. Local-area vs wide-area networks
    1. LANs
    2. WANs
  112. Addressing: IP
  113. UDP and TCP
    1. TCP: more details
      1. Connection setup
      2. Receive window
      3. Retransmission control
      4. Shutting down a connection
  114. Higher-level protocols
  115. OS implications
  116. Socket interface
  117. Effect of failures
  118. NetworkFileSystems
  119. Naming issues
  120. Caching and consistency
  121. Stateful vs stateless servers
  122. Data representation
  123. Case study: NFS version 2
  124. DistributedSystems
  125. Examples of distributed systems
  126. Distributed coordination
  127. Timestamps
  128. Distributed mutual exclusion
  129. Distributed transactions
  130. Agreement protocols
  131. Paxos
  132. The Paxos algorithm
  133. Informal analysis: how information flows between rounds
  134. Safety properties
  135. Learning the results
  136. Liveness properties
  137. ComputerSecurity
  138. Goals of computer security
  139. Protection
    1. Principle of least privilege
    2. Users, roles, and groups
    3. Protection domains
    4. Access matrices
      1. Who controls the access matrix?
      2. Copy rights
      3. Confinement
    5. Implementation
      1. Access control lists
      2. Capabilities
  140. Implementation
    1. Authentication
      1. Something you know
      2. Something you have
      3. Something you are
      4. Two-factor authentication
    2. Authorization
    3. Enforcement
    4. Intrusion detection and recovery
  141. Practical attacks and defenses
    1. Attacks on individual machines
      1. Buffer overflow and other code injection exploits
      2. Viruses
      3. Trojan horses
      4. Worms
    2. Network attacks
      1. Replay attacks
      2. Man-in-the-middle attacks
  142. How much of a problem is security?
  143. Virtualization
  144. A brief history of operating sytems
  145. Why virtualize?
  146. Virtualization techniques
    1. Emulation
    2. Hypervisors
      1. Using breakpoints
      2. Using code rewriting
      3. Using paravirtualization
      4. Using additional CPU support
  147. Applications
  148. UsingBochs
  149. Basic use
  150. Debugging
    1. Breakpoints and breakpoint gotchas
  151. Alternatives
  152. UsingSubversion
  153. Basic concepts
  154. Subversion commands: basics
    1. svn co [url]
    2. svn up
    3. svn commit
  155. Getting information
    1. svn log
    2. svn status
    3. svn diff
    4. svn cat
  156. File operations
    1. svn add [file]
    2. svn mkdir [directory]
    3. svn rm [file]
    4. svn cp [source] [destination]
    5. svn mv [source] [destination]
  157. Fixing things
    1. svn revert [file]
    2. svn resolved [file]

1. OperatingSystemsOverview

2. What is an operating system?

  • Difficult legal question in Europe and in Clinton-era US
  • We can consider several definitions:
    User's view
    Software that came with my computer that I can't get rid of.
    Textbook view
    Software that hides details of the underlying hardware, mediates access to resources, and enforces security policies.
    System programmer's view
    Anything that runs in ring 0.
  • We will mostly adopt the programmer's view.
  • Essential idea is that OS provides an abstraction layer on top of the bare hardware.
    • Typical modern system: Hardware -> BIOS -> kernel -> libraries -> user programs.

    • Where to draw the boundary? E.g., do libraries fold into the OS?
    • Natural boundary is the system call.
      • Modern CPUs have different protection levels or rings (4 on Intel CPUs, but nobody uses anything but ring 0 and ring 3).
        • Kernel code (ring 0) has full access to hardware.
        • User code (ring > 0) doesn't.

      • If user code tries to execute a privileged instruction, it traps to kernel code.
        • On IA32, uses same mechanism as for hardware interrupts:
          • CPU switches to ring 0.
          • Various magic happens in the background (memory system switches context, interrupts may be disabled).
          • IP jumps to new location based on interrupt vector table in low memory.
            • => kernel is whatever the interrupt table sends you to?

        • Kernel can now decide what to do in response to the "illegal instruction"
          • Userspace process is doing something wrong => kill it!

          • Or maybe it just needs some help from the kernel => help it.

            • This is the system call case.
            • Eventually return using specialized opcode (iret on IA32)

            • int/iret acts like very expensive context-switching procedure call.

3. History

  • In the beginning: no OS
    • Program raw hardware with a soldering iron (still used in 23rd century according to Star Trek!)
    • Eventually graduate to punch cards/paper tape/magnetic tape.
    • Model is that a batch-mode program gets complete control over the entire machine.
  • IBM virtual machines
    • Build one machine that pretends to be many machines.
    • Each individual machine still acts exactly like the classic IBM 1401 (or whatever) that you programmed with punch cards.
      • Includes virtual punch card readers, magnetic tapes, etc. with bug-for-bug compatibility.
      • We actually haven't gotten very far from this.
    • Most programming is still done in batch mode.
  • Timesharing
    • Build abstract processes that run on an idealized virtual machine.
    • Works nicely for interactive systems with many users.
    • Enabling economic change: cheap computers and expensive users.
      • Happened first in well-funded research laboratories e.g. Xerox PARC, Bell Labs, MIT AI Lab.
      • Now we all do it.
  • Distributed systems
    • Get thousands or millions of machines to work together.
    • Enabling economic change: cheaper to buy many small computers than one big one; cheap networks.
    • Dates back to 1960's parallel machines like ILIAC, now shows up as peer-to-peer systems and server farms.
    • We mostly won't talk about these.

4. What abstractions does an OS provide?

  • Hide ugly details of hardware interfaces.
    • Most hardware ships with software drivers.
    • This means hardware designers don't have to provide nice interfaces since driver can paper it over.
      • This is usually a good thing!
      • Software is easier to change than hardware
        • Can upgrade capabilities by upgrading drivers.
        • Can fix bugs without burning new ICs.
      • Software is cheaper than hardware: can exploit brainpower of the CPU
        • Example: WinModems

        • But sometimes we want the speedup we get from hardware (e.g. graphics cards)
    • OS goal: provide a standardized interface to device drivers.
      • New picture: hardware -> BIOS + device drives -> kernel -> libraries -> user programs.

    • Secondary goal: convince hardware designers to write drivers for your OS
      • Works best if you own a large chunk of the OS market.
      • Obstacle to new OS development: even experimental OS's have to pretend to be Windows or Linux to device drivers!
  • Pretend we have more resources than we really have.
    • Process abstraction
      • Yes, you only bought 1 CPU (or 2, or next year 8), but we'll pretend you have infinitely many.
      • This works by time-sharing CPU(s) between processes.

        • Cooperative multitasking: polite processes execute yield system call, kernel switches to another process.

          • This lets a runaway process freeze your system!
          • But it's easy to implement.
          • Used in practice in pre-NT Windows, pre-OSX MacOS, even today in many embedded OSs (e.g. PalmOS).
        • Pre-emptive multitasking: kernel switches to another process whether you like it or not!

          • No runaway processes
          • But requires more work in kernel and in processes since they have to deal with concurrency issues.

      • Why this works: most programs have "bursty" demand on CPU.
        • If you are waiting 1010 clock cycles for the user to click the mouse, why not let somebody else use the CPU?

        • Same thing for 108-cycle disk operations.

        • With good scheduling, nobody will notice we are doing this.
      • Further design decisions:
        • How to choose which processes to schedule? E.g. real-time, priorities, fair-share: many possible policies.
        • How much context to store for each process?
          • E.g. threads vs processes: does a process get its own address space in addition to its own virtual CPU?
          • Some historical oddities that persist for convenience like current working directories.
        • How to manage processes?
    • Virtual memory abstraction
      • We'll take small fast memory and a big slow disk and combine them to get big fast virtual memory.
        • Only works because most memory isn't used very often.
        • Usually requires hardware support (memory manager)
      • Bonus: each process gets its own address space
        • Security payoff: bad process can't scribble on good process
        • Compiler payoff: don't need to write position-independent code
  • I/O abstractions
    • Next step up from hardware drivers: make all your devices look the same
    • Keyboards, printers, console display, the mouse, many USB devices: all streams of bytes.
    • Disks, flash drives, remote network storage: all blocks of bytes.
      • Old days: all stacks of 80-column EBCDIC punch cards
    • Goal is to simplify programmers' lives.
      • Why in kernel and not in libraries? Easier to change kernel for new hardware since we have to bring in device drivers anyway.
  • Filesystem abstractions
    • Present big-bag-of-bits as structured collection of files.
      • File is typically a big-bag-of-bits.
      • But there is a tree structure organizer files into directories/folders.
      • Files may also have metadata.
        • What it is called.
        • Who has writes to the file.
        • When it was created.
        • What program to use to read it.
      • Goal is to provide enforced common language for application program developers.
      • Unix approach: represent things that aren't files in the filesystem anyway
        • E.g. /dev/tty, /proc/cpuinfo, /proc/self/cmdline.

        • Allows reuse of same interface everywhere.
      • Things to think about:
        • Why not do this at the library level?
        • Why a tree structure when databases dumped this design for tables?
          • Note that attempts to bring in tables (e.g. in Windows) have generally failed.
        • Why unstructured files?
          • Old days: all stacks of 80-column EBCDIC punch cards
          • Windows: text vs binary files
          • These seem to mostly be holdovers.
  • Security
    • Any time we provide an abstraction, we can enforce limits on what you can do with it
    • Process limits
    • File permissions
    • I/O permissions
    • In some systems, very fine-grained control, SECRET/TOP SECRET enforcement, etc.
  • Other things that get shoved into the kernel for performance.
    • E.g. window systems

5. Recurring themes

  • Suffering
    • OS development happens without all the nice tools an OS gives you.
      • This makes kernel programming harder.
        • Have to use low-level languages and assembly.
        • Debugging tools are limited.
    • You are also running on the bare hardware.
      • Mistakes are harshly punished.
  • Modularity
    • OS is a large software system like any other => we can't build it unless we can divide up the work

    • Layered approach
      • Build up from core abstractions adding new functionality at each layer
        • E.g.: BIOS -> device drivers -> memory management -> process management -> I/O management -> network abstractions -> filesystem

        • Getting the order of the layers right can be tricky: what if I want to swap VM out to files?
    • Microkernel approach
      • We like processes so much we'll push everything we can out into userspace.
      • Advantage: can restart your filesystem daemon without rebooting the whole machine.
      • Disadvantage: where are you loading your filesystem daemon from?
      • Very active area of research in 1980's and 1990's, but then most people gave up because of bad performance.
      • (But still used in some niches and performance has been getting better.)
    • Exokernel approach
      • Provide minimal partitioning of resources and then push everything into the libraries.
      • Generally not seen in the wild.
      • Main selling point: cool name.
    • Monolithic/modular kernel approach
      • Kernel is one giant program.
      • But use OO techniques to modularize it.
        • E.g. standard device driver interface with dynamic loading.
      • Advantage is that internal calls in the kernel are cheap procedure calls instead of expensive context switches.
      • Disadvantage is that one bad module can destroy your entire system.
    • Virtualization
      • Run an entire OS inside a process
      • Rationale: protects system against an OS you don't trust
      • Possible other rationale
        • Systems are now being built from multiple processes
        • => need a second level of process abstraction to encapsulate whole systems

        • E.g. virtual WWW servers.
        • Possibly a sign that process hierarchy is too flat: analogy to 1960's BASIC/FORTRAN subroutines that couldn't be recursive.
  • Security
    • Why separate address spaces/file permissions/etc.?
      • Can't trust users, they make mistakes.
      • Can't trust programmers, they make mistakes that execute at 1010 instructions per second.

      • Can't trust poorly-socialized script kiddies, they don't share your agenda.
      • Good fences contain the damage (but note most OSs don't really try to protect against malicious users).
    • Why reboot?
      • Bugs produce errors that accumulate over time.
      • Restarting processes from scratch resets to a known state.
      • (Same reason humans aren't immortal: easier to manufacture new humans than repair the old ones.)
      • (Also same reason people throw away their virus-infested PCs rather than disinfect them.)
    • Trade-off: "A computer is only secure when it's unplugged."
      • Too much security makes users bypass it.
      • Too much security destroys your work.
  • Performance
    • Famous old SGI story
      • Suppose we install a kernel on 108 machines.

      • Every day each machine wastes 1 second of a user's life.
      • 108 machines times 103 days = 1011 wasted seconds = 30 wasted lifetimes.

      • => bad coders are worse than serial killers.

    • Less of an issue over time as machines get faster
    • But defining constraint on early OS development (also why PC OS's recapitulated mainframe OS's 20-25 years later)
    • Question to ask: what features of my OS are solutions to performance issues that are no longer relevant?
  • Consistency
    • Backwards-compatibility is a huge issue, especially for commercial OSs.
      • Free OS users tolerate disasters as part of the whole hair-shirt mentality :-]
      • Microsoft story
        • When SimCity broke because Windows closed the use-after-free loophole, users blamed Windows and not SimCity

        • So Windows now recognizes SimCity and runs a special compatibility version of malloc/free

    • If your API is too complicated or too restrictive, natural selection comes into play.
    • Sometimes crummy interfaces persist through lock-in or survival of the least inadequate.
      • E.g. NFS, X11, sockets, and other de facto sub-standards.
    • Hard to guess what users want => safe OS research makes existing interfaces 10% faster.

    • But you become famous for new approaches that catch on.
  • Separation between policy and mechanism
    • We can't predict exactly what scheduling/security/etc. policies users will want.
    • So build the OS so we can change the policy later.
    • Modularity helps here just as in regular programming.

6. OperatingSystemStructure

Description of how a typical OS is organized, with emphasis on x86.

7. Hardware

  • One or more CPUs
  • Various devices
  • Memory controller
    • Sits between CPU/devices and physical memory
    • May implement virtual memory
  • Interrupt controller
    • Sits between devices and CPU(s)
    • Chooses how and when to divert CPU from normal computation

8. CPU operation

Normal operation
  • Fetch next instruction based on IP register
  • Do what it says
Interrupts
  • Device signals some event
  • Interrupt controller kicks CPU
  • CPU calls interrupt handler
    • Disables further interrupts
    • Pushes current state onto the stack
    • (In protected mode) switches to ring 0
    • Jumps to interrupt handler at standard location (new CS:IP stored at physical address 0x04*(interrupt number) for x86 real mode)

    • Interrupt handler does its thing
    • Returns (using iret on x86)

      • Pops state and re-enables interrupts
Traps
  • Like interrupts, except something bad happened (division by 0, overflow, segmentation faults, etc.)
  • Mechanism is exactly the same on x86 except for interrupt number
System calls
  • Simulate interrupt using int instruction (old x86 method) or sysenter (newer x86's).

  • Typically just one trap number is used for all system calls (0x2e in Windows, 0x80 in Linux).
    • Further dispatch in the kernel through handler table indexed by syscall number (typically passed in AX register).
  • Details of system call are stored in registers.
  • Return value(s) come back in registers.
  • Note that system calls are expected to change register state, unlike interrupts and traps which are expected to be invisible to the running program.
BIOS calls
  • Like system calls, only jumping into ROM supplied with your motherboard.
    • Standard locations and interface dating back to 1981 IBM PC.
    • See BIOS_Interrupt_Calls for a sketchy list.

    • INT 13 for disk access (int $13 in AT&T syntax)

  • Mostly used in real mode by DOS.
  • Modern OSs bypass BIOS for most operations.
    • Except power management and other whizzy system control features that are closely tied to the motherboard hardware.

9. I/O

  • Interrupts don't provide much information beyond "look at me! look at me!"
  • Data is shoved back and forth through memory.
  • For early part of the course we'll let the BIOS deal with this.

10. BIOS

Stands for Basic Input/Output System. It's what comes built-in to your motherboard and what is called first at boot time. See BootProcess. It may also provide a primitive core OS that manages simple devices (although most modern OSs bypass this interface).

11. Memory

OS memory layout is pretty much the same as in a process. We have executable code (the text segment), preinitialized data (the data segment), dynamically-allocated data (the heap), and a stack.

In x86 real mode these segments are typically pointed to by segment registers, e.g. CS for the code segment, DS for the data segment, SS for the stack segment. See IntelAssemblyProgramming.

Certain physical addresses are reserved in IBM PC architecture. This puts constraints on where the OS can operate in real mode.

A typical setup is:

00000-003FF

Interrupt-vector

00400-01000

Buffers and system stuff

01000-90000

Kernel memory

90000-9FFFF

Kernel stack

A0000-

System ROM

B8000-

Video RAM

FFFF0-FFFFF

BIOS entry point

Important point: Physical addresses above FFFFF are inaccessible in real mode. Physical addresses above A0000 (640K) are here-there-be-monsters territory, not usable by the OS.


CategoryOperatingSystemsNotes

12. BootProcess

13. Boot process: general

To boot an OS we start with the bare hardware and whatever it provides, and load in increasingly large pieces of the OS until we have our system fully up and running. Typically this may involve several stages:

  1. Load up or have pre-stored some initial bootstrap routine.
  2. Bootstrap routine finds and initializes storage devices.
  3. Bootstrap routine reads boot loader from some boot device (e.g. a hard drive, CD-ROM, or USB key; or in the bad old days: punch cards, paper tape, cassette tapes).

  4. Boot loader reads kernel from boot device.
  5. Kernel initializes rest of hardware.
  6. Kernel loads user-space startup code.
  7. Startup code brings up user system.

14. PC architecture

On an x86 PC, the initial bootstrap is handled by the BIOS, which lives in ROM wired onto the motherboard. The BIOS expects to run in real mode (simulating a vintage 1981 8088-based PC), so that's what the system comes up in. The details of the process are as follows:

  1. BIOS setup.
    1. On reset, the CPU switches to real mode and jumps to FFFF0. (Really FFFFFFF0, but address gets truncated.) This is the BIOS entry point.
    2. BIOS executes Power On Self Test (POST).
    3. BIOS calls initialization routines provided in ROM by video card (at 0xC0000) and disk controller (0xC8000).
    4. Memory test.
    5. Look for I/O devices and assign them interrupt vectors.
    6. Look for bootable drive.
      • Boot sector at (0,0,1) tagged with 55 AA marker.
    7. Load boot sector to address 0000:7c00.
    8. Jump to boot sector.
  2. Boot sector.
    1. Initialize stack.
      • SS gets 9000
      • SP gets FFFE (for 16-bit mode)
    2. Load kernel to some standard location, e.g. 0000:1000
    3. Jump to kernel.

The rest of the setup is up to the kernel, and is likely to include switching to protected mode, setting up memory management and interrupt handling, further device initialization, initializing the filesystem, loading user-space programs, etc.


CategoryOperatingSystemsNotes

15. IntelAssemblyProgramming

You will need to learn some IA-32 assembly language programming for CS422. This document starts with pointers to IA-32 assembly language documentation and then continues with some specific details that might be more directly relevant to the course.

16. What you should read instead of this page

My recommendation is to start with Kai Li's notes on IA-32 programming at http://www.cs.princeton.edu/courses/archive/fall06/cos318/docs/pc-arch.html.

Then move on to the

Official IA32 Intel architecture software developer's manuals:

One trap in all of this is that gas, the GNU assembler, uses a different syntax for assembly language from the Intel style. See the gas manual for an extensive discussion of this.

A less authoritative guide to x86 assembly written in gas syntax can be found at http://en.wikibooks.org/wiki/X86_Assembly.

Two helpful gcc tricks:

  1. You can find out what the C compiler turns a given chunk of C code to using gcc -S file.c.

  2. You can make use of assembly-language code inside C code using the asm syntax in gcc. (Note that you may need to provide extra directives to the assembler if you are coding for unusual targets, e.g. 16-bit mode boot loaders should include asm(".code16gcc"); at the top of every C file). With sufficiently clever use of this feature you can keep most of your code in C and use assembly only for very specific low-level tasks (like manipulating segment registers, calling BIOS routines, or executing special-purpose instructions that never show up as a result of normal C code like int or iret). See the gcc documentation for more on using the asm mechanism.

http://devpit.org/wiki/Compiler_Options_for_Creating_Odd_Binaries has some nice discussion of how to generate unusual binaries using gcc and ld.

17. Real mode x86 programming

For the first few assignments you will be working in real mode, which is the x86 architectures mode that emulates a vintage 1976 8086 CPU. The main advantage of real mode is that you have a flat 20-bit address space running from 0x00000 to 0xFFFFFF and no memory management or protection issues to worry about. The disadvantage is that you only have 16-bit address registers to address this 20-bit space.

The trick that Intel's engineers came up with to handle this problem was to us segmented addressing. In addition to the four 16-bit data registers AX, BX, CX, and DX and the four 16-bit address registers BP, SP, DI, and SI there are four 16-bit segment registers (later extended to six) CS, DS, ES, and SS. Addresses in real mode are obtained by combining a 16-bit segment with a 16-bit offset by the rule 0x10*segment+offset. This operation is commonly written with a colon, so for example the physical address of the stack is SS:SP = 0x10*SS+SP.

17.1. Opcodes and operands

Instructions typically operate one or two registers, immediate values (i.e. constants), or memory locations. An instruction is written as an opcode followed by its operands separated by commas.

Perhaps the most useful opcode is mov, equivalent to an assignment. It comes in several flavors depending on the size of the value you are moving: movb = 1 byte, movw = 2 bytes, movl = 4 bytes. If you don't add the size tag the assembler picks one based on the size of the destination operand. In AT&T syntax as used in gas the first operand is the source, the second is the destination (this is backwards from Intel syntax).

Here is movw conjugated with various addressing modes:

    movw $4, %ax     # copy the constant 0x04 into AX
    movw 4, %ax      # copy the contents of memory location DS:0x04 into AX
    movw %ax, %bx    # copy the contents of AX to BX
    movw (%si), %ax  # copy the contents of memory location pointed to by SI to AX
    movw 4(%bp), %ax # copy the contents of location at SS:BP+4 to AX
    movw %ax, 12(%es:%si)  # copy AX to location ES:SI+12

Note that most of the time we don't bother specifying the segment register, but instead take the default: CS for instructions, SS for the stack (push, pop, anything using BP or SP), and DS for everything else except string instruction destinations, which use ES. But we can always specify the segment register explicitly as in the last example.

Arithmetic operations follow the pattern for mov, e.g.

    addw %ax, %bx    # add AX to BX (in C: bx += ax)
    incw 4(%bp)      # increment *(SS:BP+4)
    cmpw %cx, %dx    # compare CX to DX; like subtraction but throws away result

Control flow is handled by jump instructions. Targets are labels which are followed by a colon (think goto in C):

loop:
    jmp loop         # very fast loop

Conditional jumps are often more useful:

    movw $0, %ax
loop16:
    incw %ax
    cmpw %ax, $10
    jle loop16       # jump if A <= 0x10

Two specialized jumps are call and ret, which are used for procedure calls and returns. These push or pop the IP register on the stack as appropriate. The int and iret instructions are slightly more complicated variants of these used for simulating interrupts; we'll run into these more later.

See the documentation for many more instructions.


18. Processes

Processes are the abstraction that lets us divide up the CPU and other system resources between different programs. Historically, the idea of a process grew out of the separate virtual machines of classic mainframe OS architectures. Over time, some of the features traditionally associated with a process (such as a separate address space) have been shed, giving rise to a newer notion of a thread, which slices up the CPU without dividing up other resources. Curiously, this is almost the reverse of what happened with subroutines, which started out only affecting the instruction pointer in many early programming languages (BASIC, FORTRAN) but later added separate address spaces (local variables, objects). We'll follow the historical pattern by talking about processes first.

19. Processes

A process typically encompasses a bunch of things bundled together:

  • In user space:
    1. Access to the CPU!
    2. An address space, including:
      1. The running program.
      2. Pre-initialized data.
      3. A heap for dynamic allocation.
      4. A stack.
  • In kernel space:
    1. Pointers to various system resources used by the process (e.g. Unix file descriptors).
    2. Odd but useful historical legacies like the current working directory.

    3. Security information: What privileges the processes has, what user it is running on behalf of.
    4. Internal bookkeeping stuff: resource tracking, address-space management.

In order to implement processes, we need to solve two problems:

  1. How do we divide up the CPU between processes?
  2. How do we keep all the other bits of processes separate?

The first problem can be further divided into building a mechanism that lets the CPU switch between processes (multitasking) and choosing how to decide which process it should run next (scheduling). (This is an example of the split between mechanism and policy that recurs throughout OS design and development.) The second problem is a question of allocating the right data structures, which we'll talk about below, and setting up the memory system to give each process its own address space, which we'll defer to VirtualMemory.

20. Multitasking

Here's a picture of a single process A running along doing its stuff:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Here are two processes A and Be running along in parallel (say on a system with multiple CPUs):

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

And here are A and B sharing a single CPU:

AAAAAAAA             AAAAAAA                AAAAAA       
        BBBBBBBBBBBBB       BBBBBBBBBBBBBBBB      BBBBBBB

The trick here is that as long as A and B don't have the ability to detect that they've been shunted aside, they won't be able to tell the difference between this picture and the previous picture. So when designing a multitasking system, we just have to make sure that we clean up anything that B does when it's running so that A doesn't notice and vice versa.

We also have to figure out how to switch the CPU's instruction pointer from A's code to B's code. Fortunately this is not a new problem: We do it all the time when executing a procedure call. The difference is between executing a procedure call and doing a context switch between processes is that (a) we generally don't expect to come back, and (b) the "calling" process doesn't necessarily know that the context switch is happening. But since we already have mechanisms for calling other code, we can exploit this to build context switches. There are two typical ways this is done, depending on whether we expect our processes to cooperate or not.

20.1. Option 1: Cooperative multitasking

Here we implement a context-switch as a first-class operation triggered by the running process, typically through a yield system call. So we might write code like

#include <myfancyos.h>

int
main(int argc, char **argv)
{
    while(1) {
        check_for_input();
        do_something_about_it();
        yield();
    }

    exit();
}

(Note that exit will also be a system call or a library wrapper to a system call.)

20.1.1. Mechanism

Here most of the code just runs like standard C code. But when we execute yield, several strange things happen under the hood:

  1. The CPU state is saved. Typically this involves pushing the IP and process registers onto the stack. On an x86 machine, this could be done using the CALL instruction to save the IP (automatically done as part of the yield procedure call if it is implemented as a library procedure; for software interrupts we rely on similar behavior in INT) and the PUSHAD instruction to push all the registers. For other architectures that don't provide such convenient tools the process may be more complicated, but is usually a simple matter of programming. The last step is to save the stack pointer itself in the process control block—the data structure that holds all the extra junk the kernel has to keep track of for the process.

  2. Any other process state that shouldn't be clobbered is saved. At minimum this involves switching a pointer in the kernel to a new process control block. It may also involve talking to the memory management hardware to switch to a new address space and deal with any effects on caches.

As the automobile service manuals say, "installation is the reverse of removal":

  1. Copy the saved state of the new process to wherever it needs to go; switch the PCB pointer and update memory management if necessary.
  2. Restore the stack pointer and starting popping: POPAD, IRET or somesuch.

You should be wondering here how yield knows which processes to switch to; we'll talk about this under Scheduling below. You should also notice that even if we don't implement any other separation between processes, we at minimum have to have separate stacks. This is the main thing that separates a multi-threaded process from a standard single-threaded one.

Note that these steps are generally very architecture-dependent. Some CPU architectures (including the x86) include features for hardware context switching, where state changes can be handled by switching a pointer built into the CPU. Since these features tends to be even more architecture dependent and also depend on the OS agreeing to use the CPU's ideas of what to put in a PCB, they tend not to be used much in OS's aiming for portability. But they are handy if you are coding for a single architecture (like our friends in Redmond or the builders of some embedded systems) and really care about performance.

The advantage of cooperative multitasking is that the implementation is not very hard and you can yield only when it is safe to do so. The disadvantage is that you have to remember to yield a lot, which can produce filibusters if the processes want to be rude. This not only causes trouble with this code snippet:

   1     while(1);

But it also annoys anybody using the same machine as this fine program:

   1     while(1) {
   2         compute_wind_tunnel_simulation_without_yielding();
   3         yield();
   4     }

Here the on-screen mouse pointer is likely to move very jerkily (if at all), since if we can only switch contexts when yielding we don't get to run our window system very often. The solution is to interrupt these rude processes whether they like it or not.

20.2. Option 2: Pre-emptive multitasking

Pre-emption is when the kernel yields on behalf of a process rather than making the process do it. For the most part, this is a good thing: like running a garbage collector instead of malloc and free or using virtual memory instead of paging by hand, we are giving up a little bit of control to the underlying system in return for getting rid of a lot of headaches. Unfortunately in each of these cases we create new headaches. The particular new headache we get with pre-emption is that we have to take concurrency seriously.

20.2.1. Mechanism

Typically we still allow voluntary yielding, although this operation is usually hidden under some other blocking system call (e.g. read on data that isn't available yet). But we now also have incoming interrupts from that may produce a context switch between processes. The mechanism for switching out a process is now:

  1. Interrupt triggers switch to ring 0 interrupt handler. The interrupt might be from an I/O device but much of the time will be triggered by a timer every fixed number of milliseconds.
  2. Interrupt handler calls kernel-internal yield routine.
  3. Kernel-internal routine saves process state as before and jumps to the scheduler.

The mechanism for switching a process back in is essentially unchanged.

20.2.2. Gotchas

With pre-emptive multitasking, some other process may sneak in and break things while I'm not looking. With separate address spaces (or just separate data structures) this is not a problem. In a threading model with a common address space, this can lead to all sorts of nasty inconsistencies. This is particularly likely to come up inside the kernel, where a common approach is to have one thread per user process sharing a common set of core data structures.

The simplest approach to preventing bad outcomes of unwanted concurrency is to prevent the concurrency, usually through some sort of locking or critical section mechanism. A lock marks a data structure as inaccessible to anybody but the lock-holder. This requires politeness, and in a reasonable implementation also requires interaction with the scheduler so that processes don't spin waiting for a lock to become available. A critical section enforces that no other thread takes the CPU while it is running. This only requires turning off interrupts (CLI on x86). We'll talk about how to use these and how not to use them later.

21. Scheduling

We've deferred the question of how to choose which process to switch to. This is generally the job of a scheduler, which might range from anywhere from a couple of lines of code inside yield in a simple kernel to a full-blown user-space process in a microkernel-based design. However we set up the scheduling mechanism, we can separately consider the issue of scheduling policy. Here the goal is to balance between (a) keeping the kernel simple and understandable, and (b) not annoying the user. We also have to keep track of which processes are blocked waiting for locks or slow I/O operations.

Here are some common policies:

Round-robin
Processes move to the end of a queue when pre-empted (or unblocked). The next process to run is whatever is at the head of the queue. This has the advantage of speed and simplicity, but it doesn't give much control over resource allocation.
Random
Instead of putting the runnable processes in queue, put them in a bag and draw the next one to run at random. Theoretically this is approximately equivalent to round-robin assuming your random number generator is reasonably good. OS implementers hate it since an unlucky process may starve.
Priorities
Use a priority queue. High-priority processes take precedence over lower-priority processes. Priorities can be assigned either by the user (e.g. the update-the-mouse-pointer-now-so-the-user-doesn't-get-annoyed process beats the wind tunnel simulator) or by the system based on past performance (CPU hogs that have a history of getting pre-empted get lower priority that processes that yield the CPU quickly). This is the strategy used in Linux, which even has a particularly fast (constant-time!) priority queue implementation.
Real-time scheduling
Here some processes have hard constraints on when they run next or how often they run (think airplane control systems). The scheduler runs an algorithm that enforces these constraints if possible.
Fair-share scheduling
Each user gets a fixed slice of the CPU that is further divided between the user's processes using some other scheduling mechanism.
Let the user do it

This is the common microkernel approach alluded to earlier. The kernel's scheduler policy is to always run the designated user-space scheduler process, which can delegate its control of the CPU to specific other processes.

There are trade-offs in choosing any of these policies between simplicity, efficiency, performance, predictability, and user control. Which factors are the most important depends on the application. A typical modern OS will provide some hybrid policy whose default has been observed to work for most applications but which can be overridden by the user when necessary.

An orthogonal issue of scheduling policy is the quantum or minimum time that we schedule a process for before pre-empting it. With a small quantum we pre-empty a lot (which costs time). With a big quantum the mouse pointer stops moving. Consumer OSs will typically use a fixed quantum of about 10ms, which is less than most humans will notice and comparable to disk-access delays. Real-time embedded OSs may use quanta measured in microseconds that vary depending on the real-time constraints on other processes. Again these are policy trade-offs that depend on the application.


CategoryOperatingSystemsNotes

22. Threads

A thread abstracts the normal flow of control in a program: the sequence of addresses that the IP passes through. In a multi-threaded program, we have the illusion of multiple flows of control corresponding to multiple IPs. Typically this is implemented using time-sharing, where a single IP is switched between different threads. But (usually with OS support) it may also be implemented using multiple CPU cores.

Usually when we talk about threads we are assuming multiple tasks in the same address space; a single thread isolated in its own address space generally goes by the name of a process (see Processes). Some systems (e.g. Linux) use task to refer to either a thread or a process, the relevant property being that each task has its own IP and its own stack.

23. Why use threads?

Multithreaded programs are notoriously difficult to write and debug, since having multiple threads running concurrently with unpredictable timing makes a program's behavior itself unpredictable unless the programmer is very careful. So why would we both with threads? There are basically four reasons (SGG also give four, but they're a slightly different four):

  1. You bought a machine with more than one CPU core, and you'd like to use all of them. If you can rewrite your program to perform non-interfering tasks in separate threads, your program will run faster.
  2. Your program is naturally structured to allow parallel computation, and it's easiest to write it this way. There are two very common design patterns that lend themselves to parallel execution: producer-consumer pipelines (e.g. tr A-Z a-z | tr -cs "a-z'" '\n' | sort | uniq -c | sort -nr, a five-stage pipeline that finds the most common words in its input when run on Unix), and server programs (e.g. a web server handling multiple simultaneous requests).

  3. You want to handle multiple streams of interaction at once that share common data, but you don't want one of them to slow down the others. For example, in your GUI you don't want to wait for the user to type a key before updating the clock, and conversely the user doesn't want to wait for some excessively-complicated web page to render before typing a key. (Also the web server above, which doesn't want to get stuck reading half an HTTP request.)
  4. You wanted to spawn 10,000 processes, but since the difference between them was only a few kilobytes it made sense to share the common bits so your computer doesn't burn up. To a certain extent this can be handled with a good shared library implementation, but even exploiting shared libraries it's usually cheaper (both in time and memory) to spawn a thread than a whole new process.

24. Thread creation: user view

Typically the user will have some sort of thread library. On POSIX systems, the standard thread library is Pthreads. A typical call to create a thread in Pthreads looks like this:

   1     pthread_attr_init(&attributes);
   2     pthread_create(&thread_id, &attributes, function_to_run, function_argument);

Here thread_id is a retval that returns a handle to the thread (which can be used to do further nasty things to it), attributes is a set of thread options (set to a default value by pthread_attr_init), and function_to_run, and function_argument are the standard C-style closure consisting of a function and a void * argument. Pthreads also provides procedures for waiting for threads to finish (pthread_join), and various other useful tools like mutexes.

In object-oriented languages, threads are often implemented by inheritance from some standard thread class (e.g. Thread in Java or threading.Thread in Python). Here a thread is created by creating a new object of the appropriate class and kicking it to life with some standard run() method.

For more examples see SGG §4.3.

25. User threads vs kernel threads

Processes are typically implemented as kernel threads—threads that are implemented in and supported directly by the kernel. We can also have user threads, which don't necessarily require kernel support. For example, it's not hard to hack up a primitive threading system within a user process pretty much any modern C environment by allocating some extra stacks using malloc and abusing setjmp and longjmp to switch between them (note that this will require some machine-dependent bits to get right). Similarly, Java threads in early JVM implementations ran entirely inside the Java interpreter with no interference from or knowledge of the underlying OS. But in many cases it is helpful to have OS support for user threads. There are three basic models:

  1. Many-to-one: The simplest model as described above. A user-space thread library manages many user threads on top of a single kernel thread. The advantage is that it doesn't require kernel support. The disadvantage is that since the kernel doesn't know about user threads it can't schedule them, at most one user thread can run at a time (since the thread library can't get at more than one CPU), and any action that blocks the single kernel thread (e.g. a blocking read system call) blocks all the user threads.

  2. One-to-one: The kernel allocates one kernel thread for each user thread, e.g. using the clone system call in Linux. This gives essentially all the advantages might want from threads: better scheduling, concurrency across multiple CPU cores, no blocking between user threads. One slight disadvantage is that requiring kernel involvement increases costs, and may put a limit on how many user threads a process can have. Most widely-used OSs use this model.

  3. Many-to-many: A process may have many user threads and many kernel threads, with user threads mapped to the pool of kernel threads according to various possible rules. This combines the good features of the previous two models at the cost of additional complexity.

Here's something you don't see: nested threads, where one user thread in a many-to-one model may itself be divided into further threads. The closest a standard OS will get to this is running a virtual machine, although some microkernels have the machinery to support this. Exercise: What would nested threads actually look like, and why would or wouldn't you want to support them?

26. Why not use threads?

A program that uses threads will (in general) be nondeterministic: its behavior will change from one execution to the next. This makes debugging using the standard approach of making small changes and seeing if they improve things much more difficult—how do you know if the program is really fixed or just decided to behave differently this time? It also causes trouble when two (or more) threads share data structures without using some sort of ConcurrencyControl.


CategoryOperatingSystemsNotes

27. ConcurrencyControl

Concurrency co