Reading and Schedule

Below are the schedule and reading list of this course.

Date Reading Lead
1/3 Course Introduction and the History of Virtualization
Yiying
Slides
1/5 Background and Virtualization Overview
Comet Book Chapter on Virtual Machine Monitors
Additional Readings

  1. Formal Requirements for Virtualizable Third Generation Architectures (Comm ACM 1974)
  2. Disco: Running Commodity Operating Systems on Scalable Multiprocessors (TOCS'97)
  3. Scale and Performance in the Denali Isolation Kernel

Yiying
Slides
Annotated Slides
1/10 Virtualizing CPU
A Comparison of Software and Hardware Techniques for x86 Virtualization (ASPLOS'06)
Questions

  1. Why is x86 un-virtualizable with trap-and-emulate? Give one example.
  2. How are jump instructions translated?
  3. With hardware virtualization extensions (e.g., Intel VT), do we still need binary translation? Why or why not?

Additional Readings

  1. The Evolution of an x86 Virtual Machine Monitor
  2. Software Techniques for Avoiding Hardware Virtualization Exits
  3. Embra: Fast and Flexible Machine Simulation
  4. Fast Dynamic Binary Translation for the Kernel
  5. Enabling Intel Virtualization Technology Features and Benefits

Yiying
Slides
Annotated Slides
1/12 Virtualizing Memory
The first three pages of Performance Evaluation of Intel EPT Hardware Assist and at least the first four sections of Memory Resource Management in VMware ESX Server (OSDI'02)
Questions

  1. List at least one pro and one con for software MMU and for hardware MMU
  2. What is the double paging problem and what caused it?
  3. What is the benefit of keeping a "hint" entry for each scanned (but unshared) page (as compared to not maintaining anything for the page)

Additional Readings

  1. Difference Engine: Harnessing Memory Redundancy in Virtual Machines

Yiying
Slides
Annotated Slides
1/19 Virtualizing I/O
First three sections of virtio: Towards a De-Facto Standard For Virtual I/O Devices and
first three sections of High Performance Network Virtualization with SR-IOV and
Network Virtualization Overview
Questions

  1. Is virtio a full virtualization or a paravirtualization technique? What's its main benefit?
  2. List at least one limitation of SR-IOV
  3. What are the similarities and differences between network virtualization and traditional server virtualization?

Additional Readings

  1. vIC: Interrupt Coalescing for Virtual Machine Storage Device IO
  2. ELI: Bare-Metal Performance for I/O Virtualization
  3. Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor (ATC'01)
  4. Network Virtualization in Multi-tenant Datacenters (NSDI'14)
  5. The Design and Implementation of Open vSwitch (NSDI'15)

Yiying
Slides
Annotated Slides
1/24 Container Basics
Understanding and Hardening Linux Containers (mainly Ch 2 to Ch 5; you can ignore many of the details in these chapters. Read Ch 1 for more background on virtualization. Read other chapters if you are interested in security.)
Quiz 1
Questions

  1. What types of isolations does Linux containers achieve?
  2. Can one Linux container affect the performance of another Linux container on the same machine (i.e., performance isolation)? Why or why not?
  3. Why do you think containers are less "secure" than virtual machines?

Additional Readings

  1. LXC/LXD
  2. Docker
  3. Understanding Security Implications of Using Containers in the Cloud
  4. Container Security: Issues, Challenges, and the Road Ahead
  5. Slacker: Fast Distribution with Lazy Docker Containers

Yiying
Slides
1/26 Kubernetes and gVisor
Kubernetes and gVisor
Questions

  1. What is a Kubernetes Pod? How do you think it is useful in container orchestration?
  2. What does Kubernetes use etcd for? Why is having a consistent, atomic key-value store important for Kubernetes' control plane?
  3. Vulnerabilities in the Linux kernel makes it unsafe for containers to call Linux system calls. How does gVisor solve this problem?

Additional Readings

  1. Borg, Omega, and Kubernetes (Google)
  2. The True Cost of Containing: A gVisor Case Study
  3. Container Isolation at Scale (Introducing gVisor) - Dawn Chen & Zhengyu He, Google
  4. Nabla Containers

Yiying
Slides
1/31 Serverless Computing
Pages 3 to 8 of Cloud Programming Simplified: A Berkeley View on Serverless Computing and briefly about Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider (ATC'20)
Questions

  1. Today's serverless functions are stateless. How do you think different functions can share data and communicate?
  2. Can you think of any security threats of serverless computing? Bonus points if you can outline a real threat/attack.
  3. Can you think of any other ways to reduce or avoid cold start for serverless computing (other than what the ATC'20 paper talks about).

Additional Readings

  1. Amazon Lambda
  2. Google Cloud Functions
  3. Azure Functions
  4. Serverless Computing: Current Trends and Open Problems
  5. Serverless Workflows with Durable Functions and Netherite
  6. Serverless Computing: One Step Forward, Two Steps Back

Yiying
Slides
Mohammad Shahrad (Guest Speaker)
Slides
2/2 Serverless Computing 2
Pocket: Elastic Ephemeral Storage for Serverless Analytics (OSDI'18)
Questions

  1. Why isn't using existing in-memory key-value stores such as Redis and Memcached a good option for storing ephemeral data in serverless computing?
  2. How does Pocket balance storage load?
  3. Do you think Pocket solve all the problems of managing states in serverless computing? If not, what do you think are the remaining problems?

Additional Readings

  1. Occupy the Cloud: Distributed Computing for the 99% (PyWren)
  2. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads
  3. SAND: Towards High-Performance Serverless Computing
  4. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads
  5. A Case for Serverless Machine Learning
  6. Archipelago: A Scalable Low-Latency Serverless Platform
  7. Cloudburst: Stateful Functions-as-a-Service

Yiying
Slides
2/7 LibraryOS
Unikernels: Library Operating Systems for the Cloud (ASPLOS'13)
Questions

  1. Name one benefit and one drawback of compiling a single-image VM.
  2. Unikernel runs in a single address space. Give one example of how this design helps improve performance.
  3. Comparing gVisor and Unikernels, which one do you think is more secure and which is more lightweight?

Additional Readings

  1. Unikernels as Processes
  2. Unikernels are unfit for production
  3. Rethinking the Library OS from the Top-Down
  4. Mirage OS
  5. Nabla Containers
  6. ClickOS and the Art of Network Function Virtualization
  7. Libra: a library operating system for a JVM in a virtualized execution environment
  8. Exokernel: an operating system architecture for application-level resource management
  9. Dune: Safe User-level Access to Privileged CPU Features (OSDI'12)

Yiying
Slides
2/9 Amazon Firecracker
Firecracker: Lightweight Virtualization for Serverless Applications (NSDI'20)
Questions

  1. What is the benefit of Firecracker over gVisor in terms of the specific goals Amazon has for their cloud production environments?
  2. What mechanism(s) allow Firecracker to run thousands of MicroVMs on the same machine (with 10x-20x oversubscription rate)?
  3. Why do you think Firecracker (when deployed to power AWS Lambda) run one process (one slot) in one MicroVM?

Additional Readings

  1. Amazon Firecracker Git repo
  2. Kata Containers

Yiying
Slides
2/14 Para-Virtualization
Xen and the Art of Virtualization (SOSP'03)
Quiz 2
Questions

  1. Why can Xen allow guest OS system call handlers to be accessed directly (without any ring-0 Xen involvement) but not guest page fault handler?
  2. What's the benefit of using asynchronous event notifications from Xen to a VM?
  3. What goals of Xen are not valid or less valid in today's cloud environments?

Additional Readings

  1. Understanding Full Virtualization, Paravirtualization, and Hardware Assist
  2. Safe Hardware Access with the Xen Virtual Machine Monitor
  3. Optimizing Network Virtualization in Xen
  4. Measuring CPU Overhead for I/O Processing in the Xen Virtual Machine Monitor
  5. Breaking Up is Hard to Do: Security and Functionality in a Commodity Hypervisor (SOSP'11)

Yiying
Slides
2/16 KVM and QEMU
kvm: the Linux Virtual Machine Monitor,
and QEMU, a Fast and Portable Dynamic Translator (It's OK to not fully understand Section 2)
Questions

  1. What is the implication of KVM forwarding I/O requests to the user space?
  2. What is the benefit of QEMU first translating the source instructions (guest) into micro-operations implemented in C and their compiled object files and then translating the object files into the target instructions (host)?
  3. Can you think of some good use cases for QEMU+KVM?

Additional Readings

  1. KVM Documentation

Yiying
Slides
2/23 Security
When Virtual is Harder than Real: Security Challenges in Virtual Machine Based Computing Environments (HotOS'05)
and Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds (CCS'09)
Questions

  1. Can you think of some drawback of enforcing security mechanisms at the hypervisor level (compared to at the guest OS or above)?
  2. When a zone and/or an instance type are used more frequently (i.e., having higher loads from more tenants), do you think the co-location attack would be come easier or harder? Why?
  3. Do you think a similar co-location attack exist with serverless computing (i.e., one function attacking another function on the same physical machine)? Does serverless computing make such attacks harder or easier and why?

Additional Readings

  1. Secure Container Isolation: Problem Statement & Solution Space
  2. When Virtual Is Better Than Real (HotOS'01)
  3. Secure Pods: Sandboxing workloads in Kubernetes
  4. TrustVisor: Efficient TCB Reduction and Attestation
  5. SecVisor: A Tiny Hypervisor to Provide Lifetime Kernel Code Integrity for Commodity OSes (SOSP'07)
  6. Breaking Up is Hard to Do: Security and Functionality in a Commodity Hypervisor (SOSP'11)
  7. InkTag: Secure Applications on an Untrusted Operating System (ASPLOS'13)
  8. Overshadow: A Virtualization-Based Approach to Retrofitting Protection in Commodity Operating Systems
  9. VirtuOS: An Operating System with Kernel Virtualization
  10. SCONE: Secure Linux Containers with Intel SGX
  11. Understanding Security Implications of Using Containers in the Cloud (ASPLOS'08)
  12. Container Security: Issues, Challenges, and the Road Ahead

Ayush Agarwal
Slides
Yiying
Slides
2/28 Virtualizing non-CPU Processors (Accelerators)
GPU Virtualization on VMware’s Hosted I/O Architecture
and Do OS abstractions make sense on FPGAs? (OSDI'20)
Questions

  1. More than ten years later, GPUs are used more for running machine-learning workloads and in the cloud environments. For such uses, which virtualization techniques outlined in Section 3 do you think work better?
  2. Is space sharing or time sharing harder (e.g., bigger performance overhead, harder to implement, etc.) on FPGA? Why
  3. Does the idea of "API remoting" apply to FPGA? For example, can we let VMs call APIs that have their implementation in an FPGA? Is that a good idea, and how's it different from API remoting on GPU?

Additional Readings

  1. AvA: Accelerated Virtualization of Accelerators
  2. A Full GPU Virtualization Solution with Mediated Pass-Through (ATC'14)
  3. Sharing, Protection and Compatibility for Reconfigurable Fabric with AmorphOS (OSDI'18)
  4. Accelerating & Optimizing HPC/ML on vSphere Leveraging NVIDIA GPU (2019/02 talk)
  5. GPUvm: Why Not Virtualizing GPUs at the Hypervisor? (ATC'14)
  6. PTask: Operating System Abstractions To Manage GPUs as Compute Devices (SOSP'11)

Sitan Liu
Slides
Yiying
Slides
3/2 New Cloud Infrastructure
Amazon Nitro (esp. the video talk on that page)
Questions

  1. With Amazon Nitro, virtualization functions are mostly offloaded to hardware. Do we still need a hypervisor (or an OS)? Can everything just run in user space and interact with Nitro cards directly?
  2. Can you think of a drawback of offloading tasks to hardware (i.e., Nitro's approach)?

Additional Readings

  1. Intel Unveils Infrastructure Processing Unit

Yiying
Slides
3/7 Course Summary
Hints for Computer System Design - Butler Lampson
Quiz 3
Questions

Read the "Hints for Computer System Design" paper and summarize what you have learned over the course. Feel free to write about anything else you want to comment on the course.

Yiying
Slides
3/9 Project Presentations