"Exokernel: An Operating System Architecture for Application-Level Resource Management"

Problem

How can we design an OS such that it improves performance of standard applications while providing flexibility to enable applications to customize resource management to improve performance?

Approach

extensibility: application knows best what resource mangement it needs. Therefore, it should make decisions whenever possible (end-to-end argument)
minimalist: kernel's job is to protect resources, not to manage them ("separate protection from resource management")
challenge: identify core of the abstractions for different resources
Thin kernel, fat OS libraries

Background: extensibility

There are five approaches to extensibility (including Exokernel):

OS per application

os per application

Example: Fluke
Idea:
- Hypervisor provides resource management and isolation
- Additional guest-OS layers redundant and unnecessary
- Collapse guest OS and application into same domain (typically compiles OS and app into the same binary)
Pros:
- Fast (same advantage provides by Unikernel)
Cons:
- Co-existence applications
- Kernels are fragile and hard to modify

Microkernels

microkernels

Examples: Hydra, Mach
Idea:
- Minimal OS core to manage hardware
- Higher level abstractions in user space
- IPC fundamental cross-domain primitive
Pros:
- Fault isolation
- Better extensibility
Cons:
- Slow (kernel crossings)
- Limited extensibility (maky make it easeir for OS developer to extend, but not user)

Virtual machines

Examples: VM370, Disco, VMware, Xen
Ideas:
- Different apps need different OSes
- Figure our how to run more than one OS at a time
Pros:
- low-level interface ("ideal" according to Exokernel standard)
Cons:
- "emulate" machine vs. "export" resources (e.g., need to emulate "privileged" instructions)
- Poor IPC (traditionally): machines isolated
- Hide resource management

Download untrusted code into kernel

download untrusted code into kernel

Examples: Spin, Vino
Ideas:
- OS provides extensibility interfaces
- Apps provide extensions that execute in kernel mode
Pros:
- extension
Cons:
- Still working with same OS structure
- Only extensible within limits of extensbility API
- New thicket of isolation and trust issues

System designs

High-level Architecture

Like previous four types of system, exokernel is another system architecture style focusing on the system extensibility.

exokernel

exokernel architecture

Top level structure:
- A small monolithic kernel
  - low-level, fixed interface
  - Ideally hardware interface
  - few and simple abstractions
  - extension types: resource state data (page table entries), specialized resource management modules
- Libraries of untrusted resource management routines
  - VM replacement
  - File System
  - IPC
  - ...

Note

Libraries are part of OS. Historically, OS was set of libraries for math, etc. However, it is not true today.

Key difference - trust: application can write over library, jump to bad address in library, etc. Thus, kernel cannot trust library.
Exokernel borrows liberally from other approaches:
- Like Fluke: make it easy for each app to have custom OS
- Like virtual machine: exokernel exports virtual machine but different in
  - Transparency: traditional VM wants to run unmodified OS's; exokernel VM wants to support custom OS's
  - Export rather than emulate resource: libOS is aware of multiplexing
- Like Vino, Spin: one mechanism for extensibility is to download untrusted code into kernel
Philosophy
- Traditional OS = protection + abstraction
- Exokernel:
  - Protection = kernel (minimal mechanism) + library (resource sharing policy)
  - Abstraction = library

Exokernel principles

Separate protection and management
- export resources at lowest level possible with protection (e.g., disk blocks, TLB entries, etc)
- resource management only at level needed for protection (e.g., allocation, revocation, sharing, tracking of ownership)
- "abstraction (mechanism) is policy": the implementation of abstractions in library operating systems can be simpler and more specialized than in-kernel implementations, because library operating systems need not multiplex a resource among competing applications with widely different demands
Expose allocation: allocations allocate resources explicitly
Expose names: use physical names (physical memory (cache coloring), disk arm position)
Expose revocation: let apps choose which instances of a resource to give up
Expose information: let application map in (read only) internal kernel data structures (e.g, software TLB, CPU schedule, etc)
Exterminate all operating system abstractions (end-to-end)

Key Mechanisms

Secure bindings

Bind at large granularity; access at small granularity (allow kernel to protect resources without understanding them)
Do access check at bind time, not access time (e.g., when loading TLB entry for a page, not when accessing page)
Examples:
- Hardware: TLB
- Software: Software TLB cache
- Download code (e.g., packet filter): type safe language, sandboxing, interpreters, etc
- Traditional file system: open file/read and write file
Challenge: secure bindings vs. Saltzer "complete mediation"

Visible revocation

Transparent revocation (Traditional OS)
- OS decides how many resources to give to apps
- OS chooses what to revoke and takes it
- Needed for performant frequent revocation (e.g., address space identifier (ASID))
Notify on revocation (Exokernel)
- abort protocol; repossession vector; scheduler activations
- OS decides how many resources to give to apps
- OS chooses what to revoke, takes it, and tells application (or libOS)
  - Call application handler when taking away page, CPU, etc
  - Application can react: update data structures (e.g., reduce # of threads when CPU goes away; scheduler activations) and decide what page to give up
- Reposes dirty disk block (store to "swap server")
- ASIDs (processor addressing-context identifiers) are identified as a resource best revoked transparently because of frequent revocation
Cooperative revocation (Exokernel)
- callbacks
- OS decides how many resources to give to apps
- OS asks application or libOS to give up a resource; libOS/app decides which instance to give up

Abort protocol

When voluntary revocation fails, kernel tells application what it took away. Doing so helps library to maintain valid state specification

Capabilities

Encryption-based tokens to prove right to access
Idea is to help kernel make access-rights decision
Pros:
- Simple
- Generic across resources
- Hierarchical: using capabilities to protect resources enables applications to grant access rights to other applications without kernel intervention. Applications can also use "well-known" capabilities to share resources easily

Others

Wakeup predicates: wake up process when arbitrary condition becomes true (checked when scheduler looking for something to run)
Buffer cache registry: bind disk blocks to memory pages (applications can share cached pages)
Block stat to order writes
UDF

Specific Abstractions

Many abstractions need to be implemented in exokernel: exception handler, page protection/sharing, processor scheduling, fork/exec, VM replacement, network protocol, file system. Here, I only list paper's discussion related to network.

Network

Multiplexing the network: packet filter

Idea: load a small piece of code that examines packet and decides if it is for me
Implement by downloading code into kernel: written in simple, safe language - no loops, check all mem references, etc
Problem: what if I lie and say "yes it is for me" when it isn't?
- Solution: "assume they don't lie"
- Claim: could use a trusted server to load these things or could check to make sure that a new filter never overlaps with an old one [not like to solve the problem]

Application-specific safe handlers (ASH)

Application-level message handlers that are downloaded into kernel (can reply to packet without context switch)
- Example: auspex file server responds to NFS getattr requests in hardware in network interface
Pros:
- direc message vectoring: ASH knows where message should land in user memory and thus, avoid copies
- dynamic integrated layer processing (e.g., do checksum as data is copied into network inteface)
- message initiation (fast replies)
- No danger of deadlock
- control initiation (active messages)
If we see Figure 2 of the paper: without ASH, exokernel just drops message in application buffer and later, when application is scheduled, application handles it. Since in the paper, round robin scheduler is used, we see linear increase in ping latency.