When & Where
The second exam will be held in our regular classroom on Monday, March 30, 2026.
It will take up about half the lecture, starting approximately during the second half of the class period. Please arrive on time and do not plan on coming in just to take the exam. If you arrive after the exam has started, you will not be allowed to take it.
Exam rules
Be sure to arrive on time. If you arrive after the exam starts, you will not be allowed to take it.
This will be a closed book, closed notes exam. Calculators, phones, augmented reality glasses, laptops, and tablets are neither needed nor permitted. If you have these devices, you must turn them off, put them out of sight, and not access them for the duration of the exam.
No other electronic devices are permitted except for hearing aids, pacemakers, electronic nerve stimulators, other implanted medical devices, or electronic watches that function only as timekeeping devices or chronographs.
Bring a couple of pens or pencils with you.
Plan to use a pen only if you are supremely confident in not changing your mind about your answers. . Check here for information about pencils, sharpeners, and the craft of pencil sharpening.
Past exams
You can use my past exams as a guide to what this exam may look like. Some material has changed, so do not worry about questions that appear to relate to topics we have not covered. We covered topics I didn’t cover in past classes and removed some topics that seemed too detailed or are no longer relevant.
Expect around 25 multiple-choice questions. I do not refer to old exams when I come up with a new one, so it is likely that some of the topics that I considered important in past exams will show up on future exams.
Study guide
You are responsible for the material since the last exam (the four lectures and recitations starting from week 5).
The study guide is a concatenation of the study guides from the past lectures. It attempts to cover most of the material you should know. It is not a substitute for the lectures, lecture material, and other reading matter. All the material may not be in the guide. My goal is to put most of the information you need to know a concise with fewer elaborations.
You can also prepare your own guide, which would be a much better way to prepare for the exam!
Topics
Here is a list of topics you are expected to know for the exam. It is extensive, but it should be viewed as a coverage guide rather than a catalog of detailed algorithms or definitions to memorize.
Use it as a checklist. Go through each item and ask yourself whether you understand the concept and could recognize or apply it in context. If not, review the relevant notes.
Because the exam is multiple choice, your focus should be on understanding and being able to reason about the material, not on memorizing or reciting formal definitions.
Topics that you be familiar with and may be on the exam include:
Consensus
-
Partitions, split-brain problem
-
What is quorum?
-
Purpose of replicated state machines
-
What are replicated logs?
-
Understand consensus requirements
-
Validity
-
Agreement
-
Termination (= progress)
-
-
FLP Impossibility Result: what does it say?
-
What is the purpose of Paxos (not the algorithm or phases)?
-
Raft
-
Goal: fault tolerant consensus algorithm
-
Raft quorum
-
States: follower, candidate, leader
-
Terms and term numbers
-
Leader election, timeouts, heartbeat, split votes
-
AppendEntries
-
Log matching, committing entries
-
You don’t need to know about membership changes
-
Coordination Services
-
Goals, replication
-
Chubby
-
Purpose: lock manager, name server, and simple file system
-
Architecture of a Chubby cell: master and replicas
-
Named resources (files, locks)
-
Event notification (& purpose)
-
Whole file reads and writes
-
Don’t try to memorize the API
-
Client cache consistency
-
Consensus for replication
-
-
Zookeeper
-
Persistent vs. Ephemeral nodes
-
You don’t have to know about sequential znodes
-
You don’t hae to know the consistency model
-
You don’t hae to know locks can be constructed
-
-
etcd
-
Similarities with Chubby and Zookeeper (replicas via consensus-based log replication)
-
Key difference: leases are different objects
-
-
What is the purpose of fencing tokens?
Network-Attached Storage
-
Transparency via VFS, mounting filesystems
-
Access models: remote access vs. upload/downloa
-
Semantics: sequential vs. session semantics
-
Stateful vs. stateless design
-
Caching options
-
Case studies
-
NFS (original design)
- Goals
-
You do not need to know the automounter (we did not discuss it in class)
-
UDP vs TCP transport
-
Directory and file access protocol (don’t memorize the list of RPC calls but understand why they exist, what lookup does, why there’s no open)
-
Caching, validation, possible inconsistencies
-
Know that a user-level lock manager was added to NFS but otherwise don’t bother with features of successive versions.
-
-
AFS (original design)
-
Goals
-
Caching model, semantics, callbacks
-
Volumes, uniform global namespace
-
-
Coda
-
Goals
-
Replicated volumes
-
Accessible Volume Storage Group
-
Client modification log
-
Resolution
-
Disconnected operation
-
Reintegration
-
-
SMB
-
Goals
-
Caching model: oplocks/leases (don’t memorize oplocks or leases but know what their purpose is)
-
Microsoft DFS: just know that it adds consistent naming, similar to AFS
-
-
-
Enhancements to NFS and SMB
-
You don’t have to know version numbers (e.g., NFSv4 vs SMB2)
-
You don’t have to knw anything about the SMB3 features
-
Just understand the point of the mechanisms
-
SMB2 – understand these concepts that improved performance
-
Pipelining
-
Compounding
-
-
NFSv4
-
the protocol is now stateful
-
supports better caching (similar to oplocks)
-
compound RPC (compounding)
-
-
Distributed Lookup (Distributed Hash Tables)
-
Purpose: distributed, highly available key-value store
-
What is consistent hashing?
-
Content-Addressable Network
- Multi-dimensional hashes, zones, routing, node insertion
-
Chord
-
Ring, successor nodes
-
How do finger tables make query routing more efficient?
-
-
Amazon Dynamo
-
Understand the goal: distributed, highly available key-value store
-
Functional interface: get, put
-
Consistency model for replication (eventually consistent, but tunable via N, W, R)
-
Goals: incremental scalability, symmetry, decentralization, heterogenous systems
-
Conflict resolution
-
Who does it?
-
Use of vector timestamps for conflict detection
-
-
Partitioning among multiple nodes
-
What is a virtual node and why are they a good idea?
-
Hinted handoff
-
Distributed Lookup: DNS
-
Stub resolver
-
Recursive resolver
-
Authoritative server
-
Caching/consistency
Distributed file systems
-
GFS
-
Design goals, data & metadata
-
chunkservers - what do they do?
-
master - what does it do?
-
Google cluster environment: colocate computation and data on same set of machines
-
operation log
-
why large chunks?
-
writing & replication: data flow vs. control flow
-
You don’t have to know record append and snapshot features
-
You don’t have to know how checksums and chunk version numbers are used
-
-
HDFS
-
HDFS is practically a clone of GFS. Focus on GFS terminology. If I use any HDFS terminology, I will identify the GFS equivalent
-
NameNode = Master
-
DataNode = Chunkserver
-
blocks = chunks
-
Transaction log = Operation log
-
You do not have to know any of the differences between GFS and HDFS (they’re minor)
-
-
Dropbox (from the homework video and notes)
-
Goal
-
How does server traffic load differ from most other sites that store data (facebook, twitter, …)?
-
What’s a metadata server (metaserver)?
-
Why was a notification server added? Polling vs. notifications
-
Distributed transactions
-
Transactions, write-ahead log, commit, abort
-
ACID properties
-
Concurrency control
-
Optimistic vs. pessimistic
-
Read locks vs. write locks
-
Two-phase locking
-
Strong Strict Two-Phase Locking (SS2PL)
-
Private workspace (for OCC)
-
Multi-version concurrency control (MVCC)
-
Snapshot isolation
-
-
Deadlock
-
Wait-for-graph
-
Centralized detection, phantom deadlocks
-
Chandy-Misra-Haas algorithm, probe messages
-
Timestamp-based prevention: wait-die and wound-wait
-
You don’t have to remember the difference between wait-die and wound-wait
-
-
Two-phase commit protocol
-
Understand the need for a log
-
What happens in each of the phases?
-
Problems if coordinator or participants die
-
Uncertain state
-
-
Three-phase commit protocol
-
Precommit phase
-
Supporting coordinator recovery
-
Main problem with 3PC
-
-
Consistency models
-
Linearizability
-
Sequential consistency
-
Causal consistency
-
Eventual consistency
-
-
Scaling via replication and Brewer’s CAP Theorem
-
Consistency, Availability, Partitions
-
PACELC: how does it extend CAP?
-
-
Eventual consistency: BASE approach vs. ACID
Last update: Sat Mar 28 19:55:18 2026