Exam Info
When & Where
The final exam will be held in our regular classroom on Wednesday, December 20, 2023. It will start promptly at 8:00pm and you will have until 10:00pm to complete it.
Exam rules
Be sure to arrive on time. If you arrive after the exam starts, you will not be allowed to take it.
This will be a closed book, closed notes exam. Calculators, phones, augmented reality glasses, laptops, and tablets are neither needed nor permitted. If you have these devices, you must turn them off, put them out of sight, and not access them for the duration of the exam.
No other electronic devices are permitted except for hearing aids, pacemakers, electronic nerve stimulators, other implanted medical devices, or electronic watches that function only as timekeeping devices or chronographs.
Bring a couple of pens or pencils with you. Plan to use a pen only if you are supremely confident in not changing your mind about your answers. . Check here for information about pencils, sharpeners, and the craft of pencil sharpening.
Past exams
The exam will be similar in structure to mid-semester exams but will cover material for the entire course. You can use past exams as a guide to what this exam may look like.
I do not refer to old exams when I come up with a new one, so it is likely that many of the topics that I considered important in past exams will show up on future exams. Some material may have changed, however, so do not worry about questions that appear to relate to topics we have not covered.
Study guide
You are responsible for the material covered throughout the semester.
The study guide is largely a concatenation of the first three study guides. It attempts to cover most of the material you should know but it is not a substitute for the lectures, lecture material, and other reading material. My goal is to put most of the information you need to know in as concise a form as possible, with more elaboration in areas where other coverage may be lacking.
Topics
Topics that you should know and may be on the exam include:
Introduction & Key Concepts
- Distributed system - definition
- Single system image
- Moore’s law
- Horizontal vs. vertical scaling
- Metcalfe’s law
- Multiprocessor vs. multicomputer (distributed system, networked computers)
- Fault tolerance
- Partial failure
- Fault tolerance definition
- Redundancy
- Recoverability
- Availability vs. reliability
- Series vs. parallel failures (understand the impact but you don’t need to memorize the formulas)
- Types of faults
- Fail-stop, halting
- Fail-restart
- Omission
- Timing
- Partition
- Byzantine failure
- Partial failure
- Single point of failure
- Latency: synchronous and asynchronous transmission modes
- State, replicas, caching
- Caching vs. replication
- Stale data
- Consistency
- Global state
- Transparency goals (just a high-level understanding)
- Location
- Migration
- Replication
- Concurrency
- Service models
- Centralized
- Client-server
- Multi-tier (how does it compare with layered architectures?)
- Peer-to-peer
- Hybrid
- You don’t need to know any of the “-as-a-service” models
Networking
Understand the concept of protocols and layering
- I will not ask you to enumerate the OSI or IP protocol stack but you should understand the functions of the data link, network, transport, and application layers.
Data Link Layer
- Local Area Network (LAN)
- Ethernet - service guarantees
- Data link addressing vs. network addressing: MAC address vs. IP address
Network Layer
- Internet Protocol (IP)
- connectionless service
- IP principles: survivable, decentralized design, routing
- delivery guarantees
Transport Layer: TCP & UDP
- Transport endpoints (ports) vs. network endpoints (addresses)
- TCP
- Understand connection-oriented service, full-duplex connections, reliable & in-order data delivery, flow control, and congestion control
- UDP
- Understand connectionless service and UDP’s delivery guarantees
- What are advantages of using UDP?
Clients and servers and services
Protocol encapsulation concept
- Ethernet device driver: encapsulate IP within an ethernet packet
- IP: encapsulates UDP or TCP packets within an IP packet
- TCP/UDP: encapsulates user data within a TCP or UDP packet
Sockets
- Concept
- Operations at the system call level: create, bind, connect, listen, shutdown, communication (know what they do, not what the parameters are)
General principles of the design of the Internet
- Connection of disparate networks via routers
- Unreliable communication
- End-to-end principle
Remote procedure calls
Concept of a remote procedure call (RPC)
Language vs. OS construct
Stub functions (proxies and skeletons)
Marshalling
- Data representation, endianess
- Pointerless representation
- Serialization
- Pass by reference issue
- Explicit versus implicit typing
Semantics of remote procedure calls
- Failure of RPCs
- at most once and at least once semantics
Idempotent vs. non-idempotent functions
Interface definition language (or notation) – purpose
Purpose of an RPC compiler
Purpose of an RPC name service
Boot ID, channel ID (from the homework)
Marshaling and serialization examples
- RPC-specfic solutions: XDR, NDR
- Google Protocol Buffers
- JSON
- XML
Remote procedure calls: case studies
Sun (ONC) RPC
- Service registration and discovery (portmapper)
- Need for Interface definition language, IDL
- Need for RPC compiler, rpcgen
- Need for versioning
DCE RPC (improvements over SUN RPC)
- Cells and cell directory server
- Unique universal ID: UUID
- DCE Network Data Representation: multi-canonical data representation
- Types of semantics supported (from the homework)
- Asynchronous requests and status checking (from the homework)
Microsoft COM+/DCOM/ORPC: high-level concepts
- Object RPC (ORPC): add interface pointer identifier to DCE RPC
- Build local COM object that uses ORPC to talk to server
- Surrogate process (loads objects at server)
- Marshaling: DCE Network Data Representation
- Garbage collection: remote reference counting and pinging
Java RMI (general concepts, service registration and lookup, serialiazable class, remote class)
- rmiregistry
- Remote interface (for remote references)
- Serializable interface (for parameters)
- Garbage collection in RMI: lease-based (dirty/clean)
Python RPyC (general concepts & unique additions)
- Symmetric operation: both sides can invoke functions on the other side
- No name server: have to know host & port
- Client can discover remote operations and generate proxy objects on the fly
- Passing data: by value for simple types, by reference for objects
- Objects passed by reference mean the remote side calls back to access the object or invoke methods
- Support for synchronous & asynchronous calls
Web services (general concepts)
- What is a Service Oriented Architecture (microservices)
- Don’t bother with the subtle distinctions betweeen SOA and microservices
- Benefits of using HTTP for transport
- XML-RPC and SOAP (very general concepts)
- Purpose of WSDL
- REST (general concepts; use of HTTP commands)
- I will not ask about AJAX and XMLHTTPRequest
gRPC (from the homework)
- Design goals
- Capabilities beyond basic RPC: timeouts, streaming (sequence of RPCs), HTTP/2 for transport
Clock synchronization
- What is UTC?
- Clock drift, offset, and jitter
- Drift compensation - linear compensation function
- What’s a time server?
- Cristian’s algorithm
- Understand the goal and formula for Cristian’s algorithm
- Error bounds. Remember that errors are cumulative (additive).
- Berkeley algorithm
- Understand the goal and how to compute time with the Berkeley algorithm
- Role of the leader
- What is a fault tolerant average?
- NTP
- NTP synchronization subnet
- NTP strata
- Offset
- What is a message delay?
- What is jitter?
- You don’t need to memorize the NTP(SNTP) formula but be sure to understand how it gives you the same result as Cristian’s
- You don’t need to know how dispersion and jitter are calculated
- You don’t need to know the NTP message structure or validation tests
- PTP
- Goal of PTP vs. NTP
- Role of master
- What is a best master clock?
- Don’t memorize the formula but understand the reason that there are three messages
Logical clocks
- Causal vs. concurrent events
- Lamport’s algorithm
- Goals
- Happened-before relationship
- Partial ordering
- Algorithm
- Deficiencies
- Remedy for generating unique timestamps from Lamport timestamps
- Vector clocks
- Goals
- Algorithm
- Know how to identify concurrent events by comparing timestamps
- Understand how to represent vector clocks if the group membership is unknown (sets of node:clock values)
Group communication
Unicast, broadcast, multicast
Multicast via broadcasting and diffusion group
Use of a coordinator, hierarchical coordinators
Multicast via publish-subscribe model
I won’t ask about anycast
Closed vs. open groups; hierarchical groups
Understand each of these failure modes:
- Crash failure
- Omission failure
- Byzantine failure
- Partition failure
Reliability of message delivery
- unreliable, best-effort delivery
- reliable: understand what it does
- Understand the concepts of pipelining, cumulative acknowledgments, piggybacked acknowledgments
- Positive vs. negative acknowledgments and feedback implosion
- How can you implement negative acknowledgments?
- Optimizing negative acknowledgments with feedback suppression
- Hierarchical feedback control to handle feedback implosion
- atomic: understand the fault-tolerant nature and the possible need for a persistant log. I will not ask about implementation
Ordering
- Good (consistent) vs. bad ordering
- Sending versus receiving versus delivering messages, hold-back queue
- Global-time order: understand what this means
- total order: understand how to achieve this
- causal (partial) order
- understand what this is
- precedence vector: understand how to use it
- understand when to hold back
- sync order: understand what this means
- Single Source FIFO (SSF): understand how to achieve this
- unordered multicasts
IP multicasting
- IP class D address – what is its purpose?
- You don’t need to know about ethernet multicast or the mapping between Ethernet and IP addresses
- Internet Group Management Protocol (IGMP)
- Purpose (host to edge router)
- Multicast-aware router
- Understand the basic protocol
- You don’t need to memorize the distinctions between v1, v2, and v3 of IGMP but know that v2 added a leave message and v3 added the ability to specify the source of a multicast
- Commands: membership query, membership report, leave group
- Protocol Independent Multicast (PIM)
- IGMP vs. PIM
- PIM Dense Mode multicast: flooding
- PIM Sparse Mode multicast
- Initiation
- Reverse path forwarding
- Rendezvous points
- PIM Dense Mode
- Initiation
- What is pruning?
Virtual Synchrony
- Faults
- Fail silent: fail-stop vs. fail-recover
- Byzantine failures
- Synchronous vs. asynchronous systems
- Two-armies problem: consensus in asynchronous systems
- How does redundancy help achieve scalability and availability?
- State machine replication
- Process group
- Virtual synchrony
- Group view
- View change
- Group Membership Service
- State transfer
- stable vs unstable message
- flushing messages on a view change
Mutual exclusion
- Understand properties: safety, liveness, fairness
- Types: centralized, token-based, contention-based
- Central-server algorithm
- Distributed mutual exclusion algorithms:
- Token ring
- Lamport
- Ricart & Agrawala
- Understand pros and cons of each
Election algorithms
- Bully algorithm
- Ring algorithm
- Chang and Roberts ring algorithm optimizations (I won’t ask you to recite the algorithm but understand what it does)
- Network partition problem (split-brain/segmentation)
Raft Consensus
- Purpose of replicated state machines
- Understand consensus requirements
- Validity
- Uniform agreement
- Integrity
- Termination (= progress)
- Raft
- Goal: fault tolerant consensus algorithm
- Quorum
- States that a server may take
- Terms and term numbers
- Leader election, timeouts, heartbeat, split votes
- RPCs used
- Log matching, committing entries
Network Attached Storage
- File service, file server
- Service models: remote access vs. upload/downloa
- Consistency: sequential semantics, session semantics, ambiguous semantics
- Stateful vs. stateless design
- Caching considerations
- Case studies
NFS
- Goals
- Mounting filesystems
You do not need to know the automounter (we did not discuss it in class)
- UDP vs TCP transport
- Directory and file access protocol (don’t memorize the list of RPC calls but understand why they exist, what lookup does, why there’s no open)
- Caching
- Inconsistencies because of caching & validation
- Know that a user-level lock manager was added to NFS but otherwise don’t bother with features of successive versions.
- Don’t memorize features of NFS v4 but know:
- the protocol is now stateful
- supports better caching (similar to oplocks)
- compound RPC support added
AFS (versions 1,2)
- Goals
- Caching
- Service model, semantics
- Cache coherence (callbacks)
Coda
- Goals
- Replicated volumes
- Accessible Volume Storage Group
- Client modification log
- Resolution
- Disconnected operation
- Reintegration
SMB
- Goals
- High-level protocol (RPC-like)
- Caching model: oplocks/leases (don’t memorize oplocks or leases but know what their purpose is)
- Microsoft DFS: just know that it adds consistent naming, similar to AFS
SMB2 – understand these concepts that improved performance
- Pipelining
- Compounding
- Credit-based flow control
Distributed file systems
Chubby
- Purpose: lock manager, name server, and simple file system
- Architecture of a Chubby cell: master and replicas
- Named locks
- Event notification
- Whole file reads and writes
- Don’t try to memorize the API
- Client cache consistency
- Consensus for replication
What is a parallel file system?
GFS
- Goal
- chunkservers - what do they do?
- master - what does it do?
- Google cluster environment: colocate computation and data on same set of machines
- chunk handle
- operation log
- why large chunks?
- replication: data flow vs. control flow
HDFS
- HDFS is practically a clone of GFS. Focus on GFS terminology. If I use any HDFS terminology, I will identify the GFS equivalent
- NameNode = Master
- DataNode = Chunkserver
- blocks = chunks
- Transaction log = Operation log
- You do not have to know any of the differences between GFS and HDFS (they’re minor)
- Rack-aware distribution - client tries to read from the closest replica
Dropbox
- Goal
- How does server traffic load differ from mosts other sites that store data (facebook, twitter, …)?
- What’s a metadata server (metaserver)?
- Why was a notification server added? Polling vs. notifications
Distributed Lookup (Distributed Hash Tables)
- Purpose: distributed, highly available key-value store
- Central coordinator
- Flooding
- What is an overlay network?
- Back propagation, time to live
- Distributed hash table (DHT)
- What is consistent hashing?
- Content-Addressable Network
- Multi-dimensional hashes, zone, node insertion
- Chord
- Ring, successor nodes
- How do finger tables make query routing more efficient?
- Amazon Dynamo
- Understand the goal: distributed, highly available key-value store
- Functional interface: get, put
- Consistency model for replication (eventually consistent)
- Goals: incremental scalability, symmetry, decentralization, heterogenous systems
- Conflict resolution
- Who does it?
- Use of vector timestamps for conflict detection
- Partitioning among multiple nodes
- What is a virtual node and why are they a good idea?
Domain Name Service (DNS)
- IP addresses versus domain names
- what does the DNS server do?
- I will not ask you about Regional Internet Registries
- I will not ask you about the domain name registry, registry operator, or registar but know that the registrar stores name server addresses for registered domain names.
- How do multiple DNS servers together provide domain name service?
- What is a zone?
- What does a DNS resolver do?
- What are root name servers?
- Iterative vs. recursive name resolution
Distributed transactions
Properties of transactions (ACID)
Nested transactions: locks, private work spaces, write-ahead logs
Two-phase commit protocol
- Understand the need for a log
- What happens in each of the phases?
- Problems if coordinator or participants die
Three-phase commit protocol
- Precommit
- Coordinator recovery
- Understand the conditions of what happens to the protocol when the coordinator crashes (slide 33)
- Problems with 3PC
How are the transaction consensus goals different from consensus algorithms such as Raft?
Scaling via replication and Brewer’s CAP Theorem
- Consistency
- Availability
- Fault tolerance
Eventual consistency: BASE approach vs. ACID
Concurrency control
- Schedules, serial schedule
- Lock manager
- Two-phase locking (2PL)
- What is the purpose? Preserves serializability (“Isolated” in ACID)
- Cascading aborts
- Strong strict two-phase locking (SS2PL)
- Separating read and write locks
- Read locks (shared locks), write locks (exclusive locks), commit locks
- Two-version-based concurrency control
- Timestamp ordering
- Optimistic concurrency control: just the concepts
- Validation phase
- Update phase
- Use of timestamp ordering
- Multiversion Concurrency Control
- Snapshot isolation
- Use of timestamps
Distributed deadlock
- Conditions for deadlock, deadlock resolution
- Centralized detection
- Wait-for graph, Global wait-for graph
- Phantom deadlock (what’s meant by this?)
- Detection: Chandy-Misra-Haas edge chasing algorithm
- Just the high level: a probe message circulates back to the originator
- Prevention: wait-die, wound-wait
- Just the high level: cycles are prevented because transactions can only wait on older (or younger) resources.
- I will not ask you to remember which is wait-die (old can only wait on young) or which is wound-wait (old kills young)
Bigtable
- Tablets, column families, columns, column family versions
- Table splitting
- Master server, tablet servers
- metadata tablets
- Memtable, SSTable
- Role of chubby
- Consistency model
Cassandra
- Type of database, consistency guarantees, wide-column data store
- What is a NoSQL database:
- What is a column?
- How are timestamps used?
- What is a row?
- What is the purpose of a partition key?
- What is the purpose of a clustering key?
- You do not need to know composite keys or compound keys
- How do you locate the machine containing a desired row?
- You do not need to know the commit log
- You do not need to know Bloom filters
- How is replication handled? What is hinted handoff?
- How are conflicts resolved?
- You do not need to know the various consistency levels.
Google Spanner
- Goals of Spanner
- External consistency
- Techniques used: commit protocol, locking, concurrency control
- Snapshots
- TrueTime
- Time masters
- Commit wait
MapReduce
- Master vs. Worker
- shards
- Map worker
- Map function
- Role of key
- Intermediate file
- Partition operation
- Reduce worker
- Shuffle and Sort operations
- Reduce operation
- How is fault tolerance handled?
Bulk Synchronous Parallel & Pregel
- What is the Bulk Synchronous Parallel (BSP) framework?
- What is a superstep?
- What is a barrier? How does communication work in BSP?
- What is the purpose of Pregel?
- Understand that computing is done from the point of view of each vertex
- What does vote to halt mean?
- What is a combiner? How does it make messaging more efficient?
- What is an aggregator?
- What does it mean to modify the topology of a graph?
- What does a Pregel master do?
- What is checkpointing? What happens after the failure of a worker?
Spark
- Goals
- What are Resilient Distributed Datasets (RDDs)?
- Transformations and Actions (you don’t need to remember any of them; just know what they are)
- Cluster manager, Executor, Jobs and Tasks: only understand what they do at a really high level
- Fault tolerance via recomputation and lazy evaluation
- Spark streaming - what’s a DStream?
Kafka
- Queuing vs. publish-subscribe model
- Topics
- Partitioned logs
- Consumers and Producers
- Consumer groups
- Achieving scale
Content Delivery Networks
- Flash crowd problem
- Load balancing, multihoming, mirroring, caching proxies
- Goals: nearest, available, likely
- Use and advantages of an caching overlay network in a CDN
- Mapping system, transport system
- Mapping system and transport system
- Use of dynamic DNS for mapping
- Load shedding
- Pull vs. push CDNs
- Understand CDN benefits: caching, routing, security, analytics, cost
Peer-to-peer content distribution
- BitTorrent
- Goal
- Purpose of a .torrent file
- Role of seed node
- Role of tracker
- Seeders vs. leechers
- Swarm
Security
Goals of confidentiality, integrity, availability
Data, origin, system integrity
Uses for cryptography: confidentiality, authentication, integrity, nonrepudiation
Confidentiality
- Plaintext, ciphertext, cipher, encryption
- Symmetric ciphers
- Communication with symmetric key cryptography
- Public key cryptography: use of two keys
- Communication with public key cryptography
Key exchange - pre-shared keys, trusted third party, Diffie-Hellman, and public keys
- Use of public key cryptography for key exchange
- Diffie-Hellman (know what it is used for and that it relies on a one-way function)
- What is a Diffie-Hellman common key?
- Know that Diffie-Hellman public and private “keys” are not encryption keys
- Long-term (permanent) vs. Ephemeral (session) keys
Hybrid cryptosystem (understand the advantage and how key exchange works)
Integrity
- Understand properties of cryptographic hash functions
- How does a hash function provide a framework for integrity
- Message authentication codes versus digital signatures
Authentication
- Multi-factor authentication
- What are the three factors?
- Reusable passwords: password authentication protocol (PAP)
- Hashed passwords
- Brute-force and dictionary attacks
- Challenge handshake authentication protocol (CHAP)
- Use of a nonce for authentication
- Shared secret
- Time-based One-Time Password (TOTP) algorithm
- Basic concept: password is a function of a shared secret and time of day
- Public key authentication
- How does it work? Understand the use of a nonce (a random string)
- Digital certificates (don’t memorize fields but understand the purpose and how a certificate works)
- Role of a certification authority (CA)
- Passkey authentication
- Understand the principles of transport layer security
OAuth & OpenID Connect
- OAuth
- Purpose of OAuth vs. OpenID Connect
- Role of service provider and consumer
- Role of access tokens: an authorization code is sent back via a browser redirect. Since the browser may not be fully trusted, the server uses the code to contact the service provider to get look up the real permission grant and get the access token.
- OpenID Connect
- Role of an identity provider
- Use of HTTP redirection
- Interaction with OAuth
Clusters
- Definition, single system image
- Clustering types: high performance, high availability, load balancing. storage (just the goals)
- Cluster membership and quorum
- Purpose of quorum disk
- Storage in a clustere system: network file systems (NAS) vs. distributed file systems vs. clustered file systems
- Shared disk, shared nothing, distributed lock manager
- Storage Area Network vs. System Area Network vs. Local Area Network
- Cluster interconnect: understand there’s an overhead to leaving the rack and to leaving the datacenter
- What is a clustered file system? How does it differ from a network (or distributed) file system?
- What is a heartbeat network?
- Active/active vs. active/passive operation
- Failover types: cold, warm, hot; multidirectional, cascading
- IP address takeover (IPAT)
- What is fencing?