These notes cover two related problems in distributed systems: how to move data efficiently between services in real time, and how to deliver that data to millions of end users worldwide. The first half focuses on message queues and stream processing. The second half examines content delivery networks, peer-to-peer distribution, and the practice of executing code at the network edge.
Both halves are about the same underlying challenge: at scale, you cannot route everything through a single point. You need to distribute both the data and the work.
Part 1: Message Queues and Event Streaming
The Problem: Continuous Data Streams
Most distributed system designs treat services as request/response machines. A client asks for something, a server answers. That model works well for databases, file systems, and RPC-based services, but it breaks down when the data is continuous and high-volume.
Consider what LinkedIn was handling in the early 2010s: hundreds of millions of page views per day, user activity events, database change feeds, metrics from thousands of servers, and log data from every service. The volume was not the only challenge. The events needed to reach multiple other services, those services processed data at different rates, and the producers should not need to know anything about who was consuming their output.
The solution is a message broker: a service that sits between producers and consumers, stores messages durably, and lets each side operate at its own pace.
The Publish-Subscribe Model
In a publish-subscribe system, producers (also called publishers) write messages to the broker without knowing who will read them. Consumers (also called subscribers) read from the broker without knowing who wrote the messages. The broker decouples producers from consumers: neither side needs to know anything about the other, they do not need to be running at the same time, and they do not need to operate at the same speed.
The broker stores messages and makes them available to consumers, typically preserving arrival order. This is often described as a queue, and that is a useful starting point, but different systems implement this storage model very differently. RabbitMQ uses traditional queues that discard messages after consumption; Kafka uses a persistent partitioned log that consumers can re-read at any time.
Messages are organized by topic, which is a named category or feed. A producer sends messages to a topic. A consumer subscribes to one or more topics. Different consumers can subscribe to different topics, so a single broker can serve many independent data streams simultaneously.
Delivery Semantics
The vocabulary for describing how reliably a messaging system delivers messages is the same as for RPC, and the reasons are the same: network failures can cause duplicate delivery, and consumers need to handle that correctly.
At-most-once delivery means a message is sent once and not retried. If the consumer is unavailable or a failure occurs, the message is lost. This is the fastest option because no acknowledgment is needed, but it is only acceptable when occasional loss is tolerable: for example, metrics aggregation or low-stakes sensor readings.
At-least-once delivery means the system retries until it receives an acknowledgment, so the message is guaranteed to arrive but may arrive more than once. Consumers must be prepared to handle duplicates, either by ignoring them or by making processing idempotent (producing the same result whether the message is processed once or ten times).
Exactly-once delivery is the hardest guarantee and worth treating with care. In a distributed system, true exactly-once is often not achievable end-to-end without specific coordination between the source, the broker, and the destination. What most systems actually offer is better described as exactly-once effect: the observable outcome is as if each message were processed exactly once, achieved by combining at-least-once delivery with idempotent or transactional processing at the consumer. The guarantee depends on assumptions about the source, the broker, and the sink (the destination system that receives and stores the output) all cooperating. When a system advertises exactly-once semantics, the first question to ask is: under what assumptions, and at what cost?
RabbitMQ
RabbitMQ is a message broker built around the Advanced Message Queuing Protocol (AMQP). Its design places routing responsibility in the broker itself: the broker actively decides where each message goes based on rules you configure, rather than leaving that logic to consumers.
In RabbitMQ, producers do not write directly to queues. Instead, they publish messages to an exchange, which routes them to one or more queues based on rules. Queues connect to exchanges through bindings; each binding can carry a binding key that the exchange uses when deciding which messages to route to that queue.
There are several exchange types:
-
Direct exchange: routes a message to every queue whose binding key exactly matches the message’s routing key. Often used for selective routing or work-queue patterns.
-
Fanout exchange: broadcasts a message to every queue bound to the exchange, ignoring routing keys entirely. Used for broadcast scenarios.
-
Topic exchange: routes based on pattern matching against routing keys. Routing keys use dot-separated words (for example,
sensors.temperature.nyc). The wildcard*matches exactly one word in a position;#matches zero or more words. A subscriber tosensors.#receives all sensor messages regardless of location or type. -
Headers exchange: routes based on message header attributes rather than routing keys.
RabbitMQ supports configurable message durability: a producer can mark messages as persistent, and the broker writes them to disk so they survive a restart. Consumers send an acknowledgment back to the broker when they have successfully processed a message. Until the broker receives that acknowledgment, it will not remove the message from the queue. If the consumer crashes before acknowledging, the broker redelivers the message to another consumer.
The distinguishing characteristic of RabbitMQ is where routing logic lives: it is entirely in the broker. The broker knows about every exchange, every queue, and every binding rule, and it is responsible for delivering each message to the right destination. Once a consumer acknowledges a message, the broker removes it. This model works well for task queues, job distribution, and RPC-style patterns where you want the broker to handle routing and where each message is processed once and then discarded.
RabbitMQ’s weakness is scale. It is well-suited to flexible routing and work-queue patterns, but it does not have the same scale-out model as Kafka: it cannot distribute a topic’s workload across many servers the way Kafka’s partitioned log does. And because messages are deleted after consumption, RabbitMQ cannot replay history.
Apache Kafka: The Log as an Architectural Primitive
Kafka was developed at LinkedIn and released as open source. It is now one of the most widely deployed distributed systems in production.
The central insight behind Kafka is that a log is the right abstraction for a distributed system. A log is an append-only, totally ordered sequence of records. Every database uses a transaction log internally to record changes. Most modern file systems use a journal. Kafka exposes this log as a standalone, distributed, replicated service that any application can write to and read from.
If every service writes its state changes to a log and every other service reads from it, you have a universal integration point. You can replay events to rebuild state. You can add new consumers without changing producers. You can reconstruct what happened at any point in time. This architectural approach is called event sourcing.
Topics and Partitions
Messages in Kafka are organized into topics. Each topic is a named log. Unlike a traditional queue, Kafka does not delete messages when they are consumed. Messages persist for a configurable retention period (time-based or size-based), and any consumer can read any part of the log at any time.
A topic is divided into partitions. A partition is a log: an ordered sequence of records that grows only by appending. Once a record is written to a partition, it is never modified or overwritten. Each record is assigned a sequential integer, called an offset, that uniquely identifies its position in the partition.
One important detail: total ordering in Kafka is per-partition, not per-topic. A topic with multiple partitions does not guarantee ordering across partitions. If you need all events for a particular entity to be processed in order (say, all transactions for a given bank account), you route them all to the same partition by specifying the account ID as the partition key.
Partitioning is how Kafka achieves scale. Each partition can live on a different broker (server) in the Kafka cluster. A topic with 100 partitions can distribute its data across 100 servers. Producers can choose which partition to write to. The default is round-robin, which distributes load evenly. Alternatively, producers can hash a key to route all records with the same key to the same partition, which guarantees ordering for records that share a key (for example, all events for a particular user ID).
Consumer Groups and Dual Delivery Models
A consumer group is a set of consumers that collectively consume a topic. Each partition is assigned to exactly one consumer in the group at a time. Messages in that partition are delivered to that consumer only.
This gives you a queuing model: work is distributed among the consumers in the group. If you have ten partitions and five consumers in a group, each consumer handles two partitions. Add more consumers to scale out processing. The rule is that you cannot have more active consumers in a group than there are partitions, because some consumers would have nothing to read.
Now suppose a second independent consumer group subscribes to the same topic. Each group gets its own complete copy of every message. This gives you a publish-subscribe model: many independent applications can all read the same stream of events.
Kafka unifies both models. Within a group, messages are distributed (queuing). Across groups, messages are broadcast (pub-sub). You can have dozens of groups all reading the same topic independently without any coordination or configuration changes on the producer side.
Each consumer tracks its own position in the log by recording its current offset for each partition. This means consumers can replay the log from any earlier point: to reprocess events after a bug fix, to initialize a new service with historical data, or simply to re-examine what happened at a specific time.
Fault Tolerance and Replication
Each partition has a leader and zero or more followers, assigned across different brokers. The leader handles all reads and writes. Followers replicate the data from the leader. If the leader fails, one of the followers is elected as the new leader.
A producer has a choice of durability guarantees, controlled by the acks configuration:
-
acks=0: fire and forget. No acknowledgment. Maximum throughput, no durability guarantee. -
acks=1: the leader acknowledges when it has written the message. Fast but the message can be lost if the leader dies before replication. -
acks=all: the leader acknowledges only after all in-sync replicas (followers that are current with the leader) have written the message. Strongest durability guarantee; a message survives any single-broker failure.
Why Kafka Is Fast Despite Writing to Disk
A common objection to Kafka’s design is that writing everything to disk must be slow. This objection misunderstands disk performance. The key is the difference between sequential and random I/O.
Random disk access requires seeking the read/write head to a new position, which takes milliseconds on a spinning disk. Sequential access involves reading or writing a continuous stream of data, which can be orders of magnitude faster. SSDs are faster but still show a significant performance gap between random and sequential access.
Kafka is designed entirely around sequential I/O. Partitions are append-only logs; writes always go to the end. Reads follow the log in order. Kafka also exploits the OS page cache heavily: modern operating systems cache recently read disk pages in memory, and sequential reads hit the page cache at memory speeds.
The result is that a single Kafka broker can handle millions of messages per second. The sequential I/O design was a deliberate architectural choice, and it is a large part of why Kafka outperforms systems that tried to do the same thing with random-access storage.
Log Compaction
Retention by time or size works well for event streams, but some use cases need a different policy. Consider a topic that tracks the current location of each delivery vehicle. You do not need the full history. Only the most recent location per vehicle matters. Kafka supports log compaction for exactly this case: rather than deleting old messages by age, Kafka retains only the most recent message for each key, removing older records with the same key during background compaction. The result is a compacted log that always contains the latest value per key, no matter how old it is. This makes Kafka useful not just as a transient event pipe but as a durable store of current state.
Messaging Systems Compared
| System | Model | Strengths | Weaknesses |
|---|---|---|---|
| Kafka | Durable partitioned log | Replay, huge scale, decoupling | Not a processing framework by itself |
| RabbitMQ | Broker-routed queues | Flexible broker-side routing | No replay, does not distribute workload |
Stream Processing: Making Sense of Data in Motion
A message queue like Kafka moves data. Stream processing frameworks transform, filter, aggregate, and analyze that data as it flows. Batch processing systems like MapReduce process bounded datasets: you have all the data, you run a job, you get results. Stream processing operates on unbounded datasets where new data arrives continuously and you need results in near real-time.
Backpressure
Backpressure is the problem that arises when a producer generates data faster than a consumer or downstream system can handle it. If nothing intervenes, the queue grows without bound until memory or disk is exhausted. Systems address this in three main ways:
-
Buffering: absorb bursts in a queue and process them when capacity allows. This is what Kafka’s partitioned log does naturally.
-
Dropping: discard messages when the buffer is full. Acceptable only when occasional loss is tolerable.
-
Slowing the producer: the consumer or broker sends an explicit signal back to the producer to reduce its rate. This is the strict sense of backpressure and is common in reactive stream frameworks.
Understanding these options matters because ignoring the problem produces systems that work fine under normal load and collapse under peak load.
Event Time vs. Processing Time
The fundamental complication in stream processing is that events have two timestamps: the time the event occurred (event time) and the time the processing system received it (processing time). These are often different. A mobile app might record events while the device is offline and deliver a batch when connectivity is restored. A network blip might delay sensor data by ten seconds. Events can arrive out of order.
If you want to count how many users clicked a button during each five-minute window, should you assign events to windows based on when they happened or when you received them? Using processing time is easy to implement but produces incorrect results when data is delayed. Using event time produces correct results but requires you to decide how long to wait for late arrivals before closing a window.
Windows
A window defines how to group events for aggregation. The three main window types are:
-
Tumbling window: fixed-size, non-overlapping intervals. All events from 12:00 to 12:05 form one window, 12:05 to 12:10 form the next. Each event belongs to exactly one window.
-
Sliding window: fixed size but slides by a configurable step. A 10-minute window that slides every 2 minutes produces overlapping windows. Events near the boundaries appear in multiple windows.
-
Session window: groups events by gaps in activity rather than by fixed time. If a user is active and then goes quiet for more than five minutes, that gap closes the session window. Session sizes vary because user behavior varies.
Watermarks
A watermark is the system’s estimate of how far event time has progressed. Events with timestamps earlier than the watermark are considered unlikely to still arrive, and the system uses the watermark to decide when to close a window and emit results.
A stream processor typically derives the watermark by taking the maximum event timestamp seen so far and subtracting a configured lag. For example, if the maximum event timestamp seen is 12:10 and the configured lag is 10 minutes, the current watermark is 12:00. This means the system will no longer wait for events with timestamps before 12:00. When the watermark advances past the end of a window (say, past 12:05 for a window covering 12:00–12:05), the window closes and its results are emitted.
Setting the lag too small produces incorrect results because events that arrive more than a few minutes late are dropped or handled separately. Setting it too large increases result latency and the amount of state the system must hold in memory. The right watermark policy depends on how late your data actually arrives.
Apache Spark Structured Streaming
Spark Structured Streaming extends the Spark batch processing model to streaming. The central abstraction is the unbounded table: treat the incoming stream as a table that grows over time. You write a query against that table using the same DataFrame or SQL API you would use for batch data. Spark internally handles the continuous execution.
Under the hood, Spark Structured Streaming uses a micro-batch model. Rather than processing one event at a time, it collects events into small batches (by default, triggered as fast as possible or at a configured interval) and runs a batch job on each micro-batch. This amortizes scheduling overhead and achieves high throughput. The tradeoff is latency: results are delayed by the micro-batch interval, typically milliseconds to seconds.
Structured Streaming provides event-time windowing and watermarks. You declare a window over an event-time column and specify a watermark to handle late data. Between micro-batches, Spark maintains intermediate results in memory so that running totals and partial aggregates survive across triggers.
Output modes control how results are written:
-
Append mode: only newly completed rows are written to the output. Suitable for windows that do not change once closed.
-
Complete mode: the entire result table is re-written on each trigger. Suitable for aggregations where you always want the current total.
-
Update mode: only rows that changed since the last trigger are written.
Exactly-once semantics require two things working together: checkpointing and idempotent output. Spark saves its progress (offsets and state) to a durable location periodically via checkpointing. If a node fails, Spark replays from the last checkpoint, guaranteeing no events are skipped (at-least-once). When the data source supports offset-based replay (Kafka does) and the sink supports idempotent writes or transactional commits, the two together can provide exactly-once semantics end-to-end, but only under those specific conditions.
Structured Streaming’s main advantage is that you write one codebase that runs on both batch and streaming data. If you already know Spark, there is minimal added complexity. The micro-batch model does introduce latency that makes Spark less suitable for use cases that require millisecond response times.
Apache Flink
Spark’s micro-batch model means there is always some latency between an event arriving and a result being emitted. Apache Flink takes a different approach: continuous record-at-a-time processing.
When an event arrives, Flink processes it immediately through a pipeline of operators: functions that filter, transform, aggregate, or join streams. Each operator maintains its own state, and Flink periodically checkpoints that state to durable storage so the pipeline can recover from failures without replaying the entire history.
This makes Flink better suited for applications requiring sub-second response times or precise event-time handling. The tradeoff is operational complexity. Flink has historically been harder to deploy and tune than Spark, though managed cloud offerings have narrowed that gap. Earlier systems like Apache Storm pioneered continuous stream processing but provided at-most-once delivery by default and lacked the event-time and windowing models that Spark and Flink provide.
Stream Processing Systems Compared
| System | Model | Strengths | Weaknesses |
|---|---|---|---|
| Spark Structured Streaming | Micro-batch | Unified batch/stream API, familiar | Higher latency, weaker event-time handling |
| Flink | Record-at-a-time | Low latency, precise event-time model | Higher operational complexity |
Part 2: Content Delivery Networks
The Flash Crowd Problem
In the early days of the web, serving content from a single server was acceptable because traffic was modest (many sites still work this way, like pk.org, which runs off a single Raspberry Pi). As the web grew, a new failure mode emerged: if a popular site published something newsworthy or ran a major software update, millions of users would simultaneously request the same files. The server would be overwhelmed, connections would time out, and the site would effectively go offline for legitimate users. This was called the flash crowd problem.
Apple releasing an iOS update to a billion devices illustrates the problem at its most extreme. Delivering five gigabytes to a billion phones from a single location is physically impossible regardless of server capacity, because the bandwidth requirements would exceed the capacity of any single building’s network connection by orders of magnitude.
The solution is to distribute content across many servers located throughout the world, close to the users who need it.
Approaches Before CDNs
Before content delivery networks existed, operators tried several techniques to handle load:
Browser caching lets a client store recently fetched resources locally. Subsequent requests for the same resource can be served from the local cache, sparing the server. The limitation is that the cache is private: one user’s browser cache does not help any other user.
Caching proxies let an organization place a shared proxy between its users and the internet. When one user fetches a resource, the proxy stores it. If another user at the same organization requests the same resource shortly afterward, the proxy serves it locally. This improves cache hit rates for common resources but only helps users sharing the same proxy.
Load balancing distributes requests across multiple servers behind a single address. This increases server capacity but does not address geographic latency: a user in Tokyo still has to reach servers in New Jersey. It also does nothing for network bandwidth or flash crowds unless you have essentially unlimited server capacity and bandwidth.
Mirroring replicates servers at multiple geographic locations. Users can be directed to the nearest mirror. The difficulty is keeping mirrors synchronized. Any content that changes on the origin must be propagated to every mirror, and synchronization lag means mirrors may briefly serve stale content.
None of these approaches fully solves the flash crowd problem at a global scale.
Large organizations like Google combine load balancing and mirroring: they operate multiple data centers distributed around the world, each with many servers behind a load balancer, so that requests are served from a nearby data center, and no single server is overwhelmed. This works well if you can afford to build and operate global data center infrastructure. For most organizations, the capital and operational costs are prohibitive. CDNs emerged as the solution for everyone else: a shared, globally distributed caching layer that any organization can rent.
Content Delivery Networks
A content delivery network distributes cached copies of content across hundreds or thousands of servers located near the users who request it. When a user requests a file, the CDN serves it from a nearby edge server rather than from the origin. The origin only needs to handle the initial fetch by the CDN and requests for content that changes too frequently to cache.
CDNs offload the static portions of a site, which are often the bulk of the bytes: images, video, CSS files, JavaScript libraries, and downloadable files. The application logic, database queries, and user-specific responses still run on the origin’s servers, but they constitute a small fraction of total traffic.
Push vs. Pull CDNs
Push CDNs require the content provider to explicitly upload content to the CDN’s storage nodes. The provider controls what is distributed and when. This is appropriate for large files (software packages, video assets) where you want to pre-position content before demand hits.
Pull CDNs fetch content from the origin on demand. When a user requests a file that is not in the CDN’s cache, the CDN fetches it from the origin, caches it, and serves it to the user. Subsequent requests for the same file are served from the cache without touching the origin. This is simpler to operate and works well for web assets.
CDN Structure: Edge Servers, Parent Servers, and the Origin
A CDN organizes its infrastructure into tiers. Edge servers are located close to users, often inside ISPs or at internet exchange points. Parent servers sit above the edge tier and cache content that edge servers may not carry to reduce load on the origin. The origin server is the content provider’s actual infrastructure.
When an edge server receives a request it cannot serve from its cache, it queries other edge servers in its region. If they cannot help, it asks its parent server. If the parent does not have the content either, it checks peer parent servers in other regions. Only if all of these fail does the request reach the origin. This tiered distribution dramatically reduces the load on the origin while keeping latency low for users.
CDN Providers
The CDN industry grew directly out of academic research on the flash crowd problem. Akamai was founded in 1998 by researchers at MIT who were working on exactly that problem: how to distribute web content at scale so that no single server could become a bottleneck. Akamai built a global network of caching servers deployed inside ISPs worldwide and sold access to that network as a service. For most of the 2000s it was the dominant CDN, and it remains one of the largest today.
As the market matured, a range of competitors emerged.
-
Cloudflare built its CDN around anycast routing with a strong focus on security, and has grown into one of the most widely used CDN and network services providers in the world.
-
Amazon CloudFront is deeply integrated with AWS and is the default CDN choice for applications already running on Amazon’s infrastructure.
-
Fastly targets developers who need fine-grained control and low-latency edge logic.
-
The major cloud providers (Google, Microsoft, and others) all operate their own CDN infrastructure for their own services and offer CDN products to customers.
For most companies, running a CDN from scratch is impractical: the capital cost of deploying thousands of servers inside ISPs worldwide is enormous, and the ongoing relationships with ISPs take years to establish. CDNs are infrastructure that is far more cost-effective to rent than to build. What follows uses Akamai as the primary example because its DNS-based architecture clearly illustrates the key mechanisms, but the concepts apply across providers.
DNS-Based Request Routing
The core mechanism that directs users to the nearest edge server is DNS redirection. The content provider sets up a CNAME record in DNS that points their domain to an Akamai (or other CDN) domain. When a user’s browser resolves the domain name, the request reaches the CDN’s dynamic DNS servers, which return the IP address of an appropriate edge server rather than a fixed address.
Here is a real example using Staples as the content provider:
www.staples.com. IN CNAME www.staples.com.edgekey.net.
www.staples.com.edgekey.net. IN CNAME e6155.a.akamaiedge.net.
e6155.a.akamaiedge.net. IN A 23.201.180.183
The user’s browser thinks it is connecting to www.staples.com. It is actually connecting to an Akamai edge server. The CDN’s dynamic DNS server chose 23.201.180.183 based on the user’s location, the health and load of nearby edge servers, and current network conditions. The goal is to select an edge server inside or topologically close to the user’s own ISP to minimize network hops. A user in London and a user in Los Angeles resolve the same hostname to different IP addresses.
The CDN continuously monitors its servers and the network between them, publishing health and load information to the DNS infrastructure. If a server becomes overloaded, the DNS server stops returning its address. If a server fails, it is removed from rotation within seconds.
TTL (time-to-live) values on CDN DNS responses are deliberately short, often 30 seconds for the final resolution. This allows the CDN to shift traffic rapidly in response to changing conditions or failures.
Akamai’s Mapping System
Akamai’s edge server selection goes beyond simple geographic proximity. Its mapping system factors in:
-
User location: derived from the IP address of the DNS resolver.
-
Network topology: based on BGP routing tables and traceroute measurements to estimate the actual number of hops and transit time between locations.
-
Server load: edge servers report their current load to the monitoring infrastructure.
-
Server health: servers that fail health checks are excluded from DNS responses.
-
Network performance: the system tracks measured latency and packet loss between regions.
The goal is to return an IP address for an edge server that is close (low round-trip time), available (not overloaded), and likely to have the content cached.
The Multi-Tier Content Lookup
Once the user’s browser has an IP address for an edge server, it sends a standard HTTP request. The lookup proceeds as follows:
-
The edge server checks its local cache. If found, it serves the content immediately.
-
If not found, the edge server queries other edge servers in the same region.
-
If still not found, the edge server asks its parent server.
-
The parent server checks its cache and, if needed, queries its peer parent servers.
-
As a last resort, the parent server fetches the content from the origin through the CDN’s internal routing network.
This tiered approach means that popular content is served entirely from edge caches. The origin sees only the first request for each piece of content from each region, not the millions of requests that follow.
Caching: What Gets Cached and For How Long
Not all content behaves the same way in a cache.
Static content (images, CSS, JavaScript libraries, downloadable files) changes infrequently. The content provider sets Cache-Control headers indicating how long the CDN should retain each asset. An image that never changes might be cached for a year. A JavaScript file that is updated on each deployment might be cached for a day and versioned by URL so that changes invalidate cached copies automatically.
HTTP Cache-Control header directives relevant to CDNs include:
-
max-age: how many seconds the response can remain in a cache. -
no-store: do not cache this response under any circumstances. -
no-cache: store it, but revalidate with the origin on each use. (Despite the name, this does not mean “do not cache.”) -
public: any shared cache (including CDN edge servers) may cache this response. -
private: only the user’s browser may cache this; CDN must not.
Dynamic content (pages that are assembled per-user or updated frequently) is harder to cache. Akamai addresses partially dynamic content with Edge Side Includes (ESI), a markup language that breaks a page into fragments with independent caching rules. A news page might have a masthead that never changes (cache for a week), a headline list that changes hourly (cache for an hour), and a personalized sidebar that must not be cached at all. ESI assembles these fragments at the edge, reducing origin traffic while still providing personalized responses.
Streaming Video and CDN Scale
Streaming video is a special case that has driven more CDN innovation than any other content type, and it illustrates the scale of the delivery problem better than any static example.
Netflix and Amazon Prime Video each have hundreds of millions of subscribers worldwide, generating continuous high-volume traffic around the clock. Unlike a software update that hits in a short burst, video streaming is a sustained load: a subscriber watching a two-hour film is making thousands of individual HTTP requests for small video segments, each of which must be served with low latency to avoid buffering. Multiply this by hundreds of millions of simultaneous viewers, and the bandwidth requirements become staggering.
Certain live events create a stronger version of the flash crowd problem.
-
The 2024 Paris Olympics generated 23.5 billion minutes of streaming on NBCUniversal platforms alone, with Peacock running up to 60 simultaneous live event streams and 300 live events in a single day.
-
The 2022 FIFA World Cup final drew approximately 1.5 billion viewers worldwide across broadcast and streaming platforms combined.
-
The 2026 World Cup, hosted across the United States, Canada, and Mexico starting June 11, is projected to draw over 1.6 billion viewers for the final, with more than 5 billion engaging with the tournament in some form over its 104 matches.
The record for peak concurrent viewers on a single streaming platform is approximately 59 million, set during the 2023 Cricket World Cup final on JioCinema. These numbers cannot be achieved without a global CDN infrastructure.
HTTP Live Streaming, MPEG-DASH, and Adaptive Bitrate
The segment-based delivery model used by modern streaming services comes in two main variants.
-
HTTP Live Streaming (HLS) was developed by Apple and remains the standard on Apple devices; iOS Safari does not support alternatives natively.
-
MPEG-DASH (Dynamic Adaptive Streaming over HTTP) is the open international standard developed by MPEG as a codec-agnostic alternative not controlled by any single company. Netflix, YouTube, Amazon Prime Video, Disney+, and Hulu all use MPEG-DASH as their primary protocol on Android, desktop browsers, and smart TVs, falling back to HLS on Apple devices.
From a CDN perspective, the two are architecturally identical: both break video into short segments stored as regular HTTP files, both use a manifest file to list available segments and bitrates, and both serve those segments over standard HTTP so any CDN edge server can cache and deliver them without special support.
Adaptive bitrate (ABR) transcoding extends this. The CDN (or the content provider before uploading to the CDN) encodes each segment at multiple bitrates and resolutions. The player monitors its download speed and buffer level, then requests the next segment at the bitrate it can sustain. If network conditions deteriorate, the player automatically steps down to a lower quality. If conditions improve, it steps back up. This is why streaming video degrades gracefully on a slow connection rather than stalling.
For live video, the stream arrives at a CDN entry point, which distributes it to parent servers in multiple regions, which distribute it to edge servers, which serve the segments to users. The live nature means segments cannot be pre-cached; they arrive just ahead of the demand.
There is a rough analogy to IP multicast here. The CDN entry point is like a source, and the parent-to-edge distribution hierarchy is like a multicast delivery tree: the goal in both cases is to avoid sending separate copies of the same live stream across every upstream link. The difference is that Internet video is usually not delivered with end-to-end IP multicast. Instead, modern systems use HTTP-based segment delivery over CDNs, which works over existing web infrastructure. IP multicast is still useful in managed environments such as IPTV, but it is not the general model for public Internet streaming.
CDN Routing: The Overlay Network
When an edge server needs to fetch content from the origin, the path across the public internet is not guaranteed to be optimal. The internet routes traffic based on commercial agreements between ISPs, not based on measured performance. A packet might traverse a suboptimal path because of a peering relationship, not because it is the fastest route.
CDNs address this with an overlay network: an application-level network built on top of the internet. All CDN nodes know about each other. Each node periodically measures latency and packet loss to its peers. When an edge server needs to reach the origin, it consults this performance map and selects the best path through CDN-owned intermediate nodes, bypassing suboptimal public internet routes.
Persistent TCP connections are maintained between nodes in the overlay to avoid reconnection overhead. This also reduces the latency impact of TLS handshakes, since the connection is already established when a request arrives.
Anycast Routing: The Modern Alternative
Akamai’s DNS-based approach has a limitation: routing decisions are made at DNS resolution time, before the browser makes any connection. If conditions change after the DNS response is cached, the user may end up connected to a now-suboptimal server.
Anycast is a different approach, used extensively by Cloudflare and increasingly by other CDN providers. In anycast, many servers worldwide share the same IP address. BGP routing, the protocol that governs how traffic flows between autonomous networks on the internet, naturally routes a packet sent to that IP address to the nearest server advertising that address: specifically, the one reachable via the shortest BGP path. If the nearest server goes down, BGP automatically updates its routing tables and reroutes traffic to the next closest server.
Anycast eliminates the DNS indirection step. The browser connects to the CDN’s IP address directly, and the routing fabric ensures that connection reaches the nearest available node. It also handles failover faster than DNS-based approaches, since BGP routing updates propagate in seconds whereas DNS TTLs can hold stale entries for tens of seconds or longer.
One caveat: anycast works well for short-lived TCP connections because BGP routing is consistent for a given source/destination IP pair, so all packets in a single connection naturally reach the same server. Short HTTP and HTTPS requests work fine. The concern arises with long-lived connections (such as persistent WebSocket sessions): if a BGP routing update occurs mid-connection and reroutes packets to a different node, the connection fails because the new node has no state for the existing session.
DNS-based routing and anycast are not mutually exclusive. Many CDN providers use both: anycast to route users to the nearest point-of-presence, and DNS-based mapping for finer-grained selection within that region or to direct traffic to specialized infrastructure. The distinction between “Akamai uses DNS” and “Cloudflare uses anycast” is a useful contrast, but most large CDNs combine elements of both.
Security Benefits of CDNs
CDNs provide a substantial security benefit that is not obvious from their caching function. By sitting between the internet and the origin server, the CDN absorbs and filters traffic before it reaches the origin.
A distributed denial-of-service (DDoS) attack floods a target with traffic to exhaust its bandwidth or processing capacity. An origin server with a 10 Gbps link is helpless against an attack sending 500 Gbps. A CDN with hundreds of thousands of servers across hundreds of ISPs can absorb and scrub attack traffic at volumes no single organization could defend against. The origin server’s address is generally not public; it is hidden behind the CDN. Attackers cannot reach it directly.
CDNs also typically offer web application firewall (WAF) functionality, bot detection, and TLS termination at the edge. Terminating TLS at an edge server close to the user reduces handshake latency and offloads cryptographic work from the origin.
Peer-to-Peer Content Delivery: BitTorrent
CDNs solve the flash crowd problem by distributing content to many servers and serving users from nearby ones. BitTorrent solves the same problem with a radically different architecture: every downloader becomes an uploader.
BitTorrent is a protocol for peer-to-peer file distribution. A content provider creates a .torrent file that contains metadata about the file being distributed: its name, total size, a hash of each piece (a fixed-size chunk, typically 256 KB to 1 MB), and the address of a tracker. The tracker is a server that maintains a list of peers currently downloading or seeding (uploading) a given file. The full set of peers participating in the distribution of a particular file is called a swarm.
When a new peer joins a swarm:
-
It contacts the tracker (or uses DHT-based peer discovery, described below) to get a list of other peers.
-
It connects to several peers and exchanges information about which pieces each peer currently has.
-
It begins downloading pieces, prioritizing the rarest first: pieces that few peers currently have. This strategy ensures rare pieces are quickly distributed and prevents bottlenecks where many peers are waiting for the same piece.
-
As it downloads pieces, it simultaneously uploads them to other peers requesting those same pieces.
The key scaling property is that demand also adds supply: every peer that downloads a piece immediately becomes a source for that piece, uploading it to others who need it. A CDN must provision servers in advance to handle peak demand. In a BitTorrent swarm, supply grows automatically as more peers join.
BitTorrent and DHTs
The original BitTorrent design depended on trackers, which were single points of failure. Modern BitTorrent implementations use a distributed hash table (DHT) for trackerless peer discovery, eliminating the need for a central server.
In a DHT, every participating node is assigned a random node ID, and the DHT is organized so that each node is responsible for storing information about files whose hashes are numerically close to its own node ID. This is the same consistent-hashing idea used in Chord: the key space wraps around like a ring, and each node owns the segment of keys closest to it.
To find peers for a given file, a client hashes the file’s identifier to get a key, then routes a lookup through the DHT. Each hop brings the query closer to the node responsible for that key, and that node returns a list of peers currently downloading or seeding the file. The routing takes O(log n) hops for a network of n nodes.
The bootstrapping question (how does a brand-new client find any DHT node to start with?) is solved pragmatically. BitTorrent clients ship with a hardcoded list of well-known, long-running DHT nodes. Once a client contacts one of these nodes and receives a response, it learns about neighbors and gradually builds its own routing table. There is no single point of failure: as long as any reachable DHT node exists, a new client can join. Once in the DHT, a client maintains connections to a set of neighbors and periodically refreshes its routing table as nodes come and go.
This makes the swarm fully decentralized. As long as at least one peer with the complete file remains reachable somewhere in the DHT, the file remains discoverable and available.
BitTorrent’s Limitations
BitTorrent is not well-suited for streaming. Because pieces are downloaded out of order (rarest first), the beginning of a file may not be available before the end. Sequential playback requires modifications to the standard protocol that prioritize pieces, which streaming clients implement but that adds complexity.
BitTorrent also requires that downloaders have reasonable upload bandwidth, which is less common in asymmetric residential broadband.
Large software distributors have used BitTorrent for bulk distribution. Canonical, for example, officially distributes Ubuntu images via BitTorrent, and CCP used BitTorrent-based transport in the EVE launcher. These deployments are usually managed or hybrid rather than fully open public swarms.
The comparison to CDNs highlights a fundamental design tradeoff: CDNs are centrally operated, commercially available, and predictably performant. BitTorrent is decentralized, requires no infrastructure investment, and scales automatically but is harder to control, less suited to streaming, and depends on community participation.
Edge Computing
CDNs distribute content. Edge computing extends that idea to distribute computation. Instead of the user making a round-trip to a central data center, edge computing runs logic on the CDN node that is already nearby. The motivation is latency: an edge node might be ten milliseconds from a user while a data center is one hundred milliseconds away.
Cloudflare Workers is a widely used example. Workers run JavaScript inside V8 isolates (V8 is Chrome’s JavaScript engine), lightweight sandboxes that start in microseconds, far faster than containers or virtual machines. Requests execute inside isolates, which provide memory isolation between concurrent workers without the overhead of separate processes. Workers can handle tasks like authentication checks, request routing, A/B testing, personalization, and image transformation at the edge. In some cases they return a response directly without contacting the origin at all; in others they modify the request before forwarding it.
The architectural limit is state. Edge nodes can call remote databases, but a round-trip to a central database from an edge node erases much of the latency benefit. Cloudflare addresses this with edge-local storage (a key-value store and Durable Objects for strongly consistent coordination), but complex transactional logic is still better handled at the origin. Edge compute is a complement to the origin server, not a replacement: static content lives in CDN caches, simple dynamic logic runs at the edge, and stateful operations remain centralized.
Summary
The two halves of these notes address the same underlying problem from different angles.
Message queues and stream processing solve the problem of getting data from many producers to many consumers efficiently and reliably. RabbitMQ gives you a broker that actively routes messages to the right queues based on rules you configure, and discards them after consumption. Kafka’s partitioned log gives you durability, replay, and massive scale. Spark Structured Streaming and Flink sit on top of these systems and give you the tools to aggregate and analyze data as it flows, with control over time semantics, windows, and state.
CDNs solve the problem of delivering content to users worldwide at low latency and high availability. The architecture of tiered caching, DNS-based routing, and overlay networks is a direct application of the caching, replication, and routing concepts covered earlier, applied at global scale. BitTorrent shows that the same problem can be addressed with a fully decentralized peer-to-peer architecture. Edge computing is the natural next step: once you have compute resources distributed globally close to users, it makes sense to run some of the application logic there rather than just serving static files.