Public Key Cryptography and Integrity -- Study Guide

Key Distribution Problem

The key distribution problem is the challenge of sharing a secret key securely over an insecure channel. Public key cryptography was developed to solve this.

One-Way Functions

A one-way function is easy to compute in one direction but computationally infeasible to reverse.

Examples: computing g^x mod p is easy, but finding x (the discrete logarithm) is hard. The middle squares method, an early pseudorandom number generator, also illustrates the idea of one-way computation.

Trapdoor Function

A trapdoor function is a one-way function that becomes easy to invert if you know a secret value (the trapdoor). Public key cryptography relies on building trapdoor functions.

Public Key Cryptography

Public key cryptography uses a pair of keys:

The public key is shared and used for encryption or signature verification.
The private key is kept secret and used for decryption or creating signatures.

RSA

RSA is based on the difficulty of factoring large numbers that are the product of two primes.

Encryption and decryption use modular exponentiation: m^e mod n.

Elliptic Curve Cryptography (ECC)

ECC is based on the difficulty of solving discrete logarithms on elliptic curves.

ECC achieves the same security as RSA with shorter key sizes.

Why not use public key cryptography for everything?

Public key cryptography is used for key exchange and signatures, not bulk encryption because of:

Performance: far slower than symmetric ciphers.
Ciphertext expansion: ciphertexts larger than plaintexts.
Security: raw RSA/ECC is unsafe: mathematical relations from the plaintext are preserved.

Cryptographic Hash Functions

A cryptographic hash function maps input of arbitrary length to a fixed-size output. Its key properties are:

Fixed length: The output size does not depend on input size.
Preimage resistance: It is hard to find a message that hashes to a given value.
Second preimage resistance: It is hard to find another message with the same hash.
Collision resistance: It is hard to find any two messages that hash to the same value.
Avalanche effect: Small input changes produce large, unpredictable changes in output.

Common hash functions: SHA-1 (obsolete), MD5 (obsolete), SHA-2 (SHA-256, SHA-512), bcrypt (designed to be slow).

Hash Collisions

A hash collision occurs when two different inputs produce the same hash.

Pigeonhole principle: More possible inputs than outputs guarantees collisions.
Birthday paradox: collisions occur sooner than expected, around $\sqrt{N}$ attempts for an N-size output space.

Hash Collisions

A hash collision occurs when two different inputs produce the same hash output. Because hash functions compress large inputs into a fixed-size output, collisions are guaranteed in theory. The question is how likely they are in practice.

Pigeonhole principle: Since there are more possible inputs than outputs, some inputs must map to the same output. This is unavoidable for any fixed-length hash.
Birthday paradox: The probability of finding a collision grows faster than intuition suggests. Instead of needing about N trials to find a collision in an N-element space, it takes only about $\sqrt{N}$.

Cryptographic hashes use large output sizes, making collisions astronomically unlikely. With SHA-256 (256-bit output), a collision is expected after about $2^{128}$ attempts. $2^{128}$ is about $3.4 \times 10^{38}$, an unimaginably large value. To put this in perspective: the odds of winning the Powerball jackpot once are about 1 in $3 \times 10^{8}$. To match the improbability of finding a SHA-256 collision, you’d have to win the Powerball about five times in a row.

This shows why collisions are considered practically impossible for modern cryptographic hashes, unless the function itself is broken (as with MD5 and SHA-1, where researchers found methods to create collisions).

Message Authentication Code (MAC)

A Message Authentication Code (MAC) is a keyed hash: a function that takes both a message and a secret key as input. Only someone with the secret key can produce or verify the MAC.

This ensures integrity (the message has not been modified) and origin assurance (the message was generated by someone who knows the shared key). Unlike digital signatures, a MAC does not provide proof to outsiders of who created the message, since both sender and receiver share the same key.

Example: Alice and Bob share a secret key. Alice sends Bob the message PAY $100 and its MAC. Bob recomputes the MAC on the message using the shared key. If his result matches Alice’s, he knows the message was not altered and must have come from someone with the key.

Constructions of MACs:

HMAC: uses a cryptographic hash with the key mixed into the input.
CBC-MAC: uses a block cipher in Cipher Block Chaining mode to produce a keyed hash.
AEAD (Authenticated Encryption with Associated Data): combines encryption and integrity in one step, generating ciphertext and an authentication tag (similar to a MAC). GCM and Poly1305 are common AEAD integrity components.

Digital Signature

A digital signature is the public-key equivalent of a MAC, but it uses asymmetric keys instead of a shared secret.

Concept:
Signing: the sender creates a hash of the message and encrypts that hash with their private key.
Verification: the receiver decrypts the signature with the sender’s public key and compares it with a newly computed hash of the message. If they match, the message has not been altered and must have come from the claimed sender.
- Unlike MACs, digital signatures provide authentication and non-repudiation: only the holder of the private key could have generated the signature, and the sender cannot later deny it.

Example: Alice sends Bob the message TRANSFER $500 along with its signature. Bob verifies the signature using Alice’s public key. If it checks out, Bob knows the message is intact and was created by Alice.

Digital signature algorithms: In practice, specialized algorithms exist just for signing. Examples include DSA, ECDSA, and EdDSA. These algorithms are applied to the hash of the message rather than the message itself for efficiency.

MACs vs. Signatures

MACs: symmetric, efficient, no non-repudiation.
Signatures: asymmetric, slower, provide authentication and non-repudiation.

X.509 Certificates

An X.509 certificate (often just called a digital certificate) binds a public key to the identity of an individual, organization, or website.

A certificate contains information such as the subject’s name, the subject’s public key, the issuer’s identity, and a validity period.

A Certificate Authority (CA) digitally signs the certificate using its own private key. Anyone who trusts the CA can verify the certificate by checking that signature with the CA’s public key. Certificates provide assurance that a public key truly belongs to the entity it claims.

Example: When you visit https://example.com, your browser receives the site’s X.509 certificate. It verifies that the certificate was signed by a trusted CA. If valid, your browser knows the server’s public key really belongs to that website.

Certificates are essential in TLS, where they allow clients to verify the identity of servers before exchanging keys.

Diffie-Hellman Key Exchange (DHKE)

DHKE allows two parties to establish a shared secret over an insecure channel.

Each party uses a private key and a corresponding public value to derive the shared secret. The algorithm is based on the one-way function a^x mod p.

Elliptic Curve Diffie-Hellman (ECDH) uses elliptic curves but behaves the same way.

While general-purpose public key algorithms, such as RSA and ECC, can perform this function, DHKE is usually used because generating public and private keys is much faster and the algorithm is efficient in generating a common key.

Hybrid Cryptosystem

A hybrid cryptosystem combines public key cryptography with symmetric key cryptography. Public key methods are slow, so they are typically used only to exchange a session key. Once the session key is established, both sides switch to symmetric encryption (like AES) for the actual communication, which is much faster.

A session key is a temporary symmetric key that secures communication for the duration of one session. When the session ends, the session key is discarded. If an attacker records the traffic, they cannot decrypt it later without the specific session key that was used at the time.

Forward secrecy means that even if a long-term private key is stolen in the future, past communications remain secure. This works because each session uses its own independent session key. The long-term key is only used to authenticate or establish the exchange, not to encrypt bulk data.

An ephemeral key is a temporary public/private key pair generated for a single session (or even a single exchange). Ephemeral keys are often used in protocols like Ephemeral Diffie-Hellman (DHE). The shared secret derived from the ephemeral key exchange is used to create the session key.

Ephemeral keys and session keys are closely related: the ephemeral key exchange produces the session key. In some contexts, the term “ephemeral key” is used interchangeably with “session key,” but more precisely:

Ephemeral key: the temporary public/private key pair used during the key exchange.
Session key: the symmetric key that results from the exchange and is used to encrypt the actual communication.

Example: In TLS with Ephemeral Diffie-Hellman, the server and client each generate ephemeral keys for the exchange. From those, they derive a shared session key. The ephemeral keys are discarded after the session ends, ensuring forward secrecy.

Quantum Attacks

Quantum attacks use quantum algorithms that can break many current public key systems.

RSA is vulnerable because Shor’s algorithm can efficiently factor large integers.
Diffie-Hellman (DHKE) and Elliptic Curve Cryptography (ECC) are vulnerable because Shor’s algorithm can efficiently solve discrete logarithms, which are the basis of these systems. This includes protocols like ECDH and signature schemes like DSA, ECDSA, and EdDSA.
In short, any public key system based on integer factorization or discrete logarithms is broken by a large quantum computer.
Symmetric cryptography (AES, ChaCha20) and hash-based functions (SHA-2, SHA-3, HMAC) are not broken, but Grover’s algorithm can reduce their effective strength by half. Doubling key or hash sizes restores security.

New post-quantum algorithms have been recently developed (and continue to be developed) that are resistant to these quantum attacks.

Transport Layer Security (TLS)

TLS is the protocol that secures communication on the web. It combines several cryptographic tools:

Diffie-Hellman key exchange establishes a shared secret that becomes the basis for session keys.
X.509 certificates authenticate the server (and sometimes the client) by binding identities to public keys.
Public key cryptography verifies the certificates and authenticates the Diffie-Hellman exchange (the server signs its ephemeral D-H value with its private key, and the client checks it).
HMAC is used in the handshake to authenticate transcript messages and, through HKDF, to derive keys from the shared secret.
HKDF (HMAC-based Key Derivation Function) expands the Diffie-Hellman shared secret into multiple secure session keys (a different key for traffic in each direction).
Symmetric cryptography with AEAD (such as AES-GCM or ChaCha20-Poly1305) protects bulk data, providing both confidentiality and integrity for application traffic.