Final Exam Study Guide

The three-hour study guide for the final exam

Paul Krzyzanowski

December 2024

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the three hour time window in the title literally.

Last update: Sat Dec 7 23:49:52 EST 2024

Introduction

Computer security is about keeping computers, their programs, and the data they manage “safe.” Specifically, this means safeguarding three areas: confidentiality, integrity, and availability. These three are known as the CIA Triad (no relation to the Central Intelligence Agency).

Confidentiality
Confidentiality means that we do not make a system’s data and its resources (the devices it connects to and its ability to run programs) available to everyone. Only authorized people and processes should have access. Privacy specifies limits on what information can be shared with others while confidentiality provides a means to block access to such information. Privacy is a reason for confidentiality. Someone being able to access a protected file containing your medical records without proper access rights is a violation of confidentiality.
Integrity

Integrity refers to the trustworthiness of a system. This means that everything is as you expect it to be: users are not imposters and processes are running correctly.

  • Data integrity means that the data in a system has not been corrupted.

  • Origin integrity means that the person or system sending a message or creating a file truly is that person and not an imposter.

  • Recipient integrity means that the person or system receiving a message truly is that person and not an imposter.

  • System integrity means that the entire computing system is working properly; that it has not been damaged or subverted. Processes are running the way they are supposed to.

Maintaining integrity means not just defending against intruders that want to modify a program or masquerade as others. It also means protecting the system against against accidental damage, such as from user or programmer errors.
Availability
Availability means that the system is available for use and performs properly. A denial of service (DoS) attack may not steal data or damage any files but may cause a system to become unresponsive.

Security is difficult. Software is incredibly complex. Large systems may comprise tens or hundreds of millions of lines of code. Systems as a whole are also complex. We may have a mix of cloud and local resources, third-party libraries, and multiple administrators. If security was easy, we would not have massive security breaches year after year. Microsoft wouldn’t have monthly security updates. There are no magic solutions … but there is a lot that can be done to mitigate the risk of attacks and their resultant damage.

We saw that computer security addressed three areas of concern. The design of security systems also has three goals.

Prevention
Prevention means preventing attackers from violating established security policies. It means that we can implement mechanisms into our hardware, operating systems, and application software that users cannot override – either maliciously or accidentally. Examples of prevention include enforcing access control rules for files and authenticating users with passwords.
Detection
Detection detects and reports security attacks. It is particularly important when prevention mechanisms fail. It is useful because it can identify weaknesses with certain prevention mechanisms. Even if prevention mechanisms are successful, detection mechanisms are useful to let you know that attempted attacks are taking place. An example of detection is notifying an administrator that a new user has been added to the system. Another example is being notified that there have been several consecutive unsuccessful attempts to log in.
Recovery
If a system is compromised, we need to stop the attack and repair any damage to ensure that the system can continue to run correctly and the integrity of data is preserved. Recovery includes forensics, the study of identifying what happened and what was damaged so we can fix it. An example of recovery is restoration from backups.

Security engineering is the task of implementing the necessary mechanisms and defining policies across all the components of the system. Like other engineering disciplines, designing secure systems involves making compromises. A highly secure system will be disconnected from any communication network, sit in an electromagnetically shielded room that is only accessible to trusted users, and run software that has been thoroughly audited. That environment is not acceptable for most of our computing needs. We want to download apps, carry our computers with us, and interact with the world. Even in the ultra-secure example, we still need to be concerned with how we monitor access to the room, who wrote the underlying operating system and compilers, and whether authorized users can be coerced to subvert the system. Systems have to be designed with some idea of who are likely potential attackers and what the threats are. Risk analysis is used to understand the difficulty of an attack on a system, who will be affected, and what the worst thing that can happen is. A threat model is a data flow model (e.g., diagram) that identifies each place where information moves into or out of the software or between subsystems of the program. It allows you to identify areas where the most effort should be placed to secure a system.

Secure systems have two parts to them: mechanisms and policies. A policy is a description of what is or is not allowed. For example, “users must have a password to log into the system” is a policy. Mechanisms* are used to implement and enforce policies. An example of a mechanism is the software that requests user IDs and passwords, authenticates the user, and allows entry to the system only if the correct password is used.

A vulnerability is a weakness in the security system. It could be a poorly defined policy, a bribed individual, or a flaw in the underlying mechanism that enforces security. An attack is the exploitation of a vulnerability in a system. An attack vector refers to the specific technique that an attacker uses to exploit a vulnerability. Example attack vectors include phishing, keylogging, and trying common passwords to log onto a system. An attack surface is the sum of possible attack vectors in a system: all the places where an attacker might try to get into the system.

A threat is the potential adversary who may attack the system. Threats may lead to attacks.

Threats fall into four broad categories:

Disclosure: Unauthorized access to data, which covers exposure, interception, interference, and intrusion. This includes stealing data, improperly making data available to others, or snooping on the flow of data.

Deception: Accepting false data as true. This includes masquerading, which is posing as an authorized entity; substitution or insertion of includes the injection of false data or modification of existing data; repudiation, where someone falsely denies receiving or originating data.

Disruption: Some change that interrupts or prevents the correct operation of the system. This can include maliciously changing the logic of a program, a human error that disables a system, an electrical outage, or a failure in the system due to a bug. It can also refer to any obstruction that hinders the functioning of the system.

Usurpation: Unauthorized control of some part of a system. This includes theft of service as well as any misuse of the system such as tampering or actions that result in the violation of system privileges.

The Internet increases opportunities for attackers. The core protocols of the Internet were designed with decentralization, openness, and interoperability in mind rather than security. Anyone can join the Internet and send messages … and untrustworthy entities can provide routing services. It allows bad actors to hide and to attack from a distance. It also allows attackers to amass asymmetric force: harnessing more resources to attack than the victim has for defense. Even small groups of attackers are capable of mounting Distributed Denial of Service (DDoS) attacks that can overwhelm large companies or government agencies.

Adversaries can range from lone hackers to industrial spies, terrorists, and intelligence agencies. We can consider two dimensions: skill and focus. Regarding focus, attacks are either opportunistic or targeted. Opportunistic attacks are those where the attacker is not out to get you specifically but casts a wide net, trying many systems in the hope of finding a few that have a particular vulnerability that can be exploited. Targeted attacks are those where the attacker targets you specifically. The term script kiddies is used to refer to attackers who lack the skills to craft their own exploits but download malware toolkits to try to find vulnerabilities (e.g., systems with poor or default passwords, hackable cameras). Advanced persistent threats (APT) are highly-skilled, well-funded, and determined (hence, persistent) attackers. They can craft their own exploits, pay millions of dollars for others, and may carry out complex, multi-stage attacks.

We refer to the trusted computing base (TCB) as the collection of hardware and software of a computing system that is critical to ensuring the system’s security. Typically, this is the operating system and system software but also includes the system firmware, bootloader, and any other software that, if attacked, can impact security. If the TCB is compromised, you no longer have assurance that any part of the system is secure. For example. the operating system may be modified to ignore the enforcement of file access permissions. If that happens, you no longer have assurance that any application is accessing files properly.

Cryptography

Cryptography deals with encrypting plaintext using a cipher, also known as an encryption algorithm, to create ciphertext, which is unintelligible to anyone unless they can decrypt the ciphertext. It is a tool that helps build protocols that address:

Authentication
Showing that the user really is that user.
Integrity:
Validating that the message has not been modified.
Nonrepudiation:
Binding the origin of a message to a user so that she cannot deny creating it.
Confidentiality:
Hiding the contents of a message.

A secret cipher is one where the workings of the cipher must be kept secret. There is no reliance on any key and the secrecy of the cipher is crucial to the value of the algorithm. This has obvious flaws: people in the know leaking the secret, designers coming up with a poor algorithm, and reverse engineering. Schneier’s Law (not a real law), named after Bruce Schneier, a cryptographer and security professional, suggests that anyone can invent a cipher that they will not be able to break, but that doesn’t mean it’s a good one.

For any serious use of encryption, we use well-tested, non-secret algorithms that rely on secret keys. A key is a parameter to a cipher that alters the resulting ciphertext. Knowledge of the key is needed to decrypt the ciphertext. Kerckhoffs’s Principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. We expect algorithms to be publicly known and all security to rest entirely on the secrecy of the key.

A symmetric encryption algorithm uses the same secret key for encryption and decryption.

An alternative to symmetric ciphers are asymmetric ciphers. An asymmetric, or public key cipher uses two related keys. Data encrypted with one key can only be decrypted with the other key.

Properties of good ciphers

These are the key properties we expect for a cipher to be strong:

  1. For a cipher to be considered good, ciphertext should be indistinguishable from random values.
  2. Given ciphertext, there should be no way to extract the original plaintext or the key that was used to create it except by of enumerating over all possible keys. This is called a brute-force attack.
  3. The keys used for encryption should be large enough that a brute force attack is not feasible. Each additional bit in a key doubles the number of possible keys and hence doubles the search time.

Stating that the ciphertext should be indistinguishable from random values implies high entropy. Shannon entropy measures the randomness in a system. It quantifies the unpredictability of cryptographic keys and messages, with higher entropy indicating more randomness. Low entropy would allow an attacker to find patterns or some correlation to the original content.

We expect these properties for a cipher to be useful:

  1. The secrecy of the cipher should be entirely in the key (Kerckoffs’s principle) – we expect knowledge of the algorithm to be public.

  2. Encryption and decryption should be efficient: we want to encourage the use of secure cryptography where it is needed and not have people avoid it because it slows down data access.

  3. Keys and algorithms should be as simple as possible and operate on any data:

    • There shouldn’t be restrictions on the values of keys, the data that could be encrypted, or how to do the encryption
    • Restrictions on keys make searches easier and will require longer keys.
    • Complex algorithms will increase the likelihood of implementation errors.
    • Restrictions on what can be encrypted will encourage people to not use the algorithm.
  4. The size of the ciphertext should be the same size as the plaintext.

    • You don’t want your effective bandwidth cut in half because the ciphertext is 2x the size of plaintext.
    • However, sometimes we might need to pad the data but that’s a small number of bytes regardless of the input size.
  5. The algorithm has been extensively analyzed

    • We don’t want the latest – we want an algorithm that has been studied carefully for years by many experts.

In addition to formulating the measurement of entropy, Claude Shannon posited that a strong cipher should, ideally, have the confusion and diffusion as goals in its operation.

Confusion means that there is no direct correlation between a bit of the key and the resulting ciphertext. Every bit of ciphertext will be impacted by multiple bits of the key. An attacker will not be able to find a connection between a bit of the key and a bit of the ciphertext. This is important in not giving the cryptanalyst hints on what certain bits of the key might be and thus limit the set of possible keys. Confusion hides the relationship between the key and ciphertext

Diffusion is the property where the plaintext information is spread throughout the cipher so that a change in one bit of plaintext will change, on average, half of the bits in the ciphertext. Diffusion tries to make the relationship between the plaintext and ciphertext as complicated as possible.

Classic cryptography

Monoalphabetic substitution ciphers

The earliest form of cryptography was the monoalphabetic substitution cipher. In this cipher, each character of plaintext is substituted with a character of ciphertext based on a substitution alphabet (a lookup table). The simplest of these is the Caesar cipher, known as a shift cipher, in which a plaintext character is replaced with a character that is n positions away in the alphabet. The key is the simply the the shift value: the number n. Substitution ciphers are vulnerable to frequency analysis attacks, in which an analyst analyzes letter frequencies in ciphertext and substitutes characters with those that occur with the same frequency in natural language text (e.g., if “x” occurs 12% of the time, it’s likely to really be an “e” since “e” occurs in English text approximately 12% of the time while “x” occurs only 0.1% of the time).

Polyalphabetic substitution ciphers

Polyalphabetic substitution ciphers were designed to increase resiliency against frequency analysis attacks. Instead of using a single plaintext to ciphertext mapping for the entire message, the substitution alphabet may change periodically. Leon Battista Alberti is credited with creating the first polyalphabetic substitution cipher. In the Alberti cipher (essentially a secret decoder ring), the substitution alphabet changes every n characters as the ring is rotated one position every n characters.

The Vigenère cipher is a grid of Caesar ciphers that uses a repeating key. A repeating key is a key that repeats itself for as long as the message. Each character of the key determines which Caesar cipher (which row of the grid) will be used for the next character of plaintext. The position of the plaintext character identifies the column of the grid. These algorithms are still vulnerable to frequency analysis attacks but require substantially more plaintext since one needs to deduce the key length (or the frequency at which the substitution alphabet changes) and then effectively decode multiple monoalphabetic substitution ciphers.

One-time Pads

The one-time pad is the only provably secure cipher. It uses a random key that is as long as the plaintext. Each character of plaintext is permuted by a character of ciphertext (e.g., add the characters modulo the size of the alphabet or, in the case of binary data, exclusive-or the next byte of the text with the next byte of the key). The reason this cryptosystem is not particularly useful is because the key has to be as long as the message, so transporting the key securely becomes a problem. The challenge of sending a message securely is now replaced with the challenge of sending the key securely. The position in the key (pad) must by synchronized at all times. Error recovery from unsynchronized keys is not possible. Finally, for the cipher to be secure, a key must be composed of truly random characters, not ones derived by an algorithmic pseudorandom number generator. The key can never be reused.

The one-time pad provides perfect secrecy (not to be confused with forward secrecy, also called perfect forward secrecy, which will be discussed later), which means that the ciphertext conveys no information about the content of the plaintext. It has been proved that perfect secrecy can be achieved only if there are as many possible keys as the plaintext, meaning the key has to be as long as the message. Watch this video for an explanation of perfect secrecy.

Stream ciphers

A stream cipher simulates a one-time pad by using a keystream generator to create a set of key bytes that is as long as the message. A keystream generator is a pseudorandom number generator that is seeded, or initialized, with a key that drives the output of all the bytes that the generator spits out. The keystream generator is fully deterministic: the same key will produce the same stream of output bytes each time. Because of this, receivers only need to have the key to be able to decipher a message. However, because the keystream generator does not generate true random numbers, the stream cipher is not a true substitute for a one-time pad. Its strength rests on the strength of the key. A keystream generator will, at some point, will reach an internal state that is identical to some previous internal state and produce output that is a repetition of previous output. This also limits the security of a stream cipher but the repetition may not occur for a long time, so stream ciphers can still be useful for many purposes.

Rotor machines

A rotor machine is an electromechanical device that implements a polyalphabetic substitution cipher. It uses a set of disks (rotors), each of which implements a substitution cipher. The rotors rotate with each character in the style of an odometer: after a complete rotation of one rotor, the next rotor advances one position. Each successive character gets a new substitution alphabet applied to it. The multi-rotor mechanism allows for a huge number of substitution alphabets to be employed before they start repeating when the rotors all reach their starting position. The number of alphabets is cr, where c is the number of characters in the alphabet and r is the number of rotors.

Transposition ciphers

Instead of substituting one character of plaintext for a character of ciphertext, a transposition cipher scrambles the position of the plaintext characters. Decryption is the knowledge of how to unscramble them.

A scytale, also known as a staff cipher, is an ancient implementation of a transposition cipher where text written along a strip of paper is wrapped around a rod and the resulting sequences of text are read horizontally. This is equivalent to entering characters in a two-dimensional matrix horizontally and reading them vertically. Because the number of characters might not be a multiple of the width of the matrix, extra characters might need to be added at the end. This is called padding and is essential for block ciphers, which encrypt chunks of data at a time.

Block ciphers

Most modern ciphers are block ciphers, meaning that they encrypt a chunk of bits, or block, of plaintext at a time. The same key is used to encrypt each successive block of plaintext.

AES and DES are two popular symmetric block ciphers. Symmetric block ciphers are usually implemented as iterative ciphers. The encryption of each block of plaintext iterates over several rounds. Each round uses a subkey, which is a key generated from the main key via a specific set of bit replications, inversions, and transpositions. The subkey is also known as a round key since it is applied to only one round, or iteration. This subkey determines what happens to the block of plaintext as it goes through a substitution-permutation (SP) network. The SP network, guided by the subkey, flips some bits by doing a substitution, which is a table lookup of an input bit pattern to get an output bit pattern and a permutation, which is a scrambling of bits in a specific order. The output bytes are fed into the next round, which applies a substitution-permutation step onto a different subkey. The process continues for several rounds (16 rounds for DES, 10–14 rounds for AES). and the resulting bytes are the ciphertext for the input block.

The iteration through multiple SP steps creates confusion and diffusion. Confusion means that it is extremely difficult to find any correlation between a bit of the ciphertext with any part of the key or the plaintext. A core component of block ciphers is the s-box, which converts n input bits to m output bits, usually via a table lookup. The purpose of the s-box is to add confusion by altering the relationship between the input and output bits.

Diffusion means that any changes to the plaintext are distributed (diffused) throughout the ciphertext so that, on average, half of the bits of the ciphertext would change if even one bit of plaintext is changed.

Feistel ciphers

A Feistel cipher is a form of block cipher that uses a variation of the SP network where a block plaintext is split into two parts. The substitution-permutation round is applied to only one part. That output is then XORed with the other part and the two halves are swapped. At each round, half of the input block remains unchanged. DES, the Data Encryption Standard, is an example of a Feistel cipher. AES, the Advanced Encryption Standard, is not.

DES

Two popular symmetric block ciphers are DES, the Data Encryption Standard, and AES, the Advanced Encryption Standard. DES was adopted as a federal standard in 1976 and is a block cipher based on the Feistel cipher that encrypts 64-bit blocks using a 56-bit key.

DES has been shown to have some minor weaknesses against cryptanalysis. Key can be recovered using 247 chosen plaintexts or 243 known plaintexts. Note that this is not a practical amount of data to get for a real attack. The real weakness of DES is not the algorithm but but its 56-bit key. An exhaustive search requires 255 iterations on average (we assume that, on average, the plaintext is recovered halfway through the search). This was a lot for computers in the 1970s but is not much of a challenge for today’s dedicated hardware or distributed efforts.

Triple-DES

Triple-DES (3DES) solves the key size problem of DES and allows DES to use keys up to 168 bits. It does this by applying three layers of encryption:

  1. C' = Encrypt M with key K1
  2. C'' = Decrypt C' with key K2
  3. C = Encrypt C'' with key K3

If K1, K2, and K3 are identical, we have the original DES algorithm since the decryption in the second step cancels out the encryption in the first step. If K1 and K3 are the same, we effectively have a 112-bit key and if all three keys are different, we have a 168-bit key.

Cryptanalysis is not effective with 3DES: the three layers of encryption use 48 rounds instead of 16 making it infeasible to reconstruct the substitutions and permutations that take place. A 168-bit key is too long for a brute-force attack. However, DES is relatively slow compared with other symmetric ciphers, such as AES. It was designed with hardware encryption in mind. 3DES is, of course, three times slower than DES.

AES

AES, the Advanced Encryption Standard, was designed as a successor to DES and became a federal government standard in 2002. It uses a larger block size than DES: 128 bits versus DES’s 64 bits and supports larger key sizes: 128, 192, and 256 bits. Even 128 bits is complex enough to prevent brute-force searches.

No significant academic attacks have been found thus far beyond brute force search. AES is also typically 5–10 times faster in software than 3DES.

Block cipher modes

Electronic Code Book (ECB)

When data is encrypted with a block cipher, it is broken into blocks and each block is encrypted separately. This leads to two problems.

  1. If different encrypted messages contain the same substrings and use the same key, an intruder can deduce that it is the same data.

  2. Secondly, a malicious party can delete, add, or replace blocks (perhaps with blocks that were captured from previous messages).

This basic form of a block cipher is called an electronic code book (ECB). Think of the code book as a database of encrypted content. You can look up a block of plaintext and find the corresponding ciphertext. This is not feasible to implement for arbitrary messages but refers to the historic use of codebooks to convert plaintext messages to ciphertext.

Cipher Block Chaining (CBC)

Cipher block chaining (CBC) addresses these problems. Every block of data is still encrypted with the same key. However, prior to being encrypted, the data block is exclusive-ORed with the previous block of ciphertext. The receiver does the process in reverse: a block of received data is decrypted and then exclusive-ored with the previously-received block of ciphertext to obtain the original data. The very first block is exclusive-ored with a random initialization vector, which must be transmitted to the remote side.

Note that CBC does not make the encryption more secure; it simply makes the result of each block of data dependent on all previous previous blocks. Because of the random initialization vector, even identical content would appear different in ciphertext. An attacker would not be able to tell if any two blocks of ciphertext refer to identical blocks of plaintext. Because of the chaining, even identical blocks in the same ciphertext will appear vastly different. Moreover, because of this blocks cannot be meaningfully inserted, swapped, or deleted in the message stream without the decryption failing (producing random-looking garbage).

Counter mode (CTR)

Counter mode (CTR) also addresses these problems but in a different way. The ciphertext of each block is a function of its position in the message. Encryption starts with a message counter. The counter is incremented for each block of input. Only the counter is encrypted. The resulting ciphertext is then exclusive-ORed with the corresponding block of plaintext, producing a block of message ciphertext. To decrypt, the receiver does the same thing and needs to know the starting value of the counter as well as the key.

An advantage of CTR mode is that each block has no dependance on other blocks and encryption on multiple blocks can be done in parallel.

Cryptanalysis

The goal of cryptanalysis is break codes. Most often, it is to identify some non-random behavior of an algorithm that will give the analyst an advantage over an exhaustive search of the key space.

Differential cryptanalysis seeks to identify non-random behavior by examining how changes in plaintext input affect changes in the output ciphertext. It tries to find whether certain bit patterns are unlikely for certain keys or whether the change in plaintext results in likely changes in the output.

Linear cryptanalysis tries to create equations that attempt to predict the relationships between ciphertext, plaintext, and the key. An equation will never be equivalent to a cipher but any correlation of bit patterns give the analyst an advantage.

Neither of these methods will break a code directly but may help find keys or data that are more likely are that are unlikely. It reduces the keys that need to be searched.

Public key cryptography

Public key algorithm, also known as asymmetric ciphers, use one key for encryption and another key for decryption. One of these keys is kept private (known only to the creator) and is known as the private key. The corresponding key is generally made visible to others and is known as the public key.

Anything encrypted with the private key can only be decrypted with the public key. This is the basis for digital signatures. Anything that is encrypted with a public key can be encrypted only with the corresponding private key. This is the basis for authentication and covert communication.

Public and private keys are related but, given one of the keys, there is no feasible way of computing the other. They are based on trapdoor functions, which are one-way functions: there is no known way to compute the inverse unless you have extra data: the other key.

RSA public key cryptography

The RSA algorithm is the most popular algorithm for asymmetric cryptography. Its security is based on the difficulty of finding the factors of the product of two large prime numbers. Unlike symmetric ciphers, RSA encryption is a matter of performing arithmetic on large numbers. It is also a block cipher and plaintext is converted to ciphertext by the formula:

c = me mod n

Where m is a block of plaintext, e is the encryption key, and n is an agreed-upon modulus that is the product of two primes. To decrypt the ciphertext, you need the decryption key, d:

m = cd mod n

Given the ciphertext c, e, and n, there is no efficient way to compute the inverse to obtain m. Should an attacker find a way to factor n into its two prime factors, however, the attacker would be able to reconstruct the encryption and decryption keys, e and d.

Elliptic curve cryptography (ECC)

Elliptic curve cryptography (ECC) is a more recent public key algorithm that is an alternative to RSA. It is based on finding points along a prescribed elliptic curve, which is an equation of the form:

y2 = x3 + ax + b

Contrary to its name, elliptic curves have nothing to do with ellipses or conic sections and look like bumpy lines. With elliptic curves, multiplying a point on a given elliptic curve by a number will produce another point on the curve. However, given that result, it is difficult to find what number was used. The security in ECC rests not our inability to factor numbers but our inability to perform discrete logarithms in a finite field.

The RSA algorithm is still the most widely used public key algorithm, but ECC has some advantages:

  • ECC can use far shorter keys for the same degree of security. Security comparable to 256 bit AES encryption requires a 512-bit ECC key but a 15,360-bit RSA key

  • ECC requires less CPU consumption and uses less memory than RSA. It is faster for encryption (including signature generation) than RSA but slower for decryption.

  • Generating ECC keys is faster than RSA (but much slower than AES, where a key is just a random number).

On the downside, ECC is more complex to implement and decryption is slower than with RSA. As a standard, ECC was also tainted because the NSA inserted weaknesses into the ECC random number generator that effectively created a backdoor for decrypting content. This has been remedied and ECC is generally considered the preferred choice over RSA for most applications.

If you are interested, see here for a somewhat easy-to-understand tutorial on ECC.

Quantum computing

Quantum computers are a markedly different form computer. Conventional computers store and process information that is represented in bits, with each bit having a distinct value of 0 or 1. Quantum computers use the principles of quantum mechanics, which include superposition and entanglement. Instead of working with bits, quantum computers operate on qubits, which can hold values of “0” and “1” simultaneously via superposiion. The superpositions of qubits can be entangled with other objects so that their final outcomes will be mathematically related. A single operation can be carried out on 2n values simultaneously, where n is the number of qubits in the computer.

While practical quantum computers don’t exist, it’s predicted that certain problems may be solved exponentially faster than with conventional computers. Shor’s algorithm, for instance, will be able to find the prime factors of large integers and compute discrete logarithms far more efficiently than is currently possible.

So far, quantum computers are very much in their infancy, and it is not clear when – or if – large-scale quantum computers that are capable of solving useful problems will be built. It is unlikely that they will be built in the next several years but we expect that they will be built eventually. Shor’s algorithm will be able to crack public-key based systems such as RSA, Elliptic Curve Cryptography, and Diffie-Hellman key exchange. In 2016, the NSA called for a migration to “post-quantum cryptographic algorithms” and has currently narrowed down the submissions to 26 candidates. The goal is to find useful trapdoor functions that do not rely on multiplying large primes, computing exponents, any other mechanisms that can be attacked by quantum computation. If you are interested in these, you can read the NSA’s report.

Symmetric cryptosystems, such as AES, are not particularly vulnerable to quantum computing since they rely on moving and flipping bits rather than applying mathematical functions on the data. The best potential attacks come via Grover’s algorithm, which yields only a quadratic rather than an exponential speedup in key searches. This will reduce the effective strength of a key by a factor of two. For instance, a 128-bit key will have the strength of a 64-bit key on a conventional computer. It is easy enough to use a sufficiently long key (256-bit AES keys are currently recommended) so that quantum computing poses no threat to symmetric algorithms.

Secure communication

Symmetric cryptography

Communicating securely with symmetric cryptography is easy. All communicating parties must share the same secret key. Plaintext is encrypted with the secret key to create ciphertext and then transmitted or stored. It can be decrypted by anyone who has the secret key.

Asymmetric cryptography

Communicating securely with asymmetric cryptography is a bit different. Anything encrypted with one key can be decrypted only by the other related key. For Alice to encrypt a message for Bob, she encrypts it with Bob’s public key. Only Bob has the corresponding key that can decrypt the message: Bob’s private key.

Hybrid cryptography

Asymmetric cryptography alleviates the problem of transmitting a key over an unsecure channel. However, it is considerably slower than symmetric cryptography. AES, for example, is approximately 1,500 times faster for decryption than RSA and 40 times faster for encryption. AES is also much faster than ECC. Key generation is also far slower with RSA or ECC than it is with symmetric algorithms, where the key is just a random number rather than a set of carefully chosen numbers with specific properties. Moreover, certain keys with RSA may be weaker than others.

Because of these factors, RSA and ECC are almost never used to encrypt large chunks of information. Instead, it is common to use hybrid cryptography, where a public key algorithm is used to encrypt a randomly-generated key that will encrypt the message with a symmetric algorithm. This randomly-generated key is called a session key, since it is generally used for one communication session and then discarded.

Key Exchange

The biggest problem with symmetric cryptography is key distribution. For Alice and Bob to communicate, they must share a secret key that no adversaries can get. However, Alice cannot send the key to Bob since it would be visible to adversaries. She cannot encrypt it because Alice and Bob do not share a key yet.

Diffie-Hellman key exchange

The Diffie-Hellman key exchange algorithm allows two parties to establish a common key without disclosing any information that would allow any other party to compute the same key. Each party generates a private key and a public key. Despite their name, these are not encryption keys; they are just numbers. Diffie-Hellman does not implement public key cryptography. Alice can compute a common key using her private key and Bob’s public key. Bob can compute the same common key by using his private key and Alice’s public key.

Diffie-Hellman uses the one-way function abmod c. Its one-wayness is due to our inability to compute the inverse: a discrete logarithm. Anyone may see Alice and Bob’s public keys but will be unable to compute their common key. Although Diffie-Hellman is not a public key encryption algorithm, it behaves like one in the sense that it allows us to exchange keys without having to use a trusted third party.

Key exchange using public key cryptography

With public key cryptography, there generally isn’t a need for key exchange. As long as both sides can get each other’s public keys from a trusted source, they can encrypt messages using those keys. However, we rarely use public key cryptography for large messages. It can, however, be used to transmit a session key. This use of public key cryptography to transmit a session key that will be used to apply symmetric cryptography to messages is called hybrid cryptography. For Alice to send a key to Bob:

  1. Alice generates a random session key.
  2. She encrypts it with Bob’s public key & sends it to Bob.
  3. Bob decrypts the message using his private key and now has the session key.

Bob is the only one who has Bob’s private key to be able to decrypt that message and extract the session key. A problem with this is that anybody can do this. Charles can generate a random session key, encrypt it with Bob’s public key, and send it to Bob. For Bob to be convinced that it came from Alice, she can encrypt it with her private key (this is signing the message).

  1. Alice generates a random session key.
  2. She signs it by encrypting the key with her private key.
  3. She encrypts the result with Bob’s public key & sends it to Bob.
  4. Bob decrypts the message using his private key.
  5. Bob decrypts the resulting message with Alice’s public key and gets the session key.

If anybody other than Alice created the message, the result that Bob gets by decrypting it with Alice’s public key will not result in a valid key for anyone. We can enhance the protocol by using a standalone signature (encrypted hash) so Bob can identify a valid key from a bogus one.

Forward secrecy

If an attacker steals, for example, Bob’s private key, he will be able to go through old messages and decrypt old session keys (the start of every message to Bob contained a session key encrypted with his public key). Forward secrecy, also called perfect forward secrecy, is the use of keys and key exchange protocols where the compromise of a key does not compromise past session keys. There is no secret that one can steal that will allow the attacker to decrypt multiple past messages. Note that this is of value for communication sessions but not stored encrypted documents (such as email). You don’t want an attacker to gain any information from a communication session even if a user’s key is compromised. However, the user needs to be able to decrypt her own documents, so they need to rely on a long-term key.

Diffie-Hellman enables forward secrecy. Alice and Bob can each generate a key pair and send their public key to each other. They can then compute a common key that nobody else will know and use that to communicate. Achieving forward secrecy requires single-use (ephemeral) keys. Next time Alice and Bob want to communicate, they will generate a new set of keys and compute a new common key. At no time do we rely on long-term keys, such as Alice’s secret key or RSA private key. Encrypting a session key with a long-term key, such as Bob’s public key, will not achieve forward secrecy. If an attacker ever finds Bob’s private key, she will be able to extract the session key.

Difie-Hellman is particularly good for for achieving forward secrecy because it is efficient to create new new key pairs on the fly. RSA or ECC keys can be used as well but key generation is far less efficient. Because of this, RSA and ECC keys tend to be used mainly as long-term keys (e.g., for authentication).

Message Integrity

One-way functions

A one-way function is one that can be computed relatively easily in one direction but there is no known way of computing the inverse function. One-way functions are crucial in a number of cryptographic algorithms, including digital signatures, Diffie-Hellman key exchange, and both RSA and elliptic curve public key cryptography. For Diffie-Hellman and public key cryptography, they ensure that someone cannot generate the corresponding private key when presented with a public key. Key exchange and asymmetric cryptography algorithms rely on a spacial form of one-way function, called a trapdoor function. This is a function whose inverse is computable if you are provided with extra information, such as a private key that corresponds to the public key that was used to generate the data.

Hash functions

A particularly useful form of a one-way function is the cryptographic hash function. This is a one-way function whose output is always a fixed number of bits for any input. Hash functions are commonly used in programming to construct hash tables, which provide O(1) lookups of keys.

Cryptographic hash functions produce far longer results than those used for hash tables. Common lengths are 224, 256, 384, or 512 bits. Good cryptographic hash functions (e.g., SHA-1, SHA-2, SHA-3) have several properties:

  1. Like all hash functions, take arbitrary-length input and produce fixed-length output

  2. Also like all hash functions, they are deterministic; they produce the same result each time when given identical input.

  3. They exhibit pre-image resistance, or hiding. Given a hash H, it should not be feasible to find a message M where H=hash(M).

  4. The output of a hash function should not give any information about any of the input. For example, changing a byte in the message should not cause any predictable change in the hash value.

  5. They are collision resistant. While hash collisions can exist (the number of possible hashes is smaller than than number of possible messages; see the pigeonhole principle), it is not feasible to find any two different messages that hash to the same value. Similarly, it is not feasible to modify the plaintext without changing its resultant hash.

  6. They should be relatively efficient to compute. We would like to use hash functions as message integrity checks and generate them for each message without incurring significant overhead.

The cryptographic hash function is the basis for message authentication codes and digital signatures.

Because of these properties, we have extremely high assurance that a message would no longer hash to the same value if it is modified in any way. The holy grail for an attacker is to be able to construct a message that hashes to the same value as another message. That would allow the attacker to substitute a new message for some original one (for example, redirecting a money transfer). Searching for a collision with a pre-image (known message) is much harder than searching for any two messages that produce the same hash. The birthday paradox tells us that the search for a collision of any two messages is approximately the square root of the complexity of searching for a collision on a specific message. This means that the strength of a hash function for a brute-force collision attack is approximately half the number of bits of the hash. A 256-bit hash function has a strength of approximately 128 bits.

Popular hash functions include SHA-1 (160 bits), SHA-2 (commonly 256 and 512 bits), and SHA-3 (256 and 512 bits).

Message Authentication Codes (MACs)

A cryptographic hash helps us ensure message integrity: it serves as a checksum that allows us to determine if a message has been modified. If the message is modified, it no longer hashes to the same value as before. However, if an attacker modifies a message, she may be able to modify the hash value as well. To prevent this, we need a hash that relies on a key for validation. This is a message authentication code, or MAC. Two forms of MACs are hash-based ones and block cipher-based ones:

Hash-based MAC (HMAC):
A hash-based MAC is a specific method for converting regular hash functions into MACs by using a cryptographic hash function, such as SHA-256, to hash the message and the key. Anyone who does not know the key will not be able to recreate the hash.
Block cipher-based MAC (CBC-MAC):
Recall that cipher block chaining assures us that every encrypted block is a function of all previous blocks. CBC-MAC uses a zero initialization vector and runs through a cipher block chained encryption, discarding all output blocks except for the last one, which becomes the MAC. Any changes to the message will be propagated to that final block and the same encryption cannot be performed by someone without the key. Note that a CBC-MAC still produces a fixed-length result and has all The properties of a hash function.

Digital signatures

Message authentication codes rely on a shared key. Anybody who possesses the key can modify and re-sign a message. There is no assurance that the action was done by the author of the message. Digital signatures have stronger properties than MACs:

  1. Only you can sign a message but anybody should be able to validate it.
  2. You cannot copy the signature from one message and have it be valid on another message.
  3. An adversary cannot forge a signature, even after inspecting an arbitrary number of signed messages.

Digital signatures require three operations:

  1. Key generation: {private_key, verification_key } := gen_keys(keysize)
  2. Signing: signature := sign(message, private_key)
  3. Validation: isvalid := verify(message, signature, verification_key)

Since we trust hashes to be collision-free, it makes sense to apply the signature to the hash of a message instead of the message itself. This ensures that the signature will be a small, fixed-size and makes it easy to embed in hash pointers and other structures and creates minimal transmission or storage overhead for verification.

There are several commonly used digital signature algorithms:

DSA, the Digital Signature Algorithm
The current NIST standard that generates key pairs that are secure because of the difficulty of computing discrete logarithms.
ECDSA, Elliptic Curve Digital Signature Algorithm
A variant of DSA that uses elliptic curve cryptography
Public key cryptographic algorithms
RSA or Elliptic Curve Cryptography applied to message hashes.

All these algorithms generate public and private key pairs. The first two are not general-purpose encryption algorithms but are designed solely for digital signatures.

We saw how public key cryptography can be used to encrypt messages: Alice encrypts a message using Bob’s public key to ensure that only Bob could decrypt it with his private key. We can also do the reverse: Alice can encrypt a message using her private key. Anyone can decrypt the message using her public key but, in doing so, would know that the message was encrypted by Alice.

A digital signature can be constructed by simply encrypting the hash of a message with the creator’s (signer’s) private key. Alternatively, digital signature algorithms have been created that apply a similar principle: hashing combined with trapdoor functions so that you would use a dedicated set of public/private keys to create and verify the signature. Anyone with the message signer’s public key can decrypt the hash and thus validate the hash against the message. Other parties cannot recreate the signature.

Note that, with a MAC, the recipient or anyone possessing the shared key can create the same MAC. With a digital signature, the signature can only be created by the owner of the private key. Unlike MACs, digital signatures provide non-repudiation – proof of identity. Alice cannot claim that she did not create a signature because nobody but Alice has her private key. Also unlike MACs, anyone can validate a signature since public keys are generally freely distributed. as with MACs, digital signatures also provide proof of integrity, assurance that the original message has not been modified.

Covert and authenticated messaging

We ignored the encryption of a message in the preceding discussion; our interest was assuring integrity. However, there are times when we may want to keep the message secret and validate that it has not been modified. Doing this involves sending a signature of the message along with the encrypted message.

A basic way for Alice to send a signed and encrypted message to Bob is for her to use hybrid cryptography and:

  1. Create a signature of the message. This is a hash of the message encrypted with her private key.
  2. Create a session key for encrypting the message. This is a throw-away key that will not be needed beyond the communication session.
  3. Encrypt the message using the session key. She will use a fast symmetric algorithm to encrypt this message.
  4. Package up the session key for Bob: she encrypts it with Bob’s public key. Since only Bob has the corresponding private key, only Bob will be able to decrypt the session key.
  5. She sends Bob: the encrypted message, encrypted session key, and signature.

Anonymous identities

A signature verification key (e.g., a public key) can be treated as an identity. You possess the corresponding private key and therefore only you can create valid signatures that can be verified with the public key. This identity is anonymous; it is just a bunch of bits. There is nothing that identifies you as the holder of the key. You can simply assert your identity by being the sole person who can generate valid signatures.

Since you can generate an arbitrary number of key pairs, you can create a new identity at any time and create as many different identities as you want. When you no longer need an identity, you can discard your private key for that corresponding public key.

Identity binding: digital certificates

While public keys provide a mechanism for asserting integrity via digital signatures, they are themselves anonymous. We’ve discussed a scenario where Alice uses Bob’s public key but never explained how she can assert that the key really belongs to Bob and was not planted by an adversary. Some form of identity binding of the public key must be implemented for you to know that you really have my public key instead of someone else’s. How does Alice really know that she has Bob’s public key?

X.509 digital certificates provide a way to do this. A certificate is a data structure that contains user information (called a distinguished name) and the user’s public key. This data structure also contains a signature of the certification authority. The signature is created by taking a hash of the rest of the data in the structure and encrypting it with the private key of the certification authority. The certification authority (CA) is responsible for setting policies of how they validate the identity of the person who presents the public key for encapsulation in a certificate.

To validate a certificate, you would hash all the certificate data except for the signature. Then you would decrypt the signature using the public key of the issuer. If the two values match, then you know that the certificate data has not been modified since it has been signed. The challenge is how to get the public key of the issuer. Public keys are stored in certificates, so the issuer would have a certificate containing its public key. This certificate can be signed by yet another issuer. This kind of process is called certificate chaining. For example, Alice can have a certificate issued by the Rutgers CS Department. The Rutgers CS Department’s certificate may be issued by Rutgers University. Rutgers University’s certificate could be issued by the State of New Jersey Certification Authority, and so on. At the very top level, we will have a certificate that is not signed by any higher-level certification authority. A certification authority that is not underneath any other CA is called a root CA. In practice, this type of chaining is rarely used. More commonly, there are hundreds of autonomous certification authorities acting as root CAs that issue certificates to companies, users, and services. The certificates for many of the trusted root CAs are preloaded into operating systems or, in some cases, browsers. See here for Microsoft’s trusted root certificate participants and here for Apple’s trusted root certificates.

Every certificate has an expiration time (often a year or more in the future). This provides some assurance that even if there is a concerted attack to find a corresponding private key to the public key in the certificate, such a key will not be found until long after the certificate expires. There might be cases where a private key might be leaked or the owner may no longer be trustworthy (for example, an employee leaves a company). In this case, a certificate can be revoked. Each CA publishes a certificate revocation list, or CRL, containing lists of certificates that they have previously issued that should no longer be considered valid. To prevent spoofing the CRL, the list is, of course, signed by the CA. Each certificate contains information on where to obtain revocation information.

The challenge with CRLs is that not everyone may check the certificate revocation list in a timely manner and some systems may accept a certificate not knowing that it was revoked. Some systems, particularly embedded systems, may not even be configured to handle CRLs.

Code signing - protecting code integrity

We have seen how we could use hash functions for message integrity in the form of MACs (message authentication codes, which use a shared key) and digital signatures (which use public and private keys). The same mechanism is employed to sign software: to validate that software has not been modified since it was created by the developer.

The advantages of signing code are that the software can be downloaded from untrusted servers or distributed over untrusted channels and still be validated to be untampered. It also enables us to detect whether malware on our local system has modified the software.

Microsoft Windows, Apple macOS, iOS, and Android all make extensive use of signed software. Signing an application is fundamentally no different than signing any other digital content:

  1. As a software publisher, you create a public/private key pair
  2. You obtain a digital certificate for the public key. In some cases, you need to obtain it from a certification authority (CA) that can certify you as a software publisher.
  3. You create a digital signature of the software that you’re distributing: generate a hash and encrypt it with your private key.
  4. Attach the signature and certificate to the software package. This will enable others to validate the signature.

Prior to installation, the system will validate the certificate and then validate the signature. If the signature does not match the hash of the software package, that indicates that the software has been altered. Signed software usually also supports per-page hashes. Recall demand paging in operating systems: an operating system does not load a program into memory at once; it only loads chunks (pages) as they are needed. This is called demand paging. Signed software will often include hashes for each page (typically 4K bytes) and each page will be validated as it is loaded into memory. This avoids the overhead of validating the entire file prior to running each program (e.g., the executable for Adobe Photoshop is over 100 MB) but still enables checking that the contents have not been tampered with on the computer even after the program was installed.

Authentication

Authentication is the process of binding an identity to a user. Note the distinction between authentication and identification. Identification is simply the process of asking you to identify yourself (for example, ask for a login name). Authentication is the process of proving that the identification is correct. Authorization is the process of determining whether the user is permitted to do something.

Authentication factors

The three factors of authentication are:

  1. something you have (such as a key or a card),
  2. something you know (such as a password or PIN),
  3. and something you are (biometrics).

Combining these into a multi-factor authentication scheme can increase security against the chance that any one of the factors is compromised. Multi-factor authentication must use two or more of these factors. Using two passwords, for example, is not sufficient and does not qualify as multi-factor.

Combined authentication and key exchange

Key exchange and authentication using a trusted third party

For two parties to communicate using symmetric ciphers they need to share the same key. The ways of doing this are:

  1. Share the key via some trusted mechanism outside of the network, such are reading it over the phone or sending a flash drive via FedEx.

  2. Send the key using a public key algorithm.

  3. Use a trusted third party.

We will first examine the use of a trusted third party. A trusted third party is a trusted system that has everyone’s key. Hence, only Alice and the trusted party (whom we will call Trent) have Alice’s secret key. Only Bob and Trent have Bob’s secret key.

The simplest way of using a trusted third party is to ask it to come up with a session key and send it to the parties that wish to communicate. For example, Alice sends a message to Trent requesting a session key to communicate with Bob. This message is encrypted with Alice’s secret key so that Trent knows the message could have only come from Alice.

Trent generates a random session key and encrypts it with Alice’s secret key. He also encrypts the same key with Bob’s secret key. Alice gets both keys and passes the one encrypted for Bob to Bob. Now Alice and Bob have a session key that was encrypted with each of their secret keys and they can communicate by encrypting messages with that session key.

This simple scheme is vulnerable to replay attacks. An eavesdropper, Eve, can record messages from Alice to Bob and replay them at a later time. Eve might not be able to decode the messages but she can confuse Bob by sending him seemingly valid encrypted messages.

The second problem is that Alice sends Trent an encrypted session key but Trent has no idea that Alice is requesting to communicate with him. While Trent authenticated Alice (simply by being able to decrypt her request) and authorized her to talk with Bob (by generating the session key), that information has not been conveyed to Bob.

Needham-Schroeder: nonces

The Needham-Schroeder protocol improves the basic key exchange protocol by adding nonces to messages. A nonce is simply a random string – a random bunch of bits. Alice sends a request to Trent, asking to talk to Bob. This time, it doesn’t have to even be encrypted. As part of the request she sends a nonce.

Trent responds with a message that contains:

  • Alice’s ID
  • Bob’s ID
  • the nonce
  • the session key
  • a ticket: a message encrypted for Bob containing Alice’s ID and the same session key

This entire message from Trent is encrypted with Alice’s secret key. Alice can validate that the message is a response to her message because:

  • It is encrypted for her: nobody but Alice and Trent has Alice’s secret key.
  • It contains the same nonce as in her request, so it is not a replay of some earlier message, which would have had a different randomly-generated nonce.

Alice sends the ticket (the message encrypted with Bob’s key) to Bob. He can decrypt it and knows:

  • The message must have been generated by Trent since only Trent and Bob know Bob’s key and and thus could construct a meaningful message encrypted with Bob’s key.
  • He will be communicating with Alice because Trent placed Alice’s ID in that ticket.
  • The session key since Trent placed that in the ticket as well. Alice has this too.

Bob can now communicate with Alice but he will first authenticate Alice to be sure that he’s really communicating with her. He’ll believe it’s Alice if she can prove that she has the session key. To do this, Bob creates another nonce, encrypts it with the session key, and sends it to Alice. Alice decrypts the message, subtracts one from the nonce, encrypts the result, and sends it back to Bob. She just demonstrated that she could decrypt a message using the session key and return back a known modification of the message. Needham-Schroeder is a combined authentication and key exchange protocol.

Denning-Sacco modification: timestamps to avoid key replay

One flaw in the Needham-Schroeder algorithm is when Alice sends the ticket to Bob. The ticket is encrypted with Bob’s secret key and contains Alice’s ID as well as the session key. If an attacker grabbed a communication session and managed to decrypt the session key, she can replay the transmission of the ticket to Bob. Bob won’t know that he received that same session key in the past. He will proceed to validate “Alice” by asking her to prove that she indeed knows the session key. In this case, Eve, our eavesdropper, does know it; that’s why she sent the ticket to Bob. Bob completes the authentication and thinks he is talking with Alice when in reality he is talking to Eve.

A fix for this was proposed by Denning & Sacco: add a timestamp to the ticket. When Trent creates the ticket that Alice will give to Bob, it is a message encrypted for Bob and contains Alice’s ID, the session key, and a timestamp.

When Bob receives a ticket, he checks the timestamp. If it is older than some recent time (e.g., a few seconds), Bob will simply discard the ticket, assuming that he is getting a replay attack.

Otway-Rees protocol: session IDs instead of timestamps

A problem with timestamps is that their use relies on all entities having synchronized clocks. If Bob’s clock is significantly off from Trent’s, he may falsely accept or falsely reject a ticket that Alice presents to him. Time synchronization becomes an attack vector for this protocol. If an attacker can change Bob’s concept of time, she may be able to convince Bob to accept an older ticket. To do this, she can create fake NTP (network time protocol) responses to force Bob’s clock to synchronize to a different value or, if Bob is paranoid and uses a GPS receiver to synchronize time, create fake GPS signals.

A way to avoid the replay of the ticket without using timestamps is to add a session ID to each message. The rest of the Otway-Rees protocol differs a bit from Needham-Schroeder but is conceptually very similar.

  1. Alice sends a message to Bob that contains:

    • A session ID
    • Aoth of their IDs
    • A message encrypted with Alice’s secret key. This encrypted message contains Alice and Bob’s IDs as well as the session ID.
  2. Bob sends Trent a request to communicate with Alice, containing:

    • Alice’s message
    • A message encrypted with Bob’s secret key that also contains the session ID.
  3. Trent now knows that Alice wants to talk to Bob since the session ID is inside her encrypted message and that Bob agrees to talk to Alice since that same session ID is inside his encrypted message.

  4. Trent creates a random session key encrypted for Bob and the same key encrypted for Alice and sends both of those to Bob, along with the session key.

The protocol also incorporates nonces to ensure that there is no replay attack on Trent’s response even if an attacker sends a message to Bob with a new session ID and old encrypted session keys (that were cracked by the attacker).

Kerberos

Kerberos is a trusted third-party authentication, authorization, and key exchange protocol using symmetric cryptography and based closely on the Needham-Schroeder protocol with the Denning-Sacco modification (the use of timestamps).

When Alice wants to talk with Bob (they can be users and services), she first needs to ask Kerberos. If access is authorized, Kerberos will send her two messages. One is encrypted with Alice’s secret key and contains the session key for her communication with Bob. The other message is encrypted with Bob’s secret key. Alice cannot read or decode this second message. It is called a ticket (sometimes known as a sealed envelope) and contains the same session key that Alice received but is encrypted for Bob. Alice will send that to Bob. When Bob decrypts it, he knows that the message must have been generated by an entity that knows its secret key: Kerberos. Now that Alice and Bob both have the session key, they can communicate securely by encrypting all traffic with that session key.

To avoid replay attacks, Kerberos places a timestamp in Alice’s response and in the ticket. For Alice to authenticate herself to Bob, she needs to prove that she could extract the session key from the encrypted message Kerberos sent her. She proves this by generating a new timestamp, encrypting it with the session key, and sending it to Bob. Bob now needs to prove to Alice that he can decode messages encrypted with the session key. He takes Alice’s timestamp, adds one (just to permute the value), and sends it back to Alice, encrypted with their session key.

Since your secret key is needed to decrypt every service request you make of Kerberos, you’ll end up typing your password each time you want to access a service. Storing the key in a file to cache it is not a good idea. Kerberos handles this by splitting itself into two components that run the same protocol: the authentication server (AS) and the ticket granting server (TGS). The authentication server handles the initial user request and provides a session key to access the TGS. This session key can be cached for the user’s login session and allows the user to send requests to the TGS without re-entering a password. The TGS is the part of Kerberos that handles requests for services. It also returns two messages to the user: a different session key for the desired service and a ticket that must be provided to that service.

Public key authentication

Public key authentication relies on the use of nonces, similar to the way they were used to authenticate users using the Needham-Schroeder protocol. A nonce is is generated on the fly and used to present to the other party as a challenge for them to prove that they are capable of encrypting something with a specific key that they possess. The use of a nonce is central to public key authentication.

If Alice wants to authenticate Bob, she needs to have Bob prove that he possesses his private key (private keys are never shared). To do this, Alice generates a nonce (a random bunch of bits) and sends it to Bob, asking him to encrypt it with his private key. If she can decrypt Bob’s response using Bob’s public key and sees the same nonce, she will be convinced that she is talking to Bob because nobody else will have Bob’s private key. Mutual authentication requires that each party authenticate itself to the other: Bob will also have to generate a nonce and ask Alice to encrypt it with her private key.

User interaction

In the next family of protocols, we will look at mechanisms that involve user interaction.

Password Authentication Protocol

The classic authentication method is the use of reusable passwords. This is known as the password authentication protocol, or PAP. The system asks you to identify yourself (login name) and then enter a password. If the password matches that which is associated with the login name on the system then you’re authenticated.

Password guessing defenses

To avoid having an adversary carry out a password-guessing attack, we need to make it not feasible to try a large number of passwords. A common approach is to rate-limit guesses. When the system detects an incorrect password, it will wait several seconds before allowing the user to try again. Linux, for example, waits about three seconds. After five bad guesses, it terminates and restarts the login process.

Another approach is to completely disallow password guessing after a certain number of failed attempts by locking the account. This is common for some web-based services, such as banks. However, the system has now been made vulnerable to a denial-of-service attack. An attacker may not be able to take your money but may inconvenience you by disallowing you to access it as well.

Hashed passwords

One problem with the password authentication protocol is that if someone gets hold of the password file on the system, then they have all the passwords. The common way to thwart this is to store hashes of passwords instead of the passwords themselves. This takes advantage of the one-way property of the hash: anyone who sees the hash still has no way of computing your password.

To authenticate a user, the system simply checks if hash(password) = stored_hashed_password. If someone got hold of the password file, they’re still stuck since they won’t be able to reconstruct the original password from the hash. They’ll have to resort to an exhaustive search (also known as a brute-force search) to search for a password that hashes to the value in the file. The hashed file should still be protected from read access by normal users to keep them from performing an exhaustive search.

A dictionary attack is an optimization of the search that tests common passwords, including dictionary words, known common passwords, and common letter-number substitutions rather than every possible combination of characters. Moreover, an intruder does not need to perform such search on each hashed password to find the password. Instead, the results of a dictionary search can be stored in a file and later searched to find a corresponding hash in a password file. These are called precomputed hashes. To guard against this, a password is concatenated with a bunch of extra random characters, called salt. These characters make the password substantially longer and would make a table of precomputed hashes insanely huge and impractical. Such a table would need to go far beyond a dictionary list and create hashes of all possible - and long - passwords. The salt is not a secret – it is stored in plaintext in the password file in order to validate a user’s password. Its only function is to make using precomputed hashes impractical and ensure that even identical passwords do not generate the same hashed results. An intruder would have to select one specific hashed password and do a brute-force or dictionary attack on just that password, adding salt to each guess prior to hashing it.

Spraying and Stuffing attacks

Password Spraying:
Password spraying is an attack where an attacker tries a small number of common passwords (e.g., “123456” or “password”) across a large number of accounts. Unlike brute force attacks that target a single account with multiple passwords, password spraying avoids detection by limiting the number of attempts per account, making it harder for security systems to flag the activity.
Credential Stuffing:
Credential stuffing is an attack where attackers use a large list of previously leaked or stolen username and password combinations to try logging into various systems. Since many users reuse passwords across multiple services, attackers rely on these credentials working for other accounts. Automated tools are often used to test the credentials on multiple platforms quickly.

Password recovery options

Passwords are bad. They are not incredibly secure. English text has a low entropy (approximately 1.2–1.5 bits per character) and are often easy to guess. Password files from some high-profile sites have been obtained to validate just how bad many people are at picking passwords. Over 90% of all user passwords sampled are on a list of the top 1,000 passwords. The most common password is password. People also tend to reuse passwords. If an attacker can get passwords from one place, there is a good chance that many will work with other services.

Despite many people picking bad passwords, people often forget them, especially when they are trying to be good and use different passwords for different accounts. There are several common ways of handling forgotten passwords, none of them great:

Email them:
This used to be a common solution and should be dying off. It requires that the server stores the password, which means it is not stored as a hash. This exposes the risk that anyone seeing your email will see your password.
Reset them:
This is more common but requires authenticating the requestor to avoid a denial of service attack. The common thing to do is to send a password reset link to an email address entered when the account was created. We again have the problem that if someone has access to your mail, they will have access to the password reset link and can create a new password for your account. In both these cases, we have the problem that users may no longer have the same email address. Think of the people who switched from Comcast to get Verizon FiOS and switched their comcast.net addresses to verizon.net (note: avoid using email addresses tied to services or locations that you might change).
Provide hints:
This is common for system logins (e.g. macOS and Windows). However, a good hint may weaken the password or may not help the user.
Ask questions:
It is common for sites to ask questions (“what was your favorite pet’s name?”, “what street did you live on when you were eight years old?”). The answers to many of these questions can often be found through some searching or via social engineering. A more clever thing is to have unpredictable answers (“what was your favorite pet’s name?” “Osnu7$Qbv999”) but that requires storing answers somewhere.
Rely on users to write them down:
This is fine as long as the thread model is electronic-only and you don’t worry about someone physically searching for your passwords.

One-time Passwords

The other problem with reusable passwords is that if a network is insecure, an eavesdropper may sniff the password from the network. A potential intruder may also simply observe the user typing a password. To thwart this, we can turn to one-time passwords. If someone sees you type a password or gets it from the network stream, it won’t matter because that password will be useless for future logins.

There are three forms of one-time passwords:

  1. Sequence-based. Each password is a function of the previous password. S/Key is an example of this.

  2. Challenge-based. A password is a function of a challenge provided by the server. CHAP is an example of this.

  3. Time-based. Each password is a function of the time. TOTP and RSA’s SecurID are example of this.

Sequence-based: S/Key

S/Key authentication allows the use of one-time passwords by generating a list via one-way functions. The list is created such that password n is generated as f(password[n-1]), where f is a one-way function. The list of passwords is used backwards. Given a password password[p], it is impossible for an observer to compute the next valid password because a one-way function f makes it improbably difficult to compute the inverse function, f-1(password[p]), to get the next valid password, password[p-1].

Challenge-based: CHAP

The Challenge-Handshake Authentication Protocol (CHAP) is an authentication protocol that allows a server to authenticate a user without sending a password over the network.

Both the client and server share a secret (essentially a password). A server creates a random bunch of bits (called a nonce) and sends it to the client (user) that wants to authenticate. This is the challenge.

The client identifies itself and sends a response that is the hash of the shared secret combined with the challenge. The server has the same data and can generate its own hash of the same challenge and secret. If the hash matches the one received from the client, the server is convinced that the client knows the shared secret and is therefore legitimate.

An intruder that sees this hash cannot extract the original data. An intruder who sees the challenge cannot create a suitable hashed response without knowing the secret. Note that this technique requires passwords to be accessible at the server and the security rests on the password file remaining secure.

Challenge-based: Passkeys

Passkey authentication is an implementation of public key authentication that is designed to eliminate the use of passwords. A user first logs onto a service via whatever legacy login protocol the service supports: typically a username-password or the additional use of a time-based one-time password or SMS authentication code. After that, the user’s device generates a public-private key pair for that specific service. The public key is sent to the service and associated with the user, much like a password was in the past. Note that the public key is not secret. The private key is stored on the user’s device.

Once passkey authentication is set up, the user logs in by providing their user name. The server generates a random challenge string (at least 16 bytes long) and sends it to the user. The user’s device retrieves the private key for the desired service. This is generally stored securely on the device and unlocked via Face ID, Touch ID, or a local password. None of this information, including the private key, is sent to the server. Using the private key, the device creates a digital signature for the challenge provided by the service and sends the result to the service.

The server looks up the user’s public key, which was registered during enrollment, and verifies the signature against the challenge (that is, decrypts the data sent by the user and sees if it matches a hash of the original challenge string). If the signature is valid, the service is convinced that the other side holds a valid private key that corresponds to the public key that is associated with the user and is, therefore, the legitimate user.

Time-based: TOTP

With the Time-based One Time Password (TOTP) protocol, both sides share a secret key. To authenticate, a user runs the TOTP function to create a one-time password. The TOTP function is a hash:

password := hash(secret_key, time) % 10<sup>password_length</sup>

The resultant hash is taken modulo some number that determines the length of the password. A time window of 30 seconds is usually used to provide a reasonably coarse granularity of time that doesn’t put too much stress on the user or requirements for tight clock synchronization. The service, which also knows the secret key and time, can generate the same hash and hence validate the value presented by the user.

TOTP is often used as a second factor (proof that you have some device with the secret configured in it) in addition to a password. The protocol is widely supported by companies such as Amazon, Dropbox, WordPress, Microsoft, and Google.

Hash-based: HOTP

A variation of TOTP is the Hash-based One Time Password (HOTP). As with TOTP, both side share a secret key but this time, instead of using the time, they use an incrementing counter.

HOTP is a one-time password algorithm that generates a unique, time-independent password using a shared secret and a counter. Each time the counter is incremented, a new password is created. Like TOTP, this method is often used as a second factor in multi-factor authentication, ensuring that each password can only be used once. Since it’s not time-based, users can enter the code anytime before the next one is generated.

Push notifications

Push notifications rely on sending a notification via phone-based SMS messaging (or sometimes email) to validate that a user is in possession of their device (the “something you have” factor). They are often used as a second factor in multi-factor authentication.

Second Factor Authentication with Push Notifications:
This method adds an additional layer of security by sending a push notification to the user’s registered device during login attempts. The user must approve the notification to complete authentication, ensuring that even if credentials are compromised, unauthorized access is prevented.

MFA Fatigue occurs when users are overwhelmed by frequent multi-factor authentication (MFA) requests, leading to careless behavior, such as approving authentication requests without proper verification. Attackers can exploit this by repeatedly sending prompts in hopes that the user will approve one out of frustration or by mistake, granting unauthorized access. This type of fatigue can weaken the security benefits of MFA.

Number Matching Authentication:
Number matching authentication is a technique where the user is presented with a randomly generated number on the device they are logging into, and they must confirm by entering the same number on their second factor device (e.g., mobile phone). This prevents unauthorized approval of login attempts, reducing phishing risks.

Risk-Based Authentication (RBA)

Risk-based authentication dynamically adjusts the level of security required during the login process based on the perceived risk of the request. Factors such as location, device, IP address, or the time of the request are evaluated to determine the risk level. If the system detects unusual activity (e.g., a login attempt from a different country), it may require additional authentication steps, such as multi-factor authentication, to ensure the legitimacy of the access. This approach balances security and user convenience by adapting to the risk profile.

Man-in-the-Middle attacks (Adversary in the Middle)

Authentication protocols can be vulnerable to man-in-the-middle (MitM) attacks. While the traditional term has been MitM, they’re also referred to as Adversary-in-the-Middle (AitM) attacks. In this attack, Alice thinks she is talking to Bob but is really talking to Mike (the man in the middle, an adversary). Mike, in turn talks to Bob. Any message that Alice sends gets forwarded by Mike to Bob. Mike forwards any response from Bob gets back to Alice. This way, Mike allows Alice and Bob to carry out their authentication protocol. Once Bob is convinced he is talking with Alice, Mike can drop Alice and communicate with Bob directly, posing as Alice … or stay around and read their messages, possibly changing them as he sees fit.

The protocols that are immune to this are those where Alice and Bob establish an encrypted channel using trusted keys. For example, with Kerberos, both Alice and Bob get a session key that is encrypted only for each of them. Mike cannot find it even if he intercepts their communications.

With public key cryptography, Mike can take over after Bob is convinced he is talking with Alice. To avoid a man-in-the-middle attack Alice will have to send Bob a session key. If she uses public key cryptography to do the key exchange, as long as the message from Alice is signed, Mike will not be able to decrypt the session key or forge a new one.

Biometric authentication

Biometric authentication is the process of identifying a person based on their physical or behavioral characteristics as opposed to their ability to remember a password or their possession of some device. It is the third of the three factors of authentication: something you know, something you have, and something you are.

It is also fundamentally different than the other two factors because it does not deal with data that lends itself to exact comparisons. For instance, sensing the same fingerprint several times will not likely give you identical results each time. The orientation may differ, the pressure and angle of the finger may result in some parts of the fingerprint appearing in one sample but not the other, and dirt, oil, and humidity may alter the image. Biometric authentication relies on pattern recognition and thresholds: we have to determine whether two patterns are close enough to accept them as being the same.

A false acceptance rate (FAR) is when a pair of different biometric samples (e.g., fingerprints from two different people) is accepted as a match. A false rejection rate (FRR) is when a pair of identical biometric samples is rejected as a match. Based on the properties of the biometric data, the sensor, the feature extraction algorithms, and the comparison algorithms, each biometric device has a characteristic ROC (Receiver Operating Characteristic) curve. The name derives from early work on RADAR and maps the false acceptance versus false rejection rates for a given biometric authentication device. For password authentication, the “curve” would be a single point at the origin: no false accepts and no false rejects. For biometric authentication, which is based on thresholds that determine if the match is “close enough”, we have a curve.

At one end of the curve, we can have an incredibly low false acceptance rate (FAR). This is good as it means we will not have false matches: the enemy stays out. However, it also means the false reject rate (FRR) will be very high. If you think of a fingerprint biometric, the stringent comparison needed to yield a low FAR means that the algorithm will not be forgiving to a speck of dirt, light pressure, or a finger held at a different angle. We get high security at the expense of inconveniencing legitimate users, you may have to present their finger repeatedly for sensing, hoping that it will eventually be accepted.

At the other end of the curve, we have a very low false rejection rate (FRR). This is good since it provides convenience to legitimate users. Their biometric data will likely be accepted as legitimate, and they will not have to deal with the frustration of re-sensing their biometric, hoping that their finger is clean, not too greasy, not too dry, and pressed at the right angle with the correct pressure. The trade-off is that it’s more likely that another person’s biometric data will be considered close enough as well and accepted as legitimate.

Numerous biological components can be measured. They include fingerprints, irises, blood vessels on the retina, hand geometry, facial geometry, facial thermographs, and many others. Data such as signatures and voice can also be used, but these often vary significantly with one’s state of mind (your voice changes if you’re tired, ill, or angry). They are behavioral systems rather than purely physical systems, such as your iris patterns, length of your fingers, or fingerprints, and tend to have lower recognition rates. Other behavioral biometrics include keystroke dynamics, mouse use characteristics, gait analysis, and even cognitive tests.

Regardless of which biometric is used, the important thing to do to make it useful for authentication is to identify the elements that make it different. Most of us have swirls on our fingers. What makes fingerprints different from finger to finger are the various variations in those swirls: ridge endings, bifurcations, enclosures, and other elements beyond that of a gently sloping curve. These features are called minutia. The presence of minutia, their relative distances from each other and their relative positions can allow us to express the unique aspect of a fingerprint as a relatively compact stream of bits rather than a bitmap.

Two important elements of biometrics are robustness and distinctiveness. Robustness means that the biometric data will not change much over time. Your fingerprints will look mostly the same next year and the year after. Your fingers might grow fatter (or thinner) over the years and at some point in the future, you might need to re-register your hand geometry data.

Distinctiveness relates to the differences in the biometric pattern among the population. Distinctiveness is also affected by the precision of a sensor. A finger length sensor will not measure your finger length to the nanometer, so there will be quantized values in the measured data. Moreover, the measurements will need to account for normal hand swelling and shrinking based on temperature and humidity, making the data even less precise. Accounting for these factors, approximately one in a hundred people may have hand measurements similar to yours. A fingerprint sensor may typically detect 40–60 distinct features that can be used for comparing with other sensed fingerprints. An iris scan, on the other hand, will often capture over 250 distinct features, making it far more distinctive and more likely to identify a unique individual.

Some sensed data is difficult to normalize. Here, normalization refers to the ability to align different sensed data to some common orientation. For instance, identical fingers might be presented at different angles to the sensors. The comparison algorithm will have to account for possible rotation when comparing the two patterns. The inability to normalize data makes it difficult to perform efficient searches. There is no good way to search for a specific fingerprint short of performing a comparison against each stored pattern. Data such as iris scans lends itself to normalization, making it easier to find potentially matching patterns without going through an exhaustive search.

In general, the difficulty of normalization and the fact that no two measurements are ever likely to be the same makes biometric data not a good choice for identification. It is difficult, for example, to construct a system that will store hundreds of thousands of fingerprints and allow the user to identify and authenticate themselves by presenting their finger. Such a system will require an exhaustive search through the stored data and each comparison will itself be time-consuming as it will not be a simple bit-by-bit match test. Secondly, fingerprint data is not distinct enough for a population of that size. A more realistic system will use biometrics for verification and have users identify themselves through some other means (e.g., type their login name) and then present their biometric data. In this case, the software will only have to compare the pattern associated with that user.

The biometric authentication process comprises several steps:

  1. Enrollment. Before any authentication can be performed, the system needs to store the user’s biometric data to later use it for comparison. The user will have to present the data to the sensor, distinctive features need to be extracted, and the resulting pattern stored. The system may also validate if the sensed data is of sufficiently high quality or ask the user to repeat the process several times to ensure consistency in the data.

  2. Sensing. The biological component needs to be measured by presenting it to a sensor, a dedicated piece of hardware that can capture the data (e.g., a camera for iris recognition, a capacitive fingerprint sensor). The sensor captures the raw data (e.g., an image).

  3. Feature extraction. This is a signal processing phase where the interesting and distinctive components are extracted from the raw sensed data to create a biometric pattern that can be used for matching. This process involves removing signal noise, discarding sensed data that is not distinctive or not useful for comparisons and determining whether the resulting values are of sufficiently good quality that it makes sense to use them for comparison. A barely-sensed fingerprint, for instance, may not present enough minutia to be considered useful.

  4. Pattern matching. The extracted sample is now compared to the stored sample that was obtained during the enrollment phase. Features that match closely will have small distances. Given variations in measurements, it is unlikely that the distance will be zero, which would indicate a perfect match.

  5. Decision. The “distance” between the sensed and stored samples is now evaluated to decide if the match is close enough. The decision determination decides whether the system favors more false rejects or more false accepts.

Security implications

Several security issues relate to biometric authentication.

Sensing
Unlike passwords or encryption keys, biometric systems require sensors to gather the data. The sensor, its connectors, the software that processes sensed data, and the entire software stack around it (operating system, firmware, libraries) must all be trusted and tamper-proof.
Secure communication and storage
The communication path after the data is captured and sensed must also be secure so that attackers will have no ability to replace a stored biometric pattern with one of their own.
Liveness
Much biometric data can be forged. Gummy fingerprints can copy real fingerprints, pictures of faces or eyes can fool cameras into believing they are looking at a real person, and recordings can be used for voice-based authentication systems.
Thresholds
Since biometric data relies on “close-enough” matches, you can never be sure of a certain match. You will need to determine what threshold is good enough and hope that you do not annoy legitimate users too much or make it too easy for the enemy to get authenticated.
Lack of compartmentalization
You have a finite set of biological characteristics to present. Fingerprints and iris scans are the most popular biometric sources. Unlike passwords, where you can have distinct passwords for each service, you cannot have this with biometric data.
Theft of biometric data
If someone steals your password, you can create a new one. If someone steals your fingerprint, you have nine fingerprints left and then none. If someone gets a picture of your iris, you have one more left. Once biometric data is compromised, it remains compromised.

Bitcoin & Blockchain

Bitcoin is considered to be the first blockchain-based cryptocurrency and was designed as an open, distributed, public system: there is no authoritative entity and anyone can participate in operating the servers.

With a centralized system, all trust resides in a trusted third party, such as a bank. The system fails if the bank disappears, the banker makes a mistake, or if the banker is corrupt. With Bitcoin, the goal was to create a completely decentralized, distributed system that allows people to manage transactions while preventing opportunities for fraud.

Cryptographic building blocks of bitcoin

Bitcoin uses a few key cryptographic structures.

Hash pointers

A hash pointer is similar to a traditional pointer, but instead of just containing the address (a reference to) of the next block of data, it also includes a cryptographic hash of the data in the next block. When a hash pointer points to a block of data, it effectively links to the data and provides a way to verify that the data has not been tampered with. If any alteration occurs in the data, the cryptographic hash will change, indicating a discrepancy between the expected hash (stored in the hash pointer) and the hash calculated from the altered data.

This feature of hash pointers is particularly crucial for implementing blockchains and distributed ledgers. In a blockchain, each block contains a hash pointer that points to the previous block, creating a secure and unbreakable chain of blocks. This structure ensures that if an attacker attempts to alter the data in any block, they would need to alter all subsequent blocks in the chain due to the interconnected hashes, a task that, as we will see, is computationally infeasible due to Bitcoin’s proof-of-work requirements.

Merkle trees

Merkle trees provide a way to efficiently and securely verify the contents of large data sets. A Merkle tree is a binary tree where each leaf node contains the hash of a block of data, and each non-leaf node contains the hash of the concatenation of its child nodes' hashes. At the top level is a single hash, known as the root hash or Merkle root, that represents the entirety of the data within the tree.

The beauty of a Merkle tree lies in its ability to quickly verify whether a specific piece of data is included in the set by traversing the tree of hashes from the target data’s hash up the Merkle root of the tree.

In a blockchain, each block contains a Merkle tree of all the transactions within that block. This allows for the verification of any single transaction without needing to inspect the entire block.

Public key cryptography and digital signatures

Public key cryptography is used in the inputs and outputs of transactions. Each user creates a public key, which can be shared with anyone, and a private key, which is kept secret by the owner. These keys are mathematically related but it is computationally infeasible to deduce the private key from the public key.

Digital signatures use the user’s private key to sign a message, creating a signature that anyone can verify using that user’s public key. Signing is effectively taking a hash of the message and encrypting it with a private key. The signature allows someone to use the corresponding public key to verify the integrity of the message.

The ledger and the Bitcoin network

Here’s a breakdown of Bitcoin and its core security concepts:

Distributed Ledger, Blocks, and Blockchains: Bitcoin relies on a distributed ledger, a public record of all transactions stored in blocks. Each block contains a batch of transactions and is cryptographically linked to the previous block, forming a blockchain. This chain structure prevents altering any block without changing subsequent blocks, securing the transaction history. The ledger is a complete list of every transaction since its creation in January 2009.

There is no concept of a master node or master copies of the ledger. Anyone can download the software and run a Bitcoin node – they all run the same algorithms. New systems get the names of some well-known nodes when they download the software. After connecting to one or more nodes, a Bitcoin node will ask each for a list of known Bitcoin nodes. This creates a peer discovery process that allows a node to get a complete list of other nodes in the network.

User Identification (“Addresses”): A user creates a public/private key pair for every “identity.” Identities are anonymous; Bitcoin users are identified by addresses, which are unique identifiers derived as a hash of a public key. An address allows users to receive Bitcoin but doesn’t reveal personal identity. Unlike public keys, addresses are more concise and user-friendly representations for transactions. Public keys are essential for verifying transactions, while addresses are used for receiving funds.

Transaction Components: Bitcoin transactions involve:

  • Inputs: Sources of funds, pointing to previous outputs (earlier transactions where the output is the address of the user of this transaction)
  • Outputs: Destinations for funds, including recipient addresses.
  • Change: Any leftover funds from the inputs that return to the sender.
  • Fee: The miner’s reward for validating the transaction, usually a small portion of the transaction.

Double-Spending Problem: Double spending refers to attempting to spend the same Bitcoin in multiple transactions. Bitcoin prevents this by recording transactions in a public ledger, which miners validate to ensure each coin is spent only once.

Merkle Trees in Bitcoin: In the Bitcoin blockchain, Merkle trees help organize transactions within each block. The root of the Merkle tree summarizes all transactions in a block, allowing quick verification of any single transaction to determine if it belongs in the block without examining all others.

Mining and Proof of Work (PoW): Mining is the process of adding new blocks to the blockchain. Miners compete to solve a puzzle, and the first to solve it gets to add the block. This puzzle is the challenge of modifying data in the block header so that the hash of the header will be less than a specified number (e.g., the first 17 bits will be 0). Since the output of a hash cannot be predicted, finding the right value, known as proof of work, requires a large investment of computational resources, making it costly to tamper with the blockchain and recompute PoW values for every block after the tampered block.

Target Hash: In mining, the target hash is a number that defines the difficulty of the cryptographic puzzle. Miners try to find a hash below this target, adjusting their calculations repeatedly until successful. Bitcoin’s difficulty adjustment algorithm ensures block creation occurs roughly every 10 minutes. If mining becomes faster due to increased computational power, the difficulty increases; if slower, it decreases.

In the case where two miners solve a block at nearly the same time, or an attacker tries to present an alternate chain, competing chains (forks) can emerge. Bitcoin’s protocol resolves this by choosing the longest chain, demonstrating the most work, ensuring a single version of the blockchain remains dominant. A 51% attack occurs when a single entity controls over 50% of the network’s mining power, allowing them to potentially reverse transactions, double-spend coins, or block others from confirming transactions, threatening the network’s integrity.

Access Control

Access control mechanisms are fundamental to security, managing how resources are accessed by users, processes, and devices. The protection framework ensures that users or processes interact with resources only as authorized, preventing unauthorized use and potential security breaches. It includes setting policies, authenticating users, managing privileges, and auditing access events.

At its core, access control ensures that authorized users have the necessary permissions to perform specific actions, whether that involves reading, writing, or executing files and applications.

The Role of the Operating System

The operating system (OS) is the gatekeeper for resources like the CPU, memory, files, network connections, and any connected devices. Through access control, the OS not only protects itself from applications but also keeps applications isolated from one another. The Trusted Computing Base (TCB), comprising the OS and supporting hardware, enforces this security by managing which processes can control specific resources and when they can do so.

User Mode and Kernel Mode

User mode and kernel mode are two essential operating states in a computer’s operating system, designed to protect system resources by restricting user applications’ access to hardware and critical functions. They correspond to operating states in a processor.

In user mode, applications have limited privileges, meaning they cannot directly access hardware, certain regions of memory, and privileged instructions. This isolation prevents applications from interfering with each other or the OS. Crashes in user mode are contained to the application itself, preserving overall system stability.

Kernel mode (or supervisor mode) grants the processor full access to system resources and privileged instructions, allowing it to manage memory maps, process scheduling, hardware, and file systems. This mode is critical for core operations but carries risk: a kernel mode crash can compromise the entire system.

Traps, violations, and interrupts switch a process from user mode to kernel mode to safely access system services:

  1. Traps are invoked by explicit instructions and transfer control to a predefined location. A system call is a form of a trap.
  2. Violations occur when a process attempts unauthorized actions, such as accessing an unmapped memory region or a privileged instruction.
  3. Interrupts are hardware signals that demand immediate OS attention, like responding to network or timer events.

Protection Rings

Protection rings define levels of access, with Ring 0 (kernel mode) holding the highest privileges for OS functions. Ring 3 (user mode) is the least privileged, where most applications operate, relying on the OS for resource access. Intermediate rings (Rings 1 and 2) exist but are rarely used today.

Implementing Access Control

A protection domain is a security boundary that defines what resources a process (or user) can access and which actions it can perform on them. Each domain has its own permissions, creating a controlled environment for accessing specific files, devices, or data.

An access control matrix is a grid where rows represent subjects (such as users or processes) and columns represent objects (like files or devices). Each entry in the matrix details the permissions a subject has on an object. Having an operating system implement and manage such a matrix, often with hundreds of users and hundreds of thousands of objects is not feasible. Practical implementations are simplified through structures like:

Access Control Lists (ACLs):
Assign permissions to objects, allowing certain subjects to perform specific actions.
Capability Lists:
Associates permissions with subjects, detailing actions allowed on particular objects. These structures offer efficient run-time checking but require careful management as objects and permissions change.

A capability list is associated with a subject and details the objects they can access, along with the permitted actions. This approach grants users or processes specific rights, simplifying access control by focusing on what each subject is allowed to do across multiple resources.

Specific permissions are mapped in this matrix, although practical implementations are simplified through structures like:

  • Access Control Lists (ACLs): Assign permissions to objects, allowing certain subjects to perform specific actions.
  • Capability Lists: Link permissions to subjects, detailing actions allowed on particular objects. These structures offer efficient run-time checking but require careful management as objects and permissions change.

Unix (POSIX) Access Controls

When Unix was first developed, it prioritized simplicity and efficiency in its access control model to match the limited resources and single-user systems of early computing environments. All information (metadata) about a file was contained in a fixed-size inode structure in the file system, which does not support and arbitrarily-long access control list.

The model provided only three permission categories: owner, group, and others, with read, write, and execute permissions for each. This basic structure was efficient and minimized the storage and processing required to manage access control.

As Unix and its offspring (the POSIX-compliant family of operating systems, including Linux, FreeBSD, NetBSD, macOS) evolved to support more complex access requirements, the three-category model began to show its limitations. Users needed more flexibility to set permissions for specific individuals or groups beyond these fixed categories. Thus, Access Control Lists (ACLs) were introduced to provide finer-grained access control.

In addition to standard permissions, Unix includes the setuid (set user ID) mechanism, which is crucial in managing privileged access. When a file has the setuid bit set, it allows users to execute the file with the file owner’s privileges rather than their own. This is particularly important for allowing users temporary, controlled access to higher privileges without giving them unrestricted access. For example, a program that changes a user’s password needs to modify system files, an operation typically restricted to the root user. By setting the setuid bit on the password-changing program, any user can execute it with root privileges for the limited purpose of password modification, ensuring necessary access without compromising overall system security.

Principle of Least Privilege

The principle of least privilege states that a user or program should have only the minimum level of access required to perform its tasks. This reduces the risk of accidental or malicious misuse by limiting the potential damage if a component is compromised.

Privilege separation involves dividing a program into distinct components, each with specific privileges and responsibilities. By splitting a program into separate modules with distinct access rights, the program minimizes the risk associated with each component, as only critical parts have high privileges.

A program can be divided into:

  1. Privileged Component: Handles tasks requiring elevated permissions, like accessing sensitive files.
  2. Unprivileged Component: Manages general functions that don’t require special access.

For example, a network server might have an unprivileged part that interacts with users and a privileged part that accesses protected files. Only the privileged component operates with higher permissions, reducing the attack surface if an unprivileged part is exploited.

Access Control Models

  1. Discretionary Access Control (DAC): DAC allows users to manage permissions for the resources they own. Users can assign or revoke permissions, often with the flexibility to share access. It’s common in general-purpose systems, although the approach can introduce risks if users mismanage their permissions.

  2. Mandatory Access Control (MAC): In MAC, policy decisions are centrally managed, restricting user control over permissions. This model limits unauthorized data sharing by setting rigid rules on access based on policies rather than user decisions.

Multi-Level Security (MLS) Systems control access to information based on both the user’s clearance level and the data’s classification level, ensuring that sensitive information is accessible only to authorized users. MLS systems are often used in environments requiring strict confidentiality, such as government or military organizations. One foundational model used in MLS is the Bell-LaPadula (BLP) Model, which is primarily designed to enforce confidentiality.

Multi-Level Security: the Bell-LaPadula (BLP) Model

The Bell-LaPadula (BLP) is the best-known model of multi-level security (MLS). It was designed with government security clearance levels in mind and focuses on preserving confidentiality – not leaking sensitive information. It controls access based on security clearance levels and the sensitivity of information, creating different “levels” of access. It operates on two main rules:

  1. Simple Security Property (No Read Up): Users cannot read data at a higher security level than their own clearance. For instance, a user with “Secret” clearance cannot access “Top Secret” information. This rule prevents unauthorized users from viewing sensitive data, enforcing confidentiality by limiting upward access.

  2. *-Property or Star Property (No Write Down): Users cannot write data to a lower security level than their own clearance. For instance, a user with “Top Secret” clearance cannot save data to a “Confidential” level. This rule prevents data leaks by blocking the transfer of sensitive information to lower, less secure levels.

Each user and data item in the system is assigned a classification level (e.g., Top Secret, Secret, Confidential, Unclassified). The Bell-LaPadula model uses these levels to control data flow and prevent information from leaking to unauthorized users. By enforcing “No Read Up” and “No Write Down” rules, the model ensures that data only flows in a secure direction, protecting high-sensitivity information from being exposed.

Multilateral Security and the Lattice Model

Multi-Lateral Security controls access based on compartments or labels rather than hierarchical levels. It’s used in environments where different groups need access to different categories of information. For example, a government might divide information into compartments like “Project A” and “Project B,” accessible only to specific groups or individuals.

This model enforces that users have access only within their label or compartment, preventing cross-access. For instance, a defense team and a finance team might both work on classified projects but should not see each other’s data.

Multi-Lateral Security is usually used as an extension of Multi-Level Security. The Lattice Model provides a graph structure that combines multi-level and multi-lateral security by representing access permissions in a lattice (a directed graph). Each point in the lattice represents a security level or label, and access decisions are made based on whether a user’s clearance level dominates or is dominated by an object’s security label.

In the lattice, levels and compartments combine as nodes (points), with arrows showing permitted flows. For example, if “Secret” dominates “Confidential,” someone with “Secret” clearance can access both levels (assuming they have the same compartment lables too), but the reverse is not true. This model allows both hierarchical and compartmental access control, flexibly managing complex permission schemes.

Type Enforcement Model

The Type Enforcement (TE) controls access by assigning labels or “types” to both users (subjects) and resources (objects). In this model, each type defines a set of permissions, and only specific subjects (e.g., processes or users) with assigned types can interact with particular objects. The TE model enforces rules based on these types rather than individual user identity or role.

Role-Based Access Control

Role-Based Access Control (RBAC) assigns permissions based on roles within an organization, allowing users access based on their job responsibilities. In RBAC, roles are predefined sets of permissions that users inherit, enabling efficient and scalable access management.

Role-based permissions may look like group permissions but there’s a difference. Roles define what actions a user can perform, based on their responsibilities (e.g., “Manager” role with permission to approve requests). Groups are collections of users, often for administrative or organizational purposes, but don’t inherently define permissions. In RBAC, roles focus on access and permissions tied to duties, while groups are often used to simplify user management.

Biba Integrity Model

The Biba Integrity Model focuses on protecting data integrity, ensuring that information isn’t improperly modified. Unlike the Bell-LaPadula model, which emphasizes confidentiality, Biba enforces rules to prevent unauthorized or untrusted modifications.

  1. Simple Security Property (No Read Down): Users cannot read data at a lower integrity level to avoid being influenced by less trustworthy information.
  2. Star Property (No Write Up): Users cannot write to data at a higher integrity level, preventing low-integrity users from contaminating high-integrity data.

For example, in a financial system, a user with “Low Integrity” clearance (like a trainee) can’t update “High Integrity” data (such as audited financial records), protecting the accuracy and reliability of critical information.

The Chinese Wall Model

The Chinese Wall Model is an access control model designed to prevent conflicts of interest by restricting access based on a user’s previous interactions. It’s often used in financial and consulting environments to avoid leaking sensitive information between competing clients.

  1. Simple Security Property: Users can only access information if they haven’t already accessed data from a competing entity within the same “conflict of interest class.”
  2. Star Property: Users can only write to an object if they have no read access to any conflicting data. This prevents a user from sharing information between competing entities.

For example, a consultant working with Company A in the banking sector cannot access data from Company B, another bank, to avoid conflicts of interest. If they view Company A’s data, they’re automatically restricted from viewing or sharing data from Company B.

Program Hijacking

Program hijacking refers to techniques that can be used to take control of a program and have it do something other than what it was intended to do. One class of techniques uses code injection, in which an adversary manages to add code to the program and change the program’s execution flow to run that code.

The best-known set of attacks are based on buffer overflow. Buffer overflow is the condition where a programmer allocates a chunk of memory (for example, an array of characters) but neglects to check the size of that buffer when moving data into it. Data will spill over into adjacent memory and overwrite whatever is in that memory.

Languages such as C, C++, and assembler are susceptible to buffer overflows since the language does not have a means of testing array bounds. Hence, the compiler cannot generate code to validate that data is only going into the allocated buffer. For example, when you copy a string using strcpy(char *dest, char *src), you pass the function only source and destination pointers. The strcpy function has no idea how big either of the buffers are.

Stack-based overflows

When a process runs, the operating system’s program loader allocates a region for the executable code and static data (called the text and data segments), a region for the stack, and a region for the heap (used for dynamic memory allocation, such as by malloc).

Just before a program calls a function, it pushes the function’s parameters onto the stack. When the call is made, the return address gets pushed on the stack. On entry to the function that was called, the function pushes the current frame pointer (a register in the CPU) on the stack, which forms a linked list to the previous frame pointer and provides an easy way to revert the stack to where it was before making the function call. The frame pointer register is then set to the current top of the stack. The function then adjusts the stack pointer to make room for hold local variables, which live on the stack. This region for the function’s local data is called the stack frame. Ensuring that the stack pointer is always pointing to the top of the stack enables the function to get interrupts or call other functions without overwriting anything useful on the stack. The compiler generates code to reference parameters and local variables as offsets from the current frame pointer register.

Before a function returns, the compiler generates code to:

  • Adjust the stack back to point to where it was before the stack expanded to make room for local variables. This is done by copying the frame pointer to the stack pointer.

  • Restore the previous frame pointer by popping it off the stack (so that local variables for the previous function could be referenced properly).

  • Return from the function. Once the previous frame pointer has been popped off the stack, the stack pointer points to a location on the stack that holds the return address.

Simple stack overflows

Local variables are allocated on the stack and the stack grows downward in memory. Hence, the top of the stack is in lower memory than the start, or bottom, of the stack. If a buffer (e.g., char buf[128]) is defined as a local variable, it will reside on the stack. As the buffer gets filled up, its contents will be written to higher and higher memory addresses. If the buffer overflows, data will be written further down the stack (in higher memory), overwriting the contents of any other variables that were allocated for that function and eventually overwriting the saved frame pointer and the saved return address.

When this happens and the function tries to return, the return address that is read from the stack will contain garbage data, usually a memory address that is not mapped into the program’s memory. As such, the program will crash when the function returns and tries to execute code at that invalid address. This is an availability attack. If we can exploit the fact that a program does not check the bounds of a buffer and overflows the buffer, we can cause a program to crash.

Subverting control flow through a stack overflow

Buffer overflow can be used in a more malicious manner. The buffer itself can be filled with bytes of valid machine code. If the attacker knows the exact size of the buffer, she can write just the right number of bytes to write a new return address into the very same region of memory on the stack that held the return address to the parent function. This new return address points to the start of the buffer that contains the injected code. When the function returns, it will “return” to the new code in the buffer and execute the code at that location.

Off-by-one stack overflows

As we saw, buffer overflow occurs because of programming bugs: the programmer neglected to make sure that the data written to a buffer does not overflow. This often occurs because the programmer used old, unsafe functions that do not allow the programmer to specify limits. Common functions include:

- strcpy(char *dest, char *src)

- strcat(char *dest, char *src)

- sprintf(char *format, ...)

Each of these functions has a safe counterpart that accepts a count parameter so that the function will never copy more than count number of bytes:

- strcpy(char *dest, char *src, int count)

- strcat(char *dest, char *src, int count)

- sprintf(char *format, int count,  ...)

You’d think this would put an end to buffer overflow problems. However, programmers may miscount or they may choose to write their own functions that do not check array bounds correctly. A common error is an off-by-one error. For example, a programmer may declare a buffer as:

char buf[128];

and then copy into it with:

for (i=0; i <= 128; i++)
    buf[i] = stuff[i];

The programmer inadvertently used a <= comparison instead of <.

With off-by-one bounds checking, there is no way that malicious input can overwrite the return address on the stack: the copy operation would stop before that time. However, if the buffer is the first variable that is allocated on the stack, an off-by-one error can overwrite one byte of the saved frame pointer.

The potential for damage depends very much on what the value of that saved frame pointer was and how the compiler generates code for managing the stack. In the worst case, it could be set up to a value that is 255 bytes lower in memory. If the frame pointer is modified, the function will still return normally. However, upon returning, the compiler pops the frame pointer from the stack to restore the saved value of the calling function’s frame pointer, which was corrupted by the buffer overflow. Now the program has a modified frame pointer.

Recall that references to a function’s variables and parameters are expressed as offsets from the current frame pointer. Any references to local variables may now be references to data in the buffer. Moreover, should that function return, it will update its stack pointer to this buffer area and return to an address that the attacker defined.

Heap overflows

Not all data is allocated on the stack: only local variables. Global and static variables are placed in a region of memory right above the executable program. Dynamically allocated memory (e.g., via new or malloc) comes from an area of memory called the heap. In either case, since this memory is not the stack, it does not contain return addresses so there is no ability for a buffer overflow attack to overwrite return addresses.

We aren’t totally safe, however. A buffer overflow will cause data to spill over into higher memory addresses above the buffer that may contain other variables. If the attacker knows the order in which variables are allocated, they could be overwritten. While these overwrites will not change a return address, they can change things such as filenames, lookup tables, or linked lists. Some programs make extensive use of function pointers, which may be stored in global variables or in dynamically-allocated structures such as linked lists on a heap. If a buffer overflow can overwrite a function pointer then it can change the execution of the program: when that function is called, control will be transferred to a location of the attacker’s choosing.

If we aren’t sure of the exact address at which execution will start, we can fill a buffer with a bunch of NOP (no operation) instructions prior to the injected code. If the processor jumps anywhere in that region of memory, it will happily execute these NOP instructions until it eventually reaches the injected code. This is called a NOP slide, or a landing zone.

Format string attacks with printf

The family of printf functions are commonly used in C and C++ to create formatted output. They accept a format string that defines what will be printed, with % characters representing formatting directives for parameters. For example,

printf("value = %05d\n", v);

Will print a string such as

value = 01234

if the value of v is 1234.

Reading arbitrary memory

Occasionally, programs will use a format string that could be modified. For instance, the format string may be a local variable that is a pointer to a string. This local variable may be overwritten by a buffer overflow attack to point to a different string. It is also common, although improper, for a programmer to use printf(s) to print a fixed string s. If s is a string that is generated by the attacker, it may contain unexpected formatting directives.

Note that printf takes a variable number of arguments and matches each % directive in the format string with a parameter. If there are not enough parameters passed to printf, the function does not know that: it assumes they are on the stack and will happily read whatever value is on the stack where it thinks the parameter should be. This gives an attacker the ability to read arbitrarily deep into the stack. For example, with a format string such as:

printf("%08x\n%08x\n%08x\n%08x\n");

printf will expect four parameters, all of which are missing. It will instead read the next four values that are on the top of the stack and print each of those integers as an 8-character-long hexadecimal value prefixed with leading zeros (“%08x\n”).

Writing arbitrary memory

The printf function also contains a somewhat obscure formatting directive: %n. Unlike other % directives that expect to read a parameter and format it, %n instead writes to the address corresponding to that parameter. It writes the number of characters that it has output thus far. For example,

printf(“paul%n says hi”, &printbytes);

will store the number 4 (strlen("paul")) into the variable printbytes. An attacker who can change the format specifier may be able to write to arbitrary memory. Each % directive to print a variable will cause printf to look for the next variable in the next slot in the stack. Hence, format directives such as %x, %lx, %llx will cause printf to skip over the length of an int, long, or long long and get the next variable from the following location on the stack. Thus, just like reading the stack, we can skip through any number of bytes on the stack until we get to the address where we want to modify a value. At that point, we insert a %n directive in the format string, which will modify that address on the stack with the number of bytes that were output. We can precisely control the value that will be written by specifying how many bytes are output as part of the format string. For example, a format of %.55000x tells printf to output a value to take up 55,000 characters. By using formats like that for output values, we can change the count that will be written with %n. Remember, we don’t care what printf actually prints; we just want to force the byte count to be a value we care about, such as the address of a function we want to call.

Defense against hijacking attacks

Better programming

Hijacking attacks are the result of sloppy programming: a lack of bounds checking that results in overflows. They can be eliminated if the programmer never uses unsafe functions (e.g., use strncpy instead of strcpy) and is careful about off-by-one errors.

A programer can use a technique called fuzzing to locate buffer overflow problems. Whenever a string can be provided by the user, the user will enter extremely long strings with well-defined patterns (e.g., “\[\]$$…”). If the app crashes because a buffer overflow destroyed a return address on the stack, the programmer can then load the core dump into a debugger, identify where the program crashed and search for a substring of the entered pattern (“\[\]$”) to identify which buffer was affected.

Buffer overflows can be avoided by using languages with stronger type checking and array bounds checking. Languages such as Java, C#, and Python check array bounds. C and C++ do not. However, it is sometimes difficult to avoid using C or C++.

Tight specification of requirements, coding to those requirements, and constructing tests based on those requirements helps avoid buffer overflow bugs. If input lengths are specified, they are more likely to be coded and checked. Documentation should be explicit, such as "user names longer than 32 bytes must be rejected.”

Data Execution Prevention (DEP)

Buffer overflows affect data areas: either the stack, heap, or static data areas. There is usually no reason that those regions of code should contain executable code. Hence, it makes sense for the operating system to set the processor’s memory management unit (MMU) to turn off execute permission for memory pages in those regions.

This was not possible with early Intel or AMD processors: their MMU did not support enabling or disabling execute permissions. All memory could contain executable code. That changed in 2004, when Intel and AMD finally added an NX (no-execute) bit to their MMU’s page tables. On Intel architectures, this was called the Execute Disable Bit (XD). Operating system support followed. Windows, Linux, and macOS all currently support DEP.

DEP cannot always be used. Some environments, such as some LISP interpreters actually do need execution enabled in their stack and some environments need executable code in their heap section (to support dynamic loading, patching, or just-in-time compilation). DEP also does not guard against data modification attacks, such as heap-based overflows or some printf attacks.

DEP attacks

Attackers came up with some clever solutions to defeat DEP. The first of these is called return-to-libc*. Buffer overflows still allow us to corrupt the stack. We just cannot execute code on the stack. However, there is already a lot of code sitting in the program and the libraries it uses. Instead of adding code into the buffer, the attacker merely overflows a buffer to create a new return address and parameter list on the stack. When the function returns, it switches control to the new return address. This return address will be an address in the standard C library (libc), which contains functions such as printf, system, and front ends to system calls. All that an attacker often needs to do is to push parameters that point to a string in the buffer that contains a command to execute and then “return” to the libc system function, whose function is to execute a parameter as a shell command.

A more sophisticated variant of return-to-libc is Return Oriented Programming (ROP). Return oriented programming is similar to return-to-libc but realizes that execution can branch to any arbitrary point in any function in any loaded library. The function will execute a series of instructions and eventually return. The attacker will overflow the stack with data that now tells this function where to “return”. Its return can jump to yet another arbitrary point in another library. When that returns, it can – once again – be directed to an address chosen by the intruder that has been placed further down the stack, along with frame pointers, local variables, and parameters.

There are lots and lots of return instructions among all the libraries normally used by programs. Each of these tail ends of a function is called a gadget. It has been demonstrated that using carefully chosen gadgets allows an attacker to push a string of return addresses that will enable the execution of arbitrary algorithms. To make life easier for the attacker, tools have been created that search through libraries and identify useful gadgets. A ROP compiler then allows the attacker to program operations using these gadgets.

Address Space Layout Randomization

Stack overflow attacks require knowing and injecting an address that will be used as a target when a function returns. ROP also requires knowing addresses of all the entry points of gadgets. Address Space Layout Randomization (ASLR) is a technique that was developed to have the operating system’s program loader pick random starting points for the executable program, static data, heap, stack, and shared libraries. Since code and data resides in different locations each time the program runs, the attacker is not able to program buffer overflows with useful known addresses. For ASLR to work, the program and all libraries must be compiled to use position independent code (PIC), which uses relative offsets instead of absolute memory addresses.

Stack canaries

A stack canary is a compiler technique to ensure that a function will not be allowed to return if a buffer overflow took place that may have clobbered the return address.

At the start of a function, the compiler adds code to generate a random integer (the canary) and push it onto the stack before allocating space for the function’s local variables (the entire region of the stack used by a local function is called a frame). The canary sits between the return address and these variables. If there is a buffer overflow in a local variable that tries to change the return address, that overflow will have to clobber the value of the canary.

The compiler generates code to have the function check that the canary has a valid value before returning. If the value of the canary is not the original value then a buffer overflow occurred and it’s very likely that the return value has been altered.

However, you may still have a buffer overflow that does not change the value of the canary or the return address. Consider a function that has two local arrays (buffers). They’re both allocated on the stack within the same stack frame. If array A is in lower memory than array B then an overflow in A can affect the contents of B. Depending on the code, that can alter the way the function works. The same thing can happen with scalar variables (non-arrays). For instance, suppose the function allocates space for an integer followed by an array. An overflow in the array can change the value of the integer that’s in higher memory. The canary won’t detect this. Even if the overflow happened to clobber the return value as well, the check is made only when the function is about to return. Meanwhile, it’s possible that the overflow that caused other variables to change also altered the behavior of the function.

Stack canaries cannot fix this problem in general. However, the compiler (which creates the code to generate them and check them) can take steps to ensure that a buffer overflow cannot overwrite non-array variables, such as integers and floats. By allocating arrays first (in higher memory) and then scalar variables, the compiler can make sure that a buffer overflow in an array will not change the value of scalar variables. One array overflowing to another is still a risk, however, but it is most often the scalar variables that contain values that define the control flow of a function.

Intel’s Control-flow Enforcement Technology (CET) introducd a shadow stack alongside the main stack, specifically dedicated to storing return addresses. This shadow stack is safeguarded by a memory protection attribute in the processor’s Memory Management Unit (MMU), preventing unauthorized modifications. Control flow instructions automatically push return addresses onto both the main stack and the shadow stack. When the processor executes a return instruction, it compares return addresses on both stacks. If they match, execution continues; if not, that indicates a buffer overflow and a fault is generated, enabling the operating system to terminate the process.

Command Injection

We looked at buffer overflow and printf format string attacks that enable the modification of memory contents to change the flow of control in the program and, in the case of buffer overflows, inject executable binary code (machine instructions). Other injection attacks enable you to modify inputs used by command processors, such as interpreted languages or databases. We will now look at these attacks.

SQL Injection

It is common practice to take user input and make it part of a database query. This is particularly popular with web services, which are often front ends for databases. For example, we might ask the user for a login name and password and then create a SQL query:

sprintf(buf,
	”SELECT * from logininfo WHERE username = '%s' AND password = '%s’;",
	uname, passwd);

Suppose that the user entered this for a password:

' OR 1=1 --

We end up creating this query string1:

SELECT * from logininfo WHERE username = 'paul' AND password = '' OR 1=1 -- ';

The “--” after “1=1” is a SQL comment, telling it to ignore everything else on the line. In SQL, OR operations have precendence over AND so the query checks for a null password (which the user probably does not have) or the condition 1=1, which is always true. In essence, the user’s “password” turned the query into one that ignores the user’s password and unconditionally validates the user.

Statements such as this can be even more destructive as the user can use semicolons to add multiple statements and perform operations such as dropping (deleting) tables or changing values in the database.

This attack can take place because the programmer blindly allowed user input to become part of the SQL command without validating that the user data does not change the quoting or tokenization of the query. A programmer can avoid the problem by carefully checking the input. Unfortunately, this can be difficult. SQL contains too many words and symbols that may be legitimate in other contexts (such as passwords) and escaping special characters, such as prepending backslashes or escaping single quotes with two quotes can be error prone as these escapes differ for different database vendors. The safest defense is to use parameterized queries, where user input never becomes part of the query but is brought in as parameters to it. For example, we can write the previous query as:

uname = getResourceString("username");
passwd = getResourceString("password");
query = "SELECT * FROM users WHERE username = @0 AND password = @1";
db.Execute(query, uname, passwd);

A related safe alternative is to use stored procedures. They have the same property that the query statement is not generated from user input and parameters are clearly identified.

While SQL injection is the most common code injection attack, databases are not the only target. Creating executable statements built with user input is common in interpreted languages, such as Shell, Perl, PHP, and Python. Before making user input part of any invocable command, the programmer must be fully aware of parsing rules for that command interpreter.

Shell attacks

The various POSIX2 shells (sh, csh, ksh, bash, tcsh, zsh) are commonly used as scripting tools for software installation, start-up scripts, and tying together workflow that involves processing data through multiple commands. A few aspects of how many of the shells work and the underlying program execution environment can create attack vectors.

system() and popen() functions

Both system and popen functions are part of the Standard C Library and are common functions that C programmers use to execute shell commands. The system function runs a shell command while the popen function also runs the shell command but allows the programmer to capture its output and/or send it input via the returned FILE pointer.

Here we again have the danger of turning improperly-validated data into a command. For example, a program might use a function such as this to send an email alert:

char command[BUFSIZE];
snprintf(command, BUFSIZE, "/usr/bin/mail –s \"system alert\" %s", user);
FILE *fp = popen(command, "w");

In this example, the programmer uses snprintf to create the complete command with the desired user name into a buffer. This incurs the possibility of an injection attack if the user name is not carefully validated. If the attacker had the option to set the user name, she could enter a string such as:

nobody; rm -fr /home/*

which will result in popen running the following command:

sh -c "/usr/bin/mail -s \"system alert\" nobody; rm -fr /home/*"

which is a sequence of commands, the latter of which deletes all user directories.

Other environment variables

The shell PATH environment variable controls how the shell searches for commands. For instance, suppose

PATH=/home/paul/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games

and the user runs the ls command. The shell will search through the PATH sequentially to find an executable filenamed ls:

/home/paul/bin/ls
/usr/local/bin/ls
/usr/sbin/ls
/usr/bin/ls
/bin/ls
/usr/local/games/ls

If an attacker can either change a user’s PATH environment variable or if one of the paths is publicly writable and appears before the “safe” system directories, then he can add a booby-trapped command in one of those directories. For example, if the user runs the ls command, the shell may pick up a booby-trapped version in the /usr/local/bin directory. Even if a user has trusted locations, such as /bin and /usr/bin foremost in the PATH, an intruder may place a misspelled version of a common command into another directory in the path. The safest remedy is to make sure there are no untrusted directories in PATH.

Some shells allow a user to set an ENV or BASH_ENV variable that contains the name of a file that will be executed as a script whenever a non-interactive shell is started (when a shell script is run, for example). If an attacker can change this variable then arbitrary commands may be added to the start of every shell script.

Shared library environment variables

In the distant past, programs used to be fully linked, meaning that all the code needed to run the program, aside from interactions with the operating system, was part of the executable program. Since so many programs use common libraries, such as the Standard C Library, they are not compiled into the code of an executable but instead are dynamically loaded when needed.

Similar to PATH, LD_LIBRARY_PATH is an environment variable used by the operating system’s program loader that contains a colon-separated list of directories where libraries should be searched. If an attacker can change a user’s LD_LIBRARY_PATH, common library functions can be overwritten with custom versions. The LD_PRELOAD environment variable allows one to explicitly specify shared libraries that contain functions that override standard library functions.

LD_LIBRARY_PATH and LD_PRELOAD will not give an attacker root access but they can be used to change the behavior of program or to log library interactions. For example, by overwriting standard functions, one may change how a program generates encryption keys, uses random numbers, sets delays in games, reads input, and writes output.

As an example, let’s suppose we have a program that prints random numbers:

#include <time.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
	int i;

	srand(time(NULL));
	for (i=0; i < 10; i++)
		printf("%d\n", rand()%100);
	return 0;
}

We can compile this via:

$ gcc -o random random.c

When run, we may get output containing 10 random numbers:

$ ./random
9
57
13
1
83
86
45
63
51
5

Let us write a replacement rand function that always returns the same value. We’ll put it in a file called rand.c:

int rand() {
	return 42;
}

We compile it into a shared library named newrandom.so:

gcc -shared -fPIC rand.c -o newrandom.so

Now we set the LD_PRELOAD environment variable to this library and run the program:

$ export LD_PRELOAD=$PWD/newrandom.so
$ ./random
42
42
42
42
42
42
42
42
42
42

Note that our program now behaves differently, and we did not have to recompile it or run it differently.

Input sanitization

The important lesson in writing code that uses any user input in forming commands is that of input sanitization. Input must be carefully validated to make sure it conforms to the requirements of the application that uses it and does not try to execute additional commands, escape to a shell, set malicious environment variables, or specify out-of-bounds directories or devices.

File descriptors

POSIX systems have a convention that programs expect to receive three open file descriptors when they start up:

  • file descriptor 0: standard input
  • file descriptor 1: standard output
  • file descriptor 2: standard error

Functions such as printf, scanf, puts, getc and others expect these file descriptors to be available for input and output. When a program opens a new file, the operating system searches through the file descriptor table and allocates the first available unused file descriptor. Typically this will be file descriptor 3. However, if any of the three standard file descriptors are closed, the operating system will use one of those as an available, unused file descriptor.

The vulnerability lies in the fact that we may have a program running with elevated privileges (e.g., setuid root) that modifies a file that is not accessible to regular users. If that program also happens to write to the user via, say, printf, there is an opportunity to corrupt that file. The attacker simply needs to close the standard output (file descriptor 1) and run the program. When it opens its secret file, it will be given file descriptor 1 and will be able to do its read and write operations on the file. However, whenever the program will print a message to the user, the output will not be seen by the user as it will be directed to what printf assumes is the standard output: file descriptor 1. Printf output will be written onto the secret file, thereby corrupting it.

The shell command (bash, sh, or ksh) for closing the standard output file is an obscure-looking >&-. For example:

./testfile >&-

Comprehension Errors

The overwhelming majority of security problems are caused by bugs or misconfigurations. Both often stem from comprehension errors. These are mistakes created when someone – usually the programmer or administrator – does not understand the details and every nuance of what they are doing. Some example include:

  • Not knowing all possible special characters that need escaping in SQL commands.

  • Not realizing that the standard input, output, or error file descriptors may be closed.

  • Not understanding how access control lists work or how to configure mandatory access control mechanisms such as type enforcement correctly.

If we consider the Windows CreateProcess function, we see it is defined as:

BOOL WINAPI CreateProcess(
  _In_opt_    LPCTSTR               lpApplicationName,
  _Inout_opt_ LPTSTR                lpCommandLine,
  _In_opt_    LPSECURITY_ATTRIBUTES lpProcessAttributes,
  _In_opt_    LPSECURITY_ATTRIBUTES lpThreadAttributes,
  _In_        BOOL                  bInheritHandles,
  _In_        DWORD                 dwCreationFlags,
  _In_opt_    LPVOID                lpEnvironment,
  _In_opt_    LPCTSTR               lpCurrentDirectory,
  _In_        LPSTARTUPINFO         lpStartupInfo,
  _Out_       LPPROCESS_INFORMATION lpProcessInformation);

We have to wonder whether a programmer who does not use this frequently will take the time to understand the ramifications of correctly setting process and thread security attributes, the current directory, environment, inheritance handles, and so on. There’s a good chance that the programmer will just look up an example on places such as github.com or stackoverflow.com and copy something that seems to work, unaware that there may be obscure side effects that compromise security.

As we will see in the following sections, comprehension errors also apply to the proper understanding of things as basic as various ways to express characters.

Directory parsing

Some applications, notably web servers, accept hierarchical filenames from a user but need to ensure that they restrict access only to files within a specific point in the directory tree. For example, a web server may need to ensure that no page requests go outside of /home/httpd/html.

An attacker may try to gain access by using paths that include .. (dot-dot), which is a link to the parent directory. For example, an attacker may try to download a password file by requesting

http://poopybrain.com/../../../etc/passwd

The hope is that the programmer did not implement parsing correctly and might try simply suffixing the user-requested path to a base directory:

"/home/httpd/html/" + "../../../etc/passwd"

to form

/home/httpd/html/../../../etc/passwd

which will retrieve the password file, /etc/passwd.

A programmer may anticipate this and check for dot-dot but has to realize that dot-dot directories can be anywhere in the path. This is also a valid pathname but one that should be rejected for trying to escape to the parent:

http://poopybrain.com/419/notes/../../416/../../../../etc/passwd

Moreover, the programmer cannot just search for .. because that can be a valid part of a filename. All three of these should be accepted:

http://poopybrain.com/419/notes/some..other..stuff/
http://poopybrain.com/419/notes/whatever../
http://poopybrain.com/419/notes/..more.stuff/

Also, extra slashes are perfectly fine in a filename, so this is acceptable:

http://poopybrain.com/419////notes///////..more.stuff/

The programmer should also track where the request is in the hierarchy. If dot-dot doesn’t escape above the base directory, it should most likely be accepted:

http://poopybrain.com/419/notes/../exams/

These are not insurmountable problems but they illustrate that a quick-and-dirty attempt at filename processing may be riddled with bugs.

Unicode parsing

If we continue on the example of parsing pathnames in a web server, let us consider a bug in early releases of Microsoft’s IIS (Internet Information Services, their web server). IIS had proper pathname checking to ensure that attempts to get to a parent are blocked:

http://www.poopybrain.com/scripts/../../winnt/system32/cmd.exe

Once the pathname was validated, it was passed to a decode function that decoded any embedded Unicode characters and then processed the request.

The problem with this technique was that non-international characters (traditional ASCII) could also be written as Unicode characters. A “/” could also be written in HTML as its hexadecimal value, %2f (decimal 47). It could also be represented as the two-byte Unicode sequence %c0%af.

The reason for this stems from the way Unicode was designed to support compatibility with one-byte ASCII characters. This encoding is called UTF-8. If the first bit of a character is a 0, then we have a one-byte ASCII character (in the range 0..127). However, if the first bit is a 1, we have a multi-byte character. The number of leading 1s determine the number of bytes that the character takes up. If a character starts with 110, we have a two-byte Unicode character.

With a two-byte character, the UTF-8 standard defines a bit pattern of

110a bcde   10fg hijk

The values a-k above represent 11 bits that give us a value in the range 0..2047. The “/” character, 0x2f, is 47 in decimal and 0010 1111 in binary. The value represents offset 47 into the character table (called codepoint in Unicode parlance). Hence we can represent the “/” as 0x2f or as the two byte Unicode sequence:

1100 0000   1010 1111

which is the hexadecimal sequence %c0%af. Technically, this is disallowed. The standard states that codepoints less than 128 must be represented as one byte but the two byte sequence is supported by most Unicode parsers. We can also construct a valid three-byte sequence too.

Microsoft’s bug was that they ignored parsing %c0%af as being equivalent to a / because it should not have been used to represent the character. However, the Unicode parser was happy to translate it and attackers were able to use this to access any file in on a server running IIS. This bug also gave attackers the ability to invoke cmd.com, the command interpreter, and execute any commands on the server.

After Microsoft fixed the multi-byte Unicode bug, another problem came up. The parsing of escaped characters was recursive, so if the resultant string looked like a Unicode hexadecimal sequence, it would be re-parsed.

As an example of this, let’s consider the backslash (``````), which Microsoft treats as equivalent to a slash (/) in URLs since their native pathname separator is a backlash3.

The backslash can be written in a URL in hexadecimal format as %5c. The “%” character can be expressed as %25. The “5” character can be expressed as %35. The “c” character can be expressed as %63. Hence, if the URL parser sees the string %%35c, it would expand the %35 to the character “5”, which would result in %5c, which would then be converted to a \```. If the parser sees%25%35%63, it would expand each of the%nncomponents to get the string%5c, which would then be converted to a`. As a final example, if the parser comes across ```%255c```, it will expand ```%25``` to ```%``` to get the string ```%5c```, which would then be converted to a ```\`.

It is not trivial to know what a name relates to but it is clear that all conversions have to be done before the validity of the pathname is checked. As for checking the validity of the pathname in an application, it is error-prone. The operating system itself parses a pathname a component at a time, traversing the directory tree and checking access rights as it goes along. The application is trying to recreate a similar action without actually traversing the file system but rather by just parsing the name and mapping it to a subtree of the file system namespace.

TOCTTOU attacks

TOCTTOU stands for Time of Check to Time of Use. If we have code of the form:

if I am allowed to do something
	then do it

we may be exposing ourselves to a race condition. There is a window of time between the test and the action. If an attacker can change the condition after the check then the action may take place even if the check should have failed.

One example of this is the print spooling program, lpr. It runs as a setuid program with root privileges so that it can copy a file from a user’s directory into a privileged spool directory that serves as a queue of files for printing. Because it runs as root, it can open any file, regardless of permissions. To keep the user honest, it will check access permissions on the file that the user wants to print and then, only if the user has legitimate read access to the file, it will copy it over to the spool directory for printing. An attacker can create a link to a readable file and then run lpr in the background. At the same time, he can change the link to point to a file for which he does not have read access. If the timing is just perfect, the lpr program will check access rights before the file is re-linked but will then copy the file for which the user has no read access.

Another example of the TOCTTOU race condition is the set of temporary filename creation functions (tempnam, tempnam, mktemp, GetTempFileName, etc.). These functions create a unique filename when they are called but there is no guarantee that an attacker doesn’t create a file with the same name before that filename is used. If the attacker creates and opens a file with the same name, she will have access to that file for as long as it is open, even if the user’s program changes access permissions for the file later on.

The best defense for the temporary file race condition is to use the mkstemp function, which creates a file based on a template name and opens it as well, avoiding the race condition between checking the uniqueness of the name and opening the file.

Application confinement

Access control, while essential, is not always sufficient for securing modern systems. Traditional access control mechanisms, modeled on an access matrix, do not address restricting the operations of individual processes. For the most part, they assume that a process has the full authority of the user’s ID under which it executes.

Isolation mechanisms like containers, jails, and namespaces provide mechanisms to restrict the damage that a compromised application may do.

chroot and Jailkits

The chroot command changes the root directory for a process and its children, creating an isolated directory hierarchy where they operate. Often called a “chroot jail,” this environment limits the process’s access to other parts of the file system, improving security by restricting the scope of its operations. However, chroot can only be safely executed by the root user, as it requires elevated privileges to create a secure environment. Without root permissions, it is easy for an ordinary user to use the jail to access an alternate password file to gain root privileges and escape the chroot jail, potentially compromising system security.

A jailkit helps manage chroot environments by providing tools to set up and manage these restricted areas more securely. Jailkits automate setting up a controlled environment, simplifying tasks like configuring file permissions, setting up shell access, and ensuring that the jailed process has limited capabilities. Jailkits are particularly useful for web hosting environments or isolated testing spaces, where it is necessary to confine a process within a specified directory.

FreeBSD Jails

The FreeBSD jail mechanism builds on the concept of chroot but introduces additional controls, making it more robust for isolating services. Jails not only restrict file system access but also limit network access, user processes, and the permissions of the root user within the jail. This means that even if a process within the jail gains root privileges, its actions are constrained to the jail environment. FreeBSD jails prevent the root user inside the jail from interfering with the host system, providing a more secure and controlled environment compared to chroot.

Linux Application Isolation

Linux provides several isolation mechanisms to securely manage applications, including namespaces, capabilities, and control groups (cgroups):

  1. Namespaces: Namespaces isolate different aspects of the system environment for processes, giving each process a unique view of system resources:
  • IPC (Inter-Process Communication): Isolates communication between processes, restricting shared memory access.
  • Network: Gives each process its network stack, with separate network interfaces, routing tables, and firewall rules.
  • Mount (File System): Provides isolated file system views, so a process can have its own file hierarchy.
  • PID (Process IDs): Creates isolated process trees, allowing processes to have their own set of process IDs.
  • User/Group IDs: Allows for mapping of user IDs within namespaces, enabling processes to have different user IDs from the host.
  • Network Name: Offers independent host and domain names in each namespace.
  1. Capabilities: Linux capabilities break down root privileges into smaller units, allowing processes to execute specific privileged operations without full root access. For example, a process can have network control or file modification privileges without full system control, improving security by reducing the risk of abuse if the process is compromised. Capabilities allow an administrator to grant a process specific elevated privileges, regardless of what user ID that process runs under. Even if it runs as root (user ID 0), it can still have a limited ability to run privileged operations.

  2. Control Groups (cgroups): Cgroups manage resource allocation for processes, limiting CPU, memory, file I/O, and network I/O usage. This prevents processes from monopolizing resources, maintaining system stability and performance.

Containers

Containers are lightweight environments that package an application and its dependencies into isolated user spaces, leveraging namespaces, cgroups, and capabilities for security. This approach allows each container to operate independently on a shared OS, with controlled access to system resources.

Containers separate policy from enforcement by abstracting the application environment, reducing comprehension errors by simplifying dependency management. Unlike virtual machines (VMs), containers share the host OS kernel, making them faster and more resource-efficient since they do not require a full OS for each instance.

Key Components of Containers

  • Namespaces: Isolate processes within their own environments.
  • Cgroups: Control resource allocation to prevent resource overuse.
  • Capabilities: Limit privileged operations, reducing security risks.
  • Copy-on-Write File System: Allows containers to share a base file system while adding unique changes, saving space and improving efficiency.

While containers and virtual machines (VMs) both provide isolation, they differ fundamentally. Containers share the host OS kernel, making them more lightweight and faster to deploy. VMs, in contrast, emulate entire systems, including a separate OS, which consumes more resources but provides stronger isolation due to the OS-level separation.

Despite their benefits, containers can introduce security risks. For example, shared kernel vulnerabilities may allow a compromised container to impact the host. Furthermore, misconfigured capabilities or insecure default settings can lead to privilege escalation attacks. Using up-to-date container images and adhering to strict privilege management can help mitigate these risks.

Virtual Machines

As a general concept, virtualization is the addition of a layer of abstraction to physical devices. With virtual memory, for example, a process has the impression that it owns the entire memory address space. Different processes can all access the same virtual memory location and the memory management unit (MMU) on the processor maps each access to the unique physical memory locations that are assigned to the process.

Process virtual machines present a virtual CPU that allows programs to execute on a processor that does not physically exist. The instructions are interpreted by a program that simulates the architecture of the pseudo machine. Early pseudo-machines included o-code for BCPL and P-code for Pascal. The most popular pseudo-machine today is the Java Virtual Machine (JVM). This simulated hardware does not even pretend to access the underlying system at a hardware level. Process virtual machines will often allow “special” calls to invoke system functions or provide a simulation of some generic hardware platform.

Operating system virtualization is provided by containers, where a group of processes is presented with the illusion of running on a separate operating system but, in reality, shares the operating system with other groups of processes – they are just not visible to the processes in the container.

System virtual machines*, allow a physical computer to act like several real machines, with each machine running its own operating system (on a virtual machine) and applications that interact with that operating system. The key to this machine virtualization is not to allow each operating system to have direct access to certain privileged instructions in the processor. These instructions would allow an operating system to directly access I/O ports, MMU settings, the task register, the halt instruction, and other parts of the processor that could interfere with the processor’s behavior and with the other operating systems on the system. Instead, a trap and emulate approach is used. Privileged instructions, as well as system interrupts, are caught by the Virtual Machine Monitor (VMM), also known as a hypervisor. The hypervisor arbitrates access to physical resources and presents a set of virtual device interfaces to each guest operating system (including the memory management unit, I/O ports, disks, and network interfaces). The hypervisor also handles preemption. Just as an operating system may suspend a process to allow another process to run, the hypervisor will suspend an operating system to give other operating systems a chance to run.

The two configurations of virtual machines are hosted virtual machines and native virtual machines. With a hosted virtual machine (also called a type 2 hypervisor), the computer has a primary operating system installed that has access to the raw machine (all devices, memory, and file system). This host operating system does not run in a virtual environment. One or more guest operating systems can then be run on virtual machines. The VMM serves as a proxy, converting requests from the virtual machine into operations that get sent to and executed on the host operating system. A native virtual machine (also called a type 1 hypervisor) is one where there is no “primary” operating system that owns the system hardware. The hypervisor is in charge of access to the devices and provides each operating system drivers for an abstract view of all the devices.

Security implications

Virtual machines (VMs) provide a deep layer of isolation, encapsulating the operating system along with all the applications it runs and files it needs within a secure environment separate from the physical hardware. Unlike lighter confinement methods like containers, a VM-contained compromise affects only that VM, akin to a contained issue in a physical machine.

Despite this isolation, VMs can still pose risks if compromised. Malicious entities can exploit VMs to attempt attacks on other systems within the same physical environment, leveraging the shared physical resources. Such scenarios underscore potential vulnerabilities in even well-isolated environments, highlighting the need for vigilant security practices across all layers.

A specific threat in such environments is the creation of covert channels through side channel attacks. These channels exploit system behaviors like CPU load variations to clandestinely transmit information between VMs clandestinely, bypassing conventional communication restrictions. This technique reveals how attackers can bridge gaps between highly secure and less secure systems, manipulating physical resource signals to communicate stealthily.

Application Sandboxing

Application sandboxing provides a restricted environment to safely execute potentially harmful software, minimizing system-wide risks. It restricts program operations based on predefined rules, allowing only certain actions within the system.

This mechanism is crucial for running applications from unknown sources and is also extensively used by security researchers to monitor software behavior and detect malware. Sandboxes enforce restrictions on file access, network usage, and other system interactions, offering a fundamental layer of security by controlling application capabilities in a more fine-grained manner than traditional methods like containers or jails.

While mechanisms like jails and containers, which include namespaces, control groups, and capabilities, are great for creating an environment to run services without the overhead of deploying virtual machines, they do not fully address the ability to restrict what normal applications can do.

We want to protect users from their applications: give users the ability to run apps but restrict what those apps can do on a per-app basis, such as opening files only with a certain name or permitting only TCP networking.

Sandboxing is currently supported on a wide variety of platforms at either the kernel or application level. We’ll examine three ways in which they can be built.

1. Application sandboxing via system call interposition & user-level validation

An example of a user-level sandbox is the Janus sandbox. Application sandboxing with Janus involves creating policies to define permissible system calls for each application. Janus uses a kernel module to intercept these calls and sends them to a user-level monitor program that decides whether to allow or block the call based on the configured policy file. Challenges include maintaining system state across processes and handling complex scenarios like network and file operations, pathname parsing, and potential race conditions (TOCTTOU issues).

2. Application sandboxing with integrated OS support

The better alternative to having a user-level process decide on whether to permit system calls is to incorporate policy validation in the kernel. Some operating systems provide kernel support for sandboxing. These include the Android Application Sandbox, the iOS App Sandbox, the macOS sandbox, and AppArmor on Linux. Microsoft introduced the Windows Sandbox in December 2018, but this functions far more like a container than a traditional application sandbox, giving the process an isolated execution environment.

Seccomp-BPF (SECure COMPuting with Berkeley Packet Filters) is a Linux security framework that enables limits on which system calls a process can execute. It uses the Berkeley Packet Filter to evaluate system calls as “packets,” and applying rules that govern their execution. Though it doesn’t provide complete isolation on its own, Seccomp is an essential component for constructing robust application sandboxes when combined with other mechanisms like namespaces and control groups.

3. Process virtual machine sandboxes: Java

The Java Virtual Machine (JVM) was designed to run compiled Java applications in a controlled manner on any system regardless of the operating system or hardware architecture. The JVM employs three main components to ensure security:

  1. Bytecode Verifier: It scrutinizes Java bytecode before execution to confirm it adheres strictly to Java’s standards without security breaches like bypassing access controls or array bounds.

  2. Class Loader: This component safeguards against the loading of untrusted classes and ensures the integrity of runtime environments through Address Space Layout Randomization (ASLR), maintaining the security of essential class libraries.

  3. Security Manager: This enforces protection domains that define permissible actions within the JVM. It intercepts calls to sensitive methods, verifying permissions against a security policy, which can restrict actions like file and network access, preventing operations not allowed by the policy.

Building an effective sandbox in Java has proven complex, highlighted by persistent bugs, especially in the underlying C libraries and across different JVM implementations. Moreover, Java’s allowance for native methods can bypass these security mechanisms, introducing potential risks.

Malware

Malware is a term that refers to any malicious software that is unintentionally installed on a computer system. Malware can be distributed in various ways: viruses, worms, unintentional downloads, or trojan horses. It may spy on user actions and collect information on them (spyware), or present unwanted ads (adware). It may disable components of the system or encrypt files, undoing its damage if the owner pays money (ransomware). The software may sit dormant and wait for directives from some coordinator (a command and control server), who assembled an arsenal of hundreds of thousands of computers ready to do its bidding (for example, launch a distributed denial of service, DDoS, attack). Some software might be legitimate but may contain backdoors – undocumented ways to allow an outsider to use that software to perform other operations on your system.

Functions of malware

Malware can perform a variety of functions:

Destruction and denial of service:
Wiper malware can delete files or format the entire file system, deleting even the operating system system. Denial of service (DoS) attacks can flood a network or server with requests to make services unavailable to legitimate users. Another form of a DoS attack can lock users from accessing their computers or destroy devices.
Exfiltration
: Exfiltration refers to stealing data. Malware can upload confidential files, authentication credentials, messages. Spyware can track a user’s activity, acquiring browsing history messages being sent or received, file access, keyboard operations via keyloggers, and capture camera and microphone inputs. A side-channel attack exploits unintentional information leaks, such as timing, power consumption, or electromagnetic emissions, to infer sensitive data when direct communication with the system may not be available.
Bots
Bots are processes that are deployed by an attacker and usually sit dormant. The attacked systems are referred to as zombies. These zombies periodically contact a Command & Control (C&C) server that, at the right time, can give them directions for an attack. These directions will often require downloading additional software needed for an attack. Attackers can deploy bots across millions of compromised computers, creating an army of them that is called a botnet. This is instrumental in carrying out distributed denial of service (DDoS) attacks or compute-intensive crypto mining.
Backdoors
A backdoor is a type of malicious code that, once installed, allows an attacker remote access to a computer or network while remaining hidden. This access typically bypasses normal authentication processes, giving attackers the ability to remotely control the affected system, steal sensitive data, or deploy additional malware. For example, a backdoor in a computer system could allow an attacker to remotely execute commands, manipulate files, and monitor user activities without detection and without logging onto the system.
Ransomware
Ransomware is software that will typically lock users from being able to access their system or encrypt their files, demanding payment to re-enable access or avoid disclosure. It may include running a wiper to delete data permanently if the ransom isn’t paid. There are various forms of ransomware, which include:
  • Crypto ransomware: Denial of service malware that encrypts files or storage devices.
  • Locker ransomware: Denial of service malware that locks users out of their devices.
  • Extortion ransomware: Exfiltrates data to a remote site and threatens to expose it.
  • Double extortion ransomware: Exfiltrate data to a remote site before encrypting it and threaten to disclose it if ransom isn’t paid.
Adware
Adware is generally non-destructive but is unwanted. It automatically displays or downloads advertising material such as banners or pop-ups when a user is online. It’s often bundled with free software or services, providing revenue to developers while offering the software at no cost to the user. Adware may compromise privacy by tracking user behavior to target ads more effectively.

Malware Infiltration mechanisms

There are various ways in which malware gets onto a system but the mechanisms fall into two categories:

  1. An attacker exploited some vulnerability to enable the malware to be installed.
  2. You installed the malware unknowingly.

Zero-day vulnerabilities refer to software flaws that are unknown to those who would be interested in mitigating the vulnerability, such as the vendor. The term “zero-day” indicates that the developers have zero days to fix the issue because it has already been exploited in the wild. These vulnerabilities are highly sought after by attackers because they are effective until discovered and patched.

Example: If a hacker discovers an unknown vulnerability in a web browser that allows unauthorized administrative access and this flaw is exploited before the developer becomes aware and fixes it, that is a zero-day vulnerability.

N-day vulnerabilities, also known as known vulnerabilities, refer to software flaws that have been publicly disclosed and for which a patch is often available. The “N” in N-day represents the number of days that have elapsed since the vulnerability was disclosed. Unlike zero-day vulnerabilities, N-day vulnerabilities are already known to vendors and cybersecurity professionals, and patches or workarounds are typically developed to mitigate them.

Example: A vulnerability in an operating system that allows elevation of privileges is reported and patched. If attackers exploit this vulnerability after the patch is released, it is considered an N-day vulnerability, as the patch availability makes it “known.”

Zero-click vulnerabilities are security flaws that allow attackers to execute malicious actions on a target device without requiring any interaction from the user. Unlike traditional exploits that rely on actions like clicking a link or opening a file, zero-click exploits can be triggered automatically, such as when the device processes a specially crafted message, email, or network packet. These vulnerabilities are particularly dangerous because they operate stealthily, often leaving no trace and bypassing user awareness, making them ideal for targeted attacks like espionage or data theft.

Worms and viruses

A virus is a type of malware that attaches itself to a legitimate program and requires human interaction, such as running the infected program, to spread and execute its malicious activities.

Conversely, a worm is a standalone malware that self-replicates and spreads independently across networks without the need for attachment to a specific program or human interaction. For example, a worm might exploit vulnerabilities in a network to spread itself, while a virus might spread via email attachments opened by unsuspecting users.

The distinction from a virus is that a worm runs as a standalone process while a virus requires a host program.

The popular use of both terms, worm and virus, has often blurred the distinctions between them. People often refer to any malware as a virus. Their malicious effects can be similar.

Malware components

Key components of malware include:

Infection Mechanism: The method by which malware spreads or inserts itself into a system, such as through email attachments or exploiting vulnerabilities.

Packer: A tool that compresses or encrypts malware to evade detection from anti-virus software, often making it harder to analyze or identify the malware.

Dropper: A small helper program that installs the main malware, often avoiding detection by not containing the malicious code itself.

Payload: The part of malware designed to perform malicious actions, ranging from data theft to system damage.

Trigger: A condition or event that activates the malware’s payload, like a specific date or user action.

File infector viruses

A file infector virus is a type of malware that attaches itself to executable files and spreads by modifying other executable files it can access. When an infected file is launched, the virus is executed, usually performing malicious actions while also seeking other files to infect. This used to be the dominant mechanism for malware propagation in the early days of PCs but is more challenging with systems where users have restricted permissions or where the OS validates the digital signature of applications and drivers.

Infected flash drives

Malware can spread through USB devices in several ways:

  1. Unprotected USB Firmware: Some malware targets the firmware of USB devices, which can be rewritten to include malicious code. When such a compromised device is plugged into any computer, the malware in the firmware can activate and cause the USB device to, for example, behave like a keyboard in addition to a storage device and send keyboard events to invoke a shell and run commands.

  2. USB Drop Attack: This method involves intentionally leaving infected USB drives in public or easily accessible places. Unsuspecting individuals who find and use these drives on their computers inadvertently trigger malware installation.

  3. Malicious Software or Links: USB drives may contain files that, when executed, install malware directly, or they may include links that lead to malicious websites. Opening these files or following these links can initiate the download and installation of harmful software.

Macro viruses

Macro viruses are a type of malware that embed themselves in documents and are executed when the document is opened. They are commonly written in Visual Basic for Applications, targeting Microsoft Office applications. Once activated, they can infect not only the document in which they reside but also other documents, spreading rapidly. These viruses can perform a series of operations from simple annoyances to damaging actions like corrupting files or sending data to third parties.

Even though Microsoft would present a warning about macros, users often explicitly permit them because they believe the content they are accessing is legitimate. Microsoft patched bugs that allowed macros to run without the user’s authorization but, as of 2022, attackers still found ways around these barriers.

Social engineering

By far the most common way that malware enters a system is via deception: the legitimate user of the system installed it unknowingly. This uses a social engineering attack to convince the user that it is in his or her interest to install the software. Social engineering is the art of manipulating, influencing, or deceiving a user into taking some action that is not in his/her or the organization’s best interest.

Attackers exploit human psychology rather than technical hacking techniques to infiltrate systems. This can involve phishing emails, pretexting, baiting with infected media, or any form of communication designed to elicit trust, provoke fear, or create urgency, leading individuals to reveal passwords, install malware, or open malicious links.

Any information the attacker can get about a user can help an attacker create a more convincing social attack. The term pretexting refers to using a concocted scenario to contact a user and get additional information (e.g., an attacker can pretend to be a caller from the IT department or a high-level manager from another location to try to extract information; with some rudimentary information, the attacker can mention some employee, department, or project names to sound like a true insider).

Phishing is a type of cyber attack that involves tricking individuals into revealing sensitive information or downloading malware by masquerading as a trustworthy entity in electronic communications, typically through emails that appear to come from reputable sources. Spear phishing is a more targeted version of phishing, where the attacker chooses specific individuals or organizations and tailors the message based on their characteristics, job positions, or other personal information to increase the likelihood of success. This specificity makes spear phishing significantly more effective and dangerous than generic phishing.

Credential stuffing

An attacker may obtain collections of stolen email addresses (or usernames) and passwords. Since people often use the same name and password on multiple systems, this often give the attacker access to services on other websites on which the user has accounts. Accounts for banking sites are, of course, particularly valuable since they can be a direct conduit for transferring money. This attack is called credential stuffing.

In some situations, such as getting access to a user’s email accounts, an attacker can log onto the systems or services as the owner of the account and install malware, monitor the internal organization, and even send email, disguised as the user (e.g., contact other employees or friends), which becomes a powerful social engineering attack.

Where does malware live?

File infector virus

A file infector virus is a virus that adds itself to an executable program. The virus patches the program so that, upon running, control will flow to the the virus code. Ideally, the code will install itself in some unused area of the file so that the file length will remain unchanged. A comparison of file sizes with the same programs on other systems will not reveal anything suspicious. When the virus runs, it will run the infector to decide whether to install itself on other files. The trigger will then decide whether the payload should be executed. If not, the program will appear to run normally.

Bootloader malware

A bootkits, also known as a boot sector virus, is a type of malware that infects the master boot record (MBR) or similar critical startup sectors of a computer. It loads itself before the operating system starts, giving it high-level control over the system and making it extremely difficult to detect and remove. Boot kits are often used to bypass operating system security measures and provide persistent access to the infected machine, even surviving system reinstalls if the MBR is not specifically cleaned.

JavaScript and PDF files

JavaScript, like Visual Basic, has evolved into a full programming language. Most browsers have security holes that involve Javascript. JavaScript can not only modify the content and structure of a web page but can connect to other sites. This allows any malicious site to leverage your machine. For example, systems can perform port scans on a range of IP addresses and report any detected unsecured services.

PDF (Portable Document Format) files, would seem to be innocent printable documents, incapable of harboring executable code. However, PDF is a complex format that can contain a mix of static and dynamic elements. Dynamic elements may contain Javascript, dynamic action triggers (e.g., “on open”), and the ability to retrieve “live” data via embedded URLs. As with Visual Basic scripts, PDF readers warn users of dynamic content but, depending on the social engineering around the file, the user may choose to trust the file … or not even pay attention to the warning in yet-another-dialog-box.

Backdoors

A backdoor is software that is designed with some undocumented mechanism to allow someone who knows about it to be able to access the system or specific functions in a way that bypasses proper authentication mechanisms. In many cases, they are not designed for malicious use: they may allow a manufacturer to troubleshoot a device or a software author to push an update. However, if adversarial parties discover the presence of a backdoor, they can use it for malicious purposes.

Trojans

A Trojan horse is a program with two purposes: an overt purpose and a covert one. The overt purpose is what compels the user to get and run the program in the first place. The covert purpose is unknown to the user and is the malicious part of the program.

For example, a script with the name of a common Linux command might be added to a target user’s search path. When the user runs the command, the script is run. That script may, in turn, execute the proper command, leading the user to believe that all is well. As a side effect, the script may create a setuid shell to allow the attacker to impersonate that user or mail copy over some critical data. Users install Trojans because they believe they are installing useful software, such as an anti-virus tool (BTW, a lot of downloadable hacker tools contain Trojans: hackers hacking wannabe hackers). The side-effect of this software can activate cameras, enable key loggers, or deploy bots for anonymization servers, DDoS attacks, or spam attacks.

Trojans may include programs (games, utilities, anti-malware programs), downloading services, rootkits (see next) and backdoors (see next). They appear to perform a useful task that does not raise suspicion on the part of the victim.

A Remote Access Trojan (RAT) is a type of Trojan that gives attackers unauthorized remote control over a victim’s computer, allowing them to perform actions like stealing data, monitoring activity, or installing additional malware. It’s a Trojan with a backdoor.

Rootkits

A rootkit is software that is designed to allow an attacker to access a computer and hide the existence of the software … and sometimes hide the presence of the user on the system.

Historically, a basic rootkit would replace common administration commands (such as ps, ls, find, top, netstat, etc.) with commands that mimic their operation but hide the presence of intruding users, intruding processes, and intruding files. The idea is that a system administrator should be able to examine the system and believe that all is fine and the system is free of malware (or of unknown user accounts).

User mode rootkits
A user mode rootkit involves replacing commands, interposing libraries intercepting messages, and patching commonly-used APIs that may divulge the presence of the malware. A skilled administrator may find unmodified commands or import software to detect the intruding software.
Kernel mode rootkits
A kernel mode rootkit is installed as a kernel module. Being in the kernel gives the rootkit unrestricted access to all system resources and the ability to patch kernel structures and system calls. For example, directory listings from the getdents64 system call may not report any names that match the malware. Commands and libraries can be replaced and not give any indication that malicious software is resident in the system.
Hypervisor rootkits
The most insidious rootkits are hypervisor rootkits. A hypervisor rootkit is a type of rootkit that attacks virtualized environments by targeting the hypervisor layer that controls the virtual machines. By infecting the hypervisor, the rootkit can gain control over all the virtual machines running on the host, enabling it to monitor and manipulate operations on these machines. This level of control makes detection and removal exceptionally challenging, as the rootkit can hide its presence from both the operating system and antivirus programs running on the virtual machines.

Deceptive web sites

Malicious links in phishing attacks often lead users to deceptive websites designed to steal login credentials. Attackers register domain names that closely resemble legitimate ones, often exploiting common typing errors (e.g., “gooogle.com” instead of “google.com”), to deceive users into thinking they are visiting legitimate sites.

A related method, combosquatting, uses domains combining legitimate brands with extra terms to create convincing URLs (e.g., “secure-bank-login.com”).

Some sites, even if not inherently malicious, embed harmful ads or links that trick users, such as file-serving or conversion sites with disguised ads resembling download buttons.

Deceptive pop-ups, like fake browser errors, further exploit users by urging them to take harmful actions, such as downloading malware disguised as critical updates. These tactics rely on social engineering to compromise systems without exploiting technical vulnerabilities.

Defenses

Malware was particularly easy to spread on older Windows systems since user accounts, and hence processes, ran with full administrative rights, which made it easy to modify any files on the system and even install kernel drivers. Adding file protection mechanisms, such as a distinction between user and administrator accounts added a significant layer of protection. However, malware installed by the user would run with that user’s privileges and would have full access to all of a user’s files. If any files are read or write protected, the malware can change DAC permissions.

Systems took the approach of warning users if software wanted to install software or asked for elevated privileges. Social engineering hopes to convince users that they actually want to install the software (or view the document). They will happily grant permissions and install the malware. MAC permissions can stop some viruses as they will not be able, for instance, to override write permissions on executable files but macro viruses and the user files are still a problem.

In general, however, studies have shown that by simply taking away admin rights (avoiding privilege escalation) from users, 94% of the 530 Microsoft vulnerabilities that were reported in 2016 could be mitigated and 100% of vulnerabilities in Office 2016 could be mitigated.

Anti-virus (anti-malware) software

There is no way to recognize all possible viruses. Anti-virus software uses two strategies: signature-based and behavior-based approaches.

With signature-based systems, anti-virus programs look for byte sequences that match those in known malware. Each bit pattern is an excerpt of code from a known virus and is called a signature. A virus signature is simply a set of bytes that make up a portion of the virus and allow scanning software to see whether that virus is embedded in a file. The hope is that the signature is long enough and unique enough that the byte pattern will not occur in legitimate programs. This scanning process is called signature scanning. Lists of signatures (“virus definitions”) have to be updated by the anti-virus software vendor as new viruses are discovered. Signature-based detection is used by most anti-virus products.

A behavior-based system monitors the activities of a process (typically the system calls or standard library calls that it makes). Ideally, sandboxing is employed, to ensure that the suspected code is run within a sandbox or even in an interpreted environment within a sandbox to ensure that it cannot cause real damage. Behavior-based systems try to perform anomaly detection. If the observed activity is deemed suspicious, the process is terminated and the user alerted. Sandboxed, behavior-based analysis is often run by anti-malware companies to examine what a piece of suspected malware is actually doing and whether it should be considered to be a virus. A behavior-based can identify previously-unseen malware but these systems tend to have higher false positive rates of detection: it is difficult to characterize exactly what set of operations constitute suspicious behavior.

Malware Countermeasures

Some viruses will take measures to try to defend themselves from anti-virus software.

Signature scanning countermeasures

A common thing to do in malware is to use a packer on the code, unpacking it prior to execution. Packing can be one of several operations:

  • Simply obscure the malware payload by exclusive-oring (xor) with a repeating byte pattern (exclusive-oring the data with the same byte pattern reconstructs it.
  • Compress the code and then uncompress it upon loading it prior to execution.
  • Encrypt the code and decrypt it prior to execution.

All of these techniques will change the signature of a virus. One can scan for a signature of a compressed version of the virus but there are dozens of compression algorithms around, so the scanning process gets more complicated.

With encryption (xor is a simple form of encryption), only the non-encrypted part of the virus contains the unpacking software (decryption software and the key). A virus scanner will need to match the code for the unpacker component since the key and the encrypted components can change each time the virus propagates itself.

Polymorphic viruses mutate their code each time they run while keeping the algorithm the same. This involves replacing sequences of instructions with functionally-identical ones. For example, one can change additions to subtractions of negative numbers, invert conditional tests and branches, and insert or remove no-op instructions. This thwarts signature scanning software because the the byte pattern of the virus is different each time.

Access control countermeasures

Access controls help but do not stop the problem of malware. Containment mechanisms such as containers work well for server software but are usually impractical for user software (e.g., you want Microsoft Word to be able to read documents anywhere in a user’s directories). Application sandboxing is generally far more effective and is a dominant technique used in mobile software.

Trojans, deceptive downloads, and phishing attacks are insidiously difficult to defend against since we are dealing with human nature: users want to install the software or provide the data. They are conditioned to accepting pop-up messages and entering a password. Better detection in browsers & mail clients against suspicious content or URLs helps. However, malware distributors have been known to simply ask a user to rename a file to turn it into one that is recognized by the operating system as an executable file (or a disk image, PDF, or whatever format the malware come in and may otherwise be filtered by the mail server or web browser.

Sandboxing countermeasures

Virusus are unlikely to get through a sandbox (unless there are vulnerabilities or an improper configuration). However, there are areas where malware can address sandboxing:

  1. Vendor examination
    Anti-virus vendors often test software within a tightly configured sandboxed environment so they can detect whether the software is doing anything malicious (e.g., accessing files, devices, or the network in ways it is not supposed to). If they detect that they do have malware, they will dig in further and extract a signature so they can update and distribute their list of virus definitions. Viruses can try to get through this examination phase by setting a trigger to keep the virus from immediately performing malicious actions or to stay dormant for the first several invocations. The hope is that the anti-virus vendors will not see anything suspicious and the virus will never be flagged as such by their software.

  2. User configuration (entitlements)
    Virtually all mobile applications, and increasingly more desktop/laptop applications, are run with application sandboxes in place. These may disallow malware from accessing files, devices, or the network. However, it never hurts to ask. The software can simply ask the user to modify the sandbox settings. If social engineering is successful, the user may not even be suspicious and not wonder why a game wants access to contacts or location information.

Network Security

The Internet is designed to interconnect various networks, each potentially using different hardware and protocols, with the Internet Protocol (IP) providing a logical structure atop these physical networks. IP inherently expects unreliability from underlying networks, delegating the task of packet loss detection and retransmission to higher layers like TCP or applications. Communication via IP involves multiple routers and networks, which may compromise security due to their unknown trust levels.

The OSI model helps describe the networking protocol stacks for IP:

  1. Physical Layer: Involves the actual network hardware.
  2. Data Link Layer: Manages protocols for local networks like Ethernet or Wi-Fi.
  3. Network Layer: Handles logical networking and routing across physical networks via IP.
  4. Transport Layer: Manages logical connections, ensuring reliable data transmission through TCP, or provides simpler, unreliable communication via UDP.

Each layer plays a critical role in ensuring data is transmitted securely and efficiently across the internet.

Data link layer

In an Ethernet network, the data link layer is handled by Ethernet transceivers and Ethernet switches. Security was not a consideration in the design of this layer and several fundamental attacks exist at this layer. Wi-Fi also operates at the data link layer and uses the same address structure as ethernet. It adds encryption on wireless data between the device and access point. Note that the encryption is not end-to-end, between hosts, but only to the access point.

Switch CAM table overflow

Sniff all data on the local area network (LAN).

A CAM table overflow attack exploits the self-learning mechanism of a network switch, which uses a content addressable memory (CAM) table to map MAC addresses to switch ports for efficient packet forwarding. By flooding the switch with fake MAC addresses, an attacker can overflow the CAM table. Once the table is full, the switch behaves like a hub, broadcasting packets to all ports, thus allowing the attacker to intercept data. To protect against this, port security can be configured to limit the number of MAC addresses allowed on a port, preventing unauthorized devices from overwhelming the CAM table.

VLAN hopping (switch spoofing)

Sniff all data from connected virtual local area networks.

A VLAN hopping attack exploits VLAN (Virtual Local Area Network) configurations to gain unauthorized access to multiple VLANs. VLANs segregate network traffic for enhanced security and efficiency. Since switches can connect to other switches, VLAN trunking, managed via the IEEE 802.1Q standard, allows multiple VLANs to share a single physical connection between switches.

Attackers can perform switch spoofing by emulating a switch, tricking a real switch into thinking it’s connected to another switch. This allows the attacker’s device to receive traffic across all VLANs. Defending against such attacks involves configuring managed switches to restrict trunking to authorized ports.

ARP cache poisoning

Redirect IP packets by changing the IP address to MAC address mapping.

An ARP cache poisoning attack exploits the Address Resolution Protocol (ARP), which is used by the operating system to map IP addresses to MAC addresses. Attackers can respond falsely to ARP queries or send gratuitous ARP responses not associated with a request, claiming their MAC address corresponds to another device’s IP address. This corrupts the ARP caches of devices on the network.

Defenses include Dynamic ARP Inspection on switches, which verifies ARP packets against a trusted list, and static ARP entries to prevent unauthorized changes.

DHCP spoofing

Configure new devices on the LAN with your choice of DNS address, router address, etc.

DHCP spoofing attacks target the Dynamic Host Configuration Protocol (DHCP), which networks use to assign IP addresses and network configuration parameters to devices dynamically.

The attack begins with a DHCP Discover message, which devices broadcast to find DHCP servers. Malicious actors respond to these messages before legitimate servers, directing devices to use attacker-specified DNS or gateway settings. This redirection allows attackers to intercept, manipulate, or block data.

The problem is challenging to mitigate because of the trust placed in network broadcasts and the speed of response. A defense mechanism, DHCP snooping, helps by validating DHCP messages on network switches and blocking unauthorized DHCP offers, thereby safeguarding against malicious server responses.

Network (IP) layer

The Internet Protocol (IP) layer is responsible for getting datagrams (packets) to their destination. It does not provide any guarantees on message ordering or reliable delivery. Datagrams may take different routes through the network and may be dropped by queue overflows in routers.

Source IP address authentication

Anyone can impersonate an IP datagram.

One aspect of the design of IP networking is that there is no source IP address authentication. Clients are expected to use their own source IP address but anybody can override this if they have administrative privileges on their system by using a raw sockets interface.

This enables an attacker to forge messages to appear that they come from another system. Any software that authenticates requests based on their IP addresses will be at risk.

Anonymous denial of service

The ability to set an arbitrary source address in an IP datagram can be used for anonymous denial of service attacks. If a system sends a datagram that generates an error, the error will be sent back to the source address that was forged in the query. For example, a datagram sent with a small time-to-live, or TTL, value will cause a router that is hit when the TTL reaches zero to respond back with an ICMP (Internet Control Message Protocol) Time to Live exceeded message. Error responses will be sent to the forged source IP address and it is possible to send a vast number of such messages from many machines (by assembling a botnet) across many networks, causing the errors to all target a single system.

Routers

Routers are computers with multiple network links and often with special-purpose hardware to facilitate the rapid movement of packets across interfaces. They run operating systems and have user interfaces for administration. As with many other devices that people don’t treat as “real” computers, there is a danger that they routers will have simple or even default passwords. Moreover, owners of routers may not be nearly as diligent in keeping the operating system and other software updated as they are with their computers.

Routers can be subject to some of the same attacks as computers. Denial of service (DoS) attacks can keep the router from doing its job. One way this is done is by sending a flood of ICMP datagrams. The Internet Control Message Protocol is typically used to send routing error messages and updates and a huge volume of these can overwhelm a router. Routers may also have input validation bugs and not handle certain improper datagrams correctly.

Route table poisoning is the modification of the router’s routing table either by breaking into a router or by sending route update datagrams over an unauthenticated protocol.

Transport layer (UDP, TCP)

UDP and TCP are transport layer protocols that allow applications to establish communication channels with each other. Each endpoint of such a channel is identified by a port number (a 16-bit integer that has nothing to do with Ethernet switch ports). The port number allows the operating system to direct traffic to the proper socket.

UDP, the User Datagram Protocol, is stateless, connectionless, and unreliable. As we saw with IP source address forgery, anybody can send UDP messages with forged source IP addresses.

TCP (Transmission Control Protocol) is a stateful, connection-oriented, and reliable protocol used in network communications. Being stateful, TCP keeps track of the connection’s state through sequence numbers, ensuring that packets are ordered correctly and no data is lost. As a connection-oriented protocol, TCP establishes a connection using a three-way handshake process before any data transfer. This handshake involves SYN (synchronize), SYN-ACK (synchronize acknowledgment), and ACK (acknowledgment) packets to synchronize and acknowledge connection establishment.

TCP’s three-way handshake not only establishes a connection but also initializes sequence numbers, which are crucial for ensuring data integrity and order. The process starts when the client sends a SYN packet to the server with a random initial sequence number. The server responds with a SYN-ACK packet, acknowledging the client’s sequence number by adding one, and provides its own random initial sequence number. The client completes the handshake by sending an ACK packet, acknowledging the server’s sequence number. This exchange of sequence numbers sets the foundation for a reliable, ordered data transmission.

TCP’s use of random initial sequence numbers is critical for security. By starting with a random sequence number, TCP mitigates sequence number prediction attacks, where an attacker predicts the sequence numbers of packets to spoof legitimate packets or hijack a session. This randomness helps in maintaining the integrity and security of the data exchange process.

SYN flooding

SYN flooding attacks target the TCP three-way handshake by flooding a server with SYN packets, often from spoofed IP addresses, leading to server resource exhaustion and service unavailability.

SYN cookies defend against SYN flooding attacks by having the server create an initial sequence number is a cryptographic hash of the source and destination IP addresses and ports, along with a secret number. This allows the server to verify the legitimacy of incoming ACK packets without needing to store state information prematurely, thus preventing resource exhaustion. By encoding this connection-specific information into the sequence number, the server ensures that only clients completing the valid handshake can establish a connection.

TCP Reset

A somewhat simple attack is to send a RESET (RST) segment to an open TCP socket. If the server sequence number is correct, then the connection will close. Hence, the tricky part is getting the correct sequence number to make it look like the RESET is part of the genuine message stream.

Sequence numbers are 32-bit values. The chance of successfully picking the correct sequence number is tiny: 1 in 232, or approximately one in four billion. However, many systems will accept a large range of sequence numbers approximately in the correct range to account for the fact that packets may arrive out of order, so they shouldn’t necessarily be rejected just because the sequence number is incorrect. This can reduce the search space tremendously, and an attacker can send a flood of RST packets with varying sequence numbers and a forged source address until the connection is broken.

Routing protocols

Autonomous Systems (AS) sets of IP addresses that are under the control of a single network operator. The Border Gateway Protocol (BGP) is the protocol used by external routers at each AS to exchange routing information between each other. BGP enables AS to determine the best routes for sending network traffic and manage the pathways by which data packets travel across the Internet, thus ensuring efficient and reliable routing.

BGP Hijacking

BGP hijacking, also known as route hijacking, involves maliciously redirecting internet traffic by corrupting the routing tables used by Border Gateway Protocol (BGP). An attacker misleads other networks into believing that the best route to specific IP addresses goes through their malicious system. This can be used to intercept, inspect, or redirect internet traffic to fraudulent sites.

BGP Path Forgery attacks manipulate the Border Gateway Protocol (BGP) by falsely advertising optimal paths to specific network destinations. This type of attack exploits BGP’s trust-based nature, which lacks mechanisms for path verification, leading to traffic being misrouted through the attacker’s network. These actions enable the attacker to intercept or manipulate data traffic.

BGP Prefix Forgery involves malicious actors advertising unauthorized IP prefixes via BGP. By advertising more specific prefixes than those used legitimately, attackers can divert traffic to themselves. BGP favors the most specific route available, making this a particularly effective method for redirecting traffic. This can lead to data interception or denial of service as traffic is misrouted to the attacker’s network.

Two security measures that were added to BGP were RPKI and BGPsec. RPKI (Resource Public Key Infrastructure) enhances BGP security by allowing networks to use public keys and digital signatures to verify that a network is authorized to announce specific IP prefixes, thus preventing invalid route announcements. However, RPKI’s effectiveness is limited by partial adoption and the need for network operators to maintain accurate and up-to-date certificate information.

BGPsec secures BGP by providing cryptographic validation of the entire AS path, not just the origin. This helps prevent path manipulation attacks. The main drawbacks of BGPsec include its increased complexity, higher computational overhead, and slow adoption.

Domain Name System (DNS)

The Domain Name System (DNS) is a Hierarchical service that maps Internet domain names to IP addresses. A user’s computer runs the DNS protocol via a program known as a DNS stub resolver. It first checks a local file for specific preconfigured name-to-address mappings. Then it checks its cache of previously-found mappings. Finally, it contacts an external DNS resolver, which is usually located at the ISP or is run as a public service, such as Google Public DNS or OpenDNS.

We trust that the name-to-address mapping is legitimate. Web browsers, for instance, rely on this to enforce their same-origin policy. However, DNS queries and responses are sent using UDP with no authentication or integrity checks. The only check is that each DNS query contains a Query ID (QID). A DNS response must have a matching QID so that the client can match it to the query. These responses can be intercepted and modified or just forged. Malicious responses can return a different IP address that will direct IP traffic to different hosts

A solution called DNSsec has been proposed. It is a secure extension to the DNS protocol that provide authenticated requests & responses. However, few sites support it.

Pharming attack

A pharming attack is an attack on the configuration information maintained by a DNS server –either modifying the information used by the local DNS resolver or modifying that of a remote DNS server. By changing the name to IP address mapping, an attacker can cause software to send packets to the wrong system.

The most direct form of a pharming attack is to modify the local hosts file to add a malicious name-to-address mapping. Alternatively, malware may modify the DNS server settings on a system so that it would contact an attacker’s DNS server, which can provide the wrong IP address for certain domain names.

DNS cache poisoning (DNS spoofing attack)

DNS queries first check the local host’s DNS cache to see if the results of a past query have been cached. A DNS cache poisoning attack, also known as DNS spoofing, involves corrupting the DNS cache with false information to redirect users to malicious websites. In the general case, DNS cache poisoning refers to any mechanism where an attacker is able to provide malicious responses to DNS queries, resulting in those responses getting cached locally.

JavaScript on a malicious website can perform a DNS cache poisoning attack. This attack takes advantage of the fact that a DNS response for a subdomain, such as a.bank.com can contain information about a new DNS server for the entire bank.com domain.

The browser requests access to a legitimate site but with an invalid subdomain. For example, a.bank.com. Because the system will not have the address of a.bank.com cached, it sends a DNS query to an external DNS resolver using the DNS protocol.

The DNS query includes a query ID (QID) x1. At the same time that the request for a.bank.com is made, JavaScript launches an attacker thread that sends 256 responses with random QIDs (y1, y2, y3, …}. Each of these DNS responses tells the server that the DNS server for bank.com is at the attacker’s IP address.

If one of these responses happens to have a matching QUD, the host system will accept it as truth that all future queries for anything at bank.com should be directed to the name server run by the attacker. If the responses don’t work, the script can try again with a different subdomain, b.bank.com. The attack might take several minutes, but there is a high likelihood that it will eventually succeed.

Summary: An attacker can run a local DNS server that will attempt to provide spoofed DNS responses to legitimate domain name lookup requests. If the query ID numbers of the fake response match those of a legitimate query (trial and error), the victim will get the wrong IP address, which will redirect legitimate requests to an attacker’s service.

DNS Rebinding

Web application security is based on the same-origin policy. Browser scripts can access cookies and other data on pages only if they share the same origin, which is the combination of URI (protocol), host name, and port number. The underlying assumption is that resolving a domain name takes you to the correct server.

The DNS rebinding attack allows JavaScript code on a malicious web page to access private IP addresses in the victim’s network. The attacker configures the DNS entry for a domain name to have a short time to live (TTL). When the victim’s browser visits the page and downloads JavaScript from that site, that JavaScript code is allowed to interact with the domain thanks to the same origin policy. However, right after downloading the script, the attacker can reconfigure the DNS server so that future queries will return an address in the internal network. The JavaScript code can then try to request resources from that system since, as far as the browser is concerned, the origin is the same because the name of the domain has not changed.

Summary: short time-to-live values in DNS allow an attacker to change the address of a domain name so that scripts from that domain can now access resources inside the private network.

Distributed Denial of Service (DDoS) attacks

The purpose of Distributed Denial of Service (DDoS) attacks is to overwhelm a target’s network, server, or application with an excessive amount of traffic or requests, causing it to slow down, crash, or become unavailable to legitimate users. DDoS attacks are typically carried out using a network of compromised devices, called a botnet, and are often used to disrupt business operations, extort victims, or mask other malicious activities like data theft.

Techniques used in denial of service attacks

  1. Exploit asymmetries: Target scenarios where processing requests is more taxing than sending them.
  2. Fake return addresses: Use spoofed addresses to make tracing difficult and avoid managing response traffic.
  3. Response Rrdirection: Set the return address of a request to the target, causing innocent services to bombard the target with responses.
  4. Amplification: Send small queries to services that respond with much larger data, increasing the volume of traffic aimed at the target.
  5. Botnets: Use a network of compromised devices to generate massive attack traffic.

DDoS attacks use multiple systems distributed globally to flood the target with traffic. There are two types of attacks:

  • Volumetric Attacks: Flood the target with massive amounts of data to consume bandwidth.
  • Packet-per-Second Attacks: Overwhelm the processing capacity of network devices with high rates of requests.

Reflection amplification

Reflection amplification DDoS attacks exploit the behavior of UDP-based services, which are connectionless and do not verify the sender’s IP address. Attackers send small requests to publicly accessible servers using a spoofed source IP address that matches the target’s IP. The servers then reflect their responses, which are often much larger than the original requests, to the victim, resulting in an amplified volume of traffic overwhelming the target.

This attack takes advantage of the amplification factor of certain UDP-based protocols, like DNS, NTP (Network Time Protocol), or Memcached, which generate responses significantly larger than the original queries. For example, a 60-byte DNS query can produce a 4,000-byte response. Since UDP does not require a handshake or connection establishment, the servers have no way of validating the source IP, making them ideal tools for such attacks. This combination of reflection and amplification enables attackers to launch highly disruptive DDoS attacks with minimal effort and resources.

Defensive Strategies

  • Overprovision Bandwidth: Maintain more bandwidth than typically needed to absorb higher traffic volumes.
  • Rate Limiting: Implement limits on traffic rates to prevent overconsumption of resources.
  • Blackhole Routing: Divert and drop traffic identified as malicious.
  • Network Redundancy: Ensure availability with multiple networks.
  • Disable Unnecessary UDP Services: Reduce exposure to UDP-based attacks by disabling irrelevant services.

Virtual Private Networks (VPNs)

Network tunnels serve the purpose of securely transmitting data between different network segments or over the internet by encapsulating the data packets within the protocol of an underlying network. This enables moving data across networks that might not otherwise support the same communication protocols, creating a communication channel over a public network infrastructure. For example, an IP packet on a local area network that is directed to a local IP address at a branch can be encapsulated within an IP packet that is sent to the router at that branch office, which would then extract the packet and route it on the internal network.

A tunnel provides connectivity but not security. A VPN (Virtual Private Network) is created by adding security to a network tunnel. This usually involves encrypting the encapsulated packet and adding a message authentication code (MAC) to ensure that any data transmitted between the endpoints remains confidential and secure from potential eavesdropping or modification. Additionally, VPNs employ authentication methods to verify the identities of the endpoints, further securing the data exchange within the tunnel.

IPsec (Internet Protocol Security) is a set of VPN protocols used to secure Internet communications by authenticating and encrypting each IP packet in a data stream. Communications in IPsec use one of two two main protocols: AH (Authentication Header) or ESP (Encapsulating Security Payload).

AH ensures data integrity and authenticity by adding a message authentication code (MAC) to each datagram but does not provide encryption.

ESP provides the same assurance of integrity as AH but also adds encryption in addition to a MAC, ensuring the confidentiality, integrity, and authenticity of data.

IPsec can operate in two modes: Transport and Tunnel. Transport mode encrypts only the payload of the IP packet, leaving the header untouched, and is suitable for end-to-end communication between hosts. Tunnel mode encrypts the entire IP packet and encapsulates it within a new packet, and is used mainly for gateway-to-gateway communications, such as VPNs, where entire packets need to be protected as they traverse untrusted networks.

IPSec supports the use of:

  • HMAC for message authentication.
  • Diffie-Hellman key exchange to create random session keys for HMAC and encryption while assuring forward secrecy.
  • Symmetric encryption of data using AES for the ESP protocol
  • X.509 digital certificates or pre-shared keys for authentication of endpoints.

Transport Layer Security (TLS)

Virtual Private Networks (VPNs) operate at the network layer to connect entire networks, tunneling all IP traffic without differentiating between specific data streams. This approach does not directly provide application-to-application secure communication. In contrast, Transport Layer Security (TLS), evolved from Secure Sockets Layer (SSL), operates above TCP to provide authentication, integrity, and encryption directly to applications. TLS preserves the sockets interface, allowing developers to implement network security transparently. Applications like web browsers use HTTPS, which incorporates TLS for secure communication over HTTP.

TLS has been designed to provide:

Data encryption
Symmetric cryptography is used to encrypt data.
Key exchange
During the authentication sequence, TLS performs a Diffie-Hellman key exchange so that both sides can obtain random shared session keys. From the common key, TLS uses a pseudorandom generator to create all the keys it needs for encryption and integrity.
Data integrity
Ensure that we can detect if data in transit has not been modified and new data has not been injected. TLS includes an HMAC function based on the SHA-256 hash for each message.
Authentication
TLS authenticates the endpoints prior to sending data. Authentication can be unidirectional (the client may just authenticate the server) or bidirectional (each side authenticates the other). TLS uses public key cryptography and X.509 digital certificates as a trusted binding between a user’s public key and their identity.
Interoperability & evolution
TLS was designed to support different key exchange, encryption, integrity, & authentication protocols. The start of each session enables the protocol to negotiate what protocols to use for the session.

TLS sub-protocols

TLS operates through two main phases:

the handshake protocol and the record protocol.

  1. The handshake protocol (authentication and setup):
    During the handshake protocol, the client authenticates the server using X.509 digital certificates and digital signatures. They then use Ephemeral Diffie-Hellman key exchange to create a common key. This provides forward secrecy to the communication session.

  2. The record protocol (communication):
    Following the handshake, the record protocol encrypts application data using the agreed-upon symmetric encryption algorithm, ensuring confidentiality and using a hashed message authentication code (HMAC) to ensure message integrity as data is transmitted between the server and client.

Firewalls

A firewall protects the junction between an untrusted network (e.g., external Internet) and a trusted network (e.g., internal network). Two approaches to firewalls are packet filtering and proxies. A packet filter, or screening router, determines not only the route of a packet but whether the packet should be dropped based on contents in the IP header, TCP/UDP header, and the interface on which the packet arrived. It is usually implemented inside a border router, also known as the gateway router that manages traffic flow between the ISP and user’s network. The basic principle of firewalls is to never have a direct inbound connection from the originating host from the Internet to an internal host; all traffic must flow through a firewall and be inspected.

The packet filter evaluates a set of rules to determine whether to drop or accept a packet. This set of rules forms an access control list, often called a chain. Strong security follows a default deny model, where packets are dropped unless some rule in the chain specifically permits them.

First-generation packet filters implemented stateless inspection. A packet is examined on its own with no context based on previously-seen packets.

Second-generation packet filters track TCP connections and other information from previous connections. These stateful packet inspection (SPI) firewalls allow the router to keep track of outstanding TCP connections. For instance:

  • They can block TCP data traffic if a connection setup did not take place to avoid sequence number prediction attacks.

  • They can track that a connection has been established by a client to a remote server and allow return traffic to that client (which is essential for any interaction by someone inside the network with external services).

  • They can track connectionless UDP and ICMP messages and allow responses to be sent back to clients in the internal network. DNS queries and pings (ICMP echo-reply messages) are examples of these.

  • They also and understand the relationship between packets. For example, when a client establishes an FTP (file transfer protocol) connection to a server on port 21, the server establishes a connection back to the client on a different port when it needs to send data.

Packet filters traditionally do not look above the transport layer (UDP and TCP protocols and port numbers).

Third-generation packet filters incorporate deep packet inspection (DPI), which allows a firewall to examine application data as well and make decisions based on its contents. Deep packet inspection can validate the protocol of an application as well as check for malicious content such as malformed URLs or other security attacks. DPI is often considered to be part of Intrusion Prevention Systems. Examples are detecting application-layer protocols such as HTTP and then applying application-specific filters, such as checking for suspicious URLs or disallowing the download of certain ActiveX or Java applets.

Deep Packet Inspection (DPI) firewalls evolved to Deep Content Inspection (DCI) firewalls. These use the same concept but are capable of buffering large chunks of data from multiple packets that contain an entire object and acting on it, such as unpacking base64-encoded content from web and email messages and performing a signature analysis for malware.

Application proxies

Application proxies act as intermediaries for specific applications. They inspect and filter traffic at the application layer, ensuring that only valid protocol traffic passes between networks. By validating data exchanges against known protocols, they enhance security by preventing protocol-specific attacks. When running on dual-homed hosts, these proxies benefit from an added layer of isolation; one network interface connects to the public network and the other to the private network, thereby controlling and monitoring all inbound and outbound communication effectively.

DMZs

In a typical firewalled environment using a screened subnet architecture, two distinct subnets are established: the DMZ (**demilitarized zone)** for externally accessible services like web and mail servers, and another for internal systems shielded from external access. Traffic control and security are enforced by screening routers. The exterior router manages access to the DMZ, filtering incoming traffic to allowed services, while the interior router controls traffic from the DMZ to the internal network, ensuring only necessary communications pass. This setup can be simplified using a single router with detailed filtering rules for each interface to accomplish the same function.

Deperimeterization and zero trust

The trustworthiness of systems in internal networks diminished as people would move their laptops and phones between different environments, users would install random software on their systems, systems had to access cloud services, remote work become common, and there was an increased likelihood of malware getting installed on any computers in a company’s network. The breakdown of a secure boundary between a trusted internal and untrusted external network is called deperimiterization.

This shift led to the development of the Zero Trust model, which does not assume internal network traffic is automatically safe. Instead, it enforces strict identity verification and least privilege access for every user and device, regardless of their location relative to the traditional network perimeter.

Host-based firewalls

Firewalls generally intercept all packets entering or leaving a local area network. A host-based firewall, on the other hand, runs on a user’s computer. Unlike network-based firewalls, a host-based firewall can associate network traffic with individual applications. Its goal is to prevent malware from accessing the network. Only approved applications will be allowed to send or receive network data. Host-based firewalls are particularly useful in light of deperimiterization.. A concern with host-based firewalls is that if malware manages to get elevated privileges, it may be able to shut off the firewall or change its rules.

Intrusion detection/prevention systems

An enhancement to screening routers is the use of intrusion detection systems (IDS). Intrusion detection systems are often parts of DPI firewalls and try to identify malicious behavior. There are three forms of IDS:

  1. A protocol-based IDS validates specific network protocols for conformance. For example, it can implement a state machine to ensure that messages are sent in the proper sequence, that only valid commands are sent, and that replies match requests.

  2. A signature-based IDS is similar to a PC-based virus checker. It scans the bits of application data in incoming packets to try to discern if there is evidence of “bad data”, which may include malformed URLs, extra-long strings that may trigger buffer overflows, or bit patterns that match known viruses.

  3. An anomaly-based IDS looks for statistical aberrations in network activity. Instead of having predefined patterns, normal behavior is first measured and used as a baseline. An unexpected use of certain protocols, ports, or even amount of data sent to a specific service may trigger a warning.

Anomaly-based detection implies that we know normal behavior and flag any unusual activity as bad. This is difficult since it is hard to characterize what normal behavior is, particularly since normal behavior can change over time and may exhibit random network accesses (e.g., people web surfing to different places). Too many false positives will annoy administrators and lead them to disregard alarms.

A signature-based system employs misuse-based detection. It knows bad behavior: the rules that define invalid packets or invalid application layer data (e.g., ssh root login attempts). Anything else is considered good.

Intrusion Detection Systems (IDS) monitor traffic entering and leaving the network and report any discovered problems. Intrusion Prevention Systems (IPS) serve the same function but are positioned to sit between two networks like a firewall and can actively block traffic that is considered to be a threat or policy violation.

Type Description
Firewall (screening router) 1st generation packet filter that filters packets between networks. Blocks/accepts traffic based on IP addresses, ports, protocols
Stateful inspection firewall 2nd generation packet filter. Like a screening router but also takes into account TCP connection state and information from previous connections (e.g., related ports for TCP)
Deep Packet Inspection firewall 3rd generation packet filter. Examines application-layer protocols
Application proxy Gateway between two networks for a specific application. Prevents direct connections to the application from outside the network. Responsible for validating the protocol
IDS/IPS Can usually do what a stateful inspection firewall does + examine application-layer data for protocol attacks or malicious content
Host-based firewall Typically screening router with per-application awareness. Sometimes includes anti-virus software for application-layer signature checking
Host-based IPS Typically allows real-time blocking of remote hosts performing suspicious operations (port scanning, ssh logins)

Web security

Early Web Browsers: Initially, browsers could only deal with static content. Because of this, they weren’t a useful target of attacks and security efforts were mainly directed at server-side attacks through malformed URLs, buffer overflows, and similar vulnerabilities.

Modern Browsers: As browsers evolved, they became more complex, with support for cookies, JavaScript, DOM, CSS, AJAX, WebSockets, and multimedia. All this introduces new security challenges since scripts can communicate over the network, access page contents, and modify them. WebAssembly and Google Native Client (NaCl) enable the execution of sandboxed binary software in browsers, enhancing performance but providing additional challenges in ensuring isolation and proper behavior.

Web security model

The web security model is designed to protect both users and providers of web applications by managing how scripts interact with different web resources. Central to this model is the Same-Origin Policy, which allows scripts running on web pages to only access data from the same site that delivered them.

The term same-origin refers to a policy where two resources are considered to be of the same origin if they have the same scheme (protocol), hostname, and port number. The policy helps to prevent malicious scripts on one site from obtaining access to sensitive data on another site through the user’s browser.

Under the same-origin policy, each origin has access to common client-side resources that include:

  • Cookies: Key-value data that clients or servers can set. Cookies associated with the origin are sent with each http request.

  • JavaScript namespace: Any functions and variables defined or downloaded into a frame share that frame’s origin.

  • DOM tree: This is the JavaScript definition of the HTML structure of the page.

  • DOM storage: Local key-value storage.

Any JavaScript code downloaded into a frame will execute with the authority of its frame’s origin. For instance, if cnn.com loads a script from jQuery.com, the script runs with the authority of cnn.com.

Passive content, which is non-executable content such as CSS files and images, has no authority. This normally should not matter as passive content does not contain executable code but there have been attacks in the past that had code in passive content and made that passive content turn active.

Cross-origin content

A page may load content from multiple origins. The same-origin policy defines that JavaScript code loaded from anywhere runs with the authority of the frame’s origin. Content from other origins is generally not readable or writable by JavaScript. For example:

  • A frame can load images from other origins but cannot inspect that image.

  • A frame may embed CSS from any origin but cannot inspect the CSS content.

  • A frame can load JavaScript, which executes with the authority of the frame’s origin but if the code is downloaded from a different origin, it is executable but not readable.

Cross-Origin Resource Sharing (CORS) is a security feature that allows web applications running at one origin to request resources from a different origin. CORS provides a way for server administrators to specify who can access their resources and under what conditions. This is done through the use of HTTP headers that send browsers an identification of sites that should be considered to be treated as if they share the same origin. For example, when a user downloads a page, a server on example.com can send an HTTP header that contains:

Access-Control-Allow-Origin: http://www.example.com

which tells the browser that the URL http://www.example.com will be treated as the same origin as the frame’s URL (e.g., http://example.com).

Cookies

Cookies are small pieces of data, name-value sets, sent from a website and stored on a user’s web browser. Every time the user loads the website, the browser sends relevant cookies back to the server to notify the website of the user’s previous activity.

Cookies serve three primary purposes on the web:

Session Management: Cookies can store login information, shopping cart data, and other details that keep track of user sessions, allowing users to pick up where they left off on previous visits without needing to re-enter information.

Personalization: They store user preferences, such as themes, language settings, and location, to tailor the browsing experience to the user’s needs and preferences.

Tracking: Cookies are used to monitor and analyze user behavior over time, helping websites and advertisers gather insights into browsing habits, which can be used for targeted advertising and optimizing the user experience.

There are two types of cookies based on their lifetime:

Session cookies: These are temporary cookies that remain in the cookie file of your browser until you leave the site.

Persistent cookies: These remain in the cookie file of your browser for much longer (though how long will depend on the specified lifetime of the specific cookie). They are used to remember your preferences within an application and remain on your desktop after you close your browser.

A browser will handle cookies for multiple web sites (origins) and various parts of a site.

Browsers send and receive cookies but cookies don’t quite use the same concept of an origin. Cookies are bound by a scope that includes the domain and path where they were set. A cookie associated with a specific domain and path will only be sent to the server when a request is made that matches its scope. The domain attribute specifies which domain the cookie belongs to, while the path attribute restricts the cookie to a specific directory. A server at example.com might set a cookie with a path of /blog to ensure that the cookie is only sent when accessing parts of the site within the /blog directory. This provides a degree of isolation that can prevent cookies from being sent across different contexts, which can be important for security and compartmentalization of user sessions and preferences.

Security implications arise because cookies can store sensitive information such as user IDs, passwords, login state, and other personal details that might be exploitable. To enhance security, cookies often incorporate:

  • HttpOnly flag: This makes the cookie inaccessible to client-side scripts, reducing the risk of cross-site scripting (XSS) attacks.

  • Secure flag: This restricts the transmission of cookies to secure (HTTPS) connections, preventing them from being intercepted during the transmission over unsecured networks.

First-party cookies are set by the website a user is directly visiting and are used for purposes like remembering login details or preferences for that site.

Third-party cookies are set by domains other than the one the user is visiting, often through embedded content like ads or tracking pixels (see discussion below). When the browser requests the content, the URL of the content and any cookies associated with the site hosting the content are sent along with the request. These cookies enable tracking across multiple sites by assigning a unique identifier to the user, allowing the third-party domain to collect and correlate browsing data, commonly for ad targeting or analytics.

Cross-Site Request Forgery (CSRF)

Cross-Site Request Forgery (CSRF) is an attack that tricks a web browser into executing an unwanted action on a web service where a user is authenticated. An attacker crafts a malicious website or email with requests to a vulnerable web service where the user is already logged in. When the user interacts with the malicious content, the browser makes requests to the application, sending cookies with the user’s credentials, as if the user themselves made the request.

For example, if a user is logged into their banking site and then clicks on a deceptive link that requests a funds transfer, the banking site might process that request as legitimate, since it came from the user’s browser, which sent the user’s cookies for the service. This vulnerability exploits the trust that a web application has in the user’s browse.

There are several defenses against Cross-site request forgery:

  1. CSRF Tokens: Unique, random tokens are embedded in forms or requests, verified by the server to ensure legitimacy.
  2. SameSite Cookies: Restricts cookies from being sent with cross-origin requests, preventing unauthorized actions.
  3. Referer Validation: Checks that requests originate from the expected domain, though less reliable due to potential header stripping.
  4. Re-Authentication: Requires users to re-enter passwords or confirm actions for sensitive tasks.
  5. Session Management: Relying on users to log out, ahich clears session cookies, making CSRF attempts ineffective.

Clickjacking

Clickjacking is a malicious technique of tricking a web user into clicking on something different from what the user perceives, effectively hijacking the clicks meant for another page. This is done by overlaying a transparent iframe over a visually appealing element, such as a video play button or a survey form. The user believes they are interacting with the genuine site, but the click is being routed to a hidden frame, leading to potential unauthorized actions, such as liking a page, sharing personal information, or enabling microphone access.

There are several ways for a web programmer to defend against clickjacking. JavaScript code can be added to a web page to prevent it from being framed. This script checks if the current window is the topmost window, and if it’s not, it can force the page to break out of the frame. Alternatively, an HTTP header can indicate whether a browser should be allowed to render a page in an iframe.

Input sanitization problems

Any user input needs to be parsed carefully before it can be made part of a URL, HTML content, or JavaScript. Consider a script that is generated with some in-line data that came from a malicious user:

<script> var x = "untrusted_data"; </script>

The malicious user might define that untrusted_data to be

Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye

The resulting script to set the variable x now becomes

<script> var x = "Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye"; </script>

Cross-site scripting

Cross-Site Scripting (XSS) is a web security vulnerability that allows attackers to inject malicious scripts into trusted websites. These scripts are executed in the victim’s browser, enabling attackers to steal sensitive information, hijack sessions, or perform other malicious actions. XSS exploits the trust users place in legitimate websites to execute harmful scripts in their browsing context.

There are two main types of XSS attacks: persistent XSS and reflected XSS.

Persistent XSS, also known as stored XSS, occurs when malicious scripts are permanently stored on a server, such as in a database, comment field, or message board. When other users visit the affected page, the script executes in their browsers.

For example, an attacker might post a malicious script in a comment section, and every user viewing that comment would trigger the script, potentially stealing their session cookies or performing unauthorized actions on their behalf. Persistent XSS is particularly dangerous because it can affect multiple users over time due to its stored nature.

Reflected XSS happens when malicious scripts are included in the URL or request parameters and are reflected back by the server in its response. The script executes in the victim’s browser when they click the malicious link or submit data. For example, an attacker might craft a URL like http://example.com?search=<script>alert('XSS')</script>. If the server reflects the input unsanitized in its response, the script will run in the victim’s browser. Reflected XSS typically targets individual users and relies on social engineering to trick victims into interacting with malicious links.

To mitigate XSS attacks, web developers should validate and sanitize all user inputs to prevent scripts from being injected. Output encoding should be applied to user-provided data before displaying it on web pages. Implementing a Content Security Policy (CSP) can help restrict the execution of unauthorized scripts, while setting cookies with the HTTPOnly flag protects session cookies from being accessed by malicious scripts.

Homograph (homoglyph) attacks

Homograph attacks take advantage of characters that look alike used to deceive users. For example, the domain “paypaI.com” where the last letter is a capital ‘I’ instead of an ‘l’, mimicking “paypal.com” can be used in phishing scams to make users believe they are going to a valid website.

Unicode is a comprehensive system designed to represent over 128,000 characters, covering almost all of the world’s scripts and symbols, including alphabets like Latin, Greek, Cyrillic, and scripts for languages such as Arabic, Hindi, Chinese, and many more, along with emojis and ancient scripts.

Unicode’s design, which allows visually similar or identical characters from different scripts, poses risks for deception attacks. The introduction of IDNs (Internationalized Domain Names) allows the use of Unicode characters in domain names, which has further facilitated deceptive practices by enabling the creation of domain names that visually mimic legitimate ones but use characters from different scripts. For instance, using a Cyrillic ‘a’ instead of the Latin ‘a’ can mislead users into thinking they are accessing a familiar website. The characters look identical but are different to a DNS service.

Websites like “wikipedia.org” can be mimicked using characters from non-Latin scripts, such as Greek or Cyrillic, to create visually indistinguishable yet technically different URLs, misleading users and potentially leading to phishing or other forms of cyber fraud.

Tracking via images

The same-origin policy treats images as static content with no authority. It would seem that images should not cause problems. However, an image tag (IMG) can pass parameters to the server, just like any other URL:

<img src="http://evil.com/images/balloons.jpg?extra_information" height="300" width="400"/>

The parameter can be used to notify the server that the image was requested from a specific page. Unlike cookies, which can sometimes be disabled, users will not block images from loading.

An image itself can be hidden by setting its size to a single pixel … and even making it invisible:

<img src="https://attacker.com/onebyone.png" height="1" width="1" />

These images are called tracking pixels or spy pixels.

When a browser loads an image:

  • The server that hosts the image is contacted with an HTTP GET request for the content.
  • Any cookies for that server will be sent by the browser.
  • Any extra information that’s part of the image URL will be sent. This information can, for example, identify the website or page that is hosting the content.
  • The server logs the time and IP address that requested the image.
  • The HTTP headers also identify the browser version, operating system, and type of device.

A server can use the image data to identify the specific page and read a cookie to get a unique ID for the user. The ID can be used as a key for an object store or database and store every page a user visited. That enables tracking the user’s visits across different pages.

Steganography

Cryptography’s goal is to hide the contents of a message. Steganography’s goal is to hide the very existence of the message. Classic techniques included the use of invisible ink, writing a message on one’s head and allowing the hair to cover it, microdots, and carefully-clipped newspaper articles that together communicate the message.

A null cipher is one where the actual message is hidden among irrelevant data. For example, the message may comprise the first letter of each word (or each sentence, or every second letter, etc.). Chaffing and winnowing entails the transmission of a bunch of messages, of which only certain ones are legitimate. Each message is signed with a key known only to trusted parties (e.g., a MAC). Intruders can see the messages but can’t validate the signatures to distinguish the valid messages from the bogus ones.

Messages can be embedded into images. There are a couple of ways of hiding a message in an image:

  1. A straightforward method to hide a message in an image is to use low-order bits of an image, where the user is unlikely to notice slight changes in color. An image is a collection of RGB pixels. You can mess around with the least-significant bits and nobody will notice changes in the image, so you can just encode the entire message by spreading the bits of the message among the least-significant bits of the image.

  2. You can do a similar thing but apply a frequency domain transformation, like JPEG compression does, by using a Discrete Cosine Transform (DCT). The frequency domain maps the image as a collection ranging from high-frequency areas (e.g., “noisy” parts such as leaves, grass, and edges of things) through low-frequency areas (e.g., a clear blue sky). Changes to high frequency areas will mostly be unnoticed by humans: that’s why jpeg compression works. It also means that you can add your message into those areas and then transform it back to the spatial domain. Now your message is spread throughout the higher-frequency parts of the image and can be extracted if you do the DCT again and know where to look for the message.

Many laser printers embed a serial number and date simply by printing very faint color splotches.

Steganography is closely related to watermarking. and the terms “steganography” and “watermarking” are often used interchangeably.

The primary goal of watermarking is to create an indelible imprint on a message such that an intruder cannot remove or replace the message. It is often used to assert ownership, authenticity, or encode DRM rules. The message may be, but does not have to be, invisible.

The goal of steganography is to allow primarily one-to-one communication while hiding the existence of a message. An intruder – someone who does not know what to look for – cannot even detect the message in the data.

App Integrity

Android enforces integrity through application signing and app verification processes. Developers sign their apps with a private key, and the corresponding public key is used to verify the app’s integrity upon installation. The Google Play Store further reinforces this by vetting apps before they are available for download, ensuring that only those that haven’t been tampered with are accessible to users.

The Android app sandbox

Android supports only a single user and uses Linux user IDs for isolating app privileges. Under Android, each app normally runs under a different user ID. Hence, apps are isolated and can only access their resources. Access requests to other objects involve messages that pass through a gatekeeper, which validates access requests.

Two mechanisms are used to enforce file access permissions:

  1. Linux file permissions These provide discretionary access control, allowing the owner (and root) to change permissions to allow others access to the files. With this mechanism, an app can decide to share a data file.

  2. SELinux mandatory access control Certain data and cache directories in Android are protected with the SELinux (Security-Enhanced Linux) mandatory access control (MAC) kernel extension. This ensures that even the owner cannot change access permissions for the files.

Internal storage provides a per-app private directory for files used by each application. External storage (e.g., attached microSD cards or USB devices) is shared among all apps and, of course, may be moved to other computers.

Other protections

The Linux operating system provides per-process memory isolation and address space layout randomization (ASLR). Linux also uses no-execute (NX) protection on stack and heap memory pages if the processor supports it

The Java compiler provides stack canaries, and its memory management libraries provide some heap overflow protections (checks of backward & forward pointers in dynamically allocated structures).

Android supports whole disk encryption so that if a device is stolen, an attacker will not be able to easily recover file contents even with raw access to the flash file system.

Unlike iOS, Android supports the concurrent execution of multiple apps. It is up to the developer to think about being frugal with battery life. Apps store state their state in persistent memory so they can be stopped and restarted at any time. This ability to stop an app also helps with DoS attacks as the app is not accepting requests or using system resources.

iOS security

App signing

iOS requires mandatory code signing. Unlike Android, which accepts self-signed certificates, the app package must be signed using an Apple Developer certificate and apps are only available for This does not ensure the trustworthiness of an app but identifies the registered developer and ensures that the app has not been modified after it has been signed.

Runtime protection

Apple’s iOS provides runtime protection via OS-level sandboxing using a kernel-level sandbox. System resources and the kernel are shielded from user apps. The sandbox limits which system calls an app can make and the parameters to system calls. Except through kernel exploits, an app cannot leave its sandbox.

The app sandbox restricts the ability of one app to access another app’s data and resources. Each app has its own sandbox directory. The OS enforces the sandbox and permits access only to files within that directory, as well as restricted access to to system preferences, the network, and other resources.

Inter-app communication can take place only through iOS APIs. Code generation by an app is prevented because data memory pages cannot be made executable and executable memory pages are not writable by user processes.

Data protection

All file contents are encrypted with a unique 256-bit AES per-file key, which is generated when the file is created.

This per-file key is encrypted with a class key and is stored along with the file’s metadata, which is part of the file system that describes attributes of the file, such as size, modification time, and access permissions.

The class key is generated from a hardware key in the device and the user’s passcode. Unless the passcode is entered, the class key cannot be created and the file key cannot be decrypted.

The file system’s metadata is also encrypted. A file system key is used for this, which is derived directly from the hardware key, which is generated when iOS is installed. Keys are stored in Apple’s Secure Enclave, a separate processor and isolated memory that cannot be accessed directly by the main processor. Encrypting metadata encrypts the entire structure of the file system. Someone who rips out the flash memory from an iOS device and examines it can see neither file contents (they are encrypted with per-file keys) nor information about those files (the metadata is encrypted with a file system key).

A hardware AES engine encrypts and decrypts the file as it is written/read on flash memory so file encryption is done transparently and efficiently.

The iOS kernel partition is mounted read-only, so even if an app manages to break out of its sandbox due to some vulnerability and gain root access, it will still not have permission to modify the kernel.

Additional kernel and hardware protection

In addition to the sandbox, iOS also uses address space layout randomization (ASLR) and memory execute protection for stack and heap pages via ARM’s Execute Never (XN) memory page flag.

Hardware support for security

ARM TrustZone worlds
ARM TrustZone worlds

All Android and iOS phones currently use ARM processors. ARM provides a dedicated security module, called TrustZone, that coexists with the normal processor. The hardware is separated into two “worlds”: secure (trusted) and non-secure (non-trusted) worlds. Any software resides in only one of these two worlds and the processor executes in only one world at a time.

Each of these worlds has its own operating system and applications. Android systems run an operating system called Trusty TEE in the secure world and, of course, Linux in the untrusted world.

Logically, you can think of the two worlds as two distinct processors, each running their own operating system with their own data and their own memory. Non-secure applications cannot access any own memory or registers of secure resources directly. The only way they can communicate is through a messaging API.

In practice, the hardware creates two virtual cores for each CPU core, managing separate registers and all processing state in each world.

The phone’s operating system and all applications reside in the non-trusted world. Secure components, such as cryptographic keys, signature services, encryption services, and payment services live in the trusted world. Even the operating system kernel does not have access to any of the code or data in the trusted world. Hence, even if an app manages a privilege escalation attack and gains root access, it will be unable to access certain security-critical data.

Applications for the trusted world include key management, secure boot, digital rights management, secure payment processing, mobile payments, and biometric authentication.

Apple Secure Enclave

Apple uses modified ARM processors for iPhones and iPads. In 2013, they announced Secure Enclave for their processors. The details are confidential but it appears to be similar in function to ARM’s TrustZone but designed as a physically separate coprocessor. As with TrustZone, the Secure Enclave coprocessor runs its own operating system (a modified L4 microkernel in this case).

The processor has its own secure bootloader and custom software update mechanism. It uses encrypted memory so that anything outside the Secure Enclave cannot access its data. It provides:

  • All cryptographic operations for data protection & key management.
  • Random number generation.
  • Secure key store, including Touch ID (fingerprint) and the Face ID neural network.
  • Data storage for payment processing.

The Secure Enclave maintains the confidentiality and integrity of data even if the iOS kernel has been compromised.

–>


  1. Note that sprintf is vulnerable to buffer overflow. We should use snprintf, which allows one to specify the maximum size of the buffer.  ↩︎

  2. Unix, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Android, etc.  ↩︎

  3. the official Unicode name for the slash and backslash characters are solidus and reverse solidus, respectively.  ↩︎

Last modified December 7, 2024.
recycled pixels