Final exam study guide

The three-hour study guide for the final exam

Paul Krzyzanowski

Latest update: Fri Oct 2 10:10:51 EDT 2020

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the final exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the three hour time window in the title literally.


Computer security is about keeping computers, their programs, and the data they manage “safe.” Specifically, this means safeguarding three areas: confidentiality, integrity, and availability. These three are known as the CIA Triad (no relation to the Central Intelligence Agency).

Confidentiality means that we do not make a system’s data and its resources (the devices it connects to and its ability to run programs) available to everyone. Only authorized people and processes should have access. Privacy specifies limits on what information can be shared with others while confidentiality provides a means to block access to such information. Privacy is a reason for confidentiality. Someone being able to access a protected file containing your medical records without proper access rights is a violation of confidentiality.

Integrity refers to the trustworthiness of a system. This means that everything is as you expect it to be: users are not imposters and processes are running correctly.

  • Data integrity means that the data in a system has not been corrupted.

  • Origin integrity means that the person or system sending a message or creating a file truly is that person and not an imposter.

  • Recipient integrity means that the person or system receiving a message truly is that person and not an imposter.

  • System integrity means that the entire computing system is working properly; that it has not been damaged or subverted. Processes are running the way they are supposed to.

Maintaining integrity means not just defending against intruders that want to modify a program or masquerade as others. It also means protecting the system against against accidental damage, such as from user or programmer errors.

Availability means that the system is available for use and performs properly. A denial of service (DoS) attack may not steal data or damage any files but may cause a system to become unresponsive.

Security is difficult. Software is incredibly complex. Large systems may comprise tens or hundreds of millions of lines of code. Systems as a whole are also complex. We may have a mix of cloud and local resources, third-party libraries, and multiple administrators. If security was easy, we would not have massive security breaches year after year. Microsoft wouldn’t have monthly security updates. There are no magic solutions … but there is a lot that can be done to mitigate the risk of attacks and their resultant damage.

We saw that computer security addressed three areas of concern. The design of security systems also has three goals.

Prevention means preventing attackers from violating established security policies. It means that we can implement mechanisms into our hardware, operating systems, and application software that users cannot override – either maliciously or accidentally. Examples of prevention include enforcing access control rules for files and authenticating users with passwords.
Detection detects and reports security attacks. It is particularly important when prevention mechanisms fail. It is useful because it can identify weaknesses with certain prevention mechanisms. Even if prevention mechanisms are successful, detection mechanisms are useful to let you know that attempted attacks are taking place. An example of detection is notifying an administrator that a new user has been added to the system. Another example is being notified that there have been several consecutive unsuccessful attempts to log in.
If a system is compromised, we need to stop the attack and repair any damage to ensure that the system can continue to run correctly and the integrity of data is preserved. Recovery includes forensics, the study of identifying what happened and what was damaged so we can fix it. An example of recovery is restoration from backups.

Security engineering is the task of implementing the necessary mechanisms and defining policies across all the components of the system. Like other engineering disciplines, designing secure systems involves making compromises. A highly secure system will be disconnected from any communication network, sit in an electromagnetically shielded room that is only accessible to trusted users, and run software that has been thoroughly audited. That environment is not acceptable for most of our computing needs. We want to download apps, carry our computers with us, and interact with the world. Even in the ultra-secure example, we still need to be concerned with how we monitor access to the room, who wrote the underlying operating system and compilers, and whether authorized users can be coerced to subvert the system. Systems have to be designed with some idea of who are likely potential attackers and what the threats are. Risk analysis is used to understand the difficulty of an attack on a system, who will be affected, and what the worst thing that can happen is. A threat model is a data flow model (e.g., diagram) that identifies each place where information moves into or out of the software or between subsystems of the program. It allows you to identify areas where the most effort should be placed to secure a system.

Secure systems have two parts to them: mechanisms and policies. A policy is a description of what is or is not allowed. For example, “users must have a password to log into the system” is a policy. Mechanisms* are used to implement and enforce policies. An example of a mechanism is the software that requests user IDs and passwords, authenticates the user, and allows entry to the system only if the correct password is used.

A vulnerability is a weakness in the security system. It could be a poorly defined policy, a bribed individual, or a flaw in the underlying mechanism that enforces security. An attack is the exploitation of a vulnerability in a system. An attack vector refers to the specific technique that an attacker uses to exploit a vulnerability. Example attack vectors include phishing, keylogging, and trying common passwords to log onto a system. An attack surface is the sum of possible attack vectors in a system: all the places where an attacker might try to get into the system.

A threat is the potential adversary who may attack the system. Threats may lead to attacks.

Threats fall into four broad categories:

Disclosure: Unauthorized access to data, which covers exposure, interception, interference, and intrusion. This includes stealing data, improperly making data available to others, or snooping on the flow of data.

Deception: Accepting false data as true. This includes masquerading, which is posing as an authorized entity; substitution or insertion of includes the injection of false data or modification of existing data; repudiation, where someone falsely denies receiving or originating data.

Disruption: Some change that interrupts or prevents the correct operation of the system. This can include maliciously changing the logic of a program, a human error that disables a system, an electrical outage, or a failure in the system due to a bug. It can also refer to any obstruction that hinders the functioning of the system.

Usurpation: Unauthorized control of some part of a system. This includes theft of service as well as any misuse of the system such as tampering or actions that result in the violation of system privileges.

The Internet increases opportunities for attackers. The core protocols of the Internet were designed with decentralization, openness, and interoperability in mind rather than security. Anyone can join the Internet and send messages … and untrustworthy entities can provide routing services. It allows bad actors to hide and to attack from a distance. It also allows attackers to amass asymmetric power: harnessing more resources to attack than the victim has for defense. Even small groups of attackers are capable of mounting Distributed Denial of Service (DDoS) attacks that can overwhelm large companies or government agencies.

Adversaries can range from lone hackers to industrial spies, terrorists, and intelligence agencies. We can consider two dimensions: skill and focus. Regarding focus, attacks are either opportunistic or targeted. Opportunistic attacks are those where the attacker is not out to get you specifically but casts a wide net, trying many systems in the hope of finding a few that have a particular vulnerability that can be exploited. Targeted attacks are those where the attacker targets you specifically. The term script kiddies is used to refer to attackers who lack the skills to craft their own exploits but download malware toolkits to try to find vulnerabilities (e.g., systems with poor or default passwords, hackable cameras). Advanced persistent threats (APT) are highly-skilled, well-funded, and determined (hence, persistent) attackers. They can craft their own exploits, pay millions of dollars for others, and may carry out complex, multi-stage attacks.

We refer to the trusted computing base (TCB) as the collection of hardware and software of a computing system that is critical to ensuring the system’s security. Typically, this is the operating system and system software. If the TCB is compromised, you no longer have assurance that any part of the system is secure. For example. the operating system may be modified to ignore the enforcement of file access permissions. If that happens, you no longer have assurance that any application is accessing files properly.

Access control

See lecture notes

Program Hijacking

Program hijacking refers to techniques that can be used to take control of a program and have it do something other than what it was intended to do. One class of techniques uses code injection, in which an adversary manages to add code to the program and change the program’s execution flow to run that code.

The best-known set of attacks are based on buffer overflow. Buffer overflow is the condition where a programmer allocates a chunk of memory (for example, an array of characters) but neglects to check the size of that buffer when moving data into it. Data will spill over into adjacent memory and overwrite whatever is in that memory.

Languages such as C, C++, and assembler are susceptible to buffer overflows since the language does not have a means of testing array bounds. Hence, the compiler cannot generate code to validate that data is only going into the allocated buffer. For example, when you copy a string using strcpy(char *dest, char *src), you pass the function only source and destination pointers. The strcpy function has no idea how big either of the buffers are.

Stack-based overflows

When a process runs, the operating system’s program loader allocates a region for the executable code and static data (called the text and data segments), a region for the stack, and a region for the heap (used for dynamic memory allocation, such as by malloc).

Just before a program calls a function, it pushes the function’s parameters onto the stack. When the call is made, the return address gets pushed on the stack. On entry to the function that was called, the function pushes the current frame pointer (a register in the CPU) on the stack, which forms a linked list to the previous frame pointer and provides an easy way to revert the stack to where it was before making the function call. The frame pointer register is then set to the current top of the stack. The function then adjusts the stack pointer to make room for hold local variables, which live on the stack. This region for the function’s local data is called the stack frame. Ensuring that the stack pointer is always pointing to the top of the stack enables the function to get interrupts or call other functions without overwriting anything useful on the stack. The compiler generates code to reference parameters and local variables as offsets from the current frame pointer register.

Before a function returns, the compiler generates code to:

  • Adjust the stack back to point to where it was before the stack expanded to make room for local variables. This is done by copying the frame pointer to the stack pointer.

  • Restore the previous frame pointer by popping it off the stack (so that local variables for the previous function could be referenced properly).

  • Return from the function. Once the previous frame pointer has been popped off the stack, the stack pointer points to a location on the stack that holds the return address.

Simple stack overflows

Local variables are allocated on the stack and the stack grows downward in memory. Hence, the top of the stack is in lower memory than the start, or bottom, of the stack. If a buffer (e.g., char buf[128]) is defined as a local variable, it will reside on the stack. As the buffer gets filled up, its contents will be written to higher and higher memory addresses. If the buffer overflows, data will be written further down the stack (in higher memory), overwriting the contents of any other variables that were allocated for that function and eventually overwriting the saved frame pointer and the saved return address.

When this happens and the function tries to return, the return address that is read from the stack will contain garbage data, usually a memory address that is not mapped into the program’s memory. As such, the program will crash when the function returns and tries to execute code at that invalid address. This is an availability attack. If we can exploit the fact that a program does not check the bounds of a buffer and overflows the buffer, we can cause a program to crash.

Subverting control flow through a stack overflow

Buffer overflow can be used in a more malicious manner. The buffer itself can be filled with bytes of valid machine code. If the attacker knows the exact size of the buffer, she can write just the right number of bytes to write a new return address into the very same region of memory on the stack that held the return address to the parent function. This new return address points to the start of the buffer that contains the injected code. When the function returns, it will “return” to the new code in the buffer and execute the code at that location.

Off-by-one stack overflows

As we saw, buffer overflow occurs because of programming bugs: the programmer neglected to make sure that the data written to a buffer does not overflow. This often occurs because the programmer used old, unsafe functions that do not allow the programmer to specify limits. Common functions include:

- strcpy(char *dest, char *src)

- strcat(char *dest, char *src)

- sprintf(char *format, ...)

Each of these functions has a safe counterpart that accepts a count parameter so that the function will never copy more than count number of bytes:

- strcpy(char *dest, char *src, int count)

- strcat(char *dest, char *src, int count)

- sprintf(char *format, int count,  ...)

You’d think this would put an end to buffer overflow problems. However, programmers may miscount or they may choose to write their own functions that do not check array bounds correctly. A common error is an off-by-one error. For example, a programmer may declare a buffer as:

char buf[128];

and then copy into it with:

for (i=0; i <= 128; i++)
    buf[i] = stuff[i];

The programmer inadvertently used a <= comparison instead of <.

With off-by-one bounds checking, there is no way that malicious input can overwrite the return address on the stack: the copy operation would stop before that time. However, if the buffer is the first variable that is allocated on the stack, an off-by-one error can overwrite one byte of the saved frame pointer.

The potential for damage depends very much on what the value of that saved frame pointer was and how the compiler generates code for managing the stack. In the worst case, it could be set up to a value that is 255 bytes lower in memory. If the frame pointer is modified, the function will still return normally. However, upon returning, the compiler pops the frame pointer from the stack to restore the saved value of the calling function’s frame pointer, which was corrupted by the buffer overflow. Now the program has a modified frame pointer.

Recall that references to a function’s variables and parameters are expressed as offsets from the current frame pointer. Any references to local variables may now be references to data in the buffer. Moreover, should that function return, it will update its stack pointer to this buffer area and return to an address that the attacker defined.

Heap overflows

Not all data is allocated on the stack: only local variables. Global and static variables are placed in a region of memory right above the executable program. Dynamically allocated memory (e.g., via new or malloc) comes from an area of memory called the heap. In either case, since this memory is not the stack, it does not contain return addresses so there is no ability for a buffer overflow attack to overwrite return addresses.

We aren’t totally safe, however. A buffer overflow will cause data to spill over into higher memory addresses above the buffer that may contain other variables. If the attacker knows the order in which variables are allocated, they could be overwritten. While these overwrites will not change a return address, they can change things such as filenames, lookup tables, or linked lists. Some programs make extensive use of function pointers, which may be stored in global variables or in dynamically-allocated structures such as linked lists on a heap. If a buffer overflow can overwrite a function pointer then it can change the execution of the program: when that function is called, control will be transferred to a location of the attacker’s choosing.

If we aren’t sure of the exact address at which execution will start, we can fill a buffer with a bunch of NOP (no operation) instructions prior to the injected code. If the processor jumps anywhere in that region of memory, it will happily execute these NOP instructions until it eventually reaches the injected code. This is called a NOP slide, or a landing zone.

Format string attacks with printf

The family of printf functions are commonly used in C and C++ to create formatted output. They accept a format string that defines what will be printed, with % characters representing formatting directives for parameters. For example,

printf("value = %05d\n", v);

Will print a string such as

value = 01234

if the value of v is 1234.

Reading arbitrary memory

Occasionally, programs will use a format string that could be modified. For instance, the format string may be a local variable that is a pointer to a string. This local variable may be overwritten by a buffer overflow attack to point to a different string. It is also common, although improper, for a programmer to use printf(s) to print a fixed string s. If s is a string that is generated by the attacker, it may contain unexpected formatting directives.

Note that printf takes a variable number of arguments and matches each % directive in the format string with a parameter. If there are not enough parameters passed to printf, the function does not know that: it assumes they are on the stack and will happily read whatever value is on the stack where it thinks the parameter should be. This gives an attacker the ability to read arbitrarily deep into the stack. For example, with a format string such as:


printf will expect four parameters, all of which are missing. It will instead read the next four values that are on the top of the stack and print each of those integers as an 8-character-long hexadecimal value prefixed with leading zeros (“%08x\n”).

Writing arbitrary memory

The printf function also contains a somewhat obscure formatting directive: %n. Unlike other % directives that expect to read a parameter and format it, %n instead writes to the address corresponding to that parameter. It writes the number of characters that it has output thus far. For example,

printf(“paul%n says hi”, &printbytes);

will store the number 4 (strlen("paul")) into the variable printbytes. An attacker who can change the format specifier may be able to write to arbitrary memory. Each % directive to print a variable will cause printf to look for the next variable in the next slot in the stack. Hence, format directives such as %x, %lx, %llx will cause printf to skip over the length of an int, long, or long long and get the next variable from the following location on the stack. Thus, just like reading the stack, we can skip through any number of bytes on the stack until we get to the address where we want to modify a value. At that point, we insert a %n directive in the format string, which will modify that address on the stack with the number of bytes that were output. We can precisely control the value that will be written by specifying how many bytes are output as part of the format string. For example, a format of %.55000x tells printf to output a value to take up 55,000 characters. By using formats like that for output values, we can change the count that will be written with %n. Remember, we don’t care what printf actually prints; we just want to force the byte count to be a value we care about, such as the address of a function we want to call.

Defense against hijacking attacks

Better programming

Hijacking attacks are the result of sloppy programming: a lack of bounds checking that results in overflows. They can be eliminated if the programmer never uses unsafe functions (e.g., use strncpy instead of strcpy) and is careful about off-by-one errors.

A programer can use a technique called fuzzing to locate buffer overflow problems. Whenever a string can be provided by the user, the user will enter extremely long strings with well-defined patterns (e.g., “$$$$$$…”). If the app crashes because a buffer overflow destroyed a return address on the stack, the programmer can then load the core dump into a debugger, identify where the program crashed and search for a substring of the entered pattern (“$$$$$”) to identify which buffer was affected.

Buffer overflows can be avoided by using languages with stronger type checking and array bounds checking. Languages such as Java, C#, and Python check array bounds. C and C++ do not. However, it is sometimes difficult to avoid using C or C++.

Tight specification of requirements, coding to those requirements, and constructing tests based on those requirements helps avoid buffer overflow bugs. If input lengths are specified, they are more likely to be coded and checked. Documentation should be explicit, such as "user names longer than 32 bytes must be rejected.”

Data Execution Prevention (DEP)

Buffer overflows affect data areas: either the stack, heap, or static data areas. There is usually no reason that those regions of code should contain executable code. Hence, it makes sense for the operating system to set the processor’s memory management unit (MMU) to turn off execute permission for memory pages in those regions.

This was not possible with early Intel or AMD processors: their MMU did not support enabling or disabling execute permissions. All memory could contain executable code. That changed in 2004, when Intel and AMD finally added an NX (no-execute) bit to their MMU’s page tables. On Intel architectures, this was called the Execute Disable Bit (XD). Operating system support followed. Windows, Linux, and macOS all currently support DEP.

DEP cannot always be used. Some environments, such as some LISP interpreters actually do need execution enabled in their stack and some environments need executable code in their heap section (to support dynamic loading, patching, or just-in-time compilation). DEP also does not guard against data modification attacks, such as heap-based overflows or some printf attacks.

DEP attacks

Attackers came up with some clever solutions to defeat DEP. The first of these is called return-to-libc*. Buffer overflows still allow us to corrupt the stack. We just cannot execute code on the stack. However, there is already a lot of code sitting in the program and the libraries it uses. Instead of adding code into the buffer, the attacker merely overflows a buffer to create a new return address and parameter list on the stack. When the function returns, it switches control to the new return address. This return address will be an address in the standard C library (libc), which contains functions such as printf, system, and front ends to system calls. All that an attacker often needs to do is to push parameters that point to a string in the buffer that contains a command to execute and then “return” to the libc system function, whose function is to execute a parameter as a shell command.

A more sophisticated variant of return-to-libc is Return Oriented Programming (ROP). Return oriented programming is similar to return-to-libc but realizes that execution can branch to any arbitrary point in any function in any loaded library. The function will execute a series of instructions and eventually return. The attacker will overflow the stack with data that now tells this function where to “return”. Its return can jump to yet another arbitrary point in another library. When that returns, it can – once again – be directed to an address chosen by the intruder that has been placed further down the stack, along with frame pointers, local variables, and parameters.

There are lots and lots of return instructions among all the libraries normally used by programs. Each of these tail ends of a function is called a gadget. It has been demonstrated that using carefully chosen gadgets allows an attacker to push a string of return addresses that will enable the execution of arbitrary algorithms. To make life easier for the attacker, tools have been created that search through libraries and identify useful gadgets. A ROP compiler then allows the attacker to program operations using these gadgets.

Address Space Layout Randomization

Stack overflow attacks require knowing and injecting an address that will be used as a target when a function returns. ROP also requires knowing addresses of all the entry points of gadgets. Address Space Layout Randomization (ASLR) is a technique that was developed to have the operating system’s program loader pick random starting points for the executable program, static data, heap, stack, and shared libraries. Since code and data resides in different locations each time the program runs, the attacker is not able to program buffer overflows with useful known addresses. For ASLR to work, the program and all libraries must be compiled to use position independent code (PIC), which uses relative offsets instead of absolute memory addresses.

Stack canaries

A stack canary is a compiler technique to ensure that a function will not be allowed to return if a buffer overflow took place that may have clobbered the return address.

At the start of a function, the compiler adds code to generate a random integer (the canary) and push it onto the stack before allocating space for the function’s local variables (the entire region of the stack used by a local function is called a frame). The canary sits between the return address and these variables. If there is a buffer overflow in a local variable that tries to change the return address, that overflow will have to clobber the value of the canary.

The compiler generates code to have the function check that the canary has a valid value before returning. If the value of the canary is not the original value then a buffer overflow occurred and it’s very likely that the return value has been altered.

However, you may still have a buffer overflow that does not change the value of the canary or the return address. Consider a function that has two local arrays (buffers). They’re both allocated on the stack within the same stack frame. If array A is in lower memory than array B then an overflow in A can affect the contents of B. Depending on the code, that can alter the way the function works. The same thing can happen with scalar variables (non-arrays). For instance, suppose the function allocates space for an integer followed by an array. An overflow in the array can change the value of the integer that’s in higher memory. The canary won’t detect this. Even if the overflow happened to clobber the return value as well, the check is made only when the function is about to return. Meanwhile, it’s possible that the overflow that caused other variables to change also altered the behavior of the function.

Stack canaries cannot fix this problem in general. However, the compiler (which creates the code to generate them and check them) can take steps to ensure that a buffer overflow cannot overwrite non-array variables, such as integers and floats. By allocating arrays first (in higher memory) and then scalar variables, the compiler can make sure that a buffer overflow in an array will not change the value of scalar variables. One array overflowing to another is still a risk, however, but it is most often the scalar variables that contain values that define the control flow of a function.

Command Injection

We looked at buffer overflow and printf format string attacks that enable the modification of memory contents to change the flow of control in the program and, in the case of buffer overflows, inject executable binary code (machine instructions). Other injection attacks enable you to modify inputs used by command processors, such as interpreted languages or databases. We will now look at these attacks.

SQL Injection

It is common practice to take user input and make it part of a database query. This is particularly popular with web services, which are often front ends for databases. For example, we might ask the user for a login name and password and then create a SQL query:

    ”SELECT * from logininfo WHERE username = '%s' AND password = '%s’;",
    uname, passwd);

Suppose that the user entered this for a password:

' OR 1=1 --

We end up creating this query string[1]:

SELECT * from logininfo WHERE username = 'paul' AND password = '' OR 1=1 -- ';

The “--” after “1=1” is a SQL comment, telling it to ignore everything else on the line. In SQL, OR operations have precendence over AND so the query checks for a null password (which the user probably does not have) or the condition 1=1, which is always true. In essence, the user’s “password” turned the query into one that ignores the user’s password and unconditionally validates the user.

Statements such as this can be even more destructive as the user can use semicolons to add multiple statements and perform operations such as dropping (deleting) tables or changing values in the database.

This attack can take place because the programmer blindly allowed user input to become part of the SQL command without validating that the user data does not change the quoting or tokenization of the query. A programmer can avoid the problem by carefully checking the input. Unfortunately, this can be difficult. SQL contains too many words and symbols that may be legitimate in other contexts (such as passwords) and escaping special characters, such as prepending backslashes or escaping single quotes with two quotes can be error prone as these escapes differ for different database vendors. The safest defense is to use parameterized queries, where user input never becomes part of the query but is brought in as parameters to it. For example, we can write the previous query as:

uname = getResourceString("username");
passwd = getResourceString("password");
query = "SELECT * FROM users WHERE username = @0 AND password = @1";
db.Execute(query, uname, passwd);

A related safe alternative is to use stored procedures. They have the same property that the query statement is not generated from user input and parameters are clearly identified.

While SQL injection is the most common code injection attack, databases are not the only target. Creating executable statements built with user input is common in interpreted languages, such as Shell, Perl, PHP, and Python. Before making user input part of any invocable command, the programmer must be fully aware of parsing rules for that command interpreter.

Shell attacks

The various POSIX[2] shells (sh, csh, ksh, bash, tcsh, zsh) are commonly used as scripting tools for software installation, start-up scripts, and tying together workflow that involves processing data through multiple commands. A few aspects of how many of the shells work and the underlying program execution environment can create attack vectors.

system() and popen() functions

Both system and popen functions are part of the Standard C Library and are common functions that C programmers use to execute shell commands. The system function runs a shell command while the popen function also runs the shell command but allows the programmer to capture its output and/or send it input via the returned FILE pointer.

Here we again have the danger of turning improperly-validated data into a command. For example, a program might use a function such as this to send an email alert:

char command[BUFSIZE];
snprintf(command, BUFSIZE, "/usr/bin/mail –s \"system alert\" %s", user);
FILE *fp = popen(command, "w");

In this example, the programmer uses snprintf to create the complete command with the desired user name into a buffer. This incurs the possibility of an injection attack if the user name is not carefully validated. If the attacker had the option to set the user name, she could enter a string such as:

nobody; rm -fr /home/*

which will result in popen running the following command:

sh -c "/usr/bin/mail -s \"system alert\" nobody; rm -fr /home/*"

which is a sequence of commands, the latter of which deletes all user directories.

Other environment variables

The shell PATH environment variable controls how the shell searches for commands. For instance, suppose


and the user runs the ls command. The shell will search through the PATH sequentially to find an executable filenamed ls:


If an attacker can either change a user’s PATH environment variable or if one of the paths is publicly writable and appears before the “safe” system directories, then he can add a booby-trapped command in one of those directories. For example, if the user runs the ls command, the shell may pick up a booby-trapped version in the /usr/local/bin directory. Even if a user has trusted locations, such as /bin and /usr/bin foremost in the PATH, an intruder may place a misspelled version of a common command into another directory in the path. The safest remedy is to make sure there are no untrusted directories in PATH.

Some shells allow a user to set an ENV or BASH_ENV variable that contains the name of a file that will be executed as a script whenever a non-interactive shell is started (when a shell script is run, for example). If an attacker can change this variable then arbitrary commands may be added to the start of every shell script.

Shared library environment variables

In the distant past, programs used to be fully linked, meaning that all the code needed to run the program, aside from interactions with the operating system, was part of the executable program. Since so many programs use common libraries, such as the Standard C Library, they are not compiled into the code of an executable but instead are dynamically loaded when needed.

Similar to PATH, LD_LIBRARY_PATH is an environment variable used by the operating system’s program loader that contains a colon-separated list of directories where libraries should be searched. If an attacker can change a user’s LD_LIBRARY_PATH, common library functions can be overwritten with custom versions. The LD_PRELOAD environment variable allows one to explicitly specify shared libraries that contain functions that override standard library functions.

LD_LIBRARY_PATH and LD_PRELOAD will not give an attacker root access but they can be used to change the behavior of program or to log library interactions. For example, by overwriting standard functions, one may change how a program generates encryption keys, uses random numbers, sets delays in games, reads input, and writes output.

As an example, let’s suppose we have a trial program that checks the current time against a hard-coded expiration time:

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

main(int argc, char **argv)
    unsigned long expiration = 1483228800;
    time_t now;

    /* check software expiration */
    now = time(NULL);
    if (time(NULL) > (time_t)expiration) {
        fprintf(stderr, "This software expired on %s", ctime(&expiration));
        fprintf(stderr, "This time is now %s", ctime(&now));
        fprintf(stderr, "You're good to go: %lu days left in your trial.\n",
    return 0;

When run, we may get output such as:

$ ./testdate
This software expired on Sat Dec 31 19:00:00 2016
This time is now Sun Feb 18 15:50:44 2018

Let us write a replacement time function that always returns a fixed value that is less than the one we test for. We’ll put it in a file called time.c:

unsigned long time() {
    return (unsigned long) 1483000000;

We compile it into a shared library:

gcc -shared -fPIC time.c -o

Now we set LD_PRELOAD and run the program:

$ export LD_PRELOAD=$PWD/
$ ./testdate
You're good to go: 2 days left in your trial.

Note that our program now behaves differently and we never had to recompile it or feed it different data!

Input sanitization

The important lesson in writing code that uses any user input in forming commands is that of input sanitization. Input must be carefully validated to make sure it conforms to the requirements of the application that uses it and does not try to execute additional commands, escape to a shell, set malicious environment variables, or specify out-of-bounds directories or devices.

File descriptors

POSIX systems have a convention that programs expect to receive three open file descriptors when they start up:

  • file descriptor 0: standard input

  • file descriptor 1: standard output

  • file descriptor 2: standard error

Functions such as printf, scanf, puts, getc and others expect these file desciptors to be available for input and output. When a program opens a new file, the operating system searches through the file descriptor table and allocates the first available unused file descriptor. Typically this will be file descriptor 3. However, if any of the three standard file descriptors are closed, the operating system will use one of those as an available, unused file descriptor.

The vulnerability lies in the fact that we may have a program running with elevated privileges (e.g., setuid root) that modifies a file that is not accessible to regular users. If that program also happens to write to the user via, say, printf, there is an opportunity to corrupt that file. The attacker simply needs to close the standard output (file descriptor 1) and run the program. When it opens its secret file, it will be given file descriptor 1 and will be able to do its read and write operations on the file. However, whenever the program will print a message to the user, the output will not be seen by the user as it will be directed to what printf assumes is the standard output: file descriptor 1. Printf output will be written onto the secret file, thereby corrupting it.

The shell command (bash, sh, or ksh) for closing the standard output file is an obscure-looking >&-. For example:

./testfile >&-

Comprehension Errors

The overwhelming majority of security problems are caused by bugs or misconfigurations. Both often stem from comprehension errors. These are mistakes created when someone – usually the programmer or administrator – does not understand the details and every nuance of what they are doing. Some example include:

  • Not knowing all possible special characters that need escaping in SQL commands.

  • Not realizing that the standard input, output, or error file descriptors may be closed.

  • Not understanding how access control lists work or how to configure mandatory access control mechanisms such as type enforcement correctly.

If we consider the Windows CreateProcess function, we see it is defined as:

BOOL WINAPI CreateProcess(
  _In_opt_    LPCTSTR               lpApplicationName,
  _Inout_opt_ LPTSTR                lpCommandLine,
  _In_opt_    LPSECURITY_ATTRIBUTES lpProcessAttributes,
  _In_opt_    LPSECURITY_ATTRIBUTES lpThreadAttributes,
  _In_        BOOL                  bInheritHandles,
  _In_        DWORD                 dwCreationFlags,
  _In_opt_    LPVOID                lpEnvironment,
  _In_opt_    LPCTSTR               lpCurrentDirectory,
  _In_        LPSTARTUPINFO         lpStartupInfo,
  _Out_       LPPROCESS_INFORMATION lpProcessInformation);

We have to wonder whether a programmer who does not use this frequently will take the time to understand the ramifications of correctly setting process and thread security attributes, the current directory, environment, inheritance handles, and so on. There’s a good chance that the programmer will just look up an example on places such as or and copy something that seems to work, unaware that there may be obscure side effects that compromise security.

As we will see in the following sections, comprehension errors also apply to the proper understanding of things as basic as various ways to express characters.

Directory parsing

Some applications, notably web servers, accept hierarchical filenames from a user but need to ensure that they restrict access only to files within a specific point in the directory tree. For example, a web server may need to ensure that no page requests go outside of /home/httpd/html.

An attacker may try to gain access by using paths that include .. (dot-dot), which is a link to the parent directory. For example, an attacker may try to download a password file by requesting

The hope is that the programmer did not implement parsing correctly and might try simply suffixing the user-requested path to a base directory:

"/home/httpd/html/" + "../../../etc/passwd"

to form


which will retrieve the password file, /etc/passwd.

A programmer may anticipate this and check for dot-dot but has to realize that dot-dot directories can be anywhere in the path. This is also a valid pathname but one that should be rejected for trying to escape to the parent:

Moreover, the programmer cannot just search for .. because that can be a valid part of a filename. All three of these should be accepted:

Also, extra slashes are perfectly fine in a filename, so this is acceptable:

The programmer should also track where the request is in the hierarchy. If dot-dot doesn’t escape above the base directory, it should most likely be accepted:

These are not insurmountable problems but they illustrate that a quick-and-dirty attempt at filename processing may be riddled with bugs.

Unicode parsing

If we continue on the example of parsing pathnames in a web server, let us consider a bug in early releases of Microsoft’s IIS (Internet Information Services, their web server). IIS had proper pathname checking to ensure that attempts to get to a parent are blocked:

Once the pathname was validated, it was passed to a decode function that decoded any embedded Unicode characters and then processed the request.

The problem with this technique was that non-international characters (traditional ASCII) could also be written as Unicode characters. A “/” could also be written in HTML as its hexadecimal value, %2f (decimal 47). It could also be represented as the two-byte Unicode sequence %c0%af.

The reason for this stems from the way Unicode was designed to support compatibility with one-byte ASCII characters. This encoding is called UTF–8. If the first bit of a character is a 0, then we have a one-byte ASCII character (in the range 0..127). However, if the first bit is a 1, we have a multi-byte character. The number of leading 1s determine the number of bytes that the character takes up. If a character starts with 110, we have a two-byte Unicode character.

With a two-byte character, the UTF–8 standard defines a bit pattern of

110a bcde   10fg hijk

The values a-k above represent 11 bits that give us a value in the range 0..2047. The “/” character, 0x2f, is 47 in decimal and 0010 1111 in binary. The value represents offset 47 into the character table (called codepoint in Unicode parlance). Hence we can represent the “/” as 0x2f or as the two byte Unicode sequence:

1100 0000   1010 1111

which is the hexadecimal sequence %c0%af. Technically, this is disallowed. The standard states that codepoints less than 128 must be represented as one byte but the two byte sequence is supported by most Unicode parsers. We can also construct a valid three-byte sequence too.

Microsoft’s bug was that they ignored parsing %c0%af as being equivalent to a / because it should not have been used to represent the character. However, the Unicode parser was happy to translate it and attackers were able to use this to access any file in on a server running IIS. This bug also gave attackers the ability to invoke, the command interpreter, and execute any commands on the server.

After Microsoft fixed the multi-byte Unicode bug, another problem came up. The parsing of escaped characters was recursive, so if the resultant string looked like a Unicode hexadecimal sequence, it would be re-parsed.

As an example of this, let’s consider the backslash (\), which Microsoft treats as equivalent to a slash (/) in URLs since their native pathname separator is a backlash[3].

The backslash can be written in a URL in hexadecimal format as %5c. The “%” character can be expressed as %25. The “5” character can be expressed as %35. The “c” character can be expressed as %63. Hence, if the URL parser sees the string %%35c, it would expand the %35 to the character “5”, which would result in %5c, which would then be converted to a \. If the parser sees %25%35%63, it would expand each of the %nn components to get the string %5c, which would then be converted to a \. As a final example, if the parser comes across %255c, it will expand %25 to % to get the string %5c, which would then be converted to a \.

It is not trivial to know what a name relates to but it is clear that all conversions have to be done before the validity of the pathname is checked. As for checking the validity of the pathname in an application, it is error-prone. The operating system itself parses a pathname a component at a time, traversing the directory tree and checking access rights as it goes along. The application is trying to recreate a similar action without actually traversing the file system but rather by just parsing the name and mapping it to a subtree of the file system namespace.

TOCTTOU attacks

TOCTTOU stands for Time of Check to Time of Use. If we have code of the form:

if I am allowed to do something
    then do it

we may be exposing ourselves to a race condition. There is a window of time between the test and the action. If an attacker can change the condition after the check then the action may take place even if the check should have failed.

One example of this is the print spooling program, lpr. It runs as a setuid program with root privileges so that it can copy a file from a user’s directory into a privileged spool directory that serves as a queue of files for printing. Because it runs as root, it can open any file, regardless of permissions. To keep the user honest, it will check access permissions on the file that the user wants to print and then, only if the user has legitimate read access to the file, it will copy it over to the spool directory for printing. An attacker can create a link to a readable file and then run lpr in the background. At the same time, he can change the link to point to a file for which he does not have read access. If the timing is just perfect, the lpr program will check access rights before the file is re-linked but will then copy the file for which the user has no read access.

Another example of the TOCTTOU race condition is the set of temporary filename creation functions (tempnam, tempnam, mktemp, GetTempFileName, etc.). These functions create a unique filename when they are called but there is no guarantee that an attacker doesn’t create a file with the same name before that filename is used. If the attacker creates and opens a file with the same name, she will have access to that file for as long as it is open, even if the user’s program changes access permissions for the file later on.

The best defense for the temporary file race condition is to use the mkstemp function, which creates a file based on a template name and opens it as well, avoiding the race condition between checking the uniqueness of the name and opening the file.

  1. Note that sprintf is vulnerable to buffer overflow. We should use snprintf, which allows one to specify the maximum size of the buffer.  ↩

  2. Unix, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Android, etc.  ↩

  3. the official Unicode name for the slash and backslash characters are solidus and reverse solidus, respectively.  ↩

App confinement

Two lessons we learned from experience are that applications can be compromised and that applications may not always be trusted. Server applications, in particular, such as web servers and mail servers have been compromised over and over again. This is particularly harmful as they often run with elevated privileges and on systems on which normal users do not have accounts. The second category of risk is that we may not always trust an application. We trust our web server to work properly but we cannot necessarily trust that the game we downloaded from some unknown developer will not try to upload our files, destroy our data, or try to change our system configuration. In fact, unless we have the ability to scrutinize the codebase of a service, we will not know for sure if it tries to modify any system settings or writes files to unexpected places.

With this resignation to security in mind, we need to turn our attention to limiting the resources available to an application and making sure that a misbehaving application cannot harm the rest of the system. These are the goals of confinement.

Our initial thoughts to achieving confinement may involve proper use of access controls. For example, we can run server applications as low-privilege users and make sure that we have set proper read/write/execute permissions on files, read/write/search permissions on directories, or even set up role-based policies.

However, access controls usually do not give us the ability to set permissions for “don’t allow access to anything else.” For example, we may want our web server to have access to all files in /home/httpd but nothing outside of that directory. Access controls do not let us express that rule. Instead, we are responsible for changing the protections of every file on the system and making sure it cannot be accessed by “other”. We also have to hope that no users change those permissions. In essence, we must disallow the ability for anyone to make files publicly accessible because we never want our web server to access them. We may be able to use mandatory access control mechanisms if they are available but, depending on the system, we may not be able to restrict access properly either. More likely, we will be at risk of comprehension errors and be likely to make a configuration error, leaving parts of the system vulnerable. To summarize, even if we can get access controls to help, we will not have high assurance that they do.

Access controls also only focus on protecting access to files and devices. A system has other resources, such as CPU time, memory, disk space, and network. We may want to control how much of all of these an application is allowed to use. POSIX systems provide a setrlimit system call that allows one to set limits on certain resources for the current process and its children. These controls include the ability to set file size limits, CPU time limits, various memory size limits, and maximum number of open files.

We also may want to control the network identity for an application. All applications share the same IP address on a system but this may allow a compromised application to exploit address-based access controls. For example, you may be able to connect to or even log into system that believe you are a trusted computer. An exploited application may end up confusing network intrusion detection systems.

Just limiting access through resource limits and file permissions is also insufficient for services that run as root. If an attacker can compromise an app and get root access to execute arbitrary functions, she can change resource limits (just call setrlimit with different values), change any file permissions, and even change the IP address and domain name of the system.

In order to truly confine an application, we would like to create a set of mechanisms that enforce access controls to all of a system’s resources, are easy to use so that we have high assurance in knowing that the proper restrictions are in place, and work with a large class of applications. We can’t quite get all of this yet but we can come close.


The oldest app confinement mechanism is Unix’s chroot system call and command, originally introduced in 1979 in the seventh edition[1]. The chroot system call changes the root directory of the calling process to the directory specified as a parameter.


Sets the root of the file system to /home/httpd/html for the process and any processes it creates. The process cannot see any files outside that subset of the directory tree. This isolation is often called a chroot jail.


If you run chroot, you will likely get an error along the lines of:

# chroot newroot
chroot: failed to run command ‘/bin/bash’: No such file or directory

This is because /bin/bash is not within the root (in this case, the newroot directory). You’ll then create a bin subdirectory and try running chroot again and get the same error:

# mkdir newroot/bin
# ln /bin/bash newroot/bin/bash
# chroot newroot
chroot: failed to run command ‘/bin/bash’: No such file or directory

You’ll find that is also insufficient and that you’ll need to bring in the shared libraries that /bin/bash needs by mounting /lib, /lib64, and /usr/lib within that root just to enable the shell to run. Otherwise, it cannot load the libraries it needs since it cannot see above its root (i.e., outside its jail). To simplify this process, a jailkit simplifies the process of setting up a chroot jail by providing a set of utilities to make it easier to create the desired environment within the jail and populate it with basic accounts, commands, and directories.

Problems with chroot

Chroot only limits access to the file system namespace. It does not restrict access to resources and does not protect the machine’s network identity. Applications that are compromised to give the attacker root access make the entire system vulnerable since the attacker has access to all system calls.

Chroot is available only to administrators. If this was not the case then any user would be able to get root access within the chroot jail. You would: 1. Create a chroot jail 2. Populate it with the shell program and necessary support libraries 3. Link the su command (set user, which allows you to authenticate to become any user) 4. Create password files within the jail with a known password for root. 5. Use the chroot command to enter the jail. 6. Run su root to become the root user. The command will prompt you for a password and validate it against the password file. Since all processes run within the jail, the password file is the one you set up.

You’re still in the jail but you have root access.

Escaping from chroot

If someone manages to compromise an application running inside a chroot jail and become root, they are still in the jail but have access to all system calls. For example, they can send signals to kill all other processes or shut down the system. This would be an attack on availability.

Attaining root access also provides a few ways of escaping the jail. On POSIX systems, all non-networked devices are accessible as files within the filesystem. Even memory is accessible via a file (/dev/mem). An intruder in a jail can create a memory device (character device, major number = 1, minor number = 1):

mknod mem c 1 1

With the memory device, the attacker can patch system memory to change the root directory of the jail. More simply, an attacker can create a block device with the same device numbers as that of the main file system. For example, the root file system on my Linux system is /dev/sda1 with a major number of 8 and a minor number of 1. An attacker can recreate that in the jail:

mknod rootdisk b 8 1

and then mount it as a file system within the jail:

mount -t ext4 rootdisk myroot

Now the attacker, still in the jail, has full access to the entire file system, which is as good as being out of the jail. He can add user accounts, change passwords, delete log files, run any commands, and even reboot the system to get a clean login.

FreeBSD Jails

Chroot was good in confining the namespace of an application but useless against providing security if an application had root access and did nothing to restrict access to other resources.

FreeBSD Jails are an enhancement to the idea of chroot. Jails provide a restricted filesystem namespace, just like chroot does, but also place restrictions on what processes are allowed to do within the jail, including selectively removing privileges from the root user in the jail. For example, processes within a jail may be configured to:

  • Bind only to sockets with a specified IP address and specific ports
  • Communicate only with other processes within the jail and none outside
  • Not be able to load kernel modules, even if root
  • Have restricted access to system calls that include:
    • Ability to create raw network sockets
    • Ability to create devices
    • Modify the network configuration
    • Mount or unmount filesystems

FreeBSD Jails are a huge improvement over chroot since known escapes, such as creating devices and mounting filesystems and even rebooting the system are disallowed. Depending on the application, policies may be coarse. The changed root provides all or nothing access to a part of the file system. This does not make Jails suitable for applications such as a web browser, which may be untrusted but may need access to files outside of the jail. Think about web-based applications such as email, where a user may want to upload or download attachments. Jails also do not prevent malicious apps from accessing the network and trying to attack other machines … or from trying to crash the host operating system. Moreover, FreeBSD Jails is a BSD-only solution. With an estimated 0.95…1.7% share of server deployments, it is a great solution on an operating system that is not that widely used.

Linux namespaces, cgroups, and capabilities

Linux’s answer to FreeBSD Jails was a combination of three elements: control groups, namespaces, and capabilities.

Control groups (cgroups)

Linux control groups, also called cgroups, allow you to allocate resources such as CPU time, system memory, disk bandwidth, network bandwidth, and the ability to monitor resource usage among user-defined groups of processes. This allows, for example, an administrator to allocate a larger share of the processor to a critical server application.

An administrator creates one or more cgroups and assigns resource limits to each of them. Then any application can be assigned to a control group and will not be able to use more than the resource limits configured in that control group. Applications are unaware of these limits. Control groups are organized in a hierarchy similar to processes. Child cgroups inherit some attributes from the parents.

Linux namespaces

Chroot only restricted the filesystem namespace. The filesystem namespace is the best known namespace in the system but not the only one. Linux namespaces Namespaces provide control over how processes are isolated in the following namespaces:

Namespace Description Controls
IPC System V IPC, POSIX message queues Objects created in an IPC namespace are only visible to other processes in that namespace (CLONE_NEWIPC)
Network Network devices, stacks, ports Isolates IP protocol stacks, IP routing tables, firewalls, socket port numbers ( CLONE_NEWNET)
Mount Mount points A set of processes can have their own distinct mount points and view of the file system (CLONE_NEWNS)
PID Process IDs Processes in different PID namespaces can have their process IDs – the child cannot see parent processes or other namespaces (CLONE_NEWPID)
User User & group IDs Per-namespace user/group IDs. Also, you can be root in a namespace but have restricted privileges ( CLONE_NEWUSER )
UTS host name and domain name setting hostname and domainname will not affect rest of the system (CLONE_NEWUTS)
Cgroup control group Sets a new control group for a process (CLONE_NEWCGROUP)

A process can dissociate any or all of these namespaces from its parent via the unshare system call. For example, by unsharing the PID namespace, a process gets a no longer sees other processes and will only see itself and any child processes it creates.

The Linux clone system call is similar to fork in that it creates a new process. However, it allows you to pass flags that will specify which parts of the execution context will be shared with the parent. For example, a cloned process may choose to share memory and open file descriptors, which will make it behave like threads. It can also choose to share – or not – any of the elements of the namespace.


A problem that FreeBSD Jails tackled was that of restricting the power of root inside a Jail. You could be a root user but still disallowed from executing certain system calls. POSIX (Linux) capabilities[2] tackle this issue as well.

Traditionally, Unix systems distinguished privileged versus unprivileged processes. Privileged processes were those that ran with a user ID of 0, called the root user. When running as root, the operating system would allow access to all system calls and all access permission checks were bypassed. You could do anything.

Linux capabilities identify groups of operations, called capabilities, that can be controlled independently on a per-thread basis. The list is somewhat long, 38 groups of controls, and includes capabilities such as:

  • CAP_CHOWN: make arbitrary changes to file UIDs and GIDs
  • CAP_DAC_OVERRIDE: bypass read/write/execute checks
  • CAP_KILL: bypass permission checks for sending signals
  • CAP_NET_ADMIN: network management operations
  • CAP_NET_RAW: allow RAW sockets
  • CAP_SETUID: arbitrary manipulation of process UIDs
  • CAP_SYS_CHROOT: enable chroot

The kernel keeps track of four capability sets for each thread. A capability set is a list of zero or more capabilities. The sets are:

  • Permitted: If a capability is not in this set, the thread or its children can never require that capability. This limits the power of what a process and its children can do.

  • Inheritable: These capabilities will be inherited when a thread calls execve to execute a program (POSIX programs are executed with the same thread; we are not creating a new process)

  • Effective: This is the current set of capabilities that the thread is using. The kernel uses these to perform permission checks.

  • Ambient: This is similar to Inheritable and contains a set of capabilities that are preserved across an execve of a program that is not privileged. If a setuid or setgid program is run, will clear the ambient set. These are created to allow a partial use of root features in a controlled manner. It is useful for user-level device drivers or software that needs a specific privilege (e.g., for certain networking operations).

A child process created via fork (the standard way of creating processes) will inherit copies of its parent’s capability sets following the rules of which capabilities have been marked as inheritable.

A set of capabilities can be assigned to an executable file by the administrator. They are stored as a file’s extended attributes (along with access control lists, checksums, and arbitrary user-defined name-value pairs). When the program runs, the executing process may further restrict the set of capabilities under which it operates if it chooses to do so (for example, after performing an operation that required the capability and knowing that it will no longer need to do so).

From a security point of view, the key concept of capabilities is that they allow us to provide limited elevation of privileges to a process. A process does not need to run as root (user ID 0) but can still be granted very specific privileges. For example, we can grant the ping command the ability to access raw sockets so it can send an ICMP ping message on the network but not have any other administrative powers. The application does not need to run as root and even if an attacker manages to inject code, the opportunities for attack will be restricted.

The Linux combination of cgroups, namespaces, and capabilities provides a powerful set of mechanisms to

  1. Set limits on the system resources (processor, disk, network) that a group of processes will use.

  2. Constrain the namespace, making parts of the filesystem or the existence of other processes or users invisible.

  3. Give restricted privileges to specific applicatiosn so they do not need to run as root.

This enables us to create stronger jails and have a fine degree of control as to what processes are or are not allowed to do in that jail.

While bugs have been found these mechanisms, the more serious problem is that of comprehension. The system has become far, far more complex than it was in the days of chroot. A user has to learn quite a lot to use these mechanisms properly. Failure to understand their behavior fully can create vulnerabilities. For example, namespaces do not prohibit a process from making privileged system calls. They simply limit what a process can see. A process may not be able to send a kill signal to another process only because it does not share the same process ID namespace.

Together with capabilities, namespaces allow a restricted environment that also places limits on the abilities to perform operations even if a process is granted root privileges. This enables ordinary users to create namespaces. You can create a namespace and even create a process running as a root user (UID 0) within that namespace but it will have no capabilities beyond those that were granted to the user; the user ID of 0 gets mapped by the kernel to a non-privileged user.


Software rarely lives as an isolated application. Some software requires multiple applications and most software relies on the installation of other libraries, utilities, and packages. Keeping track of these dependencies can be difficult. Worse yet, updating one shared component can sometimes cause another application to break. What was needed was a way to isolate the installation, execution, and management of multiple software packages that run on the same system.

Various attempts were undertaken to address these problems.

  1. The most basic was to fix problems when they occurred. This required carefully following instructions for installation, update, and configuration of software and extensive testing of all services on the system when anything changed. Should something break, the service would be unavailable until the problems were fixed.

  2. A drastic, but thorough, approach to isolation was to simply run each service on its own computer. That avoids conflicts in library versions and other dependencies. However, it is an expensive solution, is cumbersome, and is often overkill in most environments.

  3. Finally, administrators could deploy virtual machines. This is a technology that allows one to run multiple operating systems on one computer and gives the illusion of services running on distinct systems. However, this is a heavyweight solution. Every service needs its own installation of the operating system and all supporting software for the service as well as standard services (networking, device management, shell, etc.). It is not efficient in terms of CPU, disk, or memory resources – or even administration effort.

Containers are a mechanism that were originally created not for security but to make it easy to package, distribute, relocate, and deploy collections of software. The focus of containers is not to enable end users to install and run their favorite apps but rather for administrators to be able to deploy a variety of services on a system. A container encapsulates all the necessary software for a service, all of its dependencies, and its configuration into one package that can be easily passed around, installed, and removed.

In many ways, a container feels like a virtual machine. Containers provide a service with a private process namespace, its own network interface, and its own set of libraries to avoid problems with incompatible versions used by other software. Containers also allow an administrator to give the service restricted powers even if it runs with root (administrator) privileges. Unlike a virtual machine, however, multiple containers on one system all share the same operating system and kernel modules.

Containers are not a new mechanism. They are implemented using Linux’s control groups, namespaces, and capabilities to provide resource control, isolation, and privilege control, respectively. They also make use of a copy on write file system. This makes it easy to create new containers where the file system can track the changes made by that container over a clean base version of a file system. Containers can also take advantage of AppArmor, which is a Linux kernel module that provides a basic form of mandatory access controls based on the pathnames of files. It allows an administrator to restrict the ability of a program to access specific files even within its file system namespace.

The best-known and first truly popular container framework is Docker. A Docker Image is a file format that creates a package of applications, their supporting libraries, and other needed files. This image can be stored and deployed on many environments. Docker made it easy to deploy containers using git-like commands (docker push, docker commit) and also to perform incremental updates. By using a copy on write file system, Docker images can be kept immutable (read-only) while any changes to the container during its execution are stored separately.

As people found Docker to be useful, the next design goal was to make it easier to manage containers across a network of many computers. This is called container orchestration. There are many solutions for this, including Apache Mesos, Kubernetes, Nomad, and Docker Swarm. The best known of these is kubernetes, which was designed by Google. It coordinates storage of containers, failure of hardware and containers, and dynamic scaling: deploying the container on more machines to handle increased load. Kubernetes is coordination software, not a container system; it uses the Docker framework to run the actual container.

Even though containers were designed to simplify software deployment rather than provide security to services, they do offer several benefits in the area of security:

  • They make use of namespaces, cgroups, and capabilities with restricted capabilities configured by default. This provides isolation among containers.

  • Containers provide a strong separation of policy (defined by the container configuration) from the enforcement mechanism (handled by the operating system).

  • They improve availability by providing the ability to have a watchdog timer monitor the running of applications and restarting them if necessary. With orchestration systems such as Kubernetes, containers can be re-deployed on another system if a computer fails.

  • The environment created by a container is reproducible. The same container can be deployed on multiple systems and tested in different environments. This provides consistency and aids in testing and ensuring that the production deployment matches the one used for development and test. Moreover, it is easy to inspect exactly how a container is configured. This avoids problems encountered by manual installation of components where an administrator may forget to configure something or may install different versions of a required library.

  • While containers add nothing new to security, they help avoid comprehension errors. Even default configurations will provide improved security over the defaults in the operating system and configuring containers is easier than learning and defining the rules for capabilities, control groups, and namespaces. Administrators are more likely to get this right or import containers that are already configured with reasonable restrictions.

Containers are not a security panacea. Because all containers run under the same operating system, any kernel exploits can affect the security of all containers. Similarly, any denial of service attacks, whether affecting the network or monopolizing the processor, will impact all containers on the system. If implemented and configured properly, capabilities, namespaces, and control groups should ensure that privilege escalation cannot take place. However, bugs in the implementation or configuration may create a vulnerability. Finally, one has to be concerned with the integrity of the container itself. Who configured it, who validated the software inside of it, and is there a chance that it may have been modified by an adversary either at the server or in transit?

Virtual Machines

As a general concept, virtualization is the addition of a layer of abstraction to physical devices. With virtual memory, for example, a process has the impression that it owns the entire memory address space. Different processes can all access the same virtual memory location and the memory management unit (MMU) on the processor maps each access to the unique physical memory locations that are assigned to the process.

Process virtual machines present a virtual CPU that allows programs to execute on a processor that does not physically exist. The instructions are interpreted by a program that simulates the architecture of the pseudo machine. Early pseudo-machines included o-code for BCPL and P-code for Pascal. The most popular pseudo-machine today is the Java Virtual Machine (JVM). This simulated hardware does not even pretend to access the underlying system at a hardware level. Process virtual machines will often allow “special” calls to invoke system functions or provide a simulation of some generic hardware platform.

Operating system virtualization is provided by containers, where a group of processes is presented with the illusion of running on a separate operating system but in reality shares the operating system with other groups of processes – they are just not visible to the processes in the container.

System virtual machines allow a physical computer to act like several real machines with each machine running its own operating system (on a virtual machine) and applications that interact with that operating system. The key to this machine virtualization is to not allow each operating system to have direct access to certain privileged instructions in the processor. These instructions would allow an operating system to directly access I/O ports, MMU settings, the task register, the halt instruction and other parts of the processor that could interfere with the processor’s behavior and with the other operating systems on the system. Instead, a trap and emulate approach is used. Privileged instructions, as well as system interrupts, are caught by the Virtual Machine Monitor (VMM), also known as a hypervisor. The hypervisor arbitrates access to physical resources and presents a set of virtual device interfaces to each guest operating system (including the memory management unit, I/O ports, disks, and network interfaces). The hypervisor also handles preemption. Just as an operating system may suspend a process to allow another process to run, the hypervisor will suspend an operating system to give other operating systems a chance to run.

The two configurations of virtual machines are hosted virtual machines and native virtual machines. With a hosted virtual machine (also called a type 2 hypervisor), the computer has a primary operating system installed that has access to the raw machine (all devices, memory, and file system). This host operating system does not run in a virtual environment. One or more guest operating systems can then be run on virtual machines. The VMM serves as a proxy, converting requests from the virtual machine into operations that get sent to and executed on the host operating system. A native virtual machine (also called a type 1 hypervisor) is one where there is no “primary” operating system that owns the system hardware. The hypervisor is in charge of access to the devices and provides each operating system drivers for an abstract view of all the devices.

Security implications

Unlike app confinement mechanisms such as jails, containers, or sandboxes, virtual machines enable isolation all the way through the operating system. A compromised application, even with escalated privileges, can wreak havoc only within the virtual machine. Even compromises to the operating system kernel are limited to that virtual machine. However, a compromised virtual machine is not much different form having a compromised physical machine sitting inside your organization: not desirable and capable of attacking other systems in your environment.

Multiple virtual machines are usually deployed on one physical system. In cases such as cloud services (e.g., such as those provided by Amazon), a single physical system may host virtual machines from different organizations or running applications with different security requirements. If a malicious application on a highly secure system can detect that it is co-resident on a computer that is hosting another operating system and that operating system provides fewer restrictions, the malware may be able to create a covert channel to communicate between the highly secure system with classified data and the more open system. A covert channel is a general term to describe the the ability for processes to communicate via some hidden mechanism when they are forbidden by policy to do so. In this case, the channel can be created via a side channel attack. A side channel is the ability to get or transmit information using some aspects of a system’s behavior, such as changes in power consumption, radio emissions, acoustics, or performance. For example, processes on both systems, even though they are not allowed to send network messages, may create a means of communicating by altering and monitoring system load. The malware on the classified VM can create CPU-intensive task at specific times. Listener software on the unclassified VM can do CPU-intensive tasks at a constant rate and periodically measure their completion times. These completion times may vary based on whether the classified system is doing CPU-intensive work. The variation in completion times creates a means of sending 1s and 0s and hence transmitting a message.

  1. Note that Wikipedia and many other sites refer to this as “Version 7 Unix”. Unix has been under continuous evolution at Bell Labs from 1969 through approximately 1989. As such, it did not have versions. Instead, an updated set of manuals was published periodically. Installations of Unix have been referred to by the editions of their manuals.  ↩

  2. Linux capabilities are not to be confused with the concept of capability lists, which are a form of access control that Linux does not use).  ↩

Application Sandboxing

The goal of an application sandbox is to provide a controlled and restricted environment for code execution. This can be useful for applications that may come from untrustworthy sources, such as games from unknown developers or software downloaded from dubious sites. The program can run with minimal risk of causing widespread damage to the system. Sandboxes are also used by security researchers to observe how software behaves: what the program trying to do and whether it is attempting to access any resources in a manner that is suspicious for the application. This can help identify the presence of malware within a program. The sandbox defines and enforces what an individual application is allowed to do while executing in within its sandbox.

We previously looked at isolation via jails and containers, which use mechanisms that include namespaces, control groups, and capabilities. These constitute a widely-used form of sandboxing. However, these techniques focus on isolating an application (or group of processes) from other processes, restricting access to parts of the file system, and/or providing a separate network stack with a new IP address.

While this is great for running services without the overhead of deploying virtual machines, it does not sufficiently address the basic needs of running normal applications. We want to protect users from their applications: give users the ability to run apps but restrict what those apps can do on a per-app basis.

For example, you may want to make sure that a program accesses only files under your home directory with a suffix of “.txt”, and only for reading, without restricting the entire file system namespace as chroot would do, which would require creating a separate directory structure for shared libraries and other standard components the application may need. As another example, you might want an application to have access only to TCP networking. With a mechanism such as namespaces, we cannot exercise control over the names of files that an application can open or their access modes. Namespaces also do not allow us to control how the application interacts with the network. Capabilities allow us to restrict what a process running with root privileges can do but offer no ability to restrict more fundamental operations, such as denying a process the ability to read a file even if that file has read access enabled. The missing ingredient is rule-based policies to define precisely what system calls an application can invoke – down to the parameters of the system calls of interest.

Instead of building a jail (a container), we will add an extra layer of access control. An application will have the same view of the operating system as any other application but will be restricted in what it can do.

Sandboxing is currently supported on a wide variety of platforms at either the kernel or application level. We will examine four types of application sandboxes:

  1. User-level validation
  2. OS support
  3. Browser-based application sandboxing
  4. The Java sandbox

Note that there are many other sandbox implementations. This is just a representative sampling.

Application sandboxing via system call interposition & user-level validation

Applications interact with their environment via system calls to the operating system. Any interaction that an application needs to do aside from computation, whether legitimate or because it has been compromised, must be done through system calls: accessing files or devices, changing permissions, accessing the network, talking with other processes, etc.

An application sandbox will allow us to create policies that define which system calls are permissible to the application and in what way they can be used.

If the operating system does not provide us with the required support and we do not have the ability to recompile an application to force it to use alternate system call libraries, we can rely on system call interposition to construct a sandbox. System call interposition is the process of intercepting an app’s system calls and performing additional operations. The technique is also called hooking. In the case of a sandbox, it will intercept a system call, inspect its parameters, and decide whether to allow the system call to take place or return an error.

Example: Janus

One example of doing validation at the user level is the Janus sandboxing system, developed at UC Berkeley, originally for SunOS but later ported to Linux. Janus uses a loadable, lightweight, kernel module called mod_janus. The module initializes itself by setting up hooks to redirect system call requests to to itself. A hook is simply a mechanism that redirects an API request somewhere else and allows it to return back for normal processing. For example, a function can be hooked to simply log the fact that it has been called. The Janus kernel module copies the system call table to redirect the vector of calls to the mod_janus.

A user-configured policy file defines the allowable files and network operations for each sandboxed application. Users run applications through a Janus launcher/monitor program, which places the application in the sandbox. The monitor parses the policy file and spawns a child process for the user-specified program. The child process executes the actual application. The parent Janus process serves as the monitor, running a policy engine that receives system call notifications and decides whether to allow or disallow the system call.

Whenever a sandboxed application makes a system call, the call is redirected by the hook in the kernel to the Janus kernel module. The module blocks the thread (it is still waiting for the return from the system call) and signals the user-level Janus process that a system call has been requested. The user-level Janus process’ policy engine then requests all the necessary information about the call (calling process, type of system call, parameters). The policy engine makes a policy decision to determine whether, based on the policy, the process should be permitted to make the system call. If so, the system call is directed back to the operating system. If not, an error code is returned to the application.

Challenges of user-level validation

The biggest challenge with implementing Janus is that the user-level monitor must mirror the state of the operating system. If the child process forks a new process, the Janus monitor must also fork. It needs to keep track of not just network operations but the proper sequencing of the steps in the protocol to ensure that no improper actions are attempted on the network. This is a sequence of socket, bind, connect, read/write, and shutdown system calls. If one fails, chances are that the others should not be allowed to take place. However, the Janus monitor does not have the knowledge of whether a particular system call succeeded or not; approved calls are simply forwarded from the module to the kernel for processing. Failure to handle this correctly may enable attack vectors such as trying to send data on an unconnected socket.

The same applies with file operations. If a file failed to open, read and write operations should not be allowed to take place. Keeping track of state also gets tricky if file descriptors are duplicated (e.g., via the dup2 system call); it is not clear whether any requested file descriptor is a valid one or not.

Pathname parsing of file names has to be handled entirely by the monitor. We earlier examined the complexities of processing "../" sequences in pathnames. Janus has to do this in order to validate any policies on permissible file names or directories. It also has to keep track of relative filenames since the application may change the current directory at any time via the chdir system call. This means Janus needs to intercept chdir requests and process new pathnames within the proper context. Moreover, the application may change its entire namespace if the process calls chroot.

File descriptor can cause additional problems. A process can pass an open file descriptor to another process via UNIX domain sockets, which can then use that file descriptor (via a sendfd and recv_fd set of calls). Janus would be hard-pressed to know that this happened since that would require understanding the intent of the underlying sendmsg system calls and cmsg directives.

In addition to these difficulties, user-level validation suffers from possible TOCTTOU (time-of-check-to-time-of-use) race conditions. The environment present when Janus validates a request may change by the time the request is processed.

Application sandboxing with integrated OS support

The better alternative to having a user-level process decide on whether to permit system calls is to incorporate policy validation in the kernel. Some operating systems provide kernel support for sandboxing. These include the Android Application Sandbox, the iOS App Sandbox, the macOS sandbox, and AppArmor on Linux. Microsoft introduced the Windows Sandbox in December 2018, but this functions far more like a container than a traditional application sandbox, giving the process an isolated execution environment.


Seccomp-BPF, which stands for SECure COMPuting with Berkeley Packet Filters, is a sandboxing framework that is available on Linux systems. It allows the user to attach a system call filter to a process and all of the descendants of that process. Users can enumerate allowable system calls and also allow or disallow access to specific files or network protocols. Seccomp has been a core part of the Android security since the release of Android O in August 2017.

Seccomp uses the Berkeley Packet Filter (BPF) interpreter, which is a framework that was initially created for network socket filtering. With socket filtering, a user can create a filter to allow or disallow certain types of data to come through the socket. Since BPF is a framework that was initially created for sockets, seccomp sends “packets” that represent system calls to the BPF (Berkeley Packet Filter) interpreter. The filter allows the user to define rules that are applied to these system calls. These rules enable the inspection of each system call and its arguments and take subsequent action. Actions include allowing the call to run or not. If the call is not permitted, rules can specify whether an error is returned to the process, a SIGSYS signal is sent, or whether the process gets killed.

Seccomp is not designed to serve as a complete sandbox solution but is a tool for building sandboxes. For further process isolation, it can be used with other components, such as namespaces, capabilities, and control groups. The biggest downside of seccomp is the use of the BPF. BPF is a full interpreter – a processor virtual machine – that supports reading/writing registers, scratch memory operations, arithmetic, and conditional branches. Policies are compiled into BPF instructions before they are loaded into the kernel. It provides a low-level interface and the rules are not simple condition-action definitions. System calls are referenced by numbers, so it is important to check the system architecture in the filter as Linux system call numbers may vary across architectures. Once the user gets past this, the challenge is to apply the principle of least privilege effectively: restrict unnecessary operations but ensure that the program still functions correctly, which includes things like logging errors and other extraneous activities.

The Apple Sandbox

Conceptually, Apple’s sandbox is similar to seccomp in that it is a kernel-level sandbox, although it does not use the Berkeley Packet Filter. The sandbox comprises:

  • User-level library functions for initializing and configuring the sandbox for a process
  • A server process for handling logging from the kernel
  • A kernel extension that uses the TrustedBSD API to enforce sandbox policies
  • A kernel extension that provides support for regular expression pattern matching to enforce the defined policies

An application initializes the sandbox by calling sandbox_init. This function reads a human-friendly policy definition file and converts it into a binary format that is then passed to the kernel. Now the sandbox is initialized. Any function calls that are hooked by the TrustedBSD layer will be passed to the sandbox kernel extension for enforcement. Note that, unlike Janus, all enforcement takes place in the kernel. Enforcement means consulting a list of sandbox rules for the process that made the system call (the policy that was sent to the kernel by sandbox_init). In some cases, the rules may involve regular expression pattern matching, such as those that define filename patterns).

The Apple sandbox helps avoid comprehension errors by providing predefined sandbox profiles (entitlements). Certain resources are restricted by default and a sandboxed app must explicitly ask the user for permission. This includes accessing:

  • the system hardware (camera, microphone, USB)
  • network connections, data from other apps (calendar, contacts)
  • location data, and user files (photos, movies, music, user-specified files)
  • iCloud services

For mobile devices, there are also entitlements for push notifications and Apple Pay/Wallet access.

Once permission is granted, the sandbox policy can be modified for that application. Some basic categories of entitlements include:

  • Restrict file system access: stay within an app container, a group container, any file in the system, or temporary/global places
  • Deny file writing
  • Deny networking
  • Deny process execution

Browser-based Sandboxing: Chromium Native Client (NaCl)

Since the early days of the web, browsers have supported a plug-in architecture, where modules (containing native code) could be loaded into the browser to extend its capabilities. When a page specifies a specific plug-in via an <object> or <embed> element, the requested content is downloaded and the plug-in that is associated with that object type is invoked on that content. Examples of common plug-ins include Adobe Flash, Adobe Reader (for rendering pdf files), and Java, but there are hundreds of others. The challenge with this framework is how to keep the software in a plug-in from doing bad things.

An example of sandboxing designed to address the problem of running code in a plugin is the Chromium Native Client,called NaCl. Chromium is the open source project behind the Google Chrome browser and Chrome OS. The NaCl Browser plug-in designed to allow safe execution of untrusted native code within a browser, unlike JavaScript, which is run through an interpreter. It is built with compute-intensive applications in mind or interactive applications that use the resources of a client, such as games.

NaCl is a user-level sandbox and works by restricting the type of code it can sandbox. It is designed for the safe execution of platform-independent, untrusted native code inside a browser. The motivation was that some browser-based applications will be so compute-intensive that writing them in JavaScript will not be sufficient. These native applications may be interactive and may use various client resources but will need to do so in a controlled and monitored manner.

NaCl supports two categories of code: trusted and untrusted. Trusted code can run without a sandbox. Untrusted code must run inside a sandbox. This code has to be compiled using the NaCl SDK or any compiler that adheres to NaCl’s data alignment rules and instruction restrictions (not all machine instructions can be used). Since applications cannot access resources directly, the code is also linked with special NaCl libraries that provide access to system services, including the file system and network. NaCl includes a GNU-based toolchain that contains custom versions of gcc, binutils, gdb, and common libraries. This toolchain supports 32-bit ARM, 32-bit Intel x86 (IA–32), x86–64, and 32-bit MIPS architectures.

NaCl executes with two sandboxes in place:

  1. The inner sandbox uses Intel’s IA–32 architecture’s segmentation capabilities to isolate memory regions among apps so that even if multiple apps run in the same process space, their memory is still isolated. Before executing an application, the NaCl loader applies static analysis on thecode to ensure that there is no attempt to use privileged instructions or create self-modifying code. It also attempts to detect security defects in the code.

  2. The outer sandbox uses system call interposition to restrict the capabilities of apps at the system call level. Note that this is done completely at the user level via libraries rather than system call hooking.

Process virtual machine sandboxes: Java

A different type of sandbox is the Java Virtual Machine. The Java language was originally designed as a language for web applets, compiled Java programs that would get download and run dynamically upon fetching a web page. As such, confining how those applications run and what they can do was extremely important. Because the author of the application would not know what operating system or hardware architecture a client had, Java would compile to a hypothetical architecture called the Java Virtual Machine (JVM). An interpreter on the client would simulate the JVM and process the instructions in the application. The Java sandbox has three parts to it:

The bytecode verifier verifies Java bytecodes before they are executed. It tries to ensure that the code looks like valid Java byte code with no attempts to circumvent access restrictions, convert data illegally, bypass array bounds, or forge pointers.

The class loader enforces restrictions on whether a program is allowed to load additional classes and that key parts of the runtime environment are not overwritten (e.g., the standard class libraries). The class loader ensures that malicious code does not interfere with trusted code nad ensures that trusted class librares remain accessible and unmodified. It implements ASLR (Address Space Layout Randomization) by randomly laying out Runtime data areas (stacks, bytecodes, heap).

The security manager enforces the protection domain. It defines what actions are safe and which are not; it creates the boundaries of the sandbox and is consulted before any access to a resource is permitted. It is called at the time an application makes a call to specific methods so it can provide run-time verification of whether a program has been given rights to invoke the method, such as file I/O or network access. Any actions not allowed by the security policy result in a SecurityException being thrown. The security manager is the component that allows the user to restrict an application from accessing files or accessing the network, for example.A user can create a security policy file that enumerates what an application can or cannot do.

Java security is deceptively complex. After over twenty years of bugs one hopes that the truly dangerous ones have been fixed. Even though the Java language itself is pretty secure and provides dynamic memory management and array bounds checking, buffer overflows have been found in the underlying C support library, which has been buggy in general. Varying implementations of the JVM environment on different platforms make it unclear how secure any specific client will be. Moreover, Java supports the use of native methods, libraries that you can write in compiled languages such as C that interact with the operating system directly. These bypass the Java sandbox.


Malware is a term that refers to any malicious software that is unintentionally installed on a computer system. Malware can be distributed in various ways: viruses, worms, unintentional downloads, or trojan horses. It may spy on user actions and collect information on them (spyware), or present unwanted ads (adware). It may disable components of the system or encrypt files, undoing its damage if the owner pays money (ransomware). The software may sit dormant and wait for directives from some coordinator (a command and control server), who assembled an arsenal of hundreds of thousands of computers ready to do its bidding (for example, launch a distributed denial of service, DDoS, attack). Some software might be legitimate but may contain backdoors – undocumented ways to allow an outsider to use that software to perform other operations on your system.

Malware Motivation

A saying often paraphrased from Sun Tzu’s The Art of War is “know your enemy.” In the case of malware, it helps to understand why someone would want to install malicious software on your computer. There are numerous reasons. Some are:

Steal account credentials
If an attacker can obtain someone’s login and password credentials on one system, there is a good chance that those same credentials will work on other systems.
An attacker may have an interest in spying on the activities of a particular person. Traditionally, this would have been done through planting covert cameras and recording devices. Now it is often easier to accomplish the same results - and more - by installing software on the target’s computer. Such software is called spyware.
Data theft
An attacker may target a person at a specific company (or a student taking a certain class) in an attempt to exfiltrate data of strategic interest. Alternatively, an attacker may target people anonymously, with the hope of obtaining information of value, such as credit card data or bank account information.
There’s vengeance or vandalism: the attacker may want to destroy a target’s content or devices.
Host services
An attacker may need to harness computing, storage, or network resources. This can help hide the owner’s identity or amass a large collection of computers. An attacker can set up servers to host contraband data (e.g., stolen credit cards, login credentials, illegal material), send spam email on a large scale, mine cryptocurrency for free, or create a botnet for DDoS (distributed denial of service) attacks.
Adware and ad clicking
An attacker may add software to a system or reconfigure a browser or hosts file to present unwanted advertising in the form of pop-up windows or banners. Additionally, the malware may redirect search requests or create click-throughs on ads that the user never wanted.
Finally, the attacker may want money directly. Ransomware installs software to encrypt files that will be (hopefully) decrypted if ransom is paid. The emergence of cryptocurrencies led to a huge increase in ransomware since they enabled anonymous payments.

Another saying paraphrased from The Art of War is “all warfare is based on deception.” This is also useful to consider with malware since it is most often installed willingly by the user of the system via some form of deception rather than through the exploitation of bugs in the system.

Malware Infiltration

There are several ways in which malware gets onto a system.

An attacker can exploit vulnerabilities in system services, particularly network services, to inject code that will download the malware. Zero-day vulnerabilities are particularly useful to attackers. These are bugs that have been discovered but not yet reported to the software vendor or the general public and hence are not fixed. They are typically known only to those who discovered and exploited the vulnerability … or sold the exploit.

As such, an attacker can be confident that the exploit will work on virtually all systems running the software and does not have to rely on targets who were not diligent enough to keep their systems patched. Ideally (for the attacker), the vulnerabilities will allow malware to run with elevated privileges so they can access all parts of a system or conceal itself more effectively.

Related to this are N-day vulnerabilities. N-day vulnerabilities are known vulnerabilities. Because they are known, developers have the opportunity to patch the code with a fix and IT administrators have the ability to apply a patch, shut off services, or put some detection mechanisms in place. However, all this does not happen instantly. Attackers have a period of time — N days — between the time that a vulnerability is disclosed and the time that most systems have been patched to avoid the vulnerability.

Malware might be installed unknowingly via infected removable media, most commonly USB flash drives (in earlier years, it would have been CDs or floppy disks).

Social engineering

By far the most common way that malware enters a system is via deception: the legitimate user of the system installed it unknowingly. This uses a social engineering attack to convince the user that it is in his or her interest to install the software. Social engineering is the art of manipulating, influencing, or deceiving a user into taking some action that is not in his/her or the organization’s best interest. Goal of social engineers is to obtain your trust and get you to divulge information or provide them with some form of access. In computers, social engineering refers to any techniques used by an adversary to trick you into disclosing information, opening a file, downloading an attachment, reading a message, or running a program.

Websites may offer downloads of “security” software, system “cleaner” software, or software “updates,” none of which will do their purported task. An attacker may convince a user to click on a URL in an email attachment or a web page. Software obtained from file sharing services are also excellent venues for distributing malware. A user may try to avoid spending $4,000 for an AutoCAD license or $240/year for an Adobe Illustrator license and turn to a file sharing site to download a patched copy or a crack for the software that bypasses license checks. Quite often, these downloads contain malware instead of the desired software (what do you expect - the user is trying to be a thief downloading software from thieves).

An attacker may search collections of stolen email addresses (or usernames) and passwords. Since people often use the same name and password on multiple systems, this can often give the attacker access to other websites on which the user has accounts. Accounts for banking sites are, of course, particularly valuable since they can be a direct conduit for transferring money.

Given login information about a user, an attacker can log onto the systems or services as the owner of the account and install malware, monitor the internal organization, and even send email (e.g., contact other employees or friends).

Any information the attacker can get about a user can help an attacker create a more convincing social attack. The term pretexting refers to using a concocted scenario to contact a user and get additional information (e.g., an attacker can pretend to be a caller from the IT department or a high-level manager from another location to try to extract information; with some rudimentary information, the attacker can mention some employee, department, or project names to sound like a true insider).

Types of Malware

Worms and viruses

A virus is software that attaches itself to another piece of software. It may also be content, such as scripts inside a Microsoft Word document or PDF file, that will be accessed and hence executed by some software. It may also be an email attachment that contains a document or software with the malware or a link to the malware.

It might even be a modification of the boot loader of a computer or the firmware on a flash drive. The key point is that it does not run as an independent process. A virus may spread automatically by trying to

A virus is executed because another program ran. Viruses are often spread by sharing files or software. On a computer, a virus may replicate itself onto other files or software to maximize its chance of spreading and reduce its chance of being removed.

A worm is conceptually similar in that it can do the same damage to the computer as a virus can. The distinction from a virus is that a worm runs as a standalone process while a virus requires a host program.

Worms and virus are both designed to propagate to other computers, although they may require human intervention to spread. In other cases, they can replicate themselves and spread to other systems automatically, exploiting weaknesses in software on those computers to allow themselves to infiltrate those machines. The popular use of both terms, worm and virus, has often blurred the distinctions between them. People often refer to any malware as a virus.

When using non-legitimate ways of getting into a system or elevating their privileges, attackers often try to find zero-day vulnerabilities. These are vulnerabilities (bugs or configuration errors) that have not been publicly reported, or are newly discovered, and hence are unpatched. They are referred to as “zero-day” because developers have zero days to fix the problem.

Virus and worm components

Viruses and worms contains three components:

Infection mechanism
The infection mechanism is the component of a worm or virus that enables it to spread. It can exploit software vulnerabilities to connect to other systems, patch certain files, or alter start-up scripts.
This is the malicious part of the virus and contains the code that does the actual harm to the system such as uploading personal information or deleting files. In some cases, the payload may be a generic service that contacts a command and control server from which it gets specific instructions on what to do (e.g., mine cryptocurrency, send spam, participate in a DDoS attack).
The trigger, also called a logic bomb, is code that is run whenever a file containing the virus is run. It makes the decision whether the payload should be executed. For example, some viruses may stay dormant until a particular date, number of runs, or upon getting directives from a command and control server.

Malware residence: where does it live?

File infector virus

A file infector virus is a virus that adds itself to an executable program. The virus patches the program so that, upon running, control will flow to the the virus code. Ideally, the code will install itself in some unused area of the file so that the file length will remain unchanged. A comparison of file sizes with the same programs on other systems will not reveal anything suspicious. When the virus runs, it will run the infector to decide whether to install itself on other files. The trigger will then decide whether the payload should be executed. If not, the program will appear to run normally.

Bootloader malware

Bootkits, also known as boot sector viruses, are malware that targets the booting process of a system. The malware has an infector that installs itself in the Master Boot Record (MBR) of a disk. In older BIOS-based PC systems, the first sector of the bootable storage device is read into memory and executed when the system boots, Normally, the code that is loaded is the boot loader that then loads the operating system. By infecting the master boot record, the virus can repeatedly re-infiltrate the operating system or files on the disk even if any malware on the system was previously detected and removed.

Boot sector viruses were common in the early days of PCs when users often booted off floppy disks and shared these disks. The virus would often use DOS commands to install itself onto other disks that it detects. Users on those systems had full administrative rights to modify any part of the system.

These viruses have diminished as attackers found more appealing targets. However, there is no reason that malware that attacks on the bootloader should not be considered to be a continued threat. 2011 saw the emergence of ransomware that modified the boot loader to prevent the operating system from booting unless a ransom was paid. In 2016, Petya Trojan ransomware was deployed, which also infects the MBR and encrypts disk contents.

Infected flash drives

In the early days of PCs, people would share content by passing around floppy disks. This became a means for viruses to spread, which could be planted in either the boot sector or in files. These days, people share USB flash drives the way they used to share floppies.


In earlier Windows systems, Microsoft provided a feature called AutoRun. It was designed to make the CD (and, later, DVD and flash drive) experience better for users, particularly when using CDs for software installation. If the CD contained a file called autorun.inf, Windows would automatically execute a program identified within that file. While this made the experience of figuring out what to do after a CD is inserted easier for most users, it created a horrific security vulnerability: all that an adversary had to do was to get you to insert the media. Moreover, this functionality worked with any removable storage so that inserting a flash drive would automatically run a program defined within autorun.inf on the drive.

Microsoft eventually removed this capability from flash drives but some manufacturers created USB drives that emulated a CD drive to offer the “convenience” of AutoRun. Microsoft ultimately removed this functionality altogether in Windows 7. However, there are still old, unpatched versions of Windows out there that can be exploited with this vulnerability.

A similar problem occurs in the KDE framework. KDE is a desktop environment widely used on Linux systems. Malicious .desktop and .directory files can be created to run malicious code. Whenever the user uses the KDE file viewer to navigate to the directory where these files are stored, the code contained within these files will execute without any user interaction. This problem has not been fixed as of August 2019.

USB Firmware

The more insidious problem with USB flash drives now is unprotected firmware. A USB flash drive is a bunch of memory as well as firmware – embedded software on the chip. The firmware runs when you plug the drive into your computer. It identifies the drive as a USB storage device and manages the transferring of data. You don’t see this firmware and cannot tell if it has been changed. Because the firmware defines what the USB device is, modified firmware on the flash drive could present the drive as a keyboard and send a set of keyboard commands to the host system (for example, commands to open the terminal window and delete files).

A USB device can have multiple profiles associated with it and thus present itself as multiple devices, so the flash drive can tell the computer it is a keyboard but also a flash drive, so the user will still be able to use the device as a storage device. The firmware could also modify file contents as they pass between the USB storage device and host computer. The same attack can be user on other USB devices. For example, an ethernet adapter can redirect network messages to an attacker’s site.

Reprogramming the firmware has not been exploited by malware thus far, at least not in a widespread manner, but the vulnerability has been demonstrated and the source code to do this is freely and readily available.

Data leakage

The most common problem with flash drives is their portability and small size: they are easy to lose and easy to borrow. This makes them vulnerable to data leakage, which is just a fancy term that means some adversary may access your data simply by borrowing your flash drive.

In 2016, researchers at the University of Illinois ran an experiment where they scattered nearly 300 USB drives in public areas through the campus. Each of those drives was loaded with files that, when opened on a network-connected computer, would contact a server to tell it that the drive has been picked up and the file was opened. The results of the study showed that 98% of the drives were picked up and someone opened up at least one file on 45% of them[1]. Because of the risk of malicious firmware, even formatting a drive does not make it safe to use.

Inadvertent program execution

The portability of flash drives makes them a distribution mechanism. Experiments of scattering a number of them in parking lots revealed that many people are all too willing to plug a random drive into their system.

Even without automatic execution capabilities enabled, attackers can use flash drives as a distribution mechanism for malware. The Stuxnet attack exploited a windows bug in rendering shortcut icons where just viewing them in Windows Explorer enabled the execution of arbitrary code. Others have exploited a bug in video playback that allowed code execution. Even something as simple as an HTML file on a drive may direct the target to a website that can launch an attack.

There are many other USB device-based attacks. Take a look here if you’re curious.

Macro viruses

Some applications have support for macros, which allow the user to define a set of commands to avoid repetitive tasks and improve productivity. They are particularly common in text editors but are present in other applications as well, such as Photoshop and Microsoft Word and Excel. In some cases, as with Microsoft Office applications, macros are embedded in the document, which means they can be passed on to other users who access that document. Some macro capabilities are far more powerful than simply defining repetitive commands. Microsoft Office products, for example, provide Visual Basic scripting, which effectively allows users to embed complete programs into their documents. VBScript is based on Visual Basic and provides features that make it easy to access network printers, network files, special folders, user information, and even execute scripts on remote systems.

Scripts in Office documents can spread not only by having the user pass the original infected document around but by modifying the default template file, This will affect every other document on the system. With operating systems providing better access controls and users not running with administrative privileges, embedded scripts are a ripe area for attack. If you can convince somebody to open a document, they will run your program on their machine.

The challenge, of course, is to get a file with a malicious macro to target users and get them to open it. One of the most common techniques is to send it as an email attachment with some inducement to get the user to click on the document. This is an example of social engineering.

One hugely-successful virus that did this was the ILOVEYOU virus from 2000. The subject of the message stated that it is a letter from a secret admirer. The attachment wasn’t even a document; it was a visual basic script. To provide a better user experience, Microsoft would hide file extensions by default (macOS does this too). The file was named LOVE-LETTER-FOR-YOU.TXT.vbs but the .vbs suffix, which indicated that the file was a visual basic script, was hidden from users, so they only saw LOVE-LETTER-FOR-YOU.TXT. Not being aware of when extensions are hidden and when they are not, millions of users assumed they received an innocuous text file and clicked on it. Upon execution, the script would copy itself into various folders, modify and add new entries to the system registry, replace various types of files with copies of itself (targeting music and video files), and try to propagate itself through Internet relay Chat clients as well as email. If that wasn’t enough, it would download a file called WIN-BUGFIX.EXE and execute it. This was not a bug fixing program but rather a program that extracted user passwords and mailed them to the hacker.

The ILOVEYOU virus transmitted itself largely through email to contacts in infected computers, so your “secret admirer” message came from someone you knew and hence you were more likely to click on it. An earlier highly successful virus, Melissa, spread by offering a list of passwords for X-rated web sites. Email-based virus transmission is still a dominant mechanism. Sender headers and links are often disguised to make it look like the content is from a legitimate party.

JavaScript and PDF files

JavaScript, like Visual Basic, has evolved into a full programming language. Most browsers have security holes that involve Javascript. JavaScript can not only modify the content and structure of a web page but can connect to other sites. This allows any malicious site to leverage your machine. For example, systems can perform port scans on a range of IP addresses and report any detected unsecured services.

PDF (Portable Document Format) files, would seem to be innocent printable documents, incapable of harboring executable code. However, PDF is a complex format that can contain a mix of static and dynamic elements. Dynamic elements may contain Javascript, dynamic action triggers (e.g., “on open”), and the ability to retrieve “live” data via embedded URLs. As with Visual Basic scripts, PDF readers warn users of dynamic content but, depending on the social engineering around the file, the user may choose to trust the file … or not even pay attention to the warning in yet-another-dialog-box.


A Trojan horse is a program with two purposes: an overt purpose and a covert one. The overt purpose is what compels the user to get and run the program in the first place. The covert purpose is unknown to the user and is the malicious part of the program.

For example, a script with the name of a common Linux command might be added to a target user’s search path. When the user runs the command, the script is run. That script may, in turn, execute the proper command, leading the user to believe that all is well. As a side effect, the script may create a setuid shell to allow the attacker to impersonate that user or mail copy over some critical data. Users install Trojans because they believe they are installing useful software, such as an anti-virus tool (BTW, a lot of downloadable hacker tools contain Trojans: hackers hacking wannabe hackers). The side-effect of this software can activate cameras, enable key loggers, or deploy bots for anonymization servers, DDoS attacks, or spam attacks.

Trojans may include programs (games, utilities, anti-malware programs), downloading services, rootkits (see next) and backdoors (see next). They appear to perform a useful task that does not raise suspicion on the part of the victim.


A backdoor is software that is designed with some undocumented mechanism to allow someone who knows about it to be able to access the system or specific functions in a way that bypasses proper authentication mechanisms. In many cases, they are not designed for malicious use: they may allow a manufacturer to troubleshoot a device or a software author to push an update. However, if adversarial parties discover the presence of a backdoor, they can use it for malicious purposes.

An old, but famous, example of a backdoor is the sendmail mail delivery server. The author of sendmail wanted to have development access on a production system that had the program installed so that he can continue to improve it. The system administrator refused such access. His next release of sendmail contained a password-protected backdoor that gave him access to the system via the sendmail server. The password was hard-coded in the program and soon became well-known. Robert Morris used the knowledge of this backdoor as one of the mechanisms for his worm to propagate to other systems. More recently, in 2014, some Samsung Galaxy phones were delivered with backdoors that provide remote access to the data on the phone.


A rootkit is software that is designed to allow an attacker to access a computer and hide the existence of the software … and sometimes hide the presence of the user on the system.

Historically, a basic rootkit would replace common administration commands (such as ps, ls, find, top, netstat, etc.) with commands that mimic their operation but hide the presence of intruding users, intruding processes, and intruding files. The idea is that a system administrator should be able to examine the system and believe that all is fine and the system is free of malware (or of unknown user accounts).

User mode rootkits

The rootkit just described is a user mode rootkit and involves replacing commands, intercepting messages, and patching commonly-used APIs that may divulge the presence of the malware. A skilled administrator may find unmodified commands or import software to detect the intruding software.

Kernel mode rootkits

A kernel mode rootkit is installed as a kernel module. Being in the kernel gives the rootkit unrestricted access to all system resources and the ability to patch kernel structures and system calls. For example, directory listings from the getdents64 system call may not report any names that match the malware. Commands and libraries can be replaced and not give any indication that malicious software is resident in the system.

Hypervisor rootkits

The most insidious rootkits are hypervisor rootkits. A hypervisor sits below the operating system and is responsible for translating between virtual device operations from operating systems and the underlying hardware. All I/O flows through the hypervisor. Most computer systems do not run virtual machines and hence have no hypervisor. These systems are prime targets for a hypervisor-based rootkit. Now you can have an environment where the entire operating system can run unmodified - or even be reinstalled - and be unaware that its operations are being intercepted at a lower level. The hypervisor does not have to virtualize all hardware interactions: just the ones it cares about. For example, it might want to grab keyboard events to record passwords and messages.

Hypervisor attacks have not been deployed but have been demonstrated as a proof of concept. The challenge in detecting their presence is that operating systems are unaware if they are running under a hypervisor, so if a malicious hypervisor is installed, the operating system needs to detect that it is running under a hypervisor rather than directly on the computer system. Detection is difficult and often relies on measuring completion times of certain system calls. If they go through a hypervisor, they will take a longer time and the on-chip Time Stamp Counter (TSC), which counts CPU cycles, will show a longer value with a hypervisor in place. An alternative, and far more obscure, method of detection, is the use of an instruction that stores the interrupt descriptor table register (IDTR) into a memory location (the SIDT instruction). The hypervisor changes the register’s value and the instruction can detect that. However, this does not have to take place on a system with only one operating system, so measuring timing differences may still be the more foolproof approach.


If we think back to the goals of malware, one common goal was to extract money: even hackers need to monetize their efforts. An indirect way of accomplishing this was by collecting information to gain access to bank account data, PayPal data, or modifying accounts that may take money, such as eBay accounts. A more direct way of getting money is to demand it from the victim. Ransomware is a relatively new form of malware that locks a computer, keeps it from booting, or encrypts all files on the system. It then asks the suer to pay a ransom (usually via bitcoin) to get a decryption program.

Gathering information

Malware has varying goals. These goals may include spying on user activity, destroying content, assembling a collection of servers, or extracting money from a victim. One common goal is to gather information … or get the user to provide information. Your computer might not have anything of direct value to an adversary, but your PayPal, bank, Amazon, or eBay credentials might be useful.


Phishing is a social engineering attack whose most common purpose is to get personal information from someone, usually login credentials to some service. These are often carried out vie email with similar techniques that are used to spread infected files. A message announcing that your PayPal account is being canceled, that your bank detected a fraudulent transaction, or that FedEx could not deliver a package may prompt the receiver to panic and immediately click on a link in the message, which may result in the browser displaying a site crafted to look like PayPal, the bank, or FedEx and prompt the user for login and password information.

Phishing attacks are surprisingly effective. A 2018 study by Proofpoint found that 52% of all successful phishing emails are clicked on within one hour of being sent.

Spear phishing is a targeted form of phishing. A phishing attack sends the same message to a large set of users, hoping that some percentage of them will be fooled. A spear phishing attack sends a customized message that demonstrates some knowledge of the target, which will usually lead the target to think that the message is legitimate. For example, the 2016 Democratic National Committee (DNC) was facilitated by spear phishing. Targets were sent a message containing links, which is a common URL shortening service that hid the actual underlying URLs. Once clicked, the web site would display what looked like a legitimate Google accounts login page, already pre-populated with the victim’s GMail address.

More recent GMail spear phishing attacks send email to contacts of compromised accounts. The email contains an innocent-looking attachment: a thumbnail image of a document. When the victim clicks on the attachment, a web page that looks like a legitimate Google sign-in page is presented. As soon as the victim enters a name and password, the attackers get the credentials, log into the account, and target people in the victim’s contact list. They use an image of an actual attachment in the victim’s email and an actual subject line to make the email look more legitimate.

A 2017 report by Webroot found that 1.385 million new and unique phishing sites are created each month. Some warning signs that a mail message may be a phishing attack are:

  1. From header: is it from an unknown or suspicious address?

  2. To header: if the message is sent to multiple people, do you recognize any other names on the header?

  3. Date header: if the message purports to be a personal message, was it sent during normal business hours?

  4. Subject header: is the suspicious and is it relevant to your activities?

  5. Message content: is the message a request to click on a link in order to avoid a negative consequence?

  6. Embedded links: are there any links that you are asked to click? If you look at the target of those links, are they misspelled, suspicious, or for a site different from that of the sender?

  7. Attachments: is there an unexpected attachment that you are expected to open, such as a Microsoft Word document or PDF file?

Deceptive web sites

Quite often, malicious links in phishing attacks direct the user to a web site in order to obtain their login credentials. These sites masquerade as legitimate sites. The Proofpoint study mentioned earlier found that for every legitimate website, there are 20 malicious sites that mimic it. This is known as typosquatting. Such sites can be masqueraded banking sites, Google/Microsoft/Apple authentication pages, videoconferencing plugin-software downloads, etc.

File serving sites, including those that host software or those that provide services such as PDF or mp3 conversion are often ad-sponsored. Some of the ads on these sites, however, often look like download links and can trick a user into clicking on the ad instead of the link for the actual content. The


Another way of obtaining information is to snoop on a user’s actions. Keyloggers record everything a victim types and allow a user to extract login names, passwords, and entire messages.

Keyloggers can be implemented in several ways:

Malicious hypervisor
Since a hypervisor provides virtual interfaces for all the resources of a computer, it can capture all keyboard, mouse, and even video data. These attacks are difficult since they rely on the ability to install a hypervisor.
Kernel-based rootkit
All input/output operations go through the operating system kernel. Modifying the kernel allows malicious software to log and upload keystroke data.
System call hooking
Some operating systems provide a system call hooking mechanism that allows data to and from system calls to be intercepted. We saw how this was used to implement sandboxing. Windows enables this without having to install any kernel-level drivers. The SetWindowsHookEx system call can be used to report WH_KEYBOARD and WH_MOUSE events, capturing keyboard and mouse activity.
Browser-based logging
JavaScript can be used to capture onKeyUp() events. These events will be captured for one page but other hacks can be used to create a broader context with embedded pages. Form submission can also be intercepted to get populated form data without having to reassemble key presses into coherent account credentials.
Hardware loggers
Although visible to the user, hardware key loggers can be used for USB-connected keyboards. Some of these have embedded Wi-Fi transceivers that enable an attacker to collect the data from a distance.


Malware was particularly easy to spread on older Windows systems since user accounts, and hence processes, ran with full administrative rights, which made it easy to modify any files on the system and even install kernel drivers. Adding file protection mechanisms, such as a distinction between user and administrator accounts added a significant layer of protection. However, malware installed by the user would run with that user’s privileges and would have full access to all of a user’s files. If any files are read or write protected, the malware can change DAC permissions.

Systems took the approach of warning users if software wanted to install software or asked for elevated privileges. Social engineering hopes to convince users that they actually want to install the software (or view the document). They will happily grant permissions and install the malware. MAC permissions can stop some viruses as they will not be able, for instance, to override write permissions on executable files but macro viruses and the user files are still a problem.

In general, however, studies have shown that by simply taking away admin rights (avoiding privilege escalation) from users, 94% of the 530 Microsoft vulnerabilities that were reported in 2016 could be mitigated and 100% of vulnerabilities in Office 2016 could be mitigated.

Anti-virus (anti-malware) software

There is no way to recognize all possible viruses. Anti-virus software uses two strategies: signature-based and behavior-based approaches.

With signature-based systems, anti-virus programs look for byte sequences that match those in known malware. Each bit pattern is an excerpt of code from a known virus and is called a signature (not to be confused with digital signatures, discussed later in the course). A virus signature is simply a set of bytes that make up a portion of the virus and allow scanning software to see whether that virus is embedded in a file. The hope is that the signature is long enough and unique enough that the byte pattern will not occur in legitimate programs. This scanning process is called signature scanning. Lists of signatures (“virus definitions”) have to be updated by the anti-virus software vendor as new viruses are discovered. Signature-based detection is used by most anti-virus products.

A behavior-based system monitors the activities of a process (typically the system calls or standard library calls that it makes). Ideally, sandboxing is employed, to ensure that the suspected code is run within a sandbox or even in an interpreted environment within a sandbox to ensure that it cannot cause real damage. Behavior-based systems try to perform anomaly detection. If the observed activity is deemed suspicious, the process is terminated and the user alerted. Sandboxed, behavior-based analysis is often run by anti-malware companies to examine what a piece of suspected malware is actually doing and whether it should be considered to be a virus. A behavior-based can identify previously-unseen malware but these systems tend to have higher false positive rates of detection: it is difficult to characterize exactly what set of operations constitute suspicious behavior.

Windows Defender, as an example, makes use of both signature-based scanning as well as behavior-based process monitoring. It uses signature-based scanning on files and behavior-based analysis for running processes. Behavior monitoring includes scanning for suspicious file and registry changes (e.g., ransomware may try to encrypt all the files on a system and a lot of malware may try to modify the registry so that the software runs when the system is rebooted.


Some viruses will take measures to try to defend themselves from anti-virus software.

Signature scanning countermeasures

A common thing to do in malware is to use a packer on the code, unpacking it prior to execution. Packing can be one of several operations:

  • Simply obscure the malware payload by exclusive-oring (xor) with a repeating byte pattern (exclusive-oring the data with the same byte pattern reconstructs it.
  • Compress the code and then uncompress it upon loading it prior to execution.
  • Encrypt the code and decrypt it prior to execution.

All of these techniques will change the signature of a virus. One can scan for a signature of a compressed version of the virus but there are dozens of compression algorithms around, so the scanning process gets more complicated.

With encryption (xor is a simple form of encryption), only the non-encrypted part of the virus contains the unpacking software (decryption software and the key). A virus scanner will need to match the code for the unpacker component since the key and the encrypted components can change each time the virus propagates itself.

Polymorphic viruses mutate their code each time they run while keeping the algorithm the same. This involves replacing sequences of instructions with functionally-identical ones. For example, one can change additions to subtractions of negative numbers, invert conditional tests and branches, and insert or remove no-op instructions. This thwarts signature scanning software because the the byte pattern of the virus is different each time.

Access control countermeasures

Access controls help but do not stop the problem of malware. Containment mechanisms such as containers work well for server software but are usually impractical for user software (e.g., you want Microsoft Word to be able to read documents anywhere in a user’s directories). Application sandboxing is generally far more effective and is a dominant technique used in mobile software.

Trojans, deceptive downloads, and phishing attacks are insidiously difficult to defend against since we are dealing with human nature: users want to install the software or provide the data. They are conditioned to accepting pop-up messages and entering a password. Better detection in browsers & mail clients against suspicious content or URLs helps. However, malware distributors have been known to simply ask a user to rename a file to turn it into one that is recognized by the operating system as an executable file (or a disk image, PDF, or whatever format the malware come in and may otherwise be filtered by the mail server or web browser.

Sandboxing countermeasures

Virusus are unlikely to get through a sandbox (unless there are vulnerabilities or an improper configuration). However, there are areas where malware can address sandboxing:

  1. Vendor examination
    Anti-virus vendors often test software within a tightly configured sandboxed environment so they can detect whether the software is doing anything malicious (e.g., accessing files, devices, or the network in ways it is not supposed to). If they detect that they do have malware, they will dig in further and extract a signature so they can update and distribute their list of virus definitions. Viruses can try to get through this examination phase by setting a trigger to keep the virus from immediately performing malicious actions or to stay dormant for the first several invocations. The hope is that the anti-virus vendors will not see anything suspicious and the virus will never be flagged as such by their software.

  2. User configuration (entitlements)
    Virtually all mobile applications, and increasingly more desktop/laptop applications, are run with application sandboxes in place. These may disallow malware from accessing files, devices, or the network. However, it never hurts to ask. The software can simply ask the user to modify the sandbox settings. If social engineering is successful, the user may not even be suspicious and not wonder why a game wants access to contacts or location information.

  1. Matthew Tischer, Zakir Durumeric, et al., Users Really Do Plug in USB Drives They Find, University of Illinois,  ↩


Cryptography deals with encrypting plaintext using a cipher, also known as an encryption algorithm, to create ciphertext, which is unintelligible to anyone unless they can decrypt the ciphertext. It is a tool that helps build protocols that address:

Showing that the user really is that user.
Validating that the message has not been modified.
Binding the origin of a message to a user so that she cannot deny creating it.
Hiding the contents of a message.

A restricted cipher is one where the workings of the cipher must be kept secret. There is no reliance on any key and the secrecy of the cipher is crucial to the value of the algorithm. This has obvious flaws: people in the know leaking the secret, designers coming up with a poor algorithm, and reverse engineering.

For any serious use of encryption, we use well-tested, non-secret algorithms that rely on secret keys. A key is a parameter to a cipher that alters the resulting ciphertext. Knowledge of the key is needed to decrypt the ciphertext. Kerckhoffs’s Principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. We expect algorithms to be publicly known and all security to rest entirely on the secrecy of the key.

A symmetric encryption algorithm uses the same secret key for encryption and decryption.

An alternative to symmetric ciphers are asymmetric ciphers. An asymmetric, or public key cipher uses two related keys. Data encrypted with one key can only be decrypted with the other key.

Properties of good ciphers

For a cipher to be considered good, ciphertext should be indistinguishable from random values. Given ciphertext, there should be no way to extract the original plaintext or the key that was used to create it except by of enumerating over all possible keys. This is called a brute-force attack. The keys used for encryption should be large enough that a brute force attack is not feasible. Each additional bit in a key doubles the number of possible keys and hence doubles the search time.

Classic cryptography

Monoalphabetic substitution ciphers

The earliest form of cryptography was the monoalphabetic substitution cipher. In this cipher, each character of plaintext is substituted with a character of ciphertext based on a substitution alphabet (a lookup table). The simplest of these is the Caesar cipher, known as a shift cipher, in which a plaintext character is replaced with a character that is n positions away in the alphabet. The key is the simply the the shift value: the number n. Substitution ciphers are vulnerable to frequency analysis attacks, in which an analyst analyzes letter frequencies in ciphertext and substitutes characters with those that occur with the same frequency in natural language text (e.g., if “x” occurs 12% of the time, it’s likely to really be an “e” since “e” occurs in English text approximately 12% of the time while “x” occurs only 0.1% of the time).

Polyalphabetic substitution ciphers

Polyalphabetic substitution ciphers were designed to increase resiliency against frequency analysis attacks. Instead of using a single plaintext to ciphertext mapping for the entire message, the substitution alphabet may change periodically. Leon Battista Alberti is credited with creating the first polyalphabetic substitution cipher. In the Alberti cipher (essentially a secret decoder ring), the substitution alphabet changes every n characters as the ring is rotated one position every n characters.

The Vigenère cipher is a grid of Caesar ciphers that uses a repeating key. A repeating key is a key that repeats itself for as long as the message. Each character of the key determines which Caesar cipher (which row of the grid) will be used for the next character of plaintext. The position of the plaintext character identifies the column of the grid. These algorithms are still vulnerable to frequency analysis attacks but require substantially more plaintext since one needs to deduce the key length (or the frequency at which the substitution alphabet changes) and then effectively decode multiple monoalphabetic substitution ciphers.

One-time Pads

The one-time pad is the only provably secure cipher. It uses a random key that is as long as the plaintext. Each character of plaintext is permuted by a character of ciphertext (e.g., add the characters modulo the size of the alphabet or, in the case of binary data, exclusive-or the next byte of the text with the next byte of the key). The reason this cryptosystem is not particularly useful is because the key has to be as long as the message, so transporting the key securely becomes a problem. The challenge of sending a message securely is now replaced with the challenge of sending the key securely. The position in the key (pad) must by synchronized at all times. Error recovery from unsynchronized keys is not possible. Finally, for the cipher to be secure, a key must be composed of truly random characters, not ones derived by an algorithmic pseudorandom number generator. The key can never be reused.

The one-time pad provides perfect secrecy (not to be confused with forward secrecy, also called perfect forward secrecy, which will be discussed later), which means that the ciphertext conveys no information about the content of the plaintext. It has been proved that perfect secrecy can be achieved only if there are as many possible keys as the plaintext, meaning the key has to be as long as the message. Watch this video for an explanation of perfect secrecy.

Stream ciphers

A stream cipher simulates a one-time pad by using a keystream generator to create a set of key bytes that is as long as the message. A keystream generator is a pseudorandom number generator that is seeded, or initialized, with a key that drives the output of all the bytes that the generator spits out. The keystream generator is fully deterministic: the same key will produce the same stream of output bytes each time. Because of this, receivers only need to have the key to be able to decipher a message. However, because the keystream generator does not generate true random numbers, the stream cipher is not a true substitute for a one-time pad. Its strength rests on the strength of the key. A keystream generator will, at some point, will reach an internal state that is identical to some previous internal state and produce output that is a repetition of previous output. This also limits the security of a stream cipher but the repetition may not occur for a long time, so stream ciphers can still be useful for many purposes.

Rotor machines

A rotor machine is an electromechanical device that implements a polyalphabetic substitution cipher. It uses a set of disks (rotors), each of which implements a substitution cipher. The rotors rotate with each character in the style of an odometer: after a complete rotation of one rotor, the next rotor advances one position. Each successive character gets a new substitution alphabet applied to it. The multi-rotor mechanism allows for a huge number of substitution alphabets to be employed before they start repeating when the rotors all reach their starting position. The number of alphabets is cr, where c is the number of characters in the alphabet and r is the number of rotors.

Transposition ciphers

Instead of substituting one character of plaintext for a character of ciphertext, a transposition cipher scrambles the position of the plaintext characters. Decryption is the knowledge of how to unscramble them.

A scytale, also known as a staff cipher, is an ancient implementation of a transposition cipher where text written along a strip of paper is wrapped around a rod and the resulting sequences of text are read horizontally. This is equivalent to entering characters in a two-dimensional matrix horizontally and reading them vertically. Because the number of characters might not be a multiple of the width of the matrix, extra characters might need to be added at the end. This is called padding and is essential for block ciphers, which encrypt chunks of data at a time.

Block ciphers

Most modern ciphers are block ciphers, meaning that they encrypt a chunk of bits, or block, of plaintext at a time. The same key is used to encrypt each successive block of plaintext.

AES and DES are two popular symmetric block ciphers. Symmetric block ciphers are usually implemented as iterative ciphers. The encryption of each block of plaintext iterates over several rounds. Each round uses a subkey, which is a key generated from the main key via a specific set of bit replications, inversions, and transpositions. The subkey is also known as a round key since it is applied to only one round, or iteration. This subkey determines what happens to the block of plaintext as it goes through a substitution-permutation (SP) network. The SP network, guided by the subkey, flips some bits by doing a substitution, which is a table lookup of an input bit pattern to get an output bit pattern and a permutation, which is a scrambling of bits in a specific order. The output bytes are fed into the next round, which applies a substitution-permutation step onto a different subkey. The process continues for several rounds (16 rounds for DES, 10–14 rounds for AES). and the resulting bytes are the ciphertext for the input block.

The iteration through multiple SP steps creates confusion and diffusion. Confusion means that it is extremely difficult to find any correlation between a bit of the ciphertext with any part of the key or the plaintext. A core component of block ciphers is the s-box, which converts n input bits to m output bits, usually via a table lookup. The purpose of the s-box is to add confusion by altering the relationship between the input and output bits.

Diffusion means that any changes to the plaintext are distributed (diffused) throughout the ciphertext so that, on average, half of the bits of the ciphertext would change if even one bit of plaintext is changed.

Feistel ciphers

A Feistel cipher is a form of block cipher that uses a variation of the SP network where a block plaintext is split into two parts. The substitution-permutation round is applied to only one part. That output is then XORed with the other part and the two halves are swapped. At each round, half of the input block remains unchanged. DES, the Data Encryption Standard, is an example of a Feistel cipher. AES, the Advanced Encryption Standard, is not.


Two popular symmetric block ciphers are DES, the Data Encryption Standard, and AES, the Advanced Encryption Standard. DES was adopted as a federal standard in 1976 and is a block cipher based on the Feistel cipher that encrypts 64-bit blocks using a 56-bit key.

DES has been shown to have some minor weaknesses against cryptanalysis. Key can be recovered using 247 chosen plaintexts or 243 known plaintexts. Note that this is not a practical amount of data to get for a real attack. The real weakness of DES is not the algorithm but but its 56-bit key. An exhaustive search requires 255 iterations on average (we assume that, on average, the plaintext is recovered halfway through the search). This was a lot for computers in the 1970s but is not much for today’s dedicated hardware or distributed efforts.


Triple-DES (3DES) solves the key size problem of DES and allows DES to use keys up to 168 bits. It does this by applying three layers of encryption:

  1. C’ = Encrypt M with key K1
  2. C’’ = Decrypt C’ with key K2
  3. C = Encrypt C’’ with key K3

If K1, K2, and K3 are identical, we have the original DES algorithm since the decryption in the second step cancels out the encryption in the first step. If K1 and K3 are the same, we effectively have a 112-bit key and if all three keys are different, we have a 168-bit key.

Cryptanalysis is not effective with 3DES: the three layers of encryption use 48 rounds instead of 16 making it infeasible to reconstruct the substitutions and permutations that take place. DES is relatively slow compared with other symmetric ciphers, such as AES. It was designed with hardware encryption in mind 3DES is, of course, three times slower than DES.


AES, the Advanced Encryption Standard, was designed as a successor to DES and became a federal government standard in 2002. It uses a larger block size than DES: 128 bits versus DES’s 64 bits and supports larger key sizes: 128, 192, and 256 bits. Even 128 bits is complex enough to prevent brute-force searches.

No significant academic attacks have been found thus far beyond brute force search. AES is also typically 5–10 times faster in software than 3DES.

Block cipher modes

Electronic Code Book (ECB)

When data is encrypted with a block cipher, it is broken into blocks and each block is encrypted separately. This leads to two problems.

  1. If different encrypted messages contain the same substrings and use the same key, an intruder can deduce that it is the same data.

  2. Secondly, a malicious party can delete, add, or replace blocks (perhaps with blocks that were captured from previous messages).

This basic form of a block cipher is called an electronic code book (ECB). Think of the code book as a database of encrypted content. You can look up a block of plaintext and find the corresponding ciphertext. This is not feasible to implement for arbitrary messages but refers to the historic use of codebooks to convert plaintext messages to ciphertext.

Cipher Block Chaining (CBC)

Cipher block chaining (CBC) addresses these problems. Every block of data is still encrypted with the same key. However, prior to being encrypted, the data block is exclusive-ORed with the previous block of ciphertext. The receiver does the process in reverse: a block of received data is decrypted and then exclusive-ored with the previously-received block of ciphertext to obtain the original data. The very first block is exclusive-ored with a random initialization vector, which must be transmitted to the remote side.

Note that CBC does not make the encryption more secure; it simply makes the result of each block of data dependent on all previous previous blocks so that data cannot be meaningfully inserted or deleted in the message stream.

Counter mode (CTR)

Counter mode (CTR) also addresses these problems but in a different way. The ciphertext of each block is a function of its position in the message. Encryption starts with a message counter. The counter is incremented for each block of input. Only the counter is encrypted. The resulting ciphertext is then exclusive-ORed with the corresponding block of plaintext, producing a block of message ciphertext. To decrypt, the receiver does the same thing and needs to know the starting value of the counter as well as the key.

An advantage of CTR mode is that each block has no dependance on other blocks and encryption on multiple blocks can be done in parallel.


The goal of cryptanalysis is break codes. Most often, it is to identify some non-random behavior of an algorithm that will give the analyst an advantage over an exhaustive search of the key space.

Differential cryptanalysis seeks to identify non-random behavior by examining how changes in plaintext input affect changes in the output ciphertext. It tries to find whether certain bit patterns are unlikely for certain keys or whether the change in plaintext results in likely changes in the output.

Linear cryptanalysis tries to create equations that attempt to predict the relationships between ciphertext, plaintext, and the key. An equation will never be equivalent to a cipher but any correlation of bit patterns give the analyst an advantage.

Neither of these methods will break a code directly but may help find keys or data that are more likely are that are unlikely. It reduces the keys that need to be searched.

Public key cryptography

Public key algorithm, also known as asymmetric ciphers, use one key for encryption and another key for decryption. One of these keys is kept private (known only to the creator) and is known as the private key. The corresponding key is generally made visible to others and is known as the public key.

Anything encrypted with the private key can only be decrypted with the public key. This is the basis for digital signatures. Anything that is encrypted with a public key can be encrypted only with the corresponding private key. This is the basis for authentication and covert communication.

Public and private keys are related but, given one of the keys, there is no feasible way of computing the other. They are based on trapdoor functions, which are one-way functions: there is no known way to compute the inverse unless you have extra data: the other key.

RSA public key cryptography

The RSA algorithm is the most popular algorithm for asymmetric cryptography. Its security is based on the difficulty of finding the factors of the product of two large prime numbers. Unlike symmetric ciphers, RSA encryption is a matter of performing arithmetic on large numbers. It is also a block cipher and plaintext is converted to ciphertext by the formula:

c = me mod n

Where m is a block of plaintext, e is the encryption key, and n is an agreed-upon modulus that is the product of two primes. To decrypt the ciphertext, you need the decryption key, d:

m = cd mod n

Given the ciphertext c, e, and n, there is no efficient way to compute the inverse to obtain m. Should an attacker find a way to factor n into its two prime factors, however, the attacker would be able to reconstruct the encryption and decryption keys, e and d.

Elliptic curve cryptography (ECC)

Elliptic curve cryptography (ECC) is a more recent public key algorithm that is an alternative to RSA. It is based on finding points along a prescribed elliptic curve, which is an equation of the form:

y2 = x3 + ax + b

Contrary to its name, elliptic curves have nothing to do with ellipses or conic sections and look like bumpy lines. With elliptic curves, multiplying a point on a given elliptic curve by a number will produce another point on the curve. However, given that result, it is difficult to find what number was used. The security in ECC rests not our inability to factor numbers but our inability to perform discrete logarithms in a finite field.

The RSA algorithm is still the most widely used public key algorithm, but ECC has some advantages:

  • ECC can use far shorter keys for the same degree of security. Security comparable to 256 bit AES encryption requires a 512-bit ECC key but a 15,360-bit RSA key

  • ECC requires less CPU consumption and uses less memory than RSA. It is faster for encryption (including signature generation) than RSA but slower for decryption.

  • Generating ECC keys is faster than RSA (but much slower than AES, where a key is just a random number).

On the downside, ECC is more complex to implement and decryption is slower than with RSA. As a standard, ECC was also tainted because the NSA inserted weaknesses into the ECC random number generator that effectively created a backdoor for decrypting content. This has been remedied and ECC is generally considered the preferred choice over RSA for most applications.

If you are interested, see here for a somewhat easy-to-understand tutorial on ECC.

Quantum computing

Quantum computers are a markedly different form computer. Conventional computers store and process information that is represented in bits, with each bit having a distinct value of 0 or 1. Quantum computers use the principles of quantum mechanics, which include superposition and entanglement. Instead of working with bits, quantum computers operate on qubits, which can hold values of “0” and “1” simultaneously via superposiion. The superpositions of qubits can be entangled with other objects so that their final outcomes will be mathematically related. A single operation can be carried out on 2^n values simultaneously, where n is the number of qubits in the computer.

be solved exponentially faster than with conventional comptuers. Shor’s algorithm, for instance, will be able to find the prime factors of large integers and compute discrete logarithms far more efficiently than is currently possible.

So far, quantum computers are very much in their infancy and it is not clear when – or if – large-scale quantum computers that are capable of solving useful problems will be built. IBM and Google are two companies that are racing to build one. It is unlikely that they will be built in the next several years but we expect that they will be built eventually. Shor’s algorithm will be able to crack public-key based systems such as RSA, Elliptic Curve Cryptography, and Diffie-Hellman key exchange. In 2016, the NSA called for a migration to “post-quantum cryptographic algorithms” and has currently narrowed down the submissions to 26 candidates. The goal is to find useful trapdoor functions that do not rely on multiplying large primes, computing exponents, any other mechanisms that can be attacked by quantum computation. If you are interested in these, you can read the NSA’s report.

Symmetric cryptosystems, such as AES, are not particularly vulnerable to quantum computing since they rely on moving and flipping bits rather than applying mathematical functions on the data. The best potential attacks come via Grover’s algorithm, which yields only a quadratic rather than an exponential speedup in key searches. This will reduce the effective strength of a key by a factor of two. For instance, a 128-bit key will have the strength of a 64-bit key on a conventional computer. It is easy enough to use a sufficiently long key (256-bit AES keys are currently recommended) so that quantum computing poses no threat to symmetric algorithms.

Secure communication

Symmetric cryptography

Communicating securely with symmetric cryptography is easy. All communicating parties must share the same secret key. Plaintext is encrypted with the secret key to create ciphertext and then transmitted or stored. It can be decrypted by anyone who has the secret key.

Asymmetric cryptography

Communicating securely with asymmetric cryptography is a bit different. Anything encrypted with one key can be decrypted only by the other related key. For Alice to encrypt a message for Bob, she encrypts it with Bob’s public key. Only Bob has the corresponding key that can decrypt the message: Bob’s private key.

Hybrid cryptography

Asymmetric cryptography alleviates the problem of transmitting a key over an unsecure channel. However, it is considerably slower than symmetric cryptography. AES, for example, is approximately 1,500 times faster for decryption than RSA and 40 times faster for encryption. AES is also much faster than ECC. Key generation is also far slower with RSA or ECC than it is with symmetric algorithms, where the key is just a random number rather than a set of carefully chosen numbers with specific properties. Moreover, certain keys with RSA may be weaker than others.

Because of these factors, RSA and ECC are almost never used to encrypt large chunks of information. Instead, it is common to use hybrid cryptography, where a public key algorithm is used to encrypt a randomly-generated key that will encrypt the message with a symmetric algorithm. This randomly-generated key is called a session key, since it is generally used for one communication session and then discarded.

Key Exchange

The biggest problem with symmetric cryptography is key distribution. For Alice and Bob to communicate, they must share a secret key that no adversaries can get. However, Alice cannot send the key to Bob since it would be visible to adversaries. She cannot encrypt it because Alice and Bob do not share a key yet.

Key exchange using a trusted third party

For two parties to communicate using symmetric ciphers they need to share the same key. The ways of doing this are:

  1. Share the key via some trusted mechanism outside of the network, such are reading it over the phone or sending a flash drive via FedEx.

  2. Send the key using a public key algorithm.

  3. Use a trusted third party.

We will first examine the use of a trusted third party. A trusted third party is a trusted system that has everyone’s key. Hence, only Alice and the trusted party (whom we will call Trent) have Alice’s secret key. Only Bob and Trent have Bob’s secret key.

The simplest way of using a trusted third party is to ask it to come up with a session key and send it to the parties that wish to communicate. For example, Alice sends a message to Trent requesting a session key to communicate with Bob. This message is encrypted with Alice’s secret key so that Trent knows the message could have only come from Alice.

Trent generates a random session key and encrypts it with Alice’s secret key. He also encrypts the same key with Bob’s secret key. Alice gets both keys and passes the one encrypted for Bob to Bob. Now Alice and Bob have a session key that was encrypted with each of their secret keys and they can communicate by encrypting messages with that session key.

This simple scheme is vulnerable to replay attacks. An eavesdropper, Eve, can record messages from Alice to Bob and replay them at a later time. Eve might not be able to decode the messages but she can confuse Bob by sending him seemingly valid encrypted messages.

The second problem is that Alice sends Trent an encrypted session key but Trent has no idea that Alice is requesting to communicate with him. While Trent authenticated Alice (simply by being able to decrypt her request) and authorized her to talk with Bob (by generating the session key), that information has not been conveyed to Bob.

Needham-Schroeder: nonces

The Needham-Schroeder protocol improves the basic key exchange protocol by adding nonces to messages. A nonce is simply a random string – a random bunch of bits. Alice sends a request to Trent, asking to talk to Bob. This time, it doesn’t have to even be encrypted. As part of the request she sends a nonce.

Trent responds with a message that contains:

  • Alice’s ID
  • Bob’s ID
  • the nonce
  • the session key
  • a ticket: a message encrypted for Bob containing Alice’s ID and the same session key

This entire message from Trent is encrypted with Alice’s secret key. Alice can validate that the message is a response to her message because:

  • It is encrypted for her: nobody but Alice and Trent has Alice’s secret key.
  • It contains the same nonce as in her request, so it is not a replay of some earlier message, which would have had a different randomly-generated nonce.

Alice sends the ticket (the message encrypted with Bob’s key) to Bob. He can decrypt it and knows:

  • The message must have been generated by Trent since only Trent and Bob know Bob’s key and and thus could construct a meaningful message encrypted with Bob’s key.
  • He will be communicating with Alice because Trent placed Alice’s ID in that ticket.
  • The session key since Trent placed that in the ticket as well. Alice has this too.

Bob can now communicate with Alice but he will first authenticate Alice to be sure that he’s really communicating with her. He’ll believe it’s Alice if she can prove that she has the session key. To do this, Bob creates another nonce, encrypts it with the session key, and sends it to Alice. Alice decrypts the message, subtracts one from the nonce, encrypts the result, and sends it back to Bob. She just demonstrated that she could decrypt a message using the session key and return back a known modification of the message. Needham-Schroeder is a combined authentication and key exchange protocol.

Denning-Sacco modification: timestamps to avoid key replay

One flaw in the Needham-Schroeder algorithm is when Alice sends the ticket to Bob. The ticket is encrypted with Bob’s secret key and contains Alice’s ID as well as the session key. If an attacker grabbed a communication session and managed to decrypt the session key, she can replay the transmission of the ticket to Bob. Bob won’t know that he received that same session key in the past. He will proceed to validate “Alice” by asking her to prove that she indeed knows the session key. In this case, Eve, our eavesdropper, does know it; that’s why she sent the ticket to Bob. Bob completes the authentication and thinks he is talking with Alice when in reality he is talking to Eve.

A fix for this was proposed by Denning & Sacco: add a timestamp to the ticket. When Trent creates the ticket that Alice will give to Bob, it is a message encrypted for Bob and contains Alice’s ID, the session key, and a timestamp.

When Bob receives a ticket, he checks the timestamp. If it is older than some recent time (e.g., a few seconds), Bob will simply discard the ticket, assuming that he is getting a replay attack.

Otway-Rees protocol: session IDs instead of timestamps

A problem with timestamps is that their use relies on all entities having synchronized clocks. If Bob’s clock is significantly off from Trent’s, he may falsely accept or falsely reject a ticket that Alice presents to him. Time synchronization becomes an attack vector for this protocol. If an attacker can change Bob’s concept of time, she may be able to convince Bob to accept an older ticket. To do this, she can create fake NTP (network time protocol) responses to force Bob’s clock to synchronize to a different value or, if Bob is paranoid and uses a GPS receiver to synchronize time, create fake GPS signals.

A way to avoid the replay of the ticket without using timestamps is to add a session ID to each message. The rest of the Otway-Rees protocol differs a bit from Needham-Schroeder but is conceptually very similar.

  1. Alice sends a message to Bob that contains:
    • A session ID
    • Aoth of their IDs
    • A message encrypted with Alice’s secret key. This encrypted message contains Alice and Bob’s IDs as well as the session ID.
  2. Bob sends Trent a request to communicate with Alice, containing:
    • Alice’s message
    • A message encrypted with Bob’s secret key that also contains the session ID.
  3. Trent now knows that Alice wants to talk to Bob since the session ID is inside her encrypted message and that Bob agrees to talk to Alice since that same session ID is inside his encrypted message.

  4. Trent creates a random session key encrypted for Bob and the same key encrypted for Alice and sends both of those to Bob, along with the session key.

The protocol also incorporates nonces to ensure that there is no replay attack on Trent’s response even if an attacker sends a message to Bob with a new session ID and old encrypted session keys (that were cracked by the attacker).


Kerberos is a trusted third party authentication, authorization, and key exchange protocol using symmetric cryptography and based closely on the Needham-Schroeder protocol with the Denning-Sacco modification (the use of timestamps).

When Alice wands to talk with Bob (they can be users and services), she first needs to ask Kerberos. If access is authorized, Kerberos will send her two messages. One is encrypted with Alice’s secret key and contains the session key for her communication with Bob. The other message is encrypted with Bob’s secret key. Alice cannot read or decode this second message. It a ticket (sometimes known as a sealed envelope). It contains the same session key that Alice received but is encrypted for Bob. Alice will send that to Bob. When Bob decrypts it, he knows that the message must have been generated by an entity that knows its secret key: Kerberos. Now that Alice and Bob both have the session key, they can communicate securely by encrypting all traffic with that session key.

To avoid replay attacks, Kerberos places a timestamp in Alice’s response and in the ticket. For Alice to authenticate herself to Bob, she needs to prove that she was able to extract the session key from the encrypted message Kerberos sent her. She proves this by generating a new timestamp, encrypting it with the session key, and sending it to Bob. Bob now needs to prove to Alice that he can decode messages encrypted with the session key. He takes Alice’s timestamp, adds one (just to permute the value), and sends it back to Alice, encrypted with their session key.

Since your secret key is needed to decrypt every service request you make of Kerberos, you’ll end up typing your password each time you want to access a service. Storing the key in a file to cache it is not a good idea. Kerberos handles this by splitting itself into two components that run the same protocol: the authentication server (AS) and the ticket granting server (TGS). The authentication server handles the initial user request and provides a session key to access the TGS. This session key can be cached for the user’s login session and allows the user to send requests to the TGS without re-entering a password. The TGS is the part of Kerberos that handles requests for services. It also returns two messages to the user: a different session key for the desired service and a ticket that must be provided to that service.

Diffie-Hellman key exchange

The Diffie-Hellman key exchange algorithm allows two parties to establish a common key without disclosing any information that would allow any other party to compute the same key. Each party generates a private key and a public key. Despite their name, these are not encryption keys; they are just numbers. Diffie-Hellman does not implement public key cryptography. Alice can compute a common key using her private key and Bob’s public key. Bob can compute the same common key by using his private key and Alice’s public key.

Diffie-Hellman uses the one-way function abmod c. Its one-wayness is due to our inability to compute the inverse: a discrete logarithm. Anyone may see Alice and Bob’s public keys but will be unable to compute their common key. Although Diffie-Hellman is not a public key encryption algorithm, it behaves like one in the sense that it allows us to exchange keys without having to use a trusted third party.

Key exchange using public key cryptography

With public key cryptography, there generally isn’t a need for key exchange. As long as both sides can get each other’s public keys from a trusted source, they can encrypt messages using those keys. However, we rarely use public key cryptography for large messages. It can, however, be used to transmit a session key. This use of public key cryptography to transmit a session key that will be used to apply symmetric cryptography to messages is called hybrid cryptography. For Alice to send a key to Bob:

  1. Alice generates a random session key.
  2. She encrypts it with Bob’s public key & sends it to Bob.
  3. Bob decrypts the message using his private key and now has the session key.

Bob is the only one who has Bob’s private key to be able to decrypt that message and extract the session key. A problem with this is that anybody can do this. Charles can generate a random session key, encrypt it with Bob’s public key, and send it to Bob. For Bob to be convinced that it came from Alice, she can encrypt it with her private key (this is signing the message).

  1. Alice generates a random session key.
  2. She signs it by encrypting the key with her private key.
  3. She encrypts the result with Bob’s public key & sends it to Bob.
  4. Bob decrypts the message using his private key.
  5. Bob decrypts the resulting message with Alice’s public key and gets the session key.

If anybody other than Alice created the message, the result that Bob gets by decrypting it with Alice’s public key will not result in a valid key for anyone. We can enhance the protocol by using a standalone signature (encrypted hash) so Bob can identify a valid key from a bogus one.

Forward secrecy

If an attacker steals, for example, Bob’s private key, he will be able to go through old messages and decrypt old session keys (the start of every message to Bob contained a session key encrypted with his public key). Forward secrecy, also called perfect forward secrecy, is the use of keys and key exchange protocols where the compromise of a key does not compromise past session keys. There is no secret that one can steal that will allow the attacker to decrypt multiple past messages. Note that this is of value for communication sessions but not stored encrypted documents (such as email). You don’t want an attacker to gain any information from a communication session even if a user’s key is compromised. However, the user needs to be able to decrypt her own documents, so they need to rely on a long-term key.

Diffie-Hellman enables forward secrecy. Alice and Bob can each generate a key pair and send their public key to each other. They can then compute a common key that nobody else will know and use that to communicate. Achieving forward secrecy requires single-use (ephemeral) keys. Next time Alice and Bob want to communicate, they will generate a new set of keys and compute a new common key. At no time do we rely on long-term keys, such as Alice’s secret key or RSA private key. Encrypting a session key with a long-term key, such as Bob’s public key, will not achieve forward secrecy. If an attacker ever finds Bob’s private key, she will be able to extract the session key.

Difie-Hellman is particularly good for for achieving forward secrecy because it is efficient to create new new key pairs on the fly. RSA or ECC keys can be used as well but key generation is far less efficient. Because of this, RSA and ECC keys tend to be used mainly as long-term keys (e.g., for authentication).

Message Integrity

One-way functions

A one-way function is one that can be computed relatively easily in one direction but there is no known way of computing the inverse function. One-way functions are crucial in a number of cryptographic algorithms, including digital signatures, Diffie-Hellman key exchange, and both RSA and elliptic curve public key cryptography. For Diffie-Hellman and public key cryptography, they ensure that someone cannot generate the corresponding private key when presented with a public key. Key exchange and asymmetric cryptography algorithms rely on a spacial form of one-way function, called a trapdoor function. This is a function whose inverse is computable if you are provided with extra information, such as a private key that corresponds to the public key that was used to generate the data.

Hash functions

A particularly useful form of a one-way function is the cryptographic hash function. This is a one-way function whose output is always a fixed number of bits for any input. Hash functions are commonly used in programming to construct hash tables, which provide O(1) lookups of keys.

Cryptographic hash functions produce far longer results than those used for hash tables. Common lengths are 224, 256, 384, or 512 bits. Good cryptographic hash functions (e.g., SHA–1, SHA–2, SHA–3) have several properties:

  1. Like all hash functions, take arbitrary-length input and produce fixed-length output

  2. Also like all hash functions, they are deterministic; they produce the same result each time when given identical input.

  3. They exhibit pre-image resistance, or hiding. Given a hash H, it should not be feasible to find a message M where H=hash(M).

  4. The output of a hash function should not give any information about any of the input. For example, changing a byte in the message should not cause any predictable change in the hash value.

  5. They are collision resistant. While hash collisions can exist (the number of possible hashes is smaller than than number of possible messages; see the pigeonhole principle), it is not feasible to find any two different messages that hash to the same value. Similarly, it is not feasible to modify the plaintext without changing its resultant hash.

  6. They should be relatively efficient to compute. We would like to use hash functions as message integrity checks and generate them for each message without incurring significant overhead.

The cryptographic hash function is the basis for message authentication codes and digital signatures.

Because of these properties, we have extremely high assurance that a message would no longer hash to the same value if it is modified in any way. The holy grail for an attacker is to be able to construct a message that hashes to the same value as another message. That would allow the attacker to substitute a new message for some original one (for example, redirecting a money transfer). Searching for a collision with a pre-image (known message) is much harder than searching for any two messages that produce the same hash. The birthday paradox tells us that the search for a collision of any two messages is approximately the square root of the complexity of searching for a collision on a specific message. This means that the strength of a hash function for a brute-force collision attack is approximately half the number of bits of the hash. A 256-bit hash function has a strength of approximately 128 bits.

Popular hash functions include SHA–1 (160 bits), SHA–2 (commonly 256 and 512 bits), and SHA–3 (256 and 512 bits).

Message Integrity and Hash Pointers

A cryptographic hash serves as a checksum for a message. If a message has been modified, it will yield a different hash. By associating a hash with a message, we have a basis for managing the integrity of that message: being able to detect if the message gets changed.

Tamper-resistant linked-lists: blockchains

One way of associating a hash with a message is via the use of hash pointers. Pointers are used in data structures to allow one data element to refer to another. In processes, a pointer is a memory location. In distributed systems, a pointer may be an IP address and object identifier. A hash pointer is a tuple that contains a traditional pointer along with the hash of the data element that is being pointed to. It allows us to validate that the information being pointed to has not been modified.

The same structures that use pointers can be modified to use hash pointers and create tamper-evident structures. For example, a linked list can be constructed with each element containing a hash pointer to the next element instead of a pointer.


Adding a new block is easy. You allocate the block, copy the head hash pointer into it (the next pointer), and update the head hash pointer to point to the new block and contain a hash of that block.

If an adversary modifies, say, data block 1, we can detect that. The hash pointer in Data–2 will point to Data–1 but the hash of Data–1 will no longer match the hash in the pointer. For a successful attack, the adversary will also need to modify the hash value in the hash pointer in block 2. That will make the hash pointer in block 3 invalid, so that will need to be changed. The adversary will need to change all the hash pointers leading up to the head of the list. If we’re holding on to the head of the list (e.g., in a variable) so that the adversary cannot modify it, then we will always be able to detect tampering. A linked list using hash pointers is called a blockchain.

Merkle Trees

Another useful structure using hash pointers in place of conventional pointers is a binary tree, called a Merkle tree when implemented with hash pointers.

Merkle Tree
Merkle Tree

Leaf nodes of a Merkle tree contain conventional hash pointers: pointers to the data blocks and the hashes of those data blocks. Non-leaf child nodes contain left and right pointers along with the hash of the two hashes they point to. As with binary trees, Merkle trees give us the advantage of being able to locate data in O(log n) time instead of linear time. More importanlty, we can validate any data in O(log n) time by traversing from the root down to the last hash pointer at the leaf.

Applications of blockchains and Merkle trees

With Merkle trees, the information in a leaf does not generally contain information to help you traverse the tree to search for data; we are not necessarily building a search tree. The purpose of the tree is to make it efficient to manage and validate the integrity of the underlying data. That is, the hash pointer structure is there just to allow you to validate the underlying data rather than search for stuff. If you want to search, you can add extra information to each node – in general, we are not concerned with secrecy and you can build whatever search structures an application needs. Hash pointers are all about helping assess the integrity of data. Structures such as hash-pointer-based linked lists and Merkle trees were designed with peer-to-peer systems in mind where data can come from various untrusted peers. You just need to get the root hash from a trusted place.

The top-level pointer (the root in the case of a tree; the head in the case of linked lists) represents the integrity of the entire set of data. If any data block changes, that top level pointer will allow the user to detect that there has been a change. Therefore, it is important that this value be stored securely and obtained via a trustworthy mechanism.

What a Merkle tree allows you to do is check the integrity of replicated data on a branch-by-branch basis in an efficient manner. Merkle trees are designed for environments where data is replicated among multiple systems and you want each system to be able to validate the integrity of the entire file. This helps in two cases:

  1. You can validate downloaded data without having to wait for the entire set of data to be downloaded.

  2. You can efficiently compare your data with that on another system.

Suppose you have a file and want to check whether any blocks in your version are corrupted with respect to a version on another server. Both you and another system assembled your own structure of hash pointers.

With a linked list of hash pointers, you’d start at the head of the list and compare hashes. If the hashes match, you are confident that your files match. If you have a mismatch, you need to compare the next hash. If it matches what you have then you know that first block has been modified. If it doesn’t then you need to get the hash after that. Ultimately, you may need to traverse the entire list linearly.

With Merkle trees, it becomes easier to find the block (or blocks) that have changed. If the root hash matches, you know that your entire data set matches. If not, you request the left & right hashes and compare those with your tree. If one doesn’t match then you can compare the hashes under that subtree, iterating down the tree until you find the mismatched data block. You do not need to iterate through the entire list of blocks. This is attractive for replicated data sets where we have tens of millions of data blocks, for example, and sending a hash list is not efficient. It is essentially a tree search to find the block that is inconsistent.

Merkle trees are particularly useful for obtaining data from multiple untrusted sources. For example, they are used by Bitcoin and Ethereum servers to migrate copies of the transaction log (the blockchain). They are also used for data replication by a variety of NoSQL databases and by the Git version control system. Given a valid root (top-most) hash pointer, the remaining hash tree can be received from any untrusted source. The receiver can simply validate by checking hashes up to the root. Now that the receiver has the entire tree, any data blocks can be received from untrusted sources as well. Each block can be validated by comparing its hash with the hash at the leaf node of the Merkle tree.

Message Authentication Codes (MACs)

A cryptographic hash helps us ensure message integrity: it serves as a checksum that allows us to determine if a message has been modified. If the message is modified, it no longer hashes to the same value as before. However, if an attacker modifies a message, she may be able to modify the hash value as well. To prevent this, we need a hash that relies on a key for validation. This is a message authentication code, or MAC. Two forms of MACs are hash-based ones and block cipher-based ones:

Hash-based MAC (HMAC):
A hash-based MAC is a specific method for converting regular hash functions into MACs by using a cryptographic hash function, such as SHA–256, to hash the message and the key. Anyone who does not know the key will not be able to recreate the hash.
Block cipher-based MAC (CBC-MAC):
Recall that cipher block chaining assures us that every encrypted block is a function of all previous blocks. CBC-MAC uses a zero initialization vector and runs through a cipher block chained encryption, discarding all output blocks except for the last one, which becomes the MAC. Any changes to the message will be propagated to that final block and the same encryption cannot be performed by someone without the key. Note that a CBC-MAC still produces a fixed-length result and has all The properties of a hash function.

Digital signatures

Message authentication codes rely on a shared key. Anybody who possesses the key can modify and re-sign a message. There is no assurance that the action was done by the author of the message. Digital signatures have stronger properties than MACs:

  1. Only you can sign a message but anybody should be able to validate it.
  2. You cannot copy the signature from one message and have it be valid on another message.
  3. An adversary cannot forge a signature, even after inspecting an arbitrary number of signed messages.

Digital signatures require three operations:

  1. Key generation: {private_key, verification_key } := gen_keys(keysize)
  2. Signing: signature := sign(message, private_key)
  3. Validation: isvalid := verify(message, signature, verification_key)

Since we trust hashes to be collision-free, it makes sense to apply the signature to the hash of a message instead of the message itself. This ensures that the signature will be a small, fixed size and makes it easy to embed in hash pointers and other structures and creates minimal transmission or storage overhead for verification.

There are several commonly-used digital signature algorithms:

DSA, the Digital Signature Algorithm
The current NIST standard that generates key pairs that are secure because of the difficulty of computing discrete logarithms.
ECDSA, Elliptic Curve Digital Signature Algorithm
A variant of DSA that uses elliptic curve cryptography
Public key cryptographic algorithms
RSA or Elliptic Curve Cryptography applied to message hashes.

All these algorithms generate public and private key pairs. The first two are not general-purpose encryption algorithms but are designed solely for digital signatures.

We saw how public key cryptography can be used to encrypt messages: Alice encrypts a message using Bob’s public key to ensure that only Bob could decrypt it with his private key. We can use public key backwards: Alice can encrypt a message using her private key. Anyone can decrypt the message using her public key but, in doing so, would know that the message was encrypted by Alice.

A digital signature can be constructed by simply encrypting the hash of a message with the creator’s (signer’s) private key. Alternatively, digital signature algorithms have been created that apply a similar principle: hashing combined with trapdoor functions so that you would use a dedicated set of public/private keys to create and verify the signature. Anyone who has the message signer’s public key can decrypt the hash and thus validate the hash against the message. Other parties cannot recreate the signature.

Note that, with a MAC, the recipient or anyone in possession of the shared key can create the same MAC. With a digital signature, the signature can only be created by the owner of the private key. Unlike MACs, digital signatures provide non-repudiation – proof of identity. Alice cannot claim that she did not create a signature because nobody but Alice has her private key. Also unlike MACs, anyone can validate a signature since public keys are generally freely distributed. as with MACs, digital signatures also provide proof of integrity, assurance that the original message has not been modified.

Covert and authenticated messaging

We ignored the encryption of a message in the preceding discussion; our interest was assuring integrity. However, there are times when we may want to keep the message secret and validate that it has not been modified. Doing this involves sending a signature of the message along with the encrypted message.

A basic way for Alice to send a signed and encrypted message to Bob is for her to use hybrid cryptography and:

1. Create a signature of the message. This is a hash of the message encrypted with her private key.
2. Create a session key for encrypting the message. This is a throw-away key that will not be needed beyond the communication session.
3. Encrypt the message using the session key. She will use a fast symmetric algorithm to encrypt this message.
4. Package up the session key for Bob: she encrypts it with Bob's public key. Since only Bob has the corresponding private key, only Bob will be able to decrypt the session key.
5. She sends Bob: the encrypted message, encrypted session key, and signature.

Anonymous identities

A signature verification key (e.g., a public key) can be treated as an identity. You possess the corresponding private key and therefore only you can create valid signatures that can be verified with the public key. This identity is anonymous; it is just a bunch of bits. There is nothing that identifies you as the holder of the key. You can simply assert your identity by being the sole person who can generate valid signatures.

Since you can generate an arbitrary number of key pairs, you can create a new identity at any time and create as many different identities as you want. When you no longer need an identity, you can discard your private key for that corresponding public key.

Identity binding: digital certificates

While public keys provide a mechanism for asserting integrity via digital signatures, they are themselves anonymous. We’ve discussed a scenario where Alice uses Bob’s public key but never explained how she can assert that the key really belongs to Bob and was not planted by an adversary. Some form of identity binding of the public key must be implemented for you to know that you really have my public key instead of someone else’s. How does Alice really know that she has Bob’s public key?

X.509 digital certificates provide a way to do this. A certificate is a data structure that contains user information (called a distinguished name) and the user’s public key. This data structure also contains a signature of the certification authority. The signature is created by taking a hash of the rest of the data in the structure and encrypting it with the private key of the certification authority. The certification authority (CA) is responsible for setting policies of how they validate the identity of the person who presents the public key for encapsulation in a certificate.

To validate a certificate, you would hash all the certificate data except for the signature. Then you would decrypt the signature using the public key of the issuer. If the two values match, then you know that the certificate data has not been modified since it has been signed. The challenge is how to get the public key of the issuer. Public keys are stored in certificates, so the issuer would have a certificate containing its public key. This certificate can be signed by yet another issuer. This kind of process is called certificate chaining. For example, Alice can have a certificate issued by the Rutgers CS Department. The Rutgers CS Department’s certificate may be issued by Rutgers University. Rutgers University’s certificate could be issued by the State of New Jersey Certification Authority, and so on. At the very top level, we will have a certificate that is not signed by any higher-level certification authority. A certification authority that is not underneath any other CA is called a root CA. In practice, this type of chaining is rarely used. More commonly, there are hundreds of autonomous certification authorities acting as root CAs that issue certificates to companies, users, and services. The certificates for many of the trusted root CAs are preloaded into operating systems or, in some cases, browsers. See here for Microsoft’s trusted root certificate participants and here for Apple’s trusted root certificates.

Every certificate has an expiration time (often a year or more in the future). This provides some assurance that even if there is a concerted attack to find a corresponding private key to the public key in the certificate, such a key will not be found until long after the certificate expires. There might be cases where a private key might be leaked or the owner may no longer be trustworthy (for example, an employee leaves a company). In this case, a certificate can be revoked. Each CA publishes a certificate revocation list, or CRL, containing lists of certificates that they have previously issued that should no longer be considered valid. To prevent spoofing the CRL, the list is, of course, signed by the CA. Each certificate contains information on where to obtain revocation information.

The challenge with CRLs is the TOCTTOU problem: not everyone may check the certificate revocation list in a timely manner and some systems may accept a certificate not knowing that it was revoked. Some systems, particularly embedded systems, may not even be configured to handle CRLs.


Authentication is the process of binding an identity to a user. Note the distinction between authentication and identification. Identification is simply the process of asking you to identify yourself (for example, ask for a login name). Authentication is the process of proving that the identification is correct. Authorization is the process of determining whether the user is permitted to do something.

Authentication factors

The three factors of authentication are:

  1. something you have (such as a key or a card),
  2. something you know (such as a password or PIN),
  3. and something you are (biometrics).

Combining these into a multi-factor authentication scheme can increase security against the chance that any one of the factors is compromised. Multi-factor authentication must use two or more of these factors. Using two passwords, for example, is not sufficient and does not qualify as multi-factor.

Password Authentication Protocol

The classic authentication method is the use of reusable passwords. This is known as the password authentication protocol, or PAP. The system asks you to identify yourself (login name) and then enter a password. If the password matches that which is associated with the login name on the system then you’re authenticated.

Password guessing defenses

To avoid having an adversary carry out a password guessing attack, we need to make it not feasible to try a large number of passwords. A common approach is to rate-limit guesses. When the system detects an incorrect password, it will wait several seconds before allowing the user to try again. Linux, for example, waits about three seconds. After five bad guesses, it terminates and restarts the login process.

Another approach is to completely disallow password guessing after a certain number of failed attempts by locking the account. This is common for some web-based services, such as banks. However, the system has now been made vulnerable to a denial-of-service attack. An attacker may not be able to take your money but may inconvenience you by disallowing you to access it as well.

Hashed passwords

One problem with the password authentication protocol is that if someone gets hold of the password file on the system, then they have all the passwords. The common way to thwart this is to store hashes of passwords instead of the passwords themselves. This takes advantage of the one-way property of the hash: anyone who sees the hash still has no way of computing your password.

To authenticate a user, the system simply checks if hash(password) = stored_hashed_password. If someone got hold of the password file, they’re still stuck since they won’t be able to reconstruct the original password from the hash. They’ll have to resort to an exhaustive search (also known as a brute-force search) to search for a password that hashes to the value in the file. The hashed file should still be protected from read access by normal users to keep them from performing an exhaustive search.

A dictionary attack is an optimization of the search that tests common passwords, including dictionary words, known common passwords, and common letter-number substitutions rather than every possible combination of characters. Moreover, an intruder does not need to perform such search on each hashed password to find the password. Instead, the results of a dictionary search can be stored in a file and later searched to find a corresponding hash in a password file. These are called precomputed hashes. To guard against this, a password is concatenated with a bunch of extra random characters, called salt. These characters make the password substantially longer and would make a table of precomputed hashes insanely huge and hence not practical. Such a table would need to go far beyond a dictionary list and create hashes of all possible - and long - passwords. The salt is not a secret – it is stored in plaintext in the password file in order to validate a user’s password. Its only function is to make using precomputed hashes impractical and ensure that even identical passwords do not generate the same hashed results. An intruder would have to select one specific hashed password and do a brute-force or dictionary attack on just that password, adding salt to each guess prior to hashing it.

Password recovery options

Passwords are bad. They are not incredibly secure. English text has a low entropy (approximately 1.2–1.5 bits per character) and are often easy to guess. Password files from some high-profile sites have been obtained to validate just how bad a lot of people are at picking passwords. Over 90% of all user passwords sampled are on a list of the top 1,000 passwords. The most common password is password. People also tend to reuse passwords. If an attacker can get passwords from one place, there is a good chance that many of those passwords will work with other services.

Despite many people picking bad passwords, people often forget them, especially when they are trying to be good and use different passwords for different accounts. There are several common ways of handling forgotten passwords, none of them great:

Email them:
This used to be a common solution and is somewhat dying off. It requires that the server is able to get the password (it is not stored as a hash). It exposes the risk that anyone who might see your email will see your password.
Reset them:
This is more common but requires authenticating the requestor to avoid a denial of service attack. The common thing to do is to send a password reset link to an email address that was entered when the account was created. We again have the problem that if someone has access to your mail, they will have access to the password reset link and will be able to create a new password for your account. In both these cases, we have the problem that users may no longer have the same email address. Think of the people who switched from Comcast to get Verizon FiOS and switched their addresses to (note: avoid using email addresses tied to services or locations that you might change).
Provide hints:
This is common for system logins (e.g. macOS and Windows). However, a good hint may weaken the password or may not help the user.
Ask questions:
It is common for sites to ask questions (“what was your favorite pet’s name?”, “what street did you live on when you were eight years old?”). The answers to many of these questions can often be found through some searching or via social engineering. A more clever thing is to have unpredictable answers (“what was your favorite pet’s name?” “Osnu7$Qbv999”) but that requires storing answers somewhere.
Rely on users to write them down:
This is fine as long as the thread model is electronic-only and you don’t worry about someone physically searching for your passwords.

One-time Passwords

The other problem with reusable passwords is that if a network is insecure, an eavesdropper may sniff the password from the network. A potential intruder may also simply observe the user typing a password. To thwart this, we can turn to one-time passwords. If someone sees you type a password or gets it from the network stream, it won’t matter because that password will be useless for future logins.

There are three forms of one-time passwords:

  1. Sequence-based. Each password is a function of the previous password. S/Key is an example of this.

  2. Challenge-based. A password is a function of a challenge provided by the server. CHAP is an example of this.

  3. Time-based. Each password is a function of the time. TOTP and RSA’s SecurID are example of this.

Sequence-based: S/Key

S/Key authentication allows the use of one-time passwords by generating a list via one-way functions. The list is created such that password n is generated as f(password[n–1]), where f is a one-way function. The list of passwords is used backwards. Given a password password[p], it is impossible for an observer to compute the next valid password because a one-way function f makes it improbably difficult to compute the inverse function, f–1(password[p]), to get the next valid password, password[p–1].

Challenge-based: CHAP

The Challenge-Handshake Authentication Protocol (CHAP) is an authentication protocol that allows a server to authenticate a user without sending a password over the network.

Both the client and server share a secret (essentially a password). A server creates a random bunch of bits (called a nonce) and sends it to the client (user) that wants to authenticate. This is the challenge.

The client identifies itself and sends a response that is the hash of the shared secret combined with the challenge. The server has the same data and can generate its own hash of the same challenge and secret. If the hash matches the one received from the client, the server is convinced that the client knows the shared secret and is therefore legitimate.

An intruder that sees this hash cannot extract the original data. An intruder that sees the challenge cannot create a suitable hashed response without knowing the secret. Note that this technique requires passwords to be accessible at the server and the security rests on the password file remaining secure.

Time-based: TOTP and SecurID®

With the Time-based One Time Password (TOTP) protocol, both sides share a secret key. To authenticate, a user runs the TOTP function to create a one-time password. The TOTP function is a hash:

password := hash(secret_key, time) % 10^{password_length}

The resultant hash is taken modulo some number that determines the length of the password. A time window of 30 seconds is usually used to provide a reasonably coarse granularity of time that doesn’t put too much stress on the user or requirements for tight clock synchronization. The service, who also knows the secret key and time, can generate the same hash and hence validate the value presented by the user.

TOTP is often used as a second factor (proof that you have some device with the secret configured in it) in addition to a password. The protocol is widely supported by companies such as Amazon, Dropbox, Wordpress, Microsoft, and Google.

A closely related system is RSA’s SecureID is a two-factor authentication system that generates one-time passwords for response to a user login prompt. It relies on a user password (Personal ID Number, PIN) and a token device (an authenticator card or fob). The token generates a new number every 30 seconds. The number is a function of a seed that is unique for each card and the time of day.

To authenticate to a server, you send a concatenation of your PIN and the number from the token in lieu of a password. A legitimate remote system will have your PIN, the token seed, and the time of day and will be able to compute the same value to validate your password. An intruder would not know your PIN or the token’s seed and will never see it on the network.

Public key authentication

Public key authentication relies on the use of nonces, similar to the way they were used to authenticate users using the Needham-Schroeder protocol. A nonce is is generated on the fly and used to present to the other party as a challenge for them to prove that they are capable of encrypting something with a specific key that they possess. The use of a nonce is central to public key authentication.

If Alice wants to authenticate Bob, she needs to have Bob prove that he possesses his private key (private keys are never shared). To do this, Alice generates a nonce (a random bunch of bits) and sends it to Bob, asking him to encrypt it with his private key. If she can decrypt Bob’s response using Bob’s public key and sees the same nonce, she will be convinced that she is talking to Bob because nobody else will have Bob’s private key. Mutual authentication requires that each party authenticate itself to the other: Bob will also have to generate a nonce and ask Alice to encrypt it with her private key.

FIDO Universal Second Factor (U2F) Authenticators

FIDO U2F is a standard that was developed as an easy-to-use second factor for authentication and was primarily designed with web-based services in mind. It is generally implemented as a USB or Bluetooth fob. JavaScript on the browser calls APIs to communicate with the hardware. Hence, unlike TOTP or SecurID, the user does not have to transcribe any numbers. The entire user experience is:

  1. Enter name and password (this is the first factor).
  2. Insert the U2F key and touch the button. This validates the user’s physical presence and runs the client protocol.
  3. You’re logged in.

U2F is currently supported by Google, Microsoft, Dropbox, GitHub, RSA, and many others as well as popular password managers and web browsers (except Safari, but it is in Apple’s preview releases). Google switched from TOTP to U2F in 2017 and reports that there were no successful phishing attacks against their 85,000+ employees since then.

There are two interactions: registration with a service and authentication.

For registration:

  • The server sends a challenge (a random bunch of bits).
  • The U2F device generates a public/private key pair and creates a data structure containing its ID and its public key. This is called the handle.
  • The device signs a message comprising the handle and the server’s challenge and sends the result to the server.
  • The server verifies the signed data against the public key in the message.
  • The server then stores the key handle and associated public key in its database for future authentication sessions.

For authentication (after the user registered the device):

  • The server sends a challenge along with the user’s key handle (which it stored in its database)
  • The device extracts the private key for the service (the private key never leaves the device)
  • The device signs a data structure containing client info and the challenge & sends it back
  • The server receives the data and verifies it against the stored public key.

U2F perform its interactions over a TLS link, so all communication between the client (browser) and server are encrypted. Note that keys are generated per site, so the authenticator will not use the same key pair for different services. U2F prevents phishing attacks because only the real site authenticate the user with a specific key. Authentication will fail on a fake (phishing) site even if the user was fooled into thinking it was real.

Man-in-the-middle attacks

Authentication protocols can be vulnerable to man-in-the-middle (MITM) attacks. In this attack, Alice thinks she is talking to Bob but is really talking to Mike (the man in the middle, an adversary). Mike, in turn talks to Bob. Any message that Alice sends gets forwarded by Mike to Bob. Mike forwards any response from Bob gets back to Alice. This way, Mike allows Alice and Bob to carry out their authentication protocol. Once Bob is convinced he is talking with Alice, Mike can drop Alice and communicate with Bob directly, posing as Alice … or stay around and read their messages, possibly changing them as he sees fit.

The protocols that are immune to this are those where Alice and Bob establish an encrypted channel using trusted keys. For example, with Kerberos, both Alice and Bob get a session key that is encrypted only for each of them. Mike cannot find it even if he intercepts their communications.

With public key cryptography, Mike can take over after Bob is convinced he is talking with Alice. To avoid a man-in-the-middle attack Alice will have to send Bob a session key. If she uses public key cryptography to do the key exchange, as long as the message from Alice is signed, Mike will not be able to decrypt the session key or forge a new one.

Code signing

We have seen how we could use hash functions for message integrity in the form of MACs (message authentication codes, which use a shared key) and digital signatures (which use public and private keys). The same mechanism is employed to sign software: to validate that software has not been modified since it was created by the developer.

The advantages of signing code are that the software can be downloaded from untrusted servers or distributed over untrusted channels and still be validated to be untampered. It also enables us to detect whether malware on our local system has modified the software.

Microsoft Windows, Apple macOS, iOS, and Android all make extensive use of signed software. Signing an application is fundamentally no different than signing any other digital content:

  1. As a software publisher, you create a public/private key pair
  2. You obtain a digital certificate for the public key. In some cases, you need to obtain it from a certification authority (CA) that can certify you as a software publisher.
  3. You create a digital signature of the software that you’re distributing: generate a hash and encrypt it with your private key.
  4. Attach the signature and certificate to the software package. This will enable others to validate the signature.

Prior to installation, the system will validate the certificate and then validate the signature. If the signature does not match the hash of the software package, that indicates that the software has been altered. Signed software usually also supports per-page hashes. Recall demand paging in operating systems: an operating system does not load a program into memory at once; it only loads chunks (pages) as they are needed. This is called demand paging. Signed software will often include signatures for each page (typically 4K bytes) and each page will be validated as it is loaded into memory. This avoids the overhead of validating the entire file prior to running the program (e.g., the executable for Adobe Photoshop is over 100 MB).

Biometric authentication

Biometric authentication is the process of identifying a person based on their physical or behavioral characteristics as opposed to their ability to remember a password or their possession of some device. It is the third of the three factors of authentication: something you know, something you have, and something you are.

It is also fundamentally different than the other two factors because it does not deal with data that lends itself to exact comparisons. For instance, sensing the same fingerprint several times is not likely to give you identical results each time. The orientation may differ, the pressure and angle of the finger may result in some parts of the fingerprint to appear in one sample but not the other, and dirt, oil, and humidity may alter the image. Biometric authentication relies on statistical pattern recognition: we establish thresholds to determine whether two patterns are close enough to accept as being the same.

A false accept rate (FAR) is when a pair of different biometric samples (e.g., fingerprints from two different people) is accepted as a match. A false reject rate (FRR) is when a pair of identical biometric samples is rejected as a match. Based on the properties of the biometric data, the sensor, the feature extraction algorithms, and the comparison algorithms, each biometric device has a characteristic ROC (Receiver Operating Characteristic) curve. The name derives from early work on RADAR and maps the false accept versus false reject rates for a given biometric authentication device. For password authentication, the “curve” would be a single point at the origin: no false accepts and no false rejects. For biometric authentication, which is based on thresholds that determine if the match is “close enough”, we have a curve.

At one end of the curve, we can have an incredibly low false accept rate (FAR). This is good as it means that we will not have false matches: the enemy stays out. However, it also means that the false reject rate (FRR) will be very high. If you think of a fingerprint biometric, the stringent comparison needed to yield a low FAR means that the algorithm will not be forgiving to a speck of dirt, light pressure, or a finger held at a different angle. We get high security at the expense of inconveniencing legitimate users, you may have to present their finger over and over again for sensing, hoping that it will eventually be accepted.

At the other end of the curve, we have a very low false reject rate (FRR). This is good since it provides convenience to legitimate users. Their biometric data will likely be accepted as legitimate and they will not have to deal with the frustration of re-sensing their biometric, hoping that their finger is clean, not too greasy, not too dry, and pressed at the right angle with the correct pressure. The trade-off is that it’s more likely that another person’s biometric data will be considered close enough as well and accepted as legitimate.

There are numerous biological components that can be measured. They include fingerprints, irises, blood vessels on the retina, hand geometry, facial geometry, facial thermographs, and many others. Data such as signatures and voice can also be used, but these often vary significantly with one’s state of mind (your voice changes if you’re tired, ill, or angry). They are behavioral systems rather than purely physical systems, such as your iris patterns, length of your fingers, or fingerprints, and tend to have lower recognition rates. Other behavioral biometrics include keystroke dynamics, mouse use characteristics, gait analysis, and even cognitive tests.

Regardless of which biometric is used, the important thing to do in order to make it useful for authentication is to identify the elements that make it different. Most of us have swirls on our fingers. What makes fingerprints different from finger to finger are the various variations in those swirls: ridge endings, bifurcations, enclosures, and other elements beyond that of a gently sloping curve. These features are called minutia. The presence of minutia, their relative distances from each other an their relative positions can allow us to express the unique aspect of a fingerprint as a relatively compact stream of bits rather than a bitmap.

Two important elements of biometrics are robustness and distinctiveness. Robustness means that the biometric data will not change much over time. Your fingerprints will look mostly the same next year and the year after. Your fingers might grow fatter (or thinner) over the years and at some point in the future, you might need to re-register your hand geometry data.

Distinctiveness relates to the differences in the biometric pattern among the population. Distinctiveness is also affected by the precision of a sensor. A finger length sensor will not measure your finger length to the nanometer, so there will be quantized values in the measured data. Moreover, the measurements will need to account for normal hand swelling and shrinking based on temperature and humidity, making the data even less precise. Accounting for these factors, approximately one in a hundred people may have hand measurements similar to yours. A fingerprint sensor may typically detect 40–60 distinct features that can be used for comparing with other sensed fingerprints. An iris scan, on the other hand, will often capture over 250 distinct features, making it far more distinctive and more likely to identify a unique individual.

Some sensed data is difficult to normalize. Here, normalization refers to the ability to align different sensed data to some common orientation. For instance, identical fingers might be presented at different angles to the sensors. The comparison algorithm will have to account for possible rotation when comparing the two patterns. The inability to normalize data makes it difficult to perform efficient searches. There is no good way to search for a specific fingerprint short of performing a comparison against each stored pattern. Data such as iris scans lends itself to normalization, making it easier to find potentially matching patterns without going through an exhaustive search.

In general, the difficulty of normalization and the fact that no two measurements are ever likely to be the same makes biometric data not a good choice for identification. It is extremely difficult, for example, to construct a system that will store hundreds of thousands of fingerprints and allow the user to identify and authenticate themselves by presenting their finger. Such a system will require an exhaustive search through the stored data and each comparison will itself be time consuming as it will not be a simple bit-by-bit match test. Secondly, fingerprint data is not distinct enough for a population of that size. A more realistic system will use biometrics for verification and have users identify themselves through some other means (e.g., type their login name) and then present their biometric data. In this case, the software will only have to compare the pattern associated with that user.

The biometric authentication process comprises several steps:

  1. Enrollment. Before any authentication can be performed, the system needs to have stored biometric data of the user that it can use for comparison. The user will have to present the data to the sensor, distinctive features need to be extracted, and the resulting pattern stored. The system may also validate if the sensed data is of sufficiently high quality or ask the user to repeat the process several times to ensure consistency in the data.

  2. Sensing. The biological component needs to be measured by presenting it to a sensor, a dedicated piece of hardware that can capture the data (e.g., a camera for iris recognition, a capacitive fingerprint sensor). The sensor captures the raw data (e.g., an image).

  3. Feature extraction. This is a signal processing phase where the interesting and distinctive components are extracted from the raw sensed data to create a biometric pattern that can be used for matching. This process involves removing signal noise, discarding sensed data that is not distinctive or not useful for comparisons, and determining whether the resulting values are of sufficiently good quality that it makes sense to use them for comparison. A barely-sensed fingerprint, for instance, may not present enough minutia to be considered useful.

  4. Pattern matching. The extracted sample is now compared to the stored sample that was obtained during the enrollment phase. Features that match closely will have small distances. Given variations in measurements, it is unlikely that the distance will be zero, which would indicate a perfect match.

  5. Decision. The “distance” between the sensed and stored samples is now evaluated to decide if the match is close enough. The decision determination decides whether the system favors more false rejects or more false accepts.

Security implications

There are several security issues that relate to biometric authentication.

Unlike passwords or encryption keys, biometric systems require sensors to gather the data. The sensor, its connectors, the software that processes sensed data, and the entire software stack around it (operating system, firmware, libraries) must all be trusted and tamper-proof.
Secure communication and storage
The communication path after the data is captured and sensed must also be secure so that attackers will have no ability to replace a stored biometric pattern with one of their own.
Much biometric data can be forged. Gummy fingerprints can copy real fingerprints, pictures of faces or eyes can fool cameras into believing they are looking at a real person, and recordings can be used for voice-based authentication systems.
Since biometric data relies on “close-enough” matches, you can never be sure of a certain match. You will need to determine what threshold is good enough and hope that you do not annoy legitimate users too much or make it too easy for the enemy to get authenticated.
Lack of compartmentalization
You have a finite set of biological characteristics to present. Fingerprints and iris scans are the most popular biometric sources. Unlike passwords, where you can have distinct passwords for each service, you cannot have this with biometric data.
Theft of biometric data
If someone steals your password, you can create a new one. If someone steals your fingerprint, you have nine fingerprints left and then none. If someone gets a picture of your iris, you have one more left. Once biometric data is compromised, it remains compromised.

Detecting humans

CAPTCHA (Completely Automated Public Turning test to tell Computers and Humans Apart) is not a technique to authenticate users but rather a technique to identify whether a system is interacting with a human being or with automated software. The idea behind it is that humans can recognize highly distorted characters far better than character recognition software can.

CAPTCHA presents an image containing a string of distorted text and asks the user to identify the text. As optical character recognition (OCR) technology improved, this text had to be ever more distorted and often reached a point where legitimate users struggled to decode it. CAPTCHAs were designed to thwart scripts that would, for example, sign up for thousands of logins on a service or buy all tickets to an event. CAPTCHAs do this by having a human solve the CAPTCHA before proceeding.

This was not always successful. CAPTCHAs were susceptible to a form of a man-in-the-middle attack where the distorted image is presented to low-cost (or free) humans whose job is to decipher CAPTCHAs. These are called CAPTCHA farms. Ever-improving OCR technology also made text-based CAPTCHAs susceptible to attack. By 2014, Google found that they could use AI techniques to crack CAPTCHAs with 99.8% accuracy.

An alternative to text-based CAPTCHAs are CAPTCHAs that involve image recognition, such as “select all images that have mountains in them” or "select all squares in an image that have street signs’. A recent variation of CAPTCHA is Google’s No CAPTCHA reCAPTCHA. This simply asks users to check a box stating that I’m not a robot. The JavaScript behind the scenes, however, does several things:

  • It contacts the Google server to perform an “advanced risk analysis”. What this does is not defined but the act of contacting the server causes the web browser to send Google cookies to the server. If you’re logged in to a Google account, your identity is now known to the server via a cookie and the server can look at your past history to determine if you are a threat.

  • By contacting the Google server, the server can also check where the request came from and compare it with its list of known malicious IP addresses known to host bots

  • The JavaScript code monitors the user’s engagement with the CAPTCHA, measuring mouse movements, scroll bar movement, acceleration, and the precise location of clicks. If everything is too perfect then the software will assume it is not dealing with a human being.

The very latest variation of this system is the invisible reCAPTCHA. The user doesn’t even see the checkbox: it is oriented tens of thousands of pixels above the origin, so the JavaScript code is run but the reCAPTCHA frame is out of view. If the server-based risk analysis does not get sufficient information from the Google cookies then it relocates the reCAPTCHA frame back down to a visible part of the screen.

Finally, if the risk analysis part of the system fails, the software presents a CAPTCHA (recognize text on an image) or, for mobile users, a quiz to find matching images.

Bitcoin & Blockchain

Bitcoin was introduced anonymously in 2009 by a person or group named Satoshi Nakamoto and is considered to be the first blockchain-based cryptocurrency. Bitcoin was designed as an open, distributed, public system: there is no authoritative entity and anyone can participate in operating the servers.

Traditional payment systems rely on banks to serve as a trusted third party. If Alice pays $500 to Charles, the bank, acting as a trusted third party, deducts $500 from Alice’s account and adds $500 to Charles’ account. Beyond auditing, there is no need to maintain a log of all transactions; we simply care about account sums. With a centralized system, all trust resides in this trusted third party. The system fails if the bank disappears, the banker makes a mistake, or if the banker is corrupt.

With Bitcoin, the goal was to create a completely decentralized, distributed system that allows people to manage transactions while preventing opportunities for fraud.

The ledger and the Bitcoin network

Bitcoin maintains a complete list of every single transaction since its creation in January 2009. This list of transactions is called the ledger and is stored in a structure called a blockchain. Complete copies of the ledger are replicated at Bitcoin nodes around the world. There is no concept of a master node or master copies of the ledger. All of these systems run the same software. New systems get the names of some well-known nodes when they download the software and DNS query on these nodes returns their IP addresses. After connecting to one or more nodes, a Bitcoin node will ask each for a list of known Bitcoin nodes. This creates a peer discovery process that allows a node to get a complete list of other nodes in the network.

User identities and addresses

We know how to create unforgeable messages: just sign them. If Alice wants to transfer $500 to Charles, she can create a transaction record that describes this transfer and sign it with her private key (e.g., use a digital signature algorithm or create a hash of the transaction and encrypt it with her private key). Bitcoin uses public-private key pairs and digital signatures to sign transactions.

Bitcoin transactions — the movement of bitcoins from one account to another — are associated with public these keys and not users. Users are anonymous. Your identity is your public key and you can use this identity by proving they have the corresponding private key.

There is never any association of your public key with your name. In fact, nothing stops you from creating multiple keys and having identities. The system does not care, or know, what your physical identity is or how many addresses you assigned to yourself. All that matters is that only you have the corresponding private keys to the public keys identified in your transactions so you are the only one who could have created valid signatures for your transactions.

If you can create transactions on behalf of a specific public key, it means that you own the corresponding private key. If you lose that private key, then you can no longer create transactions and hence cannot access your Bitcoin funds. There is nobody to call to recover a lost key since you are solely responsible for storing it!

In its initial deployment, your public key was your Bitcoin identity. If someone wanted to transfer money to you, they would create a transaction where your public key is identified as the recipient of the bitcoin. Bitcoin now identifies recipients by their Bitcoin address. Your Bitcoin address is essentially a hash of your public key (and you will have several addresses if you have several keys). The details of creating an address are a bit cumbersome:

  1. Generate an ECDSA (Elliptic Curve Digital Signature Algorithm) public, private key pair. This serves as your identity and signing key.

  2. Create a SHA–256 hash of the public key

  3. Perform a RIPEMD–160 hash on that

  4. Add a version byte in front of the result

  5. Perform a SHA–256 hash on the result … and another SHA–256 hash on that result

  6. The four bytes of the result are treated as the address checksum. Add these four bytes to the end of the RIPEMD–160 hash from [4]

  7. Convert the bytes to a base–58 string using Base58Check encoding to create a 20-byte printable string. Base58Check encoding adds a checksum to be able to validate that the value is not mistyped. This string is your bitcoin address.

When all is said and done, the address is essentially just a hash of your public key. Since a hash is a one-way function, someone can create (or verify) your address if they are presented with your public key. However, they cannot derive your public key if they have your address.

Bitcoin uses addresses only as destinations; an address can only receive funds. If Bob wants to send bitcoin to Alice, he will identify Alice as the output – the target of the money – by using her address. At some point in the future, Alice can use that money by creating a transaction whose source (input) refers to the transaction where she received the bitcoin. Any bitcoin node can validate this transaction:

  • Alice’s transaction will identify where the money comes from (inputs). Each input contains contains a reference to a past transaction, her public key, and her signature.

  • A node can validate the signature by using Alice’s public key, which is also a field of the input. This proves that someone who owns the private key (Alice) that corresponds to that public key created the transaction.

  • That transaction input contains a reference to an older transaction where the output – Alice’s address – is identified as the output of the bitcoin. Given the address, we cannot derive the public key but now we have both.

  • A bitcoin node can hash Alice’s public key (from the input) to create the address and see that it is the same as in the referenced old transaction (the output). That way, it validates that the older transaction indeed gives money to Alice.

User transactions (moving coins)

A transaction contains inputs and outputs. Inputs identify where the bitcoin comes from and outputs identify where to whom it is being transferred. If Alice wants to send bitcoin to Bob, she creates a message that is a bitcoin transaction and sends it to one or more bitcoin nodes. When a node receives a message, it will forward the transaction to its peers (other nodes it knows about). Typically, within approximately five seconds, every bitcoin node on the network will have a copy of the transaction and can process it.

The bitcoin network is not a database. It is build around a ledger, the list of all transactions. There are no user accounts that can be queried. In her transaction, Alice needs to provide one or more links to previous transactions that will add up to at least the required amount of bitcoin that she’s sending. These links to earlier transactions are called inputs. Each input is an ID of an earlier transaction. Inputs are outputs of previous transactions.

When a bitcoin node receives a transaction, it performs several checks:

  1. The signature of each input is validated by checking it against the public key in the transaction. This ensures that it was created by someone who has the private key that corresponds to the public key.

  2. It hashes the public key in the transaction to create the address, which will be matched against the output addresses in the inputs.

  3. The transactions listed in the inputs are validated to make sure that those transactions have not been used by any other transaction. This ensures there will be no double spending.

  4. Finally, it makes sure that there is a sufficient quantity of bitcoin output by those input transactions.

A bitcoin transaction contains:

One or more inputs:
Each input identifies transactions where coins come from. These are references to past transactions. Each input also contains a signature and a public key that corresponds to the private key that was used to create the signature. A user may have multiple identities (keys) and reference past transactions that were directed to different addresses that belong to the user.
Destination address & amount – who the money goes to. This is simply the recipient’s bitcoin address.

Change: : The transaction owner’s address & bitcoin amount. Every input must be completely spent Any excess is generated as another output to the owner of the transaction.

Transaction fee (anywhere from 10¢ to a few $ per transaction). There is a limited amount of space (about 1 MB) in a block. A transaction is about 250 bytes. To get your transaction processed quickly, you need to outbid others.

Blocks and the blockchain

Transactions are sent to all the participating servers. Each system keeps a complete copy of the entire ledger, which records all transactions from the very first one. Currently the bitcoin ledger is about 250 GB.

Transactions are grouped into a block. A block is just a partial list of transactions. When a server is ready to do so, it can add the block to the ledger, forming a linked list of blocks that comprise the blockchain. In Bitcoin, a block contains ten minutes worth of transactions, all of which are considered to be concurrent.

Every ten minutes, a new block of transactions is added to the blockchain. A block is approximately a megabyte in size and holds around 4,000 transactions. To make it easy to locate a specific transaction within a block, the blocks are stored in a Merkle tree. This is a binary tree of hash pointers and makes it easy not just to locate a desired transaction but to validate that it has not been tampered by validating the chain of hashes along the path.

Securing the Block

A critically important part of the Bitcoin blockchain is to make sure that blocks in the blockchain have not been modified. We explored the basic concept of a blockchain earlier. Each block contains a hash pointer to the previous block in the chain. A hash pointer not only points to the previous block but also contains a SHA–256[1] hash of that block. This creates a tamper-proof structure. If the contents of any block are modified (accidentally or maliciously), the hash pointer that points to that block will no longer be valid (the hashes won’t match).

To make a change to a block, an attacker will need to modify all the hash pointers from the most recent block back to the block that was changed. One way to prevent such a modification could have been to use signed hash pointers to ensure an attacker cannot change their values. However, that would require someone to be in charge of signing these pointers and there is no central authority in Bitcoin; anyone can participate in building the blockchain. We need a different way to protect blocks from modification.

Proof of Work

Bitcoin makes the addition of a new block – or modification of a block in a blockchain – difficult by creating a puzzle that needs to be solved before the block can be added to the blockchain. By having a node solve a sufficiently difficult puzzle, there will be only a tiny chance that two or more nodes will propose adding a block to the chain at the same time.

This puzzle is called the Proof of Work and is an idea that has been adapted from an earlier system called hashcash. Proof of Work requires computing a hash of three components, hash(B, A, W) where:

  • B = block of transactions (which includes the hash pointer to the previous block)
  • A = address (i.e., hash of the public key) of the owner of the server doing the computation
  • W = the Proof of Work number

When servers are ready to commit a block of transactions onto the chain, they each compute this hash, trying various values of W until the hash result has a specific pre-defined property. The property they are searching for is a hash value that is less than some given number. Currently, it’s a value that requires the leading 74 bits of the 256-bit hash to all be 0s). The property changes over time to ensure that the puzzle never gets too easy regardless of how many nodes are in the network or how fast processors get.

Recall that one property of a cryptographic hash function is the inability to deduce any of the input by looking at the output. Hence, we have no idea what values of W will yield a hash with the desired properties. Servers have to try trillions of values with the hope that they will get lucky and find a value that yields the desired hash. This process of searching for W is called mining.

When a server finds a value of W that yields the desired hash, it advertises that value to the entire set of bitcoin servers. Upon receiving this message, it is trivial for a server to validate the proof of work by simply computing hash(B, A, W) with the W sent in the message and checking the resultant value. The servers then add the block, which contains the Proof of Work number and the winner’s address, onto the blockchain.

Bitcoin’s mining difficulty is adjusted every 2,016 blocks, which corresponds to approximately 14 days, to keep the average rate at which blocks are added to the blockchain at 10 minutes. This allows the network to handle changes in the number of miners participating in computing the proof work.

Double Spending and modifying past transactions

A major concern with decentralized cryptocurrency systems is double spending. Double spending refers to sending the same funds (or tokens) to multiple parties: Alice sends $500 to Charles and $500 to David but only has $500 in her account. Bitcoin deals with this by having every server maintain the complete ledger, so Alice’s entire list of transactions can be validated before a new one is accepted.

Alice may decide to go back to an older transaction and modify it. For example, she might change change the transaction that sent bitcoin to Charles into one that sends money to David – or simply delete the fact that she paid Charles the full amount.

To do this, she would need to compute a new proof of work value for that block so the block hash will be valid. Since Bitcoin uses hash pointers, each block contains a hash pointer to the previous (earlier) block. Alice would thus need to compute new proof of work values for all newer blocks in the chain so that her modified version of the entire blockchain is valid. She ends up making a competing blockchain.

Recomputing the proof of work numbers is a computationally intensive process. Because of the requirement to generate the Proof of Work for each block, a malicious participant will not be able to catch up with the cumulative work of all the other participants. Because of errors or the rare instances where multiple nodes compute the proof of work concurrently, even honest participants may, on occasion, end up building a competing blockchain. Bitcoin’s policy is that the longest chain in the network is the correct one. The length of the chain is the chain’s score and the highest-scoring chain will be considered the correct one by the servers. A participant is obligated to update its chain with a higher-scoring one if it gets notice of a higher-scoring chain from another system. If it doesn’t update and insists on propagating its chain as the official one, its chain will simply be ignored by others.

51% Attack

Let us go back to the example of Alice maliciously modifying a past transaction. In addition to the work of modifying the existing blockchain, Alice will also need to process new transactions that are steadily arriving, and making the blockchain get longer as new blocks get added to it. She needs to change the existing blockchain and also compute proof of work values for new blocks faster than everyone else in the network so that she would have the longest valid chain and hence a high score.

If she can do this then her chain becomes the official version of the blockchain and everyone updates their copy. This is called a 51% attack. To even have a chance of succeeding, Alice would need more computing power than the reset of the systems in the Bitcoin network combined. Back in 2017, The Economist estimated that “bitcoin miners now have 13,000 times more combined number-crunching power than the world’s 500 biggest supercomputers,” so it is not feasible for even a nation-state attacker to harness sufficient power to carry out this attack on a popular cryptocurrency network such as Bitcoin. Blockchain works only because of the assumption that the majority of participants are honest … or at least not conspiring together to modify the same transactions.

Even if someone tried to do this attack, they’d likely only be able to modify transactions in very recent history – in the past few blocks of the blockchain. This is why For this reason, transactions further back in the blockchain are considered to be more secure.

Committing Transactions

Because of the chain structure, it requires more work to modify older transactions (more blocks = more proof of work computations). Modifying only the most recent block is not hugely challenging. Hence, the further back a transaction is in the blockchain, the less likely it is that anyone can amass the computing power to change it and create a competing blockchain.

A transaction is considered confirmed after some number, N, additional blocks are added to the chain. The value of N is up to the party receiving the transaction - a level of comfort. The higher the number, the deeper the transaction is in the blockchain and the harder it is to alter. Bitcoin recommends N=1 for low-value transactions (payments under $1,000; this enables them to be confirmed quickly), N=3 for deposits and mid-value transactions, and N=6 for large payments (e.g., $10k…$1M). Even larger values of N could be used for extremely large payments.


Why would servers spend a huge amount of computation, which translates to huge investments in computing power and electricity, just to find a value that produces a hash with a certain property? To provide an incentive, the system rewards the first server (the miner) that advertises a successful Proof of Work number by depositing a certain number of Bitcoins into their account. To avoid rewarding false blockchains as well as to encourage continued mining efforts, the miner is rewarded only after 99 additional blocks have been added to the ledger.

The reward for computing a proof of work has been designed to decrease over time:

  • 50 bitcoins for the first 4 years since 2008
  • 25 bitcoins from 2012–2015
  • 12.5 bitcoins from block #420,000 July 9, 2016 – 2019
  • 6.25 bitcoins at block #630,000 – around May 24, 2020

Eventually, the reward will reach zero and there will be a maximum of around 21 million bitcoins in circulation. However, recall that each transaction has a fee associated with it. Whoever solves the puzzle first and gets a confirmed block into the blockchain will also reap the sum of all the transaction fees in that block.


Bitcoin has been designed to operate as a large-scale, global, fully decentralized network. Anybody can download the software and operate a bitcoin node. All you need is sufficient storage to store the blockchain. There are currently over 9,000 reachable full nodes spread across 99 countries. It is estimated that there are over 100,000 total nodes, including those that are be running old versions of software or are not always reachable. In this sense, Bitcoin is truly decentralized. Note that there are different types of nodes. The nodes we discussed serve as full nodes. They maintain an entire copy of the blockchain and accept transactions. Light nodes are similar but store only a part of the blockchain, talking to a full node parent if they need to access other blocks.

Not everyone who operates a bitcoin node does mining (proof of work computation). Mining is incredibly time energy intensive. To make money on mining, one needs to buy dedicated ASIC mining hardware that is highly optimized to compute SHA–256 hashes. Conventional computers will cost more in energy than they will earn in bitcoin rewards. Because of this, mining tends to be concentrated among a far smaller number of players. It is not as decentralized as much as one would like.

Bitcoin software is open source but there is only a small set of trusted developers. The software effort is inspectable but not really decentralized. Bugs have been fixed but many nodes still run old and buggy versions. Bitcoin transactions cannot be undone even if they were created by buggy nodes or via compromised keys.

  1. SHA–256 is the SHA–2 family of hash functions that produces a 256-bit output. The SHA–2 family also includes HA–224, SHA–256, SHA–384, and SHA–512.  ↩

Network Security

The Internet was designed to support the interconnection of multiple networks, each of which may use different underlying networking hardware and protocols. The Internet Protocol, IP, is a logical network built on top of these physical networks. IP assumes that the underlying networks do not provide reliable communication. It is up to higher layers of the IP software stack (either TCP or thee application) to to detect lost packets. Individual networks under IP are connected by routers, which are computing elements that are each connected to multiple networks. They receive packets on one network and forward them onto another network to get them to their destination. A packet from your computer will often flow through multiple networks and multiple routers that you know nothing about on its way to its destination. This poses security risks since you do not know of the trustworthiness of the routers and networks.

Networking protocol stacks are usually described using the OSI layered model. For IP, the layers are:

  1. Physical. Represents the actual hardware.

  2. Data Link. The protocol for the local network, typically Ethernet (802.1) or Wi-Fi (802.11). Ethernet and Wi-Fi use the same addressing scheme and were designed to be bridged together to form a single local area network.

  3. Network. The protocol for creating a single logical network and routing packets across physical networks. The Internet Protocol (IP) is responsible for this.

  4. Transport. The transport layer is responsible for creating logical software endpoints (ports) so that one application can send a stream of data to another via an operating system’s sockets interface. TCP uses sequence numbers, acknowledgement numbers, and retransmission to provide applications with a reliable, connection-oriented, bidirectional communication channel. UDP does not provide reliability and simply sends a packet to a given destination host and port.

Higher layers of the protocol stack are handled by applications and the libraries they use.

Data link layer

In an Ethernet network, the data link layer is handled by Ethernet transceivers and Ethernet switches. Security was not a consideration in the design of this layer and several fundamental attacks exist at this layer. Wi-Fi also operates at the data link layer and added encryption on wireless data between the device and access point. Note that the encryption is not end-to-end, between hosts, but ends at the access point.

Switch CAM table overflow

Sniff all data on the local area network (LAN).

Ethernet frames are delivered based on their 48-bit MAC[1] address. IP address are meaningless to ethernet transceivers and to switches since IP is handled at higher levels of the network stack. Ethernet was originally designed as a bus-based shared network; all devices on the LAN shared the same wire. Any system could see all the traffic on the Ethernet. This resulted in increased congestion as more hosts were added to the local network.

Ethernet switches alleviated this problem by using a dedicated cable between each host and the switch. The switch routes an ethernet frame only to the Ethernet port (the connector on the switch) that is connected to the system that contains the desired destination address.

Unlike routers, switches are not programmed with routes. Instead, they learn which computers are on which switch ports by looking at the source MAC addresses of incoming ethernet frames. An incoming frame indicates that the system with that source address is connected to that switch port.

To implement this, a switch contains a switch table (a MAC address table). This table contains entries for known MAC addresses and their interface. The switch then uses forwarding and filtering:

When a frame arrives for some destination address D, the switch looks up D in the switch table to find the interface. If D is in the table and on a different port than that of the incoming frame, the switch forwards the frame to that interface, queueing it if necessary.

If D is not found in the table, then the switch assumes it has not yet learned what port that address is associated with, so it forwards the frame to ALL interfaces.

This procedure makes the switch self-learning: the switch table is empty initially and gets populated as the switch inspects source addresses.

A switch has to support extremely rapid lookups in the switch table. For this reason, the table is implemented using content addressable memory (CAM, also known as associative memory). CAM is expensive and switch tables are fixed-size and not huge. The switch will delete less-frequently used entries if it needs to make room for new ones.

The CAM table overflow attack exploits the limited size of this CAM-based switch table. The attacker sends bogus Ethernet frames with random source MAC addresses. Each newly-received address will displace an entry in the switch table, eventually filling up the table. With the CAM table full, legitimate traffic will be broadcast to all links A host on any port can now see all traffic. The CAM table overflow attack turns a switch into a hub.

Countermeasures for CAM table attacks require the use of managed switches that support port security. These switches allow you to limit the number of addresses the table will hold for each switch port.

VLAN hopping (switch spoofing)

Sniff all data from connected virtual local area networks.

One use of local area networks is to isolate broadcast traffic from other groups of systems. Related users can all be placed on a single LAN. However, users may be relocated within an office office and switches may be used inefficiently. Virtual Local Area Networks (VLANs) create multiple logical LANs over a single physical switch infrastructure. The network administrator can assign each port on a switch to a specific VLAN. Each VLAN is a separate broadcast domain so that each VLAN acts like a truly independent local area network. Users belonging to one VLAN will not see any traffic from the other; it has to be routed through an IP router.

Switches may be extended by cascading them with other switches: an ethernet cable from one switch simply connects to another switch. With VLANs, the connection between switches forms a VLAN trunk and carries traffic from all VLANs to the other switch. To support this behavior, an extended Ethernet frame format was created for theEthernet frames sent on this link since each frame now needs to identify the VLAN from which it originated.

A VLAN hopping attack employs switch spoofing: an attacker’s computer identifies itself as a switch with a trunk connection. It then receives traffic on all VLANs.

Defending against this attack requires a managed switch where an administrator can disable unused ports and associate them with some unused VLAN. Disable auto-trunking also needs to be disabled so that each port cannot become a trunk. Instead, trunk ports need to be configured explicitly.

ARP cache poisoning

Redirect IP packets by changing the IP address to MAC address mapping.

Recall that IP is a logical network that sits on top of physical networks. If we are on an Ethernet network and need to send an IP datagram, that IP datagram needs to be encapsulated in an Ethernet frame. The Ethernet frame needs to contain a destination MAC address that corresponds to the destination machine (or router, if the destination address is on a different LAN). For an operating system to send an IP packet, therefore, it needs to figure out what MAC address corresponds to a given IP address.

There is no relationship between an IP and Ethernet MAC address. To find the MAC address when given an IP address, a system uses the Address Resolution Protocol, ARP. The sending computer creates an Ethernet frame that contains an ARP message with the IP address it wants to query. This ARP message is then broadcast: all network adapters on the LAN receive the message. If some system receives this message and sees that its IP address matches the address in the query, it sends back an ARP response. The response identifies the MAC address of the system that owns that IP address.

To avoid the overhead of doing this query each time the system needs to use the IP address, the operating system maintains an ARP cache that stores recently used addresses. Moreover, hosts cache any ARP replies they see, even if they did not originate them. This is done on the assumption that many systems use the same set of IP addresses and the overhead of making an ARP query is substantial. Along the same lines, a computer can send an ARP response even if nobody sent a request. This is called a gratuitious ARP and is often sent by computers when they start up as a way to give other systems on the LAN the IP:MAC address mapping without them having to ask for it at a later time.

Note that there is no way to authenticate that a response is legitimate. The asking host does not have any idea of what MAC address is associated with the IP address. Hence, it cannot tell whether a host that responds really has that IP address or is an imposter.

An ARP cache poisoning attack is one where an attacker creates fake ARP responses that contain the attacker’s MAC address and the target’s IP address. This will direct any traffic meant for the target to the attacker. It enables man-in-the-middle or denial of service attacks since the real host will not be receiving any IP traffic.

There are several defenses against ARP cache poisoning. One defense is to ignore replies that are not associated with requests. However, you need to hope that the reply you get is a legitimate one since an attacker may respond more quickly or perhaps launch a denial of service attack against the legitimate host and then respond.

Another defense is to give up on ARP broadcasts and simply use static ARP entries. This works but can be an administrative nightmare since someone will have to keep the list of IP and MAC address mappings and the addition of new machines to the environment.

Finally, one can enable something called Dynamic ARP Inspection. This essentially builds a local ARP table by using DHCP (Dynamic Host Configuration Protocol) Snooping data as well as static ARP entries. Any ARP responses will be validated against DHCP Snooping database information or static ARP entries. This assumes that the environment uses DHCP instead of fixed IP address assignments.

DHCP spoofing

Configure new devices on the LAN with your choice of DNS address, router address, etc.

When a computer joins a network, it needs to be configured for using the Internet Protocol (IP) on that network. This can be done automatically via DHCP, the Dynamic Host Configuration Protocol. It is used in practically every LAN environment and is particularly useful where computers (including phones) join and leave the network regularly, such as Wi-Fi hotspots.

A computer that joins a new network broadcasts a DHCP Discover message. A DHCP server on the network picks up this request and sends back a response that contains configuration information for this new computer on the network:

  • IP address – the address given to the system
  • Subnet mask – which bits of the IP address identify the local area network
  • Default router – gateway to which all non-local datagrams will be routed
  • DNS servers – servers that tell you the IP address for a domain name
  • Lease time – how long the configuration is valid

As with ARP, we have the problem that a computer does not know where to go to for this information and has to rely on a broadcast query, hoping that it gets a legitimate response.

With DHCP Spoofing, any system can pretend to be a DHCP server and spoof responses that would normally be sent by a valid DHCP server. This imposter can provide the new system with a legitimate IP address but with false addresses for the gateway (default router) and DNS servers. The result is that the imposter can field DNS requests, which convert domain names to IP addresses and can also redirect any traffic that leaves the local area network from the new machine.

As with ARP cache poisoning, the attacker may launch a denial of service attack against the legitimate DHCP server to keep it from responding or at least delay its responses. If the legitimate server sends its response after the imposter, the new host will simply ignore the response.

There aren’t many defenses against DHCP spoofing. Some switches (such as those by Cisco and Juniper) support DHCP snooping. This allows an administrator to configure specific switch ports as “trusted” or “untrusted." Only specific machines, those on trusted ports, will be permitted to send DHCP responses. Any other DHCP responses will be dropped. The switch will also use DHCP data to track client behavior to ensure that hosts use only the IP address assigned to them and that hosts do not generate fake ARP responses

Network (IP) layer

The Internet Protocol (IP) layer is responsible for getting datagrams (packets) to their destination. It does not provide any guarantees on message ordering or reliable delivery. Datagrams may take different routes through the network and may be dropped by queue overflows in routers.

Source IP address authentication

Anyone can impersonate an IP datagram.

One fundamental problem with IP communication is that there is absolutely no source IP address authentication. Clients are expected to use their own source IP address but anybody can override this if they have administrative privileges on their system by using a raw sockets interface.

This enables one to forge messages to appear that they come from another system. Any software that authenticates requests based on their IP addresses will be at risk.

Anonymous denial of service

The ability to set an arbitrary source address in an IP datagram can be used for anonymous denial of service attacks. If a system sends a datagram that generates an error, the error will be sent back to the source address that was forged in the query. For example, a datagram sent with a small time-to-live, or TTL, value will cause a router that is hit when the TTL reaches zero to respond back with an ICMP (Internet Control Message Protocol) Time to Live exceeded message. Error responses will be sent to the forged source IP address and it is possible to send a vast number of such messages from many machines (by assembling a botnet) across many networks, causing the errors to all target a single system.


Routers are nothing more than computers with multiple network links and often with special purpose hardware to facilitate the rapid movement of packets across interfaces. They run operating systems and have user interfaces for administration. As with many other devices that people don’t treat as “real” computers, there is a danger that they routers will have simple or even default passwords. Moreover, owners of routers may not be nearly as diligent in keeping the operating system and other software updated as they are with their computers.

Routers can be subject to some of the same attacks as computers. Denial of service (DoS) attacks can keep the router from doing its job. One way this is done is by sending a flood of ICMP datagrams. The Internet Control Message Protocol is typically used to send routing error messages and updates and a huge volume of these can overwhelm a router. Routers may also have input validation bugs and not handle certain improper datagrams correctly.

Route table poisoning is the modification of the router’s routing table either by breaking into a router or by sending route update datagrams over an unauthenticated protocol.

Transport layer (UDP, TCP)

UDP and TCP are transport layer protocols that allow applications to establish communication channels with each other. Each endpoint of such a channel is identified by a port number (a 16-bit integer that has nothing to do with Ethernet switch ports). The port number allows the operating system to direct traffic to the proper socket. Hence, both TCP and UDP packets contain not only source and destination addresses but also source and destination ports.

UDP, the User Datagram Protocol, is stateless, connectionless, and unreliable.

As we saw with IP source address forgery, anybody can send UDP messages with forged source IP addresses.

TCP, the Transmission Control Protocol, is stateful, connection-oriented, and reliable. Every packet contains a sequence number (byte offset) and the operating system assembles received packets into their correct order. The receiver also sends acknowledgements so that any missing packets are retransmitted.

To handle in-order, reliable communication, TCP needs to establish state at both endpoints. It does this through a connection setup process that comprises a three-way handshake.

  1. SYN: Client sends a SYN segment The client selects a random initial sequence number (client_isn).

  2. SYN/ACK: Server sends a SYN/ACK The server receives the SYN segment and knows that a client wants to connect to it. It allocates memory to store connection state and to hold out-of-order segments. The server generates an initial sequence number (server_isn) for its side of the data stream. This is also a random number. The response also contains an acknowledgement with the value client_isn+1.

  3. ACK: Client sends a final acknowledgement The client acknowledges receipt of the SYN/ACK message by sending a final ACK message that contains an acknowledgement number of server_isn+1.

Note that the initial sequence numbers are random rather than starting at zero as one might expect. There are two reasons for this.

The primary reason is that message delivery times on an IP network are unpredictable and it is possible that a recently-closed connection may receive delayed messages, confusing the server on the state of that connection.

The security-sensitive reason is that if sequence numbers were predictable then it would be quite easy to launch a sequence number prediction attack where an attacker would be able to guess at likely sequence numbers on a connection and send masqueraded packets that will appear to be part of the data stream. Random sequence numbers do not make the problem go away but make it more challenging to launch the attack, particularly if the attacker does not have the ability to see traffic on the network.

SYN flooding

In the second step of the three-way handshake, the server is informed that a client would like to connect and allocates memory to manage this connection. Given that kernel memory is a finite resource, the operating system will allocate only a finite number of TCP buffers in its TCP queue. After that, it will refuse to accept any new connections.

In the SYN flooding attack, the attacker sends a large number of SYN segments to the target. These SYN messages contain a forged source address of an unreachable host, so the target’s SYN/ACK responses never get delivered anywhere. The handshake is never completed but the operating system has allocated resources for this connection. Depending on the operating system, it might be a minute or much longer before it times out on waiting for a response and cleans up these pending connections. Meanwhile, all TCP buffers have been allocated and the operating system refuses to accept any more TCP connections, even if they are from a legitimate source.

SYN flooding attacks cannot be prevented completely. One way of lessening their impact is the use of SYN cookies. With SYN cookies, the server does not allocate buffers & TCP state when a SYN segment is received. It responds with a SYN/ACK and creates an initial sequence number that is a hash of several known values:

hash(src_addr, dest_addr, src_port, dest_port, SECRET)

The “SECRET” is not shared with anyone; it is local to the operating system. When (if) the final ACK comes back from the client, the server needs to validate the acknowledgement number. Normally this requires comparing the number to the stored server initial sequence number plus 1. We did not allocate space to store this value but we can recompute the number by re-generating the hash, adding one, and comparing it to the acknowledgement number in the message. If it is valid, the kernel believes it was not the victim of a SYN flooding attack and allocate resources necessary for managing the connection.

TCP Reset

A somewhat simple attack is to send a RESET (RST) segment to an open TCP socket. If the server sequence number is correct then the connection will close. Hence, the tricky part is getting the correct sequence number to make it look like the RESET is part of the genuine message stream.

Sequence numbers are 32 bit values. The chance of successfully picking the correct sequence number is tiny: 1 in 232, or approximately one in four billion. However, many systems will accept a large range of sequence numbers that are approximately in the correct range to account for the fact that packets may arrive out of orders so they shouldn’t necessarily be rejected just because the sequence number is not exactly correct. This can reduce the search space tremendously and an attacker can send a flood of RST packets with varying sequence numbers and a forged source address until the connection is broken.

Routing protocols

The Internet was designed to connect multiple independently-managed networks, each of which may use different hardware. Routers connect local area networks as well as wide area networks together. A collection of consecutive IP addresses (most significant bits, called prefixes) as well as the underlying routers and network infrastructure, all managed as one administrative entity, is called an Autonomous System (AS). For example, the part of the Internet managed by Comcast constitutes an autonomous system (Comcast actually has 42 in different regions). The networks managed by Verizon happen to constitute a few autonomous systems as well. For purposes of our discussion, think of an AS ISPs or large data centers such as Google or Amazon (BTW, Rutgers is an Autonomous System: AS46).

The routers within an autonomous system need to share routing information so that those routers can route packets efficiently toward their destination. An Interior Gateway Protocol is used within an autonomous system. The most common is OSPF, Open Shortest Path First. While security issues exist within autonomous system, we will turn our attention to the sharing of information between autonomous systems.

Routers that are connected to routers in other ASes use an Exterior Gateway Protocol (EGP) called the Border Gateway Protocol, or BGP. With BGP, each autonomous system exchanges routing and reachability information with the autonomous systems with which it connects. For example, Comcast can tell Verizon what parts of the Internet it can reach. BGP uses a distance vector routing algorithm to enable the routers to determine the most efficient path to use to send packets that are destined for other networks. Unless an administrator explicitly configures a route, BGP will pick the shortest route.

BGP Hijacking

So what are the security problems with BGP? Edge routers use BGP to send route advertisements to routers they are connected to on neighboring autonomous systems. An advertisement is a list of IP address prefixes the AS can reach (shorter prefixes mean a bigger range of addresses) and the distance (number of hops) to each group of systems.

These are TCP messages with no authentication, integrity checks, or encryption. With BGP hijacking, a malicious party that has access to the network link or a connected router can inject advertisements for arbitrary routes. This information will propagate throughout the Internet and can cause routers throughout the Internet to send IP datagrams to the attacker, with the belief that is the shortest path to the destination.

A BGP attack can be used for eavesdropping (direct network traffic to a specific network by telling everyone that you’re offering a really short path) or a denial of service (DoS) attack (make parts of the network unreachable by redirecting traffic and then dropping it. There are currently close to 33,000 autonomous systems and most have multiple administrators. We live in the hope that none are malicious and that all routers are properly configured and properly secured.

It is difficult to change BGP since tens of thousands of independent entities use it worldwide. Two partial solutions to this problem emerged. the Resource Public Key Infrastructure (RPKI) framework simply has each AS get an X.509 digital certificate from a trusted entity (the Regional Internet Registry). Each AS signs its list of route advertisements with its private key and any other AS can validate that list of advertisements using the AS’s certificate.

A related solution is BGPsec, which is still a draft standard. Instead of signing an individual AS’s routes, every BGP message between ASes is signed.

Both solutions require every single AS to employ this solution. If some AS is willing to accept untrusted route advertisements but will relay them to other ASes as signed messages then integrity is meaningless. Moreover, most BGP hijacking incidents took place because legitimate system administrators misconfigured route advertisements either accidentally or on purpose. They were not the actions of attackers that hacked into a router.

A high profile BGP attack occurred against YouTube in 2008. Pakistan Telecom received a censorship order from the telecommunications ministry to block YouTube traffic to the country. The company sent spoofed BGP messages claiming to be the best route for the range of IP addresses used by YouTube. It used a longer address prefix than the one advertised by YouTube (longer prefix = fewer addresses). Because the longer prefix was deemed to be more specific, BGP gave it a higher priority. Within minutes, routers worldwide were directing their YouTube requests to Pakistan Telecom, which would simply drop them.

Domain Name System (DNS)

The Domain Name System (DNS) is a Hierarchical service that maps Internet domain names to IP addresses. A user’s computer runs the DNS protocol via a program known as a DNS stub resolver. It first checks a local file for specific preconfigured name-to-address mappings. Then it checks its cache of previously-found mappings. Finally, it contacts an external DNS resolver, which is usually located at the ISP or is run as a public service, such as Google Public DNS or OpenDNS.

We trust that the name-to-address mapping is legitimate. Web browsers, for instance, rely on this to enforce their same-origin policy. However, DNS queries and responses are sent using UDP with no authentication or integrity checks. The only check is that each DNS query contains a Query ID (QID). A DNS response must have a matching QID so that the client can match it to the query. These responses can be intercepted and modified or just forged. Malicious responses can return a different IP address that will direct IP traffic to different hosts

A solution called DNSsec has been proposed. It is a secure extension to the DNS protocol that provide authenticated requests & responses. However, few sites support it.

Pharming attack

A pharming attack is an attack on the configuration information maintained by a DNS server –either modifying the information used by the local DNS resolver or modifying that of a remote DNS server. By changing the name to IP address mapping, an attacker can cause software to send packets to the wrong system.

The most direct form of a pharming attack is to modify the local hosts file. This is the file (/etc/hosts on Linux, BSD, and macOS systems; c:\Windows\System32\Drivers\etc\hosts on Windows) that contains mappings between domain names and IP addresses. If an entry is found here, the system will not bother checking a remote DNS server.

Alternatively, malware may modify the DNS server settings on a system so that it would contact an attacker’s DNS server, which can provide the wrong IP address for certain domain names.

DNS cache poisoning (DNS spoofing attack)

DNS queries first check the local host’s DNS cache to see if the results of a past query have been cached. This yields a huge improvement in performance since a network query can be avoided. If the cached name-to-address mapping is invalid, then the wrong IP address is returned. Modifying this cached mapping is called DNS cache poisoning, also known as DNS spoofing. In the general case, DNS cache poisoning refers to any mechanism where an attacker is able to provide malicious responses to DNS queries. One way that DNS cache poisoning is done is via JavaScript on a malicious website.

The browser requests access to a legitimate site. For example, Because the system does not have the address of cached, it sends a DNS query to an external DNS resolver using the DNS protocol. The query includes a query ID (QID) x1. At the same time that the request for is made, JavaScript launches an attacker thread that sends 256 responses with random QIDs (y1. y2, …}. Each of these DNS responses tells the server that the DNS server for is at the attacker’s IP address. If one of these responses happens to have a matching QUD, the host system will accept it as truth that all future queries for anything at should be directed to the name server for, which is run by the attacker. If the responses don’t work, the script can try again with a different sub-domain, It might take many minutes, but there is a high likelihood that the attack will eventually succeed.

There are two defenses against this attack but they both require non-standard actions that will need to be coded into the system. One is to randomize the source port number of the query. Since the attacker does not get to see the query, it will not know where to send the bogus responses. There are 216 (65,536) ports to try. The second defense is to force all DNS queries to be issued twice. The attacker will have to guess the 32-bit query ID twice in a row and the chances of doing that successfully are infinitesimally small.

Summary: An attacker can run a local DNS server that will attempt to provide spoofed DNS responses to legitimate domain name lookup requests. If the query ID numbers of the fake response match those of a legitimate query (trial and error), the victim will get the wrong IP address, which will redirect legitimate requests to an attacker’s service.

DNS Rebinding

Web application security is based on the same-origin policy. Browser scripts can access cookies and other data on pages only if they share the same origin, which is the combination of URI (protocol), host name, and port number. The underlying assumption is that resolving a domain name takes you to the correct server.

The DNS rebinding attack allows JavaScript code on a malicious web page to access private IP addresses in the victim’s network. The attacker configures the DNS entry for a domain name to have a short time to live (TTL). When the victim’s browser visits the page and downloads JavaScript from that site, that JavaScript code is allowed to interact with the domain thanks to the same origin policy. However, right after downloading the script, the attacker can reconfigure the DNS server so that future queries will return an address in the internal network. The JavaScript code can then try to request resources from that system since, as far as the browser is concerned, the origin is the same because the name of the domain has not changed.

Summary: short time-to-live values in DNS allow an attacker to change the address of a domain name so that scripts from that domain can now access resources inside the private network.

DNS amplification attack

We have seen how source address spoofing can be used to carry out an anonymous denial of service (DoS) attack. Ideally, to overload a system, the attacker would like to send a small amount of data that would create a large response that would be sent to the target. This is called amplification. An obvious method would be to send a URL request over HTTP that will cause the server to respond with a large page reply. However, this does not work as HTTP uses TCP and the target would not have the TCP session established. DNS happens to be a UDP-based service. DNS amplification uses a collection of compromised systems that will carry out the attack (a botnet). Each system will send a small DNS query using a forged source address. These systems can contact their own ISP’s DNS servers since the goal is not to overwhelm any DNS server. The query asks for “ANY”, a request for all known information about the DNS zone. Each such query will cause the DNS server to send back a far larger reply.

  1. MAC = Media Access Control_ and refers to the hardware address of the Ethernet device.  ↩

Virtual Private Networks (VPNs)

Suppose we want to connect two local area networks in geoagraphically-separated areas together. For instance, we might have a company with locations in New York and in San Francisco. One way of doing this is to get a dedicated private network link between the two points. Many phone companies and network providers offer a private line service but it can be extremely expensive and is not feasible in many circumstances, such as if one of your endpoints is in the Amazon cloud rather than at your physical location.o

Instead, we can use the public Internet to communicate between the two locations. Our two subnets will often have private IP addresses (such as 192.168.x.x), which are not routable over the public internet. To overcome this, we can use a technique called tunneling. Tunneling is the process of encapsulating an IP datagram within another IP datagram. An IP datagram in one subnet (a local area network in one of our locations) that is destined to an address on the remote subnet will be directed to a gateway router. There, it will be treated as payload (data) and packaged within an IP datagram whose destination is the IP address of the gateway router at our other location. This datagram is now routed over the public Internet. The source and destination addresses of this outer datagram are the gateway routers at both sides.

IP networking relies on store-and-forward routing. Network data passes through routers, which are often unknown and may be untrustworthy. We have seen that routes may be altered to pass data through malicious hosts or directed to malicious hosts that accept packets destined for the legitimate host. Even with TCP connections, data can be modified or redirected and sessions can be hijacked. We also saw that there is no source authentication on IP packets: a host can place any address it would like as the source. What we would like is the ability to communicate securely, with the assurance that our traffic cannot be modified and that we are truly communicating with the correct endpoints.

Virtual private networks (VPNs) take the concept of tunneling and safeguard the encapsulated data by adding a MAC (message authentication code) so that we can detect if the data is modified and encrytion so that others cannot read the data. This way, VPNs allow separate local area networks to communicate securely over the public Internet.

IPsec is a popular VPN protocol that is really a set of two protocols.

  1. The IPsec Authentication Header (AH) is an IPsec protocol that does not encrypt data but simply affixes a message authentication code to each datagram. It ensures the integrity of the each datagram.

  2. The Encapsulating Security Payload (ESP), which provides integrity checks and also encryts the payload, ensuring secrecy.

IPsec can operate in tunnel mode or transport mode. In both cases, IPsec communciates at the same layer as the Internet Protocol. That is, it is not used by applications to communciate with one another but rather by routers or operating systems to direct an entire stream of traffic.

Tunnel mode VPNs provide network-to-network or host-to-network communication. The communication takes place between either two VPN-aware gateway routers or from a host to a VPN-aware router. The entire datagram is treated like payload and encapsulated within a datagram that is sent over the Internet to the remote gateway. That gateway receives this VPN datagram, extracts the payload, and routes it on the internal network where it makes its way to the target system.

Transport mode VPNs provide communication between two hosts. In this case, the IP header is not modified but data is protected. Note that, unlike transport layer security (TLS), which we examine later, setting up a transport mode VPN will protect all data streams between the two hosts. Applications are unaware that a VPN is in place.

Authentication Header (AH)

The Authentication Header (AH) protocol guarantees the integrity and authenticity of IP packets. AH adds an extra chunk of data (the authentication header) with a MAC to the IP datagram. Anyone with knowledge of the key can create the MAC or verify it. This ensures message integrity since an attacker will not be able to modify message contents and have the HMAC remain valid. Attackers will also not be able to forge messages because they will not know the key needed to create a valid MAC. Every AH also has a sequence number that is incremented for each datagram that is transmitted, ensuring that messages are not inserted, deleted, or replayed.

Hence, IPsec AH protects messages from tampering, forged addresses, and replay attacks.

Encapsulating Security Payload (ESP)

The Encapsulating Security Payload (ESP) provides the same integrity assurance but also adds encryption to the payload to ensure confidentiality. Data is encrypted with a symmetric cipher (usually AES).

IPsec cryptographic algorithms


An IPsec session begins with authenticating the endpoints. IPsec supports the use of X.509 digital certificates or the use of pre-shared keys. Digital certificates contain the site’s public key and allow us to validate the identity of the certificate if we trust the issuer (the certification authority, or CA). We authenticate by proving that can take a nonce that the other side encrypted with our public key and decrypt it using our private key. A pre-shared key means that both sides configured a static shared secret key ahead of time. We prove that we have the key in a similar manner: one side creates a nonce and asks the other side to encrypt it and send the results. THen the other side does the same thing.

Key exchange

HMAC message authentication codes and encryption algorithms both require the use of secret keys. IPsec uses Diffie-Hellman to create random shared session keys. Diffie-Hellman makes it quick to generate a public-private key pair that is needed to derive a common key ao there is no dependence on long-term keys, assuring forward secrecy.


In IPsec ESP, the payload is encrypted using either AES-CBC or 3DES-CBC. CBC is cipher-block chaining, which has the property that the ciphertext of each datagram is dependent on all previous datagrams, ensuring that datagrams cannot be substituted from old messages.


IPsec uses HMAC, a form of a message authentication code that uses a cryptographic hash function and a shared secret key. It supports either SHA–1 or SHA–2 hash functions.

IPsec Authentication Header mode is rarely used since the overhead of encrypting data these days is quite low and ESP provides both encryption in addition to authentication and integrity.

Transport Layer Security (TLS)

Virtual Private Networks were designed to operate at the network layer. They were designed to connect networks together. Even with transport mode connectivity, they tunnel all IP traffic and do not differentiate one data stream from another. They do not solve the problem of an application needing authenticated, tamper-proof, and encrypted communications to another application.

Secure Sockets Layer (SSL) was created as a layer of software above TCP that provides authentication, integrity, and encrypted communication while preserving the abstraction of a sockets interface to applications. An application sets up an SSL session to a service. After that, it simply sends and receives data over a socket just like it would with the normal sockets-based API that operating systems provide. The programmer does not have to think about network security. As SSL evolved, it morphed into a new version called TLS, Transport Layer Security. While SSL is commonly used in conversation, all current implementations are TLS.

Any TCP-based application that may not have addressed network security can be security-enhanced by simply using TLS. For example, the standard email protocols, SMTP, POP, and IMAP, all have TLS-secured interfaces. Web browsers use HTTP, the Hypertext Transfer Protocol, and also support HTTPS, which is the exact same protocol but uses A TLS connection.

TLS has been designed to provide:

Data encryption
Symmetric cryptography is used to encrypt data.
Data integrity
Ensure that we can detect if data in transit has not been modified. TLS includes a MAC with transmitted data.
TLS provides mechanisms to authenticate the endpoints prior to sending data. Authentication is optional and can be unidirectional (the client may just authenticate the server), unidirectional (each side authenticates the other), or none (in which case we just exchange keys but do not validate identities).
Key exchange
After authentication, TLS performs a key exchange so that both sides can obtain random shared session keys. TLS creates separate keys for each direction of communication (encryption keys for client-to-server and server-to-client data streams) and separate keys for data integrity (MAC keys for client-to-server and server-to-client streams).
Interoperability & evolution
TLS was designed to support many different key exchange, encryption, integrity, & authentication protocols. The start of each session enables the protocol to negotiate what protocols to use for the session.

TLS sub-protocols

These features are implemented in two sub-protocols within TLS:

1. Authentication and key exchange
Authentication uses public key cryptography with X.509 certificates to authenticate a system. Both the client and server can present their X.509 digital certificates. TLS validates the signature of the certificate and uses nonce-based public key authentication to validate that each party has the corresponding private key.

Key exchange supports several options. Ephemeral Diffie-Hellman key exchange is the most common since it supports the efficient generation of shared keys and there is no long-term key storage, providing forward secrecy. TLS can accommodate other key exchange techniques as well, including DIffie-Hellman with static keys, RSA public key-based key exchange, and pre-shared static keys.

2. Communication
Data encryption uses symmetric cryptography and supports a variety of algorithms, including AES GCM, AES CBC, ARIA (GCM/CBC), and ChaCha20. AES is the Advanced Encryption Standard. CBC is cipher block chaining, which makes each ciphertext block a function of the preceding one. GCM is Galois/Counter Mode, an alternative to CBC that encrypts an incrementing counter and exclusive-ors it with a block of plaintext. ARIA is a South Korean standard encryption algorithm that is similar to AES. ChaCha20 is an encryption algorithm that is generally more efficient than AES on low-end processors.

Data integrity is provided by a message authentication code (MAC) that is attached to each block of data. TLS allows the choice of several, including HMAC-MD5, HMAC-SHA1, HMAC-SHA256/384, and Poly1305.

TLS protocol

The steps in a TLS session are:

  1. The client connects to the server and sends information about its version and the ciphers it supports. It is up to the client and server to negotiate for the ones they will use.

  2. The server responds with its X.509 certificate, the protocol version and ciphers it is willing to use, and, possibly, a request for a client certificate.

  3. The client validates the integrity of the server’s certificate by validating the signature of the certificate. If the client trusts the server’s CA, the client can validate the authenticity of the server. Otherwise, the client can simply validate that the server owns the corresponding private key.

  4. The client generates random session keys and sends them to the server via a Diffie-Hellman key exchange.

  5. Optionally, the client responds with its certificate.

  6. If the client responds with its certificate, the server validates the certificate and the client.

  7. The client and server can now exchange data. Each message is first compressed and then encrypted with a symmetric algorithm. An HMAC (hash MAC) for the message is also sent to allow the other side to validate message integrity.

TLS is widely used and generally considered secure if strong cryptography is used. Its biggest problem was a man-in-the-middle attack where the attacker would be able to send a message to renegotiate the protocol and choose one that disables encryption. Another attack was a denial-of-service attack where an attacker initiates a TLS connection but keeps requesting a regeneration of the encryption key, using up the server’s resources in the process. Both of these have been fixed.

Unidirectional vs. mutual authentication

TLS supports mutual authentication. To implement authentication, the server sends the client its X.509 digital certificate so the client can authenticate the server by having the server prove it knows the private key. TLS also supports mutual authentication: the client will send its X.509 certificate to the server so the server can authenticate the client.

One notable aspect of TLS sessions is that, in most cases, only the server will present a certificate. Hence, the server will not authenticate or know the identity of the client. Client-side certificates have been problematic. Generating keys and obtaining trustworthy certificates is not an easy process for users. A user would have to install the certificate and the corresponding private key on every system she uses. This would not be practical for shared systems. Moreover, if a client did have a certificate, any server can request it during TLS connection setup, thus obtaining the identity of the client. This could be desirable for legitimate banking transactions but not for sites where a user would like to remain anonymous. We generally rely on other authentication mechanisms, such as the password authentication protocol, but carry them out over TLS’s secure communication channel.


A firewall protects the junction between an untrusted network (e.g., external Internet) and a trusted network (e.g., internal network). Two approaches to firewalling are packet filtering and proxies. A packet filter, or screening router, determines not only the route of a packet but whether the packet should be dropped based on contents in the IP header, TCP/UDP header, and the interface on which the packet arrived. It is usually implemented inside a border router, also known as the gateway router that manages the flow of traffic between the ISP and internal network. The basic principle of firewalls is to never have a direct inbound connection from the originating host from the Internet to an internal host; all traffic must flow through a firewall and be inspected.

The packet filter evaluates a set of rules to determine whether to drop or accept a packet. This set of rules forms an access control list, often called a chain. Strong security follows a default deny model, where packets are dropped unless some rule in the chain specifically permits them.

First-generation packet filters implemented stateless inspection. A packet is examined on its own with no context based on previously-seen packets.

Second-generation packet filters track TCP connections and other information from previous connections. These stateful packet inspection (SPI) firewalls allow the router to keep track of outstanding TCP connections. For instance:

  • They can block TCP data traffic if a connection setup did not take place to avoid sequence number prediction attacks.

  • They can track that a connection has been established by a client to a remote server and allow return traffic to that client (which is essential for any interaction by someone inside the network with external services).

  • They can track connectionless UDP and ICMP messages and allow responses to be sent back to clients in the internal network. DNS queries and pings (ICMP echo-reply messages) are examples of these.

  • They also and understand the relationship between packets. For example, when a client establishes an FTP (file transfer protocol) connection to a server on port 21, the server establishes a connection back to the client on a different port when it needs to send data.

Packet filters traditionally do not look above the transport layer (UDP and TCP protocols and port numbers). Third-generation packet filters incorporate deep packet inspection (DPI), which allows a firewall to examine application data as well and make decisions based on its contents. Deep packet inspection can validate the protocol of an application as well as check for malicious content such as malformed URLs or other security attacks. DPI is generally considered to be part of Intrusion Prevention Systems. Examples are detecting application-layer protocols such as HTTP and then applying application-specific filters, such as checking for suspicious URLs or disallowing the download of certain ActiveX or Java applets.

Deep Packet Inspection (DPI) firewalls evolved to Deep Content Inspection (DCI) firewalls. These use the same concept but are capable of buffering large chunks of data from multiple packets that contain an entire object and acting on it, such as unpacking base64-encoded content from web and email messages and performing a signature analysis for malware.

Application proxies

An application proxy is software that presents the same protocol to the outside network as the application for which it is a proxy. For example, a mail server proxy will listen on port 25 and understand SMTP, the Simple Mail Transfer Protocol. The primary job of the proxy is to validate the application protocol and thus guard against protocol attacks (extra commands, bad arguments) that may exploit bugs in the service. Valid requests are then regenerated by the proxy to the real application that is running on another server and is not accessible from the outside network.

Application proxies are usually installed on dual-homed hosts. This is a term for a system that has two “homes”, or network interfaces: one for the external network and another for the internal network. Traffic does not pass between the two networks. The proxy is the only one that can communicate with the internal network. Unlike DPI, a proxy may modify the data stream, such as stripping headers or modifying machine names. It may also restructure the commands in the protocol used to communicate with the actual servers (that is, it does not have to relay everything that it receives).


A typical firewalled environment is a screened subnet architecture, with a separate subnet for systems that run externally-accessible services (such as web servers and mail servers) and another one for internal systems that do not offer services and should not be accessed from the outside. The subnet that contains externally-accessible services is called the DMZ (demilitarized zone). The DMZ contains all the hosts that may be offering services to the external network (usually the Internet). Machines on the internal network are not accessible from the Internet. All machines within an organization will be either in the DMZ or in the internal network.

Both subnets will be protected by screening routers. They will ensure that no packet from the outside network is permitted into the inside network. Logically, we can view our setup as containing two screening routers:

  1. The exterior router allows IP packets only to the machines/ports in the DMZ that are offering valid services. It would also reject any packets that are masqueraded to appear to come from the internal network.

  2. The interior router allows packets to only come from designated machines in the DMZ that need to access services in the internal network. Any packets not targeting the appropriate services in the internal network will be rejected. Both routers will generally allow traffic to flow from the internal network to the Internet, although an organization may block certain services (ports) or force users to use a proxy (for web access, for example).

Note that the two screening routers may be easily replaced with a single router since filtering rules can specify interfaces. Each rule can thus state whether an interface is the DMZ, internal network, or Internet (ISP).

Host-based firewalls

Firewalls generally intercept all packets entering or leaving a local area network. A host-based firewall, on the other hand, runs on a user’s computer. Unlike network-based firewalls, a host-based firewall can associate network traffic with individual applications. Its goal is to prevent malware from accessing the network. Only approved applications will be allowed to send or receive network data. Host-based firewalls are particularly useful in light of deperimiterization: the boundaries of external and internal networks are sometimes fuzzy as people connect their mobile devices to different networks and import data on flash drives. A concern with host-based firewalls is that if malware manages to get elevated privileges, it may be able to shut off the firewall or change its rules.

Intrusion detection/prevention systems

An enhancement to screening routers is the use of intrusion detection systems (IDS). Intrusion detection systems are often parts of DPI firewalls and try to identify malicious behavior. There are three forms of IDS:

  1. A protocol-based IDS validates specific network protocols for conformance. For example, it can implement a state machine to ensure that messages are sent in the proper sequence, that only valid commands are sent, and that replies match requests.

  2. A signature-based IDS is similar to a PC-based virus checker. It scans the bits of application data in incoming packets to try to discern if there is evidence of “bad data”, which may include malformed URLs, extra-long strings that may trigger buffer overflows, or bit patterns that match known viruses.

  3. An anomaly-based IDS looks for statistical aberrations in network activity. Instead of having predefined patterns, normal behavior is first measured and used as a baseline. An unexpected use of certain protocols, ports, or even amount of data sent to a specific service may trigger a warning.

Anomaly-based detection implies that we know normal behavior and flag any unusual activity as bad. This is difficult since it is hard to characterize what normal behavior is, particularly since normal behavior can change over time and may exhibit random network accesses (e.g., people web surfing to different places). Too many false positives will annoy administrators and lead them to disregard alarms.

A signature-based system employs misuse-based detection. It knows bad behavior: the rules that define invalid packets or invalid application layer data (e.g., ssh root login attempts). Anything else is considered good.

Intrusion Detection Systems (IDS) monitor traffic entering and leaving the network and report any discovered problems. Intrusion Prevention Systems (IPS) serve the same function but are positioned to sit between two networks like a firewall and can actively block traffic that is considered to be a threat or policy violation.

Type Description
Firewall (screening router) 1st generation packet filter that filters packets between networks. Blocks/accepts traffic based on IP addresses, ports, protocols
Stateful inspection firewall 2nd generation packet filter. Like a screening router but also takes into account TCP connection state and information from previous connections (e.g., related ports for TCP)
Deep Packet Inspection firewall 3rd generation packet filter. Examines application-layer protocols
Application proxy Gateway between two networks for a specific application. Prevents direct connections to the application from outside the network. Responsible for validating the protocol
IDS/IPS Can usually do what a stateful inspection firewall does + examine application-layer data for protocol attacks or malicious content
Host-based firewall Typically screening router with per-application awareness. Sometimes includes anti-virus software for application-layer signature checking
Host-based IPS Typically allows real-time blocking of remote hosts performing suspicious operations (port scanning, ssh logins)

Web security

When the web browser was first created, it was relatively simple: it parsed static content for display and presented it to the user. The content could contain links to other pages. As such, the browser was not an interesting security target. Any dynamic modification of pages was done on servers and all security attacks were focused on those servers. These attacks included malformed URLs, buffer overflows, root paths, and unicode attacks.

The situation is vastly different now. Browsers have become insanely complex:

  • Built-in JavaScript to execute arbitrary downloaded code

  • The Document Object Model (DOM), which allows JavaScript code to change the content and appearance of a web page.

  • XMLHttpRequest, which enables JavaScript to make HTTP requests back to the server and fetch content asynchronously.

  • WebSockets, which provide a more direct link between client and server without the need to send HTTP requests.

  • Multimedia support; HTML5 added direct support for <audio>, <video>~, and <track>~ tags, as well as MediaStream recording of both audio and video and even speech recognition and synthesis (with the Chrome browser, for now).

  • Access to on-device sensors, including geolocation and tilt

  • the NaCl framework on Chrome, providing the Ability to run native apps in a sandbox within the browser

The model evolved from simple page presentation to that of running an application. All these features provide a broader attack surface. The fact that many features are relatively new and more are being developed increases the likelihood of more bugs and therefore more security holes. Many browser features are complex and developers won’t always pay attention to every detail of the specs (see This leads to an environment where certain less-common uses of a feature may have bugs or security holes on certain browsers.

Multiple sources

Traditional software is installed as a single application. The application may use external libraries, but these are linked in by the author and tested. Web apps, on the other hand, dynamically load components from different places. These include fonts, images, scripts, and video as well as embedded iFrames that embed HTML documents within each other. The JavaScript code may issue XMLHttpRequests to yet additional sites.

One security concern is that of software stability. If you import JavaScript from several different places, will your page still display correctly and work properly in the future as those scripts are updated and web standards change? Do those scripts attempt to do anything malicious? Might they be modified by their author to do something malicious in the future?

Then there’s the question of how elements on a page should be allowed to interact. Can some analytics code access JavaScript variables that come from a script downloaded from on the same web page? The scripts came from different places the page author selected them for the page, so maybe it’s ok for them to interact. Can analytics scripts interact with event handlers? If the author wanted to measure mouse movements and keystrokes, perhaps it’s ok for a downloaded script to use the event handler. How about embedded frames? To the user, the content within a frame looks like it is part of the rest of the page. Should scripts work any differently?

Frames and iFrames

A browser window may contain a collection of documents from different sources. Each document is rendered inside a frame. In the most basic case, there is just one frame: the document window. A frame is a rigid division that is part of a frameset, a collection of frames. Frames are not officially supported in HTML, the latest version of HTML, but many browsers still support them. An iFrame is a floating inline frame that moves with the surrounding content. iFrames are supported. When we talk about frames, we will be talking about the frames created with an iFrame tag.

Frames are generally invisible to users and are used to delegate screen area to content from another source. A very basic goal of browser security is to isolate visits to separate pages in distinct windows or tabs. If you visit and in two separate tabs, the address bar will identify each of them and they will not share information. Alternatively, may have frames within it (e.g., to show ads from other sites, so may be a frame within Here, too, we would like the browser to provide isolation between and even though is not visible as a distinct site to the user.

Same-origin policy

The security model used by web browsers is the same-origin policy. A browser permits scripts in one page to access data in a second page only if both pages have the same origin. An origin is defined to be the URI scheme (http vs. https), the hostname, and the port. For example


have the same origin since they both use http, both use port 80 (the default http port since none is specified), and the same hostname ( If any of those components were different, the origin would not be the same. For instance, is not the same hostname as

Under the same-origin policy, each origin has access to common client-side resources that include:

  • Cookies: Key-value data that clients or servers can set. Cookies associated with the origin are sent with each http request.

  • JavaScript namespace: Any functions and variables defined or downloaded into a frame share that frame’s origin.

  • DOM tree: This is the JavaScript definition of the HTML structure of the page.

  • DOM storage: Local key-value storage.

Each frame gets the origin of its URL. Many pages will have just one frame: the browser window. Other pages may embed other frames. Each of those embedded frames will not have the origin of the outer frame but rather the URL of the frame contents. Any JavaScript code downloaded into a frame will execute with the authority of its frame’s origin. For instance, if loads JavaScript from, the script runs with the authority of Passive content, which is non-executable content such as CSS files and images, has no authority. This normally should not matter as passive content does not contain executable code but there have been attacks in the past that had code in passive content and made that passive content turn active.

Cross-origin content

As we saw, it is common for a page to load content from multiple origins. The same-origin policy states that JavaScript code from anywhere runs with the authority of the frame’s origin. Content from other origins is generally not readable or writable by JavaScript.

A frame can load images from other origins but cannot inspect that image. However, it can infer the size of the image by examining the changes to surrounding elements after it is rendered.

A frame may embed CSS (cascading stylesheets) from any origin but cannot inspect the CSS content. However, JavaScript in the frame can discover what the stylesheet does by creating new DOM nodes (e.g., a heading tag) and see how the styling changes.

A frame can load JavaScript, which executes with the authority of the frame’s origin. If the source is downloaded from another origin, it is executable but not readable. However, one can use JavaScript’s toString method to decompile the function and get a string representation of the function’s declaration.

All these restrictions are somewhat ineffective anyway since a curious user can download any of that content directly (e.g., via the curl command) and inspect it.

Cross-Origin Resource Sharing (CORS)

Even though content may be loaded from different origins, browsers restrict cross-origin HTTP requests that are initiated from scripts (e.g., via XMLHttpRequest or Fetch). This can be problematic at times since sites such as and are considered distinct origins, as are and

Cross-Origin Resource Sharing (CORS) was created to allow web servers to specify cross-domain access permission. This will allow scripts on a page to issue HTTP requests to approved sites. It also allows access to Web Fonts, inspectable images, and access to stylesheets. CORS is enabled by an HTTP header that states allowable origins. For example,


means that the URL will be treated as the same origin as the frame’s URL.


Cookies are name-value sets that are designed to maintain state between a web browser and a server. Cookies are sent to the server along with HTTP requests and servers may send back cookies with a response. Uses for cookies include storing a session ID that identifies your browsing session to the server (including a reference to your shopping cart or partially-completed form), storing shopping cart contents directly, or tracking which pages you visited on the site in the past (tracking cookies). Cookies are also used to store authentication information so you can be logged into a page automatically upon visiting it (authentication cookies).

Now the question is: which cookies should be sent to a server when a browser makes an HTTP request? Cookies don’t quite use the same concept of an origin. The scope of a cookie is defined by its domain and path. Unlike origins, the scheme (http or https) is ignored by default, as is the port. The path is the path under the root URL, which is ignored for determining origins but used with cookies. Unless otherwise defined by the server, the default domain and path are those of the frame that made the request.

A client cannot set cookies for a different domain. A server, however, can specify top-level or deeper domains. Setting a cookie for a domain will cause that cookie to be sent whenever or any domain under is accessed (e.g., For the cookie to be accepted by the browser, the domain must include the origin domain of the frame. For example, if you are on the page, your browser will accept a cookie for but will not accept a cookie for

Cookies often contain user names, complete authentication information, or shopping cart contents. If malicious code running on the web page could access those cookies, it could modify your cart, get your login credentials, or even modify cookies related to cloud-based services to have your documents or email get stored to a different account. This is a very real problem and two safeguards were put in place:

A server can tag a cookie with an HttpOnly flag. This will not allow scripts on the page to access the cookie, so it is useful to keep scripts from modifying or reading user identities or session state.

HTTP messages are sent via TCP. Nothing is encrypted. An attacker that has access to the data stream (e.g., a man in the middle or a packet sniffer) can freely read or even modify cookies. A Secure flag was added to cookies to specify that they can be sent only over an HTTPS connection.:

Set-Cookie: username=paul; path=/; HttpOnly; Secure

If a user is making requests via HTTP, Secure cookies will not be transmitted.

Cross-Site Request Forgery (XSRF)

Cross-site request forgery is an attack that sends unauthorized requests from a user that the web server trusts. Let’s consider an example. You previously logged into Netflix. Because of that, the Netflix server sent an authentication cookie to your browser; you will not have to log in the time you visit Now you go to another website that contains a malicious link or JavaScript code to access a URL. The URL is:

By hitting this link on this other website, the attacker added Plan 9 from Outer Space to your movie queue (this attack really worked with Netflix but has been fixed). This may be a minor annoyance but the same attack could create more malicious outcomes. If, instead of Netflix, the attack could take place against an e-commerce site that accepted your credentials but allows the attacker to add a different shipping address on the URL. More dangerously, a banking site may use your stored credentials and account number. Going to the malicious website may enable the attacker to request a funds transfer to another account:*amount=1000000&to_account=417824919

Note that the attack works because of how cookies work. You visited a random website but inadvertently requested another site. Your browser dutifully sends and HTTP GET request to that site with the URL specified in the link and also sends all the cookies for that site. The attacker never steals your cookies and does not intercept any traffic. The attack is simply the creation of a URL that makes it look like you requested some action.

There are several defenses against Cross-site request forgery:

  • The server can validate the Referer header on the request. This will tell it whether the request came via a link or directly from a user (or from a link on a trusted site).

  • The server can require some unique token to be present in the request. For instance, visiting might cause the Netflix server to return a token that will need to be passed to any successive URL. An attacker will not be able to create a static URL on her site that will contain this random token.

  • The interaction with the server can use HTTP POST requests instead GET requests, placing all parameters into the body of the request rather than in the URL. State information can be passed via hidden input fields instead of cookies.


Clickjacking is a deception attack where the attacker overlays an image to have the user think that he is clicking some legitimate link or image but is really requesting something else. For example, a site may present a “win a free iPad” image. However, malicious JavaScript in the page can place an invisible frame over this image that contains a link. Nothing is displayed to obstruct the “win a free iPad” image but when a user clicks on it, the link that is processed is the one in the invisible frame. This malicious link could download malware, change security settings for the Flash plug-in, or redirect the user to a page containing malware or a phishing attack.

A defense for clickjacking is to use defensive JavaScript in the legitimate code to check that the content is at the topmost layer:

window.self ==

If it isn’t then it means the content is obstructed, possibly by an invisible clickjacking attack. Another defense is to have the server send an X-Frame-Options HTTP header to instruct the browser to not allow framing from other domains.

Screen sharing

HTML5, the latest standard for HTML, added a screen-sharing API. Normally, no cross-origin communication is permitted between client and server. The screen-sharing API violates this. If a user grants screen-sharing permission to a frame, the frame can take a screenshot of the entire display (the entire monitor, all windows, and the browser). It can also get screenshots of pages hidden by tabs in a browser.

This is not a security hole and there are no exploits (yet) to enable screen sharing without the user’s explicit opt-in. However, it is a risk because the user might not be aware of the scope or duration of screen sharing. If you believe that you are sharing one browser window, you may be surprised to discover that the server was examining all your screen content.

Input sanitization

In the past we saw how user input that becomes a part of database queries or commands can alter those commands and, in many cases, enable an attacker to add arbitrary queries or commands. The same applies to URLs, HTML source, and JavaScript. Any user input needs to be parsed carefully before it can be made part of a URL, HTML content, or JavaScript. Consider a script that is generated with some in-line data that came from a malicious user:

<script> var x = "untrusted_data"; </script>

The malicious user might define that untrusted_data to be

Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="bye

The resulting script to set the variable x now becomes

<script> var x = "Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x=bye"; </script>

Cross-site scripting

Cross-site Scripting (XSS) is a code injection attack that allows the attacker to inject client-side scripts into web pages. It can be used to bypass the same-origin policy and other access controls. Cross-site scripting has been one of the most popular browser attacks.

The attack may be carried out in two ways: a URL that a user clicks on and gets back a page with the malicious code and by going to a page that contains user content that may include scripts.

In a Reflected XSS attack, all malicious content is in a page request, typically a link that an unsuspecting user will click on. The server will accept the request without sanitizing the user input and present a page in response. This page will include that original content. A common example is a search page that will display the search string before presenting the results (or a “not found” message). Another example is an invalid login request that will return with the name of the user and a “not found” message. Consider a case where the search string or the login name is not just a bunch of characters but text to a script. The server treats it as a string, does the query, cannot find the result, and sends back a page that contains that string, which is now processed as inline JavaScript code.<script>malicious_code(…) </script>

In a Persistent XSS attack, user input is stored at a site and later presented to other users. Consider online forums or comment sections for news postings and blogs. If you enter inline JavaScript as a comment, it will be placed into the page that the server constructs for any future people who view the article. The victim will not even have to click a link to run the malicious payload.

Cross-site scripting is a problem of input sanitization. Servers will need to parse input that is expected to be a string to ensure that it does not contain embedded HTML or JavaScript. The problem is more difficult with HTML because of its support for encoded characters. A parser will need to check not only for “script” but also for “%3cscript%3e”. As we saw earlier, there may be several acceptable Unicode encodings for the same character.

Cross-site scripting, by executing arbitrary JavaScript code can:

  • Access cookies related to that website
  • Hijack a session
  • Create arbitrary HTTP requests with arbitrary content via XMLHtttpRequest
  • Make arbitrary modifications to the HTML document by modifying the DOM
  • Install keyloggers
  • Download malware – or run JavaScript ransomware
  • Try phishing by manipulating the DOM and adding a fake login page

The main defense against cross-site scripting is to sanitize all input. Some web frameworks do this automatically. For instance, Django templates allow the author to specify where generated-content is inserted (e.g., <b> hello, {{name}} </b>) and performs the necessary sanitization to ensure it does not modify the HTML or add JavaScript.

Other defenses are:

  • Use a less-expressive markup language for user input, such as markdown if you want to give users the ability to enter rich text. However, input sanitization is still needed to ensure there are no HTML or JavaScript escapes

  • Employ a form of privilege separation by placing untrusted content inside a frame with a different origin. For example, user comments may be placed in a separate domain. This does not stop XSS damage but limits it to the domain.

  • Use the Content Security Policy (CSP). The content security policy was designed to defend agains XSS and clickjacking attacks. It allows website owners to tell clients what content is allowed, whether inline code is permitted, and whether the origin should be redefined to be unique.

SQL injection

We previously saw that SQL injection is an issue in any software that uses user input as part of the the SQL query. The same applies to browsers. Many web services have databases behind them and links often contain queries mixed with user input. If input is not properly sanitized, it can alter the SQL query to modify the database, force a user authentication, or return the wrong data.

GIFAR attack

The GIFAR attack is a way to embed malicious code into an image file. Sites that allow user-uploadable pictures are vulnerable. GIFAR is a pseudo-concatenation of GIF and JAR.

Java applets are sent as JAR files. A Java JAR file is essentially a zip file, a popular format for compressing and archiving multiple files. In Jar files, the header that contains information about the content is stored at the end of the file.

GIF files are lossless image files. The header in GIF files, as with most other file formats, is stored at the beginning of the file.

GIF and JAR files can be combined together to create a GIFAR file. Because the GIF header is at the beginning of the file, the browser believes it is an image and opens it as such, trusts its content, unaware that it contains code. Meanwhile the Java virtual machine (JVM) recognizes the JAR part of the file, which is run as an applet in the victim’s browser.

An attacker can use cross-site scripting to inject a request to invoke the applet (<applet archive:"myimage.gif">), which will cause it to run in the context of the origin (the server that hosted it). Because the code is run as a Java applet rather than JavaScript, it bypasses the “no authority” restriction the browser imposes on JavaScript in images.

HTML image tag vulnerability

We saw that the same-origin policy treats images as static content with no authority. It would seem that images should not cause problems (ignoring the now-patched GIFAR vulnerability that allowed images to inject Java applets). However, an image tag (IMG) can pass parameters to the server, just like any other URL:

<img src="" height="300" width="400"/>

This can be used to notify the server that the image was requested from specific content. The web server will also know the IP address that sent the request. The image itself can be practically hidden by setting its size to a single pixel:

<img src="..." height="1" width="1" />

This is sometimes done to track messages sent to user. If I send you HTML-formatted mail that contains a one-pixel image, you will not notice the image but my server will be notified that the image was downloaded. If the IMG tag contained some text to identify that this is related to the mail message I sent you, I will now know that you read the message.

Images can also be used for social engineering: to disguise a site by appropriating logos from well-known brands or adding certification logos.

Mixed HTTP and HTTPS content

A web page that was served via HTTPS might contain a reference to a URL, such as a script, that specifies HTTP content:

<script src=""> </script>

The browser would follow the scheme in the URL and download that content via HTTP rather than over the secure link. An active network attacker can hijack that session and modify the content. A safer approach is to not specify the scheme for scripts, which would cause them to be served over the same protocol as their embedding frame.

<script src="//"> </script>

Some browsers give warning of mixed content but the risks and knowledge of what really is going on might not be clear to users.

Extended Validation Certificates

TLS establishes a secure communication link between a client and server. For the authentication to be meaningful, the user must be convinced that the server’s X.509 certificate truly belongs to the entity that is identified in the certificate. Would you trust a certificate issued the Rubber Ducky Cert Shack? Even legitimate issuers such as Symantec offer varying levels of validating a certificate owner’s identity.

The lowest level of identity assurance for organizations is a domain validated certificate. To validate the user, the certificate authority will validate that some contact at that domain approves the request. This is usually done through email. It does not prove that the user has legal authority to act on behalf of the company nor is there any validation of the company. They require consent of the domain owner but do not try to validate who that owner is. They offer only incrementally more identity binding than self-signed certificates.

With extended validation (EV) certificates, the certificate authority uses a more rigorous, human-driven validation process. The legal and physical presence of the organization is validated. Then, the organization is contacted through a verified phone number and both the contact and the contact’s supervisor must confirm the authenticity of the request.

An extended validation certificate contains the usual data in a certificate (public key, issuer, organization, …) but must also contain a government-registered serial number and a physical address of the organization.

Domain validated certificate
Domain validated certificate
Extended validation certificate
Extended validation certificate

An attacker could get a low-level certificate and set up a web site. Targets would go to it, see the lock icon on their browser’s address bar that indicates an SSL connection, and feel secure. This led users to a false sense of security: the connection is encrypted but there is no reason to believe that there is validity to the organization on the other side.

Modern browsers identify and validate EV certificates. Once validated, the browser presents an enhanced security indicator that identifies the certificate owner.

Browser status bar

Most browsers offer an option to display a status bar that shows the URL of a link before you click it. This bar is trivial to spoof by adding an onclick attribute to the link that invokes JavaScript to take the page to a different link. In this example, hovering over the PayPal link will show a link to, which appears to be a legitimate PayPal login page. Clicking on that link, however, will take the user to

<a href=""
    onclick="this.href = '';">PayPal</a>

Mobile device security

What makes mobile devices unique?

In many ways, mobile devices should not be different from laptops or other computer systems. They run operating systems that are derived from those those systems, run multiple apps, and connect to the network. There are differences, however, that make them more attractive targets to attackers.


Several user factors make phones different from most computing devices:

  • Mobile users often do not think of their phones as real computers. They may not have the same level of paranoia that malware may get in or their activities may be monitored.

  • Users tend to install a lot more apps on their phones than they do on their computers. These apps are more likely to be from unknown software vendors than those installed on computers.

  • Social engineering may work more easily on phones. People are often in distracted environments when using their phones and may not pay attention to realize they are experiencing a phishing attack.

  • Phones are small. Users may be less likely to notice some security indicators, such as an EV certificate indicator. It is also easier to lose the phone … or have it stolen.

  • A lot of phones are protected with bad PINs. Four-digit PINs still dominate and, as with passwords, people tend to pick bad ones – or at least common ones. In fact, four PINs (1234, 1111, 0000, 1212, 7777) account for over 20% of PINs chosen by users.

  • While phones have safeguards to protect resources that apps can access users may grant app permission requests without thinking: they will just click through during installation to get the app up and running.


Phones have many sensors built into them: GSM, Wi-Fi, Bluetooth, and NFC radios as well as a GPS, microphone, camera. 6-axis gyroscope and accelerometer, and even barometer. These sensors can enable attackers to monitor the world around you: identify where you are and whether you are moving. They can record conversations and even capture video. The sensors are so sensitive that it has been demonstrated that a phone on a desk next to a keyboard can pick up vibrations from a user typing on the neighboring keyboard. This led to a word recovery rate of 80%.


There are a lot of mobile apps. Currently, there are about 2.6 million Android apps and 2.2 million iOS apps. Most of these apps are written by unknown, and hence untrusted, parties. We would be be wary of downloading many of these on our PCs but think nothing of doing so on our phones. We place our trust in several areas:

  • The testing & approval process by Google (automated) and Apple (automated + manual)
  • The ability of the operating system to sandbox an application
  • The operating system’s requirement of users granting permissions to access certain resources.

This trust may be misplaced as the approval process is far from foolproof. Overtly misadvertised or malicious apps can be detected but it is impossible to discern what a program will do in the future. Sandboxes have been broken in the past and users may be too happy to grant permissions to apps. Moreover, apps often ask for more permissions than they use. For example, a security researcher surveyed flashlight apps available for Android and discovered that, of the 937 apps surveyed, the majority requested an average of 25 permissions per app.

Most apps do not get security updates. There is little economic incentive for a developer to support existing apps, particularly if newer ones have been deployed.


Mobile phones are comparable to desktop systems in complexity. In some cases, they may even be more complex. This points to the fact that, like all large systems, they will have bugs and some of these bugs will be security sensitive. For instance, in late March, 2017, Apple released an upgrade for iOS, stating that they fixed over 80 security flaws. This is almost 10 years after the release of the iPhone. You can be certain there are many more flaws lurking in the system and more will be added as new features are introduced.

Because of bugs in the system, malicious apps may be able to get root privileges. If they do, they can install rootkits, enabling long-term control while concealing their presence. A lot of malicious iOS apps, for instance, gained root privileges by exploiting heap overflow vulnerabilities.

Unlike desktop systems and laptops, phones enforce a single user environment. Although PCs are usually used as single-user systems, they support multiple user accounts and run a general-purpose timesharing operating system. Mobile devices are more carefully tuned to the single-user environment.


Mobile devices are are threats to personal privacy as well as at risk of traditional security violations. Personal privacy threats include identifying users and user location, accessing the camera and microphone, and leaking personal data from the phone over the network. Additional threats include traditional phishing attacks, malware installation, malicious Android intents (messages to other apps or services), and overly-broad access to system resources and sensors.

Android security

Android was conceived as an operating system for network-connected smart mobile devices. The company was acquired by Google in 2005 and the engineering effort shifted to developing it as a Linux-based operating system platform that would be provided for free to third-party phone manufacturers. Google would make money from services and apps.

Applications on Android had specific needs, some of which have not been priorities in the design of desktop operating systems. These include:

The platform should ensure that the app is not modified between the time of its creation and its installation by users.
Each app needs private data and app components as well as the private data need to be protected from other apps.
Apps may need access to shared storage and devices, including the network. This includes the on-device file system, external storage, communication interfaces, and the various sensors available on phones and other smart devices.
Inter-app services
An app needs to be able to send messages to other apps to interact with services – but only when it is permitted to do so. This affects the design of apps. Desktop apps have a single launching point and generally run as single, monolithic process. Android apps, on the other hand, contain multiple independent components that include activities (user-facing parts), services (background components), and content providers (data access components).
The previous needs are general and apply to other mobile platforms, such as iOS. Since Android was designed as an operating system for a variety of hardware, apps should be able to run on different hardware architectures. A decision was made that apps should be distributed in a portable, architecture-neutral format. Apps run under the ART (Android Runtime) virtual machine, which is a variant of of the Java virtual machine (JVM). The original intention in Android was that apps would be written only in Java but it soon became clear that support for native (C and C++) code was needed and Google introduced the Native Development Kit to support this.

App package

An Android app is distributed as a package in an APK format, which is a single zip-format compressed file that contains several components:

Code for the user-visible component - the interface. The code component includes compiled code and needed resource files, such as strings, images, UI layouts, and data.
Code background component - services that the app may offer to other programs
Content provider
A database for whatever persistent data the app needs to store. It implements APIs to provide access to structured data. For instance, user contacts will be stored in a content provider.
Broadcast receiver
Mailbox for received messages
Package manifest (META-INF)
This contains items needed to validate the origin of the package and that it has not been tampered: - Signed list of hashes - Application creator’s certificate
Application manifest
This enumerates what makes up the application and includes: - Name (e.g., com.example.myapp) - Components list - Device requirements - Intents: interfaces the app supports to activate services & activities - Permissions: access to services this app requires (e.g., android.permission.SEND_SMS) - Permissions other apps need to access services this app provides

App Integrity

All Android apps must be signed by the developer. This allows both developers and users to know that the application will be installed without modifications on their device. On installation, the Android Package Manager verifies the signature.

Applications do not have to be signed by any central authorities. Developers can use a self-signed certificate and Android does not perform verification of CAs (certification authorities) in the certificate.

Prior to distribution, the contents of the APK package are hashed and signed by the developer and then the signature along with the developer’s certificate is inserted into the APK.

The app package is verified prior to installation by creating a hash and validating it against the signature in the package. If the app is distributed through the Google Play store, Google performs the same checks.

For additional protection, Google optionally supports a feature called Google Play Protect. This validates the app before it is downloaded and checks the user’s device for potential malware. It warns the user about malware or any apps that violate Google’s Unwanted Software Policy, such as apps that hide or misrepresent information.

The Android app sandbox

Android relies on process sandboxing for most of its security. Android is based on Linux, which is a multi-user operating system. Under Linux, each user has a distinct user ID and all apps run by that user run with the privileges of the user (ignoring setUID apps). This allows any one app full access to all user data.

User IDs

Android supports only a single user and uses Linux user IDs for isolating app privileges. Under Android, each app normally runs under a different user ID. Hence, apps are isolated and can only access their resources. Access requests to other objects involve messages that pass through a gatekeeper, which validates access requests.

Core Android services also run in a similar manner, with each service running under its own unique user ID. For example:

user id service
1001 Telephony
1002 Bluetooth
1003 Graphics
1004 Input devices
1005 Audio

Related apps may share the same Linux user ID if a sharedUserID attribute is set to the same domain for two or more applications as long as those apps are also signed by the same certificate. This would allow these related apps to share files and they can be configured to even share the same Dalvik virtual machine.

File permissions

Two mechanisms are used to enforce file access permissions:

  1. Linux file permissions These provide discretionary access control, allowing the owner (and root) to change permissions to allow others access to the files. With this mechanism, an app can decide to share a data file.

  2. SELinux mandatory access control Certain data and cache directories in Android are protected with the SELinux (Security-Enhanced Linux) mandatory access control (MAC) kernel extension. This ensures that even the owner cannot change access permissions for the files.

Internal storage provides a per-app private directory for files used by each application. External storage (e.g., attached microSD cards or USB devices) is shared among all apps and, of course, may be moved to other computers.


Android apps communicate with system services, between app components, and with other apps via intents. Intents are messaging objects. An intent is a message that contains: - requested action - data being sent to the action - the component and app that should handle the intent

Intents are declarations of app capabilities and the messaging format used to communicate with an app. They identify app components and how those components are started (e.g., foreground or background and what the entry points are).

Intents allow an app to:

  • Start a service (background task)
  • Start an activity (start a user-facing foreground task, such as a camera or phone)
  • Deliver notifications to one or more apps (broadcasts)

An app lists its available intents in its app manifest and these intents are registered when the app is installed. The intents form the list of services that the app exposes to other applications. If several apps register the same intent, the user selects which app should be launched. For example, you may have multiple browsers installed and are asked to pick the one that should be associated with implicit intents to display a URL.

Intents may explicit or implicit. With explicit intents, the app identifies the target component in the intent. That is, the intent is sent to a specific, named app. With implicit intents, the app asks Android to find a component based on the type of data being sent. For example, sending a URL to display a web page can be an explicit intent that will cause Android to open the default web browser.

Intents are used to invoke system services as well as services available on any installed apps. Common examples of intents are: add a calendar event, set an alarm, take a photo & return it, view a contact, add a contact, show a location on a map, retrieve a file, or initiate a phone call.

Intents pass through & are validated by Android’s gatekeeper.


An app manifest also contains permissions, which can specify which apps or services are allowed access to the app’s services and whether a user needs to be prompted to grant permission. Permissions determine whether one app is allowed to access another app’s component.

Apps need permissions to access any services, which include:

  • System resources: logs, battery levels, …
  • System interfaces: Internet, Bluetooth, send SMS, send email, …
  • Sensitive data: SMS messages, contacts, email, …
  • Any app-defined services

Every service, whether a normal app or a system service is assigned a protection level that determines who may be able to access it:

Permission Type
Normal this is the default; there is no danger to the users or system if this service is accessed
Dangerous Access can compromise the system or privacy. The user has to approve access during installation or runtime
Signature Access granted only if the app signed by the same developer & contains the same certificate. This allows related apps to share services (e.g., Microsoft Office apps)
SignatureOrSystem Similar to signature_ but access will be granted if a system application is requesting it

The application manifest file defines the type of permission and the service that is associated with permission name.

Permissions are managed in two forms:

Permission text strings
These are enforced by Android middleware. Sensitive resources such as the phone are only accessible via APIs and access is mediated through these APIs.
Linux group IDs
Group permissions are enforced by Linux file access checks. For efficiency, networking and file access operations do not go through APIs but directly to Linux. This includes access to Bluetooth, Wi-Fi, and external storage. To be able to access resources, the app needs to be a member of the group that corresponds to the resource. Android dynamically adds user IDs to various groups based on what permissions are granted to them.

Other protections

The Linux operating system provides per-process memory isolation and address space layout randomization (ASLR). Linux also uses no-execute (NX) protection on stack and heap memory pages if the processor supports it

The Java compiler provides provides stack canaries, and its memory management libraries provide some heap overflow protections (checks of backward & forward pointers in dynamically allocated structures).

Android supports whole disk encryption so that if a device is stolen, an attacker will not be able to easily recover file contents even with raw access to the flash file system.

Unlike iOS, Android supports the concurrent execution of multiple apps. It is up to the developer to think about being frugal with battery life. Apps store state their state in persistent memory so they can be stopped and restarted at any time. This ability to stop an app also helps with DoS attacks as the app is not accepting requests or using system resources.

Security concerns

An app can probe whether another app has specific permissions by specifying a permission with an intent method call to that app. This can help an attacker identify a target app. Receivers need to be able to handle malicious intents, even for actions they do not expect to handle and for data that might not make sense for the action.

Apps may also exploit permissions re-delegation. An app, not having a certain permission, may be able gain those privileges by communicating through another app. If a public component does not explicitly have an access permission listed in its manifest definition, Android permits any app to access it. For example, the Power Control Widget (a default Android widget) allows third-party apps to change protected system settings without requesting permissions to control those settings. This is done by presenting the user with a pop-up interface to control power-related settings. A malicious app can send a fake intent to the Power Control Widget while simulating the pressure of the widget button to switch power-related settings. It is effectively simulating a user’s actions on the screen.

By using external storage, apps can exercise permissions avoidance. By default, all apps have access to external storage. Many apps store data in external storage without specifying any protection, allowing other apps to access that data.

Another way permissions avoidance is used is that Android intents allow opening some system apps without requiring permission to do so. These apps include the camera, SMS, contact list, and browser. For instance, opening a browser via an intent can be dangerous since it enables data transmission, receiving remote commands, and even downloading files without user intervention.

iOS security

App signing

iOS requires mandatory code signing. Unlike Android, which accepts self-signed certificates, the app package must be signed using an Apple Developer certificate and apps are only available for This does not ensure trustworthiness of an app but identifies the registered developer and ensures that the app has not been modified after it has been signed.

Runtime protection

Apple’s iOS provides runtime protection via OS-level sandboxing. System resources and the kernel are shielded from user apps. The sandbox also limits which system calls an app can call Except through kernel exploits, an app cannot leave its sandbox.

The app sandbox restricts the ability of one app to access another app’s data and resources. Each app has its own sandbox directory. The OS enforces the sandbox and permits access only to files within that directory, as well as restricted access to to system preferences, the network, and other resources.

Inter-app communication can take place only through iOS APIs. Code generation by an app is prevented because data memory pages cannot be made executable and executable memory pages are not writable by user processes.

Data protection

All file contents are encrypted with a unique 256-bit AES per-file key, which is generated when the file is created.

This per-file key is encrypted with a class key and is stored along with the file’s metadata, which is the part of the file system that describes attributes of the file, such as size, modification time, and access permissions.

The class key is generated from a hardware key in the device and the user’s passcode. Unless the passcode is entered, the class key cannot be created and the file key cannot be decrypted.

The file system’s metadata is also encrypted. A file system key is used for this, which is derived directly from the hardware key, which is generated when iOS is installed. Keys are stored in Apple’s Secure Enclave, a separate processor and isolated memory that cannot be accessed directly by the main processor. Encrypting metadata encrypts the entire structure of the file system. Someone who rips out the flash memory from an iOS device and examines it will not be able to see neither file contents (they are encrypted with per-file keys) nor information about those files (the metadata is encrypted with a file system key).

A hardware AES engine encrypts and decrypts the file as it is written/read on flash memory so file encryption is done transparently and efficiently.

The iOS kernel partition is mounted read-only, so even if an app manages to break out of its sandbox due to some vulnerability and gain root access, it will still not have permission to modify the kernel.


The iOS sandbox restricts apps from accessing files stored by other apps or making changes to device settings. Each app is given a unique home directory for its files. System files and resources are shielded from the user’s apps

Unlike Android, where each app is assigned a unique user ID, apps under iOS run as a non-privileged user “mobile.”

The iOS framework grants entitlements to apps. These are digitally-signed key-value pairs that are granted to an app to allow access to specific services. It is essentially a capability. If you have an entitlement, then you can access a service.

Kernel protection

In addition to the sandbox, iOS also uses address space layout randomization (ASLR) and memory execute protection for stack and heap pages via ARM’s Execute Never (XN) memory page flag.

Masque attacks

While Apple normally expects users to install apps only from its App Store, users need to be able to deploy pre-production apps to friendly parties for testing and enterprises may need to deploy in-house apps to their employees. Apple supports a Developer Enterprise Program to create and distribute such in-house apps. This mechanism has been used to replace existing apps with private versions. The vulnerability has been patched.

iOS has been hit several times with masque attacks. While there have been various forms of these, the basic attack is to get users to install malicious apps that have been created with the same bundle identifier as some exiting legitimate app. This malicious app replaces the legitimate app and masquerades as that app. Since Apple will not host an app with a duplicate bundle identifier, the installation of these apps has to bypass the App Store. Enterprise provisioning is used to get users to install this. which typically requires the user going to a URL that redirects the user to an XML manifest file hosted on a server. The ability to launch this attack is somewhat limited as the user will generally need to have an enterprise certificate installed to make the installation seamless.

Web apps

Both iOS and Android have full web browsers that can be used to access web applications. They also permit web apps to appear as a regular app icon. The risks here are the same as those for web browsers in general: loading untrusted content and leaking cookies and URLs to foreign apps.

Mobile-focused web-based attacks can take advantage of the sensors on phones. The HTML5 Geolocation API allows JavaScript to find your location. A Use Current Location permission dialog appears, so the attacker has to hope the user will approve but there the attacker can provide incentives via a Trojan horse approach: provide a service that may legitimately need your location.

Recently, a proof of concept web attack showed how JavaScript could access the phone’s accelerometers to detect movements of the phone as a user enters a PIN. The team that implemented this achieved a 100% success rate of recognizing a four-digit PIN within five attempts of a user entering it. Apple patched this specific vulnerability but there may be more undiscovered ones.

Hardware support for security

ARM TrustZone worlds
ARM TrustZone worlds

All Android and iOS phones currently use ARM processors. ARM provides a dedicated security module, called TrustZone, that coexists with the normal processor. The hardware is separated into two “worlds”: secure (trusted) and non-secure (non-trusted) worlds. Any software resides in only one of these two worlds and the processor executes in only one world at a time.

Each of these worlds has its own operating system and applications. Android systems run an operating system called Trusty TEE in the secure world and, of course, Linux in the untrusted world.

Logically, you can think of the two worlds as two distinct processors, each running their own operating system with their own data and their own memory. Non-secure applications cannot access any own memory or registers of secure resources directly. The only way they can communicate is through a messaging API.

In practice, the hardware creates two virtual cores for each CPU core, managing separate registers and all processing state in each world.

The phone’s operating system and all applications reside in the non-trusted world. Secure components, such as cryptographic keys, signature services, encryption services, and payment services live in the trusted world. Even the operating system kernel does not have access to any of the code or data in the trusted world. Hence, even if an app manages a privilege escalation attack and gains root access, it will be unable to access certain security-critical data.

Applications for the trusted world include key management, secure boot, digital rights management, secure payment processing, mobile payments, and biometric authentication.

Apple Secure Enclave

Apple uses modified ARM processors for iPhones and iPads. In 2013, they announced Secure Enclave for their processors. The details are confidential but it appears to be similar in function to ARM’s TrustZone but designed as a physically separate coprocessor. As with TrustZone, the Secure Enclave coprocessor runs its own operating system (a modified L4 microkernel in this case).

The processor has its own secure bootloader and custom software update mechanism. It uses encrypted memory so that anything outside the Secure Enclave cannot access its data. It provides:

  • All cryptographic operations for data protection & key management.
  • Random number generation.
  • Secure key store, including Touch ID (fingerprint) and the Face ID neural network.
  • Data storage for payment processing.

The Secure Enclave maintains the confidentiality and integrity of data even if the iOS kernel has been compromised.

Steganography and Watermarking

Cryptography’s goal is to hide the contents of a message. Steganography’s goal is to hide the very existence of the message. Classic techniques included the use of invisible ink, writing a message on one’s head and allowing the hair to cover it, microdots, and carefully-clipped newspaper articles that together communicate the message.

A null cipher is one where the actual message is hidden among irrelevant data. For example, the message may comprise the first letter of each word (or each sentence, or every second letter, etc.). Chaffing and winnowing entails the transmission of a bunch of messages, of which only certain ones are legitimate. Each message is signed with a key known only to trusted parties (e.g., a MAC). Intruders can see the messages but can’t validate the signatures to distinguish the valid messages from the bogus ones.

Messages can be embedded into images. There are a couple of ways of hiding a message in an image:

  1. A straightforward method to hide a message in an image is to use low-order bits of an image, where the user is unlikely to notice slight changes in color. An image is a collection of RGB pixels. You can mess around with the least-significant bits and nobody will notice changes in the image, so you can just encode the entire message by spreading the bits of the message among the least-significant bits of the image.

  2. You can do a similar thing but apply a frequency domain transformation, like JPEG compression does, by using a Discrete Cosine Transform (DCT). The frequency domain maps the image as a collection ranging from high-frequency areas (e.g., “noisy” parts such as leaves, grass, and edges of things) through low-frequency areas (e.g., a clear blue sky). Changes to high frequency areas will mostly be unnoticed by humans: that’s why jpeg compression works. It also means that you can add your message into those areas and then transform it back to the spatial domain. Now your message is spread throughout the higher-frequency parts of the image and can be extracted if you do the DCT again and know where to look for the message.

Many laser printers embed a serial number and date simply by printing very faint color splotches.

Steganography is closely related to watermarking. and the terms “steganography” and “watermarking” are often used interchangeably.

The primary goal of watermarking is to create an indelible imprint on a message such that an intruder cannot remove or replace the message. It is often used to assert ownership, authenticity, or encode DRM rules. The message may be, but does not have to be, invisible.

The goal of steganography is to allow primarily one-to-one communication while hiding the existence of a message. An intruder – someone who does not know what to look for – cannot even detect the message in the data.



SQL Injection, The Open Web Application Security Project, April 10, 2016.

SQL Injection, Acunetix.

Simson Garfinkel & Gene Spafford, Section 11.5, Protecting Yourself, Practical UNIX & Internet Security, Second Edition, April 1996. Discusses shell attacks.

Directory traversal attack, Wikipedia.

Why does Directory traversal attack %C0%AF work?, Information Security Stack Exchange, September 9, 2016

Tom Rodriquez, What are unicode vulnerabilities on Internet Information Server (IIS)?, SANS.

The Unicode Consortium.

IDN homograph attack, Wikipedia.

Time of check to time of use, Wikipedia.

Michael Cobb, How to mitigate the risk of a TOCTTOU attack, TeachTarget, August 2011.

Ernst & Yount LLP Security & Technology Solutions, Using Attack Surface Area And Relative Attack Surface Quotient To Identify Attackability. Customer Information Paper.

Michael Howard, Back to the Future: Attack Surface Analysis and Reduction, Microsoft Secure Blog, February 14, 2011.

Olivier Sessink, Jailkit, November 18, 2015.


Evan Sarmiento, Chapter 4. The Jail Subsystem, FreeBSD Architecture Handbook, The FreeBSD Documentation Project. 2001, Last modified: 2016–10–29.

Matteo Riondato, Chapter 14. System Administration: Jails, FreeBSD Architecture Handbook, The FreeBSD Documentation Project. 2001, Last modified: 2016–10–29.

Chapter 1. Introduction to Control Groups, Red Hat Enterprise Linux 6.8 Resource Management Guide.

José Manuel Ortega, Everything you need to know about Containers Security, OSDEM 2018 presentation.

Johan De Gelas, Hardware Virtualization: the Nuts and Bolts, AnandTech article, March 17 2008.