Exam 3 study guide
The one-hour study guide for exam 3
Latest update: Wed May 20 13:43:00 EDT 2020
Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the one hour time window in the title literally.
Bitcoin & Blockchain
Bitcoin was introduced anonymously in 2009 by a person or group named Satoshi Nakamoto and is considered to be the first blockchain-based cryptocurrency. Bitcoin was designed as an open, distributed, public system: there is no authoritative entity and anyone can participate in operating the servers.
Traditional payment systems rely on banks to serve as a trusted third party. If Alice pays $500 to Charles, the bank, acting as a trusted third party, deducts $500 from Alice’s account and adds $500 to Charles’ account. Beyond auditing, there is no need to maintain a log of all transactions; we simply care about account sums. With a centralized system, all trust resides in this trusted third party. The system fails if the bank disappears, the banker makes a mistake, or if the banker is corrupt.
With Bitcoin, the goal was to create a completely decentralized, distributed system that allows people to manage transactions while preventing opportunities for fraud.
The ledger and the Bitcoin network
Bitcoin maintains a complete list of every single transaction since its creation in January 2009. This list of transactions is called the ledger and is stored in a structure called a blockchain. Complete copies of the ledger are replicated at Bitcoin nodes around the world. There is no concept of a master node or master copies of the ledger. All of these systems run the same software. New systems get the names of some well-known nodes when they download the software and DNS query on these nodes returns their IP addresses. After connecting to one or more nodes, a Bitcoin node will ask each for a list of known Bitcoin nodes. This creates a peer discovery process that allows a node to get a complete list of other nodes in the network.
User identities and addresses
We know how to create unforgeable messages: just sign them. If Alice wants to transfer $500 to Charles, she can create a transaction record that describes this transfer and sign it with her private key (e.g., use a digital signature algorithm or create a hash of the transaction and encrypt it with her private key). Bitcoin uses public-private key pairs and digital signatures to sign transactions.
Bitcoin transactions — the movement of bitcoins from one account to another — are associated with public these keys and not users. Users are anonymous. Your identity is your public key and you can use this identity by proving they have the corresponding private key.
There is never any association of your public key with your name. In fact, nothing stops you from creating multiple keys and having identities. The system does not care, or know, what your physical identity is or how many addresses you assigned to yourself. All that matters is that only you have the corresponding private keys to the public keys identified in your transactions so you are the only one who could have created valid signatures for your transactions.
If you can create transactions on behalf of a specific public key, it means that you own the corresponding private key. If you lose that private key, then you can no longer create transactions and hence cannot access your Bitcoin funds. There is nobody to call to recover a lost key since you are solely responsible for storing it!
In its initial deployment, your public key was your Bitcoin identity. If someone wanted to transfer money to you, they would create a transaction where your public key is identified as the recipient of the bitcoin. Bitcoin now identifies recipients by their Bitcoin address. Your Bitcoin address is essentially a hash of your public key (and you will have several addresses if you have several keys). The details of creating an address are a bit cumbersome:
Generate an ECDSA (Elliptic Curve Digital Signature Algorithm) public, private key pair. This serves as your identity and signing key.
Create a SHA–256 hash of the public key
Perform a RIPEMD–160 hash on that
Add a version byte in front of the result
Perform a SHA–256 hash on the result … and another SHA–256 hash on that result
The four bytes of the result are treated as the address checksum. Add these four bytes to the end of the RIPEMD–160 hash from [4]
Convert the bytes to a base–58 string using Base58Check encoding to create a 20-byte printable string. Base58Check encoding adds a checksum to be able to validate that the value is not mistyped. This string is your bitcoin address.
When all is said and done, the address is essentially just a hash of your public key. Since a hash is a one-way function, someone can create (or verify) your address if they are presented with your public key. However, they cannot derive your public key if they have your address.
Bitcoin uses addresses only as destinations; an address can only receive funds. If Bob wants to send bitcoin to Alice, he will identify Alice as the output – the target of the money – by using her address. At some point in the future, Alice can use that money by creating a transaction whose source (input) refers to the transaction where she received the bitcoin. Any bitcoin node can validate this transaction:
Alice’s transaction will identify where the money comes from (inputs). Each input contains contains a reference to a past transaction, her public key, and her signature.
A node can validate the signature by using Alice’s public key, which is also a field of the input. This proves that someone who owns the private key (Alice) that corresponds to that public key created the transaction.
That transaction input contains a reference to an older transaction where the output – Alice’s address – is identified as the output of the bitcoin. Given the address, we cannot derive the public key but now we have both.
A bitcoin node can hash Alice’s public key (from the input) to create the address and see that it is the same as in the referenced old transaction (the output). That way, it validates that the older transaction indeed gives money to Alice.
User transactions (moving coins)
A transaction contains inputs and outputs. Inputs identify where the bitcoin comes from and outputs identify where to whom it is being transferred. If Alice wants to send bitcoin to Bob, she creates a message that is a bitcoin transaction and sends it to one or more bitcoin nodes. When a node receives a message, it will forward the transaction to its peers (other nodes it knows about). Typically, within approximately five seconds, every bitcoin node on the network will have a copy of the transaction and can process it.
The bitcoin network is not a database. It is build around a ledger, the list of all transactions. There are no user accounts that can be queried. In her transaction, Alice needs to provide one or more links to previous transactions that will add up to at least the required amount of bitcoin that she’s sending. These links to earlier transactions are called inputs. Each input is an ID of an earlier transaction. Inputs are outputs of previous transactions.
When a bitcoin node receives a transaction, it performs several checks:
The signature of each input is validated by checking it against the public key in the transaction. This ensures that it was created by someone who has the private key that corresponds to the public key.
It hashes the public key in the transaction to create the address, which will be matched against the output addresses in the inputs.
The transactions listed in the inputs are validated to make sure that those transactions have not been used by any other transaction. This ensures there will be no double spending.
Finally, it makes sure that there is a sufficient quantity of bitcoin output by those input transactions.
A bitcoin transaction contains:
- One or more inputs:
- Each input identifies transactions where coins come from. These are references to past transactions. Each input also contains a signature and a public key that corresponds to the private key that was used to create the signature. A user may have multiple identities (keys) and reference past transactions that were directed to different addresses that belong to the user.
- Output:
- Destination address & amount – who the money goes to. This is simply the recipient’s bitcoin address.
Change: : The transaction owner’s address & bitcoin amount. Every input must be completely spent Any excess is generated as another output to the owner of the transaction.
Transaction fee (anywhere from 10¢ to a few $ per transaction). There is a limited amount of space (about 1 MB) in a block. A transaction is about 250 bytes. To get your transaction processed quickly, you need to outbid others.
Blocks and the blockchain
Transactions are sent to all the participating servers. Each system keeps a complete copy of the entire ledger, which records all transactions from the very first one. Currently the bitcoin ledger is about 250 GB.
Transactions are grouped into a block. A block is just a partial list of transactions. When a server is ready to do so, it can add the block to the ledger, forming a linked list of blocks that comprise the blockchain. In Bitcoin, a block contains ten minutes worth of transactions, all of which are considered to be concurrent.
Every ten minutes, a new block of transactions is added to the blockchain. A block is approximately a megabyte in size and holds around 4,000 transactions. To make it easy to locate a specific transaction within a block, the blocks are stored in a Merkle tree. This is a binary tree of hash pointers and makes it easy not just to locate a desired transaction but to validate that it has not been tampered by validating the chain of hashes along the path.
Securing the Block
A critically important part of the Bitcoin blockchain is to make sure that blocks in the blockchain have not been modified. We explored the basic concept of a blockchain earlier. Each block contains a hash pointer to the previous block in the chain. A hash pointer not only points to the previous block but also contains a SHA–256[1] hash of that block. This creates a tamper-proof structure. If the contents of any block are modified (accidentally or maliciously), the hash pointer that points to that block will no longer be valid (the hashes won’t match).
To make a change to a block, an attacker will need to modify all the hash pointers from the most recent block back to the block that was changed. One way to prevent such a modification could have been to use signed hash pointers to ensure an attacker cannot change their values. However, that would require someone to be in charge of signing these pointers and there is no central authority in Bitcoin; anyone can participate in building the blockchain. We need a different way to protect blocks from modification.
Proof of Work
Bitcoin makes the addition of a new block – or modification of a block in a blockchain – difficult by creating a puzzle that needs to be solved before the block can be added to the blockchain. By having a node solve a sufficiently difficult puzzle, there will be only a tiny chance that two or more nodes will propose adding a block to the chain at the same time.
This puzzle is called the Proof of Work and is an idea that has been adapted from an earlier system called hashcash. Proof of Work requires computing a hash of three components, hash(B, A, W) where:
- B = block of transactions (which includes the hash pointer to the previous block)
- A = address (i.e., hash of the public key) of the owner of the server doing the computation
- W = the Proof of Work number
When servers are ready to commit a block of transactions onto the chain, they each compute this hash, trying various values of W until the hash result has a specific pre-defined property. The property they are searching for is a hash value that is less than some given number. Currently, it’s a value that requires the leading 74 bits of the 256-bit hash to all be 0s). The property changes over time to ensure that the puzzle never gets too easy regardless of how many nodes are in the network or how fast processors get.
Recall that one property of a cryptographic hash function is the inability to deduce any of the input by looking at the output. Hence, we have no idea what values of W will yield a hash with the desired properties. Servers have to try trillions of values with the hope that they will get lucky and find a value that yields the desired hash. This process of searching for W is called mining.
When a server finds a value of W that yields the desired hash, it advertises that value to the entire set of bitcoin servers. Upon receiving this message, it is trivial for a server to validate the proof of work by simply computing hash(B, A, W) with the W sent in the message and checking the resultant value. The servers then add the block, which contains the Proof of Work number and the winner’s address, onto the blockchain.
Bitcoin’s mining difficulty is adjusted every 2,016 blocks, which corresponds to approximately 14 days, to keep the average rate at which blocks are added to the blockchain at 10 minutes. This allows the network to handle changes in the number of miners participating in computing the proof work.
Double Spending and modifying past transactions
A major concern with decentralized cryptocurrency systems is double spending. Double spending refers to sending the same funds (or tokens) to multiple parties: Alice sends $500 to Charles and $500 to David but only has $500 in her account. Bitcoin deals with this by having every server maintain the complete ledger, so Alice’s entire list of transactions can be validated before a new one is accepted.
Alice may decide to go back to an older transaction and modify it. For example, she might change change the transaction that sent bitcoin to Charles into one that sends money to David – or simply delete the fact that she paid Charles the full amount.
To do this, she would need to compute a new proof of work value for that block so the block hash will be valid. Since Bitcoin uses hash pointers, each block contains a hash pointer to the previous (earlier) block. Alice would thus need to compute new proof of work values for all newer blocks in the chain so that her modified version of the entire blockchain is valid. She ends up making a competing blockchain.
Recomputing the proof of work numbers is a computationally intensive process. Because of the requirement to generate the Proof of Work for each block, a malicious participant will not be able to catch up with the cumulative work of all the other participants. Because of errors or the rare instances where multiple nodes compute the proof of work concurrently, even honest participants may, on occasion, end up building a competing blockchain. Bitcoin’s policy is that the longest chain in the network is the correct one. The length of the chain is the chain’s score and the highest-scoring chain will be considered the correct one by the servers. A participant is obligated to update its chain with a higher-scoring one if it gets notice of a higher-scoring chain from another system. If it doesn’t update and insists on propagating its chain as the official one, its chain will simply be ignored by others.
51% Attack
Let us go back to the example of Alice maliciously modifying a past transaction. In addition to the work of modifying the existing blockchain, Alice will also need to process new transactions that are steadily arriving, and making the blockchain get longer as new blocks get added to it. She needs to change the existing blockchain and also compute proof of work values for new blocks faster than everyone else in the network so that she would have the longest valid chain and hence a high score.
If she can do this then her chain becomes the official version of the blockchain and everyone updates their copy. This is called a 51% attack. To even have a chance of succeeding, Alice would need more computing power than the reset of the systems in the Bitcoin network combined. Back in 2017, The Economist estimated that “bitcoin miners now have 13,000 times more combined number-crunching power than the world’s 500 biggest supercomputers,” so it is not feasible for even a nation-state attacker to harness sufficient power to carry out this attack on a popular cryptocurrency network such as Bitcoin. Blockchain works only because of the assumption that the majority of participants are honest … or at least not conspiring together to modify the same transactions.
Even if someone tried to do this attack, they’d likely only be able to modify transactions in very recent history – in the past few blocks of the blockchain. This is why For this reason, transactions further back in the blockchain are considered to be more secure.
Committing Transactions
Because of the chain structure, it requires more work to modify older transactions (more blocks = more proof of work computations). Modifying only the most recent block is not hugely challenging. Hence, the further back a transaction is in the blockchain, the less likely it is that anyone can amass the computing power to change it and create a competing blockchain.
A transaction is considered confirmed after some number, N, additional blocks are added to the chain. The value of N is up to the party receiving the transaction - a level of comfort. The higher the number, the deeper the transaction is in the blockchain and the harder it is to alter. Bitcoin recommends N=1 for low-value transactions (payments under $1,000; this enables them to be confirmed quickly), N=3 for deposits and mid-value transactions, and N=6 for large payments (e.g., $10k…$1M). Even larger values of N could be used for extremely large payments.
Rewards
Why would servers spend a huge amount of computation, which translates to huge investments in computing power and electricity, just to find a value that produces a hash with a certain property? To provide an incentive, the system rewards the first server (the miner) that advertises a successful Proof of Work number by depositing a certain number of Bitcoins into their account. To avoid rewarding false blockchains as well as to encourage continued mining efforts, the miner is rewarded only after 99 additional blocks have been added to the ledger.
The reward for computing a proof of work has been designed to decrease over time:
- 50 bitcoins for the first 4 years since 2008
- 25 bitcoins from 2012–2015
- 12.5 bitcoins from block #420,000 July 9, 2016 – 2019
- 6.25 bitcoins at block #630,000 – around May 24, 2020
Eventually, the reward will reach zero and there will be a maximum of around 21 million bitcoins in circulation. However, recall that each transaction has a fee associated with it. Whoever solves the puzzle first and gets a confirmed block into the blockchain will also reap the sum of all the transaction fees in that block.
Centralization
Bitcoin has been designed to operate as a large-scale, global, fully decentralized network. Anybody can download the software and operate a bitcoin node. All you need is sufficient storage to store the blockchain. There are currently over 9,000 reachable full nodes spread across 99 countries. It is estimated that there are over 100,000 total nodes, including those that are be running old versions of software or are not always reachable. In this sense, Bitcoin is truly decentralized. Note that there are different types of nodes. The nodes we discussed serve as full nodes. They maintain an entire copy of the blockchain and accept transactions. Light nodes are similar but store only a part of the blockchain, talking to a full node parent if they need to access other blocks.
Not everyone who operates a bitcoin node does mining (proof of work computation). Mining is incredibly time energy intensive. To make money on mining, one needs to buy dedicated ASIC mining hardware that is highly optimized to compute SHA–256 hashes. Conventional computers will cost more in energy than they will earn in bitcoin rewards. Because of this, mining tends to be concentrated among a far smaller number of players. It is not as decentralized as much as one would like.
Bitcoin software is open source but there is only a small set of trusted developers. The software effort is inspectable but not really decentralized. Bugs have been fixed but many nodes still run old and buggy versions. Bitcoin transactions cannot be undone even if they were created by buggy nodes or via compromised keys.
-
SHA–256 is the SHA–2 family of hash functions that produces a 256-bit output. The SHA–2 family also includes HA–224, SHA–256, SHA–384, and SHA–512. ↩
Network Security
The Internet was designed to support the interconnection of multiple networks, each of which may use different underlying networking hardware and protocols. The Internet Protocol, IP, is a logical network built on top of these physical networks. IP assumes that the underlying networks do not provide reliable communication. It is up to higher layers of the IP software stack (either TCP or thee application) to to detect lost packets. Individual networks under IP are connected by routers, which are computing elements that are each connected to multiple networks. They receive packets on one network and forward them onto another network to get them to their destination. A packet from your computer will often flow through multiple networks and multiple routers that you know nothing about on its way to its destination. This poses security risks since you do not know of the trustworthiness of the routers and networks.
Networking protocol stacks are usually described using the OSI layered model. For IP, the layers are:
Physical. Represents the actual hardware.
Data Link. The protocol for the local network, typically Ethernet (802.1) or Wi-Fi (802.11). Ethernet and Wi-Fi use the same addressing scheme and were designed to be bridged together to form a single local area network.
Network. The protocol for creating a single logical network and routing packets across physical networks. The Internet Protocol (IP) is responsible for this.
Transport. The transport layer is responsible for creating logical software endpoints (ports) so that one application can send a stream of data to another via an operating system’s sockets interface. TCP uses sequence numbers, acknowledgement numbers, and retransmission to provide applications with a reliable, connection-oriented, bidirectional communication channel. UDP does not provide reliability and simply sends a packet to a given destination host and port.
Higher layers of the protocol stack are handled by applications and the libraries they use.
Data link layer
In an Ethernet network, the data link layer is handled by Ethernet transceivers and Ethernet switches. Security was not a consideration in the design of this layer and several fundamental attacks exist at this layer. Wi-Fi also operates at the data link layer and added encryption on wireless data between the device and access point. Note that the encryption is not end-to-end, between hosts, but ends at the access point.
Switch CAM table overflow
Sniff all data on the local area network (LAN).
Ethernet frames are delivered based on their 48-bit MAC[1] address. IP address are meaningless to ethernet transceivers and to switches since IP is handled at higher levels of the network stack. Ethernet was originally designed as a bus-based shared network; all devices on the LAN shared the same wire. Any system could see all the traffic on the Ethernet. This resulted in increased congestion as more hosts were added to the local network.
Ethernet switches alleviated this problem by using a dedicated cable between each host and the switch. The switch routes an ethernet frame only to the Ethernet port (the connector on the switch) that is connected to the system that contains the desired destination address.
Unlike routers, switches are not programmed with routes. Instead, they learn which computers are on which switch ports by looking at the source MAC addresses of incoming ethernet frames. An incoming frame indicates that the system with that source address is connected to that switch port.
To implement this, a switch contains a switch table (a MAC address table). This table contains entries for known MAC addresses and their interface. The switch then uses forwarding and filtering:
When a frame arrives for some destination address D, the switch looks up D in the switch table to find the interface. If D is in the table and on a different port than that of the incoming frame, the switch forwards the frame to that interface, queueing it if necessary.
If D is not found in the table, then the switch assumes it has not yet learned what port that address is associated with, so it forwards the frame to ALL interfaces.
This procedure makes the switch self-learning: the switch table is empty initially and gets populated as the switch inspects source addresses.
A switch has to support extremely rapid lookups in the switch table. For this reason, the table is implemented using content addressable memory (CAM, also known as associative memory). CAM is expensive and switch tables are fixed-size and not huge. The switch will delete less-frequently used entries if it needs to make room for new ones.
The CAM table overflow attack exploits the limited size of this CAM-based switch table. The attacker sends bogus Ethernet frames with random source MAC addresses. Each newly-received address will displace an entry in the switch table, eventually filling up the table. With the CAM table full, legitimate traffic will be broadcast to all links A host on any port can now see all traffic. The CAM table overflow attack turns a switch into a hub.
Countermeasures for CAM table attacks require the use of managed switches that support port security. These switches allow you to limit the number of addresses the table will hold for each switch port.
VLAN hopping (switch spoofing)
Sniff all data from connected virtual local area networks.
One use of local area networks is to isolate broadcast traffic from other groups of systems. Related users can all be placed on a single LAN. However, users may be relocated within an office office and switches may be used inefficiently. Virtual Local Area Networks (VLANs) create multiple logical LANs over a single physical switch infrastructure. The network administrator can assign each port on a switch to a specific VLAN. Each VLAN is a separate broadcast domain so that each VLAN acts like a truly independent local area network. Users belonging to one VLAN will not see any traffic from the other; it has to be routed through an IP router.
Switches may be extended by cascading them with other switches: an ethernet cable from one switch simply connects to another switch. With VLANs, the connection between switches forms a VLAN trunk and carries traffic from all VLANs to the other switch. To support this behavior, an extended Ethernet frame format was created for theEthernet frames sent on this link since each frame now needs to identify the VLAN from which it originated.
A VLAN hopping attack employs switch spoofing: an attacker’s computer identifies itself as a switch with a trunk connection. It then receives traffic on all VLANs.
Defending against this attack requires a managed switch where an administrator can disable unused ports and associate them with some unused VLAN. Disable auto-trunking also needs to be disabled so that each port cannot become a trunk. Instead, trunk ports need to be configured explicitly.
ARP cache poisoning
Redirect IP packets by changing the IP address to MAC address mapping.
Recall that IP is a logical network that sits on top of physical networks. If we are on an Ethernet network and need to send an IP datagram, that IP datagram needs to be encapsulated in an Ethernet frame. The Ethernet frame needs to contain a destination MAC address that corresponds to the destination machine (or router, if the destination address is on a different LAN). For an operating system to send an IP packet, therefore, it needs to figure out what MAC address corresponds to a given IP address.
There is no relationship between an IP and Ethernet MAC address. To find the MAC address when given an IP address, a system uses the Address Resolution Protocol, ARP. The sending computer creates an Ethernet frame that contains an ARP message with the IP address it wants to query. This ARP message is then broadcast: all network adapters on the LAN receive the message. If some system receives this message and sees that its IP address matches the address in the query, it sends back an ARP response. The response identifies the MAC address of the system that owns that IP address.
To avoid the overhead of doing this query each time the system needs to use the IP address, the operating system maintains an ARP cache that stores recently used addresses. Moreover, hosts cache any ARP replies they see, even if they did not originate them. This is done on the assumption that many systems use the same set of IP addresses and the overhead of making an ARP query is substantial. Along the same lines, a computer can send an ARP response even if nobody sent a request. This is called a gratuitious ARP and is often sent by computers when they start up as a way to give other systems on the LAN the IP:MAC address mapping without them having to ask for it at a later time.
Note that there is no way to authenticate that a response is legitimate. The asking host does not have any idea of what MAC address is associated with the IP address. Hence, it cannot tell whether a host that responds really has that IP address or is an imposter.
An ARP cache poisoning attack is one where an attacker creates fake ARP responses that contain the attacker’s MAC address and the target’s IP address. This will direct any traffic meant for the target to the attacker. It enables man-in-the-middle or denial of service attacks since the real host will not be receiving any IP traffic.
There are several defenses against ARP cache poisoning. One defense is to ignore replies that are not associated with requests. However, you need to hope that the reply you get is a legitimate one since an attacker may respond more quickly or perhaps launch a denial of service attack against the legitimate host and then respond.
Another defense is to give up on ARP broadcasts and simply use static ARP entries. This works but can be an administrative nightmare since someone will have to keep the list of IP and MAC address mappings and the addition of new machines to the environment.
Finally, one can enable something called Dynamic ARP Inspection. This essentially builds a local ARP table by using DHCP (Dynamic Host Configuration Protocol) Snooping data as well as static ARP entries. Any ARP responses will be validated against DHCP Snooping database information or static ARP entries. This assumes that the environment uses DHCP instead of fixed IP address assignments.
DHCP spoofing
Configure new devices on the LAN with your choice of DNS address, router address, etc.
When a computer joins a network, it needs to be configured for using the Internet Protocol (IP) on that network. This can be done automatically via DHCP, the Dynamic Host Configuration Protocol. It is used in practically every LAN environment and is particularly useful where computers (including phones) join and leave the network regularly, such as Wi-Fi hotspots.
A computer that joins a new network broadcasts a DHCP Discover message. A DHCP server on the network picks up this request and sends back a response that contains configuration information for this new computer on the network:
- IP address – the address given to the system
- Subnet mask – which bits of the IP address identify the local area network
- Default router – gateway to which all non-local datagrams will be routed
- DNS servers – servers that tell you the IP address for a domain name
- Lease time – how long the configuration is valid
As with ARP, we have the problem that a computer does not know where to go to for this information and has to rely on a broadcast query, hoping that it gets a legitimate response.
With DHCP Spoofing, any system can pretend to be a DHCP server and spoof responses that would normally be sent by a valid DHCP server. This imposter can provide the new system with a legitimate IP address but with false addresses for the gateway (default router) and DNS servers. The result is that the imposter can field DNS requests, which convert domain names to IP addresses and can also redirect any traffic that leaves the local area network from the new machine.
As with ARP cache poisoning, the attacker may launch a denial of service attack against the legitimate DHCP server to keep it from responding or at least delay its responses. If the legitimate server sends its response after the imposter, the new host will simply ignore the response.
There aren’t many defenses against DHCP spoofing. Some switches (such as those by Cisco and Juniper) support DHCP snooping. This allows an administrator to configure specific switch ports as “trusted” or “untrusted." Only specific machines, those on trusted ports, will be permitted to send DHCP responses. Any other DHCP responses will be dropped. The switch will also use DHCP data to track client behavior to ensure that hosts use only the IP address assigned to them and that hosts do not generate fake ARP responses
Network (IP) layer
The Internet Protocol (IP) layer is responsible for getting datagrams (packets) to their destination. It does not provide any guarantees on message ordering or reliable delivery. Datagrams may take different routes through the network and may be dropped by queue overflows in routers.
Source IP address authentication
Anyone can impersonate an IP datagram.
One fundamental problem with IP communication is that there is absolutely no source IP address authentication. Clients are expected to use their own source IP address but anybody can override this if they have administrative privileges on their system by using a raw sockets interface.
This enables one to forge messages to appear that they come from another system. Any software that authenticates requests based on their IP addresses will be at risk.
Anonymous denial of service
The ability to set an arbitrary source address in an IP datagram can be used for anonymous denial of service attacks. If a system sends a datagram that generates an error, the error will be sent back to the source address that was forged in the query. For example, a datagram sent with a small time-to-live, or TTL, value will cause a router that is hit when the TTL reaches zero to respond back with an ICMP (Internet Control Message Protocol) Time to Live exceeded message. Error responses will be sent to the forged source IP address and it is possible to send a vast number of such messages from many machines (by assembling a botnet) across many networks, causing the errors to all target a single system.
Routers
Routers are nothing more than computers with multiple network links and often with special purpose hardware to facilitate the rapid movement of packets across interfaces. They run operating systems and have user interfaces for administration. As with many other devices that people don’t treat as “real” computers, there is a danger that they routers will have simple or even default passwords. Moreover, owners of routers may not be nearly as diligent in keeping the operating system and other software updated as they are with their computers.
Routers can be subject to some of the same attacks as computers. Denial of service (DoS) attacks can keep the router from doing its job. One way this is done is by sending a flood of ICMP datagrams. The Internet Control Message Protocol is typically used to send routing error messages and updates and a huge volume of these can overwhelm a router. Routers may also have input validation bugs and not handle certain improper datagrams correctly.
Route table poisoning is the modification of the router’s routing table either by breaking into a router or by sending route update datagrams over an unauthenticated protocol.
Transport layer (UDP, TCP)
UDP and TCP are transport layer protocols that allow applications to establish communication channels with each other. Each endpoint of such a channel is identified by a port number (a 16-bit integer that has nothing to do with Ethernet switch ports). The port number allows the operating system to direct traffic to the proper socket. Hence, both TCP and UDP packets contain not only source and destination addresses but also source and destination ports.
UDP, the User Datagram Protocol, is stateless, connectionless, and unreliable.
As we saw with IP source address forgery, anybody can send UDP messages with forged source IP addresses.
TCP, the Transmission Control Protocol, is stateful, connection-oriented, and reliable. Every packet contains a sequence number (byte offset) and the operating system assembles received packets into their correct order. The receiver also sends acknowledgements so that any missing packets are retransmitted.
To handle in-order, reliable communication, TCP needs to establish state at both endpoints. It does this through a connection setup process that comprises a three-way handshake.
SYN: Client sends a SYN segment The client selects a random initial sequence number (
client_isn
).SYN/ACK: Server sends a SYN/ACK The server receives the SYN segment and knows that a client wants to connect to it. It allocates memory to store connection state and to hold out-of-order segments. The server generates an initial sequence number (
server_isn
) for its side of the data stream. This is also a random number. The response also contains an acknowledgement with the valueclient_isn+1
.ACK: Client sends a final acknowledgement The client acknowledges receipt of the SYN/ACK message by sending a final ACK message that contains an acknowledgement number of
server_isn+1
.
Note that the initial sequence numbers are random rather than starting at zero as one might expect. There are two reasons for this.
The primary reason is that message delivery times on an IP network are unpredictable and it is possible that a recently-closed connection may receive delayed messages, confusing the server on the state of that connection.
The security-sensitive reason is that if sequence numbers were predictable then it would be quite easy to launch a sequence number prediction attack where an attacker would be able to guess at likely sequence numbers on a connection and send masqueraded packets that will appear to be part of the data stream. Random sequence numbers do not make the problem go away but make it more challenging to launch the attack, particularly if the attacker does not have the ability to see traffic on the network.
SYN flooding
In the second step of the three-way handshake, the server is informed that a client would like to connect and allocates memory to manage this connection. Given that kernel memory is a finite resource, the operating system will allocate only a finite number of TCP buffers in its TCP queue. After that, it will refuse to accept any new connections.
In the SYN flooding attack, the attacker sends a large number of SYN segments to the target. These SYN messages contain a forged source address of an unreachable host, so the target’s SYN/ACK responses never get delivered anywhere. The handshake is never completed but the operating system has allocated resources for this connection. Depending on the operating system, it might be a minute or much longer before it times out on waiting for a response and cleans up these pending connections. Meanwhile, all TCP buffers have been allocated and the operating system refuses to accept any more TCP connections, even if they are from a legitimate source.
SYN flooding attacks cannot be prevented completely. One way of lessening their impact is the use of SYN cookies. With SYN cookies, the server does not allocate buffers & TCP state when a SYN segment is received. It responds with a SYN/ACK and creates an initial sequence number that is a hash of several known values:
hash(src_addr, dest_addr, src_port, dest_port, SECRET)
The “SECRET” is not shared with anyone; it is local to the operating system. When (if) the final ACK comes back from the client, the server needs to validate the acknowledgement number. Normally this requires comparing the number to the stored server initial sequence number plus 1. We did not allocate space to store this value but we can recompute the number by re-generating the hash, adding one, and comparing it to the acknowledgement number in the message. If it is valid, the kernel believes it was not the victim of a SYN flooding attack and allocate resources necessary for managing the connection.
TCP Reset
A somewhat simple attack is to send a RESET (RST) segment to an open TCP socket. If the server sequence number is correct then the connection will close. Hence, the tricky part is getting the correct sequence number to make it look like the RESET is part of the genuine message stream.
Sequence numbers are 32 bit values. The chance of successfully picking the correct sequence number is tiny: 1 in 232, or approximately one in four billion. However, many systems will accept a large range of sequence numbers that are approximately in the correct range to account for the fact that packets may arrive out of orders so they shouldn’t necessarily be rejected just because the sequence number is not exactly correct. This can reduce the search space tremendously and an attacker can send a flood of RST packets with varying sequence numbers and a forged source address until the connection is broken.
Routing protocols
The Internet was designed to connect multiple independently-managed networks, each of which may use different hardware. Routers connect local area networks as well as wide area networks together. A collection of consecutive IP addresses (most significant bits, called prefixes) as well as the underlying routers and network infrastructure, all managed as one administrative entity, is called an Autonomous System (AS). For example, the part of the Internet managed by Comcast constitutes an autonomous system (Comcast actually has 42 in different regions). The networks managed by Verizon happen to constitute a few autonomous systems as well. For purposes of our discussion, think of an AS ISPs or large data centers such as Google or Amazon (BTW, Rutgers is an Autonomous System: AS46).
The routers within an autonomous system need to share routing information so that those routers can route packets efficiently toward their destination. An Interior Gateway Protocol is used within an autonomous system. The most common is OSPF, Open Shortest Path First. While security issues exist within autonomous system, we will turn our attention to the sharing of information between autonomous systems.
Routers that are connected to routers in other ASes use an Exterior Gateway Protocol (EGP) called the Border Gateway Protocol, or BGP. With BGP, each autonomous system exchanges routing and reachability information with the autonomous systems with which it connects. For example, Comcast can tell Verizon what parts of the Internet it can reach. BGP uses a distance vector routing algorithm to enable the routers to determine the most efficient path to use to send packets that are destined for other networks. Unless an administrator explicitly configures a route, BGP will pick the shortest route.
BGP Hijacking
So what are the security problems with BGP? Edge routers use BGP to send route advertisements to routers they are connected to on neighboring autonomous systems. An advertisement is a list of IP address prefixes the AS can reach (shorter prefixes mean a bigger range of addresses) and the distance (number of hops) to each group of systems.
These are TCP messages with no authentication, integrity checks, or encryption. With BGP hijacking, a malicious party that has access to the network link or a connected router can inject advertisements for arbitrary routes. This information will propagate throughout the Internet and can cause routers throughout the Internet to send IP datagrams to the attacker, with the belief that is the shortest path to the destination.
A BGP attack can be used for eavesdropping (direct network traffic to a specific network by telling everyone that you’re offering a really short path) or a denial of service (DoS) attack (make parts of the network unreachable by redirecting traffic and then dropping it. There are currently close to 33,000 autonomous systems and most have multiple administrators. We live in the hope that none are malicious and that all routers are properly configured and properly secured.
It is difficult to change BGP since tens of thousands of independent entities use it worldwide. Two partial solutions to this problem emerged. the Resource Public Key Infrastructure (RPKI) framework simply has each AS get an X.509 digital certificate from a trusted entity (the Regional Internet Registry). Each AS signs its list of route advertisements with its private key and any other AS can validate that list of advertisements using the AS’s certificate.
A related solution is BGPsec, which is still a draft standard. Instead of signing an individual AS’s routes, every BGP message between ASes is signed.
Both solutions require every single AS to employ this solution. If some AS is willing to accept untrusted route advertisements but will relay them to other ASes as signed messages then integrity is meaningless. Moreover, most BGP hijacking incidents took place because legitimate system administrators misconfigured route advertisements either accidentally or on purpose. They were not the actions of attackers that hacked into a router.
A high profile BGP attack occurred against YouTube in 2008. Pakistan Telecom received a censorship order from the telecommunications ministry to block YouTube traffic to the country. The company sent spoofed BGP messages claiming to be the best route for the range of IP addresses used by YouTube. It used a longer address prefix than the one advertised by YouTube (longer prefix = fewer addresses). Because the longer prefix was deemed to be more specific, BGP gave it a higher priority. Within minutes, routers worldwide were directing their YouTube requests to Pakistan Telecom, which would simply drop them.
Domain Name System (DNS)
The Domain Name System (DNS) is a Hierarchical service that maps Internet domain names to IP addresses. A user’s computer runs the DNS protocol via a program known as a DNS stub resolver. It first checks a local file for specific preconfigured name-to-address mappings. Then it checks its cache of previously-found mappings. Finally, it contacts an external DNS resolver, which is usually located at the ISP or is run as a public service, such as Google Public DNS or OpenDNS.
We trust that the name-to-address mapping is legitimate. Web browsers, for instance, rely on this to enforce their same-origin policy. However, DNS queries and responses are sent using UDP with no authentication or integrity checks. The only check is that each DNS query contains a Query ID (QID). A DNS response must have a matching QID so that the client can match it to the query. These responses can be intercepted and modified or just forged. Malicious responses can return a different IP address that will direct IP traffic to different hosts
A solution called DNSsec has been proposed. It is a secure extension to the DNS protocol that provide authenticated requests & responses. However, few sites support it.
Pharming attack
A pharming attack is an attack on the configuration information maintained by a DNS server –either modifying the information used by the local DNS resolver or modifying that of a remote DNS server. By changing the name to IP address mapping, an attacker can cause software to send packets to the wrong system.
The most direct form of a pharming attack is to modify the local
hosts
file. This is the file (/etc/hosts
on Linux, BSD, and
macOS systems; c:\Windows\System32\Drivers\etc\hosts
on Windows)
that contains mappings between domain names and IP addresses. If
an entry is found here, the system will not bother checking a remote
DNS server.
Alternatively, malware may modify the DNS server settings on a system so that it would contact an attacker’s DNS server, which can provide the wrong IP address for certain domain names.
DNS cache poisoning (DNS spoofing attack)
DNS queries first check the local host’s DNS cache to see if the results of a past query have been cached. This yields a huge improvement in performance since a network query can be avoided. If the cached name-to-address mapping is invalid, then the wrong IP address is returned. Modifying this cached mapping is called DNS cache poisoning, also known as DNS spoofing. In the general case, DNS cache poisoning refers to any mechanism where an attacker is able to provide malicious responses to DNS queries. One way that DNS cache poisoning is done is via JavaScript on a malicious website.
The browser requests access to a legitimate site. For example, a.bank.com. Because the system does not have the address of a.bank.com cached, it sends a DNS query to an external DNS resolver using the DNS protocol. The query includes a query ID (QID) x1. At the same time that the request for a.bank.com is made, JavaScript launches an attacker thread that sends 256 responses with random QIDs (y1. y2, …}. Each of these DNS responses tells the server that the DNS server for bank.com is at the attacker’s IP address. If one of these responses happens to have a matching QUD, the host system will accept it as truth that all future queries for anything at bank.com should be directed to the name server for bank.com, which is run by the attacker. If the responses don’t work, the script can try again with a different sub-domain, b.bank.com. It might take many minutes, but there is a high likelihood that the attack will eventually succeed.
There are two defenses against this attack but they both require non-standard actions that will need to be coded into the system. One is to randomize the source port number of the query. Since the attacker does not get to see the query, it will not know where to send the bogus responses. There are 216 (65,536) ports to try. The second defense is to force all DNS queries to be issued twice. The attacker will have to guess the 32-bit query ID twice in a row and the chances of doing that successfully are infinitesimally small.
Summary: An attacker can run a local DNS server that will attempt to provide spoofed DNS responses to legitimate domain name lookup requests. If the query ID numbers of the fake response match those of a legitimate query (trial and error), the victim will get the wrong IP address, which will redirect legitimate requests to an attacker’s service.
DNS Rebinding
Web application security is based on the same-origin policy. Browser scripts can access cookies and other data on pages only if they share the same origin, which is the combination of URI (protocol), host name, and port number. The underlying assumption is that resolving a domain name takes you to the correct server.
The DNS rebinding attack allows JavaScript code on a malicious web page to access private IP addresses in the victim’s network. The attacker configures the DNS entry for a domain name to have a short time to live (TTL). When the victim’s browser visits the page and downloads JavaScript from that site, that JavaScript code is allowed to interact with the domain thanks to the same origin policy. However, right after downloading the script, the attacker can reconfigure the DNS server so that future queries will return an address in the internal network. The JavaScript code can then try to request resources from that system since, as far as the browser is concerned, the origin is the same because the name of the domain has not changed.
Summary: short time-to-live values in DNS allow an attacker to change the address of a domain name so that scripts from that domain can now access resources inside the private network.
DNS amplification attack
We have seen how source address spoofing can be used to carry out an anonymous denial of service (DoS) attack. Ideally, to overload a system, the attacker would like to send a small amount of data that would create a large response that would be sent to the target. This is called amplification. An obvious method would be to send a URL request over HTTP that will cause the server to respond with a large page reply. However, this does not work as HTTP uses TCP and the target would not have the TCP session established. DNS happens to be a UDP-based service. DNS amplification uses a collection of compromised systems that will carry out the attack (a botnet). Each system will send a small DNS query using a forged source address. These systems can contact their own ISP’s DNS servers since the goal is not to overwhelm any DNS server. The query asks for “ANY”, a request for all known information about the DNS zone. Each such query will cause the DNS server to send back a far larger reply.
-
MAC = Media Access Control_ and refers to the hardware address of the Ethernet device. ↩
Virtual Private Networks (VPNs)
Suppose we want to connect two local area networks in geoagraphically-separated areas together. For instance, we might have a company with locations in New York and in San Francisco. One way of doing this is to get a dedicated private network link between the two points. Many phone companies and network providers offer a private line service but it can be extremely expensive and is not feasible in many circumstances, such as if one of your endpoints is in the Amazon cloud rather than at your physical location.o
Instead, we can use the public Internet to communicate between the two locations. Our two subnets will often have private IP addresses (such as 192.168.x.x), which are not routable over the public internet. To overcome this, we can use a technique called tunneling. Tunneling is the process of encapsulating an IP datagram within another IP datagram. An IP datagram in one subnet (a local area network in one of our locations) that is destined to an address on the remote subnet will be directed to a gateway router. There, it will be treated as payload (data) and packaged within an IP datagram whose destination is the IP address of the gateway router at our other location. This datagram is now routed over the public Internet. The source and destination addresses of this outer datagram are the gateway routers at both sides.
IP networking relies on store-and-forward routing. Network data passes through routers, which are often unknown and may be untrustworthy. We have seen that routes may be altered to pass data through malicious hosts or directed to malicious hosts that accept packets destined for the legitimate host. Even with TCP connections, data can be modified or redirected and sessions can be hijacked. We also saw that there is no source authentication on IP packets: a host can place any address it would like as the source. What we would like is the ability to communicate securely, with the assurance that our traffic cannot be modified and that we are truly communicating with the correct endpoints.
Virtual private networks (VPNs) take the concept of tunneling and safeguard the encapsulated data by adding a MAC (message authentication code) so that we can detect if the data is modified and encrytion so that others cannot read the data. This way, VPNs allow separate local area networks to communicate securely over the public Internet.
IPsec is a popular VPN protocol that is really a set of two protocols.
The IPsec Authentication Header (AH) is an IPsec protocol that does not encrypt data but simply affixes a message authentication code to each datagram. It ensures the integrity of the each datagram.
The Encapsulating Security Payload (ESP), which provides integrity checks and also encryts the payload, ensuring secrecy.
IPsec can operate in tunnel mode or transport mode. In both cases, IPsec communciates at the same layer as the Internet Protocol. That is, it is not used by applications to communciate with one another but rather by routers or operating systems to direct an entire stream of traffic.
Tunnel mode VPNs provide network-to-network or host-to-network communication. The communication takes place between either two VPN-aware gateway routers or from a host to a VPN-aware router. The entire datagram is treated like payload and encapsulated within a datagram that is sent over the Internet to the remote gateway. That gateway receives this VPN datagram, extracts the payload, and routes it on the internal network where it makes its way to the target system.
Transport mode VPNs provide communication between two hosts. In this case, the IP header is not modified but data is protected. Note that, unlike transport layer security (TLS), which we examine later, setting up a transport mode VPN will protect all data streams between the two hosts. Applications are unaware that a VPN is in place.
Authentication Header (AH)
The Authentication Header (AH) protocol guarantees the integrity and authenticity of IP packets. AH adds an extra chunk of data (the authentication header) with a MAC to the IP datagram. Anyone with knowledge of the key can create the MAC or verify it. This ensures message integrity since an attacker will not be able to modify message contents and have the HMAC remain valid. Attackers will also not be able to forge messages because they will not know the key needed to create a valid MAC. Every AH also has a sequence number that is incremented for each datagram that is transmitted, ensuring that messages are not inserted, deleted, or replayed.
Hence, IPsec AH protects messages from tampering, forged addresses, and replay attacks.
Encapsulating Security Payload (ESP)
The Encapsulating Security Payload (ESP) provides the same integrity assurance but also adds encryption to the payload to ensure confidentiality. Data is encrypted with a symmetric cipher (usually AES).
IPsec cryptographic algorithms
Authentication
An IPsec session begins with authenticating the endpoints. IPsec supports the use of X.509 digital certificates or the use of pre-shared keys. Digital certificates contain the site’s public key and allow us to validate the identity of the certificate if we trust the issuer (the certification authority, or CA). We authenticate by proving that can take a nonce that the other side encrypted with our public key and decrypt it using our private key. A pre-shared key means that both sides configured a static shared secret key ahead of time. We prove that we have the key in a similar manner: one side creates a nonce and asks the other side to encrypt it and send the results. THen the other side does the same thing.
Key exchange
HMAC message authentication codes and encryption algorithms both require the use of secret keys. IPsec uses Diffie-Hellman to create random shared session keys. Diffie-Hellman makes it quick to generate a public-private key pair that is needed to derive a common key ao there is no dependence on long-term keys, assuring forward secrecy.
Confidentiality
In IPsec ESP, the payload is encrypted using either AES-CBC or 3DES-CBC. CBC is cipher-block chaining, which has the property that the ciphertext of each datagram is dependent on all previous datagrams, ensuring that datagrams cannot be substituted from old messages.
Integrity
IPsec uses HMAC, a form of a message authentication code that uses a cryptographic hash function and a shared secret key. It supports either SHA–1 or SHA–2 hash functions.
IPsec Authentication Header mode is rarely used since the overhead of encrypting data these days is quite low and ESP provides both encryption in addition to authentication and integrity.
Transport Layer Security (TLS)
Virtual Private Networks were designed to operate at the network layer. They were designed to connect networks together. Even with transport mode connectivity, they tunnel all IP traffic and do not differentiate one data stream from another. They do not solve the problem of an application needing authenticated, tamper-proof, and encrypted communications to another application.
Secure Sockets Layer (SSL) was created as a layer of software above TCP that provides authentication, integrity, and encrypted communication while preserving the abstraction of a sockets interface to applications. An application sets up an SSL session to a service. After that, it simply sends and receives data over a socket just like it would with the normal sockets-based API that operating systems provide. The programmer does not have to think about network security. As SSL evolved, it morphed into a new version called TLS, Transport Layer Security. While SSL is commonly used in conversation, all current implementations are TLS.
Any TCP-based application that may not have addressed network security can be security-enhanced by simply using TLS. For example, the standard email protocols, SMTP, POP, and IMAP, all have TLS-secured interfaces. Web browsers use HTTP, the Hypertext Transfer Protocol, and also support HTTPS, which is the exact same protocol but uses A TLS connection.
TLS has been designed to provide:
- Data encryption
- Symmetric cryptography is used to encrypt data.
- Data integrity
- Ensure that we can detect if data in transit has not been modified. TLS includes a MAC with transmitted data.
- Authentication
- TLS provides mechanisms to authenticate the endpoints prior to sending data. Authentication is optional and can be unidirectional (the client may just authenticate the server), unidirectional (each side authenticates the other), or none (in which case we just exchange keys but do not validate identities).
- Key exchange
- After authentication, TLS performs a key exchange so that both sides can obtain random shared session keys. TLS creates separate keys for each direction of communication (encryption keys for client-to-server and server-to-client data streams) and separate keys for data integrity (MAC keys for client-to-server and server-to-client streams).
- Interoperability & evolution
- TLS was designed to support many different key exchange, encryption, integrity, & authentication protocols. The start of each session enables the protocol to negotiate what protocols to use for the session.
TLS sub-protocols
These features are implemented in two sub-protocols within TLS:
- 1. Authentication and key exchange
- Authentication uses public key cryptography with X.509 certificates to authenticate a system. Both the client and server can present their X.509 digital certificates. TLS validates the signature of the certificate and uses nonce-based public key authentication to validate that each party has the corresponding private key.
-
Key exchange supports several options. Ephemeral Diffie-Hellman key exchange is the most common since it supports the efficient generation of shared keys and there is no long-term key storage, providing forward secrecy. TLS can accommodate other key exchange techniques as well, including DIffie-Hellman with static keys, RSA public key-based key exchange, and pre-shared static keys.
- 2. Communication
- Data encryption uses symmetric cryptography and supports a variety of algorithms, including AES GCM, AES CBC, ARIA (GCM/CBC), and ChaCha20. AES is the Advanced Encryption Standard. CBC is cipher block chaining, which makes each ciphertext block a function of the preceding one. GCM is Galois/Counter Mode, an alternative to CBC that encrypts an incrementing counter and exclusive-ors it with a block of plaintext. ARIA is a South Korean standard encryption algorithm that is similar to AES. ChaCha20 is an encryption algorithm that is generally more efficient than AES on low-end processors.
-
Data integrity is provided by a message authentication code (MAC) that is attached to each block of data. TLS allows the choice of several, including HMAC-MD5, HMAC-SHA1, HMAC-SHA256/384, and Poly1305.
TLS protocol
The steps in a TLS session are:
The client connects to the server and sends information about its version and the ciphers it supports. It is up to the client and server to negotiate for the ones they will use.
The server responds with its X.509 certificate, the protocol version and ciphers it is willing to use, and, possibly, a request for a client certificate.
The client validates the integrity of the server’s certificate by validating the signature of the certificate. If the client trusts the server’s CA, the client can validate the authenticity of the server. Otherwise, the client can simply validate that the server owns the corresponding private key.
The client generates random session keys and sends them to the server via a Diffie-Hellman key exchange.
Optionally, the client responds with its certificate.
If the client responds with its certificate, the server validates the certificate and the client.
The client and server can now exchange data. Each message is first compressed and then encrypted with a symmetric algorithm. An HMAC (hash MAC) for the message is also sent to allow the other side to validate message integrity.
TLS is widely used and generally considered secure if strong cryptography is used. Its biggest problem was a man-in-the-middle attack where the attacker would be able to send a message to renegotiate the protocol and choose one that disables encryption. Another attack was a denial-of-service attack where an attacker initiates a TLS connection but keeps requesting a regeneration of the encryption key, using up the server’s resources in the process. Both of these have been fixed.
Unidirectional vs. mutual authentication
TLS supports mutual authentication. To implement authentication, the server sends the client its X.509 digital certificate so the client can authenticate the server by having the server prove it knows the private key. TLS also supports mutual authentication: the client will send its X.509 certificate to the server so the server can authenticate the client.
One notable aspect of TLS sessions is that, in most cases, only the server will present a certificate. Hence, the server will not authenticate or know the identity of the client. Client-side certificates have been problematic. Generating keys and obtaining trustworthy certificates is not an easy process for users. A user would have to install the certificate and the corresponding private key on every system she uses. This would not be practical for shared systems. Moreover, if a client did have a certificate, any server can request it during TLS connection setup, thus obtaining the identity of the client. This could be desirable for legitimate banking transactions but not for sites where a user would like to remain anonymous. We generally rely on other authentication mechanisms, such as the password authentication protocol, but carry them out over TLS’s secure communication channel.
Firewalls
A firewall protects the junction between an untrusted network (e.g., external Internet) and a trusted network (e.g., internal network). Two approaches to firewalling are packet filtering and proxies. A packet filter, or screening router, determines not only the route of a packet but whether the packet should be dropped based on contents in the IP header, TCP/UDP header, and the interface on which the packet arrived. It is usually implemented inside a border router, also known as the gateway router that manages the flow of traffic between the ISP and internal network. The basic principle of firewalls is to never have a direct inbound connection from the originating host from the Internet to an internal host; all traffic must flow through a firewall and be inspected.
The packet filter evaluates a set of rules to determine whether to drop or accept a packet. This set of rules forms an access control list, often called a chain. Strong security follows a default deny model, where packets are dropped unless some rule in the chain specifically permits them.
First-generation packet filters implemented stateless inspection. A packet is examined on its own with no context based on previously-seen packets.
Second-generation packet filters track TCP connections and other information from previous connections. These stateful packet inspection (SPI) firewalls allow the router to keep track of outstanding TCP connections. For instance:
They can block TCP data traffic if a connection setup did not take place to avoid sequence number prediction attacks.
They can track that a connection has been established by a client to a remote server and allow return traffic to that client (which is essential for any interaction by someone inside the network with external services).
They can track connectionless UDP and ICMP messages and allow responses to be sent back to clients in the internal network. DNS queries and pings (ICMP echo-reply messages) are examples of these.
They also and understand the relationship between packets. For example, when a client establishes an FTP (file transfer protocol) connection to a server on port 21, the server establishes a connection back to the client on a different port when it needs to send data.
Packet filters traditionally do not look above the transport layer (UDP and TCP protocols and port numbers). Third-generation packet filters incorporate deep packet inspection (DPI), which allows a firewall to examine application data as well and make decisions based on its contents. Deep packet inspection can validate the protocol of an application as well as check for malicious content such as malformed URLs or other security attacks. DPI is generally considered to be part of Intrusion Prevention Systems. Examples are detecting application-layer protocols such as HTTP and then applying application-specific filters, such as checking for suspicious URLs or disallowing the download of certain ActiveX or Java applets.
Deep Packet Inspection (DPI) firewalls evolved to Deep Content Inspection (DCI) firewalls. These use the same concept but are capable of buffering large chunks of data from multiple packets that contain an entire object and acting on it, such as unpacking base64-encoded content from web and email messages and performing a signature analysis for malware.
Application proxies
An application proxy is software that presents the same protocol to the outside network as the application for which it is a proxy. For example, a mail server proxy will listen on port 25 and understand SMTP, the Simple Mail Transfer Protocol. The primary job of the proxy is to validate the application protocol and thus guard against protocol attacks (extra commands, bad arguments) that may exploit bugs in the service. Valid requests are then regenerated by the proxy to the real application that is running on another server and is not accessible from the outside network.
Application proxies are usually installed on dual-homed hosts. This is a term for a system that has two “homes”, or network interfaces: one for the external network and another for the internal network. Traffic does not pass between the two networks. The proxy is the only one that can communicate with the internal network. Unlike DPI, a proxy may modify the data stream, such as stripping headers or modifying machine names. It may also restructure the commands in the protocol used to communicate with the actual servers (that is, it does not have to relay everything that it receives).
DMZs
A typical firewalled environment is a screened subnet architecture, with a separate subnet for systems that run externally-accessible services (such as web servers and mail servers) and another one for internal systems that do not offer services and should not be accessed from the outside. The subnet that contains externally-accessible services is called the DMZ (demilitarized zone). The DMZ contains all the hosts that may be offering services to the external network (usually the Internet). Machines on the internal network are not accessible from the Internet. All machines within an organization will be either in the DMZ or in the internal network.
Both subnets will be protected by screening routers. They will ensure that no packet from the outside network is permitted into the inside network. Logically, we can view our setup as containing two screening routers:
The exterior router allows IP packets only to the machines/ports in the DMZ that are offering valid services. It would also reject any packets that are masqueraded to appear to come from the internal network.
The interior router allows packets to only come from designated machines in the DMZ that need to access services in the internal network. Any packets not targeting the appropriate services in the internal network will be rejected. Both routers will generally allow traffic to flow from the internal network to the Internet, although an organization may block certain services (ports) or force users to use a proxy (for web access, for example).
Note that the two screening routers may be easily replaced with a single router since filtering rules can specify interfaces. Each rule can thus state whether an interface is the DMZ, internal network, or Internet (ISP).
Host-based firewalls
Firewalls generally intercept all packets entering or leaving a local area network. A host-based firewall, on the other hand, runs on a user’s computer. Unlike network-based firewalls, a host-based firewall can associate network traffic with individual applications. Its goal is to prevent malware from accessing the network. Only approved applications will be allowed to send or receive network data. Host-based firewalls are particularly useful in light of deperimiterization: the boundaries of external and internal networks are sometimes fuzzy as people connect their mobile devices to different networks and import data on flash drives. A concern with host-based firewalls is that if malware manages to get elevated privileges, it may be able to shut off the firewall or change its rules.
Intrusion detection/prevention systems
An enhancement to screening routers is the use of intrusion detection systems (IDS). Intrusion detection systems are often parts of DPI firewalls and try to identify malicious behavior. There are three forms of IDS:
A protocol-based IDS validates specific network protocols for conformance. For example, it can implement a state machine to ensure that messages are sent in the proper sequence, that only valid commands are sent, and that replies match requests.
A signature-based IDS is similar to a PC-based virus checker. It scans the bits of application data in incoming packets to try to discern if there is evidence of “bad data”, which may include malformed URLs, extra-long strings that may trigger buffer overflows, or bit patterns that match known viruses.
An anomaly-based IDS looks for statistical aberrations in network activity. Instead of having predefined patterns, normal behavior is first measured and used as a baseline. An unexpected use of certain protocols, ports, or even amount of data sent to a specific service may trigger a warning.
Anomaly-based detection implies that we know normal behavior and flag any unusual activity as bad. This is difficult since it is hard to characterize what normal behavior is, particularly since normal behavior can change over time and may exhibit random network accesses (e.g., people web surfing to different places). Too many false positives will annoy administrators and lead them to disregard alarms.
A signature-based system employs misuse-based detection. It knows bad behavior: the rules that define invalid packets or invalid application layer data (e.g., ssh root login attempts). Anything else is considered good.
Intrusion Detection Systems (IDS) monitor traffic entering and leaving the network and report any discovered problems. Intrusion Prevention Systems (IPS) serve the same function but are positioned to sit between two networks like a firewall and can actively block traffic that is considered to be a threat or policy violation.
Type | Description |
---|---|
Firewall (screening router) | 1st generation packet filter that filters packets between networks. Blocks/accepts traffic based on IP addresses, ports, protocols |
Stateful inspection firewall | 2nd generation packet filter. Like a screening router but also takes into account TCP connection state and information from previous connections (e.g., related ports for TCP) |
Deep Packet Inspection firewall | 3rd generation packet filter. Examines application-layer protocols |
Application proxy | Gateway between two networks for a specific application. Prevents direct connections to the application from outside the network. Responsible for validating the protocol |
IDS/IPS | Can usually do what a stateful inspection firewall does + examine application-layer data for protocol attacks or malicious content |
Host-based firewall | Typically screening router with per-application awareness. Sometimes includes anti-virus software for application-layer signature checking |
Host-based IPS | Typically allows real-time blocking of remote hosts performing suspicious operations (port scanning, ssh logins) |
Web security
When the web browser was first created, it was relatively simple: it parsed static content for display and presented it to the user. The content could contain links to other pages. As such, the browser was not an interesting security target. Any dynamic modification of pages was done on servers and all security attacks were focused on those servers. These attacks included malformed URLs, buffer overflows, root paths, and unicode attacks.
The situation is vastly different now. Browsers have become insanely complex:
Built-in JavaScript to execute arbitrary downloaded code
The Document Object Model (DOM), which allows JavaScript code to change the content and appearance of a web page.
XMLHttpRequest, which enables JavaScript to make HTTP requests back to the server and fetch content asynchronously.
WebSockets, which provide a more direct link between client and server without the need to send HTTP requests.
Multimedia support; HTML5 added direct support for
<audio>
,<video>~
, and<track>~
tags, as well as MediaStream recording of both audio and video and even speech recognition and synthesis (with the Chrome browser, for now).Access to on-device sensors, including geolocation and tilt
the NaCl framework on Chrome, providing the Ability to run native apps in a sandbox within the browser
The model evolved from simple page presentation to that of running an application. All these features provide a broader attack surface. The fact that many features are relatively new and more are being developed increases the likelihood of more bugs and therefore more security holes. Many browser features are complex and developers won’t always pay attention to every detail of the specs (see quirksmode.org). This leads to an environment where certain less-common uses of a feature may have bugs or security holes on certain browsers.
Multiple sources
Traditional software is installed as a single application. The application may use external libraries, but these are linked in by the author and tested. Web apps, on the other hand, dynamically load components from different places. These include fonts, images, scripts, and video as well as embedded iFrames that embed HTML documents within each other. The JavaScript code may issue XMLHttpRequests to yet additional sites.
One security concern is that of software stability. If you import JavaScript from several different places, will your page still display correctly and work properly in the future as those scripts are updated and web standards change? Do those scripts attempt to do anything malicious? Might they be modified by their author to do something malicious in the future?
Then there’s the question of how elements on a page should be allowed to interact. Can some analytics code access JavaScript variables that come from a script downloaded from jQuery.com on the same web page? The scripts came from different places the page author selected them for the page, so maybe it’s ok for them to interact. Can analytics scripts interact with event handlers? If the author wanted to measure mouse movements and keystrokes, perhaps it’s ok for a downloaded script to use the event handler. How about embedded frames? To the user, the content within a frame looks like it is part of the rest of the page. Should scripts work any differently?
Frames and iFrames
A browser window may contain a collection of documents from different sources. Each document is rendered inside a frame. In the most basic case, there is just one frame: the document window. A frame is a rigid division that is part of a frameset, a collection of frames. Frames are not officially supported in HTML, the latest version of HTML, but many browsers still support them. An iFrame is a floating inline frame that moves with the surrounding content. iFrames are supported. When we talk about frames, we will be talking about the frames created with an iFrame tag.
Frames are generally invisible to users and are used to delegate screen area to content from another source. A very basic goal of browser security is to isolate visits to separate pages in distinct windows or tabs. If you visit a.com and b.com in two separate tabs, the address bar will identify each of them and they will not share information. Alternatively, a.com may have frames within it (e.g., to show ads from other sites, so b.com may be a frame within a.com. Here, too, we would like the browser to provide isolation between a.com and b.com even though b.com is not visible as a distinct site to the user.
Same-origin policy
The security model used by web browsers is the same-origin policy. A browser permits scripts in one page to access data in a second page only if both pages have the same origin. An origin is defined to be the URI scheme (http vs. https), the hostname, and the port. For example
http://www.poopybrain.com/419/test.html
and
http://www.poopybrain.com/index.html
have the same origin since they both use http, both use port 80 (the default http port since none is specified), and the same hostname (www.poopybrain.com). If any of those components were different, the origin would not be the same. For instance, www.poopybrain.com is not the same hostname as poopybrain.com.
Under the same-origin policy, each origin has access to common client-side resources that include:
Cookies: Key-value data that clients or servers can set. Cookies associated with the origin are sent with each http request.
JavaScript namespace: Any functions and variables defined or downloaded into a frame share that frame’s origin.
DOM tree: This is the JavaScript definition of the HTML structure of the page.
DOM storage: Local key-value storage.
Each frame gets the origin of its URL. Many pages will have just one frame: the browser window. Other pages may embed other frames. Each of those embedded frames will not have the origin of the outer frame but rather the URL of the frame contents. Any JavaScript code downloaded into a frame will execute with the authority of its frame’s origin. For instance, if cnn.com loads JavaScript from jQuery.com, the script runs with the authority of cnn.com. Passive content, which is non-executable content such as CSS files and images, has no authority. This normally should not matter as passive content does not contain executable code but there have been attacks in the past that had code in passive content and made that passive content turn active.
Cross-origin content
As we saw, it is common for a page to load content from multiple origins. The same-origin policy states that JavaScript code from anywhere runs with the authority of the frame’s origin. Content from other origins is generally not readable or writable by JavaScript.
A frame can load images from other origins but cannot inspect that image. However, it can infer the size of the image by examining the changes to surrounding elements after it is rendered.
A frame may embed CSS (cascading stylesheets) from any origin but cannot inspect the CSS content. However, JavaScript in the frame can discover what the stylesheet does by creating new DOM nodes (e.g., a heading tag) and see how the styling changes.
A frame can load JavaScript, which executes with the authority of the frame’s origin.
If the source is downloaded from another origin, it is executable but not readable.
However, one can use JavaScript’s toString
method to decompile the function
and get a string representation of the function’s declaration.
All these restrictions are somewhat ineffective anyway since a curious user can download
any of that content directly (e.g., via the curl
command) and inspect it.
Cross-Origin Resource Sharing (CORS)
Even though content may be loaded from different origins, browsers restrict cross-origin HTTP requests that are initiated from scripts (e.g., via XMLHttpRequest or Fetch). This can be problematic at times since sites such as poopybrain.com and www.poopybrain.com are considered distinct origins, as are http://poopybrain.com and https://poopybrain.com.
Cross-Origin Resource Sharing (CORS) was created to allow web servers to specify cross-domain access permission. This will allow scripts on a page to issue HTTP requests to approved sites. It also allows access to Web Fonts, inspectable images, and access to stylesheets. CORS is enabled by an HTTP header that states allowable origins. For example,
Access-Control-Allow-Origin: http://www.example.com
means that the URL http://www.example.com will be treated as the same origin as the frame’s URL.
Cookies
Cookies are name-value sets that are designed to maintain state between a web browser and a server. Cookies are sent to the server along with HTTP requests and servers may send back cookies with a response. Uses for cookies include storing a session ID that identifies your browsing session to the server (including a reference to your shopping cart or partially-completed form), storing shopping cart contents directly, or tracking which pages you visited on the site in the past (tracking cookies). Cookies are also used to store authentication information so you can be logged into a page automatically upon visiting it (authentication cookies).
Now the question is: which cookies should be sent to a server when a browser makes an HTTP request? Cookies don’t quite use the same concept of an origin. The scope of a cookie is defined by its domain and path. Unlike origins, the scheme (http or https) is ignored by default, as is the port. The path is the path under the root URL, which is ignored for determining origins but used with cookies. Unless otherwise defined by the server, the default domain and path are those of the frame that made the request.
A client cannot set cookies for a different domain. A server, however, can specify top-level or deeper domains. Setting a cookie for a domain example.com will cause that cookie to be sent whenever example.com or any domain under example.com is accessed (e.g., www.example.com). For the cookie to be accepted by the browser, the domain must include the origin domain of the frame. For example, if you are on the page www.example.com, your browser will accept a cookie for example.com but will not accept a cookie for foo.example.com.
Cookies often contain user names, complete authentication information, or shopping cart contents. If malicious code running on the web page could access those cookies, it could modify your cart, get your login credentials, or even modify cookies related to cloud-based services to have your documents or email get stored to a different account. This is a very real problem and two safeguards were put in place:
A server can tag a cookie with an HttpOnly flag. This will not allow scripts on the page to access the cookie, so it is useful to keep scripts from modifying or reading user identities or session state.
HTTP messages are sent via TCP. Nothing is encrypted. An attacker that has access to the data stream (e.g., a man in the middle or a packet sniffer) can freely read or even modify cookies. A Secure flag was added to cookies to specify that they can be sent only over an HTTPS connection.:
Set-Cookie: username=paul; path=/; HttpOnly; Secure
If a user is making requests via HTTP, Secure cookies will not be transmitted.
Cross-Site Request Forgery (XSRF)
Cross-site request forgery is an attack that sends unauthorized requests from a user that the web server trusts. Let’s consider an example. You previously logged into Netflix. Because of that, the Netflix server sent an authentication cookie to your browser; you will not have to log in the time you visit netflix.com. Now you go to another website that contains a malicious link or JavaScript code to access a URL. The URL is:
http://www.netflix.com/JSON/AddToQueue?movieid=860103
By hitting this link on this other website, the attacker added Plan 9 from Outer Space to your movie queue (this attack really worked with Netflix but has been fixed). This may be a minor annoyance but the same attack could create more malicious outcomes. If, instead of Netflix, the attack could take place against an e-commerce site that accepted your credentials but allows the attacker to add a different shipping address on the URL. More dangerously, a banking site may use your stored credentials and account number. Going to the malicious website may enable the attacker to request a funds transfer to another account:
http://www.bank.com/action=transfer*amount=1000000&to_account=417824919
Note that the attack works because of how cookies work. You visited a random website but inadvertently requested another site. Your browser dutifully sends and HTTP GET request to that site with the URL specified in the link and also sends all the cookies for that site. The attacker never steals your cookies and does not intercept any traffic. The attack is simply the creation of a URL that makes it look like you requested some action.
There are several defenses against Cross-site request forgery:
The server can validate the
Referer
header on the request. This will tell it whether the request came via a link or directly from a user (or from a link on a trusted site).The server can require some unique token to be present in the request. For instance, visiting netflix.com might cause the Netflix server to return a token that will need to be passed to any successive URL. An attacker will not be able to create a static URL on her site that will contain this random token.
The interaction with the server can use HTTP POST requests instead GET requests, placing all parameters into the body of the request rather than in the URL. State information can be passed via hidden input fields instead of cookies.
Clickjacking
Clickjacking is a deception attack where the attacker overlays an image to have the user think that he is clicking some legitimate link or image but is really requesting something else. For example, a site may present a “win a free iPad” image. However, malicious JavaScript in the page can place an invisible frame over this image that contains a link. Nothing is displayed to obstruct the “win a free iPad” image but when a user clicks on it, the link that is processed is the one in the invisible frame. This malicious link could download malware, change security settings for the Flash plug-in, or redirect the user to a page containing malware or a phishing attack.
A defense for clickjacking is to use defensive JavaScript in the legitimate code to check that the content is at the topmost layer:
window.self == window.top
If it isn’t then it means the content is obstructed, possibly by an invisible clickjacking attack. Another defense is to have the server send an X-Frame-Options HTTP header to instruct the browser to not allow framing from other domains.
Screen sharing
HTML5, the latest standard for HTML, added a screen-sharing API. Normally, no cross-origin communication is permitted between client and server. The screen-sharing API violates this. If a user grants screen-sharing permission to a frame, the frame can take a screenshot of the entire display (the entire monitor, all windows, and the browser). It can also get screenshots of pages hidden by tabs in a browser.
This is not a security hole and there are no exploits (yet) to enable screen sharing without the user’s explicit opt-in. However, it is a risk because the user might not be aware of the scope or duration of screen sharing. If you believe that you are sharing one browser window, you may be surprised to discover that the server was examining all your screen content.
Input sanitization
In the past we saw how user input that becomes a part of database queries or commands can alter those commands and, in many cases, enable an attacker to add arbitrary queries or commands. The same applies to URLs, HTML source, and JavaScript. Any user input needs to be parsed carefully before it can be made part of a URL, HTML content, or JavaScript. Consider a script that is generated with some in-line data that came from a malicious user:
<script> var x = "untrusted_data"; </script>
The malicious user might define that untrusted_data to be
Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="bye
The resulting script to set the variable x
now becomes
<script> var x = "Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x=bye"; </script>
Cross-site scripting
Cross-site Scripting (XSS) is a code injection attack that allows the attacker to inject client-side scripts into web pages. It can be used to bypass the same-origin policy and other access controls. Cross-site scripting has been one of the most popular browser attacks.
The attack may be carried out in two ways: a URL that a user clicks on and gets back a page with the malicious code and by going to a page that contains user content that may include scripts.
In a Reflected XSS attack, all malicious content is in a page request, typically a link that an unsuspecting user will click on. The server will accept the request without sanitizing the user input and present a page in response. This page will include that original content. A common example is a search page that will display the search string before presenting the results (or a “not found” message). Another example is an invalid login request that will return with the name of the user and a “not found” message. Consider a case where the search string or the login name is not just a bunch of characters but text to a script. The server treats it as a string, does the query, cannot find the result, and sends back a page that contains that string, which is now processed as inline JavaScript code.
www.mysite.com/login.asp?user=<script>malicious_code(…) </script>
In a Persistent XSS attack, user input is stored at a site and later presented to other users. Consider online forums or comment sections for news postings and blogs. If you enter inline JavaScript as a comment, it will be placed into the page that the server constructs for any future people who view the article. The victim will not even have to click a link to run the malicious payload.
Cross-site scripting is a problem of input sanitization.
Servers will need to parse input that is expected to be a string to ensure that
it does not contain embedded HTML or JavaScript. The problem is more difficult
with HTML because of its support for encoded characters. A parser will
need to check not only for “script
” but also for “%3cscript%3e
”.
As we saw earlier, there may be several acceptable Unicode encodings for the
same character.
Cross-site scripting, by executing arbitrary JavaScript code can:
- Access cookies related to that website
- Hijack a session
- Create arbitrary HTTP requests with arbitrary content via XMLHtttpRequest
- Make arbitrary modifications to the HTML document by modifying the DOM
- Install keyloggers
- Download malware – or run JavaScript ransomware
- Try phishing by manipulating the DOM and adding a fake login page
The main defense against cross-site scripting is to sanitize all input.
Some web frameworks do this automatically. For instance,
Django templates allow the author
to specify where generated-content is inserted (e.g., <b> hello, {{name}} </b>
)
and performs the
necessary sanitization to ensure it does not modify the HTML or add JavaScript.
Other defenses are:
Use a less-expressive markup language for user input, such as markdown if you want to give users the ability to enter rich text. However, input sanitization is still needed to ensure there are no HTML or JavaScript escapes
Employ a form of privilege separation by placing untrusted content inside a frame with a different origin. For example, user comments may be placed in a separate domain. This does not stop XSS damage but limits it to the domain.
Use the Content Security Policy (CSP). The content security policy was designed to defend agains XSS and clickjacking attacks. It allows website owners to tell clients what content is allowed, whether inline code is permitted, and whether the origin should be redefined to be unique.
SQL injection
We previously saw that SQL injection is an issue in any software that uses user input as part of the the SQL query. The same applies to browsers. Many web services have databases behind them and links often contain queries mixed with user input. If input is not properly sanitized, it can alter the SQL query to modify the database, force a user authentication, or return the wrong data.
GIFAR attack
The GIFAR attack is a way to embed malicious code into an image file. Sites that allow user-uploadable pictures are vulnerable. GIFAR is a pseudo-concatenation of GIF and JAR.
Java applets are sent as JAR files. A Java JAR file is essentially a zip file, a popular format for compressing and archiving multiple files. In Jar files, the header that contains information about the content is stored at the end of the file.
GIF files are lossless image files. The header in GIF files, as with most other file formats, is stored at the beginning of the file.
GIF and JAR files can be combined together to create a GIFAR file. Because the GIF header is at the beginning of the file, the browser believes it is an image and opens it as such, trusts its content, unaware that it contains code. Meanwhile the Java virtual machine (JVM) recognizes the JAR part of the file, which is run as an applet in the victim’s browser.
An attacker can use cross-site scripting to inject a request to invoke the applet
(<applet archive:"myimage.gif">
), which will cause it to run in the context
of the origin (the server that hosted it). Because the code is run as a Java applet rather
than JavaScript,
it bypasses the “no authority” restriction the browser imposes on JavaScript in images.
HTML image tag vulnerability
We saw that the same-origin policy treats images as static content with no authority.
It would seem that images should not cause problems (ignoring the now-patched
GIFAR vulnerability that allowed images to inject Java applets). However,
an image tag (IMG
) can pass parameters to the server, just like any other URL:
<img src="http://evil.com/images/balloons.jpg?extra_information" height="300" width="400"/>
This can be used to notify the server that the image was requested from specific content. The web server will also know the IP address that sent the request. The image itself can be practically hidden by setting its size to a single pixel:
<img src="..." height="1" width="1" />
This is sometimes done to track messages sent to user. If I send you HTML-formatted mail
that contains a one-pixel image, you will not notice the image but my server will be
notified that the image was downloaded. If the IMG
tag contained some text
to identify that this is related to the mail message I sent you, I will now know
that you read the message.
Images can also be used for social engineering: to disguise a site by appropriating logos from well-known brands or adding certification logos.
Mixed HTTP and HTTPS content
A web page that was served via HTTPS might contain a reference to a URL, such as a script, that specifies HTTP content:
<script src="http://www.mysite.com/script.js"> </script>
The browser would follow the scheme in the URL and download that content via HTTP rather than over the secure link. An active network attacker can hijack that session and modify the content. A safer approach is to not specify the scheme for scripts, which would cause them to be served over the same protocol as their embedding frame.
<script src="//www.mysite.com/script.js"> </script>
Some browsers give warning of mixed content but the risks and knowledge of what really is going on might not be clear to users.
Extended Validation Certificates
TLS establishes a secure communication link between a client and server. For the authentication to be meaningful, the user must be convinced that the server’s X.509 certificate truly belongs to the entity that is identified in the certificate. Would you trust a bankofamerica.com certificate issued the Rubber Ducky Cert Shack? Even legitimate issuers such as Symantec offer varying levels of validating a certificate owner’s identity.
The lowest level of identity assurance for organizations is a domain validated certificate. To validate the user, the certificate authority will validate that some contact at that domain approves the request. This is usually done through email. It does not prove that the user has legal authority to act on behalf of the company nor is there any validation of the company. They require consent of the domain owner but do not try to validate who that owner is. They offer only incrementally more identity binding than self-signed certificates.
With extended validation (EV) certificates, the certificate authority uses a more rigorous, human-driven validation process. The legal and physical presence of the organization is validated. Then, the organization is contacted through a verified phone number and both the contact and the contact’s supervisor must confirm the authenticity of the request.
An extended validation certificate contains the usual data in a certificate (public key, issuer, organization, …) but must also contain a government-registered serial number and a physical address of the organization.


An attacker could get a low-level certificate and set up a web site. Targets would go to it, see the lock icon on their browser’s address bar that indicates an SSL connection, and feel secure. This led users to a false sense of security: the connection is encrypted but there is no reason to believe that there is validity to the organization on the other side.
Modern browsers identify and validate EV certificates. Once validated, the browser presents an enhanced security indicator that identifies the certificate owner.
Browser status bar
Most browsers offer an option to display a status bar that shows the URL of a link before you click it.
This bar is trivial to spoof by adding an onclick attribute to the link that invokes JavaScript
to take the page to a different link. In this example, hovering over the PayPal link will show a
link to http://www.paypal.com/signin
, which appears to be a legitimate PayPal login page.
Clicking on that link, however, will take the user to http://www.evil.com
.
<a href="http://www.paypal.com/signin"
onclick="this.href = 'http://www.evil.com/';">PayPal</a>
Mobile device security
What makes mobile devices unique?
In many ways, mobile devices should not be different from laptops or other computer systems. They run operating systems that are derived from those those systems, run multiple apps, and connect to the network. There are differences, however, that make them more attractive targets to attackers.
Users
Several user factors make phones different from most computing devices:
Mobile users often do not think of their phones as real computers. They may not have the same level of paranoia that malware may get in or their activities may be monitored.
Users tend to install a lot more apps on their phones than they do on their computers. These apps are more likely to be from unknown software vendors than those installed on computers.
Social engineering may work more easily on phones. People are often in distracted environments when using their phones and may not pay attention to realize they are experiencing a phishing attack.
Phones are small. Users may be less likely to notice some security indicators, such as an EV certificate indicator. It is also easier to lose the phone … or have it stolen.
A lot of phones are protected with bad PINs. Four-digit PINs still dominate and, as with passwords, people tend to pick bad ones – or at least common ones. In fact, four PINs (1234, 1111, 0000, 1212, 7777) account for over 20% of PINs chosen by users.
While phones have safeguards to protect resources that apps can access users may grant app permission requests without thinking: they will just click through during installation to get the app up and running.
Interfaces
Phones have many sensors built into them: GSM, Wi-Fi, Bluetooth, and NFC radios as well as a GPS, microphone, camera. 6-axis gyroscope and accelerometer, and even barometer. These sensors can enable attackers to monitor the world around you: identify where you are and whether you are moving. They can record conversations and even capture video. The sensors are so sensitive that it has been demonstrated that a phone on a desk next to a keyboard can pick up vibrations from a user typing on the neighboring keyboard. This led to a word recovery rate of 80%.
Apps
There are a lot of mobile apps. Currently, there are about 2.6 million Android apps and 2.2 million iOS apps. Most of these apps are written by unknown, and hence untrusted, parties. We would be be wary of downloading many of these on our PCs but think nothing of doing so on our phones. We place our trust in several areas:
- The testing & approval process by Google (automated) and Apple (automated + manual)
- The ability of the operating system to sandbox an application
- The operating system’s requirement of users granting permissions to access certain resources.
This trust may be misplaced as the approval process is far from foolproof. Overtly misadvertised or malicious apps can be detected but it is impossible to discern what a program will do in the future. Sandboxes have been broken in the past and users may be too happy to grant permissions to apps. Moreover, apps often ask for more permissions than they use. For example, a security researcher surveyed flashlight apps available for Android and discovered that, of the 937 apps surveyed, the majority requested an average of 25 permissions per app.
Most apps do not get security updates. There is little economic incentive for a developer to support existing apps, particularly if newer ones have been deployed.
Platform
Mobile phones are comparable to desktop systems in complexity. In some cases, they may even be more complex. This points to the fact that, like all large systems, they will have bugs and some of these bugs will be security sensitive. For instance, in late March, 2017, Apple released an upgrade for iOS, stating that they fixed over 80 security flaws. This is almost 10 years after the release of the iPhone. You can be certain there are many more flaws lurking in the system and more will be added as new features are introduced.
Because of bugs in the system, malicious apps may be able to get root privileges. If they do, they can install rootkits, enabling long-term control while concealing their presence. A lot of malicious iOS apps, for instance, gained root privileges by exploiting heap overflow vulnerabilities.
Unlike desktop systems and laptops, phones enforce a single user environment. Although PCs are usually used as single-user systems, they support multiple user accounts and run a general-purpose timesharing operating system. Mobile devices are more carefully tuned to the single-user environment.
Threats
Mobile devices are are threats to personal privacy as well as at risk of traditional security violations. Personal privacy threats include identifying users and user location, accessing the camera and microphone, and leaking personal data from the phone over the network. Additional threats include traditional phishing attacks, malware installation, malicious Android intents (messages to other apps or services), and overly-broad access to system resources and sensors.
Android security
Android was conceived as an operating system for network-connected smart mobile devices. The company was acquired by Google in 2005 and the engineering effort shifted to developing it as a Linux-based operating system platform that would be provided for free to third-party phone manufacturers. Google would make money from services and apps.
Applications on Android had specific needs, some of which have not been priorities in the design of desktop operating systems. These include:
- Integrity
- The platform should ensure that the app is not modified between the time of its creation and its installation by users.
- Isolation
- Each app needs private data and app components as well as the private data need to be protected from other apps.
- Sharing
- Apps may need access to shared storage and devices, including the network. This includes the on-device file system, external storage, communication interfaces, and the various sensors available on phones and other smart devices.
- Inter-app services
- An app needs to be able to send messages to other apps to interact with services – but only when it is permitted to do so. This affects the design of apps. Desktop apps have a single launching point and generally run as single, monolithic process. Android apps, on the other hand, contain multiple independent components that include activities (user-facing parts), services (background components), and content providers (data access components).
- Portability
- The previous needs are general and apply to other mobile platforms, such as iOS. Since Android was designed as an operating system for a variety of hardware, apps should be able to run on different hardware architectures. A decision was made that apps should be distributed in a portable, architecture-neutral format. Apps run under the ART (Android Runtime) virtual machine, which is a variant of of the Java virtual machine (JVM). The original intention in Android was that apps would be written only in Java but it soon became clear that support for native (C and C++) code was needed and Google introduced the Native Development Kit to support this.
App package
An Android app is distributed as a package in an APK format, which is a single zip-format compressed file that contains several components:
- Activity
- Code for the user-visible component - the interface. The code component includes compiled code and needed resource files, such as strings, images, UI layouts, and data.
- Service
- Code background component - services that the app may offer to other programs
- Content provider
- A database for whatever persistent data the app needs to store. It implements APIs to provide access to structured data. For instance, user contacts will be stored in a content provider.
- Broadcast receiver
- Mailbox for received messages
- Package manifest (META-INF)
- This contains items needed to validate the origin of the package and that it has not been tampered: - Signed list of hashes - Application creator’s certificate
- Application manifest
- This enumerates what makes up the application and includes: - Name (e.g., com.example.myapp) - Components list - Device requirements - Intents: interfaces the app supports to activate services & activities - Permissions: access to services this app requires (e.g., android.permission.SEND_SMS) - Permissions other apps need to access services this app provides
App Integrity
All Android apps must be signed by the developer. This allows both developers and users to know that the application will be installed without modifications on their device. On installation, the Android Package Manager verifies the signature.
Applications do not have to be signed by any central authorities. Developers can use a self-signed certificate and Android does not perform verification of CAs (certification authorities) in the certificate.
Prior to distribution, the contents of the APK package are hashed and signed by the developer and then the signature along with the developer’s certificate is inserted into the APK.
The app package is verified prior to installation by creating a hash and validating it against the signature in the package. If the app is distributed through the Google Play store, Google performs the same checks.
For additional protection, Google optionally supports a feature called Google Play Protect. This validates the app before it is downloaded and checks the user’s device for potential malware. It warns the user about malware or any apps that violate Google’s Unwanted Software Policy, such as apps that hide or misrepresent information.
The Android app sandbox
Android relies on process sandboxing for most of its security. Android is based on Linux, which is a multi-user operating system. Under Linux, each user has a distinct user ID and all apps run by that user run with the privileges of the user (ignoring setUID apps). This allows any one app full access to all user data.
User IDs
Android supports only a single user and uses Linux user IDs for isolating app privileges. Under Android, each app normally runs under a different user ID. Hence, apps are isolated and can only access their resources. Access requests to other objects involve messages that pass through a gatekeeper, which validates access requests.
Core Android services also run in a similar manner, with each service running under its own unique user ID. For example:
user id | service |
---|---|
1001 | Telephony |
1002 | Bluetooth |
1003 | Graphics |
1004 | Input devices |
1005 | Audio |
Related apps may share the same Linux user ID if a sharedUserID
attribute is set to the same domain for two or more applications as long as those
apps are also signed by the same certificate. This would allow these related apps to share files and they can be configured to even share the same Dalvik virtual machine.
File permissions
Two mechanisms are used to enforce file access permissions:
Linux file permissions These provide discretionary access control, allowing the owner (and root) to change permissions to allow others access to the files. With this mechanism, an app can decide to share a data file.
SELinux mandatory access control Certain data and cache directories in Android are protected with the SELinux (Security-Enhanced Linux) mandatory access control (MAC) kernel extension. This ensures that even the owner cannot change access permissions for the files.
Internal storage provides a per-app private directory for files used by each application. External storage (e.g., attached microSD cards or USB devices) is shared among all apps and, of course, may be moved to other computers.
Intents
Android apps communicate with system services, between app components, and with other apps via intents. Intents are messaging objects. An intent is a message that contains: - requested action - data being sent to the action - the component and app that should handle the intent
Intents are declarations of app capabilities and the messaging format used to communicate with an app. They identify app components and how those components are started (e.g., foreground or background and what the entry points are).
Intents allow an app to:
- Start a service (background task)
- Start an activity (start a user-facing foreground task, such as a camera or phone)
- Deliver notifications to one or more apps (broadcasts)
An app lists its available intents in its app manifest and these intents are registered when the app is installed. The intents form the list of services that the app exposes to other applications. If several apps register the same intent, the user selects which app should be launched. For example, you may have multiple browsers installed and are asked to pick the one that should be associated with implicit intents to display a URL.
Intents may explicit or implicit. With explicit intents, the app identifies the target component in the intent. That is, the intent is sent to a specific, named app. With implicit intents, the app asks Android to find a component based on the type of data being sent. For example, sending a URL to display a web page can be an explicit intent that will cause Android to open the default web browser.
Intents are used to invoke system services as well as services available on any installed apps. Common examples of intents are: add a calendar event, set an alarm, take a photo & return it, view a contact, add a contact, show a location on a map, retrieve a file, or initiate a phone call.
Intents pass through & are validated by Android’s gatekeeper.
Permissions
An app manifest also contains permissions, which can specify which apps or services are allowed access to the app’s services and whether a user needs to be prompted to grant permission. Permissions determine whether one app is allowed to access another app’s component.
Apps need permissions to access any services, which include:
- System resources: logs, battery levels, …
- System interfaces: Internet, Bluetooth, send SMS, send email, …
- Sensitive data: SMS messages, contacts, email, …
- Any app-defined services
Every service, whether a normal app or a system service is assigned a protection level that determines who may be able to access it:
Permission | Type |
---|---|
Normal | this is the default; there is no danger to the users or system if this service is accessed |
Dangerous | Access can compromise the system or privacy. The user has to approve access during installation or runtime |
Signature | Access granted only if the app signed by the same developer & contains the same certificate. This allows related apps to share services (e.g., Microsoft Office apps) |
SignatureOrSystem | Similar to signature_ but access will be granted if a system application is requesting it |
The application manifest file defines the type of permission and the service that is associated with permission name.
Permissions are managed in two forms:
- Permission text strings
- These are enforced by Android middleware. Sensitive resources such as the phone are only accessible via APIs and access is mediated through these APIs.
- Linux group IDs
- Group permissions are enforced by Linux file access checks. For efficiency, networking and file access operations do not go through APIs but directly to Linux. This includes access to Bluetooth, Wi-Fi, and external storage. To be able to access resources, the app needs to be a member of the group that corresponds to the resource. Android dynamically adds user IDs to various groups based on what permissions are granted to them.
Other protections
The Linux operating system provides per-process memory isolation and address space layout randomization (ASLR). Linux also uses no-execute (NX) protection on stack and heap memory pages if the processor supports it
The Java compiler provides provides stack canaries, and its memory management libraries provide some heap overflow protections (checks of backward & forward pointers in dynamically allocated structures).
Android supports whole disk encryption so that if a device is stolen, an attacker will not be able to easily recover file contents even with raw access to the flash file system.
Unlike iOS, Android supports the concurrent execution of multiple apps. It is up to the developer to think about being frugal with battery life. Apps store state their state in persistent memory so they can be stopped and restarted at any time. This ability to stop an app also helps with DoS attacks as the app is not accepting requests or using system resources.
Security concerns
An app can probe whether another app has specific permissions by specifying a permission with an intent method call to that app. This can help an attacker identify a target app. Receivers need to be able to handle malicious intents, even for actions they do not expect to handle and for data that might not make sense for the action.
Apps may also exploit permissions re-delegation. An app, not having a certain permission, may be able gain those privileges by communicating through another app. If a public component does not explicitly have an access permission listed in its manifest definition, Android permits any app to access it. For example, the Power Control Widget (a default Android widget) allows third-party apps to change protected system settings without requesting permissions to control those settings. This is done by presenting the user with a pop-up interface to control power-related settings. A malicious app can send a fake intent to the Power Control Widget while simulating the pressure of the widget button to switch power-related settings. It is effectively simulating a user’s actions on the screen.
By using external storage, apps can exercise permissions avoidance. By default, all apps have access to external storage. Many apps store data in external storage without specifying any protection, allowing other apps to access that data.
Another way permissions avoidance is used is that Android intents allow opening some system apps without requiring permission to do so. These apps include the camera, SMS, contact list, and browser. For instance, opening a browser via an intent can be dangerous since it enables data transmission, receiving remote commands, and even downloading files without user intervention.
iOS security
App signing
iOS requires mandatory code signing. Unlike Android, which accepts self-signed certificates, the app package must be signed using an Apple Developer certificate and apps are only available for This does not ensure trustworthiness of an app but identifies the registered developer and ensures that the app has not been modified after it has been signed.
Runtime protection
Apple’s iOS provides runtime protection via OS-level sandboxing. System resources and the kernel are shielded from user apps. The sandbox also limits which system calls an app can call Except through kernel exploits, an app cannot leave its sandbox.
The app sandbox restricts the ability of one app to access another app’s data and resources. Each app has its own sandbox directory. The OS enforces the sandbox and permits access only to files within that directory, as well as restricted access to to system preferences, the network, and other resources.
Inter-app communication can take place only through iOS APIs. Code generation by an app is prevented because data memory pages cannot be made executable and executable memory pages are not writable by user processes.
Data protection
All file contents are encrypted with a unique 256-bit AES per-file key, which is generated when the file is created.
This per-file key is encrypted with a class key and is stored along with the file’s metadata, which is the part of the file system that describes attributes of the file, such as size, modification time, and access permissions.
The class key is generated from a hardware key in the device and the user’s passcode. Unless the passcode is entered, the class key cannot be created and the file key cannot be decrypted.
The file system’s metadata is also encrypted. A file system key is used for this, which is derived directly from the hardware key, which is generated when iOS is installed. Keys are stored in Apple’s Secure Enclave, a separate processor and isolated memory that cannot be accessed directly by the main processor. Encrypting metadata encrypts the entire structure of the file system. Someone who rips out the flash memory from an iOS device and examines it will not be able to see neither file contents (they are encrypted with per-file keys) nor information about those files (the metadata is encrypted with a file system key).
A hardware AES engine encrypts and decrypts the file as it is written/read on flash memory so file encryption is done transparently and efficiently.
The iOS kernel partition is mounted read-only, so even if an app manages to break out of its sandbox due to some vulnerability and gain root access, it will still not have permission to modify the kernel.
Communication
The iOS sandbox restricts apps from accessing files stored by other apps or making changes to device settings. Each app is given a unique home directory for its files. System files and resources are shielded from the user’s apps
Unlike Android, where each app is assigned a unique user ID, apps under iOS run as a non-privileged user “mobile.”
The iOS framework grants entitlements to apps. These are digitally-signed key-value pairs that are granted to an app to allow access to specific services. It is essentially a capability. If you have an entitlement, then you can access a service.
Kernel protection
In addition to the sandbox, iOS also uses address space layout randomization (ASLR) and memory execute protection for stack and heap pages via ARM’s Execute Never (XN) memory page flag.
Masque attacks
While Apple normally expects users to install apps only from its App Store, users need to be able to deploy pre-production apps to friendly parties for testing and enterprises may need to deploy in-house apps to their employees. Apple supports a Developer Enterprise Program to create and distribute such in-house apps. This mechanism has been used to replace existing apps with private versions. The vulnerability has been patched.
iOS has been hit several times with masque attacks. While there have been various forms of these, the basic attack is to get users to install malicious apps that have been created with the same bundle identifier as some exiting legitimate app. This malicious app replaces the legitimate app and masquerades as that app. Since Apple will not host an app with a duplicate bundle identifier, the installation of these apps has to bypass the App Store. Enterprise provisioning is used to get users to install this. which typically requires the user going to a URL that redirects the user to an XML manifest file hosted on a server. The ability to launch this attack is somewhat limited as the user will generally need to have an enterprise certificate installed to make the installation seamless.
Web apps
Both iOS and Android have full web browsers that can be used to access web applications. They also permit web apps to appear as a regular app icon. The risks here are the same as those for web browsers in general: loading untrusted content and leaking cookies and URLs to foreign apps.
Mobile-focused web-based attacks can take advantage of the sensors on phones. The HTML5 Geolocation API allows JavaScript to find your location. A Use Current Location permission dialog appears, so the attacker has to hope the user will approve but there the attacker can provide incentives via a Trojan horse approach: provide a service that may legitimately need your location.
Recently, a proof of concept web attack showed how JavaScript could access the phone’s accelerometers to detect movements of the phone as a user enters a PIN. The team that implemented this achieved a 100% success rate of recognizing a four-digit PIN within five attempts of a user entering it. Apple patched this specific vulnerability but there may be more undiscovered ones.
Hardware support for security

All Android and iOS phones currently use ARM processors. ARM provides a dedicated security module, called TrustZone, that coexists with the normal processor. The hardware is separated into two “worlds”: secure (trusted) and non-secure (non-trusted) worlds. Any software resides in only one of these two worlds and the processor executes in only one world at a time.
Each of these worlds has its own operating system and applications. Android systems run an operating system called Trusty TEE in the secure world and, of course, Linux in the untrusted world.
Logically, you can think of the two worlds as two distinct processors, each running their own operating system with their own data and their own memory. Non-secure applications cannot access any own memory or registers of secure resources directly. The only way they can communicate is through a messaging API.
In practice, the hardware creates two virtual cores for each CPU core, managing separate registers and all processing state in each world.
The phone’s operating system and all applications reside in the non-trusted world. Secure components, such as cryptographic keys, signature services, encryption services, and payment services live in the trusted world. Even the operating system kernel does not have access to any of the code or data in the trusted world. Hence, even if an app manages a privilege escalation attack and gains root access, it will be unable to access certain security-critical data.
Applications for the trusted world include key management, secure boot, digital rights management, secure payment processing, mobile payments, and biometric authentication.
Apple Secure Enclave
Apple uses modified ARM processors for iPhones and iPads. In 2013, they announced Secure Enclave for their processors. The details are confidential but it appears to be similar in function to ARM’s TrustZone but designed as a physically separate coprocessor. As with TrustZone, the Secure Enclave coprocessor runs its own operating system (a modified L4 microkernel in this case).
The processor has its own secure bootloader and custom software update mechanism. It uses encrypted memory so that anything outside the Secure Enclave cannot access its data. It provides:
- All cryptographic operations for data protection & key management.
- Random number generation.
- Secure key store, including Touch ID (fingerprint) and the Face ID neural network.
- Data storage for payment processing.
The Secure Enclave maintains the confidentiality and integrity of data even if the iOS kernel has been compromised.