Formally verified sandboxing for packet-processing programs

Synopsis

Modern computing applications process vast amounts of data collaboratively employing many thousands of server machines residing in computing clusters. To support such applications, the network interconnecting servers and the packet-processing software on the servers should be fast (supporting high data rates and low delays), flexible (enabling diverse data-processing applications), and safe (e.g., programs must run without crashing). Berkeley Packet Filter (BPF) has emerged as a mechanism to meet these goals and accelerate novel high-performance packet-processing applications. BPF is currently deployed in many production systems. BPF achieves flexibility and performance by running user-developed programs in the context of the operating system. To ensure safety of such applications, this project will develop provably-correct static analyzers for BPF programs, protecting the operating system from security vulnerabilities, denial-of-service attacks, and crashes. This project will advance the state-of-the-art in the static analysis, program synthesis, and testing of networking applications such as load balancers, packet filters, and performance monitors. This project will also educate graduate, undergraduate, and high-school students on foundational techniques for reasoning about correctness, network monitoring, and filtering.

This project has three technical goals. The first is to develop a verified Berkeley Packet Filter (BPF) static analyzer based on an abstract interpretation that is correct by construction. The project will address key intellectual challenges involving the formalization of the BPF instruction set and modeling of domain-specific sandboxing properties. Currently, an in-kernel BPF static analyzer checks the safety of loaded BPF programs by performing range-tracking, memory safety, and freedom from information leaks. However, this analyzer has deficiences, resulting in the execution of unsafe programs and exploitable vulnerabilities. The second goal of this project is to develop an analyzer in the C programming language that can be usable as part of the kernel, by leveraging differential analysis, program synthesis, and testing. The final goal is to design a verified BPF toolchain based on LLVM, by developing validated translators from C to BPF bytecode.

This project is funded by National Science Foundation award #2019302 (FMitF, track I).

People

Faculty

Srinivas Narayana
Assistant Professor

Santosh Nagarakatte
Professor

Graduate Students

Harishankar Vishwanathan
Ph.D. student

Matan Shachnai
Ph.D. student

Publications

Verifying the Verifier: eBPF Range Analysis Verification

Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, and Santosh Nagarakatte
Computer Aided Verification (CAV) 2023

This paper proposes an automated method to check the correctness of range analysis used in the Linux kernel's eBPF verifier. We provide the specification of soundness for range analysis performed by the eBPF verifier. We automatically generate verification conditions that encode the operation of the eBPF verifier directly from the Linux kernel's C source code and check it against our specification. When we discover instances where the eBPF verifier is unsound, we propose a method to generate an eBPF program that demonstrates the mismatch between the abstract and the concrete semantics. Our prototype automatically checks the soundness of 16 versions of the eBPF verifier in the Linux kernel versions ranging from 4.14 to 5.19. In this process, we have discovered new bugs in older versions and proved the soundness of range analysis in the latest version of the Linux kernel.

Sound, Precise, and Fast Abstract Interpretation with Tristate Numbers

Harishankar Vishwanathan, Matan Shachnai, Srinivas Narayana, and Santosh Nagarakatte
Symposium on Code Generation and Optimization (CGO) 2022

Extended Berkeley Packet Filter (BPF) is a language and run-time system that allows non-superusers to extend the Linux and Windows operating systems by downloading user code into the kernel. To ensure that user code is safe to run in kernel context, BPF relies on a static analyzer that proves properties about the code, such as bounded memory access and the absence of operations that crash. The BPF static analyzer checks safety using abstract interpretation with several abstract domains. Among these, the domain of tnums (tristate numbers) is a key domain used to reason about the bitwise uncertainty in program values. This paper formally specifies the tnum abstract domain and its arithmetic operators. We provide the first proofs of soundness and optimality of the abstract arithmetic operators for tnum addition and subtraction used in the BPF analyzer. Further, we describe a novel sound algorithm for multiplication of tnums that is more precise and efficient (runs 33% faster on average) than the Linux kernel’s algorithm. Our tnum multiplication is now merged in the Linux kernel.

This paper won a distinguished paper award.

Software

bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul

This patch introduces a new algorithm for multiplication of tristate numbers (tnums) that is provably sound. It is faster and more precise when compared to the existing method. It has been merged into the mainline Linux kernel.

Like the existing method, this new algorithm follows the long multiplication algorithm. The idea is to generate partial products by multiplying each bit in the multiplier (tnum a) with the multiplicand (tnum b), and adding the partial products after appropriately bit-shifting them. The new algorithm, however, uses just a single loop over the bits of the multiplier (tnum a) and accumulates only the uncertain components of the multiplicand (tnum b) into a mask-only tnum. This CGO'22 research paper explains the algorithm in more detail.

A natural way to construct the tnum product is by performing a tnum addition on all the partial products. This algorithm presents another method of doing this: decompose each partial product into two tnums, consisting of the values and the masks separately. The mask-sum is accumulated within the loop in acc_m. The value-sum tnum is generated using a.value * b.value. The tnum constructed by tnum addition of the value-sum and the mask-sum contains all possible summations of concrete values drawn from the partial product tnums pairwise. We prove this result in the paper.

Our evaluations show that the new algorithm is overall more precise (producing tnums with less uncertain components) than the existing method. As an illustrative example, consider the input tnums A and B. The numbers in the parenthesis correspond to (value;mask).

A = 000000x1 (1;2)
B = 0010011x (38;1)
A * B (existing) = xxxxxxxx (0;255)
A * B (new) = 0x1xxxxx (32;95)

Importantly, we present a proof of soundness of the new algorithm in the aforementioned paper. Additionally, we show that this new algorithm is empirically faster than the existing method.

Education, Outreach, and Broader Impacts

The project is training two graduate students in formal methods and the Linux eBPF verifier technology. Both of these students have been learning the fundamentals of static analysis, building proofs, developing correct-by-construction systems, and gaining significant experience with automated theorem-proving tools such as Z3. As pre-requisite to working on this research, one of the students has attended a class on automated program reasoning, and developed a solid introduction to working with the Coq proof checker.

The principal investigators are training several undergraduate students, involving them in small project efforts related to learning and experimenting with eBPF technology. Students wrote small eBPF programs, figuring out how to get their code through the eBPF verifier, and gained experience with writing provably safe code. This enabled a broader set of students to understand and appreciate the benefits and problems with the eBPF ecosystem. The PIs will have more to report on these efforts in the subsequent years.

This project has made contributions to improve the Linux kernel, an operating system deployed in millions of devices worldwide.

Last updated Mon Oct 02 1930 EDT.

Formally Verified Sandboxing for Packet-Processing Programs