Path Queries for Interactive Network Debugging

Welcome to the path queries project!

Measuring the flow of traffic along network paths is crucial for many management tasks, including traffic engineering, diagnosing congestion, and mitigating DDoS attacks. However, conventional approaches to collecting path-based measurements rely on complex "joins" of multiple sources of data (i.e., forwarding, topology and traffic), resulting in inaccurate results, or unnecessarily high overheads.

We introduce a declarative query language for efficient path-based traffic monitoring. Path queries are specified as regular expressions over predicates on packet locations and header values, with SQL-like groupby constructs for aggregating results anywhere along a path. A run-time system compiles queries into a deterministic finite automaton. The automaton's transition function is then partitioned, compiled into match-action rules, and distributed over the switches. Switches stamp packets with automaton states to track the progress towards fulfilling a query. Only when packets satisfy a query are they packet counted, sampled, or sent to collectors for further analysis. By processing queries in the data plane, users "pay as they go", as data-collection overhead is limited to exactly those packets that satisfy the query.

We implemented our system on top of the Pyretic SDN controller and evaluated its performance on campus and ISP topologies. Our experiments indicate that the system can enable interactive debugging: compiling multiple queries in a few seconds---while fitting rules comfortably in modern switch TCAMs and the automaton state into 2-4 bytes (e.g., a VLAN or MPLS header).

Here is a demonstration of this model of debugging that goes well with this talk describing our system.

Paper drafts

NSDI 2016 paper [PDF]

We also have an extended version [PDF] of the paper with more details, and a preliminary workshop paper [PDF].

Code

Our prototype source code is publicly available on the Pyretic SDN controller system.

To see path queries running on a mininet network, you might do the following:

Download the latest pyretic VM. Note that the VM is a 1G OVA file which can be opened by most virtual machine monitors. We recommend VirtualBox.
Set up, boot, and log into the VM by following instructions from this webpage: Mininet VM setup notes
Once logged into the VM, you can check out basic examples from ~/pyretic/pyretic/examples/path_query.py. To see a query running in action, run the following commands:
```
	> cd pyretic
	> pyretic.py pyretic.examples.path_query --query=path_test_1 \
	--fwding=static_fwding_chain_3_3
	# open a new terminal
	> mininet.sh --topo=chain,3,3
      
```
Now you should be able to run standard mininet commands and test out both connectivity and basic model of working with queries. For instance, typing pingall on the mininet prompt will ping each set of hosts, generating correspoding (packet-level) query outputs on the controller.

Reproducing Our Experiments

We have checked in all scripts needed to reproduce the results from our paper drafts into the code repository. The scripts are located in the folder ~/pyretic/pyretic/evaluations/scripts/nsdi16.

To generate all metrics and output data relevant to a specific experiment in the paper, you need to run the corresponding evaluation script as follows:

    cd ~/pyretic
    sudo bash pyretic/evaluations/scripts/nsdi16/{scriptname}.sh

Here the {scriptname} is one of:

stanford_opts: to evaluate benefits of optimizations on an emulated Stanford campus network topology;
enterprise_opts: to evaluate the prototype with all optimizations enabled on a set of real and inferred enterprise and ISP topologies;
delauney_tests: to evaluate the prototype on the topologies we generated using IGen with the delauney triangulation heuristic; and
waxman: to evaluate the prototype on the Waxman topologies we generated.

For more details on the experiments themselves, see one of our paper drafts above.

Note that the pyretic VM does not run pypy, which is required to reproduce the results we have in our paper drafts. (We did not generate our results on the VM---rather by running bare-metal on an Intel Xeon E3 server.) However you could change the python interpreter for evaluations to the basic python installed on the VM by setting the PYCMD config parameter to python in ~/pyretic/pyretic/evaluations/scripts/nsdi16/init_settings.sh. This will allow the evaluation scripts to run inside the VM.

Interpreting Results of Experiments

Results are stored in ~/pyretic/pyretic/evaluations/evaluation_results/nsdi16, under multiple folders each corresponding to a query (or set of queries) on each network, and experiment options (e.g., enabled optimizations).

The main output of each experiment is the file stat.txt in the folder corresponding to the experiment, containing the following information:

Compilation time statistics: The compilation time---overall as well as categorized by stage of compilation---for each table stage that is compiled;
Rule space statistics: The rule counts from each table stage; and
DFA statistics: DFA parameters such as state counts and edges resulting from the queries being compiled.

A single stat.txt file only allows viewing the results of a single experiment, e.g., a single query running on a single network with a single set of experiment parameters.

To recover averaged results as shown in our paper, you need to process multiple stat files, aggregating the results in various ways. See ~/pyretic/pyretic/evaluations/scripts/nsdi16/processing-hints.sh for hints to process the stat files with bash commands that recover various aggregations of the numerical results.

Questions?

You can reach the corresponding author at alephtwo at [csail dot] mit dot edu.

Last updated on May 10th 2017 at 1744 ET