Research Experiences

I've worked on research and class projects at Princeton, and interned at Microsoft (Redmond, Washington), Google (Mountain View, California) and IBM research (Bangalore, India). I briefly describe those projects below in reverse chronological order.

Detecting application-level denial of service attacks in Microsoft's Azure public cloud

Denial of service attacks cause significant downtime and financial damage to operational networks and services. Such attacks increasingly target application-specific resources like thread pools and server compute resources, besides others. Detecting these attacks is challenging because they don't have an obviously discernible signature on the network: even if every packet is inspected, the impact is often experienced only on application-specific resources. We use endpoint information in the form of TCP connection parameters (bytes sent/received, number of timeouts, etc.) and resource statistics counters (CPU utilization, memory pool parameters, etc.) to detect attacks, as well as specific connections that may be held responsible for attacks. We trained a classifier that detected these attacks with high accuracy in controlled environments.

Measuring the impact of network reconfiguration in centralized traffic engineering on Google's production WAN

Logical centralization of traffic engineering in a private backbone network can provide high network utilization, resulting from deterministic, fine-grained control of traffic flows. However, reconfigurations are not reflected on the data plane instantaneously, as network devices are geographically distributed, with varying reprogramming delays. We studied the impact of distributed data plane update due to traffic engineering in Google's production wide-area network, which supports large data transfers between compute clusters. In particular, we focused on adverse effects on packet loss rates and link utilization due to pathological orderings in which different devices are updated. Simulations of worst-case update orderings with production workloads showed that links can sometimes be loaded by as much as 100\% over capacity. Yet surprisingly, an analysis of loss rates and link loads measured from counters on production switches revealed that there is little adverse impact on these metrics during network transition periods (at a time scale of tens of seconds). We believe that a combination of high link utilization, the presence of long-lived bulk-transfer TCP flows, and switch QoS, significantly simplifies the TE design: distributed data plane update might as well be considered atomic, and need not be carefully sequenced.

An Openflow controller for interoperable 802.1D Ethernet spanning tree

The SDN philosophy has been embraced by industry as a practical next step to mitigate the ossification of control stacks for switches. To meet these goals, SDN needs to be incrementally deployable alongside legacy network equipment, which requires backwards compatibility with these equipment and the protocols they implement. Today, there is some support for emulation of protocol functionality on a fully Openflow network, and backwards compatibility in the form of Openflow-capable hardware (run either in traditional L2/L3 mode or Openflow mode exclusively). However, full support for simultaneous legacy-interoperability and controller/flow-table customization is lacking. We make a modest step in this direction by implementing the Ethernet Spanning Tree Protocol (STP 802.1D) on an Openflow network. The controller runs the STP algorithm separately for every switch, generating STP packets and instructing switches to send and receive them. We have designed a modular architecture for our protocol implementation which allows the system to be easily extensible to other protocols.

Configurable line-rate traffic monitoring on a netFPGA

NetFPGA is a programmable PCI card with an on-board FPGA and GigaBit Ethernet ports. We developed a tool that implements configurable counter and field-based packet sampling on a NetFPGA, based on the PSAMP RFC (5476). The samplers can be configured by setting registers on the FPGA, which is accessible to the user through the simple command line interface of the netFPGA.

Stability of explicit congestion control protocols

Rate Control Protocol (RCP) is a transport mechanism that uses explicit rate feedback from points in the network at traffic sources to achieve small flow completion times. We analyze RCP to explain two non-intuitive observations. First, small-buffer variants of RCP that control queues through the mean of their distributions exhibit oscillatory behaviour inside the stable parameter regions of the RCP feedback loop, as flow bandwidth-delay products are reduced. Second, there are parameters just outside the stable parameter region for which queue and rate instabilities occur in the presence of queue feedback, but not otherwise. We modelled the small-buffer RCP feedback loop with explicit queue evolutions in small bandwidth-delay product environments, and analytically found the necessary and sufficient stability conditions for this modified feedback loop. Our predictions agree with the empirical observations. We also characterized the observed instabilities just outside the stable region through Hopf bifurcations.

Transactions on the World Wide Telecom Web (WWTW)

The World-Wide Telecom Web (also known as the "spoken web") is a voice-driven equivalent of the world-wide web (WWW) over the Telecom network. The spoken web started as a pilot project by IBM Research India to enable developing regions leverage the benefits of the web through their mobile phones, which are required only to have a simple numeric keypad and voice connections. We developed a mechanism for securing financial transactions over this medium using social trust, i.e., witnesses, to provide additional authentication factors for transactions. Our work was peer reviewed internally at IBM Research India and our ideas have been patented.