Research Experiences
I've worked on research and class projects at Princeton, and interned at
Microsoft (Redmond, Washington), Google (Mountain View, California) and IBM
research (Bangalore, India). I briefly describe those projects below in reverse
chronological order.
Detecting application-level denial of service attacks in Microsoft's Azure
public cloud
Denial of service attacks cause significant downtime and financial damage to
operational networks and services. Such attacks increasingly target
application-specific resources like thread pools and server compute resources, besides
others. Detecting these attacks is challenging because they don't have an
obviously discernible signature on the network: even if every packet is
inspected, the impact is often experienced only on application-specific
resources. We use endpoint information in the form of TCP
connection parameters (bytes sent/received, number of timeouts, etc.) and
resource statistics counters (CPU utilization, memory pool parameters, etc.) to
detect attacks, as well as specific connections that may be held responsible for
attacks. We trained a classifier that detected these attacks with high accuracy in
controlled environments.
Measuring the impact of network reconfiguration in centralized traffic
engineering on Google's production WAN
Logical centralization of traffic engineering in a private backbone network can
provide high network utilization, resulting from deterministic, fine-grained
control of traffic flows. However, reconfigurations are not reflected on the
data plane instantaneously, as network devices are geographically distributed,
with varying reprogramming delays. We studied the impact of
distributed data plane update due to traffic engineering in Google's production
wide-area network, which supports large data transfers between compute
clusters. In particular, we focused on adverse effects on packet loss rates and
link utilization due to pathological orderings in which different devices are
updated. Simulations of worst-case update orderings with production workloads
showed that links can sometimes be loaded by as much as 100\% over capacity. Yet
surprisingly, an analysis of loss rates and link loads measured from counters on
production switches revealed that there is little adverse impact on these
metrics during network transition periods (at a time scale of tens of
seconds). We believe that a combination of high link utilization, the presence
of long-lived bulk-transfer TCP flows, and switch QoS, significantly simplifies
the TE design: distributed data plane update might as well be considered atomic,
and need not be carefully sequenced.
An Openflow controller for interoperable 802.1D Ethernet spanning
tree
The SDN philosophy has been embraced by industry as a practical next
step to mitigate the ossification of control stacks for switches. To
meet these goals, SDN needs to be incrementally deployable
alongside legacy network equipment, which requires backwards
compatibility with these equipment and the protocols they
implement. Today, there is some support for emulation of protocol
functionality on a fully Openflow network, and backwards compatibility
in the form of Openflow-capable hardware (run either in
traditional L2/L3 mode or Openflow mode exclusively). However, full
support for simultaneous legacy-interoperability and
controller/flow-table customization is lacking. We make a modest step
in this direction by implementing the Ethernet Spanning Tree Protocol
(STP 802.1D) on an Openflow network. The controller runs the STP
algorithm separately for every switch, generating STP packets and
instructing switches to send and receive them. We have designed a
modular architecture for our protocol implementation which allows
the system to be easily extensible to other protocols.
Configurable line-rate traffic monitoring on a netFPGA
NetFPGA is a programmable PCI card with an on-board FPGA and GigaBit Ethernet
ports. We developed a tool that implements configurable counter and field-based
packet sampling on a NetFPGA, based on the PSAMP RFC (5476). The samplers can be
configured by setting registers on the FPGA, which is accessible to the user
through the simple command line interface of the netFPGA.
Stability of explicit congestion control protocols
Rate Control Protocol (RCP) is a transport mechanism that uses explicit rate
feedback from points in the network at traffic sources to achieve small flow
completion times. We analyze RCP to explain two non-intuitive
observations. First, small-buffer variants of RCP that control queues through
the mean of their distributions exhibit oscillatory behaviour inside the stable
parameter regions of the RCP feedback loop, as flow bandwidth-delay products are
reduced. Second, there are parameters just outside the stable parameter region
for which queue and rate instabilities occur in the presence of queue feedback,
but not otherwise. We modelled the small-buffer RCP feedback loop with explicit
queue evolutions in small bandwidth-delay product environments, and analytically
found the necessary and sufficient stability conditions for this modified
feedback loop. Our predictions agree with the empirical observations. We also
characterized the observed instabilities just outside the stable region through
Hopf bifurcations.
Transactions on the World Wide Telecom Web (WWTW)
The World-Wide Telecom Web (also known as the "spoken web") is a voice-driven
equivalent of the world-wide web (WWW) over the Telecom network. The spoken web
started as a pilot project by IBM Research India to enable developing regions
leverage the benefits of the web through their mobile phones, which are required
only to have a simple numeric keypad and voice connections. We developed a
mechanism for securing financial transactions over this medium using social
trust, i.e., witnesses, to provide additional authentication factors for
transactions. Our work was peer reviewed internally at IBM Research India and
our ideas have been patented.