CS 671: Graph Streaming Algorithms and Lower Bounds (Fall 2020)


Instructor Sepehr Assadi
Credits 3 units
Schedule Tuesdays 12:00 PM - 3:00 PM (Online)
Prerequisites Background on (randomized) algorithms, complexity theory, discrete mathematics, and probability theory (basic concentration inequalities). Mathematical maturity and some experience with theoretical computer science beyond the introductory level, say, Advanced Algorithms (CS 513, CS 514, or an equivalent) is necessary.
Syllabus The full course syllabus is available here. This webpage contains the highlights of course syllabus that are potentially updated as the semester progresses.

Overview

Massive graphs abound: web-graphs, social networks, or biological networks are just a few examples. However, the traditional algorithmic approaches to these problems do not accurately capture the challenges of processing massive graphs. For instance, we can no longer assume a (fast) random access to the entire input on these graphs, as their sheer size prevents us from loading them into the main memory. As a result, there is a rapidly growing interest in developing algorithms that explicitly account for the restrictions of processing massive inputs. One particularly successful approach here is graph streaming algorithms. These are algorithms that process the input graphs by making one or few sequential passes over their edges while using a limited memory---these constraints capture several challenges of processing massive graphs such as I/O-efficiency or monitoring evolving graphs.

In this course, we will study the general picture of the landscape of powers and limitations of graph streaming algorithms, some of the key ideas in establishing these results, and some of the main open questions and gaps in our understanding of this model. This is a seminar course organized in the format of a focused reading group on the topic of graph streaming algorithms and lower bounds. The courseload involves writting criticial reviews, solving algorithmic problems, and presenting papers.

Logistics

This is a synchronous online course:

Grading

The final grade for the course will be based on the following weights: Students are expected to follow Rutgers academic integrity policy for all their work in this course.

The following is a summary of each of these components; see the course syllabus for more information.

Reviews: A key component of this course is to write critical reviews for the discussed papers in a format of conference reviews for TCS papers. The main purpose of the reviews is to familiarize the students with each paper before they are being discussed in the class. The reviews are for the weekly assigned readings and are due on 11:59pm EST on Monday each week, the day before the corresponding paper is being discussed in the class. The reviews should be turned in on Canvas; please use this template for your answers.

Problem sets: Each week, we will also have a single algorithmic or lower bound problem on the topic of the discussed paper. The main goal of these problem sets is to allow the students to test their understanding of the discussed topics in more depths. The solutions to these problemsets must be typeset in LaTeX and submitted via Canvas by 11:59pm EST on Tuesday the problem set is due; see LaTeX section for some resources on using LaTeX. The students are allowed to discuss the problem sets among themselves and there is no limit on the number of involved students. However, (1) the students should write their solutions completely independently (in particular, you should understand and be able to explain everything that is written in your solution); (2) you should include the name of your collaborators in your solutions.

Presentations: Each student is expected to present one or two papers to the class during the semester. This is the main component of the course for each student and involves the following three parts: (i) preparing the materials for the presentation (such as slides) one week before the in-class presentation to receive feedback from the Instructor; (ii) an in-class presentation for an entire lecture including the discussion of the main result, its proof, and any necessary background; (iii) a report on the presented paper including the proof of the main result presented in the class---the target of these notes are other students in the class, and the notes in particular should help other students to answer the problem set given for this presentation. The students do not need to write a review nor turn in a solution for the problem set for the paper they are presenting themselves.

The list of papers for presentation along with their allocated time slot is available on the Course Calendar: the lectures marked with * are for students' presentation---the main paper to present is the assigned reading paper but in some cases, the extra references give a simpler proof of the same result; in that case, the proof can be presented from the other papers. Students should send a list of three papers they would like to present to the Instructor by September 14th, 11:59PM EST.

Course Calendar

The following is a schedule of the lectures. The lectures marked with * are for students' presentations. The lecture notes marked as draft have not been proofread by the Instructor yet.

# Date Topics Assigned Reading Extra References Notes Psets
1 Tue 09/01 Course Policy, Graph Streaming, Communication Complexity -- FKMSZ04, B08 [lec1] [pset1]
Tue 09/08 No Class: Monday's schedule
2 Tue 09/15 Maximum Cardinality Matching in a Single Pass GKK12 K13, AB19, KMM12, B20 [lec2] [pset2]
3 Tue 09/22 Palette Sparsification: (Δ+1) Coloring in a Single Pass ACK19a AA20, BCG20 [lec3] [pset3]
4 Tue 09/29 Local Ratio: Maximum Weight Matching in a Single Pass* PS17 GW19, FKMSZ04, CS14 [lec4] [pset4]
5 Tue 10/06 Graph Sketching and Dynamic Streams: Min Cut and Sparsifiers* AGM12b AGM12a, KLMMS14 [lec5] [pset5]
6 Tue 10/13 Graph Sketching and the SMP model: Maximum Matching* AKLY16 DK20, AKL17 [lec6-draft] [pset6]
7 Tue 10/20 Information Complexity: Streaming Coverage Problems AKL16 ER14, HIMV16, CW16, MV17 [lec7] [pset7]
8 Tue 10/27 Sample-and-Prune for Multi-Pass Graph Streaming Algorithms KMVV13 ACGMW15 [pset8]
9 Tue 11/03 Iterative Methods: Weighted Matching in Multiple Passes* AG11 AG15 [pset9]
10 Tue 11/10 Iterative Methods: Shortest Path in Multiple Passes* BFKL17 [lec10-draft] [pset10]
11 Tue 11/17 Sparsificiation: Exact Minimum Cut in Two Passes* -- RSW18 [lec11-draft] [pset11]
12 Tue 11/24 Pointer Chasing for Multipass Graph Streaming Lower Bounds* GO13 FKMSZ05, ACK19b, AR20 [lec12-draft] --
13 Tue 12/01 Maximum Matching in Random-Order Streams (by A. Bernstein) B20 ABBMS19, FHMRR20 -- [pset12]
14 Tue 12/08 Subgraph Counting in Small Space* MVV16 BC17, BFLMS06

Resources

There is no official textbook for this course and all required materials will be posted on this webpage. The following is a list of some helpful supplementary materials:

Bibliography

This is a list of the papers related to the topics discussed in the lectures. The list may be updated throughout the semester to add the relevant papers.

Disclaimer: This course only covers a small subset of the vast literature in graph streaming and the discussed papers are by no means a comprehensive list of the entire area.

ACGMW15 Kook Jin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth, Correlation Clustering in Data Streams. ICML 2015.
AG11 Kook Jin Ahn, Sudipto Guha, Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem. ICALP 2011.
AG15 Kook Jin Ahn, Sudipto Guha, Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints. SPAA 2015, TOPC 2018.
AGM12a Kook Jin Ahn, Sudipto Guha, Andrew McGregor, Analyzing Graph Structure via Linear Measurements. SODA 2012.
AGM12b Kook Jin Ahn, Sudipto Guha, Andrew McGregor, Graph sketches: sparsification, spanners, and subgraphs. PODS 2012.
AA20 Noga Alon, Sepehr Assadi, Palette Sparsification Beyond (Δ+1) Vertex Coloring. RANDOM 2020.
AB19 Sepehr Assadi, Aaron Bernstein, Towards a Unified Theory of Sparsification for Matching Problems. SOSA@SODA 2019.
ABBMS19 Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab Mirrokni, Cliff Stein, Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs. SODA 2019.
ACK19a Sepehr Assadi, Yu Chen, Sanjeev Khanna, Sublinear Algorithms for (Δ+1) Vertex Coloring. SODA 2019.
ACK19b Sepehr Assadi, Yu Chen, Sanjeev Khanna, Polynomial Pass Lower Bounds for Graph Streaming Algorithms. STOC 2019.
AKL16 Sepehr Assadi, Sanjeev Khanna, Yang Li, Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem. STOC 2016, SICOMP 2019.
AKL17 Sepehr Assadi, Sanjeev Khanna, Yang Li, On Estimating Maximum Matching Size in Graph Streams. SODA 2017.
AKLY16 Sepehr Assadi, Sanjeev Khanna, Yang Li, Grigory Yaroslavtsev, Maximum Matchings in Dynamic Graph Streams and the Simultaneous Communication Model. SODA 2016.
AR20 Sepehr Assadi, Ran Raz, Near-Quadratic Lower Bounds for Two-Pass Graph Streaming Algorithms. FOCS 2020.
B08 Surender Baswana, Streaming algorithm for graph spanners - single pass and constant processing time per edge.. Inf. Process. Lett 2008.
BFKL17 Ruben Becker, Sebastian Forster, Andreas Karrenbauer, Christoph Lenzen, Near-Optimal Approximate Shortest Paths and Transshipment in Distributed and Streaming Models. DISC 2017.
BCG20 Suman K. Bera, Amit Chakrabarti, Prantar Ghosh, Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models. ICALP 2020.
BC17 Suman K. Bera, Amit Chakrabarti, Towards Tighter Space Bounds for Counting Triangles and Other Substructures in Graph Streams. STACS 2017.
B20 Aaron Bernstein, Improved Bound for Matching in Random-Order Streams. ICALP 2020.
BFLMS06 Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti-Spaccamela, Christian Sohler, Counting triangles in data streams. PODS 2016.
CW16 Amit Chakrabarti, Anthony Wirth Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover. SODA 2016.
CS14 Michael Crouch, Daniel S. Stubbs Improved Streaming Algorithms for Weighted Matching, via Unweighted Matching. APPROX-RANDOM 2014.
DK20 Jacques Dark, Christian Konrad Optimal Lower Bounds for Matching and Vertex Cover in Dynamic Graph Streams. CCC 2020.
ER14 Yuval Emek, Adi Rosén Semi-Streaming Set Cover. ICALP 2014, TALG 2016.
FHMRR20 Alireza Farhadi, Mohammad Taghi Hajiaghayi, Tung Mai, Anup Rao, Ryan A. Rossi, Approximate Maximum Matching in Random Streams. SODA2020.
FKMSZ04 Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, Jian Zhang, On Graph Problems in a Semi-streaming Model. ICALP 2004, Theor. Comput. Sci. 2005.
FKMSZ05 Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, Jian Zhang, Graph Distances in the Data-Stream Model. SODA 2005, SICOMP 2008.
GW19 Mohsen Ghaffari, David Wajc, Simplified and Space-Optimal Semi-Streaming (2+ε)-Approximate Matching. SOSA@SODA 2019.
GKK12 Ashish Goel, Michael Kapralov, Sanjeev Khanna, On the communication and streaming complexity of maximum bipartite matching. SODA 2012.
GO13 Venkatesan Guruswami, Krzysztof Onak, Superlinear lower bounds for multipass graph processing. CCC 2013, Algorithmica 2016.
HIMV16 Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, Ali Vakilian, Towards Tight Bounds for the Streaming Set Cover Problem. PODS 2016.
K13 Michael Kapralov Better bounds for matchings in the streaming model. SODA 2013.
KKS15 Michael Kapralov, Sanjeev Khanna, Madhu Sudan, Streaming Lower Bounds for Approximating MAX-CUT. SODA 2015.
KK19 Michael Kapralov, Dmitry Krachun An Optimal Space Lower Bound for Approximating MAX-CUT. SODA 2013.
KLMMS14 Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, Aaron Sidford, Single Pass Spectral Sparsification in Dynamic Streams. FOCS 2014, SICOMP 2017.
KMM12 Christian Konrad, Frédéric Magniez, Claire Mathieu, Maximum Matching in Semi-Streaming with Few Passes. APPROX-RANDOM 2012.
KMVV13 Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii, Andrea Vattani, Fast greedy algorithms in mapreduce and streaming. SPAA 2013, TOPC 2015.
MV17 Andrew McGregor, Hoa T. Vu, Better Streaming Algorithms for the Maximum Coverage Problem. ICDT 2017, Theory Comput. Syst. 2019.
MVV16 Andrew McGregor, Sofya Vorotnikova, Hoa T. Vu, Better Algorithms for Counting Triangles in Data Streams. PODS 2016.
PS17 Ami Paz, Gregory Schwartzman A (2 + ε)-Approximation for Maximum Weight Matching in the Semi-Streaming Model. SODA 2017.
RSW18 Aviad Rubinstein, Tselil Schramm, S. Matthew Weinberg, Computing Exact Minimum Cuts Without Knowing the Graph. ITCS 2018.
VY11 Elad Verbin, Wei Yu, The Streaming Complexity of Cycle Counting, Sorting by Reversals, and Other Problems. SODA 2011.

LaTeX

You can download LaTeX for free here. For the purpose of this course, you do not even need to install LaTeX and can instead use an online LaTeX editor such as Overleaf.

Two great introductory resources for LaTeX are A Short Introduction to LaTeX by Allin Cottrell (for general purpose LaTeX) and LaTeX for Undergraduates by Jim Hefferson (for undergraduates mathematics) accompanied by the following cheatsheet (note that this document use "\( MATH \)" notation compared to the perhaps more widely used "$ MATH $" -- both are completely fine in LaTeX). You can also use this wonderful tool Detexify by Daniel Kirsch for finding the LaTeX commands of a symbol (just draw the symbol!).

If you are interested in learning more about LaTeX (beyond what is needed for this course), check the Wikibook on LaTeX and the Wikibook on LaTeX for Mathematics.