CS 514: Advanced Algorithms II -- Fall 2021
(Sublinear Algorithms or "Algorithms for Big Data")


Instructor Sepehr Assadi
Credits 3 units
Schedule Tuesdays 3:00 PM - 6:00 PM in TIL-226 (Livingston campus)
Prerequisites Undergraduate courses on algorithms, complexity theory, discrete mathematics, and probability; mathematical maturity.
Syllabus The full course syllabus is available here. This webpage contains the highlights of course syllabus that are potentially updated as the semester progresses.

Overview

With the emergence of massive datasets across different application domains, there is a rapidly growing interest in solving various problems over immense amounts of data. However, even most basic algorithms can become computationally prohibitive when processing massive datasets as the inputs are often too large to be stored in one place or read even once. As a result, a new set of algorithmic tools and ideas are needed for computing with exteremly constrained resources. This is the focus of sublinear algorithms, namely, algorithms whose resource requirements (e.g. time or space) are substantially smaller than the size of the input that they operate on.

We will study various advanced algorithmic ideas through the lens of sublinear algorithms in this course. In particular, we consider two most canonical models of sublinear algorithms, namely, sublinear time algorithms and streaming algorithms, and cover several key algorithmic techniques in these (and related) models, as well as discuss limitations inherent to computing with constrained resources.

Logistics

This course has no recitation sections.

COVID-19 Protocols: In accordance with Rutger’s policy, masks must be worn during class meetings. See the course syllabus for more details.

List of Topics

The following is a tentative list of topics that will be covered in this course. Along the way, we will learn about various key ideas such as probabilistic analysis of algorithms, compressed sensing, dimensionality reduction, sparsification, sketching, coresets, etc. that are used extensively in algorithm design as a whole and sublinear algorithms in particular.

Grading

The final grade for the course will be based on the following weights: More details on the grading will be posted soon.

Course Calendar

The schedule below the red line is tentative and subject to change.

# Date Topics References Lecture notes and Remarks
1 Tue 09/07 Introduction, Course Policy, Probabilistic Analysis -- Lecture Notes 1
2 Tue 09/14 Sublinear Time Algorithms: Connected Components, Average Degree CRT05, F06, GR08, S15 Lecture Notes 2
3 Tue 09/21 Query Complexity: OR Function and Connectivity BW02 Lecture Notes 3 -- Pset 1 release: [pdf]
4 Tue 09/28 Property Testing: Testing Sortedness EKKRV98 Lecture Notes 4
5 Tue 10/05 Distribution Testing: Uniformity Testing BFRSW00 Lecture Notes 5
6 Tue 10/12 Compressed Sensing and Sparse Recovery BHRRS18, RSW18 Pset 1 due date
7 Tue 10/19 Streaming Algorithms: Frequency Moments Estimation AMS96, BJKST02 Lecture Notes 7 -- Pset 2 release: [pdf]
8 Tue 10/26 Communication Complexity: Equality, Index A96, T16 Lecture Notes 8
9 Tue 11/02 Streaming Algorithms: Regression via Dimensionality Reduction CW09
10 Tue 11/09 Streaming Algorithms: Clustering via Coresets GMMMO03, G09 Pset 2 due date
11 Tue 11/16 Graph Streaming Algorithms: Connectivity, Shortest Paths, Coloring FKMSZ04, ACK19 Pset 3 release date: [pdf]
12 Tue 11/23 Graph Sketching: AGM Sketch for Connectivity AGM12

13 Tue 11/30 Multi-Pass Graph Streaming Algorithms AHK12
14 Tue 12/07 Student Presentations Pset 3 due date

Project

The project can take one of the following forms: A list of project ideas (including open theory problems and some directions to explore) will be posted sometime in October. However, you are strongly encouraged to approach the Instructor with any project idea you have on sublinear algorithms before this date to pick as your own project -- note that your project does not need to be limited to the topics discussed in class as long as it is (loosly) related to sublinear algorithms.

Project policies: Timetable for Projects:

Resources

There is no official textbook for this course and all required materials will be posted on this webpage. The following is a list of some helpful supplementary materials (this list is by no means comprehensive): And last but not the least, you should definitely check the List of Open Problems in Sublinear Algorithms as one of the best places to get recent pointers on sublinear algorithms.

Bibliography

This is a (rather incomprehensive) list of the papers related to the topics discussed in the lectures. The list will be updated after each lecture to add the new relevant papers.

A96 Farid M. Ablayev, Lower Bounds for One-Way Probabilistic Communication Complexity and Their Application to Space Complexity. Theor. Comput. Sci. 1996, ICALP 1993.
AG11 Kook Jin Ahn, Sudipto Guha, Linear Programming in the Semi-streaming Model with Application to the Maximum Matching Problem. ICALP 2011.
AGM12 Kook Jin Ahn, Sudipto Guha, Andrew McGregor, Analyzing Graph Structure via Linear Measurements. SODA 2012.
AMS96 Noga Alon, Yossi Matias, and Mario Szegedy, The space complexity of approximating the frequency moments. STOC 1996.
AHK12 Sanjeev Arora, Elad Hazan, Satyen Kale, The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing 2012.
ACK19 Sepehr Assadi, Yu Chen, Sanjeev Khanna, Sublinear Algorithms for (Δ+1) Vertex Coloring. SODA 2019.
BFRSW00 Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, Patrick White, Testing that distributions are close. FOCS 2000.
BJKST02 Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, Luca Trevisan, Counting Distinct Elements in a Data Stream. RANDOM 2002.
BHRRS18 Paul Beame, Sariel Har-Peled, Sivaramakrishnan Natarajan Ramamoorthy, Cyrus Rashtchian, Makrand Sinha, Edge Estimation with Independent Set Oracles. ITCS 2018.
BW02 Harry Buhrman, Ronald de Wolf, Complexity Measures and Decision Tree Complexity: A Survey. Theor. Comput. Sci., 2002.
CW09 Kenneth L. Clarkson, David P. Woodruff, Numerical Linear Algebra in the Streaming Model. STOC 2009.
CRT05 Bernard Chazelle, Ronitt Rubinfeld, Luca Trevisan, Approximating the Minimum Spanning Tree Weight in Sublinear Time. SIAM Journal of Computing 2005, ICALP 2001.
EKKRV98 Funda Ergün, Sampath Kannan, Ravi Kumar, Ronitt Rubinfeld, Mahesh Viswanathan, Spot-Checkers. STOC 1998.
F06 Uriel Feige, On Sums of Independent Random Variables with Unbounded Variance and Estimating the Average Degree in a Graph. SIAM Journal of Computing 2006, STOC 2004.
FKMSZ04 Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, Jian Zhang On Graph Problems in a Semi-streaming Model. ICALP 2004.
G09 Sudipto Guha, Tight Results for Clustering and Summarizing Data Streams. ICDT 2009.
GMMMO03 Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, Liadan O'Callaghan, Clustering Data Streams: Theory and Practice. IEEE Trans. Knowl. Data Eng. 2003, FOCS 2000.
GR08 Oded Goldreich, Dana Ron, Approximating average parameters of graphs. Random Structures and Algorithms 2006, APPROX-RANDOM 2006.
PR07 Michal Parnas, Dana Ron, Approximating the Minimum Vertex Cover in Sublinear Time and a Connection to Distributed Algorithms. Theor. Comput. Sci., 2007.
RSW18 Aviad Rubinstein, Tselil Schramm, S. Matthew Weinberg, Computing Exact Minimum Cuts Without Knowing the Graph. ITCS 2018.
RTVX11 Ronitt Rubinfeld, Gil Tamir, Shai Vardi, Ning Xie, Fast Local Computation Algorithms. I(T)CS 2011.
S15 C. Seshadhri, A simpler sublinear algorithm for approximating the triangle count. available on arXiv.
T16 Tim Roughgarden, Communication Complexity (for Algorithm Designers). Foundations and Trends in Theoretical Computer Science 2016.

LaTeX

You can download LaTeX for free here. For the purpose of this course, you do not even need to install LaTeX and can instead use an online LaTeX editor such as Overleaf.

Two great introductory resources for LaTeX are A Short Introduction to LaTeX by Allin Cottrell (for general purpose LaTeX) and LaTeX for Undergraduates by Jim Hefferson (for undergraduates mathematics) accompanied by the following cheatsheet (note that this document use "\( MATH \)" notation compared to the perhaps more widely used "$ MATH $" -- both are completely fine in LaTeX). You can also use this wonderful tool Detexify by Daniel Kirsch for finding the LaTeX commands of a symbol (just draw the symbol!).

If you are interested in learning more about LaTeX (beyond what is needed for this course), check the Wikibook on LaTeX and the Wikibook on LaTeX for Mathematics.

Rutgers CS Diversity and Inclusion Statement

Rutgers Computer Science Department is committed to creating a consciously anti-racist, inclusive community that welcomes diversity in various dimensions (e.g., race, national origin, gender, sexuality, disability status, class, or religious beliefs). We will not tolerate micro-aggressions and discrimination that creates a hostile atmosphere in the class and/or threatens the well-being of our students. We will continuously strive to create a safe learning environment that allows for the open exchange of ideas and cherished freedom of speech, while also ensuring equitable opportunities and respect for all of us. Our goal is to maintain an environment where students, staff, and faculty can contribute without the fear of ridicule or intolerant or offensive language.

If you witness or experience racism, discrimination micro-aggressions, or other offensive behavior, you are encouraged to bring it to the attention to the undergraduate program director and/or the department chair. You can also report it to the Bias Incident Reporting System

COVID-19 Protocols

In order to protect the health and well-being of all members of the University community, masks must be worn by all persons on campus when in the presence of others (within six feet) and in buildings in non-private enclosed settings (e.g., common workspaces, workstations, meeting rooms, classrooms, etc.). Masks must be worn during class meetings; any student not wearing a mask will be asked to leave.

Masks should conform to CDC guidelines and should completely cover the nose and mouth.

If you are feeling sick, or suspect you may have been exposed to COVID-19, do not come to the class. Arrangements will be made for students who are not able to attend class because of an illness or quarantine.