Workshop on Algorithms for Large Data (Online) 2025

This workshop aims to foster collaborations between researchers across multiple disciplines through a set of central questions and techniques for algorithm design for large data. By bringing together experts from diverse fields, we seek to explore innovative approaches and shared challenges in this rapidly evolving area. Due to a generous grant from SIGACT, the workshop is free to attend for all interested participants.

What	Workshop on Algorithms for Large Data (Online)
When	Monday, April 14 - Wednesday, April 16, 2025
Where	The workshop will be held virtually.

Organizers

Guy Blanc (Stanford)
Shivam Nadimpalli (MIT)
Quanquan C. Liu (Yale)
Samson Zhou (Texas A&M University)

Registration

The workshop is now over. Thanks for your interest in WALDO 2025!

Speakers

Maryam Aliakbarpour (Rice)
Sepehr Assadi (University of Waterloo)
Soheil Behnezhad (Northeastern)
Josh Brakensiek (UC Berkeley)
Mark Braverman (Princeton)
Karthik C.S. (Rutgers)
Clément Canonne (University of Sydney)
Vincent Cohen-Addad (Google Research NYC)
Sumegha Garg (Rutgers)
Elena Grigorescu (University of Waterloo)
Anupam Gupta (NYU)
Meghal Gupta (UC Berkeley)
Piotr Indyk (MIT)
Michael Kapralov (EPFL)
Sanjeev Khanna (UPenn)
Pravesh Kothari (Princeton)
Jane Lange (MIT)
Slobodan Mitrović (UC Davis)
Ronitt Rubinfeld (MIT)
Rocco Servedio (Columbia)
Madhur Tulsiani (TTIC)
Nicole Wein (Michigan)
Sofya Raskhodnikova (Boston University)
David P. Woodruff (Carnegie Mellon)
Huacheng Yu (Princeton)

Schedule

Schedule
See below for talk details, including abstracts.

Time (pm ET)	Day 1 (Mon Apr 14)	Day 2 (Tue Apr 15)	Day 3 (Wed Apr 16)
12:00-12:05	ACM Remarks	Steering Committee	Organizers
12:05-12:30	Anupam Gupta	Michael Kapralov	Ronitt Rubinfeld
12:30-12:55	Mark Braverman	Sanjeev Khanna	Sofya Raskhodnikova
12:55-13:10	Coffee Break	Coffee Break	Coffee Break
13:10-13:35	David Woodruff	Nicole Wein	Piotr Indyk
13:35-14:00	Slobodan Mitrović	Karthik C.S.	Sumegha Garg
14:00-14:25	Poster Session	Soheil Behnezhad	Poster Session
14:25-15:00	Poster Session	Lunch	Poster Session
15:00-15:25	Madhur Tulsiani	Vincent Cohen-Addad	Sepehr Assadi
15:25-15:50	Meghal Gupta	Elena Grigorescu	Huacheng Yu
15:50-16:05	Coffee Break	Coffee Break	Coffee Break
16:05-16:30	Rocco Servedio	Jane Lange	Maryam Aliakbarpour
16:30-16:55	Pravesh Kothari	Joshua Brakensiek	Clément Canonne

12:00 pm ET

Opening Remarks by SIGACT

12:05pm ET

Anupam Gupta: The Price of Explainability for Clustering [Abstract]

12:30 pm ET

Mark Braverman: Optimality of Frequency Moment Estimation [Abstract]

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

David P. Woodruff: Lifting Linear Sketches: Optimal Bounds and Adversarial Robustness [Abstract]

13:35 pm ET

Slobodan Mitrović: Computing Graph Cuts Privately [Abstract]

14:00 pm ET

Monday Poster Session

15:00 pm ET

Madhur Tulsiani: Expander Graphs and Optimally List-Decodable Codes [Abstract]

15:25 pm ET

Meghal Gupta: Stream-Decodable Error-Correcting Codes [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Rocco Servedio: Is Nasty Noise Actually Harder than Malicious Noise? [Abstract]

16:30 pm ET

Pravesh Kothari: The Quasi-polynomial Low-Degree Conjecture is False [Abstract]

12:00 pm ET

Opening Remarks by Steering Committee

12:05 pm ET

Michael Kapralov: Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions [Abstract]

12:30 pm ET

Sanjeev Khanna: Correlation Clustering and (De)Sparsification: Graph Sketches Can Match Classical Algorithms [Abstract]

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

Nicole Wein: Covering Approximate Shortest Paths with DAGs [Abstract]

13:35 pm ET

Karthik C.S.: Near-Optimal Lower Bound for Parameterized Euclidean k-means [Abstract]

14:00 pm ET

Soheil Behnezhad: Vizing's Theorem in Near-Linear Time [Abstract]

14:25 pm ET

Break (35 mins)

15:00 pm ET

Vincent Cohen-Addad: Solving the Correlation Cluster LP in Sublinear Time [Abstract]

15:25 pm ET

Elena Grigorescu: Differential Privacy and Sublinear-Time are Incompatible Sometimes [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Jane Lange: Lifting Uniform Learners via Distributional Decomposition [Abstract]

16:30 pm ET

Josh Brakensiek: Redundancy Is All You Need (for Sparsification) [Abstract]

12:00 pm ET

Opening Remarks by Organizers

12:05 pm ET

Ronitt Rubinfeld: Efficient 2-Coloring of Random Uniform Hypergraphs [Abstract]

12:30 pm ET

Sofya Raskhodnikova: Fully Dynamic Algorithms for Graphs with Edge Differential Privacy [Abstract]

We study differentially private algorithms for analyzing graphs in the challenging setting of continual release with fully dynamic updates, where edges are inserted and deleted over time, and the algorithm is required to update the solution at every time step. Previous work has presented differentially private algorithms for many graph problems that can handle insertions only or deletions only (called partially dynamic algorithms) and obtained some hardness results for the fully dynamic setting. The only algorithms in the latter setting were for the edge count, given by Fichtenberger, Henzinger, and Ost (ESA '21), and for releasing the values of all graph cuts, given by Fichtenberger, Henzinger, and Upadhyay (ICML '23). We provide the first differentially private and fully dynamic graph algorithms for several other fundamental graph statistics (including the triangle count, the number of connected components, the size of the maximum matching, and the degree histogram), analyze their error, and show strong lower bounds on the error for all algorithms in this setting.

In the talk, we will discuss two variants of edge differential privacy for fully dynamic graph algorithms and our current understanding of the error achievable under both variants: event-level and item-level. Under the former notion, two graph update sequences are considered neighboring if, roughly speaking, they differ in at most one update; under the latter notion, they can differ only in updates pertaining to one edge. Differential privacy requires that for any two neighboring inputs, the output distributions of the algorithm are close. We give upper and lower bounds on the error of both---event-level and item-level---fully dynamic algorithms for several fundamental graph problems. No fully dynamic algorithms that are private at the item-level (the more stringent of the two notions) were known before. In the case of item-level privacy, for several problems, our algorithms match our lower bounds.

Joint work with Teresa Anna Steiner

12:55 pm ET

Coffee Break (15 mins)

13:10 pm ET

Piotr Indyk: A Bi-metric Framework for Fast Similarity Search [Abstract]

13:35 pm ET

Sumegha Garg: A New Information Complexity Measure for Multi-Pass Streaming Algorithms [Abstract]

14:00 pm ET

Wednesday Poster Session

15:00 pm ET

Sepehr Assadi: Distributed Triangle Detection is Hard in Few Rounds [Abstract]

15:25 pm ET

Huacheng Yu: Near-Optimal Relative Error Streaming Quantile Estimation via Elastic Compactors [Abstract]

15:50 pm ET

Coffee Break (15 mins)

16:05 pm ET

Maryam Aliakbarpour: Leveraging Predictions for Efficient Hypothesis Testing in Discrete Distributions [Abstract]

16:30 pm ET

Clément Canonne: Better Private Distribution Testing by Leveraging Unverified Auxiliary Data [Abstract]

Posters

Posters
Monday Poster Session:

Anay Mehrotra (Yale): Can SGD Select Good Fishermen? Local Convergence Under Self-Selection Biases and Beyond
Elena Gribelyuk (Princeton): Lifting Linear Sketches: Optimal Bounds and Adversarial Robustness
Emile Anand (Georgia Tech): The Structural Complexity of Matrix-Vector Multiplication
Honghao Lin (Carnegie Mellon): A Strong Separation for Adversarially Robust ℓ₀ Estimation for Linear Sketches
Janani Sundaresan (University of Waterloo): Optimal Communication Complexity of Chained Index
Justin Y. Chen (MIT): Statistical-Computational Trade-offs for Hypothesis Selection
Konstantinos Stavropoulos (UT Austin): Efficiently Certifiable Guarantees for Learning with Distribution Shift
Lydia Zakynthinou (UC Berkeley): Dimension-free Private Mean Estimation
Nicolas Menand (UPenn): Streaming and Massively Parallel Algorithms for Euclidean Max-Cut
Prashanti Anderson (MIT): Sample-Optimal Private Regression in Polynomial Time
Rhea Jain (UIUC): Streaming Algorithms for Vertex Connectivity Network Design
Shourya Pandey (UT Austin): Black-Box k-to-1 PCA Reductions
Vaidehi Srinivas (Northwestern): Computing High-dimensional Confidence Sets of Arbitrary Distributions
William Guo (UPenn): Oja’s Algorithm for Streaming PCA: Spectral Guarantees for Sparse Matrices

Wednesday Poster Session:

Alma Ghafari (Northeastern): Fully Dynamic Matching and Ordered Ruzsa-Szemerédi Graphs
Anna Brandenberger (MIT): Learning Networks from Dynamics: Detecting Abrupt Changes in Point Processes
Chengyuan Deng (Rutgers University): On the Price of Differential Privacy for Hierarchical Clustering
Debanuj Nayak (Boston University): Differentially Private Multi-Sampling from Distributions
Dmitrii Avdiukhin (Northwestern): Embedding Dimension of Contrastive Learning and k-Nearest Neighbors
Felix Zhou (Yale): Private Training & Synthetic Data via DP Clustering
Junze Yin (Rice): Alternating Minimization for Matrix Completion and Beyond
Maoyuan "Raymond" Song (Purdue): Learning-Augmentation for Online Convex Covering and Concave Packing
Nithish Kumar Kumar (Purdue): Multicriteria Spanners – A New Tool for Network Design
Pachara Sawettamalya (Princeton): Strong XOR Lemma for Information Complexity
Pooja Kulkarni (UIUC): Trustworthy Discrete Resource Allocation: Beyond Offline and Linear Settings
Shenghao Xie (Texas A&M University): Perfect Sampling in Turnstile Streams Beyond Small Moments
Themistoklis Haris (Boston University): Robust Data Structures for Searching under Adaptive Queries
Wei Zhang (MIT): Optimal k-secretary with Logarithmic Memory

Support

WALDO 2025 is generously supported by a community grant from SIGACT. Web design by Pedro Paredes.

Steering Committee

Ainesh Bakshi (MIT)
Vladimir Braverman (Johns Hopkins)
Rajesh Jayaram (Google Research NYC)