Spring 2018

Sublinear algorithms, local algorithms and robust statistics


With the emergence of big data, and the huge combinatorial objects that make up this data, there is a need for understanding data that is too large to store or view in its entirety. As much of today's data has some sort of combinatorial structure, the questions that now arise have relatively unfamiliar flavors: For example, does the Twitter graph have small diameter? How fast is a networked community growing? Does a distribution over traffic accidents in a very large map have certain shape properties, such as monotone increasing density with distance to a traffic light, or similarity to a Poisson Multinomial distribution? Determining the scope of what is possible to understand in such settings is a basic and fundamental endeavor, that has been addressed in multiple ways within the communities of statistics, mathematics, computer science, and information theory. Models that have been studied include: streaming algorithms, where big data streams by, and the algorithm has limited space in which to store information and intermediate computations; sketching algorithms, where the input is compressed (sketched) into a smaller domain in a way that certain operations can be supported on the sketches; compressive sensing algorithms, where a signal is recovered from noisy measurements; sublinear time algorithms, in which algorithms approximate a parameter of the data after viewing a miniscule fraction of the data; local algorithms that compute and make decisions on parts of the output considering only a portion of the input; property testing algorithms, in which a quick determination of what properties the data has must be made in sublinear time in the size of the data; and distribution property testing algorithms, in which properties of the distribution must be understood in time sublinear in the size of the sampled domain.


As big data is getting bigger, there is a need for analyzing data with sublinear constraints -- that is, for algorithms which require only sublinear time, space, measurements and/or samples. The goal of this workshop is to bring together experts in various areas of Computer Science, Electrical Engineering, Statistics and Mathematics to discuss recent work and exciting new challenges. It is hoped that the multidisciplinary nature of the workshop will highlight common goals and themes, as well as to facilitate an interchange of technical ideas that may be of use more widely than previously thought. The workshop will be preceded by a one day bootcamp on June 10, with the goal of presenting the basic techniques, definitions and goals in several of the communities.


The workshop will take place at MIT (room 2-190) on June 11-13. Bootcamp on June 10 (room 32-G449).

Confirmed workshop speakers

  • Ankur Moitra (MIT)
  • Michael Mitzenmacher (Harvard)
  • Moses Charikar (Stanford)
  • Yihong Wu (Yale)
  • Alon Orlitsky (UCSD)
  • Huy Nguyen (Northeastern)
  • Ilias Diakonikolas (USC)
  • Deanna Needell (UCLA)
  • David Woodruff (CMU)
  • Vladimir Braverman (Johns Hopkins)
  • Parikishit Gopalan (VMware Research)
  • Noga Ron Zewi (University of Haifa)
  • Artur Czumaj (U. of Warwick)
  • Sofya Raskhodnikova (Boston U.)
  • Jacob Fox (Stanford)
  • Andrea Montanari (Stanford)
  • Jelani Nelson (Harvard)

Confirmed bootcamp speakers

  • Yufei Zhao (MIT)
  • David Gamarnik (MIT)
  • Alex Andoni (Columbia)
  • Krzysztof Onak (IBM Yorktown)
  • Costas Daskalakis (MIT)


  • David Gamarnik (MIT)
  • Piotr Indyk (MIT)
  • Philippe Rigollet (MIT)
  • Ronitt Rubinfeld (MIT) -- Lead organizer
  • Devavrat Shah (MIT)
  • Madhu Sudan (Harvard)
  • Rachel Ward (UT Austin)
  • Yihong Wu (Yale)
Local information

Practical information