MIFODS Seminar

MIFODS events in the next two weeks.MIFODS runs a joint seminar with other seminars at MIT: The Stochastics and Statistics seminar, the LIDS seminar, the ToC colloquium, the Machine Learning seminar, the Applied Math colloquium and the Combinatorics seminar.

Seminars

All seminars in chronological order
mifods seminar

Afonso Bandeira (NYU)

Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment

Video
Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions andorientations; Cryo-Electron Microscopy (Cryo-EM) imaging where projections of a molecule density are taken from unknown rotations, andseveral others. One fundamental example of this type of problems is Multi-Reference Alignment: Given a group acting in a space, the goal is to estimate an orbit of the group action from noisy samples. For example, in one of its simplest forms, one is tasked with estimating a signal from noisy cyclically shifted copies. We will show that the number of observations needed by any method has a surprising dependency on the signal-to-noise ratio (SNR), and algebraic properties of the underlying group action. Remarkably, in some important cases, this sample complexity is achieved with computationally efficient methods based on computing invariants under the group of transformations.
mifods seminar

Justin Solomon (MIT)

Scaling and Broadening the Scope of Computational Transport

Video
The theory of optimal transport links probability to geometry, providing a natural means to characterize, compare, and manipulate probability distributions supported on a geometric domain. While optimal transport has long been identified as a candidate for improving pipelines in computer vision, graphics, statistics, and other disciplines, computational considerations have made it challenging to apply in practice; recent improvements in the computational pipeline, however, have motivated renewed interest in this discipline among members of the computer science community. In this talk, I will discuss recent efforts jointly with collaborators and students in the MIT Geometric Data Processing Group to develop efficient algorithms for computational optimal transport and derived problems, leveraging structure in key applications of this machinery. I will highlight new techniques for stochastic transport, transport along discrete surfaces, and transport along graphs, as well as extensions of the basic problem to tensor-valued measures and to quadratic assignment problems.
mifods seminar

David Sontag (MIT)

When Inference is Tractable

Video
A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats ’18), structure in model parameters (Weller, Rowland, Sontag, AI Stats ’16), and reinterpreting inference as ground truth recovery (Globerson, Roughgarden, Sontag, Yildirim, ICML ’15).
mifods seminar

Ohad Shamir (Weizman Institute)

Is Depth Needed for Deep Learning?

Video
Deep learning, as its name indicates, is based on training artificial neural networks with many layers. A key theoretical question is to understand why such depth is beneficial, and when is it provably necessary to express certain types of functions. In fact, this question is closely related to circuit complexity, which has long been studied in theoretical computer science -- albeit for different reasons, and for circuits which differ in some important ways from modern neural networks. Despite this similarity, the interaction between the circuit complexity and machine learning research communities is currently quite limited. In this talk, I'll survey some of the recent depth separation results developed in the machine learning community, and discuss open questions where insights from circuit complexity might help. The talk is aimed at a general theoretical computer science audience, and no prior knowledge about deep learning will be assumed.
mifods seminar

Johannes Schmidt-Hieber (Leiden)

Statistical theory for deep neural networks with ReLU activation function

Video
The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.
mifods seminar

Sanjoy Dasgupta (UCSD)

Using interaction for simpler and better learning

Video
In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process. We will see how interaction makes it possible to: 1. Learn DNF (disjunctive normal form) concepts. 2. Perform machine teaching in situations where the student’s concept class is unknown. 3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.
mifods seminar

Carlos Fernandez-Granda (NYU)

Sparse Recovery Beyond Compressed Sensing

Video
Recovering sparse signals from underdetermined linear measurements is a challenging problem. Deconvolution in reflection seismology and imaging, source localization in EEG, estimation of relaxation parameters in MRI, and direction-of-arrival estimation in radar can all be reformulated as sparse inverse problems. Convex-programming methods based on l1-norm minimization are widely applied to tackle such problems in practice, but current theoretical guarantees are mostly focused on randomized measurement operators that are not relevant to these applications. In this talk, we present a theoretical framework to analyze these methods for realistic deterministic operators, which yields exact- recovery guarantees under certain conditions on the signal support that are related to the correlation structure of the linear operator.
mifods seminar

Shai Shalev-Shwartz (MobileEye)

Autonomous driving: From a science project to mass production

Video
In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of the talk is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a "winter of autonomous driving". In the first part of the talk I will show why statistical guarantees give a very weak safety and will propose instead a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). Time permitting, the second part of the talk will involve reinforcement learning techniques for scalable driving policy.
mifods seminar

Dejan Slepcev (CMU)

Variational problems on random structures and their continuum limits

Video
We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying “ground-truth” measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts, graph laplacians and their limits as the number of data points goes to infinity. We will discuss the limits of functionals when the number of data points goes to infinity. In particular we establish under what conditions the minimizers of discrete problems have a well defined continuum limit.
mifods seminar

Moritz Hardt (UC Berkeley)

When Recurrent Models Don't Need To Be Recurrent

Video
We show that stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Our result applies to a broad range of non-linear recurrent neural networks under a natural stability condition, which we observe is also necessary. Complementing our theoretical findings, we verify the conclusions of our theory on both real and synthetic tasks. Furthermore, we demonstrate recurrent models satisfying the stability assumption of our theory can have excellent performance on real sequence learning tasks. Joint work with John Miller (UC Berkeley).
mifods seminar

Costis Daskalakis (MIT)

Improving Generative Adversarial Networks using Game Theory and Statistics

Video
Generative Adversarial Networks (aka GANs) are a recently proposed approach for learning samplers of high-dimensional distributions with intricate structure, such as distributions over natural images, given samples from these distributions. They are obtained by setting up a two-player zero-sum game between two neural networks, which learn statistics of a target distribution by adapting their strategies in the game using gradient descent. Despite their intriguing performance in practice, GANs pose great challenges to both optimization and statistics. Their training suffers from oscillations, and they are difficult to scale to high-dimensional settings. We study how game-theoretic and statistical techniques can be brought to bare on these important challenges. We use Game Theory towards improving GAN training, and Statistics towards scaling up the dimensionality of the generated distributions.
mifods seminar

Suresh Venkatasubramanian (Utah)

Towards a theory (or theories) of fairness in automated decision-making

Video
With the advent of automated decision making has come a whole new vocabulary for algorithm design. We say that automated decision-making should be "fair" or "unbiased" or "nondiscriminatory". But our algorithms speak only mathematics, and these words need translation (if at all they can be translated). Over the last few years, the research area of fairness, accountability and transparency has made attempts to formalize what it means for an algorithm to be fair. In this talk I'll discuss what has become a conversation between society and belief systems on the one hand, and mathematical formalisms on the other. I'll explain the different formalizations of fairness and how they reflect differing perspectives on how society understands fairness and bias. On the other hand, I'll describe how the formalizations lead to conclusions about the possibility (or not) of fair decision-making.
mifods seminar

Tselil Schramm (Harvard+MIT)

Efficient Algorithms for the Graph Matching Problem in Correlated Random Graphs

Video
The Graph Matching problem is a robust version of the Graph Isomorphism problem: given two not-necessarily-isomorphic graphs, the goal is to find a permutation of the vertices which maximizes the number of common edges. We study a popular average-case variant; we deviate from the common heuristic strategy and give the first quasi-polynomial time algorithm, where previously only sub-exponential time algorithms were known.
Based on joint work with Boaz Barak, Chi-Ning Chou, Zhixian Lei, and Yueqi Sheng.
mifods seminar

John Duchi (Stanford)

Locally private estimation, learning, inference, and optimality

Video
In this talk, we investigate statistical learning and estimation under local privacy constraints, where data providers do not trust the collector of the data and so privatize their data before it is even collected. We identify fundamental tradeoffs between statistical utility and privacy in such local models of privacy, providing instance-specific bounds for private estimation and learning problems by developing local minimax risks. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to evaluate the difficulty of individual problem instances and delineate the possibilities for adaptation in private estimation and inference. As part of this, we identify an alternative to the Fisher information for private estimation, giving a more nuanced understanding of the challenges of adaptivity and optimality. We also provide optimal procedures for private inference, highlighting the importance of a more careful development of optimal tradeoffs between estimation and privacy. One consequence of our results is to identify settings where standard local privacy restrictions may be too strong for practice; time permitting, I will then discuss a few new directions that maintain limited amounts of privacy while simultaneously allowing the development of high-performance statistical and learning procedures.
Based on joint work with Feng Ruan.
mifods seminar

Emma Brunskill (Stanford)

Towards Better Reinforcement Learning for High Stakes Domains

There is increasing excitement about reinforcement learning-- a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. One key question is how to leverage the ever expanding logs of sequences of decisions made and their outcomes to identify better decision policies, such as using electronic medical record data to inform better personalized patient treatments. I will discuss some of our work on developing better statistical estimators and optimizers for this problem, from the vantage point of a reinforcement learning researcher. A second key issue is to narrow the gap between RL theory and experiment, to create online reinforcement learning algorithms with strong guarantees on their performance. In this effort, I will briefly describe our recent work on policy certificates for RL algorithms. This work is a step towards providing the types of guarantees and transparency that will enable us to confidently deploy RL in important high stakes scenarios.
mifods seminar

Parikshit Gopalan (VMware)

Good visualizations have short sketches

We consider the problem of interactively visualizing a distributed tabular dataset with billions of rows. Our goal is to allow a user to browse such datasets in realtime, at any resolution, with just a few mouse clicks (much like navigating in an online map service) .
We hypothesize that the sketching model of computation is tailor-made for this setting: interesting visualizations are amenable to efficient sketching protocols, whose communication cost is bounded by the size of the desired output (which is small, since it fits on screen). In this talk, we present Hillview, an open-source tool for interactive visualization of large datasets, built around this premise. We focus on the algorithmic challenges that arise in trying to render common visualizations in the sketching model.
Based on joint work with Mihai Budiu, Udi Weider, Marcos Aguilera, Lalith Suresh and Han Kruiger (all from VMware research).
mifods seminar

Emmanuel Abbe (EPFL)

When deep learning fails at learning

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This talk gives a first set of results proving that deep learning algorithms fail at learning certain kinds of functions in poly-time, focusing on the classical problem of parities. It further points out to information measures that quantify when these failures tend to generalize, and the tradeoffs between memory, noise and initialization. Joint work with Colin Sandon.