MIFODS Seminar

MIFODS events in the next two weeks.MIFODS runs a joint seminar with other seminars at MIT: The Stochastics and Statistics seminar, the LIDS seminar, the ToC colloquium, the Machine Learning seminar, the Applied Math colloquium and the Combinatorics seminar.

Seminars

All seminars in chronological order
mifods seminar

Afonso Bandeira (NYU)

Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment

Video
Many problems in signal/image processing, and computer vision amount to estimating a signal, image, or tri-dimensional structure/scene from corrupted measurements. A particularly challenging form of measurement corruption are latent transformations of the underlying signal to be recovered. Many such transformations can be described as a group acting on the object to be recovered. Examples include the Simulatenous Localization and Mapping (SLaM) problem in Robotics and Computer Vision, where pictures of a scene are obtained from different positions andorientations; Cryo-Electron Microscopy (Cryo-EM) imaging where projections of a molecule density are taken from unknown rotations, andseveral others. One fundamental example of this type of problems is Multi-Reference Alignment: Given a group acting in a space, the goal is to estimate an orbit of the group action from noisy samples. For example, in one of its simplest forms, one is tasked with estimating a signal from noisy cyclically shifted copies. We will show that the number of observations needed by any method has a surprising dependency on the signal-to-noise ratio (SNR), and algebraic properties of the underlying group action. Remarkably, in some important cases, this sample complexity is achieved with computationally efficient methods based on computing invariants under the group of transformations.
mifods seminar

Justin Solomon (MIT)

Scaling and Broadening the Scope of Computational Transport

Video
The theory of optimal transport links probability to geometry, providing a natural means to characterize, compare, and manipulate probability distributions supported on a geometric domain. While optimal transport has long been identified as a candidate for improving pipelines in computer vision, graphics, statistics, and other disciplines, computational considerations have made it challenging to apply in practice; recent improvements in the computational pipeline, however, have motivated renewed interest in this discipline among members of the computer science community. In this talk, I will discuss recent efforts jointly with collaborators and students in the MIT Geometric Data Processing Group to develop efficient algorithms for computational optimal transport and derived problems, leveraging structure in key applications of this machinery. I will highlight new techniques for stochastic transport, transport along discrete surfaces, and transport along graphs, as well as extensions of the basic problem to tensor-valued measures and to quadratic assignment problems.
mifods seminar

David Sontag (MIT)

When Inference is Tractable

Video
A key capability of artificial intelligence will be the ability to reason about abstract concepts and draw inferences. Where data is limited, probabilistic inference in graphical models provides a powerful framework for performing such reasoning, and can even be used as modules within deep architectures. But, when is probabilistic inference computationally tractable? I will present recent theoretical results that substantially broaden the class of provably tractable models by exploiting model stability (Lang, Sontag, Vijayaraghavan, AI Stats ’18), structure in model parameters (Weller, Rowland, Sontag, AI Stats ’16), and reinterpreting inference as ground truth recovery (Globerson, Roughgarden, Sontag, Yildirim, ICML ’15).
mifods seminar

Ohad Shamir (Weizman Institute)

Is Depth Needed for Deep Learning?

Video
Deep learning, as its name indicates, is based on training artificial neural networks with many layers. A key theoretical question is to understand why such depth is beneficial, and when is it provably necessary to express certain types of functions. In fact, this question is closely related to circuit complexity, which has long been studied in theoretical computer science -- albeit for different reasons, and for circuits which differ in some important ways from modern neural networks. Despite this similarity, the interaction between the circuit complexity and machine learning research communities is currently quite limited. In this talk, I'll survey some of the recent depth separation results developed in the machine learning community, and discuss open questions where insights from circuit complexity might help. The talk is aimed at a general theoretical computer science audience, and no prior knowledge about deep learning will be assumed.
mifods seminar

Johannes Schmidt-Hieber (Leiden)

Statistical theory for deep neural networks with ReLU activation function

Video
The universal approximation theorem states that neural networks are capable of approximating any continuous function up to a small error that depends on the size of the network. The expressive power of a network does, however, not guarantee that deep networks perform well on data. For that, control of the statistical estimation risk is needed. In the talk, we derive statistical theory for fitting deep neural networks to data generated from the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to logarithmic factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints such as (generalized) additive models. While there is a lot of flexibility in the network architecture, the tuning parameter is the sparsity of the network. Specifically, we consider large networks with number of potential parameters being much bigger than the sample size. Interestingly, the depth (number of layers) of the neural network architectures plays an important role and our theory suggests that scaling the network depth with the logarithm of the sample size is natural.
mifods seminar

Sanjoy Dasgupta (UCSD)

Using interaction for simpler and better learning

Video
In the usual setup of supervised learning, the learner is given a stack of labeled examples and told to fit a classifier to them. It would be quite unnatural for a human to learn in this way, and indeed this model is known to suffer from a variety of fundamental hardness barriers. However, many of these hurdles can be overcome by moving to a setup in which the learner interacts with a human (or other information source) during the learning process. We will see how interaction makes it possible to: 1. Learn DNF (disjunctive normal form) concepts. 2. Perform machine teaching in situations where the student’s concept class is unknown. 3. Improve the results of unsupervised learning. We will present a generic approach to “interactive structure learning” that, for instance, yields simple interactive algorithms for topic modeling and hierarchical clustering. Along the way, we will present a novel cost function for hierarchical clustering, as well as an efficient algorithm for approximately minimizing this cost.
mifods seminar

Carlos Fernandez-Granda (NYU)

Sparse Recovery Beyond Compressed Sensing

Video
Recovering sparse signals from underdetermined linear measurements is a challenging problem. Deconvolution in reflection seismology and imaging, source localization in EEG, estimation of relaxation parameters in MRI, and direction-of-arrival estimation in radar can all be reformulated as sparse inverse problems. Convex-programming methods based on l1-norm minimization are widely applied to tackle such problems in practice, but current theoretical guarantees are mostly focused on randomized measurement operators that are not relevant to these applications. In this talk, we present a theoretical framework to analyze these methods for realistic deterministic operators, which yields exact- recovery guarantees under certain conditions on the signal support that are related to the correlation structure of the linear operator.
mifods seminar

Shai Shalev-Shwartz (MobileEye)

Autonomous driving: From a science project to mass production

Video
In recent years, car makers and tech companies have been racing towards self driving cars. It seems that the main parameter in this race is who will have the first car on the road. The goal of the talk is to add to the equation two additional crucial parameters. The first is standardization of safety assurance --- what are the minimal requirements that every self-driving car must satisfy, and how can we verify these requirements. The second parameter is scalability --- engineering solutions that lead to unleashed costs will not scale to millions of cars, which will push interest in this field into a niche academic corner, and drive the entire field into a "winter of autonomous driving". In the first part of the talk I will show why statistical guarantees give a very weak safety and will propose instead a white-box, interpretable, mathematical model for safety assurance, which we call Responsibility-Sensitive Safety (RSS). Time permitting, the second part of the talk will involve reinforcement learning techniques for scalable driving policy.
mifods seminar

Dejan Slepcev (CMU)

Variational problems on random structures and their continuum limits

Video
We will discuss variational problems arising in machine learning and their limits as the number of data points goes to infinity. Consider point clouds obtained as random samples of an underlying “ground-truth” measure. Graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points. Many machine learning tasks, such as clustering and semi-supervised learning, can be posed as minimizing functionals on such graphs. We consider functionals involving graph cuts, graph laplacians and their limits as the number of data points goes to infinity. We will discuss the limits of functionals when the number of data points goes to infinity. In particular we establish under what conditions the minimizers of discrete problems have a well defined continuum limit.
mifods seminar

Moritz Hardt (UC Berkeley)

When Recurrent Models Don't Need To Be Recurrent

Video
We show that stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Our result applies to a broad range of non-linear recurrent neural networks under a natural stability condition, which we observe is also necessary. Complementing our theoretical findings, we verify the conclusions of our theory on both real and synthetic tasks. Furthermore, we demonstrate recurrent models satisfying the stability assumption of our theory can have excellent performance on real sequence learning tasks. Joint work with John Miller (UC Berkeley).
mifods seminar

Costis Daskalakis (MIT)

Improving Generative Adversarial Networks using Game Theory and Statistics

Video
Generative Adversarial Networks (aka GANs) are a recently proposed approach for learning samplers of high-dimensional distributions with intricate structure, such as distributions over natural images, given samples from these distributions. They are obtained by setting up a two-player zero-sum game between two neural networks, which learn statistics of a target distribution by adapting their strategies in the game using gradient descent. Despite their intriguing performance in practice, GANs pose great challenges to both optimization and statistics. Their training suffers from oscillations, and they are difficult to scale to high-dimensional settings. We study how game-theoretic and statistical techniques can be brought to bare on these important challenges. We use Game Theory towards improving GAN training, and Statistics towards scaling up the dimensionality of the generated distributions.
mifods seminar

Suresh Venkatasubramanian (Utah)

Towards a theory (or theories) of fairness in automated decision-making

Video
With the advent of automated decision making has come a whole new vocabulary for algorithm design. We say that automated decision-making should be "fair" or "unbiased" or "nondiscriminatory". But our algorithms speak only mathematics, and these words need translation (if at all they can be translated). Over the last few years, the research area of fairness, accountability and transparency has made attempts to formalize what it means for an algorithm to be fair. In this talk I'll discuss what has become a conversation between society and belief systems on the one hand, and mathematical formalisms on the other. I'll explain the different formalizations of fairness and how they reflect differing perspectives on how society understands fairness and bias. On the other hand, I'll describe how the formalizations lead to conclusions about the possibility (or not) of fair decision-making.
mifods seminar

Tselil Schramm (Harvard+MIT)

Efficient Algorithms for the Graph Matching Problem in Correlated Random Graphs

Video
The Graph Matching problem is a robust version of the Graph Isomorphism problem: given two not-necessarily-isomorphic graphs, the goal is to find a permutation of the vertices which maximizes the number of common edges. We study a popular average-case variant; we deviate from the common heuristic strategy and give the first quasi-polynomial time algorithm, where previously only sub-exponential time algorithms were known.
Based on joint work with Boaz Barak, Chi-Ning Chou, Zhixian Lei, and Yueqi Sheng.
mifods seminar

John Duchi (Stanford)

Locally private estimation, learning, inference, and optimality

Video
In this talk, we investigate statistical learning and estimation under local privacy constraints, where data providers do not trust the collector of the data and so privatize their data before it is even collected. We identify fundamental tradeoffs between statistical utility and privacy in such local models of privacy, providing instance-specific bounds for private estimation and learning problems by developing local minimax risks. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to evaluate the difficulty of individual problem instances and delineate the possibilities for adaptation in private estimation and inference. As part of this, we identify an alternative to the Fisher information for private estimation, giving a more nuanced understanding of the challenges of adaptivity and optimality. We also provide optimal procedures for private inference, highlighting the importance of a more careful development of optimal tradeoffs between estimation and privacy. One consequence of our results is to identify settings where standard local privacy restrictions may be too strong for practice; time permitting, I will then discuss a few new directions that maintain limited amounts of privacy while simultaneously allowing the development of high-performance statistical and learning procedures.
Based on joint work with Feng Ruan.
mifods seminar

Emma Brunskill (Stanford)

Towards Better Reinforcement Learning for High Stakes Domains

Video
There is increasing excitement about reinforcement learning-- a subarea of machine learning for enabling an agent to learn to make good decisions. Yet numerous questions and challenges remain for reinforcement learning to help support progress in important high stakes domains like education, consumer marketing and healthcare. One key question is how to leverage the ever expanding logs of sequences of decisions made and their outcomes to identify better decision policies, such as using electronic medical record data to inform better personalized patient treatments. I will discuss some of our work on developing better statistical estimators and optimizers for this problem, from the vantage point of a reinforcement learning researcher. A second key issue is to narrow the gap between RL theory and experiment, to create online reinforcement learning algorithms with strong guarantees on their performance. In this effort, I will briefly describe our recent work on policy certificates for RL algorithms. This work is a step towards providing the types of guarantees and transparency that will enable us to confidently deploy RL in important high stakes scenarios.
mifods seminar

Parikshit Gopalan (VMware)

Good visualizations have short sketches

We consider the problem of interactively visualizing a distributed tabular dataset with billions of rows. Our goal is to allow a user to browse such datasets in realtime, at any resolution, with just a few mouse clicks (much like navigating in an online map service) .
We hypothesize that the sketching model of computation is tailor-made for this setting: interesting visualizations are amenable to efficient sketching protocols, whose communication cost is bounded by the size of the desired output (which is small, since it fits on screen). In this talk, we present Hillview, an open-source tool for interactive visualization of large datasets, built around this premise. We focus on the algorithmic challenges that arise in trying to render common visualizations in the sketching model.
Based on joint work with Mihai Budiu, Udi Weider, Marcos Aguilera, Lalith Suresh and Han Kruiger (all from VMware research).
mifods seminar

Emmanuel Abbe (EPFL)

When Deep Learning Fails at Learning

Video
As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This talk gives a first set of results proving that deep learning algorithms fail at learning certain kinds of functions in poly-time, focusing on the classical problem of parities. It further points out to information measures that quantify when these failures tend to generalize, and the tradeoffs between memory, noise and initialization. Joint work with Colin Sandon.
mifods seminar

Nike Sun (MIT)

Capacity Lower Bound for the Ising Perceptron

Video
The perceptron is a toy model of a simple neural network that stores a collection of given patterns. Its analysis reduces to a simple problem in high-dimensional geometry, namely, understanding the intersection of the cube (or sphere) with a collection of random half-spaces. Despite the simplicity of this model, its high-dimensional asymptotics are not well understood. I will describe what is known and present recent results.
mifods seminar

Zico Kolter (CMU)

Making Deep Learning More Robust, Modular, and Efficient

Video
Deep learning is often seen as the "breakthrough" AI technology of recent years, revolutionizing areas spanning computer vision, natural language processing, and game playing. However, if we seek to deploy such systems in real-world, safety-critical domains, a starker reality emerges. Modern deep learning systems are brittle (sensitive to adversarial manipulation and a general lack of robustness), opaque (difficult to interpret and debug their components), and expensive (often requiring vastly more data than practical in real-world settings).

These failings are sometimes billed as an argument against deep learning as a whole. But in this talk, I will argue instead for new methods that can address these challenges, while preserving the fundamental benefits of deep learning (namely, end-to-end training of composable, differentiable architectures). First, I will discuss our approaches to designing provably robust deep networks using tools from convex relaxations and duality. I also highlight recent work on scaling these methods to much larger domains, including some initial work on provable robustness at ImageNet scale. Second, I will present our work on integrating more complex modules as interpretable layers within deep learning architectures. I show how modules such as optimization solvers, physical simulation, model-based control, and game equilibrium solvers can all be integrated as layers within a deep network, enabling more intuitive architectures that can learn from vastly less data. Last, I will highlight some additional ongoing directions and open questions in both these areas.
mifods seminar

Sam Hopkins (Berkeley)

How to Estimate the Mean of a Random Vector in Polynomial Time

Video
We study polynomial time algorithms for estimating the mean of a multivariate random vector under very mild assumptions: we assume only that the random vector X has finite mean and covariance. This allows for X to be heavy-tailed. In this setting, the radius of confidence intervals achieved by the empirical mean are exponentially larger in the case that X is Gaussian or sub-Gaussian. That is, the empirical mean is poorly concentrated. We offer the first polynomial time algorithm to estimate the mean of X with sub-Gaussian-size confidence intervals under such mild assumptions. That is, our estimators are exponentially better-concentrated than the empirical mean. Our algorithm is based on a new semidefinite programming relaxation of a high-dimensional median. Previous estimators which assumed only existence of finitely-many moments of X either sacrifice sub-Gaussian performance or are only known to be computable via brute-force search procedures requiring time exponential in the dimension.

Based on https://arxiv.org/abs/1809.07425, to appear in Annals of Statistics.
mifods seminar

Aaditya Ramdas (CMU)

Exponential Line-crossing Inequalities

Video
This talk will present a class of exponential bounds for the probability that a martingale sequence crosses a time-dependent linear threshold. Our key insight is that it is both natural and fruitful to formulate exponential concentration inequalities in this way. We will illustrate this point by presenting a single assumption and a single theorem that together strengthen many tail bounds for martingales, including classical inequalities (1960-80) by Bernstein, Bennett, Hoeffding, and Freedman; contemporary inequalities (1980-2000) by Shorack and Wellner, Pinelis, Blackwell, van de Geer, and de la Pena; and several modern inequalities (post-2000) by Khan, Tropp, Bercu and Touati, Delyon, and others. In each of these cases, we give the strongest and most general statements to date, quantifying the time-uniform concentration of scalar, matrix, and Banach-space-valued martingales, under a variety of nonparametric assumptions in discrete and continuous time. In doing so, we bridge the gap between existing line-crossing inequalities, the sequential probability ratio test, the Cramer-Chernoff method, self-normalized processes, and other parts of the literature. Time permitting, I will briefly discuss applications to sequential covariance matrix estimation, adaptive clinical trials and multi-armed bandits via the notion of “confidence sequences”.

(joint work with Steve Howard, Jas Sekhon and Jon McAuliffe, preprint https://arxiv.org/abs/1808.03204)
mifods seminar

Timnit Gebru (Google)

Towards Transparency in AI, Methods and Challenges

Video
Automated decision making tools are currently used in high stakes scenarios. From natural language processing tools used to automatically determine one’s suitability for a job, to health diagnostic systems trained to determine a patient’s outcome, machine learning models are used to make decisions that can have serious consequences on people’s lives. In spite of the consequential nature of these use cases, vendors of such models are not required to perform specific tests showing the suitability of their models for a given task. Nor are they required to provide documentation describing the characteristics of their models, or disclose the results of algorithmic audits to ensure that certain groups are not unfairly treated. I will show some examples to examine the dire consequences of basing decisions entirely on machine learning based systems, and discuss recent work on auditing and exposing the gender and skin tone bias found in commercial gender classification systems. I will end with the concept of AI datasheets for datasets and model cards for model reporting to standardize information for datasets and pre-trained models, in order to push the field as a whole towards transparency and accountability. Recently, we have seen many powerful entities in academia and industry announcing initiatives related to AI ethics. I will spend some time in this talk discussing how we can learn from the mistakes and evolutions of other disciplines who have/continue to perform what some call parachute research: one that uses the pain of marginalized communities without centering their voices or benefiting them.
mifods seminar

Yoram Singer (Princeton)

Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Video
Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training some of the largest deep models used at Google.
mifods seminar

Will Perkins (University of Chicago)

Counting and Sampling at Low Temperatures

Video
We consider the problem of efficient sampling from the hard-core and Potts models from statistical physics. On certain families of graphs, phase transitions in the underlying physics model are linked to changes in the performance of some sampling algorithms, including Markov chains. We develop new sampling and counting algorithms that exploit the phase transition phenomenon and work efficiently on lattices (and bipartite expander graphs) at sufficiently low temperatures in the phase coexistence regime. Our algorithms are based on Pirogov-Sinai theory and the cluster expansion, classical tools from statistical physics. Joint work with Tyler Helmuth and Guus Regts.