Friday 9:30 am Michael Mitzenmacher
Data Exploration : Equitability and the Maximal Information Coefficient
Abstract: A fundamental problem in data exploration with a high-dimensional data set is to figure out what pairs of dimensions appear to have an interesting relationship, worthy of future study. We set out desiderata for methods to solve this problem, introducing the concept of "equitability"; equitability means having a ranking method that does not favor certain types of relationships over others. We then explain our algorithmic approach to achieving equitability, using what we have dubbed the maximal information coefficient, a measure based on mutual information. Finally, we present some experimental results.
Joint work with David Reshef, Yakir Reshef, Pardis Sabeti, and others.
Friday 10:00 am Sergei VassilivitskiAbstract: We consider the problem of inferring choices made by users from data that only contains the relative popularity of each item. We propose a framework that models the problem as that of inferring a Markov chain given its stationary distribution. Formally, given a graph and a distribution on its nodes, the goal is to assign scores to nodes such that the stationary distribution of the following Markov chain will equal the given distribution on nodes: the transition probability from a node to its neighbor is proportional to the score of the neighbor. We prove sufficient conditions under which this problem is feasible, and, for the feasible instances, obtain a simple algorithm for a generic version of the problem. This iterative algorithm provably finds the unique solution to this problem and has a polynomial rate of convergence. In practice we find that the algorithm converges after fewer than 10 iterations. We then apply this framework to choice problems in online settings and show that our algorithm is able to explain the observed data and predict the user choice, much better than other competing baselines across a variety of diverse datasets.
Appeared in WSDM 2015. Joint work with Ravi Kumar, Andrew Tomkins, and Erik Vee.
Friday 11:00 am Stefanie JegelkaAbstract: Submodular functions capture a spectrum of discrete problems in machine learning, signal processing and computer vision. In these areas, practical algorithms are a major concern that motivates to exploit structure in addition to submodularity. A simple example of such a structure are functions that decompose as a sum of "simple" submodular functions. For this setting, several algorithms arise from relations between submodularity and convexity. In particular, this talk will focus on a class of algorithms that solve submodular minimization as a best approximation problem. These algorithms are easy to use and to parallelize, and solve both a convex relaxation and the original discrete problem. We observe that the algorithms work well in practice, and analyze their convergence properties.
This is joint work with Robert Nishihara, Francis Bach, Suvrit Sra and Michael I. Jordan.
Friday 11:30 am Francis BachAbstract: In this talk, I will consider the submodular function minimization problem and present several algorithms for solving it, for generic and semi-generic (e.g., decomposable) submodular functions. Some come with theoretical guarantees, some do not; some may be used for large-scale problems such as encountered in machine learning or computer vision, some do not. Is it possible to have both theoretical guarantees and practical efficiency in this setting?
Friday 2:00 pm Babis TsourakakisAbstract: In this talk, we will present recent results focus on dense subgraph discovery in large networks. We will introduce the concept of densest subgraph sparsifiers, a randomized algorithm that allows scalable densest subgraph computations on multi-gigabyte (static) networks. Furthermore, we will present state-of-the-art approximation algorithms for dense discovery in large-scale dynamic graphs. We will conclude with a recent generalization of the well-studied densest subgraph problem.
Friday 2:30 pm Vahab MirrokniAbstract: In this talk, I discuss large-scale graph mining project at Google NYC. The goal of the project is to develop a distributed graph algorithm library for analyzing graphs with hundreds of billions of edges. I present an overview of challenges from three perspectives: application-inspired problems, distributed computation challenges, and finally combined system+algorithms research. In the first topic, I discuss the model of public-private graphs. On the 2nd topic, I discuss randomized composable core-sets for distributed optimization and balanced clustering. Finally, I discuss hybrid algorithmic and system problems for computing connected components in Mapreduce, and also a new graph mining framework based on asynchronous message passing, referred to as ASYMP.
Friday 3:30 pm Amin KarbasiAbstract: Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. Our summarization is based on coreset constructions from computational geometry. We also develop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In particular, we show that for a fixed risk (or data size), as the data size increases (resp. risk increases) the running time of TRAM decreases. Our extensive experiments on real data sets demonstrate the existence and practical utility of such tradeoffs, not only for k-means but also for Gaussian Mixture Models.
Friday 4:00 pm Yisong YueAbstract: The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also called bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this talk, I will provide an overview of the Dueling Bandits Problem, including intuitions and some details of basic algorithmic results. I will also discuss some more recent advances as well as interesting directions for future work.
Saturday 9:00 am Jeff BilmesAbstract: We investigate two mixed robust/average-case submodular data partitioning problems that we collectively call Submodular Partitioning. These problems generalize purely robust instances of the problem, namely max-min submodular fair allocation (SFA) and min-max submodular load balancing (SLB), and also average-case instances, that is the submodular welfare problem (SWP) and submodular multiway partition (SMP). Robust versions have been studied before and have focused on tight approximation guarantees. In the average case, many of the existing algorithms are, moreover, scalable. In the present work, we propose several simple algorithms that not only scale to large datasets but that also achieve approximation guarantees comparable to the state-of-the-art. We are motivated by problems in machine learning, including data partitioning and load balancing for distributed machine learning, data clustering, and image segmentation. We empirically demonstrate the efficacy of our algorithms on real-world problems involving data partitioning for distributed optimization (of convex, nearest neighbor, and deep neural network objectives), and also purely unsupervised image segmentation.
Joint work with Kai Wei, Rishabh Iyer, Shengjie Wang, and Wenruo Bai, of the University of Washington, Seattle.
Saturday 9:30 pm Morteza ZadimoghaddamAbstract: An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the diversity, coverage, monotone and non-monotone submodular problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in both distributed and streaming settings.
Saturday 10:00 am Andreas KrauseAs a discrete analogue of convexity, submodularity has profound implications for optimization. In recent years, submodular optimization has found many new applications, such as in machine learning, vision, data mining and network analysis. In this talk, I will present our recent work on quantifying uncertainty in submodular optimization. In particular, we carry out the first systematic investigation of inference and learning in probabilistic submodular models (PSMs). These are probabilistic models defined through submodular functions -- log-sub/supermodular distributions -- generalizing regular binary Markov Random Fields and Determinantal Point Processes. They express natural notions such as attractiveness and repulsion and allow to capture long-range, high-order dependencies among the variables. I will present our recently discovered variational approach towards inference in general PSMs based on sub- and supergradients. We obtain both lower and upper bounds on the log-partition function, which enables computing probability intervals for marginals, conditionals and marginal likelihoods. We also obtain fully factorized approximate posteriors, at essentially the same computational cost as ordinary submodular optimization. Our framework results in convex problems for optimizing over differentials of submodular functions, which we show how to optimally solve. Our approximation is exact at the mode (for log-supermodular distributions), and we provide bounds on the approximation quality of the log-partition function with respect to the curvature of the function. We further establish natural relations between our variational approach and the classical mean-field method from statistical physics. Exploiting additive structure in the objective leads to highly scalable, parallelizable message passing algorithms. We empirically demonstrate the accuracy of our inference scheme on several PSMs arising in computer vision and network analysis.
This is joint work with Josip Djolonga
Saturday 11:00 pm Jan VondrakWe study the question of learning of valuation functions from random examples, initiated by Balcan and Harvey. We aim to learn valuation functions over the uniform distribution on {0,1}^n, within an L2 error of epsilon. Our earlier work showed that for submodular functions, this can be done in time 2^{O(1/epsilon^2)} poly(n). In this work, we improve the bound to 2^{O(1/epsilon^{4/5})} poly(n) and show that the 4/5 exponent is optimal. The technical question that leads to this result is, what is the optimal degree of a multilinear polynomial that approximates a submodular function within an L2 error of epsilon. We show the answer to be O(1/\epsilon^{4/5}). We also give optimal polynomial approximations for XOS and self-bounding functions.
joint work with Vitaly Feldman
Saturday 11:30 am Yaron SingerAbstract: We consider the question of whether provable guarantees are obtainable in combinatorial optimization in the presence of error and noise. We provide initial answers, by focusing on the question of maximizing monotone submodular functions under matroid constraints.
Based on joint work with Avinatan Hassidim.
Saturday 2:00 pm Tim RoughgardenAbstract: We first explain how to use concepts from learning theory to make optimal auction theory more operational, replacing the “common prior” assumption with a data-driven approach. We then expand on this idea, taking a PAC learning approach to the selection of application-specific algorithms. Our work shows that dimension notions from statistical learning theory, historically used to measure the complexity of classes of binary- and real-valued functions, are relevant in a much broader algorithmic context.
Includes joint work with Rishi Gupta and Jamie Morgenstern.
Saturday 2:30 pm Elad HazanAbstract: For many data sets that are sparse and structured, data reconstruction is computationally hard. A notable example is the case for media-recommendation data, which is sparse and low-rank. We consider general model for classification and regression tasks where we have missing data and assume that the (clean) data resides in a low dimensional subspace. We give a non-reconstructive learning approach for this setting with provable guarantees.
Based on joint work with Roi Livni and Yishay Mansour
Saturday 3:00 pm Edo LibertyAbstract: This talk will review two recent results about performing Principal Component Analysis (PCA) in an online fashion. The first portion of the talk will be dedicated to motivating a new problem definition. Then, a simple and deterministic algorithm for online PCA will be presented along with its guaranties (and their proof sketches). Finally, recent progress and open problems will be discussed.
Online Principal Components Analysis, SODA 2015, Christos Boutsidis, Dan Garber, Zohar Shay Karnin, Edo Liberty
Online PCA with Spectral Bounds, COLT 2015, Zohar Shay Karnin, Edo Liberty
Saturday 4:00 pm EveryoneAbstract: We will form groups around topics people are interested to discuss. For each discussion we'll nominate a person leading the discussion. We will conclude with a short summary of the discussions.
Sunday, 9:30 am Eva TardosAbstract: Classical work in economics on inferring agent values from data relies on the assumption that all participant strategies are best responses of the observed play of other players, i.e. they constitute a Nash equilibrium. In this paper, we show how to perform inference relying on a weaker assumption instead: assuming that players are using some form of no-regret learning. Learning outcomes emerged in recent years as an attractive alternative to Nash equilibrium in analyzing game outcomes, modeling players who haven’t reached a stable equilibrium, but rather use algorithmic learning, aiming to learn the best way to play from previous observations. In this paper we show how to infer values of players who use algorithmic learning strategies. Such inference is an important first step before we move to testing any learning theoretic behavioral model on auction data. We apply our techniques to a dataset from Microsoft’s sponsored search ad auction system. Joint work with Denis Nekipelov and Vasilis Syrgkanis.
Sunday 10:00am Joachim BuhmannAbstract: Algorithms are traditionally analyzed with respect to the resources execution time and memory space. The robustness of algorithms to input fluctuations, model mismatch and perturbations is required in almost all real world applications of information technology but it is rarely considered as a selection criterion for algorithms. The achievable precision of an algorithm, i.e., the attainable resolution in its output space, is determined by its capability to extract predictive information in the input relative to its output. I will present an information theoretic framework to control generalization of an algorithm over input fluctuations. The algorithm is formulated as a computational evolution of a (possibly contracting) posterior distribution over the output space. The tradeoff between precision and stability is optimized by an input sensitive generalization capacity (GC). GC measures how much the posteriors on two different problem instances agree despite the noise in the input. Thereby, GC objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. Information theoretic algorithm selection is demonstrated for minimum spanning tree algorithms and for greedy MaxCut algorithms. The method can rank centroid based and spectral clustering methods, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering.
Sunday 11:00am Sanjeev AroraAbstract: Semantic word embeddings represent words as vectors in a geometric space. They are useful for many NLP tasks and often constructed using nonlinear/nonconvex techniques such as deep nets and energy-based models. Recently Mikolov et al (2013) showed that such embeddings exhibit linear structure that can be used to solve "word analogy tasks" such as man: woman :: king: ??. This has triggered much subsequent work on neural models for language.
We give a loglinear *generative* model for text corpora that directly models latent semantic structure in words and involves random walk in the context space. A rigorous mathematical analysis is performed to obtain explicit expressions for word cooccurence probabilities, which leads to a surprising explanation for why low-dimensional word embeddings exist and why they exhibit linear structure that allows solving analogies. It also casts new light on older "Vector Space methods." Our model and its analysis leads to several counterintuitive predictions, which are also empirically verified.
We think our methodology and generative model may be useful for other NLP tasks.
Joint work with Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski (listed in alphabetical order).