- Algorithms: Designing efficient algorithms for problems in combinatorial optimization and data analysis.
- Theoretical Machine Learning: Designing efficient and robust algorithms for problems in machine learning.
- Beyond Worst-Case Analysis: Improved algorithmic guarantees using paradigms like realistic average-case models, stable instances and smoothed analysis.
-
Efficient Tensor Decomposition.
Book Chapter in Beyond the Worst Case Analysis of Algorithms
Edited by Tim Roughgarden, Cambridge University Press 2020.
[ArXiv| Book Link | Abstract]This chapter studies the problem of decomposing a tensor into a sum of constituent rank one tensors. While tensor decompositions are very useful in designing learning algorithms and data analysis, they are NP-hard in the worst-case. We will see how to design efficient algorithms with provable guarantees under mild assumptions, and using beyond worst-case frameworks like smoothed analysis. -
Adversarial robustness via robust low rank representations.
(With Pranjal Awasthi, Himanshu Jain and Ankit Singh Rawat).
[ArXiv| Abstract]Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees.
Our first contribution is for certified robustness to perturbations measured in ℓ2 norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100.
Our second contribution is for the more challenging setting of certified robustness to perturbations measured in ℓ∞ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to ℓ∞ perturbations in those representations. Our certificate of ℓ∞ robustness relies on a natural quantity involving the ∞→2 matrix operator norm associated with the representation, to translate robustness guarantees from ℓ2 to ℓ∞ perturbations.
A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.
-
Scheduling Precedence-Constrained Jobs on Related Machines with Communication Delay.
(With Biswaroop Maiti, Rajmohan Rajaraman, David Stalfa and Zoya Svitkina).
FOCS 2020. [ArXiv| Abstract]We consider the problem of scheduling $n$ precedence-constrained jobs on $m$ uniformly-related machines in the presence of an arbitrary, fixed communication delay $\rho$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an $O(\log m \log \rho/\log\log \rho)$-approximation algorithm for minimizing makespan, assuming the minimum makespan is at least $\rho$. Our algorithm is based on rounding a linear programming relaxation for the problem, which includes carefully designed constraints capturing the interaction among communication delay, precedence requirements, varying speeds, and job duplication. Our result builds on two previous lines of work, one with communication delay but identical machines (Lepere, Rapine 2002) and the other with uniformly-related machines but no communication delay (Chudak, Shmoys 1999).
We next show that the integrality gap of our mathematical program is $\Omega(\sqrt{\log \rho})$. Our gap construction employs expander graphs and exploits a property of robust expansion and its generalization to paths of longer length. Finally, we quantify the advantage of duplication in scheduling with communication delay. We show that the best schedule without duplication can have makespan $\Omega(\rho/\log\rho)$ or $\Omega(\log m/\log\log m)$ or $\Omega(\log n/\log\log n)$ times that of an optimal schedule allowing duplication. Nevertheless, we present a polynomial time algorithm to transform any schedule to a schedule without duplication at the cost of a $O(\log^2 n\log m)$ factor increase in makespan. Together with our makespan approximation algorithm for schedules allowing duplication, this also yields a polylogarithmic-approximation algorithm for the setting where duplication is not allowed.
-
Estimating Principal Components under Adversarial Perturbations.
(With Pranjal Awasthi and Xue Chen).
COLT 2020. [ArXiv| Abstract]Robustness is a key requirement for widespread deployment of machine learning algorithms, and has received much attention in both statistics and computer science. We study a natural model of robustness for high-dimensional statistical estimation problems that we call the adversarial perturbation model. An adversary can perturb every sample arbitrarily up to a specified magnitude δ measured in some $\ell_q$ norm, say $\ell_\infty$. Our model is motivated by emerging paradigms such as low precision machine learning and adversarial training.
We study the classical problem of estimating the top-r principal subspace of the Gaussian covariance matrix in high dimensions, under the adversarial perturbation model. We design a computationally efficient algorithm that given corrupted data, recovers an estimate of the top-$r$ principal subspace with error that depends on a robustness parameter that we identify. This parameter corresponds to the $q \to 2$ operator norm of the projector onto the principal subspace, and generalizes well-studied analytic notions of sparsity. Additionally, in the absence of corruptions, our algorithmic guarantees recover existing bounds for problems such as sparse PCA and its higher rank analogs. We also prove that the above dependence on the robustness parameter is almost optimal asymptotically, not just in a minimax sense, but remarkably for every instance of the problem. This instance-optimal guarantee shows that the $q \to 2$ operator norm of the subspace essentially characterizes the estimation error under adversarial perturbations.
-
Adversarially Robust Low Dimensional Representations.
(With Pranjal Awasthi, Vaggos Chatziafratis and Xue Chen).
[ArXiv| Abstract]Adversarial or test time robustness measures the susceptibility of a machine learning system to small perturbations made to the input at test time. This has attracted much interest on the empirical side, since many existing ML systems perform poorly under imperceptible adversarial perturbations to the test inputs. On the other hand, our theoretical understanding of this phenomenon is limited, and has mostly focused on supervised learning tasks.
In this work we study the problem of computing adversarially robust representations of data. We formulate a natural extension of Principal Component Analysis (PCA) where the goal is to find a low dimensional subspace to represent the given data with minimum projection error, and that is in addition robust to small perturbations measured in $\ell_q$ norm (say $q=\infty$). Unlike PCA which is solvable in polynomial time, our formulation is computationally intractable to optimize as it captures the well-studied sparse PCA objective.
We show the following algorithmic and statistical results. - Polynomial time algorithms in the worst-case that achieve constant factor approximations to the objective while only violating the robustness constraint by a constant factor. - We prove that our formulation (and algorithms) also enjoy significant statistical benefits in terms of sample complexity over standard PCA on account of a ``regularization effect'', that is formalized using the well-studied spiked covariance model. - Surprisingly, we show that our algorithmic techniques can also be made robust to corruptions in the training data, in addition to yielding representations that are robust at test time! Here an adversary is allowed to corrupt potentially every data point up to a specified amount in the $\ell_q$ norm. We further apply these techniques for mean estimation and clustering under adversarial corruptions to the training data.
-
On Robustness to Adversarial Examples and Polynomial Optimization.
(With Pranjal Awasthi and Abhratanu Dutta).
NeurIPS 2019.[Arxiv|Slides | Abstract]We study the design of computationally efficient algorithms with provable guarantees, that are robust to adversarial (test time) perturbations. While there has been an explosion of recent work on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions including \emph{when and how can one design provably robust learning algorithms?}, and \emph{what is the price of achieving robustness to adversarial examples in a computationally efficient manner?}
The main contribution of this work is to exhibit a strong connection between achieving robustness to adversarial examples, and a rich class of polynomial optimization problems, thereby making progress on the above questions. In particular, we leverage this connection to (a) design the first computationally efficient robust algorithms with provable guarantees for a large class of hypothesis, namely linear classifiers and degree-$2$ polynomial threshold functions~(PTFs), (b) give a precise characterization of the price of achieving robustness in a computationally efficient manner for these classes, (c) design efficient algorithms to certify robustness and generate adversarial attacks in a principled manner for $2$-layer neural networks. We empirically demonstrate the effectiveness of these attacks on real data.
-
Smoothed Analysis in Unsupervised Learning via Decoupling
(With Aditya Bhaskara, Aidao Chen and Aidan Perreault).
FOCS 2019. [ArXiv| Abstract]Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables.
In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to obtain polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning:
1. Robust subspace recovery, when the fraction $\alpha$ of inliers in the d-dimensional subspace $T \subset \mathbb{R}^n$ is at least $\alpha >(d/n)^{\ell}$ for any constant integer $\ell>0$. This contrasts with the known worst-case intractability when $\alpha< d/n$, and the previous smoothed analysis result which needed $\alpha > d/n$ (Hardt and Moitra, 2013).
2. Higher order tensor decompositions, where we generalize the so-called FOOBI algorithm of Cardoso to find order-$\ell$ rank-one tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for $2\ell$'th order tensors with rank $O(n^{\ell})$.
3. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model. -
Block Stability for MAP inference
(With Hunter Lang and David Sontag).
AISTATS 2019. [ArXiv| Abstract]To understand the empirical success of approximate MAP inference, recent work (Lang et al., 2018) has shown that some popular approximation algorithms perform very well when the input instance is stable. The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are (adversarially) perturbed. Unfortunately, this strong condition does not seem to be satisfied in practice. In this paper, we introduce a significantly more relaxed condition that only requires blocks (portions) of an input instance to be stable. Under this block stability condition, we prove that the pairwise LP relaxation is persistent on the stable blocks. We complement our theoretical results with an empirical evaluation of real-world MAP inference instances from computer vision. We design an algorithm to find stable blocks, and find that these real instances have large stable regions. Our work gives a theoretical explanation for the widespread empirical phenomenon of persistency for this LP relaxation.
-
Towards Learning Sparsely Used Dictionaries with Arbitrary Supports
(With Pranjal Awasthi).
FOCS 2018. [ArXiv| Abstract]Dictionary learning is a popular approach for inferring a hidden basis or dictionary in which data has a sparse representation. Data generated from the dictionary A (an n by m matrix, with m > n in the over-complete setting) is given by $Y = AX$ where X is a matrix whose columns have supports chosen from a distribution over k-sparse vectors, and the non-zero values chosen from a symmetric distribution. Given Y, the goal is to recover A and X in polynomial time. Existing algorithms give polytime guarantees for recovering incoherent dictionaries, under strong distributional assumptions both on the supports of the columns of X, and on the values of the non-zero entries. In this work, we study the following question: Can we design efficient algorithms for recovering dictionaries when the supports of the columns of X are arbitrary? To address this question while circumventing the issue of non-identifiability, we study a natural semirandom model for dictionary learning where there are a large number of samples y=Ax with arbitrary k-sparse supports for x, along with a few samples where the sparse supports are chosen uniformly at random. While the few samples with random supports ensures identifiability, the support distribution can look almost arbitrary in aggregate. Hence existing algorithmic techniques seem to break down as they make strong assumptions on the supports. Our main contribution is a new polynomial time algorithm for learning incoherent over-complete dictionaries that works under the semirandom model. Additionally the same algorithm provides polynomial time guarantees in new parameter regimes when the supports are fully random. Finally using these techniques, we also identify a minimal set of conditions on the supports under which the dictionary can be (information theoretically) recovered from polynomial samples for almost linear sparsity, i.e., $k=\widetilde{O}(n)$.
-
Clustering Semi-Random Mixtures of Gaussians
(With Pranjal Awasthi).
ICML 2018. [ArXiv| Abstract]Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model, a semi-random adversary is allowed to make arbitrary "monotone" or helpful changes to the data generated from the Gaussian mixture model. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching information-theoretic lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.
-
Optimality of Approximate Inference Algorithms on Stable Instances
(With Hunter Lang and David Sontag).
AISTATS 2018. [ArXiv| Abstract]Approximate algorithms for structured prediction problems---such as the popular alpha-expansion algorithm (Boykov et al. 2001) in computer vision---typically far exceed their theoretical performance guarantees on real-world instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alpha-expansion on MAP inference in Ferromagnetic Potts models (FPMs). Our main results use the connection between energy minimization in FPMs and the Uniform Metric Labeling problem to give a stability condition under which the alpha-expansion algorithm provably recovers the optimal MAP solution. This theoretical result complements the numerous empirical observations of alpha-expansion's performance. Additionally, we give a different stability condition under which an LP-based algorithm recovers the optimal solution.
-
On Learning Mixtures of Well-Separated Gaussians
(With Oded Regev).
FOCS 2017. [ArXiv|PDF | Abstract]We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of $k$ standard spherical Gaussians with means \mu_1,\ldots,\mu_k in R^d, and the goal is to estimate the means up to accuracy $\delta$ using $poly(k,d, 1/\delta)$ samples. In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly $\min\{k,d\}^{1/4}$. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation $o(1)$, exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.
1. We show that with separation $o(\sqrt{\log k})$, super-polynomially many samples are required. In fact, this holds even when the $k$ means of the Gaussians are picked at random in $d=O(\log k)$ dimensions.
2. We show that with separation $\Omega(\sqrt{\log k})$, $\poly(k,d,1/\delta)$ samples suffice. Notice that the bound on the separation is independent of $\delta$. This result is based on a new and efficient ``accuracy boosting'' algorithm that takes as input coarse estimates of the true means and in time (and samples) $\poly(k,d, 1/\delta)$ outputs estimates of the means up to arbitrarily good accuracy $\delta$ assuming the separation between the means is $\Omega(\min\{\sqrt{\log k},\sqrt{d}\})$ (independently of $\delta$). The idea of the algorithm is to iteratively solve a ``diagonally dominant'' system of non-linear equations.
We also (1) present a computationally efficient algorithm in $d=O(1)$ dimensions with only $\Omega(\sqrt{d})$ separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of $k$ spherical Gaussians with polynomial samples.
-
Clustering Stable Instances of Euclidean k-means
(With Abhratanu Dutta and Alex Wang).
NIPS 2017.[ArXiv| Abstract]The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances. Stable instances have unique optimal k-means solutions that do not change even when each point is perturbed a little (in Euclidean distance). This captures the property that the k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we show an efficient algorithm with provable guarantees that is also robust to outliers. We complement these results by studying the amount of stability in real datasets and demonstrating that our algorithm performs well on these benchmark datasets.
-
Approximation Algorithms for Label Cover and the Log-Density Threshold
(With Eden Chlamtac, Pasin Manurangsi and Dana Moshkovitz).
SODA 2017. [ArXiv|SODA | Abstract]Many known optimal NP-hardness of approximation results are reductions from a problem called \textsc{Label-Cover}. The input is a bipartite graph $G=(L,R,E)$ and each edge $e=(x,y)\in E$ carries a projection $\pi_e$ that maps labels to $x$ to labels to $y$. The objective is to find a labeling of the vertices that satisfies as many of the projections as possible. It is believed that the best approximation ratio efficiently achievable for Label-Cover is of the form $N^{-c}$ where $N=nk$, $n$ is the number of vertices, $k$ is the number of labels, and $c \in (0,1)$ is some constant.
Inspired by a framework originally developed for Densest $k$-Subgraph, we propose a ``log density threshold'' for the approximability of Label-Cover. Specifically, we suggest the possibility that the Label-Cover approximation problem undergoes a computational phase transition at the same threshold at which local algorithms for its random counterpart fail. This threshold is $N^{3-2\sqrt{2}}\approx N^{-0.17}$. We then design, for any $\varepsilon>0$, a polynomial-time approximation algorithm for {\em semi-random} \textsc{Label-Cover} whose approximation ratio is $N^{3-2\sqrt{2}+\varepsilon}$. In our semi-random model, the input graph is random (or even just expanding), and the projections on the edges are arbitrary.
For worst-case Label-Cover we show a polynomial-time algorithm whose approximation ratio is roughly $N^{-0.233}$. The previous best efficient approximation ratio was $N^{-0.25}$. We present some evidence towards an $N^{-c}$ threshold by constructing integrality gaps for $N^{\Omega(1)}$ rounds of the Sum-of-squares/Lasserre hierarchy of the natural relaxation of Label Cover. For general 2CSP the ``log density threshold'' is $N^{-0.25}$, and we give a polynomial-time algorithm in the semi-random model whose approximation ratio is $N^{-0.25+\varepsilon}$ for any $\varepsilon>0$.
-
Learning Communities in the Presence of Errors
(With Konstantin Makarychev and Yury Makarychev).
COLT 2016. [ ArXiv | Abstract]We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors. In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, Feige---Kilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add o(n) edges. Our work answers this question affirmatively even in the case of k>2 communities.
We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is \epsilon m close to SBM in the Kullback---Leibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.
-
Beating the Random Assignment on Constraint Satisfaction Problems of Bounded Degree
(With Boaz Barak, Ankur Moitra, Ryan O'Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, David Witmer and John Wright).
APPROX 2015. [ ArXiv | Abstract]We show that for any odd k and any instance of the Max-kXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a 1/2+ c/\sqrt{D} fraction of constraints, where D is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone, and Gutmann (2014), which gave a quantum algorithm to find an assignment satisfying a 1/2+ D^{−3/4} fraction of the equations. For arbitrary constraint satisfaction problems, we give a similar result for "triangle-free" instances; i.e., an efficient algorithm that finds an assignment satisfying at least a \mu +c/\sqrt{D} fraction of constraints, where \mu is the fraction that would be satisfied by a uniformly random assignment.
-
Correlation Clustering with Noisy, Partial Information
(With Konstantin Makarychev and Yury Makarychev).
COLT 2015. [ ArXiv | Abstract]In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value (1+\delta)opt-cost+O_\delta(n log^3 n) with high probability, where opt-cost is the value of the optimal solution (for every \delta>0). The second algorithm finds the ground truth clustering with an arbitrarily small classification error (under some additional assumptions on the instance).
-
Learning Mixtures of Ranking Models
(With Pranjal Awasthi, Avrim Blum and Or Sheffet).
NIPS 2014 . [ ArXiv |Abstract]This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model . Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima.
We present the first polynomial time algorithm which provable learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. In particular, our results imply that in order to recover the top k-rankings from a heterogeneous population, it is enough for each user to just specify their top-3 (unordered) preferences.
-
Constant Factor Approximations for Balanced Cut in the random PIE model
(With Konstantin Makarychev and Yury Makarychev).
STOC 2014. [ ArXiv | Abstract]We propose and study a new average-case model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than previously studied average-case models like semi-random models, random planted partitioning models and stochastic block models, and can capture real-world network models like preferential attachment models. We give constant factor approximations for Balanced Cut in this model.
The model : Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R. Let E_random be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under renaming of vertices in L and in R). Then we say that G + E_random is a graph with permutation-invariant random edges. For example, in a social network formed by the superposition of separate networks representing different types of ties (say geographic and professional), these different networks are independent.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost O(E_random) + n.polylog(n) in this model. In the most interesting regime, this is a constant factor approximation with respect to the cost of the planted cut.
-
Smoothed Analysis of Tensor Decompositions
(With Aditya Bhaskara, Moses Charikar and Ankur Moitra)
STOC 2014. [ PDF | ArXiv | Abstract ]Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, these are not known to be stable to noise.
Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.
-
Uniqueness of Tensor Decompositions and Polynomial Identifiability of Latent Variable Models
(With Aditya Bhaskara and Moses Charikar)
COLT 2014. [PDF | ArXiv | Abstract ]We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in proving the identifiability of parameters for variouslatent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models.
Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain "non-degeneracy" properties. Our methods give a way to go beyond this non-degeneracy barrier, and establish polynomial identifiability of the parameters under much milder conditions. Given the importance of Kruskal's theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.
- Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut
(With Konstantin Makarychev and Yury Makarychev)
SODA 2014. [PDF | ArXiv | Abstract]We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for $\gamma$-stable Max Cut instances with $\gamma >= c\sqrt{log n}loglog n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $\gamma$-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not $\gamma$-stable. We prove that there is no robust polynomial-time algorithm for $\gamma$-stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$, where $\alpha_{SC}$ is the best approximation factor for Sparsest Cut with non-uniform demands.
Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with $\ell_2^2$ triangle inequalities) is integral if $\gamma \geq D_{\ell_2^2\to \ell_1}(n)$, where $D_{\ell_2^2\to \ell_1}(n)$ is the least distortion with which every n point metric space of negative type embeds into $\ell_1$. On the negative side, we show that the SDP relaxation is not integral when $\gamma < D_{\ell_2^2\to \ell_1}(n/2)$. Moreover, there is no tractable convex relaxation for $\gamma$-stable instances of Max Cut when $\gamma < \alpha_{SC}(n/2)$. That suggests that solving $\gamma$-stable instances with $\gamma =o(\sqrt{\log n})$ might be difficult or impossible.
Our results significantly improve previously known results. The best previously known algorithm for $\gamma$-stable instances of Max Cut required that $\gamma \geq c\sqrt{n}$ (for some $c > 0$) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.
- Sorting noisy data with partial information
(With Konstantin Makarychev and Yury Makarychev)
ITCS 2013. [PDF | Abstract]nd the minimum set of edges (arcs) whose removal makes a given graph a directed acyclic graph (DAG). The interest in FAS stems from its numerous applications in sorting and ranking based on pairwise information, removing cyclic dependencies etc. However, the best known algorithm for this problem only gives a polylogarithmic approximation in the worst-case, and constant factor approximations have been elusive.Since real-world instances are unlikely to behave like worst-case instances, a compelling question is : can we design better algorithms for realistic average case instances? We propose two realistic average-case models for the problem, and present new approximation algorithms which obtain a PTAS and constant factor approximations in these two models.
- Approximation algorithms for Semi-random Graph Partitioning problems
(With Konstantin Makarychev and Yury Makarychev)
STOC 2012. [PDF | ArXiv | Abstract]We propose and study a new semi-random model for graph partitioning problems. We believe that it captures many properties of real
WThe model is more flexible than the semi-random model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser.
In our semi-random graph partitioning model, we need just edges across different clusters to be (semi)-random. The edges inside clusters can be completely arbitrary (unlike previous models).
We develop a general framework for solving semi-random instances using new SDP-based techniques and apply it to several problems of interest. We present constant factor approximation algorithms for semi-random instances of the Balanced Cut, Multicut, Min Uncut and Small-Set Expansion problems. We also show how to almost recover the optimal solution if the instance satisfies an additional expanding condition. Our algorithms work in a wider range of parameters than algorithms for previously studied random and semi-random models.
- Polynomial Integrality gaps for Strong relaxations of Densest k-subgraph
(With Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami and Yuan Zhou)
SODA 2012. [PDF | ArXiv | Abstract]The densest k-subgraph (DkS) problem (i.e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for DkS: the current best algorithm gives an ~ $O(n^{1/4})$ approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P != NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of densest k-subgraph and its variants. Thus, understanding the approximability of DkS is an important challenge. In this work, we give evidence for the hardness of approximating DkS within polynomial factors. Specifically, we expose the limitations of strong semidefinite programs from SDP hierarchies in solving densest k-subgraph. Our results include:
* A lower bound of $\Omega(n^{1/4}/\log^3 n)$ on the integrality gap for $\Omega(\log n/\log \log n)$ rounds of the Sherali-Adams relaxation for DkS. This also holds for the relaxation obtained from Sherali-Adams with an added SDP constraint. Our gap instances are in fact Erdos-Renyi random graphs.
* For every $\epsilon > 0$, a lower bound of $n^{2/53-\epsilon}$ on the integrality gap of $n^{\Omega(\epsilon)}$ rounds of the Lasserre SDP relaxation for DkS, and an $n^{Omega_\epsilon(1)}$ gap for $n^{1-\epsilon}$ rounds. Our construction proceeds via a reduction from random instances of a certain Max-CSP over large domains. In the absence of inapproximability results for DkS, our results show that even the most powerful SDPs are unable to beat a factor of $n^{\Omega(1)}$, and in fact even improving the best known $n^{1/4}$ factor is a barrier for current techniques. - Algorithms and Hardness of the k-route cuts problem
(With Julia Chuzhoy, Yury Makarychev and Yuan Zhou)
SODA 2012. [PDF | ArXiv | Abstract]We study the k-route cut problem: given an undirected edge-weighted graph G=(V,E), a collection {(s_1,t_1),(s_2,t_2),...,(s_r,t_r)} of source-sink pairs, and an integer connectivity requirement k, the goal is to find a minimum-weight subset E' of edges to remove, such that the connectivity of every pair (s_i, t_i) falls below k. Specifically, in the edge-connectivity version, EC-kRC, the requirement is that there are at most (k-1) edge-disjoint paths connecting s_i to t_i in G \ E', while in the vertex-connectivity version, NC-kRC, the same requirement is for vertex-disjoint paths. Prior to our work, poly-logarithmic approximation algorithms have been known for the special case where k >= 3, but no non-trivial approximation algorithms were known for any value k>3, except in the single-source setting. We show an O(k log^{3/2}r)-approximation algorithm for EC-kRC with uniform edge weights, and several polylogarithmic bi-criteria approximation algorithms for EC-kRC and NC-kRC, where the connectivity requirement k is violated by a constant factor. We complement these upper bounds by proving that NC-kRC is hard to approximate to within a factor of k^{eps} for some fixed eps>0.
We then turn to study a simpler version of NC-kRC, where only one source-sink pair is present. We give a simple bi-criteria approximation algorithm for this case, and show evidence that even this restricted version of the problem may be hard to approximate. For example, we prove that the single source-sink pair version of NC-kRC has no constant-factor approximation, assuming Feige's Random k-AND assumption.
- On Quadratic Programming with a ratio objective
(With Aditya Bhaskara, Moses Charikar and Rajsekar Manokaran)
ICALP 2012. [PDF | ArXiv ]
- Approximating the matrix p-norm
(With Aditya Bhaskara)
SODA 2011. [PDF | Abstract]We consider the problem of computing the q->p norm of a matrix A, which is defined for p,q >= 1, as |A|q->p = Maxx ||Ax||p, where ||x||q=1
This is in general a non-convex optimization problem, and is a natural generalization of the well-studied question of computing singular values ( when p=q=2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p=1 and q=\infty ). Our first result is an efficient algorithm for computing the q->p norm of matrices with non-negative entries, when q >= p >= 1. The algorithm we analyze can be seen as an analog of power iteration for computing eigenvalues.
We then present an application of our techniques in the oblivious routing setting: we make constructive a recent existential result of Englert and Racke on O(log n) competitive oblivious routing schemes in the l_p norm.
On the other hand, for general matrices, we show "almost polynomial" factor hardness for 2 < p <= q and p <= q < 2 if NP does not have quasi-polynomial time algorithms. - Detecting High Log-Densities -- an O(n1/4) Approximation for Densest k-Subgraph
(With Aditya Bhaskara, Moses Charikar, Eden Chlamtac and Uri Feige)
STOC 2010. [PDF | Abstract]In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg~\cite{FKP}, gives an approximation ratio of $n^{1/3 - \epsilon}$ for some specific $\epsilon > 0$ (estimated by those authors at around $\epsilon = 1/60$).
We present an algorithm that for every $\epsilon> 0$ approximates the Densest $k$-Subgraph problem within a ratio of $n^{1/4 + \epsilon}$ in time $n^{O(1/\epsilon)}$. If allowed to run for time $n^{O(\log n)}$, our algorithm achieves an approximation ratio of $O(n^{1/4})$. Our algorithm is inspired by studying an average-case version of the problem where the goal is to distinguish random graphs from random graphs with planted dense subgraphs -- the approximation ratio we achieve for the general case matches the ``distinguishing ratio'' we obtain for this planted problem. Achieving a distinguishing ratio of $o(n^{1/4})$ for the planted problem (in polynomial time) is beyond the reach of our current techniques.
At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in $G$, and using these counts to identify the vertices of the dense subgraph. Our algorithm is based on the following principle. We say that a graph $G(V,E)$ has log-density $\alpha$ if its average degree is $\Theta(|V|^{\alpha})$. The algorithmic core of our result is a family of algorithms that output $k$-subgraphs of nontrivial density whenever the log-density of the densest $k$-subgraph is larger than the log-density of the host graph.
Finally, we extend this algorithm to obtain an $O(n^{1/4 - \epsilon})$-approximation algorithm which runs in time $O(2^{n^{O(\epsilon)}})$ and also explore various approaches to obtain better approximation algorithms in restricted parameter settings for random instances.
-
Graph Isomorphism: Approximate and Robust
(With Yu Wu, Yuichi Yoshida and Yuan Zhou).
- Non-cooperative Bundling games
(With Ravishankar Krishnaswamy, Pandu Rangan Chandrasekaran and Ravi Sundaram)
[Manuscript| Abstract]Bundling is a basic problem in commerce. We define and study the Bundling Game: each of n players with a unique product to sell can choose to sell their products in bundles with other players' products so as to maximize payoff; the players have knowledge of the valuations of m (potential) buyers for each of the products. We define two natural classes of payoff functions - Fair Profit Sharing Functions (FPSFs) and Incentive Functions (IFs). For FPSFs we prove the surprising impossibility result that there exists no Fair Profit Sharing Payoff Function that can always guarantee stability. We counterbalance this result by showing that pure Nash equilibria always exist for IFs and obtain tight bounds on the Price of Anarchy. In particular, the revenue maximizing bundling is a pure Nash equilibrium. Motivated by the bundling game, we present hardness results and approximation algorithms for finding revenue maximizing bundlings and bundles and associated variants.
Some older work includes