Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Universal Algorithms for Clustering Problems

Universal Algorithms for Clustering Problems ARUN GANESH, UC Berkeley, USA BRUCE M. MAGGS, Duke University and Emerald Innovations, USA DEBMALYA PANIGRAHI, Duke University, USA This paper presentsuniversalalgorithms for clustering problems, including the widely k-median, studie k-means, d and k-center objectives. The input is a metric space containing potential allclient locations. The algorithm must k cluster select centers such that they are a good solution for any subset of clients that actually realize. Speciically, weregr aim et, for low deined as the maximum over all subsets of the diference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solution sol for a clustering problem is said to(αb,eβan )-approximation if for all subsets ′ ′ ′ ′ ′ of clients C , it satisies sol(C ) ≤ α · opt(C ) + β · mr, where opt(C ) is the cost of the optimal solution forCclients and mr is the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering obje k-me ctiv dian, es of k-means, and k-center that achieve(O (1),O (1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to ℓ other -objectives and the setting where some subset of the clients are ixed. We also give hardness results showing(that α, β )-approximation is NP-harαd if or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in (Osome (1),Osense (1))-appr , oximation is the strongest type of guarantee obtainable for universal clustering. CCS Concepts: · Theory of computation → Facility location and clustering. Additional Key Words and Phrases: universal algorithms, clustering 1 INTRODUCTION In universalapproximation (e.g., 9,[10, 12, 22, 26, 31, 36, 49, 50]), the algorithm is presented with a set potential of input points and must produce a solution. After seeing the solution, an adversary selects some subset of the points as the actual realization of the input, and the cost of the solution is based on this realization. The goal of a universal algorithm is to obtain a solution that is near-optimal every possible for input realization. For example, suppose that a network-based-service provider can aford to deploy serversk at locations around the world and hopes to minimize latency between clients and servers. The service provider does not know in advance which clients will request service, but knows where clients are located. A universal solution provides guarantees on the quality of the solution regardless of which clients ultimately request service. As another example, suppose that a program committee chair wishes to invite k people to serve on the committee. The chair knows the areas of expertise of each person who is qualiied to serve. Based on past iterations of the conference, the chair also knows about many possible topics that might be addressed by submissions. The chair could use a universal algorithm to select a committee that will cover the topics well, regardless of the topics of the papers that are submitted. The In the context of clustering, universal facility location sometimes refers to facility location where facility costs scale with the number of clients assigned to them. This problem is unrelated to the notion of universal algorithms studied in this paper. Authors’ addresses: Arun Ganesh, arunganesh@berkeley.edu, UC Berkeley, Soda Hall, Berkeley, California, USA, 94709; Bruce M. Maggs, bmm@cs.duke.edu, Duke University and Emerald Innovations, 308 Research Drive, Durham, North Carolina, USA, 27710; Debmalya Panigrahi, debmalya@cs.duke.edu, Duke University, 308 Research Drive, Durham, North Carolina, USA, 27710. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 1549-6325/2022/12-ART https://doi.org/10.1145/3572840 ACM Trans. Algor. 2 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi situation also arises in targeting advertising campaigns to client demographics. Suppose a campaign can spend fork advertisements, each targeted to a speciic client type. While the entire set of client types that are potentially interested in a new product is known, the exact subset of clients that will watch the ads, or eventually purchase the product, is unknown to the advertiser. How does the advertiser target kher advertisements to address the interests of any realized subset of clients? Motivated by these sorts of applications, this paper presents the irst universal algorithms for clustering problems, including the classic k-median,k-means, and k-center problems. The input to these algorithms is a metric space containing all locations clientsand of cluster centers. The algorithm must sele k ct cluster centers such that this is a good solution any forsubset of clients that actually realize. It is tempting to imagine that, in general, for some large enoughα,value one can ofind a solution sol such ′ ′ ′ ′ that for all realizations (i.e., subsets ofCclients) , sol(C ) ≤ α · opt(C ), where sol(C ) denotes sol’s cost in ′ ′ ′ realization C and opt(C ) denotes the optimal cost in realization C . But this turns out to be impossible for many problems, including the clustering problems we study, and indeed this diiculty may have limited the study of universal algorithms. For example, suppose that the inputkfor -methe dian problem is a uniform metric k + 1on points, each with a cluster center and client. In this case, for any sol solution withk cluster centers, there is some realization C consisting of a single client that is not co-located with k cluster any of the centers in sol. Then, ′ ′ sol(C ) > 0 but opt(C ) = 0. Since it is not possible to provide a strict multiplicative approximation guarantee for every realization, we could instead consider an additive approximation. That is, we could instead seek to minimize the regret, deined as the maximum diference between the cost of the algorithm’s solution and the optimal cost across all realizations. Informally, the regret is the additional cost incurred due to not knowing the realization ahead of time. The solution that minimizes regret is calle minimum d the regret solution , or mrs for short, and its ′ ′ regret is termedminimum regror et mr. More formally mr, = min max ′[sol(C ) − opt(C )]. We now seek a sol C ′ ′ ′ ′ ′ solution sol that achieves, for all input realizations C , sol(C )− opt(C ) ≤ mr, i.e.,sol(C ) ≤ opt(C ) + mr. But, obtaining such a solution turns out to NPb-har e d for many problems, and furthermore obtaining a solution that achieves approximately minimum regret (that is, it has regretc ·atmr most for somec) is also NP-hard in general (see Section 8 for more details). In turn, one has to settle for approximationopt on band othmr. That is, we settle ′ ′ ′ for seeking sol such that sol(C ) ≤ α · opt(C ) + β · mr for all C . An algorithm generating such a solution is then called an (α, β )-approximate universal algorithm for the problem. Note that in the aforementioned example withk + 1 points, any solution must pay mr (the distance between any two points) in some realization where opt(C ) = 0 and only one client appears (in which case paying mr might sound avoidable or undesirable). This example demonstrates that stricter notions of regret and approximation (αthan , β )-approximation are infeasible in general, suggesting that (α, β )-approximation is the least relaxed guarantee possible for universal clustering. 1.1 Problem Definitions and Results We are now ready to formally deine our problems and state our results. In all the clustering problems that we consider in this paper, the input is a metric space on all the potential client C and cluster locations centersF. Let c denote the metric distance between points i and j. The solution produced by the algorithm comprises k ij cluster centers in F; let us denote this set by sol. Now, suppose a subset of clients C ⊆ C realizes in the actual input. Then, the cost of each client j ∈ C is given as the distance from the client to its closest cluster center, 2 d We only consider inite C in this paper.CIfis ininite C (e.g. = R ), then the minimum regret will usually also be ininite. If one restricts to realizations where, say, at most m clients appear, it suices to consider realizations that m clients place at one of initely many points, letting us reduce to universal k-center with inite C. The special case wherFe= C has also been studied in the clustering literature17 , e,.g., 32],in although [ the more common setting (as in our work) is to not make this assumption. Of course, all results (including ours) without this assumption also apply to the special case. If F = C, the constants in our bounds improve, but the results are qualitatively the same. We note that some sources refer k-center to the problem whenF , C as the k-supplier problem instead, andkuse -center to refer exclusively to the case wher F =e C. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 3 i.e., cost(j, sol) = min c . The clustering problems difer in how these costs are combined into the overall i∈sol ij minimization objective. The respective objectives are given below: • k-median (e.g., [6, 13, 17, 34, 44]): sum of client costs, i.e sol.,(C ) = ′ cost(j, sol). j∈C • k-center (e.g., [21, 32, 33, 40, 47]): maximumclient cost, i.e sol.,(C ) = max cost(j, sol). j∈C ′ 2 • k-means (e.g., [1, 30, 37, 43, 45]): ℓ -norm of client costs, i.e sol.,(C ) = ′ cost(j, sol) . 2 j∈C We also consider ℓ -clustering (e.g., [30]) which generalizes all these individual clusteringℓobje - ctives. In p p clustering, the objective isℓ the -norm of the client costs for a given value p ≥ 1, i.e., 1/p ′ * p+ . / sol(C ) = cost(j, sol) . j∈C , - Note that k-median andk-means are special cases ℓof-clustering for p = 1 and p = 2 respectively k-center . can also be deined in the ℓ -clustering framework as the limit of the objectiv p →e∞for ; moreover, it is well-known that ℓ -norms only difer by constants for p > logn (see Appendix A), thereby allowingk-center the objective to be approximated within a constantℓby -clustering for p = logn. Our main result is to obtain (O (1),O (1))-approximate universal algorithms k-me for dian,k-center, and k-means. We also generalize these results toℓ the -clustering problem. Theorem 1.1. There are (O (1),O (1))-approximate universal algorithms forkthe -median,k-means, and k-center problems. More generally, there are (O (p),O (p ))-approximate universal algorithmsℓfor -clustering problems, for any p ≥ 1. Remark: The bound fork-means is by setting p = 2 inℓ -clustering. For k-median andk-center, we use separate algorithms to obtain improved bounds than those provided by ℓ -clustering the result. This is particularly noteworthy fork-center where ℓ -clustering only gives poly-logarithmic approximation. Universal Clustering with Fixed Clients. We also consider a more general setting where some of the clients are ixed, i.e., are there in any realization, but the remaining clients may or may not realize as in the previous case. (Of course, if no client is ixed, we get back the previous setting as a special case.) This more general model is inspired by settings where a set of clients is already present but the remaining clients are mere predictions. This surprisingly creates new technical challenges, that we overcome to get: Theorem 1.2. There are (O (1),O (1))-approximate universal algorithms for kthe -median,k-means, and k- 2 2 center problems with ixed clients. More generally, there ar (Oe(p ),O (p ))-approximate universal algorithms for ℓ -clustering problems, for anyp ≥ 1. Hardness Results. Next, we study the limits of approximation for universal clustering. In particular, we show that the universal clustering problems for all the objectives considered in this NP-har pap d er inaraerather strong sense. Speciically, we show that bαoth and β are separately bounded away from 1, irrespective of the value of the other parameter, showing the necessity of both α and β in our approximation bounds. Similar lower bounds continue to hold for universal clustering in Euclidean metrics, even when PTASes are known in the oline (non-universal) setting [1, 5, 41, 43, 47]. Theorem 1.3. In universalℓ -clustering for anyp ≥ 1, obtaining α < 3 or β < 2 isNP-hard. Even for Euclidean metrics, obtaining α < 1.8 or β ≤ 1 isNP-hard. The lower bounds on α (resp., β) are independent of the value ofβ (resp., α). Interestingly, our lower bounds rely on realizations where sometimes as few as one client appears. This suggests that e.g. redeining regret to be some function of the number of clients that appear (rather than just their cost) cannot subvert these lower bounds. ACM Trans. Algor. 4 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 1.2 Techniques Before discussing our techniques, we discuss why standard approximations for clustering problems are insuicient. It is known that theoptimalsolution for the realization that includes all clients (1, 2)-appr giv oximation es a for universal k-median (this is a corollary of a more general result 38]; wein do[not know if their analysis can be extended to e.g. k-means), giving universal algorithms for easyž ł cases k-meof dian such as tree metrics. But, the clustering problems we consider in this pap NP er-har ared in general; so, the best we can hope for in polynomial time is to obtain optimal fractionalsolutions, or approximateinteger solutions. Unfortunately, the proof of38 [ ] does not generalize any to regret guarantee for the optimal fractionalsolution. Furthermore, for all problems considered in this paper, (e1v+ enϵ )-approximate (integer) solutions for the łall clientsž instance are not guaranteed to be (α, β )-approximations for any inite α, β (see the example in Appendix B). These observations fundamentally distinguish universal approximations NP-hard pr for oblems like the clustering problems in this paper from those in P, and require us to develop new techniques for universal approximations. In this paper, we develop a general framework for universal approximation based on linear prlp ogramming ) ( relaxations that forms the basis of our results k-me ondian,k-means, and k-center (Theorem 1.1) as well as the extension to universal clustering with ixed clients (Theorem 1.2). The irst step in our framework is to write lpan relaxation of the regret minimization problem. In this formulation, we introduce a new regret variable that we seek to minimize and is constrained to be at least the diference between the (fractional) solution obtainelp d by and the the optimal integer solution for every realizable instance . Abstractly, if the lp relaxation of the optimization problem ismin giv{cen· xby : x ∈ P}, then the new regret minimization lp is given by min{r : x ∈ P; c(I )·x ≤ opt(I ) + r, ∀I}. Here, I ranges over all realizable instances of the problem. Hence lp is, the exponential in size, and we need to invoke the ellipsoid method via a separation oracle to obtain an optimal fractional solution. We irst note that the constraintsx ∈ P can be handled using a separation oracle for the optimization problem itself. So, our focus is on designing a separation oracle for the new set of constraints c(I )·x ≤ opt(I ) + r, ∀I. This amounts to determining the regret of a ixed solution given x, by which unfortunatelyNP , is-hard for our clustering problems. So, we settle for designing an approximate separation oracle, i.e., approximating the regret of a given solution. For k-median, we reduce this to a submodular maximization problem subject to a cardinality constraint, which can then be (approximately) solved via standard greedy algorithms. k-means, For and more generally ℓ -clustering, the situation is more complex. Similarly k-median, to maximizing regret for a ixed solution can be reduced to a set of submodular maximization problems, but deriving the functional value for a given set now requires solving the NP-hard knapsack problem. We overcome this diiculty by showing that we can fractional use knapsack solutions as surrogates for the optimal integer knapsack solution in this reduction, thereby restoring polynomial running time of the submodular maximization oracle. Finally, in the presence of ixed clients, we need to run the submodular maximization algorithm over a set of combinatorial obje indep cts calle endencedsystems . In this case, the resulting separation oracle only gives a bicriteria guarantee, i.e., the solutions it considers feasible are only guaranteed to satisfy∀I : c(I )·x ≤ α · opt(I ) + β · r for constantsα and β. Note that the bi-criteria guarantee suices for our purposes since these constants are absorbed in the overall approximation bounds. The next step in our framework is to round these fractional solutions to integer solutions for the regret minimization lp. Typically, in clustering problemsksuch -median, as lp rounding algorithms giv average e guarantees, i.e., although the overall objective in the integer solution is bounded against that of the fractional For problems like k-means with non-linear objectives, the constraint c(I )·x ≤ opt(I ) + r cannot be replaced with a constraint that is simultaneously linear x, r.in However, for a ixed value rof , the corresponding non-linear constraints still give a convex feasible region, and so the techniques we discuss in this section can still be used. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 5 solution, individual connection costs of clients are not (deterministically) preserved in the rounding. But, average guarantees are too weak for our purpose: in a realized instance, an adversary may only select the clients whose connection costs increase by a large factor in the rounding thereby causing a large regret. Ideally, we would like to ensure that the connection costevof ery individual client is preserved up to a constant in the rounding. However, this may be impossible in general, i.e., no integer solution might satisfy this requirement. Consider a uniform metric ovker+ 1 points. One fractional solution is to make fraction of each point a cluster center. k+1 1 1 Then, each client has connection cost in the fractional solution since it needs to conne fraction ct to a k+1 k+1 remote point. However, in any integer solution, since there ar k ecluster only centers butk + 1 points overall, there is one client that has connection cost of 1, which k + 1is times its fractional connection cost. To overcome this diiculty, we allow for a uniform additivincr e ease in the connection cost of every client. We show that such a rounding also preserves the regret guarantee of our fractional solution within constant factors. The clustering problem we now solve has a modiied objective: for every client, the distance to the closest cluster center is now discounted by the additive allowance, with the caveat that the connection cost is 0 if this diference is negative. This variant is a generalization of a problem25 app ],earing and we call in [ it clustering with discounts(e.g., fork-median, we call this problem k-median with discounts .) Our main tool in the rounding then becomes an approximation algorithm to clustering problems with discounts. k-median, For we use a Lagrangian relaxation of this problem to the classic facility location problem to design such an appr k-means oximation. For and ℓ -clustering, the same general concept applies, but we need an additional ingredient virtual calle solution d a that acts as a surrogate between the regret of the (rounded) integer solution and that of the fractional solution obtained above. Fork-center, we give a purely combinatorial (greedy) algorithm. 1.3 Related Work For all previous universal algorithms, the approximation factor corresponds to ourα,parameter i.e., these algorithms ar(αe , 0)-approximate. The notion of regret was not considered. As we have explained, however, it is not possible to obtain such results for universal clustering. Furthermore, it may be possible to trade-of some of the large values αofin the results below, eΩ.g., ( n) for set cover, by allowing β > 0. Universal algorithms have been of large interest in part because of their applications as online algorithms where all the computation is performed ahead of time. Much of the work on universal algorithms has focused on TSP. For Euclidean TSP in the plane, Platzman and Bartholdi 49] gave[an O (logn)-approximate universal algorithm. Hajiaghayi et al. [31] generalized this resultO to(an logn)-approximation for minor-free metrics, and Schalekamp and Shmoys [50] gave an O (logn)-approximation for tree metrics. For arbitrary metrics, et al.Jia [35] presented an O (logn/ log log n)-approximation, which improves to O (an logn)-approximation for doubling metrics. The approximation factor for arbitrary metrics was imprO ov(elog d ton) by Gupta et al. [26]. It is also known that these logarithmic bounds are essentially tight for univ9ersal , 10, 22TSP , 31].[ For the metric Steiner tree problem, Jiaet al. [35] adapted their own TSP algorithm to provide O (an logn/ log log n)-approximate universal algorithm, logn which is also tight up to the exponent of the2,log 10, 35 [ ]. Busch et al. [12] present an O (2 )-approximation for universal Steiner tree on general graphs and O (pan olylog (n))-approximation for minor-free graphs. Finally, for universal (weighted) set coveret , Jia al. [35] (see also23 [ ]) provide anO ( n logn)-approximate universal algorithm and an almost matching lower bound. The problem of minimizing regret has been studied in the context of robust optimization. The robust 1-median problem was introduced for tree metrics by Kouvelis and42 Yu] and in [several faster algorithms and for general metrics were developed in the following years (8e]). .g. For see r[obust k-center, Averbakh and Berman[8] gave a reduction to a linear number of ordinar k-center y problems, and thus for classes of instances where the ordinary k-center problem is polynomial time solvable (e.g., instances with k or constant on tree metrics) this problem is also polynomial time solvable 7]. A difer [ ent notion of robust algorithms is one wherSe of a set possible ACM Trans. Algor. 6 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi scenarios is provided as part of the input to the problem. This model was originally considered for network design problems (see the survey by Chekuri18[]). Anthony et al. [3] gave an O (logn + log|S|)-approximation algorithm for solving k-median and a variety of related problems in this model 11 (se])e on also an n[ -point metric space. However, note that |S| can be exponential |in C| in general. Another popular model for uncertainty is two-stage optimization 16, 19 (e,.g., 20,[27ś 29, 39, 51ś 53]). Here, the irst stage presents a set of realizable instances (or a distribution over them) and the second stage chooses one of those realizations. The algorithm is free to make choices at either stage but those choices come at a higher cost in the second stage when it has more information about the input. Because of the diferent costs, results in this model have no bearing on our setting. Roadmap. We present the constant approximation algorithms (Theorem 1.1) for univ k-me ersal dian,ℓ -clustering (k-means is a special case), and k-center in Sections 2, 3, and 4 respectively. In describing these algorithms, we defer the clustering with discounts algorithms used in the rounding to Appendix C. We also give the extensions to universal clustering with ixed clients k-median, for k-means/ℓ -clustering, and k-center (Theorem 1.2) in Sections 5, 6, and 7. Finally, the hardness results for general metrics and for Euclidean metrics (Theorem 1.3) appear in Sections 8 and 9 respectively. 2 UNIVERSAL k-MEDIAN In this section, we prove the following theorem: Theorem 2.1. There exists a(27, 49)-approximate universal algorithm forkthe -median problem. We follow the recipe described in Section 1.2. Namely, the algorithm has two components. The irst component is a separation oracle for the regret minimization lp based on submodular maximization, which we deine below. E + Submodular Maximization with Cardinality Constraints. A (non-negative) function f : 2 → R is said to be submodular if for all S ⊆ T ⊆ E and x ∈ E, we have f (T ∪ {x}) − f (T ) ≤ f (S ∪ {x}) − f (S). It is said to be monotone if for all S ⊆ T ⊆ E, we have f (T ) ≥ f (S). The following theorem for maximizing monotone submodular functions subject to a cardinality constraint is well-known. Theorem 2.2 (Nemhauser et al. [48]). For the problem of inding S ⊆ E that maximizes a monotone submodular E + functionf : 2 → R , the natural greedy algorithm that starts with S = ∅ and repeatedly adds x ∈ E that maximizes f (S ∪ {x}) until|S| = k, is a ≈ 1.58-approximation. e−1 We give the reduction from the separation oracle to submodular maximization in Section 2.1, and then employ the above theorem. The second component of our framework is a rounding algorithm that emplo k-me ys the dian with discounts problem, which we deine below. k-median with Discounts. In the k-median with discounts problem, we are giv k-me en adian instance, but where each client j has an additional (non-negative) parameter r called its discount . Just as in thek-median problem, our goal is to place k cluster centers that minimize the total connection costs of all clients. But, the connection cost for client j can now be discounted by up tor , i.e., client j with connection cost c contributes j j (c − r ) := max{0,c − r } to the objective of the solution. j j j j Let opt be the cost of an optimal solution to k-me thedian with discounts problem. We say an algorithm alg that outputs a solution with connectionccost for client j is a(γ , σ )-approximation if: (c − γ · r ) ≤ σ · opt. j j j∈C That is, a(γ , σ )-approximate algorithm outputs a solution whose objective function when computed using discountsγ · r for all j is at mostσ times the optimal objective using discounts r . In the case where all r are j j j ACM Trans. Algor. Universal Algorithms for Clustering Problems • 7 equal,25 [ ] gave a (9, 6)-approximation algorithm for this problem based on the classic primal-dual algorithm for k-median. The following lemma generalizes their result to the setting r may wher difer: e the Lemma 2.3. There exists a (deterministic) polynomial-time (9, 6)-approximation algorithm forkthe -median with discounts problem. We give details of the algorithm and the proof of this lemma in Appendix C. We noterthat arewhen equal, all the constants in25 [ ] can be improved (see e.g.15 [ ]); we do not know of any similar improvement when r the may difer. In Section 2.2, we give the reduction from rounding the fractional solutionkfor -meuniv dianersal to the k-median with discounts problem, and then employ the above lemma. 2.1 Universal k-median: Fractional Algorithm The standard k-median polytope (see e.g., [34]) is given by: X X P = {(x,y) : x ≤ k;∀i, j : y ≤ x ;∀j : y ≥ 1;∀i, j : x ,y ∈ [0, 1]}. i ij i ij i ij i i Here, x represents whether pointi is chosen as a cluster center, and y represents whether client j connects to i i ij as its cluster center. Now, consider the follo lpwing formulation for minimizing r: regret X X ′ ′ min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (1) ij ij j∈C i ′ ′ where opt(C ) is the cost of the (integral) optimal solution in Crealization . Note that the new constraints: P P ′ ′ ∀C ⊆ C : ′ c y − opt(C ) ≤ r (we call it the regret constraint set) require that the regret is ratin most j∈C i ij ij all realizations. In order to solvelp (1), we need a separation oracle for the regret constraint set. Note that there are exponentially ′ ′ ′ many constraints corresponding to realizations C ; moreover, even for a single realization C , computingopt(C ) isNP-hard. So, we resort to designing an approximateseparation oracle. Fix some fractional solution (x,y, r ). ′ ′ Overloading notation,Slet (C ) denote the cost of the solution with cluster centers S in realization C . By deinition, ′ ′ opt(C ) = min S (C ). Then designing a separation oracle for the regret constraint set is equivalent to S⊆F,|S|=k determining if the following inequality holds:   X X    max max  c y − S (C ) ≤ r . ij ij   C ⊆C S⊆F,|S|=k  ′  j∈C i   We lip the order of the two maximizations, and deine f (S) as follows:   X X    ′    f (S) = max c y − S (C ) . y ij ij ′   C ⊆C  ′  j∈C i   Then designing a separation oracle is equivalent to maximizing f (S) forS ⊆ F subject to |S| = k. The rest of the proof consists of showing that this function is monotone and submodular, and eiciently computable. Lemma 2.4. Fixy. Then, f (S) is a monotone submodular function Sin . Moreover, f (S) is eiciently computable y y for a ixedS. Proof. Let d (j, S) := min′ c ′ denote the distance from clien j totthe nearest cluster center in S. IfS = ∅, i ∈S i j we say d (j, S) := ∞. The value of C that deinesf (S) is the set of all clients closer S thantoto the fractional ACM Trans. Algor. 8 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi ′ ′ solution y, i.e., c y > min c . This immediately establishes eicient computability f (S). Moreof over, i ij ij i ∈S i j y we can equivalently write f (S) as follows: X X f (S) = ( c y − d (j, S)) . y ij ij j∈C i A sum of monotone submodular functions is a monotone submodular function, so it suices to show that for all clients j, the new function д (S) := ( c y − d (j, S)) is monotone submodular. y, j ij ij P P + + • д ismonotone: forS ⊆ T , d (j,T ) ≤ d (j, S), and thus ( c y − d (j, S)) ≤ ( c y − d (j,T )) . y, j ij ij ij ij i i • д is submodular if: y, j ∀S ⊆ T ⊆ F ,∀x ∈ F : д (S ∪ {x}) − д (S) ≥ д (T ∪ {x}) − д (T ) y, j y, j y, j y, j FixS, T , and x. Assume д (T ∪{x})− д (T ) is positive (if it is zero, by monotonicity the above inequality y, j y, j trivially holds). This implies x is closer that to client j than any cluster center in T (and hence S too), i.e., d (j, x ) ≤ d (j,T ) ≤ d (j, S). Thus, d (j, x ) = d (j, S ∪ {x}) = d (j,T ∪ {x}) which impliesдthat (S ∪ {x}) = y, j д (T ∪ {x}). Then we just need to show that д (S) ≤ д (T ), but this holds by monotonicity. □ y, j y, j y, j By standard results (see e.g., GLS24 [ ]), we get an (α, β )-approximate fractional solution for univ k-meersal dian via the ellipsoid method if we have an approximate separationlporacle (1) thatfor given a fractional solution (x,y, r ) does either of the following: • Declares(x,y, r ) feasible, in which(xcase ,y) has cost at most α · opt(I) + β · r in all realizations, or • Outputs an inequality violate (xd,yby , r ) inlp (1). The approximate separation oracle does the following for the regret constraint set (all other constraints e−1 can be checked exactly): Given a solution (x,y, r ), ind an -approximate maximizer S off via Lemma 2.4 and Theorem 2.2. Let C be the set of clients closer S than to the fractional solution y (i.e., the realization that P P maximizes f (S)). Iff (S) > r, the separation oracle returns the violated inequality ′ c y − S (C ) ≤ r; y y j∈C i ij ij else, it declares the solution feasible. Whenever the actual (x,ryegr ) is et of at least · r, this oracle will S ind e−1 such that f (S) > r and output a violated inequality. Hence, we get the following lemma: Lemma 2.5. There exists a deterministic algorithm that in polynomial time computes a fractional ≈ 1.58- e−1 approximate solution for lp (1) representing the universal k-median problem. 2.2 Universal k-Median: Rounding Algorithm Let frac denote the -approximate fractional solution to the univ k-me ersal dian problem provided by Lemma 2.5. e−1 We will use the following prop k-me ertydian, of shown by Archeret al. [4]. Lemma 2.6 ([4]). The integrality gap of the natural lp relaxation of thek-median problem is at most 3. Lemmas 2.5 and 2.6 imply that that for any set of clients C , 1 e ′ ′ ′ · opt(C ) ≤ frac(C ) ≤ opt(C ) + · mr. (2) 3 e − 1 ′ ′ Our overall goal is to obtain a solution sol that minimizes max ′ [sol(C ) − opt(C )]. But, instead of optimizing C ⊆C ′ ′ over the exponentially many difer opt ent(C ) solutions, we use the surrogate·fra 3 c(C ) which has the advantage of being deined by a ixed solution frac, but still 3-approximates opt(C ) by Eq. 2. This suggests minimizing ′ ′ ′ the following objective instead: max [sol(C ) − 3· frac(C )]. For a given solution sol, the set of clients C that maximizes the new expression are the clients whose connection sol costs(denote in d c ) exceeds 3 times their cost infrac (denoted f ): ′ ′ + max[sol(C ) − 3· frac(C )] = (c − 3f ) . j j j∈C ACM Trans. Algor. Universal Algorithms for Clustering Problems • 9 But, minimizing this objective is precisely thek-me aim dian of the with discounts problem, where the discount for client j is f3 . This allows us to invoke Lemma 2.3 for k-me thedian with discounts problem. Thus, our overall algorithm is as follows. First, use Lemma 2.5 to ind a fractional frac =solution (x,y, r ). Let f := c y be the connection cost of client j infrac. Then, construct a k-median with discounts instance j ij ij where client j has discount f3 , and use Lemma 2.3 on this instance to obtain the inal solution to the universal k-median problem. We now complete the proof of Theorem 2.1 using the above lemmas. Proof of Theorem 2.1. Let m be the connection cost ofmrs to client j. Then, ′ ′ ′ ′ mr = max[mrs(C ) − opt(C )] ≥ max[mrs(C ) − 3· frac(C )] (by Eq. (2)) ′ ′ C ⊆C C ⊆C X X = (m − 3f ) = (m − 3f ) . j j j j j∈C j∈C:m >3f j j Thus, mr upper bounds the optimal objective ink-me the dian with discounts instance that we construct.c Let be the connection cost of client j in the solution output by the algorithm. Then by Lemma 2.3 we get that: X X + + (c − 27f ) ≤ 6· (m − 3f ) ≤ 6· mr. (3) j j j j j∈C j∈C As a consequence, we have: X X X X ′ + ′ ∀C ⊆ C : c = [27f + (c − 27f )] ≤ 27f + (c − 27f ) ≤ 27· frac(C ) + 6· mr, j j j j j j j ′ ′ ′ ′ j∈C j∈C j∈C j∈C where the last step uses the deinition f of and Eq. (3). Now, using the bound onfrac(C ) from Eq.(2) in the inequality above, we have the desired bound on the cost of the algorithm: ′ ′ ′ ′ ∀C ⊆ C : c ≤ 27· frac(C ) + 6· mr ≤ 27 opt(C ) + · mr + 6· mr ≤ 27· opt(C ) + 49· mr. □ e − 1 j∈C 3 UNIVERSAL ℓ -CLUSTERING AND UNIVERSAL k-MEANS In this section, we give universal algorithms ℓ -clustering for with the following guarantee: Theorem 3.1. For all p ≥ 1, there exists a(54p, 103p )-approximate universal algorithm for ℓthe -clustering problem. As a corollary, we obtain the following result forkuniv -means ersal (p = 2). Corollary 3.2. There exists a(108, 412)-approximate universal algorithm forkthe -means problem. Before describing further details of the univ ℓ -clustering ersal algorithm, we note a rather unusual feature of the universal clustering framework. Typically, in ℓ -clustering, standard the algorithms efectively optimize the ℓ objective (e.g., sum of squared distances for k-means) because these are equivalent in the following sense: an α-approximation for the ℓ objective is equivalent to α an -approximation for the ℓ objective. But, this equivalence fails in the setting of universal algorithms for reasons that we discuss below. Indeed, we irst give a universal ℓ -clustering algorithm, which is a simple extension k-median of thealgorithm, and then give universal ℓ -clustering algorithms, which turns out to be much more challenging. ACM Trans. Algor. 10 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 3.1 Universal ℓ -Clustering As in universal k-median, we can write an lp formulation for univℓersal -clustering, i.e. clustering with the ′ p objectivesol(C ) = ′ cost(j, sol) : j∈C X X ′ ′ min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (4) ij ij j∈C i where P is stillk-me the dian polytope deined in Section 2.1. The main diiculty is thatℓ the distances no longer form a metric, i.e., do not satisfy triangle inequality. Nevertheless, the distances still have a metric connection, that thepyth arpeothe wer of metric distances. We show that this connection is suicient to prove the following result: e 2 p p p Theorem 3.3. For all p ≥ 1, there exists a(27 , 27 · + · 9 )-approximate algorithm for the universal ℓ e−1 3 clustering problem. As in universal k-median, a key component in proving Theorem 3.3 is a rounding algorithm that employs a bi-criteria approximationℓto-clustering the with discounts problem. Indeed, this result will also be useful in the next subsection, when we consider the universal ℓ -clustering problem. So, we formallyℓdeine -clustering with discounts problem below and state our result for it. p p ℓ -clustering with Discounts. In this problem, are given ℓ -clustering a instance, but where each client j has an p p additional (non-negative) parameter r called its discount . Our goal is to place k cluster centers that minimize the total connection costs of all clients. But, the connection costjfor canclient now be discounted by up tor , p p p p i.e., client j with connection cost c contributes(c − r ) := max{0,c − r } to the objective of the solution. j j j j (Note that the k-median with discounts problem that we described in the previous section is a special case of this problem for p = 1.) Let opt be the cost of an optimal solution to ℓ -clustering the with discounts problem. We say an algorithm p 5 alg that outputs a solution with connectionccost for client j is a(γ , σ )-approximationif: p p p + (c − γ · r ) ≤ σ · opt. j j j∈C That is, a(γ , σ )-approximate algorithm outputs a solution whose objective function computed using discounts γ · r for all j is at mostσ times the optimal objective using discounts r . We give the following result about the j j ℓ -clustering with discounts problem (see Appendix C for details): p 2 p Lemma 3.4. There exists a (deterministic) polynomial-time (9 , · 9 )-approximation algorithm for ℓthe- clustering with discounts problem. We now employ this lemma in obtaining Theorem 3.3. Recall that thekuniv -median ersal result in the previous section had three main ingredients: • Lemma 2.5 to obtain an -approximate fractional solution. This continues to hold ℓ obje forctiv the e, e−1 since Lemma 2.5 does not use any metric property. • An upper bound of 3 on the integrality gap of the natural lp relaxation kof -median from4]. [ The same result now gives an upper bound ofon 3 the integrality gapℓ of -clustering. 5 p We refer to this as (aγ , σ )-approximation instead of (γ,aσ )-approximation to emphasize the diference between the scaling factor for discountsγ and the loss in approximation factor γ . ACM Trans. Algor. Universal Algorithms for Clustering Problems • 11 • Lemma 2.3 to obtain an approximation guarantee forkthe -median with discounts problem. This is where the metric property of the connection costs inkthe -median problem was being used. Nevertheless, Lemma 3.4 above gives a generalization of Lemma 2.3 to ℓ -clustering the with discounts problem. Theorem 3.3 now follows from these three observations using exactly the same steps as Theorem 2.1 in the previous section; we omit these steps for brevity. □ The rest of this section is dedicated to the univ ℓ ersal -clustering problem. Askfor -median, we have two stages, the fractional algorithm and the rounding algorithm, that we present in the next two subsections. 3.2 Universal ℓ -Clustering: Fractional Algorithm Let us start by describing the fractional relaxation of the univ ℓ -clustering ersal problem (again,P is thek-median polytope deined as in Section 2.1): 1/p X X ′ * + ′ . / min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (5) ij ij j∈C i , - As described earlier, when minimizing regr ℓ et, andthe ℓ objectives are no longer equivalent. For instance, recall that one of the key steps in Lemma 2.5 was to establish the submodularity of thef function (S) denoting the maximum regret caused by any realization when comparing two given solutions: a fractional y andsolution an integer solution S. Indeed, the worst case realization had a simple structure: choose all clients that have a smaller connection costSfor than fory. This observation continues to hold for ℓ the objective because of the linearityf of (S) as a function of the realized clients y and once S are ixed. But, theℓ objective is not linear even y p after ixing the solutions, and as a consequence, we lose both the simple structure of the maximizing realization as well as the submodularity of the overall function f (S). For instance, consider two clients: one at distances 1 and 0, and another at distances 1+ ϵ and 1, fromy and S respectively. Using ℓtheobjective, the regret with both 1/p clients(is 2 + ϵ ) − 1, whereas with just the irst client the regret is 1, which is larger p ≥ 2. for The above observation results in two related diiculties: irst, f (S) that is not submodular and hence standard submodular maximization techniques do not apply, but also that y and giv S,en we cannot even compute the functionf (S) eiciently. To overcome this diiculty, we further reine the function f (S) to a collection of y y functions f (S) by also ixing the cost of the fractional solution y to at most a given value Y . As we will soon y,Y see, this allows us to relateℓ the objective to theℓ objective, but under an additional łknapsackž-like packing constraint. It is still not immediate f is thateiciently computable because of the knapsack constraint that y,Y we have introduced. Our second observation is that relaxing NPthe -har ( d) integer knapsack problem to the corresponding (poly-time) fractionalknapsack problem does not afect the optimal value f of (S) (i.e., allowing y,Y fractional clients does not incr y’sease regret), while making the function eiciently computable. As a bonus, the relaxation to fractional knapsack also restores submodularity of the function, allowing us to use standard maximization tools as earlier. We describe these steps in detail below. p q q ′ ′ To relate the regret in the ℓ and ℓ objectives, let frac (C ) and S (C ) denote the ℓ objective to theqth p p p p p ′ ′ ′ power fory and S respectively in realization C (and letfrac (C ), S (C ) denote the correspondingℓ objectives). p p p Assume that y’s regret againstS is non-zero. Then: P P 6 ′ The constraints are not simultaneously linear y andin r , although ixing r , we can write these constraints as ′ c y ≤ (opt(C ) + i j j∈C i i j r ) , which is linear y. In inturn, to solve this program we bisection searchro,vusing er the ellipsoid method to determine if there is a feasible point for each ixe r . d ACM Trans. Algor. 12 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi   p p ′ ′ f g   frac (C ) − S (C ) p p ′ ′     max frac (C ) − S (C ) = max p p p−1 q p−1−q ′ ′   C ⊆C C ⊆C ′ ′  frac (C )S (C )  p p q=0   p p  ′ ′  frac (C ) − S (C )   p p   ≃ max   p−1 C ⊆C   frac (C )  p  p p  ′ ′  frac (C ) − S (C )   p p   = max max   p 1−1/p Y ′ ′  Y  C ⊆C:frac (C )≤Y   p p   ′ ′ max [frac (C ) − S (C )]  ′ ′  p p C ⊆C,frac (C )≤Y   = max   . 1−1/p       The ≃ denotes equality to within a factor p, and of uses the fact that if the regret is non-zero, then for every ′ ′ ′ C such that frac (C ) > S (C ) (one of which is always the maximizer of all expressions in this equation), every p p p−1 term in the sum in the denominator is upper boundefra d byc (C ). We would like to argue that the numerator, p p p ′ ′ ′ ′ max{frac (C ) − S (C ) : C ⊆ C, frac (C ) ≤ Y}, p p p is a submodular function S. of If we did this, then we could ind an adversary and realization of clients that (approximately) maximizes the regryetby ofiterating over all (discretized) values Y . But,of as described above, it is easier to work with its fractional analog because of the knapsack constraint: p p p f (S) := max{frac (I) − S (I) : I ∈ [0, 1] , frac (I) ≤ Y}. y,Y p p p P P P p p p p p Here, frac (I) := d · c y and S (I) := d · min c are the natural extensions of the ℓ j ij j i∈S p j∈C i∈F p j∈C p ij ij objective to fractional clients,dwher isethe fraction of client j that is includedI.in The next lemma shows that allowing fractional clients does not afect the maximum regret: Lemma 3.5. For any two solutionsy, S, there exists a global maximum fra of c (I) − S (I) over I ∈ [0, 1] where p p all the clients are integral, i.e I ∈.,{0, 1} . Therefore, f g f g ′ ′ max frac (I) − S (I) = max frac (C ) − S (C ) . p p p p C ⊆C I∈[0,1] We remark that unlike for the ℓ objective, integrality of the maximizer is not immediate ℓ obje for ctiv thee because the regret ofy compared to S is not a linear function I. of Proof. We will show that the derivativ fra ecof(I) − S (I) when frac (I) > S (I) with respect to a ixed p p p p d is either always positive or always negativ d ∈e(0for , 1), or negative while < 0d < d and then positive j j j afterwards. This gives that any I with a fractional coordinate wher frace (I) > S (I) (which is necessary for a p p fractional I to be a global maximum but not the integral all-zeroes vector) cannot be a local maximum, giving the lemma. To show this property of the derivative, letting I denote I withd = 0, we have frac (I)−S (I) = (frac (I )+ −j j p p −j p p 1/p 1/p c d ) − (S (I ) + c d ) where c ,c are the ℓ distance from j to frac, S respectively. For positiv d , the e 1 j −j 2 j 1 2 j p p derivative with respectd tois well-deined and equals ACM Trans. Algor. Universal Algorithms for Clustering Problems • 13 1 c c 1 2 * + − . p p 1−1/p 1−1/p (frac (I ) + c d ) (S (I ) + c d ) −j 1 j −j 2 j p p , - The derivative is positive if the following inequality holds: S (I ) + c d p −j 2 j c p−1 > ( ) . frac (I ) + c d 1 p −j 1 j p p We irst focus on the case wherec > c . The left-hand side startsSat(I )/frac (I ) and monotonically 1 2 p −j p −j and asymptotically approaches c /c as d increases. This implies that either it is always c /c atorleast it is 2 1 j 2 1 p/(p−1) p/(p−1) ′ increasing and approaching c /c , i.e. at least (c /c ) ford > 0 or less than(c /c ) for all d < d 2 1 2 1 j 2 1 j ′ p/(p−1) ′ for somed and then greater than (c /c ) for all d > d (here, we are using the fact that c > c and so 2 1 j 1 2 p/(p−1) (c /c ) < c /c ). In turn, the desired property of the derivative holds. 2 1 2 1 p p p p In the case where c ≤ c , at any optimumS (I ) + c d < frac (I ) + c d (otherwisefrac (I) < S (I) 1 2 −j 2 j −j 1 j p p p p and so I cannot be a maximum because the all zeroes vector achieves a better objective) and so the derivative is always negative in this case as desired. □ It turns out that relaxing to fractional clients not only helps in eicient computability f of (Sthe ), function y,Y but also simpliies the proof of submodularity of the function. Lemma 3.6. The functionf (S) as deined above is submodular. y,Y Proof. Fix a universal clustering instance, fractional y, and solution valueY . Consider anyS ⊆ T ⊆ F and x ∈ F. f (T ∪{x})− f (T ) ≤ f (S ∪{x})− f (S). f (S) is the optimum of a fractional knapsack instance y,Y y,Y y,Y y,Y y,Y p p where each client is an item with value equal to the diference between its contribution frac (I) − Sto(I) p p and weight equal to its contribution fracto(I), with the knapsack having total weight Y . For simplicity we can assume there is a dummy item with value 0 and weight Y in the knapsack instance as well (a value 0 item cannot afect the optimum). Note that the weights are ixed for any S, and the values increase monotonically withS. We will refer to the latter fact monotonicity as of values . The optimum of a fractional knapsack instance is given by sorting the items in decreasing ratio of value to weight, and taking the (fractional) preix of the items sorted this way that has total weight Y . We will refer to this fact pr aseix the property. We will show that we can construct a fractional knapsack solution that when using clusterScenters ∪ {x} has value at least f (S) + f (T ∪{x}) − f (T ), proving the lemma. For brevity, we will refer to fractions of clients which may y,Y y,Y y,Y be in the knapsack as if they were integral clients. Considerf (T ∪ {x}) − f (T ). We can split this diference into four cases: y,Y y,Y (1) A client in the optimal knapsack S, Tfor , and T ∪ {x}. (2) A client in the optimal knapsack S and forT ∪ {x} but not forT . (3) A client in the optimal knapsack T and forT ∪ {x} but not forS. (4) A client in the optimal knapsack T ∪for {x} but not forS or T . In every case, the client’s value must have increased (otherwise, it cannot contribute to the diference in cases 1 and 3, or it must also beTin ’s knapsack in cases 2 and 4), i.e x .is the closest cluster center to the client T ∪{in x} (and thus S ∪ {x}). Let w ,w ,w ,w be the total weight of clients in each case. The total weight ofTclients ’s in 1 2 3 4 knapsack but not T ∪ {x} isw + w . We will refer to these clients as replaced clients. The increase in value due 2 4 to cases 2 and 4 can be thought of as replacing the replaced clients with case 2 and 4 clients. In particular, we will think of the case 2 clients as replacing the replaced clientswofwith weight the smallest total value T , and for the case 4 clients as replacing the remaining replaced clients (i.e. those with the largest T ).total value for ACM Trans. Algor. 14 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Without loss of generality, we assume there are no case 1 clients. By the preix property, any of the knapsack instances for S,T ,T ∪ {x} (and alsoS ∪ {x} by monotonicity of values and the preix property) has optimal value equal to the total value of case 1 clients plus the optimal value of a smaller knapsack instance with total weight Y − w and all clients except case 1 clients available. The value of case S 1∪clients {x} and Tfor ∪ {x} is the same (since the values are determinedxby ), and can only be smallerSfor than T by monotonicity of values. In turn, we just need to construct a knapsack forS ∪{x} for the smaller instance with no case 1 clients whose value is at least that Sofplus the contributionf to (T ∪ {x}) − f (T ) from cases 2-4. y,Y y,Y To build the desired knapsackSfor∪ {x}, we start with the knapsack for S. The case 2 clients Sin ’s knapsack by the preix property have less value for S than forT by monotonicity of values. By the preix propertyT, for the case 2 clients have less value than the replaced clients of total w with weight the smallest total value T (since for the former are not in the optimal knapsackTfor , and the latter are). So, the increase in value of the case 2 clients inS’s knapsack is at least the contribution f to(T ∪ {x}) − f (T ) due to replacing clients T ’sin knapsack y,Y y,Y with case 2 clients. To account for the case 4 clients, we take the clients S’s knapsack in which are not case 2 clients with weight w and the least total valueTfor , and replace them with the case 4 clients. These clients are among the clients of total weight w + w with the lowest value-to-weight ratios T ) (for inS’s knapsack (they aren’t necessarily the 2 4 clients of total weight w with the lowest value-to-weight ratios, because we chose not to include case 2 clients in this set). On the other hand, the replaced clients T ’s knapsack in all have value-to-weight ratios T grfor eater than at leastw weight of other clients T ’sin knapsack (those replaced by the case 2 clients). So by monotonicity of values and the preix property, the clients we replace S’s knapsack in with case 4 clients have lower value S for than the clients being replaced by case 4 clientsT ,do and for so we increase the value of our knapsack by more than the contribution to f (T ∪ {x}) − f (T ) due to case 4 clients. y,Y y,Y Lastly, we take any clientsS’s inknapsack which are not in T ’s knapsack or case 2 or case 4 clients with total weight w and replace them with the case 3 clients. Since these clients ar Te’snot knapsack, in their value forT (and thus their value for S by monotonicity of values) is less than the case 3 clients’Tvalue by thefor preix property. In turn, this replacement increases the value of our knapsack by more than the contribution to f (T ∪ {x}) − f (T ) due to case 3 clients. □ y,Y y,Y This lemma allows us to now give an approximate separation oracle for fractional solutions of universal ℓ -clustering by trying all guesses Y (ℓfor -SepOracle in Figure 1). p p One complication of using the above as a separation oracle in the ellipsoid algorithm is that it outputs linear constraints whereas the actual constraints in the fractional relaxation (5)giv are en non-linear in . So, in the following lemma, we need some additional work to show that violation of the linear constraints output by ℓ -SepOracle also implies violation of the non-linear constraints in (5). Lemma 3.7. For any p ≥ 1, ϵ > 0 there exists an algorithm which inds ( a · p + ϵ )-approximation to the lp for e−1 the universalℓ -clustering problem. Proof. The algorithm is to use the ellipsoid metho ℓ -SepOra d with cle as the separation oracle. We note that f (S) can be evaluated in polynomial time by computing optimal fractional knapsacks as discussed in the proof y,Y of Lemma 3.6. In addition, there are polynomially values Y that arof e iterated over, since⌈log ′ c /c ⌉ is max min 1+ϵ O (p logn/ϵ ) times the number of bits needed to describe the largest value c . Soofeach call ℓto-SepOracle ij p takes polynomial time. p p p−1 By Lemma 3.5 and the observation thatfrac (I)−S (I) ≤ p·frac (I)(frac (I)−S (I)) if frac (I)−S (I) > 0, p p p p p p p we get that ℓ -SepOracle cannot output a violated inequality if the input solution is feasible (5). So we just to Eq. ′ ′ e ′ need to show that if frac (C )− S (C ) ≥ ( p + ϵ )r for someS,C , ℓ -SepOracle outputs a violated inequality, p p p e−1 i.e. does not output łFeasible.ž Let Y be the smallest valueYof iterated over byℓ -SepOracle that is at least ACM Trans. Algor. Universal Algorithms for Clustering Problems • 15 ℓ -SepOracle((x,y, r ), F, C): Input: Fractional solution (x,y, r ), set of cluster centers F, set of all clients C 1: if Any constraint in (5) except the regret constraint set is violate then d 2: return the violated constraint 3: end if p p 4: c ← min c ,c ← max c min i∈F, j∈C max j∈C i∈F ij ij ′ ′ 2 ′ ⌈log ′ c /c ⌉ max min 1+ϵ 5: for Y ∈ {c ,c (1 + ϵ ),c (1 + ϵ ) , . . . c (1 + ϵ ) } do min min min min e−1 6: S ← -maximizer of f subject to |S| ≤ k via Theorem 2.2 y,Y P P P p p P P 7: I ← argmax d c y − d min c j∈C j i∈F ij j∈C j i∈S I∈[0,1] : d c y ≤Y ij ij j i j j∈C i∈F i j f g P P P p p ′ ′ 8: if d c y − d min c > r then 1−1/p j∈C i∈F ij j∈C i∈S j j ij ij pY f g P P P p p 1 ′ ′ 9: return d c y − d min c ≤ r 1−1/p j∈C i∈F ij j∈C i∈S j ij j ij pY 10: end if 11: end for 12: return łFeasiblež Fig. 1. Separation Oracle forℓ -Clustering ′ ′ e−1 ′ frac (C ), and S the -maximizer of f found byℓ -SepOracle. We have for theI found on Line 7 of y,Y p ℓ -SepOracle: f g (i ) 1 e − 1 p p p p ′ ′ ′ ′ ′ frac (I ) − S (I ) ≥ max [frac (I ) − S (I )] ≥ p p p p ′ 1−1/p ′ 1−1/p p(Y ) pe (Y ) S,I:frac (I)≤Y f g e − 1 e − 1 p p p p ′ ′ max [frac (I) − S (I)] ≥ frac (C ) − S (C ) ≥ p p p p ′ 1−1/p p p ′ 1−1/p pe (Y ) pe (Y ) S,I:frac (I)≤frac (C ) p p p p ′ ′ f g frac (C ) − S (C ) e − 1 e − 1 p p ′ ′ = frac (C ) − S (C ) > r . p p ′ p−1 ′ pe (1 + ϵ ) pe (1 + ϵ ) frac (C ) p p ′ ′ In (i), we use the fact that for a ixeSd, max p [frac (I ) − S (I )] is the solution to a fractional p p I:frac (I)≤Y knapsack problem with weight Y , and that decreasing the weight allowed in a fractional knapsack instance can (e−1) only reduce the optimum. For the last inequality to hold, we just need to ϵ cho < ϵose inℓ -SepOracle for pe the desiredϵ. This shows that for Y , ℓ -SepOracle will output a violated inequality as desired. □ 3.3 Universal ℓ -Clustering: Rounding Algorithm At a high level, we use the same strategy for rounding the fractional ℓ -clustering solution as we did with k-median. Namely, we solve a discounted version of the problem where the discount for each client is equal to the (scaled) cost of the client in the fractional solution. However, if we apply this ℓ dir objeectiv ctlye,to the we run into several problems. In particular, the linear discounts are incompatible with the non-linear objective deined over the clients. A more promising idea is to use these discounts ℓ obje on the ctive, which in fact is deined as a linear combination over the individual client’s objectives. But, for this to work, we will irst need to relate the regret bound in the ℓ objective to that in the ℓ objective. This is clearly not true in general, i.e., for all p p realizations. However, we show that the realization that maximizes the regret of analg algorithm against a ixed solution sol in both objectives is the same under the following łfarnessž condition: for every client, either alg’s connection is smaller than sol’s or it is at least p times as large assol’s. Given any solution sol, it is easy to deine ACM Trans. Algor. 16 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi a virtualsolution sol g whose individual connection costs are bounde p times d by that in sol, and sol g satisies the farness condition. This allows us to relate the ralg egret against of sol g (and thus againstp timessol) in theℓ objective to its regret in ℓtheobjective. We irst state the technical lemma relating ℓ and theℓ objectives under the farness condition. Informally, this lemma says that if we want to choose a realization maximizing the alg regr against et of sol in (an approximation of) theℓ -objective, we should always include a client whose distance alg exceto eds their distance sol to by a factor larger than p. This contrasts (and limits) the example given at the beginning of this section, where we showed that including clients whose distance alg eto xceeds the distance tosol by a smaller factor can actually reduce the regret alg of againstsol. In turn, if all clients are closersol to are closer by a factor of p, then the realization that maximizes regretℓin -obje thective is also the realization that maximizes regret in the ℓ -objective. Lemma 3.8. Suppose alg and sol are two solutions to anℓ -clustering instance, such that there is a subset of ∗ ∗ clientsC with the following property: for every clientCin , the connection cost in alg is greater thanp times the connection cost in sol, while for every client not C in, the connection cost in sol is at least the connection cost in alg. Then, C maximizes the following function: p p ′ ′ alg (C )−sol (C )  p p alg (C ) > 0 p−1 ′ ′ alg (C ) f (C ) := p  ′ 0 alg (C ) = 0 q q ′ ′ Proof. Fix any subset of clients C which does not include j. Let alg (C ) be alg’s ℓ -objective cost on this p p q q subset, sol (C ) be sol’s ℓ -objective on this subset, a be alg’s connection cost for j to the pth power, and s be p p sol’s connection cost for j to the pth power. To do this, we analyze a continuous extension f , eof valuated atC plusj in fractional amount x: p p ′ ′ alg (C ) + ax − sol (C ) − sx p p f (C , x ) = . ′ (p−1)/p (alg (C ) + ax ) ′ ′ ′ ′ When x = 0, this is f (C ) (if f (C ) is positive) and when x = 1, this is f (C ∪{j}) (if f (C ∪{j}) is positive). Its derivative with respectx to is: p p ′ ′ alg (C )+ax−sol (C )−sx p a(p−1) p p ′ (p−1)/p (a − s)(alg (C ) + ax ) − · ′ 1/p (alg (C )+ax ) f (C , x ) = . ′ 2(p−1)/p dx (alg (C ) + ax ) Which has the same sign as: p p ′ ′ alg (C ) + ax − sol (C ) − sx a(p − 1) p p (a − s) − · . alg (C ) + ax p p ′ ′ ′ d ′ ˜ ˜ Ifalg (C ) + ax > sol (C ) + sx, i.e.f (C , x ) is positive, thenf (C , x ) is negativeaif≤ s. Consider j j p p dx ′ ∗ ′ ′ any C including a client j not inC . Suppose f (C ) > 0. Then f (C , x ) has a negative derivative on , 1][0 ′ ′ (sincef (C , x ) starts out positive and is increasing x goas es from 1 to 0, i.e. it stays positivfe), (Cso\ {j}) = ′ ′ ′ ′ ′ ′ ˜ ˜ f (C , 0) > f (C , 1) = f (C ), and C cannot be a maximizer of f . If otherwise f (C ) = 0, then C clearly cannot j j be a maximizer of f unlessC is as well. Similarly, observe that: ACM Trans. Algor. Universal Algorithms for Clustering Problems • 17 p p ′ ′ alg (C ) + ax − sol (C ) − sx a(p − 1) a(p − 1) p p a (a − s) − · > (a − s) − = − s. p p p alg (C ) + ax d ′ ∗ ′ ∗ So f (C , x ) is positivae > if ps, which holds forjall ∈ C . Consider anyC not including a client j inC . dx ′ ′ ′ ′ ′ ′ ˜ ˜ ˜ Suppose f (C ) > 0. Then f (C , x ) has a positive derivative,on 1],[0 so f (C ) = f (C , 0) < f (C , 1) = f (C ∪{j}), j j j ′ ′ ′ ∗ and C cannot be a maximizer of f . If otherwisef (C ) = 0, then C clearly cannot be a maximizer f unless of C ′ ∗ ∗ is as well. Since we have shown that eCver,yC cannot be a maximizer of f unlessC is also a maximizer of f , we conclude thatC maximizes f . □ Intuitively, this lemma connects ℓ and the ℓ objectives as this subset of clients C will also be the set that maximizes the ℓ regret ofalg vs sol, and f (C ) is (within a factor p)of equal to theℓ regret. We use this lemma along with the ℓ -clustering with discounts approximation in Lemma 3.4 to design the rounding algorithm for universal ℓ -clustering. As in the rounding algorithm for univ k-meersal dian, let sol denote a (virtual) solution whose connection costs are 3 times that of the fractional solution frac for all clients. The rounding algorithm solves anℓ -clustering with discounts instance, where the discounts aresol 2 times ’s connection costs. (Recall that ink-median, the discount was equalsol to’s connection cost. Now, we need the additional factor of 2 for technical reasons.) Let alg be the solution output by the algorithm of Lemma 3.4 for this problem. We prove the following boundalg for: Lemma 3.9. There exists an algorithm which given(any α, β )-approximate fractional solution frac for ℓ - 1/p clustering, outputs a(54pα, 54pβ + 18p )-approximate integral solution. Proof. Let sol denote a (virtual) solution whose connection costs are 3 times that of the fractional solution frac for all clients. The rounding algorithmℓsolv -clustering es an with discounts instance, where the discounts are 2 timessol’s connection costs. Letalg be the solution output by the algorithm of Lemma 3.4 for this problem. We also consider an additional virtualsol g solution , whose connection costs are deined as follows: For clients j such that alg’s connection cost is greater than 18 times sol’s but less than 18 p timessol’s, we multiply sol’s g g connection costs byp to obtain connection costssol in. For all other clients, the connection cost sol is inthe same g g as that insol. Now, alg and 18·sol satisfy the condition in Lemma 3.8 and ·sol18is a(54pα, 54pβ )-approximation. Our goal in the rest of the proof is to bound the regralg et ofagainst (a constant times) sol g by (a constant times) the minimum regr mr et. Let us denote this regret: f g ′ ′ reg := max alg (C ) − 18· sol (C ) . p p C ⊆C ′ ′ ′ Note that if reg = 0 (it can’t be negative), then for all realizations C , alg (C ) ≤ 18· sol g (C ). In that case, the p p lemma follows immediately. So, we assume reg that > 0. f g ′ ′ Let C = argmax alg (C ) − 18· sol g (C ) , i.e., the realization deining reg that maximizes the regret 1 p p C ⊆C for theℓ objective. We need to relate reg to the regret in theℓ -objective for us to use the approximation guarantees ofℓ -clustering with discounts from Lemma 3.4. Lemma 3.8 gives us this relation, since it tells us that C is exactly the set of clients for which alg’s closest cluster center is at a distance of more than 18 times that ofsol’s closest cluster center. But, this means that C also maximizes the regret forℓthe objective, i.e., ACM Trans. Algor. 18 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi f g p p ′ p ′ C = argmax alg (C ) − 18 · sol g (C ) . Then, we have: 1 p p f g f g p p p p ′ p ′ ′ p ′ max ′ alg (C ) − 18 · sol g (C ) max ′ alg (C ) − 18 · sol g (C ) C ⊆C C ⊆C p p p p reg = ≤ P p−1−j p−1 p−1 j alg (C ) alg (C ) · 18· sol g (C ) 1 1 p 1 j=0 p f g p p ′ p ′ max alg (C ) − 18 · sol (C ) C ⊆C p p ≤ . p−1 alg (C ) p 1 The last inequality follows since connection sol gcosts are atin least those in sol. Note that the numerator in this last expression is exactly the value of the objectiveℓ for -clustering the with discounts problem from Lemma 3.4. Using this lemma, we can now bound the numerator by the optimum for this problem, which in turn is bounded by the objective produced by the minimum regret solution mrs for theℓ -clustering with discounts instance: f g f g p p p p ′ p ′ ′ p ′ ′ ′ max alg (C ) − 18 · sol (C ) max mrs (C ) − 2 · sol (C ) C ⊆C p p 2 C ⊆C p p reg ≤ ≤ · 9 · . (6) p−1 p−1 alg (C ) alg (C ) 1 1 p p p p ′ p ′ First, we bound the numerator in the above expression.CLet:= argmax [mrs (C )− 2 · sol (C )] be the 2 ′ p p C ⊆C realization that maximizes this term. We now relate this mrterm (the irst to step is by factorization and the second step holds because 2· sol = 6· frac exceeds the optimal integer solution by to the upper bound of 3 on the integrality gap [4]): f g p p ′ p ′ max mrs (C ) − 2 · sol (C ) = mrs (C ) − 2· sol (C ) p 2 p 2 p p p−1 p−1−j · mrs (C ) · (2· sol (C )) 2 p 2 j=0 p−1 p−1−j ≤ mr· mrs (C ) · (2· sol (C )) . p 2 p 2 j=0 Using the above bound in Eq. (6), we get: p−1 j p−1−j mrs (C ) · (2· sol (C )) 2 p 2 2 j=0 p reg ≤ · 9 · mr· . (7) p−1 alg (C ) In the rest of proof, we obtain a bound on the last term in(7)Eq. . We consider two cases. Ifalg (C ) ≥ p 1 9p · mrs (C ) (intuitively, the denominator is large compared to the numerator), then: p 2 p−1 j p−1−j p−1 p−1−j mrs (C )2 sol (C ) 2 2 mrs (C ) 1 p p 2 j=0 p −p+1 −p+1 1/p ≤ p · ≤ p · (9p ) = 9 p , p−1 p−1 alg (C ) alg (C ) 1 1 p p p−1 The irst step uses the fact thatmrs (C ) ≥ 2sol (C ), so the largest term in the summrs is (C ). Combined p 2 p 2 2 1/p with Eq. (7) this reg ≤ 6p · mr, giving the lemma statement. Ifalg (C ) < 9p · mrs (C ), then we cannot hope to meaningfully bound(7)Eq . In this case, howeverreg , p 1 p 2 is also bounded by mrs (C ), which we will eventually bound mr. Mor by e formally, by our assumption that p 2 p p reg > 0, mrs (C ) − 2 · sol (C ) > 0 and we have 2sol (C ) ≤ mrs (C ) ≤ sol (C ) + mr. The irst inequality 2 2 p 2 p 2 p 2 p p p p is by our assumption that mrs (C ) − 2 · sol (C ) > 0, the second inequality is by deinition mrs, mr ofand 2 2 p p ACM Trans. Algor. Universal Algorithms for Clustering Problems • 19 the fact thatsol(C ) upper bounds opt(C ). In turn, sol (C ) ≤ mr which gives mrs (C ) ≤ 2mr and thus 2 2 p 2 p 2 1/p 1/e reg ≤ alg (C ) ≤ 18p · mr. We note that for all p ≥ 1, p ≤ e ≤ 1.5. □ p 1 We note that the inal step requires using discounts equal tosol twice ’s connection costs instead of just sol’s connection costs. If we did the latter, we would have started with the ine sol quality (C ) ≤ mrs (C ) ≤ p 2 p 2 sol (C ) + mr instead, which does not give us any useful bound solon (C ) or mrs(C ) in terms of just mr. We p 2 2 2 also note that we chose not to optimize the constants in the inal result of Lemma 3.9 in favor of simplifying the presentation. Theorem 3.1 now follows by using the values (α, βof ) from Lemma 3.7 (and a suiciently small choice of the error parameter ϵ) in the statement of Lemma 3.9 above. 4 UNIVERSAL k-CENTER In the previous section, we gave universal algorithms forℓgeneral -clustering problems. Recall that k-center the objective, deined as the maximum distance of a client from its closest cluster center, can also be interpreted as the ℓ -objective in the ℓ -clustering framework. Moreover, it is well known thatnfor -dimensional any vector, ∞ p itsℓ and ℓ norms difer only a constant factor (see Fact A.1 in Appendix A). Therefore, choposing = logn logn ∞ in Theorem 3.3 gives poly-logarithmic approximation bounds for thek-center universal problem. In this section, we give direct techniques that improve these bounds to constants: Theorem 4.1. There exists a(3, 3)-approximate algorithm for the universal k-center problem. Recall that F is the set of all cluster centers, min so c gives the smallest distance from client j to any cluster i∈F ij center that can be opened. In turn, for every client j, its distance to the closest cluster center in the minimum regret solution mrs, min c , must be at most mr := min c + mr. This is because in the realization where i∈mrs ij j i∈F ij onlyj appears, the optimal solution hasmin cost c , and the cost ofmrs is justmin c . So, we design an i∈F ij i∈mrs ij algorithm alg that 3-approximates these distances mr , i.e., for every client j, its distance to the closest cluster center inalg is at most mr 3 . Indeed, this algorithm satisies a more general property:any giv value en r, it produces a set of cluster centers alg such that every client j is at a distance≤ 3r from its closest cluster center (that is,min c ≤ 3r ), where r := min c + r. Moreover, if r ≥ mr, then the number of cluster centers i∈alg ij j j i∈F ij selected byalg is at mostk (for smaller values r, of alg might select more than k cluster centers). Our algorithm alg is a natural greedy algorithm. We order clients j in increasing orderr of , and if a client j does not have a cluster center within distance r in 3 the current solution, then we add its closest cluster center in F to the solution. Lemma 4.2. Given a valuer, the greedy algorithmalg selects cluster centers that satisfy the following properties: • every clientj is within a distance 3r of= 3(min c + r ) from their closest cluster center. j i∈F ij • If r ≥ mr, then alg does not select more than k cluster centers, i.e., the solution produced by alg is feasible for the k-center problem. Proof. The irst property follows from the deinition alg. of To show that alg does not pick more thank cluster centers, we map the cluster center i added inalg by some client j to its closest cluster center i inmrs. Now, we claim that no two cluster centers i , i inalg can be mapped 1 2 to the same cluster centeri inmrs. Clearly, this proves the lemma since mrs has onlyk cluster centers. Suppose i , i are two cluster centers in alg mapped to the same cluster centeri inmrs. Assume without loss 1 2 of generality that i was added to alg beforei . Let j , j be the clients that causeid, i to be added; sincei was 1 2 1 2 1 2 2 added later, we haver ≤ r . The distance from j to i is at most the length of the path (j , i , j , i ) (see Fig. 2), j j 2 1 2 1 1 1 2 which is at mostr 2+ r ≤ 3r . But, in this case j would not have added a new cluster center i , thus arriving j j j 2 2 2 1 2 at a contradiction. □ ACM Trans. Algor. 20 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Fig. 2. Two clients j , j that are distance at most r , r respectively from the same cluster centeri in mrs cannot cause alg 1 2 j j 1 2 to add two diferent cluster centers i , i . 1 2 We now use the above lemma to prove Theorem 4.1. ′ ′ Proof of Theorem 4.1. In any realization C ⊆ C, the optimal value ofkthe -center objective is opt(C ) ≥ max min c , whereas the solution produced by the algorithm alg given above has objective value at most j∈C i∈F ij ′ ′ 3(max min c + r ). So, alg’s solution costs at most· 3opt(C ) + 3· r for all realizations C ⊆ C. So, if we j∈C i∈F ij were able to chooser = mr, we would prove the theorem. But, we do not know the value mr of (in fact, this is NP-hard). Instead, we increase the value rofcontinuously until alg produces a solution with at most k clients. By Lemma 4.2, we are guaranteed that this will happen forrsome ≤ mr, which then proves the theorem. Our inal observation is that this algorithm can be implemented in polynomial time, since there are only polynomially many possibilities k-center for obje the ctive across all realizations (namely, the set of all cluster center to client distances) and thus polynomially many possiblemrvalues (the setfor of all diferences between all possible solution costs). So, we only neealg d tofor runthese values of r in increasing order. □ We note that the greedy algorithm described above can be viewed as an extensionkof -center the algorithm in [33] to a (3, 3)-approximation for the k-center ł with discountsž problem, where the discounts are the minimum distances min c . i∈F ij 5 UNIVERSAL k-MEDIAN WITH FIXED CLIENTS In this section, we extend the techniques from Section 2 to prove the following theorem: Theorem 5.1. If there exists a deterministic polynomial γ -appr timeoximation algorithm forkthe -median problem, then for every ϵ > 0 there exists a(54γ + ϵ, 60)-approximate universal algorithm for the univkersal -median problem with ixed clients. By using the derandomized version of(2the .732 + ϵ )-approximation algorithm of Li and Svensson 44] for[ the k-median problem, and appropriate choice ofϵbparameters, oth we obtain the following corollary from Theorem 5.1. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 21 Corollary 5.2. For every ϵ > 0, there exists a(148 + ϵ, 60)-approximate universal algorithm forkthe -median problem with ixed clients. Our high level strategy comprises two steps. In Section 5.2, we show how to ind a good fractional solution by approximately solving a linear program. In Section 5.3, we then round the fractional solution in a manner that preserves its regret guarantee within constant factors. As discussed in Section 1.1, for simplicity our algorithm’s description and analysis will avoid the notion of demands and instead equivalently view the input as specifying a set of ixed and unixed clients, of which multiple might exist at the same location. 5.1 Preliminaries In addition to the preliminaries of Section 2, we will use the following tools: Submodular Maximization over Independence Systems. An independence systemcomprises a ground set E and a set of subsets (calle indep d endent sets ) I ⊆ 2 with the property that if A ⊆ B and B ∈ I then A ∈ I (the subset closed property). An independent setS inI ismaximalif there does not exist S ⊃ S such that ′ ′ S ∈ I. Note that one can deine an independence system by specifying the set of maximal independent I sets only, since the subset closed property implies I is simply all subsets ofIsets . An inindependence system is a 1-independence system(or 1-system in short) if all maximal independent sets are of the same size. The following result on maximizing submodular functions over 1-independence systems follows from a more general result given implicitly in [48] and more formally in [14]. Theorem 5.3. There exists a polynomial time algorithm that giv 1-indep en a endence system(E,I) and a non- E + ′ negative monotone submodular function f : 2 → R deined over it, inds a-maximizer of f , i.e. inds S ∈ I such that f (S ) ≥ max f (S). S∈I The algorithm in the above theorem is the natural greedy algorithm, which starts S = ∅with and repeatedly ′ ′ ′ adds to S the elementu that maximizes f (S ∪{u}) while maintaining S that ∪{u} is inI, until no such addition is possible. Incremental ℓ -Clustering. We will also useincr theementalℓ -clusteringproblem which is deined as follows: p p Given anℓ -clustering instance and a subset of the cluster centers S (the eł xistingž cluster centers), ind the minimum cost solution toℓ the -clustering instance with the additional constraint that the solution must contain all cluster centersS.in When S = ∅, this is just the standarℓd-clustering problem, and this problem is equivalent to the standard ℓ -clustering problem by the following lemma: Lemma 5.4. If there exists aγ -approximation algorithm forℓthe -clustering problem, there existsγa-approximation for the incrementalℓ -clustering problem. Proof of Lemma 5.4. The γ -approximation for incremental ℓ -clustering is as follows: Given an instance I of incremental ℓ -clustering with clients C and existing cluster centers S, create a ℓ -clustering instance I which p p has the same cluster centers and clients as ℓthe -clustering instance except that at the location of every cluster 1/p center inS, we add a client with demand γ|C| max c + 1. i, j ij Let T be the solution that is a supersetS of of size k that achieves the lowest cost of all such supersets in instanceI. Let T be the output of runningγa-approximation algorithm ℓ for -clustering on I . Then we wish to show T is a superset of S and has cost at most γ times the cost of T in instance I. Any solution that buys all cluster centers S hasin the same cost inI and I . Then we claim it suices to show that T is a superset of S. IfT is a superset of S, then since bothT and T are supersets ofS and sinceT is a ′ ′ ∗ ′ γ -approximation in instance I , its cost in I is at mostγ times the cost of T inI . This in turn implies T has cost at most γ times the cost of T inI, giving the Lemma. ACM Trans. Algor. 22 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Assume without loss of generality no two cluster centers are distance 0 away from each other. To sho T w that is a superset of S, note that in instance I any solution that does not buy a superset Sofis thus at least distance 1 1/p from the location of some cluster center S and inthus pays cost at leastγ|C| max c + 1 due to one of the i, j ij added clients. On the other hand, any solution that is a superset S isof distance 0 from all the added clients and 1/p thus only has to pay connection costs on clients C, which in in turn means it has cost at most |C| max c . i, j ij 1/p SinceT is the output ofγa-approximation algorithm, T thus has cost at most γ|C| max c , which meansT i, j ij must be a superset ofS. □ 5.2 Obtaining a Fractional Solution for Universalk-Median with Fixed Clients ′ ′ ′ Let C ⊆ C denote the set of ixed clients and for any realization of C clients satisfying C ⊆ C ⊆ C, letopt(C ) f f denote the cost of the optimal solution C .for The universal k-median LP is given by: minr (r denotes maximum regret across all demand realizations) s.t. x ≤ k (x = 1 if we open cluster center i) i i i∈F ∀i ∈ F , j ∈ C : y ≤ x (y = 1 if cluster center i is serving client j) ij i ij ∀j ∈ C : y ≥ 1 ij i∈F X X ∀C ⊆ C ⊆ C : c y − opt(C ) ≤ r (8) f ij ij j∈C i∈F ∀i ∈ F , j ∈ C : x ,y ∈ [0, 1] i ij Note that Eq. (8) and the objective function distinguish this LP from thekstandar -median d LP. We call(8) Eq. the regret constraint set . For a ixed fractional solution x,y, r, our goal is to approximately separate the regret constraint set, since all other constraints can be separated exactly. In the rest of this subsection, we describe our approximate separation oracle and give its analysis. ′ ′ ′ Let S (C ) denote the cost of the solution S ⊆ F in realization C (that is,S (C ) = ′ min c ). Since j∈C i∈S ij ′ ′ opt(C ) = min S (C ), separating the regret constraint set exactly is equivalent to deciding if the S:S⊆F,|S|=k following holds:   X X     ∀S : S ⊆ F ,|S| = k : max  c y − S (C ) ≤ r . (9) ij ij ′ ′   C :C ⊆C ⊆C  ′  j∈C i∈F   P P ′ ′ By splitting the terms ′ c y and S (C ) into terms for C and C \ C , we can rewrite Eq.(9) as follows: j∈C i∈F ij ij f f ACM Trans. Algor. Universal Algorithms for Clustering Problems • 23 X X max c y − S (C ) ≤ r ij ij C ⊆C ⊆C,S⊆F,|S|=k j∈C i∈F   X X     ∀S ⊆ F ,|S| = k : max c y − S (C ) ≤ r ij ij   C ⊆C ⊆C   j∈C i∈F    X X X X    ∀S ⊆ F ,|S| = k : max  c y + c y − S (C \ C ) − S (C ) ≤ r ij ij ij ij f f ′   C ⊆C ⊆C ′   j∈C \C i∈F j∈C i∈F     X X X X     ∀S ⊆ F ,|S| = k : max  c y − S (C \ C ) ≤ S (C ) − c y + r ij ij f f ij ij ′   C ⊆C ⊆C  ′  j∈C \C i∈F j∈C i∈F f f      X X  X X   ∀S ⊆ F ,|S| = k : max  c y − S (C ) ≤ S (C ) − c y + r ij ij f ij ij ∗   C ⊆C\C  ∗  j∈C i∈F j∈C i∈F   For fractional solution y, let   X X     f (S) = max  c y − S (C ) . (10) y ij ij ∗ ∗   C :C ⊆C\C  ∗  j∈C i∈F   Note that we can compute f (S) for anyS easily since the maximizing value C is the of set of clients j for which S has connection cost less than c y . We already knowf (S) is not submodular. But, the term S (C ) is ij ij y i∈F f not ixed with respect to S, so maximizing f (S) is not enough to separate Eq.(8). To overcome this diiculty, for every possible cost M on the ixed clients, we replace S (C ) withM and only maximize over solutions S for whichS (C ) ≤ M (for convenience, we will call any Ssolution for which S (C ) ≤ M an M-cheap solution): f f ( ) X X ∀M ∈ 0, 1, . . . ,|C | maxc : max f (S) ≤ M − c y + r . (11) f ij y ij ij i, j S:S⊆F,|S|=k,S (C )≤M j∈C i∈F Note that this set of inequalities is equivalent (8), but to Eq. it has the advantage that the left-hand side is approximately maximizable and the right-hand side is ixed. Hence, these inequalities can be approximately separated. However, there are exponentially many inequalities; so, forϵany > 0, ixe wedrelax to the following polynomially large set of inequalities: ( ) ⌈log (|C | max c )⌉+1 f i, j i j 1+ϵ ∀M ∈ 0, 1, 1 + ϵ, . . . , (1 + ϵ ) : X X max f (S) ≤ M − c y + r . (12) y ij ij S:S⊆F,|S|=k,S (C )≤M j∈C i∈F Separating inequality(12) Eq.for a ixedM corresponds to submodular maximization f (of S), but now subject to the constraints|S| = k and S (C ) ≤ M as opposed to just |S| = k. Let S be the set of all S ⊆ F such that f M |S| = k and S (C ) ≤ M. Sincef (S) is monotone, maximizing f (S) over S is equivalent to maximizing f (S) f y y M y over the independence system(F ,I ) with maximal independentSsets . M M Then all that is needed to approximately separate (12)Eq. corresponding to a ixeM d is an oracle for deciding ′ ′ ′ membership in (F ,I ). Recall that S ⊆ F is in (F ,I ) if there exists a Sset⊇ S such that|S | = k and S (C ) ≤ M. M M But, even deciding membership of the empty set (Fin ,I ) requires one to solveka-median instance on the ixed clients, which is in general NP-hard. More generally, we are required to solve an instance of the incremental k-median problem (see Section 5.1) with existing cluster centers S. in ACM Trans. Algor. 24 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi While exactly solving incremental k-median is NP-hard, we have a constant approximation algorithm for it (callAit ), by Lemma 5.4. So, we could deine a new system (F ,I ) that contains a setS ⊆ F if the output of A for the incremental k-median instance with existing clusterScenters has cost at most M. But, (F ,I ) may no longer be a 1-system, or even an independence system. To restore the subset closed property, the membership ′ ′ oracle needs to ensure that: (a) if a subset S ⊆ S is determined to not be (in F ,I ), then S is not either, and (b) ′ ′ if a superset S ⊇ S is determined to be (in F ,I ), then so isS. GreedyMax((x,y, r ), F, C , C, M, T , f ): f 0 Input: Fractional solution (x,y, r ), set of cluster centers F, set of ixed clients C , set of all clients C, valueM, F + M-cheap solution T , submodular objectivfe: 2 → R 1: S ← ∅ 2: F ← F 3: for l from 1 tok do 4: for Each cluster centeri inF \ S do l−1 l−1 5: if S ∪ {i} ⊆ T and T isM-cheap then l−1 0 0 6: T ← T l,i 7: else ′ ′ ′ ′ ′ ′ 8: if For some l , i , S ∪ {i} ⊆ T and T isM-cheap then l−1 l ,i l ,i ′ ′ 9: T ← T l,i l ,i 10: else 11: T ← Output ofγ -approximation algorithm on incremental ℓ -clustering l,i p instance with cluster centers F , existing cluster centers S ∪ {i} l−1 l−1 and clients C . 12: end if 13: end if 14: end for 15: F ← F l l−1 16: for Each cluster centeri inF \ S do l l−1 17: if i does not appear in anyT ′ that isM-cheap then l,i 18: F ← F \ {i} l l 19: end if 20: end for 21: S ← S ∪ {argmax f (S ∪ {i})} l l−1 l−1 i∈F \S l l−1 22: end for 23: return S Fig. 3. Modified Greedy Submodular Maximization Algorithm We now give the modiied greedy maximization algorithm GreedyMax that we use to try to separate one of the inequalities in (12),Eq. which uses a built-in membership oracle that ensures the above properties hold. Pseudocode is given in Figure 3, and we informally describ Greed e ityMax here. initializes S = ∅, F = F, and 0 0 starts with aM-cheap k-median solution T (generated by running aγ -approximation on the k-median instance involving only ixed clients C ). In iteration l, GreedyMax starts with a partial solution S withl − 1 cluster f l−1 centers, and it is considering adding cluster centers F toin S . For each cluster centeri inF , GreedyMax l−1 l−1 l−1 generates some k-median solution T containing S ∪ {i} to determine if S ∪ {i} is in the independence l,i l−1 l−1 ACM Trans. Algor. Universal Algorithms for Clustering Problems • 25 ′ ′ system. If a previously generated solution, T or T ′ ′ for anyl , i , containsS ∪ {i} and isM-cheap, then T 0 l ,i l−1 l,i is set to this solution. Otherwise Greed, yMax runs the incremental k-median approximation algorithm on the instance with existing cluster centers S in ∪ {i}, the only cluster centers in the instanceF are, and the client l−1 l−1 set isC . It sets T to the solution generated by the approximation algorithm. f l,i After generating the set of solutions {T } , if one of these solutions contains S ∪ {i} and isM-cheap, i∈F l,i l−1 l−1 then GreedyMax concludes thatS ∪{i} is in the independence system. This, combined with the fact that these l−1 solutions may be copied from previous iterations ensures property (b) holds M-cheap (as thesolutions generated by GreedyMax are implicitly considered to be in the independence system). Otherwise Greed , since yMax was unable to ind an M-cheap superset ofS ∪ {i}, it considers S ∪ {i} to not be in the independence system. In l−1 l−1 accordance with these beliefs, GreedyMax initializes F as a copy ofF , and then removes any i such that it l l−1 did not ind an M-cheap superset ofS ∪ {i} fromF and thus from future consideration, ensuring property l−1 l (a) holds. It then greedily addsS to the i inF that maximizes f (S ∪ {i}) as deined before to create a new l−1 l y l−1 partial solution S . After thekth iteration, GreedyMax outputs the solution S . l k SepOracle((x,y, r ), F, C , C): Input: A fractional solution x,y, r, set of cluster centers F, set of ixed clients C , set of all clients C 1: if Any constraint in the universal k-median LP except the regret constraint set is violate thendreturn the violated constraint 2: end if 3: T ← output ofγ -approximation algorithm A fork-median run on instance with cluster centers F, clients C 2 ⌈log (γ |C | max c )⌉+1 i, j i j 1+ϵ f 4: for M ∈ {0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } such that T isM-cheap do 5: S ← GreedyMax((x,y, r ), F, C , C, M, T , f ) f 0 y P P ′ ′ ∗ 6: C ← argmax ∗ [ ∗ c y − S (C )] ij ij j∈C i∈F C ⊆C\C P P P P 7: if ′ c y − S (C ) > M − c y + r then j∈C i∈F ij ij j∈C i∈F ij ij P P P P 8: return ′ c y − S (C ) ≤ M − c y + r ij ij ij ij j∈C i∈F j∈C i∈F 9: end if 10: end for 11: return łFeasiblež Fig. 4. Approximate Separation Oracle for Universalk-Median Our approximate separation oracle SepOra , cle, can then use GreedyMax as a subroutine. Pseudocode is given in Figure 4, and we give an informal description of the algorithm SepOracle hereche . cks all constraints except the regret constraint set, and then outputs any violated constraints it inds. If none are found, it then k-me runs dian a approximation algorithm on the instance containing only the ixed clients to generate T . For a solution each M that is 0 or a power of+1ϵ (as in Eq.(12)), if T isM-cheap, it then invokes GreedyMax for this valueMof(otherwise, GreedyMax will consider the corresponding independence system to be empty, so there is no point in running P P P P it), passing T to GreedyMax. If then checks the inequality ′ c y − S (C ) ≤ M − c y + r for 0 ij ij ij ij j∈C i j∈C i the solution S outputted by GreedyMax, and outputs this inequality if it is violated. This completes the intuition behind and description of the separation oracle. We now move on to its analysis. First, we show thatGreedyMax always inds a valid solution. Lemma 5.5. GreedyMax always outputs a set S of sizek when called by SepOracle. ACM Trans. Algor. 26 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Proof. Note that GreedyMax is only invokeTd if isM-cheap. This implies some T isM-cheap since some 0 1,i T will be initialize T . Then, d toit suices to show that in the lth iteration, there is some i that can be added to 1,i 0 S . If this is true, it implies S is of size k sincek elements are added across all k iterations. l k This is true in iteration 1 becauseTsome isM-cheap and thus any element of T is in F and can be added. 1,i 1,i 1 Assume this is inductively true in iteration l, i.ei. is added toS in iteration l because i is in some M-cheap T . l l,i ′′ ′ ′ ′ SinceT isM-cheap, no element of T is deleted from F . Then in iteration l + 1, for all i inT \ S (a set of l,i l,i l l,i l ′′ ′′ ′ sizek − l, i.e. non-empty),T can be initialize T d.to Then all such i can be added to S because all such l+1,i l,i l+1 ′′ ′′ i satisfy that T isM-cheap and thus are inF . By induction, in all iterations, therei that is some can be l+1,i l+1 added to S , giving the Lemma. □ Then, the following lemma assertsGreed that yMax is indeed performing greedy submodular maximization over some 1-system. Lemma 5.6. Fix any run ofGreedyMax. Consider the values ofS ,T , F for all l, i (deined as in Figure 3) at l l,i l the end of this run. LetB be the set containing S ∪ {i} for each l, i < F , i < S . Let (F ,S) be the independence l−1 l l−1 system for which the set of maximal independent sets S consists of all size k subsets S of F such that no subset of max S is inB, and S isM-cheap. Then the following properties hold: (1) For any l and any i < F , S ∪ {i} is not inS l l−1 (2) For any l and any i ∈ F , S ∪ {i} is inS l l−1 (3) (F ,S) is a 1-system Proof. Property 1: This property is immediate from the deinition B and S of. Property 2: Fix anyl, i ∈ F . We want to show that S ∪ {i} is inS. Sincei ∈ F , there exists someT ′ such l l−1 l l,i that S ∪ {i} is a subset of T ′ and T ′ isM-cheap (otherwisei, would have been deleted from F ). If we can l−1 l,i l,i l show T ′ is inS then we immediately get that S ∪ {i} is inS. l,i max l−1 ′′ Suppose not. SinceT ′ isM-cheap, this must be because some subset of T ′ is of the form S ′ ∪ {i } for l,i l,i l −1 ′′ ′′ ′ ′ i < F ′, i < S ′ . In particular, consider the smallestlvalue for which this is true, i.e l .blet e the iteration in l l −1 ′′ whichi was deleted from F ′. ′ ′′ ′′ Ifl < l, sincei was deleted from F ′, i cannot appear in anyM-cheap solution containing S ′ generated l l −1 by the incremental k-median approximation algorithm before the end of iteration l (otherwiseT, ′ ′′ could be l ,i ′′ ′′ initialized to this solution, pr i evfrenting om being deleted). Since i is not in F ′ (and thus not inF ′ . . . F ), l l +1 l ′ ′′ in iterations l + 1 to l the approximation algorithm is not allowei d.to Souse no M-cheap solution is ever ′′ ′ ′ generated by the approximation algorithm which is a supSerset∪of{i }. But T is aM-cheap superset of l −1 l,i ′′ S ∪ {i } which must have been generated by the approximation algorithm at some point, a contradiction. l −1 ′ ′′ ′ ′ ′ Thus we can assume l ≥ l. However, recall that T is anM-cheap solution containSing∪{i }. Ifl = l, this l,i l ′′ ′ ′ ′ ′′ prevents i from being deleted in iteration l , giving a contradiction. l >Ifl then T can be initialized to a l ,i ′′ ′′ ′ ′ M-cheap superset ofS ∪ {i }, sinceT is such a superset. This also prevents i from being deleted in iteration l l,i ′′ l , giving a contradiction. ′ ′ In all cases assuming T is not inS leads to a contradiction,Tso is inS and thus S ∪ {i} is inS. l,i max l,i max l−1 Property 3: S is deined so that all maximal independent sets arke,of giving size the property. □ Corollary 5.7. Any run of GreedyMax outputs an M-cheap, -maximizer fof(S) over the system (F ,S) as deined in Lemma 5.6. Proof. Properties 1 and 2 in Lemma 5.6 imply that at each Greed step, yMax adds the element to its current solution that maximizes the objectiv f (S)ewhile maintaining that the current solution S. Thus is in GreedyMax is exactly the greedy algorithm in Theorem 5.3 for maximizing a monotone submodular objective over an independence system. By Lemma 5.5,GreedyMax always inds a maximal independent set, and the deinition of ACM Trans. Algor. Universal Algorithms for Clustering Problems • 27 S guarantees that this maximal independent M set-cheap is . Lemma 2.4 gives thatf (S) is a monotone submodular function of S. Then, Property 3 combined with Theorem 5.3 implies the solution output Greedby yMax is a -maximizer. □ Of course, maximizing over an arbitrary 1-system is of little use. In particular, we would like to show that the 1-system Lemma 5.6 shows we are maximizing over approximates the 1-system of subsets of solutions whose cost on the ixed clients is at most M. The next lemma shows that while all such solutions may not be in this 1-system, all solutions that ar -cheap e are. Lemma 5.8. In any run of GreedyMax, let S be deined as in Lemma 5.6. For the value M passed to GreedyMax in this run and any solution S which is-cheap, S ∈ S. Proof. Fix any suchS. Let B be deined as in Lemma 5.6. For any element B ofB, it must be the case that running aγ -approximation on the incremental k-median instance with existing clusterBcenters produced a solution with cost greater than M. This implies that forB any inB, the incremental k-median instance with M ′ existing cluster centers B has optimal solution with cost greater than . However, for any subsetS ofS, the ′ M optimal solution to the incremental k-median instance with existing clusterScenters has cost at most sinceS is a feasible solution to this instance which -cheap.is Thus no subset ofS is inB, and hence S is inS. □ Lastly, we show thatSepOracle never incorrectly outputs that a point is infeasible, i.e., that the region SepOracle considers feasible strictly contains the region that is actually feasiblek-me in dian the univ LP. ersal Lemma 5.9. If x,y, r is feasible for the universal k-median LPSepOra , cle outputs łFeasible.ž Proof. SepOracle can exactly check all constraints besides the regret constraint set, so assume that if P P P P SepOracle outputs that x,y, r is not feasible, it outputs that′ c y −S (C ) ≤ M− c y +r j∈C i∈F ij ij j∈C i∈F ij ij is violated for some M, S. In particular, it only outputs that this constraint is violated if it actually is violated. If this constraint is violated, then since by Corollar S isMy-cheap: 5.7 X X X X c y − S (C ) > M − c y + r ij ij ij ij j∈C i∈F j∈C i∈F X X X X c y + c y > S (C ) + M + r ij ij ij ij j∈C i∈F j∈C i∈F X X c y > S (C ) + M + r ij ij j∈C ∪C i∈F ′ ′ ′ ≥ S (C ) + S (C ) + r = S (C ∪ C ) + r ≥ opt(C ∪ C ) + r f f f Which implies the pxoint ,y, r is not feasible for the univ k-me ersal dian LP. □ We now have all the tools to prove our overall claim: Lemma 5.10. If there exists a deterministic polynomial-time γ -approximation algorithm forkthe -median problem, then for every ϵ > 0 there exists a deterministic algorithm that outputs (2γ (1a+ ϵ ), 2)-approximate fractional solution to the universal k-median problem in polynomial time. Proof. We use the ellipsoid method wher SepOra e cle is used as the separation oracle. By Lemma 5.9 since the minimum regret solution is a feasible solution to the k-me univ dian ersal LP, it is also considered feasible by ∗ ∗ ∗ ∗ SepOracle. Then, the solution x ,y , r output by the ellipsoid method satisies r ≤ mr. ACM Trans. Algor. 28 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi ∗ ∗ ∗ ∗ ∗ Suppose the ellipsoid method outputs x ,y , r such that x ,y are not a (2γ (1 + ϵ ), 2)-approximate solution. This means there exists S,C ⊆ C ⊆ C such that: X X ∗ ′ c y >2γ (1 + ϵ )S (C ) + 2· mr ij ij j∈C i∈F X X ∗ ′ ′ c y − S (C \ C ) >2γ (1 + ϵ )S (C ) + (2γ (1 + ϵ ) − 1)S (C \ C ) ij f f f ij j∈C \C i∈F X X − c y + 2· mr ij ij j∈C i∈F   X X     ≥2 γ (1 + ϵ )S (C ) − c y + mr . f ij ij     j∈C i∈F   2 ⌈log (γ |C | max c )⌉+1 f i, j i j 1+ϵ Thus, for the value of M in the set{0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } contained in the interval [γS (C ),γ (1 + ϵ )S (C )], we have f f     X X  X X   X X      ∗ ′ ∗ ∗ ∗ c y − S (C \ C ) ≥ 2 M − c y + mr ≥ 2 M − c y + r  . ij f ij ij ij ij ij     ′     j∈C \C i∈F j∈C i∈F j∈C i∈F f f f     The last inequality follows r ≤ since mr. Then, consider the iterationSepOra in cle where it runsGreedyMax for this valueMof . SinceM ≥ γS (C ), S is -cheap. Thus by Lemma 5.8, S is part of the independence system S speciied in Lemma 5.6 which GreedyMax inds a maximizer for in this iteration, and thus the maximum of P P ∗ ∗ the objective in this independence system is at least M − 2[ c y + r ]. By Corollary 5.7, SepOracle ij j∈C i f ij P P ′ ′′ ′ ∗ ′ ′′ thus inds someS ,C ⊆ C \ C such that S isM-cheap and for which ′′ c y − S (C ) is at least f ij j∈C i ij P P ∗ ∗ ∗ ∗ ∗ M − c y + r . But this meansSepOracle will output that x ,y , r is infeasible, which means the ij j∈C i f ij ellipsoid algorithm cannot output this solution, a contradiction. □ 5.3 Rounding the Fractional Solution for Universalk-Median with Fixed Clients Proof of Theorem 5.1. The algorithm is as follows: Use the algorithm of Lemma 5.10 with error parameter ϵ ϵ to ind a(2γ (1 + ), 2)-approximate fractional solution. f bLet e the connection cost of this fractional 54γ 54γ solution for client j. Construct a k-median with discounts instance with the same clients C and cluster centers F where client j has discount 0 if it was originally a ixed client, and f if discount it was3originally a unixed client. The solution to this instance given by Lemma 2.3 is the solution forkthe -meuniv dianersal instance. Again using the integrality gap upper bound of k-me 3 for dian, we have:   X X X   ′ ′  ′  + mr = max[mrs(C ) − opt(C )] ≥ max mrs(C ) − 3 f  = (m − 3f ) + (m − 3f ) . (13) j j j j j ′ ′   C C  ′  j∈C j∈C j∈C\C f f   The cost of the minimum regret solution in k-me the dian with discounts instance is given by: X X X X X X + + m + (m − 3f ) = 3f + (m − 3f ) + (m − 3f ) ≤ 3f + mr, by Eq. (13). (14) j j j j j j j j j j∈C j∈C j∈C j∈C j∈C\C j∈C\C f f f f f f ACM Trans. Algor. Universal Algorithms for Clustering Problems • 29 Let c be the connection cost of the algorithm’s solution for j. client Lemma 2.3 and Eq. (14) give:   X X X     c + (c − 9· 3f ) ≤ 6  3f + mr j j j j     j∈C j∈C\C j∈C f f f   X X X c + (c − 27f ) ≤ 18f + 6· mr j j j j (15) j∈C j∈C j∈C\C f f f   X X X X     +   =⇒ max c − 27 f = (c − 27f ) + (c − 27f ) ≤ 6· mr. j j j j j j ′    ′ ′  j∈C j∈C j∈C j∈C\C   Lemma 5.10 then gives that for any valid C : ′ ′ frac(C ) = f ≤ 2γ 1 + · opt(C ) + 2· mr. (16) 54γ j∈C Using Eq. (15) and (16), we can then conclude that X X ′ ′ ∀C ⊆ C ⊆ C : c ≤ 27 f + 6· mr ≤ (54γ + ϵ ) · opt(C ) + 60· mr. □ f j j ′ ′ j∈C j∈C 6 UNIVERSAL ℓ -CLUSTERING WITH FIXED CLIENTS In this section, we give the following theorem: Theorem 6.1. For all p ≥ 1, if there existsγa-approximation for ℓ -clustering, then for allϵ > 0 there exists a 1/p 2 1/p (54pγ · 2 + ϵ, 108p + 6p + ϵ )-approximate universal algorithmℓfor -clustering with ixed clients. In particular, we get from known results [1, 30]: 2 1/p 2 1/p • A (162p · 2 + ϵ, 108p + 18p + ϵ )-approximate universal algorithm ℓ -clustering for with ixed clients for all ϵ > 0, p ≥ 1. • A (459, 458)-approximate universal algorithm k-means for with ixed clients. The algorithm for universal ℓ -clustering with ixed clients follows by combining techniques ℓ -clustering from p p and k-median with ixed clients. 6.1 Finding a Fractional Solution We reuse the subroutineGreedyMax to do submodular maximization over an independence system whose bases are M-cheap solutions (that is, solutionsℓ with -objective at mostM on only the ixed clients), and use the submodular function f with varying choicesYof as we did for ℓ -clustering. We can extend Lemma 3.5 as y,Y p follows: C C\C f f Lemma 6.2. For any two solutionsy, S, if the global maximum fra of c (I)− S (I) over 1 × [0, 1] is positive, p p C C\C f f then there is a maximizer that is 1 in× {0, 1} , i.e. f g f g ′ ′ max frac (I) − S (I) = max frac (C ) − S (C ) . p p p p C C\C C ⊆C ⊆C f f s I∈1 ×[0,1] : The proof follows exactly the same way as Lemma 3.5. In that proof, the property we use of having no ixed clients is that if the global maximum is not the all zeroes vector, then it isfra positiv c (I) >eSand (I).soIn the p p statement of Lemma 6.2, we just assume positivity instead. This shows that it is still ine to output separating hyperplanes based on fractional realizations of clients in the presence of ixed clients. The only time it is maybe ACM Trans. Algor. 30 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi not ine is in a fractional realization where if the fra łregr c is etž of negative, but in this case we will not output a separating hyperplane anyway. Fixed-ℓ -SepOracle((x,y, r ), F, C , C): p f Input: A fractional solution x,y, r, set of cluster centers F, set of ixed clients C , set of all clients C 1: if Any constraint in the universal ℓ -clustering LP except the regret constraint set is violate then rdeturn the violated constraint 2: end if 3: T ← output ofγ -approximation algorithm A forℓ -clustering run on instance with cluster centers F, 0 p clients C p p 4: c ← min c ,c ← max c min i∈F, j∈C\C max j∈C\C i∈F ij f ij P P 5: Y ← c y ij f j∈C i∈F f ij 1/p 2 ⌈log (γ |C | max c )⌉+1 f i, j i j 1+ϵ 6: for M ∈ {0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } such that T isM-cheap do ′ ′ 2 ′ ⌈log ′ c /c ⌉ max min 1+ϵ 7: for Y ∈ {0,c ,c (1 + ϵ ),c (1 + ϵ ) , . . . c (1 + ϵ ) } do min min min min 8: S ← GreedyMax((x,y, r ), F, C , C, M, T , f ) f 0 y,Y P P P p p P P 9: I ← argmax C C\C p d c y − d min c j ij j i∈S f f j∈C\C i∈F j∈C\C f ij f ij I∈1 ×[0,1] : d c y ≤Y j∈C\C j i∈F i j i j f g P P P p p 1 ′ ′ 10: if d c y − d min c > r then ij i∈S 1−1/p j∈C i∈F j∈C j ij j ij p (Y +Y ) f g P P P p p 1 ′ ′ 11: return d c y − d min c ≤ r 1−1/p j∈C i∈F ij j∈C i∈S j ij j ij p (Y +Y ) 12: end if 13: end for 14: end for 15: return łFeasiblež Fig. 5. Approximate Separation Oracle for Universalℓ -Clustering with Fixed Clients. GreedyMax is the same algorithm as presented in Figure 3 fork-median. 1/p Lemma 6.3. If there exists aγ -approximation for ℓ -clustering, then for allϵ > 0, α = 2 γ (1 + ϵ ), β = 2p(1 + ϵ ) there exists an algorithm that outputs an (α, β )-approximate universal fractional solution ℓ for -clustering with ixed clients. Proof. IfFixed-ℓ -SepOracle ever outputs an inequality in the regret constraint set, for the corresponding q q q Y ,Y , I , S, letfrac (I), sol (I) denote the ℓ costs of the fractional solution S as and before. Then we have by f p p p P P deinitionYofand the constraint that d c y ≤ Y : f j ij j∈C i∈F ij   X X X   p p  ′ ′ r <  d c y − d minc  ≤ ij j ij j ij 1−1/p   p(Y + Y ) i∈S   j∈C i∈F j∈C   f g p p ′ ′ ′ ′ frac (I ) − sol (I ) = frac (I ) − sol (I ). P p p p p p−1 j p−1−j ′ ′ frac (I )sol (I ) j=0 p p The second inequality uses that Fixed-ℓ -SepOracle only outputs an inequality in the regret constraint set ′ ′ such that frac (I ) > sol (I ). We then have by Lemma 6.2 that for any feasible fractional solution, the inequality p p output by Fixed-ℓ -SepOracle is satisied. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 31 Now, suppose there exists someI, sol such that frac (I) > αsol (I) + βr (forr ≥ 0). Consider the values of p p P P P P p p Y , M iterated over byFixed-ℓ -SepOracle such that d c y ≤ Y ≤ (1+ϵ )( d c y ) p j∈C\C j i∈F ij j∈C\C j i∈F ij f ij f ij and γ sol (C ) ≤ M ≤ γ (1 + ϵ )sol (C ). Then: p f p f 1/p X X * p + . c y / > αsol (C ) + βr ij p ij j∈C i∈F , - 1/p X X * p + . c y / − αsol (C ) > βr ij p ij j∈C i∈F , - P P p p p ′ ′ c y − α sol (C ) ij j∈C i∈F p ij > βr (i) P P 1−1/p ′ c y j∈C i∈F ij ij P P p p p ′ (1 + ϵ ) ′ c y − α sol (C ) ij j∈C i∈F p ij > βr (ii) 1−1/p Y + Y 1−1/p X X βr Y + Y p p p ′ c y − α sol (C ) > ij ij 1 + ϵ j∈C i∈F 1−1/p X X X X βr (Y + Y ) p p p p p ′ p c y − α sol (C \ C ) > + α sol (C ) − c y ij f f ij ij p p ij 1 + ϵ j∈C \C i∈F j∈C i∈F f f 1−1/p X X X X βr (Y + Y ) p p p p ′ p c y − α sol (C \ C ) > + α − c y . (iii) ij ij p f ij ij 1 + ϵ γ (1 + ϵ ) i∈F j∈C i∈F j∈C \C f f P P (i) follows from the fact sol that (C ) > ′ c if a > b. (ii) follows from deinitions Y ,Yof . (iii) p f j∈C i∈F ij ′ ′ ′ follows from the choice M.of Let I denote the vector whose jth element is I if j ∈ C and 0 otherwise. By C j the analysis in Section 5.1, since sol isM/γ -cheap it is in the independence systemGreed that yMax inds a 1/2-maximizer for. ThatGreed is, yMax outputs some S and Fixed-ℓ -Oracle inds someI such that S is P P M-cheap, d c y ≤ Y , and such that: ij j∈C\C i∈F f j ij ACM Trans. Algor. 32 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi   1−1/p p X X X X   βr (Y + Y ) 1 f M p p p   ′ p ′ p d c y − α S (I ) >  + α − c y  ij ij j p ij C\C ij f   2 1 + ϵ γ (1 + ϵ ) ′   j∈C \C i∈F j∈C i∈F    p 1−1/p X X X X βr (Y + Y ) S (C ) 1   f p f p p p ′ ′ p ′   d c y − S (I ) > + α − d c y (iv) ij ij p   j ij C\C j ij 2  1 + ϵ γ (1 + ϵ )  i∈F j∈C i∈F j∈C \C   f f 1−1/p X X X X βr (Y + Y ) p p p p ′ ′ ′ d c y − S (I ) > + S (C ) − d c y (v) ij f ij j p p j ij C\C ij 2(1 + ϵ ) j∈C \C i∈F j∈C i∈F 1−1/p X X βr (Y + Y ) p p ′ ′ d c y − S (I ) > ij j p ij 2(1 + ϵ ) j∈C i∈F P P p p ′ ′ ′ d c y − S (I ) j∈C i∈F ij βr j ij p 1−1/p 2p(1 + ϵ ) p(Y + Y ) P P p p ′ ′ ′ d c y − S (I ) ij j∈C i∈F p j ij > r . (vi) 1−1/p p(Y + Y ) (iv) follows the M-cheapness ofS. (v) follows from the choice α. of (vi) follows from the choice β. of So, Fixed-ℓ -SepOracle outputs an inequality as desired. □ 6.2 Rounding the Fractional Solution Again, we show how to generalize the approach for rounding fractional solutions k-median forwith ixed clients to round fractional solutions ℓ for clustering with ixed clients. We extend Lemma 3.8 as follows: Lemma 6.4. Suppose alg and sol are two (possibly virtual) solutions toℓan -clustering instance with ixed clients ∗ ∗ C , such that there is a subset of clients C ⊂ (C \ C ) such that for every client in C alg’s connection cost is greater f f than p timessol’s connection cost, and for every clientC in\C \C , sol’s connection cost is at least alg’s connection cost. Then p p ′ ′ alg (C )−sol (C )  p p alg (C ) > 0 p−1 ′ ′  alg (C ) f (C ) := p  ′ 0 alg (C ) = 0 is maximizedCby∪ C . The proof follows exactly as that of Lemma 3.8. Lemma 6.5. There exists an algorithm that given any (α, β )-approximate universal fractional solution ℓ for - 1/p clustering with ixed clients, outputs (54apα, 54pβ + 18p )-approximate universal integral solution. Proof. Let sol be the virtual solution whose connection costs are 3 times the fractional solution’s for all clients. The algorithm is to solv ℓ -clustering e the with discounts instance using Lemma 3.4 where the discounts are 0 for ixed clients and 2 times sol’s connection costs for the remaining clients. Note that using these f g p p p ′ p ′ discounts, theℓ -clustering with discounts objective emax quals ′ alg (C ) − 2 · sol (C \ C ) instead C ⊆C ⊆C f p f p p f g p p ′ p ′ of max alg (C ) − 2 · sol (C ) . C ⊆C ⊆C f p p ACM Trans. Algor. Universal Algorithms for Clustering Problems • 33 Let alg be the output solution. We will againalg bound ’s cost against the virtual solution sol g whose connection costs are sol’s connection costs times p for non-ixed clients j such that alg’s connection cost toj is at least 18 timessol’s but less than 18 p times 18· sol’s, and the same as sol’s for the remaining clients. f g ′ ′ We use max ′ to denote max ′ . Ifmax ′ alg (C ) − 18sol g(C ) ≤ 0 then alg’s cost is always bounded C C ⊆C ⊆C C p f g ′ ′ ′ ′ ′ g g g by 18 timessol’s cost and we are done. So assume max [alg (C )−18sol (C )] > 0. LetC = argmax ′ alg (C ) − 18sol (C ) p p 1 p p f g p p ′ p ′ and C = argmax mrs (C ) − 2 · sol (C ) . Like in the proof of Lemma 3.9, via Lemma 6.4 we have: 2 p p f g p p ′ p ′ f g max alg (C ) − 18 sol (C ) p p ′ ′ max alg (C ) − 18sol g (C ) = = p p p−1 alg (C ) f g p p p ′ p ′ p max ′ alg (C ) − 18 sol (C \ C ) − 18 sol (C ) C f f p p p p−1 alg (C ) f g p p p ′ p ′ p max mrs (C ) − 2 sol (C \ C ) − 18 sol (C ) 2 p p f p f · 9 ≤ p−1 alg (C ) f g p p ′ p ′ max ′ mrs (C ) − 2 sol (C ) 2 p p · 9 . p−1 alg (C ) 1/p Using the same analysis as in Lemma 3.9 we can upper bound this inal quantity p ·by mr18 , proving the lemma. □ Theorem 6.1 follows from Lemmas 6.3 and 6.5. 7 UNIVERSAL k-CENTER WITH FIXED CLIENTS In this section, we discuss how to extend the proof of Theorem 4.1 to prove the following theorem: Theorem 7.1. There exists a(9, 3)-approximate algorithm for universal k-center with ixed clients. Proof. To extend the proof of Theorem 4.1 to the case where ixed clients are present,apx let (C ) denote the ′ ′ cost of a 3-approximation to the k-center problem with client C set ; it is well known how to compute apx(C ) in polynomial time 33]. A [ solution with regr r must et be within distance r := apx(C ∪{j}) +r of client j, otherwise j s in realization C ∪ {j} the solution has regret larger than r due to client j. The same algorithm as in the proof of Theorem 4.1 using this deinition r inds of alg within distance r =3 3·apx(C ∪{j})+3·mr ≤ 9·opt(C ∪{j})+3mr j j s s ′ ′ ′ of client j. opt(C ) ≥ opt(C ∪ {j}) for any realization C and any client j ∈ C , so this solution is (9, 3a)- approximation. □ 8 HARDNESS OF UNIVERSAL CLUSTERING FOR GENERAL METRICS In this section we give some hardness results to help contextualize the algorithmic results. Much like the hardness results for k-median, all our reductions are based on the NP-hardness of approximating set cover (or equivalently, dominating set) due to the natural relation between the two types of problems. We state our hardness results in terms of ℓ -clustering. Setting p = 1 gives hardness results for k-median, and setting p = ∞ (and using the convention /1∞ = 0 in the proofs as needed) gives hardness resultskfor -center. ACM Trans. Algor. 34 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 8.1 Hardness of Approximatingα Theorem 8.1. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering whereα < 3 is NP-hard. Proof. We will show that given a deterministic (α, β )-approximate algorithm wher α < e 3, we can design an algorithm (using (the α, β )-approximate algorithm as a subroutine) that solves the set cover problem (i.e. inds a set cover of size k if one exists) giving the lemma by NP-hardness of set cover. The algorithm is as follows: Given an instance of set cover, construct the following instance of univ ℓ -clustering: ersal • For each element, there is a corresponding client in the univ ℓ -clustering ersal instance. • For each set S, there is a cluster center which is distance 1 from the clients corresponding to elements S in and 3 from other all clients. Then, we just run the universal ℓ -clustering algorithm on this instance, and output the sets corresponding to cluster centers this algorithm buys. Assume for the rest of the proof that a set cover of size k exists. Then the corresponding k cluster centers are as close as possible to every client, and are always an optimal solution. This mr =giv 0 for es that this universal ℓ -clustering instance. Now, suppose by contradiction that this algorithm does not solve the set cover problem. That is, for some set cover instance we run an(α, β )-approximate algorithm wher α < e 3 on the produced ℓ -clustering instance, and it produces a solution alg that does not choose cluster centers corresponding to a set cover. This means it is distance 3 from some client j. For realization C = {j}, we have by the deinition(of α, β )-approximation: ′ ′ alg(C ) ≤ α · opt(C ) + β · mr =⇒ 3 ≤ α · 1 + β · 0 = α Which is a contradiction, giving the lemma. □ Note that for e.g.k-median, we can classically get an approximation ratio of less than 3. So this theorem shows that the universal version of the problem is harder, even if we are willing to use arbitrar β. y large 8.2 Hardness of Approximatingβ We give the following result on the hardness of univ ℓ -clustering. ersal Theorem 8.2. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering whereβ < 2 is NP-hard. Proof. We will show that given a deterministic (α, β )-approximate algorithm wher β < e 2, we can design an algorithm (using (the α, β )-approximate algorithm as a subroutine) that solves the dominating set problem (i.e. outputs at mostk vertices which are a dominating set of k size if a dominating set ofksize exists) giving the lemma by NP-hardness of dominating set. The algorithm is as follows: Given an instance of dominating set G = (V , E), construct the following instance of univ ℓ -clustering: ersal • For each vertex v ∈ V , there is a corresponding k-clique of clients in the univ ℓ -clustering ersal instance. • For each (u,v) ∈ E, connect all clients u’sin corresponding clique to all those v’s. in • Impose the shortest path metric on the clients, where all edges are length 1. Then, we just run the universal ℓ -clustering algorithm on this instance, and output the set of vertices corresponding to cluster centers this algorithm buys. Assume for the rest of the proof that a dominating set ofksize exists in the dominating set instance. Then, a dominating set of size k also exists in the constructed univℓersal -clustering instance (where the cliques this set resides in correspond to the vertices in the dominating set in the original instance). Thus, there is a solution to the universal ℓ -clustering instance that covers all clients at distance at most 1. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 35 We will irst show this dominating set solution is a minimum regret solution. Given a dominating set solution, note that in any realization of the demands, opt can cover k locations at distance 0, and must cover the rest of the clients at distance at least 1. Thus, to maximize the regret of a dominating set solution,k w clients e pick any covered at distance 1 by the dominating set, and choose the realization including only these clients. Now, consider any solution which is not a dominating set. For such a solution, ther k-clique e is some covered 1/p at distance 2. We can make such a solution incur regr k et 2by includingkall clients in this clique, with the optimal solution being to buy all cluster centers in this clique at cost 0. Thus, the dominating set solution is a 1/p minimum regret solution,mr and= k . Now consider any(α, β )-approximation algorithm and suppose this algorithm when run on the reduced dominating set instance does not produce a dominating set solution while one exists. Consider C the realization including only the clients in k-clique some covered at distance 2. By deinition (α, β )of -approximation we get: ′ ′ alg(C ) ≤ α · opt(C ) + β · mr 1/p 1/p 2k ≤ 0 + βk (17) 2 ≤ β Ifβ < 2, this is a contradiction, i.e. the algorithm will always output a dominating k if one seteof xists. size Thus an (α, β )-approximation algorithm wher β <e2 can be used to solve the dominating set problem, proving the theorem. □ 9 HARDNESS OF UNIVERSAL CLUSTERING FOR EUCLIDEAN METRICS 9.1 Hardness of Approximatingα We can consider the special caseℓ of -clustering where the cluster center and client locations are all points in d d R , and the metric isℓa-norm inR . One might hope that e.g. for d = 2, α = 1 + ϵ is achievable since for the classic Euclidean k-median problem, a PTAS exists 5]. W [ e show that there is still a lower bound α evon en for ℓ -clustering R in. Theorem 9.1. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering in R using the 1+ 7 ℓ -norm where α < for q = 2 or α < 2 for q = 1,∞ is NP-hard. Proof. The hardness is via reduction from the discr k-center ete problem in R . Section 3 of46[] shows how to reduce an instance of planar 3-SAT (which is NP-hard) to an instance of Euclidean k-center inR using theℓ norm as the metric such that: • For every client, the distance to the nearest cluster center is 1. • There exists ak-cluster center solution which is distance 1 from all clients if the planar 3-SAT instance is satisiable, and none exists if the instance is unsatisiable. • Any solution that is strictly less than distance α − ϵ away from all clients can be converted in polynomial 1+ 7 time to a solution within distance 1 of allαclients = if for q = 2, α = 2 if q = 1,∞. We note that [46]’s reduction is actually to the continuousž ł version of the problem where every pRoint in can be chosen as a cluster center, including the points clients are located at. That is, if we use this reduction without modiication then the irst property is not actually true (since the minimum distance is 0). However, in the proof of correctness for this construction 46] sho [ ws that (both for the optimal solution and any algorithmic solution) it suices to only consider cluster centers located at the centers of ℓ adiscs set ofof radius 1 chosen such that every client lies on at least one of these discs and no client is contained within any of these discs. So, taking this reduction and then restricting the choice of cluster centers to the centers of these discs, we retrieve an instance with the desired properties. ACM Trans. Algor. 36 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Now, consider the corresponding instance as a univℓersal -clustering instance. Like in the proof of Theorem 8.1, if the planar 3-SAT instance reduced from is satisiable, there exists a clustering solution that is as close as possible to every client, i.e. has regret 0.mr So= 0. Thus, an (α, β )-approximate clustering solution is within α of distance every client (in the realization where only j app client ears, opt is 1 so anα-approximate solution must be within distanceα of this client). In turn, using the properties of the reduced clustering instance (α, β )-appr , anoximation where α is less than the lower bound given 46in ] can [ be converted into an algorithm that solves planar 3-SA □ T. 9.2 Hardness of Approximatingβ We can also show thatβ = 1 is NP-hard in R using a similar reduction: Theorem 9.2. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering in R using the ℓ -norm where β = 1 for q = 1, 2,∞ is NP-hard. Proof. We again use a reduction from planar 3-SAT due46to].[This time, we use the reductions in Section 4 of [46] for simplicity, which has the properties that: • Every client is distance 0 from a co-located cluster center, and the distance to the second-closest cluster center is 1. • There exists ak-cluster center solution which is distance 1 from k all clients but and distance 0 frkom clients (the ones at the cluster centers) if the planar 3-SAT instance is satisiable, and none exists if the instance is unsatisiable. Consider any instance reduced from a satisiable planar 3-SAT instance. The solution in the resulting instance ∗ 1/p sol with the second property above has regretk (and in fact, thismr is ): by the irst property above, no solution can be less than distance 1 away from any clients other than k clients the co-located with its cluster ∗ ′ centers. In turn, the regret of thesol against any adversarial sol is maximized by the realization C only including ∗ ′ ′ 1/p 1/p the clients co-located withk the cluster centers in the sol. We then get sol (C ) − sol(C ) = k − 0 = k . Now consider an arbitrar(αy, 1)-approximate universal solution alg in this instance. Consider any set k of ′ ′ ′ ′ 1/p clients C not co-located with alg’s cluster centers.opt(C ) = 0, so we get alg(C ) ≤ α·opt(C )+mr = mr ≤ k . alg is distance at least 1 from all clients C by construction, in so this only holds algif is distance 1 from all clients C in. This gives that alg is distance 1 from all k clients but (those co-located with cluster centers alg),in and distance 0 from the remaining clients. In alg turn, satisies the property of a solution corresponding to a satisfying assignment to the planar 3-SAT instance. This shows(αthat , 1)-appr an oximate universal solution to ℓ -clustering R incan be used to solve planar 3-SAT. □ 10 FUTURE DIRECTIONS In this paper, we gave the irst universal algorithms for clusteringkpr -me oblems: dian,k-means, and k-center (and their generalization ℓ -clustering). to While we achieve constant approximation guarantees for these problems, the actual constants are orders of magnitude larger than the best (non-universal) approximations known for these problems. In part to ensure clarity of presentation, we did not attempt to optimize these constants. But it is unlikely that our techniques will small lead constants to for thek-median andk-means problems (although, interestingly, we got small constants k-center for ). On the other hand, we show that in general itNP is -hard to ind an(α, β )-approximation algorithm for a universal clustering problem α matches wherthe e approximation factor for the standard clustering problem. Therefore, it is not entirely clear what one should are thereexpect: universal algorithms for clustering with approximation factors of the same order as the classical (non-universal) bounds? One possible approach to improving the constants is considering algorithms that usekmor cluster e than p 2 p centers. For example, our(9 , · 9 )-approximation for ℓ -clustering with discounts can easily be improved to ACM Trans. Algor. Universal Algorithms for Clustering Problems • 37 p p an (3 , 3 )-approximation if it is allowekd−to1use cluster 2 centers. This immediately improves all constants in the paper. For example, our (27, 49)-approximation for universal k-median becomes a(9, 18)-approximation if it is allowed tokuse− 12cluster centers. Unfortunately, our lower boundsα,on β apply even if the algorithm is allowed to use (1− ϵ )k lnn cluster centers, but it is an interesting problem to show that e.g. (1using + ϵ )k lnn cluster centers allows one to beat either bound. Another open research direction pertains to Euclidean clustering. Here, we showeRd that fordin≥ 2, α needs to be bounded away from 1, which is in stark contrast to non-universal clustering problems that admit PTASes in constant-dimension Euclidean space. But, d = for 1, i.e., for universal clustering on a line, the picture is not as clear. On a line, the lower bounds α ar one no longer valid, which brings forth the possibility of (non-bicriteria) approximations of regret. Indeed, it is known that there is 2-approximation forkuniv -median ersal on a line 38],[ and even better, an optimalalgorithm for universal k-center on a line7].[ This raises the natural question: can we design a PTAS for the universal k-median problem on a line? ACKNOWLEDGMENTS Arun Ganesh was supported in part by NSF Award CCF-1535989. Bruce M. Maggs was supported in part by NSF Award CCF-1535972. Debmalya Panigrahi was supported in part by NSF grants CCF-1535972, CCF-1955703, an NSF CAREER Award CCF-1750140, and the Indo-US Virtual Networked Joint Center on Algorithms under Uncertainty. REFERENCES [1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2017. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms. ProceInedings of the 58th Annual IEEE Symposium on Foundations of Computing . 61ś72. https://doi.org/10. 1109/FOCS.2017.15 [2] N. Alon and Y. Azar. 1992. On-line Steiner Trees in the Euclidean Pr Plane oceedings . In of the 8th Annual Symposium on Computational Geometry. 337ś343. [3] Barbara Anthony, Vineet Goyal, Anupam Gupta, and Viswanath Nagarajan. 2010. A Plant Location Guide for the Unsure: Approximation Algorithms for Min-Max Location Problems. Math. Oper. Res. 35, 1 (Feb. 2010), 79ś101. https://doi.org/10.1287/moor.1090.0428 [4] Aaron Archer, Ranjithkumar Rajagopalan, and David B. Shmoys. 2003. Lagrangian Relaxation for the k-Median Problem: New Insights and Continuity Properties.Algorithms In - ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings , Giuseppe Di Battista and Uri Zwick (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 31ś42. https://doi.org/10.1007/978- 3-540-39658-1_6 [5] Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. 1998. Approximation Schemes for Euclidean k-medians and Related Problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (Dallas, Texas, USA (ST ) OC ’98). ACM, New York, NY, USA, 106ś113. https://doi.org/10.1145/276698.276718 [6] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. 2001. Local Search Heuristic for K-median and Facility Location Problems. ProceeIn dings of the Thirty-third Annual ACM Symposium on Theory of Computing (Hersonissos, Greece)(STOC ’01). ACM, New York, NY, USA, 21ś29. https://doi.org/10.1145/380752.380755 [7] I. Averbakh and Oded Berman. 1997. Minimax regret p-center location on a network with demand uncertainty Location . Science 5, 4 (1997), 247 ś 254. https://doi.org/10.1016/S0966-8349(98)00033-3 [8] Igor Averbakh and Oded Berman. 2000. Minmax Regret Median Location on a Network Under Uncertainty INFORMS . Journal on Computing12, 2 (2000), 104ś110. https://doi.org/10.1287/ijoc.12.2.104.11897 arXiv:https://doi.org/10.1287/ijoc.12.2.104.11897 [9] D. Bertsimas and M. Grigni. 1989. Worst-case examples for the spaceilling curve heuristic for the Euclidean traveling salesman problem. Operations Research Letter8, 5 (Oct. 1989), 241ś244. [10] Anand Bhalgat, Deeparnab Chakrabarty, and Sanjeev Khanna. 2011. Optimal Lower Bounds for Universal and Diferentially Private Steiner Trees and TSPs. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and , Leslie Techniques Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 75ś86. [11] Sayan Bhattacharya, Parinya Chalermsook, Kurt Mehlhorn, and Adrian Neumann. 1994. New Approximability Results for the Robust k-Median Problem. Pr Inoceedings of the 14th Scandanavian Workshop on Algorithm The.or 51ś60. y [12] Costas Busch, Chinmoy Dutta, Jaikumar Radhakrishnan, Rajmohan Rajaraman, and Srinivasagopalan Srivathsan. 2012. Split and Join: Strong Partitions and Universal Steiner Trees for Graphs. 53rd In Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012 . 81ś90. ACM Trans. Algor. 38 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi [13] Jaroslaw Byrka, Fabrizio Grandoni, Thomas Rothvoß, and Laura Sanità. 2013. Steiner Tree Approximation via Iterative Randomized Rounding.J. ACM 60, 1 (2013), 6:1ś6:33. [14] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. 2011. Maximizing a Monotone Submodular Function Subject to a Matroid Constraint. SIAM J. Comput. 40, 6 (Dec. 2011), 1740ś1766. https://doi.org/10.1137/080733991 [15] Deeparnab Chakrabarty and Chaitanya Swamy. 2019. Approximation Algorithms for Minimum Norm and Ordered Optimization Problems. InProceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (Phoenix, AZ, USA(ST ) OC 2019). Association for Computing Machinery, New York, NY, USA, 126ś137. https://doi.org/10.1145/3313276.3316322 [16] Moses Charikar, Chandra Chekuri, and Martin Pál. 2005. Sampling Bounds for Stochastic Optimization. ProceedingsInof the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th International Conference on Randamization and Computation: Algorithms and Techniques (Berkeley, CA(APPROX’05/RANDOM’05) ) . Springer-Verlag, Berlin, Heidelberg, 257ś269. https://doi.org/10.1007/11538462_22 [17] Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. 1999. A Constant-factor Approximation Algorithm for the K-median Problem (Extended Abstract). InProceedings of the Thirty-irst Annual ACM Symposium on Theory of Computing (Atlanta, Georgia, USA) (STOC ’99). ACM, New York, NY, USA, 1ś10. https://doi.org/10.1145/301250.301257 [18] Chandra Chekuri. 2007. Routing and network design with robustness to changing or uncertain traic demands. SIGACT News 38, 3 (2007), 106ś129. [19] Kedar Dhamdhere, Vineet Goyal, R. Ravi, and Mohit Singh. 2005. How to Pay, Come What May: Approximation Algorithms for Demand-Robust Covering Problems. 46th In Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA, Proceedings . 367ś378. [20] Uriel Feige, Kamal Jain, Mohammad Mahdian, and Vahab S. Mirrokni. 2007. Robust Combinatorial Optimization with Exponential Scenarios. InInteger Programming and Combinatorial Optimization, 12th International IPCO Conference, Ithaca, NY, USA, June 25-27, 2007, Proceedings . 439ś453. [21] Teoilo F. Gonzalez. 1985. Clustering to Minimize the Maximum Intercluster TheDistance or. Comput. . Sci.38 (1985), 293ś306. [22] Igor Gorodezky, Robert D. Kleinberg, David B. Shmoys, and Gwen Spencer. 2010. Improved Lower Bounds for the Universal and a priori TSP. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and , Maria Techniques Serna, Ronen Shaltiel, Klaus Jansen, and José Rolim (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 178ś191. [23] F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh. 2008. Set Covering with Our EyesPr Close oceedings d. In of the 49th Annual IEEE Symposium on Foundations of Computer Science . [24] Martin Grötschel, László Lovász, and Alexander Schrijver. 1981. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1, 2 (1981), 169ś197. [25] Sudipto Guha and Kamesh Munagala. 2009. Exceeding Expectations and Clustering UncertainPrData. oceedings In of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Providence, Rhode Island, USA (PODS ) ’09). Association for Computing Machinery, New York, NY, USA, 269ś278. https://doi.org/10.1145/1559795.1559836 [26] Anupam Gupta, Mohammad T. Hajiaghayi, and Harald Räcke. 2006. Oblivious Network Design. Proceedings In of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (Miami, Florida) (SODA ’06). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 970ś979. http://dl.acm.org/citation.cfm?id=1109557.1109665 [27] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. 2014. Thresholded covering algorithms for robust and max-min optimization. Math. Program. 146, 1-2 (2014), 583ś615. [28] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. 2016. Robust and MaxMin Optimization under Matroid and Knapsack Uncertainty Sets. ACM Trans. Algorithms12, 1 (2016), 10:1ś10:21. [29] Anupam Gupta, Martin Pál, R. Ravi, and Amitabh Sinha. 2004. Boosted Sampling: Approximation Algorithms for Stochastic Optimization. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing (Chicago, IL, USA(ST ) OC ’04). ACM, New York, NY, USA, 417ś426. https://doi.org/10.1145/1007352.1007419 [30] Anupam Gupta and Kanat Tangwongsan. 2008. Simpler Analyses of Local Search Algorithms for Facility ArXiv Location. abs/0809.2554 (2008). [31] Mohammad T. Hajiaghayi, Robert Kleinberg, and Tom Leighton. 2006. Improved Lower and Upper Bounds for Universal TSP in Planar Metrics. InProceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms (Miami, Florida). 649ś658. [32] Dorit S. Hochbaum and David B. Shmoys. 1985. A Best Possible Heuristic for the k-Center Pr Math. oblem. Oper. Res. 10, 2 (May 1985), 180ś184. https://doi.org/10.1287/moor.10.2.180 [33] Dorit S. Hochbaum and David B. Shmoys. 1986. A Uniied Approach to Approximation Algorithms for BottleneckJ.Pr Aoblems. CM 33, 3 (May 1986), 533ś550. https://doi.org/10.1145/5925.5933 [34] Kamal Jain and Vijay V. Vazirani. 2001. Approximation Algorithms for Metric Facility Location and k-Median Problems Using the Primal-dual Schema and Lagrangian Relaxation. J. ACM 48, 2 (March 2001), 274ś296. https://doi.org/10.1145/375827.375845 [35] L. Jia, G. Lin, G. Noubir, R. Rajaraman, and R. Sundaram. 2005. Universal Algorithms for TSP, Steiner Tree, and Set PrCo oceveer dings . In of the 36th Annual ACM Symposium on Theory of Computing . ACM Trans. Algor. Universal Algorithms for Clustering Problems • 39 [36] Lujun Jia, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. 2006. GIST: Group-Independent Spanning Tree for Data Aggregation in Dense Sensor Networks. Distribute In d Computing in Sensor Systems , Phillip B. Gibbons, Tarek Abdelzaher, James Aspnes, and Ramesh Rao (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282ś304. [37] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2002. A Local Search Approximation Algorithm for K-means Clustering. Proceedings In of the Eighteenth Annual Symposium on Computational Geometry (Barcelona, Spain) (SCG ’02). ACM, New York, NY, USA, 10ś18. https://doi.org/10.1145/513400.513402 [38] Adam Kasperski and Pawel Zielinski. 2007. On the existence of an FPTAS for minmax regret combinatorial optimization problems with interval data. Oper. Res. Lett. 35 (2007), 525ś532. [39] Rohit Khandekar, Guy Kortsarz, Vahab S. Mirrokni, and Mohammad R. Salavatipour. 2013. Two-stage Robust Network Design with Exponential Scenarios. Algorithmica 65, 2 (2013), 391ś408. [40] Samir. Khuller and Yoram J. Sussmann. 2000. The Capacitated K-Center Problem. SIAM Journal on Discrete Mathematics 13, 3 (2000), 403ś418. https://doi.org/10.1137/S0895480197329776 arXiv:https://doi.org/10.1137/S0895480197329776 [41] Stavros G. Kolliopoulos and Satish Rao. 1999. A Nearly Linear-Time Approximation Scheme for the Euclidean k-median Problem. In Algorithms - ESA’ 99 , Jaroslav Nešetřil (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 378ś389. [42] Panos Kouvelis and Gang Yu. 1997. Robust 1-Median Location Problems: Dynamic Aspects and Uncertainty . Springer US, Boston, MA, 193ś240. https://doi.org/10.1007/978-1-4757-2620-6_6 [43] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. 2004. A simple linear (1 + time ϵ )-approximation algorithm k-means for clustering in any dimensions. Pr Inoceedings of the 45th IEEE Symposium on Foundations of Computer Science . 454ś462. [44] Shi Li and Ola Svensson. 2013. Approximating k-Median via Pseudo-Approximation. ProIn ceedings of the Forty-ifth Annual ACM Symposium on Theory of Computing (Palo Alto, California, USA). 901ś910. [45] Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory28 (1982), 129ś136. [46] Stuart G. Mentzer. 2016. Approximability of Metric Clustering Problems. (March 2016). Unpublished manuscript. [47] Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. 2013. The Euclidean k-Supplier Problem. Integer PrInogramming and Combinatorial Optimization , Michel Goemans and José Correa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 290ś301. [48] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functionsÐI. Mathematical Programming 14, 1 (1978), 265ś294. https://doi.org/10.1007/BF01588971 [49] Loren K. Platzman and John J. Bartholdi, III. 1989. Spaceilling Curves and the Planar Travelling Salesman J. ACM Pr36, oblem. 4 (Oct. 1989), 719ś737. https://doi.org/10.1145/76359.76361 [50] Frans Schalekamp and David B. Shmoys. 2008. Algorithms for the universal and a priori Operations TSP. Research Letters36, 1 (2008), 1ś3. https://doi.org/10.1016/j.orl.2007.04.009 [51] David B. Shmoys and Chaitanya Swamy. 2006. An Approximation Scheme for Stochastic Linear Programming and Its Application to Stochastic Integer Programs.J. ACM 53, 6 (Nov. 2006), 978ś1012. https://doi.org/10.1145/1217856.1217860 [52] Chaitanya Swamy and David B. Shmoys. 2006. Approximation algorithms for 2-stage stochastic optimization SIGA prCT oblems. News 37, 1 (2006), 33ś46. [53] Chaitanya Swamy and David B. Shmoys. 2012. Sampling-Based Approximation Algorithms for Multistage Stochastic Optimization. SIAM J. Comput. 41, 4 (2012), 975ś1004. A RELATIONS BETWEEN VECTOR NORMS For completeness, we give a proof of the following well known fact that relates ℓ norms. vector Fact A.1. For any 1 ≤ p ≤ q and x ∈ R , we have 1/p−1/q ||x|| ≤ ||x|| ≤ n ||x|| . q p q Proof. For all p, repeatedly applying Minkowski’s inequality we have: 1/p 1/p 1/p n n−1 n−2 n X X X X p p p * + * + * + ||x|| = |x| ≤ |x| + |x | ≤ |x| + |x | + |x | ≤ . . . ≤ |x | = ||x|| . p n n−1 n i 1 i=1 i=1 i=1 i=1 , - , - , - Then we bound ||x|| by ||x|| as follows: q p ACM Trans. Algor. 40 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 1/p 1/q p/q 1/p n n n X X X * q/p + q p p * + * + * + ||x|| = |x| = . |x| / ≤ |x| = ||x|| . q p i=1 i=1 i=1 , - , - , - , - ′ ′ ′ ′ p The inequality is by applying ||x || ≤ ||x || to the vector x with entries x = |x | . To bound ||x|| by q/p 1 p ||x|| , we invoke Holder’s inequality as follows: p/q 1−p/q n n n n X X X X p q/p p p p p q/(q−p) 1−p/q * + * + ||x|| = |x| = |x| · 1 ≤ |x | 1 = ||x|| · n . i q i=1 i=1 , i=1 - , i=1 - Taking thepth root of this inequality gives the desired bound. □ 1/c logn 1/c The limiting behavior q →as∞ shows that ||x|| ≤ ||x|| ≤ n ||x|| = 2 ||x|| , i.e. that the ∞ c logn ∞ ∞ ℓ -norm and ℓ -norm forp = Ω(logn) are within a constant factor. ∞ p B APPROXIMATIONS FOR ALL-CLIENTS INSTANCES ARE NOT UNIVERSAL In this section, we demonstrate that even (1 + ϵ )-approximate (integer) solutions for the łall clientsž instance for clustering problems are not guaranteed to(αb,eβ )-approximations for any inite α, β. This is in sharp contrast to the optimal (integer) solution, which is known (1,to2)b-appr e a oximation for a broad range of problems including the clustering problems considered in this paper [38]. Consider an instance of universal 1-median with c clients ,c and cluster centersf , f . Both the cluster centers 1 2 1 2 are at distance 1 from c , and at distances 0 andϵ respectively from c (see Figure 6).f is a(1 + ϵ )-approximate 1 2 2 solution for the realization containing both clients. In this mr is instance 0 and sof, is not an(α, β )-approximation for any inite α, β due to the realization containing c only . The same example can be used for the ℓ -clustering 2 p p 1/p objective for all p ≥ 1 sincef has approximation factor (1 + ϵ ) ≤ (1 + ϵ ) when all clients are present. In the case ofk-center, f is an optimal solution when all clients are present. Fig. 6. Example where a (1 + ϵ)-approximation for all clients has no(α, β )-approximation guarantee, for anyℓ -clustering objective including k-center. C ALGORITHMS FOR k-MEDIAN AND ℓ -CLUSTERING WITH DISCOUNTS p 2 p In this section, we prove Lemma 3.4 which states that there exists (9 ,a · 9 )-approximation algorithm for the ℓ -clustering with discounts problem. As a corollary, by p = setting 1, we will obtain Lemma 2.3 which states that there exists a(9, 6)-approximation algorithm for k-me thedian with discounts problem. To prove Lemma 3.4, we will irst use a technique due to Jain and Vazirani 34] to design [ a Lagrangian-preserving approximation for the ACM Trans. Algor. Universal Algorithms for Clustering Problems • 41 p p p ℓ -facility location with discountsℓpr-facility oblem. location with discounts (FLD) is the ℓ -clustering same as p p p with discounts, except rather than being restricted to buying k cluster centers, each cluster center has a cost f associated with buying it (discounts and cluster centers costs are not connected in any way). C.1 Algorithm forℓ -Facility Location with Discounts Since FLD is a special case of non-metric facility location, we can consider the standard linear programming primal-dual formulation for the latter. The primal program is as follows: P P p p min f x + (c − r ) y i i ij i∈F i∈F, j∈C ij j s.t. ∀j ∈ C : y ≥ 1 ij i∈F ∀i ∈ F , j ∈ C : y ≤ x ij i ∀i ∈ F : x ≥ 0 ∀i ∈ F , j ∈ C : y ≥ 0 ij The dual program is as follows: max a j∈C p p s.t. ∀i ∈ F , j ∈ C : a − (c − r ) ≤ b j ij ij j ∀i ∈ F : b ≤ f j∈C ij i ∀j ∈ C : a ≥ 0 ∀i ∈ F , j ∈ C : b ≥ 0 ij We design a primal-dual algorithm for the FLD problem. This FLD algorithm operates in two phases. In both programs, all variables start out as 0. In the irst phase, we generate a dual solution. For each client j deine a łtimež variable t which is initialized to 0. We grow the dual variables as follows: we incrtease uniformly the . We grow the a such that for anyj, j j at all times a = (t − r ) (or equivalently,aall start at 0, and we increase all a at a uniform rate, but we j j j j p p p + + only start growing a at timer ). Each b is set to the minimum feasible value (a −, i.e (c .− r ) ) . If the j ij j j ij j p p constraint b ≤ f is tight, we stop increasing t ,all a for which b = a − (c − r ) , i.e., for the clients j∈C ij i j j ij j ij j j that contributed to increasing the value b (we say these clients put weight on this cluster center). We ij j∈C continue this process until t stop allgrowing. Note that at any time the dual solution grown is always feasible. In the second phase, consider a graph induced on the cluster centers whose constraints are tight, where we place an edge between cluster centers i, i if there exists some client j that put weight on both cluster centers. Find a maximal independent S of set this graph and output this set of cluster centers.π Let be a map from clients to cluster centers such thatπ (j) is the cluster center which made t stop increasing in the irst phase of the algorithm.πIf (j) ∈ S, connect j to cluster centerπ (j), otherwise connectj to one ofπ (j)’s neighbors in the graph arbitrarily. We can equivalently think of the algorithm as generating an integral primal xsolution = 1 for all wher i ∈ eS and x = 0 otherwise, andy = 1 if j is connected toi and isy = 0 otherwise. Based again on the technique of i ij ij [34], we can show the following Lemma holds: Lemma C.1. Let x,y be the primal solution and a,b be the dual solution generated by the above FLD algorithm. x,y satisies X X X X p p p + p p (c − 3 r ) y + 3 f x ≤ 3 a ij i i j ij j j∈C i∈F i∈F j∈C (1) (2) (1) Proof. Let C be the set of clientsjin such that π (j) ∈ S and C = C \ C . For any i ∈ S, letC be the set of all clients j such that π (j) = i. Note that: ACM Trans. Algor. 42 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi X X X p p + + a = [b + (c − r ) ] = (c − r ) + f j ij ij ij i j j j∈C j∈C j∈C i i i (1) No client C in contributes to the sum b for multiple i inS (because S is an independent set). This gives ij i∈F us: X X X X X X p p p p p + + (c − 3 r ) y + f x ≤ (c − r ) y + f x ij i i ij i i ij j ij j (1) (1) i∈F i∈F i∈F i∈F j∈C j∈C X X p p = [f + (c − r ) ] ij j i∈S j∈C X X = a i∈S j∈C = a . (18) (1) j∈C (2) For each client j inC , j is connected to one of π (j)’s neighborsi. Sinceπ (j) and i are neighbors, there is ′ ′ some client j that put weight on bothπ (j) and i. Sincej put weight onπ (j) and thus π (j) going tight would ′ ′ have stopped t from increasing, t stopped increasing before or when π (j) went tight, which was when t j j j stopped growing. Since all t start growing at the same time and grow uniformly t ≤ ,t . Sincej put weight j j j p p p p + + + on π (j), we know a − (c − r ) > 0 and thus (t − r ) − (c − r ) > 0, implying t ≥ c . Similarly, j π (j )j j π (j )j j j j j π (j )j p p 1/p t ′ ≥ c ,c . Triangle inequalitycgiv≤ ces ′ + c ′ + c ≤ 3t . Then, we get: j ′ ′ ij ij π (j )j π (j )j ij π (j )j j X X X X X p p p p p + p p + p + p (c − 3 r ) y ≤ (3 t − 3 r ) = 3 (t − r ) = 3 a (19) ij j j j ij j j j (2) (2) (2) (2) i∈F j∈C j∈C j∈C j∈C Adding 3 times Eq. (18) to Eq. (19) gives the Lemma. □ C.2 Algorithm forℓ -Clustering with Discounts We now move on to inding an algorithmℓfor -clustering with discounts. We can represent the problem as a primal/dual linear program pair as follows. The primal program is: p p min (c − r ) y ij i∈F, j∈C ij j s.t. ∀j ∈ C : y ≥ 1 ij i∈F ∀i ∈ F , j ∈ C : y ≤ x ij i x ≤ k i∈F i ∀i ∈ F : x ≥ 0 ∀i ∈ F , j ∈ C : y ≥ 0 ij The dual program is as follows: max a − kz j∈C p p s.t. ∀i ∈ F , j ∈ C : a − (c − r ) ≤ b j ij ij j ∀i ∈ F : b ≤ z j∈C ij ∀j ∈ C : a ≥ 0 ∀i ∈ F , j ∈ C : b ≥ 0 ij ACM Trans. Algor. Universal Algorithms for Clustering Problems • 43 We now describe the algorithm we will use to prove Lemma 3.4, which uses the FLD algorithm from Section C.1 as a subroutine. By taking our ℓ -clustering with discounts instance and assigning all cluster centers the same cost z, we can produce a FLD instance. Whenz = 0, the FLD algorithm will either buy mor kecluster than centers, or ind a set of at mostk cluster centers, in which case we can output that set. When z = |C| max c , i, j ij 1 O (1) the FLD algorithm will buy only 1 cluster center. Thus,ϵ for suchany that log = n , via bisection search using polynomially many runs of this algorithm we can ind z such a value thatof this algorithm buys a set of cluster centersS of size k ≥ k when cluster centers costz and a set of cluster centers S of size k ≤ k cluster 1 1 2 2 centers when cluster centers costz + ϵ (the bisection search starts with the range ,|C[0| max c ] and in each i, j ij iteration, determines how many cluster centers are bought when z is the midpoint value in its current range. It then recurses on the halfa[,b] of its current range which maintains the invariant that z =when a, at leastk cluster centers are bought and whenz = b, at most k cluster centers are bought). If either k = k or k = k, we output the corresponding cluster center set. Otherwise, we will randomly choose 1 2 a solution which is roughly a combination S and Sof(we will describe how to derandomize this process as is 1 2 k−k required to prove Lemma 2.3 later). Let ρ be the solution in , 1][0to ρk + (1− ρ)k = k, i.eρ. = . Construct 1 2 k −k 1 2 ′ ′ a set S that consists of the closest cluster center S in to each cluster center in S . If the size of S is less than k , 1 2 2 1 1 ′ ′ ∗ ′ add arbitrary cluster centers from S \ S to S until its size k . is Then, with probability ρ, letS = S , otherwise 1 2 1 1 1 ∗ ′ ∗ letS = S . Then, sample a uniformly random subset k −ofk elements from S \ S and add them to S . Then 2 2 1 ∗ ′ ′ output S (note that S \ S is of size k − k so every element in S \ S has probability ρ of being chosen). 1 1 2 1 1 1 Proof of Lemma 3.4. Note that if the FLD algorithm ever outputs a solution which buyskecluster xactly centers, then by Lemma C.1 we get that for the LP solution x,y encoding this solution and a dual solution a: X X X X p p p + p p (c − 3 r ) y + 3 f x ≤ 3 a ij i i j ij j j∈C i∈F i∈F j∈C X X X p p p + p p (c − 3 r ) y + 3 kz ≤ 3 a ij j ij j j∈C i∈F j∈C X X X p p p + p (c − 3 r ) y + ≤ 3 [ a − kz] ij j ij j j∈C i∈F j∈C p p Which by duality means that this solution is (3 ,also 3 )-appr a oximation for the ℓ -clustering with discounts instance. If bisection search never inds a solution withkexactly cluster centers, but instead a pair of solutions S , S 1 2 where |S | > k,|S | < k, the idea is that the algorithm constructs a "bi-point" fractional solution from these 1 2 solutions (i.e. constructs a fractional solution that is just a convex combination of the two integral solutions) and then rounds it. (1) (1) (1) (2) (2) (2) Consider the primal/dual solutions x ,y , a and x ,y , a corresponding toS , S . By Lemma C.1 we 1 2 get: X X X p p (1) (1) p + p p (c − 3 r ) y + 3 k z ≤ 3 a ij j ij j j∈C i∈F j∈C X X X p p (2) (2) p + p p (c − 3 r ) y + 3 k (z + ϵ ) ≤ 3 a ij j ij j j∈C i∈F j∈C By combining the two inequalities and cho ϵ appr osing opriately we can get that: X X X p p (1) (2) (1) (2) p + p ′ (c − 3 r ) (ρy + (1− ρ)y ) ≤ (3 + ϵ )[ (ρa + (1− ρ)a ) − kz] ij j ij ij j j j∈C i∈F j∈C ACM Trans. Algor. 44 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi For an ϵ we will ix later. (1) (1) (1) (2) (2) (2) Note that ρ(x ,y , a ) + (1− ρ)(x ,y , a ) and z form a feasible (fractional) primal/dual solution pair p p ′ for theℓ -clustering with discounts problem, and by the above inequality the primal(3solution , 3 + ϵ )-is a approximation. Then, we round the convex combination of the two solutions as described abocveb. Let e the connection (1) (2) cost of client j in the rounded solution, and c ,c the connection cost of client j in solutions S , S . Then since 1 2 j j p ′ p−1 ′ 2 p (3 + ϵ )(2 · 3 − ϵ ) < · 9 forϵ ∈ [0, 1] to prove the lemma it suices to show that for each client j the p−1 ′ expected contribution to the objective using discount r for9client j is at most 2· 3 − ϵ times the contribution of client j to the primal solution’s objective using discount r . That is: 3 p p (1)p p (2)p p p + p−1 ′ p + p + E[(c − 9 r ) ] ≤ (2· 3 − ϵ )[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ] j j j j j j Suppose client j’s nearest cluster center in S is in S . Then with probability ρ, j is connected to that cluster (1) center at connection costc , and with probability − ρ 1it is connected to the nearest cluster centerS in at (2) connection costc . Then: p p p p (1)p p (2)p p p + p + p + + E[(c − 9 r ) ] ≤ E[(c − 3 r ) ] = ρ(c − 3 r ) + (1− ρ)(c − 3r ) j j j j j j j j ′ ′ Suppose client j’s nearest cluster center in S (calliit ) is not in S . Note that each cluster center in S \ S has 1 1 1 1 1 probability ρ of being opened. Thus with probability ρ, we can upper bound c by the distance from j to i . If this j 1 does not happen, leti be j’s nearest cluster center in S and i be the cluster center nearest toi inS . One of 2 2 2 1 i , i must be opened, so we can bound j’s connection cost by its connection cost to whichever is opened. Then one of three cases occurs: • With probability ρ, j’s nearest cluster center in S is opened. Thenc is at most the distance from j to i , i.e. 1 j 1 (1) c . • With probability (1− ρ)ρ, j’s nearest cluster center in S is not opened andS is opened.c is at most the 1 j ′ ′ ′ distance from j to i . Sincei is the cluster center closesti toinS , the distance from i to i is at most the 2 1 2 1 1 1 (1) (2) distance from i to i , which is at most c + c . Then by triangle inequality, the distance j tofri om is at 1 2 j j 1 p (1)p (2)p (1) (2) p−1 most c + 2c . Using the AMGM inequality, wce get≤ 3 (c + 2c ). j j i j j j • With probability (1− ρ) , j’s nearest cluster center in S is not opened andS is opened.c is at most the 1 2 j (2) distance from j to i , i.ec. . Then we get: p + E[(c − 9 r ) ] (1)p p (2)p p p p + 2 p + p−1 (1) (2) p + ≤ρ(c − 9 r ) + (1− ρ) (c − 9 r ) + (1− ρ)ρ(3 (c + 2c ) − 9 r ) j j j j j j j (1)p p (2)p p p p + 2 p + p−1 (1) (2) p + =ρ(c − 9 r ) + (1− ρ) (c − 9 r ) + 3 (1− ρ)ρ(c + 2c − 3· 3 r ) j j j j j j j (1) p (2) p (1) p p + 2 p + p−1 p + ≤ρ(c − 3 r ) + (1− ρ) (c − 3 r ) + 3 (1− ρ)ρ(c − 3 r ) j j j j j j (2) p p−1 p + + 2· 3 (1− ρ)ρ(c − 3 r ) j j p p (1) (2) p−1 p + p−1 p + =(3 (1− ρ) + 1)[ρ(c − 3 r ) ] + (2· 3 ρ + 1− ρ)[(1− ρ)(c − 3 r ) ] j j j j (1) p (2) p p−1 p + p + ≤(2· 3 − min{ρ, 1− ρ})[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ] j j j j (1) p (2) p p−1 ′ p + p + ≤(2· 3 − ϵ )[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ]. j j j j ACM Trans. Algor. Universal Algorithms for Clustering Problems • 45 ′ 1 k−k Where the last step is given by choosing ϵ to be at most , sincep = and 1 ≤ k < k < k ≤ |F| and 2 1 |F | k −k 1 2 thus ρ and 1− ρ are both at least . This gives the lemma, except that the algorithm is randomized. However, |F | ∗ ′ the randomized rounding scheme can easily be derandomized: irst, we Schoto ose be whichever of S , S has a lower expected objective. Then, to choose the remaining k− k cluster centers to add toS , we add cluster centers ∗ ′ by one by one. When we have c cluster centers left to addSto, we add the cluster centeri fromS \ S that ∗ ′ ∗ minimizes the expected objective achievSed∪by{i} and c − 1 random cluster centers from (S \ S ) \ (S ∪{i}). Each step of the derandomization cannot increase the expected objective, so the derandomized algorithm achieves the guarantee of Lemma 3.4. □ Again, we note that Lemma 2.3 is obtained as a corollary of Lemma 3.4, wherepw=e set 1. ACM Trans. Algor. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Algorithms (TALG) Association for Computing Machinery

Universal Algorithms for Clustering Problems

Loading next page...
 
/lp/association-for-computing-machinery/universal-algorithms-for-clustering-problems-da5dfzjv5r

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Copyright held by the owner/author(s).
ISSN
1549-6325
eISSN
1549-6333
DOI
10.1145/3572840
Publisher site
See Article on Publisher Site

Abstract

ARUN GANESH, UC Berkeley, USA BRUCE M. MAGGS, Duke University and Emerald Innovations, USA DEBMALYA PANIGRAHI, Duke University, USA This paper presentsuniversalalgorithms for clustering problems, including the widely k-median, studie k-means, d and k-center objectives. The input is a metric space containing potential allclient locations. The algorithm must k cluster select centers such that they are a good solution for any subset of clients that actually realize. Speciically, weregr aim et, for low deined as the maximum over all subsets of the diference between the cost of the algorithm’s solution and that of an optimal solution. A universal algorithm’s solution sol for a clustering problem is said to(αb,eβan )-approximation if for all subsets ′ ′ ′ ′ ′ of clients C , it satisies sol(C ) ≤ α · opt(C ) + β · mr, where opt(C ) is the cost of the optimal solution forCclients and mr is the minimum regret achievable by any solution. Our main results are universal algorithms for the standard clustering obje k-me ctiv dian, es of k-means, and k-center that achieve(O (1),O (1))-approximations. These results are obtained via a novel framework for universal algorithms using linear programming (LP) relaxations. These results generalize to ℓ other -objectives and the setting where some subset of the clients are ixed. We also give hardness results showing(that α, β )-approximation is NP-harαd if or β is at most a certain constant, even for the widely studied special case of Euclidean metric spaces. This shows that in (Osome (1),Osense (1))-appr , oximation is the strongest type of guarantee obtainable for universal clustering. CCS Concepts: · Theory of computation → Facility location and clustering. Additional Key Words and Phrases: universal algorithms, clustering 1 INTRODUCTION In universalapproximation (e.g., 9,[10, 12, 22, 26, 31, 36, 49, 50]), the algorithm is presented with a set potential of input points and must produce a solution. After seeing the solution, an adversary selects some subset of the points as the actual realization of the input, and the cost of the solution is based on this realization. The goal of a universal algorithm is to obtain a solution that is near-optimal every possible for input realization. For example, suppose that a network-based-service provider can aford to deploy serversk at locations around the world and hopes to minimize latency between clients and servers. The service provider does not know in advance which clients will request service, but knows where clients are located. A universal solution provides guarantees on the quality of the solution regardless of which clients ultimately request service. As another example, suppose that a program committee chair wishes to invite k people to serve on the committee. The chair knows the areas of expertise of each person who is qualiied to serve. Based on past iterations of the conference, the chair also knows about many possible topics that might be addressed by submissions. The chair could use a universal algorithm to select a committee that will cover the topics well, regardless of the topics of the papers that are submitted. The In the context of clustering, universal facility location sometimes refers to facility location where facility costs scale with the number of clients assigned to them. This problem is unrelated to the notion of universal algorithms studied in this paper. Authors’ addresses: Arun Ganesh, arunganesh@berkeley.edu, UC Berkeley, Soda Hall, Berkeley, California, USA, 94709; Bruce M. Maggs, bmm@cs.duke.edu, Duke University and Emerald Innovations, 308 Research Drive, Durham, North Carolina, USA, 27710; Debmalya Panigrahi, debmalya@cs.duke.edu, Duke University, 308 Research Drive, Durham, North Carolina, USA, 27710. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2022 Copyright held by the owner/author(s). 1549-6325/2022/12-ART https://doi.org/10.1145/3572840 ACM Trans. Algor. 2 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi situation also arises in targeting advertising campaigns to client demographics. Suppose a campaign can spend fork advertisements, each targeted to a speciic client type. While the entire set of client types that are potentially interested in a new product is known, the exact subset of clients that will watch the ads, or eventually purchase the product, is unknown to the advertiser. How does the advertiser target kher advertisements to address the interests of any realized subset of clients? Motivated by these sorts of applications, this paper presents the irst universal algorithms for clustering problems, including the classic k-median,k-means, and k-center problems. The input to these algorithms is a metric space containing all locations clientsand of cluster centers. The algorithm must sele k ct cluster centers such that this is a good solution any forsubset of clients that actually realize. It is tempting to imagine that, in general, for some large enoughα,value one can ofind a solution sol such ′ ′ ′ ′ that for all realizations (i.e., subsets ofCclients) , sol(C ) ≤ α · opt(C ), where sol(C ) denotes sol’s cost in ′ ′ ′ realization C and opt(C ) denotes the optimal cost in realization C . But this turns out to be impossible for many problems, including the clustering problems we study, and indeed this diiculty may have limited the study of universal algorithms. For example, suppose that the inputkfor -methe dian problem is a uniform metric k + 1on points, each with a cluster center and client. In this case, for any sol solution withk cluster centers, there is some realization C consisting of a single client that is not co-located with k cluster any of the centers in sol. Then, ′ ′ sol(C ) > 0 but opt(C ) = 0. Since it is not possible to provide a strict multiplicative approximation guarantee for every realization, we could instead consider an additive approximation. That is, we could instead seek to minimize the regret, deined as the maximum diference between the cost of the algorithm’s solution and the optimal cost across all realizations. Informally, the regret is the additional cost incurred due to not knowing the realization ahead of time. The solution that minimizes regret is calle minimum d the regret solution , or mrs for short, and its ′ ′ regret is termedminimum regror et mr. More formally mr, = min max ′[sol(C ) − opt(C )]. We now seek a sol C ′ ′ ′ ′ ′ solution sol that achieves, for all input realizations C , sol(C )− opt(C ) ≤ mr, i.e.,sol(C ) ≤ opt(C ) + mr. But, obtaining such a solution turns out to NPb-har e d for many problems, and furthermore obtaining a solution that achieves approximately minimum regret (that is, it has regretc ·atmr most for somec) is also NP-hard in general (see Section 8 for more details). In turn, one has to settle for approximationopt on band othmr. That is, we settle ′ ′ ′ for seeking sol such that sol(C ) ≤ α · opt(C ) + β · mr for all C . An algorithm generating such a solution is then called an (α, β )-approximate universal algorithm for the problem. Note that in the aforementioned example withk + 1 points, any solution must pay mr (the distance between any two points) in some realization where opt(C ) = 0 and only one client appears (in which case paying mr might sound avoidable or undesirable). This example demonstrates that stricter notions of regret and approximation (αthan , β )-approximation are infeasible in general, suggesting that (α, β )-approximation is the least relaxed guarantee possible for universal clustering. 1.1 Problem Definitions and Results We are now ready to formally deine our problems and state our results. In all the clustering problems that we consider in this paper, the input is a metric space on all the potential client C and cluster locations centersF. Let c denote the metric distance between points i and j. The solution produced by the algorithm comprises k ij cluster centers in F; let us denote this set by sol. Now, suppose a subset of clients C ⊆ C realizes in the actual input. Then, the cost of each client j ∈ C is given as the distance from the client to its closest cluster center, 2 d We only consider inite C in this paper.CIfis ininite C (e.g. = R ), then the minimum regret will usually also be ininite. If one restricts to realizations where, say, at most m clients appear, it suices to consider realizations that m clients place at one of initely many points, letting us reduce to universal k-center with inite C. The special case wherFe= C has also been studied in the clustering literature17 , e,.g., 32],in although [ the more common setting (as in our work) is to not make this assumption. Of course, all results (including ours) without this assumption also apply to the special case. If F = C, the constants in our bounds improve, but the results are qualitatively the same. We note that some sources refer k-center to the problem whenF , C as the k-supplier problem instead, andkuse -center to refer exclusively to the case wher F =e C. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 3 i.e., cost(j, sol) = min c . The clustering problems difer in how these costs are combined into the overall i∈sol ij minimization objective. The respective objectives are given below: • k-median (e.g., [6, 13, 17, 34, 44]): sum of client costs, i.e sol.,(C ) = ′ cost(j, sol). j∈C • k-center (e.g., [21, 32, 33, 40, 47]): maximumclient cost, i.e sol.,(C ) = max cost(j, sol). j∈C ′ 2 • k-means (e.g., [1, 30, 37, 43, 45]): ℓ -norm of client costs, i.e sol.,(C ) = ′ cost(j, sol) . 2 j∈C We also consider ℓ -clustering (e.g., [30]) which generalizes all these individual clusteringℓobje - ctives. In p p clustering, the objective isℓ the -norm of the client costs for a given value p ≥ 1, i.e., 1/p ′ * p+ . / sol(C ) = cost(j, sol) . j∈C , - Note that k-median andk-means are special cases ℓof-clustering for p = 1 and p = 2 respectively k-center . can also be deined in the ℓ -clustering framework as the limit of the objectiv p →e∞for ; moreover, it is well-known that ℓ -norms only difer by constants for p > logn (see Appendix A), thereby allowingk-center the objective to be approximated within a constantℓby -clustering for p = logn. Our main result is to obtain (O (1),O (1))-approximate universal algorithms k-me for dian,k-center, and k-means. We also generalize these results toℓ the -clustering problem. Theorem 1.1. There are (O (1),O (1))-approximate universal algorithms forkthe -median,k-means, and k-center problems. More generally, there are (O (p),O (p ))-approximate universal algorithmsℓfor -clustering problems, for any p ≥ 1. Remark: The bound fork-means is by setting p = 2 inℓ -clustering. For k-median andk-center, we use separate algorithms to obtain improved bounds than those provided by ℓ -clustering the result. This is particularly noteworthy fork-center where ℓ -clustering only gives poly-logarithmic approximation. Universal Clustering with Fixed Clients. We also consider a more general setting where some of the clients are ixed, i.e., are there in any realization, but the remaining clients may or may not realize as in the previous case. (Of course, if no client is ixed, we get back the previous setting as a special case.) This more general model is inspired by settings where a set of clients is already present but the remaining clients are mere predictions. This surprisingly creates new technical challenges, that we overcome to get: Theorem 1.2. There are (O (1),O (1))-approximate universal algorithms for kthe -median,k-means, and k- 2 2 center problems with ixed clients. More generally, there ar (Oe(p ),O (p ))-approximate universal algorithms for ℓ -clustering problems, for anyp ≥ 1. Hardness Results. Next, we study the limits of approximation for universal clustering. In particular, we show that the universal clustering problems for all the objectives considered in this NP-har pap d er inaraerather strong sense. Speciically, we show that bαoth and β are separately bounded away from 1, irrespective of the value of the other parameter, showing the necessity of both α and β in our approximation bounds. Similar lower bounds continue to hold for universal clustering in Euclidean metrics, even when PTASes are known in the oline (non-universal) setting [1, 5, 41, 43, 47]. Theorem 1.3. In universalℓ -clustering for anyp ≥ 1, obtaining α < 3 or β < 2 isNP-hard. Even for Euclidean metrics, obtaining α < 1.8 or β ≤ 1 isNP-hard. The lower bounds on α (resp., β) are independent of the value ofβ (resp., α). Interestingly, our lower bounds rely on realizations where sometimes as few as one client appears. This suggests that e.g. redeining regret to be some function of the number of clients that appear (rather than just their cost) cannot subvert these lower bounds. ACM Trans. Algor. 4 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 1.2 Techniques Before discussing our techniques, we discuss why standard approximations for clustering problems are insuicient. It is known that theoptimalsolution for the realization that includes all clients (1, 2)-appr giv oximation es a for universal k-median (this is a corollary of a more general result 38]; wein do[not know if their analysis can be extended to e.g. k-means), giving universal algorithms for easyž ł cases k-meof dian such as tree metrics. But, the clustering problems we consider in this pap NP er-har ared in general; so, the best we can hope for in polynomial time is to obtain optimal fractionalsolutions, or approximateinteger solutions. Unfortunately, the proof of38 [ ] does not generalize any to regret guarantee for the optimal fractionalsolution. Furthermore, for all problems considered in this paper, (e1v+ enϵ )-approximate (integer) solutions for the łall clientsž instance are not guaranteed to be (α, β )-approximations for any inite α, β (see the example in Appendix B). These observations fundamentally distinguish universal approximations NP-hard pr for oblems like the clustering problems in this paper from those in P, and require us to develop new techniques for universal approximations. In this paper, we develop a general framework for universal approximation based on linear prlp ogramming ) ( relaxations that forms the basis of our results k-me ondian,k-means, and k-center (Theorem 1.1) as well as the extension to universal clustering with ixed clients (Theorem 1.2). The irst step in our framework is to write lpan relaxation of the regret minimization problem. In this formulation, we introduce a new regret variable that we seek to minimize and is constrained to be at least the diference between the (fractional) solution obtainelp d by and the the optimal integer solution for every realizable instance . Abstractly, if the lp relaxation of the optimization problem ismin giv{cen· xby : x ∈ P}, then the new regret minimization lp is given by min{r : x ∈ P; c(I )·x ≤ opt(I ) + r, ∀I}. Here, I ranges over all realizable instances of the problem. Hence lp is, the exponential in size, and we need to invoke the ellipsoid method via a separation oracle to obtain an optimal fractional solution. We irst note that the constraintsx ∈ P can be handled using a separation oracle for the optimization problem itself. So, our focus is on designing a separation oracle for the new set of constraints c(I )·x ≤ opt(I ) + r, ∀I. This amounts to determining the regret of a ixed solution given x, by which unfortunatelyNP , is-hard for our clustering problems. So, we settle for designing an approximate separation oracle, i.e., approximating the regret of a given solution. For k-median, we reduce this to a submodular maximization problem subject to a cardinality constraint, which can then be (approximately) solved via standard greedy algorithms. k-means, For and more generally ℓ -clustering, the situation is more complex. Similarly k-median, to maximizing regret for a ixed solution can be reduced to a set of submodular maximization problems, but deriving the functional value for a given set now requires solving the NP-hard knapsack problem. We overcome this diiculty by showing that we can fractional use knapsack solutions as surrogates for the optimal integer knapsack solution in this reduction, thereby restoring polynomial running time of the submodular maximization oracle. Finally, in the presence of ixed clients, we need to run the submodular maximization algorithm over a set of combinatorial obje indep cts calle endencedsystems . In this case, the resulting separation oracle only gives a bicriteria guarantee, i.e., the solutions it considers feasible are only guaranteed to satisfy∀I : c(I )·x ≤ α · opt(I ) + β · r for constantsα and β. Note that the bi-criteria guarantee suices for our purposes since these constants are absorbed in the overall approximation bounds. The next step in our framework is to round these fractional solutions to integer solutions for the regret minimization lp. Typically, in clustering problemsksuch -median, as lp rounding algorithms giv average e guarantees, i.e., although the overall objective in the integer solution is bounded against that of the fractional For problems like k-means with non-linear objectives, the constraint c(I )·x ≤ opt(I ) + r cannot be replaced with a constraint that is simultaneously linear x, r.in However, for a ixed value rof , the corresponding non-linear constraints still give a convex feasible region, and so the techniques we discuss in this section can still be used. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 5 solution, individual connection costs of clients are not (deterministically) preserved in the rounding. But, average guarantees are too weak for our purpose: in a realized instance, an adversary may only select the clients whose connection costs increase by a large factor in the rounding thereby causing a large regret. Ideally, we would like to ensure that the connection costevof ery individual client is preserved up to a constant in the rounding. However, this may be impossible in general, i.e., no integer solution might satisfy this requirement. Consider a uniform metric ovker+ 1 points. One fractional solution is to make fraction of each point a cluster center. k+1 1 1 Then, each client has connection cost in the fractional solution since it needs to conne fraction ct to a k+1 k+1 remote point. However, in any integer solution, since there ar k ecluster only centers butk + 1 points overall, there is one client that has connection cost of 1, which k + 1is times its fractional connection cost. To overcome this diiculty, we allow for a uniform additivincr e ease in the connection cost of every client. We show that such a rounding also preserves the regret guarantee of our fractional solution within constant factors. The clustering problem we now solve has a modiied objective: for every client, the distance to the closest cluster center is now discounted by the additive allowance, with the caveat that the connection cost is 0 if this diference is negative. This variant is a generalization of a problem25 app ],earing and we call in [ it clustering with discounts(e.g., fork-median, we call this problem k-median with discounts .) Our main tool in the rounding then becomes an approximation algorithm to clustering problems with discounts. k-median, For we use a Lagrangian relaxation of this problem to the classic facility location problem to design such an appr k-means oximation. For and ℓ -clustering, the same general concept applies, but we need an additional ingredient virtual calle solution d a that acts as a surrogate between the regret of the (rounded) integer solution and that of the fractional solution obtained above. Fork-center, we give a purely combinatorial (greedy) algorithm. 1.3 Related Work For all previous universal algorithms, the approximation factor corresponds to ourα,parameter i.e., these algorithms ar(αe , 0)-approximate. The notion of regret was not considered. As we have explained, however, it is not possible to obtain such results for universal clustering. Furthermore, it may be possible to trade-of some of the large values αofin the results below, eΩ.g., ( n) for set cover, by allowing β > 0. Universal algorithms have been of large interest in part because of their applications as online algorithms where all the computation is performed ahead of time. Much of the work on universal algorithms has focused on TSP. For Euclidean TSP in the plane, Platzman and Bartholdi 49] gave[an O (logn)-approximate universal algorithm. Hajiaghayi et al. [31] generalized this resultO to(an logn)-approximation for minor-free metrics, and Schalekamp and Shmoys [50] gave an O (logn)-approximation for tree metrics. For arbitrary metrics, et al.Jia [35] presented an O (logn/ log log n)-approximation, which improves to O (an logn)-approximation for doubling metrics. The approximation factor for arbitrary metrics was imprO ov(elog d ton) by Gupta et al. [26]. It is also known that these logarithmic bounds are essentially tight for univ9ersal , 10, 22TSP , 31].[ For the metric Steiner tree problem, Jiaet al. [35] adapted their own TSP algorithm to provide O (an logn/ log log n)-approximate universal algorithm, logn which is also tight up to the exponent of the2,log 10, 35 [ ]. Busch et al. [12] present an O (2 )-approximation for universal Steiner tree on general graphs and O (pan olylog (n))-approximation for minor-free graphs. Finally, for universal (weighted) set coveret , Jia al. [35] (see also23 [ ]) provide anO ( n logn)-approximate universal algorithm and an almost matching lower bound. The problem of minimizing regret has been studied in the context of robust optimization. The robust 1-median problem was introduced for tree metrics by Kouvelis and42 Yu] and in [several faster algorithms and for general metrics were developed in the following years (8e]). .g. For see r[obust k-center, Averbakh and Berman[8] gave a reduction to a linear number of ordinar k-center y problems, and thus for classes of instances where the ordinary k-center problem is polynomial time solvable (e.g., instances with k or constant on tree metrics) this problem is also polynomial time solvable 7]. A difer [ ent notion of robust algorithms is one wherSe of a set possible ACM Trans. Algor. 6 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi scenarios is provided as part of the input to the problem. This model was originally considered for network design problems (see the survey by Chekuri18[]). Anthony et al. [3] gave an O (logn + log|S|)-approximation algorithm for solving k-median and a variety of related problems in this model 11 (se])e on also an n[ -point metric space. However, note that |S| can be exponential |in C| in general. Another popular model for uncertainty is two-stage optimization 16, 19 (e,.g., 20,[27ś 29, 39, 51ś 53]). Here, the irst stage presents a set of realizable instances (or a distribution over them) and the second stage chooses one of those realizations. The algorithm is free to make choices at either stage but those choices come at a higher cost in the second stage when it has more information about the input. Because of the diferent costs, results in this model have no bearing on our setting. Roadmap. We present the constant approximation algorithms (Theorem 1.1) for univ k-me ersal dian,ℓ -clustering (k-means is a special case), and k-center in Sections 2, 3, and 4 respectively. In describing these algorithms, we defer the clustering with discounts algorithms used in the rounding to Appendix C. We also give the extensions to universal clustering with ixed clients k-median, for k-means/ℓ -clustering, and k-center (Theorem 1.2) in Sections 5, 6, and 7. Finally, the hardness results for general metrics and for Euclidean metrics (Theorem 1.3) appear in Sections 8 and 9 respectively. 2 UNIVERSAL k-MEDIAN In this section, we prove the following theorem: Theorem 2.1. There exists a(27, 49)-approximate universal algorithm forkthe -median problem. We follow the recipe described in Section 1.2. Namely, the algorithm has two components. The irst component is a separation oracle for the regret minimization lp based on submodular maximization, which we deine below. E + Submodular Maximization with Cardinality Constraints. A (non-negative) function f : 2 → R is said to be submodular if for all S ⊆ T ⊆ E and x ∈ E, we have f (T ∪ {x}) − f (T ) ≤ f (S ∪ {x}) − f (S). It is said to be monotone if for all S ⊆ T ⊆ E, we have f (T ) ≥ f (S). The following theorem for maximizing monotone submodular functions subject to a cardinality constraint is well-known. Theorem 2.2 (Nemhauser et al. [48]). For the problem of inding S ⊆ E that maximizes a monotone submodular E + functionf : 2 → R , the natural greedy algorithm that starts with S = ∅ and repeatedly adds x ∈ E that maximizes f (S ∪ {x}) until|S| = k, is a ≈ 1.58-approximation. e−1 We give the reduction from the separation oracle to submodular maximization in Section 2.1, and then employ the above theorem. The second component of our framework is a rounding algorithm that emplo k-me ys the dian with discounts problem, which we deine below. k-median with Discounts. In the k-median with discounts problem, we are giv k-me en adian instance, but where each client j has an additional (non-negative) parameter r called its discount . Just as in thek-median problem, our goal is to place k cluster centers that minimize the total connection costs of all clients. But, the connection cost for client j can now be discounted by up tor , i.e., client j with connection cost c contributes j j (c − r ) := max{0,c − r } to the objective of the solution. j j j j Let opt be the cost of an optimal solution to k-me thedian with discounts problem. We say an algorithm alg that outputs a solution with connectionccost for client j is a(γ , σ )-approximation if: (c − γ · r ) ≤ σ · opt. j j j∈C That is, a(γ , σ )-approximate algorithm outputs a solution whose objective function when computed using discountsγ · r for all j is at mostσ times the optimal objective using discounts r . In the case where all r are j j j ACM Trans. Algor. Universal Algorithms for Clustering Problems • 7 equal,25 [ ] gave a (9, 6)-approximation algorithm for this problem based on the classic primal-dual algorithm for k-median. The following lemma generalizes their result to the setting r may wher difer: e the Lemma 2.3. There exists a (deterministic) polynomial-time (9, 6)-approximation algorithm forkthe -median with discounts problem. We give details of the algorithm and the proof of this lemma in Appendix C. We noterthat arewhen equal, all the constants in25 [ ] can be improved (see e.g.15 [ ]); we do not know of any similar improvement when r the may difer. In Section 2.2, we give the reduction from rounding the fractional solutionkfor -meuniv dianersal to the k-median with discounts problem, and then employ the above lemma. 2.1 Universal k-median: Fractional Algorithm The standard k-median polytope (see e.g., [34]) is given by: X X P = {(x,y) : x ≤ k;∀i, j : y ≤ x ;∀j : y ≥ 1;∀i, j : x ,y ∈ [0, 1]}. i ij i ij i ij i i Here, x represents whether pointi is chosen as a cluster center, and y represents whether client j connects to i i ij as its cluster center. Now, consider the follo lpwing formulation for minimizing r: regret X X ′ ′ min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (1) ij ij j∈C i ′ ′ where opt(C ) is the cost of the (integral) optimal solution in Crealization . Note that the new constraints: P P ′ ′ ∀C ⊆ C : ′ c y − opt(C ) ≤ r (we call it the regret constraint set) require that the regret is ratin most j∈C i ij ij all realizations. In order to solvelp (1), we need a separation oracle for the regret constraint set. Note that there are exponentially ′ ′ ′ many constraints corresponding to realizations C ; moreover, even for a single realization C , computingopt(C ) isNP-hard. So, we resort to designing an approximateseparation oracle. Fix some fractional solution (x,y, r ). ′ ′ Overloading notation,Slet (C ) denote the cost of the solution with cluster centers S in realization C . By deinition, ′ ′ opt(C ) = min S (C ). Then designing a separation oracle for the regret constraint set is equivalent to S⊆F,|S|=k determining if the following inequality holds:   X X    max max  c y − S (C ) ≤ r . ij ij   C ⊆C S⊆F,|S|=k  ′  j∈C i   We lip the order of the two maximizations, and deine f (S) as follows:   X X    ′    f (S) = max c y − S (C ) . y ij ij ′   C ⊆C  ′  j∈C i   Then designing a separation oracle is equivalent to maximizing f (S) forS ⊆ F subject to |S| = k. The rest of the proof consists of showing that this function is monotone and submodular, and eiciently computable. Lemma 2.4. Fixy. Then, f (S) is a monotone submodular function Sin . Moreover, f (S) is eiciently computable y y for a ixedS. Proof. Let d (j, S) := min′ c ′ denote the distance from clien j totthe nearest cluster center in S. IfS = ∅, i ∈S i j we say d (j, S) := ∞. The value of C that deinesf (S) is the set of all clients closer S thantoto the fractional ACM Trans. Algor. 8 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi ′ ′ solution y, i.e., c y > min c . This immediately establishes eicient computability f (S). Moreof over, i ij ij i ∈S i j y we can equivalently write f (S) as follows: X X f (S) = ( c y − d (j, S)) . y ij ij j∈C i A sum of monotone submodular functions is a monotone submodular function, so it suices to show that for all clients j, the new function д (S) := ( c y − d (j, S)) is monotone submodular. y, j ij ij P P + + • д ismonotone: forS ⊆ T , d (j,T ) ≤ d (j, S), and thus ( c y − d (j, S)) ≤ ( c y − d (j,T )) . y, j ij ij ij ij i i • д is submodular if: y, j ∀S ⊆ T ⊆ F ,∀x ∈ F : д (S ∪ {x}) − д (S) ≥ д (T ∪ {x}) − д (T ) y, j y, j y, j y, j FixS, T , and x. Assume д (T ∪{x})− д (T ) is positive (if it is zero, by monotonicity the above inequality y, j y, j trivially holds). This implies x is closer that to client j than any cluster center in T (and hence S too), i.e., d (j, x ) ≤ d (j,T ) ≤ d (j, S). Thus, d (j, x ) = d (j, S ∪ {x}) = d (j,T ∪ {x}) which impliesдthat (S ∪ {x}) = y, j д (T ∪ {x}). Then we just need to show that д (S) ≤ д (T ), but this holds by monotonicity. □ y, j y, j y, j By standard results (see e.g., GLS24 [ ]), we get an (α, β )-approximate fractional solution for univ k-meersal dian via the ellipsoid method if we have an approximate separationlporacle (1) thatfor given a fractional solution (x,y, r ) does either of the following: • Declares(x,y, r ) feasible, in which(xcase ,y) has cost at most α · opt(I) + β · r in all realizations, or • Outputs an inequality violate (xd,yby , r ) inlp (1). The approximate separation oracle does the following for the regret constraint set (all other constraints e−1 can be checked exactly): Given a solution (x,y, r ), ind an -approximate maximizer S off via Lemma 2.4 and Theorem 2.2. Let C be the set of clients closer S than to the fractional solution y (i.e., the realization that P P maximizes f (S)). Iff (S) > r, the separation oracle returns the violated inequality ′ c y − S (C ) ≤ r; y y j∈C i ij ij else, it declares the solution feasible. Whenever the actual (x,ryegr ) is et of at least · r, this oracle will S ind e−1 such that f (S) > r and output a violated inequality. Hence, we get the following lemma: Lemma 2.5. There exists a deterministic algorithm that in polynomial time computes a fractional ≈ 1.58- e−1 approximate solution for lp (1) representing the universal k-median problem. 2.2 Universal k-Median: Rounding Algorithm Let frac denote the -approximate fractional solution to the univ k-me ersal dian problem provided by Lemma 2.5. e−1 We will use the following prop k-me ertydian, of shown by Archeret al. [4]. Lemma 2.6 ([4]). The integrality gap of the natural lp relaxation of thek-median problem is at most 3. Lemmas 2.5 and 2.6 imply that that for any set of clients C , 1 e ′ ′ ′ · opt(C ) ≤ frac(C ) ≤ opt(C ) + · mr. (2) 3 e − 1 ′ ′ Our overall goal is to obtain a solution sol that minimizes max ′ [sol(C ) − opt(C )]. But, instead of optimizing C ⊆C ′ ′ over the exponentially many difer opt ent(C ) solutions, we use the surrogate·fra 3 c(C ) which has the advantage of being deined by a ixed solution frac, but still 3-approximates opt(C ) by Eq. 2. This suggests minimizing ′ ′ ′ the following objective instead: max [sol(C ) − 3· frac(C )]. For a given solution sol, the set of clients C that maximizes the new expression are the clients whose connection sol costs(denote in d c ) exceeds 3 times their cost infrac (denoted f ): ′ ′ + max[sol(C ) − 3· frac(C )] = (c − 3f ) . j j j∈C ACM Trans. Algor. Universal Algorithms for Clustering Problems • 9 But, minimizing this objective is precisely thek-me aim dian of the with discounts problem, where the discount for client j is f3 . This allows us to invoke Lemma 2.3 for k-me thedian with discounts problem. Thus, our overall algorithm is as follows. First, use Lemma 2.5 to ind a fractional frac =solution (x,y, r ). Let f := c y be the connection cost of client j infrac. Then, construct a k-median with discounts instance j ij ij where client j has discount f3 , and use Lemma 2.3 on this instance to obtain the inal solution to the universal k-median problem. We now complete the proof of Theorem 2.1 using the above lemmas. Proof of Theorem 2.1. Let m be the connection cost ofmrs to client j. Then, ′ ′ ′ ′ mr = max[mrs(C ) − opt(C )] ≥ max[mrs(C ) − 3· frac(C )] (by Eq. (2)) ′ ′ C ⊆C C ⊆C X X = (m − 3f ) = (m − 3f ) . j j j j j∈C j∈C:m >3f j j Thus, mr upper bounds the optimal objective ink-me the dian with discounts instance that we construct.c Let be the connection cost of client j in the solution output by the algorithm. Then by Lemma 2.3 we get that: X X + + (c − 27f ) ≤ 6· (m − 3f ) ≤ 6· mr. (3) j j j j j∈C j∈C As a consequence, we have: X X X X ′ + ′ ∀C ⊆ C : c = [27f + (c − 27f )] ≤ 27f + (c − 27f ) ≤ 27· frac(C ) + 6· mr, j j j j j j j ′ ′ ′ ′ j∈C j∈C j∈C j∈C where the last step uses the deinition f of and Eq. (3). Now, using the bound onfrac(C ) from Eq.(2) in the inequality above, we have the desired bound on the cost of the algorithm: ′ ′ ′ ′ ∀C ⊆ C : c ≤ 27· frac(C ) + 6· mr ≤ 27 opt(C ) + · mr + 6· mr ≤ 27· opt(C ) + 49· mr. □ e − 1 j∈C 3 UNIVERSAL ℓ -CLUSTERING AND UNIVERSAL k-MEANS In this section, we give universal algorithms ℓ -clustering for with the following guarantee: Theorem 3.1. For all p ≥ 1, there exists a(54p, 103p )-approximate universal algorithm for ℓthe -clustering problem. As a corollary, we obtain the following result forkuniv -means ersal (p = 2). Corollary 3.2. There exists a(108, 412)-approximate universal algorithm forkthe -means problem. Before describing further details of the univ ℓ -clustering ersal algorithm, we note a rather unusual feature of the universal clustering framework. Typically, in ℓ -clustering, standard the algorithms efectively optimize the ℓ objective (e.g., sum of squared distances for k-means) because these are equivalent in the following sense: an α-approximation for the ℓ objective is equivalent to α an -approximation for the ℓ objective. But, this equivalence fails in the setting of universal algorithms for reasons that we discuss below. Indeed, we irst give a universal ℓ -clustering algorithm, which is a simple extension k-median of thealgorithm, and then give universal ℓ -clustering algorithms, which turns out to be much more challenging. ACM Trans. Algor. 10 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 3.1 Universal ℓ -Clustering As in universal k-median, we can write an lp formulation for univℓersal -clustering, i.e. clustering with the ′ p objectivesol(C ) = ′ cost(j, sol) : j∈C X X ′ ′ min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (4) ij ij j∈C i where P is stillk-me the dian polytope deined in Section 2.1. The main diiculty is thatℓ the distances no longer form a metric, i.e., do not satisfy triangle inequality. Nevertheless, the distances still have a metric connection, that thepyth arpeothe wer of metric distances. We show that this connection is suicient to prove the following result: e 2 p p p Theorem 3.3. For all p ≥ 1, there exists a(27 , 27 · + · 9 )-approximate algorithm for the universal ℓ e−1 3 clustering problem. As in universal k-median, a key component in proving Theorem 3.3 is a rounding algorithm that employs a bi-criteria approximationℓto-clustering the with discounts problem. Indeed, this result will also be useful in the next subsection, when we consider the universal ℓ -clustering problem. So, we formallyℓdeine -clustering with discounts problem below and state our result for it. p p ℓ -clustering with Discounts. In this problem, are given ℓ -clustering a instance, but where each client j has an p p additional (non-negative) parameter r called its discount . Our goal is to place k cluster centers that minimize the total connection costs of all clients. But, the connection costjfor canclient now be discounted by up tor , p p p p i.e., client j with connection cost c contributes(c − r ) := max{0,c − r } to the objective of the solution. j j j j (Note that the k-median with discounts problem that we described in the previous section is a special case of this problem for p = 1.) Let opt be the cost of an optimal solution to ℓ -clustering the with discounts problem. We say an algorithm p 5 alg that outputs a solution with connectionccost for client j is a(γ , σ )-approximationif: p p p + (c − γ · r ) ≤ σ · opt. j j j∈C That is, a(γ , σ )-approximate algorithm outputs a solution whose objective function computed using discounts γ · r for all j is at mostσ times the optimal objective using discounts r . We give the following result about the j j ℓ -clustering with discounts problem (see Appendix C for details): p 2 p Lemma 3.4. There exists a (deterministic) polynomial-time (9 , · 9 )-approximation algorithm for ℓthe- clustering with discounts problem. We now employ this lemma in obtaining Theorem 3.3. Recall that thekuniv -median ersal result in the previous section had three main ingredients: • Lemma 2.5 to obtain an -approximate fractional solution. This continues to hold ℓ obje forctiv the e, e−1 since Lemma 2.5 does not use any metric property. • An upper bound of 3 on the integrality gap of the natural lp relaxation kof -median from4]. [ The same result now gives an upper bound ofon 3 the integrality gapℓ of -clustering. 5 p We refer to this as (aγ , σ )-approximation instead of (γ,aσ )-approximation to emphasize the diference between the scaling factor for discountsγ and the loss in approximation factor γ . ACM Trans. Algor. Universal Algorithms for Clustering Problems • 11 • Lemma 2.3 to obtain an approximation guarantee forkthe -median with discounts problem. This is where the metric property of the connection costs inkthe -median problem was being used. Nevertheless, Lemma 3.4 above gives a generalization of Lemma 2.3 to ℓ -clustering the with discounts problem. Theorem 3.3 now follows from these three observations using exactly the same steps as Theorem 2.1 in the previous section; we omit these steps for brevity. □ The rest of this section is dedicated to the univ ℓ ersal -clustering problem. Askfor -median, we have two stages, the fractional algorithm and the rounding algorithm, that we present in the next two subsections. 3.2 Universal ℓ -Clustering: Fractional Algorithm Let us start by describing the fractional relaxation of the univ ℓ -clustering ersal problem (again,P is thek-median polytope deined as in Section 2.1): 1/p X X ′ * + ′ . / min{r : (x,y) ∈ P;∀C ⊆ C : c y − opt(C ) ≤ r}, (5) ij ij j∈C i , - As described earlier, when minimizing regr ℓ et, andthe ℓ objectives are no longer equivalent. For instance, recall that one of the key steps in Lemma 2.5 was to establish the submodularity of thef function (S) denoting the maximum regret caused by any realization when comparing two given solutions: a fractional y andsolution an integer solution S. Indeed, the worst case realization had a simple structure: choose all clients that have a smaller connection costSfor than fory. This observation continues to hold for ℓ the objective because of the linearityf of (S) as a function of the realized clients y and once S are ixed. But, theℓ objective is not linear even y p after ixing the solutions, and as a consequence, we lose both the simple structure of the maximizing realization as well as the submodularity of the overall function f (S). For instance, consider two clients: one at distances 1 and 0, and another at distances 1+ ϵ and 1, fromy and S respectively. Using ℓtheobjective, the regret with both 1/p clients(is 2 + ϵ ) − 1, whereas with just the irst client the regret is 1, which is larger p ≥ 2. for The above observation results in two related diiculties: irst, f (S) that is not submodular and hence standard submodular maximization techniques do not apply, but also that y and giv S,en we cannot even compute the functionf (S) eiciently. To overcome this diiculty, we further reine the function f (S) to a collection of y y functions f (S) by also ixing the cost of the fractional solution y to at most a given value Y . As we will soon y,Y see, this allows us to relateℓ the objective to theℓ objective, but under an additional łknapsackž-like packing constraint. It is still not immediate f is thateiciently computable because of the knapsack constraint that y,Y we have introduced. Our second observation is that relaxing NPthe -har ( d) integer knapsack problem to the corresponding (poly-time) fractionalknapsack problem does not afect the optimal value f of (S) (i.e., allowing y,Y fractional clients does not incr y’sease regret), while making the function eiciently computable. As a bonus, the relaxation to fractional knapsack also restores submodularity of the function, allowing us to use standard maximization tools as earlier. We describe these steps in detail below. p q q ′ ′ To relate the regret in the ℓ and ℓ objectives, let frac (C ) and S (C ) denote the ℓ objective to theqth p p p p p ′ ′ ′ power fory and S respectively in realization C (and letfrac (C ), S (C ) denote the correspondingℓ objectives). p p p Assume that y’s regret againstS is non-zero. Then: P P 6 ′ The constraints are not simultaneously linear y andin r , although ixing r , we can write these constraints as ′ c y ≤ (opt(C ) + i j j∈C i i j r ) , which is linear y. In inturn, to solve this program we bisection searchro,vusing er the ellipsoid method to determine if there is a feasible point for each ixe r . d ACM Trans. Algor. 12 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi   p p ′ ′ f g   frac (C ) − S (C ) p p ′ ′     max frac (C ) − S (C ) = max p p p−1 q p−1−q ′ ′   C ⊆C C ⊆C ′ ′  frac (C )S (C )  p p q=0   p p  ′ ′  frac (C ) − S (C )   p p   ≃ max   p−1 C ⊆C   frac (C )  p  p p  ′ ′  frac (C ) − S (C )   p p   = max max   p 1−1/p Y ′ ′  Y  C ⊆C:frac (C )≤Y   p p   ′ ′ max [frac (C ) − S (C )]  ′ ′  p p C ⊆C,frac (C )≤Y   = max   . 1−1/p       The ≃ denotes equality to within a factor p, and of uses the fact that if the regret is non-zero, then for every ′ ′ ′ C such that frac (C ) > S (C ) (one of which is always the maximizer of all expressions in this equation), every p p p−1 term in the sum in the denominator is upper boundefra d byc (C ). We would like to argue that the numerator, p p p ′ ′ ′ ′ max{frac (C ) − S (C ) : C ⊆ C, frac (C ) ≤ Y}, p p p is a submodular function S. of If we did this, then we could ind an adversary and realization of clients that (approximately) maximizes the regryetby ofiterating over all (discretized) values Y . But,of as described above, it is easier to work with its fractional analog because of the knapsack constraint: p p p f (S) := max{frac (I) − S (I) : I ∈ [0, 1] , frac (I) ≤ Y}. y,Y p p p P P P p p p p p Here, frac (I) := d · c y and S (I) := d · min c are the natural extensions of the ℓ j ij j i∈S p j∈C i∈F p j∈C p ij ij objective to fractional clients,dwher isethe fraction of client j that is includedI.in The next lemma shows that allowing fractional clients does not afect the maximum regret: Lemma 3.5. For any two solutionsy, S, there exists a global maximum fra of c (I) − S (I) over I ∈ [0, 1] where p p all the clients are integral, i.e I ∈.,{0, 1} . Therefore, f g f g ′ ′ max frac (I) − S (I) = max frac (C ) − S (C ) . p p p p C ⊆C I∈[0,1] We remark that unlike for the ℓ objective, integrality of the maximizer is not immediate ℓ obje for ctiv thee because the regret ofy compared to S is not a linear function I. of Proof. We will show that the derivativ fra ecof(I) − S (I) when frac (I) > S (I) with respect to a ixed p p p p d is either always positive or always negativ d ∈e(0for , 1), or negative while < 0d < d and then positive j j j afterwards. This gives that any I with a fractional coordinate wher frace (I) > S (I) (which is necessary for a p p fractional I to be a global maximum but not the integral all-zeroes vector) cannot be a local maximum, giving the lemma. To show this property of the derivative, letting I denote I withd = 0, we have frac (I)−S (I) = (frac (I )+ −j j p p −j p p 1/p 1/p c d ) − (S (I ) + c d ) where c ,c are the ℓ distance from j to frac, S respectively. For positiv d , the e 1 j −j 2 j 1 2 j p p derivative with respectd tois well-deined and equals ACM Trans. Algor. Universal Algorithms for Clustering Problems • 13 1 c c 1 2 * + − . p p 1−1/p 1−1/p (frac (I ) + c d ) (S (I ) + c d ) −j 1 j −j 2 j p p , - The derivative is positive if the following inequality holds: S (I ) + c d p −j 2 j c p−1 > ( ) . frac (I ) + c d 1 p −j 1 j p p We irst focus on the case wherec > c . The left-hand side startsSat(I )/frac (I ) and monotonically 1 2 p −j p −j and asymptotically approaches c /c as d increases. This implies that either it is always c /c atorleast it is 2 1 j 2 1 p/(p−1) p/(p−1) ′ increasing and approaching c /c , i.e. at least (c /c ) ford > 0 or less than(c /c ) for all d < d 2 1 2 1 j 2 1 j ′ p/(p−1) ′ for somed and then greater than (c /c ) for all d > d (here, we are using the fact that c > c and so 2 1 j 1 2 p/(p−1) (c /c ) < c /c ). In turn, the desired property of the derivative holds. 2 1 2 1 p p p p In the case where c ≤ c , at any optimumS (I ) + c d < frac (I ) + c d (otherwisefrac (I) < S (I) 1 2 −j 2 j −j 1 j p p p p and so I cannot be a maximum because the all zeroes vector achieves a better objective) and so the derivative is always negative in this case as desired. □ It turns out that relaxing to fractional clients not only helps in eicient computability f of (Sthe ), function y,Y but also simpliies the proof of submodularity of the function. Lemma 3.6. The functionf (S) as deined above is submodular. y,Y Proof. Fix a universal clustering instance, fractional y, and solution valueY . Consider anyS ⊆ T ⊆ F and x ∈ F. f (T ∪{x})− f (T ) ≤ f (S ∪{x})− f (S). f (S) is the optimum of a fractional knapsack instance y,Y y,Y y,Y y,Y y,Y p p where each client is an item with value equal to the diference between its contribution frac (I) − Sto(I) p p and weight equal to its contribution fracto(I), with the knapsack having total weight Y . For simplicity we can assume there is a dummy item with value 0 and weight Y in the knapsack instance as well (a value 0 item cannot afect the optimum). Note that the weights are ixed for any S, and the values increase monotonically withS. We will refer to the latter fact monotonicity as of values . The optimum of a fractional knapsack instance is given by sorting the items in decreasing ratio of value to weight, and taking the (fractional) preix of the items sorted this way that has total weight Y . We will refer to this fact pr aseix the property. We will show that we can construct a fractional knapsack solution that when using clusterScenters ∪ {x} has value at least f (S) + f (T ∪{x}) − f (T ), proving the lemma. For brevity, we will refer to fractions of clients which may y,Y y,Y y,Y be in the knapsack as if they were integral clients. Considerf (T ∪ {x}) − f (T ). We can split this diference into four cases: y,Y y,Y (1) A client in the optimal knapsack S, Tfor , and T ∪ {x}. (2) A client in the optimal knapsack S and forT ∪ {x} but not forT . (3) A client in the optimal knapsack T and forT ∪ {x} but not forS. (4) A client in the optimal knapsack T ∪for {x} but not forS or T . In every case, the client’s value must have increased (otherwise, it cannot contribute to the diference in cases 1 and 3, or it must also beTin ’s knapsack in cases 2 and 4), i.e x .is the closest cluster center to the client T ∪{in x} (and thus S ∪ {x}). Let w ,w ,w ,w be the total weight of clients in each case. The total weight ofTclients ’s in 1 2 3 4 knapsack but not T ∪ {x} isw + w . We will refer to these clients as replaced clients. The increase in value due 2 4 to cases 2 and 4 can be thought of as replacing the replaced clients with case 2 and 4 clients. In particular, we will think of the case 2 clients as replacing the replaced clientswofwith weight the smallest total value T , and for the case 4 clients as replacing the remaining replaced clients (i.e. those with the largest T ).total value for ACM Trans. Algor. 14 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Without loss of generality, we assume there are no case 1 clients. By the preix property, any of the knapsack instances for S,T ,T ∪ {x} (and alsoS ∪ {x} by monotonicity of values and the preix property) has optimal value equal to the total value of case 1 clients plus the optimal value of a smaller knapsack instance with total weight Y − w and all clients except case 1 clients available. The value of case S 1∪clients {x} and Tfor ∪ {x} is the same (since the values are determinedxby ), and can only be smallerSfor than T by monotonicity of values. In turn, we just need to construct a knapsack forS ∪{x} for the smaller instance with no case 1 clients whose value is at least that Sofplus the contributionf to (T ∪ {x}) − f (T ) from cases 2-4. y,Y y,Y To build the desired knapsackSfor∪ {x}, we start with the knapsack for S. The case 2 clients Sin ’s knapsack by the preix property have less value for S than forT by monotonicity of values. By the preix propertyT, for the case 2 clients have less value than the replaced clients of total w with weight the smallest total value T (since for the former are not in the optimal knapsackTfor , and the latter are). So, the increase in value of the case 2 clients inS’s knapsack is at least the contribution f to(T ∪ {x}) − f (T ) due to replacing clients T ’sin knapsack y,Y y,Y with case 2 clients. To account for the case 4 clients, we take the clients S’s knapsack in which are not case 2 clients with weight w and the least total valueTfor , and replace them with the case 4 clients. These clients are among the clients of total weight w + w with the lowest value-to-weight ratios T ) (for inS’s knapsack (they aren’t necessarily the 2 4 clients of total weight w with the lowest value-to-weight ratios, because we chose not to include case 2 clients in this set). On the other hand, the replaced clients T ’s knapsack in all have value-to-weight ratios T grfor eater than at leastw weight of other clients T ’sin knapsack (those replaced by the case 2 clients). So by monotonicity of values and the preix property, the clients we replace S’s knapsack in with case 4 clients have lower value S for than the clients being replaced by case 4 clientsT ,do and for so we increase the value of our knapsack by more than the contribution to f (T ∪ {x}) − f (T ) due to case 4 clients. y,Y y,Y Lastly, we take any clientsS’s inknapsack which are not in T ’s knapsack or case 2 or case 4 clients with total weight w and replace them with the case 3 clients. Since these clients ar Te’snot knapsack, in their value forT (and thus their value for S by monotonicity of values) is less than the case 3 clients’Tvalue by thefor preix property. In turn, this replacement increases the value of our knapsack by more than the contribution to f (T ∪ {x}) − f (T ) due to case 3 clients. □ y,Y y,Y This lemma allows us to now give an approximate separation oracle for fractional solutions of universal ℓ -clustering by trying all guesses Y (ℓfor -SepOracle in Figure 1). p p One complication of using the above as a separation oracle in the ellipsoid algorithm is that it outputs linear constraints whereas the actual constraints in the fractional relaxation (5)giv are en non-linear in . So, in the following lemma, we need some additional work to show that violation of the linear constraints output by ℓ -SepOracle also implies violation of the non-linear constraints in (5). Lemma 3.7. For any p ≥ 1, ϵ > 0 there exists an algorithm which inds ( a · p + ϵ )-approximation to the lp for e−1 the universalℓ -clustering problem. Proof. The algorithm is to use the ellipsoid metho ℓ -SepOra d with cle as the separation oracle. We note that f (S) can be evaluated in polynomial time by computing optimal fractional knapsacks as discussed in the proof y,Y of Lemma 3.6. In addition, there are polynomially values Y that arof e iterated over, since⌈log ′ c /c ⌉ is max min 1+ϵ O (p logn/ϵ ) times the number of bits needed to describe the largest value c . Soofeach call ℓto-SepOracle ij p takes polynomial time. p p p−1 By Lemma 3.5 and the observation thatfrac (I)−S (I) ≤ p·frac (I)(frac (I)−S (I)) if frac (I)−S (I) > 0, p p p p p p p we get that ℓ -SepOracle cannot output a violated inequality if the input solution is feasible (5). So we just to Eq. ′ ′ e ′ need to show that if frac (C )− S (C ) ≥ ( p + ϵ )r for someS,C , ℓ -SepOracle outputs a violated inequality, p p p e−1 i.e. does not output łFeasible.ž Let Y be the smallest valueYof iterated over byℓ -SepOracle that is at least ACM Trans. Algor. Universal Algorithms for Clustering Problems • 15 ℓ -SepOracle((x,y, r ), F, C): Input: Fractional solution (x,y, r ), set of cluster centers F, set of all clients C 1: if Any constraint in (5) except the regret constraint set is violate then d 2: return the violated constraint 3: end if p p 4: c ← min c ,c ← max c min i∈F, j∈C max j∈C i∈F ij ij ′ ′ 2 ′ ⌈log ′ c /c ⌉ max min 1+ϵ 5: for Y ∈ {c ,c (1 + ϵ ),c (1 + ϵ ) , . . . c (1 + ϵ ) } do min min min min e−1 6: S ← -maximizer of f subject to |S| ≤ k via Theorem 2.2 y,Y P P P p p P P 7: I ← argmax d c y − d min c j∈C j i∈F ij j∈C j i∈S I∈[0,1] : d c y ≤Y ij ij j i j j∈C i∈F i j f g P P P p p ′ ′ 8: if d c y − d min c > r then 1−1/p j∈C i∈F ij j∈C i∈S j j ij ij pY f g P P P p p 1 ′ ′ 9: return d c y − d min c ≤ r 1−1/p j∈C i∈F ij j∈C i∈S j ij j ij pY 10: end if 11: end for 12: return łFeasiblež Fig. 1. Separation Oracle forℓ -Clustering ′ ′ e−1 ′ frac (C ), and S the -maximizer of f found byℓ -SepOracle. We have for theI found on Line 7 of y,Y p ℓ -SepOracle: f g (i ) 1 e − 1 p p p p ′ ′ ′ ′ ′ frac (I ) − S (I ) ≥ max [frac (I ) − S (I )] ≥ p p p p ′ 1−1/p ′ 1−1/p p(Y ) pe (Y ) S,I:frac (I)≤Y f g e − 1 e − 1 p p p p ′ ′ max [frac (I) − S (I)] ≥ frac (C ) − S (C ) ≥ p p p p ′ 1−1/p p p ′ 1−1/p pe (Y ) pe (Y ) S,I:frac (I)≤frac (C ) p p p p ′ ′ f g frac (C ) − S (C ) e − 1 e − 1 p p ′ ′ = frac (C ) − S (C ) > r . p p ′ p−1 ′ pe (1 + ϵ ) pe (1 + ϵ ) frac (C ) p p ′ ′ In (i), we use the fact that for a ixeSd, max p [frac (I ) − S (I )] is the solution to a fractional p p I:frac (I)≤Y knapsack problem with weight Y , and that decreasing the weight allowed in a fractional knapsack instance can (e−1) only reduce the optimum. For the last inequality to hold, we just need to ϵ cho < ϵose inℓ -SepOracle for pe the desiredϵ. This shows that for Y , ℓ -SepOracle will output a violated inequality as desired. □ 3.3 Universal ℓ -Clustering: Rounding Algorithm At a high level, we use the same strategy for rounding the fractional ℓ -clustering solution as we did with k-median. Namely, we solve a discounted version of the problem where the discount for each client is equal to the (scaled) cost of the client in the fractional solution. However, if we apply this ℓ dir objeectiv ctlye,to the we run into several problems. In particular, the linear discounts are incompatible with the non-linear objective deined over the clients. A more promising idea is to use these discounts ℓ obje on the ctive, which in fact is deined as a linear combination over the individual client’s objectives. But, for this to work, we will irst need to relate the regret bound in the ℓ objective to that in the ℓ objective. This is clearly not true in general, i.e., for all p p realizations. However, we show that the realization that maximizes the regret of analg algorithm against a ixed solution sol in both objectives is the same under the following łfarnessž condition: for every client, either alg’s connection is smaller than sol’s or it is at least p times as large assol’s. Given any solution sol, it is easy to deine ACM Trans. Algor. 16 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi a virtualsolution sol g whose individual connection costs are bounde p times d by that in sol, and sol g satisies the farness condition. This allows us to relate the ralg egret against of sol g (and thus againstp timessol) in theℓ objective to its regret in ℓtheobjective. We irst state the technical lemma relating ℓ and theℓ objectives under the farness condition. Informally, this lemma says that if we want to choose a realization maximizing the alg regr against et of sol in (an approximation of) theℓ -objective, we should always include a client whose distance alg exceto eds their distance sol to by a factor larger than p. This contrasts (and limits) the example given at the beginning of this section, where we showed that including clients whose distance alg eto xceeds the distance tosol by a smaller factor can actually reduce the regret alg of againstsol. In turn, if all clients are closersol to are closer by a factor of p, then the realization that maximizes regretℓin -obje thective is also the realization that maximizes regret in the ℓ -objective. Lemma 3.8. Suppose alg and sol are two solutions to anℓ -clustering instance, such that there is a subset of ∗ ∗ clientsC with the following property: for every clientCin , the connection cost in alg is greater thanp times the connection cost in sol, while for every client not C in, the connection cost in sol is at least the connection cost in alg. Then, C maximizes the following function: p p ′ ′ alg (C )−sol (C )  p p alg (C ) > 0 p−1 ′ ′ alg (C ) f (C ) := p  ′ 0 alg (C ) = 0 q q ′ ′ Proof. Fix any subset of clients C which does not include j. Let alg (C ) be alg’s ℓ -objective cost on this p p q q subset, sol (C ) be sol’s ℓ -objective on this subset, a be alg’s connection cost for j to the pth power, and s be p p sol’s connection cost for j to the pth power. To do this, we analyze a continuous extension f , eof valuated atC plusj in fractional amount x: p p ′ ′ alg (C ) + ax − sol (C ) − sx p p f (C , x ) = . ′ (p−1)/p (alg (C ) + ax ) ′ ′ ′ ′ When x = 0, this is f (C ) (if f (C ) is positive) and when x = 1, this is f (C ∪{j}) (if f (C ∪{j}) is positive). Its derivative with respectx to is: p p ′ ′ alg (C )+ax−sol (C )−sx p a(p−1) p p ′ (p−1)/p (a − s)(alg (C ) + ax ) − · ′ 1/p (alg (C )+ax ) f (C , x ) = . ′ 2(p−1)/p dx (alg (C ) + ax ) Which has the same sign as: p p ′ ′ alg (C ) + ax − sol (C ) − sx a(p − 1) p p (a − s) − · . alg (C ) + ax p p ′ ′ ′ d ′ ˜ ˜ Ifalg (C ) + ax > sol (C ) + sx, i.e.f (C , x ) is positive, thenf (C , x ) is negativeaif≤ s. Consider j j p p dx ′ ∗ ′ ′ any C including a client j not inC . Suppose f (C ) > 0. Then f (C , x ) has a negative derivative on , 1][0 ′ ′ (sincef (C , x ) starts out positive and is increasing x goas es from 1 to 0, i.e. it stays positivfe), (Cso\ {j}) = ′ ′ ′ ′ ′ ′ ˜ ˜ f (C , 0) > f (C , 1) = f (C ), and C cannot be a maximizer of f . If otherwise f (C ) = 0, then C clearly cannot j j be a maximizer of f unlessC is as well. Similarly, observe that: ACM Trans. Algor. Universal Algorithms for Clustering Problems • 17 p p ′ ′ alg (C ) + ax − sol (C ) − sx a(p − 1) a(p − 1) p p a (a − s) − · > (a − s) − = − s. p p p alg (C ) + ax d ′ ∗ ′ ∗ So f (C , x ) is positivae > if ps, which holds forjall ∈ C . Consider anyC not including a client j inC . dx ′ ′ ′ ′ ′ ′ ˜ ˜ ˜ Suppose f (C ) > 0. Then f (C , x ) has a positive derivative,on 1],[0 so f (C ) = f (C , 0) < f (C , 1) = f (C ∪{j}), j j j ′ ′ ′ ∗ and C cannot be a maximizer of f . If otherwisef (C ) = 0, then C clearly cannot be a maximizer f unless of C ′ ∗ ∗ is as well. Since we have shown that eCver,yC cannot be a maximizer of f unlessC is also a maximizer of f , we conclude thatC maximizes f . □ Intuitively, this lemma connects ℓ and the ℓ objectives as this subset of clients C will also be the set that maximizes the ℓ regret ofalg vs sol, and f (C ) is (within a factor p)of equal to theℓ regret. We use this lemma along with the ℓ -clustering with discounts approximation in Lemma 3.4 to design the rounding algorithm for universal ℓ -clustering. As in the rounding algorithm for univ k-meersal dian, let sol denote a (virtual) solution whose connection costs are 3 times that of the fractional solution frac for all clients. The rounding algorithm solves anℓ -clustering with discounts instance, where the discounts aresol 2 times ’s connection costs. (Recall that ink-median, the discount was equalsol to’s connection cost. Now, we need the additional factor of 2 for technical reasons.) Let alg be the solution output by the algorithm of Lemma 3.4 for this problem. We prove the following boundalg for: Lemma 3.9. There exists an algorithm which given(any α, β )-approximate fractional solution frac for ℓ - 1/p clustering, outputs a(54pα, 54pβ + 18p )-approximate integral solution. Proof. Let sol denote a (virtual) solution whose connection costs are 3 times that of the fractional solution frac for all clients. The rounding algorithmℓsolv -clustering es an with discounts instance, where the discounts are 2 timessol’s connection costs. Letalg be the solution output by the algorithm of Lemma 3.4 for this problem. We also consider an additional virtualsol g solution , whose connection costs are deined as follows: For clients j such that alg’s connection cost is greater than 18 times sol’s but less than 18 p timessol’s, we multiply sol’s g g connection costs byp to obtain connection costssol in. For all other clients, the connection cost sol is inthe same g g as that insol. Now, alg and 18·sol satisfy the condition in Lemma 3.8 and ·sol18is a(54pα, 54pβ )-approximation. Our goal in the rest of the proof is to bound the regralg et ofagainst (a constant times) sol g by (a constant times) the minimum regr mr et. Let us denote this regret: f g ′ ′ reg := max alg (C ) − 18· sol (C ) . p p C ⊆C ′ ′ ′ Note that if reg = 0 (it can’t be negative), then for all realizations C , alg (C ) ≤ 18· sol g (C ). In that case, the p p lemma follows immediately. So, we assume reg that > 0. f g ′ ′ Let C = argmax alg (C ) − 18· sol g (C ) , i.e., the realization deining reg that maximizes the regret 1 p p C ⊆C for theℓ objective. We need to relate reg to the regret in theℓ -objective for us to use the approximation guarantees ofℓ -clustering with discounts from Lemma 3.4. Lemma 3.8 gives us this relation, since it tells us that C is exactly the set of clients for which alg’s closest cluster center is at a distance of more than 18 times that ofsol’s closest cluster center. But, this means that C also maximizes the regret forℓthe objective, i.e., ACM Trans. Algor. 18 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi f g p p ′ p ′ C = argmax alg (C ) − 18 · sol g (C ) . Then, we have: 1 p p f g f g p p p p ′ p ′ ′ p ′ max ′ alg (C ) − 18 · sol g (C ) max ′ alg (C ) − 18 · sol g (C ) C ⊆C C ⊆C p p p p reg = ≤ P p−1−j p−1 p−1 j alg (C ) alg (C ) · 18· sol g (C ) 1 1 p 1 j=0 p f g p p ′ p ′ max alg (C ) − 18 · sol (C ) C ⊆C p p ≤ . p−1 alg (C ) p 1 The last inequality follows since connection sol gcosts are atin least those in sol. Note that the numerator in this last expression is exactly the value of the objectiveℓ for -clustering the with discounts problem from Lemma 3.4. Using this lemma, we can now bound the numerator by the optimum for this problem, which in turn is bounded by the objective produced by the minimum regret solution mrs for theℓ -clustering with discounts instance: f g f g p p p p ′ p ′ ′ p ′ ′ ′ max alg (C ) − 18 · sol (C ) max mrs (C ) − 2 · sol (C ) C ⊆C p p 2 C ⊆C p p reg ≤ ≤ · 9 · . (6) p−1 p−1 alg (C ) alg (C ) 1 1 p p p p ′ p ′ First, we bound the numerator in the above expression.CLet:= argmax [mrs (C )− 2 · sol (C )] be the 2 ′ p p C ⊆C realization that maximizes this term. We now relate this mrterm (the irst to step is by factorization and the second step holds because 2· sol = 6· frac exceeds the optimal integer solution by to the upper bound of 3 on the integrality gap [4]): f g p p ′ p ′ max mrs (C ) − 2 · sol (C ) = mrs (C ) − 2· sol (C ) p 2 p 2 p p p−1 p−1−j · mrs (C ) · (2· sol (C )) 2 p 2 j=0 p−1 p−1−j ≤ mr· mrs (C ) · (2· sol (C )) . p 2 p 2 j=0 Using the above bound in Eq. (6), we get: p−1 j p−1−j mrs (C ) · (2· sol (C )) 2 p 2 2 j=0 p reg ≤ · 9 · mr· . (7) p−1 alg (C ) In the rest of proof, we obtain a bound on the last term in(7)Eq. . We consider two cases. Ifalg (C ) ≥ p 1 9p · mrs (C ) (intuitively, the denominator is large compared to the numerator), then: p 2 p−1 j p−1−j p−1 p−1−j mrs (C )2 sol (C ) 2 2 mrs (C ) 1 p p 2 j=0 p −p+1 −p+1 1/p ≤ p · ≤ p · (9p ) = 9 p , p−1 p−1 alg (C ) alg (C ) 1 1 p p p−1 The irst step uses the fact thatmrs (C ) ≥ 2sol (C ), so the largest term in the summrs is (C ). Combined p 2 p 2 2 1/p with Eq. (7) this reg ≤ 6p · mr, giving the lemma statement. Ifalg (C ) < 9p · mrs (C ), then we cannot hope to meaningfully bound(7)Eq . In this case, howeverreg , p 1 p 2 is also bounded by mrs (C ), which we will eventually bound mr. Mor by e formally, by our assumption that p 2 p p reg > 0, mrs (C ) − 2 · sol (C ) > 0 and we have 2sol (C ) ≤ mrs (C ) ≤ sol (C ) + mr. The irst inequality 2 2 p 2 p 2 p 2 p p p p is by our assumption that mrs (C ) − 2 · sol (C ) > 0, the second inequality is by deinition mrs, mr ofand 2 2 p p ACM Trans. Algor. Universal Algorithms for Clustering Problems • 19 the fact thatsol(C ) upper bounds opt(C ). In turn, sol (C ) ≤ mr which gives mrs (C ) ≤ 2mr and thus 2 2 p 2 p 2 1/p 1/e reg ≤ alg (C ) ≤ 18p · mr. We note that for all p ≥ 1, p ≤ e ≤ 1.5. □ p 1 We note that the inal step requires using discounts equal tosol twice ’s connection costs instead of just sol’s connection costs. If we did the latter, we would have started with the ine sol quality (C ) ≤ mrs (C ) ≤ p 2 p 2 sol (C ) + mr instead, which does not give us any useful bound solon (C ) or mrs(C ) in terms of just mr. We p 2 2 2 also note that we chose not to optimize the constants in the inal result of Lemma 3.9 in favor of simplifying the presentation. Theorem 3.1 now follows by using the values (α, βof ) from Lemma 3.7 (and a suiciently small choice of the error parameter ϵ) in the statement of Lemma 3.9 above. 4 UNIVERSAL k-CENTER In the previous section, we gave universal algorithms forℓgeneral -clustering problems. Recall that k-center the objective, deined as the maximum distance of a client from its closest cluster center, can also be interpreted as the ℓ -objective in the ℓ -clustering framework. Moreover, it is well known thatnfor -dimensional any vector, ∞ p itsℓ and ℓ norms difer only a constant factor (see Fact A.1 in Appendix A). Therefore, choposing = logn logn ∞ in Theorem 3.3 gives poly-logarithmic approximation bounds for thek-center universal problem. In this section, we give direct techniques that improve these bounds to constants: Theorem 4.1. There exists a(3, 3)-approximate algorithm for the universal k-center problem. Recall that F is the set of all cluster centers, min so c gives the smallest distance from client j to any cluster i∈F ij center that can be opened. In turn, for every client j, its distance to the closest cluster center in the minimum regret solution mrs, min c , must be at most mr := min c + mr. This is because in the realization where i∈mrs ij j i∈F ij onlyj appears, the optimal solution hasmin cost c , and the cost ofmrs is justmin c . So, we design an i∈F ij i∈mrs ij algorithm alg that 3-approximates these distances mr , i.e., for every client j, its distance to the closest cluster center inalg is at most mr 3 . Indeed, this algorithm satisies a more general property:any giv value en r, it produces a set of cluster centers alg such that every client j is at a distance≤ 3r from its closest cluster center (that is,min c ≤ 3r ), where r := min c + r. Moreover, if r ≥ mr, then the number of cluster centers i∈alg ij j j i∈F ij selected byalg is at mostk (for smaller values r, of alg might select more than k cluster centers). Our algorithm alg is a natural greedy algorithm. We order clients j in increasing orderr of , and if a client j does not have a cluster center within distance r in 3 the current solution, then we add its closest cluster center in F to the solution. Lemma 4.2. Given a valuer, the greedy algorithmalg selects cluster centers that satisfy the following properties: • every clientj is within a distance 3r of= 3(min c + r ) from their closest cluster center. j i∈F ij • If r ≥ mr, then alg does not select more than k cluster centers, i.e., the solution produced by alg is feasible for the k-center problem. Proof. The irst property follows from the deinition alg. of To show that alg does not pick more thank cluster centers, we map the cluster center i added inalg by some client j to its closest cluster center i inmrs. Now, we claim that no two cluster centers i , i inalg can be mapped 1 2 to the same cluster centeri inmrs. Clearly, this proves the lemma since mrs has onlyk cluster centers. Suppose i , i are two cluster centers in alg mapped to the same cluster centeri inmrs. Assume without loss 1 2 of generality that i was added to alg beforei . Let j , j be the clients that causeid, i to be added; sincei was 1 2 1 2 1 2 2 added later, we haver ≤ r . The distance from j to i is at most the length of the path (j , i , j , i ) (see Fig. 2), j j 2 1 2 1 1 1 2 which is at mostr 2+ r ≤ 3r . But, in this case j would not have added a new cluster center i , thus arriving j j j 2 2 2 1 2 at a contradiction. □ ACM Trans. Algor. 20 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Fig. 2. Two clients j , j that are distance at most r , r respectively from the same cluster centeri in mrs cannot cause alg 1 2 j j 1 2 to add two diferent cluster centers i , i . 1 2 We now use the above lemma to prove Theorem 4.1. ′ ′ Proof of Theorem 4.1. In any realization C ⊆ C, the optimal value ofkthe -center objective is opt(C ) ≥ max min c , whereas the solution produced by the algorithm alg given above has objective value at most j∈C i∈F ij ′ ′ 3(max min c + r ). So, alg’s solution costs at most· 3opt(C ) + 3· r for all realizations C ⊆ C. So, if we j∈C i∈F ij were able to chooser = mr, we would prove the theorem. But, we do not know the value mr of (in fact, this is NP-hard). Instead, we increase the value rofcontinuously until alg produces a solution with at most k clients. By Lemma 4.2, we are guaranteed that this will happen forrsome ≤ mr, which then proves the theorem. Our inal observation is that this algorithm can be implemented in polynomial time, since there are only polynomially many possibilities k-center for obje the ctive across all realizations (namely, the set of all cluster center to client distances) and thus polynomially many possiblemrvalues (the setfor of all diferences between all possible solution costs). So, we only neealg d tofor runthese values of r in increasing order. □ We note that the greedy algorithm described above can be viewed as an extensionkof -center the algorithm in [33] to a (3, 3)-approximation for the k-center ł with discountsž problem, where the discounts are the minimum distances min c . i∈F ij 5 UNIVERSAL k-MEDIAN WITH FIXED CLIENTS In this section, we extend the techniques from Section 2 to prove the following theorem: Theorem 5.1. If there exists a deterministic polynomial γ -appr timeoximation algorithm forkthe -median problem, then for every ϵ > 0 there exists a(54γ + ϵ, 60)-approximate universal algorithm for the univkersal -median problem with ixed clients. By using the derandomized version of(2the .732 + ϵ )-approximation algorithm of Li and Svensson 44] for[ the k-median problem, and appropriate choice ofϵbparameters, oth we obtain the following corollary from Theorem 5.1. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 21 Corollary 5.2. For every ϵ > 0, there exists a(148 + ϵ, 60)-approximate universal algorithm forkthe -median problem with ixed clients. Our high level strategy comprises two steps. In Section 5.2, we show how to ind a good fractional solution by approximately solving a linear program. In Section 5.3, we then round the fractional solution in a manner that preserves its regret guarantee within constant factors. As discussed in Section 1.1, for simplicity our algorithm’s description and analysis will avoid the notion of demands and instead equivalently view the input as specifying a set of ixed and unixed clients, of which multiple might exist at the same location. 5.1 Preliminaries In addition to the preliminaries of Section 2, we will use the following tools: Submodular Maximization over Independence Systems. An independence systemcomprises a ground set E and a set of subsets (calle indep d endent sets ) I ⊆ 2 with the property that if A ⊆ B and B ∈ I then A ∈ I (the subset closed property). An independent setS inI ismaximalif there does not exist S ⊃ S such that ′ ′ S ∈ I. Note that one can deine an independence system by specifying the set of maximal independent I sets only, since the subset closed property implies I is simply all subsets ofIsets . An inindependence system is a 1-independence system(or 1-system in short) if all maximal independent sets are of the same size. The following result on maximizing submodular functions over 1-independence systems follows from a more general result given implicitly in [48] and more formally in [14]. Theorem 5.3. There exists a polynomial time algorithm that giv 1-indep en a endence system(E,I) and a non- E + ′ negative monotone submodular function f : 2 → R deined over it, inds a-maximizer of f , i.e. inds S ∈ I such that f (S ) ≥ max f (S). S∈I The algorithm in the above theorem is the natural greedy algorithm, which starts S = ∅with and repeatedly ′ ′ ′ adds to S the elementu that maximizes f (S ∪{u}) while maintaining S that ∪{u} is inI, until no such addition is possible. Incremental ℓ -Clustering. We will also useincr theementalℓ -clusteringproblem which is deined as follows: p p Given anℓ -clustering instance and a subset of the cluster centers S (the eł xistingž cluster centers), ind the minimum cost solution toℓ the -clustering instance with the additional constraint that the solution must contain all cluster centersS.in When S = ∅, this is just the standarℓd-clustering problem, and this problem is equivalent to the standard ℓ -clustering problem by the following lemma: Lemma 5.4. If there exists aγ -approximation algorithm forℓthe -clustering problem, there existsγa-approximation for the incrementalℓ -clustering problem. Proof of Lemma 5.4. The γ -approximation for incremental ℓ -clustering is as follows: Given an instance I of incremental ℓ -clustering with clients C and existing cluster centers S, create a ℓ -clustering instance I which p p has the same cluster centers and clients as ℓthe -clustering instance except that at the location of every cluster 1/p center inS, we add a client with demand γ|C| max c + 1. i, j ij Let T be the solution that is a supersetS of of size k that achieves the lowest cost of all such supersets in instanceI. Let T be the output of runningγa-approximation algorithm ℓ for -clustering on I . Then we wish to show T is a superset of S and has cost at most γ times the cost of T in instance I. Any solution that buys all cluster centers S hasin the same cost inI and I . Then we claim it suices to show that T is a superset of S. IfT is a superset of S, then since bothT and T are supersets ofS and sinceT is a ′ ′ ∗ ′ γ -approximation in instance I , its cost in I is at mostγ times the cost of T inI . This in turn implies T has cost at most γ times the cost of T inI, giving the Lemma. ACM Trans. Algor. 22 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Assume without loss of generality no two cluster centers are distance 0 away from each other. To sho T w that is a superset of S, note that in instance I any solution that does not buy a superset Sofis thus at least distance 1 1/p from the location of some cluster center S and inthus pays cost at leastγ|C| max c + 1 due to one of the i, j ij added clients. On the other hand, any solution that is a superset S isof distance 0 from all the added clients and 1/p thus only has to pay connection costs on clients C, which in in turn means it has cost at most |C| max c . i, j ij 1/p SinceT is the output ofγa-approximation algorithm, T thus has cost at most γ|C| max c , which meansT i, j ij must be a superset ofS. □ 5.2 Obtaining a Fractional Solution for Universalk-Median with Fixed Clients ′ ′ ′ Let C ⊆ C denote the set of ixed clients and for any realization of C clients satisfying C ⊆ C ⊆ C, letopt(C ) f f denote the cost of the optimal solution C .for The universal k-median LP is given by: minr (r denotes maximum regret across all demand realizations) s.t. x ≤ k (x = 1 if we open cluster center i) i i i∈F ∀i ∈ F , j ∈ C : y ≤ x (y = 1 if cluster center i is serving client j) ij i ij ∀j ∈ C : y ≥ 1 ij i∈F X X ∀C ⊆ C ⊆ C : c y − opt(C ) ≤ r (8) f ij ij j∈C i∈F ∀i ∈ F , j ∈ C : x ,y ∈ [0, 1] i ij Note that Eq. (8) and the objective function distinguish this LP from thekstandar -median d LP. We call(8) Eq. the regret constraint set . For a ixed fractional solution x,y, r, our goal is to approximately separate the regret constraint set, since all other constraints can be separated exactly. In the rest of this subsection, we describe our approximate separation oracle and give its analysis. ′ ′ ′ Let S (C ) denote the cost of the solution S ⊆ F in realization C (that is,S (C ) = ′ min c ). Since j∈C i∈S ij ′ ′ opt(C ) = min S (C ), separating the regret constraint set exactly is equivalent to deciding if the S:S⊆F,|S|=k following holds:   X X     ∀S : S ⊆ F ,|S| = k : max  c y − S (C ) ≤ r . (9) ij ij ′ ′   C :C ⊆C ⊆C  ′  j∈C i∈F   P P ′ ′ By splitting the terms ′ c y and S (C ) into terms for C and C \ C , we can rewrite Eq.(9) as follows: j∈C i∈F ij ij f f ACM Trans. Algor. Universal Algorithms for Clustering Problems • 23 X X max c y − S (C ) ≤ r ij ij C ⊆C ⊆C,S⊆F,|S|=k j∈C i∈F   X X     ∀S ⊆ F ,|S| = k : max c y − S (C ) ≤ r ij ij   C ⊆C ⊆C   j∈C i∈F    X X X X    ∀S ⊆ F ,|S| = k : max  c y + c y − S (C \ C ) − S (C ) ≤ r ij ij ij ij f f ′   C ⊆C ⊆C ′   j∈C \C i∈F j∈C i∈F     X X X X     ∀S ⊆ F ,|S| = k : max  c y − S (C \ C ) ≤ S (C ) − c y + r ij ij f f ij ij ′   C ⊆C ⊆C  ′  j∈C \C i∈F j∈C i∈F f f      X X  X X   ∀S ⊆ F ,|S| = k : max  c y − S (C ) ≤ S (C ) − c y + r ij ij f ij ij ∗   C ⊆C\C  ∗  j∈C i∈F j∈C i∈F   For fractional solution y, let   X X     f (S) = max  c y − S (C ) . (10) y ij ij ∗ ∗   C :C ⊆C\C  ∗  j∈C i∈F   Note that we can compute f (S) for anyS easily since the maximizing value C is the of set of clients j for which S has connection cost less than c y . We already knowf (S) is not submodular. But, the term S (C ) is ij ij y i∈F f not ixed with respect to S, so maximizing f (S) is not enough to separate Eq.(8). To overcome this diiculty, for every possible cost M on the ixed clients, we replace S (C ) withM and only maximize over solutions S for whichS (C ) ≤ M (for convenience, we will call any Ssolution for which S (C ) ≤ M an M-cheap solution): f f ( ) X X ∀M ∈ 0, 1, . . . ,|C | maxc : max f (S) ≤ M − c y + r . (11) f ij y ij ij i, j S:S⊆F,|S|=k,S (C )≤M j∈C i∈F Note that this set of inequalities is equivalent (8), but to Eq. it has the advantage that the left-hand side is approximately maximizable and the right-hand side is ixed. Hence, these inequalities can be approximately separated. However, there are exponentially many inequalities; so, forϵany > 0, ixe wedrelax to the following polynomially large set of inequalities: ( ) ⌈log (|C | max c )⌉+1 f i, j i j 1+ϵ ∀M ∈ 0, 1, 1 + ϵ, . . . , (1 + ϵ ) : X X max f (S) ≤ M − c y + r . (12) y ij ij S:S⊆F,|S|=k,S (C )≤M j∈C i∈F Separating inequality(12) Eq.for a ixedM corresponds to submodular maximization f (of S), but now subject to the constraints|S| = k and S (C ) ≤ M as opposed to just |S| = k. Let S be the set of all S ⊆ F such that f M |S| = k and S (C ) ≤ M. Sincef (S) is monotone, maximizing f (S) over S is equivalent to maximizing f (S) f y y M y over the independence system(F ,I ) with maximal independentSsets . M M Then all that is needed to approximately separate (12)Eq. corresponding to a ixeM d is an oracle for deciding ′ ′ ′ membership in (F ,I ). Recall that S ⊆ F is in (F ,I ) if there exists a Sset⊇ S such that|S | = k and S (C ) ≤ M. M M But, even deciding membership of the empty set (Fin ,I ) requires one to solveka-median instance on the ixed clients, which is in general NP-hard. More generally, we are required to solve an instance of the incremental k-median problem (see Section 5.1) with existing cluster centers S. in ACM Trans. Algor. 24 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi While exactly solving incremental k-median is NP-hard, we have a constant approximation algorithm for it (callAit ), by Lemma 5.4. So, we could deine a new system (F ,I ) that contains a setS ⊆ F if the output of A for the incremental k-median instance with existing clusterScenters has cost at most M. But, (F ,I ) may no longer be a 1-system, or even an independence system. To restore the subset closed property, the membership ′ ′ oracle needs to ensure that: (a) if a subset S ⊆ S is determined to not be (in F ,I ), then S is not either, and (b) ′ ′ if a superset S ⊇ S is determined to be (in F ,I ), then so isS. GreedyMax((x,y, r ), F, C , C, M, T , f ): f 0 Input: Fractional solution (x,y, r ), set of cluster centers F, set of ixed clients C , set of all clients C, valueM, F + M-cheap solution T , submodular objectivfe: 2 → R 1: S ← ∅ 2: F ← F 3: for l from 1 tok do 4: for Each cluster centeri inF \ S do l−1 l−1 5: if S ∪ {i} ⊆ T and T isM-cheap then l−1 0 0 6: T ← T l,i 7: else ′ ′ ′ ′ ′ ′ 8: if For some l , i , S ∪ {i} ⊆ T and T isM-cheap then l−1 l ,i l ,i ′ ′ 9: T ← T l,i l ,i 10: else 11: T ← Output ofγ -approximation algorithm on incremental ℓ -clustering l,i p instance with cluster centers F , existing cluster centers S ∪ {i} l−1 l−1 and clients C . 12: end if 13: end if 14: end for 15: F ← F l l−1 16: for Each cluster centeri inF \ S do l l−1 17: if i does not appear in anyT ′ that isM-cheap then l,i 18: F ← F \ {i} l l 19: end if 20: end for 21: S ← S ∪ {argmax f (S ∪ {i})} l l−1 l−1 i∈F \S l l−1 22: end for 23: return S Fig. 3. Modified Greedy Submodular Maximization Algorithm We now give the modiied greedy maximization algorithm GreedyMax that we use to try to separate one of the inequalities in (12),Eq. which uses a built-in membership oracle that ensures the above properties hold. Pseudocode is given in Figure 3, and we informally describ Greed e ityMax here. initializes S = ∅, F = F, and 0 0 starts with aM-cheap k-median solution T (generated by running aγ -approximation on the k-median instance involving only ixed clients C ). In iteration l, GreedyMax starts with a partial solution S withl − 1 cluster f l−1 centers, and it is considering adding cluster centers F toin S . For each cluster centeri inF , GreedyMax l−1 l−1 l−1 generates some k-median solution T containing S ∪ {i} to determine if S ∪ {i} is in the independence l,i l−1 l−1 ACM Trans. Algor. Universal Algorithms for Clustering Problems • 25 ′ ′ system. If a previously generated solution, T or T ′ ′ for anyl , i , containsS ∪ {i} and isM-cheap, then T 0 l ,i l−1 l,i is set to this solution. Otherwise Greed, yMax runs the incremental k-median approximation algorithm on the instance with existing cluster centers S in ∪ {i}, the only cluster centers in the instanceF are, and the client l−1 l−1 set isC . It sets T to the solution generated by the approximation algorithm. f l,i After generating the set of solutions {T } , if one of these solutions contains S ∪ {i} and isM-cheap, i∈F l,i l−1 l−1 then GreedyMax concludes thatS ∪{i} is in the independence system. This, combined with the fact that these l−1 solutions may be copied from previous iterations ensures property (b) holds M-cheap (as thesolutions generated by GreedyMax are implicitly considered to be in the independence system). Otherwise Greed , since yMax was unable to ind an M-cheap superset ofS ∪ {i}, it considers S ∪ {i} to not be in the independence system. In l−1 l−1 accordance with these beliefs, GreedyMax initializes F as a copy ofF , and then removes any i such that it l l−1 did not ind an M-cheap superset ofS ∪ {i} fromF and thus from future consideration, ensuring property l−1 l (a) holds. It then greedily addsS to the i inF that maximizes f (S ∪ {i}) as deined before to create a new l−1 l y l−1 partial solution S . After thekth iteration, GreedyMax outputs the solution S . l k SepOracle((x,y, r ), F, C , C): Input: A fractional solution x,y, r, set of cluster centers F, set of ixed clients C , set of all clients C 1: if Any constraint in the universal k-median LP except the regret constraint set is violate thendreturn the violated constraint 2: end if 3: T ← output ofγ -approximation algorithm A fork-median run on instance with cluster centers F, clients C 2 ⌈log (γ |C | max c )⌉+1 i, j i j 1+ϵ f 4: for M ∈ {0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } such that T isM-cheap do 5: S ← GreedyMax((x,y, r ), F, C , C, M, T , f ) f 0 y P P ′ ′ ∗ 6: C ← argmax ∗ [ ∗ c y − S (C )] ij ij j∈C i∈F C ⊆C\C P P P P 7: if ′ c y − S (C ) > M − c y + r then j∈C i∈F ij ij j∈C i∈F ij ij P P P P 8: return ′ c y − S (C ) ≤ M − c y + r ij ij ij ij j∈C i∈F j∈C i∈F 9: end if 10: end for 11: return łFeasiblež Fig. 4. Approximate Separation Oracle for Universalk-Median Our approximate separation oracle SepOra , cle, can then use GreedyMax as a subroutine. Pseudocode is given in Figure 4, and we give an informal description of the algorithm SepOracle hereche . cks all constraints except the regret constraint set, and then outputs any violated constraints it inds. If none are found, it then k-me runs dian a approximation algorithm on the instance containing only the ixed clients to generate T . For a solution each M that is 0 or a power of+1ϵ (as in Eq.(12)), if T isM-cheap, it then invokes GreedyMax for this valueMof(otherwise, GreedyMax will consider the corresponding independence system to be empty, so there is no point in running P P P P it), passing T to GreedyMax. If then checks the inequality ′ c y − S (C ) ≤ M − c y + r for 0 ij ij ij ij j∈C i j∈C i the solution S outputted by GreedyMax, and outputs this inequality if it is violated. This completes the intuition behind and description of the separation oracle. We now move on to its analysis. First, we show thatGreedyMax always inds a valid solution. Lemma 5.5. GreedyMax always outputs a set S of sizek when called by SepOracle. ACM Trans. Algor. 26 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Proof. Note that GreedyMax is only invokeTd if isM-cheap. This implies some T isM-cheap since some 0 1,i T will be initialize T . Then, d toit suices to show that in the lth iteration, there is some i that can be added to 1,i 0 S . If this is true, it implies S is of size k sincek elements are added across all k iterations. l k This is true in iteration 1 becauseTsome isM-cheap and thus any element of T is in F and can be added. 1,i 1,i 1 Assume this is inductively true in iteration l, i.ei. is added toS in iteration l because i is in some M-cheap T . l l,i ′′ ′ ′ ′ SinceT isM-cheap, no element of T is deleted from F . Then in iteration l + 1, for all i inT \ S (a set of l,i l,i l l,i l ′′ ′′ ′ sizek − l, i.e. non-empty),T can be initialize T d.to Then all such i can be added to S because all such l+1,i l,i l+1 ′′ ′′ i satisfy that T isM-cheap and thus are inF . By induction, in all iterations, therei that is some can be l+1,i l+1 added to S , giving the Lemma. □ Then, the following lemma assertsGreed that yMax is indeed performing greedy submodular maximization over some 1-system. Lemma 5.6. Fix any run ofGreedyMax. Consider the values ofS ,T , F for all l, i (deined as in Figure 3) at l l,i l the end of this run. LetB be the set containing S ∪ {i} for each l, i < F , i < S . Let (F ,S) be the independence l−1 l l−1 system for which the set of maximal independent sets S consists of all size k subsets S of F such that no subset of max S is inB, and S isM-cheap. Then the following properties hold: (1) For any l and any i < F , S ∪ {i} is not inS l l−1 (2) For any l and any i ∈ F , S ∪ {i} is inS l l−1 (3) (F ,S) is a 1-system Proof. Property 1: This property is immediate from the deinition B and S of. Property 2: Fix anyl, i ∈ F . We want to show that S ∪ {i} is inS. Sincei ∈ F , there exists someT ′ such l l−1 l l,i that S ∪ {i} is a subset of T ′ and T ′ isM-cheap (otherwisei, would have been deleted from F ). If we can l−1 l,i l,i l show T ′ is inS then we immediately get that S ∪ {i} is inS. l,i max l−1 ′′ Suppose not. SinceT ′ isM-cheap, this must be because some subset of T ′ is of the form S ′ ∪ {i } for l,i l,i l −1 ′′ ′′ ′ ′ i < F ′, i < S ′ . In particular, consider the smallestlvalue for which this is true, i.e l .blet e the iteration in l l −1 ′′ whichi was deleted from F ′. ′ ′′ ′′ Ifl < l, sincei was deleted from F ′, i cannot appear in anyM-cheap solution containing S ′ generated l l −1 by the incremental k-median approximation algorithm before the end of iteration l (otherwiseT, ′ ′′ could be l ,i ′′ ′′ initialized to this solution, pr i evfrenting om being deleted). Since i is not in F ′ (and thus not inF ′ . . . F ), l l +1 l ′ ′′ in iterations l + 1 to l the approximation algorithm is not allowei d.to Souse no M-cheap solution is ever ′′ ′ ′ generated by the approximation algorithm which is a supSerset∪of{i }. But T is aM-cheap superset of l −1 l,i ′′ S ∪ {i } which must have been generated by the approximation algorithm at some point, a contradiction. l −1 ′ ′′ ′ ′ ′ Thus we can assume l ≥ l. However, recall that T is anM-cheap solution containSing∪{i }. Ifl = l, this l,i l ′′ ′ ′ ′ ′′ prevents i from being deleted in iteration l , giving a contradiction. l >Ifl then T can be initialized to a l ,i ′′ ′′ ′ ′ M-cheap superset ofS ∪ {i }, sinceT is such a superset. This also prevents i from being deleted in iteration l l,i ′′ l , giving a contradiction. ′ ′ In all cases assuming T is not inS leads to a contradiction,Tso is inS and thus S ∪ {i} is inS. l,i max l,i max l−1 Property 3: S is deined so that all maximal independent sets arke,of giving size the property. □ Corollary 5.7. Any run of GreedyMax outputs an M-cheap, -maximizer fof(S) over the system (F ,S) as deined in Lemma 5.6. Proof. Properties 1 and 2 in Lemma 5.6 imply that at each Greed step, yMax adds the element to its current solution that maximizes the objectiv f (S)ewhile maintaining that the current solution S. Thus is in GreedyMax is exactly the greedy algorithm in Theorem 5.3 for maximizing a monotone submodular objective over an independence system. By Lemma 5.5,GreedyMax always inds a maximal independent set, and the deinition of ACM Trans. Algor. Universal Algorithms for Clustering Problems • 27 S guarantees that this maximal independent M set-cheap is . Lemma 2.4 gives thatf (S) is a monotone submodular function of S. Then, Property 3 combined with Theorem 5.3 implies the solution output Greedby yMax is a -maximizer. □ Of course, maximizing over an arbitrary 1-system is of little use. In particular, we would like to show that the 1-system Lemma 5.6 shows we are maximizing over approximates the 1-system of subsets of solutions whose cost on the ixed clients is at most M. The next lemma shows that while all such solutions may not be in this 1-system, all solutions that ar -cheap e are. Lemma 5.8. In any run of GreedyMax, let S be deined as in Lemma 5.6. For the value M passed to GreedyMax in this run and any solution S which is-cheap, S ∈ S. Proof. Fix any suchS. Let B be deined as in Lemma 5.6. For any element B ofB, it must be the case that running aγ -approximation on the incremental k-median instance with existing clusterBcenters produced a solution with cost greater than M. This implies that forB any inB, the incremental k-median instance with M ′ existing cluster centers B has optimal solution with cost greater than . However, for any subsetS ofS, the ′ M optimal solution to the incremental k-median instance with existing clusterScenters has cost at most sinceS is a feasible solution to this instance which -cheap.is Thus no subset ofS is inB, and hence S is inS. □ Lastly, we show thatSepOracle never incorrectly outputs that a point is infeasible, i.e., that the region SepOracle considers feasible strictly contains the region that is actually feasiblek-me in dian the univ LP. ersal Lemma 5.9. If x,y, r is feasible for the universal k-median LPSepOra , cle outputs łFeasible.ž Proof. SepOracle can exactly check all constraints besides the regret constraint set, so assume that if P P P P SepOracle outputs that x,y, r is not feasible, it outputs that′ c y −S (C ) ≤ M− c y +r j∈C i∈F ij ij j∈C i∈F ij ij is violated for some M, S. In particular, it only outputs that this constraint is violated if it actually is violated. If this constraint is violated, then since by Corollar S isMy-cheap: 5.7 X X X X c y − S (C ) > M − c y + r ij ij ij ij j∈C i∈F j∈C i∈F X X X X c y + c y > S (C ) + M + r ij ij ij ij j∈C i∈F j∈C i∈F X X c y > S (C ) + M + r ij ij j∈C ∪C i∈F ′ ′ ′ ≥ S (C ) + S (C ) + r = S (C ∪ C ) + r ≥ opt(C ∪ C ) + r f f f Which implies the pxoint ,y, r is not feasible for the univ k-me ersal dian LP. □ We now have all the tools to prove our overall claim: Lemma 5.10. If there exists a deterministic polynomial-time γ -approximation algorithm forkthe -median problem, then for every ϵ > 0 there exists a deterministic algorithm that outputs (2γ (1a+ ϵ ), 2)-approximate fractional solution to the universal k-median problem in polynomial time. Proof. We use the ellipsoid method wher SepOra e cle is used as the separation oracle. By Lemma 5.9 since the minimum regret solution is a feasible solution to the k-me univ dian ersal LP, it is also considered feasible by ∗ ∗ ∗ ∗ SepOracle. Then, the solution x ,y , r output by the ellipsoid method satisies r ≤ mr. ACM Trans. Algor. 28 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi ∗ ∗ ∗ ∗ ∗ Suppose the ellipsoid method outputs x ,y , r such that x ,y are not a (2γ (1 + ϵ ), 2)-approximate solution. This means there exists S,C ⊆ C ⊆ C such that: X X ∗ ′ c y >2γ (1 + ϵ )S (C ) + 2· mr ij ij j∈C i∈F X X ∗ ′ ′ c y − S (C \ C ) >2γ (1 + ϵ )S (C ) + (2γ (1 + ϵ ) − 1)S (C \ C ) ij f f f ij j∈C \C i∈F X X − c y + 2· mr ij ij j∈C i∈F   X X     ≥2 γ (1 + ϵ )S (C ) − c y + mr . f ij ij     j∈C i∈F   2 ⌈log (γ |C | max c )⌉+1 f i, j i j 1+ϵ Thus, for the value of M in the set{0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } contained in the interval [γS (C ),γ (1 + ϵ )S (C )], we have f f     X X  X X   X X      ∗ ′ ∗ ∗ ∗ c y − S (C \ C ) ≥ 2 M − c y + mr ≥ 2 M − c y + r  . ij f ij ij ij ij ij     ′     j∈C \C i∈F j∈C i∈F j∈C i∈F f f f     The last inequality follows r ≤ since mr. Then, consider the iterationSepOra in cle where it runsGreedyMax for this valueMof . SinceM ≥ γS (C ), S is -cheap. Thus by Lemma 5.8, S is part of the independence system S speciied in Lemma 5.6 which GreedyMax inds a maximizer for in this iteration, and thus the maximum of P P ∗ ∗ the objective in this independence system is at least M − 2[ c y + r ]. By Corollary 5.7, SepOracle ij j∈C i f ij P P ′ ′′ ′ ∗ ′ ′′ thus inds someS ,C ⊆ C \ C such that S isM-cheap and for which ′′ c y − S (C ) is at least f ij j∈C i ij P P ∗ ∗ ∗ ∗ ∗ M − c y + r . But this meansSepOracle will output that x ,y , r is infeasible, which means the ij j∈C i f ij ellipsoid algorithm cannot output this solution, a contradiction. □ 5.3 Rounding the Fractional Solution for Universalk-Median with Fixed Clients Proof of Theorem 5.1. The algorithm is as follows: Use the algorithm of Lemma 5.10 with error parameter ϵ ϵ to ind a(2γ (1 + ), 2)-approximate fractional solution. f bLet e the connection cost of this fractional 54γ 54γ solution for client j. Construct a k-median with discounts instance with the same clients C and cluster centers F where client j has discount 0 if it was originally a ixed client, and f if discount it was3originally a unixed client. The solution to this instance given by Lemma 2.3 is the solution forkthe -meuniv dianersal instance. Again using the integrality gap upper bound of k-me 3 for dian, we have:   X X X   ′ ′  ′  + mr = max[mrs(C ) − opt(C )] ≥ max mrs(C ) − 3 f  = (m − 3f ) + (m − 3f ) . (13) j j j j j ′ ′   C C  ′  j∈C j∈C j∈C\C f f   The cost of the minimum regret solution in k-me the dian with discounts instance is given by: X X X X X X + + m + (m − 3f ) = 3f + (m − 3f ) + (m − 3f ) ≤ 3f + mr, by Eq. (13). (14) j j j j j j j j j j∈C j∈C j∈C j∈C j∈C\C j∈C\C f f f f f f ACM Trans. Algor. Universal Algorithms for Clustering Problems • 29 Let c be the connection cost of the algorithm’s solution for j. client Lemma 2.3 and Eq. (14) give:   X X X     c + (c − 9· 3f ) ≤ 6  3f + mr j j j j     j∈C j∈C\C j∈C f f f   X X X c + (c − 27f ) ≤ 18f + 6· mr j j j j (15) j∈C j∈C j∈C\C f f f   X X X X     +   =⇒ max c − 27 f = (c − 27f ) + (c − 27f ) ≤ 6· mr. j j j j j j ′    ′ ′  j∈C j∈C j∈C j∈C\C   Lemma 5.10 then gives that for any valid C : ′ ′ frac(C ) = f ≤ 2γ 1 + · opt(C ) + 2· mr. (16) 54γ j∈C Using Eq. (15) and (16), we can then conclude that X X ′ ′ ∀C ⊆ C ⊆ C : c ≤ 27 f + 6· mr ≤ (54γ + ϵ ) · opt(C ) + 60· mr. □ f j j ′ ′ j∈C j∈C 6 UNIVERSAL ℓ -CLUSTERING WITH FIXED CLIENTS In this section, we give the following theorem: Theorem 6.1. For all p ≥ 1, if there existsγa-approximation for ℓ -clustering, then for allϵ > 0 there exists a 1/p 2 1/p (54pγ · 2 + ϵ, 108p + 6p + ϵ )-approximate universal algorithmℓfor -clustering with ixed clients. In particular, we get from known results [1, 30]: 2 1/p 2 1/p • A (162p · 2 + ϵ, 108p + 18p + ϵ )-approximate universal algorithm ℓ -clustering for with ixed clients for all ϵ > 0, p ≥ 1. • A (459, 458)-approximate universal algorithm k-means for with ixed clients. The algorithm for universal ℓ -clustering with ixed clients follows by combining techniques ℓ -clustering from p p and k-median with ixed clients. 6.1 Finding a Fractional Solution We reuse the subroutineGreedyMax to do submodular maximization over an independence system whose bases are M-cheap solutions (that is, solutionsℓ with -objective at mostM on only the ixed clients), and use the submodular function f with varying choicesYof as we did for ℓ -clustering. We can extend Lemma 3.5 as y,Y p follows: C C\C f f Lemma 6.2. For any two solutionsy, S, if the global maximum fra of c (I)− S (I) over 1 × [0, 1] is positive, p p C C\C f f then there is a maximizer that is 1 in× {0, 1} , i.e. f g f g ′ ′ max frac (I) − S (I) = max frac (C ) − S (C ) . p p p p C C\C C ⊆C ⊆C f f s I∈1 ×[0,1] : The proof follows exactly the same way as Lemma 3.5. In that proof, the property we use of having no ixed clients is that if the global maximum is not the all zeroes vector, then it isfra positiv c (I) >eSand (I).soIn the p p statement of Lemma 6.2, we just assume positivity instead. This shows that it is still ine to output separating hyperplanes based on fractional realizations of clients in the presence of ixed clients. The only time it is maybe ACM Trans. Algor. 30 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi not ine is in a fractional realization where if the fra łregr c is etž of negative, but in this case we will not output a separating hyperplane anyway. Fixed-ℓ -SepOracle((x,y, r ), F, C , C): p f Input: A fractional solution x,y, r, set of cluster centers F, set of ixed clients C , set of all clients C 1: if Any constraint in the universal ℓ -clustering LP except the regret constraint set is violate then rdeturn the violated constraint 2: end if 3: T ← output ofγ -approximation algorithm A forℓ -clustering run on instance with cluster centers F, 0 p clients C p p 4: c ← min c ,c ← max c min i∈F, j∈C\C max j∈C\C i∈F ij f ij P P 5: Y ← c y ij f j∈C i∈F f ij 1/p 2 ⌈log (γ |C | max c )⌉+1 f i, j i j 1+ϵ 6: for M ∈ {0, 1, 1 + ϵ, (1 + ϵ ) , . . . (1 + ϵ ) } such that T isM-cheap do ′ ′ 2 ′ ⌈log ′ c /c ⌉ max min 1+ϵ 7: for Y ∈ {0,c ,c (1 + ϵ ),c (1 + ϵ ) , . . . c (1 + ϵ ) } do min min min min 8: S ← GreedyMax((x,y, r ), F, C , C, M, T , f ) f 0 y,Y P P P p p P P 9: I ← argmax C C\C p d c y − d min c j ij j i∈S f f j∈C\C i∈F j∈C\C f ij f ij I∈1 ×[0,1] : d c y ≤Y j∈C\C j i∈F i j i j f g P P P p p 1 ′ ′ 10: if d c y − d min c > r then ij i∈S 1−1/p j∈C i∈F j∈C j ij j ij p (Y +Y ) f g P P P p p 1 ′ ′ 11: return d c y − d min c ≤ r 1−1/p j∈C i∈F ij j∈C i∈S j ij j ij p (Y +Y ) 12: end if 13: end for 14: end for 15: return łFeasiblež Fig. 5. Approximate Separation Oracle for Universalℓ -Clustering with Fixed Clients. GreedyMax is the same algorithm as presented in Figure 3 fork-median. 1/p Lemma 6.3. If there exists aγ -approximation for ℓ -clustering, then for allϵ > 0, α = 2 γ (1 + ϵ ), β = 2p(1 + ϵ ) there exists an algorithm that outputs an (α, β )-approximate universal fractional solution ℓ for -clustering with ixed clients. Proof. IfFixed-ℓ -SepOracle ever outputs an inequality in the regret constraint set, for the corresponding q q q Y ,Y , I , S, letfrac (I), sol (I) denote the ℓ costs of the fractional solution S as and before. Then we have by f p p p P P deinitionYofand the constraint that d c y ≤ Y : f j ij j∈C i∈F ij   X X X   p p  ′ ′ r <  d c y − d minc  ≤ ij j ij j ij 1−1/p   p(Y + Y ) i∈S   j∈C i∈F j∈C   f g p p ′ ′ ′ ′ frac (I ) − sol (I ) = frac (I ) − sol (I ). P p p p p p−1 j p−1−j ′ ′ frac (I )sol (I ) j=0 p p The second inequality uses that Fixed-ℓ -SepOracle only outputs an inequality in the regret constraint set ′ ′ such that frac (I ) > sol (I ). We then have by Lemma 6.2 that for any feasible fractional solution, the inequality p p output by Fixed-ℓ -SepOracle is satisied. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 31 Now, suppose there exists someI, sol such that frac (I) > αsol (I) + βr (forr ≥ 0). Consider the values of p p P P P P p p Y , M iterated over byFixed-ℓ -SepOracle such that d c y ≤ Y ≤ (1+ϵ )( d c y ) p j∈C\C j i∈F ij j∈C\C j i∈F ij f ij f ij and γ sol (C ) ≤ M ≤ γ (1 + ϵ )sol (C ). Then: p f p f 1/p X X * p + . c y / > αsol (C ) + βr ij p ij j∈C i∈F , - 1/p X X * p + . c y / − αsol (C ) > βr ij p ij j∈C i∈F , - P P p p p ′ ′ c y − α sol (C ) ij j∈C i∈F p ij > βr (i) P P 1−1/p ′ c y j∈C i∈F ij ij P P p p p ′ (1 + ϵ ) ′ c y − α sol (C ) ij j∈C i∈F p ij > βr (ii) 1−1/p Y + Y 1−1/p X X βr Y + Y p p p ′ c y − α sol (C ) > ij ij 1 + ϵ j∈C i∈F 1−1/p X X X X βr (Y + Y ) p p p p p ′ p c y − α sol (C \ C ) > + α sol (C ) − c y ij f f ij ij p p ij 1 + ϵ j∈C \C i∈F j∈C i∈F f f 1−1/p X X X X βr (Y + Y ) p p p p ′ p c y − α sol (C \ C ) > + α − c y . (iii) ij ij p f ij ij 1 + ϵ γ (1 + ϵ ) i∈F j∈C i∈F j∈C \C f f P P (i) follows from the fact sol that (C ) > ′ c if a > b. (ii) follows from deinitions Y ,Yof . (iii) p f j∈C i∈F ij ′ ′ ′ follows from the choice M.of Let I denote the vector whose jth element is I if j ∈ C and 0 otherwise. By C j the analysis in Section 5.1, since sol isM/γ -cheap it is in the independence systemGreed that yMax inds a 1/2-maximizer for. ThatGreed is, yMax outputs some S and Fixed-ℓ -Oracle inds someI such that S is P P M-cheap, d c y ≤ Y , and such that: ij j∈C\C i∈F f j ij ACM Trans. Algor. 32 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi   1−1/p p X X X X   βr (Y + Y ) 1 f M p p p   ′ p ′ p d c y − α S (I ) >  + α − c y  ij ij j p ij C\C ij f   2 1 + ϵ γ (1 + ϵ ) ′   j∈C \C i∈F j∈C i∈F    p 1−1/p X X X X βr (Y + Y ) S (C ) 1   f p f p p p ′ ′ p ′   d c y − S (I ) > + α − d c y (iv) ij ij p   j ij C\C j ij 2  1 + ϵ γ (1 + ϵ )  i∈F j∈C i∈F j∈C \C   f f 1−1/p X X X X βr (Y + Y ) p p p p ′ ′ ′ d c y − S (I ) > + S (C ) − d c y (v) ij f ij j p p j ij C\C ij 2(1 + ϵ ) j∈C \C i∈F j∈C i∈F 1−1/p X X βr (Y + Y ) p p ′ ′ d c y − S (I ) > ij j p ij 2(1 + ϵ ) j∈C i∈F P P p p ′ ′ ′ d c y − S (I ) j∈C i∈F ij βr j ij p 1−1/p 2p(1 + ϵ ) p(Y + Y ) P P p p ′ ′ ′ d c y − S (I ) ij j∈C i∈F p j ij > r . (vi) 1−1/p p(Y + Y ) (iv) follows the M-cheapness ofS. (v) follows from the choice α. of (vi) follows from the choice β. of So, Fixed-ℓ -SepOracle outputs an inequality as desired. □ 6.2 Rounding the Fractional Solution Again, we show how to generalize the approach for rounding fractional solutions k-median forwith ixed clients to round fractional solutions ℓ for clustering with ixed clients. We extend Lemma 3.8 as follows: Lemma 6.4. Suppose alg and sol are two (possibly virtual) solutions toℓan -clustering instance with ixed clients ∗ ∗ C , such that there is a subset of clients C ⊂ (C \ C ) such that for every client in C alg’s connection cost is greater f f than p timessol’s connection cost, and for every clientC in\C \C , sol’s connection cost is at least alg’s connection cost. Then p p ′ ′ alg (C )−sol (C )  p p alg (C ) > 0 p−1 ′ ′  alg (C ) f (C ) := p  ′ 0 alg (C ) = 0 is maximizedCby∪ C . The proof follows exactly as that of Lemma 3.8. Lemma 6.5. There exists an algorithm that given any (α, β )-approximate universal fractional solution ℓ for - 1/p clustering with ixed clients, outputs (54apα, 54pβ + 18p )-approximate universal integral solution. Proof. Let sol be the virtual solution whose connection costs are 3 times the fractional solution’s for all clients. The algorithm is to solv ℓ -clustering e the with discounts instance using Lemma 3.4 where the discounts are 0 for ixed clients and 2 times sol’s connection costs for the remaining clients. Note that using these f g p p p ′ p ′ discounts, theℓ -clustering with discounts objective emax quals ′ alg (C ) − 2 · sol (C \ C ) instead C ⊆C ⊆C f p f p p f g p p ′ p ′ of max alg (C ) − 2 · sol (C ) . C ⊆C ⊆C f p p ACM Trans. Algor. Universal Algorithms for Clustering Problems • 33 Let alg be the output solution. We will againalg bound ’s cost against the virtual solution sol g whose connection costs are sol’s connection costs times p for non-ixed clients j such that alg’s connection cost toj is at least 18 timessol’s but less than 18 p times 18· sol’s, and the same as sol’s for the remaining clients. f g ′ ′ We use max ′ to denote max ′ . Ifmax ′ alg (C ) − 18sol g(C ) ≤ 0 then alg’s cost is always bounded C C ⊆C ⊆C C p f g ′ ′ ′ ′ ′ g g g by 18 timessol’s cost and we are done. So assume max [alg (C )−18sol (C )] > 0. LetC = argmax ′ alg (C ) − 18sol (C ) p p 1 p p f g p p ′ p ′ and C = argmax mrs (C ) − 2 · sol (C ) . Like in the proof of Lemma 3.9, via Lemma 6.4 we have: 2 p p f g p p ′ p ′ f g max alg (C ) − 18 sol (C ) p p ′ ′ max alg (C ) − 18sol g (C ) = = p p p−1 alg (C ) f g p p p ′ p ′ p max ′ alg (C ) − 18 sol (C \ C ) − 18 sol (C ) C f f p p p p−1 alg (C ) f g p p p ′ p ′ p max mrs (C ) − 2 sol (C \ C ) − 18 sol (C ) 2 p p f p f · 9 ≤ p−1 alg (C ) f g p p ′ p ′ max ′ mrs (C ) − 2 sol (C ) 2 p p · 9 . p−1 alg (C ) 1/p Using the same analysis as in Lemma 3.9 we can upper bound this inal quantity p ·by mr18 , proving the lemma. □ Theorem 6.1 follows from Lemmas 6.3 and 6.5. 7 UNIVERSAL k-CENTER WITH FIXED CLIENTS In this section, we discuss how to extend the proof of Theorem 4.1 to prove the following theorem: Theorem 7.1. There exists a(9, 3)-approximate algorithm for universal k-center with ixed clients. Proof. To extend the proof of Theorem 4.1 to the case where ixed clients are present,apx let (C ) denote the ′ ′ cost of a 3-approximation to the k-center problem with client C set ; it is well known how to compute apx(C ) in polynomial time 33]. A [ solution with regr r must et be within distance r := apx(C ∪{j}) +r of client j, otherwise j s in realization C ∪ {j} the solution has regret larger than r due to client j. The same algorithm as in the proof of Theorem 4.1 using this deinition r inds of alg within distance r =3 3·apx(C ∪{j})+3·mr ≤ 9·opt(C ∪{j})+3mr j j s s ′ ′ ′ of client j. opt(C ) ≥ opt(C ∪ {j}) for any realization C and any client j ∈ C , so this solution is (9, 3a)- approximation. □ 8 HARDNESS OF UNIVERSAL CLUSTERING FOR GENERAL METRICS In this section we give some hardness results to help contextualize the algorithmic results. Much like the hardness results for k-median, all our reductions are based on the NP-hardness of approximating set cover (or equivalently, dominating set) due to the natural relation between the two types of problems. We state our hardness results in terms of ℓ -clustering. Setting p = 1 gives hardness results for k-median, and setting p = ∞ (and using the convention /1∞ = 0 in the proofs as needed) gives hardness resultskfor -center. ACM Trans. Algor. 34 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 8.1 Hardness of Approximatingα Theorem 8.1. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering whereα < 3 is NP-hard. Proof. We will show that given a deterministic (α, β )-approximate algorithm wher α < e 3, we can design an algorithm (using (the α, β )-approximate algorithm as a subroutine) that solves the set cover problem (i.e. inds a set cover of size k if one exists) giving the lemma by NP-hardness of set cover. The algorithm is as follows: Given an instance of set cover, construct the following instance of univ ℓ -clustering: ersal • For each element, there is a corresponding client in the univ ℓ -clustering ersal instance. • For each set S, there is a cluster center which is distance 1 from the clients corresponding to elements S in and 3 from other all clients. Then, we just run the universal ℓ -clustering algorithm on this instance, and output the sets corresponding to cluster centers this algorithm buys. Assume for the rest of the proof that a set cover of size k exists. Then the corresponding k cluster centers are as close as possible to every client, and are always an optimal solution. This mr =giv 0 for es that this universal ℓ -clustering instance. Now, suppose by contradiction that this algorithm does not solve the set cover problem. That is, for some set cover instance we run an(α, β )-approximate algorithm wher α < e 3 on the produced ℓ -clustering instance, and it produces a solution alg that does not choose cluster centers corresponding to a set cover. This means it is distance 3 from some client j. For realization C = {j}, we have by the deinition(of α, β )-approximation: ′ ′ alg(C ) ≤ α · opt(C ) + β · mr =⇒ 3 ≤ α · 1 + β · 0 = α Which is a contradiction, giving the lemma. □ Note that for e.g.k-median, we can classically get an approximation ratio of less than 3. So this theorem shows that the universal version of the problem is harder, even if we are willing to use arbitrar β. y large 8.2 Hardness of Approximatingβ We give the following result on the hardness of univ ℓ -clustering. ersal Theorem 8.2. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering whereβ < 2 is NP-hard. Proof. We will show that given a deterministic (α, β )-approximate algorithm wher β < e 2, we can design an algorithm (using (the α, β )-approximate algorithm as a subroutine) that solves the dominating set problem (i.e. outputs at mostk vertices which are a dominating set of k size if a dominating set ofksize exists) giving the lemma by NP-hardness of dominating set. The algorithm is as follows: Given an instance of dominating set G = (V , E), construct the following instance of univ ℓ -clustering: ersal • For each vertex v ∈ V , there is a corresponding k-clique of clients in the univ ℓ -clustering ersal instance. • For each (u,v) ∈ E, connect all clients u’sin corresponding clique to all those v’s. in • Impose the shortest path metric on the clients, where all edges are length 1. Then, we just run the universal ℓ -clustering algorithm on this instance, and output the set of vertices corresponding to cluster centers this algorithm buys. Assume for the rest of the proof that a dominating set ofksize exists in the dominating set instance. Then, a dominating set of size k also exists in the constructed univℓersal -clustering instance (where the cliques this set resides in correspond to the vertices in the dominating set in the original instance). Thus, there is a solution to the universal ℓ -clustering instance that covers all clients at distance at most 1. ACM Trans. Algor. Universal Algorithms for Clustering Problems • 35 We will irst show this dominating set solution is a minimum regret solution. Given a dominating set solution, note that in any realization of the demands, opt can cover k locations at distance 0, and must cover the rest of the clients at distance at least 1. Thus, to maximize the regret of a dominating set solution,k w clients e pick any covered at distance 1 by the dominating set, and choose the realization including only these clients. Now, consider any solution which is not a dominating set. For such a solution, ther k-clique e is some covered 1/p at distance 2. We can make such a solution incur regr k et 2by includingkall clients in this clique, with the optimal solution being to buy all cluster centers in this clique at cost 0. Thus, the dominating set solution is a 1/p minimum regret solution,mr and= k . Now consider any(α, β )-approximation algorithm and suppose this algorithm when run on the reduced dominating set instance does not produce a dominating set solution while one exists. Consider C the realization including only the clients in k-clique some covered at distance 2. By deinition (α, β )of -approximation we get: ′ ′ alg(C ) ≤ α · opt(C ) + β · mr 1/p 1/p 2k ≤ 0 + βk (17) 2 ≤ β Ifβ < 2, this is a contradiction, i.e. the algorithm will always output a dominating k if one seteof xists. size Thus an (α, β )-approximation algorithm wher β <e2 can be used to solve the dominating set problem, proving the theorem. □ 9 HARDNESS OF UNIVERSAL CLUSTERING FOR EUCLIDEAN METRICS 9.1 Hardness of Approximatingα We can consider the special caseℓ of -clustering where the cluster center and client locations are all points in d d R , and the metric isℓa-norm inR . One might hope that e.g. for d = 2, α = 1 + ϵ is achievable since for the classic Euclidean k-median problem, a PTAS exists 5]. W [ e show that there is still a lower bound α evon en for ℓ -clustering R in. Theorem 9.1. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering in R using the 1+ 7 ℓ -norm where α < for q = 2 or α < 2 for q = 1,∞ is NP-hard. Proof. The hardness is via reduction from the discr k-center ete problem in R . Section 3 of46[] shows how to reduce an instance of planar 3-SAT (which is NP-hard) to an instance of Euclidean k-center inR using theℓ norm as the metric such that: • For every client, the distance to the nearest cluster center is 1. • There exists ak-cluster center solution which is distance 1 from all clients if the planar 3-SAT instance is satisiable, and none exists if the instance is unsatisiable. • Any solution that is strictly less than distance α − ϵ away from all clients can be converted in polynomial 1+ 7 time to a solution within distance 1 of allαclients = if for q = 2, α = 2 if q = 1,∞. We note that [46]’s reduction is actually to the continuousž ł version of the problem where every pRoint in can be chosen as a cluster center, including the points clients are located at. That is, if we use this reduction without modiication then the irst property is not actually true (since the minimum distance is 0). However, in the proof of correctness for this construction 46] sho [ ws that (both for the optimal solution and any algorithmic solution) it suices to only consider cluster centers located at the centers of ℓ adiscs set ofof radius 1 chosen such that every client lies on at least one of these discs and no client is contained within any of these discs. So, taking this reduction and then restricting the choice of cluster centers to the centers of these discs, we retrieve an instance with the desired properties. ACM Trans. Algor. 36 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi Now, consider the corresponding instance as a univℓersal -clustering instance. Like in the proof of Theorem 8.1, if the planar 3-SAT instance reduced from is satisiable, there exists a clustering solution that is as close as possible to every client, i.e. has regret 0.mr So= 0. Thus, an (α, β )-approximate clustering solution is within α of distance every client (in the realization where only j app client ears, opt is 1 so anα-approximate solution must be within distanceα of this client). In turn, using the properties of the reduced clustering instance (α, β )-appr , anoximation where α is less than the lower bound given 46in ] can [ be converted into an algorithm that solves planar 3-SA □ T. 9.2 Hardness of Approximatingβ We can also show thatβ = 1 is NP-hard in R using a similar reduction: Theorem 9.2. For all p ≥ 1, inding an (α, β )-approximate solution for universal ℓ -clustering in R using the ℓ -norm where β = 1 for q = 1, 2,∞ is NP-hard. Proof. We again use a reduction from planar 3-SAT due46to].[This time, we use the reductions in Section 4 of [46] for simplicity, which has the properties that: • Every client is distance 0 from a co-located cluster center, and the distance to the second-closest cluster center is 1. • There exists ak-cluster center solution which is distance 1 from k all clients but and distance 0 frkom clients (the ones at the cluster centers) if the planar 3-SAT instance is satisiable, and none exists if the instance is unsatisiable. Consider any instance reduced from a satisiable planar 3-SAT instance. The solution in the resulting instance ∗ 1/p sol with the second property above has regretk (and in fact, thismr is ): by the irst property above, no solution can be less than distance 1 away from any clients other than k clients the co-located with its cluster ∗ ′ centers. In turn, the regret of thesol against any adversarial sol is maximized by the realization C only including ∗ ′ ′ 1/p 1/p the clients co-located withk the cluster centers in the sol. We then get sol (C ) − sol(C ) = k − 0 = k . Now consider an arbitrar(αy, 1)-approximate universal solution alg in this instance. Consider any set k of ′ ′ ′ ′ 1/p clients C not co-located with alg’s cluster centers.opt(C ) = 0, so we get alg(C ) ≤ α·opt(C )+mr = mr ≤ k . alg is distance at least 1 from all clients C by construction, in so this only holds algif is distance 1 from all clients C in. This gives that alg is distance 1 from all k clients but (those co-located with cluster centers alg),in and distance 0 from the remaining clients. In alg turn, satisies the property of a solution corresponding to a satisfying assignment to the planar 3-SAT instance. This shows(αthat , 1)-appr an oximate universal solution to ℓ -clustering R incan be used to solve planar 3-SAT. □ 10 FUTURE DIRECTIONS In this paper, we gave the irst universal algorithms for clusteringkpr -me oblems: dian,k-means, and k-center (and their generalization ℓ -clustering). to While we achieve constant approximation guarantees for these problems, the actual constants are orders of magnitude larger than the best (non-universal) approximations known for these problems. In part to ensure clarity of presentation, we did not attempt to optimize these constants. But it is unlikely that our techniques will small lead constants to for thek-median andk-means problems (although, interestingly, we got small constants k-center for ). On the other hand, we show that in general itNP is -hard to ind an(α, β )-approximation algorithm for a universal clustering problem α matches wherthe e approximation factor for the standard clustering problem. Therefore, it is not entirely clear what one should are thereexpect: universal algorithms for clustering with approximation factors of the same order as the classical (non-universal) bounds? One possible approach to improving the constants is considering algorithms that usekmor cluster e than p 2 p centers. For example, our(9 , · 9 )-approximation for ℓ -clustering with discounts can easily be improved to ACM Trans. Algor. Universal Algorithms for Clustering Problems • 37 p p an (3 , 3 )-approximation if it is allowekd−to1use cluster 2 centers. This immediately improves all constants in the paper. For example, our (27, 49)-approximation for universal k-median becomes a(9, 18)-approximation if it is allowed tokuse− 12cluster centers. Unfortunately, our lower boundsα,on β apply even if the algorithm is allowed to use (1− ϵ )k lnn cluster centers, but it is an interesting problem to show that e.g. (1using + ϵ )k lnn cluster centers allows one to beat either bound. Another open research direction pertains to Euclidean clustering. Here, we showeRd that fordin≥ 2, α needs to be bounded away from 1, which is in stark contrast to non-universal clustering problems that admit PTASes in constant-dimension Euclidean space. But, d = for 1, i.e., for universal clustering on a line, the picture is not as clear. On a line, the lower bounds α ar one no longer valid, which brings forth the possibility of (non-bicriteria) approximations of regret. Indeed, it is known that there is 2-approximation forkuniv -median ersal on a line 38],[ and even better, an optimalalgorithm for universal k-center on a line7].[ This raises the natural question: can we design a PTAS for the universal k-median problem on a line? ACKNOWLEDGMENTS Arun Ganesh was supported in part by NSF Award CCF-1535989. Bruce M. Maggs was supported in part by NSF Award CCF-1535972. Debmalya Panigrahi was supported in part by NSF grants CCF-1535972, CCF-1955703, an NSF CAREER Award CCF-1750140, and the Indo-US Virtual Networked Joint Center on Algorithms under Uncertainty. REFERENCES [1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2017. Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms. ProceInedings of the 58th Annual IEEE Symposium on Foundations of Computing . 61ś72. https://doi.org/10. 1109/FOCS.2017.15 [2] N. Alon and Y. Azar. 1992. On-line Steiner Trees in the Euclidean Pr Plane oceedings . In of the 8th Annual Symposium on Computational Geometry. 337ś343. [3] Barbara Anthony, Vineet Goyal, Anupam Gupta, and Viswanath Nagarajan. 2010. A Plant Location Guide for the Unsure: Approximation Algorithms for Min-Max Location Problems. Math. Oper. Res. 35, 1 (Feb. 2010), 79ś101. https://doi.org/10.1287/moor.1090.0428 [4] Aaron Archer, Ranjithkumar Rajagopalan, and David B. Shmoys. 2003. Lagrangian Relaxation for the k-Median Problem: New Insights and Continuity Properties.Algorithms In - ESA 2003: 11th Annual European Symposium, Budapest, Hungary, September 16-19, 2003. Proceedings , Giuseppe Di Battista and Uri Zwick (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 31ś42. https://doi.org/10.1007/978- 3-540-39658-1_6 [5] Sanjeev Arora, Prabhakar Raghavan, and Satish Rao. 1998. Approximation Schemes for Euclidean k-medians and Related Problems. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (Dallas, Texas, USA (ST ) OC ’98). ACM, New York, NY, USA, 106ś113. https://doi.org/10.1145/276698.276718 [6] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. 2001. Local Search Heuristic for K-median and Facility Location Problems. ProceeIn dings of the Thirty-third Annual ACM Symposium on Theory of Computing (Hersonissos, Greece)(STOC ’01). ACM, New York, NY, USA, 21ś29. https://doi.org/10.1145/380752.380755 [7] I. Averbakh and Oded Berman. 1997. Minimax regret p-center location on a network with demand uncertainty Location . Science 5, 4 (1997), 247 ś 254. https://doi.org/10.1016/S0966-8349(98)00033-3 [8] Igor Averbakh and Oded Berman. 2000. Minmax Regret Median Location on a Network Under Uncertainty INFORMS . Journal on Computing12, 2 (2000), 104ś110. https://doi.org/10.1287/ijoc.12.2.104.11897 arXiv:https://doi.org/10.1287/ijoc.12.2.104.11897 [9] D. Bertsimas and M. Grigni. 1989. Worst-case examples for the spaceilling curve heuristic for the Euclidean traveling salesman problem. Operations Research Letter8, 5 (Oct. 1989), 241ś244. [10] Anand Bhalgat, Deeparnab Chakrabarty, and Sanjeev Khanna. 2011. Optimal Lower Bounds for Universal and Diferentially Private Steiner Trees and TSPs. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and , Leslie Techniques Ann Goldberg, Klaus Jansen, R. Ravi, and José D. P. Rolim (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 75ś86. [11] Sayan Bhattacharya, Parinya Chalermsook, Kurt Mehlhorn, and Adrian Neumann. 1994. New Approximability Results for the Robust k-Median Problem. Pr Inoceedings of the 14th Scandanavian Workshop on Algorithm The.or 51ś60. y [12] Costas Busch, Chinmoy Dutta, Jaikumar Radhakrishnan, Rajmohan Rajaraman, and Srinivasagopalan Srivathsan. 2012. Split and Join: Strong Partitions and Universal Steiner Trees for Graphs. 53rd In Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012 . 81ś90. ACM Trans. Algor. 38 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi [13] Jaroslaw Byrka, Fabrizio Grandoni, Thomas Rothvoß, and Laura Sanità. 2013. Steiner Tree Approximation via Iterative Randomized Rounding.J. ACM 60, 1 (2013), 6:1ś6:33. [14] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. 2011. Maximizing a Monotone Submodular Function Subject to a Matroid Constraint. SIAM J. Comput. 40, 6 (Dec. 2011), 1740ś1766. https://doi.org/10.1137/080733991 [15] Deeparnab Chakrabarty and Chaitanya Swamy. 2019. Approximation Algorithms for Minimum Norm and Ordered Optimization Problems. InProceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (Phoenix, AZ, USA(ST ) OC 2019). Association for Computing Machinery, New York, NY, USA, 126ś137. https://doi.org/10.1145/3313276.3316322 [16] Moses Charikar, Chandra Chekuri, and Martin Pál. 2005. Sampling Bounds for Stochastic Optimization. ProceedingsInof the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems, and Proceedings of the 9th International Conference on Randamization and Computation: Algorithms and Techniques (Berkeley, CA(APPROX’05/RANDOM’05) ) . Springer-Verlag, Berlin, Heidelberg, 257ś269. https://doi.org/10.1007/11538462_22 [17] Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. 1999. A Constant-factor Approximation Algorithm for the K-median Problem (Extended Abstract). InProceedings of the Thirty-irst Annual ACM Symposium on Theory of Computing (Atlanta, Georgia, USA) (STOC ’99). ACM, New York, NY, USA, 1ś10. https://doi.org/10.1145/301250.301257 [18] Chandra Chekuri. 2007. Routing and network design with robustness to changing or uncertain traic demands. SIGACT News 38, 3 (2007), 106ś129. [19] Kedar Dhamdhere, Vineet Goyal, R. Ravi, and Mohit Singh. 2005. How to Pay, Come What May: Approximation Algorithms for Demand-Robust Covering Problems. 46th In Annual IEEE Symposium on Foundations of Computer Science (FOCS 2005), 23-25 October 2005, Pittsburgh, PA, USA, Proceedings . 367ś378. [20] Uriel Feige, Kamal Jain, Mohammad Mahdian, and Vahab S. Mirrokni. 2007. Robust Combinatorial Optimization with Exponential Scenarios. InInteger Programming and Combinatorial Optimization, 12th International IPCO Conference, Ithaca, NY, USA, June 25-27, 2007, Proceedings . 439ś453. [21] Teoilo F. Gonzalez. 1985. Clustering to Minimize the Maximum Intercluster TheDistance or. Comput. . Sci.38 (1985), 293ś306. [22] Igor Gorodezky, Robert D. Kleinberg, David B. Shmoys, and Gwen Spencer. 2010. Improved Lower Bounds for the Universal and a priori TSP. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and , Maria Techniques Serna, Ronen Shaltiel, Klaus Jansen, and José Rolim (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 178ś191. [23] F. Grandoni, A. Gupta, S. Leonardi, P. Miettinen, P. Sankowski, and M. Singh. 2008. Set Covering with Our EyesPr Close oceedings d. In of the 49th Annual IEEE Symposium on Foundations of Computer Science . [24] Martin Grötschel, László Lovász, and Alexander Schrijver. 1981. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica 1, 2 (1981), 169ś197. [25] Sudipto Guha and Kamesh Munagala. 2009. Exceeding Expectations and Clustering UncertainPrData. oceedings In of the Twenty-Eighth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Providence, Rhode Island, USA (PODS ) ’09). Association for Computing Machinery, New York, NY, USA, 269ś278. https://doi.org/10.1145/1559795.1559836 [26] Anupam Gupta, Mohammad T. Hajiaghayi, and Harald Räcke. 2006. Oblivious Network Design. Proceedings In of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm (Miami, Florida) (SODA ’06). Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 970ś979. http://dl.acm.org/citation.cfm?id=1109557.1109665 [27] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. 2014. Thresholded covering algorithms for robust and max-min optimization. Math. Program. 146, 1-2 (2014), 583ś615. [28] Anupam Gupta, Viswanath Nagarajan, and R. Ravi. 2016. Robust and MaxMin Optimization under Matroid and Knapsack Uncertainty Sets. ACM Trans. Algorithms12, 1 (2016), 10:1ś10:21. [29] Anupam Gupta, Martin Pál, R. Ravi, and Amitabh Sinha. 2004. Boosted Sampling: Approximation Algorithms for Stochastic Optimization. In Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing (Chicago, IL, USA(ST ) OC ’04). ACM, New York, NY, USA, 417ś426. https://doi.org/10.1145/1007352.1007419 [30] Anupam Gupta and Kanat Tangwongsan. 2008. Simpler Analyses of Local Search Algorithms for Facility ArXiv Location. abs/0809.2554 (2008). [31] Mohammad T. Hajiaghayi, Robert Kleinberg, and Tom Leighton. 2006. Improved Lower and Upper Bounds for Universal TSP in Planar Metrics. InProceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms (Miami, Florida). 649ś658. [32] Dorit S. Hochbaum and David B. Shmoys. 1985. A Best Possible Heuristic for the k-Center Pr Math. oblem. Oper. Res. 10, 2 (May 1985), 180ś184. https://doi.org/10.1287/moor.10.2.180 [33] Dorit S. Hochbaum and David B. Shmoys. 1986. A Uniied Approach to Approximation Algorithms for BottleneckJ.Pr Aoblems. CM 33, 3 (May 1986), 533ś550. https://doi.org/10.1145/5925.5933 [34] Kamal Jain and Vijay V. Vazirani. 2001. Approximation Algorithms for Metric Facility Location and k-Median Problems Using the Primal-dual Schema and Lagrangian Relaxation. J. ACM 48, 2 (March 2001), 274ś296. https://doi.org/10.1145/375827.375845 [35] L. Jia, G. Lin, G. Noubir, R. Rajaraman, and R. Sundaram. 2005. Universal Algorithms for TSP, Steiner Tree, and Set PrCo oceveer dings . In of the 36th Annual ACM Symposium on Theory of Computing . ACM Trans. Algor. Universal Algorithms for Clustering Problems • 39 [36] Lujun Jia, Guevara Noubir, Rajmohan Rajaraman, and Ravi Sundaram. 2006. GIST: Group-Independent Spanning Tree for Data Aggregation in Dense Sensor Networks. Distribute In d Computing in Sensor Systems , Phillip B. Gibbons, Tarek Abdelzaher, James Aspnes, and Ramesh Rao (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 282ś304. [37] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. 2002. A Local Search Approximation Algorithm for K-means Clustering. Proceedings In of the Eighteenth Annual Symposium on Computational Geometry (Barcelona, Spain) (SCG ’02). ACM, New York, NY, USA, 10ś18. https://doi.org/10.1145/513400.513402 [38] Adam Kasperski and Pawel Zielinski. 2007. On the existence of an FPTAS for minmax regret combinatorial optimization problems with interval data. Oper. Res. Lett. 35 (2007), 525ś532. [39] Rohit Khandekar, Guy Kortsarz, Vahab S. Mirrokni, and Mohammad R. Salavatipour. 2013. Two-stage Robust Network Design with Exponential Scenarios. Algorithmica 65, 2 (2013), 391ś408. [40] Samir. Khuller and Yoram J. Sussmann. 2000. The Capacitated K-Center Problem. SIAM Journal on Discrete Mathematics 13, 3 (2000), 403ś418. https://doi.org/10.1137/S0895480197329776 arXiv:https://doi.org/10.1137/S0895480197329776 [41] Stavros G. Kolliopoulos and Satish Rao. 1999. A Nearly Linear-Time Approximation Scheme for the Euclidean k-median Problem. In Algorithms - ESA’ 99 , Jaroslav Nešetřil (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 378ś389. [42] Panos Kouvelis and Gang Yu. 1997. Robust 1-Median Location Problems: Dynamic Aspects and Uncertainty . Springer US, Boston, MA, 193ś240. https://doi.org/10.1007/978-1-4757-2620-6_6 [43] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. 2004. A simple linear (1 + time ϵ )-approximation algorithm k-means for clustering in any dimensions. Pr Inoceedings of the 45th IEEE Symposium on Foundations of Computer Science . 454ś462. [44] Shi Li and Ola Svensson. 2013. Approximating k-Median via Pseudo-Approximation. ProIn ceedings of the Forty-ifth Annual ACM Symposium on Theory of Computing (Palo Alto, California, USA). 901ś910. [45] Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory28 (1982), 129ś136. [46] Stuart G. Mentzer. 2016. Approximability of Metric Clustering Problems. (March 2016). Unpublished manuscript. [47] Viswanath Nagarajan, Baruch Schieber, and Hadas Shachnai. 2013. The Euclidean k-Supplier Problem. Integer PrInogramming and Combinatorial Optimization , Michel Goemans and José Correa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 290ś301. [48] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functionsÐI. Mathematical Programming 14, 1 (1978), 265ś294. https://doi.org/10.1007/BF01588971 [49] Loren K. Platzman and John J. Bartholdi, III. 1989. Spaceilling Curves and the Planar Travelling Salesman J. ACM Pr36, oblem. 4 (Oct. 1989), 719ś737. https://doi.org/10.1145/76359.76361 [50] Frans Schalekamp and David B. Shmoys. 2008. Algorithms for the universal and a priori Operations TSP. Research Letters36, 1 (2008), 1ś3. https://doi.org/10.1016/j.orl.2007.04.009 [51] David B. Shmoys and Chaitanya Swamy. 2006. An Approximation Scheme for Stochastic Linear Programming and Its Application to Stochastic Integer Programs.J. ACM 53, 6 (Nov. 2006), 978ś1012. https://doi.org/10.1145/1217856.1217860 [52] Chaitanya Swamy and David B. Shmoys. 2006. Approximation algorithms for 2-stage stochastic optimization SIGA prCT oblems. News 37, 1 (2006), 33ś46. [53] Chaitanya Swamy and David B. Shmoys. 2012. Sampling-Based Approximation Algorithms for Multistage Stochastic Optimization. SIAM J. Comput. 41, 4 (2012), 975ś1004. A RELATIONS BETWEEN VECTOR NORMS For completeness, we give a proof of the following well known fact that relates ℓ norms. vector Fact A.1. For any 1 ≤ p ≤ q and x ∈ R , we have 1/p−1/q ||x|| ≤ ||x|| ≤ n ||x|| . q p q Proof. For all p, repeatedly applying Minkowski’s inequality we have: 1/p 1/p 1/p n n−1 n−2 n X X X X p p p * + * + * + ||x|| = |x| ≤ |x| + |x | ≤ |x| + |x | + |x | ≤ . . . ≤ |x | = ||x|| . p n n−1 n i 1 i=1 i=1 i=1 i=1 , - , - , - Then we bound ||x|| by ||x|| as follows: q p ACM Trans. Algor. 40 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi 1/p 1/q p/q 1/p n n n X X X * q/p + q p p * + * + * + ||x|| = |x| = . |x| / ≤ |x| = ||x|| . q p i=1 i=1 i=1 , - , - , - , - ′ ′ ′ ′ p The inequality is by applying ||x || ≤ ||x || to the vector x with entries x = |x | . To bound ||x|| by q/p 1 p ||x|| , we invoke Holder’s inequality as follows: p/q 1−p/q n n n n X X X X p q/p p p p p q/(q−p) 1−p/q * + * + ||x|| = |x| = |x| · 1 ≤ |x | 1 = ||x|| · n . i q i=1 i=1 , i=1 - , i=1 - Taking thepth root of this inequality gives the desired bound. □ 1/c logn 1/c The limiting behavior q →as∞ shows that ||x|| ≤ ||x|| ≤ n ||x|| = 2 ||x|| , i.e. that the ∞ c logn ∞ ∞ ℓ -norm and ℓ -norm forp = Ω(logn) are within a constant factor. ∞ p B APPROXIMATIONS FOR ALL-CLIENTS INSTANCES ARE NOT UNIVERSAL In this section, we demonstrate that even (1 + ϵ )-approximate (integer) solutions for the łall clientsž instance for clustering problems are not guaranteed to(αb,eβ )-approximations for any inite α, β. This is in sharp contrast to the optimal (integer) solution, which is known (1,to2)b-appr e a oximation for a broad range of problems including the clustering problems considered in this paper [38]. Consider an instance of universal 1-median with c clients ,c and cluster centersf , f . Both the cluster centers 1 2 1 2 are at distance 1 from c , and at distances 0 andϵ respectively from c (see Figure 6).f is a(1 + ϵ )-approximate 1 2 2 solution for the realization containing both clients. In this mr is instance 0 and sof, is not an(α, β )-approximation for any inite α, β due to the realization containing c only . The same example can be used for the ℓ -clustering 2 p p 1/p objective for all p ≥ 1 sincef has approximation factor (1 + ϵ ) ≤ (1 + ϵ ) when all clients are present. In the case ofk-center, f is an optimal solution when all clients are present. Fig. 6. Example where a (1 + ϵ)-approximation for all clients has no(α, β )-approximation guarantee, for anyℓ -clustering objective including k-center. C ALGORITHMS FOR k-MEDIAN AND ℓ -CLUSTERING WITH DISCOUNTS p 2 p In this section, we prove Lemma 3.4 which states that there exists (9 ,a · 9 )-approximation algorithm for the ℓ -clustering with discounts problem. As a corollary, by p = setting 1, we will obtain Lemma 2.3 which states that there exists a(9, 6)-approximation algorithm for k-me thedian with discounts problem. To prove Lemma 3.4, we will irst use a technique due to Jain and Vazirani 34] to design [ a Lagrangian-preserving approximation for the ACM Trans. Algor. Universal Algorithms for Clustering Problems • 41 p p p ℓ -facility location with discountsℓpr-facility oblem. location with discounts (FLD) is the ℓ -clustering same as p p p with discounts, except rather than being restricted to buying k cluster centers, each cluster center has a cost f associated with buying it (discounts and cluster centers costs are not connected in any way). C.1 Algorithm forℓ -Facility Location with Discounts Since FLD is a special case of non-metric facility location, we can consider the standard linear programming primal-dual formulation for the latter. The primal program is as follows: P P p p min f x + (c − r ) y i i ij i∈F i∈F, j∈C ij j s.t. ∀j ∈ C : y ≥ 1 ij i∈F ∀i ∈ F , j ∈ C : y ≤ x ij i ∀i ∈ F : x ≥ 0 ∀i ∈ F , j ∈ C : y ≥ 0 ij The dual program is as follows: max a j∈C p p s.t. ∀i ∈ F , j ∈ C : a − (c − r ) ≤ b j ij ij j ∀i ∈ F : b ≤ f j∈C ij i ∀j ∈ C : a ≥ 0 ∀i ∈ F , j ∈ C : b ≥ 0 ij We design a primal-dual algorithm for the FLD problem. This FLD algorithm operates in two phases. In both programs, all variables start out as 0. In the irst phase, we generate a dual solution. For each client j deine a łtimež variable t which is initialized to 0. We grow the dual variables as follows: we incrtease uniformly the . We grow the a such that for anyj, j j at all times a = (t − r ) (or equivalently,aall start at 0, and we increase all a at a uniform rate, but we j j j j p p p + + only start growing a at timer ). Each b is set to the minimum feasible value (a −, i.e (c .− r ) ) . If the j ij j j ij j p p constraint b ≤ f is tight, we stop increasing t ,all a for which b = a − (c − r ) , i.e., for the clients j∈C ij i j j ij j ij j j that contributed to increasing the value b (we say these clients put weight on this cluster center). We ij j∈C continue this process until t stop allgrowing. Note that at any time the dual solution grown is always feasible. In the second phase, consider a graph induced on the cluster centers whose constraints are tight, where we place an edge between cluster centers i, i if there exists some client j that put weight on both cluster centers. Find a maximal independent S of set this graph and output this set of cluster centers.π Let be a map from clients to cluster centers such thatπ (j) is the cluster center which made t stop increasing in the irst phase of the algorithm.πIf (j) ∈ S, connect j to cluster centerπ (j), otherwise connectj to one ofπ (j)’s neighbors in the graph arbitrarily. We can equivalently think of the algorithm as generating an integral primal xsolution = 1 for all wher i ∈ eS and x = 0 otherwise, andy = 1 if j is connected toi and isy = 0 otherwise. Based again on the technique of i ij ij [34], we can show the following Lemma holds: Lemma C.1. Let x,y be the primal solution and a,b be the dual solution generated by the above FLD algorithm. x,y satisies X X X X p p p + p p (c − 3 r ) y + 3 f x ≤ 3 a ij i i j ij j j∈C i∈F i∈F j∈C (1) (2) (1) Proof. Let C be the set of clientsjin such that π (j) ∈ S and C = C \ C . For any i ∈ S, letC be the set of all clients j such that π (j) = i. Note that: ACM Trans. Algor. 42 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi X X X p p + + a = [b + (c − r ) ] = (c − r ) + f j ij ij ij i j j j∈C j∈C j∈C i i i (1) No client C in contributes to the sum b for multiple i inS (because S is an independent set). This gives ij i∈F us: X X X X X X p p p p p + + (c − 3 r ) y + f x ≤ (c − r ) y + f x ij i i ij i i ij j ij j (1) (1) i∈F i∈F i∈F i∈F j∈C j∈C X X p p = [f + (c − r ) ] ij j i∈S j∈C X X = a i∈S j∈C = a . (18) (1) j∈C (2) For each client j inC , j is connected to one of π (j)’s neighborsi. Sinceπ (j) and i are neighbors, there is ′ ′ some client j that put weight on bothπ (j) and i. Sincej put weight onπ (j) and thus π (j) going tight would ′ ′ have stopped t from increasing, t stopped increasing before or when π (j) went tight, which was when t j j j stopped growing. Since all t start growing at the same time and grow uniformly t ≤ ,t . Sincej put weight j j j p p p p + + + on π (j), we know a − (c − r ) > 0 and thus (t − r ) − (c − r ) > 0, implying t ≥ c . Similarly, j π (j )j j π (j )j j j j j π (j )j p p 1/p t ′ ≥ c ,c . Triangle inequalitycgiv≤ ces ′ + c ′ + c ≤ 3t . Then, we get: j ′ ′ ij ij π (j )j π (j )j ij π (j )j j X X X X X p p p p p + p p + p + p (c − 3 r ) y ≤ (3 t − 3 r ) = 3 (t − r ) = 3 a (19) ij j j j ij j j j (2) (2) (2) (2) i∈F j∈C j∈C j∈C j∈C Adding 3 times Eq. (18) to Eq. (19) gives the Lemma. □ C.2 Algorithm forℓ -Clustering with Discounts We now move on to inding an algorithmℓfor -clustering with discounts. We can represent the problem as a primal/dual linear program pair as follows. The primal program is: p p min (c − r ) y ij i∈F, j∈C ij j s.t. ∀j ∈ C : y ≥ 1 ij i∈F ∀i ∈ F , j ∈ C : y ≤ x ij i x ≤ k i∈F i ∀i ∈ F : x ≥ 0 ∀i ∈ F , j ∈ C : y ≥ 0 ij The dual program is as follows: max a − kz j∈C p p s.t. ∀i ∈ F , j ∈ C : a − (c − r ) ≤ b j ij ij j ∀i ∈ F : b ≤ z j∈C ij ∀j ∈ C : a ≥ 0 ∀i ∈ F , j ∈ C : b ≥ 0 ij ACM Trans. Algor. Universal Algorithms for Clustering Problems • 43 We now describe the algorithm we will use to prove Lemma 3.4, which uses the FLD algorithm from Section C.1 as a subroutine. By taking our ℓ -clustering with discounts instance and assigning all cluster centers the same cost z, we can produce a FLD instance. Whenz = 0, the FLD algorithm will either buy mor kecluster than centers, or ind a set of at mostk cluster centers, in which case we can output that set. When z = |C| max c , i, j ij 1 O (1) the FLD algorithm will buy only 1 cluster center. Thus,ϵ for suchany that log = n , via bisection search using polynomially many runs of this algorithm we can ind z such a value thatof this algorithm buys a set of cluster centersS of size k ≥ k when cluster centers costz and a set of cluster centers S of size k ≤ k cluster 1 1 2 2 centers when cluster centers costz + ϵ (the bisection search starts with the range ,|C[0| max c ] and in each i, j ij iteration, determines how many cluster centers are bought when z is the midpoint value in its current range. It then recurses on the halfa[,b] of its current range which maintains the invariant that z =when a, at leastk cluster centers are bought and whenz = b, at most k cluster centers are bought). If either k = k or k = k, we output the corresponding cluster center set. Otherwise, we will randomly choose 1 2 a solution which is roughly a combination S and Sof(we will describe how to derandomize this process as is 1 2 k−k required to prove Lemma 2.3 later). Let ρ be the solution in , 1][0to ρk + (1− ρ)k = k, i.eρ. = . Construct 1 2 k −k 1 2 ′ ′ a set S that consists of the closest cluster center S in to each cluster center in S . If the size of S is less than k , 1 2 2 1 1 ′ ′ ∗ ′ add arbitrary cluster centers from S \ S to S until its size k . is Then, with probability ρ, letS = S , otherwise 1 2 1 1 1 ∗ ′ ∗ letS = S . Then, sample a uniformly random subset k −ofk elements from S \ S and add them to S . Then 2 2 1 ∗ ′ ′ output S (note that S \ S is of size k − k so every element in S \ S has probability ρ of being chosen). 1 1 2 1 1 1 Proof of Lemma 3.4. Note that if the FLD algorithm ever outputs a solution which buyskecluster xactly centers, then by Lemma C.1 we get that for the LP solution x,y encoding this solution and a dual solution a: X X X X p p p + p p (c − 3 r ) y + 3 f x ≤ 3 a ij i i j ij j j∈C i∈F i∈F j∈C X X X p p p + p p (c − 3 r ) y + 3 kz ≤ 3 a ij j ij j j∈C i∈F j∈C X X X p p p + p (c − 3 r ) y + ≤ 3 [ a − kz] ij j ij j j∈C i∈F j∈C p p Which by duality means that this solution is (3 ,also 3 )-appr a oximation for the ℓ -clustering with discounts instance. If bisection search never inds a solution withkexactly cluster centers, but instead a pair of solutions S , S 1 2 where |S | > k,|S | < k, the idea is that the algorithm constructs a "bi-point" fractional solution from these 1 2 solutions (i.e. constructs a fractional solution that is just a convex combination of the two integral solutions) and then rounds it. (1) (1) (1) (2) (2) (2) Consider the primal/dual solutions x ,y , a and x ,y , a corresponding toS , S . By Lemma C.1 we 1 2 get: X X X p p (1) (1) p + p p (c − 3 r ) y + 3 k z ≤ 3 a ij j ij j j∈C i∈F j∈C X X X p p (2) (2) p + p p (c − 3 r ) y + 3 k (z + ϵ ) ≤ 3 a ij j ij j j∈C i∈F j∈C By combining the two inequalities and cho ϵ appr osing opriately we can get that: X X X p p (1) (2) (1) (2) p + p ′ (c − 3 r ) (ρy + (1− ρ)y ) ≤ (3 + ϵ )[ (ρa + (1− ρ)a ) − kz] ij j ij ij j j j∈C i∈F j∈C ACM Trans. Algor. 44 • Arun Ganesh, Bruce M. Maggs, and Debmalya Panigrahi For an ϵ we will ix later. (1) (1) (1) (2) (2) (2) Note that ρ(x ,y , a ) + (1− ρ)(x ,y , a ) and z form a feasible (fractional) primal/dual solution pair p p ′ for theℓ -clustering with discounts problem, and by the above inequality the primal(3solution , 3 + ϵ )-is a approximation. Then, we round the convex combination of the two solutions as described abocveb. Let e the connection (1) (2) cost of client j in the rounded solution, and c ,c the connection cost of client j in solutions S , S . Then since 1 2 j j p ′ p−1 ′ 2 p (3 + ϵ )(2 · 3 − ϵ ) < · 9 forϵ ∈ [0, 1] to prove the lemma it suices to show that for each client j the p−1 ′ expected contribution to the objective using discount r for9client j is at most 2· 3 − ϵ times the contribution of client j to the primal solution’s objective using discount r . That is: 3 p p (1)p p (2)p p p + p−1 ′ p + p + E[(c − 9 r ) ] ≤ (2· 3 − ϵ )[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ] j j j j j j Suppose client j’s nearest cluster center in S is in S . Then with probability ρ, j is connected to that cluster (1) center at connection costc , and with probability − ρ 1it is connected to the nearest cluster centerS in at (2) connection costc . Then: p p p p (1)p p (2)p p p + p + p + + E[(c − 9 r ) ] ≤ E[(c − 3 r ) ] = ρ(c − 3 r ) + (1− ρ)(c − 3r ) j j j j j j j j ′ ′ Suppose client j’s nearest cluster center in S (calliit ) is not in S . Note that each cluster center in S \ S has 1 1 1 1 1 probability ρ of being opened. Thus with probability ρ, we can upper bound c by the distance from j to i . If this j 1 does not happen, leti be j’s nearest cluster center in S and i be the cluster center nearest toi inS . One of 2 2 2 1 i , i must be opened, so we can bound j’s connection cost by its connection cost to whichever is opened. Then one of three cases occurs: • With probability ρ, j’s nearest cluster center in S is opened. Thenc is at most the distance from j to i , i.e. 1 j 1 (1) c . • With probability (1− ρ)ρ, j’s nearest cluster center in S is not opened andS is opened.c is at most the 1 j ′ ′ ′ distance from j to i . Sincei is the cluster center closesti toinS , the distance from i to i is at most the 2 1 2 1 1 1 (1) (2) distance from i to i , which is at most c + c . Then by triangle inequality, the distance j tofri om is at 1 2 j j 1 p (1)p (2)p (1) (2) p−1 most c + 2c . Using the AMGM inequality, wce get≤ 3 (c + 2c ). j j i j j j • With probability (1− ρ) , j’s nearest cluster center in S is not opened andS is opened.c is at most the 1 2 j (2) distance from j to i , i.ec. . Then we get: p + E[(c − 9 r ) ] (1)p p (2)p p p p + 2 p + p−1 (1) (2) p + ≤ρ(c − 9 r ) + (1− ρ) (c − 9 r ) + (1− ρ)ρ(3 (c + 2c ) − 9 r ) j j j j j j j (1)p p (2)p p p p + 2 p + p−1 (1) (2) p + =ρ(c − 9 r ) + (1− ρ) (c − 9 r ) + 3 (1− ρ)ρ(c + 2c − 3· 3 r ) j j j j j j j (1) p (2) p (1) p p + 2 p + p−1 p + ≤ρ(c − 3 r ) + (1− ρ) (c − 3 r ) + 3 (1− ρ)ρ(c − 3 r ) j j j j j j (2) p p−1 p + + 2· 3 (1− ρ)ρ(c − 3 r ) j j p p (1) (2) p−1 p + p−1 p + =(3 (1− ρ) + 1)[ρ(c − 3 r ) ] + (2· 3 ρ + 1− ρ)[(1− ρ)(c − 3 r ) ] j j j j (1) p (2) p p−1 p + p + ≤(2· 3 − min{ρ, 1− ρ})[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ] j j j j (1) p (2) p p−1 ′ p + p + ≤(2· 3 − ϵ )[ρ(c − 3 r ) + (1− ρ)(c − 3 r ) ]. j j j j ACM Trans. Algor. Universal Algorithms for Clustering Problems • 45 ′ 1 k−k Where the last step is given by choosing ϵ to be at most , sincep = and 1 ≤ k < k < k ≤ |F| and 2 1 |F | k −k 1 2 thus ρ and 1− ρ are both at least . This gives the lemma, except that the algorithm is randomized. However, |F | ∗ ′ the randomized rounding scheme can easily be derandomized: irst, we Schoto ose be whichever of S , S has a lower expected objective. Then, to choose the remaining k− k cluster centers to add toS , we add cluster centers ∗ ′ by one by one. When we have c cluster centers left to addSto, we add the cluster centeri fromS \ S that ∗ ′ ∗ minimizes the expected objective achievSed∪by{i} and c − 1 random cluster centers from (S \ S ) \ (S ∪{i}). Each step of the derandomization cannot increase the expected objective, so the derandomized algorithm achieves the guarantee of Lemma 3.4. □ Again, we note that Lemma 2.3 is obtained as a corollary of Lemma 3.4, wherepw=e set 1. ACM Trans. Algor.

Journal

ACM Transactions on Algorithms (TALG)Association for Computing Machinery

Published: Mar 9, 2023

Keywords: Universal algorithms

References