We study optimization problems whereby the optimization variable is a probability measure. Since the probability space is not a vector space, many classical and powerful methods for optimization (e.g., gradients) are of little help. Thus, one typically resorts to the abstract machinery of infinite-dimensional analysis or other ad-hoc methodologies - not tailored to the probability space - which however involve projections or rely on convexity-type assumptions. We believe instead that these problems call for a comprehensive methodological framework for calculus in probability spaces. In this work, we combine ideas from optimal transport, variational analysis, and Wasserstein gradient flows to equip the Wasserstein space (i.e., the space of probability measures endowed with the Wasserstein distance) with a variational structure, both by combining and extending existing results and introducing novel tools. Our theoretical analysis culminates in very general necessary optimality conditions for optimality. Notably, our conditions (i) resemble the rationales of Euclidean spaces, such as the Karush-Kuhn-Tucker and Lagrange conditions, (ii) are intuitive, informative, and easy to study, and (iii) they provide either closed form solutions or computationally attractive algorithms. We believe this framework lays the foundation for new algorithmic and theoretical advancements in the study of optimization problems in probability spaces, which we exemplify with numerous case studies and applications to machine learning, drug discovery, and distributionally robust optimization.
It is quite simple: This work considers optimization problems of the form
Our main result are general first-order optimality conditions for the optimization problem above:
If \(\varOptimal \in \Pp{}{\reals^d}\) is an optimal solution with finite second moment and provided that a constraint qualification holds, then the ``Wasserstein subgradients'' are ``aligned'' with the constraints at ``optimality'', i.e.,
To get some intuition, let's consider the following two pictures.
The figure on the left depicts two empirical candidate solutions \( \textcolor{blue}{\varGeneric_1}\) and \(\textcolor{orangeDark}{\varGeneric_2}\) for the generic optimization problem with \(\feasibleSet = \{\varGeneric\in\Pp{2}{\reals^2}\st\expectedValue{(x_1, x_2)\sim\varGeneric}{x_1^2+x_2^2}\leq \varepsilon^2\}\) (i.e., bounded second moment) and \(\objective(\varGeneric) = \expectedValue{(x_1, x_2) \sim \textcolor{blue}{\varGeneric}}{x_1+x_2}\), of which we show the contours and the gradient vector field. The solid (resp., dashed) black arrows represent the gradient of the constraint function \(\expectedValue{(x_1, x_2)\sim\varGeneric}{x_1^2+x_2^2}- \varepsilon^2\) at \(\textcolor{blue}{\varGeneric_1}\) (resp., \(\textcolor{orangeDark}{\varGeneric_2}\)). Here, \(\textcolor{blue}{\varGeneric_1}\) is indeed a candidate optimal solution: The gradient of the objective is aligned with the gradient of the constraint. For \(\textcolor{orangeDark}{\varGeneric_2}\), instead, these two are not aligned. Thus, \(\textcolor{orangeDark}{\varGeneric_2}\) cannot be optimal.
The figure on the right shows that \(\textcolor{blue}{\varOptimal}\) satisfies the optimality condition for the generic optimization problem with \(\feasibleSet = \closedWassersteinBall{\textcolor{red}{\varReference}}{\varepsilon}{2}=\{\varGeneric\in\Pp{2}{\reals^2}\st\wassersteinDistance{\varGeneric}{\textcolor{red}{\varReference}}{2}\leq\varepsilon\}\) (i.e., Wasserstein ball centered at \(\textcolor{red}{\varReference}\) of radius \(\varepsilon)\) and \(\objective(\varGeneric) = \expectedValue{(x_1, x_2) \sim \varGeneric}{x_1 + x_2}\), of which the contours and the gradient vector field are shown. The black arrows connecting \(\textcolor{red}{\varReference}\) and \(\textcolor{blue}{\varOptimal}\) represent the gradient of the constraint function \(\wassersteinDistance{\textcolor{blue}{\varGeneric}}{\textcolor{red}{\varReference}}{2}^2-\varepsilon^2\). Since \(\textcolor{blue}{\varOptimal}\) is optimal, the gradient of the objective and the constraint are aligned at all the "particles" of \(\textcolor{blue}{\varOptimal}\).
we illustrate our optimality conditions by informally studying a simple and accessible version of the general optimization problem. For \(\theta \neq 0\) and \(\varepsilon > 0\), consider the problem
For more examples and applications, check out the paper!
@article{lanzetti2024variational, author = {Lanzetti, Nicolas and Terpin, Antonio and D\"orfler, Florian}, title = {Variational Analysis in the Wasserstein Space}, journal={arXiv preprint arXiv:2406.10676} year = {2024} }