# Lagrange duality

Yao Yao on December 4, 2014

## 0. Introduction

In the study of convex optimization, the mathematical optimization problems follow the form below:

\begin{align} & \underset{x \in \mathbb{R}^{n}}{\text{minimize}} & & f(x) \newline & \text{subject to} & & x \in C. \tag{1} \label{eq1} \end{align}

where

• $x \in \mathbb{R}^{n}$ is a vector known as the optimization variable
• $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ is a convex function that we want to minimize
• $C \subseteq \mathbb{R}^{n}$ is a convex set describing the set of feasible (capable of being done or carried out) solutions

A necessary and sufficient condition for $x^{\ast} \in \mathbb{R}^{n}$ to be the globally optimal of problem $(\ref{eq1})$ is that $\nabla_{x}{f(x^{\ast})} = 0$.

In the more general setting of convex optimization problem with constraints, however, this simple optimality condition does not work. One primary goal of duality theory is to characterize the optimal points of convex programs in a mathematically rigorous way.

Here we propose a form of generic differentiable convex optimization problems, as:

\begin{align} & \underset{x \in \mathbb{R}^{n}}{\text{minimize}} & & f(x) \newline & \text{subject to} & & g_{i}(x) \leq 0, & i=1,\cdots,m \newline & & & h_{i}(x) = 0, & i=1,\cdots,p \tag{OPT} \label{eqopt} \end{align}

where

• $x \in \mathbb{R}^{n}$ is the optimization variable
• $f : \mathbb{R}^{n} \rightarrow \mathbb{R}$ and $g_{i} : \mathbb{R}^{n} \rightarrow \mathbb{R}$ are differentiable convex functions
• $h_{i} : \mathbb{R}^{n} \rightarrow \mathbb{R}$ are affine functions
• An affine function is a function of the form $f(x) = a^{T}x+b$ for some $a \in \mathbb{R}^{n}$, $b \in \mathbb{R}$
• $g_{i}$ is one of the $m$ convex inequality constraints
• $h_{i}$ is one of the $p$ affine equality constraints

## 1. The Lagrangian

Given a convex constrained minimization problem of the form $(\ref{eqopt})$, the (generalized) Lagrangian is a function $L : \mathbb{R}^{n} \times \mathbb{R}^{m} \times \mathbb{R}^{p} \rightarrow \mathbb{R}$ defined as

\begin{align} L(x, \alpha, \beta) = f(x) + \sum_{i=1}^{m}{\alpha_{i}g_{i}(x)} + \sum_{i=1}^{p}{\beta_{i}h_{i}(x)} \tag{2} \label{eq2} \end{align}

where

• $x \in \mathbb{R}^{n}$ is the primal variables of the Lagrangian
• $\alpha \in \mathbb{R}^{m}$ and $\beta \in \mathbb{R}^{p}$ are the dual variables of the Lagrangian, a.k.a Lagrange multipliers

Intuitively, the Lagrangian can be thought of as a modified version of the objective function to the original convex optimization problem $(\ref{eqopt})$ which accounts for each of the constraints. The Lagrange multipliers $\alpha_{i}$ and $\beta_{i}$ can be thought of penalties of violating the constraints. The key intuition behind the theory of Lagrange duality is the following:

For any convex optimization problem, there always exist settings of the dual variables such that the unconstrained minimum of the Lagrangian with respect to the primal variables (keeping the dual variables fixed) coincides with the solution of the original constrained minimization problem.

We formalize this intuition when we describe the KKT conditions in Section 6.

## 2. Primal and dual problems

To show the relationship between the Lagrangian and the original convex optimization problem $(\ref{eqopt})$, we introduce the notions of the “primal” and “dual problems” associated with a Lagrangian:

### The primal problem

The primal objective function $\theta_{P} : \mathbb{R}^{n} \rightarrow \mathbb{R}$ is defined as

\begin{align} \theta_{P}(x) = \underset{\alpha, \beta : \alpha_{i} \geq 0, \forall i}{\text{max}} & & L(x, \alpha, \beta) \end{align}

Then the unconstrained minimization problem

\begin{align} \underset{x}{\text{min}} & & \theta_{P}(x) \tag{P} \label{eqp} \end{align}

is known as the primal problem.

• Generally, we say that a point $x \in \mathbb{R}^{n}$ is primal feasible if $g_{i}(x) \leq 0$, $i = 1, \cdots, m$ and $h_{i}(x) = 0$, $i = 1, \cdots, p$
• We typically use the vector $x^{\ast} \in \mathbb{R}^{n}$ to denote the solution of $(\ref{eqp})$, and we let $p^{\ast} = \theta_{P}(x^{\ast})$ denote the optimal value of the primal objective.

## The dual problem

By switching the order of the minimization and maximization above, we obtain an entirely different optimization problem.

The dual objective function $\theta_{D} : \mathbb{R}^{m} \times \mathbb{R}^{p} \rightarrow \mathbb{R}$ is defined as

\begin{align} \theta_{D}(\alpha, \beta) = \underset{x}{\text{min}} & & L(x, \alpha, \beta) \end{align}

Then the constrained maximization problem

\begin{align} \underset{\alpha, \beta : \alpha_{i} \geq 0, \forall i}{\text{max}} & & \theta_{D}(\alpha, \beta) \tag{D} \label{eqd} \end{align}

is known as the dual problem.

• Generally, we say that $(\alpha, \beta)$ are dual feasible if $\alpha_{i}(x) \geq 0$, $i = 1, \cdots, m$
• We typically use the vector $(\alpha^{\ast}, \beta^{\ast}) \in \mathbb{R}^{m} \times \mathbb{R}^{p}$ to denote the solution of $(\ref{eqd})$, and we let $d^{\ast} = \theta_{D}(\alpha^{\ast}, \beta^{\ast})$ denote the optimal value of the dual objective.

## 3. Interpreting the primal problem

\begin{align} \theta_{P}(x) & = \underset{\alpha, \beta : \alpha_{i} \geq 0, \forall i}{\text{max}} L(x, \alpha, \beta) \newline & = \underset{\alpha, \beta : \alpha_{i} \geq 0, \forall i}{\text{max}} \left [ f(x) + \sum_{i=1}^{m}{\alpha_{i}g_{i}(x)} + \sum_{i=1}^{p}{\beta_{i}h_{i}(x)} \right ] \newline & = f(x) + \underset{\alpha, \beta : \alpha_{i} \geq 0, \forall i}{\text{max}} \left [ \sum_{i=1}^{m}{\alpha_{i}g_{i}(x)} + \sum_{i=1}^{p}{\beta_{i}h_{i}(x)} \right ] \end{align}

## 4. Interpreting the dual problem

The dual objective, $\theta_{D}(\alpha, \beta)$, is a concave function of $\alpha$ and $\beta$. To interpret the dual problem, first we make the following observation:

Lemma 1: If $(\alpha, \beta)$ are dual feasible, then $\theta_{D}(\alpha, \beta) \leq p^{\ast}$.

Proof:

\begin{align} \theta_{D}(\alpha, \beta) & = \underset{x}{\text{min}} L(x, \alpha, \beta) \newline & \leq L(x^{\ast}, \alpha, \beta) \newline & = f(x^{\ast}) + \sum_{i=1}^{m}{\alpha_{i}g_{i}(x^{\ast})} + \sum_{i=1}^{p}{\beta_{i}h_{i}(x^{\ast})} \newline & \leq f(x^{\ast}) = p^{\ast} \end{align}

Lemma 2 (Weak Duality): For any pair of primal and dual problems, $d^{\ast} = \theta_{D}(\alpha^{\ast}, \beta^{\ast}) \leq p^{\ast}$.

This is obvious following Lemma 1.

Lemma 3 (Strong Duality): For any pair of primal and dual problems which satisfy certain technical conditions called constraint qualifications, then $d^{\ast} = p^{\ast}$.

Slater’s condition: a primal/dual problem pair satisfy Slater’s condition if there exists some feasible primal solution $x$ for which all inequality constraints are strictly satisfied (i.e., $g_{i}(x) < 0$, $i = 1, \cdots, m$).

• 求 dual problem $(\ref{eqd})$
• 等价于
• 求 primal problem $(\ref{eqp})$
• 等价于
• 求最原始的 optimization 问题 $(\ref{eqopt})$

## 5. Complementary slackness

Lemma 4 (Complementary Slackness): If strong duality holds, then $\alpha_{i}^{\ast} g_{i}(x^{\ast}) = 0$ for each $i = 1,\cdots,m$.

$\alpha_{i}^{\ast} > 0 \Rightarrow g_{i}(x^{\ast}) = 0$

$g_{i}(x^{\ast}) < 0 \Rightarrow \alpha_{i}^{\ast} = 0$

## 6. The KKT (Karush-Kuhn-Tucker) conditions

Theorem 1: Suppose that $x^{\ast} \in \mathbb{R}^{n}$, $\alpha^{\ast} \in \mathbb{R}^{m}$ and $\beta^{\ast} \in \mathbb{R}^{p}$ satisfy the following conditions:

1. (Primal feasibility) $g_{i}(x^{\ast}) \leq 0$, $i = 1, \cdots, m$ and $h_{i}(x^{\ast}) = 0$, $i = 1, \cdots, p$,
2. (Dual feasibility) $\alpha_{i}^{\ast} \geq 0$, $i = 1, \cdots, m$,
3. (Complementary slackness) $\alpha_{i}^{\ast} g_{i}(x^{\ast}) = 0$, $i = 1, \cdots, m$, and
4. (Lagrangian stationarity) $\nabla_{x}{L(x^{\ast}, \alpha^{\ast}, \beta^{\ast})} = 0$.

Then $x^{\ast}$ is primal optimal and $(\alpha^{\ast}, \beta^{\ast})$ are dual optimal. Furthermore, if strong duality holds, then any primal optimal $x^{\ast}$ and dual optimal $(\alpha^{\ast}, \beta^{\ast})$ must satisfy the conditions 1 through 4.

These conditions are known as the Karush-Kuhn-Tucker (KKT) conditions.