The inconsistency among the unlabeled examples:

The inconsistency between labeled and unlabeled examples:

Objective function:

The optimal pseudo labels $\mathbf y^u$ can be found by minimizing $F$, formally

This is a convex optimaiztion problem and can be solved effectively by numerical methods, which have nothing to do with your learning algorithm $\mathcal A$.

Now we do want to involve our learning algorithm $\mathcal A$ and use the same idea of problem $\eqref{prob1}$ to improve $\mathcal A$. On the other hand, we want to put problem $\eqref{prob1}$ into a machine learning scenario and still find optimal ${\mathbf y^u}$.

Suppose you are going to solve problem $\eqref{prob1}$ with gradient. Therefore during every step of your iteration, you update $\mathbf y^u$ to get a smaller $F$. In the machine learning scenario, what you update is a classifier $H$ which would predict and pseudo-label $\mathbf y^u$ to get a smaller $F$.

That is to say, we are going to substitute $\mathbf y = \left [\mathbf y^l; \mathbf y^u \right ]$ to $\mathbf y = \left [\mathbf y^l; H(\mathbf x^u) \right ]$ or even $\mathbf y = H(\mathbf x) \text{ s.t. } H(\mathbf x^l) = \mathbf y^l$.

We further expand our machine learning scenario to involve an ensemble of classifiers:

At the $(T+1)^{\text{th}}$ iteration, our goal is to find:

To simplify the notation, define $H^{(T)}(\mathbf x_j) \equiv H_{j}$, $h^{(T+1)}(\mathbf x_j) \equiv h_j$ and $\alpha_{T+1} \equiv \alpha$.

Thus

Expand $F(\mathbf y, S)$ by substituding $\mathbf y^u$ with $H(\mathbf x^u)$:

Now the problem becomes:

Problem $\eqref{prob3}$ involves products of $\alpha$ and $h_i$, making it nonlinear and, hence, difficult to optimize. We are going to apply Bound-Optimization below to solve this problem.

We first further expand $F_l(\mathbf y, S)$:

Then we find an upper bound of $F_u(\mathbf y^u, S)$:

Flip $i$ and $j$ of the first term in $F_uâ€™(\mathbf y^u, S)$:

Define:

Note that when calculating $p_j$ and $q_j$, $j$ is fixed and $p_j$ and $q_j$ are functions of $H_j$.

$p_j$ and $q_j$ can be interpreted as the confidence in classifying the unlabeled example $\mathbf x_i$ into the positive class and the negative class, respectively.

$\text{Claim 1}:$ Problem $\eqref{prob3}$ is equivalent to

The expression in $\eqref{prob4}$ is difficult to optimize since the weight $\alpha$ and the classifier $h$ are coupled together. We simplify the problem furhter using the upper bound of $\overline{F}_1$.

$\text{Claim 2}:$ Problem $\eqref{prob4}$ is equivalent to