Iterative Shrinkage and Thresholding Algorithm¶
Our description is based on [DDDM04, Ela10, ZE10].
ISTA can be used to solve problems of the form:
and its variants (with different regularizations).
Here the objective function is:
Derivation¶
For an identity \(\bA\), the problem reduces to:
The term on the R.H.S. :
is separable in the components of \(\bx\).
The scalar function
has a minimizer given by the soft thresholding function:
Zibulevsky and Elad in [ZE10] show that a similar solution emerges when \(\bA\) is unitary also.
Daubechies et al. in [DDDM04] introduced a surrogate term:
For this function to be strictly convex w.r.t. \(\bx\), its Hessian must be positive definite. This holds if:
the largest eigen value \(\bA^H \bA\).
The new surrogate objective function is defined as:
Introducing:
we can rewrite the surrogate objective as:
As discussed above, this surrogate objective can be minimized using the soft thresholding operator:
Starting with some \(\bx_0\), we can propose an iterative method over a sequence \(\{ \bx_i \} = \{\bx_0, \bx_1, \bx_2, \dots \}\) as:
This is the IST algorithm.
By changing the regularization in (1), we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer.
Sparsifying Basis¶
Often, the signal \(\bx\) (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis \(\bB\). We have
as the representation of \(\bx\) in the basis \(\bB\).
The regularization is then applied to the representation \(\ba\). (1) becomes:
We can rewrite this as:
(14) changes to:
By substituting \(\ba = \bB^H \bx\) and \(\bx = \bB \ba\), we get:
Simplifying further:
This is the version of IST algorithm with an operator \(\bA\) and a sparsifying basis \(\bB\).
Implementation¶
We introduce the current residual:
and a step size parameter \(\alpha = \frac{1}{c}\). We also assume that a thresholding function \(\mathbf{T}\) will be user defined.
This simplifies the iteration to:
Algorithm state
Current state of the algorithm is described by following quantities:
Term |
Description |
---|---|
|
Current estimate of \(\bx\) |
|
Current residual \(\br\) |
|
Squared norm of the residual \(\| \br \|_2^2\) |
|
Change in the norm of \(x\) given by \(\|\bx_{i+1} - \bx_{i} \|_2\) |
|
Number of iterations of algorithm run so far |
Algorithm initialization
We initialize the algorithm with:
\(\bx \leftarrow \bx_0\) an initial estimate of solution given by user. It can be \(\bzero\).
\(\br \leftarrow \bb - \bA \bx_0\)
Compute
r_norm_sqr
Give a very high value to
x_change_norm
(since there is no \(\bx_{-1}\)).iterations = 0
Algorithm iteration
Following steps are involved in each iteration. These are directly based on (20).
Compute gradient: \(\bg \leftarrow \alpha \bA^H \br\)
Update estimate: \(\bx \leftarrow \bx + \bg\)
Transform \(\alpha \leftarrow \bB^H \bx\)
Threshold: \(\alpha \leftarrow \mathbf{T} (\alpha)\)
Inverse transform \(\bx \leftarrow \bB \alpha\)
Update residual: \(\br \leftarrow \bb - \bA \bx\)