
Forward-Backward Proximal Splitting

# Forward-Backward Proximal Splitting

This numerical tour presents the Forward-Backward (FB) algorithm to minimize the sum of a smooth and a simple function. It shows an application to sparse-spikes deconvolution.

## Installing toolboxes and setting up the path.

You need to download the following files: signal toolbox and general toolbox.

You need to unzip these toolboxes in your working directory, so that you have toolbox_signal and toolbox_general in your directory.

For Scilab user: you must replace the Matlab comment '%' by its Scilab counterpart '//'.

Recommandation: You should create a text file named for instance numericaltour.sce (in Scilab) or numericaltour.m (in Matlab) to write all the Scilab/Matlab command you want to execute. Then, simply run exec('numericaltour.sce'); (in Scilab) or numericaltour; (in Matlab) to run the commands.

Execute this line only if you are using Matlab.

getd = @(p)path(p,path); % scilab users must *not* execute this


Then you can add the toolboxes to the path.

getd('toolbox_signal/');
getd('toolbox_general/');


## Bibliography

Excellent review papers on proximal splitting algorithms include:

Amir Beck and Marc Teboulle, Gradient-Based Algorithms with Applications to Signal Recovery Problems, in "Convex Optimization in Signal Processing and Communications". Editors: Yonina Eldar and Daniel Palomar. Cambridge university press.

P. L. Combettes and J.-C. Pesquet, Proximal splitting methods in signal processing, in: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, (H. H. Bauschke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Editors), pp. 185-212. Springer, New York, 2011.

## Forward-Backward Algorithm

We consider the problem of minimizing the sum of two functions $E^\star \umin{x \in \RR^N} E(x) = f(x) + g(x).$

We assume that $$f$$ is a $$C^1$$ function with $$L$$-Lipschitz gradient, which means that $\forall (x,y) \in \RR^N \times \RR^N, \quad \norm{\nabla f(x) - \nabla f(y)} \leq L \norm{x-y}.$

We also assume that $$g$$ is "simple", in the sense that one can compute in closed form the so-called proximal mapping, which is defined as $\text{prox}_{\ga g}(x) = \uargmin{y \in \RR^N} \frac{1}{2}\norm{x-y}^2 + \ga g(y).$ for any $$\ga > 0$$.

The FB algorithm reads, after initilizing $$x^{(0)} \in \RR^N$$, $x^{(\ell+1)} = \text{prox}_{\ga g}\pa{ x^{(\ell)} - \ga \nabla f( x^{(\ell)} ) }.$

If $$0 < \ga < 2/L$$, then this scheme converges to a minimizer of $$f+g$$.

## Sparse Regularization of Inverse Problems

We consider a linear inverse problem $y = \Phi x_0 + w \in \RR^P$ where $$x_0 \in \RR^N$$ is the (unknown) signal to recover, $$w \in \RR^P$$ is a noise vector, and $$\Phi \in \RR^{P \times N}$$ models the acquisition device.

To recover an approximation of the signal $$x_0$$, we use the Basis Pursuit denoising method, that use the $$\ell^1$$ norm as a sparsity enforcing penalty $\umin{x \in \RR^N} \frac{1}{2} \norm{y-\Phi x}^2 + \la \norm{x}_1,$ where the $$\ell^1$$ norm is defined as $\norm{x}_1 = \sum_i \abs{x_i}.$

The parameter $$\la$$ should be set in accordance to the noise level $$\norm{w}$$.

This minimization problem can be cast in the form of minimizing $$f+g$$ where $f(x) = \frac{1}{2} \norm{y-\Phi x}^2 \qandq g(x) = \la \norm{x}_1.$

$$f$$ is a smooth function, which satisfies $\nabla f(x) = \Phi^* (\Phi x - y),$ it has a $$L$$-Lipschitz gradient with $L = \norm{ \Phi^* \Phi }.$

The $$\ell^1$$-norm is "simple", because its proximal operator is a soft thresholding: $\text{prox}_{\ga g}(x)_k = \max\pa{ 0, 1 - \frac{\la \ga}{\abs{x_k}} } x_k.$

## Signal Deconvolution Problem on Synthetic Sparse Data

A simple linearized model of seismic acquisition consider a linear operator which is a filtering $\Phi x = \phi \star x$

Dimension of the problem.

N = 1024;


We load a seismic filter $$\phi$$, which is a second derivative of a Gaussian.

Width of the filter.

s = 5;


Second derivative of Gaussian.

t = (-N/2:N/2-1)';
h = (1-t.^2/s^2).*exp( -(t.^2)/(2*s^2) );
h = h-mean(h);


Define the operator $$\Phi$$.

h1 = fftshift(h); % Recenter the filter for fft use.
Phi = @(u)real(ifft(fft(h1).*fft(u)));


Display the filter and its Fourier transform.

% Fourier transform (normalized)
hf = real(fftshift(fft(h1))) / sqrt(N);
% display
q = 200;
clf;
subplot(2,1,1);
plot(-N/2+1:N/2, h);
axis([-q q min(h) max(h)]);
title('Filter, Spacial (zoom)');
subplot(2,1,2);
plot(-N/2+1:N/2, hf);
axis([-q q 0 max(hf)]);
title('Filter, Fourier (zoom)');


We generate a synthetic sparse signal $$x_0$$, with only a small number of non zero coefficients.

Number of Diracs of the signal.

s = round(N*.03);


Set the seed-number (for reproductibility).

rand('state', 1);
randn('state', 1);


Location of the diracs.

sel = randperm(N); sel = sel(1:s);


Signal $$x_0$$.

x0 = zeros(N,1); x0(sel) = 1;
x0 = x0 .* sign(randn(N,1)) .* (1-.3*rand(N,1));


Noise level.

sigma = .06;


Compute the measurements $$y=\Phi x_0 + w$$ where $$w$$ is a realization of a Gaussian white noise of variance $$\si^2$$.

y = Phi(x0) + sigma*randn(N,1);


Display signals and measurements.

clf; ms = 20;
subplot(2,1,1);
u = x0; u(x0==0) = NaN;
stem(u, 'b.', 'MarkerSize', ms); axis('tight');
title('Signal x_0');
subplot(2,1,2);
plot(y); axis('tight');
title('Measurements y');


## Sparse-Spikes Deconvolution

We now implement the FB algorithm for the sparse spikes problem.

Regularization strenght $$\la$$.

lambda = 1;


Define the proximity operator of $$\ga g$$.

proxg = @(x,gamma)perform_thresholding(x, lambda*gamma, 'soft');


Define the gradient operator of $$f$$

gradf = @(x)Phi(Phi(x)-y);


Define the Lipschitz constant of $$f$$.

L = max(abs(fft(h)))^2;


Gradient step size $$\ga$$, should be smaller than $$2/L$$.

gamma = 1.95 / L;


Initialization $$x^{(0)}$$.

x = y;


Perform one step of FB.

x = proxg( x - gamma*gradf(x), gamma );


Exercice 1: (check the solution) Compute the solution of L1 deconvolution. Keep track of the degay of the energy $$E=f+g$$.

exo1;


Display the result.

clf;
subplot(2,1,1);
u = x0; u(x0==0) = NaN;
stem(u, 'b.', 'MarkerSize', ms); axis('tight');
title(['Signal x0']);
subplot(2,1,2);
u = x; u(x==0) = NaN;
stem(u, 'b.', 'MarkerSize', ms); axis('tight');


## Over-relaxed Forward-Backward

It is possible to introduce a relaxation parameter $$-1 < \mu < 1$$ to average the current iterate of the FB with the previous one. In this case, one must set the descent paramter to $\ga=\frac{1}{L}.$

gamma = 1/L;


The algorithm initializes $$z^{(1)}=x^{(0)} \in \RR^N$$, and then it computes $x^{(\ell)} = \text{prox}_{\ga g}\pa{ z^{(\ell)} - \ga \nabla f( z^{(\ell)} ) }.$ $z^{(\ell+1)} = x^{(\ell)} + \mu \pa{ x^{(\ell)} - x^{(\ell-1)} }$

The regime $$-1<\mu <0$$ corresponds to under-relaxation. Setting $$\mu=0$$, one recovers the unrelaxed Forward-Backward. Setting $$0 < \mu < 1$$ creates over-relaxation.

Note that convergence of the iterates $$x^{(\ell)}$$ is only guaranteed for $$-1 < \mu < 1/2$$, and that one can only prove convergence of $$E(x^{(\ell)})$$ toward $$E^\star$$ in case $$\mu \in [1/2,1[$$.

For a in-depth analysis of this scheme, see the book:

H.H. Bauschke and P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces Springer-Verlag (2011)

Exercice 2: (check the solution) Impement the relaxed FB algorithm and display its convergence rate for several values of $$\mu$$.

exo2;


## FISTA Accelerated Algorithm

It is possible to use an adaptive relaxation parameter $$\mu = \mu^{(\ell)}$$ that changes from iteration to iteration.

This strategy is used in the Fast Iterative Soft Thresholding (FISTA) algorithm introduced in:

Amir Beck and Marc Teboulle, A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM Journal on Imaging Sciences 2 (2009), no. 1, 183--202.

In this algorithm, the relaxation parameters $$\mu^{(\ell)}$$ are automatically set and tend to the limit value $\mu^{(\ell)} \overset{\ell \rightarrow +\infty}{\longrightarrow} 1 .$

The step size should be set to: $\ga = \frac{1}{L}.$

gamma = 1/L;


The algorithm initializes $$z^{(1)}=x^{(0)} \in \RR^N$$, and $$t^{(1)}=1$$ and then it computes $x^{(\ell)} = \text{prox}_{\ga g}\pa{ z^{(\ell)} - \ga \nabla f( z^{(\ell)} ) }.$ $t^{(\ell+1)} = \frac{ 1 + \sqrt{1+4(t^{(\ell)})^2} }{2} \qandq \mu^{(\ell+1)} = \frac{t^{(\ell)}-1}{ t^{(\ell+1)} }$ $z^{(\ell+1)} = x^{(\ell+1)} + \mu^{(\ell+1)} \pa{ x^{(\ell)} - x^{(\ell-1)} }$

For this scheme, one cannot prove convergence of the iterates, but one can prove that it reaches an optimal convergence rate of the iterates, namely $E(f^{(\ell)}) - E^\star = O(1/\ell^2)$ while the convergence rate for the usual FB is only $$O(1/\ell)$$.

Exercice 3: (check the solution) Compute the solution of L1 deconvolution using FISTA. Keep track of the degay of the energy $$E = f+g$$.

exo3;