Projection onto a simplex
http://arxiv.org/abs/1101.6081 ( arXiv )
ALL VOLUMES CITATION SEARCH THUMBNAILS PDF VIEWER PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/IR00000353/00002
 Material Information
Title: Projection onto a simplex
Physical Description: Technical Reports
Language: English
Creator: Ye, Xiaojing
Chen, Yunmei
Publication Date: February 9, 2011
 Notes
Abstract: This mini-paper presents a fast and simple algorithm to compute the projection onto the canonical n-dimensional simplex. Utilizing the Moreau’s identity, we show that the problem is essentially a univariate minimization and the objective function is strictly convex and continuously differ- entiable. Moreover, it is shown that there are at most n candidates which can be computed explicitly, and the minimizer is the only one that falls into the correct interval.
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Xiaojing Ye.
Publication Status: Unpublished
Citation/Reference: Feb. 18, 2011: version includes minor typographical updates from slightly earlier version online from 1/30/2011-2/17/2011. Earlier version remains available.
 Record Information
Source Institution: University of Florida Institutional Repository
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution.
System ID: IR00000353:00002

Downloads

This item has the following downloads:

( PDF )


Full Text









Projection Onto A Simplex


Yunmei Chen and Xiaojing Ye *

February 9, 2011



Abstract
This mini-paper presents a fast and simple algorithm to compute the projection onto the
canonical simplex A". Utilizing the Moreau's identity, we show that the problem is essentially
a univariate minimization and the objective function is strictly convex and continuously differ-
entiable. Moreover, it is shown that there are at most n candidates which can be computed
explicitly, and the minimizer is the only one that falls into the correct interval.

Keywords. Nonlinear programming, projection onto a simplex, Moreau's identity, proximity
operators.


1 Introduction

The computation of projection onto simplex appears in many imaging and statistics problems,
such as multiphase segmentation, diffusion tensor imaging, etc. The problem can be described
as follows: for any given vector y E R', the projection of y onto the simplex A" is to solve the
minimization problem
x -arg mmin || - y||, (1)

where || || denotes the regular Euclidean norm and A" is the canonical simplex defined by


A": x= (x,-- .,x )TEP. :0 i= 1

For example, A2 is the line segment between two points (1, 0) and (0, 1) in R2, and A3 is the triangle
in R3 with vertexes (1, 0, 0), (0, 1, 0) and (0, 0, 1). The solution is nontrivial and it does not yield
an explicit form. Early attempts include iterative projections onto affine subspaces of R" [3], and
Lagrangian method that shows the optimal solution is (y - t)+ for some t followed by the iterative
bisection algorithm to find this t, etc. In addition, it is worth pointing out that there are a large
number of literatures projection onto the i1 ball {x e . : ||x||i < 1}, e.g. [2, 5], which has has
similar formulation and is in general easier to solve.
In this report, we present a novel, faster and simpler numerical algorithm to solve (1) for any
dimension n > 2.
*Department of Mathematics, University of Florida, PO Box 118105, Gainesville, FL 32611-8105. Phone (352)
392-0281. Fax (352) 392-8357. Email: yun,xye~ufl.edu, Web: http://www.math.ufl.edu/~yun,xye.









2 Algorithm


The minimization problem in (1) can be rewritten as
1
minm (x) + I x-y2, y (y 1, ,, . , (3)
zxER 2
where w is the indicator function of A" defined by

,W ) { 0 if x EA",
( +oo otherwise .

As w is a proper and convex function due to the fact that A" is close and convex, we know (3), and
hence (1), has a unique solution.
In the literature, the solution to (3) is also denoted by the Moreau's :i.. :im:il7- operator [4]
1
(I+ 7)-1 (y) : arg min 7(x) + I||X - ||2, (5)
xER1 2

which satisfies the Moreau's identity [1]

y= (I+97)-(y) + (I + *)'(y), (6)

where 7* is the Fenchel transform of 7 defined by

*(y) := sup (z,y) - (z). (7)
zERn

Note that 7* in (7) has a simple form if 7 is the indicator function of A" defined in (4), i.e.

7*() : sup) sup (z, y)
zER" zEA"
n (8)
= max > = max {yi}.
zEL, 1 i= 1
So 7*(y) is just the largest component of y. Based on (6), it is suffice to compute the Moreau's
p]...-::iil7, operator of w*

(I+ *) y) argin*( min z) + 2
zER 1 (9)
=arg min max {zi} + |z - y12(
zERn 1
and then (5) can be obtained by

(I+ O97)- (y) =y - (I+ O*')- (y). (10)

To find the solution of (9), we first sort the components of y (= yi, -- ,y,)T E Rn in the
ascending order as Y(1) < ... < 'i, ,. Then we know the minimization problem in (9) can be written
as
mm max {zi} + - z - y||2 = min min t + ||z - y||2 : max zi t (11)
zERn l1









by introducing the new variable t. For any fixed t, the inner minimization problem on the right of
(11) is


min t + -I|z
zERI � 2
z6s 2


y112 : max ~Z}
l

(12)


and the minimizer z(t) (as it depends on t) is obviously


if y> > t
if Yt < t


and the minimum value of (12) is


(13)


f(t) : + ll(t)- yll21


t + 2 Z(t _ yj)2
j=1
L 1 n
t + 2 (t -y(j))2
j7i+1
1
2 (t


if t < Y(1)


1, . ,n- 1


if t > I, ,


Note that f is well-defined at Y(1), ... , , , as shown in Lemma 2.1
equivalent to the univariate minimization problem


below. Therefore, (11) is


min f(t)
tER


(15)


where f(t) is defined in (14).
Note that f in (14) is piecewise quadratic, and hence is piecewise convex and smooth. Moreover,
the following lemma shows that f e C1(R).
Lemma 2.1. The function f /. fi, ,1 in (14) is continuous on R, and its derivative f' exists and is
also continuous on R.
Proof: According to (14), for all i = 1, ... , n, there are


lim f(t)= , + E 01, ,
t-Y(i) j i


lim f'(t)
(i)


n
1 + E(, ,
j i


-y(j))2 -", +, + E ",
j i+1

n
Y(j)) 1 + (', , - y(j))
j i+1


which prove the lemma. D
Based on Lemma 2.1, we can obtain the derivative f' as


f'(t) -


n
1 + (t - yj) if t < Y(1)
j-1
n
1 + (t - (j)) if I,,", j i+1
1 + (t - 'I, ,) if t ,>,


Note that f' is well-defined at the points y(1), - - ,, i,, in (18) as it is continuous. Based on the
discussion above, we have the theorem that restricts the search in n candidates and obtains the
solution to (1) using the only candidate that falls into the correct interval.


(14)


y())2 - f(, ,)


(16)


(17)


(18)


if,, ,< t< t , 1), i


- lim f'(,, ,),
t-+y)


(z(t)) -
ye Y









Theorem 2.2. For any vector ye . , the p,..i.. /.'.. of y onto A" as in (1) is obtained by the
positive part of y - t:
x (y - t). (19)
where t is the only one in {ti : i =0, 0 , n - 1} that falls into the corresponding interval as follows,
i}n y(j) 1
ti := E -iz y( - , i =0,... ,n - 1, where t (20)
Proof. Based on Lemma 2.1, we know f is piecewise quadratic and f e C1(R). As (3) has a
unique minimizer, we know that (9) has a unique minimizer as well, and hence the optimal t is
unique. Therefore, there is only one single t that has vanish derivative i.e. f'(t) = 0, and this t is
the minimizer of mintER f(t).
Now we compute the possible choices for the optimizer t. Note that f is piecewisely defined,
and according to (18), there are at most n points that have vanish derivative. It is easy to check
that i., v are those shown in (20).
However, since the optimal t exists and is unique, we know that there is one and only one of
{ti : i 0, - - , n - 1} that can be the optimal L. In another words, there is only one ti that falls
into the "correct" interval and hence is the optimal choice of L.
Once t is obtained, we have the minimizer of (11) as z(t) where z(.) is defined in (13), and hence
obtain the solution x to (1) based on the Moreau's identity (10):

i (Y- {L I it h- if yi > (21)
0 otherwise
which implies that x (= y - t)+. This completes the proof. E
Based on Theorem 2.2, we only need to find the ti in (20) that falls in the corresponding interval,
and claim it as the optimal t for (21). This procedure is described as in Steps 2-4 in the Algorithm
1. An interpretation of such procedure is as follows. Suppose we start with a very large t, namely
t > ii, ,, and let t go towards negative. First, we can see f'(t) = 1+ (t- i, ,) > 1 > 0 if t > ii, , and
hence the optimal t cannot occur in 'i, ,, oo). As f' is continuous and f'(,i, ,) = 1 > 0, we know f'
is positive near y ). As f is quadratic in ',1, -1), , ,), this also implies that the optimal t, if exists
in this interval, can only be t,-i :- , - 1 based on (18). Due to the existence and uniqueness of
t, we can surely accept t t1-i if i, -1) < tn1- < I,, ,, or simply t,-i > 1,, -1) (since obviously
t1-i < ,, ,). If t,1- < ", -1), we know the optimal t is not achieved yet and hence f'(,, _1)) > 0
due to the fact that f is quadratic in ',, _1), 1, ,). Repeating the similar argument, we can accept
t tn-2 :- (", -1) + ",i - 1)/2 if 'i, -2) < tn-2 < 1i, -1), or simply tn-2 > li, -2) as we know
f(t) is quadratic and keeps increasing near 1,, -1) in this case (hence tn-2 cannot be equal to or
larger than 1,, _1)). Based on this ,..,,1- -i-. we can repeat at most n - 1 steps of such "compute ti
- compare to ',, ," processes (i n - 1, n - 2, ... , 1) until t is found. If i is still not found after
n - 1 steps, then it must be to : ((E' j - 1)/n.
Completed with the last step x = (y - i)+, the algorithm is summarized in Algorithm 1.



3 Numerical Examples

We manually computed the projections of several examples of y in low-dimensional (2D and
3D) cases using Algorithm 1, and the results turned out to be exact as expected. For more general
cases, it is usually nontrivial to demonstrate the correctness of Algorithm 1 numerically.










Algorithm 1 (projsplx). Projection of y E R" onto the simplex A".

1. Input y (yi, , y)T E R;

2. Sort y in the ascending order as y(1) < " '- < '., ,, and set i = n - 1;


3. Compute ti _ Y() . If i > , , then set = ti and go
and redo Step 3 if i > 1 or go to Step 4 if i 0;

4. Set i Z - Yj-1.

5. Return x (= y - 1+ as the projection of y onto A".


k .
* �.'....'. . . . - '" . . .- .... �.
I' ^ ;*^ *J- .' , ** *
:. . : ... .: ' ,, .:- , : . . *
*. . . ; .. .

� ; ., :" ., .o . :- �7


3 2 1 0 1
(a) 2D case


2 3 4


to Step 5, otherwise set i +- i- 1


*.






2 -2 -1 5 -05 05
(b) 3D case


15 2


Figure 1: Projections (blue) of 1024 random points (red) onto simplexes A2 (left) and A3 (right).


For completeness of this report, we show several numerical examples of Algorithm 1. The
algorithm is implemented in C and compiled in MATLAB (Version R2010b) using mex function.
The computations are performed on a Lenovo ThinkPad laptop with Intel Core 2 CPU at 2.53GHz,
3GB of memory, and GNU/Linux (Kernel version 2.6.35) operating system.
We first generate 1024 2D points from N(0, 2) using the MATLAB function randn(2,1024)
and project them onto A2 using Algorithm 1. The results are plotted in Figure 1(a). A similar
test projects 1024 samples from N(0, 0.513) to A3 and plots the results in Figure 1(b). Note that
both tests are not designed to show that Algorithm computed the correct projections, since the
correspondences are not shown. On the other hand, these two tests demonstrate that the outputs
x are indeed located at the simplex (note that we did not check :ir, vi., re in Algorithm 1 that x is
on the simplex).
The next test shows the CPU time of projections of 216 = 65, 536 n-dimensional points draw
from N(0, I,) for n = 2, 3,... , 50. The results are plotted in Figure 3. The CPU time has the trend
of increase as n becomes larger. Interestingly, the increasing rate is rather low. For example, for
n = 5 to n = 50, the CPU time used only changes from 1.88s to 2.52s, for which the difference seems
rather minor compared to the significant changes in computational complexity of the problem.









2.6


2.5 --

u 2.4-
0
o 2.3

S2.2 -

-- 2.1 -

0 2-

1.9-

1.8 I I I I
0 10 20 30 40 50
Dimension n

Figure 2: CPU time (in seconds) of projections of 65, 536 n-dimensional points drawn from N(0, Ia).
The tests are carried on for n = 2,... ,50.

4 Concluding Remarks

In this paper we propose a fast and simple algorithm projsplx as shown in Algorithm 1 that
projects an n-dimensional vector y to the canonical simplex A/. The computation comprises of the
sort of components of the input y and at most n simple "compute-co .'iip..." processes. The solution
is exact and the algorithm is extremely easy to implement. The MATLAB/C code is provided on
the following websites for public use.

* Author's website: http://www.math.ufl.edu/-xye/codes/projsplx.zip
* Matlab Central: http://www. mathworks. com/matlabcentral/f ileexchange/30332

5 Acknowledgement

Xiaojing Ye would like to thank Stephen Becker (CalTech) for his helpful comments in comple-
menting the references.

References

[1] P. Combettes and V. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale
Model. Simul., 4(4):1168-1200, 2005.









[2] John Duchi, Shai S. Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto
the 11-ball for learning in high dimensions. In Proceedings of the 25th international ...'f. ,. in.
on Machine learning, pages 272-279, 2008.

[3] C. Michelot. A finite algorithm for finding the projection of a point onto the canonical simiplex
of R". J. Optim. T. .,,.;/ Appl., 50(1):195-200, 1'-'.

[4] J. Moreau. Proximity et duality dans un space hilbertien. Bull. Soc. Math. France, 93:273-299,
1- i ,

[5] Ewout van den Berg and Michael P. Friedlander. Probing the pareto frontier for basis pursuit
solutions. SIAM J. Sci. Comput., 31:890-912, November 2008.




University of Florida Home Page
© 2004 - 2011 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated May 24, 2011 - Version 3.0.0 - mvs