Projection Onto A Simplex
Yunmei Chen and Xiaojing Ye *
February 9, 2011
Abstract
This minipaper presents a fast and simple algorithm to compute the projection onto the
canonical simplex A". Utilizing the Moreau's identity, we show that the problem is essentially
a univariate minimization and the objective function is strictly convex and continuously differ
entiable. Moreover, it is shown that there are at most n candidates which can be computed
explicitly, and the minimizer is the only one that falls into the correct interval.
Keywords. Nonlinear programming, projection onto a simplex, Moreau's identity, proximity
operators.
1 Introduction
The computation of projection onto simplex appears in many imaging and statistics problems,
such as multiphase segmentation, diffusion tensor imaging, etc. The problem can be described
as follows: for any given vector y E R', the projection of y onto the simplex A" is to solve the
minimization problem
x arg mmin   y, (1)
where   denotes the regular Euclidean norm and A" is the canonical simplex defined by
A": x= (x, .,x )TEP. :0
i= 1
For example, A2 is the line segment between two points (1, 0) and (0, 1) in R2, and A3 is the triangle
in R3 with vertexes (1, 0, 0), (0, 1, 0) and (0, 0, 1). The solution is nontrivial and it does not yield
an explicit form. Early attempts include iterative projections onto affine subspaces of R" [3], and
Lagrangian method that shows the optimal solution is (y  t)+ for some t followed by the iterative
bisection algorithm to find this t, etc. In addition, it is worth pointing out that there are a large
number of literatures projection onto the i1 ball {x e . : xi < 1}, e.g. [2, 5], which has has
similar formulation and is in general easier to solve.
In this report, we present a novel, faster and simpler numerical algorithm to solve (1) for any
dimension n > 2.
*Department of Mathematics, University of Florida, PO Box 118105, Gainesville, FL 326118105. Phone (352)
3920281. Fax (352) 3928357. Email: yun,xye~ufl.edu, Web: http://www.math.ufl.edu/~yun,xye.
2 Algorithm
The minimization problem in (1) can be rewritten as
1
minm (x) + I xy2, y (y 1, ,, . , (3)
zxER 2
where w is the indicator function of A" defined by
,W ) { 0 if x EA",
( +oo otherwise .
As w is a proper and convex function due to the fact that A" is close and convex, we know (3), and
hence (1), has a unique solution.
In the literature, the solution to (3) is also denoted by the Moreau's :i.. :im:il7 operator [4]
1
(I+ 7)1 (y) : arg min 7(x) + IX  2, (5)
xER1 2
which satisfies the Moreau's identity [1]
y= (I+97)(y) + (I + *)'(y), (6)
where 7* is the Fenchel transform of 7 defined by
*(y) := sup (z,y)  (z). (7)
zERn
Note that 7* in (7) has a simple form if 7 is the indicator function of A" defined in (4), i.e.
7*() : sup) sup (z, y)
zER" zEA"
n (8)
= max > = max {yi}.
zEL, 1
i= 1
So 7*(y) is just the largest component of y. Based on (6), it is suffice to compute the Moreau's
p]...::iil7, operator of w*
(I+ *) y) argin*( min z) + 2
zER 1 (9)
=arg min max {zi} + z  y12(
zERn 1
and then (5) can be obtained by
(I+ O97) (y) =y  (I+ O*') (y). (10)
To find the solution of (9), we first sort the components of y (= yi,  ,y,)T E Rn in the
ascending order as Y(1) < ... < 'i, ,. Then we know the minimization problem in (9) can be written
as
mm max {zi} +  z  y2 = min min t + z  y2 : max zi t (11)
zERn l1
by introducing the new variable t. For any fixed t, the inner minimization problem on the right of
(11) is
min t + Iz
zERI � 2
z6s 2
y112 : max ~Z}
l
(12)
and the minimizer z(t) (as it depends on t) is obviously
if y> > t
if Yt < t
and the minimum value of (12) is
(13)
f(t) : + ll(t) yll21
t + 2 Z(t _ yj)2
j=1
L 1 n
t + 2 (t y(j))2
j7i+1
1
2 (t
if t < Y(1)
1, . ,n 1
if t > I, ,
Note that f is welldefined at Y(1), ... , , , as shown in Lemma 2.1
equivalent to the univariate minimization problem
below. Therefore, (11) is
min f(t)
tER
(15)
where f(t) is defined in (14).
Note that f in (14) is piecewise quadratic, and hence is piecewise convex and smooth. Moreover,
the following lemma shows that f e C1(R).
Lemma 2.1. The function f /. fi, ,1 in (14) is continuous on R, and its derivative f' exists and is
also continuous on R.
Proof: According to (14), for all i = 1, ... , n, there are
lim f(t)= , + E 01, ,
tY(i) j i
lim f'(t)
(i)
n
1 + E(, ,
j i
y(j))2 ", +, + E ",
j i+1
n
Y(j)) 1 + (', ,  y(j))
j i+1
which prove the lemma. D
Based on Lemma 2.1, we can obtain the derivative f' as
f'(t) 
n
1 + (t  yj) if t < Y(1)
j1
n
1 + (t  (j)) if I,,",
j i+1
1 + (t  'I, ,) if t ,>,
Note that f' is welldefined at the points y(1),   ,, i,, in (18) as it is continuous. Based on the
discussion above, we have the theorem that restricts the search in n candidates and obtains the
solution to (1) using the only candidate that falls into the correct interval.
(14)
y())2  f(, ,)
(16)
(17)
(18)
if,, ,< t< t , 1), i
 lim f'(,, ,),
t+y)
(z(t)) 
ye Y
Theorem 2.2. For any vector ye . , the p,..i.. /.'.. of y onto A" as in (1) is obtained by the
positive part of y  t:
x (y  t). (19)
where t is the only one in {ti : i =0, 0 , n  1} that falls into the corresponding interval as follows,
i}n y(j) 1
ti := E iz y(  , i =0,... ,n  1, where t
(20)
Proof. Based on Lemma 2.1, we know f is piecewise quadratic and f e C1(R). As (3) has a
unique minimizer, we know that (9) has a unique minimizer as well, and hence the optimal t is
unique. Therefore, there is only one single t that has vanish derivative i.e. f'(t) = 0, and this t is
the minimizer of mintER f(t).
Now we compute the possible choices for the optimizer t. Note that f is piecewisely defined,
and according to (18), there are at most n points that have vanish derivative. It is easy to check
that i., v are those shown in (20).
However, since the optimal t exists and is unique, we know that there is one and only one of
{ti : i 0,   , n  1} that can be the optimal L. In another words, there is only one ti that falls
into the "correct" interval and hence is the optimal choice of L.
Once t is obtained, we have the minimizer of (11) as z(t) where z(.) is defined in (13), and hence
obtain the solution x to (1) based on the Moreau's identity (10):
i (Y {L I it h if yi > (21)
0 otherwise
which implies that x (= y  t)+. This completes the proof. E
Based on Theorem 2.2, we only need to find the ti in (20) that falls in the corresponding interval,
and claim it as the optimal t for (21). This procedure is described as in Steps 24 in the Algorithm
1. An interpretation of such procedure is as follows. Suppose we start with a very large t, namely
t > ii, ,, and let t go towards negative. First, we can see f'(t) = 1+ (t i, ,) > 1 > 0 if t > ii, , and
hence the optimal t cannot occur in 'i, ,, oo). As f' is continuous and f'(,i, ,) = 1 > 0, we know f'
is positive near y ). As f is quadratic in ',1, 1), , ,), this also implies that the optimal t, if exists
in this interval, can only be t,i : ,  1 based on (18). Due to the existence and uniqueness of
t, we can surely accept t t1i if i, 1) < tn1 < I,, ,, or simply t,i > 1,, 1) (since obviously
t1i < ,, ,). If t,1 < ", 1), we know the optimal t is not achieved yet and hence f'(,, _1)) > 0
due to the fact that f is quadratic in ',, _1), 1, ,). Repeating the similar argument, we can accept
t tn2 : (", 1) + ",i  1)/2 if 'i, 2) < tn2 < 1i, 1), or simply tn2 > li, 2) as we know
f(t) is quadratic and keeps increasing near 1,, 1) in this case (hence tn2 cannot be equal to or
larger than 1,, _1)). Based on this ,..,,1 i. we can repeat at most n  1 steps of such "compute ti
 compare to ',, ," processes (i n  1, n  2, ... , 1) until t is found. If i is still not found after
n  1 steps, then it must be to : ((E' j  1)/n.
Completed with the last step x = (y  i)+, the algorithm is summarized in Algorithm 1.
3 Numerical Examples
We manually computed the projections of several examples of y in lowdimensional (2D and
3D) cases using Algorithm 1, and the results turned out to be exact as expected. For more general
cases, it is usually nontrivial to demonstrate the correctness of Algorithm 1 numerically.
Algorithm 1 (projsplx). Projection of y E R" onto the simplex A".
1. Input y (yi, , y)T E R;
2. Sort y in the ascending order as y(1) < " ' < '., ,, and set i = n  1;
3. Compute ti _ Y() . If i > , , then set = ti and go
and redo Step 3 if i > 1 or go to Step 4 if i 0;
4. Set i Z  Yj1.
5. Return x (= y  1+ as the projection of y onto A".
k .
* �.'....'. . . .  '" . . . .... �.
I' ^ ;*^ *J .' , ** *
:. . : ... .: ' ,, .: , : . . *
*. . . ; .. .
� ; ., :" ., .o . : �7
3 2 1 0 1
(a) 2D case
2 3 4
to Step 5, otherwise set i + i 1
*.
2 2 1 5 05 05
(b) 3D case
15 2
Figure 1: Projections (blue) of 1024 random points (red) onto simplexes A2 (left) and A3 (right).
For completeness of this report, we show several numerical examples of Algorithm 1. The
algorithm is implemented in C and compiled in MATLAB (Version R2010b) using mex function.
The computations are performed on a Lenovo ThinkPad laptop with Intel Core 2 CPU at 2.53GHz,
3GB of memory, and GNU/Linux (Kernel version 2.6.35) operating system.
We first generate 1024 2D points from N(0, 2) using the MATLAB function randn(2,1024)
and project them onto A2 using Algorithm 1. The results are plotted in Figure 1(a). A similar
test projects 1024 samples from N(0, 0.513) to A3 and plots the results in Figure 1(b). Note that
both tests are not designed to show that Algorithm computed the correct projections, since the
correspondences are not shown. On the other hand, these two tests demonstrate that the outputs
x are indeed located at the simplex (note that we did not check :ir, vi., re in Algorithm 1 that x is
on the simplex).
The next test shows the CPU time of projections of 216 = 65, 536 ndimensional points draw
from N(0, I,) for n = 2, 3,... , 50. The results are plotted in Figure 3. The CPU time has the trend
of increase as n becomes larger. Interestingly, the increasing rate is rather low. For example, for
n = 5 to n = 50, the CPU time used only changes from 1.88s to 2.52s, for which the difference seems
rather minor compared to the significant changes in computational complexity of the problem.
2.6
2.5 
u 2.4
0
o 2.3
S2.2 
 2.1 
0 2
1.9
1.8 I I I I
0 10 20 30 40 50
Dimension n
Figure 2: CPU time (in seconds) of projections of 65, 536 ndimensional points drawn from N(0, Ia).
The tests are carried on for n = 2,... ,50.
4 Concluding Remarks
In this paper we propose a fast and simple algorithm projsplx as shown in Algorithm 1 that
projects an ndimensional vector y to the canonical simplex A/. The computation comprises of the
sort of components of the input y and at most n simple "computeco .'iip..." processes. The solution
is exact and the algorithm is extremely easy to implement. The MATLAB/C code is provided on
the following websites for public use.
* Author's website: http://www.math.ufl.edu/xye/codes/projsplx.zip
* Matlab Central: http://www. mathworks. com/matlabcentral/f ileexchange/30332
5 Acknowledgement
Xiaojing Ye would like to thank Stephen Becker (CalTech) for his helpful comments in comple
menting the references.
References
[1] P. Combettes and V. Wajs. Signal recovery by proximal forwardbackward splitting. Multiscale
Model. Simul., 4(4):11681200, 2005.
[2] John Duchi, Shai S. Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto
the 11ball for learning in high dimensions. In Proceedings of the 25th international ...'f. ,. in.
on Machine learning, pages 272279, 2008.
[3] C. Michelot. A finite algorithm for finding the projection of a point onto the canonical simiplex
of R". J. Optim. T. .,,.;/ Appl., 50(1):195200, 1''.
[4] J. Moreau. Proximity et duality dans un space hilbertien. Bull. Soc. Math. France, 93:273299,
1 i ,
[5] Ewout van den Berg and Michael P. Friedlander. Probing the pareto frontier for basis pursuit
solutions. SIAM J. Sci. Comput., 31:890912, November 2008.
