THE GENERALIZED INVERSE DUAL FOR

QUADRATICALLY CONSTRAINED QUADRATIC PROGRAMS

By

WILLIAM DAVID RANDOLPH

A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF

THE UNIVERSITY OF FLORIDA

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

1974

__

UNIVERSITY OF FLORIDA

3 1262 08666 208 6

To Alicia

ACKNOWLEDGMENTS

The author wishes to take this opportunity to express his gratitude

to all those who lent support in this research effort. Thanks are

especially due Dr. D. W. Hearn for his suggestion of the research topic

and his continued support in the pursuit of meaningful results. The

author would also like to thank Dr. M. E. Thomas, Dr. H. D. Ratliff,

Dr. R. L. Francis and Dr. N. Passell for their advice and helpful

suggestions as members of his supervisory committee.

The author wishes to express his appreciation also to the many

faculty and fellow students whom he studied with at Rose Polytechnic

Institute, The George Washington University, and the University of

Florida.

Finally, thanks go to his wife, Alicia, for her confidence,

encouragement, and support.

This research was supported in part by U. S. Army Contract No.

DA-ARO-D-31-124-72-G123.

TABLE OF CONTENTS

ACKNOWLEDGMENTS .................................................

ABSTRACT........................................................

CHAPTERS:

1. INTRODUCTION...........................................

2. THE MOORE-PENROSE GENERALIZED INVERSE OF POSITIVE

SEMIDEFINITE MATRICES..................................

2.1 Introduction......................................

2.2 Extensions ...................................

2.3 Differential Forms................................

3. GENERALIZED INVERSE DUAL FOR CONVEX QUADRATICALLY

CONSTRAINED QUADRATIC PROGRAMS.........................

3.1 Introduction......................................

3.2 Duality ..........................................

3.3 Feasible Set F......... ...........................

3.4 Objective Function '(y)...........................

3.5 Optimal Primal Solution...........................

3.6 Linear Constraints.................................

3.7 Linear and Quadratic Programming..................

3.8 Equivalence of Dual Forms.........................

4. DUAL ALGORITHMIC CONSIDERATIONS ........................

4.1 Introduction......................................

4.2 Algorithm.........................................

4.3 Projection Matrices...............................

4.4 Objective Function Gradient.......................

Page

iii

vi

1

4

4

4

11

14

14

15

19

30

37

41

42

44

50

50

51

54

57

TABLE OF CONTENTS (Continued)

Page

4.5 Definite Problems................................. 58

4.6 Semidefinite Problems............................. 61

5. APPLICATIONS ........................................... 66

5.1 Introduction...................................... 66

5.2 Multifacility Euclidean Distance Location

Problem............................................ 66

5.3 Sinha Duality.................................... 72

5.4 General Fermat Problem.......................... 76

5.5 Portfolio Selection............................... 77

5.6 Convex Programming................................ 79

6. FUTURE RESEARCH ........................................ 83

APPENDICES .................. .................................... 87

APPENDIX A.................................................. 88

APPENDIX B................................................ 104

REFERENCES ...................................................... 107

BIOGRAPHICAL SKETCH ............................................. 112

Abstract of Dissertation Presented to the

Graduate Council of the University of Florida in Partial

Fulfillment of the Requirements for the Degree of Doctor of Philosophy

THE GENERALIZED INVERSE DUAL FOR

QUADRATICALLY CONSTRAINED QUADRATIC PROGRAMS

By

William David Randolph

June, 1974

Chairman: Dr. D. W. Hearn

Major Department: Industrial and Systems Engineering

The generalized inverse dual (GID) and its properties are developed

for convex quadratically constrained quadratic programs, based on the

Moore-Penrose generalized inverse of a matrix. The resulting dual problem

has a concave objective function and the constraints are shown to be

effectively linear. The closure of the dual constraint set is character-

ized by an index maximal dual vector.

It is also shown that the GID is equivalent to the conjugate function

dual (CFD), developed through conjugate function theory or generalized

geometric inequalities, and has significantly fewer variables.

Comparison of the two duals, GID and CFD, for a projected gradient

algorithm, demonstrates that for strictly convex problems the GID is

favored over the CFD; some experimental results for the GID are given.

For nonstrictly convex problems some advantages of the GID are discussed,

e.g.,determining an initial dual feasible vector.

Five areas of application where the GID could prove to have signifi-

cant computational advantages are discussed: multifacility Euclidean

distance location problem, stochastic programming, the general Fermat

problem, portfolio selection, and general convex programming.

CHAPTER 1

INTRODUCTION

This dissertation is concerned with convex quadratically constrained

quadratic programs.

Various examples and references to such programs have occurred in

the mathematical programming literature. John [37] in his classic

paper of 1948 presented as an example a quadratically constrained program

with a linear objective function. The special structure of quadratically

constrained quadratic programs was also recognized by Kuhn and Tucker [40]

and used as an example saddle point problem.

Other examples of quadratically constrained quadratic programs are

found in the field of chance-constrained programming; see Charnes and

Cooper [11], Euclidean distance location problems, see Elzinga and

Hearn [25], Hearn [34], and Francis [29,30,31].

Some authors have addressed specific quadratic problems. For

example, Bellman [3] discusses a solution technique for a quadratic

program with one quadratic constraint. The procedure is based on

simultaneous diagonalization of the two Hessians and a change of vari-

ables. van de Panne [69] presents a finite algorithm for a linear

objective function with linear constraints and one quadratic constraint.

Major developments in this area have been the 1967 generalized

geometric inequality dual of Peterson and Ecker [51,52,53], and the

conjugate function duality of Rockafellar [57]. The two developments

have one point in common; neither appears to relate to the duality of

Wolfe [71] which has been a powerful tool in mathematical programming.

A more recent work is that of Baron [2]. Baron discusses and

develops computational procedures for the class of quadratically con-

strained quadratic programs using two forms of the Lagrangian function.

The duality theory employed was first shown by Falk [26] for strictly

convex programs. Baron uses the well-known convex programming algorithm

of Dantzig [20] for solution of these Lagrangian dual problems.

The research reported herein develops a new dual form for the class

of convex quadratically constrained quadratic programs.

In Chapter 2 some extensions to the theory of the Moore-Penrose

[47,50] generalized inverse for sums of positive semidefinite matrices

are developed. In particular, general theorems are stated which are

germane to the developments of Chapter 3.

In Chapter 3 the generalized inverse dual and its properties for

convex quadratically constrained quadratic programs are developed. The

only theoretical basis required for the generalized inverse dual is

linear algebra and Wolfe [71] duality. It is also shown that the

corresponding duals of Peterson and Ecker and Rockafellar are derivable

from the generalized inverse dual and conversely.

Chapter 4 is a discussion of a feasible direction algorithm for the

generalized inverse dual and the conjugate function dual of Peterson and

Ecker and Rockafellar. It is concluded that for a particular subclass of

convex quadratically constrained quadratic programs the generalized inverse

dual algorithm should prove more favorable than that for the conjugate

function dual.

3

Chapter 5 presents five applications where the generalized inverse

dual leads to more computationally tractable mathematical programming

problems.

Chapter 6 sets out specific areas of future research based on the

results reported herein.

CHAPTER 2

THE MOORE-PENROSE GENERALIZED INVERSE

OF POSITIVE SEMIDEFINITE MATRICES

2.1 Introduction

The development of the duality theory for convex quadratically

constrained quadratic programs encompasses specializations of the

following extensions to the theory of the Moore-Penrose generalized

inverse. These extensions are developed and presented at this time due

to their importance and the fact that they are not known to exist else-

where in the literature.

The theory of this chapter is restricted to sums of symmetric

positive semidefinite matrices. The reason for this restriction will

be evident in Chapter 3.

2.2 Extensions

Notationally, let Q j = 1,2,...,m, be symmetric positive semi-

definite matrices of order nxn and rank PJ. Let

m

Q = I Q

j=1

Then Q is a symmetric positive semidefinite matrix.

Lemma 2.1. Let Q be the generalized inverse of Q, then

QQ = QQ.

Proof of Lemma 2.1. By the properties of the.generalized inverse

QQ is symmetric,

Q-Q = (Q-Q)t

= Qt-t

From Corollary A.3.1, Q = Qt implies that (Qt) = Q-, hence,

QQ = QQ-.

Q.E.D.

Theorem 2.1. Let Q be the generalized inverse of Q, then

Q (I-Q Q) = 0,

Q (I-QQ) = 0,

(I-QQ )Q = 0,

(I-Q Q)Q = 0.

Proof of Theorem 2.1. By definition

QQ-Q = Q,

Q(I-Q Q) = 0,

m

j Qj(I-Q Q) = 0. (2.2.1)

j=I

From Theorem B.2, the trace of the lefthand side matrix of (2.2.1) is

m m

tr[ l Q (I-Q Q)] = I tr[Q (I-Q-Q)] = 0.

j=1 j=l

Now Q is symmetric positive semidefinite and by Theorem A.10,

(I-q Q) is positive semidefinite; hence, by Corollary B.4.1,

tr[Q.(I-Q Q)] s 0, for j = 1,2,...,m.

For a sum of nonnegative terms to add to zero they must all equal

zero. Therefore,

tr(Q (I-Q Q)) = 0 and Q (I-Q Q) = 0

by Theorem B.4.

The other forms follow in the same manner using Lemma 2.1 and the

alternate form of (2.2.1)

I (I-QQ )Q = 0.

J.1 Q.E.D.

A notational device for weighted sums of matrices will be Q = Q(y) =

m

y Qj, y E E If there is more than one set of weights, the

J=1

distinguishing character will be adjoined to Q, e.g., yj and Q'. For

emphasis the functional notation Q(y) will be retained.

m

Corollary 2.1.1. Let Q = y Q Yj > 0, j = 1,2,...,m, then

j=l

Q(I-Q Q) = 0,

Q (I-QQ ) = 0,

(I-QQ )Qj = 0,

(I-Qa )Qj 0.

Corollary 2.1.2. Corollary 2.1.1 holds when Qj is replaced by

Qj, the generalized inverse of Qj.

Proof of Corollary 2.1.2. It will suffice to show that (I-QQ )Qj = 0

since the same approach is used in all cases.

(I-QQ )Q = (I-QQ) )QjQQ

since Qj is symmetric, by Lemma 2.1,

(I-QQ-)Q = (I- Q)Q QjQ = 0.

Q.E.D.

m m

Theorem 2.2. Let Q Q(y) = Y y Q and Q' = Q(y') = y'Q ,

j1 j=l =

where yj > 0, y' > 0 for j 1,2,...,m, and y' 5 yj. Then Q-Q = Q'Q'.

Proof of Theorem 2.2. The proof will be to show (I- Q) and

(I-Q'-Q') are both generalized inverses of (I-Q-Q) which by the uniqueness

of the generalized inverse, Theorem A.2, implies

(I-Q-) = (I-Q' Q').

Corollary A.8.2 and Theorems A.9 and A.10 show the generalized

inverse of (I--Q ) as being (I-QQ). Now, using Lemma 2.1 and Corollary

2.1.1, it is easily shown that (I-Q'Q') satisfies the four properties

of the generalized inverse, e.g.,

(U1 (I-Q )(I-Q' Q') = (I-Q~) (I-QQ)Q'Q'

m

= (I-Q Q) y '(I-Q Q)Q Q'

j=1

= (I-Q Q), hence symmetric

and therefore (I-Q' Q') is also a generalized inverse of (I-Q-Q). Thus,

(I-Q Q) = (I-Q'-Q')

and

Q.E.D.

Corollary 2.2.1.

QQ = Q'Q'

QQ Q' Q'

QQ = Q'Q',.

Proof of Corollary 2.2.1. By Lemma 2.1,

Q Q = Q'Q '

and the result is immediate.

Q.E.D.

Another important corollary to Theorem 2.2 deals with the column

space and null space of a matrix. The following two definitions are

stated for emphasis.

Definition. Let A be an nxm matrix; denote the m columns of A as

vectors in E so that A = [a ,a2,...,am]. The vector space spanned by

these m column vectors of A is defined as the column space of the matrix

A.

Definition. Let A be an nxm matrix. The null space of the matrix

A is defined to be the set of vectors S where

S = {yIAy = 0; y E Em}.

If the matrix A is symmetric, the following lemma establishes the

equivalence between its null space and orthogonal complement of the

column space.

Lemma 2.2. Let A be an nxm matrix. The null space of At and the

orthogonal complement of the column space of A are the same.

A proof of this lemma is found in Graybill [32].

m m

Corollary 2.2.2. Given Q = 1 yQj and Q' = I yQj ; y, y. > 0

j=1I J j=li J

for j = 1,2,...,m. Then the column vectors of Q span the same space as

do those of Q'; i.e., Q and Q' have the same column space.

Proof of Corollary 2.2.2. By Lemma 2.2 and Corollary A.8.1 the

null space of Q is spanned by (I-Q Q) and that of Q' by (I-Q'-Q'). Let

b be in the column space of Q. Then,

(I-Q Q)b = 0,

Q Qb = b,

but by Theorem 2.2

QQ = Q' Q'

and

(I-Q' Q')b = 0.

Hence, b is in the column space of Q'. Likewise, a vector b' in the

column space of Q' is in the column space of Q. Therefore, Q and Q'

have the same column space.

Q.E.D.

m

Corollary 2.2.3. Given Q = [ y jQ and for some index set

j=1

I = {il,i2,... ,i }, mi s m, such that I E {1,2,...,m} let Q' = y'Qj,

1 iJE

y: > 0 and yi 0 yj, then the column space of Q' is contained in or equal

to the column space of Q.

Proof of Corollary 2.2.3. Take b' in the column space of Q' then

Q'Q'-b' = b',

premultiplying by QQ and applying Corollary 2.1.1 it is seen that

QQ~Q'Q'-b' = Q'Q'-b' = QQ-b' = b'.

Hence, the column space of Q' is contained in the column space of Q.

Now take b in the column space of Q,

QQ b = b.

Premultiply by (I-Q'Q'-); i.e., if b is in the column space of Q' it will

have zero component in the null space of Q'.

(I-Q'Q'-)QQ-b = (I-Q'Q'-)b,

Y y (I-Q'Q'-)Q1Q-b = (I-Q'Q')b,

by Corollary 2.1.1 and clearly only if Qj, for jiI, is in the column

space of Q' will b have zero component in the null space of Q'. Hence,

in general, the column space of Q will not be properly included in the

column space of Q'.

Q.E.D.

An alternative and useful way of describing the matrix Q, through

decomposition, is based on the following lemma.

Lemma 2.3. An nxn symmetric positive semidefinite matrix Q can be

factored as Q = Bt B, where B is an nxn upper triangular matrix. For Q

of rank p, B is of the form

B = [Bt,0]t,

B being of order pxn.

Proof of Lemma 2.3. It is well known [ 7 ] that a symmetric positive

semidefinite matrix Q of rank p can be factored into BBt where B is an

upper triangular matrix of order pxn. Form the nxn matrix B = [tB,0]t, 0

being a zero matrix of order nx(n-p), then

BtB = [t,0] = tB + 0 = Q,

conformability requirements being satisfied.

Q.E.D.

The matrix Q, which is a sum of symmetric positive semidefinite

matrices Qj, J = 1,2,...,m, can now be expressed as

= J1 y BB .

which can be put into a matrix product form,

Q = Bty2

where

Bt t t Bt,

B [BE,B2,. ..,Bt]

and

yY 0 ... 0O

Y = Ix 0 y ... 0 I the nxn identity matrix.

0 0 .d." ym

Y2 is defined by a Kronecker product; see Appendix A for definition.

A further simplification of this notational form can be realized

by noting that Y is defined as having diagonal elements y-j, then

with

B= B(y) = YB.

Theorem 2.3.

-- = ttt- = .

QQ =BB= B~B.

Proof of Theorem 2.3.

QQ = BtB(BtB)

By Corollary A.3.2, (B B) = B-Bt, then

QQ = BB Bt

= Btlt-t

=BB

=B B.

Q.E.D.

Corollary 2.3.1.

Qt t- -

BtBt- = BBQj = Q,

QjBtt- = QjB B = Qj.

Proof of Corollary 2.3.1. By Corollary 2.1.1,

QQQ =Qj

Q.QQ =Q,

and the result is immediate.

Q.E.D.

2.3 Differential Forms

In order to express the differential of the generalized inverse

m

of Q = 0 YQ r yj > 0 for j = 1,2,...,m, it will be useful to state the

j=l

following lemma.

The lemma, while well known, is stated and proved to emphasize the

correspondence between the partial derivative forms for the nonsingular

inverse and the generalized inverse. It is assumed that Q is a function

of yj with the Q fixed for j = 1,2,...,m.

Lemma 2.4. Given Q = = y Qj, y > 0 for j = 1,2,...,m and Q is

j=l

nonsingular, then

-1

^ k Q 1

3Yk

Proof of Lemma 2.4.

-1

Q Q=I

k -1 + = 0

DYk

-= -Q- Qk .

3Yk Q.E.D.

Using the alternate definition of the generalized inverse the

partial derivative of Q can be given for Q singular. It is seen that

while the proof is more involved a similar form is derived as for the

nonsingular Q and as expected the generalized form is equivalent to that

of Lemma 2.4 for Q nonsingular.

m

Theorem 2.4. Given Q = i YQ., yJ > 0 for j = 1,2,...,m, then

J=1

^_ -Q kQk -

Proof of Theorem 2.4. The alternate definition of the generalized

inverse of Q is

Q lim (QQ + 2I)-1

6+0

where (QQ + 621) is a nonsingular symmetric matrix for all 6 > 0. The

partial derivative of in respect to yk is then

-D = lim (QQ + 621)-Q + (QQ + 6 I) Qk (2.3.1)

aYk aYk

6+0

By Lemma 2.4,

ark (Q + 61) -(+ 61)- (Q + Qk)( + 621)-,

which when substituted into (2.3.1) results in,

= lim ((Q + 621)Qk

6+0

-lim ( + d2I )r'klQ + 621I)Q^

6+0

-lim (QQ + 621) QQ QQ 621)

6+0

Noting that lim (QQ + 621)- = Q

6+0

then

lim (QQ + 62)-1 = QQ = Q-

6+0

Recalling Corollary 2.1.1, it is seen that,

lim (QQ + 0-1kQQQ + 6221)Q = lim (QQ + 6 1) Qk"

6+0 6+0

Therefore,

-= -lim (QQ + 6 I) QQk QQ 621)1

ayk

6+0

which, by the alternate definition of the generalized inverse is

equivalent to

k -Q Qk -

Syk

Q.E.D.

CHAPTER 3

GENERALIZED INVERSE DUAL FOR CONVEX QUADRATICALLY

CONSTRAINED QUADRATIC PROGRAMS

3.1 Introduction

The material of this chapter is an extension of duality theory

developed from the classical Lagrangian analysis. In particular, the

theory is developed from the duality of Wolfe [71].

The philosophy of the analysis parallels that of Falk [26]. Falk's

results are for more general problems, but have limited usefulness in

algorithmic application. His results for strictly convex problems are

extended to convex quadratically constrained quadratic programs.

Other duals for quadratically constrained quadratic programs are

those of Peterson and Ecker [51,52,53] and Rockafellar [57]. The

Peterson-Ecker dual is established by use of a generalized geometric

inequality and the Rockafellar dual is by conjugate function analysis.

The work of Peterson and Ecker and Rockafellar motivated a number

of results in this chapter. It is shown that the smaller generalized

inverse dual has the same desirable features as the conjugate dual and

that the duals are derivable from each other. All proofs are new and

require only linear algebra. Furthermore, the generalized inverse dual

reveals a useful characterization of the constraint set not available

with the conjugate dual form.

3.2 Duality

The following development establishes the generalized inverse dual

and its properties for the class of convex quadratically constrained

quadratic programs.

The primal problem is stated in the general form,

It t

{P.1} minimize 0 (x) = x Qox + h x + c

x

subject to: Dj(x) xt x + h x + c 0

for j = 1,2,...,m

where Qj is an nxn real, symmetric, positive semidefinite matrix,

h E En, c e E1 for j = 0,1,2,...,m.

Theorem 3.1 is the major result of this chapter and is based on the

theory of the Moore-Penrose generalized inverse of a matrix.

Theorem 3.1. The Wolfe dual of (P.1} is equivalent to

1 tHt-H t

{D.1} maximize i(y) = Ht Hy + cy

subject to: y = (yo,yl',y,...,y )t 0

yo 1

QQ Hy = Hy

where

H (ho,h,h,...,h),

m

Q = Q(y) = y Q,

j=o

c (Cocl c2,... cm)t

and Q is the generalized inverse of Q.

Proof of Theorem 3.1. The Wolfe dual [71] of {P.1} is

1 t tt ty (3.2.1)

{W.1} maximize i(x,y) = x Qx + y H x + c y (3.2.1)

subject to: y = (yo,Yl...,y7)" 0

Qx + Hy = 0 (3.2.2)

o =1

H, Q, and c defined as above.

By Theorem A.7, (3.2.2) can be expressed as

x = -Q Hy (I-Q Q)g (3.2.3)

for g E E Equation (3.2.2) implies that for a given nonnegative y to

be feasible Hy must lie in the column space of Q. Hence,

QQ Hy = Hy

as stated in Theorem A.6. Substitution for x, by (3.2.3) in (3.2.1)

results in

1 tt- -t

i(y) - yHQ (y)Hy + c y.

Thus, {W.1} and {D.1} are equivalent.

Q.E.D.

The proof that i(y) is concave on the feasible set will be deferred

until section 3.4. This is done due to the proof being more direct when

based on differentiability of p(y).

As a result of the equivalence of {D.1} to {W.1} the major duality

results of Wolfe can be applied to {D.1}. The weak duality theorem

states that i(y) 0o (x) for y feasible to {D.I} and x feasible to {P.1I

and by Wolfe's duality theorem i(yO) = o (xo) for yO and xo optimal.

Also, the unbounded dual theorem and the no primal minimum theorem apply

to {P.1} and {D.1}.

Corollary 3.1.1. If each of the Q of {P.1} is a diagonal matrix

then the dual {D.1} is a fractional programming problem with linear

constraints.

Proof of Corollary 3.1.1. Clearly Q is a diagonal matrix of rank r,

then by a suitable transformation

Q = (qj = (qj(y))

where

qii = polynomial of degree one in y

i = 1,2,...,r

qi = 0 for i # j

qii = 0 for i = r+l,r+2,...,n.

Furthermore, reflecting the above transformation, let

Hy = (1i2'.... n)t

where

Ei' i = 1,2,...,n, is a polynomial of degree one in y.

Using Theorem A.10 and simplifying, {D.I} becomes

2

r 2 m

{D.2} maximize t(y) = - -- + I + c

i=1 qi j=l J 0

subject to: y > 0, j = 1,2,...,m

r+k = 0, k = 1,2,...,n-r.

Q.E.D.

A subclass of {P.1} for which the only dual constraint is non-

negativity is characterized by o (x) being strictly convex. This

corresponds, by a well-known theorem, to Qo being positive definite.

Corollary 3.1.2. Given {P.1} has a strictly convex objective

function then {D.1} reduces to

1 ytHt -IHy + -ty

{D.3} maximize ((y) = - yt -1Hy + cy

subject to: y = (yo,Yl,y2,.. *,y)t > 0

Yo=1.

Proof of Corollary 3.1.2. Qo is positive definite and therefore

nonsingular. Hence, Q is positive definite and nonsingular. The

generalized inverse of Q is then Q-

Q.E.D.

Note that if yj > 0 for some Q positive definite, the same result

holds. This corresponds to a strictly convex 4. being an active primal

constraint so that, via complementary slackness, yj > 0.

Corollary 3.1.3. {D.3} is a fractional programming problem.

^-1

Proof of Corollary 3.1.3. Each element of -Q is a polynomial in

the y. divided by a polynomial in y that is, let

(qj) = (q(y))

where qij is determined by the co-factor of the jih element of Q and is

a polynomial in the yj of degree n-l divided by the determinant of Q

which is a polynomial in the yj of degree n.

The vector Hy has elements which are polynomials in the yj of degree

1. Therefore, ytHt -1Hy will be a polynomial of degree n+l divided by a

polynomial of degree n.

Q.E.D.

{D.1} is a function of dual variables only; the number of dual

variables being equal to the number of primal constraints. In this

respect, it is a theoretical improvement over previously developed duals

for the class {P.1}.

To expand on {D.1} and investigate its potential as a computational

tool, the characterization of the constraint set must be simplified.

The constraint set will be shown to be convex, but in general it is

neither open nor closed. The approach is then to define its closure and

relative interior.

It will be seen that there is a unique characterization of the

closure and relative interior of the constraint set determined by a

specially structured feasible dual vector. In fact, by just knowing the

form of this vector the dual problem is significantly simplified in that

the constraints reduce to linear equality and inequality forms.

The importance of the closure characterization will be demonstrated

by showing that the dual problem solved on the closure of the constraint

set results in the same solution as if solved over the original constraints.

3.3 Feasible Set F

The set of feasible points for {D.1} can be expressed by

F = {y E I yo = 1, Q(y)Q(y)Hy = Hy}

where y = (y ,y1,." ,y m) and E7 is the nonnegative orthant of Em+

Theorem 3.2. The feasible set F is convex.

Proof of Theorem 3.2. Let y ,y2 E F and y = Xyl + (1-l)y2,

1 2

0 < X < 1. By y ,y feasible,

Q(ylQ-(y)Hyl = Hyl,

Q(y )Q-(y)Hy2 = Hy2,

1 2 3 3

and Xy + (1-X)y = y > 0 with y = 1. Furthermore, by Corollary 2.1.1,

1 3 2 3

noting that if y > 0 then y > 0 and if y > 0 then y > 0,

[I-Q-(y3)Q(y )]Q(y ) = 0,

[I-Q(y (y3)] 2) = 0.

Hence,

3~ 3 1 12- 2 2

[I-Q (y3)Q(y3)]{XQ(y )Q (yl)Hy + (1-X)Q(y )Q (y )Hy} = 0

and [I-Q (y 3)Q(y )]Hy3 = 0.

Thus, y3 = Xy1 + (1-X)y2 E F.

Q.E.D.

A new notion will now be introduced which leads to a linear

characterization of the closure of F, designated by F*. The notion is

that of an index maximal dual feasible vector. The existence of such a

vector is assured for all dual feasible problems. The disadvantage is

that it is not easily computed.

To introduce the concept of an index maximal dual feasible vector

the index set of feasible dual vectors is defined.

Definition. The index set of feasible dual vectors is

N(y) = {j yj > 0}.

Note that for every y e F, 0 E N(y).

The size of N(y) is defined as the number of elements it contains.

Definition. The size of the index set of feasible dual vectors

is

S(y) = r

where r is the number of dual variables, including yo, that are positive,

e.g., given

N(y) = {0,j1,j2 ...jr-

then

S(y) = r.

Definition. If y* E F and S(y*) > S(y) for all y e F then y* is

an index maximal dual feasible vector.

It is clear there will exist such a vector for every dual feasible

problem.

In referring to such vectors, the shortened term "index maximal"

will often be used, but where emphasis is required the full term will be

employed.

1~

Theorem 3.3. Given y E F then

(I-Q (y)Q(y))Q = Q (I-Q (y)Q(y)) = 0

for j c N(y).

Proof of Theorem 3.3. The proof is immediate from Corollary 2.1.1

on noting that yj > 0 for j E N(y).

Q.E.D.

Corollary 3.3.1. Let y, y' e F such that N(y) = N(y'). Then Q(y)

and Q(y') span the same column space and Q(y)Q (y) = Q(y')Q (y').

Proof of Corollary 3.3.1. The proof follows from Corollary 2.2.2

by again noting that yj > 0 for j c N(y) and Theorem 2.2.

Q.E.D.

Theorem 3.4. Given y* e F is an index maximal dual feasible vector

and y e F, then

N(y) c N(y*).

Proof of Theorem 3.4. If F consists of a single point the theorem

is trivial; therefore, assume that F contains more than one point.

The proof will be by contradiction. Assume that N(y) t N(y*). y*

index maximal implies that N(y*) cf N(y).

2

Since F is convex there exist y E F such that

2

y = Xy* + (l-X)y, for some 0 < < 1. Hence,

N(y2) = N(y*) U N(y)

= {jly* > 0 or y > 0}.

This leads to a contradiction on y* being index maximal, i.e.,

2 2

N(y*) N(y ) implying that S(y ) > S(y*).

Q.E.D.

Theorem 3.5. Let

F* = {y e Em+1 y = l,y* index maximal, [I-Q (y*)Q(y*)]Hy = 0},

then FE F*.

Proof of Theorem 3.5. Let y* be index maximal and y e F, then by

Theorem 3.4, N(y) c N(y*). y E Fimplies Q(y)Q (y)Hy = Hy; multiplying

by [I-Q (y*)Q(y*)],

[I-Q-(y*)Q(y*)]Q(y)Q (y)Hy = [I-Q-(y*)Q(y*)]Hy.

By Theorem 3.3

[I-Q-(y*)Q(y*)]Q(y) = 0,

therefore,

[I-Q-(y*)Q(y*)]Hy = 0

for all y c F and F c F*.

Q.E.D.

F* is closed and convex and if 0 (x) is linear for j = 1,2,...,m, F* = F.

A significant consequence of this theorem is that while the concept

of an index maximal dual feasible vector implies dual feasibility it is

only necessary to know N(y*). That is, by Corollary 3.3.1 any vector y

such that N(y) = N(y*) will suffice for defining the matrix (I-Q-(y*)Q(y*)).

The most logical being that vector which has yj = 1 for j E N(y*) and

y = 0 for j i N(y*). Therefore, the set N(y*) and not the particular

y* defines F*. For an example of this idea let

1 1 1 1 0 0 0 0

1 1 1 1 0 4 0 0

o = 1 1 2 2 Q = 0 0 5 5 and

1 1 2 2 0 0 5 5

y* = (1,3)t resulting in

F1 1 1 1 220 -16 -6 -6

1 13 1 1 -16 16 0 0

(y) Iand Q(y*) I

QY*) 1 17 17 1 -6 0 3 3

S1 17 17 -6 0 3 3

Finally,

Q(y*)Q (y*) =

Now let y = (1,1)t, then

(y) 5 1 1

) 1 1 7 7

1 1 7 7

and Q(y)Q-(y) =

1 0 00

0 1 0 0

0 0 1/2 1/2

0 0 1/2 1/2

1 6 0 0

(Y) 24 0 1

-2 0 1 1

1 0 0 0

0 1 0 0

0 0 1/2 1/2

0 0 1/2 1/2

Thus it is seen that Q(y*)Q (y*) = Q(y)Q (y) for N(y*) = N(y).

An example of F e F* is the following. Take the primal problem

{E.1} minimum o$(x) = x x + x

0 L 1 --2]

1 t

subject to: 1(x) = x

1 t

42(x) = x

20

The primal solution is x =

problem

[o o o

0 1 0

0

00 0 0

[ 0

0 0 1

x+ 1r0

x +

(-1,2,2) and $ (x) =

0

x-5 < 0

t

x-4 < 0

-6.5. For the dual

1 0 0 1

Q = 0 yi Hy = 1 + y,2

0 0 y -2 + yl

and Q is seen to be nonsingular for y = (1,1,1)t, an index maximal

dual feasible vector. Hence,

[I-Q (y)Q(y)] = 0

and

F* = {y = 1, y1 > 0, y2 > 0}.

F is seen to differ from F* by noting that no y = (1,0,y 2)t, 2 1

or y = (1,l,y 0), y 0 2 is in F. See Figure 3-2(c).

F = {yo = 1, Y1 > 0, y2 > 0} U {(1,0,1),(1,2,0)}

and F F*, F 0 F*.

Another example is as follows:

1 t 1 1 M '

{E.2} minimum o (x) = x 0 0 0 + 2 x

0 0 1 x + 1-2

t

1 1 x 0 x+ -1 I <

subject to: (x)= 1 x 0 0 0 x + +1 x-1 0

1() i [10?1 x+[X1 21

St 1 0

D2(x) = x 0 0

0 0 1

The primal solution is x = (1,1,1)t and 0 (xo

(I-Q Q) = 01 for all y y2 > 0, and

0 0 0

dual feasible vector will have yl,y2> 0 and it

(I-Q Q)Hy = 2+yl+y

or F = F* = {yo,Yly12 y Oly = l,y1 + Y2 = 2}.

L uJ

0

t

x + x-1< 0

1

) = -5. The dual has

-2-y

y = -2+y +y2

-2-y2

F = F*. An index maximal

is seen that

Before stating the next three corollaries it will be useful to

state two definitions dealing with the topology of convex sets. These

concepts and their properties are developed by Fenchel [27] and Rocka-

fellar [58].

Definition. The affine hull of C E En, aff C, is the intersection

of the collection of affine sets M such that Cc M.

Definition. The relative interior of a convex set C, ri C,

consists of the points x e aff C for which there exists an E > 0,

such that y E C whenever y E aff C and I|x-y|| < e. In other words,

ri C = {x e aff C I E > 0 and (x + EB) n aff C C} C

where B is the unit ball in En, B = {xlllxii < 1, and

x + B x'I Iix-x' i E< .

It is evident that for a convex set C

ri C = C cl C,

cl C is the closure of C.

The following corollary of Rockafellar [58, Corollary 6.3.1],

relating the closure of two sets given a hypothesis on the relative

interiors, is stated without proof as a lemma.

Lemma 3.1. Let C and C be convex sets in En. Then cl C = cl C

1 2 1

if and only if ri C = ri C .

The following is an important consequence of Theorem 3.5.

Lemma 3.2. aff F = aff F*

Proof of Lemma 3.2. Clearly, if F = F* the lemma is trivial.

Therefore, assume that F # F*. Now FC F*; therefore, aff Fe aff F*.

Let y* e FcF* be index maximal and y c F* but y t F. F* is convex

hence, for 0 < V' < 1 there exist

y' = X'y* + (l-A')y E F*.

From this,

N(y') N N(y*) N(y'),

thus,

N(y') = N(y*) by y* being index maximal and

Q-(y*)Q(y*) = Q-(y')Q(y') by Corollary 3.3.1.

Then y' e F and the line passing through y*, y', y is in the affine hull

of F, i.e.,

aff F* aff F,

which combined with aff F aff F* results in aff F = aff F*.

Q.E.D.

The next theorem and associated corollaries establish the importance

of the concept of an index maximal dual feasible vector.

Theorem 3.6. ri F = ri F*.

Proof of Theorem 3.6. By Lemma 3.2, aff F = aff F*. If F = F*

the result is trivial; hence, assume that F= F* = cl F*. Also, it is

clear that if y c ri F then y c ri F*.

Now take y E ri F*. There exist y', y* e F* such that y* is index

maximal and because y e ri F*,

y = Ay* + (l-A)y'

for some 0 < c< 1. From this it is seen that y is also index maximal

and hence in F, by application of Corollary 3.3.1.

Select y" c (y + cB) f aff F*, that is, select y'' in the nonempty

e-neighborhood of y such that Ily-y''" < 6 = E Then there exist y'"

in the E-neighborhood such that

y" = ''y + (l-A")y'''

for some 0 < A' < 1. Hence, y'' is an index maximal dual feasible

vector in F. Thus, (y + EB) n aff F=F, implying y c ri F and

ri F = ri F*.

Q.E.D.

Corollary 3.6.1. cl(F) = F*.

Proof of Corollary 3.6.1. By definition F* is closed. Using

Theorem 3.6 and Lemma 3.1 the result is immediate.

Q.E.D.

Corollary 3.6.2. y E F(F*) is an index maximal dual feasible

vector if and only if y E ri F(ri F*).

Proof of Corollary 3.6.2. Assume F consists of more than one

point, otherwise the corollary is immediate.

Take y E ri F, then there exists points y*, y' e F* such that y* is

index maximal and y = Xy* + (l-X)y' for some 0 < X < 1. Hence,

N(y) = N(y*) and y is index maximal.

Now take y index maximal in F* and let E = min {yj}. Then

1 jeN(y)

any y' e (y + 2 CB) n aff F* is in F*. Therefore, y e ri F*(ri F).

Q.E.D.

The implications of the preceding corollary are significant.

First, there can be no dual of {P.1} which exhibits a closed F properly

contained in the interior of E+ Furthermore, relative boundary

points of F are always on at least one of the coordinate hyperplanes of

Em+ F is assumed to consist of more than one point.

Figure 3-1 consists of three convex sets, m = 2, which are not

permissible as F; while Figure 3-2 consists of three examples of F,

m = 2 again.

Y2

(yO = 1, not shown)

Figure 3-1

L

Y2 f

2

Y2

F

Yl

(a)

2 Yl

F

(b)

y1

2 y1

(yo = 1, not shown)

Figure 3-2

3.4 Objective Function i(y)

Having seen that F can be characterized by F*, i.e., ri F = ri F*

and cl F = F*, the next step is to investigate g(y) on F*. This is done

by initially looking at the vector valued function f(y) over F;

f(y) = Q (y)Hy.

It will be shown that ((y) is concave, continuous, and twice differ-

entiable on F. Also, g(y) approaches negative infinity as a relative

boundary point of F, not in F, is approached.

The definition of *(y) will be extended to F* and (y) will be

shown to be upper semicontinuous on F*.

Lemma 3.3. For y* e ri F*, y e F*, and 0 < A < 1

f(X) = f(Xy* + (l-x)y)

= [AQ(y*) + (l-X) (y)]H(Ay* + (l-A)y)

f(X) = (RtR)-Hy*

+ (I-R-B(y*))Q-(y)(I-R-B(y*))tHy

+ (I-R(y*))Q (y) (I-RB (y*))t'y*

-f (I-R B(y*))Q (y)Bt(y*)GM(X)GB(y*)Q (y)(I-R-B(y*))tHy*

G= I-RR

2

M(X) = {I + (-- GB(y*)Q (y)Bt(y*)G}l

Proof of Lemma 3.3.

[AQ(y*) + (1-X)Q(y)] =-[Q(y*) + -) Q(y)J.

1/2 .

Letting U = B (y)/ V= B (y*) and applying Cline's result,

Theorem A.14, the result is obtained.

Q.E.D.

The term M(A) can readily be shown to exhibit the property

lim M(A) = I + 0(A2)

A+0

where a matrix valued function is 0(A2) if each element is 0(X2). The

definition employed for 0(A2) is the following:

Definition. A scalar function T(-), of a real variable A, is said

to be 0(An) as A 0 if T(A)/An is bounded as A -* 0.

The next lemma establishes a useful relationship for some matrix

products inherent in f(A).

Lemma 3.4. Rt- -(y) = 0 and Rt-(I-Q-(y)Q(y)) = Rt-.

Proof of Lemma 3.4. Noting that

Q (y)Q(y)R RQ(y) Q (y)Q(y)(I-Q (y)Q(y))Bt(y*)RQ(y) = 0,

RtRQ(y) = RtB(y*)(I-Q (y)Q(y))Q(y) = 0,

R Rt-(y)Q(y)Rt = RRt- y)(I-Q~(y)Q(y))Bt(y*) = 0,

Q(y)Q(y)R = Q(y)Q(y)(I-Q (y)Q(y))Bt(y*) = 0,

the hypothesis of Theorem A.16 is satisfied for [Q(y)Rt] Hence,

[Q(y)Rt]- = Rt-Q(y)

but Q(y)Rt = 0.

Rt- = R t-Rt- -(y)Q(y) = Rt-(I-Q ((y) ).

Q.E.D.

There is a direct connection between the matrix R and the set of

dual feasible vectors F. This is exhibited in the following.

Lemma 3.5. If y* E ri F*, y e F* and R = B(y*)(I--Q(y)Q(y)) = 0

then y e F.

Proof of Lemma 3.5. By definition of F*, Q(y*)Q (y*)Hy = Hy.

Premultiplying by (I-Q (y)Q(y)) results in

(I-Q-(y)Q(y))Q(y*)Q-(y*)Hy = (I-Q-(y)Q(y))Hy,

but by Theorem 2.3 Q(y*)Q-(y*) = Bt(y*)Bt-(y*), therefore by hypothesis

(I-Q-(y)Q(y))Hy = 0 and y eF.

Q.E.D.

Theorem 3.7. If y* e ri F*, y E F* then

(a) lim i(Ay* + (l-A)y) = - if y I F,

X+0

(b) lim i(Ay* + (l-A)y) = i(y) if y e F,

A+0

hence K(y) is continuous over F.

Proof of Theorem 3.7. First it is noted that

i t

i(Ay* + (l-f)y) = f (f)[lQ(y*) + (l-\)Q(y)lf(\)

+ ct(Ay* + (l-A)y)

and on substitution of f(X) from Lemma 3.3, it is seen that for:

(a) if F = F* the result is satisfied vacuously; otherwise one

term of Aft(A)Q(y*)f(A) is

2

A ytHt(RtR)-(y*)(RtR)-Hy

2

A (L) ytHtRRt-(I-Q-(y)Q(y))Q(y*)(I-Q(y)Q(y))R RtHy

S 1 yHt(RtR)-Hy = A -- [RtHyt[Rt-Hy].

The first equivalence follows from Corollary A.3.2 and Lemma 3.4. By

Lemma 3.5, R # 0 and y i F implies that RtHy # 0. Hence, the term

is positive and as A approaches zero, p approaches negative infinity,

all other terms being finite.

(b) y e F implies Hy Q (y)Q(y)Hy, Corollary A.3.2 implies

(RtR)- = R Rt- and using Lemma 3.4 (RtR)-Hy = 0. The only

term remaining in the limit is

ytH Q (y)(I-R B(y*)) Q(y)(I-R B(y*))Q (y)Hy

t t.

which on application of Lemma 3.4 reduces to y HtQ (y)Q(y)Q (y)Hy

and lim ip(Xy* + (l-X)y) = J(y) for y E F.

X+0

The continuity of ip(y) on F is immediate from part (b).

Q.E.D.

The directional derivative of t(y) also exhibits a useful property

as relative boundary points are approached.

Corollary 3.7.1. If y* E ri F, y E F* then

lim d i(Ay* + (1-X)y) = + m for y i F.

X+0 dA

Proof of Corollary 3.7.1.

>(ay* + (l-x)y) = ft(a)[Q(y*) + (l-x)Q(y)]f(A)

(y* + (l-)y) -t

+ ct(Ay* + (l-A)y)

where f(X) is as developed in Lemma 3.3. From this, it can be shown that,

d y (* ytHm(,RR)-Hy,

lim- 4 ((y* + (l-A)y) = K(y*,y) + lim 1 1) (R R) Hy,

X+0 dA +

ytHt(RtR)-Hy > 0 by hypothesis with

K(y*,y) = 1 y*tHt(RtR)-H(y*-2y)

-y*tHt(RtR)-Q(y*)D1Hy

ytHtD (Q(y*)-Q(y))DHy

+ (y-y*)tHtDtQ(y*)(RtR)-Hy

+ y*tHtDtQ(y*)(RtR)-Hy

ytHtDt (y)DHHy*

+ ytHtDtQ(y)D2H*

+ ct(y*-y)

where D1 = (I-R B(y*))Q (y)(I-Bt(y*)Rt)

D2 = (I-R B(y*))Q (y)Q(y*)D1.

K(y*,y) is constant.

Thus it is seen that

lim A-d-(Xy* + (l-A)y) = + o for y t F.

A+0 dA

Q.E.D.

Up to this point i(y) has only been defined on F and not on F*.

Extending the domain of p(y) to F* can be accomplished by use of

Theorem A.13. That is, for y e F*, y i F accept the best approximate

solution to the system Q(y)x + Hy = 0, namely, x = -Q (y)Hy. Then

4(y) has the same form as on F and is finite for such boundary points.

Hereafter, i(y) will be used to represent the function on both F and F*.

In order to characterize i(y) on F* the definition for an upper

semicontinuous function is stated for emphasis.

Definition. A real valued function f defined on a normed space En

is said to be upper semicontinuous at x if, given E > 0, there is a

6 > 0 such that f(x)-f(xo) < E for IIx-XOI < 6.

From the continuity of ((y) on F, the definition of i(y) over F*,

and Theorem 3.7 the following is immediate.

Corollary 3.7.2. i(y) is an upper semicontinuous function on F*.

The form of the gradient and the Hessian of i(y) will now be

developed. To simplify the proof of the continuity of the gradient

and the Hessian, the next lemma is developed.

Stewart [66] has shown that a necessary and sufficient condition

for a generalized inverse to be continuous is that its rank be constant

over its domain. This is an extension of the work of Ben-Israel [4 ]

who first addressed the question.

The relative interior of Fis a set of index maximal dual feasible

vectors, Corollary 3.6.2, which by Corollary 3.3.1 imples that the

rank of Q is constant over ri F. Therefore, applying Stewart's result

the lemma is stated as;

Lemma 3.6. f(y) = Q (y)Hy is continuous over the relative interior

of F.

Theorem 3.8. The gradient of i(y), y E F, is

Vi(y) = ['i(-Q (y)Hy), 0 (-Q (y)Hy),..., 0m(-Q (y)Hy)]

and is continuous on ri F.

Proof of Theorem 3.8. The kth component of VP(y) is

WY) t- 1 ytHt DQ Hy

= -h-QHy + ck yt Hy

which by Theorem 2.4, for k e N(y), reduces to

y -Q-(y)Hy).

aYk

For k i N(y), modification of the proof of Theorem 2.4 gives

Slim (QQ + 621)(I-Q )-Q QkQ

ayk 6+0

and

yk Hy = QQHy = -Q -QkQHy.

Then

1 = k (-Q (y)Hy) for k i N(y) and the form of the gradient

DYk

is defined.

The continuity of Vp(y) follows from Lemma 3.6, noting that

l(y) k(-f(y)).

SkQ.E.D.

This is an extension of the result of Falk [26] where it is seen

that Vi(y) is defined at a particular point, i.e., x = -Q (y)Hy, instead

of any x = -Q-(y)Hy-(I-Q-(y)Q(y))g.

Corollary 3.8.1. The jkth element of the Hessian of ((y), for

y E ri F, is

Hjk Hjk -yHtQQ jQkQ Hy + h QQkQHy

+ 1Q Q jQHy htQ hk,

The Hessian exist for each y E ri F and is continuous on ri F.

Proof of Corollary 3.8.1. The proof follows from

Hjk y J y k

Theorem 2.5 and Theorem 3.8. The continuity of the Hessian follows from

the generalized inverse being continuous on ri F and Lemma 3.6.

Q.E.D.

The next corollary will be used in the proof of the concavity of

j(y) on F.

Corollary 3.8.2.

Vi(y)y = *(y) O (-Q Hy)

Proof of Corollary 3.8.2.

V(y)y I [- y Ht Q; -Hy hQ Hy + c ]y

J=1

SytH Q Hy yt H Q QQ Hy ytHtQ-Hy

t- -t

+ h Q Hy + c y c

= ((y) 0 (-Q Hy)

Q.E.D.

C(y) can now be proved to be concave based on the following result

found in Mangasarian [43],

i(y) is concave on F if and only if

p(y2) (yl) < Vi(yl)(y2yl1) for each yl,y2 E F.

Theorem 3.9. p(y) is concave on F.

Proof of Theorem 3.9. Take yly2 c F, then Q(y2) = 2 Q

J N(y2)

is positive semidefinite and

0 < [y2tHtQ-(y2)-yltHtQ-(yl)]Q(y2)[Q-(y2)Hy2-Q-(yl)Hyl]

0 y2tHtQ(y2)Hy2 + 1 yltHtQ-(yl)Q(y2)Q-(yl)Hyl

1 tt1

y2tHtQ-(y2)Q(y2)Q-(yl)Hyl yltHtQ(yl)Q(y2)- (y2)y2

Now y2 e F implies that Q(y2)-(y2)Hy2 = Hy2. This, and adding

c y2 to the inequality;

(2) yltHtQ'(yl)Q(y2)q-(yl)Hyl y2tHt- (y)Hyl + ty2

m

p(y2) < i [t ylHtq (yl)Q(yl)Hyl htQ-(yl)Hyl + c ]y

But this is recognized by Corollary 3.8.2 to be

i(y2) Vp(yl)y2 + o(-Q-(yl)Hyl)

adding -p(yl) and using Corollary 3.8.2 again, the inequality reduces to

i(y2) *(yl) < Vt(yl)(y2-y1) and i(y) is concave on F.

Q.E.D.

3.5 Optimal Primal Solutions

Wolfe duality theory applied to {D.1} insures that the solution of

the dual problem characterizes the primal solution of {P.1}. This is

certainly true in the sense that the optimal objective function value of

{P.1} is determined from the optimal dual solution, yet all optimal

~

primal variables are not explicitly defined. This is due to the primal

solution being defined in two parts, explicitly on the range space and

implicitly on the null space of Q. Only that part defined explicitly

on the range space of Q is immediately determined from the optimal dual

solution. This should not be construed to imply that an optimal dual

solution does not return the optimal primal vector; it does.

Assume {D.1} has been solved and the optimal dual vector is y

Then yo F and the optimal primal vector is given by

xo = -(yO)Hyo-(I-Q (yO)Q(yO))g

for g E En. The optimal primal vector will be unique only if Q(yO) has

rank n, i.e., I = Q (y )Q(y ). Therefore, to establish x g must be

determined.

Substituting x0 into {P.1} results in the following;

{Po.1} minimize t (g) = -ht(I-Q-Q)g + (-Q-Hy)

0 0 0

subject to: 0 (g) = -h (I-Q )g + (-Q Hy) = 0

for j e N(yO)

1 t

0 (g) = gt(l-Q Q)Q (I-Q Q)g

-(h -y tHtQQ )(I-Q -)g

+ 4j(-Q Hy0) < 0

for j 0 N(y0)

where Q is understood to be a function of yO and the order of the con-

straints has been arranged so that the first S(yo)-l constraints are

active, that is, N(yo) (0} = {1,2,...,S(y)-l}.

By complementary slackness, 5j(g) for j E N(yo) {0} is zero.

This results in a consistent set of linear qualities.

Let H = (h1,h2,...,hs(yo)-l), an nx(S(yo)-l) matrix, and 0o be

the S(yo)-l vector

0 = [ 1(-Q-Hy), 2(-Q-Hy) .***. (y)- l(- oHyo t

The first S(y )-i constraints of {Po.1} are expressable as

Ht (I- (y)Q (yO))g =

o

Solving for (I-Q Q)g results in

(I-Q-Q)g = Ht- 0 + (I-Ht H)g (3.5.1)

for gr En.

Theorem 3.10. 0 (g) is a constant for y That is, the primal

objective function value is fully determined by the optimal dual vector.

Proof of Theorem 3.10.

0 (g) = -ht(I-Q-Q)g + 0 (-Q-HyO) and by the constraint set of

{D.1}

-h (I-QQ) = y ,...,y ht (I-QQ)

0 1 22 mm..... m

which can be written as

-h (I-Q Q) = y H (I-Q Q)

o o

where y = (yy2, ... YS(yo)_)t by noting yk = 0 for k > S(yo).

Now,

-ott Hy0)

eo(g) yo" H (I-Q Q)g + 4 (-Q Hy

and on substitution for (I-Q Q)g by (3.7.1)

0o(xo) (g) = yt HtHt- + o0(-Q-(y )Hy ).

Q.E.D.

{Po.1} is now characterized as the minimization of a constant

subject to m-S(y )+l quadratic constraints. The following theorem

defines those instances when it is unnecessary to solve {P .1}.

Theorem 3.11. The optimal primal solution is fully determined by

the optimal dual solution provided the nx(S(y )-l) matrix H has rank n

and the optimal primal solution is

xo -(yO)HyO-(H H )-1H0.

Proof of Theorem 3.11. If H has rank n, then by Theorem A.6,

Ht- = (H Ht)-H and (I-HH- ) = (I-(H Ht)-H Ht) = 0. Therefore,

(I-Q-Q)g = H t- = (H Ht) H 0o and xO = -Q-(y)HyO-(H Ht )H o for

o 00 0 00 0

Ho = (hlh2* h(y)-) (HH)- = (HoHt-1

Q.E.D.

When Ho does not have rank n, {P .1} is a special form of {P.1}

o0

which can readily be solved. Substitution by (3.5.1) in {P .1},

noting that it is only required to determine a g which satisfies the

m-S(yo)+l constraints, and solving the following strictly convex auxiliary

primal problem defines the component of xo which is in the null space of

(Y0o)

(yo).

{PO.2} minimize grtIgr

subject to: ((gr) < 0, for j J N(y)

where

S(gt) = g rt(I-HHo)Q (I-HoHo)gr

+ [OoHtQ -h + yotHtQQ ](I-HoH)gr

+ [(1otHo t -ht + yotHt-Q ]Ht- O

+ 0j(-Q-(yo)Hy)

Solving {P .2} by the dual {D.1} will return a unique g Q will be

of rank n. Therefore, at moat, two dual problems will require solution

and at least one will be a strictly convex problem permitting the use

of Corollary 3.1.2.

~I~

3.6 Linear Constraints

The class of convex quadratically constrained quadratic programs

{P.I} can be expanded to admit problems with linear constraints.

{P.I'} minimize

S(x) = 1 xQ x + htx + c

o 2 0 0 0

subject to: j (x) = 1 x x + hx + cj 0

1 2 1j

for j = 1,2,...,m1

t

akx + ck < 0

for k = m + l,m + 2,...,m

where m-m = m

1 2

By treating linear inequality constraints as quadratic constraints, with

zero quadratic terms, {P.1'} is merely a special form of {P.1} and the

theory of Section 3.2 applies.

The more interesting case is when {P.1'} has linear equality con-

straints. Let the set of linear equality constraints be given by

Ax = c (3.6.1)

where A is an m xn coefficient matrix and c is an m xl constant vector.

2 2

Given that the primal problem is feasible there exist solutions

to (3.6.1). Using the generalized inverse and the assumption of

feasibility,

x = A-c + (I-A A)g (3.6.2)

for all g c En. Clearly any solution to {P.1'} is of the form (3.6.2)

and the primal problem can be transformed to one in g.

Substituting (3.6.2) into {P.1'} results in the following

1 tt tt -

(P.2'} minimize o 2(g) gt(I-A A)Qo(I-A-A)g + (ht + ctA )(I-A A)g

+ (c + 2 ctA QAc + htA-c)

0 2 0 0

t t t t -t

subject to: @'(g) =- gI-A A)Q(I-AA)g + (h + cA )(I-A A)g

+ (c + ctQA Ac + hfA-c) < 0

for j = 1,2,...,m

This transforms {P.1'} with linear equality constraints to the form {P.1}.

The advantage of the conversion is that the dual problem {D.1} is

reduced from a problem in m dual variables to one in only mi dual

variables.

It is possible to express a set of linear inequality constraints as

a set of equality constraints by the use of slack variables. This would

be highly desirable based on the potential reduction of the size of the

dual problem. Unfortunately, the use of slack variables will also

increase the size of the set of nonnegativity constraints and the dual

problem size remains effectively unchanged.

In conclusion, it can be said that linear constraints cause no

difficulty in applying the dual {D.1}. While linear equality constraints

permit the formation of an equivalent reduced problem, linear inequality

constraints are treated as special quadratic form constraints.

3.7 Linear and Quadratic Programming

Linear and convex quadratic programs are proper subsets of the

class {P.1}. The dual {D.1}, therefore, should admit of special forms

for these two subclasses.

For linear programs, {D.1} results in the unsymmetric dual. This

is seen by noting that Q = 0 and the constraint set F is given by

m

F = {yj >0, J = 1,2,...,m[ X h y = -ho},

j=l 3i

the objective function is

m

S(y) = Co + ci Cy

j=1

The application of {D.1} to convex quadratic programs results in a

nonsymmetric dual which is again a quadratic program.

Define the general convex quadratic program as

{Q.1} minimize 0 (x) = 1 xtQ x + htx + c

0 2 o o o

subject to: j (x) h x + c < 0

for j 1,2,...,m

where Q is symmetric and positive semidefinite. {D.1} reduces to

{D.4} maximize i(y) = yttQ-y

-t t

+ (c h QoH)y

1 t-

+ (- hQoho + )

2 ooo a

subject to: y (ylY2,...M)t > 0

(I-Qo o)y = -(I-Q;oo)ho

where H = (hl,h2,...,h ) and c = (c1,c2,...,cm)t

{D.4} reflects a notational variation from {D.I} in that Q = Q

and is not a function of the yj. This permits the implicit representation

of yo = 1 in {D.4}. It is clear that '(y) is quadratic and that the

constraint set is linear. For Q of rank r, there are n-r linear

constraints in {D.4}. This is seen by noting that there exists an

orthogonal matrix P such that

PtQP = Do,

0 0

~

D is a diagonal matrix which has r nonzero diagonal elements equal to

o

the eigenvalues of Q The constraint set is thereby reduced to

(I-DoD)P Hy = -(I-D D )Pth.

(I-D D ) is a diagonal matrix with r zero and n-r nonzero diagonal

oo

elements. Hence, there will be n-r linear equality constraints for (D.4).

If Qo is nonsingular, {D.4} has only nonnegativity constraints and

is the dual form addressed by Lemke [41].

3.8 Equivalence of Dual Forms

In sections 3.2 through 3.6 the generalized inverse dual {D.1)

and its properties were developed for convex quadratically constrained

quadratic programs. {D.1} was developed from the Wolfe dual, {W.1}, of

{P.I}. Three of the significant advantages of {D.1} over {W.I} are

(1) {D.1} is an m variable dual while {W.1} has m+n variables, (2) the

objective function of {D.1} is concave whereas that of {W.1} is not, and

(3) the constraint set of {D.1} is also characterized by a linear system.

There are two other dual forms of {P.1}. One is the generalized

geometric inequality dual of Peterson and Ecker [51,52,53] and the

conjugate function dual of Rockafellar [571.

The dual of Peterson and Ecker is applicable to all convex Xp-

constrained ip-programs and {P.1} in particular while Rockafellar's dual

is applicable to the class of faithfully convex programs. Faithful

convexity being defined as:

Definition. A function f is faithfully convex if it is not affine

(simultaneously convex and concave) along any line segment, unless f

is affine along the entire line extending the line segment.

Because Rockafellar [57] has shown that his dual and that of

Peterson and Ecker are equivalent for {P.1} the designation {CD.1} will

be employed to refer to both.

The following form of {CD.l} will be used.

{CD.l} maximize

T(yz) 1 1 t -t

T(y,z) 2 J() YI zJ + c y

2 jeN(y) Yj j

subject to: yj > 0, j = 1,2,...,m

Yo =1

m m

SBJz = hjy

j=0 j=

z = 0 if y = 0

zj E En, y E l

{CD.1} is stated explicitly in terms of the dual variables y zj

for j = 0,1,2,...,m. For notational convenience the number of dual

variables is exhibited as (m+l)(n+l) but in applications the number of

m

dual variables will be I p. + m where p is the rank of Q .

j=0 J

Both Peterson and Ecker and Rockafellar have developed properties

of (CD.1) comparable to those developed for {D.I}. Three differences

m

between {CD.1} and {D.1} are seen to be the number of variables, I pj + m

j=0

compared to m, the form of the constraint set and the form of the

objective function.

To compare the duals {D.1} and {CD.1} it is first noted that there

is an alternate characterization of the system

Q(y)Q (y)Hy = Hy

resulting in a consistent system of equations in an expanded space.

Theorem 3.12. Given that Qx = -Hy is consistent, i.e., QQ Hy = Hy,

then there exists a vector of m+l n-vectors

t t t tt

z (z ,l,Z2,... ) ;

o 1 2' m

z = z' i/ 2 ,z E En

such that for Q = BIB o 1, y > 0, j = 0,1,2,...,m

m m

I Btz = Ih yj,

j=0 j=0

is consistent. The converse holds if z = 0 when y = 0.

Proof of Theorem 3.12. By Lemma 2.3, Q can be factored into

=Bt

= tyBy

where t = (B ,Bt Bt ...,Bt) for Q = B B, j =0,1,2,...,m,

01 2 = B1 i

B of order nxn for all j,

and

of order n(m+l)xn(unl),

a Kronecker product.

Using this factorization and

BtBt-Hy = Hy.

Hence, there exists a vector z' =

B z' = Hy

Theorem 2.3, QQ Hy = Hy reduces to

(z 't,zt,...,z't)t such that

olM.

Btz' = Hy.

t t tt

By defining the m+l n-vector z = (z ,z ,...,zm

0 1 M

1 0 .. . O

0 IFy

Y = I x .

2

0

0 . 0' /y9

it is seen that

m m

I Bz = i h yj.

j=0 jJ=0

The converse is obtained by reversing the above sequence of arguments

with z = 0 when yj = 0.

Q.E.D.

The expression for the feasible set F in the expanded dual space

is now

F {y e E+ lyo 1 and z E j = 0,1,...,m

m m

solving I B = I hj y and y = 0 implies

j=0 j=0

z = 0}.

Theorem 3.13. There exists a one to one correspondence between the

feasible points of {D.1} and {CD.1}. Furthermore, for such points the

associated objective function values are equal.

Proof of Theorem 3.13. The proof will be by demonstrating that

{CD.1} can be derived from {D.1} and vice versa.

From {D.1},

1 t^t- ^ -t

(y) =- yHQ (y)Q(y)Q (y)Hy + cty

which by the factorization of Q(y) = tB(y)i(y) is equivalent to

(y) = [Bt-(y)Hy]t[Bt-(y)Hy + cty

where B(y) = BY as defined for Theorem 3.12.

Define

-'t-

z YB (y)Hy = YBQ Hy, (3.8.1)

is a vector of m+1 n-vectors and the jth n-vector is zero if y is zero.

z iavetromln-etranthi n-etrizeoi iszero.

Multiply (3.8.1) by Y ,

Y z= Y YBQ Hy = (Y Y)YBQ Hy = YZQHy

or

Y-z B(y)Q (y)Hy = Bt-(y)Hy. (3.8.2)

Substituting (3.8.2) into i(y) results in

T(y,z) = (Y-z)t(Yz) + Cty. (3.8.3)

Multiply (3.8.2) by Bt(y),

Bt(y)Y.z = BtYY-z B Bt(y)Bt-(y)Hy = Hy

by the constraints of {D.1} and Theorem 3.12.

By Theorem A.11, YY is a diagonal matrix with y. > 0 implying a

diagonal element of 1 and yj = 0 implying 0, resulting in YY z = z by

definition. Then by Theorem 3.12 the constraints of {D.1} can be stated

as:

Btz = Hy

zj = 0 if yj = 0

yj 0 for j = 1,2,...,m and y = 1

and the objective function by (3.8.3), hence {D.1) implies {CD.1}.

Now taking {CD.1}; it is seen by Theorem 3.12 that the constraint

set of {CD.1) is equivalent to that of {D.1}. Furthermore,

Btz = tYY-z = BtY-z = Hy

by definition of Y and z for (CD.1}. This equation is consistent;

therefore,

Y-z = BtHy + (I-BBt)g

for g En(ml). Substituting this into the objective function,

equation (3.8.3), results in

T(y,g) [(Bt-Hy)t(Bt-Hy) + g t(I-t-Bt)g] + cty.

T(y,g) =

49

The matrix (I-Bt-Bt) is a symmetric positive semidefinite matrix and

1 t ^t-At

maximizing g(I-B-Bt)g implies that g is in the range space of B.

In particular, g = 0 is in the range space of B and so without loss of

generality T(y,g) can be simplified to

( (y) Hyt (Bt-Hy)ttHy) + cty.

(y) =2

Hence, {CD.1} implies {D.1}.

Q.E.D.

The question of which dual form would be more advantageous for a

particular application will be addressed in Chapter 4.

~

CHAPTER 4

DUAL ALGORITHMIC CONSIDERATIONS

4.1 Introduction

In Chapter 3 three dual forms were introduced for {P.1); the Wolfe

dual {W.1}, the generalized inverse dual {D.1}, and the conjugate dual

ICD.1}.

Since {W.1} lacks many desirable properties which {D.1} and {CD.l}

possess the analysis of this chapter will address the question of whether

{D.1} or {CD.1} offers computational advantages for either or both of the

two subclasses of {P.1}.

The two subclasses of {P.1} will be defined as definite problems

and semidefinite problems. Definite problems will be those for which

4 (x) is strictly convex or those which have a known active strictly

convex constraint. Semidefinite problems will be those which are not

definite.

The comparison analysis will be subjective in nature, but will

catalog those critical elements of algorithmic development which may

likely prove to be advantageous or not computationally.

To this date there has been one reported algorithm developed for

(CD.1}. This work, done by Ecker and Niemi [24], reported one experi-

mental result based on a definite problem, but offered no comparison

information with other algorithmic approaches.

For intermediate scale definite problems, primal problems of 40

variables and 30 constraints, a projected gradient algorithm was

developed for {D.3} and reported on by Hearn and Randolph [35].

Experimental results, CPU time, obtained with this algorithm are

catalogued in Table 1. The problems were randomly generated, had 100

percent dense matrices with positive components between zero and ten,

and strictly convex objective functions. For each n,m three problems

were generated and solved. Also, a general primal algorithm, the

Sequential Unconstrained Minimization Technique (SUMT) of Fiacco and

McCormick [28] as implemented by Mylander, Holmes and McCormick [48],

was applied to three of the primal problems. The first problem n = 5,

m = 3 had a CPU time of 2.6 seconds; the second n = 10, m = 5 had a CPU

time of 11.9 seconds and a third problem, n = 15, m = 10, was attempted,

but terminated after 120 seconds.

4.2 Algorithm

To compare the duals, {D.1} and {CD.1}, the projected gradient

algorithm will be used as the standard for investigation. A discussion

of the projected gradient algorithm is found in Rosen [9 ,60]. A

discussion of feasible direction approaches, which includes gradient

projection, is found in Zoutendijk [72].

The basic form of the algorithm is as follows, stated for {D.1}:

(a) At a feasible point yk compute the gradient of the objective

function, V (yk)

(b) Determine the matrix T, defining the support of the constraint

k

set at yk. The rows of T are the gradients of the constraints

active at yk (or a subset of the active constraints).

cn \oD C r-

0 0 04 ; -

N cn

04

H N

3A

0-441

04 1C

00

U

02

ca (

E-l

0 C

0)

0

-4

CDc

*0

0 -i-

1-4

H<*

04i

en) Un IT 0 o n Un 0

-1 H H- In

in Ln 0 in 0 0 0

r4 Hf C4 IT t

4-i

F-

44

0)

4

C

0

o

4J

4O

a

W

u

0

Cd

1-c

-l

H

fr

CO

Mf

-a

O S

i-4

-4

0)C

flu

41

4-1 -4

0m 0

u -4

H

I O

o

.) O

:0.0

-1

uw a

OU 0

0

0 t

CD $4

C 0.0

CU H 0

CO

OHO

U 0

ad -H Tl

U 00

S0 >

oiC

-49

HOC)0

-K

(c) Compute the projection matrix P, which projects vectors onto

the null space of T.

k kfl k k k k+l

(d) Compute yk+l y +k P V(yk), such that (yk+l) is

maximized, for PVi(yk) # 0, and go to (b). If PVi(yk) = 0

and the orthogonal projection of Vi(yk) onto T, the null space

k

of P, results in nonpositive components then y is optimal;

otherwise select the largest positive component of this pro-

jection and remove the associated equality constraint from the

set of active constraints and go to (b).

Rosen, in defining the support matrix, required the selection of a

linearly independent set of active constraint gradients for the rows of T.

Other authors have used the definition of a regular point to meet this

requirement, for example, see Luenberger [42]. This particular concept,

while theoretically acceptable, creates computational difficulties. The

solution to this difficulty is obviated by the use of the generalized

inverse of T. That is, by Corollary A.8.1 the projection onto the null

space of T is defined by the matrix (I-T T) and the projection onto the

null space of P by T T.

Therefore, in comparing the two dual forms the generalized inverse

of the support matrix will be derived.

Other necessary computations for the algorithm involve finding

gradients of the objective functions and solving the one-dimensional

maximization problem.

An important consequence of both {D.1} and {CD.1} with respect to

feasible direction algorithms is that initiating such an algorithm at a

dual feasible point will be sufficient to insure that no infeasible

dual solutions are generated. For {D.1}, Theorem 3.8 and Corollary 3.8.1

are seen to prove this statement.

Therefore, use of the projected gradient algorithm generates feasible

directions and the one-dimensional maximization will result in dual

feasible solutions.

4.3 Projection Matrices

The support matrices of the two dual problems will be determined and

their null space projectors defined. To distinguish between the two,

a sub-D will designate matrices for {D.1} and a sub-c those corresponding

to {CD.1}.

To simplify the notation assume thkt the dual variables, yly2,.. .y,

a < m, are zero and the active constraints {D.1} are arranged as

y = 0

Y2 = 0

ys = 0

(I-Q-(y)Q(y))Hy = 0,

where y E F. The matrix defining the support manifold at the intersection

of these active constraints is given by

TD

I 1

D (

where I Esxs, OE Esx(m-s) (I-Q)H E Enxm, and it is understood that

H = (h ,h ,...,hm), the ho column being removed due to yo being constant

for all y E F. TD is seen to be an (s+n)xm matrix and its null space

projection matrix is defined as

PD = (I-TTDD)

By Theorem A.15

T t- = UU + CC

DTD

where U = [- -] E Emxs, V Ht(I-Q-Q) e Em-, and C =.(I-UU-)V.

By Corollary A.3.4 U = [1,0] and UU = [--4---] Emxm, I e EX

010

Therefore,

C = (I-UU )V= [- --] H Q), I E(s)x(m-, and

partitioning H, H = (H ,Hs) where Hs = (h ,h2,...,hs) and is = (hs+l'hs+2'

...,h ) it is seen that

C 0 =] E Emxn 0 Ex

Pt----I[E"_"' O E

H(I-Q Q)

s

Again applying Corollary A.3.4 results in

01 0 mx

CC = 7:I 6[Em

CC 1 o0 a I-Q-Q)[R,(I-Q Q;T

and

-t R

H(I-Q Q)Hs (-QQ)] = [(I-q)H s-[(I-Q Q)HJ] Q QH

s 4&s a a s

which reduces to

[(I-Q Q)Hs ] Hi

since, by Theorem A.16, [(I-Q Q)H ]Q = [Q(I-Q-Q)Hs] = 0.

Combining the above,

P 0 E Emxm

D 0 I-[(I-Q-Q)H ]H

and {I-[(I- Q) H]-H s E(m-s -

Hence, applying the projected gradient algorithm to {D.1} it is

only necessary to work with the dual variables which are positive,

i.e., yj such that j E N(y).

Now turning to {CD.1}, again assume that yl ,2,...,y = 0 and the

set of active constraints is

__

yl.= 0

Y2= 0

y =0

m m

I B z I h y = 0.

j=0 j=0

From this the matrix defining the support manifold of active

constraints at the point y is

T I 0_ E (s+n)x (+m)

c -H Et

where IE Es", m = (m-l)n, t (Bo,B,...,B), and = (h,h ...,h)

The null space projection matrix for {CD.l} is then

P = (I-TT c)

c cc

Applying Theorem A.15,

TtTt- =UU- + CC

cc

with U = -- and C = (I-UU-) -t-

Corollary A.3.4 gives

UU = -- resulting in

C= -t OE Esxn

2--]

S-0-

and CC- -----

Ssl [:]]

L B .J IBJ

Therefore,

0 1- (mRm)x(Bm)

P = -2--------I( Et x )

c I--H ,B 't-(-H ,B)

where [I-(-H ,Bt)-(-H ,B )] c E(mm-s)x(m-s). Further use .of.

5 5

Theorem A.15 will not simplify this form, but again it is seen that the

projection matrix Pc depends on the positive yj and not on the dual

variable vectors zj.

4.4 Objective Function Gradient

The gradient of 0(y) is given in Theorem 3.8. For {CD.1}, the

objective function is

T(y,z) 1 t -t

T(yz zj z + c y

2 je N(y) j J

and its gradient is

VT(y,z) =

aT 1 1 t

where -yk- y2 Zkzk

k Yzk k

Yk

+ ck for k = 1,2,...,m,

* kc

for k = O,l,...,m, defined only for yk > 0.

4.5 Definite Problems

By definition of definite problems Q(y) is nonsingular and by

Corollary 3.1.2 {D.11 specializes to {D.3}.

It is clear that for {D.3}; F = F*, the selection of an initial

y e ri F is trivial, and the optimal dual solution completely defines the

optimal primal solution.

For {CD.1) the major benefit of the definite problem is in the

determination of an initial feasible solution point. That is, say Qo is

positive definite, then

-1 m

20 = (t 0 [hay + B z1.

J=0 [jyj

arbitrarily selecting yj,zj > 0 for j = 1,2,...,m, computing zo, and

recalling that y = 1, results in an initial feasible dual solution.

In terms of applying a projected gradient algorithm it is seen that

for {D.1} the projection matrix reduces to

PD L EE

where I e ESs but in contrast, for {CD.1}, using Theorem A.6,

0I 0

pc ------- ------------------

i I -s (H + Q(l)) (-H ,B)

B_ as 8 a

m

where Q(1) = Q .

j=0

One means of comparing algorithms is to calculate the number of

multiplications required. This basis of comparison, while not foolproof,

can be used as an indicator of relative computing requirements.

Table 2 details the multiplications for {D.I} and {CD.}1 applied to

definite problems. For convenience it has been assumed that yl,y2, .... s = 0

for both {D.1} and {CD.}1 and that the Choleski method of matrix inversion

was employed (see Clasen [15] and Westlake [70]). Also, it is assumed that

for a given primal problem {D.1} and {CD.l} will be solved in the same

number of iterations.

An initial observation to be made is that the determination of P and

c

Pc T(y,z) contributes the majority of the iteration operations to {CD.1}.

This implies that if it were known that the optimal solution was in the

relative interior of Fc, the feasible region of {CD.1}, then the iteration

count for (CD.1} would be drastically reduced and other considerations

relative to the two duals would come into play.

Another case where the multiplication counts are significantly

different is quadratic programming. Here the iteration counts for both

duals are reduced. That is, for (D.1} it is only necessary to compute

one inverse at the initiation of the algorithm and for {CD.1} the order

of P is reduced from (m+l)n+m-s to n+m-s. The cumulative result of

these reductions though, is that while the order of P is reduced it is

C

still necessary to invert an nxn matrix for each iteration not in the

relative interior of F and {D.1) still offers significantly fewer

multiplicative operations.

For the general case, m proper quadratic constraints, it is clear

that for noninterior point solutions (D.}1 incurs fewer multiplicative

operations than does {CD.1}.

Other characteristics to be considered are the relative size of the

two duals and associated computer storage requirements. {D.1} has m dual

0 u 0

O cni a

i ca -H40 41i

401 )D

ao 0

a ia 44

0 4 -

) sr-i i a

o 4) -H 0

a
4) 0 H 3 w )

A o mi 4-i*

O! U J H 4

0

0

J4

0

i-4

-4

a0

H4

*O

-4

Nr

dO

cO"

I +

+I

+ct

I -I

I -C'

I |-

I 0CJ

I M I-

ImiN '-

1C 0 C M

+A

Ct

Ic

N

,,

-4 0

,cl

m

variables whereas {CD.1} has. pj + m dual varialbes, p being the

j=0

rank of Q For example, if n = 50 and m = 25, assuming the average

pj = 25, {D.1} would require 25 dual variables while {CD.1) would require

650 dual variables, 26 times as many. Computer storage requirements will

differ in a comparable fashion also. That is, even though {D.1} needs to

store the same number of matrices as {CD.1}, the Q. and B., it will be

necessary to maintain storage for the matrix Pc, which will be of order

1300 x 1300, and the matrix of zj dual variables.

In conclusion then for definite problems the generalized inverse

dual, {D.1} specialized to {D.3}, should prove to be favorable over the

conjugate dual, {CD.l}, computationally.

4.6 Semidefinite Problems

The comparison of {D.I} to {CD.1} for aemidefinite problems is not

so clear cut. The major difficulty is that of determining an initial

feasible dual vector, preferably in the relative interior of F. This

problem is common to both dual forms and, therefore, will only be dis-

cussed as related to {D.1}.

The ideal initial dual vector for {D.1}, as stated, would be one in

the relative interior of F. The following are classifications for which

such an initial vector can be readily determined.

(a) For some k e {0,1,2,...,m}, Ok is strictly convex.

This implies that Qk is positive definite and F* is the non-

negative orthant. The initial point, y1, can be selected such

that y > 0 for j = 1,2,...,m, y l ri F.

(b) For j,k = 0,1,2,...,m, hj is in the column space of Qk. Again

F* is the nonnegative orthant and y can be selected such that

y > 0, j = 1,2,...,m, yl E ri F.

i

The next set of classifications determine an initial dual vector y

in F, either a relative boundary or relative interior point.

(c) h is in the column space of Qo, then y = 0 for j = 1,2,...,m

will result in y E F. Also, if there is an index set

Ir = {0,l ,j2 ...,j r, such that for k E I hk is in the column

1 1

space of some Q j C Ir, then yk > 0 for k E Ir, Yk = 0 for

k V I results in y E F.

(d) There exist y > 0 such that Hy = 0. This clearly results in

y E F. Hy = 0 and y > 0 can be expressed by the system

m

S hyj = -ho, (4.6.1)

yj > 0 (4.6.2)

To facilitate recognition of this characterization it is noted that

by Farkas' theorem [43] there will either exist a solution to the system

(4.6.1), (4.6.2) or to

htx < 0 for j = 1,2,...,m (4.6.3)

-htx < 0 (4.6.4)

o

but not to both. The system (4.6.3), (4.6.4) is readily recognized

through a linear programming formulation. By solving

{LP.1} minimize X(x) = -htx

subject to: h x < 0

for j = 1,2,...,m

a solution to the system (4.6.3), (4.6.4) is determined to exist or not.

That is, if the optimal objective function value for {LP.1) is nonnegative

then (4.6.1), (4.6.2) has a solution which is in F.

Obviously, not all problems will fall into the preceding categories

and even if they did the recognition of a particular category is not

especially straightforward. The introduction of the following technique

is intended to offer one approach to alleviating the difficulty by using

an extension of the 'Big M' method, see Charnes [12], of linear pro-

gramming which was also designed for determining initial feasible

solutions.

Taking {P.1}, add the m+lst constraint

1 t

S ((s) = -x I x -M < 0

m+1 2

with M a large positive number. {P.I} with this constraint will be

referred to as the auxiliary problem, {AP.1}. Clearly, 0m+l(x) is

strictly convex and thereby {AP.1} falls into classification (a) for

selecting an initial feasible solution to the dual auxiliary problem,

{AD.1}. The objective function of {AD.1} is

1 tt

(y) = ytHtQ (y)Hy + cy

where

yt (yoy ,*.,YmYm+)

H (ho,h ,...,hm,0),

Q " + ji Q + y1m+ I,

-t

and c = (co,c ,...,cm,-M). Maximizing i'(y) using an initial

feasible solution containing ym+1 > 0 will lead to a feasible solution

of {D.1}, driving ym+ to zero, if such a feasible solution exists. As

in the two phase method of linear programming, see Dantzig [20], when

ym+l is reduced to zero continued iterations should be performed using

{D.1} instead of {AD.1), since ym+l will never be made positive again

due to the influence of -M on the objective function.

The above technique, while not particularly elegant, is simple to

implement and will determine a feasible solution to (D.1) or indicate that

{D.1) is infeasible.

Another potentially useful technique for solving semidefinite problems

is again based on an auxiliary problem which is definite. The theoretical

basis of the approach is found in Fiacco and McCormick [28, Theorem 37].

In essence, the proof is that if a sequence of (P.1} problems is solved

with objective functions

1 t +c,

6 (ii)= xi(Q + cil)xi + +co

Ei > 0 and limit Ei = 0

then the solution to the original {P.1} is the limit xo as i + m. This

i

approach is computationally advantageous in that each problem in the

sequence is definite and can be solved using (D.}1 with no additional

dual variables required. The disadvantage is the requirement to solve a

sequence of problems.

For semidefinite problems both {D.1} and {CD.1} involve the

computation of generalized inverses. In recent years, a sizable litera-

ture has developed relating to generalized inverse computational methods,

but experimental results have yet to indicate preferable methods for

classes of matrices. Rust, Burrus and Schneeberger [62] have developed

a FORTRAN program based on Gramm-Schmidt orthogonalization while

Greville [33] and Tewarson [67] have developed methods for special

partitioned matrices. Computational methods based on Gauss-Jordan elimi-

nation have been proposed by Ben-Israel and Wersan [6 ] and Noble [49];

whereas Pyle [56] has a method employing a gradient projection method.

65

Also, Hestenes' [36] biorthogonalization technique has been extended,

see Boullion and Odell [ 8], to computing generalized inverses and

Decell [23] and Ben-Israel and Charnes [5 ] have an approach which is

based on the Cayley-Hamilton theorem.

The question of what technique might prove best for evaluating the

objective of {D.1} is a numerical one which requires further research.

CHAPTER 5

APPLICATIONS

5.1 Introduction

The intent of this chapter is to present some areas of application

in which the generalized inverse dual, {D.I}, should prove useful.

It is demonstrated that the generalized inverse dual of some dual

forms results in the solution of the primal problem which was nondiffer-

entiable on the primal feasible space. Also, it is seen that such

problems evidence other simplifications in the generalized inverse dual

form.

To simplify the notation for presentation, some assumptions have

been made which could be relaxed for discussing these applications in

detail.

5.2 Multifacility Euclidean Distance Location Problem

This problem is one of locating in En N new facilities such that

the maximum weighted Euclidean distance from M existing facilities is

minimized.

The problem is a minimax version of the well-known Fermat problem

which is addressed by Kuhn [39]. (For additional references, see Cabot,

Francis, and Stary [10].) Other distance measures for the problem have

also been considered; most notably the rectilinear measure considered

by Dearing and Francis [221.

The primal Euclidean distance minimax multifacility location problem

is:

{M.1} minimize maximum {wjllx -ai 2 for all j and i;

x1 ,x UjI ...-xjI 2 for 1 < j < k < N}

where:

ai En represents existing facility site for i = 1,2,...,M.

x En new facility site for j = 1,2,...,N.

wji E E given nonnegative weight (squared) representing

interaction between new facility xj and existing

facility ai for all j and i.

ujk e E given nonnegative weight (squared) representing

interaction between new facilities xj and xk for

1 < j < k < N.

I* |I Euclidean norm function.

{M.1} can be written as a constrained nonlinear programming problem

by letting the variable s represent the minimand;

{M.2} minimize s

subject to: wjillxj-ai 2 -s 0

for j = 1,2,...,N and i = 1,2,...,M,

ujk lxj-xkl 2-s 0

for 1 < j < k < N

To facilitate the presentation it will be assumed that wji, j = 1,

2,...,N and i = 1,2,...,M, are all positive and that ujk, for all

1 < j < k < N are also positive. This assumption can be replaced by a

chaining assumption, see Cabot and Francis [ 9] for discussion, which

insures the problem being well formulated.

Using the Euclidean norm squared leads to the following interpreta-

tion of {M.2},

{M.3} minimize pox

0

subject to: xtD x + h x + c < 0 (5.2.1)

2 j i i

for j = 1,2,...,N and i = 1,2,...,M,

1t t

2x Dx + pkx < 0 (5.2.2)

for 1 < j < k < N

t t t ,st

where now x = (xt,x2,...,xs) c En+1

Xj = (xlj,x2j,...,Xnj) E En

t t t t t t 1 i)t E nN+1

hj = (01,02 ...,0 j-,-2a,0 1,..,0N )t E

ii 12 ii i J+1' N' wji

with 0 the zero vector in En,

t t t 1 )t nN+lI

Pjk= (O0 00,...,0O, --) e EnN+

1 2'". N' Ujk

Jk

Po = (0,02,t...'0,)

t I

ci a ia E Dj and Djk are matrices.

Let D be an (nN+l)x(nN+1) matrix represented by

d lI d12I ... d I 01

d211 d22I ... d2NI 0

D =

dNll dN2I ... dNN 0

Ot 0t ... Ot 0

I nxn n

where dij E, I En "" and 0 c E The matrices D and Djk are

defined as follows;

D = D such that dj 2 and all other d = 0,

Djk = D such that djj dkk = 2, djk = dkj = -2 and all other

d = 0.

Clearly, {M.3} is a quadratically constrained quadratic programming

problem and, by definition of the Euclidean norm, D and Djk for all

j and k are positive semidefinite.

Let the dual variables associated with the primal constraints (5.2.1)

be designated by yji and those associated with (5.2.2) be designated by

Vjk'

Hence, by Theorem 3.1, the dual of {M.3} is

N M

{MD.1} maximize (y) = t HtQHy + I yjic

2 j=1 i=l

subject to: y > 0, ye E+N(N-1)/]+

S t t.t

y = (yoy ,v )

Yo = 1

y = (ylly12,... ,ylM" ...yNl'"..YNM)t

v = (V12,V3 ....,vlN,23,...,V2N,...,vN-,N)t

(I-QQ)Hy 0

N M N-1 N

where Q = YiD + I jkDjk

j=l i=l j=l k=j+l

H = (Pohllhl2 ... hNHMl2""'pIN',P23' ... N-i,n)*

It is noted that

M

1 Yliai

i=1

Hy a

M

-i Y2tai

i

yNia

i=1

N M N-1 N

j=1 il wji j=l kj+l ujk

Also, by defining

M N j-1

aj= i I Y + vk + I Vmj (5.2.3)

i =1 k=j+l m=l j

i

for j = 1,2,...,N, using the convention I = 0;

k=j+l

and

aij = -ij for 1 < i < j < N, (5.2.4)

aji= aij (5.2.5)

it is seen that

0 l 0

NxN anm

where 1 = Ix2A, A = (a ) E Ex, I Enn, is defined as a Kronecker

product. By Corollary A.3.4,.

Q -i[7-1

and by Theorem A.12

1 2

Q1 = IxcA-.

Therefore,

I-IxAA I 01

F =y .> 0 y =1,---- Hy 0}.

o 0 I

Looking at the matrix A it is noted that by setting all dual

variables positive and applying the following known theorem of linear

algebra, stated here as a lemma with proof omitted, it is seen that A

has rank N.

N

Lemma 5.1. If A is a symmetric NxN matrix and if aii > i IaijI

j-i

for i = 1,2,...,N, then A is a positive definite matrix.

Therefore, A is nonsingular and

N M y N-1 N

F*i y > 0 y i 1 I -+ I I l}

-j= i=l wji j=l k=j+l Ujk

with an index maximal dual feasible vector being one which has all

components positive. The selection of an initial solution in ri F* = ri F

is trivial.

Furthermore, for any feasible vector y it is seen by (5.2.3), (5.2.4)

and (5.2.5) that if a.j is zero for a particular j then the elements of

the associated row and column are zero. Therefore, by an elementary,

orthogonal, row and column transformation, E, A can be expressed as,

A t All 0

A = E -11.- E

where All has positive diagonal elements.

It can then be shown that

A7 Et [All ] E

Lo loj

and by Corollary A.3.4

t Al 10

A = Et 1 -- E.

LO 1 0

Often it will hold that Al is also positive definite, in which

case, A = A and the dual is further simplified.

11 11

Thus, it is seen that if i(y), a concave objective function, is

maximized by application of a feasible direction algorithm (feasible

directions to F) over F*, a linearly constrained region, then such a

solution also solves (MD.1). The computational difficulties of solving

this dual are of size (dimension of Q and H) and not complexity as the

foregoing demonstrates.

The analysis of this section can also be extended to include multi-

facility problems with constraints. This would encompass some type of

constraint on the weighted distance between any two facilities, either

existing or new. For a discussion of this problem see Dearing [21].

5.3 Sinha Duality

Another application of {D.1} is in solving Sinha's [65] problem.

Sinha has established a dual for one formulation of the stochastic

programming problem. The primal and dual forms of Sinha are

{SP.1} maximize t (x) = dtx (xt x)1/2

j=l

subject to: Ax < b

x> 0

where d, x e En, Q e Enn and symmetric positive semidefinite, A E s

b Es

{SD.1} minimize Q(z) = btz

m

subject to: Atz + I Qjj > d

j=l

wt Qj1 < 1, for j = 1,2,...,m

z >0

where wj e En for j = 1,2,...,m and z E Es.

Making the following notational definitions,

t t 2 t mt s+an

w = (z ,w ,w ...,w ) E

Et = (bt,0,0,...,0) Es+mn

-tt (At t t nx(s+mn)

Q = (A ,QIQ2.. QM) E E

Q E(s+mn)x(s+mn) an m+1 block diagonal matrix such that there

is first an sxs zero matrix and then m nxn matrices, the j+st

matrix on the diagonal being Q all others zero.

-t =SX(S+mn)

Q (1,0,0,...,0) Esx

Sinha's dual is now given by,

{SD.1'} minimize n(w) = tw

1 tw 1

subject to: 2 w < 0

for j = 1,2,...,m

-Qtw + d < 0

tw < 0

o-Q

t t tt m+n+s+l m

Defining y = (yo,.y n ) cE where yo 1, ym E Em,

yn En, and y Es results in the following generalized inverse dual.

{D.5} maximize |(y) = -1 {i y (y Q ) y +y ]} + dt n

subject to: A y + I y =b b

An asxss

[I-(Ym Qj (ymj)]Q n =0

for j 1 l,2,...,m.

ym 0

y > 0

Yn -> 0

y > 0

y_>0

where it is seen that the vector ys can be dropped and the first linear

equality converted to a linear inequality, Aty < b.

It is first observed that while Sinha's dual has s+mn variables,

{D.5} has only m+n. Furthermore, t(y) can be rewritten, based on the

properties of the generalized inverse and Corollary A.3.5, as

t

(Y) = { ([ + ym]} + dtyn

je N (y) = > 0

where N(y) = ifya > 0}, with Q jy = 0 for j J N(Jy)

Mjn

The gradient of 1(y) for any feasible point is likewise available

by Theorem 3.8 and recalling that if (y Q ) is the zero matrix, then

its generalized inverse is also a zero matrix. An initial feasible

point for {D.5} can be derived by solving Aty < b, y > 0, and

setting y = 1 for all j = 1,2,...,m. Hence, solution procedures for

{D.5} will require no matrix inversion computations and therefore is

computationally tractable.

The merit of {D.51 is exemplified in the next theorem.

Theorem 5.1. Given that (ytym )t solves {D.5} then x =

0ot ot

solves {SP.1}. Also, if x solves {SP.1} then (xot y ) solves {D.5}

where ot 01/2

where y x = (x Q x ) 1

Proof of Theorem 5.1. Clearly x = y is feasible to (SP.1}.

Also, for jeN(y )

ym 2 yo2

o ot o 1/2

and yMi ( n Yn

Substituting into *(y),

r t o 1/2 to

'(yo) Y n Qj 2 + d yn 0o(XY)

jeN(ym)

where it is recalled that if jtN(ym. Qy 0= 0. Then 0o(x=yn) ox)

for all feasible x to {SP.1}; assuming otherwise leads to a contradiction

on y0 being optimal to {D.5}.

Now, assuming that xo is a solution to {SP.1} it is seen that by

letting y0 ot o 1/2 ot t t

letting ym = (x Qjx )1 that (x ,ym) is feasible to (D.5} and

(x ,ym) = o(). Also, (x,yo) n (ynym) for (ynm)t feasible to

{D.5} as otherwise a contradiction results to x being optimal to {SP.1}.

Q.E.D.

By using {D.5) instead of {SP.1} the difficulty of the nondifferen-

tiability of 0(x) is avoided by expanding the problem dimensionality to

n+m. This, opposed to {SD.1} having dimensionality of s+mn, should offer

significant advantages in developing efficient computational procedures.

Also, solutions to {D.5} define solutions to {SP.1} by the above theorem

and it is unnecessary to invoke the added complexity of computing primal

variables from dual solutions to {SD.1} which as Sinha discusses will

often involve solution of a linear program.

Obvious application of {D.5} is to stochastic programming problems,

but another application is found in the work of Cabot and Francis [9 ]

who used a formulation of Sinha's dual in investigating a multifacility

location problem involving Euclidean distances. Again the difficulty of

the primal problem involves the nondifferentiability of the objective

function at feasible points.

Cabot and Francis [ 9] employ an equality constrained form of

Sinha's primal and develop an equality constrained dual. In handling

this through {D.5} there are two alternatives. First, as per section

3.6, the linear equality constraint could be used to convert the Sinha

dual to an equivalent form with no equality constraints and then

formulate {D.5}. Alternatively, the equality constraint could be

retained, resulting in yn being unrestricted in {D.5}. Because of

the necessity to compute the generalized inverse of an nx(mn+5) matrix.

with the first alternative and, by Theorem 5.1, the fact that the

second alternative results in the primal solution, the second alterna-

tive should prove preferable even at the cost of more variables.

I~C~

5.4 General Fermat Problem

The Fermat problem dates from the 17th Century and was originally

stated as: Given three points in the plane, find a fourth point such

that the sum of its distances to the three given points is a minimum.

This form of the problem has been addressed by several authors; see

Kuhn [39] for a historical sketch.

The general Fermat problem is: given m points in the plane, find

the point which minimizes the sum of positively weighted distances to the

m points.

m

{GF.l} minimize 0 (x) = wj lx-pj j

x j=l

for wj > 0, p the given jth point and jjx-yJl the Euclidean distance

between points x and y.

From the preceding section this is recognized as a special case of

the Sinha primal. If one dualizes the corresponding Sinha dual, as in

section 5.3, a new version of {GF.1} results, namely

1m 2

{D.6} minimize *'(x,y) =~ [(x-pj)t(yjI) (x-p ) + y w2]

j=l

subject to: yj > 0

[I (yjl) (yjl)](x-pj) = 0

for j = 1,2,...,m

It is clear that F* = {(x,y) E E jy > 0} and an algorithm can

readily be initiated at a relative interior point of F. Use of a

projected gradient algorithm for {D.5} is easily implemented by noting

that the projection matrix is diagonal with zero diagonal elements

associated with y = 0 and x = pj and unity elsewhere.

Note that at the optimal point, for yj > 0,

3'*(xy) (xo-pt(x -pj) +

o -1 0o

which implies that y = w J[x -p I and as in Theorem 5.1, at optimality

'(xo,y) = 0o(x).

By the theory of Chapter 3 the function P' (x,y) is convex and twice

continuously differentiable over y > 0, x unrestricted, the relative

interior of F. Furthermore, if some yj + 0, *' (x,y) + = unless x + p .

2

This is seen by noting that the numerator |x-pj,12 goes to zero faster

than yj. It is noted that this notion applies to the more general

multifacility location problem of Cabot and Francis [9].

5.5 Portfolio Selection

There exist several forms of this problem; examples are found in

Saaty and Bram [63], Markowitz [45], Mao and Slrndal [44], Roy [61] and

Sharpe [64].

The problem addressed here is

n n

{PS.1} maximize I pjxj (minimize I pjxj)

j = j=l

n n

subject to: aij xx < a

J=l i=l

n

il xj c

x > 0

where a given sum of money c is to be allocated to n securities, j is

the expected rate of return on the jth security, the nonzero symmetric

matrix (aij) is the covariance matrix of the random variable rj, the

return rate on the jth security, and a is a constraint on the variance

allowable.

The equality constraint is equivalent to

x (ht)-c + (I-(ht)-ht)g, g En

where h =

shown that

J

(1,1,...,1) En each element being unity. It is easily

(ht) 1 h, (ht)(ht) = 1, and

n

n1 -1 ... -1

S[I-(h (h)] -1 n-1 ... -1 has ran

-1 -1 ... n-l

k n-1.

Hence,

x- h + Jg

n

and {PS.1} is transformed to determining g.

{PS.2} minimize

- uth utJg

n

t1 1 C 2hth

subject to: gJQJg + htQ (o-- ( 2hQh) < 0

-Jg h < 0

n -

where Q = 2A = 2(aij) E Enn

Ut (l2""'...,n) C En and g8 En

{PS.2} has a linear objective function, one quadratic constraint and

n linear inequality constraints.

The generalized inverse dual of {PS.2} is

{D.7} maximize

(y) = ytHt(y1Q')Hy + tCy

2 1 C

subject to: y = (yy o ,* ,"yn+l)t 0

Yo

Cy 0

where H = (u, Qh,I), Q' = JQJ,

n

C [(y JQJ) (y JQJ)]H,

and

-t ( ut a + ()2 ht, cc E n+2

c (-_ "uh, -0 +() hhAh -n ....,-) E

((y) has been.simplified by the readily proved identity,

(JQJ) = J(JQJ)-J.

The form of {D.7} can possibly be further simplifed depending on

the probability distribution assumed for the random variables rj; that

is, if a multivariate normal distribution is assumed, A will be non-

singular and Cy = 0 will reduce to.a single linear constraint for y > 0.

{D.7} has a concave objective function, linear constraints and

offers computational advantages over {PS.1}l The feasible set of {D.7}

permits the straightforward application of Rosen's gradient projection

algorithm and it is only necessary to compute the generalized inverse

of Q' = JQJ once.

5.6 Convex Programming

Topkis and Veinott [68] have stated conditions on the directions

and step sizes which assure convergence to a stationary point in feasible

direction algorithms for minimizing a real-valued continuous function on

a closed set. In particular, for constrained problems, they state second

order methods which require the solution of quadratically constrained

quadratic programs for the determination of feasible directions. See

Topkis and Veinott, Theorem 3 and Lemmas 4 and 5.

Given the mathematical program;

minimize W (x)

o

subject to: (x) W 0

for j = 1,2,...,m

where the j (x), j = 0,1,2,...,m are convex and twice differentiable.

The basic second order method algorithm is stated as follows for

the k+lst solution given the kth feasible solution xk

(1) Is xk optimal? If yes, stop, otherwise continue.

(2) Compute feasible direction dk by solution of the following

program,

minimize a

subject to: [V 0(xk)]td + 1 dtH( (xk))d s 0

S(xk) + [ (xk )td + dtH(O (xk))d s 0

for j = 1,2,...,m

dd < 1

using the generalized inverse dual, {D.I}.

(3) Solve the following for Ak;

minimize (xk + dk)

A>0

subject to:. ( (xk + Adk)

for j = 1,2,...,m

Put xkl = x + kdkand return to Step (1).

Topkis and Veinott, among others, point out that such an algorithm

could prove superior to available first order methods. To see this and

gain an appreciation of the potential of the algorithm the following

example is offered.

{E.3) minimize x2

2 22

subject to: (xI + x)2 ) 1

The result of two iterations for this problem using the proposed

second order method is seen in Figure 5-1 as the dashed line. The

second step estimates the minimum at xl = -0.069 and x2 = -0.998 where the

actual values for the minimum are xl = 0 and x = -1.000. The correspond-

ing results for the first-order method (In a first order method dk is

obtained in Step (2) as the solution of a linear program.) appear in

Figure 5-1 as the solid line.

The generalized inverse dual, (D.11, of Chapter 3 offers a

computationally tractable method of determining second order feasible

directions which has been the major drawback to the algorithm.

(1.0"

/x = 0.0

_0.8831

-0.469

= 0.638

-0.770)

r ( 0.0961

x 1-0.996)

Figure 5-1

2

x =

I

CHAPTER 6

FUTURE RESEARCH

In the foregoing, a viable duality theory has been developed for

convex quadratically constrained quadratic programs. The results of this

research have .exposed other pertinent areas in which further investigation

should prove fruitful. These areas can be broadly categorized as theory,

algorithm design, and numerical theory associated with algorithm imple-

mentation.

Attempts at generalizing the primal problem convex quadratic functions

to pseudoconvex and quasiconvex quadratic functions has proved to be un-

satisfactory. That is, applying the results of Martos [46].and Cottle

and Ferland [19] it can readily be shown that the Lagrangian function

does not exhibit any usable mathematical structure.. This is not totally

unexpected because, in general, sums of pseudoconvex or quasiconvex

functions are not pseudoconvex or quasiconvex. The significance, though,

lies in the fact that while convex quadratic functions impart to the

Lagrangian their mathematical structure, a generalization of convexity

for quadratic functions negates this structural transfer.

The research into which, if any, subclasses of the generalized con-

vex quadratically constrained quadratic programs evidence a viable

generalized inverse dual should continue. The duality theorems of

Karamardian [38], for Lagrangian functions which are strictly quasi-

convex in the primal variables and strictly quasiconcave in the dual

variables, could prove to be a starting point for defining these

subclasses. As noted, investigations to date have been based on

structural properties of the primal problem and not on the dual form.

Hence, one must determine if a recognizable mathematical structure on the

dual form dictates structural properties for the primal problem.

Another area for continued research is the extension of the concept

of an index maximal dual feasible vector as introduced in Chapter 3. It

was seen how this concept led to some very interesting theoretical results

for the class {P.1} and it is possible that a broader interpretation and

use can be established.

A prelude to the above could be the examination of minimal dimension

dual spaces. This idea is prompted by the demonstrated equivalence

between the generalized inverse dual and the conjugate function dual.

That is, the generalized inverse dual has but one dual variable associated

with each inequality constraint of the primal problem, whereas the conjugate

function dual associates n+l dual variables with each inequality constraint

and the generalized inverse dual space is smaller than that of the conjugate

dual space.

In a sense, the property of taking an n variable, m constraint

problem and developing an m variable, n constraint dual has been extended

to convex quadratically constrained quadratic programs. Can this extension

be carried further to other subclasses of convex programs, the dimension

of the dual space being determined by the number of inequalities of the

primal problem? If so, what advantages are there?

Another area of particular interest is the investigation of the

conjugate function in relation to the generalized inverse. Again this

research is prompted by the equivalence between the generalized inverse

dual and the conjugate function dual. What is the detailed relationship

between the two concepts for quadratic functions and can this relationship

be extended to other classes of functions?

Finally, the question of second-order feasible direction methods

should be investigated further in.light of the generalized inverse dual.

The research of this topic falls into all three categories: theory,

algorithm design, and numerical theory.

With respect to algorithm design, there are two general areas for

future research, the first being design of an algorithm based on the

generalized inverse dual for semidefinite problems. Two possible

approaches to such an algorithm would be to design around the generalized

inverse dual exclusively or to use the generalized inverse dual as the

primary form and the conjugate function dual for unidirectional searches,

because of its simpler objective function.

The second area of algorithm design is based on the results reported

in Chapter 5. Specialized algorithms for the general Fermat problem and

extensions to the multifacility problem of Cabot and Francis could be

significant improvements over currently available algorithms.

A direct descendant of algorithm design is research into the numerical

analysis of algorithm implementation. It is first noted that the existing

implemented algorithm for definite problems does not take full advantage.

of the state of the art. That is, an open question remains as to how

efficiently can definite problems be solved by use of the generalized

inverse dual.

For semidefinite problems many questions of implementation require

research. For example, would it be better to compute Q (y) or solve

Q(y)x = -Hy? Also, with respect to an algorithm design combining the

generalized inverse and conjugate function dual, what is required in

86

terms of computation to transfer from one dual form'to the other?

In conclusion, it can be seen from this partial listing that

several interesting and potentially beneficial areas of research have

been opened by the introduction of the generalized inverse dual.

APPENDICES

APPENDIX A

THE MOORE-PENROSE GENERALIZED INVERSE

Ordinarily matrices are thought of as being either nonsingular or

singular and consequently having or not having an inverse. In 1920

Moore [47] published a paper on the reciprocal of a general matrix, but

it went unnoticed until 1955 when Penrose [50], unaware of Moore's work,

published a paper on the generalized inverse of a matrix. Penrose's

paper created the impetus for further research and several authors have

now added to the theory of what is known as the Moore-Penrose generalized

inverse. Because there is yet no standard nomenclature, some authors

refer to the Moore-Penrose pseudo-inverse, pseudo-inverse, general

reciprocal, Moore-Penrose generalized inverse, generalized inverse,

while others refer to various other matrix inverse forms by these and

other names. The recent work of Boullion and Odell [8] has an extensive

bibliography on the subject. Herein, Moore-Penrose generalized inverse

and generalized inverse will be synonymous.

The major advantage of the Moore-Penrose generalized inverse is

that it exists and is unique for any matrix. This permits a unified

theoretical treatment of matrix calculus and greater flexibility in the

application of matrix methods to other fields.

As is often the case in matrix calculus, initial efforts at

developing the theory of the generalized inverse were carried on in the

field of statistics. A representative bibliography of these works is

found in Boullion and Odell.

The application of the generalized inverse theory to operations

research has been almost exclusively in the area of linear programming

and carried out by Cline [18], Pyle [55], and Charnes, Cooper and

Thompson [13]. Charnes and Kirby [14] have addressed the generalized

inverse applied to a convex programming problem.

This appendix is included to support the self-contained nature of

the dissertation and presents a compilation of the Moore-Penrose

generalized inverse theory which is germane to the extensions of

Chapter 2, the duality theory of Chapter 3, and the equivalence concepts

of Chapter 4. The theory herein presented is for real matrices, but the

extension to matrices over the complex field is immediate.

The first definition is due to Penrose [50]. The second and

alternate definition, due to Moore [47], is used for theoretical

development of derivative forms.

Definition. Let A be an nxm matrix. A matrix A which has the

following properties is called a generalized inverse of A:

Wi) AA is symmetric

(ii) AA is symmetric

(ii) AAA = A

(iv) A AA = A

Definition. For any nxm matrix A, the generalized inverse is

defined as

A- = lim (AtA + 621)-1 At

6+0

= lim At(AAt + 62)-1

A discussion and proof of the equivalence of these two definitions

is found in Albert [ 1].

Before proving the existence theorem for generalized inverses,

three lemmas are stated and proved. These lemmas are central to the

existence theorem proof.

Lemma A.I. Let M be a matrix of order nxm of rank r, then there

are matrices P, of order nxr, and Q, of order rxm, both of rank r such

that M = PQ.

Proof of Lemma A.I. Suppose that n a m, there exist nonsingular

matrices R and S such that

R1M 1 or M= R S.

O 0

Put P = R -- and Q -

a nt z I Q -

O 0 [0 0

then M = PQ and P and Q both have rank r.

Q.E.D.

Lemma A.2. Let M be an mxr matrix with rank r. Then MHt is

nonsingular.

Proof of Lemma A.2. Since M is of rank r, there is a nonsingular

matrix P of size rxr such that

MP 0 r

ptMt= (I ,0) ,

and

PtMP = (Ir,0) Or = I

0J

Thus, there exists a nonsingular matrix P such that

PtMtP = I

MtM is now equivalent to I and hence nonsingular.

Q.E.D.

Lemma A.3. Let M be an rxm matrix of rank r. Then MMt is

nonsingular.

Proof of Lemma A.3. Define M = Mt and apply Lemma 2 proving MM

is nonsingular, but Mt =MMt is nonsingular.

Q.E.D.

Theorem A.1. The generalized inverse of an nxm matrix A, if it

exists,, is of order mxn.

Proof of Theorem A.1. By the properties of the generalized

inverse and conformability requirements it is seen that for AA- and A A

to be symmetric with A of order nxm, A must be of order mxn.

Q.E.D.

The existence theorem of the generalized inverse can now be

stated.

Theorem A.2. Let A be an nxm matrix. Then A has a generalized

inverse.

Proof of Theorem A.2. If A is a zero matrix of order nxm then A

is a zero matrix of mxn. Assume that A is not a zero matrix. By Lemma

A.1, A can be factored in the form

A = PQ

where P is nxr and Q is rxm matrices of rank r. P P and QQt are .non-

singular by Lemmas A.2 and A.3.

Put A- Qt (QQt)-(tp)-ipt, then

(i (AAn)t = A-tAt

= [Qt(QQt)'(ptp)-lpt]t(Q)t

= p{(ptp)-1}t{(QQt)-}tQQtpt

= p(ptp)-l(QQt)-(QQt)pt

= .P(ptp)-pt

Therefore, AA is symmetric and satisfies (i) of the definition.

(Ui) (A-A)t = AtA-t

= (pQ)tQt (QQt)-1(ptp)-pt]t

= Qtpt[P(ptp)-)}{(QQt)-1}tq

= Qt (QQt)-1Q

A-A is symmetric and satisfies (ii) of the definition.

(tLl) AAA = PQ[Qt(QQt)-' (ptp)-pt]pQ

= PQ

= A,

satisfying (iii) of the definition.

(iv) A-AA- = [Qt(QQt)- (pt )-pt]pQ[Qt(QQt)-1(ptp)-Ipt]

SQt(QQt)-(ptp)-Ipt

= A".

The defined A = Qt (QQt) -(PtP)-1pt is therefore a generalized inverse

of A and always exists.

Q.E.D.

In the following, it is shown that the generalized inverse as

defined is unique.

Theorem A.3. Let A be an nxm matrix. The generalized inverse A

of A is unique.

Proof of Theorem A.3. Suppose that A has two generalized inverses,

say AI and Ag. It will first be shown that AAI = AA2.

AA2 (AAAA)A2

S(AA1)(AA2)

= (AA2) t (AA)

= (AA2A)A1

= AA1

Similarly, A;A = AA. Therefore,

A; = A2AA2.

= (A2A)A;

=. (AA)A2

= A (AA2)

= A1AA1

Hence, the generalized inverse is unique.

Q.E.D.

Corollary A.3.1. The generalized inverse of At is the transpose

of the generalized inverse of A; i.e., (At) = (A)t.

Proof of Corollary A.3.1. The proof consists of showing that.

(A)t is the generalized inverse of At and since the generalized inverse

is unique by Theorem A.3, it follows that (At)- = (A-)t. Write A = PQ

as in the proof of Theorem A.2,

A- Qt (Qt)- (ptp)-pt. (A.1)

Also,

At =Qtt

and

(At)' p(ptp)-1(QQt)-'Q.