UFDC Home  myUFDC Home  Help 



Full Text  
OPTIMIZATION APPROACHES IN RISK MANAGEMENT AND FINANCIAL ENGINEERING By MICHAEL ZABARANKIN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2003 Copyright 2003 by Michael Zabarankin ACKNOWLEDGMENTS I thank my supervisors, Prof. Stanislav Uryasev and Prof. R. Tyrrell Rockafellar, and the members of my Ph.D. committee for their help and guidance. I am also grateful to my parents for their constant support. TABLE OF CONTENTS ACKNOW LEDGMENTS ................................. LIST OF TABLES ............................. ... ..... LIST OF FIGURES .. .. ... ... ... .. ... ... ... .. ... ... ... . A B ST R A C T . . . . . . . . . .. page iii vi vii x CHAPTER 1 INTRODUCTION .................................. 2 PORTFOLIO ANALYSIS WITH GENERAL DEVIATION MEASURES ... 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 Introduction . . . . . . . . Deviation and Risk . .. ... .. ... ... ... .. ... .. Portfolio Framework .. ...................... Fundamental Optimization Problem .. .............. Efficient Sets and Frontiers .. .................. Threshold Determination for the RiskFree Rate .. ......... Characterization of Optimal Portfolios .. .............. Specialized CAPMlike Relations .. ................ Multiplier Derivation of Thresholds .. ................ C conclusions . . . . . . . . 3 DRAWDOWN MEASURE IN PORTFOLIO OPTIMIZATION ......... 3.1 Introduction . . . . . . . . . 3.1.1 Drawdown Regulations in Real Trading Strategies ......... 3.1.2 Drawdown Notion in Theoretical Framework ............ 3.2 M odel Development ............................. 3.2.1 Dynamic Performance Functionals ................ 3.3 Absolute Drawdown for a Single Sample Path .............. 3.3.1 Maximum, Average and Conditional Drawdowns ......... 3.3.2 Conditional ValueatRisk and Conditional Drawdown Properties 3.3.3 Mixed Conditional Drawdown .................... 3.4 Optimization Techniques for Conditional Drawdown Computation . 3.5 Multiscenario Conditional ValueatRisk and Drawdown Measure ... 3.5.1 Multiscenario Conditional ValueatRisk .. ............ 3.5.2 Drawdown Measure .. ..................... 3.6 Portfolio Optimization with Drawdown Measure .. .......... 3.6.1 Reduction to Linear Programming Problem .. .......... 3.6.2 Efficient Frontier .. ...................... 3.7 Drawdown Measure in Reallife Portfolio Optimization . ... 86 3.7.1 Static Asset Allocation .................. ... .. 86 3.7.2 Historical Data and Scenario Generation . . 88 3.7.3 Numerical Results. .................. ...... .. 91 3.8 Conclusions .................. ............... .. 92 4 TRAJECTORY OPTIMIZATION IN A THREAT ENVIRONMENT ..... 106 4.1 Introduction. .................. .............. .. 106 4.2 Model Development . . . ......... 110 4.3 Minimization of a Functional with Nonholonomic Constraint and Movable End Point ....... .... ... .......... ...... 114 4.4 Calculus of Variations Approach ................ . .. 117 4.5 Network Flow Optimization Approach ..... . . 134 4.5.1 Network Structure ....... . . . .. 134 4.5.2 Smoothing Procedure and Curvature Constraint . ... 136 4.5.3 Approximation Scheme . . . ..... .... 138 4.5.4 Reduction to the Constrained Shortest Path Problem ...... 141 4.5.5 The Label Settings Algorithm with Preprocessing Procedure 142 4.6 Numerical Experiments .................. ...... 145 4.6.1 2D Network Optimization . . . ... 146 4.6.2 3D Network Optimization in the Case of a Single Radar . 147 4.6.3 3D Network Optimization in Cases with Two and Three Radars 149 4.7 An !i, i, of Computational Results ............. . 152 4.8 Conclusions .................. ............... .. 159 5 DISCUSSION ................... ................ 162 REFERENCES ................... ......... ....... 165 BIOGRAPHICAL SKETCH .................. ............ .. 169 LIST OF TABLES Table page 31 Average Drawdown optimization: 1 sample path ............... ..96 32 Average Drawdown optimization: 100 sample paths . . ..... 97 33 Average Drawdown optimization: 300 sample paths . . ..... 98 34 0.8Conditional Drawdown optimization: 1 sample path . . .... 99 35 0.8Conditional Drawdown optimization: 100 sample paths . ... 100 36 0.8Conditional Drawdown optimization: 300 sample paths . ... 101 37 Maximum Drawdown optimization: 1 sample path . . ..... 102 38 Maximum Drawdown optimization: 100 sample paths . . .... 103 39 Maximum Drawdown optimization: 300 sample paths . . .... 104 41 Optimal risk, AL, C and C's estimate .................. 133 42 Coefficients for the Gaussian quadrature for J = 16 . . ... 140 43 2D network preprocessing: single radar .................. .. 147 44 2D network optimization with LSA: single radar ..... . ..... 148 45 3D network preprocessing: single radar .................. .. 150 46 3D network optimization with LSA: single radar .. . 150 47 3D network optimization with LSA & smoothing condition: single radar .150 48 3D network optimization: two radars .................. 153 49 3D network optimization: three radars .................. .. ..153 LIST OF FIGURES Figure page 21 C'l ni I! efficient set: expected gain versus standard deviation . ... 22 22 Reduced optimization perspective .................. ... 27 23 Efficient sets in coordinates standard for graphs of functions . ... 29 24 Efficient sets in coordinates traditional for finance .............. ..30 25 Threshold values for the riskfree rate .................. .. 36 26 Threshold interval tied to corner behavior .................. ..40 31 Time series of uncompounded cumulative rate of return w and corresponding absolute drawdown (. .................. .. ...... 65 32 Drawdown time series ( and indicator function I{c . . ..... 67 33 Function 7r (s) .................. ............... .. 68 34 Inverse function tr.(a) ................. . . 68 35 Drawdown surface and threshold plane .................. .. 80 36 Efficient frontiers: Average Drawdown .................. .. 93 37 Optimal ii]: .liusted returns: Average Drawdown . . ..... 93 38 Efficient frontiers: 0.8Conditional Drawdown ................ ..94 39 Optimal i1:. .lusted returns: 0.8Conditional Drawdown . ... 94 310 Efficient frontiers: Maximum Drawdown ................ 95 311 Optimal ii1: .lusted returns: Maximum Drawdown . . ..... 95 41 Ellipsoid shape is defined by parameter r = b/a ..... . ..... 111 42 3D model for optimal path planning in a threat environment . ... 112 43 Function p( ) sin ) . ............... . 126 44 Optimal t if I, .1 for the case of 1,!. i. with different constraints on the length, 1,, in the tI '.i I, i. ' plane ................ .. 127 45 Optimal t i'. I. 1. for the case of 1'!. i. with different constraints on the length, 1,, in 3D space .................. ........ .. .. 128 46 Comparison of optimal t i.1 ii ,s in trajectories' plane for the cases with a single sensor (n = 2) and single radar (n = 4) with the same constraint on the length, 3.2 ....... ........ ...... .. ...... 128 47 Optimal trajectories for sphere (K = 1.0) and elongated ellipsoids (K = 0.5, 0.1) for n =4 and 3.2 shown in tf i. II. .'y's plane . . ... 132 48 Optimal trajectories for sphere (K = 1.0) and compressed ellipsoids (K = 2.0, 10) for n = 4 and 1, 3.2 shown in trajectory's plane . .... 132 49 Optimal t1 ,i.i l.,i. for sphere (K = 1.0), elongated (K = 0.1) and com pressed (K = 2.0) ellipsoids for n = 4 and 1, 3.2 in 3D space ..... ..133 410 Structure of arcs in every node in a 2D network: "1" axis arcs, "2" diagonal arcs, "3" long diagonal arcs ................... .... 136 411 Structure of arcs in every node in a 3D network ..... . ..... 137 412 Network smoothing and curvature constraint ................ ..137 413 3D network for solving the risk minimization problem . . ... 139 414 Comparison of analytical and discrete optimization t1 i i. I. .1 i. s for the case of sphere (K = 1.0), n = 4 and different length constraints, 1,, in t, i.. i i .' plane ................... .... ............. 148 415 Comparison of analytical and discrete optimization t i i, I .1 i. s for elongated (n = 0.1) and compressed (K = 2.0) ellipsoids for n = 4 and the same constraint on the length 3.2 in tf i I. 1 i. ' plane . .... 149 416 Comparison of analytical and discrete optimization t! I .1 i, s for sphere (n = 1.0), n = 4 with different length constraints, 1,, in 3D space . 151 417 Comparison of analytical and discrete optimization 11 i I, ..1 i, s for elongated ellipsoid K = 0.1 and parameters n = 4, 1 = 3.2 in 3D space . ... 151 418 Comparison of analytical and discrete optimization t li. I 1i. for com pressed ellipsoid K = 2.0 and parameters n = 4, 1 = 3.2 in 3D space 152 419 Optimal t1, i. i bi. s in the case of two radars for compressed ellipsoid (K = 2.0), sphere (K = 1.0) and elongated ellipsoid (K = 0.1) with the same length constraint, = 3.2 .................. ....... .. 153 420 Front view: optimal trajectories in the case of two radars for compressed ellipsoid (K = 2.0), sphere (K = 1.0) and elongated ellipsoid (K = 0.1) with the same length constraint, = 3.2 .................. .. ..154 421 View from above: optimal trajectories in the case of two radars for com pressed ellipsoid (K = 2.0), sphere (K = 1.0) and elongated ellipsoid (n = 0.1) with the same length constraint, = 3.2 . . .... 154 422 Optimal trajectories in the case of three radars for compressed ellipsoid (K = 2.0), sphere (K = 1.0) and elongated ellipsoid (K = 0.1) with the same length constraint, = 3.2 .................. ....... .. 155 423 Front view: optimal trajectories in the case of three radars for compressed ellipsoid (K = 2.0), sphere (n = 1.0) and elongated ellipsoid (K = 0.1) with the same length constraint, l = 3.2 .................. .. ..155 424 Side view: optimal t i. i i.1ii. in the case of three radars for compressed ellipsoid (K = 2.0), sphere (K = 1.0) and elongated ellipsoid (K = 0.1) with the same length constraint, = 3.2 .................. .. ..156 425 Dependence of LSA running time on the shape of ellipsoid, K (3D network, single radar): curve "1" no smoothing, curve "2" smoothing is used 157 426 LSA running time versus number of labels treated: 3D network, single radar 158 427 Number of labels treated versus number of nodes left after preprocessing (3D network, single radar): curves "1" and "2" correspond to LSA and LSA with iri. .... i1in respectively .................. ..... 159 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OPTIMIZATION APPROACHES IN RISK MANAGEMENT AND FINANCIAL ENGINEERING By Michael Zabarankin August 2003 Chair: Stanislav Uryasev M i, Ir Department: Industrial and Systems Engineering The dissertation is dedicated to development and application of analytical and discrete optimization approaches in Risk Management and Financial Engineering. Through tech niques of convex .11 i,, i. the dissertation rigorously generalizes the axiomatic properties of standard deviation to a new class of deviation measures. The role of deviation mea sures in optimization and Financial Engineering is .111 '1,. ..1 Optimality conditions based on concepts of convex i_ i1, i. but relying on the special features of deviation measures are derived in support of a variety of potential financial applications, such as portfolio optimization and the Capital Asset Pricing Model. The dissertation introduces a new one parameter family of risk measures called Conditional Drawdown (CDD). These measures of risk are functionals of the portfolio drawdown (underwater) curve considered in active portfolio management. Mathematical properties of the CDD are studied; and efficient opti mization techniques for CDD computation and solving asset allocation problems with CDD measure are proposed. A reallife assetallocation problem is .1,11,...1 and solved using the proposed measures. The dissertation develops a model and corresponding analytical and discrete optimization approaches for routing an aircraft in a threat environment. The threat is associated with the risk of aircraft detection by radar or sensor installations. The model considers the aircraft in three dimensional space with a variable radar crosssection and an arbitrary number of installations. The risk minimization problem subject to a con straint on trajectory length is reduced to network flow optimization. In the case of a single installation, an analytical solution for this problem is obtained. CHAPTER 1 INTRODUCTION The dissertation considers and develops analytical and discrete optimization ap proaches to various Risk Management and Financial Engineering applications. Chapter 2 considers generalized measures of deviation, as substitutes for standard deviation, in a framework like that of classical portfolio theory for coping with the uncertainty inherent in achieving rates of return beyond the riskfree rate. Such measures, associated for example with conditional valueatrisk and its variants, can reflect the different attitudes of different classes of investors. They lead nonetheless to generalized onefund theorems as well as to covariance relations which resemble those commonly used in capital asset pricing models (CAPM), but have wider interpretations. A more customized version of portfolio optimiza tion is the aim, rather than the idea that a single i_, i r fund" might arise from market equilibrium and serve the interests of all investors. The results cover discrete distributions along with continuous distributions; and there fore are applicable in particular to financial models involving finitely many future states, whether introduced directly or for purposes of numerical approximation. Through tech niques of convex mI i', i. our study deals rigorously with a number of features that have not been given much attention to this subject, such as solution nonuniqueness, or nonex istence, and a potential lack of differentiability of the deviation expression with respect to the portfolio weights. Moreover we address in detail the previously neglected phenomenon that, if the riskfree rate lies above a certain threshold, a master fund of the usual type will fail to exist and need to be replaced by one of an alternative type, representing a "net short 1p '..i 1 instead of a "net long p1I ... ,,11 in the risky instruments. Chapter 2 develops the Capital Asset Pricing Model (CAPM) based on generalized deviation measures in the general case of stock distributions. The CAPM is intended for finding an efficient portfolio of financial instruments available in the market; and pricing these instruments with respect to an efficient portfolio. The risk and expected rate of return are two main characteristics associated with a portfolio. The efficient portfolio is an optimal solution for a problem of portfolio risk minimization with constraint on the portfolio expected rate of return. Using apparatus of convex .11 i, 1. we extended classical CAPM to the class of deviation measures under the assumption that the riskfree instrument is available in the market. It turned out, that, theoretically, there is a critical set for the values of the rate of return of the riskfree instrument, for which an appropriate CAPM cannot be constructed. A theorem establishing the structure of the critical set is formulated and proved. The critical set must be a closed finite interval. One of main results is formulated in the following form: if the rate of return of the riskfree instrument does not belong to the critical interval, then the vector composed by the stock's expected rates of return must belong to the set of generalized gradients of a deviation measure calculated for the efficient portfolio. This result is illustrated by several examples using classical measures such as standard deviation and Conditional Value at Risk. Chapter 3 proposes a new oneparameter family of risk measures called Conditional Drawdown (CDD). These measures of risk are functionals of the portfolio drawdown (un derwater) curve considered in active portfolio management. For some value of the tolerance parameter a, in the case of a single sample path, drawdown functional is defined as the mean of the worst (1 a) 100% drawdowns. The CDD measure generalizes the notion of the drawdown functional to a multiscenario case and can be considered as a generaliza tion of a deviation measure to a dynamic case. The CDD measure includes the Maximal Drawdown and Average Drawdown as its limiting cases. Chapter 3 studies mathematical properties of the CDD and develops efficient optimization techniques for CDD computation and solving assetallocation problems with a CDD measure. For a particular example, we find the optimal portfolios for cases of Maximal Drawdown, Average Drawdown, and sev eral intermediate cases between these two. The CDD family of risk functionals is similar to Conditional ValueatRisk (CVaR), which is also called Mean Shortfall, Mean Access loss, or Tail ValueatRisk. Some recommendations on how to select the optimal risk func tionals for getting practically stable portfolios are provided. Chapter 3 solves a reallife assetallocation problem using the proposed measures. Chapter 4 deals with Risk Management in military applications. It develops a model and analytical and discrete optimization approaches for routing an aircraft in a threat en vironment. The model considers an aircraft's trajectory in three dimensional (3D) space; and presents the aircraft by a symmetrical ellipsoid with the axis of symmetry determining trajectory direction. The threat is associated with the risk of aircraft detection by radars, sensors or surface air missiles. Using the analytical and discrete optimization approaches, the deterministic problem of finding an aircraft's optimal risk trajectory subject to a con straint on the trajectory length has efficiently been solved. Through techniques of Calculus of Variations, the analytical approach reduces the original risk optimization problem to a vectorial nonlinear differential equation. In the case of a single detecting installation, the solution to this equation is expressed by a quadrature. The discrete optimization approach approximates the original problem by the Constrained Shortest Path Problem (CSPP) for a 3D network with a flexible structure. The CSPP has been solved for various ellipsoid shapes and different length constraints in the cases of several radars. The impact of ellip soid shape on the geometry of an optimal ti i I *r'y as well as impact of variable RCS on performance of the discrete optimization approach have been .111 i ,. 1 and illustrated with several numerical examples. CHAPTER 2 PORTFOLIO ANALYSIS WITH GENERAL DEVIATION MEASURES 2.1 Introduction The minimization of variance, or equivalently standard deviation, is a universally famil iar feature of classical portfolio theory. It has been subjected to criticism, however, because standard deviation does not adequately account for the phenomenon of "fat ii! in loss distributions, and moreover penalizes ups and downs equally. In everyday applications, a more common tool than standard deviation is valueatrisk, VaR. But that too has been controversial because of mathematical shortcomings (lack of convexity and monotonicity, as well as reasonable continuity) and its inability to respond to the magnitude of the possible losses below the threshold it identifies. A related concept, conditional valueatrisk, CVaR (also known as expected shortfall and tailVaR), has proved to be superior in these respects and therefore more suited for the optimization of choices in financial management. For sound methodology in finance, a theory relating these different approaches and allowing their effects to be compared is essential. A very important step toward such a comprehensive theory was made by Artzner et al. [1], who introduced the notion of a coherent risk measure, demonstrating that VaR did not provide coherency. Although extensive motivation was given by Artzner et al. [1], i .1.~ i,; y" has not fully taken hold in the community concerned with applications. One obstacle has been their axiom [1] that deals with the effect of adding a constant to the return of a financial random variable. Many have balked at that axiom, despite its explanation. The reason for this difficulty may well be confusion over the meaning of !.  '. Artzner et al. [1] referred to loss as a negative outcome, whereas for many practitioners it is assumed to refer to a shortfall relative to expectation. This is compounded by the fact that, in applications, VaR and CVaR are typically invoked to measure such a shortfall, instead of measuring absolute loss itself. There is more to this than might appear on the surface. On a space of random variables X, it is one thing to apply a risk measure to X C for some fixed constant C, and quite another to apply it to X EX. A basic aim of this chapter is to make clear that risk measures, more or less in the sense of Artzner et al. [1], when applied to X EX, yield a separate class functionals that can aptly be called deviation measures. Deviation measures satisfy different axioms and have their own significance, with standard deviation just being one example that happens to be symmetric. In classical portfolio theory, investors respond to the uncertainty of profits by selecting portfolios that minimize variance, or equivalently standard deviation, subject to achieving a specified level in expected gain [2, 3]. The well known "onefund Ii, *' 11 [4, 5] stipulates that this can be accomplished in terms of a single iii i, r fund" portfolio by means of a formula that balances the amount invested in that portfolio with the amount invested at the current riskfree rate. The ..... irl inii:_ "CAP\ theorem furnishes covariance relations with respect to the master fund portfolio which are interpreted as having a potential for predicting the market behavior of financial instruments [6, 5]. An overview of "CAP I results was provided by Grauer [7]. Extensions of the theory to account for higher moments than variance were considered by Samuelson [8]. N.. ., 1 i,. other approaches to uncer' ,ii , have gained in popularity. Portfolios are being selected on the basis of characteristics such as valueatrisk (VaR), conditional value atrisk (CVaR), or other properties proposed for use in risk assessment. These measures have no pretension of being universal, however, VaR and CVaR depend, for instance, on the specification of a confidence level parameter, which could vary among investors. Instead, what is apparent in the alternative approaches currently being touted is a move toward a kind of partial customization of responses to risk, while still avoiding (as impractical) a reliance on specifying individual utility functions. Utility functions in finance have been the theme in numerous publications [9, 10]. A question in this evolving environment is whether any parallels to the classical theory persist when the minimization of standard deviation is replaced by something else. Re searchers have already looked into the possibilities in several special cases, under various limiting assumptions (recognized explicitly or imbedded implicitly). Our goal, in contrast, is to demonstrate that important parallels with classical theory exist much broader and more, despite technical hurdles. By showing this, we hope to bring out features that have not completely been .,11 i,. .1. or even perceived, in the past. We focus on the general deviation measures that were developed axiomatically by Rockafellar et al. [11]. These were shown to be closely related to, yet distinct from, risk measures in the sense of Artzner, Delbaen, Eber and Heath [1]; and to enjoy a number of attractive characterizations. Our idea is to substitute such a deviation measure for standard deviation in the setting of classical theory and investigate the consequences rigorously in detail. Furthermore, we aim at doing so, for the first time, in cases where the rates of return may have discrete distributions as well as cases where they have continuous distributions. Our central result says that a basic onefund theorem holds regardless of the particular choice of the deviation measure, but with certain modifications. The optimal risky portfolio need not be unique (although it often might be). More significantly, the full study of optimality in the context of an available riskfree instrument requires an understanding not only of an efficient frontier for risky portfolios at cost 1, but also of such a frontier for risky portfolios at cost 1. We establish that the second frontier comes into 1p1 , when the riskfree rate of return exceeds a certain threshold. Moreover, we show how that threshold can be calculated by solving an auxiliary optimization problem. The need for an efficient frontier referring to "net short positions," along with the usual one for "net long positions," is not surprising, in view of the diversity of measures that investors may be using. In line with their different opinions about risk, some investors may find the riskfree rate high enough to warrant borrowing from the market and investing that money riskfree, while others will prefer a fund in which the !, i,'_," outweigh the .! '! ts." Sharpe [6, p. 507] discussed an interesting analogy in terms of a stock index futures contract which might even consist entirely of short positions. The emergence of a variety of different master funds, optimal for different deviation measures, is an inescapable outcome of any theory which, like ours, attempts to cope with the current tendency toward customization in portfolio optimization. What does it mean for CAPM, though? A master fund identified with respect to the wishes of one class of investors can no longer be proposed as obviously furnishing input for factor mi1 i, i of the market as a whole, because the financial markets react to the wishes of all investors. A master fund, in our general sense, can no longer be interpreted as associated with a sort of universal equilibrium. Whether some such master funds, individually or collectively, might nonetheless turn out to be valuable in factor .111 ,i. is an issue outside the scope of this chapter. We do, however, prove a theorem that resembles the classical one in providing CAPMlike covariance relations, which serve to characterize the master funds. We place the generalized /3 coefficients in these relations in a framework from which the results in this category that other researchers have obtained earlier, in special contexts, can readily be derived. Included in that way are results, subject to restrictions, for the deviation measures associated with meanlower partial moments [12, 13], conditional valueatrisk [14, 15, 16], and with mean absolute deviation [17]. Our CAPMlike results have far fewer restrictions. They do not rely on the existence of density functions for the distributions that arise, or even on the absence of probability atoms (corresponding to jumps in the distribution functions), which would preclude applications to discrete random variables. Furthermore, they do not require the differ 'Ji i61.1,i of the deviation with respect to the parameters specifying the relative weights of the instruments in the portfolio. This broad advance has been made possible by our utilization of techniques of convex .111 i1, i [18] beyond standard calculus, along with strict adherence to the fundamental principles of optimization theory. Lack of familiarity with the mathematics of optimization has been a handicap in some of the finance literature in this area, going all the way back to Markowitz. For example, Markowitz [3] excluded short positions by constraining the portfolio weights to be non negative. He neglected, however, to take into account that Lagrange multipliers for those inequality constraints could come into 1.1 ,. in which case a closedform solution to the optimality conditions for a master fund would be impossible. Supposing that the multi pliers can be taken to be zero is equivalent (because of convexity) to supposing that, if shorting were allowed, there would ir, v, be no short positions at optimality. There is no support for that conviction, however, and indeed, numerical calculations are known to produce quite different answers when shorting is allowed and when it is not. Similar loose ness about whether solutions to optimization problems even exist must be the reason why the magnitude of the riskfree rate was not perceived to have an effect, and the need for master funds representing net short positions went undetected. The need for allowing at least some short positions in a master fund was emphasized by Sharpe [6, pp. 500, 505]. Even though the generalized CAPMlike 13 coefficients we study lack the iii. i," connotations ascribed to the classical /3's, they capture characteristics which may be valu able in assessing the riskiness of financial instruments. In every case, the f3 of an instrument is the ratio of a covariance expression to the generalized deviation of the random variable giving the rate of return of a particular master fund. Instead of the covariance between the rate of return of the instrument and the rate of return of the master fund, however, the f3 is based on the covariance between the rate of return of the instrument and a special random variable extracted from the rate of return of the master fund, as dictated by the particular deviation measure. For instance The f3 for lower semideviation measures the dependence of the instrument's rate of return on the master fund's downside (this being just the part of the fund's random variable that tracks underperformance). The f3 for conditional valueatrisk relative to a chosen atail of losses assesses how the instrument reacts when the master fund for this situation dips into that tail. The f3 for a mixture of conditional valueatrisk involving several different loss tails provides a balanced assessment based on the possibilities of poor performance of the master funds associated with those tails. The f3 for the lowerrange deviation of an instrument measures its reaction to the worstcase performance of a certain master fund. Such examples, worked out near the end of the chapter, reveal that, despite departures from classical portfolio theory in interpretation, the passage to general deviation measures is capable of uncovering information that could be of considerable interest in financial applications. 2.2 Deviation and Risk We start by reviewing what we mean by deviation measures and explaining how they are paired with risk measures, which thereby give rise to a spectrum of useful examples. We consider a space Q, the elements w of which can represent future states or individual scenarios (perhaps just finitely many), and suppose it to be supplied with a probability measure P and the other technicalities that make it a legitimate probability space. We treat as random variables (r.v.'s) the (measurable) functions X on Q for which E[X2] < oo; the space of such functions will be denoted, for short, by 2(Q). For X in 2(Q), the mean /t(X) and variance T2(X) are well defined, in particular: (X) = EX = X(w)dP(w), (2.1) 2(X) =E[X EX]2 f=[X(w) p(X)]2dP(). To assist in working with constant r.v.'s, X(w) C, the letter C will i1,, denote a constant in the real numbers IR. By a deviation measure will be meant a functional D that assigns to each random variable X (understood to be in (Q) 1., , ,) a value D(X) in accordance with the following axioms: (Dl) D(X + C)= D(X); equivalently, D(X) = D(X EX) for all X, (D2) D(0) = 0, and D(AX) = AD(X) for all X and A > 0, (D3) D(X + X') < D(X) + D(X') for all X and X', (D4) D(X) > 0 for nonconstant X, whereas D(X) = 0 for constant X. These axioms come from our paper [11], where the notion of a general deviation measure was first formulated at this level.1 The equivalence in D1 is evident from taking C to equal EX, and on the other hand, noting that [X + C] E[X + C] = X EX for any constant C. The example of standard deviation, D(X) = a(X), dominates classical portfolio theory and is symmetric in the sense that D(X) = D(X). Similar ii, .i, _,1, I li: examples of deviation measures , li, i_:_ the axioms include the standard semideviations D(X) o (X) and D(X) = (X), where o2(X) =E[maxEX X,0}2], 2(X) = E[max{X EX, 0}2]. (2.2) 1 Malevergne and Sornette [13] described a class of measures axiomatically in terms of Dl, D2 (effectively), and the version of D4 requiring only weak inequality, but no D3. Such measures lack convexity and other properties. The first of these emphasizes the downside of X, while the second emphasizes the upside. A very different pair of examples, likewise oriented to downside or upside, is furnished by the lower range and the upper range, D(X) = EX inf X, D(X) = sup X EX, (2.3) where infX and supX denote the  i i iI" infimum and supremum of X(w) over w E Q (obtained by disregarding subsets of Q having probability 0). For either of these, it is possible for some r.v.'s X that D(X) = oo, which is allowed by the axioms. Of course, both are sure to be finite in the case of a finite, discrete p, l.1. 1 dil l space Q. Another class of deviation measures, of increasing interest now in applications, arises from conditional valueatrisk, CVaR, as an alternative to valueatrisk, VaR. A brief discus sion of risk measures, in contrast to deviation measures, will lay the platform for introducing this class properly. By an expectationbounded risk measure will be meant a functional R that assigns values R(X) to random variables X in such a way that (R1) R(X + C) = R(X) C for all X and constants C, (R2) R(0) = 0, and R(AX) = 7R(X) for all X and all A > 0, (R3) R(X + X') < R(X) + R(X') for all X and X', (R4) R(X) > E[X] for all nonconstant X, whereas R(X) = E[X] for constant X. Axiom R4 is the property we explicitly mean by ::I I' i' i.'.." 'ii. II. II (The equation part of R4 is already a consequence of R1, so the strict inequality for nonconstant X is the chief assertion.) Artzner, Delbaen, Eber and Heath in their landmark contribution to risk theory were the first to consider risk measures from a broad perspective, but they concentrated instead on functionals R I if, i :_ R1, R2, R3 and, instead of R4, the monotonicity axiom: (R5) R(X) < R(X') when X > X'. They called these coherent risk measures. (Actually, they posed R5 in a seemingly weaker form, namely R(X) < 0 when X > 0, which is equivalent to the present R5 under the other axioms. Also, they had somewhat different version R1, tailored to the use of an investment instrument, but the version used here was subsequently adopted by Delbaen [19].) Property R5 is natural and even crucial for many purposes, and we will be concerned with it as well. However, we forgo it in the basic definition of an expectationbounded risk measure in order to capture a fundamental pairing between risk measures and deviation measures. The risk boundedness axiom R4, first spelled out in our own earlier work, is needed for the following result, where it emerges as the counterpart to deviation axiom D4. Theorem 1 (deviation vs. risk [11]). Deviation measures correspond onetoone with expectationbounded risk measures under the relations (a) D(X) = R(X EX), (b) R(X) E[X] + D(X). Specifically, if 7R is an expectationbounded risk measure and D is defined by (a), then D is a deviation measure that yields back R through (b). On the other hand, if D is any deviation measure and 7R is defined by (b), then 7R is a risk measure that yields back D through (a). In this correspondence, 7R has the further property R5, yielding coherency, if and only if D has the further property that (D5) D(X) < EX inf X for all X. In accordance with the final part of Theorem 1, we call D a coherent deviation measure when it satisfies D5 along with Dl, D2, D3, and D4. The deviation measure D(X) = pa(X) for any p E (0, oc) is paired, for instance, with 7R(X) = po(X) p(X), whereas the lowerrange deviation measure D(X) = EX inf X is paired with the maximum loss risk measure 7?(X) = sup[X]. Coherency is lacking in the first example but present in the second example. Recall next that for any a E (0, 1) the valueatrisk of X at level a is defined by VaRa(X) = inf{ z I P{X < z} > a}, (2.4) and then the corresponding conditional valueatrisk is [expectation of X in its lower atail distribution]. CVaR,(X) (2.5) The expectation in question is the same as the conditional expectation of X subject to X < VaR (X) when P{ X X VaR (X)} = 0, but refers more generally to the expectation of the r.v. whose cumulative distribution function F, is obtained from the cumulative distribution function F for X by taking F,(z) = F(z)/a when z < VaR,(X) and F,(z) = 1 when z > VaR,(X). This form of the definition of CVaR,(X) was developed by Rockafellar and Uryasev [20], earlier Rockafellar and Uryasev [21] concentrated only on the case of continuous distribution functions. Acerbi [22] has shown that this value can also be obtained from the formula CVaR (X) VaRp(X)dp. (2.6) a o The important thing for our purposes here is that a coherent riskdeviation pair in the pattern of Theorem 1 is obtained by taking R(X) = CVaR,(X), D(X) = CVaR,(X EX). (2.7) In contrast, the functional R(X) = VaR,(X) fails in general to satisfy axioms R3, R4 and R5, and correspondingly D(X) = VaR,(XEX) fails to satisfy D3, D4 and D5. Rockafellar et al. [11] provided more detail on these examples, as well as their generalization to "mixed" CVaR having I '. i representations. The abundance of different deviation measures leads naturally to the question of how to distinguish among them for application purposes. An important guideline is available in terms of the notion of a risk envelope. By this is meant a subset Q of 2(Q), with elements Q appropriately to be called risk monitors, which satisfies the axioms (Ql) Q is convex, closed, and contains 1 (constant r.v.), (Q2) EQ = 1 for every Q E Q, (Q3) for every nonconstant X e 2(Q) there is a Q E Q such that E[QX] < EX. A risk envelope Q is called coherent if it satisfies, in addition, (Q4) Q > 0 for every Q e Q. When Q is coherent, its elements Q can be interpreted as i,,.'1,,ii1,//:/;: li functions on Q relative to the underlying probability measure P on Q (which itself corresponds to the element 1 E Q as its density). For a density function Q, the probability of a subset o c is foo Q(w)dP(w), whereas the expectation of a random variable X is EQ[X] = fo X(w)Q(w)dP(w). To explain the connection between risk envelopes Q and deviation measures D, we need to impose on D the minor technical property of lower . ii.,,'liniil which means that the sets { X I D(X) < 6} are closed in 2(Q) for all 6 > 0. This holds for all of the examples of deviation measures D mentioned above; cf. [11, Proposition 2]. Theorem 2 (risk envelope characterization of deviation [11]). The lower semicontinuous deviation measures D iif \i.l_: D1, D2, D3 and D4 correspond onetoone with risk en velopes Q ,iif\ i._:. Q1, Q2 and Q3 under the relations (a) D(X) = supQE E[Q(EX X)], (b) Q { Q IEQ= 1, E[Q(EX X)] < D(X) for allX}, where equivalently E[Q(EX X)] = E[Q(X)] E[X] = covar(X, Q). Moreover, D has the additional property D5 (for coherency) if and only if Q has property Q4. The significance of the characterization in Theorem 2 is especially clear in the coherent case, because EQ[X] is then the expectation of the loss r.v. X with respect to the alternative probability distribution having density Q on Q. The interpretation is that D(X) assesses how much worse that expected loss could be than the expected loss E[X] under the reference probability distribution P on Q, when the alternative distributions the investor wishes to take into account are those encompassed by Q. The different elements Q of Q, under their designation as risk monitors, perceive the potential losses in X from different perspectives. Axiom Q3 requires Q to be rich enough that, when X has a downside, that downside can be detected by at least one of the available risk monitors Q e Q. In association with Q1 and Q2, it can be seen as meaning that the elements Q E Q form a sort of neighborhood around the constant 1 relative to the all density functions in 2(Q). Focussing on a particular choice of the deviation measure D in some application cor responds in this framework to deciding on a particular attitude toward risk, in the sense of the trustworthiness of the reference probability distribution P. Rockafellar et al. [11] discussed several issues in connection with concepts of ... pi .1 !., risk." Descriptions of the risk envelopes Q corresponding to a number of specific deviation measures D have been worked out by Rockafellar et al. [11, 23]. Some will be recalled toward the end of this chapter in connection with optimality rules and CAPMlike equations. An additional example, which we have not addressed elsewhere but deserves mention because of its rich modeling implications, is the one in which Q is just a finite set of future states and Q is taken to be the convex hull of a finite collection of functions Qi,..., Qm on Q which give probability densities, i.e., satisfy Qj(a)) > o, Q j(w) 1, wEQ and provide the i !. Ihi. " property that for every nonconstant function X on Q there is at least one index j such that EQ [EX X] > 0. Axioms Q1, Q2, Q3 and Q4 are all fulfilled then, and the corresponding deviation measure D in Theorem 2, coming out as D(X) max E[Q(EX X)] max {E[Q(X)] E[X]}, (2.8) j 1,...,i j 1,...,m is therefore coherent. In this setting, each Qj can be viewed as a "mixed ,. 1 I. representing an alternative probability distribution among the pi"".. i. i..' (the elements w of Q) that is different from the reference distribution. An investor may have specified these mixed scenarios as tests for what might reasonably be a cause for concern in appraising the future. The deviation D(X) (2.8) identifies the worst discrepancy that could occur between the expected losses under the specified alternatives and the expected loss E[X]. 2.3 Portfolio Framework To proceed with our effort to extend the classical results in portfolio theory for standard deviation to general deviation measure D, we must provide a market setting. The market will be taken, for model purposes, to consist of instruments i= 0,1,..., n having rates of return ri. The first of these instruments, for i = 0, is riskfree; its rate of return ro is a constant. The other instruments, for i = 1,..., m, are risky; their rates of return ri are r.v.'s in 2(Q). A dollar invested in instrument i brings back 1 + ri, for a gain (or profit) of ri dollars at the end of the time period under consideration. We will be concerned with portfolios that can be put together by investing an amount xi in each instrument i. These amounts (in dollars) can be positive, zero or negative. (A negative investment corresponds to a short position.) Such a portfolio has the present cost xo + x, + + x, and the uncertain future value xo(1 + ro) + xi(1 + ri) + + x(1 + r,). The associated gain is thus the r.v. X in 2(Q) described by X = xoro + xiri + + xnrn. (2.9) Here we are using "gain" in the sense that a loss is a negative gain. Costs, too, might be negative as well as positive, or zero. To facilitate our work with these r.v.'s X while taking into account the special role of the riskfree instrument and keeping notation simple, we introduce r (r, ... r,) (vector r.v.), r (r, ..., r.) for r Eri, along with the vectors X = (X ,..., n), ( ). The general r.v. X is then x0ro + x r (2.9); its expected gain EX is x0ro + x r, and its cost is Xo + xTe. We speak of x = (xl,..., xn) itself as giving the xportfolio for which the gain is the r.v. x r in 2(Q), the expected gain is xzr, and the cost is x e. The following assumptions on instruments i = 1,... ,n in the model will henceforth be in effect. The rest of this section will be devoted to elucidating their immediate conse quences. Basic assumptions are (Al) No xportfolio with x / 0 is riskfree. (A2) The expected rates of return rl,..., r, are not all the same. (A3) D(ri) < oo and D(ri) < oo for all i. Assumption Al is harmless and merely underscores our aim of letting the i = 0 instru ment do all the riskfree service. A notion of redundancy will help in understanding why this is true. Let us say that an instruments i is redundant in the model if the associated r.v. ri, which gives the gain from investing one dollar in instrument i, can exactly be replicated by the gain r.v. of a portfolio put together from the other instruments. Note that such replication, if possible at all, would have to be achieved at cost 1, or an arbitrage opportu nity would exist, thereby undermining our intent of starting from a market in which prices are in equilibrium. Proposition 1 (elimination of redundancy). Assumption Al is fulfilled if and only if none of the instruments i in the model is redundant. Proof. If some xportfolio with x $ 0 were riskfree, we could find a value ox such that the r.v. X = xoro + xr + ... + xzr is identically 0. One of the coefficients xl,..., x would be nonzero; suppose for purposes of illustration that is xl. We would then have ri = i,'o, + i'' + + x'r for xi = Xi/x, which would mean that the i = 1 instrument is redundant. For the converse, suppose some instrument i is redundant. If that holds for i = 0, then by the definition of redundancy there must be a nonzero xportfolio that is riskfree. Otherwise, we can suppose for simplicity of notation that i = 1 is redundant. This refers to the existence of coefficients Xo, x2, ... xn such that rl = xoro + x2r2 + + xr,. Then rl + x2r2 + + nrn = xoro, so the xportfolio for x = (1, x2,... x) would be riskfree. 0 Redundant instruments offer nothing new, so we could I'., , eliminate them from the model one by one until nothing redundant was left. Then Al would hold. Another insight into Al can be obtained through consideration of distribution func tions. Proposition 2 (continuous distributions). Assumption Al is satisfied when the r.v. r is continuously distributed (i.e., the multivariate distribution function for rl,..., rn is con tinuous on 1R"), thus guaranteeing that the gain x r of any xportfolio with x $ 0 is continuously distributed as well. Proof. The well known fact about x r being continuously distributed in these circum stances precludes x r from being a constant r.v., of course. 0 Assumption A2 is needed to sidestep special circumstances which have little interest for us here. If it did not hold, there would be a value p such that ri = p for i= 1,..., n; then the expected gain x r of an xportfolio would i'. , be p times its cost x e. Both Al and A2 seem to be taken for granted by many in finance, even though they are essential to the validity of commonly made assertions.2 The desire to maintain mathe matical rigor in our development of portfolio theory requires us to make these assumptions, and others, explicit. Assumption A3 is certainly satisfied when D is a deviation measure that is finite on all of 2(Q), and many measures with that property have already been indicated beyond D(X) o(X), for instance D(X) = CVaR,(X EX). But A3 may also be satisfied for some deviation measures that are not finite on all of 2(Q). An example is D(X) EX inf X when the rates of return ri are bounded. Note that we are obliged to require the finiteness of D(ri) and D(ri) separately, because D need not be symmetric. Proposition 3 (portfolio deviations). The deviation function fD(X) D(x r) is finite everywhere and convex on 1"' (hence also continuous), moreover with the properties that (a) fD(O) = 0, but fD(x) > 0 when x $ 0, (b) fD(A ) = AXfD() when A > 0, (c) fD(X + X') 2 For instance, in the text [24, p. 159], the n +2 linear equations in n +2 unknowns that describe the weights for an efficient risky portfolio in the Markowitz model are said to have a unique solution, but really that is only true when the coefficient matrix is nonsingular. The matrix in question fails to be nonsingular if Al and A2 do not hold. (e) fD() = maxQE~ E1I i(ri E[Q, 1) = maxQE Q E xi covar(Q, ri), where, in the final formula, Q is the risk envelope associated with the deviation measure D. Proof. In view of axiom D4 on D, the strict inequality in (a) is equivalent to Al. Prop erties (b) and (c), together with the fact in (a) that fD(O) = 0, follow immediately from axioms D2 and D3 on D. They imply in particular that fD is a convex function. The set of x for which fD(x) < oo is then a convex subset of 1". Because of A3, that set includes the vectors, (1,0,...,0), (0,1,...,0), (0,0,..., 1), which correspond to portfolios consisting of just one of the instruments i = 1,..., n, either in unit long position or unit short position. It must also then include all positive multiples of those vectors, through (b), as well as all sums generated from those, through (c). Thus, it has to be all of W". For the fact that finite convex functions "2' are continuous [18, Theorem 10.1]. On the principle of [18, Corollary 8.7.1] and fD being convex and continuous, if any set of form { x fD(x) < 6} is bounded, then all sets of that form must be bounded. By (a), the set { x fD(x) < 0} is the singleton {0}, so (d) is correct. Finally, the max formula in (e) comes immediately from applying the max formula in Theorem 2(a) to X = x r = Ei1 xiri. Proposition 4 (richness of costgain combinations). For every choice of (7, () E j2, there is an xportfolio having cost xe = r and expected gain xr = (. Proof. This is the main consequence of A2. The set of pairs (r, () coming from portfolios in this way constitutes a subspace of f2, so if it were not all of f2, these pairs would would be collinear, and we would be in the lockstep situation excluded by A2. D 2.4 Fundamental Optimization Problem The problem of optimization that now we wish to study with respect to the gain r.v.'s X in (2.9) is minimize D(xoro + x r) subject to Xo + xe = 1 and xoro + x r > ro + A, P(A) where D(xoro + x r) is actually just the fD(x) in Proposition 3, of course. The cost constraint x0 + x e =1 signifies that exactly one dollar is to be invested in the portfolio. The gain constraint xoro + xTr > ro + A requires that this unit investment should result in an expected future value of at least 1 + ro + A. The parameter A gives the risk premium the extra amount being demanded over the gain associated with investing at the risk free rate ro. The gain constraint has been written as an inequality instead of an equation because there should not be any objection if some portfolio, without worsening the deviation or costing more, might have an expected gain that is more than ro + A. It will come out below, however, that any portfolio solving problem P(A) must satisfy this constraint with equality, when A > 0. The unit cost constraint in P(A) can be used to eliminate 0o by assigning it the value xo = 1 x e. The problem statement comes down then to: Po(A) minimize fD(x) subject to x [r roe] > A. Adopting this framework in terms of xportfolios alone, we let do(A) optimal value (the infimum of the deviation) in 'Po(A), (2.10) So(A) = optimal solution set (the minimizing vectors x) in 'Po(A). Proposition 5 (solution existence and homogeneity). An optimal solution to problem Po(A) is sure to exist (not necessarily uniquely), no matter what the choice of A. Indeed, the optimal solution set So(A) is ,l\ 1 ' convex, closed and bounded, in addition to being nonempty. Moreover, for A < 0 : do(A) = 0 and So(A) = {0} (put all in the riskfree instrument), for A > 0 : do(A) > 0, with do(A) = Ado(1) and So(A) = { A x e So(1)}. (2.11) Additionally, when A > 0 the gain constraint is i1 active in Po(A), i.e., every x e So(A) satisfies x [r roe] = A. Proof. In view of Proposition 4, the constraint in problem Po(A) can be satisfied regard less of the choice of ro and A. The sets of type {x f(x) < 6, x [r roe] > A} for 6 > do(A) (2.12) are nonempty by the definition of do(A) as well as compact because of the continuity of fD and the boundedness in Proposition 3(d). Any nest of nonempty compact sets has a nonempty intersection. In this case, moreover, the sets are convex by virtue of the convexity of fD, so the intersection is likewise a convex set. This confirms that So(A) is nonempty, convex and compact. The special assertions about Po(A) in the case of A < 0 are evident from Proposition 3(a). They rely also on the constraint having been stated as an inequality rather than an equation. In the case where A > 0, the relationships involving do(1) and So(1) are immediate from the positive homogeneity of fD in Proposition 3(b). The constraint in Po(A) has to be active when A > 0, because if a has x [r roe] > A, there is a factor 0 E (0, 1) such that the vector x' = Ox satisfies the same inequality and yet yields a deviation amount that is smaller than the one for x by the same factor. This is incompatible with x being optimal. Note that here we are invoking Proposition 3(a) once more, since this argument would fall through if the deviation in question were 0. 0 Theorem 3 (generalized onefund theorem in unsealed form). Suppose x* belongs to the solution set So(A*) to problem Po(A*) for some A* > 0. Then, for each A > 0, an optimal solution to problem P(A) is obtained by investing the amount (A/A*)(x* e) in the x*portfolio and the amount Xo = 1 (A/A*)(x*e) in the riskfree instrument. Proof. According to solution rule in (2.11) of Proposition 5, by having x* e So(A*) we are sure to have (A/A*)x* e So(A). We have formulated problem Po(A) in such a manner that a solution to 'P(A) is constructed by acquiring an x e So(A), at cost x e, and then investing the amount xo = 1 xe in the riskfree instrument. Here we have x e = (A/A*)(*e), so the conclusion is at hand. 0 Theorem 3 is ii ,I I" because it imposes no restriction on the cost x* e of the risky portfolio that is being utilized in constructing a solution to problem 'P(A) for each A > 0. In principle, the cost of the x*portfolio should not really matter, since in passing from x* to x = (A/A*)x* the relative proportions invested in the instruments i = 1,..., n remain the same. The magnitudes of these amounts go up or down, but they do so by the common factor A/A*. The unsealed result nevertheless leads right away to the following i I,. I" conclusions. Theorem 3' (generalized onefund theorem in scaled form). (a) Suppose, for some A* > 0 and x* E So(A*), that x*e = 1, i.e., the x*portfolio has positive unit cost. Then, for any A > 0, an optimal solution to problem P(A) is obtained by investing the positive amount (A/A*) in the x*portfolio and the positive amount xo = 1 (A/A*) in the riskfree instrument. (b) Suppose, for some A* > 0 and x* E So(A*), that x*e = 1, i.e., the x*portfolio has negative unit cost. Then, for any A > 0, an optimal solution to problem P(A) is obtained by investing the negative amount (A/A*) in the x*portfolio, thereby effectively obtaining the positive amount (A/A*), and then investing the amount 1 + (A/A*) in the riskfree instrument. For someone versed in classical portfolio theory, this scaled form of the onefund the orem may seem strange. Why did we bother with the unsealed form at all? Why not proceed straight to the case of an x*portfolio as in (a)? And how can it ever be necessary to consider, or even be possible to encounter, an x*portfolio as in (b). The answer lies in the fact that nothing in our assumptions, so far, about ro and the r.v.'s rl,..., r, ensures that in solving problem P(A), or equivalently, solving problem Po(A), for a specified A > 0, the risky xportfolio we get in optimality will have positive cost. The possibility that the cost is negative, or zero, cannot be understood without a deeper investigation. Indeed, we will see that it depends on the interplay between the chosen deviation measure D and the size of the riskfree rate ro, relative to the uncertain rates rl,. ..,rn. Definition 1 (master funds). An x*portfolio meeting the prescription in (a) of Theorem 3' will be said to furnish a master fund of positive /'l," whereas an x*portfolio meeting the prescription in (b) of Theorem 3' will be said to furnish a master fund of negative 'i"P In the classical theory, based on D = a, only master funds of the 1p...! ve type" in this definition are contemplated, and they are immediately tied into the notion of "efficient set." The familiar picture in Figure 21 is used to indicate that, for A > 0, the optimal value do(A) can be obtained by interpolating along a tangent line to the efficient set that passes through the point (0, ro). The point of tangency corresponds to an x*portfolio of cost 1 having expected gain (* = ro + A* for some A* > 0. This portfolio then solves problem Po(A*) and furnishes a master fund which is able to perform the role in Theorem 3'(a). In our situation of nonstandard deviation, there is a need to look much more closely at this picture and recognize certain shortcomings as well as major challenges. In the first place, the "efficient , i corresponding analogously to a general deviation measure D $ a may no longer be a quadratic curve like the one in Figure 21 (which is actually a hyperbola). For that reason, the onefund theorem must contend with serious complications.3 4 expected gain tangent portfolio  % slope = 1/do(1) efficient set ro +A  r+A +1  I I I I deviation do (1) do(A) Figure 21: Classical efficient set: expected gain versus standard deviation 3 Without the curve being quadratic, there is no hope at all, by the way, of generalizing the classical I . ifund" theorem [4], which asserts the existence of two portfolios from which all efficient portfolios can be constructed as linear combinations. That result is intrinsically .11 I ,i iii" in its mathematical underpinnings. Although the region marked out by the efficient set will continue to be convex in our generalized setting, its boundary may incorporate corners or straight segments. For the case of a corner, the very meaning of tangency has to be pinned down carefully. In the presence of straight segments, the tangent line could have a whole interval in common with the efficient set, and this might in fact be an infinite interval. Another complication, which can also come up in the classical model, is the possibility that, because of the asymptotic behavior of the efficient set, the efficient set has no "tangent" line at all that passes through (0, ro). Indeed, apart from any troubles with asymptotic behavior, there is an unspoken difficulty in the classical picture over the fact that it takes for granted the existence of a master fund of positive type. Clearly this existence, perceived in relation to tangencyy," depends in particular on the rate ro. It might fail if ro were too high. In the traditional setting with D = a, such a situation has been regarded as implausible and i, i , 111 i pI1' i 'l1. with market equilibrium." This view appears to originate in CAPM considerations and the supposition that all investors are effectively engaged in minimizing standard deviation. A master fund could not be of negative type, for the reason that if all investors wanted to take a net short position as represented by a certain portfolio, so as to obtain money to invest at the riskfree rate, something must be wrong with the riskfree rate an implicit market instability which fails to account for the limited supply of money. None of that really applies to our setting, however, because we are only exploring portfolio optimization for a subclass of investors, those who choose the particular deviation measure D we are focussing on. Other investors, with different measures D, can be expected to come to different conclusions about their portfolio choices. Some may end up with net short positions, while others may not. From that angle, there is no hint of conflict with market equilibrium in thinking about a master fund of negative type possibly emerging from a particular choice of D at some level of ro. We are compelled, therefore, in our framework of a diversity of deviation measures D, to face up to all cost possibilities for xportfolios as potential solutions to problem Po(A). This will lead us to study how such solutions may depend on ro as a parameter. Before getting into that parametric .111 , i. we can record a key fact about duality in Po(A). This problem is, after all, a convex programming problem, in which the convex function fD is minimized subject to a single linear constraint. The Lagrangian function is LA(x, A) f() + A(A x [r ,, ), for A > 0, (2.13) and the problem dual to Po(A) consists therefore of maximizing the function g(A) inf, LA(x, A) subject to A > 0 (cf. the general theory is developed by Rockafellar [18, 25]). Because of the positive homogeneity in Proposition 3(b), however, we have gD(A) = AA when fD(x) Ax [r roe] > 0 for all x E R"f, but g(A) = oo otherwise. The problem dual to Po(A) therefore takes the form: maximize AA with respect to A I if i, in: fD(x) > Ax [r roe] for all x E WR. (2.14) Of course, we really only need to understand the case of A = 1, since everything else can be obtained from that through rescaling. By i''l1, in: known duality results about the relationship between a convex programming problem and its dual, we get the following conclusion about that case. Proposition 6 (duality). The optimal value do(1) in problem Po(1) has the dual charac terization of being the highest A such that fr(x) > AX [r roe] for all x E R". (2.15) Proof. We are dealing with a convex programming problem in which the objective func tion fD has bounded level sets, the property in Proposition 3(d), and on the other hand the Slater constraint qualification holds (it is possible to satisfy inequality constraints, here just one, with strict inequality). In that case the dual problem has an optimal solution, as does the primal problem, and the optimal values in the two problems (the min value in the primal problem and the max value in the dual problem) coincide; cf. [25]. 0 2.5 Efficient Sets and Frontiers In our endeavor to understand how the classical picture in Figure 21 might have to be modified and expanded, we cannot limit our attention to xportfolios with cost x e > 0, for reasons already explained. It is essential to look at costs x e < 0 as well. Moreover, we have to adjust to the fact that the deviation measure D under scrutiny might not be symmetric. If we have an xportfolio representing a "net long position," in the sense that x e > 0, and we wish to pass to the associated xportfolio with x = x, representing a "net short p .iI. ,i.11 because ,xe = e < 0, we cannot count on having D(x r) = D(x r). Switches between "long" and li!. t" could have significant effects on risk perception. Out of these considerations, we are obliged to investigate an auxiliary optimization problem with respect to the instruments i = 1,... n. In this problem, 7r and ( are param eters denoting targeted cost and expected gain, and we seek to solve: P(,, () minimize fD(X) = D( r) subject to xae = 7 and r = (. We wish to investigate it without any preconditions on the signs of r or (. The gain constraint in P(r, () has been written as an equation this time because of the chiefly technical role that the ., i 1, i, will 1p1 , and the simpler geometry afforded by having an equation instead of an inequality. We let {d(r, () optimal value (the infimum of the deviation) in P(r, (), S(r, () =optimal solution set (the minimizing vectors x) in P(r, (). Proposition 7 parametricc framework for cost and expected gain). An optimal solution to problem P(r, () is sure to exist (not necessarily uniquely), no matter what the choice of r and (. Indeed, the solution set S(r, () in 1R" is ,it 1i convex, closed and bounded, with S(0,0) = {0} and S(AX, A0) { Ax x E S(x ()} when A > 0, (2.17) while the function d on JR2 giving the minimum deviation is finite everywhere and convex (hence also continuous), moreover with the properties that (a) d(0, 0) = 0, but d(7, () > 0 when (w, () (0, 0), (b) d(X7, A0) = Ad(7, () when A > 0, (c) d(7Tl + T2,l ) 8 i d(7l, 1l) d(7T2, 2), (d) { (r, () I d(7r, () < 6} is a bounded set for every 6 > 0. Proof. Our assumption A2 guarantees through Proposition 4 that the constraints in P(7, () can be satisfied, regardless of how t and ( are chosen. The finiteness of d(7, () and nonemptiness of S(7, () follow then from the properties of fD in Proposition 3 (much as in the proof of Proposition 5). As the set of solutions to a convex programming problem, S(7, () is convex and closed. By virtue of the boundedness of the level sets of fD in Proposition 3(d), S(7, () is bounded. Properties (a)(b)(c)(d) of the function d follow from the corresponding properties of fD in Proposition 3. In particular, the set in (d) is the image of the compact set in Proposition 3(d) under the (continuous) linear transformation x v* (X e, x r), and that guarantees it is compact as well. 0 The relevance of problem P(7, () for our goal of .111 i,. in: the cost of a solution to problem Po(A) comes from the following observation. Proposition 8 (reduced optimization perspective). When A > 0, problem Po(A) is equiv alent to the problem 'Po(A) minimize d(r, () subject to r and ( ,iif\ i:._ ( ro7 = A, in the sense that optimal values in both problems are the same, and the solutions to 'Po(A) are the vectors x in R"' such that the pair (Tr, () = (Te, xTr) solves 'P(A). Proof. This is elementary in view of the nonemptiness of the solution sets S(r, () estab lished in Proposition 7, but notice that the single linear constraint, which was an inequality in Po(A), has been written now as an equation. Proposition 5 has made this possible by establishing that, when A > 0, the inequality must be tight at optimality. O According to Proposition 8, the pairs (x e, xTr) giving the cost and expected gain associated with the solutions x (or solution, if unique) to problem 'Po(A) are the pairs (Tr, () that furnish the minimum of the function d along the line in R2 described by the equation ( = ror + A. Due to positive homogeneity in (2.11) of Proposition 5, of course, we can concentrate on the case where A = 1. In that case, depicted in Figure 22, the line in question is the one with slope ro that passes through the point (0, 1). The ii'.. shown in Figure 22 are given by the equations d(r, () = 6 for various 6 > 0 and reflect the properties in Proposition 7. They are the boundaries of certain compact, convex sets which are merely rescaled versions of each other, generated by expanding or contracting the one for 6 = d(O, 1). In the classical case of standard deviation, the curves would be ellipses, but in general they may have corners and straight segments. expected gain  line: (= r0 +1 S(slope = r) /(0, I) / I I I I portfolio cost (0, 0) I  curve d(n,')=8 \ for = d(0, 1) Figure 22: Reduced optimization perspective The issue of whether the xportfolios for x E So(1) have cost xTe > 0, xe = 0 or xae < 0, comes down to whether, in minimizing d along the line in Figure 22, the points (w, () that are obtained at the minimum have t > 0, t = 0 or t < 0. (We have to speak in general of "rpor I.! .!, . and p" .iii (t, ()" because uniqueness of optimal solutions is not assured here, in general.) It is immediately clear that this must depend largely on the size of the riskfree rate ro and cannot be resolved merely on the basis of any of the assumptions that have been made, so far, on the rates of return of the instruments i = 1,..., n in our model. For the ro that is illustrated, the slanted line in Figure 22 cuts into the set { (t, () d(r, () < d(0, 1)} toward the right, and one therefore has rt > 0 at optimality. But for higher and higher levels of ro, a stage will eventually be reached where the line henceforth cuts into this set instead toward the left, in which case rt < 0 at optimality. A formal mI i i, of these circumstances, aimed at characterizing the threshold value, or values, of ro where the line does not cut into the set at all, will have to be undertaken. Observe that in the case shown in Figure 22 there is not just one ro for which line does not cut into the set in question, but indeed a whole interval of such values. Geometrically, this corresponds to the boundary of the set having a "corner 1.. mi at (0, 1). Although that can be regarded as an exceptional situation, it cannot be ruled out. Our result on threshold behavior (Theorem 5 in the next section) must therefore, in general, allow for an interval of ro values for which one has t = 0 at optimality. For now, the essential thing to recognize is the need to study two efficient sets, if cost behavior in the onefund theorem is to be understood over the whole range of possible ro values. There has to be an efficient set corresponding to xportfolios furnishing "unit long P1 .. (cost = 1), but also one for xportfolios furnishing "unit short 1 ii .. i (cost S1). Because the deviation measure D might not be symmetric, neither of these efficient sets can be expected to be derivable in a simple way from the other. Dictates of simplicity in dealing with the geometry of efficiency and its relationship to the onefund theorem and properties of the function d cause us to adopt a convention different from the one in Figure 21, where deviation is on the horizontal axis and expected gain on the vertical axis. Instead, we will have deviation on the vertical axis and expected gain on the horizontal axis. Of course, a flip across the 450 line between the two axes can be used to convert our convention to the classical one, when desired. Definition 2 (efficient sets and frontiers, positive and negative). By the positive efficient set and the negative efficient set will be meant the boundaries G+ and G, respectively, of the feasibility sets F {(( 6) I x with aXe = 1, ar = fDz() < 6}, (2.18) F { ((, 6) 3 x with e = 1, ar (, fD(x) < 6}. By the positive efficient frontier will be meant the part ofG+ consisting of all ((, 6) e F+ for which there is no ((', 6) E F+ with (' > (. Likewise, by the negative efficient frontier will be meant the part of G consisting of all ((, 6) e F for which there is no ((', 6) e F with (' > (. master fund (negative)  negative efficient frontier negative efficient set F Figure 23: Efficient sets in coordinates standard for graphs of functions The virtue of passing to ((, 6) in the definition of F will emerge in the results below as a way of getting the most out of a single geometric picture in which both of the efficient sets, or frontiers, can be seen, namely the picture in Figure 23, which will be explained in due course, after Theorem 4. Note that, through a notational switch between x and x, one can express F equivalently by F { ((, 6) 3]x with x e = 1, xr = (, f(x) < 6}. (2.19) Thus, in cases where the deviation measure D is symmetric, so that fD(x) = fD(x), the set F would merely be the reflection of the set F+ across the (axis, and there would be less of an imperative for considering it separately. As depicted in Figure 23, the positive efficient frontier is the i !i, boundary of F+, in contrast to G+ being the whole boundary. In the same way, the negative efficient frontier is the "left" boundary of F in contrast to G being the whole boundary. Only these partial boundaries will really have a role in what follows, but it is convenient mathematically to work with G+ and G themselves. For convenience of comparisons, Figure 24 poses all these sets in the reversed coordinate , i i 1 that is customary in finance. There, the positive efficient frontier becomes an "upper" boundary and the negative efficient frontier a !I , boundary. F negative efficient set negative efficient frontier 4 ^^   expected gain positive efficient frontier positive efficient set deviation Figure 24: Efficient sets in coordinates traditional for finance Proposition 9 (efficient sets as function graphs). The positive efficient set G+ is the graph of the convex function min deviation for cost 1 and expected gain (, d+(() = d(l, () (2.20) whereas the negative efficient set G is the graph of the concave function d (() = d(1, () = min deviation for cost 1 and expected gain (. (2.21) Indeed, F+ is the closed, convex set consisting of the pairs ((,6) for which 6 > d+((), whereas F is the closed, convex set consisting of the pairs ((,6) for which 6 < d ((). Furthermore, the asymptotic slope of G+ on the right is the same as the asymptotic slope of G on the left, lim d ) lim d ( = d(0, 1) > 0. (2.22) Coo ( Coo ( Proof. The convexity of d+ and concavity of d are evident from the convexity of d in Proposition 7. Like d, these functions are finite and continuous, in particular. Those properties, along with the fact in Proposition 7 that the minimum deviations in (2.20) and (2.21) are sure to be attained, immediately yield the descriptions claimed for F+ and F. The relations in (2.22) specialize to d a well known property in convex 1 i1 , i the asymptotic slope of a finite convex function along any halfline depends only on the direction of the halfline, not on its starting point (see [18, Theorem 8.5]). Here, we are looking at such slopes along halflines in the r( space of Figure 22 that are parallel to the positive or negative (axis. On the positive (axis itself, d is linear because of the positive homogeneity in Proposition 7(b): we have d(0, ()/ = d(0, 1) for all ( > 0. This value is trivially then the limit of the ratio as (  oo. Therefore, d(0, 1) is also the limit of d(1, ()/( as (  oo, and as well as the limit of d(1, ()/( as (  oo, where, by the definitions of d+ and d the first is the limit of d+(()/( as (  oo and the second is limit of d(()/(() as ( oo. Incidentally, the asymptotic slope of G+ on the right agrees likewise with the asymptotic slope of G on the left, but this fact will not 1.1 , any role here. (The portions of G+ on the left and G on the right would drop out if we wrote the ( constraint in P(7, () as an inequality instead of an equation.) We are in position now to answer the question of how the tangency relationship associ ated with the classical onefund theorem, as in Figure 21, can be extended to our framework of general deviation measures, as a complement to the onefund results in Theorems 3 and 3'. Theorem 4 (efficiency characterization of master funds). The optimal deviation value do(1) in problem Po(1) is the highest of the slopes of all the lines in JR2 through (ro, 0) that lie between the curves G+ and G (perhaps touching them, but not crossing them). In referring to the line through (ro, 0) with slope do(1) as the ,,h!ii", the following conclusions can be drawn. (a) If the roline touches G+ at a point ((*,6*), then any x* E S(1, *) furnishes a master fund of positive type: it has cost x*e = 1 and belongs to the optimal solution set So(A*) for A* = ro > 0. (b) If the roline touches G at a point ((*, 6*), then any x* E S(1, (*) furnishes a master fund of negative type: it has cost *Te = 1 and belongs to the optimal solution set So(A*) for A* = (* + ro > 0. (c) The maximum value that do(l) can have with respect to different values of ro is the common asymptotic slope value d(0, 1) for G+ on the right and G on the left. Proof. Our strategy is to derive this from the dual characterization of do(1) in Proposition 6. That characterization translates in terms of the definition of d into having do(1) = max{ A IA[( ror] < d(r, () for all (T, () E JRn}. (2.23) For the inequality condition inside this description to hold, it only has to hold when 7 > 0 or 7 < 0, since it must then hold automatically for 7 = 0 by the continuity of d. Indeed, because of the positive homogeneity of d in Proposition 7(b), it merely has to hold for 7 1 and for 7 = 1, in order for this conclusion to be reached. In the case of = 1, the inequality A[(ro7] < d(r, () comes down to A((ro) < d+(C). Having this hold for all ( E means that the line in ((, 6)space through (ro, 0) with slope A does not cross above the graph of d+. In the case of r = 1, the inequality A[( ror] < d(r, () comes down to A[( + ro] < d((). With a switch of notation between ( and (, this becomes A[( ro] > d ((). Having that hold for all ( E Rf means that the line in question does not cross below the graph of d Thus, the characterization of do in (2.23) reduces to the graphical characterization claimed in the theorem. Now (a) and (b) are obvious from the representations of G+ and G in Proposition 9. On the other hand, (c) follows from the monotonicity in the curvatures of G+ and G (i.e., of the left and right derivatives of the functions d+ and d due to their convexity and concavity), as expressed through Proposition 8 and the claim there about asymptotic slopes. 0 Figure 23 illustrates the general situation described in Theorem 4. Two possible values of the riskfree rate ro are indicated, corresponding to alternatives (a) and (b) of the theorem. Note the possibility of more than one point of tangencyy," and on the other hand, the ..1.il.1i, of a range of ro values all yielding the same point of tangency. Alternative (c) of Theorem 4 corresponds to roline having the same slope as the diagonal line shown in dashes. Typically there might be only one such line, associated with a unique threshold rate, but sometimes there could be a family of parallel lines corresponding to an interval of ro rates. This will be the subject of the next section, and eventually, Figure 25. The same basic relationships underlie the reversedcoordinate picture in Figure 24, of course, but there they cannot be described so simply in terms of slopes. A line having slope do(1) in Figure 23 turns into a line having slope l/do(1) in Figure 24, so for instance, the optimal value in problem 'Po(1) emerges instead as the reciprocal of the lowest of the slopes of all the lines through (0, ro) that lie between the two efficient sets. The awkwardness of this kind of statement, insisting on reciprocals, is another of the reasons why we have chosen to give priority to the presentation in Figure 23. 2.6 Threshold Determination for the RiskFree Rate The task immediately ahead of us is the mIi i, i of the transitional behavior between the cases in Theorem 4. For that, we will make use of the Lagrange multipliers associated with the problem P(7, (), specifically in the case of (r, () = (0, 1). The Lagrangian for P(r, () is the function L(W)(, p, 9) = fD() + p[r ae] + ( Tr]. (2.24) We say that (p, rl) is a Lagrange multiplier vector for P(r, () when inf, L(W,C)(x, rl, p) = optimal value d(7, () in P(7, (). (2.25) This definition, the standard one for a convex programming problem like P(7, () (cf. [18]), gets around the fact that the objective function fD(x) need not be differentiable everywhere with respect to x. We let M(r, () = set of Lagrange multiplier vectors (p, rl) in P(r, (). (2.26) Under our assumptions, the Lagrange multiplier set M(r, () is I., , nonempty, con vex and bounded. This is true from the general theory of convex programming problems because the optimal solution set to P(r, () is ., i, nonempty and bounded, and the opti mal value d(r, () is i,,v ,, finite; cf. [25], [18]. Moreover, the multiplier vectors for P(t, () are known from that theory to be the il ,I iI, 1'"I of the (optimalvalue) function d at the point (r, (). Accordingly, they furnish the formula d'(r, (; 7', ') max{ 7'p + (' I (p, r) e M(T, ()}, (2.27) P,rl where the left side denotes the onesided directional derivative of d at (7r, () with respect to a vector (rt', (') and is defined by d'(7, (; T', (' lim + + (2.28) Et>o E~ Such derivatives exist because d is convex. Theorem 5 (rate thresholds and cost behavior). (a) Th!, I.i 1 values ro and re exist which satisfy re < r0 and have the following effect for problem Po(A) and its solution set So(A) for all A > 0: every x E So(A) has cost x e > 0 when ro < r0, every x E So(A) has cost x e < 0 when ro > io, every x E So(A) has cost xe = 0 when re < ro < r, where the third case falls away if actually re = so as to yield a single threshold rate ro. (b) In the borderline case of ro = r, at least one x E So(A) has cost xe = 0, but there could be other vectors x E So(A) having cost xfe > 0. Similarly, in the case of ro = io, at least one x E So(A) has cost e = 0, but there could be other vectors x E So(A) having cost xTe < 0. (c) The threshold rates ro and re can be determined from the Lagrange multiplier set M(O, 1) for problem P(0, 1). That set consists of the pairs (p, rJ) 'iiA i:_ r = d(O, 1) and p < p < p+ for a certain interval [p p+], and in those terms one has P+ p o d(0,1)' o d(0,1) Proof. Let p(7r) = d(r, ror +1), this being a finite, convex function on JR (by Proposition 7). Because of the scaling relation in (2.11) of Proposition 5, we need only look at the case where A = 1. As seen in Proposition 9, the costs a e of the vectors x E So(1) are the values of rt that minimize p over JR. Such values form a nonempty, closed, bounded interval in fR, inasmuch as So(1) is a nonempty, closed, bounded, convex subset of R~ (cf. Proposition 7); this interval may well collapse to just one rt value, of course. The issue is the extent to which the values of rt that minimize p may be positive, negative or zero. As a finite, convex function on f, o has right and left derivatives (ir) and p'_ (r) which are nondecreasing as functions of rt, with '_ (it) < (' (r). The minimum of O is attained at rt if and only if p'_ (r) < 0 < + (7r). We can test this condition at t = 0. If y' (0) < 0, the minimum of o can only be attained at some rt > 0, whereas if '_ (0) > 0, it can only be attained at some rt < 0. If '_ (0) < 0 < y+'(0), it can only be attained at t = 0. When / +(0) = 0, the minimum is (5 A deviation I I I I +I I I I I I I I !/ I I I I I I I I I I I I I I I I I I I r / , *~oj / expected gain I I I I I I I I I /I 6 \ I I I I I I II Figure 25: Tl::. i i values for the riskfree rate attained at t = 0, but it is conceivable that p might be constant over some interval [0, e] (with e > 0), and the minimum would also be attained then by the positive values of rt in that interval. Likewise, when '_ (0) = 0, the minimum is attained at r = 0, but it is conceivable that o might be constant over some interval [, 0], and the minimum would also be attained then by the negative values of rt in that interval. The crucial left and right derivatives y' (0) and '_ (0) are obtainable from the onesided directional derivatives of d: 0' (0) d'(0, 1; 1,ro), (0) = d'(0,1; 1, ro). The Lagrange multiplier characterization of the directional derivatives of d in (2.27) tells us then that 0' (0) = max{ p + rorl (p, r) e M(0, 1), _' (0) min{ p + ror (p, r1) e M(0, 1)}. We note next that, because d(0, A) = Ad(, 1) when A > 0 by the positive homogeneity in Proposition 7(b), we have d(0, 1) d'(0, 1; 0, 1) and d(0, 1) d'(0, 1; 0, 1), and conse quently by (2.27) that d(0, 1) = max{ Op + l I (p, r) e M(0, 1)} min{ Op + +l I (p, r) e M(0, 1)}. This implies that M(0, 1) lies within the horizontal line consisting of the pairs (p, rl) such that r = d(0, 1). Because M(0, 1) is a compact, convex set, it must actually be a closed segment of that line (possibly reduced to a single point): the corresponding p values must comprise an interval [p p+]. Thus, M(0, 1) has the special form claimed in the theorem. The formulas we already have for v' (0) and y' (0) tell us then that /' (0) p+ + rod(0, 1), (0) =p + rod(0, 1). Finally, we can put this together with the criterion already developed for the location of the values of rt that minimize cp. Clearly 1 +(0) < 0 if and only if ro < p+/d(O, 1), whereas p' (O) > 0 if and only if ro > p /d(0, 1), and the proof is thus finished. o Corollary (optimal portfolios at transition). The values of ro for which the maximum in part (c) of Theorem 4 is attained are those in the interval [r r]. For such ro, the optimal solution set So(A) to Po(A), for any A > 0, contains an x having cost xTe = 0. Theorem 5 provides important information about master funds, in particular. Theorem 6 (existence of master funds). The threshold values ro and ro in Theorem 5 have the property that when ro < t there is a master fund of positive type but none of negative type, when ro > ot, there is a master fund of negative type but none of positive type, when r0 < ro < 0, there is neither a master fund of positive type nor one of negative type. In the borderline case of ro = io, there might be a master fund of positive type, whereas in the borderline case of ro = ot, there might be a master fund of negative type. (When ro = o it is not excluded that master funds of both types exist simultaneously.) Proof. When ro < to, there exists by Proposition 5 and Theorem 5(a) an x E So(A) having xre > 0. By setting x* = x/x e and A* = A/X e, we get x*Te = 1 and have x* E So(A*) (again by Proposition 5). This x* meets the prescription in Definition 1 for furnishing a master fund of positive type. Similarly, when ro > r,, there exists by Proposition 5 and Theorem 5(a) an x E So(A) having xTe < 0. Then, by setting x* = x/xTel and A* = A/ Te, we get x*Te =1 and have x* E So(A*). This x* furnishes a master fund of negative type. On the other hand, Theorem 5(a) makes clear that a master fund of positive type cannot exist when ro > ro, but might exist (by the argument just given) when ro = t. Likewise, a master fund of negative type cannot exist when ro < ri, but might exist when ro = r. 0 The interpretation coming from Theorems 5 and 6 is that when the riskfree rate ro is high enough (specifically, above the threshold rate r0), it is advantageous, for investors whose attitudes toward risk are captured by the particular deviation measure D under investigation (and its associated risk envelope Q), to take a net li! t p..iii..ii in the market (an xportfolio with negative cost) and invest at the riskfree rate all the money that is obtained that way. The relation between threshold behavior and the efficient set geometry in Figure 23 is indicated in Figure 25. Ordinarily, it can be expected that r = i, in which case the unified threshold value could be denoted simply by or. The explanation is that these values are determined in the proof of Theorem 5 from right and left derivatives of the convex function #p(r) = d(7r, ro + 1), and such derivatives have to coincide almost everywhere. When r = i, there is only one line that fits between the two efficient sets. The reason why cases with ro < ro can truly occur is seen, from this vantage point, to be tied to the fact that the right and left derivatives of 0 may differ in some places, due to the function d not being differentiable. But it can also be understood from the remarks made earlier about the geometry in Figure 22. When the curve through (0, 1) in Figure 22 has a corner there, one has a range of slopes ro corresponding to cases for which the minimum in the reduced format of Proposition 7 occurs with r = 0. This range of slopes is marked by the two threshold values ro and r0, as shown in Figure 26. Allowance for corner points really does have to be made, because of deviation measures such as the one in (2.8), for instance. There we have fD() max x (r E[Qjr]) max {E[x r] EQ [X r]}, (2.29) so that fD is piecewise linear and the curves in Figure 26 are polygonal. Aside from the modeling potential of this kind of formula, it could also come up in numerical methods in which the risk envelope Q associated with D is approximated progressively by sets formed by generating finitely many elements Qj e Q. Such approximation yields (29) as a substitute for the max formula for fD in Proposition 3(e). Insight into circumstances in which corner points are sure not to be present will be furnished later in Proposition 11 and its corollary, and in Example 7. It should be noted that, although Theorem 6 conveys the circumstances in which master funds of one type or the other are sure to exist, it says nothing about when they might be unique. That is an entirely separate issue. Uniqueness could fail on two grounds. The first is the possibility of more than one point of tangency where the roline meets the frontier, as seen in Figure 23. The second arises when more than one portfolio can yield the same point on the frontier. It may be anticipated that these phenomena are i ii.," but they cannot readily be eliminated, a priori, in the absence of a suitable strict convexity property of D. Rockafellar et al. demonstrated [23, Example 4], however, that the required version of strict convexity is unavailable, in general, for coherent deviation measures such as lower semideviation, lower range, and CVaR, and the same can be seen for mixed CVaR and mean absolute deviation. In contrast to these observations, and the facts in Theorem 6, both the existence and uniqueness of a master portfolio seem to be taken for granted in much of the literature on portfolio optimization. The belief is widespread, moreover, that a master portfolio of positive type 1. , suffices, regardless of the magnitude of the riskfree rate ro. Our hope is that the rigorous methodology pursued in this chapter will help to dispel such misconceptions. expected gain Slope = . . / f, A. slope =r p(0,1) / / pf c / / 1 p S / I portfolio cost ^i I I r 'U I curve d(x,)= for 5 = d(o, 1) Figure 26: Threshold interval tied to corner behavior 2.7 Characterization of Optimal Portfolios Our attention turns now to the challenge of identifying the distinguishing characteristics of the xportfolios that, for a selected deviation measure D, solve the basic problems 'Po(A) when A > 0. In this context, we henceforth add the assumptions (A4) D is lower semicontinuous, i.e., the sets {X I D(X) < 6} are closed in 2( ) for every 6 > 0, (A5) D(X) < oo for all X e 2(), which ensure actually that D is continuous on 2(Q); cf. [11, Proposition 1]. We then have at our disposal the dual representation of D in terms of its associated risk envelope, as in part (a) of Theorem 2, which in particular can be expressed by covariances with risk envelopes D(X) = max covar(X, Q). (2.30) QEQ Of special interest to us in what follows will be the subset of Q on which the maximum in this formula is attained for a given X, namely Qx {Q EQ I covar(X, Q) = (X)}. (2.31) This assists in the statement of opltim lit conditions for problems in which D is minimized, through its connection to li I I, _,ii of D. By the standard definition in convex .111 i, i. as adapted to the probability framework of our space 2(Q), a subgradient of D at X is an element Y E 2(Q) such that D(X') > D(X) + E[Y(X' X)] for all X' e 2. The notation is used that OD(X) = { set of all subgradients Y of D at X }. In our situation where D is finite and continuous on 2(Q) OD(X) is i1' ., a nonempty, convex subset of 2 which is closed and bounded. Rockafellar et al. showed [11, Theorem 5] that in fact OD(X) { Y 1 Q Q Qx}, (2.32) and furthermore that, for any Y e 0D(X) one has EY = 0 and D(X) = E[XY] = covar(X, Y). Subgradients of the function fD() = D(x r) can be derived from those of D by a kind of chain rule. A subgradient of fD(x) at a point x E R"' is of course a vector ye R"f such that fD(x') > fD(x) + (x' x)y for all x' e R". The set of such y is denoted by OfD(x). A number of background facts from convex .111 i, i, [18] are worth recalling. Because fD is convex and finite everywhere, the subgradient set OfD(x) is '. i, nonempty, convex and compact. It reduces to a singleton {y} if and only if fD is differentiable at x, in which case the unique element y = (yi,..., y.) is the gradient VfD(x), so that Yi= x) for i 1,...,n (2.33) (see [18, Theorem 25.1]). Indeed, the convexity and finiteness of fD guarantee that this case of differ (ii il ,il, in which OfD(x) reduces simply to {Vfy(x)}, holds for !_..I (I. X E fn. But unfortunately, there is no easy way to know, in general, whether a particular x calculated to be optimal will happen to be such a point of differentiability. Most of the deviation measures D of interest in our general framework necessarily do lead to functions fD for which differ idi d1 il., can fail at a significant class of points x. In consequence, the simplifications accruing from differentiability cannot be taken for granted. A possible help sometimes could be the estimate that Of O+f if y e Of,(x), then (x) < yi < O x) for i 1,...,n, (2.34) Oxi Oxi where (0 fD/xi)(x) and (0+ fD/Oxi)(x) denote the left and right (onesided) partial deriva tives of fD with respect to xi at x (which exist because fD is convex). The converse impli cation is generally false; the estimate in (2.34) is not enough to pin down y as a subgradient at x. When the left and right partial derivatives coincide for each i, however, one does necessarily have fD differentiable at x with (2.33) holding. Proposition 10 (chain rule for deviation subgradients). The vectors y E OfD(x) are the vectors (yl,..., yn) such that, for the r.v. X = x r, there exists Y E OD(X) with yi = covar(ri, Y) for i = 1,..., n, (2.35) or equivalently, there exists Q e Qx such that yi = covar(r, Q) for i 1,...,n, (2.36) Furthermore, y E Of,(x) entails xay = fD(x). When x is a point where fD is differentiable, as is sure to hold in particular when there is only one Y E OD(X), or equivalently, only one Q e Qx, these formulas are available to be combined with the partial derivative relations in (2.33). Proof. The first formula comes from a general chain rule of convex .i i1, i through the fact that fD is the composition of D with the continuous linear transformation T from JR to 2(Q) defined by T(x) = x r. That chain rule [25, Theorem 19] requires, for instance, the existence of a point in the range of T at which D is continuous, and this requirement is met under our assumptions A4 and A5, as noted. It characterizes OfD(x) as the set of vectors of the form T*(Y) for Y E D(X), where T* is the adjoint linear transformation from 2(Q) to R". That adjoint transformation takes Y to the vector in ]R"' having the components yi in (2.35). That confirms the first description of OfD(x), and the second description then falls immediately out of (2.32). The fact that x y = fD(x) when y E OfD(x) is obvious then from the fact that E[XY] = D(X) when Y E OD(X). It is clear that when OD(X) reduces to a singleton {Y}, the characterization of OfD(x) yields a singleton {y}, in which case fD is differentiable at x and (2.33) holds, as recalled above. 0 Theorem 7 (.oltimrliti rule). A portfolio vector x belongs to the solution set So(A) in problem 'Po(A) for a A > 0 if and only if T [r roe] = A and there is a value of A and a vector y E OfD(x) (as characterized by Proposition 10) such that )[7 ro] = i for i 1,...,n. (2.37) Then necessarily x / 0, fD(x) > 0, and A = fD(x)/A > 0. Proof. In problem Po(A), the finite (but not necessarily differentiable) function fD is minimized subject to the single linear constraint a [r roe] > A. For such a problem of convex p.i:_ iiiiiiii,:_. the condition both necessary and sufficient for the opltinii1itv of x is (the fulfillment of the constraint along with) the existence of a Lagrange multiplier A > 0 such that the Lagrangian expression LAx, A) f ) + [(A [r ,,, 1) attains its unconstrained minimum at this x, with A = 0 if x [r roe] > A. By Proposition 5, we necessarily have x [r roe] = A in opthn1,tiitr, however. Since A > 0, that entails x $ 0, so fD(x) > 0 by Proposition 3(a). The condition for an unconstrained minimum, LA (', A) > LA (x, A) for all x', translates into fX(X') > fv(x) + (x' x) [A(r roe)] for all x. This means A(r roe) E QfD(x) and is the same as (2.37) holding for some y E QfD(x). It implies by Proposition 10 that x [A(r roe)] = fD(x), from which it follows (even without knowing in advance that A > 0, as we do) that A = fv(x)/x[r roe] = f(x)/A > 0. A It must be emphasized that the optimality rule in Theorem 7 is valid regardless of the cost of the xportfolio. It can be applied in particular to master funds, however, by way of Theorems 5 and 6, as will be carried out shortly in Theorem 8. For many choices of the deviation measure D, a precise description of the vectors y E OfD(x) that enter the rule is available through Proposition 10 and the .111 1, i of deviation subgradients Y E OD(X) and risk envelopes Q E Qx that we have already carried out in examples [11, 23]. The details will not be listed here, but we will draw on then below in a series examples concerned with master funds. The statement of Theorem 7 has been chosen to keep close to the optimality conditions that would be anticipated when fD is a differentiable function (away from the origin of f1', where its differentiability is impossible). Then the yi's are the partial derivatives in (2.33), and it may be imagined that a solution x = (xl,... ,x) could be determined from the n+1 equations in the n + 1 unknowns Xi,..., X and A that correspond then to optini1,lit, namely { x[rl ro] +. + [rn ro] = A, (2.38) A[ri ro] (ix,...,xn) for i 1,...,n. A practical shortcoming in this approach is apparent, however, in the fact that the partial derivatives depend nonlinearly on xl,...,x,, in general, and although this dependence would be continuous, it would not likely be differentiable. (The twice differentiability of fD is rarely to be expected in our setting.) Solving systems of nonlinear equations directly is a formidable task even by numerical methods when the expressions in the equations are not differentiable. This is true all the more when the solution is not guaranteed to be unique, even locally a further difficulty which, on the basis of the remarks at the end of the preceding section, cannot be shoved aside. Anyway, there is no need to reduce optinriilitv conditions to a ;, . ill of equations in order to gain insight from them. The subgradient expressions in Theorem 7 provide both practical and theoretical information which can readily be utilized. Numerical methods of optimization for solving problem Po(A), making use of these conditions, can be substituted very effectively for numerical methods of solving nonlinear equations. Such optimization techniques can take advantage of special features like the max formula for fD in Proposition 3(e) obtained from the risk envelope Q, which relates to the form of the yi's in (2.36), for example, but this is not the place to pursue that topic. The optimlitvi1 conditions in Theorem 7 can be restated in a different manner by recognizing that, since A must turn out to be equal to fD(x)/A, this ratio can be substituted for A in (2.37), moreover with A replaced by x [r roe]. The conditions we reach by that route are r ro = i A for i 1,...,n, with (yi,...,yn) = y E f,(x). (2.39) These equations moreover embody the requirement that T [r roe] = A, since that follows from them by Proposition 10 through multiplying each equation by xi and adding up. The criterion for optiH1,litv in (2.39) produces important information about the port folios that furnish master funds. Theorem 8 (characterization of master funds). For an x*portfolio and its gain r.v. X* x* r, let B(x*) stand for the set of all vectors 3 = (31,..., /3) (perhaps no more than one) that satisfy covar(ri, Y*)/D(X*) for some Y* E OD(X*), and ei (2.40) covar(ri, Q*)/D(X*) for some Q* E Qx*, the two descriptions in this formula being equivalent. (a) A master fund of positive type is furnished by x* if and only ifx* has cost x*e = 1 and there is a coefficient vector 3 E B(x*) such that riro = 3i [EX* ro] for i= 1,...,n. (2.41) (b) A master fund of negative type is furnished by x* if and only if x* has cost x*e = 1 and there is a coefficient vector 13 E B(x*) such that i ro = 3i [(EX*) ro] for i = 1,..., n. (2.42) (c) All coefficient vectors 3 E B(x*) have the property that x31x* + . +3,,* = 1. (2.43) T!, v are subject to the general estimates 1 O f 1 O+f f(*) O (x) < < ( f) for i 1,...,n, (2.44) which, when x* happens to be a point at which fD is differentiable, reduce to fixing 3 uniquely through 1 3i (W*) for i 1,...,n. (2.45) fD(x*) Oxi Proof. This combines the optimality conditions in Theorem 7, in the version posed in (2.39), with the facts in Proposition 10 about vectors y E Of(x*). In the light of having * e = 1, the equations in (2.41) assert the existence of y E OfD(x*) such that re ro = A* for i 1,...,n, with A* x= x [r roe]. (2.46) fj(x*) That is equivalent to having z* in the solution set So(A*), and since the cost of z* is 1, corresponds to x* furnishing a master fund of positive type in the sense of Definition 1. The case of a master fund of negative type in (b) is entirely parallel, except for being adjusted to z* having cost 1. Since 3i[(EX*) ro] = 3ix*[r roe] in that case, they correspond again to having (47) hold for some y E fD(ax*). The relations in (2.40), (2.43), (2.44) and (2.45), simply translate properties in Proposition 10 from y to y/fD(x*). Oj The switch in Theorem 8(b) from EX* to EX* amounts to a switch from the x* portfolio, which constitutes a net short position, to its opposite, the H*portfolio for x* x*, which constitutes a net long position. The appearance of 3i in place of 3i in (2.42), corresponds to reversing, accordingly, the various correlations in (2.40). Regardless of the type of master fund, the equation in (2.43) can be interpreted as pro viding guidance to the allocation of risk among the instruments i = 1,..., n in determining x*. The generality of Theorems 7 and 8 deserves emphasis. Other researchers working with nonstandard deviation measures have dealt with special classes of measures and have typically narrowed the scope of their results by supposing the uniqueness of the optimal portfolio in question, or further in the case of a master fund, that it is of positive type. They have also relied on the function fD being differentiable. Theorems 7 and 8, in contrast, do not have these limitations. 2.8 Specialized CAPMlike Relations The equations in the characterization of master funds in Theorem 8 bear a strong resemblance to the CAPM relations in classical portfolio theory. They need not have the same interpretation as in that theory, though. We explore this now with respect to a number of different choices of the deviation measure D. The subgradients Y* and risk monitors Q* described in these examples can be applied also beyond the master fund context to the optimality conditions in Theorem 7. We leave the details of that aside, however. Example 1 (master funds for standard deviation). When D = a and X* is any noncon stant r.v., there is a unique Y* e OD(X*), namely Y* = [X* EX*]/a(X*). The master fund characterizations in Theorem 8, with respect to X* = x*r, therefore hold with fD differentiable and covar(ri, X*) A = (2.47) O72(X*) Detail. The description of OD for this case comes from [11, Examples 14 and 18]. o In this setting, as long as the riskfree rate ro is not too high (in the sense of the threshold in Theorem 5), there is a unique master fund of positive type: x*, (* and A* are uniquely determined subject to the x*portfolio having unit cost. The coefficients /3i in (48) turn the relations in (2.41) into the standard CAPM equations for the expected rate of return of this master fund (or "market portfolio"). These classical covariance relations have been interpreted in the case of a master fund of positive type4 as furnishing a onefactor predictive model in the form ri ro /3i[X* ro] for i = 1,..., n. (2.48) This is based conceptually on a supposition that all investors seek essentially to minimize standard deviation, when putting together a portfolio at a specified level of expected gain. It is tempting to think that the equations of Theorem 8 in (2.41) might be able to take on such a role as in (49) more widely, for other deviation measures, but one must be careful not to jump directly to such a conclusion. We are operating here from a distinctly different standpoint, where the investors employing any particular deviation measure D are viewed only as a subgroup of all the investors, perhaps just a small subgroup. There is little reason to believe that the actions of such a subgroup ought to have a determining influence on market behavior as a whole. 4 Little, if any, attention has been paid in the classical context to the potential nonex istence of such a fund. For instance, in Luenberger's derivation of a master fund in his book [24, p. 168], he sets up a function of the . i, !," our xi's, and claims that by de termining where the partial derivatives of this function vanish, the fund in question will be determined. The function is nonconvex, however, so this is just a necessary condition, not a sufficient condition, and could correspond to a maximum as well as a minimum. Anyway, he neglects the issue of whether a point where the derivatives vanish even exists. In this way, the threshold phenomenon with the riskfree rate is missed entirely. Another issue which must not be ignored, in general, is that the coefficients /3i in Theorem 8 might not be uniquely determined. This could happen in (2.40) because of Y* or Q* not being uniquely determined by the conditions at hand with respect to x*, but it could also occur in (2.40) or (2.45) from x* not even being the only solution to problem Po(A*). Still another .. 1i.1il, is that x* might be the unique solution for this A*, and yet another portfolio, corresponding to a value different from A*, might furnish a different master fund. This could arise from a "flat spot" on the efficient frontier; it occurs in Figures 23 and 24. Of course, it is conceivable nonetheless that, through statistical .1,! ', i. the relations (2.41) in Theorem 8 with respect to one, or maybe several alternative deviation measures in combination, may lead to interesting predictive models of type (49) with advantages over the classical CAPM. (The underlying assumption of the classical model is ',i, ., not beyond controversy.) That is not a topic to be taken up in this chapter, however. In the examples that follow, we are content mainly to see what the relations coming from Theorem 8 look like as master fund characterizations, and to note comparisons with the "beta" formulas derived by other researchers under assumptions like differentiability and uniqueness. In each case, the same coefficients work for master funds of negative type as well as ones of positive type, in line with the alternative forms of the equations in (2.41) and (2.42). Example 2 (master funds for lower semideviation). When D = a and X* is any non constant r.v., there is a unique Y* e OD(X*), namely Y* = [X* EX*]/a (X*) for X* = min{X* EX*,0}. The master fund characterizations in Theorem 8, with respect to X* = x*r, therefore hold with fD differentiable and covar(ri, X*) 2Xi *) (2.49) (2 (X*) Detail. Here, we rely on the formula for OD established by Rockafellar et al. [23], Example 6. 0 Lower semideviation is among the measures covered by Malevergne and Sornette [13]. Those authors, although concerned especially with ii.._ iii. ," based their results on ax ioms aimed at covering a wide class of measures of I,. i I .11!" type. They did not require convexity or continuity, or invoke those properties i, '._ re, so the underpinnings to their assertions of the existence and uniqueness of optimal portfolios appears to be without foun dation. The same is true of their claims of having determined master funds of positive type without making any restriction on the riskfree rate. Example 3 (master funds for CVaR). Let D(X) = CVaR,(X EX) for any choice of a E (0, 1). For any r.v. X*, the elements of Qx* are then the functions Q* on Q that are densities (Q* > 0, EQ* = 1) such that Sa1 on { w E Q X*(w) < VaRa(X*)}, Q* 0 on { e Q X*(w) > VaRa(X*)}, (2.50) E [0, a1] on { a e Q X*(w) = VaR,(X*)}. The master fund characterizations in Theorem 8, with respect to X* = x*r, hold therefore with the coefficients 3i in (2.40) coming from such a density function Q*. Moreover, if the set { w E Q X(w) = VaRo(X)} has probability 0, as is true in particular when the r.v. r = (ri,..., rn) is continuously distributed, then fD is differentiable at x* and the coefficients 13i can be expressed by conditional expectations: E[ r ri X* < VaR(X*) ] E[EX* X* IX* < VaRa(X*)] ( Detail. This utilizes the description of Qx* coming out of [11, Example 20]. C'I. ly, Q* is uniquely determined (up to the usual equivalence in 2(Q)) by the relations in (2.50) for X* when there is zero probability of X* taking on the value VaR,(X*). Then fD is differentiable at x* by Proposition 10. Otherwise, though, Q* might not be uniquely determined, and fD could thus fail to be differentiable at x*. Indeed, when P{X* < VaRa(X*)} < < P{X* < VaRa(X*)}, the values of Q* on the set { E Q I X*(w) = VaRo(X*)} can be selected arbitrarily from the interval [0, a1], subject only to arranging that P{X* < VaRo(X*)} = a. It may not be possible in that situation to pass from the formula for 3i in (2.40) to the formula in (2.51). 0 The version of the CVaR opttim litv, relations in (2.51) is interesting for the reason that the numerators of the beta coefficients give the conditional expectation of the downside of ri subject to X* being in its lower atail. As noted, however, this version is valid when r is continuously distributed, but not necessarily when the distribution of X* has a p i! ,1, 1il li atom at VaRa(X*). In previous work on CVaR master funds, Tasche [14], Bertsimas [15] and Acerbi and Simonetti [16] have avoided the issue of discontinuities coming up in the probability dis tributions and has moreover required the differentiability of fD. The need for a threshold assumption on the riskfree rate, in order to be assured of the existence of a master fund of positive type, did not get addressed, nor did the issue of nonuniqueness of such a fund, even when it exists. Example 4 (master funds for mixed CVaR). Let D(X) = E ,i AkCVaR (X EX) with Ck E (0,1), Ak > 0 and /ZE 1 Ak = 1. For any r.v. X*, the elements of Qx* are then the functions Q* on Q of the form Q* Z= E AkQ*, where each Q* is a density (Q > 0, EQ* = 1) such that a a on a) E I X*(a)) < V.R (X*)}, Ql 0 on { E QI X*(w) > VaR (X*)}, (2.52) E [0,ak1] on {w) E Q X*(w)= VI (X*)}. In this case, therefore, the master fund characterizations in Theorem 8, with respect to X* = x*r, hold with the coefficients 3i in (2.40) coming from such a Q*, which itself is another I,il v function. If the sets {( we Q X*(w) = V'P (X*)} for k = 1,..., m all have probability 0, as is true in particular when r = (ri, ..., r,) is continuously distributed, then fD is differentiable at x* and the coefficients 3i can be expressed by a weighting of conditional expectations: E3l AkE[ r ri IX* < VaR,(X*)] E=, AkE[ EX* X* I X* V. ,, (X*)]" Detail. We have D Z Ek AkDk for Dk(X) = CVaR,(X EX). The rule for the risk envelope of a sum of deviation measures [23, Example 2] then comes into 1p1 ,. According to that rule, Q consists of all Q Z= E 1 AkQk with Qk belonging to the risk envelope for Dk. Then too, since Qx*, for any r.v. X*, consists, by definition, of the elements Q that maximize covar(X*, Q) subject to Q E Q, we have Q* E Q if and only if Q* = J: I AkQ* for a choice of functions Q* E (Qk)x*. It remains only to utilize for each of the CVaR deviation measures Dk the description of (Qk)x* in the pattern of Example 3. O Example 5 (master funds for lower range deviation). Let D(X) = EX inf X with the state space Q being finite (so that inf X is finite for all X e 2(Q)). For any r.v. X*, the elements of Qx* are then the functions Q* on Q that are densities concentrated in the worst w states for X*, i.e., '! \ are the functions Q* ,ii f i.i_ > 0 for w E Q such that X*(w) infX*, EQ* = 1 and Q*(w) = (2.54) 0 for w e Q such that X*(w) > infX*. In this case, the master fund characterizations in Theorem 8, with respect to X* = x*r, therefore hold with the coefficients 3i in (2.40) coming from such a density function Q*. When the set of states w E Q such that X*(w) = inf X* consists of a unique w* (having nonzero prl .1 1illiv), then Q* is uniquely determined from X*, so fD is differentiable at x* and the coefficients 13i come out as S= i() where X*(w*) infX*. (2.55) EX* X*(w*) Detail. The description of Qx* for this choice of D corresponds to the formula for OD obtained by Rockafellar et al. [11], Example 19. D Example 6 (master funds for generalized mean absolute deviation). Suppose D(X) = E[ aX EXI ] = a(w)X() EXIdP(w) (2.56) JO for a positive (measurable) function a of the states w E Q. For any r.v. X*, the subgradients Y* E OD(X*) are then the functions of form Y* = V EV coming from functions V on Q that satisfy = a(w) on { w e IX*(wa) > EX*}, V(w) = a(w) on { w I X*(w) < EX*}, (2.57) E [a(w),a(w)] on {C E Q X*(w) = EX*}. TIh !1 ir ., the master fund characterizations in Theorem 8, with respect to X* = x*Tr, hold with the coefficients P3i in (2.40) coming from such a Y* (in which case covar(ri, Y*) = covar(ri, V), actually). If the middle set in this formula has probability 0, as is true in particular when the r.v. r = (ri,...,rn) is continuously distributed, then fD is differentiable at x* and the coefficients 3i are determined by E[a(ri ri) sign(X* EX*)] E[ aX* EX* I ] Detail. We have D(X) = J(X EX) for the convex functional J(U) fJ (w,U(w))dP(w) in which p(w,u) = a(w)lul for (w,u) e Q x JR. Hence the subgra dients Y* e OD(X*) have the form Y* V EV for the subgradients V e OJ(X* EX*). The subgradients of an integral functional such as J are known in convex .' !!1, i, to be characterized by the rule that V E OJ(U) if and only if V(w) e Ouc(w, U(w)), where Ouc(w, U(w)) refers to the subgradient set of u(w, u) with respect to u at u = U(w). This corresponds to V I if, i :_ the relations in (2.57). O Konno [17] has investigated mean absolute deviation with a(w) 1 under the assump tion that the ri's have a multivariate distribution given by a density function on 1R", and with that he has obtained similar /3's. In his I ii:_, the uniqueness of a master fund is not assured, though, since mean absolute deviation lacks the kind of strict convexity needed for that. Short positions are said to be excluded, which would make comparisons difficult with the optimization problem we treat here, but the constraints against shorting are suppressed in the developments by assuming their Lagrange multipliers can be set equal to 0. The optimal portfolio is assumed nevertheless to involve no shorting, so the potential need for a master fund of negative type does not come into view. 2.9 Multiplier Derivation of Thresholds The characterization of ro and ro at the end of Theorem 5 provides a means of cal culating these threshold riskfree rates through optimization. By solving problem P(0, 1), one gets its multiplier set M(0, 1), which has the special form described and embodies all the information needed. It is possible now, by building on the subgradient developments in the preceding section, to indicate in more detail what this involves. The Lagrangian function for P(0, 1) is L(o,1)(x, p, rl) = fr(x) px e + r[ x r], as specialized from (2.24), and M(0,1) consists of the pairs (p, r) such that inf, L(o,1)(x, p, ) = d(0,1), as noted in (2.25). There are no sign restrictions on these multipliers p and rl, a priori, since the constraints in P(0, 1) are equations. Proposition 11 (threshold multipliers). In terms of an optimal x* in P(0, 1), the multiplier pairs (p, rl) E M(0, 1) are characterized by the relation pe + r E afD(x*), (2.59) which corresponds to the existence of some Y* e OD(X*) such that p =covar(ri, Y*) rri for i 1,..., n, with = VD(X*)= d(0,1). (2.60) Proof. In P(0, 1), we are looking at a convex programming problem in which a finite convex function on f?', namely fD, is minimized subject to two linear constraints which we know can i'. ', be satisfied (Proposition 4). In such a problem, the condition that x* be optimal and (p, rl) be a Lagrange multiplier vector corresponds to x* il ir, in,:: xe = 0 and *Tr = 1, and being such that the inequality L(o,1)(a*,p, r) < L(o,1)(, p, rl) holds for all x. This inequality has the form fi(x) pX e + 1[1 xr] > fD(X*) px*Te + [1 XTr], which can be written as fD(x) > fD(x*) + [x x*] [pe + rir]. Having it hold for all x E JR is the same as saying that pe + rjr is a subgradient y E OfD(x*), as claimed in (2.59). Applying the description of such subgradients y in (2.35) of Proposition 10, we get the characterization claimed in (2.60). O Corollary (single thresholds). In situations where the subgradient set OD(X*) consists of a unique Y*, the multiplier set M(0, 1) reduces to a single pair (p, rl), and a single threshold is then assured: ro = = r = (2.61) Although the numerical approach in which the possible multiplier vectors (p, rj) are calculated as byproducts of the optimization in P(0, 1) itself may be all that can be counted on in general, it is interesting to note that, in the classical case of standard deviation, an analytic formula for the threshold can be derived. Example 7 (threshold formula for standard deviation). When D = a, there is a single threshold ro = o = t for the riskfree rate, which can be expressed in terms of the variancecovariance matrix for the risky assets i = 1,..., n. This matrix, let it be denoted by A, is positive definite under assumption Al, so its inverse A1 exists. One has r Ale ro = (2.62) eTAle Detail. In building on Example 1, we know that the unique Y* in (2.60) must be [X*  EX*]/a(X*). Then (61) says that pe = A[x*/((X*)] rr, so Ax* = u(X*)[pe + rlr] and consequently x* = (X*)Al [pe+rr]. Since x*Te = 0, we must have [pe+rr] A e = 0, or in other words, p[e A e]+rl[rVAe] 0. Therefore, p is uniquely determined and the corresponding single threshold value for the riskfree rate is the ratio in (2.62). 0 56 2.10 Conclusions We have endeavored to complement the groundbreaking work of Artzner, Delbaen, Eber and Heath by demonstrating that when risk measures in their sense are applied to X EX instead of to a return r.v. X itself in the presence of a new property of expectationboundedness a parallel class of functionals, appropriately termed deviation measures, arises. This should help bridge the gap between the theory of risk measures and the way that risk is typically viewed by practitioners in the finance industry. To further this aim, we have explored the distinction between risk measures and deviation measures in a range of key examples, providing insights also in terms of risk acceptance notions and their dualization by risk envelopes. The replacement of standard deviation by other deviations, such as arise from condi tional valueatrisk and other risk notions, in accordance with current trends, by no means causes the classical approach to optimization outdated. Instead, it enriches that approach by making a degree of customization available. Onefund theorems still reign as a way of simplification, even though the designated funds, in their dependence on the deviation measure, can be different for different classes of investors. Utilizing tools of convex .1,1 I, i. we have furthermore developed a scheme for deter mining optim il1it in problems of minimizing deviation or risk. We have shown that risk envelopes have major significance in solving such problems and have illustrated the partic ulars in a number of settings. The results open the way for many applications and advances in their numerical methodology. Furthermore, optimality can still be characterized by covariance relations. However, optimality with respect to net short positions in the risky instruments must be .111 '1,. 1 in addition to optimality with respect to net long positions in order to determine how an investor may wish to act with respect to the current riskfree rate. The covariance relations that are obtained furnish additional information about the behavior of risky instruments in various situations. This information could be useful in practical .111 '1, i, even if no longer associated with notions of equilibrium in which all investors are attracted to a single master fund. CHAPTER 3 DRAWDOWN MEASURE IN PORTFOLIO OPTIMIZATION 3.1 Introduction Optimal portfolio allocation is a longstanding issue in both practical portfolio manage ment and academic research on portfolio theory. Various methods have been proposed and studied by Grinold [26]. All of them, as a starting point, assume some measure of portfo lio performance, which consists of at least two components: evaluating expected portfolio reward; and assessing expected portfolio risk. From theoretical prospective, there are two wellknown approaches to manage portfolio performance: Expected Utility Theory and Risk Management, which are usually considered within a framework of a oneperiod or multi period model. If we are interested in Risk Management approach to portfolio optimization within a long term, what are the functionals for assessing portfolio risk that account for different sequences of portfolio losses? Let portfolio be optimized within time interval [0, T], and let W(t) be portfolio value at time moment t E [0, T]. One of the functionals that we are looking for is portfolio drawdown defined by max W(T) W(t) W(t), which, indeed, re[o,t] accounts for a sequence of portfolio losses. What are the advantages to formulate a portfolio optimization problem with a constraint on portfolio drawdown? To answer to this question, drawdown regulations in real trading strategies and drawdown theoretical aspects should be addressed first. 3.1.1 Drawdown Regulations in Real Trading Strategies From a standpoint of a fund manager, who trades clients' or bank's proprietary capital, and for whom the clients' accounts are the only source of income coming in the form of management and incentive fees, losing these accounts is equivalent to the death of his business. This is true with no regard to whether the employed strategy is longterm valid and has very attractive expected return characteristics. Such fund manager's primary concern is to keep the existing accounts and to attract the new ones in order to increase his revenues. Commodity Trading Advisor (CTA) determines the following rules regarding magnitude and duration of their clients' accounts drawdowns Highly unlikely to tolerate a 50% drawdown in an account with an average or small risk CTA. An account may be shut down if a 2' drawdown is breached. A warning is issued if an account in a 15% drawdown. An account will be closed if it is in a drawdown, even of small magnitude, for longer than 2 years; Time to get out of a drawdown should not be longer than a year. 3.1.2 Drawdown Notion in Theoretical Framework Several studies discussed portfolio optimization with drawdown constraints. Grossman and Zhou [27] obtained an exact analytical solution to portfolio optimization with constraint on maximal drawdown based on the following model Continuous setup Onedimensional case allocating current capital between one risky and one riskfree assets An assumption of lognormality of the risky asset Use of dynamic programming approach finding a timedependent fraction of the current capital invested into the risky asset Cvitanic and Karatzas [28] generalized this model [27] to multidimension case (several risky assets). In contrast to Grossman and Zhou [27] and Cvitanic and Karatzas [28], Chekhlov et al. [29] defined portfolio drawdown to be the drop of the current portfolio value comparing to its maximum achieved in the past up to current moment t, i.e. max W(7)  rE[O,t] W(t), and introduced oneparameter family of drawdown functionals, entitled Conditional Drawdown (CDD). Moreover, Chekhlov et al. [29] considered portfolio optimization with a constraint on drawdown functionals in a setup similar to the index tracking problem [30], where an index historical performance is replicated by a portfolio with constant weights. Chekhlov et al. [29] proposed the following setup * Discrete formulation Multidimensional case several risky assets (markets and futures) A static set of portfolio weights li f, in a certain risk condition over the whole interval [0, T] No assumption about the underlying probability distribution, which allows considering variety of practical applications use of the historical sample paths of assets' rates of return over [0, T] Use of linear programming approach reduction of portfolio optimization to linear programming (LP) problem The CDD is related to ValueatRisk (VaR) and Conditional ValueatRisk (CVaR) measures studied by Rockafellar and Uryasev [21, 20]. By definition, with respect to a specified probability level a, the aVaR of a portfolio is the lowest amount (, such that, with probability a, the loss will not exceed (, in a specified time r, whereas the aCVaR is the conditional expectation of losses above that amount (,. Various issues about VaR methodology were discussed by Jorion [31]. The CDD is similar to CVaR and can be viewed as a modification of the CVaR to the case when the lossfunction is defined as a drawdown. CDD and CVaR are conceptually related percentilebased risk performance functionals. Optimization approaches developed for CVaR are directly extended to CDD. The CDD includes the average drawdown and maximal drawdown as its limiting cases. It takes into account both the magnitude and duration of the drawdowns, whereas the maximal drawdown concentrates on a single event maximal account's loss from its previous peak. However, Chekhlov et al. [29] only tested the ii:_: . 1 approach to portfolio opti mization subject to constraints on drawdown functionals. The CDD [29] was not defined as a true risk measure and the reallife portfolio optimization example was considered based only on the historical sample paths of assets' rates of return. This chapter is focused on Concept of drawdown measure possession of all properties of a deviation measure, generalization of deviation measures to a dynamic case Concept of risk profiling Mixed Conditional Drawdown (generalization of CDD) Optimization techniques for CDD computation reduction to linear programming (LP) problem * Portfolio optimization with constraint on Mixed CDD Our study develops concept of drawdown measure by generalizing the notion of the CDD to the case of several sample paths for portfolio uncompounded rate of return. Def inition of drawdown measure is essentially based on the notion of CVaR [32, 21, 20] and mixed CVaR [11] extended to a multiscenario case. Drawdown measure uses the concept of risk profiling introduced by Rockafellar et al. [11], namely, drawdown measure is a iiiiili.. ii i .. mixed CVaR applied to drawdown lossfunction. From theoretical prospective, drawdown measure satisfies the system of axioms deter mining deviation measures [11, 23]. Those axioms are: ,,,,,, ./:.,I .::. ..:,, /. I./: .:/ to con stant shift, positive 1...i,.....,. i, .:I/ and convexity. Moreover, drawdown measure is an example generalizing properties of deviation measures to a dynamic case. We develop optimization techniques for efficient computation of drawdown measure in the case when instruments' rates of return are given. Similar to the Markowitz meanvariance approach [3], we formulate and solve an opti mization problem with the reward performance function and CDD constraints. The reward CDD optimization is a piecewise linear convex optimization problem [18], which can be reduced to a linear programming problem (LP) using auxiliary variables. Linear programming allows solving large optimization problems with hundreds of thou sands of instruments. The algorithm is fast, numerically stable, and provides a solution dur ing one run (without adjusting parameters like in genetic algorithms or neural networks). Linear programming approaches are routinely used in portfolio optimization with various criteria, such as mean absolute deviation [33], maximum deviation [34], and mean regret [30]. Ziemba and Mulvey [35] discussed other applications of optimization techniques in the finance area. 3.2 Model Development Suppose a given time interval [0, T] is partitioned into N subintervals [tk, tk], k 1, N, by the set of points {to = 0, tl, t2, ..., tN = T}, and suppose there are m risky assets with rates of return determined by random vector r(tk) = (rl(k), r2(tk), ..., rm(tk)) at time moments tk for k = 1, N. We also assume that the riskfree instrument (or cash) with the constant rate of return ro is available. The ith asset's rate of return at time moment tk is defined by ri(tk) = itJ) 1, where pi(tk) and pi(tk1) are the ith asset's prices per share at moments tk and tk1, respectively. Let C denote an initial capital at to = 0 and let values Xi(tk) for i = 1, m and xo(tk) define the proportion of the current capital invested in the ith risky asset and riskfree instrument at tk, respectively. Consequently, a portfolio formed of the m risky assets and the riskfree instrument is determined by the vector of weights x(tk) = xo(tk), xl(tk), X2(tk),..., X,(tk)). The components of x(tk) satisfy the budget constraint i=0 By definition, the rate of return of the portfolio at time moment tk is r(P(xtk)) r(k) k) = f i (k) i k) (3.2) i=0 Portfolio optimization can be considered within a framework of a oneperiod or multi period model. A oneperiod model in portfolio optimization assumes the ith asset's rates of return for all tk, k = 1, N, to be independent observations of a random variable ri. In this case, the vector of portfolio weights is constant and portfolio rate of return is a random variable r() presented by a linear combination of random assets' rates of return ri, i 1, m, and constant ro, i.e. r() = ri i. A traditional setup for a oneperiod portfolio i=0 optimization problem from Risk Management point of view is maximizing portfolio expected rate of return subject to the budget constraint and a constraint on the risk max E(r(P)) s. t. lRisk(r()) < d, (3.3) E xi = i. i=0 Risk of the portfolio can be measured by different performance functionals, depending on investor's risk preferences. Variance, VaR, CVaR and Mean Absolute Deviation (MAD) are examples of risk functionals used in portfolio Risk Management [11]. Certainly, solv ing optimization problem (3.3) with different risk measures will lead to different optimal portfolios. However, all of them are based on a oneperiod model, which does not take into account the sequence of the asset's rates of return within time interval [0, T]. A multiperiod model in portfolio optimization is intended for controlling and optimiz ing portfolio wealth over a long term. It is essentially based on how the asset's rates of return evolve within the whole time interval. Moreover, in each time moment tk, k = 0, N, there might be a capital inflow or outflow into or from the portfolio and portfolio weights Xi(tk), i =, m, might be rebalanced. In this case, the portfolio wealth at tk for k = 1, N is defined Wk X(tk)) (Wk I(X k1)) + Y(tk1)) (1 + rP)(x(tk))), (3.4) where Y(tk1) F+(tk1) F(tk1) is the resulting capital flow at tk1 (inflow F+(tk_) minus outflow F_(tk_)), which can be positive or negative. A portfolio optimization problem can also be formulated based on the Expected Utility Theory (EUT). According to the EUT, an investor with additively separable concave utility function U(.) chooses a consumption stream {Co, C1,..., CN1} and portfolio to maximize E U (C(tk), tk) + B (W(tN,(tN))) where B(.) is the concave utility of bequest. \k=0 ) Note the EUT is focused on maximization of investor's consumption. However, a risk manager who runs a hedge fund and wishes to increase capital inflow by attracting new investors would be more interested in maximizing portfolio wealth at the final moment tN = T and decreasing portfolio drops over the whole time interval [0, T]. In this case, Risk Management approach is more adequate to formulate a portfolio optimization problem max P(W) s. t. R(W) < d, (3.5) EXi(tk) 1= k =0,N, i=0 where P(W) and R(W) are performance and risk functionals, respectively, depending on stream W = (Wi, W2, ..., WN). Suppose the optimization problem (3.5) is considered under the following conditions A manager cannot affect a stream of Y(tk) (if the portfolio value increases it is likely that capital inflow will also increase and viseversa). The manager can only allocate resources among different instruments (investment strategies) in the portfolio at every moment tk, k 0 N, i.e. he/she can only optimize portfolio rate of return by choosing portfolio weights Xi(tk). Accounting for these conditions, how can the manager evaluate portfolio performance over [0, T] and efficiently solve (3.5)? Before to answer to this question, the following legitimate issues regarding problem formulation (3.5) should be addressed How the risk is measured within [0, T] How the assets' rates of return are modeled within [0, T] What optimization approach is chosen to solve (3.5) 3.2.1 Dynamic Performance Functionals Does using variance, VaR, CVaR or MAD by itself make sense in a dynamic case? The obvious answer is no, since no one of them by itself takes into account the sequence of assets' rates of return. Although, the aforementioned risk measures may be quite appropriate if they are applied to a random variable or functional, which distinguishes different sequences of Wk in a stream (Wi, W2, ..., WN). One of the functionals accounting for a sequence of Wk is based on the notion of portfolio drawdown, which deals with the drop in portfolio wealth at time moment tk with respect to the wealth's maximum value preceding tk. By definition, the portfolio drawdown at tk is the ratio of the drop in portfolio wealth at tk to the preceding wealth's maximum Wk (k(t)) D (W, tk) 1 .kk) (3.6) max {Wj ((tj))} A possible problem formulation is to consider portfolio optimization with drawdown constraints in continuous dynamics [27, 28], for instance Pr {DD (W, t) < 7, V [0, T]} 1. (3.7) In this case, portfolio drawdown should not exceed given value 7 E [0, 1] almost surely for all t E [0, T]. However, instead of imposing the constraint (3.7) for all t E [0, T], we are interested in maximizing portfolio expected rate of return while controlling an integral characteristic of portfolio performance. We entitle such a characteristic to be a dynamic performance functional (DPF). In this case, following Risk Management methodology, we would be able to construct an efficient frontier establishing the dependence between the expected rate of return of optimal portfolios and corresponding values of the DPF. Devel oping DPF based on the notion of portfolio drawdown and solving a reallife assetallocation problem with these functionals is the subject of this chapter. 3.3 Absolute Drawdown for a Single Sample Path This section presents the notion of the Absolute Drawdown (AD) and considers three DPF based on this notion. The AD is applied to a sample path of the uncompounded cumu lative portfolio rate of return. Note the AD is applied not to the compounded cumulative portfolio rate of return Wk(x(tk)). If the values of r) (x(tk)) for k = 1, N determine a sam ple path (time series) of the portfolio's rate of return, then, by definition, the uncompounded cumulative portfolio rate of return at time moment tk is 0, k = 0, Wk(X(tk)) k (3.8) Sr( )), k= 1,N. 1=1 To simplify notations, we use wk instead of Wk(x(tk)), assuming that wk is 1,, a function of vector x(tk). Further in this section, we consider only a single sample path of wk, k = 1, N, which we denote by vector w, i.e. w = (wl, ..., WN). Definition 1. The AD is a vectorialfunctional depending on the sample path w AD(w) = = (1, ..., N), k = max (wj} wk. (3.9) Note that components (wl, ..., WN) and (&1, ..., wN) of vectors w and 4, are, in fact, time series wl, ..., WN and (1, ..., AN, respectively, where the kth components of w and ( correspond to time moment tk. Since o is I1. , zero, we do not include it into drawdown time series Moreover, although AD(w) and ( are the same drawdown time series, we refer to notation AD(w) to emphasize its dependence on w and to notation ( whenever we use drawdown time series just as vector of numbers. Figure 31 illustrates an example of the absolute drawdown ( and a corresponding sample path of uncompounded cumulative rate of return w. Starting from to = 0, uncom pounded cumulative rate of return w goes up and the first component of equals zero. When w decreases, ( goes up. When time series w achieves its local minimum, absolute drawdown achieves its local maximum. This process continues until tN = T. S= (w, ..., WN) t 0 t, t2 T Figure 31: Time series of uncompounded cumulative rate of return w and corresponding absolute drawdown (. Proposition 1. Defining vectorial operations: w + const = (wi + const, ..., WN + const) and Aw = (Awl, ..., AwN), the AD(w) satisfies the following properties 1. Nonnegativity: AD(w) > 0. 2. Insensitivity to constant shift: AD(w + const) = AD(w). 3. Positive homogeneity: AD(A w) = A AD(w), VA > 0. 4. Convexity: if wA A wa + (1 A) Wb is a linear combination of any two sample paths of uncompounded cumulative rates of return, Wa and Wb, with A e [0, 1], then AD(w\) < A AD(wa) + (1 A) AD(b). Proof. Properties 13 are direct consequences of definition (3.9). Property 4 is proved based on max {Aw + (1 ) wb} < A max {Wa + (1 ) max {wb}, A [0, 1]. 0 O difference between the AD and DD is similar to the difference between absolute and relative errors in a measurement. The AD and DD functionals can be used in Risk Management and Statistics to control absolute and relative drops in a realization of a stochastic process. However, in this chapter we are focused on applications of drawdown functionals in portfolio optimization. Since further in this chapter, we deal only with the absolute drawdown functional, AD, the word I .... Iii can be omitted without confusion. 3.3.1 Maximum, Average and Conditional Drawdowns We consider three DPF based on the notion of drawdown: (i) Maximum Drawdown (MaxDD), (ii) Average Drawdown (AvDD), and (iii) CDD. The last risk functional is actually a family of performance functions depending upon parameter a. It is defined similar to CVaR [20] and, as special cases, includes the MaxDD and AvDD. Definition 2. For given time interval [0,T], partitioned into N subintervals [tkl,tk], k = 1, N, with to = 0 and tN = T, AvDD and MaxDD functionals are defined, respectively MaxDD(w) max {k} (3.10) 1 1N AvDD(w) = k. (3.11) k 1 To define Conditional ValueARisk (CV@R) and CDD, we introduce a function 7r(s) such that s) N (3.12) k= 1 where I/{ to zero, if the condition is false, i.e. 1, c 0, c > s, Figure 32 explains definition of function 7re(s). For the threshold s shown on the figure, function 7r(s) equals j, since (k < s for 5 values of k, namely, k = 2, 3, 4, 7, 8. The inverse function to (3.12) is defined 1() { inf{s I r(s) > a}, a (0,1], % 1(a) (3.13) 0, = 0. Remark 1. Since all ,k, k =1, N, are nonnegative, we define 7r 1(0) to be zero. Remark 2. In fact, Va e (0,1], s = 7 1(a) is the unique solution to two inequalities (3.14) 7 (s 0) < _a < 7(s + 0). Figures 33 and 34 illustrate left and right continuous step functions 7rt(s) and 7 "1(a), respectively, which correspond to drawdown time series ( shown on Figure 32. Drawdown time series       0 t t t, t4 t5 t6 t7 ts =T Figure 32: Drawdown time series ( and indicator function I{c Let ((a) be a threshold such that (1 a) 100% of drawdowns exceed this threshold. By definition, (3.15) If we are able to precisely count (1a)* 100% of the worst drawdowns, then r{(((a)) 7 ((a)) = a. For such a value of the parameter a, the CV@R of (k, k = 1, N, is defined as the mean of the worst (1a)*100% drawdowns. For instance, if a 0, then CV@R is the average drawdown, and if a = 0.95, then CV R is the average of the worst 5% drawdowns. However, in a general case, 7{((a)) 7= t 1(a)) > a, followed from definition (3.13). It means that, in general, we are not able to precisely count (1 a) 100% of the worst drawdowns. In this case, the CV@R becomes a weighted average of the threshold ((a) and the mean of the worst drawdowns strictly exceeding ((a). ((a) = 71(a). x(s) 7/8  3/4  5/8  ............................ 3/8 * / 2 . . . . .8 1/4  Sa(a Figure 3 3: Funltion i(s) 8  < >  Definition 3. For a given sequence of k, k = 1, N, CV@R is formally defined by CVI@RO() ( (()) ((c) + 1 k, (3.16) S la ( + (1 a)N k where E = k ~I > C(a), Ak 1, N}. Note the first term in the righthand side of (3.16) appears because of inequality tr (~ (a)) > a. If (1 a) 100% of the worst drawdowns can be counted precisely, then 7T (7r1(Ca)) = a and the first term in the righthand side of (3.16) disappears. Defi nition (3.16) follows from the framework of the CVaR methodology [21, 20]. Close relation between the CVaR and CV@R is discussed in the following remark. Remark 3. CV@Ro, given by (3.16), and functional CVaR, [11] (p. 7, example 4), are linearly dependent, i.e. if X is an arbitrary random variable then 1 CV@R,(X) (E(X) + a CVaR,(X)). (3.17) Thus, use of the CV@R or CVaR is only the matter of convenience. Definition 4. In a single scenario case, the CDD with tolerance level a E [0, 1] is the CV@R applied to the drawdown functional, AD(w), A,(w) = CV@RA(AD(w)). (3.18) Equivalently, interpreting (k, k = 1, N, to be observations of a "random variable" aCDD is the CV@Ro of a loss function AD(w). 3.3.2 Conditional ValueatRisk and Conditional Drawdown Properties CDD is an example of a functional generalizing properties of deviation measures to a dynamic case. However, since CDD is closely related to CVaR, which properties were studied in detail by Rockafellar and Uryasev [21, 20], it is useful to discuss CDD properties based on properties of CVaR. Because of linear relation (3.17), we can replace CVaR by CV@R. Proposition 2. CV@Ro,() satisfies the following properties 1. Constant translation: CV@R,( + const) = CV@R,(0) + cost, Va E [0, 1]. 2. Positive homogeneity: CV@R (A ) = A CV@R (0), VA > 0 and Va E [0, 1]. 3. Monotonicity: if (k < rk, Vkl,k, then CV@Rs(0) < CVA@Rs(r), Va E [0,1]. 4. Convexity: if A X a + (1 A) 4b is a linear combination of any two drawdown sam ple paths a and b with A E [0, 1], then CV@Ra(6 ) < A CV@Ra(~) + (1 A) CV@Ra(b). Proof. Based on linear relation between CV@R, and CVaRo, given by (3.17), properties 14 are direct consequence of CVaR, properties [11]. O Proposition 3. The CDD = A,(w) satisfies the properties of deviation measures, i.e. 1. Nonnegativity: Aa(w) > 0, Va E [0, 1]. 2. Insensitivity to constant shift: A,(w + const) = A,(w), Va E [0,1]. 3. Positive homogeneity: A,(A w) AA,(w), VA > 0 and Va E [0,1]. 4. Convexity: if w = A w, + (1 A) Wb is a linear combination of any two sample paths of uncompounded cumulative rate of returns Wa and Wb with A E [0, 1], then A,(wA) < A A (Wa) + (1 A) A (Wb) Proof. Properties 1 4 follow from Propositions 1 and 2. Indeed, based on the relation between the CDD and CV@R, i.e. A,(w) = CV@Rs(AD(w)), the first property is a direct consequence of AD(w) nonnegativity. Properties 2 4 are proved, respectively, A,(w + c) = CV@R(AD(w + c)) = CVeR, (AD(w)) = A,(w), A,(A w) = CV@R,(AD(A w)) = CV@R,(AAD(w)) = A CV@R, (AD(w)) = AA,(w), A,(wA) = CVAR,(AD(A wa + (1 A)Wb)) < CVAR, (A AD(w,) + (1 A)AD(wb)) < A CV@R, (AD(Wa)) + (1 A)CV@R, (AD(Wb)) Aa(W) )+(1 A)Aa(Wb). Note the monotonicity property of CV@R is used in the first line of the proof of CDD convexity. O Proposition 4. MaDD (3.10) and AvDD (3.11) are the special cases of the aCDD func tional (this notation is used to emphasize CDD dependence on a), namely, MaxDD(w) = Ai(w), AvDD(w) = Ao(w). (3.19) Proof. To prove the first formula of (3.19), we assume that ((1) < oo. Based on this assumption, in the case of a = 1, we have ((1) = r (1) = rl(1) = ((1), i.e. function ((a) is constant in the left vicinity of 1. Hence, r(((1)) = r(((1)) = 1, E1 = 0 and Ai(w) = (1) lim = ((1) lim )= (() MaxDD(w). Al>1 r a>1 1 r When a consequently 0, according to the definition (3.13), ((0) 0, 0o fJk 1,N\ and, Ao(w) Nk =E AvDD(w). tkEO0 k=1 Theorem 1. CVRo,() can be presented in the alternative form 1 CV@Ra() = 1r 1(q) dq, o which is mathematically equivalent to (3.16). Proof. Let s j j = 1, J be the set of the ordered values of sk, k = 1, N, number of different values of (k, k = 1, N, such that si < S2 < ... < SJ N J the multiplicity of sj, i.e. nj = E I{ k=S } and E nj = N. Defining qj k=1 j=1 functions rrt and 1 are determined by the set of (sj, qj), j 1, J, i.e. 7"(j) qj, 7" (qj) = sj. where J is the and n~j > 1 is : E nzi, step 1(3.21) (3.21) J J Let so 0 and qo = 0, then since n (qj1, qj] 0 and U (qj1, qj] (0, 1], for any j=1 j=1 value of a E (0, 1], there exists j* from 1, J such that a E (qj _1, qj ]. Using (3.21) and condition a E (qj*_ qj ], we obtain C((c) sj, 7r(((a)) j*, and, consequently, 1 J N : sj nj I=3*+l J E ~1(qj) (qj j1) j=j*+1 (3.20) tk E tkhE 1 (q) dq. qj* Y, Taking the last relations into account, for any a E (0, 1), the integral in the righthand side of (3.20) is presented 1 1 J (q) dq (qj a) sj* + J (q) dq = (T(((a)) a) ((a) + N I k, a qj* tkE ao7 which coincides with the expression (3.16) with accuracy of multiplier (1 Only two cases are left to consider, namely, when a = 0 and a ~t l(1) < oc, we have, respectively, 1 J Ao(w) f7Et(q) dq = E nj sj 0 j=1 A(W) = lim l T l(q) dq al a . Assuming = 1. Assuming N  E k = AvDD(w), k1 7 1(1) MaxDD(w). Remark 4. Let X be an arbitrary random variable with the cumulative distribution function Fx(t) = Pr{X < t}. Assuming Fx1(a) to be the inverse function of Fx(t), functionals CV@Ro and CVaRo are expressed, respectively, 1 CV@Ra(X) 1 J Fl(q) dq, CVaRa(X) a 0j FI(q) dq. 0 Relation (3.17) can be verified based on (3.22). CVaR methodology was thoroughly devel oped by Rockafellar and Uryasev [21, 20]. Example 1. To illustrate the concept of the CV@R, let us calculate CV@Ro.7() for drawdown time series ( shown on Figure 3 2. According to Figure 34, ((0.7) = '1(0.7) (6, and, consequently, from Figure 33, 7d(((0.7) = 7r(6) = 0.75. Using formula (3.16), we obtain CV@Ro.7() ( 7) 6 + 107 1 6 + (1 + 5). To verify this result, we can calculate CV@Ro.7() based on (3.20). Namely, following Figure 34, we have CV@Ro.7() 1 = ((0.75 0.7) (6 + (0.875 0.75) 61 + (1 0.875) 5) 6 6 + 5 1 + 5 5. Example 2. For the drawdown time series shown on Figure 32, MaxDD(w) = 5 and 8 AvDD(w) E (k. k= 1 (3.22) 3.3.3 Mixed Conditional Drawdown The notion of CDD can be generalized by considering convex combinations of the CDDs corresponding to different confidence levels. This idea is essentially based on risk profiling, i.e. assignment of specific weights for CDDs with predetermined confidence levels. Definition 5. Given a risk P",,i., X(a) such that 1) dX(a) > 0; 1 2) j dx(a) 1; 0 mixed CDD, is defined 1 Ax (W) (w ) dX(a). (3.23) 0 Obviously, the mixed CDD preserves all properties of A,(w) stated in proposition 4. A fund manager can flexibly express his or her risk preferences by shaping x(a). Proposition 5. The mixed CDD can be presented in the alternative form 1 A (w) f'(a) p(a) da, (3.24) 0 with i" 'rum" p(a) to be: 1) nonnegative on [0, 1]; 2) nondecreasing on [0, 1]; 1 3) j p(a) da =l. 1. 0 The relation between x(a) in (3.23) and p(a) in (3.24) is 1 dp(a) dX(a). Proof. Expressing A,(w) in the form of (3.20), consider /A(W) ( ff (q) dq, dx) f of (q)Iq> dq dx(a) 0 a 0 0 f fl(q) (f Iq>_} dx(o) dq l(q) (I J (a ) dqI 0 \0 0 0 1 0 a where p(a) = J' dx(q) satisfies all properties 1) 3). Indeed, p(a) is nonneg 0 1 ative and 1iin. .1. I.i,,:_. since dp(a) = a dX(a) > 0. Moreover, Jp(a) da 0 1 1 ff Ti 1 {a>q} dx(q) da = 1. Obviously, conditions 1) 3) are necessarily satisfied by 00 function p(a), since they are derived from the properties of function X(a). However, if function p(a) satisfies conditions 1) 3) then it is sufficient for (3.24) to be constant trans lating, positively homogeneous, monotonic and convex with respect to The last fact comes from a direct verification of those properties. O Corollary 1. The nondecrease property of 1"' I ium," p(a), is a necessary condition for the mixed CDD to be convex. This property has an obvious but important interpretation, namely, the greater drawdown quantile, 1, is, the greater p" inoi/ coefficient, jp, should be assigned. A similar conclusion regarding risk spectrum in coherent risk measures was made by Acerbi and Tasche [32]. This conclusion is a consequence of a general ...... ,. ,,.i principle, stating: the greater risk is, the more it should be penalized [1]. Example 3. MaxDD and AvDD are mixed CDDs with risk profiles X(a) = I{a>} and X(a) = I{a>o}, respectively. Discrete risk profile. An important case is when risk profile, X(a), is specified by the discrete set of points Xi = d(ai), i = 1, L. In this case, the mixed CDD is expressed L A(w) = i Ai (W), (3.25) i= 1 L where Z Xi =1 and Xi > 0. Consequently, "' I 'um" function is presented by i 1 () I{a}. (3.26) i1 ai i= 1 Detail. Interchanging summation and integration operations in A+(w), the result follows L 1 1L A+ (W) iXi Aa ) Xi 7 i(q)1dq >a I(q)dq. i 1 i a, i Obviously, (3.26) is a positive nondecreasing function. 3.4 Optimization Techniques for Conditional Drawdown Computation This section develops optimization techniques for CDD efficient computation. Formulas (3.16) and (3.20) require to calculate the value of ((a) first, which doubles computational time. However, there is an optimization procedure that obtains the values of threshold ((a) and CDD simultaneously. This procedure is especially important in a large scale optimization. In the case when a time series of drawdowns is given, computation of the aCDD is reduced to computation of CV@Ro,(). Theorem 2. Given a time series of instrument's drawdowns = (1, ..., ZN), correspond ing to time moments {tl, ..., tN}, the CDD functional is presented by CV@Ro,(), which computation is reduced to the following linear programming procedure N CV@R,()= min y + 1 Zk Y, Z k~= (3.27) s. t. Zk > k Zk> 0, k = 1, N, leading to a single optimal value of y equal to ((a) if r~((a)) > a, and to a closed interval of optimal y with the left endpoint of ((a) if w(((a)) = a. Proof. We introduce a piecewise function 1 N h(y) y + ( [ y' (3.28) k1 )N k 1 where [k + = max (~k y, 0}, and establish the following relation CV@Ra(() min h(y). (3.29) y The derivative of h(y) with respect to y is presented t h(Y) N N (Y) aV=1a (3.30) (I ) 1N kI F1  N{< k Note h(y) is continuous for all values of y, except the set of points y k I k k 1, N}. The necessary condition for function h(y) to attain an extremum is d d+ dh(/) < 0 < h(y), (3.31) dydy y where h(y) = a (r (y 0) c) and %hh(y) = (1 (rr(y + 0) c) are left and right derivatives, respectively, which coincide with each other for all y except y { I k 1, N}. According to (3.30) and (3.31), an optimal value y* should satisfy inequalities 7(y* 0) < a < 7r(y* + 0), which have a unique solution y* = ((a) if 7~(((a)) > a (see Remark 2), i.e. if y* f { k k = 1, N. However, if C(((a)) a, then there is a closed interval of optimal values y*, with the left endpoint of ((a), namely, y* E [((a), ((a + 0)], where r((5(a + 0)) > a. Hence, two cases are considered: a) y* (a) if w((a)) > a; b) y* E [((a), (( + 0)] if ( (((a)) a. In both cases, equality [k y*]+ (k Y*) I{)>y*} (6k Y*) {>(c(a)} holds with respect to all ,k, k = 1, N, for any fixed y*. Thus, based on this fact, we obtain N min h(y) h (y*) = y* + E [k Y] Y k1 ( N y N i 1 a { a)} + (1a)N f>c(a) k=1 k1 k=1 1a + (1a)N E k, where @ (C(Q))a ) (7(C())) ,(())) where 1 _ ((a) in the case of a), and v(7_()) ) y* 0 in the case N Since expression L [k y]+ is minimized, it can equivalently be presented by the sum CV@R(()= max E (kGk s.. k = 1, O qk k = N. k=. t. 1, 0 N 1,N. k 11a) The value of CV@Ro,() can be found in O(n log2 n) time. Proof. It is enough to observe that knapsack problem (3.32) is dual to linear programming problem (3.27). Based on duality theory, optimal values of the objective functions in (3.27) and (3.32) should coincide. Problem (3.32) can be solved by the standard ,,.. .I algorithm in O(n log2 n) time. The algorithm sorts items according to their ... I k = 1, N}. Let [a] denote the integer part of real number a. Obviously, qvariables, corresponding to the largest [(1 a)Nj i... I" have optimal values equal to ( rN, and the qvariable, corresponding to the ([(1 a)Nj + 1)th .. " in the sorted order, has optimal value equal to 1 [( N The rest of qvariables equal 0. In this case, the complexity of the algorithm is mainly determined by a sorting procedure, which, in this case, requires at least O(n log2 n) operations. 0 Formulation (3.32) is closely related to the presentation of CV@R based on the concept of a risk envelope, which is a closed, convex set of probabilities containing 1. Risk envelope theory was developed by Rockafellar et al. [11, 23]. Suppose, a sample path of instrument's rates of return (ri, ..., rN), corresponding to time moments {tl, ..., tN}, is given. In this case, uncompounded cumulative instrument's k rate of return at tk is Wk = r1, and the CDD is presented in the form of A,(w). = 1 Proposition 6. Given a sample path of instrument's rates of return (ri, ..., rN), the CDD functional, Aa(w), is computed by the following optimization procedure N A(w) = min y + 1 N Zk u, y, z k1 s.t. Zk uk, (3.33) Uk Uk1 rk, o = 0, Zk > 0, Uk > 0, k = 1,N, which leads to a single optimal value of y equal to ((a) if 7r(((a)) > a, and to a closed interval of optimal y with the left endpoint of ((a) if 7r(((a)) = a. Proof. By virtue of relation Aa(w) = CV@Ra(AD(w)) = CV@Ra(), optimization prob lem (3.33) is a direct consequence of (3.27). Using recursive formula k = ['k1 rk+, constraint Zk > k y in (3.27) is reduced to zk > uk y, where nonnegative auxiliary variables Uk satisfy additional constraints Uk > 'k1 rk, k = 1, N, with uo = 0. 0 Corollary 3. Given a sample path of instrument's rates of return (ri, ..., rN), the CDD functional, A,(w), is computed by the following optimization procedure N A,(w) = max E rk q' 7 k=1 N s. t. E qk = 1, %k k+l < k < (1 (3.34 k 1 qk > O, lk > O, lN+ = 0, k = N. Proof. Problem (3.34) is dual to linear programming program (3.32). O Theorem 2 and all its corollaries can be easily generalized to the case of mixed CDD. Proposition 7. Given a sample path of instrument's rates of return rk k =1, N and discrete risk profile Xi = d(ai), i = 1, L, the mixed CDD, A+(w), is computed by L N A+(w) min Xi Yi + (a)N ik U, z i 1 k=1 s. t. Zik > k Yi, (3.35) Uk > Uk rk, Uo = 0, Zik > 0, Uk > 0, i 1, L, k = 1,N. Proof. Formulation (3.35) is a direct consequence of mixed CDD definition (3.25) and optimization problem (3.33). Notice that auxiliary variables Uk do not have index i, since they determine the drawdown sequence same for all ai. O 3.5 Multiscenario Conditional ValueatRisk and Drawdown Measure This section presents concept of the MlNl i.., i ,, CV@R and drawdown measure, which, in fact, are the CV@R and CDD defined in the case of several sample paths for uncompounded cumulative portfolio rate of return. We generalize results obtained for the CDD under assumption of a single sample path to the case of several sample paths. Let Q denote a discrete set of random events, i.e. Q { uwj j = 1,K}, and let K pj be the p .l. .il, i Iv of event wj (Vj : pj > 0, and E P = 1). Suppose rj(tk) j=1 (rIj(tk), r2j(tk), ..., rj(tk)), k = 1, N, is the jth sample path for the random vector of risky assets' rates of return, corresponding to random event wuj e and time interval [0, T] presented by the discrete set of time moments {to = 0, fi, t2, ..., tN = T}. Consequently, the jth sample path for the rate of return and uncompounded cumulative rate of return of a portfolio with capital weights x(tk) = (o(tk ), 1(tk), x2(tk), ..., m(tk)) are defined, respectively, r) (X(tk)) = r(tk) (tk) = ri (tk) i (tk), (3.36) i= 1 0, k = 0, Wjk(X(tk)) = k(3.37) Erl (x(h)), k= 1, N. 1=1 To simplify notations, we use wjk instead of wjk(x(tk)) implying that wjk is i1., a function of x(tk). In a multiscenario case, w denotes matrix {wjk}, j = 1, K, k = 0, N. 3.5.1 Multiscenario Conditional ValueatRisk Definition 6. In a multiscenario case, the AD(w) is a matrixfunctional defined on Q x [0, T] AD(w) = {Jjk/}, jk = max {j} Wjk, j = 1, K, k = 1, N. (3.38) O All AD properties stated in Proposition 1 hold in a multiscenario case. Indeed, based on (3.38), properties 1 4 in Proposition 1 can be verified directly. Matrix AD(w) is interpreted to be drawdown surface (jk, (wj, tk) E G x [0, T]. Definition 7. Similar to definitions of MaxDD and AvDD in single scenario case, MaxDD and AvDD are defined on Q x [0, T], respectively, MaxDD(w)= max {(jk}, (3.39) I NK AvDD(w) 1 1p jk. (3.40) k=lj=1 Definition 8. Indicator function for drawdown surface,its inverse function and threshold plane, ((a), are defined, respectively, 7(S) E N j <, (3.41) k=lj=1 \ inf{s I 7r(s) > a} a E (0, 1], 0, = 0, ((a) ="(a). (3.43) Figure 35 illustrates drawdown surface (jk and threshold plane ((a). jk Drawdown surface a threshold plane // i  I/ / / I / / / j// scenarios / I I / y / / Y__ . . . .  t1 t2 t3 time Figure 35: Drawdown surface and threshold plane Definition 9. Multiscenario CV@R may be defined similar to a single period CV@R, namely, CV@R(a) () M(c) + ( P jjk, (3.44) 1a (() (1 a)N I: jkE oa where o = (jk I jk > ((a), k 1, N}. Proposition 8. Multiscenario CV@R, given by (3.44), can be presented in the alternative form CV@R(0) l (q)dq, (3.45) a where r. 1(q) is the inverse function given by (3.42). Proof. Similar to the proof of Theorem 1. 0 Remark 5. Let X be an arbitrary random variable. Suppose we are given K sample paths X(tk, wj), k = 1, N, corresponding to random events wuj E Q with probabilities pj such that K NK E pj 1. Defining an indicator function for X to be 7x(s) Z Z PjI{x(tk,w) (where the inverse function 7rx1 is defined similar to (3.42)), multiscenario CV@R may be 1 determined similar to a single period CV@R, namely, CV@Ro(X) = T f 7xl(q) dq. a 3.5.2 Drawdown Measure In a multiscenario case, CDD with tolerance level a is interpreted as The average of the worst (1 a) 100% drawdowns on drawdown surface, if the worst (1 a) 100% drawdowns can be counted precisely The linear combination of ((a) and the average of the drawdowns strictly exceeding threshold plane ((a), if we are unable to precisely count of (1 a) 100% drawdowns A strict mathematical definition of the drawdown measure is given below. Definition 10. In a multiscenario case, the CDD, with tolerance level a E [0,1], is the multiscenario CV@R, applied to drawdown surface, AD(w), A,(w) = CVR,(ADp(w)), (3.46) and drawdown measure is the mixed CDD with risk profile X(a) 1 A (w)A) A (w) dX(ac), (3.47) 0 where A,(w) is given by (3.46). Proposition 9. Defining matrix operations: w + const = {wjk + const} and Aw = {Awjk}, drawdown measure A (w) satisfies the following properties 1. Nonnegativity: A+(w) > 0, Va E [0, 1]. 2. Insensitivity to constant shift: AZ(w + const) = A+(w), Va [0, 1]. 3. Positive homogeneity: A+(A w) AA+(w), VA > 0 and Va E [0, 1]. 4. Convexity: if w = A wi + (1 A) w2 is a linear combination of any wi and w2 with A E [0,1], then A+(w/A) AA+(wi) + (1 A) A+(w2). Proof. Properties 14 are direct generalization of CDD properties stated in Proposition 4. Proposition 10. In the case of discrete risk profile, drawdown measure is computed by L N K A+(w) min E Xi y+ E E Pj Zijk U, Z i= k= l 1 s. t. Zijk > Ujk yi, (P) (3.48) t1,A > 0, Uj = 0, Zijk > 0, i 1,L, j 1,K, k =1,N. Proof. Introducing intermediate optimization problems L L ( N K Xi CV@R,,() min X Yi + 1 S )N [~jk Y +i i=1 Ii 1 a k1 ji 1 1 SXi CV@R, (0) min E y y + 1 E z i 1 yi, zijk i= ) k=lj= 1 s. t. Zijk > Njk yi, Zijk > 0, i 1,L, j 1,K, k= 1,N, the proof is conducted similar to the proof of Theorem 2. 0 3.6 Portfolio Optimization with Drawdown Measure This section formulates a portfolio optimization problem with drawdown risk measure and ii:_:_ r efficient optimization techniques for its solving. Optimal asset allocation considers Generation of sample paths for the assets' rates of return. Uncompounded cumulative portfolio rate of return rather than compounded one. In this case, optimal asset allocation maximizes the expected value of uncompounded cumulative portfolio rate of return at the final time moment tN = T subject to a constraint on drawdown measure K max E, (w(T,w,x)) = E j wjN(X) x2X j1 (3. i s. t. AZ+(w(x)) < , where X is the set of linear 1.1 In..!! .1!" constraints and y7 [0,1] is a proportion of the initial capital allowed to loose. In contrast to Grossman and Zhou [27] and Cvitanic and Karatzas [28], who considered vector of portfolio weights to be a function of time within [0, T], we assume portfolio weights x(tk) to be static for all tk, k = 0, N. This special strategy can be achieved by portfolio rebalancing at every tk, k = 0, N. Justification of this assumption depends on a particular case study. Based on the assumption made, uncompounded cumulative portfolio rate of return w is rewritten k m k Wjk(x) i(j) L) Xi. (3.50) =1 i=ll= 1 3.6.1 Reduction to Linear Programming Problem Theorem 3. Problem (3.49) is reduced to linear programming (LP) problem K max E pj wjN(x) u, xEX, y, z j=1 L N K .t. Z (Li i+ (l N P Zik i 1 k 1 j 1 zijk >_ 71,A Yi, (3.51) Ujk > Uj(k1)rjk Ujk > 0, jo = 0, Zijk > 0, i 1,L, j 1,K, k 1,N, where 'i yi and zijk are auxiliary variables. Proof. Consider piecewise function H(x, y) i= ( (1 a)N k= lj= According to Proposition 10, drawdown measure may be presented by L A x(w( )) xi CVR,, (a(x)) = min H(x,y). (3.53) i= 1 Consequently, problem (3.49) is reduced to K max E pj WN(X) x2x j=1 (3.54) s. t. min H(x,y) <7, y The key point of the proof is to show that minimum in the constraint of (3.54) may be relaxed, i.e to show that problem (3.54) is equivalent to K max E pj WN(X) xEX,y j=1 (3.55) s. t. H(, y)< , The proof of this fact is conducted by relaxing constraint min H(x, y) < C; in (3.54), namely, problem (3.54) is equivalently rewritten ( K / mm max 1pj WjN(X) + A min H(x, y) A>0 xEX Y (K mm max Zpj wjN(x) + A (7 H(x, y)) (3.56) A 0 xEX, y However, problem (3.56) is the Lagrange relaxation of (3.55). Hence, (3.55) is equivalent to (3.54). According to Theorem 4 and Proposition 10, LP (3.51) is a direct consequence of (3.55). O Corollary 4. In the cases of MaxDD(w) and AvDD(w), corresponding to the mixed CDD with risk profiles of x(a) = I(,>o} and x(c) = I{ai>}, LP (3.51) is simplified, respectively, K max pj WjN(X) u, x2X j=1 s. t. Ujk Uj(k1) "'(), (3.57) 7 > Ujk > 0, Ujo = 0, j 1, K, k 1, N, K max pj WjN(X) u, x2X j=1 N K s.t. 1 E E I k= j=1 (3.58) 1,1i > Uj(k1) I (X),(3.5) Ujk > 0, jo = 0, j 1,K, k 1, N. 3.6.2 Efficient Frontier Efficient frontier is a central concept in Risk Management methodology. Suppose for every value of 7 and risk profile X (7) is an optimal solution to (3.51). In this case, efficient frontier is a curve expressing dependence of optimal portfolio expected reward K Z Pj wjN(X* ()) on portfolio risk 7. j=1 Proposition 11. Efficient frontier 7, pj WjN(x (7)) is a concave curve. j =1 K Proof. Denoting g(x) = pj WjN(x), we show that for any 71,2 E [0,1] and 7 e [0,1] j=1 g(x(7 7 + (1 T)72)) >7 T (7i)) + (1 T) (72)). According to the proof of Theorem 3, we have g( x(7))= max g(x) XEX, y s. t. H(x, y) <7, and using notation GA(x, y) g(x) A H(x, y), we obtain g(*(7)) min max (GA(x, y) + A7) min (G'A((A),y(A)) + A7). A>0 xEX, y A>0 Since expression G,(x(A), y(A)) + A7 is linear with respect to 7, min (G,(x(A), y(A)) + AT) A>O is a concave function of 7. Indeed, mm (GA(x(A),y(A)) + A(7 i1 + (1 )72)) A>0 mmin ( (G(x(A), y(A)) + A71) + (1 T) (GA(x(A), y(A)) + A72)) A>0 > T min (GA(x(A), y(A)) + A1) + (1 7) mmin (GA(x(A), y(A)) + A72). A>0 A>O This fact proves the proposition. O Riskadjusted return is an important characteristic for choosing an optimal portfolio on an efficient frontier that evaluates the ratio of the portfolio reward to the portfolio risk K Px(7) = Pi jwjN(XX(7)). (3.59) j=1 A fund manager is interested in such a value of 7 e [0,1], for which the iii: il usted return Px(7) is maximal. It is interpreted to be the best balance between the risk accepted K and the rate of return achieved. According to Proposition 11, E pj wjN(x* ;()) is concave, j=1 hence, ratio Px(7) has a finite global maximum. Although Px(7) is a nonlinear function with respect to 7, a problem for finding px(7) maximum and corresponding optimal 7 is reduced to an LP. Proposition 12. The optimization problem max px(7) is reduced to LP 7[0,1] K max E pj wiN(X) i, v, 2EX, y, z j=1 SN K i=. t. k=X j= 1 Zijk _> ,A i, (3.60) Ujk >_ tj(k1) rk , tjk > 0, jo = 0, Zijk > 0, i 1,L, j 1,K, k 1,N. K If x* is an optimal solution to (3.60) then px(7*) = max Px(7) = Pj WjN(x*), with ye[0,1] j 1 Im optimal value 7* = 1 / E and corresponding optimal portfolio xf = x I = 0, m. / l0 K K Proof. Since max Px() = max 71 E pi wjN(X*(0)) max max y1 pj WjN(x), y_ [0,1] yE[0,1] j1 yE[0,1] xEXx j=1 where Xx is the set of constraints in problem (3.51), the problem of K max max 71 E pj wgjN() is reduced to LP (3.60) by changing variables 1 x= /7, 7E[0,1] xEXx j=1 yi = yi/l, tkj = Ukj/i, 2ijk z= ijk/, I 0, m, i = 1,L, j = 1, K, k = 1, N. Set X may include additional variable v = 1/7. For instance, a box constraint Xmin < Xl < Xmax from the set X is transformed to Xmin V < x < Xmax v, which is an element of X. 0 3.7 Drawdown Measure in Reallife Portfolio Optimization 3.7.1 Static Asset Allocation This section formulates and solves a reallife portfolio optimization problem with a static set of weights using drawdown measure. A problem of dynamic weight allocation when asset (or a set of assets) is logBrownian under a constraint on the worst equity drawdown was considered in several papers. First, a 1dimensional case was solved by Grossman and Zhou [27] as a mathematical programming problem. Then, the problem was generalized to a multidimensional case by Cvitanic and Karatzas [28]. In contrast to Grossman and Zhou [27] and Cvitanic and Karatzas [28], we are inter ested in a constant set of weights that optimizes a certain portfolio of assets, which are not assumed to have a logbrownian dynamics. This problem is stimulated by several important practical financial applications, particularly related to the socalled hedgefund business. A Commodity Trading Advisor (CTA) company is a hedge fund that normally trades several (sometimes, more than a 100) futures markets simultaneously using some mathe matical strategies that it believes have certain edge. Such a company manages substantial assets as a part of all hedge funds, by some estimates, close to $100 BN. Most of the CTA community trades the, socalled, longterm trendfollowing systems, but there are now mul tiple examples of shortterm meanreverting trading systems as well. These systems may be viewed as some functions of the individual futures market price realized prior to the present time. These strategies normally have a substantial smoothingout effect on the futures prices and have close to stationary properties. Every CTA, then, has to allocate a certain portion of overall risk (or overall capital that it manages) to each and every "mar ket". Due to a substantial level of stationarity of the strategies, each CTA calculates the weights according to a certain internal proprietary weight allocation procedure. Normally, this set remains fixed and does not change unless a certain market gets added or removed from the set, which normally happens when a new system is introduced, when a certain market disappears (like Deutsche Mark or French Franc in 1999), or a new market is being added. A standard practice in the CTA community is to use some version of the classical Markowitz meanvariance approach. Another important example of static asset allocation comes form the socalled, Fund of Fund (FoF) business. In the recent several years this sector of hedge funds has experienced a substantial growth. A typical FoF manager gives allocations of its clients' capital to a set of preselected managers, normally between 5 and 25. It does so fairly infrequently, because of liquidity constraints imposed by managers themselves, but this is not the only reason. FoF views equity return streams as fairly stationary time series with some attractive return, risk, and correlation properties, which need some time to present themselves. Unless some unexpected event happens, the allocations are given for a substantial period of time, on average of 2 years or more. A group of .111 i', in a typical FoF is responsible for finding a constant set of weights, which makes a total portfolio of the FoF to be attractive to its clients. Both of these typical cases are faced with a problem of finding a constant set of weights, which optimize their portfolios in a certain sense. The practical goal of this chapter is to facilitate this process with a clear and statistically sound algorithm, which utilizes a newly designed set of riskmeasures based on a notion of an equity drawdown. Despite their known potential drawbacks, it is a wellaccepted and, moreover, recom mended practice [36], is to study historical backtested strategy results of a hedge fund and, based on these results, obtain an estimate of the inherent risk using some risk measures. The only popular quantitative risk measure is VaR [36]. Various insufficiencies of the VaR measure are also widely known. We believe that the results developed in our study would facilitate understanding of how this can be achieved. 3.7.2 Historical Data and Scenario Generation Even though scientists and engineers used certain simple versions of resampling pro cedures since 1930s, it was namely B. Efron [37] who unified the disconnected ideas; and resampling emerged as a robust method of estimating confidence intervals of some measur able functions over a statistical sample of data. Method is particularly useful for the time series where obtaining other realizations of the data may be difficult or even impossible. Bootstrap is a form of resampling the original data set bootstrap, which "resamples with replacement." Sometimes, the simplest version of it is called "nonparametric boot strap." The method originally was applied to some sociological and biological applications, I i, i :_. in the shade for statistical, engineering and financial applications up until the 1990s. Due to their intrinsic "one realization only"nature, the financial time series could be one of the best applications for resampling methods. Within financial applications, a strong particular interest in obtaining estimates of certain measurable quantities (such as rate of return, or standard deviation), comes from the development of trading systems. It is well known, that a problem of actual using over fitted trading systems can possibly lead to substantial financial losses. Therefore, it is hard to underestimate the importance of a problem of discovering how overfitted a particular trading , I. 11 is. Among a few examples, one can mention a single asset trading , ,. _11. for example, a system which trades a 1: djliusted continuous 10year U.S. Government Note futures contract, or, a more general portfolio optimization problem such as allocation of weights between several assets in a portfolio subject to certain constraints. Our study considers a particular example of optimal portfolioallocation problem. This example could be very relevant for global CTA managers, who apply certain trading sys tems (very frequently, longterm trendfollowing systems) across a wide set of global futures markets attempting to take advantage of price movements occurring in these markets. Nor mally, after they are content with their trading system, they have to make a decision of allocating their portfolio risk between various markets. In this example, we are given a set of sample paths of certain futures trading systems (in this particular case, some longterm trendfollowing system) as applied to a set of 32 different global futures markets. The , I. ii, includes long, short or flat markets, and I,. , trades the same number of contracts with the average trade length from one to two months. Here is a list of the markets with their corresponding exchanges that were traded by the system. Ticker , l.. .1, of FutureSource are used for their abbreviation. In alphabetical order of ticker symbol: 1. AAO The Australian All Ordinaries Index (OTC); 2. AD Australian Dollar Currency Futures (C'il3); 3. AXB Australian 10Year Bond Futures (SFE); 4. BD U.S. Long (30Year) Treasury Bond Futures (CBT); 5. BP British Pound Sterling Currency Futures (C'll'); 6. CD Canadian Dollar Currency Futures (C(Il3); 7. CP Copper Futures (COMEX); 8. DGB German 10Year Bond (Bund) Futures (LIFFE); 9. DX U.S. Dollar Index Currency Futures (FNX); 10. ED 90Day Euro Dollar Futures (C'NI); 11. EU Euro Currency Futures (CME); 12. FV U.S. 5Year Treasury Note Futures (CBT); 13. FXADJY Australian Dollar vs. Japanese Yen Cross Currency Forward (OTC); 14. FXBPJY British Pound Sterling vs. Japanese Yen Cross Currency Forward (OTC); 