Transceiver design for MIMO communications : a channel decomposition perspective

MISSING IMAGE

Material Information

Title:
Transceiver design for MIMO communications : a channel decomposition perspective
Physical Description:
x, 105 leaves : ill. ; 29 cm.
Language:
English
Creator:
Jiang, Yi
Publication Date:

Subjects

Subjects / Keywords:
Electrical and Computer Engineering thesis, Ph. D
Dissertations, Academic -- Electrical and Computer Engineering -- UF
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2005.
Bibliography:
Includes bibliographical references.
Additional Physical Form:
Also available online.
General Note:
Printout.
General Note:
Vita.
Statement of Responsibility:
by Yi Jiang.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 027715281
oclc - 847496253
System ID:
AA00025770:00001

Full Text











TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
A CHANNEL DECOMPOSITION PERSPECTIVE
















By
YI JIANG
















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005



































Copyright by Yi Jiang, 2005

All rights reserved.















ACKNOWLEDGMENTS
Foremost, I thank my advisor, Professor Jian Li, for her support, encouragement, and guidance in the past four years. Dr. Li provided me the invaluable opportunity to investigate those fascinating research problems and always showed full confidence in me. I just hope I can live up to her expectations in my future career. I am very grateful to my collaborator, Professor William W. Hager, whose suggestions and rigorous math have benefited me a lot. I thank Dr. Tan F. Wong for teaching me the information theory. Some of the basic ideas of this dissertation were formulated when I was taking his class in the fall of 2003. 1 would like to thank Dr. John M. Shea, Dr. Tan F. Wong, Dr. Kenneth K. 0, and Dr. William W. Hager for serving in my dissertation committee. Thanks go to all my friends both at the University of Florida and elsewhere who made the last four years full of fun.
This dissertation is dedicated to my parents and my fiance Hongying.
















TABLE OF CONTENTS
page
ACKNOWLEDGMENTS..................................... iii
LIST OF TABLES.......................................... vi
LIST OF FIGURES......................................... vii
ABSTRACT..................... ......................... ix
CHAPTER
1INTRODUCTION....................................... 1
1.1 Two Categories of Schemes for MIMO Communications.......... 1
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate......... 2 1.3 MIMO Transceiver Design from Channel Decomposition Perspective .3 1.4 Dissertation Outline..................................4
2 LINEAR MIMO TRANSCEIVER DESIGNS...................... 6
2.1 Channel Model and Channel Capacity ..... ... ... ... ...6
2.1.1 Channel Model .. .. .. .. .. ... ... ... .... ... ..6
2.1.2 Channel Capacity .. .. .. .. ... ... .. .... .....8
2.2 Channel Capacity and Cram6r-Rao Bound .. .. .. .. .... .....10
2.3 Rate Performance of Linear Transceivers .. .. .. ... ... ... ..13
3 MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION. .. .. .. ... ... ... ... ... ... ... ... ... ..17
3.1 VI3LAST and ZF-DP .. .. .. .. ... ... ... ... ... ... ..17
3.1.1 VBLAST .. .. .. .. ... ... ... ... ... ... ... ..17
3.1.2 ZF-DP .. .. .. .. .... ... ... ... ... ... ... ..18
3.2 Geometric Mean Decomposition for MIMO Transceiver Design ... 20 3.3 Performance Analyses and Imiplementat ions Issues .. .. .. .. .....21
3.3.1 Performance Analyses .. .. .. .. ... ... ... ... .....21
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking 23 3.3.3 Subchiannel Selection............................ 24
3.3.4 Further Remarks............................... 25
3.4 Performance Examples............................... 27
3.5 Conclusions. .. .. .. ... ... ... ... ... ... .... .....28
4 UNIFORM CHANNEL DECOMPOSITION .. .. .. .. ... ... ....33
4.1 Closed-Form Representationi of MMISE-VBLAST. .. .. .. .. .. ..34
4.2 UCD-VBLAST .. .. .. ... ... ... ... ... ..... ....35
4.3 UCD.-DP. .. .. .. .. ... ... ... ... ..... .... .....39
4.4 Performance Anialysis.................. .. .. .. .. .. ....40
4.11 Diversity Gain Anialysis.........................40
4.4.2 Further Remarks............................... 42


iv










4.5 Numerical Examples . . . . . . . . . . . . . 42
4.6 Conclusions . . . . . . . . . . . . . . . . 44
5 TUNABLE CHANNEL DECOMPOSITION . . . . . . . . . 52
5.1 Introduction . . . . . . . . . . . . . . . . 52
5.2 Channel Model and Preliminaries . . . . . . . . . . 53
5.2.1 Channel Model . . . . . . . . . . . . . 53
5.2.2 Channel Decomposition . . . . . . . . . . . 53
5.2.3 Majorization and Generalized Triangular Decomposition . . 55
5.3 Tunable Channel Decomposition . . . . . . . . . . 57
5.3.1 TCD-VBLAST . . . . . . . . . . . . . 57
5.3.2 TCD-DP . . . . . . . . . . . . . . . 61
5.4 MIMO Communications with QoS Constraints . . . . . . 63
5.5 CDMA Sequence Design . . . . . . . . . . . . 70
5.5.1 CDMA Sequences Maximizing Sum Capacity . . . . . 71
5.5.2 Uplink Case . . . . . . . . . . . . . 71
5.5.3 Downlink Case . . . . . . . . . . . . . 73
5.5.4 Numerical Example . . . . . . . . . . . . 73
5.5.5 Further Remarks . . . . . . . . . . . . . 75
5.6 Conclusions . . . . . . . . . . . . . . . 75
6 NOVEL MATRIX DECOMPOSITIONS . . . . . . . . . . 81
6.1 Introduction . . . . . . . . . . . . . . . . 81
6.2 Geometric Mean Decomposition . . . . . . . . . . 81
6.2.1 Generalized Maximin Properties . . . . . . . . 83
6.2.2 Implementation Based on Initial SVD . . . . . . . 84
6.3 Generalized Triangular Decomposition . . . . . . . . 87
6.3.1 Existence of GTD . . . . . . . . . . . . 87
6.3.2 The GTD Algorithm . . . . . . . . . . . 87
6.3.3 Inverse Eigenvalue Problem . . . . . . . . . . 93
6.4 Conclusions . . . . . . . . . . . . . . . . 95
7 CONCLUSIONS . . . . . . . . . . . . . . . . 99
REFERENCES . . . . . . . . . . . . . . . . . . 100
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . 105
















v
















LIST OF TABLES
Table page
4-1 The UCD-VBLAST scheme ...... .................... 38
5-1 The TCD-VBLAST Scheme ...... .................... 61
6-1 Comparison of SVDEIG and GTD for inverse eigenvalue problems (CPU
time in seconds, singular value and eigenvalue errors in sup-norm) . 93












































VI















LIST OF FIGURES
Figure page
3-1 Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr = 4 for i.i.d. Rayleigh flat fading channels ................... 29
3-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and
M1 = 5. Results based on 2000 Monte Carlo trials ............... 30
3-3 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 1000 Monte Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20
dB, and (d) 30 dB .................................... 31
3-4 BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr = 4 and (b) Mt = 4
and A,1 = 4 ......................................... 31
3-5 BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined with OFDM for ISI suppression ....................... 32
4-1 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20
dB, and (d) 30 dB .................................... 49
4-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and
1r = 5. Results based on 2000 Monte Carlo trials ............... 50
4-3 Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt = 4
and Air = 4 ......................................... 50
4-4 BER performances of the UCD-DP, UCD-VBLAST schemes and the
imaginary UCD-genie scheme. Results based on 1000 Monte Carlo
trials of an i.i.d. Rayleigh flat fading channel with Mt = 10 and M, = 10. 51
5-1 Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, C, = 3, C2 = 2, and 03 = 1 .......................... 57
5-2 A Matlab function to solve (5.49) ............................. 68
5-3 Inpt SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with it = 5
and A,. = 6 .......... .................................. 69
5-4 Input SNP vs. C1. A rank 2 channel is decomposed into two subchannels
with capacities C, and C2 = 10 C, .................... 70
6 1 The operation displayed in (6.8) ....... ....................... 86

vii











6-2 The operation displayed in (6.15) . . . . . . . . . . . 89
























































viii
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
A CHANNEL DECOMPOSITION PERSPECTIVE By
Yi Jiang
May 2005
Chair: Jian Li
Major Department: Electrical and Computer Engineering
This dissertation studies the signal processing aspect of multi-input multi-output (MIMO) communications. The contribution of this dissertation is twofold.
First, this dissertation presents a new perspective to the MIMO communications: any MIMO scheme can be regarded as a MIMO channel decomposer, which decomposes (in an information loss or lossless manner) a MIMO channel into multiple scalar subchannels. Based on this perspective, this dissertation presents three novel MIMO transceiver designs, the geometric mean decomposition (GMD) scheme, the uniform channel decomposition (UCD) scheme, and the tunable channel decomposition (TCD) scheme. All these schemes deploy either a decision feedback equalizer (DFE) at the receiver or a dirty paper precoder (DPP) at the transmitter. These transceiver designs represent a paradigm shift from the conventional linear MIMO transceiver designs to the nonlinear ones. The superior performance of the GMD and UCD schemes unveils the practical significance of making transmitter and receiver cooperate with each other. That is, such cooperations facilitate achieving the optimal tradeoff between the diversity gain and multiplexing promised by the MIMO communication theory. The TCD scheme represents a unifying solution to a considerably wide range of problems, including designing the precoder for orthogonal frequency division multiplexing (OFDM) communications and the optimal code division multiple access (CDMA) sequence design.
Second, this dissertat ion introduces two novel matrix decomposition algorithms, i.e., the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). The two matrix decompositions form the cornerstones of the three transceiver


ix










designs proposed in this dissertation. Moreover, the two decompositions have significant implications in the matrix analysis community. For instance, the GTD is a new solution to the inverse eigenvaluc problem.





















































x















CHAPTER 1
INTRODUCTION
1.1 Two Categories of Schemes for MIMO Communications
Communications over multiple-input multiple-output (MIMO) wireless channels have been a subject of intense research over the past several years because deploying multiple antennas at both transmitter and receiver sides can drastically improve the spectral efficiency [1] [2] [3] [4]. For example, in contrast to the conventional additive white Gaussian noise (AWGN) channel whose spectral efficiency is

C(snr) = log2(1 + snr) bps/Hz,

without requiring additional input power, the MIMO channel with Mt transmitting antennas and M receiving antennas can have spectral efficiency as large as [1] [2]

C(snr) = min(AMt, Mr) log2(snr) + 0(1) bps/Hz,

given that there is plenty of scattering in the channel. Many spatial multiplexing methods, e.g., the BLAST scheme [2] [5] [6] [7] [8] [9] [10] [11], have been proposed to reap the great channel capacity.
Improving the data transmission reliability is another advantage of applying multiple antennas in wireless communications. By transmitting the same information through more than one independent fading channel, one can obtain much more reliable communications thanks to the redundance introduced. The space-time coding methods are based on such a rationale, (see, e.g., [12] [13] [14] [15]).
Zheng and Tse [16] show that one can exploit the diversity gain and multiplexing gain promised by the MIMO channel simultaneously. However, there is a fundamental tradeoff between the two gains. Zlieng and Tse's theory provides a unifying framework to measure the performance of any MIMO schemes. Hence designing practical schemes capable of achieving the optimal diversity-inultiplexing tradeoff is a central research topic in MIMO communications.









2

1.2 Joint Transceiver Design: Where Tx and Rx Collaborate
All the aforementioned methods assume that the channel state information (CSI) is available at the receiver (CSIR) only. Under this assumption, collaborations between the transmitter and receiver are difficult in the physical layer. However, if the communication environment is relatively stationary, the availability of CSI at the transmitter (CSIT) is also possible via feedback or the reciprocal principle when time division duplex (TDD) is used. In fact, in the third generation WCDMA standard [17], the CSIT is assumed to obtain improved system performance, which is referred to as the closed-loop transmit diversity or transmit adaptive array (TxAA) technique. Based on this assumption, the joint optimal transceiver design (also referred to as precoding at the transmitter and equalization at the receiver) has recently attracted considerable attentions [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29].
These designs are based on a variety of criteria, including minimum mean-squarederror (MMSE), [18] [21] [22], maximum SNR [21], maximum information rate [19] [20] [22], and BER based criteria [23] [24] [25] [29]. More recently, a unified framework has been presented to accommodate all these criteria, under which the design problems can be solved via convex optimization methods [26].
The aforementioned literature on joint transceiver design considered linear transformations only. It is widely understood that the singular value decomposition (SVD), which decomposes a MIMO channel into multiple parallel subchannels, and water filling can be used to achieve the channel capacity [3]. However, due to the usually very different signal-to-noise ratios (SNR) of the subchannels, this apparently simple scheme requires careful bit allocation (see, e.g., [19] [20] [231) to match the subchannel capacity and achieve a prescribed BER. Bit allocation not only increases the coding/decoding complexity, but also is inherently capacity lossy because of the finite constellation granularity. An alternative is to use the same constellation in all the subchannels, like the schemes adopted by the European standard HIPERLAN/2 and the IEEE 802.11 standards for wireless local area networks (WLANs). However, for this alternative, the BER is dominated by the subchannels with the lowest SNRs. To optimize the BER performance, more signal power could be allocated to the poorer subehannels. Yet this approach causes significant capacity loss due to "inverse water filling" like power allocation. There is apparently a fundamental tradeoff between the capacity and the BER performance.









3

1.3 MIMO Transceiver Design from Channel Decomposition Perspective
In this dissertation, we present a new perspective to the MIMO communications. We regard the aforementioned MIMO schemes as MIMO channel decomposers, which decompose (in an information lossy or lossless manner) a MIMO channel into multiple scalar subchannels. For instance, the MIMO transceiver design based on SVD decomposes a MIMO into multiple eigen-subchannels. Similarly, the V-BLAST scheme decomposes a MIMO channel into multiple scalar subchannels which are referred to as layers by its inventors. These channel decompositions, however, are totally determined by the specific channel realization and one can have little control over how the channel is decomposed. For example, the gains of the subchannels obtained via SVD are totally determined by the singular values of the channel matrix, which one can have no control over.
An interesting question arises: if the transmitter and receiver are allowed to collaborate, how can we design a transceiver that can decompose a MIMO channel into multiple subchannels with prescribed channel gain, and without incurring capacity loss? This dissertation is devoted to answering this question. In the process of pursuing the answer, we investigate the following aspects of the problem.
First, we show that the conventional linear transceivers are inherently inflexible, and we cannot rely on linear transceivers to achieve our desired channel decompositions. Hence we need to go beyond the linearity constraint and investigate the nonlinear schemes, such as a decision feedback equalizer (DFE) and a dirty paper precoder (DPP).
Second, we study the possibility of new matrix decompositions other than using SVD. We propose two novel matrix decomposition algorithms, the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). The two decompositions represent a wide class of matrix decomposition, which has significant implications in the matrix analysis community. For instance, the GTD is a new solution to the inverse eigenvalue problem.
Third, we propose three transceiver designs which combine the new matrix decomposition algorithms with the DFE and DPP. The three designs are the GMD scheme, the uniform channel decomposition (UCD) scheme and the tunable channel decomposition (TCD) scheme. Among them, the UCD scheme can decompose, in a strictly capacity lossless manner, a MIMO channel into multiple subchannels with identical capacities or, equivalently, identical channel gains. Moreover, the UCD scheme is a practical scheme








4

that can achieve the optimal tradeoff between the diversity gain and multiplexing gain. Without incurring any capacity loss, the TCD scheme can decompose a MIMO channel into multiple subchannels with prescribed capacities/channel gains. This scheme is applicable to a wide range of applications, including the multi-task communications where independent data streams with different qualities-of-service (QoS) share the same MIMO channel, and designing the optimal CDMA sequences.
1.4 Dissertation Outline
In Chapter 2, we introduce the data model and some relevant information-theoretic results that will be used in this dissertation. We also review the existing transceiver designs and analyze the performances of those methods. By linking the channel capacity with the Cram6r Rao bound (CRB), we give an information-theoretic explanation why linear transceivers are inflexible.
Chapter 3 presents the GMD scheme that combines the VBLAST detector or DP precoder with the GMD matrix decomposition algorithm. The GMD scheme can decompose a MIMO channel into multiple identical scalar subchannels. This desirable property can bring much convenience to the practical system design, particularly the symbol constellation selection. Moreover, we have shown that the GMD scheme is optimal asymptotically for high SNR in terms of both information rate and BER performance while the computational complexity of our scheme is comparable to the conventional linear transceiver scheme.
In Chapter 4, we propose a uniform channel decomposition (UCD) scheme. Similar to the GMD scheme, the UCD is also based on the GMD matrix decomposition algorithm and can decompose a MIMO channel into multiple identical subchannels. Two remarkable merits of UCD, which are not shared by the GMD scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD can achieve the optimal diversity and multiplexing tradeoff. Moreover, the UCD scheme can decompose a MIMO channel into an arbitrarily large number of independent subchannels, which is an enabling technology to achieve high data rate transmission using small symbol constellations.
Chapter 5 is devoted to tackling a new aspect of the MIMO transceiver design l)roblem. Instead of attempting to optimize the BEt performance for fixed input power and data rate, we propose tie TCD scheme which can decompose a MIMO channel into multiple sutchannels with prescribed channel capacities. We show that TCD is a








5

solution to a wide range of applications, including the applications in which independent data streams with different qualities-of-service (QoS) share the same MIMO channel and design the optimal CDMA sequences.
The mathematical foundations of this dissertation, the GMD and GTD algorithms, are established in Chapter 6. The two novel matrix decomposition algorithms are the cornerstones of the MIMO transceiver designs proposed in this dissertation.
The conclusions are given in Chapter 7.
To read this dissertation, it is unnecessary to plunge into the details of the GMD and GTD algorithms. For this reason, we put them to the latter part of the dissertation. However, a rough understanding of the two algorithms is necessary to appreciate Chapters 3-5.















CHAPTER 2
LINEAR MIMO TRANSCEIVER DESIGNS
2.1 Channel Model and Channel Capacity
2.1.1 Channel Model
We consider a communication system with Mt transmitting and M, receiving antennas in a frequency flat fading channel. The sampled baseband signal is given by

y = HFx + z, (2.1)

where x E CLX1 is the information symbols precoded by the precoder F E CMxL and y E CMx' is the received signal and H E CMxM is the channel matrix with rank K. We assume E[xx*] = oaIL and z N(O, (olM,) is the circularly symmetric complex Gaussian noise, where IL stands for an identity matrix with dimension L. We define the input SNR as
E[x*F*Fx] ( 1
P O2 -Tr{F*F} Tr{F*F}, (2.2)
Ozz
2
where a = -. Designing the MIMO transceivers, including the precoder F and the associated equalizer, is the focus of this dissertation.
We note that the data model in (2.1) is generic. For an intersymbol-interference (ISI) channel with impulse response h = [h0, hi,..., hAM-1]T with (.)T denoting transpose, if a block data with length N are transmitted using the "zero-padded" OFDM, then the received block data can also be written in the form of (2.1) with

h0 0 0 ... 0
h0 0 ... 0


H= 0 ..0 (2.3)

0 h4 1 ... h10
0
0 0 ... 0 hA 11 In this case, H is a Toeplitz matrix with its dimensionality At = N and Air N+M -1. If the OFDM with cyclic prefix is used, the channel matrix is a circulant Toeplitz matrix,

6









7


ho 0 hM-1 ...... hi
hi ho 0 hM-1 h2
0 .
H =(2.4) hM-2 ... hM-1
hM-1 hM-2 ... hi ho 0
0 hM-1 hM-2 ... hi ho

Here, Mt = r N. In either case, if the block data are precoded with the linear
precoder F, then the received data are given in (2.1). This ISI channel problem has been studied in [21] [30].
In an idealized synchronous CDMA (S-CDMA) system where the channel does not experience any fading or near-far effect, L mobile users modulate their information symbols via spreading sequences {sI }[1, each of which has the processing gain N. The discrete-time baseband S-CDMA signal received at the (single-antenna) base-station can be represented as [31]
y = Sx + z (2.5)

where S = [SI,... SL] E R NL and the lth (1 < I < L) entry of x, xj, stands for the information symbol from the lth user. In the downlink channel, the base station multiplexes the information dedicated to the L mobile users through the spreading sequences, which are the columns of S. Then all the mobiles receive the same signal given in (2.5). We remark that (2.5) can also be written as (2.1) with H = IN and F = S. Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading sequences amounts to optimizing the precoder F for a MIMO system. Indeed, this problem has been under intensive research in the past several years.
In summary, both designing a precoder for OFDM transmission through an ISI channel and searching for the optimal S-CDMA sequences can be regarded as special cases in the unifying framework of MIMO transceiver designs. MIMO transceiver designs can be used in the OFDM and CDMA applications after only simple modifications. In this dissertation, we will concentrate on MIMO transceiver design although we will discuss the optimal design of CDMA sequences in Chapter 5.







8

2.1.2 Channel Capacity
Suppose x is a Gaussian random vector. The capacity of the MIMO channel (2.1) is 1
C = log2 I + 2HFF*H*I (2.6)

where denotes the determinant of a matrix. If both CSIT and CSIR are available, we can maximize the channel capacity with respect to F given the input power constraint axTr{FF*} = pu. That is,

CIT = max log2 II + -1HFF*H*i, (2.7)
4 T{FF* }=pa~

where a is as defined in (2.2) and the subscript of CIT stands for "informed transmitter".
Denote the SVD of H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {AH,k} =1 are the nonzero singular values of H. The solution to F in (2.7) is [3]
F = VM1/2. (2.8)

Here 4 is diagonal whose kth (1 < k < K) diagonal element Ok determines the power loaded to the kth subchannel and is found via "water filling" to be


k (P) = (2.9)
H,k
2 El
with IL being chosen such that au Ok(P) = pa' and (a)+ = max{0,a}. Then the solution to (2.7) is
CIT = log2 1 + -2 Ak bps/Hz. (2.10)
k=1
Note that some of Ok's can be zeros. In this case, we can only transmit L < K data streams.
If the CSIT is not available, the optimal transmission strategy is to evenly allocate power to each antenna [3]. For this case, F = I ,h and the channel capacity with uninformed transmitter (UT) is Al, / fA2\,,
Cur = log2 1 + bps/Hz. (2.11)
n=1



1 Throughout this dissertation, we assume that the coherent time of the channel goes to infinity. Hence advanced coding is applicable to approach the Shannon capacity.









9

It is proven [32] that if K = CIT
CT--+1 as p-oc. (2.12)
CUT

We claim a stronger relationship as follows. 2 Lemma 2.1.1 For the data model in (2.1), if the channel matrix H is of full column rank, i.e., K = Mt, then

CIT CUT --+ 0 as p oc. (2.13)

Proof: Inserting (2.9) into (2.10) yields

K
2 +
CIT = log2 (1iAH n) (2.14)
n=1
where p is chosen such that
K
Kp E -2 -, (2.15)
n=1 Hn
or
2 = P+(2.16)

Here we assume that all the K subchannels are used because of large p, i.e., IL > 0 for n = 1,2,...,K.
From (2.14), (2.16), and (2.11), and using the fact that K = Mt, we have K + K i
n= A
CIT CUT Z= log2= p+ K (2.17)


Note that
P n=l A,-lir K =1 for 1 P-OC

and that, f(x) = log2 x is a continuous function if x > 0. The lemma follows immediately from (2.17). U
However, we note that CSIT can be very helpful in the following cases:
A. The SNR is low or moderate.
B. H is rank deficient or ill-conditioned.



2 A similar, but somewhat vague, statement is found in [8].









10

C. There are more transmitting antennas than receiving ones, i.e., Mt > M,.
Moreover, the availability of CSIT provides more freedom, which makes it easier to devise joint transceiver design schemes to achieve the underlying channel capacity. This observation is the underlying theme of this dissertation.
2.2 Channel Capacity and Cram6r-Rao Bound
One of the most important significances of the Shannon's information theory is that this theory can predict the highest achievable data rate for a given channel. Similarly, the Cram6r-Rao bound (CRB) [33], which is the inverse of the Fisher information matrix (FIM), can predict the minimum mean squared error (MMSE) an estimator can achieve. In this section, we show that the MIMO channel capacity formula of (2.6) can be rewritten as a function of CRB, or FIM. Based on this relationship, we show that linear transceivers lack flexibility.
We rewrite (2.1) as follows:
y = Hx + z, (2.19)

but we relax the assumption of (2.1) slightly. Instead of assuming spatially white noise, we assume that z N(O, R.). We also assume that the channel input x N(O, R:) also has circularly symmetric complex Gaussian distribution and is independent of z. Then the channel output y N(O, HRXH* + R,). For this more general scenario, the channel capacity is
C = log JR, + HRxH*I (2.20)
IR I (.0

Now Consider the following random vector,


x N (0, R, (2.21)
y HR,, RY

Its log-likelihood function is


log f(x, y) = -const [x* y*] R, RxH* (2.22)
HR, R y

Using the block matrix inversion formula [34], we get,


[ ., RVH*] [A ] (2.23)
HR RY B* o>









where
A = (R, RxH*RyxHRz)- (2.24)

B = (R, RxH*R HRz)- R1H* R (2.25)

and o is irrelevant to the present discussion. From (2.22)-(2.25) we have

Olog f(x, y) = (R RxH*R-'HR,) (x RxH*R 'y). (2.26)


where x is the conjugate of x. Here we define the differentiation with respect to a complex-valued vector as [35, Appendix B] ( -a (a a 1
8 1 8, 3-, 1 7 +3
: (2.27)
0w 2 8* 2
O,,M 3oM+ o-,

where the mth entry of w, wm Um + jVm, m = 1,...,M. The Bayesian Fisher information matrix (FIM) [36] is given by FIM = E [ log f (x, y) log f (x, y)T SOx (2.28)

Based on (2.26) and (2.21), we obtain

FIM = A[I: RH*Rl[ R R HI A
HRx Ry -RyxHRx
= (R RxH*R 1HR) (2.29)
= R; + H*(Ry HRxH*)-'H (2.30)

= R- + H*R;-H (2.31)

Comparing (2.20) and (2.31), we see that R = log2 IRI + log2 IFIMI (2.32)

= log2 RxI log2 ICRBI (2.33)

where
CRB = FINI-' = Rx RxH*R'HR (2.34)

This shows that there exists a simple relation between the Gaussian MIMO channel throughput, which is an upper bound of the information transmission rate for any









12

coder/decoder, and the CRB, which is a lower bound on the covariance matrix of any unbiased estimator of x.
The MMSE estimator of x is

KMMSE -- R1H (HRxH* + R,)-1 y (2.35)

It is easy to verify that the MMSE estimator of x can achieve the CRB. Hence the MMSE estimator is the best we can achieve under the Gaussian assumptions. In general cases, the matrices FIM and CRB are non-diagonal; i.e., the MMSE estimates of the elements of x are correlated. The correlations between the elements of x clearly contain useful information for the subsequent decoding procedures. However, in practice, we only estimate the single elements of x separately and ignore the correlations between these elements. This causes the loss of information. In fact, we can quantify the capacity loss as
Mt
Q_ = log CRBkk log JCRB (2.36)
k=1
where CRBkk denotes the k-th diagonal element of CRB. According to the Hadamard inequality [34], for any positive semidefinite matrix M E CK,

K
IMI < 17 Mkk (2.37)
i=1

and the equality holds if and only if M is diagonal. Hence C,,,, > 0 and there is no capacity loss if and only if CRB is a diagonal matrix.
Based on the aforementioned discussions, we see that i) in general MIMO cornmunications, linear MMSE estimators followed by separate substream decoding are not capacity-wise optimal and ii) if the channel matrix H has the property that CRB of (2.34) is a diagonal matrices, linear MMSE estimators may be the first step of capacity lossless processing. If CSIT is available, the transmitter can apply some precoder F and get a virtual channel matrix
H,, = HF (2.38)

such that CRB is diagonal. This explains why all the existing linear transceiver designs invariably lead to the diagonalization of the chanmel matrix. Indeed, if R, is diagonal and R: = (., then it follows from (2.31) that H,, must have orthogonal columns to get diagonal FIM and hence diagonal CRB. Theii the precoder F = V, which is the right singular vector of H, is the only optimal solution. Yet as we discussed









13

before, this inflexible transceiver scheme can bring many difficulties to the subsequent coding/decoding and modulation/demodulation procedures.
2.3 Rate Performance of Linear Transceivers
To gain more insights into the limitations of the linear transceiver designs, we analyze the asymptotic rate performances of two typical linear transceiver designs for high SNR. We will show that the linear transceivers may suffer from considerable capacity loss and there is apparently a fundamental tradeoff between the throughput and the BER performance.
According to the channel model of (2.1), the received data vector is

y = HFx + z. (2.39)

The optimal linear receiver is always the LMMSE equalizer (also see, e.g., [23]) G,= FH (HFF*H*o2 + aI)-1, (2.40)

which yields the optimal estimate of the information symbol 9 = Gopty. The meansquared-error (MSE) matrix of 9 is

E = (I + t-'F*H*HF)-'. (2.41)

Note that E is a function of the linear precoder F. In the following, we analyze two linear precoder designs based on the minimization of the trace of the MSE matrix (MTM) and the minimization of the maximum diagonal elements of MSE matrix (MMD) criteria, which are referred to as ARITH-MSE and MAX-MSE in [26], respectively. We choose these two schemes because they appear to be the most typical ones and the MMD scheme yields the optimal (or very close to the optimal) performance among all the linear transceivers. Indeed, tile MMD is equivalent to the linear MIN-BER scheme in the flat fading channel case (see [26]). We do not consider the SVD plus water filling scheme herein since it requires the complicated bit loading.
The MTM scheme, or ARITH-MSE, which has appeared in several linear transceiver design papers (see, e.g., [22] [23] and [26]), attempts to minimize tr(E) with respect to F. The MTM precoder turns out to be

FAn'M = VI )1/2 (2.42)







14

where V is as defined in the SVD H = UAV*, and 4) is a diagonal matrix whose ith diagonal element Oi denotes the signal power loaded to the ith subchannel. According to the literature (see e.g. [231 Sec. III-A)

1AHi AH- (2.43)


where p is the Lagrange multiplier which controls the loaded power such that 1 di = pao Suppose p is sufficiently large. Then all the K subchannels are used and

= 2i = pr2 (2.44)

or
-1/2 P + iAH (2.45)
= "A-'o;1(245 Substituting (2.42), (2.43) and (2.45) into (2.41), we see that E is diagonal with the ith diagonal element
K 1-1
k=l AH,k
E ( = + kK=1 A )AH, (2.46)

Then (cf. Equation (28) of [26])

C = -log 2E (2.47)

= log2 (+K ) + 10g2 A,. (2.48)
i=k H,k )
Hence the sum rate of the channel using the MTM scheme is K p +-' K -2 K
CAr : = C = K log2 K k= -1Hk Og2 ,i. (2.49)
i=1 _-.k=1 H,k i=1
The channel capacity with uniform power loading in the K subchannels is

K
CUPL = log2(1 Hi (2.50)
i=1
Here CupL is different from C7T defined in (2.11) in that CupL corresponds to tilhe channel with the transmitter knowing the range space of H.
It follows from (2.49) and (2.50) that
D 2 pt -+' Ai= H A2

CUPL-CITM = log2 (1 + -AH,)-Klog2 ( i log2AHi. (2.51)
i=1 i= 1 ,i i=









15

After some straightforward calculations, we have Y vK A-1
lim CUL CMTM = K log2 zi=1 ,i bps/Hz. (2.52)
(- =1 1/K
P-00 K A-1
1.li=1 "H,i)

Note that for any real valued sequence {AH,i }K1 > 0, the arithmetic mean is greater than EKI, K 1/'K
or equal to the geometric mean, or 1 A (He K H 1 Ai11 Hence we conclude
that limp_~. CUPL CMTM > 0 and the equality holds if and only if {AH,i__ are all the same. We infer from (2.52) that the capacity loss of the MTM transceiver can be quite large if the channel matrix H has a large condition number, which is verified in Section 3.4.
If the same constellation is used for each subchannel, then the substream corresponding to the largest E dominates the overall BER performance. Recall that E = which is proportional to the inverse of AH,i. Hence the subchannels may have very different SNRs especially when H has a large condition number. To mitigate this undesirable effect, one can use the MMD transceiver, or MAX-MSE (cf. [26] Section V-A5), with FMMD = FMTM8, (2.53)

where 8 is a unitary matrix that makes all the diagonal elements of E in (2.41) the same, that is,
I K
E = K Ei. (2.54)
i=1
According to (2.47), the capacity of the channel using the MMD linear transceiver is

K
CMMD = -K log2e = -Klog2 E. (2.55)
i= 1
Thus

KA
1 K K
CTM CMMD = K log2 E log2 E, (2.56)
i=1 i=

= Klog2 1 ) K, (2.57)


g2 K A-1 (2.58)
= K 10g2 11lj(2.58)







16

where to get (2.58) from (2.57) we have used (2.46). Note that the relative capacity loss of MMD compared with MTM is independent of SNR given that all the subchannels are used. Interestingly, we can see from (2.58) and (2.52) that CMTM CMMD = limp__o CUPL CMfTM. We conclude that asymptotically for high SNR, the MMD transceiver has twice the capacity loss of MTM, i.e.,
I 1 bp/-1 2.9
lim CUPL CMMD = 2K log2 bps/Hz, (2.59)
(K -)1/K

although it may yield better BER performance. An intuitive explanation of the capacity loss of the MMD transceiver is as follows. Note that the only difference between MTM and MMD is the prerotation matrix E, which is an invariant operator in terms of information capacity. However, E makes the MSE matrix E non-diagonal, which means that the elements of i = Govty are correlated. Clearly, the correlation contains useful information for symbol detection and decoding. However, the linear equalizer ignores the correlation, which results in the additional capacity loss quantified in (2.58). The analyses here are verified in Section 3.4.
In summary, the MTM transceiver suffers from capacity loss of (2.52) due to the information theoretically non-optimal power loading defined in (2.43). The MMD transceiver suffers from additional capacity loss because it makes the MSE matrix nondiagonal. Hence there is an apparently inevitable tradeoff between the information rate and BER performance if the same symbol constellation is used in the different subchannels. In the next chapter, we will introduce the GMD scheme and clarify that there is not necessarily a tradeoff between BER performance and channel capacity. Indeed, the GMD scheme attempts to achieve the best of both worlds simultaneously.
















CHAPTER 3
MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION
3.1 VBLAST and ZF-DP
In this section, we first give a brief introduction to the VBLAST architecture [5], which is equivalent to the generalized decision feedback equalizer (GDFE) [37]. We also introduce the more recent zero-forcing "dirty paper" precoder (ZFDP) applied to the MIMO broadcast channels [38] [39].
3.1.1 VBLAST
VBLAST is a simple suboptimal receiver interface which is used in the MIMO system assuming that only CSIR is available. For a MIMO system (2.1) with Mt < Mr and rank K = Af,, the transmitter allocates independent bit streams across the Mt transmitting antennas with no precoding. To decode the transmitted information symbol, VBLAST first estimates the signal with the spatial structure hM,, where hi denotes the ith column of H, and then cancels it out from the received signal vector. Next, it estimates the signal with spatial structure hM,-1 and so on. The signal estimator can be either the ZF or MMSE estimator. Some proper reordering of the columns of H is helpful to improve the BER performance [5]. This decoding scheme involves sequential pulling and cancellation which is proved to be equivalent to the generalized decision feedback equalizer (GDFE) [37].
The ZF nulling step in the VBLAST scheme can be represented by the QR decomposition H = QR where Q is an Aft x K matrix with orthonormal columns and R is a K x K upper triangular matrix. Let us rewrite (2.1) as

y = QRx + z. (3.1)

Multiplying Q* to both sides of (3.1) yields

SRx + 2, (3.2)








17









18

or
Yl r1l r12 ... rlK Xl
Y2 0 r22 ... r2K X2 + 2
=+ .(3.3)

YK 0 ... 0 rKK XK K
The sequential signal detection is as follows for i = K : -1 : 1
ii= C -( _ZEKi+1 ri~ij) j

end
where C stands for mapping to the nearest symbol in the symbol constellation. Ignoring the error-propagation effect, we see that the MIMO channel is decomposed into K parallel scalar subchannels i = riixi + ii, i = 1, 2,. .., K. (3.4)


3.1.2 ZF-DP
We consider a broadcast MIMO channel with Mt transmitting antennas and Al,. receiving antennas (Mt >_ Al,). The channel model is exactly the same as (2.1) and the CSIT is available. However the receiving antennas cannot cooperate with each other. A vector transmission scheme was proposed in [40], which combines the QR decomposition and "dirty paper" precoding. We refer to this approach as the zero-forcing "dirty paper" precoding (ZFDP). (The use of the "dirty paper" phrase is due to Costa [41].)
The ZFDP scheme resembles the zero-forcing VBLAST method. It also goes through the sequential pulling and cancellation procedure. The only difference is that all these operations are done by the transmitter.
By assuming H to be of full row rank, i.e., K = Air, ZFDP also begins with the QR decomposition H* O f. Let us rewrite (2.1) as

y = R* *x + z. (3.5)

Denoting x = Q k yields
y = R*:c + z, (3.6)









19

or
Y1 { r11 0 ... 0 Z1
Y2 r21 62 ... 0 2 Z2


YK rKl ...... rKK XK ZK

Denote s E CKX1 to be the symbol vector destined for the K receivers. We wish to have k satisfying
?1181 ?11 0 ... 0
r22s2 21 r22 ... :2 (3.8)


rKKSr rKl ...... rKK K
The solution to (3.8) is
k= R-*diag{R}s. (3.9)

However, the matrix inversion can amplify the norm of k significantly which can lead to additional power consumption at the transmitter. By exploiting the finite alphabet property of the communication signals, the modulo arithmetic precoder (more recently known as the Tomlinson-Harashima Precoder [42], [43]) can be applied to bound the value of the transmitted signal. Moreover, the trellis precoding can be used to eliminate the 1.53 dB shape-loss of Tomlinson-Harashima precoding [44]. The ZFDP transmission scheme decomposes the MIMO channel into K parallel scalar channels (see [40] for more details)
yi :i + zi i = 1, 2,..., K. (3.10)

Several remarks are now in order. a) VBLAST is shown to be able to achieve only about 72% of the capacity [5]. That is because imposing the same rate of transmission on all the transmitters makes the channel capacity limited by the worst of the K scalar subchannels. b) VBLAST has only diversity gain of MrA- t+1. c) ZFDP can achieve the broadcast channel capacity for high SNR [39], but the subchannels have different fading levels. Hence the transmitter, just like the aforementioned linear transceivers, have to consider the tradeoffs between the BER performance and the channel throughput. d) ZFDP scheme causes no error propagation, and thus (3.10) is precise. e) Both VBLAST and ZFDP involve nonlinear operations.







20

3.2 Geometric Mean Decomposition for MIMO Transceiver Design
Note that VBLAST assumes no cooperations among transmitting antennas and ZFDP assumes no cooperations at the receivers. Then a natural question arises: can we exploit both the CSIR and CSIT to make things better if both CSIR and CSIT are available? We attempt to address this question next.
In the sequel, we assume that the same signal constellation is used in all the independent symbol streams to reduce the system complexity. This is consistent with the HIPERLAN/2 and IEEE 802.11 standards. Then the overall BER performance of the system will be limited by the subchannel with the lowest SNR. To mitigate this problem, based on (3.4) and (3.10), we consider the following optimization problem

max min {rii : 1 < i < K}
QP
subject to R = Q*HP
R CIKXK,rij= 0 for i> j (3.11)
rii>0 for 1 Q*Q = P*P = IK

where the semi-unitary matrices Q and P denote the linear operations at the receiver and transmitter, respectively.
Since both Q and P are semi-unitary matrices, we have Hn=1 nn n=1 AH,n,
where {AH,,},=i are the K non-zero singular values of H. In Chapter 6 we show that if there exist semi-unitary matrices P and Q satisfying

H = QRP*, or equivalently, R = Q*HP (3.12a)

where the diagonal elements of R are given by


rii = A< K H,n 1
then the R in (3.12) is the solution to (3.11). The detailed treatment of the decomposition (3.12) is delegated to Chapter 6, We refer to this decomposition as the geometric mean deconiposition (GMD) since the diagonal elements of R are the geometric mean of {A ,,,J=1. A computationally efficient and numerically stable algorithm is proposed in Section 6.2 to calculate the decomposition.
It seems reasonable to constrain the linear equalizer Q to be semni-unitary since it will keep the background noise white. Yet it seems unnecessary to constrain P to be







21

semi-unitary as well. Indeed, the constraint that P and Q should be semi-unitary is in fact inactive as shown in the following lemma established in Section 6.2.1. Lemma 3.2.1 The GMD of (3.12) is also the solution to the following optimization problem with relaxed constraints:

max min {rii: 1 PQ

subject to R = Q*HP, rij = 0 for i > j, R G R (3.13)

rii>0, 1
tr(Q*Q) < K, tr(P*P) < K.

Proof: Omitted. See Section 6.2.1 for details. U
The GMD, which can be viewed as an extended QR decomposition, can be readily combined with the aforementioned VBLAST (GDFE) or ZFDP. GMD-VBLAST is implemented as follows: We first calculate the GMD H = QRP*. Next we choose the precoder F = P, then the equivalent data model is

y = QRx + z. (3.14)

The next step is nothing but the VBLAST detector.
Ignoring the error propagation effect, we can regard the resulting subchannels as K independent and identical subchannels

yi = AHxi z, for i= 1,...,K. (3.15)

The GMD-ZFDP scheme is similar to GMD-VBLAST because of the duality between VBLAST and ZFDP.
3.3 Performance Analyses and Implementations Issues
In this section, we first present the performance analyses of the GMD scheme from capacity perspective, from which we demonstrate the advantages of our GMD scheme over the linear transceivers. Next, we consider combining the GMD scheme with the blind two-way channel subspace tracking in the TDD scenario. To achieve close to optimal performance at low SNR, we propose to combine GMD with subchannel selection. Finally, we discuss the relationship between our GMD scheme and [30].
3.3.1 Performance Analyses
As we have mentioned earlier, the overall BER performance of a MIMO communication system is dominated by the worst subchanuels asymptotically for high SNR.









22

Hence the scheme optimizing the worst subchannel can enjoy the optimal BER performance for high SNR. This observation is also the motivation of the aforementioned MMD scheme. As a major advantage over the linear transceiver schemes, the GMD scheme is also asymptotically optimal in terms of the channel capacity for high SNR as we will show below.
If the signal power is allocated evenly to the K subchannels, then based on (3.15), we get
CGMD = K1og2 ( K H (3.16)

where p is defined in (2.2). The channel capacity with uniform power loading on the K subchannels is (see (2.50))

K
CUPL = og2 (I H (3.17)
n=l

It follows from (3.16) and (3.17) that

CUPL CGMD = log2 =I(1 + p ,n) (3.18)
(1 + pX2)K

From (3.12b) and (3.18), we have

lim CUPL CGMD = 0. (3.19)
p--cxD

Based on Lemma 2.1.1
lim CIT CUPL = 0. (3.20)
p-oo
Hence, it follows from (3.19) and (3.20) that

lim CIT CGMD = 0, (3.21)

i.e., for high SNR the GMD scheme is asymptotically optimal.
Hence the GMD scheme does not need to make the tradeoff between the information rate and BER performance as the conventional linear transceivers. Instead, our GMD scheme can achieve the optimum on both aspects simultaneously for high SNR.
As we have mentioned before, VBLAST may suffer from error propagation. Hence the BER performance of GMD-VBLAST will be inferior to the scalar equivalence in (3.15). We calculate the upper bound of the GMD-VBLAST BER as follows. For a fixed SNR p, we assume that the system of (3.15) has symbol error rate (SER) P,, i.e., each subchannel has SER Pa/K. XVe consider the worst case that decoding errors in some









23

subchannels will cause the failure of the decoding in all the subsequent subchannels. The SER upper bound is readily calculated as SK-1
Pe,GMD-VBLAST K (1 Pe)n(K n)P
n=0
K-1
< -k (K-n)Pe
n=O
K + 1Pe. (3.22)
2

For a moderate K, say K < 10, the performance loss caused by the error propagation is rather small. For a system with high dimensionality, GMD-ZFDP is a better choice since it causes no error propagation. On the other hand, the Tomlinson-Harashima precoder leads to an input power increase of M for M-QAM.
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking
In TDD systems, the GMD scheme may be combined with two-way channel subspace tracking techniques. The GMD algorithm, given in Chapter 6, starts with the SVD. To calculate the matrix P (cf. (3.11)), we only need to know the singular values A and the right singular vectors V (cf. Chapter 6). Similarly, only A and U are used to calculate Q. Rewriting (2.1) with the precoder F = P yields,

y = HPx + z. (3.23)

Since the GMD scheme uses the same signal constellation and uniform power allocation, the covariance matrix of s is a scaled identity matrix, i.e., E[xx*] = a2I. Hence,

Ry = E[yy*] = HH*a + azI. (3.24)

If the signal power ax and the noise power az are known a priori, we have HH* = (Ry KI)/aX. Applying SVD to HH*, we get

HH* = UA2U*. (3.25)

The GMD algorithm can be applied based on U and A to get the matrices Q and R, which are sufficient for decoding. If a TDD system is used, the reverse channel, where the roles of previous transmitter and receiver are exchanged, can be modeled as

Yrev, = HTQ*s,.ev, + Zre, (3.26)







24

where the subscript "rev" means "reverse channel". Define Ry E[YrvYv E v] (3.27)


where y denotes the complex conjugate of y. Using the similar argument, we have

H*H = VA2V*. (3.28)

Then the reverse receiver, i.e., the previous transmitter, can calculate R and P from V and A. Channel subspace tracking techniques (see, e.g., [45] [46]) can be used to estimate U, V and A efficiently. Hence our GMD scheme can be applied without the need of using training symbols for channel estimation. We note that this merit of GMD is not shared by the conventional transceiver schemes introduced in Section 2.3 since all those methods allocate different powers to different subchannels, which makes it difficult, if not impossible, to estimate the singular values in A. Of course, if the same power is allocated to each eigen-subchannel, this blind two-way channel subspace tracking idea can also be combined with the SVD based schemes, at the cost of significant capacity loss.
The GMD scheme can be made backward compatible with the TDD systems using VBLAST decoders. By using CSIT or blind subspace tracking techniques, the transmitter can calculate the linear precoder F. Hence it can always precode the transmitted data x to be Px, even when sending the training data. Thus the receiver is "fooled" to believe that the channel is the virtual one H,t = HP = QR. Although the linear precoder P is made transparent to the VBLAST detector, the decoder still enjoys the multiple identical subchannels due to the linear precoder F = P.
3.3.3 Subchannel Selection
The previous discussion is based on the assumption that all the subchannels corresponding to positive singular values are used for signal transmission. However, in practical scenarios, some of the positive singular values of the channel matrix H can be very small. This situation occurs for spatially correlated flat fading channels, or even i.i.d. Rayleigh flat fading channels with 1,M r Alt > 1. From (3.12b), we see that it will influence the overall channel quality and hence subehannel selection is helpful. The other situation where subchannel selection is needed is the case when the input power is low or moderate. In this section, we propose a simple algorithm to select the








25

subchannels, which is numerically verified to be able to achieve near optimal capacity even at low SNR.
Let us sort the singular values of H as AH,1 > AH,2... > AH,K > 0. If GMD is constrained to the first n < K eigen subchannels, we obtain n identical subchannels

yi=A,xi+z, for i=1,...,n. (3.29)

where

wr = F (3.30)

To maximize the channel throughput with our GMD scheme, we need to solve the following problem

max n log 1 + (3.31)
1 or

max 1 + P (3.32)
1< n The solution to this problem is straightforward. We can use either linear search or bisection method to find the optimal n.
Several remarks are in order. i) It is straightforward to incorporate the channel selection into the GMD algorithm. In Section 6.2.2, we show that GMD starts from SVD H = UAV* and then applies a series of Givens transformation to A to make it upper triangular. The Givens transformation can be constrained to the first n. < K diagonal elements of A. ii) The blind channel subspace tracking can be combined with the subchannel selection strategy seamlessly. If only the subchannels corresponding to the largest n < K singular values are selected, the blind channel tracking technique will track the n dimensional subspace automatically. iii) The performance loss of the GMD scheme at low SNR region is due to the well-known fact that the zero-forcing equalizer is inherently suboptimal. In the next chapter, we propose the so-called uniform channel decomposition (UCD) scheme, which can decompose a MIMO channel into multiple identical subchannels in a strictly capacity lossless manner.
3.3.4 Further Remarks
The author later noticed [30] in which an idea similar to GMD was proposed to approach the performance of the ML detector in the ISI suppression scenario. For a SISO ISI channel, if symbols are precoded and transmitted in a block manner, then the








26

data model (2.39) can be used to represent the received block data (cf. (2.3) and (2.4)). Note that for this case, H is a Toeplitz matrix due to the time invariant property of the ISI channel. A linear precoder design F was proposed in [30] such that the virtual channel Hvt = HF can be decomposed via QR decomposition to be Ht = QR where R has equal diagonal elements. We see that this equal diagonal idea is equivalent to GMD. However, our GMD scheme, independently motivated by the MIMO transceiver design problem, has several major advantages over the algorithm in [30]:
1. Our GMD scheme represents a paradigm shift from the conventional linear transceiver designs to nonlinear designs and can be proven, both numerically and theoretically, to have superior performance from both BER and information theoretic
aspects.
2. Our GMD algorithm is computationally much more efficient than that of [30].
Both algorithms start from the SVD of H which is followed by K 1 iterations.
The GMD involves 2K 2 fast Givens rotations. For a channel H with Mt = MT = K, the SVD requires O(Ka) flops while the GMD requires additional O(K2) flops. Thus the computational complexity of the GMD scheme is comparable to the conventional linear transceiver schemes. However, the algorithm in [30] involves multiplications and inversions of matrices in each iteration and the overall
computational burden turns out to be additional O(K4) flops.
3. For the GMD algorithm, only the information of HH*, and hence A and U, are
needed to calculate Q. However, for the algorithm in [30], the equalizer needs to know both the precoder F and H, and hence Hvt = HF, in order to apply the traditional QR to Hvt. Hence it cannot be combined with the aforementioned
blind two-way channel subspace tracking algorithm introduced in Section 3.3.2.
Like the algorithm in [30], the GMD scheme can also be combined with orthogonal frequency division multiplexing (OFDM) for ISI suppression. For a SISO ISI channel with memory L,
L-I
y(n) E hix(n 1) + z(n), (3.33)
1=0
after applying OFDM with block length N, we get a MIMO channel

y = Dx + z (3.34)

where D is a diagonal matrix with the diagonal elements equal to the N-point FFT of h = [ho, hl.. -i.l1]T. Hence the CMD scheme can be applied directly. We expect









27

that GMD-ZFDP may have better BER performance than GMD-VBLAST if N > 1, in which case the GMD-VBLAST may suffer from considerable performance degradation due to error propagations.
3.4 Performance Examples
We present next several numerical examples to demonstrate the effectiveness of the GMD scheme. In all the examples, we assume Rayleigh independent flat fading channels.
In the first example, we consider a Rayleigh flat fading channel with Mt = 4 and M, = 4. We compute the Shannon capacities of the channel with both CSIR and CSIT (CIT, (2.10)), the channel with uninformed transmitter (CUT, (2.11)), the channel using the GMD scheme (CGMD, (3.16)), the channel using the MTM scheme (CMTM, (2.49)), and the channel using the MMD scheme (CMMD, (2.55)). We average the capacities of 1000 Monte-Carlo-generated H realizations. The result is presented in Figure 3-1. We note that the capacity loss of the MMD scheme is about twice that of the MTM scheme at high SNR as predicted in Section 2.3. The relative capacity loss of the MMD scheme compared with MTM is smaller at low SNR because some subchannels are not used at low SNR. The GMD scheme outperforms the linear transceiver designs when the SNR is moderate or high and is asymptotically capacity lossless at high SNR.
Figure 3-2 shows the complementary cumulative distribution functions (CCDF) of the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR equal to 23 dB. The five thin dashed curves denote the channel capacities of the five subchannels obtained via SVD plus water filling. Note that the leftmost thin curve crosses the vertical axis at a value less than one, which means that the worst subchannel (corresponding to the smallest singular value of the channel matrix) is sometimes discarded by water filling. The thick line is the CCDF of each subchannel capacity obtained via GMD. Figure 3-2 further illustrates the disadvantages of the conventional "SVD plus bit allocation" scheme (see, e.g., [19] [20] [23]). The channel capacities of the 5 subchannels obtained via SVD plus water filling range from 0 to about 10 bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match the capacity of the worst subchannel and something like 512 or 1024 QAM to the best subchannel. This bit allocation significantly increases the modulation/demodulation complexity. Moreover, using a constellation with size greater than 256 is impractical for the current RF circuit design technology. For the CMD scheme, on the other hand, the same constellation with a moderate size, say 64-QAM, can be applied to reap most of the channel capacity.







28

To demonstrate the effectiveness of the subchannel selection approach, we consider a 10 x 10 independent Rayleigh flat fading channel. The channel is usually ill-conditioned since some singular values of H are very close to zero. Without the subchannel selection strategy, GMD suffers from performance degradation, especially at low SNR, as seen in Figure 3-3. On the other hand, with the subchannel selection scheme, there is only about 0.2 bit/sec/Hz rate loss compared with the CIT, even at very low SNR.
We compare the BER performance of the GMD-VBLAST scheme with the unprecoded MMSE-VBLAST scheme with the optimal detection ordering, the MTM scheme and the MMD scheme. No error correcting code is used in the simulations. In Figure 3-4(a), H e C4X2 has identically independent Rayleigh fading elements. Hence the channel matrix is usually well-conditioned. Two independent symbol streams modulated as 16-QAM are transmitted. The figure is obtained by averaging 1000 Monte Carlo trials of H. We see that the GMD scheme has more than one dB improvement over the MMD scheme at moderate to high SNR. In Figure 3-4(b), H G C4"4 usually has a large condition number, in which case the MMD scheme is subject to more capacity loss as analyzed in Section 2.3. Four independent symbol streams are transmitted. The BER performance of the GMD scheme is much better than the others. We did not include MTM because it discards some bad subchannels and hence cannot be used to transmit four independent data streams.
In the final example, we combine the GMD scheme with 64-point FFT based OFDM for ISI suppression in a SISO channel. We assume that the channel response hl, I = 0, 1,... L 1, are independent zero-mean circularly symmetric Gaussian random variables with unit variance. The channel length is L = 4. The GMD-ZFDP is about 2 dB better than GMD-VBLAST. This is because GMD-VBLAST suffers from considerable error propagation effect. This result suggests that GMD-ZFDP may be preferred over GMD-VBLAST if the channel has a large dimensionality.
3.5 Conclusions
In this chapter, we introduce a novel joint transceiver design, which combining the geometric mean decomposition (GMD) with the VBLAST equalizer or dirty paper precoder. The GMD scheme can decompose a MIMO channel into multiple identical scalar subchannels. This desirable property can bring about much convenience to the practical system design, particularly the symbol constellation selection. Moreover, we have shown that the GMD scheme is optimal asymptotically for high SNR in terms of










29

Mr = 4, Mt = 4 lid Rayleigh channel 35I I I
IT
30 -- UT
-GMD 7
-A, MTM (ARITH-MSE)
25- MMD (MAX-MSE)

20- A
-,;,0
"~5
0 ...-


10

o I I I I
0 5 10 15 20 25 30
SNR (dB)

Figure 3-1: Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and Mr = 4 for i.i.d. Rayleigh flat fading channels.


both information rate and BER performance while the computational complexity of our scheme is comparable with the conventional linear transceiver scheme. Furthermore, we have shown that the GMD scheme can be applied without the need of using training symbols for channel estimation if combined with subspace tracking techniques. We have also considered the issue of subchannel selection when some of the subchannels are too poor to be useful. The GMD scheme can also be combined with OFDM for ISI suppression. Both the theoretical analyses and empirical simulations have been provided to validate the effectiveness of our approaches.










30

























Mr =5,Mt = 5 lid Rayleigh channel SNR = 23 dB



0.9.

0 .8 .

0.7

0.6
U
0.5 0.4

0.3

0.2 .
-cap/dim, GMO
0.1 cap/dim, IT

0
0 2 4 6 8 10
Capacity (bit/sec/Hz)

Figure 3-2: Complementary cumulative distribution functions of the capacities of 5 subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results based on 2000 Monte Carlo trials.












31




M 10, Mt101, SNR.d 0 1U, 10, =10,SNR 10 dB


0.9- 0.90.8 0.685

0.7- 0.70.6- 0.6
IL I.


0.4-v 0.4

0.3 5 .0.3

0.2 GUD 0.2
GMD w/Ch Sol -OMDwC~
0.1 UT 0.1 VWChSo

0 2 4 6 0 12 14 10 Is 20 25 30 35
Capaciy (bit/oc/z) 10Capacity (bitsec/Hz)


(a) (b)
U 10, M, =10, SNR 20OdB M, 10, M,=10, SNR 30 dB


0.9- 0.9

0.8- 0.80.7- 0.7 'V

U..
J 0.6 049


0.4- 0.40.3- 0.30.2- M 0.2J GMO w/ Ch Sof
UT -UT 35 40 45 so 55 00 15 75 so as so a5
Capacity (bit/sac/Hz) Capacity (bit/soc/Hz)

(c) (d)


Figure 3-3: Complementary cumulative distribution function of the capacity of an i.i.d. Rayleigh flat fading channel with Alt = 10 and Al, = 10. Results based on 1000 Monte Carlo trials. SNB = (a) 0 dB, (b) 10 dB, (c) 20 dB, and (d) 30 dB.





M=2. Mr. 4 Ild Rayleigh channel, 16-OAM M,= 4, M,= 4 ild Rayleigh channel, 16-OAM

--Ordered MUSE-VBLAST --Ordered MUSE-VBLAST
10 -.MTU (ARITOI-USE) -sMMID(MAX-MSE)
D U~(MAX-USE) 10o-, GMO-VBLAST
-GMD-VBLAST





ir10 a


10,


to-" 10'.


10,


SNR (dB) SNR (dB)

(a) (b)


Figure 3-4: PER p~erformlanlce averaged over 1000 Monte Carlo trials of i.i.d. Ray' leigh

flat fading channel vs SNR with (a) AIt = 2 and Mrl = 4 anfd (b) Alt 4 and Air 4.











32

























GMD+OFDM, N =64, L =4, 64-QAM

-GMO-VBLAST
--GMD-ZFDP T







Cr 102








10


10 15 20 25
SNR (MB) Figure 3-5: BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined with OFDM for ISI suppression.














CHAPTER 4
UNIFORM CHANNEL DECOMPOSITION We have seen in Chapter 3 that the GMD scheme can have much better performance than the conventional linear transceivers. However, the GMD scheme may suffer from considerable capacity loss at low SNR due to the inherent "zero-forcing" operations which is capacity lossy, especially at low SNR. In this chapter, we propose a uniform channel decomposition (UCD) scheme, which is also based on the GMD matrix decomposition algorithm, to decompose a MIMO channel into multiple identical subchannels. The UCD scheme has two implementation forms. One is the combination of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as UCD-VBLAST, and the other includes a dirty paper
(DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as UCD-DP. Just like the GMD scheme, UCD can bring much convenience to the subsequent modulation/demodulation and coding/decoding procedures by obviating the need of bit allocation. Two remarkable merits of UCD, which are not shared by the GMD scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD has the maximal diversity gain. Moreover, the UCD scheme can decompose a MIMO channel into an arbitrarily large number of independent subchannels, which is an enabling technology to achieve high data rate transmission using small symbol constellations.
To facilitate the discussion, we recall the channel model given in (2.1) as follows.

y = HFx + z, (4.1)

where x G CLx is the information symbols precoded by the linear precoder F E Cm,L and y E CMI is the received signal and H C CMM' is the channel matrix with rank K. We assume E[xx*] = o-IL and z N(O, OU2IA,) is the circularly symmetric complex Gaussian noise. We define the input SNR as E[x*F*Fx) 0,21
F !Tr{F*F} -Tr{F*F}. (4.2)
0, 5z2 2 3





33









34

4.1 Closed-Form Representation of MMSE-VBLAST
The UCD scheme is based on the closed-form representation of the VBLAST scheme using MMSE nulling vectors. For MMSE-VBLAST, the nulling vector for the ith layer


we = hjhj + al hi, i= 1,...,Mt. (4.3)
(is
ij=1
The MMSE-VBLAST algorithm can be represented in a concise matrix form which was given in [9] (also see the more detailed version [47]).
Consider the augmented matrix

H
Ha = v/-aIM' (Mr+Mt)xM (4.4)


Applying the QR decomposition to Ha yields


Ha = QI RH A Q I RH (4.5)
Q1e

where RH. E CM xM' is an upper triangular matrix with positive diagonal elements and QUH E CMx"'. Note that H = QuRH. is not the QR decomposition of H since Qu is not unitary. However, we can readily obtain the nulling vectors using Qu. and RH. as shown in the following lemma [47]: Lemma 4.1.1 Let {qi}l denote the columns of Q o and 1{rH,iiLi1 the diagonal elements of RIa, where QH and RH, are given in (4.5). The nulling vectors of (4.3) satisfy
wi -= rH1,iiqHa,i, i= 1,2...,Am. (4.6)


Then the output signal-to-interfere-and-noise ratio (SINR) of the ith layer (i.e., the signal corresponding to hi) using wi is S= i (4.7)
w, j hjh* + o(I wi

Inserting (4.3) into (4.7). we can simplify (4.7) via some straightforward calculations to be (see. e.g.. [48])
Pi = h Ci h11i i = 1,..., M. (4.8)

where C, = E- hjh; + o.







35

The SINRs given in (4.8) are related to RH. as shown in the following lemma: Lemma 4.1.2 The diagonal of RH, given in (4.5) and {pi}j'l given in (4.8) satisfy

a(1 + p) = rHaii i = 1, 2,..., Mr. (4.9)

Proof: See Appendix A. U

An immediate corollary follows.
Corollary 4.1.3 The MMSE-VBLAST detector is information lossless. That is, AMt
Slog(1 + pi) = log IH*Ha-1 + I, (4.10)
i=1
where the right hand side of (4.10) is equal to (2.7) with F = IM,.
Proof: From (4.4) and (4.5), we have

H*Ha-' + I = a-la*HaO = a-iRH*RH. (4.11)

Hence
Mt
log IH*H-1 + I = log (a-r2 j) (4.12)
i=1
According to Lemma 4.1.2, Mt
log IH*Ha-i +I = log(1 +pi).
i= 1


We note that Corollary 4.1.3 coincides with the findings in [48].
4.2 UCD-VBLAST
If we modify the precoder F given in (2.8) to be

F = VMI/2Q* (4.13)

where CLxK with L > K (to avoid capacity loss, we should not choose L < K
in general) and Q* = I, then we see through inserting (4.13) into (2.7) that the F given in (4.13) is also a precoder maximizing the channel throughput. However, introducing Q brings much greater flexibility than the precoder of (2.8). In the following, we concentrate on how to design f.








36

Given the precoder of (4.13), the virtual channel is

G A HF = UAI)1/2* A UEO* (4.14)

where E = AD1/2 is a diagonal matrix with diagonal elements {a}K__. Let Ga denote the augmented matrix

G, = I (4.15)
V'aIL
The UCD scheme is based on the following lemma.
Lemma 4.2.1 For any matrix of the form given in (4.15), we can find a semi-unitary matrix n E CLxK such that the QR decomposition of Ga yields an upper triangular matrix with equal diagonal elements.
Proof: Rewrite (4.15) as

U[EO9Kx(L-K)]1
Ga = U[E K(L-K)0 (4.16)
VSIL I

where Qo E CLxL is a unitary matrix whose first K columns form Q. We further rewrite (4.16) as
l ol OO(.Kx(L-K)] G0 o Ll no*. (4.17)
0 Qo Ita
We can have the following GMD:

j [ U OKx(L-K) QjRjP (4.18)
J -IL= QgRP) (4.18)
VaIL

where R E RILxL is an upper triangular matrix with equal diagonal elements and Qj E C(M-+L)xL is semi-unitary and Pj E CLxL is unitary. Inserting (4.18) into (4.17) yields
= IM, O 1
Go =o QJRJP*g. (4.19)

0 0o
Let no = P* and

Q = II 0]Qj. (4.20)

Then (4.19) can be rewritten to be Ga = QcoRj which is the QR decomposition of G,. The semi-unitary matrix Q associated with G, consists of the first K columns of Q0 (or P*))1 0









37

From Lemma 4.2.1 and Lemma 4.1.2, we conclude that we can always combine a linear precoder and the MMSE-VBLAST detector to uniformly decompose a MIMO channel into L > K subchannels with the same output SINRs. According to Corollary 4.1.3, we can further conclude that the channel decomposition is strictly capacity lossless. We refer to the scheme demonstrated in Lemma 4.2.1 as UCD-VBLAST.
The proof of Lemma 4.2.1 is insightful. Indeed, given the SVD of H and the "water filling" level 41/2 we only need to calculate the GMD given in (4.18). Then we immediately obtain the linear precoder F = V41/2*, where Q consists of the first K columns of P*. Let Qu denote the first Mr rows of QG., or equivalently the first M, rows of Qj (cf. (4.20)). According to Lemma 4.1.1, the nulling vectors are calculated as wi = r i oi, i = 1,2,..., L (4.21)

where rji is the ith diagonal element of Rj and qGo,i is the ith column of Qo.
Some observations can help reduce the computational complexity. For any matrix B e CMxN with SVD B = UBABV* and the augmented matrix with SVD


A = = UAAAV*, (4.22)


the diagonal elements of AA and AB, i.e., AA,j and AB,, satisfy AA,i = A, i = 1,...,N. (4.23)

Moreover

UA = and VA = VB. (4.24)
V A
Hence the SVD of J defined in (4.18) is


j U[ OK(L-K)]E IL (4.25)


where E is an L x L diagonal matrix with the diagonal elements = (, 1 < i < K. (4.26a)

and
di v K + < i








38

Applying the GMD matrix decomposition algorithm given in Section 6.2 to E yields

= (QIQ2 ... QL-1)RJ(PL-IL-2... P). (4.27)

Hence

U[E- OKx(L-K)l U[E!0OKx(L-K)j 1 Q ..L-Rj(TT T T)
(Qx 2-- L-) J L- L-2 lI (4.28)
Then the linear precoder has the form:

F=V [41/2: 0Kx(L-K) P12 .. PL-1. (4.29)


The nulling vectors are calculated according to (4.21) with rj,i = ( =Iai) L, and


QG = U[E 0Kx(L-K)]EQQ2 ... QL-1. (4.30)

Note that Q, and Pt, 1 = 1,2,.. ., L, are Givens rotation matrices and hence calculating (4.29) and (4.30) needs O(Mt(K + L)) and O(Mr(K + L)) flops, respectively.
We summarize the UCD-VBLAST scheme as follows 1
Table 4-1: The UCD-VBLAST scheme step operation flops
1 Compute SVD H = UAV* O(MtM,K)
2 Calculate ,p/2 using (2.9) O(K2)
3 E = A0/2 O(K)
4 Obtain E using (4.26) O(K)
5 Apply GMD to E to obtain (4.27) O(L2)
6 Generate F using (4.29) O(Mt(K + L))
7 Compute Qg using (4.30) O(M,.(K + L))
8 Calculate {w_1 using (4.21) O(M,.L)


Obviously, our UCD-VBLAST scheme has comparable computational complexity to the SVD based linear transceiver designs. An observation relevant to practical implementations is as follows. Note that the receiver does not have to calculate Step 6 since CSIT is available and the transmitter can run Steps 1 to 6. However, if the receiver calculates F, which only takes a small number of flops, and feeds it back to the transmitter,



1 Steps 5-7 can be processed simultaneously as in the GMD algorithm.







39

then the transmitter is relieved from calculating the SVDs. Hence in FDD systems, it is preferable to feed back F, rather than H, to the transmitter. In TDD systems, there are still advantages for feeding back F since this reduces by approximately half the overall computational complexity.
We conclude the discussions of the UCD-VBLAST scheme by deriving the SINR of each subchannel. Note that the diagonal elements of Rj is


rj, = j)l i= 1,2,...,L, (4.31)
1= 1

which is the geometric mean of the diagonal elements of t. It follows from (4.26) that

( K (01 )1/L= ( K 1 /21L
= aL-K 12 + a 1a (a-la + 1) (4.32)
l=1 \l=1

According to Lemma 4.1.2,


Pi = p a-l + 1) 1, i = 1, 2,..., L. (4.33)
\l=1

Hence
L K K
log2(1 +Pi) = 0log2(1 a = og2(1 +-1Hii) (4.34)
i=1 i=1 i=1
which is exactly the CIT in (2.10). Hence UCD-VBLAST is strictly capacity lossless.
4.3 UCD-DP
As a dual form of UCD-VBLAST, the UCD scheme can be implemented by using DP precoding, which we refer to as UCD-DP. For UCD-DP, a direct construction of the linear precoder F as done in Section 4.2 is not obvious. Instead, we exploit the uplink-downlink duality revealed in [49] to obtain UCD-DP.
We convert the UCD-DP problem into the UCD-VBLAST problem in the reverse channel where the roles of the transmitter and receiver are exchanged

y = H*x + z. (4.35)

The UCD-VBLAST scheme can be applied to the channel of (4.35), which yields the precoder Fre, and the equalizer {wJl as in (4.29) and (4.21), respectively. Normalize {wi}= to be of unit Euclidean norm, which we denote as {i}$_1. Let W = [ ., VL]. According to the uplink-downlink duality, the precoder of UCD-DP should be F = WDq where Dq is diagonal with the diagonal elements {J}_~, which will be









40

determined based on (4.40) below. We use F,ev, the linear precoder in the reverse channel, as the linear equalizer. Then the equivalent MIMO channel is

y = F evHWDqx + F*ez, (4.36)

where the ith scalar subchannel of the MIMO channel is L i-1
Yi = fiHwivxi + E f*HYjv'xj + E f*Hcvjr3~xJ + f*z. (4.37)
j=i+1 j=1

Applying the dirty paper precoder to xi and treating j_11 f*H jVqjyxj as the interference known at the transmitter (note that here we precode the first layer first while for UCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel

L
y, = f*Hvi vAxi + E f* HWjVIqxJ + f*z (4.38)
j=i+1
with SINR
qi IffH' v |2
Sqjf*Hj for i = 1, 2,. .., L. (4.39)
allfi|2 + Lj=i+1 qJf*Hwj12
The next step is to calculate {qi}fLI such that pi = p, 1 < i < L, where p is as defined in (4.33). Let aij = If*Hvj12. Then (4.39) can be represented in the matrix form

all -pa12 ... -palL q 1f 12
0 a22 -Pa2L q2 11f2112
= pa (4.40)

0 ... 0 aLL qL fLI2

It is easy to see that qi > 0, 0 < i < L. It is proven in [49] that EI qi = tr(FF*) = tr(Fe,,Fev). That is, the UCD-DP needs exactly the same power as the UCD-VBLAST to obtain L identical subchannels with SINR p.
The UCD-DP using the Tomlinson-Harashima precoder leads to an input power increase of A for M-QAM symbols. Nevertheless, for a system with high dimensionality and/or using large constellation, UCD-DP is a better choice than UCD-VBLAST since it is free of propagation errors.
4.4 Performance Analysis
4.4.1 Diversity Gain Analysis
An important performance metric is diversity gain, which is defined as follows [161.







41

Definition 4.4.1 Let P(p) denote the average error probability of a scheme at SNR p. The diversity gain of the scheme is
d = lim log Pe(p)

P- logp (4.41)
The diversity gain measures how fast the error probability decays with SNR. We note that diversity gain is usually discussed without assuming the availability of CSIT. The reason is that diversity gain is a concept associated with channel outage, i.e., the case where the channel is too poor to support a target data rate. Using CSIT, one can adjust the transmission rate to avoid channel outage. However, if the rate is fixed, which is desirable in practice, we can also use diversity gain as a performance measure of the transceiver designs. Based on this observation, we analyze the diversity gains of the UCD and GMD schemes. The result is summarized in the following proposition. Proposition 4.4.2 Consider the i.i.d. Rayleigh flat fading MIMO channel defined in (4.1). Let M = max(Mt, Mr) and m = min(M, M,). The diversity gains of the GMD and the UCD schemes are

dM(M, m) = (M m + 1)m, and duc (M,m) = Mm, (4.42)

respectively.

We have applied the typical error event analysis (see [16][50]) to obtain (4.42). The details are relegated to Appendix B. We see that although UCD has a negligible coding gain compared with the GMD scheme at high SNR, it has an additional m2 m diversity gains over GMD. An interesting point to make is that water filling does not help improve diversity gains. Hence at high SNR, water filling is useless in both capacity and diversity aspects.
Given the fact that the GMD scheme is asymptotically capacity lossless for high SNR, it is rather surprising to see the large diversity loss of GMD compared with UCD. We give an intuitive explanation as follows. Note that diversity gain is determined by the typical error events that the MIMO channel is in deep fade. Namely, the diversity gain of a scheme depends on its ability of dealing with bad channels. A deeply faded channel with high input SNR is equivalent to a "normal" channel with low SNR, in which scenario the GMD scheme is far less efficient than UCD as shown in the numerical examples. Consequently, the GMD has less diversity gain than UCD.









42

4.4.2 Further Remarks
Besides the larger coding gain at low SNR and an improved diversity gain at high SNR, the UCD scheme enjoys more flexibility than the GMD scheme. For a rank K MIMO channel, the GMD scheme can support no more than K independent data streams. However, the UCD scheme can decompose a rank K MIMO channel into L > K identical subchannels, and L is not even limited by the dimensionality of the channel matrix. This property of the UCD scheme enables one to achieve high data rate transmission using small constellations as demonstrated in the numerical examples.
The UCD scheme also suggests new ways of channel decomposition which are much more flexible than the conventional SVD based ones. Indeed, one may chose the permutation matrices and Givens rotations to achieve a wide variety of channel decompositions with some prescribed SINRs as suggested by the generalized triangular decomposition (GTD) (See Chapter 6, [51] [52]). This idea is developed in Chapter 5.
Finally, we link UCD with DBLAST [2], which has been shown to be able to achieve the optimal tradeoff between the channel diversity and multiplexing [16]. We observe that each diagonal layer of DBLAST can be viewed as the interleaving of the vertical layers of VBLAST in the space-time domain and each diagonal layer can be regarded as a virtual subchannel with the same capacity. However, DBLAST requires short and powerful error correcting coding to make the virtual subchannel work as a "real" one. This is a major difficulty for the implementation of DBLAST. In addition, DBLAST suffers from boundary wastage. In contrast, our UCD scheme, by exploiting CSIT, applies interleaving (via the Givens rotations and permutations) in the space domain only. This makes the UCD scheme free from the boundary wastage. Moreover, the UCD scheme is decoupled from coding procedures. Indeed, UCD can be concatenated with any error correcting code. Furthermore, UCD makes it easier to design the coding scheme since UCD decomposes a MIMO channel into multiple subchannels with identical capacities. Thus in a slowly time varying channel, UCD is much easier to implement than DBLAST despite their duality. This manifests clearly the values of CSIT.
4.5 Numerical Examples
We present next several numerical examples to demonstrate the effectiveness of the UCD scheme.
In the first example, we assume Rayleigh independent flat fading channels with lt = 10 and A,. = 10. Ve compare the channel capacity using the UCD and GMD







43

schemes. The complementary cumulative distribution functions (CCDF) of the capacity drawn out of 2000 Monte-Carlo realizations of H are shown in Figure 4-1. We see that the UCD scheme outperforms the GMD scheme significantly at low SNR although the difference becomes smaller at higher SNR.
Figure 4-2 shows the CCDFs of the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR equal to 25 dB. The five thin dashed curves denote the channel capacities of the five subchannels obtained via SVD plus water filling. Note that the leftmost thin dashed curve crosses the vertical axis at a value less than one, which means that the worst subchannel (corresponding to the smallest singular value of the channel matrix) is sometimes discarded by water filling. The thick solid line is the CCDF of the capacity of the L = 5 subchannels obtained via UCD. All these subchannels have the same capacity. As discussed in Section 4.2, a rank K MIMO channel can be decomposed into L > K subchannels. The thin solid line represents the case where a MIMO channel is decomposed into 7 identical subchannels using the UCD scheme. Figure 4-2 demonstrates the advantages of our UCD scheme over the conventional "SVD plus bit allocation" scheme (see, e.g., [19]). The channel capacities of the 5 subchannels obtained via SVD plus water filling range from 0 to about 11 bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match the capacity of the worst subchannel and something like 1024 or 2048 QAM to the best subchannel. This bit allocation significantly increases the modulation/demodulation complexity. Using GMD or UCD, we can decompose a rank 5 MIMO channel into 5 subchannels and hence the same constellation with a reasonable size, say 128-QAM, can be used to reap most of the channel capacity. The UCD scheme can do even better. In this example, after decomposing a MIMO channel into 7 subchannels via UCD, we can apply a small to moderate constellation, say 16-QAM or 64-QAM, to achieve the channel capacity.
In the third example, we assume Rayleigh independent fiat fading channels with It = 4 and A1r = 4. We compare the BER performance of the GMD and UCD schemes along with the conventional MMSE-VBLAST with optimal detection ordering in Figure 4-3. We see that both GMD and UCD outperform the conventional VBLAST detector significantly. Moreover, the BER vs. SNR lines of the GMD and UCD schemes have much steeper decreasing slopes, which means much better diversity gains, than the conventional VBLAST. The diversity gains of the GMD and UCD schemes are 4 and 16,









44

respectively. While there is a noticeably larger diversity gain for UCD compared with GMD as shown in Figure 4-3, the difference is not as drastic as the theoretical prediction. It is because the input SNR is not high enough to validate the approximations made in the typical error event analyses (see Appendix B).
In the final example, we compare the BER performance of UCD-VBLAST and UCD-DP in the scenario of a 10 x 10 Rayleigh flat fading channel. To present a benchmark, we also include UCD-genie as the imaginary scenario where at each layer, a genie would eliminate the influence of erroneous detections from the previous layers when using UCD-VBLAST. Figure 4-4 shows that UCD-VBLAST may suffer from some small BER degradations caused by error propagation (about 0.5 dB for BER = 10-4) compared with UCD-genie. The UCD-DP, on the contrary, is free of error propagation and hence has BER performance very close to that of UCD-genie. The slight SNR loss of UCD-DP is mainly due to the inherent power-amplification effect of the Tomlinson-Harashima precoder.
4.6 Conclusions
Based on the GMD matrix decomposition algorithm and the closed-form representation of the MMSE-VBLAST detector, we have introduced the UCD scheme for MIMO communications that can decompose a MIMO channel into multiple subchannels with identical capacities in a capacity lossless manner. We have proposed two versions of the UCD scheme, i.e., UCD-VBLAST and UCD-DP. The UCD scheme can provide much convenience for the subsequent modulation/demodulation and coding/decoding procedures due to obviating the need of bit allocation. We have also shown that UCD can achieve the maximal diversity gain. The simulations show that the UCD scheme has excellent performance even without the use of error correcting codes. The UCD scheme suggests a new way of channel decomposition which enjoys much more flexibility than the conventional SVD based ones.
Appendix A
Proof of Lemma 4.1.2
Rewrite (4.5)

H. = QH 0 A Q ] RRo. (4.43)









45

Let Ha,i (Hi) denote the submatrix containing the first i columns of Ha (H) and ha,i
(hi) the ith column. Then hi
Hi
Ha,i vIi hai 0(i-1)x (4.44)
O(Mt-i)xMt
O(Mt-i)x

For the QR decomposition Ha = QH.RH., the geometric implication of rH,ii is the component of ha,i projected onto the subspace spanned by the ith column of QHa, i.e., qHa,i. Note that qHs,i is orthogonal to the subspace spanned by {qHj -1 or, equivalently, the column space of Ha,i-1. Hence

r ,i= h* ,p ha,i, (4.45)

where PA stands for the orthogonal projection onto th null space of AT. Therefore

Hii = h,, I Ha,i-1 (Ha,i- Ha,i-1)-1 Ha,i- ha,i. (4.46)


Inserting (4.44) into (4.46) yields
r2 *1
rH, = a + h I Hi-1 (H_lHi_1 + aI)- H*-I hi
= a + ah (Hi-1H* I+ aI) h. (4.47)


From (4.8), we see that
pi = h (HiH* + al)-1 hi. (4.48)

Hence r2,i = a(1 + pi). The lemma is proven.
Appendix B
Proof of Proposition 4.4.2 Without loss of generality, we assume H e Cum, each of whose entry is of circularly symmetric Gaussian distribution with zero-mean and unit variance. Consider BPSK modulation. The average error probability of the GMD scheme is


PeD = E [Q (= [Q ( 24)j = E Q 2p (P H
i=(4.49)
(4.49)







46

where the Q-function is defined as /+0o
Q(-) dx.
Q(X) 27r

The diversity gain of the GMD scheme is log PGMD
dGMoD(M, m) = lim e (4.50)
p-.o log p
For any QAM constellation, the average error probability is similar to (4.49) except for some constants before or inside the Q-function. Since we focus on the high SNR region, all these constants will not affect the diversity gain defined in (4.50).
At high SNR, the typical error event is

= {AH 2 -' }. (4.51)

It can be shown that instead of calculating (4.50), which involves complicated integrations, we can compute the following [50, Ch. 3]:

doMu(M, m) = lim log P()4.52) p-so logp (4.52)
Note that
771
= ]H ,, = IH*HI- (4.53)
i=1
According to [53, Theorem 7.5.3] (with straightforward extensions from real-valued domain to the complex-valued domain),

TO
= H*Hj = 19-m1+i (4.54)
i=1
where gig's are independent Chi-squared random variables with probability density

1
f2(x) (i-l)!x- e, x > 0. (4.55)

Now the typical error event can be written as


S M-m+iil 1- f-m+i p-m
Si=1 = {g-,n}i= A -,+i i = 1. ... ,m} (4.56)







47

where o = {{= i}im, : -', ai > m}. Hence
m
P(e) = IP(g2M-m+i a i=1

From (4.55), we know that as E 0,

P(g2 < E) = )'-le-dx j dx = -e'. (4.58)


Using (4.52) (4.58) and (4.57), we calculate the diversity gain as
m D-(M -rr+i) d a
dMD(M, m) = l log fe+ U- 1 (M-rm+i)! do .. dom.
dGMD(M, M)1 -1
p-oo log p
lim log f,+ p- d(M-m+i)ada ... dam. (4.59) =- limlog (4.59)
p-Oo log p
m
= inf (M m + i)ai, (4.60)
Co i=1

where + ,,= o {(na > 0, i = 1,..., m}. To obtain (4.60) from (4.59), we have used the property that the integral in the numerator of (4.59) is dominated by the term with the SNR exponent closest to zero, as p -+ oc (see [16] for details). Here the integration is constrained over + because the integration over E, is dominated by the one over +. The reason is as follows. Suppose only a,..., a'j > 0, j < m, and the other a's, ak, ..., k,,, are negative. Then
m ]
r l P (g- + < a) ;; 11 pgI-I'j~ ( ~ j,
<- + < P-_) -: P

i=1 i=1

Let E+ denote { {,01 : > 0 = a, > m .'.j ak. Clearly,

3 m
inf L(M in + n)a, > inf (Ml mn + i)ai, a i=1 i=1

which implies that the integration over So is dominated by that over 9,+. Solving the optimization problem of (4.60) yields

d;x,(MA, min) = (MAl min + 1)m. (4.61)








48

Now we consider UCD. We observe that the power allocation applied to each eigen subchannel is no greater than p. Hence the overall channel throughput of UCD is
m m
log (1 + P A Rc log (1 + pA4,), (4.62)
Elg 1 m H H<-R~D
i=1 i=1

where the left term denotes the channel throughput associated with uniform power allocation. Applying UCD, we obtain m subchannels with the same SNR:

mm mm
1 + -A2 1 PUCD (1 + pA2,i) -1. (4.63)
= 1 i= 1

The typical error event is

E = {{AH,ij : PUCD < 1}. (4.64)

It follows from (4.63) that

Pi (P)( PP "( )i (1TI H ) P2 (P).
Pi(P)=P 1+- H),i,-1<1 2P(g)>P (1+pH,)-1< 1 AP2
i=1 i=1
(4.65)
It is easy to see that
lirm log Pi(p)= lim logP2(p)(4.66) p-o log p p-o log p
Hence
log P(S) log P, (p)
lim = lim (4.67)
p-o logp p-co log p
which implies that water filling does not help improve diversity gain.
It follows from the analyses of [16] that the UCD scheme achieves the optimal diversity-multiplexing tradeoff. In particular, when the transmission data rate is fixed, disregard the increase of input SNR, the diversity gain is d.,dd(M, m) = Mm.













49

























Mt=10, M = 10, SNR=0OdB Mt =10, Mr=10, SNR= 10 dB

I 1W I - GMD0
0.9 UCD 0.9 UCO

0.8 0.8

0.7- 0.7

0.6 0.86
U. IL
6o.o 5 o "
0.U 0.5

A0.4 0.4

0.3 0.3

0.2- 0.20.1 0.1


2 4 8 10 12 14 10 15 20 25 30 35
Capacity (bit/seclHz) Capacity (bWtsecHz)

(a) (b)
M, 10, Mr =10, SNR=20 dB Mt =10, Mr =10, SNR =30dB

0 -9- I I I -mo
0.9 UCO 0.9 UCD

0.8 0.80.7- 0.70.6- 0.6ILILL
0. 0.50.4- 0.40.3 0.30.2- 0.20.1 0.1
I J-.
40 45 s0 00 8 5 55 80 90 95
Capacity (bit/sec/Hz) Capacity (bit/secHz)

(c) (d)


Figure 4-1: Complementary cumulative distribution function of the capacity of an i.i.d. Rayleigh flat fading channel with MAt = 10 and Mr = 10. Results based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20 dB, and (d) 30 dB.









50





M =5, Mt= 5 lid Rayleigh channel SNR = 25 dB
1 1 ( I

0.9 0.8

0.7

0.6

0 0.5

0.4

0.3

0.2 cap/dim, GMD

0.1 cap/dim, UCD (L = 5)
cap/dim, UCD (L = 7)
0 .I I 'II.
0 2 4 6 8 10
Capacity (bit/sec/Hz) Figure 4-2: Complementary cumulative distribution functions of the capacities of 5 subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and M = 5. Results based on 2000 Monte Carlo trials.







Mt= 4, Mr= 4 lid Rayleigh channel, 16-QAM

- Ordered MMSE-VBLAST
-o. GMD-VBLAST 10 *g !: : UCD-VBLAST




E10 '
10

S10
10-4 . .. . tk. . ::



10



10. I I
10 15 20 25
SNR (dB)

Figure 4-3: Uncoded BER performance when using 16-QAM. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with M. = 4 and M. = 4.








51


















Mr= 10, Mr= 10 lid Rayleigh channel, 64-QAM

-UCD-VBLAST
1 0 -.







10-2.....





16 18 20 22 24 26
SNR (dB)

Figure 4-4: BER performances of the UCD-DP, UCD-VBLAST schemes and the imaginary UCD-genie scheme. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with AI = 10 and Mr = 10.















CHAPTER 5
TUNABLE CHANNEL DECOMPOSITION
5.1 Introduction
All these aforementioned MIMO transceiver designs focus on improving the communication quality subject to power constraints. In this chapter, we tackle a new aspect of the MIMO transceiver design problem. We regard a MIMO transceiver design as a way of decomposing a MIMO channel into multiple subchannels. As we have mentioned, the MIMO channel decomposition through SVD plus "water filling" lacks flexibility despite its optimality in terms of achieving the maximal overall channel capacity. The success of UCD motivates a much more flexible channel decomposition approach, namely the tunable channel decomposition (TCD) scheme, which is the main result of this chapter. Using the recently developed generalized triangular decomposition (GTD), we propose the TCD scheme to decompose a MIMO channel into multiple subchannels with prescribed capacities or, equivalently, signal-to-interference-and-noise ratios (SINR). The main properties of the TCD scheme are summarized as follows:
1. Given K parallel subchannels with capacities C1, C2,... CK, which is obtained
through applying SVD plus "water filling" to a rank K MIMO channel, TCD can convert the K subchannels into L > K subchannels with capacities R1, R2, -., RL if and only if (Ci, ., CK, 0,.., 0) E RL majorizes (R1, R2,. RL) 1 In particular, E-1C = Ei=1 Ri, i.e., the TCD is capacity lossless.
2. The TCD scheme has two implementation forms. One is the combination of a
linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as TCD-VBLAST, and the other includes a DP precoder and a linear equalizer followed by a DP decoder, which we refer to as
TCD-DP.
3. Given the SVD of the MIMO channel matrix, the computational complexity of
TCD, which is to calculate the precoder and equalizer matrices, is O(KL), which
is computationally quite efficient.



1 The concept of majorization is introduced in Section 5.2.


52







53

Almost originated at the same time as the research on MIMO transceiver designs, the optimal design of symbol synchronous CDMA (S-CDMA) sequences has been under intensive study over the past decade (see, e.g., [311[54][551[561). Although the two research topics have been studied in an apparently independent manner in the signal processing and information theory communities, the CDMA sequence design problem can be viewed as a special case of the MIMO transceiver design as we have shown in Section 2.1.1. Hence the TCD scheme can be applied, with little modifications, to the design of optimal CDMA sequences. Moreover, the TCD-VBLAST and TCD-DP schemes can be applied to design optimal CDMA sequences in the uplink (mobile-tobase) and downlink (base-to-mobile) scenarios, respectively. Our TCD scheme, which is independently motivated by the MIMO transceiver design problem, turns out to be related to the scheme proposed in [561. The relationship is discussed in Section 5.3.
5.2 Channel Model and Preliminaries
5.2.1 Channel Model
To facilitate the discussion, we rewrite the channel model used in the previous chapters.
y = HFx + z, (5.1)

where x E CL, I is the information symbols precoded by the linear precoder F G CM x L and y e CMr-I is the received signal and H E CM-xM is the channel matrix with rank K. We assume E[xx*]= U UIL and z N(O, UrIAf,) is the circularly symmetric complex Gaussian noise. We define the SNR as

E[x*F*Fx U2 1
P- Tr{F*F} Tr{F*F}, (5.2)

5.2.2 Channel Decomposition
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {A ,k}K I are the nonzero singular values of H. To maximize the channel capacity with respect to F given the input power constraint Tr{FF*} po,/U2, one needs to solve

CrT = max log2 I + a-HFF*H*. (5.3)
TrFF" } p./_2

The optimal linear precoder is (cf. (2.8))

F = Vpl/2. (5.4)








54

Here L = K and P is diagonal whose kth (1 < k < K) diagonal element Ok determines the power loaded to the kth subchannel and is found via "water filling" to be
/ )+
OkW A(5.5) H ,


with pu being chosen such that a E- 1 Ok(P') pc' and (a)+ = max{0, a}. In this case, we obtain K subchannels with capacities

Ck = log2 ( + H4k = log2 bps/Hz, k = 1,2,...,K. (5.6)


Due to the usually large dynamic range of singular values { H,k}k.1, the SVD decomposes a MIMO channel into multiple parallel eign-subchannels with different channel capacities. Moreover, since the optimal power loading levels are fixed as given in (5.5), the achievable MIMO channel decomposition is rigidly given in (5.6) and it lacks flexibility.
Another way of decomposing a MIMO channel is to use the VBLAST detector [5]. The VBLAST scheme involves sequential pulling and cancellation and it decomposes the MIMO channel into K subchannels (or layers as coined in [5]). By changing the ordering of the signal detection, we can get K! subchannel combinations, each of which is capacity lossless [48].
Theoretically, more combinations of subchannels is possible via time sharing (see [57, Ch. 14.3]). Recall that every DBLAST layer sends its data substream across the K transmitting antennas, or VBLAST layers, in a time sharing manner [2]. For example, for a system with Mt = 2, the transmitted data are Vertical Layer-I :1 Y2 x3 Y4 .. (5.7)
Vertical Layer-II 0 X2 Y3 x4 ...

Let xi and yi, i = 1, 2,..., denote the symbols transmitted through the DBLAST layers I and II, respectively, at time i. The receiver first estimates x, and then estimates x2 by regarding Y2 as interference. The estimates of X1, x2 are decoded jointly, which form the output of the diagonal layer I. After subtracting out the effect of x], X2 from the received data, we can estimate and decode Y2, Y3. which form the diagonal layer II. We remark that DBLAST can be viewed as a combination of VBLAST and tie time sharing technique, which decomposes the MIMO0 channel into multiple identical subchannels.







55

However, time sharing can be difficult to implement in practice. For instance, the major difficulty of DBLAST is the requirement of encoding the diagonal layer with short and efficient error correction codes, which limits its practical implementation despite its superb theoretical performance analyzed in [16].
If CSIT is available, more flexible and practical channel decompositions can be achieved. In Chapter 4, we have proposed the UCD scheme which combines the geometric mean decomposition (GMD) developed in Section 6.2 with either an MMSE-VBLAST detector or a DP precoder to decompose the MIMO channel of (5.1) into L > K identical subchannels. Hence, the UCD scheme can achieve the theoretical performance of the DBLAST scheme without resorting to any error correcting coding.
In this chapter, we generalize the results of Chapter 4 and develop a systematic channel decomposition that combines the recently proposed GTD algorithm with either an MMSEVBLAST detector or a DP precoder. We show that given K parallel subchannels with capacities C1, C2,. .., CK, which are obtained via SVD, TCD can convert the K subchannels into L > K subchannels2 with capacities R1, R2, .., RL if and only if (R1, R2,..., RL) is majorized by (C1, . , C 0,..., 0) E R L. This scheme is particularly relevant to the applications where independent data streams with different qualities-ofservice (QoS) share the same MIMO channel [28]. For example, video services usually require higher SNRs than audio services. Decomposing a MIMO channel into multiple subchannels with prescribed capacities and transmitting independent data streams through these subchannels can provide much convenience for resource allocations.
5.2.3 Majorization and Generalized Triangular Decomposition
We introduce several basic concepts and theorems of the majorization theory from [58].
Definition 1 For x, y E nR, if
J J
,] y~ 1 J i=l i=1

with equality holds for j = n, where the subscript [i] denotes the ith largest element of the sequence, we say that x is majonized by y and denote x -<+ y or, equivalently, y >-+ x.



2 If L < K, some eigu-subchanmels are discarde(d, which causes capacity loss. Hence we focus on the case of L > K.







56

Definition 2 An n x n matrix P is doubly stochastic if its (i,j)th entry pij 0 for i,j = 1,...,n, and -=lPij = 1 and _'=lPij = 1. Theorem 5.2.1 x -<+ y if and only if there exists a doubly stochastic matrix P such that x = Py.
A square matrix 11 is said to be a permutation matrix if each row and column has a single one, and all the other entries are zero. There are n! permutation matrices of size n X n.
Theorem 5.2.2 The permutation matrices constitute the extreme points of the set of doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the convex hull of the permutation matrices.
It follows from Theorems 5.2.1 and 5.2.2 that the set {x Ix -<+ y} is the convex hull spanned by the n! points which are the permutations of y.
As we have mentioned before, given K parallel subchannels with capacities C1, C2,..., CK, which are obtained via SVD, TCD can convert the K subchannels into L > K subchannels with capacities R1, R2,. . RL if and only if (R1, R2,.. ., RL) -<+ (Cl, . CK, 0, ...,O) 0
RL. For example, for a MIMO channel H with rank K 3, assume that the capacities of the 3 subchannels obtained via SVD are C, > C2 > C3. If L = K, then TCD can decompose the MIMO channel into 3 subchannels with a rate vector r = (R1, R2, R3) if and only if r lies in the convex hull


CO C2 ,C3 C1 C3 C2 ,C1 (5.9)

1 C3 C2 C3 C' C1 C2

Here Co stands for the convex hull defined as

Co{S} = {JOxj + ... + OKxKjxj E S,9, > 0,01 +OK = 1}. (5.10)

In general, the "capacity region" is a convex hull defined by K! vertices in a Kdimensional space. Since the TCD is capacity lossless, i.e., E Ci = 1ii1 Ri, the capacity region falls into a (K 1)-dimensional hyperplane. The gray area in Figure 5-1 shows the convex hull of (5.9) with C, = 3, C2 = 2, and C3 = 1. In this case, the 6 vertices lie in the 2-D plane {x xi= 6}. An interesting special case is the UCD scheme [59], which achieves the rate vector corresponding to the center of the convex hull, i.e., r = (2,2,2).








57

Capacity losses region (C1 : 3, C2 = 2, C3 1)


(1.;2,3)

3
2.5- (1,3,2)
2.5

2



1 .5- M12
1





Figure 5-1: Illustration of the capacity lossless region obtainable via TCD. We assume K = 3, C1 = 3, C2 = 2, and 03 =1.

Definition 3 For x, y E ll if
0 2





fix~ i=l i=l

with equality for j = n, we say that x is multiplicatively majorized by y and write x -<, y or, equivalently, y >-. x.
Obviously, if x -<, y, then logx -<+ logy.
Now we are ready to introduce the GTD theorem.
Theorem 5.2.3 (GTD theorem) Let H E C"nx have rank K with singular values A E IR'. There exists an upper triangular matrix R E CKXK and matrices Q and P with orthonormal columns such that H = QRP* if and only if the diagonal elements of R satisfy Irl -< A.

Proof. We relegate the proof to Chapter 6. U
There is a coniputationally efficient and numerically stable algorithm to achieve the GTD predicted by Theorem 5.2.3, which is presented in Chapter 6.
5.3 Tunable Channel Decomposition
5.3.1 TCD-VBLAST
We see from (5.2) that F can always be scaled such that a = 1. Hence without loss of generality, we let o = I in the sequel to simplify the notation.







58

Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {AHk k=1 are the nonzero singular values of H. The conventional SVD based linear transceiver designs have precoder F = V1/2 where & is a diagonal matrix whose diagonal elements stand for the power allocation. The precoder F transforms the MIMO channel into K orthogonal subchannels with capacities
Ck = log2(1 +A k) bps/Hz, k= 1,2,...,K. (5.12)

For this kind of precoder design, the only way of controlling the capacity of the subchannels is to change the power allocation 4.
If we modify the precoder F to be 3

F = V 1)/2f2T (5.13)

where Q E RLxK with L > K, and fTO = I, then it can been readily seen that introducing f does not change the overall channel capacity. However, it brings much greater flexibility as demonstrated in the following theorem. Theorem 5.3.1 (TCD Theorem) Consider a MIMO channel of (4.1) with F given in (5.13). For any L > K, let c E RIL be a zero vector with its first K elements replaced with {Ck k=1, where Ck = log (1 + A ,kk). Given any rates {Rk}k=1, we can find an orthonormal matrix Q E RLxK such that the combination of the linear precoder F = V41/2 T and the MMSE-VBLAST detector yields L subchannels with capacities
I L IL
{Rk k=1 if and only if Rk =1 -+ C.
Proof: Given the precoder of (5.13), the virtual channel is

G A HF = UA /2T = UAcG f2T (5.14)

where AG = A(1/2 is a diagonal matrix with diagonal elements

A = AH,;


3 Letting Q to be complex-valued does not introduce additional flexibility as is clear according to the GTD algorithm.








59

Let the augmented matrix Ga be defined as


G = UAt QT (5.16)
IL I (M,-+L)xL

After some straightforward calculations, we can obtain the SVD of Ga as the following: U[AGc.0Kx(L-K)]GAc n
Ga =[U[AOKL-K)] AGo 0, (5.17)


where fo E RLxL is orthogonal with its first K columns forming f and the diagonal matrix AGa contains the singular values of Ga: {~~ 1/+~f, l< i< K,

AcG.,i = + A 1 i < K(5.18)
1, i > K.

According to Theorem 5.2.3, we can apply GTD to obtain Aco = QRcGpT (5.19)

if and only if the diagonal elements of RGo RLxL, which we denote as {fGii iL1 satisfy
{|rGa*,iIi {L=1 x = (5.20)

Note that both Q and P in (5.19) are real-valued matrices because AGa is a real-valued diagonal matrix. Inserting (5.19) into (5.17) yields U[Ac' 0Kx(L-K)]A# T
Ga = o QR 0oP (5.21)
Go g
Choose Go = pT and define UAG. OKx(L-K) (QGo = Q. (5.22)

Then (5.21) can be rewritten as Ga = QGoRcGo, which is the QR decomposition of Ga. By Lemmna 4.1.2, it follows that for a = 1, (5.20) is equivalent to {1 + p} = {I_ - ?1+ ~=l G gji=l G ~ii=l,







60

where pi, 1 < i < L, denotes the output SINR of the ith subchannel, and AG.,i is given in (5.18). If
_}= = {log(1+ p -)}- .+ {logA ,}=1 = c, (5.24)

then (5.20) and (5.23) hold, which implies the existence of Q (the first K columns of PT).
Conversely, suppose that there exists a semi-unitary matrix Q such that the linear precoder F = V4 1/2[T and the MMSE-VBLAST detector yields L subchannels with capacities {Rk} k=L. Let Go = QGoRG, be the QR decomposition. It follows from Theorem 5.2.3 that (5.20) holds. Hence, by (5.23), we conclude that (5.24) holds. U
The proof of Theorem 5.3.1 is constructive. Indeed, given the SVD of H and the power loading level #1/2, we only need to calculate AG, AGa, and the GTD of AGa given in (5.19). Then we immediately obtain the linear precoder

F = Vj1/2QT = V [1/2 OKx(L-K) P. (5.25)

Let Qo denote the first Mr, rows of QGa. Then it follows from (5.22) that Qu = U[AcGiOKx(L-K)]A'Q

= U [rI OK(L-K) Q, (5.26)


where F E RKxK is diagonal with its ith diagonal element being yj According to Lemma 4.1.1, the nulling vectors are calculated as 1= G 1 < i < L, (5.27)
Wi = r'G.,iiq (5.27)

where rGa,ii is the ith diagonal element of RG, and qGo,i is the ith column of Q.
In the GTD algorithm, P and Q are obtained via multiplication of L 1 Givens rotation matrices. Hence calculating (5.25) and (5.26) needs O(Mt(L+K)) and O(M,(L+ K)) flops, respectively. We note that the decoding starts with the Lth layer, then the L Ith, and so on.
Given the SVD of H and the power allocation level -1, the TCD-VBLAST scheme needs to run the procedures summarized in Table 5-1. If Mt = AMr = K, then the TCDVBLAST scheme requires only O(L2 + K2 + KL) flops, given the SVD of the channel matrix.








61

Table 5-1: The TCD-VBLAST Scheme step operation flops
1 Calculate AG = A'1/2 O(K)
2 Obtain AGa using (5.18) O(K)
3 Apply GTD to AG, to obtain (5.19) O(L2)
4 Generate F using (5.25) O(Mt(L + K))
5 Compute Q' using (5.26) O(M,(L + K))
6 Calculate {wij=l1 using (5.27) O(MrL)

5.3.2 TCD-DP
Similar to UCD, the TCD also have two implementation forms, which are dual to each other. As a dual form of TCD-VBLAST, the TCD scheme can be implemented by using a DP precoder, which we refer to as TCD-DP. For TCD-DP, a direct construction of the linear precoder F as done in Section 5.3.1 is not obvious. Instead, we exploit the uplink-downlink duality revealed in [49] to obtain TCD-DP. This technique is also used in [59].
We first apply the TCD-VBLAST scheme to the reverse channel

y = H*Fx + z, (5.28)

where the roles of the transmitter and receiver are exchanged and the H in (5.1) is replaced by H*. Then we obtain the precoder F and the equalizer W [wl,...,WL] from H* according to (5.25) and (5.27), respectively. Applying F and the VBLAST detector with nulling vectors {wi}',, we obtain L subchannels i-1
wfy = w*H*fixi+ w*H*fjxj + wz, i= 1,...,L, (5.29)
j=1
where the ith subchannel (5.29) is free of interference from the jth (j > i) subchannels which are detected and cancelled out in advance. The SINR of the subchannel (5.29) is wH*f, 12 U2
wiw + j wHf (5.30)
wf weF-= lwf H*fl2o42
Note that replacing wi by Wi, which is obtained by scaling wi such that 1 wi[ = 1, does not change pi since the output SINR is invariant to the length of wi. Also note that a = 1, i.e., or = o Hence (5.29) can be simplified to be Iw*H*fi 2
Pi = 1 .-1 (5.31)
1 + J: w;~v*H*fj12








62

Let fi, i = 1,..., L, be the scaled version of fi and has unit length. Denote pi = |fs]j2. Then
IW*H*fi 1p, 5
p = H*f2 i= 1,...,L.(5.32)
1 + E ', i*H*f l2Pj

Let aj = f*Hwj12 Then (5.32) can be represented in the matrix form all 0 ... 0 Pl pl
-p2a12 a22 P2 P2 (5.33)
(5.33)
0
-PLal1L -PLa2L '" aLL PL PL

According to the uplink-downlink duality, in the original channel, the precoder of TCDDP should be F = [_W1,..., V/qL], where {qi}L1 will be determined later in (5.37), and the receiving vectors are fi, i = 1,...,L. Then we get L subchannels whose ith scalar subchannel of the MIMO channel is L i-1
Yi = f*HHW4vi-x + E f UjvfHW,-yxj + f*HWj/,-7xj + f*z. (5.34) j=i+1 j=1
i-1
Applying the dirty paper precoder to xi and treating E'1, fj HW qJxJ as the interference known at the transmitter (note that here we precode the first layer first while for TCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
y. = f*Hi v-i + E f,*HvW4/xj + fi*z (5.35)
j=i+1

with SINR (again, recall that a = 1 and o2 = a ) f 1 *H I2
p = for i= 1,2,...,L. (5.36)
1+ I-j=i+l qJ If;iHj2,

Similar to (5.32), (5.36) can also be represented as a11 -pla12 ... -plalL q Pl
0 a22 ... -P2a2L q2 P2
= i(5.37)

0 ... 0 aLL qL PL
L
It is easy to see that qj > 0, 0 < i < L. It is proven in [49] that y= q = tr(FF*) = tr(FF*) = I'-. pi. That is, to obtain L subchannels with SINRs {f }L the TCD-DP







63

needs exactly the same power as the TCD-VBLAST. To make this chapter self-contained, we give below an alternative proof to this interesting and useful fact.
Let UA denote a strictly upper triangular matrix whose (i,j)th entry is aij for 1 < i < j < L and zero otherwise. Let D A and Dp,, be two L x L diagonal matrices with their ith element equal to aii and pi, respectively. Then (5.32) can be rewritten as

(DA- DPu ) p = p (5.38)

or equivalently
(DP DA &U) p = 1 (5.39)

where p = [PI,. PL]T, P = [P, - PL]T and 1 is a vector with unit elements. Hence

p = (9 1DA -AT) -11 (5.40)

Similarly, (5.37) can be rewritten as

(DA DUA) q = p (5.41)

or
(D'DA UA) q =1. (5.42)

Hence
q = ()p'DA UA) 1. (5.43)

From (5.40) and (5.43),
L L
pi = 1T (Dp A _T)-l 1 = 1T (7p-1DA 1A) 1 = qi. (5.44)
i= 1 i=1
We can use the Tomlinson-Harashima precoder [42][431 or the trellis precoder [441 to achieve known interference cancellation at the transmitter. For a system with high dimensionality, TCD-DP is a better choice than TCD-VBLAST since it is free of propagation errors.
5.4 MIMO Communications with QoS Constraints
In this section, we apply the TCD scheme to MIMO communications with QoS constraints. Suppose we want to transmit L > K independent data streams through a MIMO channel. Instead of multiplexing all the substreams in the time division manner to share the entire MIMO channel, we apply TCD to decompose the MIMO channel into multiple subchannels whose capacities/SINRs meet the QoS requirements of the









64

substreams, and dedicate one subchannel to each substream. In [28], the authors studied the same problem. They proposed a linear transceiver design which, similar to TCD, can also control the SINR of each subchannel via designing the precoder. However, the linear transceiver is capacity lossy and can suffer from considerable performance degradation compared with our TCD scheme as we will show at the end of this section. Given that all the subchannels meet the QoS constraints, we want to minimize the overall input power. We need to solve the following optimization problem: minF tr (FF*)
subject to = QR (5.45)
IL

diag(R) = { 1-+}i=l.

Here QR denotes the QR decomposition and diag(R) denotes the vector formed by the diagonal of R. According to Lemma 4.1.2, the diagonal of R determines the SINRs of the subchannels. Without loss of generality, we assume that p, > P2 > ... > PL, We now consider a problem whose constraints are more relaxed than those of (5.45):

minF tr (FF*)

subject to AG, >-{ V -}L+ Ga= (HF (5.46)
\ IL

where AG. stands for the singular values of the augmented matrix Ga. In general, for any matrix A, we let AA denote the singular values of A. By Theorem 5.2.3, if F is feasible in (5.45), then F is feasible in (5.46). We now further simplify (5.46) and show that its solution provides a solution of (5.45). Theorem 5.4.1 If H = UAV* is the singular value decomposition of H, then (5.46) has a solution of the form F = Vp1/2 where 4 e ]RKK is a diagonal matrix with diagonal elements i. 1 < i < K, chosen to solve the problem

min, Ei l Oi

subject to H-i=1( + 4.q ))-li1(l+pi), Ok>k+ >0, 1
I+ Aio) = I-I=l1(1 + pi).
(5.47)







65

Moreover, if QRG.PT is the GTD of AG. in (5.19), then (5.45) has the solution F = V4)1/2fT where 41 is a solution of (5.47) and fl is the matrix formed by the first K columns of pT.
Proof: See Appendix A. U
We now develop an efficient algorithm for solving (5.47). We will see that the constraint /k > Ok+I can be omitted since it is automatically satisfied at a minimizer of (5.47). To begin, we make a change of variables to further simplify the formulation of (5.47). We define
i + 1/AH,i, 1 < i < K,

/3i = I
-,7, i1H=K( + Pi).
H,K

With these definitions, (5.47) reduces to

min z(5.49)
kk
subject to 1i=1 'Vi > H 1 3i, Ok 1/A,k, 1 < k < K.

Both the equality constraint and the inequalities Ok Ok+1 in (5.47) have been dropped since these constraints are automatically satisfied at an optimum. The fact that Ck > k+1 is established after Lemma 5.4.2. With regard to the equality constraint, if V) is feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is reduced when the trailing components of 0 are lowered. That is, if 0 is feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is reduced when the trailing components of 0 are lowered.
Clearly, the feasible set for (5.49) is nonempty and the cost function tends to infinity as any of the components of 7P tends to infinity. By continuity of the cost function and the constraints, a minimizer must exist. We now analyze the structure of the minimizer. By exploiting the structure, we obtain a fast algorithm for solving (5.49).
We first study a similar optimization problem with relaxed constraints. Lemma 5.4.1 Any solution ip of the problem K k k
min E Oi subject to fJIbi >1 f3 1 < k < K, (5.50)
i=1 i= i=1

has the property that Oi+1 < 4'i for each i.








66

Proof: We replace the inequalities in (5.50) by the equivalent constraints obtained by taking log's:
k k
Zlog(Vi) > Elog(3i), 1 < k < K,
i=1 i=1
The Lagrangian L associated with (5.50), after this modification of the constraints, is K /
4(0, ) = E Ok -Ilk (og(oi) lo0g3i)))
k=I \

By the first-order optimality conditions associated with 4, there exists I > 0 with the property that the gradient of the Lagrangian with respect to 4, vanishes. Equating to zero the partial derivative of the Lagrangian with respect to Vj, we obtain the relations

K
=~ E pi, j= I.,K.
i=j

Hence, Oj Oj+l = pj > 0. U
Using Lemma 5.4.1, we can gain insights into the structure of a solution to (5.49). Lemma 5.4.2 There exists a solution 4, to (5.49) with the property that for some integer j E [1,K),
1
V + 5 4'i for all i < j, Vi+j1 >_ Vi for all i > j, i = for all i > j. (5.51)

In particular, Oj < Vi for all i.
Proof: If 4 is a solution of (5.49) with the property that Vi > -4- for all 1 < j < K, then by the convexity of the constraints, it follows that 4, is a solution of (5.50). By Lemma 5.4.1, we conclude that Lemma 5.4.2 holds with j = K. Now, suppose that 4, is a solution of (5.49) with Vq 1/=IA2i for some i. We wish to show that Ok = 1/A2k for all k > i. Suppose, to the contrary, that there exists an index k > i with the property that Ok = /A 2 and Obk+I > 1i/A2k+]. We show that components k and k + 1 of 4, can be modified so as to satisfy the constraints and make the cost strictly smaller. In particular, let 0(c) be identical with except for components k and k + 1:

Vk(6) =(1 I + ()k and Vlk+I(() = 1+e (5.52)

For e > 0 small, O(c) satisfies the constraints of (5.49). The change A(() in the cost function of (5.49) is
+ +'Ok+ 1
( ) = (1 + ()2,. + -- .- 14-1.
l+c








67

The derivative of A(c) evaluated at zero is

A'(0) = /k Ok+1

Since 1/Ak is an increasing function of k and since k = 1/A2,k, we conclude that k+1 > /k and A'(0) < 0. Hence, for E > 0 near zero, /(E) has a smaller cost than i(0), which yields a contradiction. Hence, there exists an index j with the property that #i = 1/A,i for all i > j and i > 1/A2,i for all i < j.
According to Lemma 5.4.1, 4i i/ i+ for any i < j. To complete the proof, we need to show that j, < Vj+4. As noted previously, any solution of (5.49) satisfies K K

i=1 i=1

which implies (cf. (5.48))

j j- ( 11 J2
II, = > I ,.
i=1 i=1 i=j+l i=1

That is, the constraint j=1 #i > HI= i i n (5.49) is inactive. If Oj > Oj+, we will decrease the j-th component and increase the j + 1 component, while leaving the other components unchanged. Letting V(6) be the modified vector, we set VJ+1(6) = (1 + 6)&j+l and #j(6)- = 1 1+6'

Since the j-th constraint in (5.49) is inactive, 0(6) is feasible for 6 near zero. And if Vkj > 4j+l, then the cost decreases as 6 increases. It follows that ,j < j+i. U
By Lemma 5.4.2, , is a decreasing function of i for i C [1,j] while Oi = 1/AL, for i > j. Since AnH, is a decreasing function of i, it follows that Oi = i 1/A',i is a decreasing function of i for i E [1,j] with Oi > 0, while Oi = 0 for i > j. Hence, Oi is a decreasing function of i E [1, K]. In particular, the constraint Ok > k+1 in (5.47) is automatically satisfied by the associated solution characterized in Lenmma 5.4.2.
We refer to the index j in Lemma 5.4.2 as the "break point." At the break point, the lower bound constraint 0i > 1i/Ai changes from inactive to active. We now use Lemma 5.4.2 to obtain an algorithm for (5.49). Lemma 5.4.3 Let %k denote the k-th geometric mean of the 3i: k )1/k







68


function b = TCDPow (0,A)
L = 1 ; R = length (03) ; = zeros (1, R)
( = cumsum (log (/3)) ;
while R > L
[t, 1] = max (((L:R)./[i:R-L+1i]) ; 7y, = exp (t) ; L1 = L + 1 1 ; if > 1/A(LI)^2
i(L:Li) = Ti ;
L=L+ ;
C(L:R) = C(L:R) (L-i) ;
else
iP(L1:R) = 1./(A(L1:R).^2) ; C(Li-1) = ((R) sum (log (4(L1:R))) R = L1- 1 ;
end
end

Figure 5-2: A Matlab function to solve (5.49).

and let I denote an index for which 7k is the largest:

S= arg max{k : 1 < k < K}. (5.53)

If T~i > 1/A then putting (4 = -t for all i < 1 is optimal in (5.49). If T < 1/Ar, then = /AH,i for all i > 1 at an optimal solution of (5.49).
Proof: See Appendix B. U
Based on Lemma 5.4.3, we can use the following strategy to solve (5.49). We form the geometric mean described in Lemma 5.4.3 and we evaluate 1. If i > 1/A2,1, then we set i = 7 for i < 1, and we simplify (5.49) by removing Vi, 1 < i < 1, from the problem. If < 1/A j1, then we set Vh = 1/Aji, for i > 1, and we simplify (5.49) by removing Vj, I < i < K, from the problem. The Matlab code TCDPow implementing this algorithm appears in Figure 5-2.
After obtaining the power loading level Oi = 'i- 1/A2,i, 1 < i < K, we calculate the precoder F and the nulling vectors {wi}L= according to Table 5-1 in Section 5.3. Note that one of the possible paths through the TCDPow routine makes the leading elements of Small equal while setting the trailing elements of Vi = 1/Aji. This path coincides with the standard water filling algorithm. In this case, the TCD scheme is optimal in terms of maximizing the overall throughput given the input power. On the other hand, if some substreamni has a very high prescribed SINR such that the I given in (5.53) is less than the "break point" j,. then 4 leads to be a multi-level water filling power allocation, which









69

suffers from overall capacity loss. This happens when the target rate vector [R1,... RL] falls out of the convex hull spanned by the L! permutations of [Ci, ... CK, 0,. . 0] (cf. Figure 5-1), where Ck, k = 1, ... K, are the capacities of the eigen subchannels with water filling power allocation. As a remedy to this issue, one can "break" (if it is practically allowable) the oversized substream into more than one substreams with smaller rates, or equivalently, lower SINR requirements. Note that TCD can decompose a MIMO channel into an arbitrarily large number of subchannels.
An interesting special case is that p, = P2 ... = PL, i.e., the substream shares the same SINR requirements. In this case, 01 < 2 _< ... < #K since the singular values { AH,i }=' are in nonincreasing order, and TCDPow yields a standard water filling solution. In this case, TCD becomes UCD.
We present two numerical example to conclude this section. In the first example, we assume Rayleigh independent flat fading channels with Mt = 5 and Mr = 6. We consider equal QoS requirements for L = 5 independent substreams. Figure 5-3 compares the input power needed by our TCD scheme and the linear transceiver scheme of [28]. Our scheme can save about 2.5 dB for any prescribed output SINR.

Mr = 6, Mt = 5 lid Rayleigh Flat Fading
30 1 I I I
- Linear TxRx"
25- TCD

20


cc,




MoteCrl 1ril5o i d Rylig fltf-n hne ihM n r 6


0510 is 20 25
Prescribed Output SINR (dB)

Figure 5-3: Input SNR vs. Output SINt-. The result is based on the average of 500 Monte Carlo trials of a i.i.d. Rayleigh fat fading channel with Mt = 5 and M, = 6.


In the second example, we consider a rank two MIMO channel with singular value Aj1,A2. Suppose we want to decompose the MIMO channel into 2 subchannels with







70

capacity C1 and C2 with C, + C2 = 10 bps/Hz. We consider the three scenarios with (A1 = 2, A2 = 1), (A1 = 5, A2 = 1), and (A1 = 10, A2 = 1). For all the three cases, there is an inflection point beyond which our TCD is the same as the linear design of [28]. That is because when the two subchannels have very disparate QoS constraints, i.e., C1 is far larger than C2, the optimal strategy is to apply SVD to the channel matrix and transmit data through the orthogonal eign-subchannels. (In this case, fl = I. (cf. (5.13)).) If the subchannels QoS constraints are not too disparate, which corresponds to the region to the left of the inflection point, the required input power of our TCD scheme is invariant with respect to Ci, C2 and is strictly less than that needed by the linear design. This region corresponds to the capacity lossless region (cf. Figure 5-1). Another interesting point is that the relative advantage of TCD is more prominent if the singular values A1, A2 become more disparate.

22 2 Linear TxRxX1 = 2, 2 = 1
TCD X1 = 2, X2 =1
20 0 Linear TxRx 1= 5, 2 = 1 TCD 1 = 5, 2 =1
18 1 Linear TxRx 1= 10, 2= 1
... TCD1 = 10' "2 = 1
i 16 .

Z 14o

12 a a 0 .
-------- ... -, --e O O
10 [


.. . .. . .. .,.. . .. . .,, .. .. ............ -13, ,, ,, O
6 I I I
5 5.5 6 6.5 7 7.5 8 8.5 9
C1=10-C2

Figure 5-4: Input SNR vs. C1. A rank 2 channel is decomposed into two subchannels with capacities C, and C2 = 10 C1.

5.5 CDMA Sequence Design As we have shown in Section 2.1.1, the CDMA sequence design problem can be viewed as a special case of the MIMO transceiver design. In an idealized S-CDMA system where the channel does not experience any fading or near-far effect, L mobile users modulate their information symbols via spreading sequences { i},i, each of which has the processing gain N. The discrete-time baseband S-CDMA signal received at the








71

(single-antenna) base-station can be represented as [31]

y = Sx + z (5.54)

where S = [S1...,SL] E RNxL and the /th (1 < I < L) entry of x, x1, stands for the information symbol from the lth user. In the downlink channel, the base station multiplexes the information dedicated to the L mobile users through the spreading sequences, which are the columns of S. Then, all the mobiles receive the same signal given in (5.54). We remark that (5.54) can also be written as (4.1) with H = IN and F = S. Here M = Mt = N is the processing gain. Hence, optimizing the spreading sequences amounts to optimizing the precoder F for a MIMO system. Indeed, due to the simple channel matrix (H = I), some procedures of the TCD scheme can be simplified. We shall show that the TCD scheme turns out to be an improved solution to the sequence design proposed in [56]. At the end of this section, we will compare our TCD scheme and the scheme proposed in [56].
5.5.1 CDMA Sequences Maximizing Sum Capacity
Recall that the precoder maximizing the overall MIMO channel capacity is F = V41)/2T where 4I is obtained by water filling algorithm. For an S-CDMA channel, H = I, then V = I and the optimal power loading level is the uniform power allocation. Hence the CDMA sequence maximizing the sun capacity is S = VAUT. Since Q has orthonormal columns, we obtain SST = pl. This observation coincides with the findings in [31], in which the authors show that the CDMA sequences maximizing the sum capacity are the Welch-Bound-Equality sequences.
5.5.2 Uplink Case
For the uplink scenario, i.e., the mobiles to base station case, the base station calculates the optimal CDMA sequences for each mobile user and the associated successive nulling vectors needed by itself. Then the base station informs the mobile users their designated CDMA sequences.
First, we need to calculate the power loading levels P E RNN such that the following GTD matrix decomposition is possible:


A ( [4)/2"ONx(L-N) QRpT'(5.55)


where the diagonal elements of R, r,,. i = 1.2,..., L, satisfy the QoS constraints. Note that the singular values of 4)a form a sequence whose first N elements are V1 + 0, i =









72

1, 2,..., N, followed by L N ones. From Theorem 5.2.3, (5.55) exists if and only if


({1+ + }gl,1,...,1) >- {1 +pi}= (5.56)

Similar to (5.47), we need to solve the problem min, EiN1 i
subject to ({1 + i}Ji1, 1,..., 1) > {1 +p (5.57)

S_ 0, Vi

Similar to (5.49), (5.57) can be further simplified using the variables
L
=4 i+l1, = 1 + pi for i < N, and 3YN = (1+pi).
i=N
The simplified problem is

N
minP Ei=1 4i
} (5.58)
subject to H 1, i =1 i 'k >1 1, 1 < k < N.

The algorithm TCDPow simplifies immensely when we apply it to (5.58). Since f3i > 1 =
1 for all i, the constraints V4 > 1 are inactive. Since fi 5 f3i-l for all i < N, the geometric means satisfy 7i, < -y,-1 for all i < N. Hence, in Lemma 5.4.3, the value of I is either 1 or N. If 1 = 1, then we set 4i = /301 and we remove /1 from the problem. If I = N, then 0i = gN for all i. It follows that there exists an index j with the property that

'=/3i for all i j.
i= j+1
This observation coincides with the solution obtained in [561.
Let % denote an L x L identity matrix with its first N diagonal elements replaced by 0ip, 1 < i < N. According to the TCD scheme presented in Section 5.3.1, we then apply the GTD algorithm to %1/2 to obtain

T2 = QRPs. (5.59)

According to (5.25),
S = F = [4,1/2 ONx(L-N)] P. (5.60)

Let
[vI,..., VLi = [41'/2:0 Nx(L N)l, -'1 2Q ,. (5.61)







73

By (5.26) and (5.27), the nulling vectors used at the base station are

wi= rPiij i = 1.... L7 (5.62)

where r, is the ith diagonal element of Rp. In summary, the base station needs to run the following three steps:
1. Solve the optimization problem (5.58).
2. Apply the GTD algorithm to q11/2 in (5.59).
3. Obtain the spreading sequences for all mobile users, [Sl, SL] = S, and the
nulling vectors IW}?t (cf (5.60) and (5.62)) for the base station.
5.5.3 Downlink Case
In the downlink case, the mobiles cannot cooperate with each other for decision feed-back. Hence the VBLAST detection is impractical at receivers. However, we can apply TCD-DP as introduced in Section 5.3.2 to cancel out known interferences at the transmitter, i.e., the base station. We can convert the downlink problem as an uplink one and exploit the downlink-uplink duality as we have done in Section 5.3.2. Note that H = H*= 1, i.e., the downlink and uplink channels are the same! Consider the case where the uplink and downlink communications are symmetric, i.e., for each mobile user, the QoS of the communications from the user to the base station and the base station to the user are the same. After obtaining the spreading sequences [SI, -. SLj for the mobile users, and the nulling vectors [wl, .. WL] used at the base station for the uplink case, we immediately know that in the downlink case the spreading sequences transmitted from the base station are exactly [wi,. -, wL] and the nulling vectors used at the mobiles are the spreading sequences, ISI, SL], used in the uplink case. The only parameters we need to calculate are ql, . qNv (cf. (5.37)). Hence in this symmetric case, the base station only needs to inform the mobiles their designated spreading sequences once in the two-way communications. Each mobile uses the same sequence for both data transmission in the uplink channel and interference niulling in the downlink channel.
5.5.4 Numerical Example
We present one numerical example to show how TCD can be applied to CDMA sequence design. WVe consider ail example where there are L = 4 mobile users and the processing gain N = 3. The prescribed SINRs of the four users are 20, 19,18, and 17 dB, respectivel. For time uplinik case, we apply the TCD-VBLAST scheme to obtainl










74

the spreading sequences of the four users as the columns of the matrix 10.0000 -12.0745 -6.4974 -3.0926 S = 0 0 7.4138 -15.5760j (5.63)

0 8.8312 -13.3801 -6.3686)

The nulling vectors used by the base station are the columns of the matrix

0.0990 -0.0015 -0.0037 -0.0104 ( 00 0.1157 -0.0522 .(5.64)

0 0.1098 -0.0077 -0.0213

We note that for this uplink scenario, the base station detects the fourth mobile user, which has the spreading sequence corresponds to the fourth column of S, first and the first user last.
If the prescribed SINRs of the four users remain the same in the downlink scenario, the spreading sequences used by the base station are the four columns of the matrix 17.1936 -0.2303 -0.5154 -1.2796 F0 0 16.0012 -6.4449 .(5.65)

0 17.0149 -1.0614 -2.6352

In this case, the base station applies the dirty paper precoder to the first mobile user first and the last user last. Note that the columns of F and W in (5.64) are the same up to a scaling factor. Moreover, tr(FF T) =tr (SST) 892.7274. which means that the power consumed iii the base station equals to the overall power used by the four mobile users. At the mobile end, the users use the nulling vectors 100000 -12.0745 -6.4974 -3.0926 S 0 0 7.4138 -15.5760 ,(5.66)

0 8.8312 -13.3801 -6.3686

which are exactly the spreading sequences used in the up)linlk scenario. Scaling the output signals may be necessary at the mobile ends for the subsequent dirty l)pper decoder. But the signal scaling does not influence the output SINR.
If zeros are not allowed in the spreading sequences, we can left multiply S and W a 3 x 3 orthogonal matrix to eliminate the zeros in S.







75

5.5.5 Further Remarks
The TCD scheme, which was originally motivated by MIMO transceiver designs, turns out to be similar to the scheme of [56] in several aspects. Both schemes are based on the nonlinear decision feedback operations. Hence both are optimal in terms of maximizing the channel throughput and minimizing the overall input power. Both the GTD algorithm, on which the TCD scheme is based, and the construction of the Hermitian matrix with prescribed eigenvalues and Cholesky values as done in [56] rely on the WeylHorn theorem. However, our TCD scheme enjoys several remarkable advantages over the scheme of [56]. First, note that if we obtain the GTD H = QRP*, where R has the prescribed diagonal elements, then it follows immediately that A & P*H*HP = RR* is the desired Cholesky decomposition. However, the information associated with Q is lost in the Cholesky decomposition. Hence the nulling vectors used at the receivers of [56] cannot be calculated explicitly as our TCD does (cf. (5.27)). Furthermore, the correlation matrix A is only an intermediate result. To get the CDMA sequences, one has to decompose A = RR* explicitly. The TCD scheme, however, can be used to obtain both the precoder (CDMA sequences), which are the columns of P, and the equalizer from Q simultaneously. Second, our TCD scheme is a solution to the more general MIMO transceiver design problem. The Cholesky decomposition algorithm provided in Appendix C of [56] is only applicable to the scenario where the singular values are only of two values. Hence it is not applicable to the general design of MIMO transceivers. The more general Cholesky factorization algorithm suggested in the proofs is computationally far less efficient. Third, the TCD scheme has two implementation forms, i.e., TCD-VBLAST and TCD-DP, which makes it applicable to both the downlink and uplink scenarios. Finally, the TCD scheme provides insights that identify the CDMA sequence design problem as special cases of the MIMO transceiver design.
5.6 Conclusions
Based on the recently developed GTD matrix decomposition algorithm, we have proposed the TCD scheme utilizing the CSIT and CSIR. TCD can be used to decompose a MIMO channel into multiple subchannels with prescribed capacities. The TCD scheme has two implementation forms. One is the combination of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as TCD-VBLAST, and the other includes a dirty paper (DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as TCD-DP. Both forms of TCD








76

are computationally very efficient. We have also determined the subchannel capacity region such that a capacity lossless decomposition is possible. The applications of the TCD scheme for MIMO communications with QoS constraints have been investigated. We have also identified the problems of designing precoders for OFDM communications and designing CDMA sequences as special cases in the unifying framework of MIMO transceiver designs. In particular, we have shown that the CDMA sequence design problem in the uplink and downlink scenarios can be solved using TCD-VBLAST and TCD-DP, respectively.
Appendix A
Proof of Theorem 5.4.1 Observe that for F = VI)1/2, we have
K
tr (FF*)= E and HF = UA4)12. (5.67)
i=1

Hence, HFi = A 2,ioi for 1 < i < K, and V/I+ ii I i K,
1, i>K.


Since 1+pi > 1, the last L-K inequalities in the multiplicative majorization condition in (5.46) are implied by the single equality constraint in (5.47). Hence, the problem (5.46) reduces to (5.47) where F = V1'/2, which gives an upper bound for the minimum in (5.46).
Let F = UFDli/2fT denote the singular value decomposition for any given F E CML. Once again, tr (FF*) is given by the sum in (5.67). By [60, Theorem 3.3.14], the singular values of the product HF of two matrices are multiplicatively majorized by the product of the singular values of H and F:

k k
HJ,io > A2 Fi 1 < k < K. (5.68)



i=1 i=1
Taking log's, we have

k k
Elog(A 2'i( )) __ log(A 2 F, ) I < k < A'. (5.69)









77

By [60, Lemma 3.3.81) and (5.69),
k k
f (log(A,44)) f(1og(A, )), 1 < k < K, (5.70)
i=1 i=1

whenever f is a real-valued, increasing convex function. The function f(t) = log(et + 1) is convex since its second derivative is positive. Making this choice for f in (5.70) and exponentiating both sides, we obtain: k k
(Ai + 1) (A2F, i + 1), 1 < k < K. (5.71)
i=1 i=1

Since F is feasible in (5.46), k k
(A ,i + 1) (p + 1), 1 < k < K,
i=1 i=1
K L
(AF,i + l1) = (pi + 1).
i=1 i=1

Combining this with (5.71), we get k k
fl(A4,i + 1) > (Pi + 1), 1 i=1 i=1
K L
fl(A i,i i + 1) > I(pi + 1). (5.72)
i=1 i=1
Since Ajii + 1 is the square of the i-th singular value of the augmented matrix Ga corresponding to the choice V4)1/2, we conclude that F = V4)1/2 satisfies all the inequality constraints in (5.46). If the inequality (5.72) is strict, then OK should be decreased in order to satisfy the equality constraint in (5.47). Since decreasing OK only lowers tr(FF*), we deduce that the minimum in (5.46) is achieved by a matrix of the form F = V4)1/2. If F = V11/2 is optimal in (5.46), then so is F = V)1/2T whenever 0 has K orthonorinal columns (since the constraints are satisfied and the value of the cost does not change). NWe now make the choice for 01 given in Theorem 5.3.1. That is, if QRGaPT is the GTD of Ac., in (5.19) where 4 is a solution of (5.47), then n is the matrix formed by the first K columns of pT. For this choice of Q, the constraints of (5.45) are satisfied. As noted earlier, the minimum in (5.45) can be no smaller than the minimum in (5.46). Since this choice for F yields the same cost in both (5.45) and (5.46), we conclude that F = VI)1/22T is optimal in (5.45).







78

Appendix B
Proof of Lemma 5.4.3 First suppose that -f, > 1/A2. By the arithmetic/geometric mean inequality, the problem

min V i subject to Vi-H3, 0 >0, (5.73)
i=l i---1 j=l
has the solution Vi = <, i < 1. Since A11,i is a decreasing function of i and S> 1/A',i, we conclude that V, = -y, satisfies the constraints bi > 1/A',i for 1 < i < I. Since I attains the maximum in (5.53),
k

i=1
for all k < 1. Hence, by taking 0i = for 1 < i < 1, the first I inequalities in (5.49) are satisfied, with equality for k = 1, and the first 1 lower bound constraints i >! I/A,i are satisfied.
Let 0* denote any optimal solution of (5.49). If I I
H1 =I (5.74)
i=1 i=1

then by the unique optimality of Vbi = -y, 1 < i < 1, in (5.73), and by the fact that the inequality constraints in (5.49) are satisfied for k E [1, 1], we conclude that V/* = yz for all i E [1, 1]. On the other hand, suppose that I I
171 f> j31= (5.75)
i=l i=1
We show below that this leads to a contradiction; consequently, (5.74) holds and = for i E [1,1].
Define the quantity



By (5.75) > -ye. Again, by the arithnetic/geometric mean inequality, the solution of the problem
I I
mill VJi subject to fJ' i > y', ?p > 0, (5.76)
,=l i=1
is bi = J* for i C [1, 11. By (5.75), > 71 and 0 satisfies the inequality constraints in (5.49) for k E [1,1].









79

Let M be the first index with the property that M M
H : H .(5.77)
i=1 i1

Such an index exists since 0* is optimal, which implies that K K

i=1 i=1
First, suppose that M < j, where j is the break point given in Lemma 5.4.2. By complementary slackness, pi = 0 and V for 1 < i K M. We conclude that
bi = -y. for 1 < i < M. By (5.77) we have

M
=.

It follows that



which contradicts the fact that 1 achieves the maximum in (5.53).
In the case Al > j, we have V/i = 7* for 1 < i < j. Again, this follows by complementary slackness. However, we need to stop when i = j since the lower bound constraints become active for i > j. In Lemma 5.4.2, we show that = ^y. for i > j. Consequently, we have 1M3 =M i> YM>7,1


Again, this contradicts the fact that 1 achieves the maximum in (5.53). This completes the analysis of the case where -j _> 1/A4,.
Now consider the case "j < 1/4,. By the definition of 7'1, we have


3 or -tK > fI f3,. (5.78)
i~l i=l1

If j is the break point described in Lemma 5.4.2, then ,> for all i; it follows that

K
H (V)K. (5.79)







80

Since the product of the components of 0* is equal to the product of the components of 0, from (5.78) and (5.79) we get K K
J > fjI13= 11* : (03;)'

i=1 i=1
Hence, y~ > O > 1/A4l > 1/A4,i for all i < j. In particular, if I < j, then 7Y 1/A4, or, I > j when t < 1/Aj. As a consequence, = -














CHAPTER 6
NOVEL MATRIX DECOMPOSITIONS
6.1 Introduction
Given a complex matrix H, we consider the decomposition H = QRP*, where R is upper triangular and Q and P have orthonorinal columns. Special instances of this decomposition are
(a) the singular value decomposition (SVD) [61, 62]

H = VEW*,

where E is a diagonal matrix containing the singular values on the diagonal,
(b) the Schur decomposition [63] H = QUQ*,

where U is an upper triangular matrix with the eigenvalues of H on the diagonal,
(c) the QR decomposition where P = I.
In this chapter, we will introduce two novel matrix decompositions, i.e., the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). As we introduced before, the GMD scheme and the UCD scheme are based on the GMD matrix decomposition algorithm, and the TCD is based on the GTD algorithm. The results of this chapter are motivated by the applications of designing MIMO transceiver. Interesting, these results turn out to be also useful to the numerical analysis community.
6.2 Geometric Mean Decomposition
In this section, we present a new unitary decomposition which call the geometric mean decomposition or GMD. Given a rank K matrix H c Ce
Cj < i < K.
>0




81








82

Here the aj are the singular values of H, and & is the geometric mean of the positive singular values. Thus R is upper triangular and the nonzero diagonal elements are the geometric mean of the positive singular values.
We were led to this decomposition when trying to optimize the performance of multiple-input multiple-output (MIMO) systems. However, this decomposition has arisen recently in several other applications. In [64, Prob. 26.3] Higham proposed the following problem:
Develop an efficient algorithm for computing a unit upper triangular matrix with prescribed singular values ai, 1 < i < K, where the product of the ai
is 1.
A solution to this problem could be used to construct test matrices with user specified singular values.
The solution of Kosowski and Smoktunowicz [65] starts with the diagonal matrix E, with i-th diagonal element ai, and applies a series of 2 by 2 orthogonal transformations to obtain a unit triangular matrix. The complexity of their algorithm is O(K2). Thus the solution given in [65] amounts to the statement

QTEP= R, (6.1)

where R is unit upper triangular.
For general E, where the product of the ai is not necessarily 1, one can multiply E by the scaling matrix &-'I, apply (6.1), then multiply by & to obtain the GMD of E. And for a general matrix H, the singular value decomposition H = VEW* and (6.1) combine to give the H = QRP* where

Q = VQ0 and P = WP0.

According to (3.11), we consider the problem of choosing Q and P to maximize the minimum of the rij:

max min {rii : 1 < i < K}
QP
subject to QRP* = H, Q*Q = I, P*P = I, (6.2)
rij= 0 for i > j, R EIR1;IJ

where K is the rank of H. Since the GMD of H is feasible in (6.2), we conclude that the GMD yields the optimal solution to (6.2).








83

6.2.1 Generalized Maximin Properties
We consider the following problem:

max min {uii:l F,G

subject to GUF* =H, uij =0 for i > j, U KK (6.3)

u~i>0, 1 tr ((G'G)-) _< p, tr ((F'F)-) < p2. If Pi = P2 = K, then any Q and P feasible in (6.2) are feasible in (6.3). Hence, the problem (6.3) is less constrained than the problem (6.2) since the set of feasible matrices has been enlarged. Nonetheless, we now show that the solution to this relaxed problem is the same as the solution of the more constrained problem (6.2). Theorem 6.2.1 If H E Cmn has rank K, then a solution of (6.3) is given by

K= U ( )PPNR, and F=P K


where QRP* is the GMD of H.
Proof" Let VYEW* be the singular value decomposition of H, where E E /K contains the K positive singular values of H on the diagonal. If F and G satisfy the constraints of (6.3), then we have

H = VEW* = GUF*.

The column space of GUF* is contained in the column space of G. Since G has K columns, the dimension of the column space is at most K. Since GUF* = H has rank K, the column space of G must coincide with the column space of H, which is equal to the column space of V. Hence, there exists a K by K invertible matrix A such that

G = VA. (6.4)

In the same fashion, the column space of F must coincide with the column space of H*, which is equal to the column space of W. And there exists a K by K invertible matrix B such that
F = WB. (6.5)







84

Combining (6.4) and (6.5) with the identity GUF* = H = VEW* gives

AUB* = E.

It follows that
K
det (E*E) = det (BU*A*AUB*) = det (A*A)det (B*B) uii2I|,
i= 1
which gives
K
min u,12K < 171 2 = det (E*E)det (A*A)-'det (B*B)-'. (6.6)
l
By the constraints of (6.3), we have

tr ((G*G)-') = tr ((A*A)-') < pi, tr ((F*F)-1) = tr ((B*B)-1) < P2.

By the geometric mean inequality and the fact that the determinant (trace) of a matrix is the product (sum) of the eigenvalues, a K by K Hermitian positive semidefinite matrix S satisfies
-K
det (S) < K

Using these bounds for the determinant and the trace in (6.6), we have

mim ui < P- (6.7)
1
Finally, it can be verified that for the choices of G, U, and F given in the statement of the theorem, the inequality (6.7) is an equality. O
6.2.2 Implementation Based on Initial SVD
We now give an algorithm for evaluating the GMD that starts with the singular value decomposition H = VEW*. The algorithm generates a sequence of upper triangular matrices R(L), 1 < L < K, with R1) = E. Each matrix R(L) has the following properties:
(a) ri = 0 when i > j or j > max {L,i}.
(b)-(L) = for all i < L, and the geometric mean of rL), L < i < K, is T. We express Rk+l) = QkR k)Pk where Q k and Pk are orthogonal for each k.
These orthogonal matrices are constructed using a symmetric permutation and a pair of Givens rotations. Suppose that R(k) satisfies (a) and (b). If r > d, then let







85

HI be a permutation matrix with the property that flR(k)fl exchanges the (k + 1)-st diagonal element of R(k) with any element rpp, p > k, for which rp< d. If-k( < ( then let H be chosen to exchange the (k + 1)-st diagonal element with any element rP, p>k, for which rpp > &. Let 61 =(k) and J2 = r() denote the new diagonal elements at locations k and k + 1 associated with the permuted matrix IR(k)I.
Next, we construct orthogonal matrices G1 and G2 by modifying the elements in the identity matrix that lie at the intersection of rows k and k + 1 and columns k and k + 1. We multiply the permuted matrix flR(k)IH on the left by GT and on the right by G1. These multiplications will change the elements in the 2 by 2 submatrix at the intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements of G1 and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of G2, HR(k)HI, and GI:

S[C3161 [ 01 c -i & [ X~
-s62 c51 0 62 s c 0 y (6.8)
(C) (HR(k)H) (G,) (n(k+l))

If 1 =2 = ,we take c = 1 and s = 0; if 51 $5 2, we take 22
c 2 2 and s= V1-c2. (6.9)
1 62

In either case,
sc(62 l)r255
x 2 )rk and y = (6.10)

Since & lies between 5j and 62, s and c are nonnegative real scalars.
Figure 6-1 depicts the transformation from R(k) to GIIR(k)HlG1. The dashed box is the 2 by 2 submatrix displayed in (6.8). Notice that c and s, defined in (6.9), are real scalars chosen so that

c2 +S2 =1 and (c31)2 + (s52)2 =2.


With these identities, the validity of (6.8) follows by direct computation. Defining Qk = HG2 and Pk = HG1, we set R(k+l) = QTR(k)Pk. (6.11)

It follows from Figure 6--1, (6.8), and the identity (let (R(k+ 1) = det (R(k)), that (a) and (b) hold for L = k + 1. Thus there exists a real upper triangular matrix RO), with







86
Column k
I



X 000 XX X 00
A 0: 0 0o X :- 0 0
Row k
--XJ 0 0 : ---X: 0 0
xoo x0o0

X 0 X 0
X X

R(k) G nHR(k)HGI

Figure 6-1: The operation displayed in (6.8)

Son the diagonal, and unitary matrices Qj and Pi, i = 1, 2,.. ., K 1, such that R(K) = (QT 1 ... QTQT) R (K) k- 2 1 2 E(P1P2 ... Pk-l)Combining this identity with the singular value decomposition, we obtain H = QRP* where
(K-1 K-1 Q=V( Qi, R=R(K), and P = W Pg .
(i=1 )I(i=1 )
In summary, our algorithm for computing the GMD, based on an initial SVD, is the following:
1. Let H = VEW* be the singular value decomposition of H, and initialize Q = V,
P = W, R = E, and k = 1.
2. If rkk > choose p > k such that rp 5 &. If rkk < &, choose p > k such that
rp> In R, P, and Q, perform the following exchanges:

rk+lk+1 pp

P:,k+l P:,p
Q: Q:.p

3. Construct the matrices G1 and G2 shown in (6.8). Replace R by GTRG1, replace
Q by QG2, and replace P by PG1.
4. If k = K 1, then stop, QRP* is the GMD of H. Otherwise, replace k by k + 1
and go to step 2.
Given the SVD, this algorithms for the GMD requires O((m + n)K) flops. For comparison, reduction of H to bidiagonal form by the Golub-Reinsch bidiagonalization








87

scheme [66, 67, 68], often the first step in the computation of the SVD, requires O(mnK) flops.
6.3 Generalized Triangular Decomposition
In this section, we attempt to generalized decomposition of the form

H = QRP*, (6.12)

where R is upper triangular and Q and P have orthonormal columns. We will answer the following two questions. First, what is the necessary and sufficient condition that the decomposition of (6.12) exists. Second, how to calculate such a decomposition. Section
6.3.1 and 6.3.2 focus on answering the two questions.
6.3.1 Existence of GTD
The following result is due to Weyl [69] (also see [60, p. 171]):
Theorem 6.3.1 If A E Cnn with eigenvalues A and singular values or, then A -< a.
The following result is due to Horn [70] (also see [60, p. 220]):

Theorem 6.3.2 If r E Cn and o E IRn with r -- o, then there exists an upper triangular matrix R E Cfln with singular values ai, 1 < i < n, and with r on the diagonal of R.
We now combine Theorems 6.3.1 and 6.3.2 to obtain:

Theorem 6.3.1 Let H E CmX" have rank K with singular values a, > U*2 > ... >UK > 0. There exists an upper triangular matrix R E CKK and matrices Q and P with orthonormal columns such that H = QRP* if and only if r -< a.
Proof: If H = QRP*, then the eigenvalues of R are its diagonal elements and the singular values of R coincide with those of H. By Theorem 6.3.1, r --< a. Conversely, suppose that r -< o,. Let H = V5W* be the singular value decomposition, where E E R"KK. By Theorem 6.3.2, there exists an upper triangular matrix R E CKxK with the ri on the diagonal and with singular values ci, 1 < i < K. Let R = V0EW* be the singular value decomposition of R. Substituting E = V RW0 in the singular value decomposition for H, we have

H = (VV0)R(WW0)*.

In other words, H = QRP* where Q = VV* and P = WW*. U
6.3.2 The GTD Algorithm
Given a matrix H E C"'" with rank K and with singular values a1 > a2 > -. > UK > 0, and given a vector r G C' such that r -< or, we now give an algorithm for







88

computing the decomposition H = QRP*. This algorithm for the GTD essentially yields a constructive proof of Theorem 6.3.2.
Let VEW* be the singular value decomposition of H, where E is a K by K diagonal matrix with the diagonal containing the positive singular values. We let R(L) E CKxK denote an upper triangular matrix with the following properties:
(a) r = 0 when i > j or j > i > L. In other words, the trailing principal submatrix of R(L), starting at row L and column L, is diagonal.
(b) If r(L) denotes the diagonal of R(L), then the first L 1 elements of r and r(L) are equal. In other words, the leading diagonal elements of R(L) match the prescribed
leading elements of the vector r.
(c) rL:K -< rLK, where rLK denotes the subvector of r consisting of components L
through K. In other words, the trailing diagonal elements of R(L) multiplicatively
majorize the trailing elements of the prescribed vector r.
Initially, we set R(1') = E. Clearly, (a)-(c) hold for L = 1. Proceeding by induction, suppose we have generated upper triangular matrices R(L), L = 1,2,...,k, satisfying
(a)-(c), and unitary matrices QL and PL, such that R(L+l) = Q*R(L)PL for 1 < L < k. We now show how to construct unitary matrices Qk and Pk such that R(k+1) Q*R(k)Pk, where R(k+1) satisfies (a)-(c) for L = k + 1.
Let p and q be defined as follows:

p = arg mi{Irk)l : k < i < K, Ir k)I rkI}, (6.13)

q = arg max{Ir k) : k where r(kJ is the i-th element of r(k). Since rk:K -r, there exists p and q satisfying (6.13) and (6.14). Let HI be the matrix corresponding to the symmetric permutation fl*R(k)fl which moves the diagonal elements rg) and rqq to the k-th and (k + 1)-st diagonal positions respectively. Let 6i = r( and 62 = r4) denote the new diagonal elements at locations k and k + 1 associated with the permuted matrix H*R(k)1i.
Next, we construct unitary matrices G1 and G2 by modifying the elements in the identity matrix that lie at the intersection of rows k and k + 1 and columns k and k + 1. We multiply the permuted matrix HI*R(k)I on the left by G* and on the right by G1. These multiplications will change the elements in the 2 by 2 submatrix at the intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements of G1 and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of G,







89
Column k
I

X X X X X X X X X X X X
X X X X X X X X X X
AX0:0 0 AXX 00
Row k
X- 0 0 X- 0 0
xio o _...Xi0o

X 0 X 0
X X

I*R(k) > G*nI*R(k)T G

Figure 6-2: The operation displayed in (6.15)

H*R(k)H, and GI:

r ck* s J i I x (6.15)
-s62 c1 0 62 s c 0 y (6.15)

(G) (H*R(k)n) (G1) (R(k+1))

If 1611 = 1621 = Irkl, we take c = 1 and s = 0; if 1651 # 1621, we take lrk12 16212
c= r 12 and s= -c2. (6.16)
612 1622

In either case,
sc(16212 1,12)rk 6162rk (6.17)
x: =~ and y-i= 1. (6.17)
Irk 12 al Ek- 2
H*R(k)H- to GHI*R(k)IIG1. The dashed box is the 2 by 2 submatrix displayed in (6.15). Notice that c and s, defined in (6.16), are real scalars chosen so that

c2 + 2 = and 21612 + S216212 = Irk2. (6.18)

With these identities, the validity of (6.15) follows by direct computation. By the choice of p and q, we have
1621 r 7k 161, (6.19)

If 1611 / 1621, it follows from (6.19) that c and s are real nonnegative scalars. It can be checked that the 2 by 2 matrices in (6.15) associated with G1 and G* are both unitary. Consequently, both G1 and G2 are unitary. We define

R+l) = (HG2)*R(k)(HGI) = Q*R )Ps,







90

where Qk = IG2 and Pk = TIG1. By (6.15) and Figure 6-2, R(k+l) has properties (a) and (b) for L = k + 1. Now consider property (c).
We write a b if a and b are equal after a suitable reordering of the components. Let a, b, a+, and b+ be vectors whose components are ordered in decreasing magnitude, and which satisfy

a rk:K, b r (k) a+ rk+:K, and b+ (k+l) (6.20)
a'*K, O~ k:K, a ~.rk+I:K, LI rk+l:K" 6.0


Thus ai is the i-th largest (in magnitude) component of rk:K. By the induction hypothesis, we have a -< b. To establish (c), we need to show that a+ < b. Let the index s be chosen so that a8 = rk, and let the index t be chosen so that

btl > Irkl Ibt+ll. (6.21)

By the definition of p and q, rp = bt and r qq= bt+,. As seen in (6.20), a+ is obtained from a by deleting a8 = rk. The vector r(k+l) is obtained from r(k) by a unitary transformation that changes the value of two elements. In particular, b+ is obtained from b by replacing the adjacent pair bt and bt+l by btbt+l rk
Z- Ik 2

By (6.21) btl > ly) > lbt+ll. Consequently, b =y. (6.22)


We partition the proof of (c) into 2 cases.
Case 1: s < t. Since a+ < ai for all i, a- b, and bi = b+ for 1 < i < t, we have atl a1-1 -,<4 b = b t. b (6.23)


For j > t > s, it follows from the induction hypothesis and the connection between a and a+ that
j-1 j-1jj
I-kl fllaI = lafIfIlatl = H jai K 17 b. (6.24)
i=i=1=1 i=1 i=1
Since G, and G2 are unitary, the determinant of (6.15) gives

1,, k = r ) = Ibt = 17r'y = irkbI+l, (6.25)