Citation

## Material Information

Title:
Transceiver design for MIMO communications : a channel decomposition perspective
Creator:
Jiang, Yi
Publication Date:
Language:
English
Physical Description:
x, 105 leaves : ill. ; 29 cm.

## Subjects

Subjects / Keywords:
Antennas ( jstor )
Channel capacity ( jstor )
Constellations ( jstor )
Eigenvalues ( jstor )
Geometric mean ( jstor )
Matrices ( jstor )
Signals ( jstor )
Supernova remnants ( jstor )
Transceivers ( jstor )
Transmitters ( jstor )
Dissertations, Academic -- Electrical and Computer Engineering -- UF
Electrical and Computer Engineering thesis, Ph. D
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

## Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 2005.
Bibliography:
Includes bibliographical references.
Also available online.
General Note:
Printout.
General Note:
Vita.
Statement of Responsibility:
by Yi Jiang.

## Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
027715281 ( ALEPH )
847496253 ( OCLC )

Full Text

TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
A CHANNEL DECOMPOSITION PERSPECTIVE

By
YI JIANG

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005

ACKNOWLEDGMENTS
Foremost, I thank my advisor, Professor Jian Li, for her support, encouragement, and guidance in the past four years. Dr. Li provided me the invaluable opportunity to investigate those fascinating research problems and always showed full confidence in me. I just hope I can live up to her expectations in my future career. I am very grateful to my collaborator, Professor William W. Hager, whose suggestions and rigorous math have benefited me a lot. I thank Dr. Tan F. Wong for teaching me the information theory. Some of the basic ideas of this dissertation were formulated when I was taking his class in the fall of 2003. 1 would like to thank Dr. John M. Shea, Dr. Tan F. Wong, Dr. Kenneth K. 0, and Dr. William W. Hager for serving in my dissertation committee. Thanks go to all my friends both at the University of Florida and elsewhere who made the last four years full of fun.
This dissertation is dedicated to my parents and my fiance Hongying.

page
ACKNOWLEDGMENTS..................................... iii
LIST OF TABLES.......................................... vi
LIST OF FIGURES......................................... vii
ABSTRACT..................... ......................... ix
CHAPTER
1INTRODUCTION....................................... 1
1.1 Two Categories of Schemes for MIMO Communications.......... 1
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate......... 2 1.3 MIMO Transceiver Design from Channel Decomposition Perspective .3 1.4 Dissertation Outline..................................4
2 LINEAR MIMO TRANSCEIVER DESIGNS...................... 6
2.1 Channel Model and Channel Capacity ..... ... ... ... ...6
2.1.1 Channel Model .. .. .. .. .. ... ... ... .... ... ..6
2.1.2 Channel Capacity .. .. .. .. ... ... .. .... .....8
2.2 Channel Capacity and Cram6r-Rao Bound .. .. .. .. .... .....10
2.3 Rate Performance of Linear Transceivers .. .. .. ... ... ... ..13
3 MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION. .. .. .. ... ... ... ... ... ... ... ... ... ..17
3.1 VI3LAST and ZF-DP .. .. .. .. ... ... ... ... ... ... ..17
3.1.1 VBLAST .. .. .. .. ... ... ... ... ... ... ... ..17
3.1.2 ZF-DP .. .. .. .. .... ... ... ... ... ... ... ..18
3.2 Geometric Mean Decomposition for MIMO Transceiver Design ... 20 3.3 Performance Analyses and Imiplementat ions Issues .. .. .. .. .....21
3.3.1 Performance Analyses .. .. .. .. ... ... ... ... .....21
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking 23 3.3.3 Subchiannel Selection............................ 24
3.3.4 Further Remarks............................... 25
3.4 Performance Examples............................... 27
3.5 Conclusions. .. .. .. ... ... ... ... ... ... .... .....28
4 UNIFORM CHANNEL DECOMPOSITION .. .. .. .. ... ... ....33
4.1 Closed-Form Representationi of MMISE-VBLAST. .. .. .. .. .. ..34
4.2 UCD-VBLAST .. .. .. ... ... ... ... ... ..... ....35
4.3 UCD.-DP. .. .. .. .. ... ... ... ... ..... .... .....39
4.4 Performance Anialysis.................. .. .. .. .. .. ....40
4.11 Diversity Gain Anialysis.........................40
4.4.2 Further Remarks............................... 42

iv

4.5 Numerical Examples . . . . . . . . . . . . . 42
4.6 Conclusions . . . . . . . . . . . . . . . . 44
5 TUNABLE CHANNEL DECOMPOSITION . . . . . . . . . 52
5.1 Introduction . . . . . . . . . . . . . . . . 52
5.2 Channel Model and Preliminaries . . . . . . . . . . 53
5.2.1 Channel Model . . . . . . . . . . . . . 53
5.2.2 Channel Decomposition . . . . . . . . . . . 53
5.2.3 Majorization and Generalized Triangular Decomposition . . 55
5.3 Tunable Channel Decomposition . . . . . . . . . . 57
5.3.1 TCD-VBLAST . . . . . . . . . . . . . 57
5.3.2 TCD-DP . . . . . . . . . . . . . . . 61
5.4 MIMO Communications with QoS Constraints . . . . . . 63
5.5 CDMA Sequence Design . . . . . . . . . . . . 70
5.5.1 CDMA Sequences Maximizing Sum Capacity . . . . . 71
5.5.2 Uplink Case . . . . . . . . . . . . . 71
5.5.3 Downlink Case . . . . . . . . . . . . . 73
5.5.4 Numerical Example . . . . . . . . . . . . 73
5.5.5 Further Remarks . . . . . . . . . . . . . 75
5.6 Conclusions . . . . . . . . . . . . . . . 75
6 NOVEL MATRIX DECOMPOSITIONS . . . . . . . . . . 81
6.1 Introduction . . . . . . . . . . . . . . . . 81
6.2 Geometric Mean Decomposition . . . . . . . . . . 81
6.2.1 Generalized Maximin Properties . . . . . . . . 83
6.2.2 Implementation Based on Initial SVD . . . . . . . 84
6.3 Generalized Triangular Decomposition . . . . . . . . 87
6.3.1 Existence of GTD . . . . . . . . . . . . 87
6.3.2 The GTD Algorithm . . . . . . . . . . . 87
6.3.3 Inverse Eigenvalue Problem . . . . . . . . . . 93
6.4 Conclusions . . . . . . . . . . . . . . . . 95
7 CONCLUSIONS . . . . . . . . . . . . . . . . 99
REFERENCES . . . . . . . . . . . . . . . . . . 100
BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . 105

v

LIST OF TABLES
Table page
4-1 The UCD-VBLAST scheme ...... .................... 38
5-1 The TCD-VBLAST Scheme ...... .................... 61
6-1 Comparison of SVDEIG and GTD for inverse eigenvalue problems (CPU
time in seconds, singular value and eigenvalue errors in sup-norm) . 93

VI

LIST OF FIGURES
Figure page
3-1 Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr = 4 for i.i.d. Rayleigh flat fading channels ................... 29
3-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and
M1 = 5. Results based on 2000 Monte Carlo trials ............... 30
3-3 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 1000 Monte Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20
dB, and (d) 30 dB .................................... 31
3-4 BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr = 4 and (b) Mt = 4
and A,1 = 4 ......................................... 31
3-5 BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined with OFDM for ISI suppression ....................... 32
4-1 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20
dB, and (d) 30 dB .................................... 49
4-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and
1r = 5. Results based on 2000 Monte Carlo trials ............... 50
4-3 Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt = 4
and Air = 4 ......................................... 50
4-4 BER performances of the UCD-DP, UCD-VBLAST schemes and the
imaginary UCD-genie scheme. Results based on 1000 Monte Carlo
trials of an i.i.d. Rayleigh flat fading channel with Mt = 10 and M, = 10. 51
5-1 Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, C, = 3, C2 = 2, and 03 = 1 .......................... 57
5-2 A Matlab function to solve (5.49) ............................. 68
5-3 Inpt SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with it = 5
and A,. = 6 .......... .................................. 69
5-4 Input SNP vs. C1. A rank 2 channel is decomposed into two subchannels
with capacities C, and C2 = 10 C, .................... 70
6 1 The operation displayed in (6.8) ....... ....................... 86

vii

6-2 The operation displayed in (6.15) . . . . . . . . . . . 89

viii

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
A CHANNEL DECOMPOSITION PERSPECTIVE By
Yi Jiang
May 2005
Chair: Jian Li
Major Department: Electrical and Computer Engineering
This dissertation studies the signal processing aspect of multi-input multi-output (MIMO) communications. The contribution of this dissertation is twofold.
First, this dissertation presents a new perspective to the MIMO communications: any MIMO scheme can be regarded as a MIMO channel decomposer, which decomposes (in an information loss or lossless manner) a MIMO channel into multiple scalar subchannels. Based on this perspective, this dissertation presents three novel MIMO transceiver designs, the geometric mean decomposition (GMD) scheme, the uniform channel decomposition (UCD) scheme, and the tunable channel decomposition (TCD) scheme. All these schemes deploy either a decision feedback equalizer (DFE) at the receiver or a dirty paper precoder (DPP) at the transmitter. These transceiver designs represent a paradigm shift from the conventional linear MIMO transceiver designs to the nonlinear ones. The superior performance of the GMD and UCD schemes unveils the practical significance of making transmitter and receiver cooperate with each other. That is, such cooperations facilitate achieving the optimal tradeoff between the diversity gain and multiplexing promised by the MIMO communication theory. The TCD scheme represents a unifying solution to a considerably wide range of problems, including designing the precoder for orthogonal frequency division multiplexing (OFDM) communications and the optimal code division multiple access (CDMA) sequence design.
Second, this dissertat ion introduces two novel matrix decomposition algorithms, i.e., the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). The two matrix decompositions form the cornerstones of the three transceiver

ix

designs proposed in this dissertation. Moreover, the two decompositions have significant implications in the matrix analysis community. For instance, the GTD is a new solution to the inverse eigenvaluc problem.

x

CHAPTER 1
INTRODUCTION
1.1 Two Categories of Schemes for MIMO Communications
Communications over multiple-input multiple-output (MIMO) wireless channels have been a subject of intense research over the past several years because deploying multiple antennas at both transmitter and receiver sides can drastically improve the spectral efficiency [1] [2] [3] [4]. For example, in contrast to the conventional additive white Gaussian noise (AWGN) channel whose spectral efficiency is

C(snr) = log2(1 + snr) bps/Hz,

without requiring additional input power, the MIMO channel with Mt transmitting antennas and M receiving antennas can have spectral efficiency as large as [1] [2]

C(snr) = min(AMt, Mr) log2(snr) + 0(1) bps/Hz,

given that there is plenty of scattering in the channel. Many spatial multiplexing methods, e.g., the BLAST scheme [2] [5] [6] [7] [8] [9] [10] [11], have been proposed to reap the great channel capacity.
Improving the data transmission reliability is another advantage of applying multiple antennas in wireless communications. By transmitting the same information through more than one independent fading channel, one can obtain much more reliable communications thanks to the redundance introduced. The space-time coding methods are based on such a rationale, (see, e.g., [12] [13] [14] [15]).
Zheng and Tse [16] show that one can exploit the diversity gain and multiplexing gain promised by the MIMO channel simultaneously. However, there is a fundamental tradeoff between the two gains. Zlieng and Tse's theory provides a unifying framework to measure the performance of any MIMO schemes. Hence designing practical schemes capable of achieving the optimal diversity-inultiplexing tradeoff is a central research topic in MIMO communications.

2

1.2 Joint Transceiver Design: Where Tx and Rx Collaborate
All the aforementioned methods assume that the channel state information (CSI) is available at the receiver (CSIR) only. Under this assumption, collaborations between the transmitter and receiver are difficult in the physical layer. However, if the communication environment is relatively stationary, the availability of CSI at the transmitter (CSIT) is also possible via feedback or the reciprocal principle when time division duplex (TDD) is used. In fact, in the third generation WCDMA standard [17], the CSIT is assumed to obtain improved system performance, which is referred to as the closed-loop transmit diversity or transmit adaptive array (TxAA) technique. Based on this assumption, the joint optimal transceiver design (also referred to as precoding at the transmitter and equalization at the receiver) has recently attracted considerable attentions [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29].
These designs are based on a variety of criteria, including minimum mean-squarederror (MMSE), [18] [21] [22], maximum SNR [21], maximum information rate [19] [20] [22], and BER based criteria [23] [24] [25] [29]. More recently, a unified framework has been presented to accommodate all these criteria, under which the design problems can be solved via convex optimization methods [26].
The aforementioned literature on joint transceiver design considered linear transformations only. It is widely understood that the singular value decomposition (SVD), which decomposes a MIMO channel into multiple parallel subchannels, and water filling can be used to achieve the channel capacity [3]. However, due to the usually very different signal-to-noise ratios (SNR) of the subchannels, this apparently simple scheme requires careful bit allocation (see, e.g., [19] [20] [231) to match the subchannel capacity and achieve a prescribed BER. Bit allocation not only increases the coding/decoding complexity, but also is inherently capacity lossy because of the finite constellation granularity. An alternative is to use the same constellation in all the subchannels, like the schemes adopted by the European standard HIPERLAN/2 and the IEEE 802.11 standards for wireless local area networks (WLANs). However, for this alternative, the BER is dominated by the subchannels with the lowest SNRs. To optimize the BER performance, more signal power could be allocated to the poorer subehannels. Yet this approach causes significant capacity loss due to "inverse water filling" like power allocation. There is apparently a fundamental tradeoff between the capacity and the BER performance.

3

1.3 MIMO Transceiver Design from Channel Decomposition Perspective
In this dissertation, we present a new perspective to the MIMO communications. We regard the aforementioned MIMO schemes as MIMO channel decomposers, which decompose (in an information lossy or lossless manner) a MIMO channel into multiple scalar subchannels. For instance, the MIMO transceiver design based on SVD decomposes a MIMO into multiple eigen-subchannels. Similarly, the V-BLAST scheme decomposes a MIMO channel into multiple scalar subchannels which are referred to as layers by its inventors. These channel decompositions, however, are totally determined by the specific channel realization and one can have little control over how the channel is decomposed. For example, the gains of the subchannels obtained via SVD are totally determined by the singular values of the channel matrix, which one can have no control over.
An interesting question arises: if the transmitter and receiver are allowed to collaborate, how can we design a transceiver that can decompose a MIMO channel into multiple subchannels with prescribed channel gain, and without incurring capacity loss? This dissertation is devoted to answering this question. In the process of pursuing the answer, we investigate the following aspects of the problem.
First, we show that the conventional linear transceivers are inherently inflexible, and we cannot rely on linear transceivers to achieve our desired channel decompositions. Hence we need to go beyond the linearity constraint and investigate the nonlinear schemes, such as a decision feedback equalizer (DFE) and a dirty paper precoder (DPP).
Second, we study the possibility of new matrix decompositions other than using SVD. We propose two novel matrix decomposition algorithms, the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). The two decompositions represent a wide class of matrix decomposition, which has significant implications in the matrix analysis community. For instance, the GTD is a new solution to the inverse eigenvalue problem.
Third, we propose three transceiver designs which combine the new matrix decomposition algorithms with the DFE and DPP. The three designs are the GMD scheme, the uniform channel decomposition (UCD) scheme and the tunable channel decomposition (TCD) scheme. Among them, the UCD scheme can decompose, in a strictly capacity lossless manner, a MIMO channel into multiple subchannels with identical capacities or, equivalently, identical channel gains. Moreover, the UCD scheme is a practical scheme

4

that can achieve the optimal tradeoff between the diversity gain and multiplexing gain. Without incurring any capacity loss, the TCD scheme can decompose a MIMO channel into multiple subchannels with prescribed capacities/channel gains. This scheme is applicable to a wide range of applications, including the multi-task communications where independent data streams with different qualities-of-service (QoS) share the same MIMO channel, and designing the optimal CDMA sequences.
1.4 Dissertation Outline
In Chapter 2, we introduce the data model and some relevant information-theoretic results that will be used in this dissertation. We also review the existing transceiver designs and analyze the performances of those methods. By linking the channel capacity with the Cram6r Rao bound (CRB), we give an information-theoretic explanation why linear transceivers are inflexible.
Chapter 3 presents the GMD scheme that combines the VBLAST detector or DP precoder with the GMD matrix decomposition algorithm. The GMD scheme can decompose a MIMO channel into multiple identical scalar subchannels. This desirable property can bring much convenience to the practical system design, particularly the symbol constellation selection. Moreover, we have shown that the GMD scheme is optimal asymptotically for high SNR in terms of both information rate and BER performance while the computational complexity of our scheme is comparable to the conventional linear transceiver scheme.
In Chapter 4, we propose a uniform channel decomposition (UCD) scheme. Similar to the GMD scheme, the UCD is also based on the GMD matrix decomposition algorithm and can decompose a MIMO channel into multiple identical subchannels. Two remarkable merits of UCD, which are not shared by the GMD scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD can achieve the optimal diversity and multiplexing tradeoff. Moreover, the UCD scheme can decompose a MIMO channel into an arbitrarily large number of independent subchannels, which is an enabling technology to achieve high data rate transmission using small symbol constellations.
Chapter 5 is devoted to tackling a new aspect of the MIMO transceiver design l)roblem. Instead of attempting to optimize the BEt performance for fixed input power and data rate, we propose tie TCD scheme which can decompose a MIMO channel into multiple sutchannels with prescribed channel capacities. We show that TCD is a

5

solution to a wide range of applications, including the applications in which independent data streams with different qualities-of-service (QoS) share the same MIMO channel and design the optimal CDMA sequences.
The mathematical foundations of this dissertation, the GMD and GTD algorithms, are established in Chapter 6. The two novel matrix decomposition algorithms are the cornerstones of the MIMO transceiver designs proposed in this dissertation.
The conclusions are given in Chapter 7.
To read this dissertation, it is unnecessary to plunge into the details of the GMD and GTD algorithms. For this reason, we put them to the latter part of the dissertation. However, a rough understanding of the two algorithms is necessary to appreciate Chapters 3-5.

CHAPTER 2
LINEAR MIMO TRANSCEIVER DESIGNS
2.1 Channel Model and Channel Capacity
2.1.1 Channel Model
We consider a communication system with Mt transmitting and M, receiving antennas in a frequency flat fading channel. The sampled baseband signal is given by

y = HFx + z, (2.1)

where x E CLX1 is the information symbols precoded by the precoder F E CMxL and y E CMx' is the received signal and H E CMxM is the channel matrix with rank K. We assume E[xx*] = oaIL and z N(O, (olM,) is the circularly symmetric complex Gaussian noise, where IL stands for an identity matrix with dimension L. We define the input SNR as
E[x*F*Fx] ( 1
P O2 -Tr{F*F} Tr{F*F}, (2.2)
Ozz
2
where a = -. Designing the MIMO transceivers, including the precoder F and the associated equalizer, is the focus of this dissertation.
We note that the data model in (2.1) is generic. For an intersymbol-interference (ISI) channel with impulse response h = [h0, hi,..., hAM-1]T with (.)T denoting transpose, if a block data with length N are transmitted using the "zero-padded" OFDM, then the received block data can also be written in the form of (2.1) with

h0 0 0 ... 0
h0 0 ... 0

H= 0 ..0 (2.3)

0 h4 1 ... h10
0
0 0 ... 0 hA 11 In this case, H is a Toeplitz matrix with its dimensionality At = N and Air N+M -1. If the OFDM with cyclic prefix is used, the channel matrix is a circulant Toeplitz matrix,

6

7

ho 0 hM-1 ...... hi
hi ho 0 hM-1 h2
0 .
H =(2.4) hM-2 ... hM-1
hM-1 hM-2 ... hi ho 0
0 hM-1 hM-2 ... hi ho

Here, Mt = r N. In either case, if the block data are precoded with the linear
precoder F, then the received data are given in (2.1). This ISI channel problem has been studied in [21] [30].
In an idealized synchronous CDMA (S-CDMA) system where the channel does not experience any fading or near-far effect, L mobile users modulate their information symbols via spreading sequences {sI }[1, each of which has the processing gain N. The discrete-time baseband S-CDMA signal received at the (single-antenna) base-station can be represented as [31]
y = Sx + z (2.5)

where S = [SI,... SL] E R NL and the lth (1 < I < L) entry of x, xj, stands for the information symbol from the lth user. In the downlink channel, the base station multiplexes the information dedicated to the L mobile users through the spreading sequences, which are the columns of S. Then all the mobiles receive the same signal given in (2.5). We remark that (2.5) can also be written as (2.1) with H = IN and F = S. Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading sequences amounts to optimizing the precoder F for a MIMO system. Indeed, this problem has been under intensive research in the past several years.
In summary, both designing a precoder for OFDM transmission through an ISI channel and searching for the optimal S-CDMA sequences can be regarded as special cases in the unifying framework of MIMO transceiver designs. MIMO transceiver designs can be used in the OFDM and CDMA applications after only simple modifications. In this dissertation, we will concentrate on MIMO transceiver design although we will discuss the optimal design of CDMA sequences in Chapter 5.

8

2.1.2 Channel Capacity
Suppose x is a Gaussian random vector. The capacity of the MIMO channel (2.1) is 1
C = log2 I + 2HFF*H*I (2.6)

where denotes the determinant of a matrix. If both CSIT and CSIR are available, we can maximize the channel capacity with respect to F given the input power constraint axTr{FF*} = pu. That is,

CIT = max log2 II + -1HFF*H*i, (2.7)
4 T{FF* }=pa~

where a is as defined in (2.2) and the subscript of CIT stands for "informed transmitter".
Denote the SVD of H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {AH,k} =1 are the nonzero singular values of H. The solution to F in (2.7) is [3]
F = VM1/2. (2.8)

Here 4 is diagonal whose kth (1 < k < K) diagonal element Ok determines the power loaded to the kth subchannel and is found via "water filling" to be

k (P) = (2.9)
H,k
2 El
with IL being chosen such that au Ok(P) = pa' and (a)+ = max{0,a}. Then the solution to (2.7) is
CIT = log2 1 + -2 Ak bps/Hz. (2.10)
k=1
Note that some of Ok's can be zeros. In this case, we can only transmit L < K data streams.
If the CSIT is not available, the optimal transmission strategy is to evenly allocate power to each antenna [3]. For this case, F = I ,h and the channel capacity with uninformed transmitter (UT) is Al, / fA2\,,
Cur = log2 1 + bps/Hz. (2.11)
n=1

1 Throughout this dissertation, we assume that the coherent time of the channel goes to infinity. Hence advanced coding is applicable to approach the Shannon capacity.

9

It is proven [32] that if K = CIT
CT--+1 as p-oc. (2.12)
CUT

We claim a stronger relationship as follows. 2 Lemma 2.1.1 For the data model in (2.1), if the channel matrix H is of full column rank, i.e., K = Mt, then

CIT CUT --+ 0 as p oc. (2.13)

Proof: Inserting (2.9) into (2.10) yields

K
2 +
CIT = log2 (1iAH n) (2.14)
n=1
where p is chosen such that
K
Kp E -2 -, (2.15)
n=1 Hn
or
2 = P+(2.16)

Here we assume that all the K subchannels are used because of large p, i.e., IL > 0 for n = 1,2,...,K.
From (2.14), (2.16), and (2.11), and using the fact that K = Mt, we have K + K i
n= A
CIT CUT Z= log2= p+ K (2.17)

Note that
P n=l A,-lir K =1 for 1 P-OC

and that, f(x) = log2 x is a continuous function if x > 0. The lemma follows immediately from (2.17). U
However, we note that CSIT can be very helpful in the following cases:
A. The SNR is low or moderate.
B. H is rank deficient or ill-conditioned.

2 A similar, but somewhat vague, statement is found in [8].

10

C. There are more transmitting antennas than receiving ones, i.e., Mt > M,.
Moreover, the availability of CSIT provides more freedom, which makes it easier to devise joint transceiver design schemes to achieve the underlying channel capacity. This observation is the underlying theme of this dissertation.
2.2 Channel Capacity and Cram6r-Rao Bound
One of the most important significances of the Shannon's information theory is that this theory can predict the highest achievable data rate for a given channel. Similarly, the Cram6r-Rao bound (CRB) [33], which is the inverse of the Fisher information matrix (FIM), can predict the minimum mean squared error (MMSE) an estimator can achieve. In this section, we show that the MIMO channel capacity formula of (2.6) can be rewritten as a function of CRB, or FIM. Based on this relationship, we show that linear transceivers lack flexibility.
We rewrite (2.1) as follows:
y = Hx + z, (2.19)

but we relax the assumption of (2.1) slightly. Instead of assuming spatially white noise, we assume that z N(O, R.). We also assume that the channel input x N(O, R:) also has circularly symmetric complex Gaussian distribution and is independent of z. Then the channel output y N(O, HRXH* + R,). For this more general scenario, the channel capacity is
C = log JR, + HRxH*I (2.20)
IR I (.0

Now Consider the following random vector,

x N (0, R, (2.21)
y HR,, RY

Its log-likelihood function is

log f(x, y) = -const [x* y*] R, RxH* (2.22)
HR, R y

Using the block matrix inversion formula [34], we get,

[ ., RVH*] [A ] (2.23)
HR RY B* o>

where
A = (R, RxH*RyxHRz)- (2.24)

B = (R, RxH*R HRz)- R1H* R (2.25)

and o is irrelevant to the present discussion. From (2.22)-(2.25) we have

Olog f(x, y) = (R RxH*R-'HR,) (x RxH*R 'y). (2.26)

where x is the conjugate of x. Here we define the differentiation with respect to a complex-valued vector as [35, Appendix B] ( -a (a a 1
8 1 8, 3-, 1 7 +3
: (2.27)
0w 2 8* 2
O,,M 3oM+ o-,

where the mth entry of w, wm Um + jVm, m = 1,...,M. The Bayesian Fisher information matrix (FIM) [36] is given by FIM = E [ log f (x, y) log f (x, y)T SOx (2.28)

Based on (2.26) and (2.21), we obtain

FIM = A[I: RH*Rl[ R R HI A
HRx Ry -RyxHRx
= (R RxH*R 1HR) (2.29)
= R; + H*(Ry HRxH*)-'H (2.30)

= R- + H*R;-H (2.31)

Comparing (2.20) and (2.31), we see that R = log2 IRI + log2 IFIMI (2.32)

= log2 RxI log2 ICRBI (2.33)

where
CRB = FINI-' = Rx RxH*R'HR (2.34)

This shows that there exists a simple relation between the Gaussian MIMO channel throughput, which is an upper bound of the information transmission rate for any

12

coder/decoder, and the CRB, which is a lower bound on the covariance matrix of any unbiased estimator of x.
The MMSE estimator of x is

KMMSE -- R1H (HRxH* + R,)-1 y (2.35)

It is easy to verify that the MMSE estimator of x can achieve the CRB. Hence the MMSE estimator is the best we can achieve under the Gaussian assumptions. In general cases, the matrices FIM and CRB are non-diagonal; i.e., the MMSE estimates of the elements of x are correlated. The correlations between the elements of x clearly contain useful information for the subsequent decoding procedures. However, in practice, we only estimate the single elements of x separately and ignore the correlations between these elements. This causes the loss of information. In fact, we can quantify the capacity loss as
Mt
Q_ = log CRBkk log JCRB (2.36)
k=1
where CRBkk denotes the k-th diagonal element of CRB. According to the Hadamard inequality [34], for any positive semidefinite matrix M E CK,

K
IMI < 17 Mkk (2.37)
i=1

and the equality holds if and only if M is diagonal. Hence C,,,, > 0 and there is no capacity loss if and only if CRB is a diagonal matrix.
Based on the aforementioned discussions, we see that i) in general MIMO cornmunications, linear MMSE estimators followed by separate substream decoding are not capacity-wise optimal and ii) if the channel matrix H has the property that CRB of (2.34) is a diagonal matrices, linear MMSE estimators may be the first step of capacity lossless processing. If CSIT is available, the transmitter can apply some precoder F and get a virtual channel matrix
H,, = HF (2.38)

such that CRB is diagonal. This explains why all the existing linear transceiver designs invariably lead to the diagonalization of the chanmel matrix. Indeed, if R, is diagonal and R: = (., then it follows from (2.31) that H,, must have orthogonal columns to get diagonal FIM and hence diagonal CRB. Theii the precoder F = V, which is the right singular vector of H, is the only optimal solution. Yet as we discussed

13

before, this inflexible transceiver scheme can bring many difficulties to the subsequent coding/decoding and modulation/demodulation procedures.
2.3 Rate Performance of Linear Transceivers
To gain more insights into the limitations of the linear transceiver designs, we analyze the asymptotic rate performances of two typical linear transceiver designs for high SNR. We will show that the linear transceivers may suffer from considerable capacity loss and there is apparently a fundamental tradeoff between the throughput and the BER performance.
According to the channel model of (2.1), the received data vector is

y = HFx + z. (2.39)

The optimal linear receiver is always the LMMSE equalizer (also see, e.g., [23]) G,= FH (HFF*H*o2 + aI)-1, (2.40)

which yields the optimal estimate of the information symbol 9 = Gopty. The meansquared-error (MSE) matrix of 9 is

E = (I + t-'F*H*HF)-'. (2.41)

Note that E is a function of the linear precoder F. In the following, we analyze two linear precoder designs based on the minimization of the trace of the MSE matrix (MTM) and the minimization of the maximum diagonal elements of MSE matrix (MMD) criteria, which are referred to as ARITH-MSE and MAX-MSE in [26], respectively. We choose these two schemes because they appear to be the most typical ones and the MMD scheme yields the optimal (or very close to the optimal) performance among all the linear transceivers. Indeed, tile MMD is equivalent to the linear MIN-BER scheme in the flat fading channel case (see [26]). We do not consider the SVD plus water filling scheme herein since it requires the complicated bit loading.
The MTM scheme, or ARITH-MSE, which has appeared in several linear transceiver design papers (see, e.g., [22] [23] and [26]), attempts to minimize tr(E) with respect to F. The MTM precoder turns out to be

FAn'M = VI )1/2 (2.42)

14

where V is as defined in the SVD H = UAV*, and 4) is a diagonal matrix whose ith diagonal element Oi denotes the signal power loaded to the ith subchannel. According to the literature (see e.g. [231 Sec. III-A)

1AHi AH- (2.43)

where p is the Lagrange multiplier which controls the loaded power such that 1 di = pao Suppose p is sufficiently large. Then all the K subchannels are used and

= 2i = pr2 (2.44)

or
-1/2 P + iAH (2.45)
= "A-'o;1(245 Substituting (2.42), (2.43) and (2.45) into (2.41), we see that E is diagonal with the ith diagonal element
K 1-1
k=l AH,k
E ( = + kK=1 A )AH, (2.46)

Then (cf. Equation (28) of [26])

C = -log 2E (2.47)

= log2 (+K ) + 10g2 A,. (2.48)
i=k H,k )
Hence the sum rate of the channel using the MTM scheme is K p +-' K -2 K
CAr : = C = K log2 K k= -1Hk Og2 ,i. (2.49)
i=1 _-.k=1 H,k i=1
The channel capacity with uniform power loading in the K subchannels is

K
CUPL = log2(1 Hi (2.50)
i=1
Here CupL is different from C7T defined in (2.11) in that CupL corresponds to tilhe channel with the transmitter knowing the range space of H.
It follows from (2.49) and (2.50) that
D 2 pt -+' Ai= H A2

CUPL-CITM = log2 (1 + -AH,)-Klog2 ( i log2AHi. (2.51)
i=1 i= 1 ,i i=

15

After some straightforward calculations, we have Y vK A-1
lim CUL CMTM = K log2 zi=1 ,i bps/Hz. (2.52)
(- =1 1/K
P-00 K A-1
1.li=1 "H,i)

Note that for any real valued sequence {AH,i }K1 > 0, the arithmetic mean is greater than EKI, K 1/'K
or equal to the geometric mean, or 1 A (He K H 1 Ai11 Hence we conclude
that limp_~. CUPL CMTM > 0 and the equality holds if and only if {AH,i__ are all the same. We infer from (2.52) that the capacity loss of the MTM transceiver can be quite large if the channel matrix H has a large condition number, which is verified in Section 3.4.
If the same constellation is used for each subchannel, then the substream corresponding to the largest E dominates the overall BER performance. Recall that E = which is proportional to the inverse of AH,i. Hence the subchannels may have very different SNRs especially when H has a large condition number. To mitigate this undesirable effect, one can use the MMD transceiver, or MAX-MSE (cf. [26] Section V-A5), with FMMD = FMTM8, (2.53)

where 8 is a unitary matrix that makes all the diagonal elements of E in (2.41) the same, that is,
I K
E = K Ei. (2.54)
i=1
According to (2.47), the capacity of the channel using the MMD linear transceiver is

K
CMMD = -K log2e = -Klog2 E. (2.55)
i= 1
Thus

KA
1 K K
CTM CMMD = K log2 E log2 E, (2.56)
i=1 i=

= Klog2 1 ) K, (2.57)

g2 K A-1 (2.58)
= K 10g2 11lj(2.58)

16

where to get (2.58) from (2.57) we have used (2.46). Note that the relative capacity loss of MMD compared with MTM is independent of SNR given that all the subchannels are used. Interestingly, we can see from (2.58) and (2.52) that CMTM CMMD = limp__o CUPL CMfTM. We conclude that asymptotically for high SNR, the MMD transceiver has twice the capacity loss of MTM, i.e.,
I 1 bp/-1 2.9
lim CUPL CMMD = 2K log2 bps/Hz, (2.59)
(K -)1/K

although it may yield better BER performance. An intuitive explanation of the capacity loss of the MMD transceiver is as follows. Note that the only difference between MTM and MMD is the prerotation matrix E, which is an invariant operator in terms of information capacity. However, E makes the MSE matrix E non-diagonal, which means that the elements of i = Govty are correlated. Clearly, the correlation contains useful information for symbol detection and decoding. However, the linear equalizer ignores the correlation, which results in the additional capacity loss quantified in (2.58). The analyses here are verified in Section 3.4.
In summary, the MTM transceiver suffers from capacity loss of (2.52) due to the information theoretically non-optimal power loading defined in (2.43). The MMD transceiver suffers from additional capacity loss because it makes the MSE matrix nondiagonal. Hence there is an apparently inevitable tradeoff between the information rate and BER performance if the same symbol constellation is used in the different subchannels. In the next chapter, we will introduce the GMD scheme and clarify that there is not necessarily a tradeoff between BER performance and channel capacity. Indeed, the GMD scheme attempts to achieve the best of both worlds simultaneously.

CHAPTER 3
MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION
3.1 VBLAST and ZF-DP
In this section, we first give a brief introduction to the VBLAST architecture [5], which is equivalent to the generalized decision feedback equalizer (GDFE) [37]. We also introduce the more recent zero-forcing "dirty paper" precoder (ZFDP) applied to the MIMO broadcast channels [38] [39].
3.1.1 VBLAST
VBLAST is a simple suboptimal receiver interface which is used in the MIMO system assuming that only CSIR is available. For a MIMO system (2.1) with Mt < Mr and rank K = Af,, the transmitter allocates independent bit streams across the Mt transmitting antennas with no precoding. To decode the transmitted information symbol, VBLAST first estimates the signal with the spatial structure hM,, where hi denotes the ith column of H, and then cancels it out from the received signal vector. Next, it estimates the signal with spatial structure hM,-1 and so on. The signal estimator can be either the ZF or MMSE estimator. Some proper reordering of the columns of H is helpful to improve the BER performance [5]. This decoding scheme involves sequential pulling and cancellation which is proved to be equivalent to the generalized decision feedback equalizer (GDFE) [37].
The ZF nulling step in the VBLAST scheme can be represented by the QR decomposition H = QR where Q is an Aft x K matrix with orthonormal columns and R is a K x K upper triangular matrix. Let us rewrite (2.1) as

y = QRx + z. (3.1)

Multiplying Q* to both sides of (3.1) yields

SRx + 2, (3.2)

17

18

or
Yl r1l r12 ... rlK Xl
Y2 0 r22 ... r2K X2 + 2
=+ .(3.3)

YK 0 ... 0 rKK XK K
The sequential signal detection is as follows for i = K : -1 : 1
ii= C -( _ZEKi+1 ri~ij) j

end
where C stands for mapping to the nearest symbol in the symbol constellation. Ignoring the error-propagation effect, we see that the MIMO channel is decomposed into K parallel scalar subchannels i = riixi + ii, i = 1, 2,. .., K. (3.4)

3.1.2 ZF-DP
We consider a broadcast MIMO channel with Mt transmitting antennas and Al,. receiving antennas (Mt >_ Al,). The channel model is exactly the same as (2.1) and the CSIT is available. However the receiving antennas cannot cooperate with each other. A vector transmission scheme was proposed in [40], which combines the QR decomposition and "dirty paper" precoding. We refer to this approach as the zero-forcing "dirty paper" precoding (ZFDP). (The use of the "dirty paper" phrase is due to Costa [41].)
The ZFDP scheme resembles the zero-forcing VBLAST method. It also goes through the sequential pulling and cancellation procedure. The only difference is that all these operations are done by the transmitter.
By assuming H to be of full row rank, i.e., K = Air, ZFDP also begins with the QR decomposition H* O f. Let us rewrite (2.1) as

y = R* *x + z. (3.5)

Denoting x = Q k yields
y = R*:c + z, (3.6)

19

or
Y1 { r11 0 ... 0 Z1
Y2 r21 62 ... 0 2 Z2

YK rKl ...... rKK XK ZK

Denote s E CKX1 to be the symbol vector destined for the K receivers. We wish to have k satisfying
?1181 ?11 0 ... 0
r22s2 21 r22 ... :2 (3.8)

rKKSr rKl ...... rKK K
The solution to (3.8) is
k= R-*diag{R}s. (3.9)

However, the matrix inversion can amplify the norm of k significantly which can lead to additional power consumption at the transmitter. By exploiting the finite alphabet property of the communication signals, the modulo arithmetic precoder (more recently known as the Tomlinson-Harashima Precoder [42], [43]) can be applied to bound the value of the transmitted signal. Moreover, the trellis precoding can be used to eliminate the 1.53 dB shape-loss of Tomlinson-Harashima precoding [44]. The ZFDP transmission scheme decomposes the MIMO channel into K parallel scalar channels (see [40] for more details)
yi :i + zi i = 1, 2,..., K. (3.10)

Several remarks are now in order. a) VBLAST is shown to be able to achieve only about 72% of the capacity [5]. That is because imposing the same rate of transmission on all the transmitters makes the channel capacity limited by the worst of the K scalar subchannels. b) VBLAST has only diversity gain of MrA- t+1. c) ZFDP can achieve the broadcast channel capacity for high SNR [39], but the subchannels have different fading levels. Hence the transmitter, just like the aforementioned linear transceivers, have to consider the tradeoffs between the BER performance and the channel throughput. d) ZFDP scheme causes no error propagation, and thus (3.10) is precise. e) Both VBLAST and ZFDP involve nonlinear operations.

20

3.2 Geometric Mean Decomposition for MIMO Transceiver Design
Note that VBLAST assumes no cooperations among transmitting antennas and ZFDP assumes no cooperations at the receivers. Then a natural question arises: can we exploit both the CSIR and CSIT to make things better if both CSIR and CSIT are available? We attempt to address this question next.
In the sequel, we assume that the same signal constellation is used in all the independent symbol streams to reduce the system complexity. This is consistent with the HIPERLAN/2 and IEEE 802.11 standards. Then the overall BER performance of the system will be limited by the subchannel with the lowest SNR. To mitigate this problem, based on (3.4) and (3.10), we consider the following optimization problem

max min {rii : 1 < i < K}
QP
subject to R = Q*HP
R CIKXK,rij= 0 for i> j (3.11)
rii>0 for 1 Q*Q = P*P = IK

where the semi-unitary matrices Q and P denote the linear operations at the receiver and transmitter, respectively.
Since both Q and P are semi-unitary matrices, we have Hn=1 nn n=1 AH,n,
where {AH,,},=i are the K non-zero singular values of H. In Chapter 6 we show that if there exist semi-unitary matrices P and Q satisfying

H = QRP*, or equivalently, R = Q*HP (3.12a)

where the diagonal elements of R are given by

rii = A< K H,n 1
then the R in (3.12) is the solution to (3.11). The detailed treatment of the decomposition (3.12) is delegated to Chapter 6, We refer to this decomposition as the geometric mean deconiposition (GMD) since the diagonal elements of R are the geometric mean of {A ,,,J=1. A computationally efficient and numerically stable algorithm is proposed in Section 6.2 to calculate the decomposition.
It seems reasonable to constrain the linear equalizer Q to be semni-unitary since it will keep the background noise white. Yet it seems unnecessary to constrain P to be

21

semi-unitary as well. Indeed, the constraint that P and Q should be semi-unitary is in fact inactive as shown in the following lemma established in Section 6.2.1. Lemma 3.2.1 The GMD of (3.12) is also the solution to the following optimization problem with relaxed constraints:

max min {rii: 1 PQ

subject to R = Q*HP, rij = 0 for i > j, R G R (3.13)

rii>0, 1
tr(Q*Q) < K, tr(P*P) < K.

Proof: Omitted. See Section 6.2.1 for details. U
The GMD, which can be viewed as an extended QR decomposition, can be readily combined with the aforementioned VBLAST (GDFE) or ZFDP. GMD-VBLAST is implemented as follows: We first calculate the GMD H = QRP*. Next we choose the precoder F = P, then the equivalent data model is

y = QRx + z. (3.14)

The next step is nothing but the VBLAST detector.
Ignoring the error propagation effect, we can regard the resulting subchannels as K independent and identical subchannels

yi = AHxi z, for i= 1,...,K. (3.15)

The GMD-ZFDP scheme is similar to GMD-VBLAST because of the duality between VBLAST and ZFDP.
3.3 Performance Analyses and Implementations Issues
In this section, we first present the performance analyses of the GMD scheme from capacity perspective, from which we demonstrate the advantages of our GMD scheme over the linear transceivers. Next, we consider combining the GMD scheme with the blind two-way channel subspace tracking in the TDD scenario. To achieve close to optimal performance at low SNR, we propose to combine GMD with subchannel selection. Finally, we discuss the relationship between our GMD scheme and [30].
3.3.1 Performance Analyses
As we have mentioned earlier, the overall BER performance of a MIMO communication system is dominated by the worst subchanuels asymptotically for high SNR.

22

Hence the scheme optimizing the worst subchannel can enjoy the optimal BER performance for high SNR. This observation is also the motivation of the aforementioned MMD scheme. As a major advantage over the linear transceiver schemes, the GMD scheme is also asymptotically optimal in terms of the channel capacity for high SNR as we will show below.
If the signal power is allocated evenly to the K subchannels, then based on (3.15), we get
CGMD = K1og2 ( K H (3.16)

where p is defined in (2.2). The channel capacity with uniform power loading on the K subchannels is (see (2.50))

K
CUPL = og2 (I H (3.17)
n=l

It follows from (3.16) and (3.17) that

CUPL CGMD = log2 =I(1 + p ,n) (3.18)
(1 + pX2)K

From (3.12b) and (3.18), we have

lim CUPL CGMD = 0. (3.19)
p--cxD

Based on Lemma 2.1.1
lim CIT CUPL = 0. (3.20)
p-oo
Hence, it follows from (3.19) and (3.20) that

lim CIT CGMD = 0, (3.21)

i.e., for high SNR the GMD scheme is asymptotically optimal.
Hence the GMD scheme does not need to make the tradeoff between the information rate and BER performance as the conventional linear transceivers. Instead, our GMD scheme can achieve the optimum on both aspects simultaneously for high SNR.
As we have mentioned before, VBLAST may suffer from error propagation. Hence the BER performance of GMD-VBLAST will be inferior to the scalar equivalence in (3.15). We calculate the upper bound of the GMD-VBLAST BER as follows. For a fixed SNR p, we assume that the system of (3.15) has symbol error rate (SER) P,, i.e., each subchannel has SER Pa/K. XVe consider the worst case that decoding errors in some

23

subchannels will cause the failure of the decoding in all the subsequent subchannels. The SER upper bound is readily calculated as SK-1
Pe,GMD-VBLAST K (1 Pe)n(K n)P
n=0
K-1
< -k (K-n)Pe
n=O
K + 1Pe. (3.22)
2

For a moderate K, say K < 10, the performance loss caused by the error propagation is rather small. For a system with high dimensionality, GMD-ZFDP is a better choice since it causes no error propagation. On the other hand, the Tomlinson-Harashima precoder leads to an input power increase of M for M-QAM.
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking
In TDD systems, the GMD scheme may be combined with two-way channel subspace tracking techniques. The GMD algorithm, given in Chapter 6, starts with the SVD. To calculate the matrix P (cf. (3.11)), we only need to know the singular values A and the right singular vectors V (cf. Chapter 6). Similarly, only A and U are used to calculate Q. Rewriting (2.1) with the precoder F = P yields,

y = HPx + z. (3.23)

Since the GMD scheme uses the same signal constellation and uniform power allocation, the covariance matrix of s is a scaled identity matrix, i.e., E[xx*] = a2I. Hence,

Ry = E[yy*] = HH*a + azI. (3.24)

If the signal power ax and the noise power az are known a priori, we have HH* = (Ry KI)/aX. Applying SVD to HH*, we get

HH* = UA2U*. (3.25)

The GMD algorithm can be applied based on U and A to get the matrices Q and R, which are sufficient for decoding. If a TDD system is used, the reverse channel, where the roles of previous transmitter and receiver are exchanged, can be modeled as

Yrev, = HTQ*s,.ev, + Zre, (3.26)

24

where the subscript "rev" means "reverse channel". Define Ry E[YrvYv E v] (3.27)

where y denotes the complex conjugate of y. Using the similar argument, we have

H*H = VA2V*. (3.28)

Then the reverse receiver, i.e., the previous transmitter, can calculate R and P from V and A. Channel subspace tracking techniques (see, e.g., [45] [46]) can be used to estimate U, V and A efficiently. Hence our GMD scheme can be applied without the need of using training symbols for channel estimation. We note that this merit of GMD is not shared by the conventional transceiver schemes introduced in Section 2.3 since all those methods allocate different powers to different subchannels, which makes it difficult, if not impossible, to estimate the singular values in A. Of course, if the same power is allocated to each eigen-subchannel, this blind two-way channel subspace tracking idea can also be combined with the SVD based schemes, at the cost of significant capacity loss.
The GMD scheme can be made backward compatible with the TDD systems using VBLAST decoders. By using CSIT or blind subspace tracking techniques, the transmitter can calculate the linear precoder F. Hence it can always precode the transmitted data x to be Px, even when sending the training data. Thus the receiver is "fooled" to believe that the channel is the virtual one H,t = HP = QR. Although the linear precoder P is made transparent to the VBLAST detector, the decoder still enjoys the multiple identical subchannels due to the linear precoder F = P.
3.3.3 Subchannel Selection
The previous discussion is based on the assumption that all the subchannels corresponding to positive singular values are used for signal transmission. However, in practical scenarios, some of the positive singular values of the channel matrix H can be very small. This situation occurs for spatially correlated flat fading channels, or even i.i.d. Rayleigh flat fading channels with 1,M r Alt > 1. From (3.12b), we see that it will influence the overall channel quality and hence subehannel selection is helpful. The other situation where subchannel selection is needed is the case when the input power is low or moderate. In this section, we propose a simple algorithm to select the

25

subchannels, which is numerically verified to be able to achieve near optimal capacity even at low SNR.
Let us sort the singular values of H as AH,1 > AH,2... > AH,K > 0. If GMD is constrained to the first n < K eigen subchannels, we obtain n identical subchannels

yi=A,xi+z, for i=1,...,n. (3.29)

where

wr = F (3.30)

To maximize the channel throughput with our GMD scheme, we need to solve the following problem

max n log 1 + (3.31)
1 or

max 1 + P (3.32)
1< n The solution to this problem is straightforward. We can use either linear search or bisection method to find the optimal n.
Several remarks are in order. i) It is straightforward to incorporate the channel selection into the GMD algorithm. In Section 6.2.2, we show that GMD starts from SVD H = UAV* and then applies a series of Givens transformation to A to make it upper triangular. The Givens transformation can be constrained to the first n. < K diagonal elements of A. ii) The blind channel subspace tracking can be combined with the subchannel selection strategy seamlessly. If only the subchannels corresponding to the largest n < K singular values are selected, the blind channel tracking technique will track the n dimensional subspace automatically. iii) The performance loss of the GMD scheme at low SNR region is due to the well-known fact that the zero-forcing equalizer is inherently suboptimal. In the next chapter, we propose the so-called uniform channel decomposition (UCD) scheme, which can decompose a MIMO channel into multiple identical subchannels in a strictly capacity lossless manner.
3.3.4 Further Remarks
The author later noticed [30] in which an idea similar to GMD was proposed to approach the performance of the ML detector in the ISI suppression scenario. For a SISO ISI channel, if symbols are precoded and transmitted in a block manner, then the

26

data model (2.39) can be used to represent the received block data (cf. (2.3) and (2.4)). Note that for this case, H is a Toeplitz matrix due to the time invariant property of the ISI channel. A linear precoder design F was proposed in [30] such that the virtual channel Hvt = HF can be decomposed via QR decomposition to be Ht = QR where R has equal diagonal elements. We see that this equal diagonal idea is equivalent to GMD. However, our GMD scheme, independently motivated by the MIMO transceiver design problem, has several major advantages over the algorithm in [30]:
1. Our GMD scheme represents a paradigm shift from the conventional linear transceiver designs to nonlinear designs and can be proven, both numerically and theoretically, to have superior performance from both BER and information theoretic
aspects.
2. Our GMD algorithm is computationally much more efficient than that of [30].
Both algorithms start from the SVD of H which is followed by K 1 iterations.
The GMD involves 2K 2 fast Givens rotations. For a channel H with Mt = MT = K, the SVD requires O(Ka) flops while the GMD requires additional O(K2) flops. Thus the computational complexity of the GMD scheme is comparable to the conventional linear transceiver schemes. However, the algorithm in [30] involves multiplications and inversions of matrices in each iteration and the overall
computational burden turns out to be additional O(K4) flops.
3. For the GMD algorithm, only the information of HH*, and hence A and U, are
needed to calculate Q. However, for the algorithm in [30], the equalizer needs to know both the precoder F and H, and hence Hvt = HF, in order to apply the traditional QR to Hvt. Hence it cannot be combined with the aforementioned
blind two-way channel subspace tracking algorithm introduced in Section 3.3.2.
Like the algorithm in [30], the GMD scheme can also be combined with orthogonal frequency division multiplexing (OFDM) for ISI suppression. For a SISO ISI channel with memory L,
L-I
y(n) E hix(n 1) + z(n), (3.33)
1=0
after applying OFDM with block length N, we get a MIMO channel

y = Dx + z (3.34)

where D is a diagonal matrix with the diagonal elements equal to the N-point FFT of h = [ho, hl.. -i.l1]T. Hence the CMD scheme can be applied directly. We expect

27

that GMD-ZFDP may have better BER performance than GMD-VBLAST if N > 1, in which case the GMD-VBLAST may suffer from considerable performance degradation due to error propagations.
3.4 Performance Examples
We present next several numerical examples to demonstrate the effectiveness of the GMD scheme. In all the examples, we assume Rayleigh independent flat fading channels.
In the first example, we consider a Rayleigh flat fading channel with Mt = 4 and M, = 4. We compute the Shannon capacities of the channel with both CSIR and CSIT (CIT, (2.10)), the channel with uninformed transmitter (CUT, (2.11)), the channel using the GMD scheme (CGMD, (3.16)), the channel using the MTM scheme (CMTM, (2.49)), and the channel using the MMD scheme (CMMD, (2.55)). We average the capacities of 1000 Monte-Carlo-generated H realizations. The result is presented in Figure 3-1. We note that the capacity loss of the MMD scheme is about twice that of the MTM scheme at high SNR as predicted in Section 2.3. The relative capacity loss of the MMD scheme compared with MTM is smaller at low SNR because some subchannels are not used at low SNR. The GMD scheme outperforms the linear transceiver designs when the SNR is moderate or high and is asymptotically capacity lossless at high SNR.
Figure 3-2 shows the complementary cumulative distribution functions (CCDF) of the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR equal to 23 dB. The five thin dashed curves denote the channel capacities of the five subchannels obtained via SVD plus water filling. Note that the leftmost thin curve crosses the vertical axis at a value less than one, which means that the worst subchannel (corresponding to the smallest singular value of the channel matrix) is sometimes discarded by water filling. The thick line is the CCDF of each subchannel capacity obtained via GMD. Figure 3-2 further illustrates the disadvantages of the conventional "SVD plus bit allocation" scheme (see, e.g., [19] [20] [23]). The channel capacities of the 5 subchannels obtained via SVD plus water filling range from 0 to about 10 bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match the capacity of the worst subchannel and something like 512 or 1024 QAM to the best subchannel. This bit allocation significantly increases the modulation/demodulation complexity. Moreover, using a constellation with size greater than 256 is impractical for the current RF circuit design technology. For the CMD scheme, on the other hand, the same constellation with a moderate size, say 64-QAM, can be applied to reap most of the channel capacity.

28

To demonstrate the effectiveness of the subchannel selection approach, we consider a 10 x 10 independent Rayleigh flat fading channel. The channel is usually ill-conditioned since some singular values of H are very close to zero. Without the subchannel selection strategy, GMD suffers from performance degradation, especially at low SNR, as seen in Figure 3-3. On the other hand, with the subchannel selection scheme, there is only about 0.2 bit/sec/Hz rate loss compared with the CIT, even at very low SNR.
We compare the BER performance of the GMD-VBLAST scheme with the unprecoded MMSE-VBLAST scheme with the optimal detection ordering, the MTM scheme and the MMD scheme. No error correcting code is used in the simulations. In Figure 3-4(a), H e C4X2 has identically independent Rayleigh fading elements. Hence the channel matrix is usually well-conditioned. Two independent symbol streams modulated as 16-QAM are transmitted. The figure is obtained by averaging 1000 Monte Carlo trials of H. We see that the GMD scheme has more than one dB improvement over the MMD scheme at moderate to high SNR. In Figure 3-4(b), H G C4"4 usually has a large condition number, in which case the MMD scheme is subject to more capacity loss as analyzed in Section 2.3. Four independent symbol streams are transmitted. The BER performance of the GMD scheme is much better than the others. We did not include MTM because it discards some bad subchannels and hence cannot be used to transmit four independent data streams.
In the final example, we combine the GMD scheme with 64-point FFT based OFDM for ISI suppression in a SISO channel. We assume that the channel response hl, I = 0, 1,... L 1, are independent zero-mean circularly symmetric Gaussian random variables with unit variance. The channel length is L = 4. The GMD-ZFDP is about 2 dB better than GMD-VBLAST. This is because GMD-VBLAST suffers from considerable error propagation effect. This result suggests that GMD-ZFDP may be preferred over GMD-VBLAST if the channel has a large dimensionality.
3.5 Conclusions
In this chapter, we introduce a novel joint transceiver design, which combining the geometric mean decomposition (GMD) with the VBLAST equalizer or dirty paper precoder. The GMD scheme can decompose a MIMO channel into multiple identical scalar subchannels. This desirable property can bring about much convenience to the practical system design, particularly the symbol constellation selection. Moreover, we have shown that the GMD scheme is optimal asymptotically for high SNR in terms of

29

Mr = 4, Mt = 4 lid Rayleigh channel 35I I I
IT
30 -- UT
-GMD 7
-A, MTM (ARITH-MSE)
25- MMD (MAX-MSE)

20- A
-,;,0
"~5
0 ...-

10

o I I I I
0 5 10 15 20 25 30
SNR (dB)

Figure 3-1: Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and Mr = 4 for i.i.d. Rayleigh flat fading channels.

both information rate and BER performance while the computational complexity of our scheme is comparable with the conventional linear transceiver scheme. Furthermore, we have shown that the GMD scheme can be applied without the need of using training symbols for channel estimation if combined with subspace tracking techniques. We have also considered the issue of subchannel selection when some of the subchannels are too poor to be useful. The GMD scheme can also be combined with OFDM for ISI suppression. Both the theoretical analyses and empirical simulations have been provided to validate the effectiveness of our approaches.

30

Mr =5,Mt = 5 lid Rayleigh channel SNR = 23 dB

0.9.

0 .8 .

0.7

0.6
U
0.5 0.4

0.3

0.2 .
-cap/dim, GMO
0.1 cap/dim, IT

0
0 2 4 6 8 10
Capacity (bit/sec/Hz)

Figure 3-2: Complementary cumulative distribution functions of the capacities of 5 subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results based on 2000 Monte Carlo trials.

31

M 10, Mt101, SNR.d 0 1U, 10, =10,SNR 10 dB

0.9- 0.90.8 0.685

0.7- 0.70.6- 0.6
IL I.

0.4-v 0.4

0.3 5 .0.3

0.2 GUD 0.2
GMD w/Ch Sol -OMDwC~
0.1 UT 0.1 VWChSo

0 2 4 6 0 12 14 10 Is 20 25 30 35
Capaciy (bit/oc/z) 10Capacity (bitsec/Hz)

(a) (b)
U 10, M, =10, SNR 20OdB M, 10, M,=10, SNR 30 dB

0.9- 0.9

0.8- 0.80.7- 0.7 'V

U..
J 0.6 049

0.4- 0.40.3- 0.30.2- M 0.2J GMO w/ Ch Sof
UT -UT 35 40 45 so 55 00 15 75 so as so a5
Capacity (bit/sac/Hz) Capacity (bit/soc/Hz)

(c) (d)

Figure 3-3: Complementary cumulative distribution function of the capacity of an i.i.d. Rayleigh flat fading channel with Alt = 10 and Al, = 10. Results based on 1000 Monte Carlo trials. SNB = (a) 0 dB, (b) 10 dB, (c) 20 dB, and (d) 30 dB.

M=2. Mr. 4 Ild Rayleigh channel, 16-OAM M,= 4, M,= 4 ild Rayleigh channel, 16-OAM

--Ordered MUSE-VBLAST --Ordered MUSE-VBLAST
10 -.MTU (ARITOI-USE) -sMMID(MAX-MSE)
D U~(MAX-USE) 10o-, GMO-VBLAST
-GMD-VBLAST

ir10 a

10,

to-" 10'.

10,

SNR (dB) SNR (dB)

(a) (b)

Figure 3-4: PER p~erformlanlce averaged over 1000 Monte Carlo trials of i.i.d. Ray' leigh

flat fading channel vs SNR with (a) AIt = 2 and Mrl = 4 anfd (b) Alt 4 and Air 4.

32

GMD+OFDM, N =64, L =4, 64-QAM

-GMO-VBLAST
--GMD-ZFDP T

Cr 102

10

10 15 20 25
SNR (MB) Figure 3-5: BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined with OFDM for ISI suppression.

CHAPTER 4
UNIFORM CHANNEL DECOMPOSITION We have seen in Chapter 3 that the GMD scheme can have much better performance than the conventional linear transceivers. However, the GMD scheme may suffer from considerable capacity loss at low SNR due to the inherent "zero-forcing" operations which is capacity lossy, especially at low SNR. In this chapter, we propose a uniform channel decomposition (UCD) scheme, which is also based on the GMD matrix decomposition algorithm, to decompose a MIMO channel into multiple identical subchannels. The UCD scheme has two implementation forms. One is the combination of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as UCD-VBLAST, and the other includes a dirty paper
(DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as UCD-DP. Just like the GMD scheme, UCD can bring much convenience to the subsequent modulation/demodulation and coding/decoding procedures by obviating the need of bit allocation. Two remarkable merits of UCD, which are not shared by the GMD scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD has the maximal diversity gain. Moreover, the UCD scheme can decompose a MIMO channel into an arbitrarily large number of independent subchannels, which is an enabling technology to achieve high data rate transmission using small symbol constellations.
To facilitate the discussion, we recall the channel model given in (2.1) as follows.

y = HFx + z, (4.1)

where x G CLx is the information symbols precoded by the linear precoder F E Cm,L and y E CMI is the received signal and H C CMM' is the channel matrix with rank K. We assume E[xx*] = o-IL and z N(O, OU2IA,) is the circularly symmetric complex Gaussian noise. We define the input SNR as E[x*F*Fx) 0,21
F !Tr{F*F} -Tr{F*F}. (4.2)
0, 5z2 2 3

33

34

4.1 Closed-Form Representation of MMSE-VBLAST
The UCD scheme is based on the closed-form representation of the VBLAST scheme using MMSE nulling vectors. For MMSE-VBLAST, the nulling vector for the ith layer

we = hjhj + al hi, i= 1,...,Mt. (4.3)
(is
ij=1
The MMSE-VBLAST algorithm can be represented in a concise matrix form which was given in [9] (also see the more detailed version [47]).
Consider the augmented matrix

H
Ha = v/-aIM' (Mr+Mt)xM (4.4)

Applying the QR decomposition to Ha yields

Ha = QI RH A Q I RH (4.5)
Q1e

where RH. E CM xM' is an upper triangular matrix with positive diagonal elements and QUH E CMx"'. Note that H = QuRH. is not the QR decomposition of H since Qu is not unitary. However, we can readily obtain the nulling vectors using Qu. and RH. as shown in the following lemma [47]: Lemma 4.1.1 Let {qi}l denote the columns of Q o and 1{rH,iiLi1 the diagonal elements of RIa, where QH and RH, are given in (4.5). The nulling vectors of (4.3) satisfy
wi -= rH1,iiqHa,i, i= 1,2...,Am. (4.6)

Then the output signal-to-interfere-and-noise ratio (SINR) of the ith layer (i.e., the signal corresponding to hi) using wi is S= i (4.7)
w, j hjh* + o(I wi

Inserting (4.3) into (4.7). we can simplify (4.7) via some straightforward calculations to be (see. e.g.. [48])
Pi = h Ci h11i i = 1,..., M. (4.8)

where C, = E- hjh; + o.

35

The SINRs given in (4.8) are related to RH. as shown in the following lemma: Lemma 4.1.2 The diagonal of RH, given in (4.5) and {pi}j'l given in (4.8) satisfy

a(1 + p) = rHaii i = 1, 2,..., Mr. (4.9)

Proof: See Appendix A. U

An immediate corollary follows.
Corollary 4.1.3 The MMSE-VBLAST detector is information lossless. That is, AMt
Slog(1 + pi) = log IH*Ha-1 + I, (4.10)
i=1
where the right hand side of (4.10) is equal to (2.7) with F = IM,.
Proof: From (4.4) and (4.5), we have

H*Ha-' + I = a-la*HaO = a-iRH*RH. (4.11)

Hence
Mt
log IH*H-1 + I = log (a-r2 j) (4.12)
i=1
According to Lemma 4.1.2, Mt
log IH*Ha-i +I = log(1 +pi).
i= 1

We note that Corollary 4.1.3 coincides with the findings in [48].
4.2 UCD-VBLAST
If we modify the precoder F given in (2.8) to be

F = VMI/2Q* (4.13)

where CLxK with L > K (to avoid capacity loss, we should not choose L < K
in general) and Q* = I, then we see through inserting (4.13) into (2.7) that the F given in (4.13) is also a precoder maximizing the channel throughput. However, introducing Q brings much greater flexibility than the precoder of (2.8). In the following, we concentrate on how to design f.

36

Given the precoder of (4.13), the virtual channel is

G A HF = UAI)1/2* A UEO* (4.14)

where E = AD1/2 is a diagonal matrix with diagonal elements {a}K__. Let Ga denote the augmented matrix

G, = I (4.15)
V'aIL
The UCD scheme is based on the following lemma.
Lemma 4.2.1 For any matrix of the form given in (4.15), we can find a semi-unitary matrix n E CLxK such that the QR decomposition of Ga yields an upper triangular matrix with equal diagonal elements.
Proof: Rewrite (4.15) as

U[EO9Kx(L-K)]1
Ga = U[E K(L-K)0 (4.16)
VSIL I

where Qo E CLxL is a unitary matrix whose first K columns form Q. We further rewrite (4.16) as
l ol OO(.Kx(L-K)] G0 o Ll no*. (4.17)
0 Qo Ita
We can have the following GMD:

j [ U OKx(L-K) QjRjP (4.18)
J -IL= QgRP) (4.18)
VaIL

where R E RILxL is an upper triangular matrix with equal diagonal elements and Qj E C(M-+L)xL is semi-unitary and Pj E CLxL is unitary. Inserting (4.18) into (4.17) yields
= IM, O 1
Go =o QJRJP*g. (4.19)

0 0o
Let no = P* and

Q = II 0]Qj. (4.20)

Then (4.19) can be rewritten to be Ga = QcoRj which is the QR decomposition of G,. The semi-unitary matrix Q associated with G, consists of the first K columns of Q0 (or P*))1 0

37

From Lemma 4.2.1 and Lemma 4.1.2, we conclude that we can always combine a linear precoder and the MMSE-VBLAST detector to uniformly decompose a MIMO channel into L > K subchannels with the same output SINRs. According to Corollary 4.1.3, we can further conclude that the channel decomposition is strictly capacity lossless. We refer to the scheme demonstrated in Lemma 4.2.1 as UCD-VBLAST.
The proof of Lemma 4.2.1 is insightful. Indeed, given the SVD of H and the "water filling" level 41/2 we only need to calculate the GMD given in (4.18). Then we immediately obtain the linear precoder F = V41/2*, where Q consists of the first K columns of P*. Let Qu denote the first Mr rows of QG., or equivalently the first M, rows of Qj (cf. (4.20)). According to Lemma 4.1.1, the nulling vectors are calculated as wi = r i oi, i = 1,2,..., L (4.21)

where rji is the ith diagonal element of Rj and qGo,i is the ith column of Qo.
Some observations can help reduce the computational complexity. For any matrix B e CMxN with SVD B = UBABV* and the augmented matrix with SVD

A = = UAAAV*, (4.22)

the diagonal elements of AA and AB, i.e., AA,j and AB,, satisfy AA,i = A, i = 1,...,N. (4.23)

Moreover

UA = and VA = VB. (4.24)
V A
Hence the SVD of J defined in (4.18) is

j U[ OK(L-K)]E IL (4.25)

where E is an L x L diagonal matrix with the diagonal elements = (, 1 < i < K. (4.26a)

and
di v K + < i

38

Applying the GMD matrix decomposition algorithm given in Section 6.2 to E yields

= (QIQ2 ... QL-1)RJ(PL-IL-2... P). (4.27)

Hence

U[E- OKx(L-K)l U[E!0OKx(L-K)j 1 Q ..L-Rj(TT T T)
(Qx 2-- L-) J L- L-2 lI (4.28)
Then the linear precoder has the form:

F=V [41/2: 0Kx(L-K) P12 .. PL-1. (4.29)

The nulling vectors are calculated according to (4.21) with rj,i = ( =Iai) L, and

QG = U[E 0Kx(L-K)]EQQ2 ... QL-1. (4.30)

Note that Q, and Pt, 1 = 1,2,.. ., L, are Givens rotation matrices and hence calculating (4.29) and (4.30) needs O(Mt(K + L)) and O(Mr(K + L)) flops, respectively.
We summarize the UCD-VBLAST scheme as follows 1
Table 4-1: The UCD-VBLAST scheme step operation flops
1 Compute SVD H = UAV* O(MtM,K)
2 Calculate ,p/2 using (2.9) O(K2)
3 E = A0/2 O(K)
4 Obtain E using (4.26) O(K)
5 Apply GMD to E to obtain (4.27) O(L2)
6 Generate F using (4.29) O(Mt(K + L))
7 Compute Qg using (4.30) O(M,.(K + L))
8 Calculate {w_1 using (4.21) O(M,.L)

Obviously, our UCD-VBLAST scheme has comparable computational complexity to the SVD based linear transceiver designs. An observation relevant to practical implementations is as follows. Note that the receiver does not have to calculate Step 6 since CSIT is available and the transmitter can run Steps 1 to 6. However, if the receiver calculates F, which only takes a small number of flops, and feeds it back to the transmitter,

1 Steps 5-7 can be processed simultaneously as in the GMD algorithm.

39

then the transmitter is relieved from calculating the SVDs. Hence in FDD systems, it is preferable to feed back F, rather than H, to the transmitter. In TDD systems, there are still advantages for feeding back F since this reduces by approximately half the overall computational complexity.
We conclude the discussions of the UCD-VBLAST scheme by deriving the SINR of each subchannel. Note that the diagonal elements of Rj is

rj, = j)l i= 1,2,...,L, (4.31)
1= 1

which is the geometric mean of the diagonal elements of t. It follows from (4.26) that

( K (01 )1/L= ( K 1 /21L
= aL-K 12 + a 1a (a-la + 1) (4.32)
l=1 \l=1

According to Lemma 4.1.2,

Pi = p a-l + 1) 1, i = 1, 2,..., L. (4.33)
\l=1

Hence
L K K
log2(1 +Pi) = 0log2(1 a = og2(1 +-1Hii) (4.34)
i=1 i=1 i=1
which is exactly the CIT in (2.10). Hence UCD-VBLAST is strictly capacity lossless.
4.3 UCD-DP
As a dual form of UCD-VBLAST, the UCD scheme can be implemented by using DP precoding, which we refer to as UCD-DP. For UCD-DP, a direct construction of the linear precoder F as done in Section 4.2 is not obvious. Instead, we exploit the uplink-downlink duality revealed in [49] to obtain UCD-DP.
We convert the UCD-DP problem into the UCD-VBLAST problem in the reverse channel where the roles of the transmitter and receiver are exchanged

y = H*x + z. (4.35)

The UCD-VBLAST scheme can be applied to the channel of (4.35), which yields the precoder Fre, and the equalizer {wJl as in (4.29) and (4.21), respectively. Normalize {wi}= to be of unit Euclidean norm, which we denote as {i}\$_1. Let W = [ ., VL]. According to the uplink-downlink duality, the precoder of UCD-DP should be F = WDq where Dq is diagonal with the diagonal elements {J}_~, which will be

40

determined based on (4.40) below. We use F,ev, the linear precoder in the reverse channel, as the linear equalizer. Then the equivalent MIMO channel is

y = F evHWDqx + F*ez, (4.36)

where the ith scalar subchannel of the MIMO channel is L i-1
Yi = fiHwivxi + E f*HYjv'xj + E f*Hcvjr3~xJ + f*z. (4.37)
j=i+1 j=1

Applying the dirty paper precoder to xi and treating j_11 f*H jVqjyxj as the interference known at the transmitter (note that here we precode the first layer first while for UCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel

L
y, = f*Hvi vAxi + E f* HWjVIqxJ + f*z (4.38)
j=i+1
with SINR
qi IffH' v |2
Sqjf*Hj for i = 1, 2,. .., L. (4.39)
allfi|2 + Lj=i+1 qJf*Hwj12
The next step is to calculate {qi}fLI such that pi = p, 1 < i < L, where p is as defined in (4.33). Let aij = If*Hvj12. Then (4.39) can be represented in the matrix form

all -pa12 ... -palL q 1f 12
0 a22 -Pa2L q2 11f2112
= pa (4.40)

0 ... 0 aLL qL fLI2

It is easy to see that qi > 0, 0 < i < L. It is proven in [49] that EI qi = tr(FF*) = tr(Fe,,Fev). That is, the UCD-DP needs exactly the same power as the UCD-VBLAST to obtain L identical subchannels with SINR p.
The UCD-DP using the Tomlinson-Harashima precoder leads to an input power increase of A for M-QAM symbols. Nevertheless, for a system with high dimensionality and/or using large constellation, UCD-DP is a better choice than UCD-VBLAST since it is free of propagation errors.
4.4 Performance Analysis
4.4.1 Diversity Gain Analysis
An important performance metric is diversity gain, which is defined as follows [161.

41

Definition 4.4.1 Let P(p) denote the average error probability of a scheme at SNR p. The diversity gain of the scheme is
d = lim log Pe(p)

P- logp (4.41)
The diversity gain measures how fast the error probability decays with SNR. We note that diversity gain is usually discussed without assuming the availability of CSIT. The reason is that diversity gain is a concept associated with channel outage, i.e., the case where the channel is too poor to support a target data rate. Using CSIT, one can adjust the transmission rate to avoid channel outage. However, if the rate is fixed, which is desirable in practice, we can also use diversity gain as a performance measure of the transceiver designs. Based on this observation, we analyze the diversity gains of the UCD and GMD schemes. The result is summarized in the following proposition. Proposition 4.4.2 Consider the i.i.d. Rayleigh flat fading MIMO channel defined in (4.1). Let M = max(Mt, Mr) and m = min(M, M,). The diversity gains of the GMD and the UCD schemes are

dM(M, m) = (M m + 1)m, and duc (M,m) = Mm, (4.42)

respectively.

We have applied the typical error event analysis (see [16][50]) to obtain (4.42). The details are relegated to Appendix B. We see that although UCD has a negligible coding gain compared with the GMD scheme at high SNR, it has an additional m2 m diversity gains over GMD. An interesting point to make is that water filling does not help improve diversity gains. Hence at high SNR, water filling is useless in both capacity and diversity aspects.
Given the fact that the GMD scheme is asymptotically capacity lossless for high SNR, it is rather surprising to see the large diversity loss of GMD compared with UCD. We give an intuitive explanation as follows. Note that diversity gain is determined by the typical error events that the MIMO channel is in deep fade. Namely, the diversity gain of a scheme depends on its ability of dealing with bad channels. A deeply faded channel with high input SNR is equivalent to a "normal" channel with low SNR, in which scenario the GMD scheme is far less efficient than UCD as shown in the numerical examples. Consequently, the GMD has less diversity gain than UCD.

42

4.4.2 Further Remarks
Besides the larger coding gain at low SNR and an improved diversity gain at high SNR, the UCD scheme enjoys more flexibility than the GMD scheme. For a rank K MIMO channel, the GMD scheme can support no more than K independent data streams. However, the UCD scheme can decompose a rank K MIMO channel into L > K identical subchannels, and L is not even limited by the dimensionality of the channel matrix. This property of the UCD scheme enables one to achieve high data rate transmission using small constellations as demonstrated in the numerical examples.
The UCD scheme also suggests new ways of channel decomposition which are much more flexible than the conventional SVD based ones. Indeed, one may chose the permutation matrices and Givens rotations to achieve a wide variety of channel decompositions with some prescribed SINRs as suggested by the generalized triangular decomposition (GTD) (See Chapter 6, [51] [52]). This idea is developed in Chapter 5.
Finally, we link UCD with DBLAST [2], which has been shown to be able to achieve the optimal tradeoff between the channel diversity and multiplexing [16]. We observe that each diagonal layer of DBLAST can be viewed as the interleaving of the vertical layers of VBLAST in the space-time domain and each diagonal layer can be regarded as a virtual subchannel with the same capacity. However, DBLAST requires short and powerful error correcting coding to make the virtual subchannel work as a "real" one. This is a major difficulty for the implementation of DBLAST. In addition, DBLAST suffers from boundary wastage. In contrast, our UCD scheme, by exploiting CSIT, applies interleaving (via the Givens rotations and permutations) in the space domain only. This makes the UCD scheme free from the boundary wastage. Moreover, the UCD scheme is decoupled from coding procedures. Indeed, UCD can be concatenated with any error correcting code. Furthermore, UCD makes it easier to design the coding scheme since UCD decomposes a MIMO channel into multiple subchannels with identical capacities. Thus in a slowly time varying channel, UCD is much easier to implement than DBLAST despite their duality. This manifests clearly the values of CSIT.
4.5 Numerical Examples
We present next several numerical examples to demonstrate the effectiveness of the UCD scheme.
In the first example, we assume Rayleigh independent flat fading channels with lt = 10 and A,. = 10. Ve compare the channel capacity using the UCD and GMD

43

schemes. The complementary cumulative distribution functions (CCDF) of the capacity drawn out of 2000 Monte-Carlo realizations of H are shown in Figure 4-1. We see that the UCD scheme outperforms the GMD scheme significantly at low SNR although the difference becomes smaller at higher SNR.
Figure 4-2 shows the CCDFs of the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR equal to 25 dB. The five thin dashed curves denote the channel capacities of the five subchannels obtained via SVD plus water filling. Note that the leftmost thin dashed curve crosses the vertical axis at a value less than one, which means that the worst subchannel (corresponding to the smallest singular value of the channel matrix) is sometimes discarded by water filling. The thick solid line is the CCDF of the capacity of the L = 5 subchannels obtained via UCD. All these subchannels have the same capacity. As discussed in Section 4.2, a rank K MIMO channel can be decomposed into L > K subchannels. The thin solid line represents the case where a MIMO channel is decomposed into 7 identical subchannels using the UCD scheme. Figure 4-2 demonstrates the advantages of our UCD scheme over the conventional "SVD plus bit allocation" scheme (see, e.g., [19]). The channel capacities of the 5 subchannels obtained via SVD plus water filling range from 0 to about 11 bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match the capacity of the worst subchannel and something like 1024 or 2048 QAM to the best subchannel. This bit allocation significantly increases the modulation/demodulation complexity. Using GMD or UCD, we can decompose a rank 5 MIMO channel into 5 subchannels and hence the same constellation with a reasonable size, say 128-QAM, can be used to reap most of the channel capacity. The UCD scheme can do even better. In this example, after decomposing a MIMO channel into 7 subchannels via UCD, we can apply a small to moderate constellation, say 16-QAM or 64-QAM, to achieve the channel capacity.
In the third example, we assume Rayleigh independent fiat fading channels with It = 4 and A1r = 4. We compare the BER performance of the GMD and UCD schemes along with the conventional MMSE-VBLAST with optimal detection ordering in Figure 4-3. We see that both GMD and UCD outperform the conventional VBLAST detector significantly. Moreover, the BER vs. SNR lines of the GMD and UCD schemes have much steeper decreasing slopes, which means much better diversity gains, than the conventional VBLAST. The diversity gains of the GMD and UCD schemes are 4 and 16,

44

respectively. While there is a noticeably larger diversity gain for UCD compared with GMD as shown in Figure 4-3, the difference is not as drastic as the theoretical prediction. It is because the input SNR is not high enough to validate the approximations made in the typical error event analyses (see Appendix B).
In the final example, we compare the BER performance of UCD-VBLAST and UCD-DP in the scenario of a 10 x 10 Rayleigh flat fading channel. To present a benchmark, we also include UCD-genie as the imaginary scenario where at each layer, a genie would eliminate the influence of erroneous detections from the previous layers when using UCD-VBLAST. Figure 4-4 shows that UCD-VBLAST may suffer from some small BER degradations caused by error propagation (about 0.5 dB for BER = 10-4) compared with UCD-genie. The UCD-DP, on the contrary, is free of error propagation and hence has BER performance very close to that of UCD-genie. The slight SNR loss of UCD-DP is mainly due to the inherent power-amplification effect of the Tomlinson-Harashima precoder.
4.6 Conclusions
Based on the GMD matrix decomposition algorithm and the closed-form representation of the MMSE-VBLAST detector, we have introduced the UCD scheme for MIMO communications that can decompose a MIMO channel into multiple subchannels with identical capacities in a capacity lossless manner. We have proposed two versions of the UCD scheme, i.e., UCD-VBLAST and UCD-DP. The UCD scheme can provide much convenience for the subsequent modulation/demodulation and coding/decoding procedures due to obviating the need of bit allocation. We have also shown that UCD can achieve the maximal diversity gain. The simulations show that the UCD scheme has excellent performance even without the use of error correcting codes. The UCD scheme suggests a new way of channel decomposition which enjoys much more flexibility than the conventional SVD based ones.
Appendix A
Proof of Lemma 4.1.2
Rewrite (4.5)

H. = QH 0 A Q ] RRo. (4.43)

45

Let Ha,i (Hi) denote the submatrix containing the first i columns of Ha (H) and ha,i
(hi) the ith column. Then hi
Hi
Ha,i vIi hai 0(i-1)x (4.44)
O(Mt-i)xMt
O(Mt-i)x

For the QR decomposition Ha = QH.RH., the geometric implication of rH,ii is the component of ha,i projected onto the subspace spanned by the ith column of QHa, i.e., qHa,i. Note that qHs,i is orthogonal to the subspace spanned by {qHj -1 or, equivalently, the column space of Ha,i-1. Hence

r ,i= h* ,p ha,i, (4.45)

where PA stands for the orthogonal projection onto th null space of AT. Therefore

Hii = h,, I Ha,i-1 (Ha,i- Ha,i-1)-1 Ha,i- ha,i. (4.46)

Inserting (4.44) into (4.46) yields
r2 *1
rH, = a + h I Hi-1 (H_lHi_1 + aI)- H*-I hi
= a + ah (Hi-1H* I+ aI) h. (4.47)

From (4.8), we see that
pi = h (HiH* + al)-1 hi. (4.48)

Hence r2,i = a(1 + pi). The lemma is proven.
Appendix B
Proof of Proposition 4.4.2 Without loss of generality, we assume H e Cum, each of whose entry is of circularly symmetric Gaussian distribution with zero-mean and unit variance. Consider BPSK modulation. The average error probability of the GMD scheme is

PeD = E [Q (= [Q ( 24)j = E Q 2p (P H
i=(4.49)
(4.49)

46

where the Q-function is defined as /+0o
Q(-) dx.
Q(X) 27r

The diversity gain of the GMD scheme is log PGMD
dGMoD(M, m) = lim e (4.50)
p-.o log p
For any QAM constellation, the average error probability is similar to (4.49) except for some constants before or inside the Q-function. Since we focus on the high SNR region, all these constants will not affect the diversity gain defined in (4.50).
At high SNR, the typical error event is

= {AH 2 -' }. (4.51)

It can be shown that instead of calculating (4.50), which involves complicated integrations, we can compute the following [50, Ch. 3]:

doMu(M, m) = lim log P()4.52) p-so logp (4.52)
Note that
771
= ]H ,, = IH*HI- (4.53)
i=1
According to [53, Theorem 7.5.3] (with straightforward extensions from real-valued domain to the complex-valued domain),

TO
= H*Hj = 19-m1+i (4.54)
i=1
where gig's are independent Chi-squared random variables with probability density

1
f2(x) (i-l)!x- e, x > 0. (4.55)

Now the typical error event can be written as

S M-m+iil 1- f-m+i p-m
Si=1 = {g-,n}i= A -,+i i = 1. ... ,m} (4.56)

47

where o = {{= i}im, : -', ai > m}. Hence
m
P(e) = IP(g2M-m+i a i=1

From (4.55), we know that as E 0,

P(g2 < E) = )'-le-dx j dx = -e'. (4.58)

Using (4.52) (4.58) and (4.57), we calculate the diversity gain as
m D-(M -rr+i) d a
dMD(M, m) = l log fe+ U- 1 (M-rm+i)! do .. dom.
dGMD(M, M)1 -1
p-oo log p
lim log f,+ p- d(M-m+i)ada ... dam. (4.59) =- limlog (4.59)
p-Oo log p
m
= inf (M m + i)ai, (4.60)
Co i=1

where + ,,= o {(na > 0, i = 1,..., m}. To obtain (4.60) from (4.59), we have used the property that the integral in the numerator of (4.59) is dominated by the term with the SNR exponent closest to zero, as p -+ oc (see [16] for details). Here the integration is constrained over + because the integration over E, is dominated by the one over +. The reason is as follows. Suppose only a,..., a'j > 0, j < m, and the other a's, ak, ..., k,,, are negative. Then
m ]
r l P (g- + < a) ;; 11 pgI-I'j~ ( ~ j,
<- + < P-_) -: P

i=1 i=1

Let E+ denote { {,01 : > 0 = a, > m .'.j ak. Clearly,

3 m
inf L(M in + n)a, > inf (Ml mn + i)ai, a i=1 i=1

which implies that the integration over So is dominated by that over 9,+. Solving the optimization problem of (4.60) yields

d;x,(MA, min) = (MAl min + 1)m. (4.61)

48

Now we consider UCD. We observe that the power allocation applied to each eigen subchannel is no greater than p. Hence the overall channel throughput of UCD is
m m
log (1 + P A Rc log (1 + pA4,), (4.62)
Elg 1 m H H<-R~D
i=1 i=1

where the left term denotes the channel throughput associated with uniform power allocation. Applying UCD, we obtain m subchannels with the same SNR:

mm mm
1 + -A2 1 PUCD (1 + pA2,i) -1. (4.63)
= 1 i= 1

The typical error event is

E = {{AH,ij : PUCD < 1}. (4.64)

It follows from (4.63) that

Pi (P)( PP "( )i (1TI H ) P2 (P).
Pi(P)=P 1+- H),i,-1<1 2P(g)>P (1+pH,)-1< 1 AP2
i=1 i=1
(4.65)
It is easy to see that
lirm log Pi(p)= lim logP2(p)(4.66) p-o log p p-o log p
Hence
log P(S) log P, (p)
lim = lim (4.67)
p-o logp p-co log p
which implies that water filling does not help improve diversity gain.
It follows from the analyses of [16] that the UCD scheme achieves the optimal diversity-multiplexing tradeoff. In particular, when the transmission data rate is fixed, disregard the increase of input SNR, the diversity gain is d.,dd(M, m) = Mm.

49

Mt=10, M = 10, SNR=0OdB Mt =10, Mr=10, SNR= 10 dB

I 1W I - GMD0
0.9 UCD 0.9 UCO

0.8 0.8

0.7- 0.7

0.6 0.86
U. IL
6o.o 5 o "
0.U 0.5

A0.4 0.4

0.3 0.3

0.2- 0.20.1 0.1

2 4 8 10 12 14 10 15 20 25 30 35
Capacity (bit/seclHz) Capacity (bWtsecHz)

(a) (b)
M, 10, Mr =10, SNR=20 dB Mt =10, Mr =10, SNR =30dB

0 -9- I I I -mo
0.9 UCO 0.9 UCD

0.8 0.80.7- 0.70.6- 0.6ILILL
0. 0.50.4- 0.40.3 0.30.2- 0.20.1 0.1
I J-.
40 45 s0 00 8 5 55 80 90 95
Capacity (bit/sec/Hz) Capacity (bit/secHz)

(c) (d)

Figure 4-1: Complementary cumulative distribution function of the capacity of an i.i.d. Rayleigh flat fading channel with MAt = 10 and Mr = 10. Results based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20 dB, and (d) 30 dB.

50

M =5, Mt= 5 lid Rayleigh channel SNR = 25 dB
1 1 ( I

0.9 0.8

0.7

0.6

0 0.5

0.4

0.3

0.2 cap/dim, GMD

0.1 cap/dim, UCD (L = 5)
cap/dim, UCD (L = 7)
0 .I I 'II.
0 2 4 6 8 10
Capacity (bit/sec/Hz) Figure 4-2: Complementary cumulative distribution functions of the capacities of 5 subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and M = 5. Results based on 2000 Monte Carlo trials.

Mt= 4, Mr= 4 lid Rayleigh channel, 16-QAM

- Ordered MMSE-VBLAST
-o. GMD-VBLAST 10 *g !: : UCD-VBLAST

E10 '
10

S10
10-4 . .. . tk. . ::

10

10. I I
10 15 20 25
SNR (dB)

Figure 4-3: Uncoded BER performance when using 16-QAM. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with M. = 4 and M. = 4.

51

Mr= 10, Mr= 10 lid Rayleigh channel, 64-QAM

-UCD-VBLAST
1 0 -.

10-2.....

16 18 20 22 24 26
SNR (dB)

Figure 4-4: BER performances of the UCD-DP, UCD-VBLAST schemes and the imaginary UCD-genie scheme. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with AI = 10 and Mr = 10.

CHAPTER 5
TUNABLE CHANNEL DECOMPOSITION
5.1 Introduction
All these aforementioned MIMO transceiver designs focus on improving the communication quality subject to power constraints. In this chapter, we tackle a new aspect of the MIMO transceiver design problem. We regard a MIMO transceiver design as a way of decomposing a MIMO channel into multiple subchannels. As we have mentioned, the MIMO channel decomposition through SVD plus "water filling" lacks flexibility despite its optimality in terms of achieving the maximal overall channel capacity. The success of UCD motivates a much more flexible channel decomposition approach, namely the tunable channel decomposition (TCD) scheme, which is the main result of this chapter. Using the recently developed generalized triangular decomposition (GTD), we propose the TCD scheme to decompose a MIMO channel into multiple subchannels with prescribed capacities or, equivalently, signal-to-interference-and-noise ratios (SINR). The main properties of the TCD scheme are summarized as follows:
1. Given K parallel subchannels with capacities C1, C2,... CK, which is obtained
through applying SVD plus "water filling" to a rank K MIMO channel, TCD can convert the K subchannels into L > K subchannels with capacities R1, R2, -., RL if and only if (Ci, ., CK, 0,.., 0) E RL majorizes (R1, R2,. RL) 1 In particular, E-1C = Ei=1 Ri, i.e., the TCD is capacity lossless.
2. The TCD scheme has two implementation forms. One is the combination of a
linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as TCD-VBLAST, and the other includes a DP precoder and a linear equalizer followed by a DP decoder, which we refer to as
TCD-DP.
3. Given the SVD of the MIMO channel matrix, the computational complexity of
TCD, which is to calculate the precoder and equalizer matrices, is O(KL), which
is computationally quite efficient.

1 The concept of majorization is introduced in Section 5.2.

52

53

Almost originated at the same time as the research on MIMO transceiver designs, the optimal design of symbol synchronous CDMA (S-CDMA) sequences has been under intensive study over the past decade (see, e.g., [311[54][551[561). Although the two research topics have been studied in an apparently independent manner in the signal processing and information theory communities, the CDMA sequence design problem can be viewed as a special case of the MIMO transceiver design as we have shown in Section 2.1.1. Hence the TCD scheme can be applied, with little modifications, to the design of optimal CDMA sequences. Moreover, the TCD-VBLAST and TCD-DP schemes can be applied to design optimal CDMA sequences in the uplink (mobile-tobase) and downlink (base-to-mobile) scenarios, respectively. Our TCD scheme, which is independently motivated by the MIMO transceiver design problem, turns out to be related to the scheme proposed in [561. The relationship is discussed in Section 5.3.
5.2 Channel Model and Preliminaries
5.2.1 Channel Model
To facilitate the discussion, we rewrite the channel model used in the previous chapters.
y = HFx + z, (5.1)

where x E CL, I is the information symbols precoded by the linear precoder F G CM x L and y e CMr-I is the received signal and H E CM-xM is the channel matrix with rank K. We assume E[xx*]= U UIL and z N(O, UrIAf,) is the circularly symmetric complex Gaussian noise. We define the SNR as

E[x*F*Fx U2 1
P- Tr{F*F} Tr{F*F}, (5.2)

5.2.2 Channel Decomposition
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {A ,k}K I are the nonzero singular values of H. To maximize the channel capacity with respect to F given the input power constraint Tr{FF*} po,/U2, one needs to solve

CrT = max log2 I + a-HFF*H*. (5.3)
TrFF" } p./_2

The optimal linear precoder is (cf. (2.8))

F = Vpl/2. (5.4)

54

Here L = K and P is diagonal whose kth (1 < k < K) diagonal element Ok determines the power loaded to the kth subchannel and is found via "water filling" to be
/ )+
OkW A(5.5) H ,

with pu being chosen such that a E- 1 Ok(P') pc' and (a)+ = max{0, a}. In this case, we obtain K subchannels with capacities

Ck = log2 ( + H4k = log2 bps/Hz, k = 1,2,...,K. (5.6)

Due to the usually large dynamic range of singular values { H,k}k.1, the SVD decomposes a MIMO channel into multiple parallel eign-subchannels with different channel capacities. Moreover, since the optimal power loading levels are fixed as given in (5.5), the achievable MIMO channel decomposition is rigidly given in (5.6) and it lacks flexibility.
Another way of decomposing a MIMO channel is to use the VBLAST detector [5]. The VBLAST scheme involves sequential pulling and cancellation and it decomposes the MIMO channel into K subchannels (or layers as coined in [5]). By changing the ordering of the signal detection, we can get K! subchannel combinations, each of which is capacity lossless [48].
Theoretically, more combinations of subchannels is possible via time sharing (see [57, Ch. 14.3]). Recall that every DBLAST layer sends its data substream across the K transmitting antennas, or VBLAST layers, in a time sharing manner [2]. For example, for a system with Mt = 2, the transmitted data are Vertical Layer-I :1 Y2 x3 Y4 .. (5.7)
Vertical Layer-II 0 X2 Y3 x4 ...

Let xi and yi, i = 1, 2,..., denote the symbols transmitted through the DBLAST layers I and II, respectively, at time i. The receiver first estimates x, and then estimates x2 by regarding Y2 as interference. The estimates of X1, x2 are decoded jointly, which form the output of the diagonal layer I. After subtracting out the effect of x], X2 from the received data, we can estimate and decode Y2, Y3. which form the diagonal layer II. We remark that DBLAST can be viewed as a combination of VBLAST and tie time sharing technique, which decomposes the MIMO0 channel into multiple identical subchannels.

55

However, time sharing can be difficult to implement in practice. For instance, the major difficulty of DBLAST is the requirement of encoding the diagonal layer with short and efficient error correction codes, which limits its practical implementation despite its superb theoretical performance analyzed in [16].
If CSIT is available, more flexible and practical channel decompositions can be achieved. In Chapter 4, we have proposed the UCD scheme which combines the geometric mean decomposition (GMD) developed in Section 6.2 with either an MMSE-VBLAST detector or a DP precoder to decompose the MIMO channel of (5.1) into L > K identical subchannels. Hence, the UCD scheme can achieve the theoretical performance of the DBLAST scheme without resorting to any error correcting coding.
In this chapter, we generalize the results of Chapter 4 and develop a systematic channel decomposition that combines the recently proposed GTD algorithm with either an MMSEVBLAST detector or a DP precoder. We show that given K parallel subchannels with capacities C1, C2,. .., CK, which are obtained via SVD, TCD can convert the K subchannels into L > K subchannels2 with capacities R1, R2, .., RL if and only if (R1, R2,..., RL) is majorized by (C1, . , C 0,..., 0) E R L. This scheme is particularly relevant to the applications where independent data streams with different qualities-ofservice (QoS) share the same MIMO channel [28]. For example, video services usually require higher SNRs than audio services. Decomposing a MIMO channel into multiple subchannels with prescribed capacities and transmitting independent data streams through these subchannels can provide much convenience for resource allocations.
5.2.3 Majorization and Generalized Triangular Decomposition
We introduce several basic concepts and theorems of the majorization theory from [58].
Definition 1 For x, y E nR, if
J J
,] y~ 1 J i=l i=1

with equality holds for j = n, where the subscript [i] denotes the ith largest element of the sequence, we say that x is majonized by y and denote x -<+ y or, equivalently, y >-+ x.

2 If L < K, some eigu-subchanmels are discarde(d, which causes capacity loss. Hence we focus on the case of L > K.

56

Definition 2 An n x n matrix P is doubly stochastic if its (i,j)th entry pij 0 for i,j = 1,...,n, and -=lPij = 1 and _'=lPij = 1. Theorem 5.2.1 x -<+ y if and only if there exists a doubly stochastic matrix P such that x = Py.
A square matrix 11 is said to be a permutation matrix if each row and column has a single one, and all the other entries are zero. There are n! permutation matrices of size n X n.
Theorem 5.2.2 The permutation matrices constitute the extreme points of the set of doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the convex hull of the permutation matrices.
It follows from Theorems 5.2.1 and 5.2.2 that the set {x Ix -<+ y} is the convex hull spanned by the n! points which are the permutations of y.
As we have mentioned before, given K parallel subchannels with capacities C1, C2,..., CK, which are obtained via SVD, TCD can convert the K subchannels into L > K subchannels with capacities R1, R2,. . RL if and only if (R1, R2,.. ., RL) -<+ (Cl, . CK, 0, ...,O) 0
RL. For example, for a MIMO channel H with rank K 3, assume that the capacities of the 3 subchannels obtained via SVD are C, > C2 > C3. If L = K, then TCD can decompose the MIMO channel into 3 subchannels with a rate vector r = (R1, R2, R3) if and only if r lies in the convex hull

CO C2 ,C3 C1 C3 C2 ,C1 (5.9)

1 C3 C2 C3 C' C1 C2

Here Co stands for the convex hull defined as

Co{S} = {JOxj + ... + OKxKjxj E S,9, > 0,01 +OK = 1}. (5.10)

In general, the "capacity region" is a convex hull defined by K! vertices in a Kdimensional space. Since the TCD is capacity lossless, i.e., E Ci = 1ii1 Ri, the capacity region falls into a (K 1)-dimensional hyperplane. The gray area in Figure 5-1 shows the convex hull of (5.9) with C, = 3, C2 = 2, and C3 = 1. In this case, the 6 vertices lie in the 2-D plane {x xi= 6}. An interesting special case is the UCD scheme [59], which achieves the rate vector corresponding to the center of the convex hull, i.e., r = (2,2,2).

57

Capacity losses region (C1 : 3, C2 = 2, C3 1)

(1.;2,3)

3
2.5- (1,3,2)
2.5

2

1 .5- M12
1

Figure 5-1: Illustration of the capacity lossless region obtainable via TCD. We assume K = 3, C1 = 3, C2 = 2, and 03 =1.

Definition 3 For x, y E ll if
0 2

fix~ i=l i=l

with equality for j = n, we say that x is multiplicatively majorized by y and write x -<, y or, equivalently, y >-. x.
Obviously, if x -<, y, then logx -<+ logy.
Now we are ready to introduce the GTD theorem.
Theorem 5.2.3 (GTD theorem) Let H E C"nx have rank K with singular values A E IR'. There exists an upper triangular matrix R E CKXK and matrices Q and P with orthonormal columns such that H = QRP* if and only if the diagonal elements of R satisfy Irl -< A.

Proof. We relegate the proof to Chapter 6. U
There is a coniputationally efficient and numerically stable algorithm to achieve the GTD predicted by Theorem 5.2.3, which is presented in Chapter 6.
5.3 Tunable Channel Decomposition
5.3.1 TCD-VBLAST
We see from (5.2) that F can always be scaled such that a = 1. Hence without loss of generality, we let o = I in the sequel to simplify the notation.

58

Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K diagonal matrix whose diagonal elements {AHk k=1 are the nonzero singular values of H. The conventional SVD based linear transceiver designs have precoder F = V1/2 where & is a diagonal matrix whose diagonal elements stand for the power allocation. The precoder F transforms the MIMO channel into K orthogonal subchannels with capacities
Ck = log2(1 +A k) bps/Hz, k= 1,2,...,K. (5.12)

For this kind of precoder design, the only way of controlling the capacity of the subchannels is to change the power allocation 4.
If we modify the precoder F to be 3

F = V 1)/2f2T (5.13)

where Q E RLxK with L > K, and fTO = I, then it can been readily seen that introducing f does not change the overall channel capacity. However, it brings much greater flexibility as demonstrated in the following theorem. Theorem 5.3.1 (TCD Theorem) Consider a MIMO channel of (4.1) with F given in (5.13). For any L > K, let c E RIL be a zero vector with its first K elements replaced with {Ck k=1, where Ck = log (1 + A ,kk). Given any rates {Rk}k=1, we can find an orthonormal matrix Q E RLxK such that the combination of the linear precoder F = V41/2 T and the MMSE-VBLAST detector yields L subchannels with capacities
I L IL
{Rk k=1 if and only if Rk =1 -+ C.
Proof: Given the precoder of (5.13), the virtual channel is

G A HF = UA /2T = UAcG f2T (5.14)

where AG = A(1/2 is a diagonal matrix with diagonal elements

A = AH,;

3 Letting Q to be complex-valued does not introduce additional flexibility as is clear according to the GTD algorithm.

59

Let the augmented matrix Ga be defined as

G = UAt QT (5.16)
IL I (M,-+L)xL

After some straightforward calculations, we can obtain the SVD of Ga as the following: U[AGc.0Kx(L-K)]GAc n
Ga =[U[AOKL-K)] AGo 0, (5.17)

where fo E RLxL is orthogonal with its first K columns forming f and the diagonal matrix AGa contains the singular values of Ga: {~~ 1/+~f, l< i< K,

AcG.,i = + A 1 i < K(5.18)
1, i > K.

According to Theorem 5.2.3, we can apply GTD to obtain Aco = QRcGpT (5.19)

if and only if the diagonal elements of RGo RLxL, which we denote as {fGii iL1 satisfy
{|rGa*,iIi {L=1 x = (5.20)

Note that both Q and P in (5.19) are real-valued matrices because AGa is a real-valued diagonal matrix. Inserting (5.19) into (5.17) yields U[Ac' 0Kx(L-K)]A# T
Ga = o QR 0oP (5.21)
Go g
Choose Go = pT and define UAG. OKx(L-K) (QGo = Q. (5.22)

Then (5.21) can be rewritten as Ga = QGoRcGo, which is the QR decomposition of Ga. By Lemmna 4.1.2, it follows that for a = 1, (5.20) is equivalent to {1 + p} = {I_ - ?1+ ~=l G gji=l G ~ii=l,

60

where pi, 1 < i < L, denotes the output SINR of the ith subchannel, and AG.,i is given in (5.18). If
_}= = {log(1+ p -)}- .+ {logA ,}=1 = c, (5.24)

then (5.20) and (5.23) hold, which implies the existence of Q (the first K columns of PT).
Conversely, suppose that there exists a semi-unitary matrix Q such that the linear precoder F = V4 1/2[T and the MMSE-VBLAST detector yields L subchannels with capacities {Rk} k=L. Let Go = QGoRG, be the QR decomposition. It follows from Theorem 5.2.3 that (5.20) holds. Hence, by (5.23), we conclude that (5.24) holds. U
The proof of Theorem 5.3.1 is constructive. Indeed, given the SVD of H and the power loading level #1/2, we only need to calculate AG, AGa, and the GTD of AGa given in (5.19). Then we immediately obtain the linear precoder

F = Vj1/2QT = V [1/2 OKx(L-K) P. (5.25)

Let Qo denote the first Mr, rows of QGa. Then it follows from (5.22) that Qu = U[AcGiOKx(L-K)]A'Q

= U [rI OK(L-K) Q, (5.26)

where F E RKxK is diagonal with its ith diagonal element being yj According to Lemma 4.1.1, the nulling vectors are calculated as 1= G 1 < i < L, (5.27)
Wi = r'G.,iiq (5.27)

where rGa,ii is the ith diagonal element of RG, and qGo,i is the ith column of Q.
In the GTD algorithm, P and Q are obtained via multiplication of L 1 Givens rotation matrices. Hence calculating (5.25) and (5.26) needs O(Mt(L+K)) and O(M,(L+ K)) flops, respectively. We note that the decoding starts with the Lth layer, then the L Ith, and so on.
Given the SVD of H and the power allocation level -1, the TCD-VBLAST scheme needs to run the procedures summarized in Table 5-1. If Mt = AMr = K, then the TCDVBLAST scheme requires only O(L2 + K2 + KL) flops, given the SVD of the channel matrix.

61

Table 5-1: The TCD-VBLAST Scheme step operation flops
1 Calculate AG = A'1/2 O(K)
2 Obtain AGa using (5.18) O(K)
3 Apply GTD to AG, to obtain (5.19) O(L2)
4 Generate F using (5.25) O(Mt(L + K))
5 Compute Q' using (5.26) O(M,(L + K))
6 Calculate {wij=l1 using (5.27) O(MrL)

5.3.2 TCD-DP
Similar to UCD, the TCD also have two implementation forms, which are dual to each other. As a dual form of TCD-VBLAST, the TCD scheme can be implemented by using a DP precoder, which we refer to as TCD-DP. For TCD-DP, a direct construction of the linear precoder F as done in Section 5.3.1 is not obvious. Instead, we exploit the uplink-downlink duality revealed in [49] to obtain TCD-DP. This technique is also used in [59].
We first apply the TCD-VBLAST scheme to the reverse channel

y = H*Fx + z, (5.28)

where the roles of the transmitter and receiver are exchanged and the H in (5.1) is replaced by H*. Then we obtain the precoder F and the equalizer W [wl,...,WL] from H* according to (5.25) and (5.27), respectively. Applying F and the VBLAST detector with nulling vectors {wi}',, we obtain L subchannels i-1
wfy = w*H*fixi+ w*H*fjxj + wz, i= 1,...,L, (5.29)
j=1
where the ith subchannel (5.29) is free of interference from the jth (j > i) subchannels which are detected and cancelled out in advance. The SINR of the subchannel (5.29) is wH*f, 12 U2
wiw + j wHf (5.30)
wf weF-= lwf H*fl2o42
Note that replacing wi by Wi, which is obtained by scaling wi such that 1 wi[ = 1, does not change pi since the output SINR is invariant to the length of wi. Also note that a = 1, i.e., or = o Hence (5.29) can be simplified to be Iw*H*fi 2
Pi = 1 .-1 (5.31)
1 + J: w;~v*H*fj12

62

Let fi, i = 1,..., L, be the scaled version of fi and has unit length. Denote pi = |fs]j2. Then
IW*H*fi 1p, 5
p = H*f2 i= 1,...,L.(5.32)
1 + E ', i*H*f l2Pj

Let aj = f*Hwj12 Then (5.32) can be represented in the matrix form all 0 ... 0 Pl pl
-p2a12 a22 P2 P2 (5.33)
(5.33)
0
-PLal1L -PLa2L '" aLL PL PL

According to the uplink-downlink duality, in the original channel, the precoder of TCDDP should be F = [_W1,..., V/qL], where {qi}L1 will be determined later in (5.37), and the receiving vectors are fi, i = 1,...,L. Then we get L subchannels whose ith scalar subchannel of the MIMO channel is L i-1
Yi = f*HHW4vi-x + E f UjvfHW,-yxj + f*HWj/,-7xj + f*z. (5.34) j=i+1 j=1
i-1
Applying the dirty paper precoder to xi and treating E'1, fj HW qJxJ as the interference known at the transmitter (note that here we precode the first layer first while for TCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
y. = f*Hi v-i + E f,*HvW4/xj + fi*z (5.35)
j=i+1

with SINR (again, recall that a = 1 and o2 = a ) f 1 *H I2
p = for i= 1,2,...,L. (5.36)
1+ I-j=i+l qJ If;iHj2,

Similar to (5.32), (5.36) can also be represented as a11 -pla12 ... -plalL q Pl
0 a22 ... -P2a2L q2 P2
= i(5.37)

0 ... 0 aLL qL PL
L
It is easy to see that qj > 0, 0 < i < L. It is proven in [49] that y= q = tr(FF*) = tr(FF*) = I'-. pi. That is, to obtain L subchannels with SINRs {f }L the TCD-DP

63

needs exactly the same power as the TCD-VBLAST. To make this chapter self-contained, we give below an alternative proof to this interesting and useful fact.
Let UA denote a strictly upper triangular matrix whose (i,j)th entry is aij for 1 < i < j < L and zero otherwise. Let D A and Dp,, be two L x L diagonal matrices with their ith element equal to aii and pi, respectively. Then (5.32) can be rewritten as

(DA- DPu ) p = p (5.38)

or equivalently
(DP DA &U) p = 1 (5.39)

where p = [PI,. PL]T, P = [P, - PL]T and 1 is a vector with unit elements. Hence

p = (9 1DA -AT) -11 (5.40)

Similarly, (5.37) can be rewritten as

(DA DUA) q = p (5.41)

or
(D'DA UA) q =1. (5.42)

Hence
q = ()p'DA UA) 1. (5.43)

From (5.40) and (5.43),
L L
pi = 1T (Dp A _T)-l 1 = 1T (7p-1DA 1A) 1 = qi. (5.44)
i= 1 i=1
We can use the Tomlinson-Harashima precoder [42][431 or the trellis precoder [441 to achieve known interference cancellation at the transmitter. For a system with high dimensionality, TCD-DP is a better choice than TCD-VBLAST since it is free of propagation errors.
5.4 MIMO Communications with QoS Constraints
In this section, we apply the TCD scheme to MIMO communications with QoS constraints. Suppose we want to transmit L > K independent data streams through a MIMO channel. Instead of multiplexing all the substreams in the time division manner to share the entire MIMO channel, we apply TCD to decompose the MIMO channel into multiple subchannels whose capacities/SINRs meet the QoS requirements of the

64

substreams, and dedicate one subchannel to each substream. In [28], the authors studied the same problem. They proposed a linear transceiver design which, similar to TCD, can also control the SINR of each subchannel via designing the precoder. However, the linear transceiver is capacity lossy and can suffer from considerable performance degradation compared with our TCD scheme as we will show at the end of this section. Given that all the subchannels meet the QoS constraints, we want to minimize the overall input power. We need to solve the following optimization problem: minF tr (FF*)
subject to = QR (5.45)
IL

diag(R) = { 1-+}i=l.

Here QR denotes the QR decomposition and diag(R) denotes the vector formed by the diagonal of R. According to Lemma 4.1.2, the diagonal of R determines the SINRs of the subchannels. Without loss of generality, we assume that p, > P2 > ... > PL, We now consider a problem whose constraints are more relaxed than those of (5.45):

minF tr (FF*)

subject to AG, >-{ V -}L+ Ga= (HF (5.46)
\ IL

where AG. stands for the singular values of the augmented matrix Ga. In general, for any matrix A, we let AA denote the singular values of A. By Theorem 5.2.3, if F is feasible in (5.45), then F is feasible in (5.46). We now further simplify (5.46) and show that its solution provides a solution of (5.45). Theorem 5.4.1 If H = UAV* is the singular value decomposition of H, then (5.46) has a solution of the form F = Vp1/2 where 4 e ]RKK is a diagonal matrix with diagonal elements i. 1 < i < K, chosen to solve the problem

min, Ei l Oi

subject to H-i=1( + 4.q ))-li1(l+pi), Ok>k+ >0, 1
I+ Aio) = I-I=l1(1 + pi).
(5.47)

65

Moreover, if QRG.PT is the GTD of AG. in (5.19), then (5.45) has the solution F = V4)1/2fT where 41 is a solution of (5.47) and fl is the matrix formed by the first K columns of pT.
Proof: See Appendix A. U
We now develop an efficient algorithm for solving (5.47). We will see that the constraint /k > Ok+I can be omitted since it is automatically satisfied at a minimizer of (5.47). To begin, we make a change of variables to further simplify the formulation of (5.47). We define
i + 1/AH,i, 1 < i < K,

/3i = I
-,7, i1H=K( + Pi).
H,K

With these definitions, (5.47) reduces to

min z(5.49)
kk
subject to 1i=1 'Vi > H 1 3i, Ok 1/A,k, 1 < k < K.

Both the equality constraint and the inequalities Ok Ok+1 in (5.47) have been dropped since these constraints are automatically satisfied at an optimum. The fact that Ck > k+1 is established after Lemma 5.4.2. With regard to the equality constraint, if V) is feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is reduced when the trailing components of 0 are lowered. That is, if 0 is feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is reduced when the trailing components of 0 are lowered.
Clearly, the feasible set for (5.49) is nonempty and the cost function tends to infinity as any of the components of 7P tends to infinity. By continuity of the cost function and the constraints, a minimizer must exist. We now analyze the structure of the minimizer. By exploiting the structure, we obtain a fast algorithm for solving (5.49).
We first study a similar optimization problem with relaxed constraints. Lemma 5.4.1 Any solution ip of the problem K k k
min E Oi subject to fJIbi >1 f3 1 < k < K, (5.50)
i=1 i= i=1

has the property that Oi+1 < 4'i for each i.

66

Proof: We replace the inequalities in (5.50) by the equivalent constraints obtained by taking log's:
k k
Zlog(Vi) > Elog(3i), 1 < k < K,
i=1 i=1
The Lagrangian L associated with (5.50), after this modification of the constraints, is K /
4(0, ) = E Ok -Ilk (og(oi) lo0g3i)))
k=I \

By the first-order optimality conditions associated with 4, there exists I > 0 with the property that the gradient of the Lagrangian with respect to 4, vanishes. Equating to zero the partial derivative of the Lagrangian with respect to Vj, we obtain the relations

K
=~ E pi, j= I.,K.
i=j

Hence, Oj Oj+l = pj > 0. U
Using Lemma 5.4.1, we can gain insights into the structure of a solution to (5.49). Lemma 5.4.2 There exists a solution 4, to (5.49) with the property that for some integer j E [1,K),
1
V + 5 4'i for all i < j, Vi+j1 >_ Vi for all i > j, i = for all i > j. (5.51)

In particular, Oj < Vi for all i.
Proof: If 4 is a solution of (5.49) with the property that Vi > -4- for all 1 < j < K, then by the convexity of the constraints, it follows that 4, is a solution of (5.50). By Lemma 5.4.1, we conclude that Lemma 5.4.2 holds with j = K. Now, suppose that 4, is a solution of (5.49) with Vq 1/=IA2i for some i. We wish to show that Ok = 1/A2k for all k > i. Suppose, to the contrary, that there exists an index k > i with the property that Ok = /A 2 and Obk+I > 1i/A2k+]. We show that components k and k + 1 of 4, can be modified so as to satisfy the constraints and make the cost strictly smaller. In particular, let 0(c) be identical with except for components k and k + 1:

Vk(6) =(1 I + ()k and Vlk+I(() = 1+e (5.52)

For e > 0 small, O(c) satisfies the constraints of (5.49). The change A(() in the cost function of (5.49) is
+ +'Ok+ 1
( ) = (1 + ()2,. + -- .- 14-1.
l+c

67

The derivative of A(c) evaluated at zero is

A'(0) = /k Ok+1

Since 1/Ak is an increasing function of k and since k = 1/A2,k, we conclude that k+1 > /k and A'(0) < 0. Hence, for E > 0 near zero, /(E) has a smaller cost than i(0), which yields a contradiction. Hence, there exists an index j with the property that #i = 1/A,i for all i > j and i > 1/A2,i for all i < j.
According to Lemma 5.4.1, 4i i/ i+ for any i < j. To complete the proof, we need to show that j, < Vj+4. As noted previously, any solution of (5.49) satisfies K K

i=1 i=1

which implies (cf. (5.48))

j j- ( 11 J2
II, = > I ,.
i=1 i=1 i=j+l i=1

That is, the constraint j=1 #i > HI= i i n (5.49) is inactive. If Oj > Oj+, we will decrease the j-th component and increase the j + 1 component, while leaving the other components unchanged. Letting V(6) be the modified vector, we set VJ+1(6) = (1 + 6)&j+l and #j(6)- = 1 1+6'

Since the j-th constraint in (5.49) is inactive, 0(6) is feasible for 6 near zero. And if Vkj > 4j+l, then the cost decreases as 6 increases. It follows that ,j < j+i. U
By Lemma 5.4.2, , is a decreasing function of i for i C [1,j] while Oi = 1/AL, for i > j. Since AnH, is a decreasing function of i, it follows that Oi = i 1/A',i is a decreasing function of i for i E [1,j] with Oi > 0, while Oi = 0 for i > j. Hence, Oi is a decreasing function of i E [1, K]. In particular, the constraint Ok > k+1 in (5.47) is automatically satisfied by the associated solution characterized in Lenmma 5.4.2.
We refer to the index j in Lemma 5.4.2 as the "break point." At the break point, the lower bound constraint 0i > 1i/Ai changes from inactive to active. We now use Lemma 5.4.2 to obtain an algorithm for (5.49). Lemma 5.4.3 Let %k denote the k-th geometric mean of the 3i: k )1/k

68

function b = TCDPow (0,A)
L = 1 ; R = length (03) ; = zeros (1, R)
( = cumsum (log (/3)) ;
while R > L
[t, 1] = max (((L:R)./[i:R-L+1i]) ; 7y, = exp (t) ; L1 = L + 1 1 ; if > 1/A(LI)^2
i(L:Li) = Ti ;
L=L+ ;
C(L:R) = C(L:R) (L-i) ;
else
iP(L1:R) = 1./(A(L1:R).^2) ; C(Li-1) = ((R) sum (log (4(L1:R))) R = L1- 1 ;
end
end

Figure 5-2: A Matlab function to solve (5.49).

and let I denote an index for which 7k is the largest:

S= arg max{k : 1 < k < K}. (5.53)

If T~i > 1/A then putting (4 = -t for all i < 1 is optimal in (5.49). If T < 1/Ar, then = /AH,i for all i > 1 at an optimal solution of (5.49).
Proof: See Appendix B. U
Based on Lemma 5.4.3, we can use the following strategy to solve (5.49). We form the geometric mean described in Lemma 5.4.3 and we evaluate 1. If i > 1/A2,1, then we set i = 7 for i < 1, and we simplify (5.49) by removing Vi, 1 < i < 1, from the problem. If < 1/A j1, then we set Vh = 1/Aji, for i > 1, and we simplify (5.49) by removing Vj, I < i < K, from the problem. The Matlab code TCDPow implementing this algorithm appears in Figure 5-2.
After obtaining the power loading level Oi = 'i- 1/A2,i, 1 < i < K, we calculate the precoder F and the nulling vectors {wi}L= according to Table 5-1 in Section 5.3. Note that one of the possible paths through the TCDPow routine makes the leading elements of Small equal while setting the trailing elements of Vi = 1/Aji. This path coincides with the standard water filling algorithm. In this case, the TCD scheme is optimal in terms of maximizing the overall throughput given the input power. On the other hand, if some substreamni has a very high prescribed SINR such that the I given in (5.53) is less than the "break point" j,. then 4 leads to be a multi-level water filling power allocation, which

69

suffers from overall capacity loss. This happens when the target rate vector [R1,... RL] falls out of the convex hull spanned by the L! permutations of [Ci, ... CK, 0,. . 0] (cf. Figure 5-1), where Ck, k = 1, ... K, are the capacities of the eigen subchannels with water filling power allocation. As a remedy to this issue, one can "break" (if it is practically allowable) the oversized substream into more than one substreams with smaller rates, or equivalently, lower SINR requirements. Note that TCD can decompose a MIMO channel into an arbitrarily large number of subchannels.
An interesting special case is that p, = P2 ... = PL, i.e., the substream shares the same SINR requirements. In this case, 01 < 2 _< ... < #K since the singular values { AH,i }=' are in nonincreasing order, and TCDPow yields a standard water filling solution. In this case, TCD becomes UCD.
We present two numerical example to conclude this section. In the first example, we assume Rayleigh independent flat fading channels with Mt = 5 and Mr = 6. We consider equal QoS requirements for L = 5 independent substreams. Figure 5-3 compares the input power needed by our TCD scheme and the linear transceiver scheme of [28]. Our scheme can save about 2.5 dB for any prescribed output SINR.

Mr = 6, Mt = 5 lid Rayleigh Flat Fading
30 1 I I I
- Linear TxRx"
25- TCD

20

cc,

MoteCrl 1ril5o i d Rylig fltf-n hne ihM n r 6

0510 is 20 25
Prescribed Output SINR (dB)

Figure 5-3: Input SNR vs. Output SINt-. The result is based on the average of 500 Monte Carlo trials of a i.i.d. Rayleigh fat fading channel with Mt = 5 and M, = 6.

In the second example, we consider a rank two MIMO channel with singular value Aj1,A2. Suppose we want to decompose the MIMO channel into 2 subchannels with

70

capacity C1 and C2 with C, + C2 = 10 bps/Hz. We consider the three scenarios with (A1 = 2, A2 = 1), (A1 = 5, A2 = 1), and (A1 = 10, A2 = 1). For all the three cases, there is an inflection point beyond which our TCD is the same as the linear design of [28]. That is because when the two subchannels have very disparate QoS constraints, i.e., C1 is far larger than C2, the optimal strategy is to apply SVD to the channel matrix and transmit data through the orthogonal eign-subchannels. (In this case, fl = I. (cf. (5.13)).) If the subchannels QoS constraints are not too disparate, which corresponds to the region to the left of the inflection point, the required input power of our TCD scheme is invariant with respect to Ci, C2 and is strictly less than that needed by the linear design. This region corresponds to the capacity lossless region (cf. Figure 5-1). Another interesting point is that the relative advantage of TCD is more prominent if the singular values A1, A2 become more disparate.

22 2 Linear TxRxX1 = 2, 2 = 1
TCD X1 = 2, X2 =1
20 0 Linear TxRx 1= 5, 2 = 1 TCD 1 = 5, 2 =1
18 1 Linear TxRx 1= 10, 2= 1
... TCD1 = 10' "2 = 1
i 16 .

Z 14o

12 a a 0 .
-------- ... -, --e O O
10 [

.. . .. . .. .,.. . .. . .,, .. .. ............ -13, ,, ,, O
6 I I I
5 5.5 6 6.5 7 7.5 8 8.5 9
C1=10-C2

Figure 5-4: Input SNR vs. C1. A rank 2 channel is decomposed into two subchannels with capacities C, and C2 = 10 C1.

5.5 CDMA Sequence Design As we have shown in Section 2.1.1, the CDMA sequence design problem can be viewed as a special case of the MIMO transceiver design. In an idealized S-CDMA system where the channel does not experience any fading or near-far effect, L mobile users modulate their information symbols via spreading sequences { i},i, each of which has the processing gain N. The discrete-time baseband S-CDMA signal received at the

71

(single-antenna) base-station can be represented as [31]

y = Sx + z (5.54)

where S = [S1...,SL] E RNxL and the /th (1 < I < L) entry of x, x1, stands for the information symbol from the lth user. In the downlink channel, the base station multiplexes the information dedicated to the L mobile users through the spreading sequences, which are the columns of S. Then, all the mobiles receive the same signal given in (5.54). We remark that (5.54) can also be written as (4.1) with H = IN and F = S. Here M = Mt = N is the processing gain. Hence, optimizing the spreading sequences amounts to optimizing the precoder F for a MIMO system. Indeed, due to the simple channel matrix (H = I), some procedures of the TCD scheme can be simplified. We shall show that the TCD scheme turns out to be an improved solution to the sequence design proposed in [56]. At the end of this section, we will compare our TCD scheme and the scheme proposed in [56].
5.5.1 CDMA Sequences Maximizing Sum Capacity
Recall that the precoder maximizing the overall MIMO channel capacity is F = V41)/2T where 4I is obtained by water filling algorithm. For an S-CDMA channel, H = I, then V = I and the optimal power loading level is the uniform power allocation. Hence the CDMA sequence maximizing the sun capacity is S = VAUT. Since Q has orthonormal columns, we obtain SST = pl. This observation coincides with the findings in [31], in which the authors show that the CDMA sequences maximizing the sum capacity are the Welch-Bound-Equality sequences.
For the uplink scenario, i.e., the mobiles to base station case, the base station calculates the optimal CDMA sequences for each mobile user and the associated successive nulling vectors needed by itself. Then the base station informs the mobile users their designated CDMA sequences.
First, we need to calculate the power loading levels P E RNN such that the following GTD matrix decomposition is possible:

A ( [4)/2"ONx(L-N) QRpT'(5.55)

where the diagonal elements of R, r,,. i = 1.2,..., L, satisfy the QoS constraints. Note that the singular values of 4)a form a sequence whose first N elements are V1 + 0, i =

72

1, 2,..., N, followed by L N ones. From Theorem 5.2.3, (5.55) exists if and only if

({1+ + }gl,1,...,1) >- {1 +pi}= (5.56)

Similar to (5.47), we need to solve the problem min, EiN1 i
subject to ({1 + i}Ji1, 1,..., 1) > {1 +p (5.57)

S_ 0, Vi

Similar to (5.49), (5.57) can be further simplified using the variables
L
=4 i+l1, = 1 + pi for i < N, and 3YN = (1+pi).
i=N
The simplified problem is

N
minP Ei=1 4i
} (5.58)
subject to H 1, i =1 i 'k >1 1, 1 < k < N.

The algorithm TCDPow simplifies immensely when we apply it to (5.58). Since f3i > 1 =
1 for all i, the constraints V4 > 1 are inactive. Since fi 5 f3i-l for all i < N, the geometric means satisfy 7i, < -y,-1 for all i < N. Hence, in Lemma 5.4.3, the value of I is either 1 or N. If 1 = 1, then we set 4i = /301 and we remove /1 from the problem. If I = N, then 0i = gN for all i. It follows that there exists an index j with the property that

'=/3i for all i j.
i= j+1
This observation coincides with the solution obtained in [561.
Let % denote an L x L identity matrix with its first N diagonal elements replaced by 0ip, 1 < i < N. According to the TCD scheme presented in Section 5.3.1, we then apply the GTD algorithm to %1/2 to obtain

T2 = QRPs. (5.59)

According to (5.25),
S = F = [4,1/2 ONx(L-N)] P. (5.60)

Let
[vI,..., VLi = [41'/2:0 Nx(L N)l, -'1 2Q ,. (5.61)

73

By (5.26) and (5.27), the nulling vectors used at the base station are

wi= rPiij i = 1.... L7 (5.62)

where r, is the ith diagonal element of Rp. In summary, the base station needs to run the following three steps:
1. Solve the optimization problem (5.58).
2. Apply the GTD algorithm to q11/2 in (5.59).
3. Obtain the spreading sequences for all mobile users, [Sl, SL] = S, and the
nulling vectors IW}?t (cf (5.60) and (5.62)) for the base station.
In the downlink case, the mobiles cannot cooperate with each other for decision feed-back. Hence the VBLAST detection is impractical at receivers. However, we can apply TCD-DP as introduced in Section 5.3.2 to cancel out known interferences at the transmitter, i.e., the base station. We can convert the downlink problem as an uplink one and exploit the downlink-uplink duality as we have done in Section 5.3.2. Note that H = H*= 1, i.e., the downlink and uplink channels are the same! Consider the case where the uplink and downlink communications are symmetric, i.e., for each mobile user, the QoS of the communications from the user to the base station and the base station to the user are the same. After obtaining the spreading sequences [SI, -. SLj for the mobile users, and the nulling vectors [wl, .. WL] used at the base station for the uplink case, we immediately know that in the downlink case the spreading sequences transmitted from the base station are exactly [wi,. -, wL] and the nulling vectors used at the mobiles are the spreading sequences, ISI, SL], used in the uplink case. The only parameters we need to calculate are ql, . qNv (cf. (5.37)). Hence in this symmetric case, the base station only needs to inform the mobiles their designated spreading sequences once in the two-way communications. Each mobile uses the same sequence for both data transmission in the uplink channel and interference niulling in the downlink channel.
5.5.4 Numerical Example
We present one numerical example to show how TCD can be applied to CDMA sequence design. WVe consider ail example where there are L = 4 mobile users and the processing gain N = 3. The prescribed SINRs of the four users are 20, 19,18, and 17 dB, respectivel. For time uplinik case, we apply the TCD-VBLAST scheme to obtainl

74

the spreading sequences of the four users as the columns of the matrix 10.0000 -12.0745 -6.4974 -3.0926 S = 0 0 7.4138 -15.5760j (5.63)

0 8.8312 -13.3801 -6.3686)

The nulling vectors used by the base station are the columns of the matrix

0.0990 -0.0015 -0.0037 -0.0104 ( 00 0.1157 -0.0522 .(5.64)

0 0.1098 -0.0077 -0.0213

We note that for this uplink scenario, the base station detects the fourth mobile user, which has the spreading sequence corresponds to the fourth column of S, first and the first user last.
If the prescribed SINRs of the four users remain the same in the downlink scenario, the spreading sequences used by the base station are the four columns of the matrix 17.1936 -0.2303 -0.5154 -1.2796 F0 0 16.0012 -6.4449 .(5.65)

0 17.0149 -1.0614 -2.6352

In this case, the base station applies the dirty paper precoder to the first mobile user first and the last user last. Note that the columns of F and W in (5.64) are the same up to a scaling factor. Moreover, tr(FF T) =tr (SST) 892.7274. which means that the power consumed iii the base station equals to the overall power used by the four mobile users. At the mobile end, the users use the nulling vectors 100000 -12.0745 -6.4974 -3.0926 S 0 0 7.4138 -15.5760 ,(5.66)

0 8.8312 -13.3801 -6.3686

which are exactly the spreading sequences used in the up)linlk scenario. Scaling the output signals may be necessary at the mobile ends for the subsequent dirty l)pper decoder. But the signal scaling does not influence the output SINR.
If zeros are not allowed in the spreading sequences, we can left multiply S and W a 3 x 3 orthogonal matrix to eliminate the zeros in S.

75

5.5.5 Further Remarks
The TCD scheme, which was originally motivated by MIMO transceiver designs, turns out to be similar to the scheme of [56] in several aspects. Both schemes are based on the nonlinear decision feedback operations. Hence both are optimal in terms of maximizing the channel throughput and minimizing the overall input power. Both the GTD algorithm, on which the TCD scheme is based, and the construction of the Hermitian matrix with prescribed eigenvalues and Cholesky values as done in [56] rely on the WeylHorn theorem. However, our TCD scheme enjoys several remarkable advantages over the scheme of [56]. First, note that if we obtain the GTD H = QRP*, where R has the prescribed diagonal elements, then it follows immediately that A & P*H*HP = RR* is the desired Cholesky decomposition. However, the information associated with Q is lost in the Cholesky decomposition. Hence the nulling vectors used at the receivers of [56] cannot be calculated explicitly as our TCD does (cf. (5.27)). Furthermore, the correlation matrix A is only an intermediate result. To get the CDMA sequences, one has to decompose A = RR* explicitly. The TCD scheme, however, can be used to obtain both the precoder (CDMA sequences), which are the columns of P, and the equalizer from Q simultaneously. Second, our TCD scheme is a solution to the more general MIMO transceiver design problem. The Cholesky decomposition algorithm provided in Appendix C of [56] is only applicable to the scenario where the singular values are only of two values. Hence it is not applicable to the general design of MIMO transceivers. The more general Cholesky factorization algorithm suggested in the proofs is computationally far less efficient. Third, the TCD scheme has two implementation forms, i.e., TCD-VBLAST and TCD-DP, which makes it applicable to both the downlink and uplink scenarios. Finally, the TCD scheme provides insights that identify the CDMA sequence design problem as special cases of the MIMO transceiver design.
5.6 Conclusions
Based on the recently developed GTD matrix decomposition algorithm, we have proposed the TCD scheme utilizing the CSIT and CSIR. TCD can be used to decompose a MIMO channel into multiple subchannels with prescribed capacities. The TCD scheme has two implementation forms. One is the combination of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred to as TCD-VBLAST, and the other includes a dirty paper (DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as TCD-DP. Both forms of TCD

76

are computationally very efficient. We have also determined the subchannel capacity region such that a capacity lossless decomposition is possible. The applications of the TCD scheme for MIMO communications with QoS constraints have been investigated. We have also identified the problems of designing precoders for OFDM communications and designing CDMA sequences as special cases in the unifying framework of MIMO transceiver designs. In particular, we have shown that the CDMA sequence design problem in the uplink and downlink scenarios can be solved using TCD-VBLAST and TCD-DP, respectively.
Appendix A
Proof of Theorem 5.4.1 Observe that for F = VI)1/2, we have
K
tr (FF*)= E and HF = UA4)12. (5.67)
i=1

Hence, HFi = A 2,ioi for 1 < i < K, and V/I+ ii I i K,
1, i>K.

Since 1+pi > 1, the last L-K inequalities in the multiplicative majorization condition in (5.46) are implied by the single equality constraint in (5.47). Hence, the problem (5.46) reduces to (5.47) where F = V1'/2, which gives an upper bound for the minimum in (5.46).
Let F = UFDli/2fT denote the singular value decomposition for any given F E CML. Once again, tr (FF*) is given by the sum in (5.67). By [60, Theorem 3.3.14], the singular values of the product HF of two matrices are multiplicatively majorized by the product of the singular values of H and F:

k k
HJ,io > A2 Fi 1 < k < K. (5.68)

i=1 i=1
Taking log's, we have

k k
Elog(A 2'i( )) __ log(A 2 F, ) I < k < A'. (5.69)

77

By [60, Lemma 3.3.81) and (5.69),
k k
f (log(A,44)) f(1og(A, )), 1 < k < K, (5.70)
i=1 i=1

whenever f is a real-valued, increasing convex function. The function f(t) = log(et + 1) is convex since its second derivative is positive. Making this choice for f in (5.70) and exponentiating both sides, we obtain: k k
(Ai + 1) (A2F, i + 1), 1 < k < K. (5.71)
i=1 i=1

Since F is feasible in (5.46), k k
(A ,i + 1) (p + 1), 1 < k < K,
i=1 i=1
K L
(AF,i + l1) = (pi + 1).
i=1 i=1

Combining this with (5.71), we get k k
fl(A4,i + 1) > (Pi + 1), 1 i=1 i=1
K L
fl(A i,i i + 1) > I(pi + 1). (5.72)
i=1 i=1
Since Ajii + 1 is the square of the i-th singular value of the augmented matrix Ga corresponding to the choice V4)1/2, we conclude that F = V4)1/2 satisfies all the inequality constraints in (5.46). If the inequality (5.72) is strict, then OK should be decreased in order to satisfy the equality constraint in (5.47). Since decreasing OK only lowers tr(FF*), we deduce that the minimum in (5.46) is achieved by a matrix of the form F = V4)1/2. If F = V11/2 is optimal in (5.46), then so is F = V)1/2T whenever 0 has K orthonorinal columns (since the constraints are satisfied and the value of the cost does not change). NWe now make the choice for 01 given in Theorem 5.3.1. That is, if QRGaPT is the GTD of Ac., in (5.19) where 4 is a solution of (5.47), then n is the matrix formed by the first K columns of pT. For this choice of Q, the constraints of (5.45) are satisfied. As noted earlier, the minimum in (5.45) can be no smaller than the minimum in (5.46). Since this choice for F yields the same cost in both (5.45) and (5.46), we conclude that F = VI)1/22T is optimal in (5.45).

78

Appendix B
Proof of Lemma 5.4.3 First suppose that -f, > 1/A2. By the arithmetic/geometric mean inequality, the problem

min V i subject to Vi-H3, 0 >0, (5.73)
i=l i---1 j=l
has the solution Vi = <, i < 1. Since A11,i is a decreasing function of i and S> 1/A',i, we conclude that V, = -y, satisfies the constraints bi > 1/A',i for 1 < i < I. Since I attains the maximum in (5.53),
k

i=1
for all k < 1. Hence, by taking 0i = for 1 < i < 1, the first I inequalities in (5.49) are satisfied, with equality for k = 1, and the first 1 lower bound constraints i >! I/A,i are satisfied.
Let 0* denote any optimal solution of (5.49). If I I
H1 =I (5.74)
i=1 i=1

then by the unique optimality of Vbi = -y, 1 < i < 1, in (5.73), and by the fact that the inequality constraints in (5.49) are satisfied for k E [1, 1], we conclude that V/* = yz for all i E [1, 1]. On the other hand, suppose that I I
171 f> j31= (5.75)
i=l i=1
We show below that this leads to a contradiction; consequently, (5.74) holds and = for i E [1,1].
Define the quantity

By (5.75) > -ye. Again, by the arithnetic/geometric mean inequality, the solution of the problem
I I
mill VJi subject to fJ' i > y', ?p > 0, (5.76)
,=l i=1
is bi = J* for i C [1, 11. By (5.75), > 71 and 0 satisfies the inequality constraints in (5.49) for k E [1,1].

79

Let M be the first index with the property that M M
H : H .(5.77)
i=1 i1

Such an index exists since 0* is optimal, which implies that K K

i=1 i=1
First, suppose that M < j, where j is the break point given in Lemma 5.4.2. By complementary slackness, pi = 0 and V for 1 < i K M. We conclude that
bi = -y. for 1 < i < M. By (5.77) we have

M
=.

It follows that

which contradicts the fact that 1 achieves the maximum in (5.53).
In the case Al > j, we have V/i = 7* for 1 < i < j. Again, this follows by complementary slackness. However, we need to stop when i = j since the lower bound constraints become active for i > j. In Lemma 5.4.2, we show that = ^y. for i > j. Consequently, we have 1M3 =M i> YM>7,1

Again, this contradicts the fact that 1 achieves the maximum in (5.53). This completes the analysis of the case where -j _> 1/A4,.
Now consider the case "j < 1/4,. By the definition of 7'1, we have

3 or -tK > fI f3,. (5.78)
i~l i=l1

If j is the break point described in Lemma 5.4.2, then ,> for all i; it follows that

K
H (V)K. (5.79)

80

Since the product of the components of 0* is equal to the product of the components of 0, from (5.78) and (5.79) we get K K
J > fjI13= 11* : (03;)'

i=1 i=1
Hence, y~ > O > 1/A4l > 1/A4,i for all i < j. In particular, if I < j, then 7Y 1/A4, or, I > j when t < 1/Aj. As a consequence, = -

CHAPTER 6
NOVEL MATRIX DECOMPOSITIONS
6.1 Introduction
Given a complex matrix H, we consider the decomposition H = QRP*, where R is upper triangular and Q and P have orthonorinal columns. Special instances of this decomposition are
(a) the singular value decomposition (SVD) [61, 62]

H = VEW*,

where E is a diagonal matrix containing the singular values on the diagonal,
(b) the Schur decomposition [63] H = QUQ*,

where U is an upper triangular matrix with the eigenvalues of H on the diagonal,
(c) the QR decomposition where P = I.
In this chapter, we will introduce two novel matrix decompositions, i.e., the geometric mean decomposition (GMD) and the generalized triangular decomposition (GTD). As we introduced before, the GMD scheme and the UCD scheme are based on the GMD matrix decomposition algorithm, and the TCD is based on the GTD algorithm. The results of this chapter are motivated by the applications of designing MIMO transceiver. Interesting, these results turn out to be also useful to the numerical analysis community.
6.2 Geometric Mean Decomposition
In this section, we present a new unitary decomposition which call the geometric mean decomposition or GMD. Given a rank K matrix H c Ce
Cj < i < K.
>0

81

82

Here the aj are the singular values of H, and & is the geometric mean of the positive singular values. Thus R is upper triangular and the nonzero diagonal elements are the geometric mean of the positive singular values.
We were led to this decomposition when trying to optimize the performance of multiple-input multiple-output (MIMO) systems. However, this decomposition has arisen recently in several other applications. In [64, Prob. 26.3] Higham proposed the following problem:
Develop an efficient algorithm for computing a unit upper triangular matrix with prescribed singular values ai, 1 < i < K, where the product of the ai
is 1.
A solution to this problem could be used to construct test matrices with user specified singular values.
The solution of Kosowski and Smoktunowicz [65] starts with the diagonal matrix E, with i-th diagonal element ai, and applies a series of 2 by 2 orthogonal transformations to obtain a unit triangular matrix. The complexity of their algorithm is O(K2). Thus the solution given in [65] amounts to the statement

QTEP= R, (6.1)

where R is unit upper triangular.
For general E, where the product of the ai is not necessarily 1, one can multiply E by the scaling matrix &-'I, apply (6.1), then multiply by & to obtain the GMD of E. And for a general matrix H, the singular value decomposition H = VEW* and (6.1) combine to give the H = QRP* where

Q = VQ0 and P = WP0.

According to (3.11), we consider the problem of choosing Q and P to maximize the minimum of the rij:

max min {rii : 1 < i < K}
QP
subject to QRP* = H, Q*Q = I, P*P = I, (6.2)
rij= 0 for i > j, R EIR1;IJ

where K is the rank of H. Since the GMD of H is feasible in (6.2), we conclude that the GMD yields the optimal solution to (6.2).

83

6.2.1 Generalized Maximin Properties
We consider the following problem:

max min {uii:l F,G

subject to GUF* =H, uij =0 for i > j, U KK (6.3)

u~i>0, 1 tr ((G'G)-) _< p, tr ((F'F)-) < p2. If Pi = P2 = K, then any Q and P feasible in (6.2) are feasible in (6.3). Hence, the problem (6.3) is less constrained than the problem (6.2) since the set of feasible matrices has been enlarged. Nonetheless, we now show that the solution to this relaxed problem is the same as the solution of the more constrained problem (6.2). Theorem 6.2.1 If H E Cmn has rank K, then a solution of (6.3) is given by

K= U ( )PPNR, and F=P K

where QRP* is the GMD of H.
Proof" Let VYEW* be the singular value decomposition of H, where E E /K contains the K positive singular values of H on the diagonal. If F and G satisfy the constraints of (6.3), then we have

H = VEW* = GUF*.

The column space of GUF* is contained in the column space of G. Since G has K columns, the dimension of the column space is at most K. Since GUF* = H has rank K, the column space of G must coincide with the column space of H, which is equal to the column space of V. Hence, there exists a K by K invertible matrix A such that

G = VA. (6.4)

In the same fashion, the column space of F must coincide with the column space of H*, which is equal to the column space of W. And there exists a K by K invertible matrix B such that
F = WB. (6.5)

84

Combining (6.4) and (6.5) with the identity GUF* = H = VEW* gives

AUB* = E.

It follows that
K
det (E*E) = det (BU*A*AUB*) = det (A*A)det (B*B) uii2I|,
i= 1
which gives
K
min u,12K < 171 2 = det (E*E)det (A*A)-'det (B*B)-'. (6.6)
l
By the constraints of (6.3), we have

tr ((G*G)-') = tr ((A*A)-') < pi, tr ((F*F)-1) = tr ((B*B)-1) < P2.

By the geometric mean inequality and the fact that the determinant (trace) of a matrix is the product (sum) of the eigenvalues, a K by K Hermitian positive semidefinite matrix S satisfies
-K
det (S) < K

Using these bounds for the determinant and the trace in (6.6), we have

mim ui < P- (6.7)
1
Finally, it can be verified that for the choices of G, U, and F given in the statement of the theorem, the inequality (6.7) is an equality. O
6.2.2 Implementation Based on Initial SVD
We now give an algorithm for evaluating the GMD that starts with the singular value decomposition H = VEW*. The algorithm generates a sequence of upper triangular matrices R(L), 1 < L < K, with R1) = E. Each matrix R(L) has the following properties:
(a) ri = 0 when i > j or j > max {L,i}.
(b)-(L) = for all i < L, and the geometric mean of rL), L < i < K, is T. We express Rk+l) = QkR k)Pk where Q k and Pk are orthogonal for each k.
These orthogonal matrices are constructed using a symmetric permutation and a pair of Givens rotations. Suppose that R(k) satisfies (a) and (b). If r > d, then let

85

HI be a permutation matrix with the property that flR(k)fl exchanges the (k + 1)-st diagonal element of R(k) with any element rpp, p > k, for which rp< d. If-k( < ( then let H be chosen to exchange the (k + 1)-st diagonal element with any element rP, p>k, for which rpp > &. Let 61 =(k) and J2 = r() denote the new diagonal elements at locations k and k + 1 associated with the permuted matrix IR(k)I.
Next, we construct orthogonal matrices G1 and G2 by modifying the elements in the identity matrix that lie at the intersection of rows k and k + 1 and columns k and k + 1. We multiply the permuted matrix flR(k)IH on the left by GT and on the right by G1. These multiplications will change the elements in the 2 by 2 submatrix at the intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements of G1 and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of G2, HR(k)HI, and GI:

S[C3161 [ 01 c -i & [ X~
-s62 c51 0 62 s c 0 y (6.8)
(C) (HR(k)H) (G,) (n(k+l))

If 1 =2 = ,we take c = 1 and s = 0; if 51 \$5 2, we take 22
c 2 2 and s= V1-c2. (6.9)
1 62

In either case,
sc(62 l)r255
x 2 )rk and y = (6.10)

Since & lies between 5j and 62, s and c are nonnegative real scalars.
Figure 6-1 depicts the transformation from R(k) to GIIR(k)HlG1. The dashed box is the 2 by 2 submatrix displayed in (6.8). Notice that c and s, defined in (6.9), are real scalars chosen so that

c2 +S2 =1 and (c31)2 + (s52)2 =2.

With these identities, the validity of (6.8) follows by direct computation. Defining Qk = HG2 and Pk = HG1, we set R(k+l) = QTR(k)Pk. (6.11)

It follows from Figure 6--1, (6.8), and the identity (let (R(k+ 1) = det (R(k)), that (a) and (b) hold for L = k + 1. Thus there exists a real upper triangular matrix RO), with

86
Column k
I

X 000 XX X 00
A 0: 0 0o X :- 0 0
Row k
--XJ 0 0 : ---X: 0 0
xoo x0o0

X 0 X 0
X X

R(k) G nHR(k)HGI

Figure 6-1: The operation displayed in (6.8)

Son the diagonal, and unitary matrices Qj and Pi, i = 1, 2,.. ., K 1, such that R(K) = (QT 1 ... QTQT) R (K) k- 2 1 2 E(P1P2 ... Pk-l)Combining this identity with the singular value decomposition, we obtain H = QRP* where
(K-1 K-1 Q=V( Qi, R=R(K), and P = W Pg .
(i=1 )I(i=1 )
In summary, our algorithm for computing the GMD, based on an initial SVD, is the following:
1. Let H = VEW* be the singular value decomposition of H, and initialize Q = V,
P = W, R = E, and k = 1.
2. If rkk > choose p > k such that rp 5 &. If rkk < &, choose p > k such that
rp> In R, P, and Q, perform the following exchanges:

rk+lk+1 pp

P:,k+l P:,p
Q: Q:.p

3. Construct the matrices G1 and G2 shown in (6.8). Replace R by GTRG1, replace
Q by QG2, and replace P by PG1.
4. If k = K 1, then stop, QRP* is the GMD of H. Otherwise, replace k by k + 1
and go to step 2.
Given the SVD, this algorithms for the GMD requires O((m + n)K) flops. For comparison, reduction of H to bidiagonal form by the Golub-Reinsch bidiagonalization

87

scheme [66, 67, 68], often the first step in the computation of the SVD, requires O(mnK) flops.
6.3 Generalized Triangular Decomposition
In this section, we attempt to generalized decomposition of the form

H = QRP*, (6.12)

where R is upper triangular and Q and P have orthonormal columns. We will answer the following two questions. First, what is the necessary and sufficient condition that the decomposition of (6.12) exists. Second, how to calculate such a decomposition. Section
6.3.1 and 6.3.2 focus on answering the two questions.
6.3.1 Existence of GTD
The following result is due to Weyl [69] (also see [60, p. 171]):
Theorem 6.3.1 If A E Cnn with eigenvalues A and singular values or, then A -< a.
The following result is due to Horn [70] (also see [60, p. 220]):

Theorem 6.3.2 If r E Cn and o E IRn with r -- o, then there exists an upper triangular matrix R E Cfln with singular values ai, 1 < i < n, and with r on the diagonal of R.
We now combine Theorems 6.3.1 and 6.3.2 to obtain:

Theorem 6.3.1 Let H E CmX" have rank K with singular values a, > U*2 > ... >UK > 0. There exists an upper triangular matrix R E CKK and matrices Q and P with orthonormal columns such that H = QRP* if and only if r -< a.
Proof: If H = QRP*, then the eigenvalues of R are its diagonal elements and the singular values of R coincide with those of H. By Theorem 6.3.1, r --< a. Conversely, suppose that r -< o,. Let H = V5W* be the singular value decomposition, where E E R"KK. By Theorem 6.3.2, there exists an upper triangular matrix R E CKxK with the ri on the diagonal and with singular values ci, 1 < i < K. Let R = V0EW* be the singular value decomposition of R. Substituting E = V RW0 in the singular value decomposition for H, we have

H = (VV0)R(WW0)*.

In other words, H = QRP* where Q = VV* and P = WW*. U
6.3.2 The GTD Algorithm
Given a matrix H E C"'" with rank K and with singular values a1 > a2 > -. > UK > 0, and given a vector r G C' such that r -< or, we now give an algorithm for

88

computing the decomposition H = QRP*. This algorithm for the GTD essentially yields a constructive proof of Theorem 6.3.2.
Let VEW* be the singular value decomposition of H, where E is a K by K diagonal matrix with the diagonal containing the positive singular values. We let R(L) E CKxK denote an upper triangular matrix with the following properties:
(a) r = 0 when i > j or j > i > L. In other words, the trailing principal submatrix of R(L), starting at row L and column L, is diagonal.
(b) If r(L) denotes the diagonal of R(L), then the first L 1 elements of r and r(L) are equal. In other words, the leading diagonal elements of R(L) match the prescribed
leading elements of the vector r.
(c) rL:K -< rLK, where rLK denotes the subvector of r consisting of components L
through K. In other words, the trailing diagonal elements of R(L) multiplicatively
majorize the trailing elements of the prescribed vector r.
Initially, we set R(1') = E. Clearly, (a)-(c) hold for L = 1. Proceeding by induction, suppose we have generated upper triangular matrices R(L), L = 1,2,...,k, satisfying
(a)-(c), and unitary matrices QL and PL, such that R(L+l) = Q*R(L)PL for 1 < L < k. We now show how to construct unitary matrices Qk and Pk such that R(k+1) Q*R(k)Pk, where R(k+1) satisfies (a)-(c) for L = k + 1.
Let p and q be defined as follows:

p = arg mi{Irk)l : k < i < K, Ir k)I rkI}, (6.13)

q = arg max{Ir k) : k where r(kJ is the i-th element of r(k). Since rk:K -r, there exists p and q satisfying (6.13) and (6.14). Let HI be the matrix corresponding to the symmetric permutation fl*R(k)fl which moves the diagonal elements rg) and rqq to the k-th and (k + 1)-st diagonal positions respectively. Let 6i = r( and 62 = r4) denote the new diagonal elements at locations k and k + 1 associated with the permuted matrix H*R(k)1i.
Next, we construct unitary matrices G1 and G2 by modifying the elements in the identity matrix that lie at the intersection of rows k and k + 1 and columns k and k + 1. We multiply the permuted matrix HI*R(k)I on the left by G* and on the right by G1. These multiplications will change the elements in the 2 by 2 submatrix at the intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements of G1 and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of G,

89
Column k
I

X X X X X X X X X X X X
X X X X X X X X X X
AX0:0 0 AXX 00
Row k
X- 0 0 X- 0 0
xio o _...Xi0o

X 0 X 0
X X

I*R(k) > G*nI*R(k)T G

Figure 6-2: The operation displayed in (6.15)

H*R(k)H, and GI:

r ck* s J i I x (6.15)
-s62 c1 0 62 s c 0 y (6.15)

(G) (H*R(k)n) (G1) (R(k+1))

If 1611 = 1621 = Irkl, we take c = 1 and s = 0; if 1651 # 1621, we take lrk12 16212
c= r 12 and s= -c2. (6.16)
612 1622

In either case,
sc(16212 1,12)rk 6162rk (6.17)
x: =~ and y-i= 1. (6.17)
Irk 12 al Ek- 2
H*R(k)H- to GHI*R(k)IIG1. The dashed box is the 2 by 2 submatrix displayed in (6.15). Notice that c and s, defined in (6.16), are real scalars chosen so that

c2 + 2 = and 21612 + S216212 = Irk2. (6.18)

With these identities, the validity of (6.15) follows by direct computation. By the choice of p and q, we have
1621 r 7k 161, (6.19)

If 1611 / 1621, it follows from (6.19) that c and s are real nonnegative scalars. It can be checked that the 2 by 2 matrices in (6.15) associated with G1 and G* are both unitary. Consequently, both G1 and G2 are unitary. We define

R+l) = (HG2)*R(k)(HGI) = Q*R )Ps,

90

where Qk = IG2 and Pk = TIG1. By (6.15) and Figure 6-2, R(k+l) has properties (a) and (b) for L = k + 1. Now consider property (c).
We write a b if a and b are equal after a suitable reordering of the components. Let a, b, a+, and b+ be vectors whose components are ordered in decreasing magnitude, and which satisfy

a rk:K, b r (k) a+ rk+:K, and b+ (k+l) (6.20)
a'*K, O~ k:K, a ~.rk+I:K, LI rk+l:K" 6.0

Thus ai is the i-th largest (in magnitude) component of rk:K. By the induction hypothesis, we have a -< b. To establish (c), we need to show that a+ < b. Let the index s be chosen so that a8 = rk, and let the index t be chosen so that

btl > Irkl Ibt+ll. (6.21)

By the definition of p and q, rp = bt and r qq= bt+,. As seen in (6.20), a+ is obtained from a by deleting a8 = rk. The vector r(k+l) is obtained from r(k) by a unitary transformation that changes the value of two elements. In particular, b+ is obtained from b by replacing the adjacent pair bt and bt+l by btbt+l rk
Z- Ik 2

By (6.21) btl > ly) > lbt+ll. Consequently, b =y. (6.22)

We partition the proof of (c) into 2 cases.
Case 1: s < t. Since a+ < ai for all i, a- b, and bi = b+ for 1 < i < t, we have atl a1-1 -,<4 b = b t. b (6.23)

For j > t > s, it follows from the induction hypothesis and the connection between a and a+ that
j-1 j-1jj
I-kl fllaI = lafIfIlatl = H jai K 17 b. (6.24)
i=i=1=1 i=1 i=1
Since G, and G2 are unitary, the determinant of (6.15) gives

1,, k = r ) = Ibt = 17r'y = irkbI+l, (6.25)

Full Text
87
scheme [66, 67, 68], often the first step in the computation of the SVD, requires 0(mnK)
flops.
6.3 Generalized Triangular Decomposition
In this section, we attempt to generalized decomposition of the form
H = QRP*, (6.12)
where R is upper triangular and Q and P have orthonormal columns. We will answer
the following two questions. First, what is the necessary and sufficient condition that the
decomposition of (6.12) exists. Second, how to calculate such a decomposition. Section
6.3.1and 6.3.2 focus on answering the two questions.
6.3.1Existence of GTD
The following result is due to Weyl [69] (also see [60, p. 171]):
Theorem 6.3.1 If A E Cnxn with eigenvalues A and singular values cr, then A X a.
The following result is due to Horn [70] (also see [60, p. 220]):
Theorem 6.3.2 If r 6 Cn and matrix R Â£ Cnxn with singular values at, 1 < i < n, and with r on the diagonal of R.
We now combine Theorems 6.3.1 and 6.3.2 to obtain:
Theorem 6.3.1 Let H E > ... >
aK > 0. There exists an upper triangular matrix R E CKxK and matrices Q and P
with orthonormal columns such that H = QRP* if and only if v < cr.
Proof: If H = QRP*, then the eigenvalues of R are its diagonal elements and the
singular values of R coincide with those of H. By Theorem 6.3.1, r < a. Conversely,
suppose that r < a. Let H VEW* be the singular value decomposition, where
E E RhxK. By Theorem 6.3.2, there exists an upper triangular matrix R Â£ CKxK with
the r on the diagonal and with singular values cr,, 1 < i < K. Let R = V0EW be
the singular value decomposition of R. Substituting E = VJRW0 in the singular value
decomposition for H, we have
h = (vv;)R(ww;)*.
In other words, H = QRP* where Q VVq and P WWJ.
6.3.2The GTD Algorithm
Given a matrix H E Crnx" with rank K and with singular values oÂ¡ > o2 > ... >
&K > 0, and given a vector r E CA such that r a, we now give an algorithm for

63
needs exactly the same power as the TCD-VBLAST. To make this chapter self-contained,
we give below an alternative proof to this interesting and useful fact.
Let Ua denote a strictly upper triangular matrix whose (, _?)th entry is aXJ for
1 < i < j < L and zero otherwise. Let VA and Vp be two Lx L diagonal matrices with
their zth element equal to att and p,, respectively. Then (5.32) can be rewritten as
{VA VJA\) p = p (5.38)
or equivalently
{V^Va-UJ) p = 1 (5.39)
where p = [pi,... ,p]T, p [pi, >Pl]T and 1 is a vector with unit elements. Hence
P = (V;1Va-UJ)~11 (5.40)
Similarly, (5.37) can be rewritten as
{VA VPUA) q p (5.41)
or
{V-1Va-Ua) q = l. (5.42)
Hence
q =(V;1Va-Ua)'11. (5.43)
Prom (5.40) and (5.43),
L L
j^Pi = iT {vyvA-ujy11 = iT (v-yvA-uAyl 1 = (5.44)
=i =1
We can use the Tomlinson-Harashima precoder [42] [43] or the trellis precoder [44]
to achieve known interference cancellation at the transmitter. For a system with high
dimensionality, TCD-DP is a better choice than TCD-VBLAST since it is free of prop
agation errors.
5.4 MIMO Communications with QoS Constraints
In this section, we apply the TCD scheme to MIMO communications with QoS
constraints. Suppose we want to transmit L > K independent data streams through a
MIMO channel. Instead of multiplexing all the substreams in the time division manner
to share the entire MIMO channel, we apply TCD to decompose the MIMO channel
into multiple subchannels whose capacities/SINRs meet the QoS requirements of the

69
suffers from overall capacity loss. This happens when the target rate vector [i?i,... Rl\
falls out of the convex hull spanned by the L\ permutations of [C\,... C/f,0,... ,0]
(cf. Figure 5-1), where C*, k = l...., K, are the capacities of the eigen subchannels
with water filling power allocation. As a remedy to this issue, one can break (if it
is practically allowable) the oversized substream into more than one substreams with
smaller rates, or equivalently, lower SINR requirements. Note that TCD can decompose
a MIMO channel into an arbitrarily large number of subchannels.
An interesting special case is that pi = P2 = Pl, he., the substream shares
the same SINR requirements. In this case, fd\ < /?2 < < Pk since the singular values
{^//,},Ci are in nonincreasing order, and TCDPow yields a standard water filling solution.
In this case, TCD becomes UCD.
We present two numerical example to conclude this section. In the first example, we
assume Rayleigh independent flat fading channels with Mt 5 and Mr = 6. We consider
equal QoS requirements for L = 5 independent substreams. Figure 5-3 compares the
input power needed by our TCD scheme and the linear transceiver scheme of [28], Our
scheme can save about 2.5 dB for any prescribed output SINR.
= 6, M( = 5 iid Rayleigh Flat Fading
Figure 5-3: Input SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr 6.
In the second example, we consider a rank two MIMO channel with singular value
Ai,A2. Suppose we want to decompose the MIMO channel into 2 subchannels with

42
4.4.2 Further Remarks
Besides the larger coding gain at low SNR and an improved diversity gain at high
SNR, the UCD scheme enjoys more flexibility than the GMD scheme. For a rank
K MIMO channel, the GMD scheme can support no more than K independent data
streams. However, the UCD scheme can decompose a rank K MIMO channel into
L > K identical subchannels, and L is not even limited by the dimensionality of the
channel matrix. This property of the UCD scheme enables one to achieve high data rate
transmission using small constellations as demonstrated in the numerical examples.
The UCD scheme also suggests new ways of channel decomposition which are much
more flexible than the conventional SVD based ones. Indeed, one may chose the permu
tation matrices and Givens rotations to achieve a wide variety of channel decompositions
with some prescribed SINRs as suggested by the generalized triangular decomposition
(GTD) (See Chapter 6, [51] [52]). This idea is developed in Chapter 5.
Finally, we link UCD with DBLAST [2], which has been shown to be able to achieve
the optimal tradeoff between the channel diversity and multiplexing [16]. We observe
that each diagonal layer of DBLAST can be viewed as the interleaving of the vertical
layers of VBLAST in the space-time domain and each diagonal layer can be regarded
as a virtual subchannel with the same capacity. However, DBLAST requires short and
powerful error correcting coding to make the virtual subchannel work as a real one.
This is a major difficulty for the implementation of DBLAST. In addition, DBLAST
suffers from boundary wastage. In contrast, our UCD scheme, by exploiting CSIT,
applies interleaving (via the Givens rotations and permutations) in the space domain
only. This makes the UCD scheme free from the boundary wastage. Moreover, the
UCD scheme is decoupled from coding procedures. Indeed, UCD can be concatenated
with any error correcting code. Furthermore, UCD makes it easier to design the coding
scheme since UCD decomposes a MIMO channel into multiple subchannels with identical
capacities. Thus in a slowly time varying channel, UCD is much easier to implement
than DBLAST despite their duality. This manifests clearly the values of CSIT.
4.5 Numerical Examples
We present next several numerical examples to demonstrate the effectiveness of the
UCD scheme.
In the first example, we assume Rayleigh independent flat fading channels with
Mt = 10 and Mr = 10. We compare the channel capacity using the UCD and GMD

CHAPTER 1
INTRODUCTION
1.1 Two Categories of Schemes for MIMO Communications
Communications over multiple-input multiple-output (MIMO) wireless channels
have been a subject of intense research over the past several years because deploy
ing multiple antennas at both transmitter and receiver sides can drastically improve the
spectral efficiency [1] [2] [3] [4]. For example, in contrast to the conventional additive
white Gaussian noise (AWGN) channel whose spectral efficiency is
C(snr) = log2(l + snr) bps/Hz,
without requiring additional input power, the MIMO channel with Mt transmitting
antennas and Mr receiving antennas can have spectral efficiency as large as [1] [2]
C(snr) min(Mr, Mt) log2(snr) + 0(1) bps/Hz,
given that there is plenty of scattering in the channel. Many spatial multiplexing meth
ods, e.g., the BLAST scheme [2] [5] [6] [7] [8] [9] [10] [11], have been proposed to reap
the great channel capacity.
Improving the data transmission reliability is another advantage of applying multi
ple antennas in wireless communications. By transmitting the same information through
more than one independent fading channel, one can obtain much more reliable commu
nications thanks to the redundance introduced. The space-time coding methods are
based on such a rationale, (see, e.g., [12] [13] [14] [15]).
Zheng and Tse [16] show that one can exploit the diversity gain and multiplexing
gain promised by the MIMO channel simultaneously. However, there is a fundamental
tradeoff between the two gains. Zheng and Tses theory provides a unifying framework
to measure the performance of any MIMO schemes. Hence designing practical schemes
capable of achieving the optimal diversity-multiplexing tradeoff is a central research
topic in MIMO communications.
1

86
Column k
X
X
X
0
0
0
X X
X
X
0
0
X
X
0
0
0
X
X
X
0
0
Ic
iX
0;
0
0
; X
X
0
0
xi
0
0
xj
0
0
X
0
X
0
X
Ak)
GjnR^nG,
Figure 6-1: The operation displayed in (6.8)
a on the diagonal, and unitary matrices Q and P, i = 1,2,..., K 1, such that
Rw = (QLi QQ[)S(PiP2 . Pfc-i).
Combining this identity with the singular value decomposition, we obtain H = QRP*
where
q=v(iiq.)> r=rw. and p=w^j][Pj-
In summary, our algorithm for computing the GMD, based on an initial SVD, is
the following:
1. Let H = VEW* be the singular value decomposition of H, and initialize Q = V,
P = W, R = S, and k = 1.
2. If rkk > a. choose p > k such that rpp < a. If rkk < a, choose p > k such that
Cpp > d-. In R, P, and Q, perform the following exchanges:
Cfe+l,fc+l fpp
P;,fc+1 ^ P :,p
Q:,fc ^ Q:,p
3. Construct the matrices G] and G2 shown in (6.8). Replace R by GjRGi, replace
Q by QG2, and replace P by PG^
4. If k = K 1, then stop, QRP* is the GMD of H. Otherwise, replace k by k + 1
and go to step 2.
Given the SVD, this algorithm for the GMD requires 0((m + n)K) flops. For
comparison, reduction of H to bidiagonal form by the Golub-Reinsch bidiagonalization

9
It is proven [32] that if K = Mt
c
-p~- > 1 as p > oo. (2-12)
Cut
We claim a stronger relationship as follows. 2
Lemma 2.1.1 For the data model in (2.1), if the channel matrix H is of full column
rank, i.e., K = Mt, then
Cit Cut * 0 as P * oo.
(2.13)
Proof: Inserting (2.9) into (2.10) yields
K
CIT = ^\og2 {pX2HJ
where p is chosen such that
n=l
K
or
KP ~ TT- = P>
n=l H,n
p 1 A 1
K + X2Hn
n=l H (2.14)
(2.15)
(2.16)
Here we assume that all the K subchannels are used because of large p, i.e., p jf > 0
AH.n
for n = 1,2,..., K.
From (2.14), (2.16), and (2.11), and using the fact that K = Mt, we have
* //-+Â£.
Cit Cut = 2_^ 1Â§2
K 1
W.
n=l
p +
K
Note that
lim
k = 1 for 1 < n < K
(2.17)
(2.18)
poo n + 2
AH. n
and that f(x) = log2 x is a continuous function if x > 0. The lemma follows immediately
from (2.17).
However, we note that CSIT can be very helpful in the following cases:
A. The SNR is low or moderate.
B. H is rank deficient or ill-conditioned.
2 A similar, but somewhat vague, statement is found in [8],

76
are computationally very efficient. We have also determined the subchannel capacity
region such that a capacity lossless decomposition is possible. The applications of the
TCD scheme for MIMO communications with QoS constraints have been investigated.
We have also identified the problems of designing precoders for OFDM communications
and designing CDMA sequences as special cases in the unifying framework of MIMO
transceiver designs. In particular, we have shown that the CDMA sequence design
problem in the uplink and downlink scenarios can be solved using TCD-VBLAST and
TCD-DP, respectively.
Appendix A
Proof of Theorem 5.4.1
Observe that for F Vi1/2, we have
K
tr(FF*) = ^& and HF = UAi1/2. (5.67)
1=1
Hence, \2HFi = X2Hi . Jl + 1 < i < K,
\G i = v
1,
i > K.
Since 1+pi > 1, the last LK inequalities in the multiplicative majorization condition in
(5.46) are implied by the single equality constraint in (5.47). Hence, the problem (5.46)
reduces to (5.47) where F = Vi1/2, which gives an upper bound for the minimum in
(5.46).
Let F = UFi' 2fiT denote the singular value decomposition for any given F G
CM,xL. Once again, tr (FF*) is given by the sum in (5.67). By [60, Theorem 3.3.14],
the singular values of the product HF of two matrices are multiplicatively majorized by
the product of the singular values of H and F:
k k
JJ 1 < k < K. (5.68)
i= 1 =1
Taking logs, we have
k k
^!g(A^,,) > ^log(A^Fi),
i=l
1=1
1 (5.69)

104
[60] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge: Cambridge
University Press, 1991.
[61] E. Beltrami, Sulle funzioni bilineari, Giornale De Matematiche, vol. 11, pp. 98-
106, 1873.
[62] C. Jordan, Mmoire sur les formes bilinaires, J. Math. Purs Appl., vol. 19,
pp. 35-54, 1874.
[63] I. Schur, On the characteristic roots of a linear substitution with an application
to the theory of integral equations, Math. Ann., vol. 66, pp. 488-510, 1909.
[64] N. J. Higham, Accuracy and Stability of Numerical Algorithms. Philadelphia:
SIAM, 1996.
[65] P. Kosowski and A. Smoktunowicz, On constructing unit triangular matrices with
prescribed singular values, Computing, vol. 64, pp. 279-285, 2000.
[66] G. H. Golub and C. Reinsch, Singular value decomposition and least squares so
lution, Numerische Mathematik, vol. 14, pp. 403-420, 1970.
[67] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns
Hopkins University Press, 1983.
[68] J. H. Wilkinson and C. Reinsch, Linear algebra, in Handbook for Automatic
Computation (F. L. Bauer, ed.), vol. 2, (Berlin), Springer-Verlag, 1971.
[69] H. Weyl, Inequalities between two kinds of eigenvalues of a linear transformation,
Proc. Nat. Acad. Sci. U. S. A., vol. 35, pp. 408-411, 1949.
[70] A. Horn, On the eigenvalues of a matrix with prescribed singular values, Proc.
Amer. Math. Soc., vol. 5, pp. 4-7, 1954.
[71] M. T. Chu, A fast recursive algorithm for constructing matrices with prescribed
eigenvalues and singular values, SIAM J. Numer. Anal., vol. 37, pp. 1004-1020,
2000.
[72] Y. Jiang, J. Li, and W. Hager, Joint transceiver design for
MIMO communications using geometric mean decomposition, IEEE
Transactions on Signal Processing, to appear. Available online:
http://www.sal.ufl.edu/yjiang/papers/gmdCommR2.pdf.
[73] M. T. Chu, Inverse eigenvalue problems, SIAM Rev., vol. 40, no. 1, pp. 1-39
(electronic), 1998.

LIST OF TABLES
Table page
4-1 The UCD-VBLAST scheme 38
5-1 The TCD-VBLAST Scheme 61
6-1 Comparison of SVD_EIG and gtd for inverse eigenvalue problems (CPU
time in seconds, singular value and eigenvalue errors in sup-norm) ... 93
vi

31
M *10, Mb 10, SNR = 0dB
r t
M = 10, M 10, SNR = 10dB
r I
Capacity (bit/sec/Hz)
Capacity (bit/sec/Hz)
(a)
(b)
M = 10, M = 10, SNR 20 dB
r t
M = 10, M = 10, SNR = 30 dB
r t
(c)
(d)
Figure 3-3: Complementary cumulative distribution function of the capacity of an i.i.d.
Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 1000 Monte
Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20 dB, and (d) 30 dB.
M{= 2, Mf= 4 iid Rayleigh channel, 16-QAM
M{= 4, Mf= 4 iid Rayleigh channel, 16-QAM
(b)
Figure 3-4: BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr = 4 and (b) Mt = 4 and Mr = 4.

BIOGRAPHICAL SKETCH
Yi Jiang was born in Yixing, Jiangsu Province, China, in November 1978. He
received his B.Sc. degree in Electronic Engineering and Information Science from the
University of Science and Technology of China (USTC) in July 2001, the M.S. degree
in Electrical Engineering from the University of Florida (UF) in May 2003.
105

97
delta2 = d (q) ;
d ([kpl q]) = d ([q kpl]) ;
sq.deltal = abs (deltal)'2 ;
sq_delta2 = abs (delta2)'2 ;
sq_rk = abs_rk*2 ;
denom = sq.deltal sq_delta2 ;
if ( (denom <= 0) I (sq_rk > sq_deltal) )
c = 1 ; s = 0 ;
elseif ( sq_rk < sq_delta2 )
c = 0 ; s = 1 ;
else
c = sqrt ((sq_rk sq_delta2)/denom) ;
s = sqrt (l-c*c) ;
end
if ( sqjrk > 0 )
x = -s*c*rk*denom/sq_rk ;
y = deltal*delta2*rk/sq_rk ;
G1 = [ c -s
sc];
G2 = [ c*deltal -s*(delta2)
s*delta2 c*(deltal) ] ;
G2 = ((rk')/sq_rk) G2 ;
else
x = 0. ;
y = deltal ;
G1 = [0-1
10];
G2 = G1 ;
end
if ( k > 1 )
7, permute the columns
R (l:kml, [k p]) = R (l:kml, [p k]) ;
R (l:kml, [kpl q]) = R (l:kml, [q kpl])

30
= 5, M( = 5 iid Rayleigh channel SNR = 23 dB
Figure 3-2: Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results
based on 2000 Monte Carlo trials.

CHAPTER 5
TUNABLE CHANNEL DECOMPOSITION
5.1 Introduction
All these aforementioned MIMO transceiver designs focus on improving the commu
nication quality subject to power constraints. In this chapter, we tackle a new aspect of
the MIMO transceiver design problem. We regard a MIMO transceiver design as a way
of decomposing a MIMO channel into multiple subchannels. As we have mentioned, the
MIMO channel decomposition through SVD plus water filling lacks flexibility despite
its optimality in terms of achieving the maximal overall channel capacity. The success
of UCD motivates a much more flexible channel decomposition approach, namely the
tunable channel decomposition (TCD) scheme, which is the main result of this chapter.
Using the recently developed generalized triangular decomposition (GTD), we propose
the TCD scheme to decompose a MIMO channel into multiple subchannels with pre
scribed capacities or, equivalently, signal-to-interference-and-noise ratios (SINR). The
main properties of the TCD scheme are summarized as follows:
1. Given K parallel subchannels with capacities C\, C2, .., Ck, which is obtained
through applying SVD plus water filling to a rank K MIMO channel, TCD can
convert the K subchannels into L > K subchannels with capacities R\, R2,..., R
if and only if (Ci,..., CK, 0,...,0)eRJ majorizes (Ri,R2, ..., Ri) 1 In partic
ular, Ci = X^!=i Rii be., the TCD is capacity lossless.
2. The TCD scheme has two implementation forms. One is the combination of a
linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST)
detector, which is referred to as TCD-VBLAST, and the other includes a DP
precoder and a linear equalizer followed by a DP decoder, which we refer to as
TCD-DP.
3. Given the SVD of the MIMO channel matrix, the computational complexity of
TCD, which is to calculate the precoder and equalizer matrices, is O(KL), which
is computationally quite efficient.
1 The concept of majorization is introduced in Section 5.2.
52

TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
By
YI JIANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2005

ACKNOWLEDGMENTS
Foremost, I thank my advisor, Professor Jian Li, for her support, encouragement,
and guidance in the past four years. Dr. Li provided me the invaluable opportunity
to investigate those fascinating research problems and always showed full confidence in
me. I just hope I can live up to her expectations in my future career. I am very grateful
to my collaborator, Professor William W. Hager, whose suggestions and rigorous math
have benefited me a lot. I thank Dr. Tan F. Wong for teaching me the information
theory. Some of the basic ideas of this dissertation were formulated when I was taking
his class in the fall of 2003. I would like to thank Dr. John M. Shea, Dr. Tan F. Wong,
Dr. Kenneth K. O, and Dr. William W. Hager for serving in my dissertation committee.
Thanks go to all my friends both at the University of Florida and elsewhere who made
the last four years full of fun.
This dissertation is dedicated to my parents and my fiance Hongying.
iii

page
ACKNOWLEDGMENTS iii
LIST OF TABLES vi
LIST OF FIGURES vii
ABSTRACT ix
CHAPTER
1 INTRODUCTION 1
1.1 Two Categories of Schemes for MIMO Communications 1
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate 2
1.3 MIMO Transceiver Design from Channel Decomposition Perspective 3
1.4 Dissertation Outline 4
2 LINEAR MIMO TRANSCEIVER DESIGNS 6
2.1 Channel Model and Channel Capacity 6
2.1.1 Channel Model 6
2.1.2 Channel Capacity 8
2.2 Channel Capacity and Cramr-Rao Bound 10
2.3 Rate Performance of Linear Transceivers 13
3 MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOM
POSITION 17
3.1 VBLAST and ZF-DP 17
3.1.1 VBLAST 17
3.1.2 ZF-DP 18
3.2 Geometric Mean Decomposition for MIMO Transceiver Design .... 20
3.3 Performance Analyses and Implementations Issues 21
3.3.1 Performance Analyses 21
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking 23
3.3.3 Subchannel Selection 24
3.3.4 Further Remarks 25
3.4 Performance Examples 27
3.5 Conclusions 28
4 UNIFORM CHANNEL DECOMPOSITION 33
4.1 Closed-Form Representation of MMSE-VBLAST 34
4.2 UCD-VBLAST 35
4.3 UCD-DP 39
4.4 Performance Analysis 40
4.4.1 Diversity Gain Analysis 40
4.4.2 Further Remarks 42
IV

4.5 Numerical Examples 42
4.6 Conclusions 44
5 TUNABLE CHANNEL DECOMPOSITION 52
5.1 Introduction 52
5.2 Channel Model and Preliminaries 53
5.2.1 Channel Model 53
5.2.2 Channel Decomposition 53
5.2.3 Majorization and Generalized Triangular Decomposition .... 55
5.3 Tunable Channel Decomposition 57
5.3.1 TCD-VBLAST 57
5.3.2 TCD-DP 61
5.4 MIMO Communications with QoS Constraints 63
5.5 CDMA Sequence Design 70
5.5.1 CDMA Sequences Maximizing Sum Capacity 71
5.5.4 Numerical Example 73
5.5.5 Further Remarks 75
5.6 Conclusions 75
6 NOVEL MATRIX DECOMPOSITIONS 81
6.1 Introduction 81
6.2 Geometric Mean Decomposition 81
6.2.1 Generalized Maximin Properties 83
6.2.2 Implementation Based on Initial SVD 84
6.3 Generalized Triangular Decomposition 87
6.3.1 Existence of GTD 87
6.3.2 The GTD Algorithm 87
6.3.3 Inverse Eigenvalue Problem 93
6.4 Conclusions 95
7 CONCLUSIONS .' 99
REFERENCES 100
BIOGRAPHICAL SKETCH 105
v

LIST OF TABLES
Table page
4-1 The UCD-VBLAST scheme 38
5-1 The TCD-VBLAST Scheme 61
6-1 Comparison of SVD_EIG and gtd for inverse eigenvalue problems (CPU
time in seconds, singular value and eigenvalue errors in sup-norm) ... 93
vi

LIST OF FIGURES
Figure page
3-1 Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr = 4 for i.i.d. Rayleigh flat fading channels 29
3-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and
Mr = 5. Results based on 2000 Monte Carlo trials 30
3-3 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt 10 and Mr = 10. Results
based on 1000 Monte Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20
dB, and (d) 30 dB 31
3-4 BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr 4 and (b) Mt 4
and Mr = 4 31
3-5 BER performances of GMD-VBLAST and GMD-ZFDP. Both are com
bined with OFDM for ISI suppression 32
4-1 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results
based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20
dB, and (d) 30 dB 49
4-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt 5 and
Mr = 5. Results based on 2000 Monte Carlo trials 50
4-3 Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt 4
and Mr = 4 50
4-4 BER performances of the UCD-DP, UCD-VBLAST schemes and the
imaginary UCD-genie scheme. Results based on 1000 Monte Carlo
trials of an i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. 51
5-1 Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, Ci = 3, C2 = 2, and C3 = 1 57
5-2 A Matlab function to solve (5.49) 68
5-3 Input SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with Mt 5
and Mr = 6 69
5-4 Input SNR vs. C\. A rank 2 channel is decomposed into two subchannels
with capacities C\ and C2 = 10 C\ 70
6-1 The operation displayed in (6.8) .
vii
86

6-2 The operation displayed in (6.15)
89
viii

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
By
Yi Jiang
May 2005
Chair: Jian Li
Major Department: Electrical and Computer Engineering
This dissertation studies the signal processing aspect of multi-input multi-output
(MIMO) communications. The contribution of this dissertation is twofold.
First, this dissertation presents a new perspective to the MIMO communications:
any MIMO scheme can be regarded as a MIMO channel decomposer, which decomposes
(in an information lossy or lossless manner) a MIMO channel into multiple scalar sub
channels. Based on this perspective, this dissertation presents three novel MIMO trans
ceiver designs, the geometric mean decomposition (GMD) scheme, the uniform channel
decomposition (UCD) scheme, and the tunable channel decomposition (TCD) scheme.
All these schemes deploy either a decision feedback equalizer (DFE) at the receiver or
a dirty paper precoder (DPP) at the transmitter. These transceiver designs represent a
paradigm shift from the conventional linear MIMO transceiver designs to the nonlinear
ones. The superior performance of the GMD and UCD schemes unveils the practical
significance of making transmitter and receiver cooperate with each other. That is, such
cooperations facilitate achieving the optimal tradeoff between the diversity gain and
multiplexing promised by the MIMO communication theory. The TCD scheme repre
sents a unifying solution to a considerably wide range of problems, including designing
the precoder for orthogonal frequency division multiplexing (OFDM) communications
and the optimal code division multiple access (CDMA) sequence design.
Second, this dissertation introduces two novel matrix decomposition algorithms, i.e.,
the geometric mean decomposition (GMD) and the generalized triangular decomposition
(GTD). The two matrix decompositions form the cornerstones of the three transceiver
IX

designs proposed in this dissertation. Moreover, the two decompositions have significant
implications in the matrix analysis community. For instance, the GTD is a new solution
to the inverse eigenvalue problem.
x

CHAPTER 1
INTRODUCTION
1.1 Two Categories of Schemes for MIMO Communications
Communications over multiple-input multiple-output (MIMO) wireless channels
have been a subject of intense research over the past several years because deploy
ing multiple antennas at both transmitter and receiver sides can drastically improve the
spectral efficiency [1] [2] [3] [4]. For example, in contrast to the conventional additive
white Gaussian noise (AWGN) channel whose spectral efficiency is
C(snr) = log2(l + snr) bps/Hz,
without requiring additional input power, the MIMO channel with Mt transmitting
antennas and Mr receiving antennas can have spectral efficiency as large as [1] [2]
C(snr) min(Mr, Mt) log2(snr) + 0(1) bps/Hz,
given that there is plenty of scattering in the channel. Many spatial multiplexing meth
ods, e.g., the BLAST scheme [2] [5] [6] [7] [8] [9] [10] [11], have been proposed to reap
the great channel capacity.
Improving the data transmission reliability is another advantage of applying multi
ple antennas in wireless communications. By transmitting the same information through
more than one independent fading channel, one can obtain much more reliable commu
nications thanks to the redundance introduced. The space-time coding methods are
based on such a rationale, (see, e.g., [12] [13] [14] [15]).
Zheng and Tse [16] show that one can exploit the diversity gain and multiplexing
gain promised by the MIMO channel simultaneously. However, there is a fundamental
tradeoff between the two gains. Zheng and Tses theory provides a unifying framework
to measure the performance of any MIMO schemes. Hence designing practical schemes
capable of achieving the optimal diversity-multiplexing tradeoff is a central research
topic in MIMO communications.
1

2
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate
All the aforementioned methods assume that the channel state information (CSI)
is available at the receiver (CSIR) only. Under this assumption, collaborations between
the transmitter and receiver are difficult in the physical layer. However, if the commu
nication environment is relatively stationary, the availability of CSI at the transmitter
(CSIT) is also possible via feedback or the reciprocal principle when time division duplex
(TDD) is used. In fact, in the third generation WCDMA standard [17], the CSIT is
assumed to obtain improved system performance, which is referred to as the closed-loop
transmit diversity or transmit adaptive array (TxAA) technique. Based on this assump
tion, the joint optimal transceiver design (also referred to as precoding at the transmitter
and equalization at the receiver) has recently attracted considerable attentions [18] [19]
[20] [21] [22] [23] [24] [25] [26] [27] [28] [29],
These designs are based on a variety of criteria, including minimum mean-squared-
error (MMSE), [18] [21] [22], maximum SNR [21], maximum information rate [19] [20]
[22], and BER based criteria [23] [24] [25] [29], More recently, a unified framework has
been presented to accommodate all these criteria, under which the design problems can
be solved via convex optimization methods [26].
The aforementioned literature on joint transceiver design considered linear trans
formations only. It is widely understood that the singular value decomposition (SVD),
which decomposes a MIMO channel into multiple parallel subchannels, and water fill
ing can be used to achieve the channel capacity [3], However, due to the usually very
different signal-to-noise ratios (SNR) of the subchannels, this apparently simple scheme
requires careful bit allocation (see, e.g., [19] [20] [23]) to match the subchannel capacity
and achieve a prescribed BER. Bit allocation not only increases the coding/decoding
complexity, but also is inherently capacity lossy because of the finite constellation gran
ularity. An alternative is to use the same constellation in all the subchannels, like
the schemes adopted by the European standard HIPERLAN/2 and the IEEE 802.11
standards for wireless local area networks (WLANs). However, for this alternative, the
BER is dominated by the subchannels with the lowest SNRs. To optimize the BER
performance, more signal power could be allocated to the poorer subchannels. Yet this
approach causes significant capacity loss due to inverse water filling like power allo
cation. There is apparently a fundamental tradeoff between the capacity and the BER
performance.

3
1.3 MIMO Transceiver Design from Channel Decomposition Perspective
In this dissertation, we present a new perspective to the MIMO communications.
We regard the aforementioned MIMO schemes as MIMO channel decomposers, which
decompose (in an information lossy or lossless manner) a MIMO channel into mul
tiple scalar subchannels. For instance, the MIMO transceiver design based on SVD
decomposes a MIMO into multiple eigen-subchannels. Similarly, the V-BLAST scheme
decomposes a MIMO channel into multiple scalar subchannels which are referred to as
layers by its inventors. These channel decompositions, however, are totally determined
by the specific channel realization and one can have little control over how the channel
is decomposed. For example, the gains of the subchannels obtained via SVD are totally
determined by the singular values of the channel matrix, which one can have no control
over.
An interesting question arises: if the transmitter and receiver are allowed to col
laborate, how can we design a transceiver that can decompose a MIMO channel into
multiple subchannels with prescribed channel gain, and without incurring capacity loss?
This dissertation is devoted to answering this question. In the process of pursuing the
answer, we investigate the following aspects of the problem.
First, we show that the conventional linear transceivers are inherently inflexible,
and we cannot rely on linear transceivers to achieve our desired channel decomposi
tions. Hence we need to go beyond the linearity constraint and investigate the nonlinear
schemes, such as a decision feedback equalizer (DFE) and a dirty paper precoder (DPP).
Second, we study the possibility of new matrix decompositions other than using
SVD. We propose two novel matrix decomposition algorithms, the geometric mean de
composition (GMD) and the generalized triangular decomposition (GTD). The two de
compositions represent a wide class of matrix decomposition, which has significant im
plications in the matrix analysis community. For instance, the GTD is a new solution
to the inverse eigenvalue problem.
Third, we propose three transceiver designs which combine the new matrix decom
position algorithms with the DFE and DPP. The three designs are the GMD scheme, the
uniform channel decomposition (UCD) scheme and the tunable channel decomposition
(TCD) scheme. Among them, the UCD scheme can decompose, in a strictly capacity
lossless manner, a MIMO channel into multiple subchannels with identical capacities or,
equivalently, identical channel gains. Moreover, the UCD scheme is a practical scheme

4
that can achieve the optimal tradeoff between the diversity gain and multiplexing gain.
Without incurring any capacity loss, the TCD scheme can decompose a MIMO chan
nel into multiple subchannels with prescribed capacities/channel gains. This scheme
is applicable to a wide range of applications, including the multi-task communications
where independent data streams with different qualities-of-service (QoS) share the same
MIMO channel, and designing the optimal CDMA sequences.
1.4 Dissertation Outline
In Chapter 2, we introduce the data model and some relevant information-theoretic
results that will be used in this dissertation. We also review the existing transceiver
designs and analyze the performances of those methods. By linking the channel capacity
with the Cramer Rao bound (CRB), we give an information-theoretic explanation why
linear transceivers are inflexible.
Chapter 3 presents the GMD scheme that combines the VBLAST detector or DP
precoder with the GMD matrix decomposition algorithm. The GMD scheme can decom
pose a MIMO channel into multiple identical scalar subchannels. This desirable prop
erty can bring much convenience to the practical system design, particularly the symbol
constellation selection. Moreover, we have shown that the GMD scheme is optimal as
ymptotically for high SNR in terms of both information rate and BER performance
while the computational complexity of our scheme is comparable to the conventional
linear transceiver scheme.
In Chapter 4, we propose a uniform channel decomposition (UCD) scheme. Similar
to the GMD scheme, the UCD is also based on the GMD matrix decomposition algo
rithm and can decompose a MIMO channel into multiple identical subchannels. Two
remarkable merits of UCD, which are not shared by the GMD scheme, are that first,
UCD is strictly capacity lossless at any SNR, and second, UCD can achieve the opti
mal diversity and multiplexing tradeoff. Moreover, the UCD scheme can decompose
a MIMO channel into an arbitrarily large number of independent subchannels, which
is an enabling technology to achieve high data rate transmission using small symbol
constellations.
Chapter 5 is devoted to tackling a new aspect of the MIMO transceiver design
problem. Instead of attempting to optimize the BER performance for fixed input power
and data rate, we propose the TCD scheme which can decompose a MIMO channel
into multiple subchannels with prescribed channel capacities. We show that TCD is a

5
solution to a wide range of applications, including the applications in which independent
data streams with different qualities-of-service (QoS) share the same MIMO channel and
design the optimal CDMA sequences.
The mathematical foundations of this dissertation, the GMD and GTD algorithms,
are established in Chapter 6. The two novel matrix decomposition algorithms are the
cornerstones of the MIMO transceiver designs proposed in this dissertation.
The conclusions are given in Chapter 7.
To read this dissertation, it is unnecessary to plunge into the details of the GMD
and GTD algorithms. For this reason, we put them to the latter part of the disserta
tion. However, a rough understanding of the two algorithms is necessary to appreciate
Chapters 3-5.

CHAPTER 2
LINEAR MIMO TRANSCEIVER DESIGNS
2.1 Channel Model and Channel Capacity
2.1.1 Channel Model
We consider a communication system with Mt transmitting and Mr receiving an
tennas in a frequency flat fading channel. The sampled baseband signal is given by
y = HFx + z,
(2.1)
where x G CLxl is the information symbols precoded by the precoder F G CA,X and
y Â£ CMrXl is the received signal and H G CMrXMt is the channel matrix with rank K.
We assume Â£[xx*] = a^Ii and z ~ N(0, is the circularly symmetric complex
Gaussian noise, where I stands for an identity matrix with dimension L. We define the
input SNR as
p = = ^|Tr{F*F} 4 lTr{FF}, (2.2)
af a
where a = f. Designing the MIMO transceivers, including the precoder F and the
associated equalizer, is the focus of this dissertation.
We note that the data model in (2.1) is generic. For an intersymbol-interference
(ISI) channel with impulse response h = [Hq. hi,, /-m-i]T with (-)T denoting trans
pose, if a block data with length N are transmitted using the zero-padded OFDM,
then the received block data can also be written in the form of (2.1) with
H -
h0 0
ho
hM-i
0
0
0
0
0
0
0 h\f-i ... ho
o . . ;
0 0 ... 0 h\{-1
(2.3)
In this case, H is a Toeplitz matrix with its dimensionality Mt = N and Mr N+M1.
If the OFDM with cyclic prefix is used, the channel matrix is a circulant Toeplitz matrix,
6

7
i.e.,
ho 0 /ijvf-i
hi ho 0 /im-i
hi
h2
H =
0
(2.4)
h\i-2 hM-1
hM-1 hM-2 hi h0 0
O hM-1 hM-2 h\ ho
Here, Mt = Mr = Ar. In either case, if the block data are precoded with the linear
precoder F, then the received data are given in (2.1). This ISI channel problem has
been studied in [21] [30].
In an idealized synchronous CDMA (S-CDMA) system where the channel does
not experience any fading or near-far effect, L mobile users modulate their information
symbols via spreading sequences {s,}^, each of which has the processing gain N. The
discrete-time baseband S-CDMA signal received at the (single-antenna) base-station can
be represented as [31]
y = Sx + z
(2.5)
where S = [si,...,s] 6 RNxL and the Ith (1 < l < L) entry of x, xÂ¡, stands for
the information symbol from the fth user. In the downlink channel, the base station
multiplexes the information dedicated to the L mobile users through the spreading
sequences, which are the columns of S. Then all the mobiles receive the same signal
given in (2.5). We remark that (2.5) can also be written as (2.1) with H = IN and F = S.
Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading sequences
amounts to optimizing the precoder F for a MIMO system. Indeed, this problem has
been under intensive research in the past several years.
In summary, both designing a precoder for OFDM transmission through an ISI
channel and searching for the optimal S-CDMA sequences can be regarded as special
cases in the unifying framework of MIMO transceiver designs. MIMO transceiver designs
can be used in the OFDM and CDMA applications after only simple modifications. In
this dissertation, we will concentrate on MIMO transceiver design although we will
discuss the optimal design of CDMA sequences in Chapter 5.

8
2.1.2 Channel Capacity
Suppose x is a Gaussian random vector. The capacity of the MIMO channel (2.1)
is
C = log2
|<72I + (T2HFF*H*|
kzi|
(2.6)
where | | denotes the determinant of a matrix. If both CSIT and CSIR are available, we
can maximize the channel capacity with respect to F given the input power constraint
cr2Tr{FF*} = pa\. That is,
CjT = max log2 |I + a-1HFF*H*|, (2.7)
<7Â¡Tr{FF*}=p<72
where a is as defined in (2.2) and the subscript of Cjt stands for informed transmitter.
Denote the SVD of H as H = UAV*, where A is a K x K diagonal matrix whose
diagonal elements {A//*})^ are the nonzero singular values of H. The solution to F in
(2.7) is [3]
F = V\$1/2. (2.8)
Here \$ is diagonal whose /cth (1 < k < K) diagonal element 4>k determines the power
loaded to the fcth subchannel and is found via water filling to be
+
(2.9)
Mp0 = yi~ ^ j
with fi being chosen such that Ylk=i fail*) Pal an(l (a)+ = max{0,a}. Then the
solution to (2.7) is
Cjt = log2 (l + Atf'j bps/Hz. (2.10)
k=i \ a /
Note that some of *.s can be zeros. In this case, we can only transmit L < K data
streams.
If the CSIT is not available, the optimal transmission strategy is to evenly allocate
I poc
power to each antenna [3], For this case, F = \/7tIm( and the channel capacity with
V Mt
uninformed transmitter (UT) is
Cut = log2 f 1 + ) bps/Hz.
n=l t
(2.11)
1 Throughout this dissertation, we assume that the coherent time of the channel goes
to infinity. Hence advanced coding is applicable to approach the Shannon capacity.

9
It is proven [32] that if K = Mt
c
-p~- > 1 as p > oo. (2-12)
Cut
We claim a stronger relationship as follows. 2
Lemma 2.1.1 For the data model in (2.1), if the channel matrix H is of full column
rank, i.e., K = Mt, then
Cit Cut * 0 as P * oo.
(2.13)
Proof: Inserting (2.9) into (2.10) yields
K
CIT = ^\og2 {pX2HJ
where p is chosen such that
n=l
K
or
KP ~ TT- = P>
n=l H,n
p 1 A 1
K + X2Hn
n=l H (2.14)
(2.15)
(2.16)
Here we assume that all the K subchannels are used because of large p, i.e., p jf > 0
AH.n
for n = 1,2,..., K.
From (2.14), (2.16), and (2.11), and using the fact that K = Mt, we have
* //-+Â£.
Cit Cut = 2_^ 1Â§2
K 1
W.
n=l
p +
K
Note that
lim
k = 1 for 1 < n < K
(2.17)
(2.18)
poo n + 2
AH. n
and that f(x) = log2 x is a continuous function if x > 0. The lemma follows immediately
from (2.17).
However, we note that CSIT can be very helpful in the following cases:
A. The SNR is low or moderate.
B. H is rank deficient or ill-conditioned.
2 A similar, but somewhat vague, statement is found in [8],

10
C. There are more transmitting antennas than receiving ones, i.e., Mt> Mr.
Moreover, the availability of CSIT provides more freedom, which makes it easier to
devise joint transceiver design schemes to achieve the underlying channel capacity. This
observation is the underlying theme of this dissertation.
2.2 Channel Capacity and Cramr-Rao Bound
One of the most important significances of the Shannons information theory is that
this theory can predict the highest achievable data rate for a given channel. Similarly,
the Cramr-Rao bound (CRB) [33], which is the inverse of the Fisher information matrix
(FIM), can predict the minimum mean squared error (MMSE) an estimator can achieve.
In this section, we show that the MIMO channel capacity formula of (2.6) can be re
written as a function of CRB, or FIM. Based on this relationship, we show that linear
transceivers lack flexibility.
We rewrite (2.1) as follows:
y = Hx + z, (2-19)
but we relax the assumption of (2.1) slightly. Instead of assuming spatially white noise,
we assume that z ~ A'(0, R.). We also assume that the channel input x ~ N(0. Rx)
also has circularly symmetric complex Gaussian distribution and is independent of z.
Then the channel output y ~ Ar(0. HR IT -f R,). For this more general scenario, the
channel capacity is
^ |R2 + HRXH*
c=logm
Now Consider the following random vector,
X
~ Ar [ 0,
y
l [
Its log-likelihood function is
R,
HR,
log/(x,y)
const [x* y*]
R,
R,H*
-1
X
HR,
Ry
y
(2.21)
(2.22)
Using the block matrix inversion formula [34], we get
R.r
R,H*
-1
A B
HR,
Ry
B*
(2.23)

where
11
A = (R* RxITR^HR*) 1 (2.24)
B = (Rx R^ITR^HR*)_1 RXH*R~1 (2.25)
and o is irrelevant to the present discussion. From (2.22)-(2.25) we have
~dlog/fx, Y) = (Rx RxH*RHRx)_1 (x RaH*R;1y). (2.26)
where x is the conjugate of x. Here we define the differentiation with respect to a
complex-valued vector as [35, Appendix B]
d_
<9w
1
2
/
_2_
dui J dv\
\
a
duM
f dvM )
d_
<9w
a
du\
d
duM
+ jA
+ 3
dvM )
(2.27)
where the mth entry of w, wm urn + jvm, m = 1 The Bayesian Fisher
information matrix (FIM) [36] is given by
FIM = E
dlog/(x,y) <91og/(x,y)T
dx dx
(2.28)
Based on (2.26) and (2.21), we obtain
FIM = A [I : R-rFPRy1]
Rx RxH*
I
HRj Rj,
-R^hr*
= (Rx-RxH*Ry1HRx)1
= Rj1 + H*(Ry HRXH*)-1H
= R^ + HR^H
(2.29)
(2.30)
(2.31)
Comparing (2.20) and (2.31), we see that
R = log2|Rx|+log2|FIM|
(2.32)
= log2 |Rx| log2 |CRB|
(2.33)
where
CRB = FIM"1 =RX- R.ITR^HR, (2.34)
This shows that there exists a simple relation between the Gaussian MIMO channel
throughput, which is an upper bound of the information transmission rate for any

12
coder/decoder, and the CRB, which is a lower bound on the covariance matrix of any
unbiased estimator of x.
The MMSE estimator of x is
xMMSE = R,H (HRXH* + R,)-1 y (2.35)
It is easy to verify that the MMSE estimator of x can achieve the CRB. Hence the MMSE
estimator is the best we can achieve under the Gaussian assumptions. In general cases,
the matrices FIM and CRB are non-diagonal; i.e., the MMSE estimates of the elements
of x are correlated. The correlations between the elements of x clearly contain useful
information for the subsequent decoding procedures. However, in practice, we only
estimate the single elements of x separately and ignore the correlations between these
elements. This causes the loss of information. In fact, we can quantify the capacity loss
as
Qob. = Â£ log CRB** log |CRB| (2.36)
fc=i
where CRB** denotes the k-th diagonal element of CRB. According to the Hadamard
inequality [34], for any positive semidefinite matrix M 6 CK,
K
|M| < H Mkk (2.37)
2 = 1
and the equality holds if and only if M is diagonal. Hence Closs > 0 and there is no
capacity loss if and only if CRB is a diagonal matrix.
Based on the aforementioned discussions, we see that i) in general MIMO com
munications, linear MMSE estimators followed by separate substream decoding are not
capacity-wise optimal and ii) if the channel matrix H has the property that CRB of
(2.34) is a diagonal matrices, linear MMSE estimators may be the first step of capacity
lossless processing. If CSIT is available, the transmitter can apply some precoder F and
get a virtual channel matrix
Hvt = HF (2.38)
such that CRB is diagonal. This explains why all the existing linear transceiver designs
invariably lead to the diagonalization of the channel matrix. Indeed, if Rx is diagonal
and R- = a]I, then it follows from (2.31) that Hvt must have orthogonal columns
to get diagonal FIM and hence diagonal CRB. Then the precoder F = V, which
is the right singular vector of H, is the only optimal solution. Yet as we discussed

13
before, this inflexible transceiver scheme can bring many difficulties to the subsequent
coding/decoding and modulation/demodulation procedures.
2.3 Rate Performance of Linear Transceivers
To gain more insights into the limitations of the linear transceiver designs, we ana
lyze the asymptotic rate performances of two typical linear transceiver designs for high
SNR. We will show that the linear transceivers may suffer from considerable capacity
loss and there is apparently a fundamental tradeoff between the throughput and the
BER performance.
According to the channel model of (2.1), the received data vector is
y = HFx + z. (2.39)
The optimal linear receiver is always the LMMSE equalizer (also see, e.g., [23])
Gop( = F*H* (HFFHV2 + ct2I)_1, (2.40)
which yields the optimal estimate of the information symbol s = G^y. The mean-
squared-error (MSE) matrix of s is
E = (I + a_1F*H*HF)_1. (2.41)
Note that E is a function of the linear precoder F. In the following, we analyze two linear
precoder designs based on the minimization of the trace of the MSE matrix (MTM) and
the minimization of the maximum diagonal elements of MSE matrix (MMD) criteria,
which are referred to as ARITH-MSE and MAX-MSE in [26], respectively. We choose
these two schemes because they appear to be the most typical ones and the MMD
scheme yields the optimal (or very close to the optimal) performance among all the
linear transceivers. Indeed, the MMD is equivalent to the linear MIN-BER scheme in
the flat fading channel case (see [26]). We do not consider the SVD plus water filling
The MTM scheme, or ARITH-MSE, which has appeared in several linear transceiver
design papers (see, e.g., [22] [23] and [26]), attempts to minimize tr(E) with respect to
F. The MTM precoder turns out to be
F MTM = V*1/2
(2.42)

14
where V is as defined in the SVD H = UAV*, and d> is a diagonal matrix whose ith
diagonal element denotes the signal power loaded to the ith subchannel. According
to the literature (see e.g. [23] Sec. III-A)
fa
^1/2Ah, A2,.
(2.43)
where p is the Lagrange multiplier which controls the loaded power such that fa =
pa2. Suppose p is sufficiently large. Then all the K subchannels are used and
K
or
1/2 P + E.=i A Â£
Eli, *iW'
(2.44)
(2.45)
Substituting (2.42), (2.43) and (2.45) into (2.41), we see that E is diagonal with the th
diagonal element
spK x-i
Z^k=l AH,k
Ei =
(P + Xjt=i A///t)A//,
Then (cf. Equation (28) of [26])
Ci = log2 Et
l (P + Sfc=1 AH k \ , x
= lg2 | .-1 + log2 AH.i-
V* A_:
\ 2^i=k AH,k /
Hence the sum rate of the channel using the MTM scheme is
k fn + yrK A-2 \ K
Cmtm = J2i = Klog2 | K k~*H,k ] + loS2 Ah,.-
.=1 \ X-/fc=l A H k ) i=i
The channel capacity with uniform power loading in the K subchannels is
K
Cupl = Y log2 (1 +
i= 1
(2.46)
(2.47)
(2.48)
(2.49)
(2.50)
Here Cupl is different from CUT defined in (2.11) in that Cupl corresponds to the
channel with the transmitter knowing the range space of H.
It follows from (2.49) and (2.50) that
Cupl-Cmtm = lg2 (l + -^A^)Alog2 (P 1 H'') lQg2 Ah,.- (2.
=1 \ 2^= 1 AH,i / ,-=l
51)

15
After some straightforward calculations, we have
A-1
lim CUPL Cmtm = K log2 bps/Hz. (2.52)
(IE, *il)v
Note that for any real valued sequence {A//^}^ > 0, the arithmetic mean is greater than
or equal to the geometric mean, or Lii ^h\ > ^h\^J Hence we conclude
that limp^ooCupl Cmtm > 0 and the equality holds if and only if {\h,}: are aU
the same. We infer from (2.52) that the capacity loss of the MTM transceiver can be
quite large if the channel matrix H has a large condition number, which is verified in
Section 3.4.
If the same constellation is used for each subchannel, then the substream cor
responding to the largest Et dominates the overall BER performance. Recall that
a-1
Ei k=1 y, which is proportional to the inverse of AHence the sub-
(P+Z^fc=1
channels may have very different SNRs especially when H has a large condition number.
To mitigate this undesirable effect, one can use the MMD transceiver, or MAX-MSE
(cf. [26] Section V-A5), with
F mmd = Emtm, (2.53)
where is a unitary matrix that makes all the diagonal elements of E in (2.41) the
same, that is,
1 K
E=kYE'- (254)
1=1
According to (2.47), the capacity of the channel using the MMD linear transceiver is
Thus
K
C,
MMD
= -K log2 = -K log2 ^
1=1
(2.55)
j K K
= K^-^YE'~YXo^Ei
= K log2 K
= K log2
1 Li: Ei
1 IK
i ,-i
(iE, e,)
A A-1
K AH,t
(IE, Ail)
(2.56)
(2.57)
(2.58)
Cmtm Cmmd

16
where to get (2.58) from (2.57) we have used (2.46). Note that the relative capacity loss
of MMD compared with MTM is independent of SNR given that all the subchannels
are used. Interestingly, we can see from (2.58) and (2.52) that Cmtm ~ Cmmd =
lim^oo Cupl Cmtm We conclude that asymptotically for high SNR, the MMD
transceiver has twice the capacity loss of MTM, i.e.,
V A-1
lim CVPL Cmmd = 2K log2 ^ ^ bps/Hz, (2.59)
(nr., as-)
although it may yield better BER performance. An intuitive explanation of the capacity
loss of the MMD transceiver is as follows. Note that the only difference between MTM
and MMD is the prerotation matrix , which is an invariant operator in terms of
information capacity. However, makes the MSE matrix E non-diagonal, which means
that the elements of s = Gopty are correlated. Clearly, the correlation contains useful
information for symbol detection and decoding. However, the linear equalizer ignores
the correlation, which results in the additional capacity loss quantified in (2.58). The
analyses here are verified in Section 3.4.
In summary, the MTM transceiver suffers from capacity loss of (2.52) due to the
ceiver suffers from additional capacity loss because it makes the MSE matrix non
diagonal. Hence there is an apparently inevitable tradeoff between the information
rate and BER performance if the same symbol constellation is used in the different sub
channels. In the next chapter, we will introduce the GMD scheme and clarify that there
is not necessarily a tradeoff between BER performance and channel capacity. Indeed,
the GMD scheme attempts to achieve the best of both worlds simultaneously.

CHAPTER 3
MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION
3.1 VBLAST and ZF-DP
In this section, we first give a brief introduction to the VBLAST architecture [5],
which is equivalent to the generalized decision feedback equalizer (GDFE) [37]. We also
introduce the more recent zero-forcing dirty paper precoder (ZFDP) applied to the
3.1.1 VBLAST
VBLAST is a simple suboptimal receiver interface which is used in the MIMO
system assuming that only CSIR is available. For a MIMO system (2.1) with Mt <
Mr and rank K = Mt, the transmitter allocates independent bit streams across the
Mt transmitting antennas with no precoding. To decode the transmitted information
symbol, VBLAST first estimates the signal with the spatial structure hAit, where h,
denotes the zth column of H, and then cancels it out from the received signal vector.
Next, it estimates the signal with spatial structure hA/t-i and so on. The signal estimator
can be either the ZF or MMSE estimator. Some proper reordering of the columns of H is
helpful to improve the BER performance [5]. This decoding scheme involves sequential
nulling and cancellation which is proved to be equivalent to the generalized decision
feedback equalizer (GDFE) [37].
The ZF nulling step in the VBLAST scheme can be represented by the QR decom
position H = QR where Q is an Mt x K matrix with orthonormal columns and R is a
K x K upper triangular matrix. Let us rewrite (2.1) as
y = QRx + z. (3.1)
Multiplying Q* to both sides of (3.1) yields
y = Rx + z, (3.2)
17

18
1
... Â£
to i
i

Di D2
0 722
... r1K
r2K
X\
X2
+
Z\
z2
VK
0 ...
0 rKK
xk
Zk
The sequential signal detection is as follows
(3.3)
for i = K : 1:1
xx = C Ej=+i rijÂ£jJ /ra
end
where C stands for mapping to the nearest symbol in the symbol constellation. Ignoring
the error-propagation effect, we see that the MIMO channel is decomposed into K
parallel scalar subchannels
yi = riixi + zi, i = 1,2, , AT.
(3.4)
3.1.2 ZF-DP
We consider a broadcast MIMO channel with Mt transmitting antennas and Mr
receiving antennas (Mt > Mr). The channel model is exactly the same as (2.1) and the
CSIT is available. However the receiving antennas cannot cooperate with each other. A
vector transmission scheme was proposed in [40], which combines the QR decomposition
and dirty paper precoding. We refer to this approach as the zero-forcing dirty paper
precoding (ZFDP). (The use of the dirty paper phrase is due to Costa [41].)
The ZFDP scheme resembles the zero-forcing VBLAST method. It also goes
through the sequential nulling and cancellation procedure. The only difference is that
all these operations are done by the transmitter.
By assuming H to be of full row rank, i.e., K = Mr, ZFDP also begins with the
QR decomposition H* = QR. Let us rewrite (2.1) as
y = R*Q*x + z. (3.5)
Denoting x = Qx yields
y = R*x + z,
(3.6)

19
yi
hi
0 .
. 0
Xi
Zi
V2
hi
h2
. 0
&2
+
2
rx i
Vk
rKK
XK
zk
Denote s Â£ CKxl to be the symbol vector destined for the K receivers. We wish to
x satisfying
hisi
1
O
. 0
X\
?22S2

hi h2
. 0
X2
XKKSr
rx 1
rKK
Xk
(3.7)
have
(3.8)
The solution to (3.8) is
x = R *diag{R}s.
(3.9)
However, the matrix inversion can amplify the norm of x significantly which can lead
to additional power consumption at the transmitter. By exploiting the finite alphabet
property of the communication signals, the modulo arithmetic precoder (more recently
known as the Tomlinson-Harashima Precoder [42], [43]) can be applied to bound the
value of the transmitted signal. Moreover, the trellis precoding can be used to eliminate
the 1.53 dB shape-loss of Tomlinson-Harashima precoding [44]. The ZFDP transmission
scheme decomposes the MIMO channel into K parallel scalar channels (see [40] for more
details)
Vi = ruXi + Zi i = 1,2, (3.10)
Several remarks are now in order, a) VBLAST is shown to be able to achieve only
about 72% of the capacity [5]. That is because imposing the same rate of transmission
on all the transmitters makes the channel capacity limited by the worst of the K scalar
subchannels, b) VBLAST has only diversity gain of MrMt+1. c) ZFDP can achieve the
broadcast channel capacity for high SNR [39], but the subchannels have different fading
levels. Hence the transmitter, just like the aforementioned linear transceivers, have to
consider the tradeoffs between the BER performance and the channel throughput, d)
ZFDP scheme causes no error propagation, and thus (3.10) is precise, e) Both VBLAST
and ZFDP involve nonlinear operations.

20
3.2 Geometric Mean Decomposition for MIMO Transceiver Design
Note that VBLAST assumes no cooperations among transmitting antennas and
ZFDP assumes no cooperations at the receivers. Then a natural question arises: can
we exploit both the CSIR and CSIT to make things better if both CSIR and CSIT are
available? We attempt to address this question next.
In the sequel, we assume that the same signal constellation is used in all the inde
pendent symbol streams to reduce the system complexity. This is consistent with the
HIPERLAN/2 and IEEE 802.11 standards. Then the overall BER performance of the
system will be limited by the subchannel with the lowest SNR. To mitigate this problem,
based on (3.4) and (3.10), we consider the following optimization problem
max min : 1 < i < K\
Q,P
subject to R = Q*HP
R G R*xK,rq = 0 for i > j C3-11)
> 0 for 1 < i < K
Q*Q = P*P = Ik
where the semi-unitary matrices Q and P denote the linear operations at the receiver
and transmitter, respectively.
Since both Q and P are semi-unitary matrices, we have fln^i rnn = A//,n,
where {A#n}A=1 are the K non-zero singular values of H. In Chapter 6 we show that if
there exist semi-unitary matrices P and Q satisfying
H = QRP*, or equivalently, R = Q*HP (3.12a)
where the diagonal elements of R are given by
ru = A# = A#,n^
i IK
1 < < K,
(3.12b)
then the R in (3.12) is the solution to (3.11). The detailed treatment of the decompo
sition (3.12) is delegated to Chapter 6, We refer to this decomposition as the geometric
mean decomposition (GMD) since the diagonal elements of R are the geometric mean
of {A// } ^=1. A computationally efficient and numerically stable algorithm is proposed
in Section 6.2 to calculate the decomposition.
It seems reasonable to constrain the linear equalizer Q to be semi-unitary since it
will keep the background noise white. Yet it seems unnecessary to constrain P to be

21
semi-unitary as well. Indeed, the constraint that P and Q should be semi-unitary is in
fact inactive as shown in the following lemma established in Section 6.2.1.
Lemma 3.2.1 The GMD of (3.12) is also the solution to the following optimization
problem with relaxed constraints:
max min {ra : 1 < i < K\
P,Q
subject to R = Q*HP, r^ = 0 for i > j, Re RKxK,
(3.13)
r > 0, 1 < i < K,
tr(Q*Q) < K, tr(P*P) < K.
Proof: Omitted. See Section 6.2.1 for details.
The GMD, which can be viewed as an extended QR decomposition, can be read
ily combined with the aforementioned VBLAST (GDFE) or ZFDP. GMD-VBLAST is
implemented as follows: We first calculate the GMD H = QRP*. Next we choose the
precoder F = P, then the equivalent data model is
y = QRx + z. (3-14)
The next step is nothing but the VBLAST detector.
Ignoring the error propagation effect, we can regard the resulting subchannels as K
independent and identical subchannels
yi = XHxi + zi, for i = l,...,K. (3.15)
The GMD-ZFDP scheme is similar to GMD-VBLAST because of the duality be
tween VBLAST and ZFDP.
3.3 Performance Analyses and Implementations Issues
In this section, we first present the performance analyses of the GMD scheme from
capacity perspective, from which we demonstrate the advantages of our GMD scheme
over the linear transceivers. Next, we consider combining the GMD scheme with the
blind two-way channel subspace tracking in the TDD scenario. To achieve close to opti
mal performance at low SNR, we propose to combine GMD with subchannel selection.
Finally, we discuss the relationship between our GMD scheme and [30].
3.3.1 Performance Analyses
As we have mentioned earlier, the overall BER performance of a MIMO commu
nication system is dominated by the worst subchannels asymptotically for high SNR.

22
Hence the scheme optimizing the worst subchannel can enjoy the optimal BER per
formance for high SNR. This observation is also the motivation of the aforementioned
MMD scheme. As a major advantage over the linear transceiver schemes, the GMD
scheme is also asymptotically optimal in terms of the channel capacity for high SNR as
we will show below.
If the signal power is allocated evenly to the K subchannels, then based on (3.15),
we get
Cgmd = K\og2 (l +(3.16)
where p is defined in (2.2). The channel capacity with uniform power loading on the K
subchannels is (see (2.50))
K
Cupl = ^ log2 (l + ) (3.17)
It follows from (3.16) and (3.17) that
Cupl CGmd = log2
rin=l (! + P^H,n)
(1 -I- pXjj)
From (3.12b) and (3.18), we have
(3.18)
lim Cupl Cgmd 0. (3.19)
p* OO
Based on Lemma 2.1.1
lim CÂ¡T Cupl = 0. (3.20)
p+ OO
Hence, it follows from (3.19) and (3.20) that
lim Cit Cgmd = 0, (3-21)
p*oo
i.e., for high SNR the GMD scheme is asymptotically optimal.
Hence the GMD scheme does not need to make the tradeoff between the information
rate and BER performance as the conventional linear transceivers. Instead, our GMD
scheme can achieve the optimum on both aspects simultaneously for high SNR.
As we have mentioned before, VBLAST may suffer from error propagation. Hence
the BER performance of GMD-VBLAST will be inferior to the scalar equivalence in
(3.15). We calculate the upper bound of the GMD-VBLAST BER as follows. For a
fixed SNR p, we assume that the system of (3.15) has symbol error rate (SER) Pe, i.e.,
each subchannel has SER Pe/K. We consider the worst case that decoding errors in some

23
subchannels will cause the failure of the decoding in all the subsequent subchannels. The
SER upper bound is readily calculated as
Pe,GMD-VBLAST =
<
-tfitl-P.m-r,)P.
n0
1 K~'
-^T(K-n)Pc
71=0
(3.22)
For a moderate K, say K < 10, the performance loss caused by the error propagation is
rather small. For a system with high dimensionality, GMD-ZFDP is a better choice since
it causes no error propagation. On the other hand, the Tomlinson-Harashima precoder
leads to an input power increase of for M-QAM.
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking
In TDD systems, the GMD scheme may be combined with two-way channel sub
space tracking techniques. The GMD algorithm, given in Chapter 6, starts with the
SVD. To calculate the matrix P (cf. (3.11)), we only need to know the singular values
A and the right singular vectors V (cf. Chapter 6). Similarly, only A and U are used
to calculate Q. Rewriting (2.1) with the precoder F = P yields,
y = HPx + z. (3.23)
Since the GMD scheme uses the same signal constellation and uniform power allocation,
the covariance matrix of s is a scaled identity matrix, i.e., E[xx*] = Ry = Â£[yy*] = HHV2 -F a\l. (3.24)
If the signal power a2 and the noise power a2 are known a priori, we have HH* =
(Ry cr2I)/cr2. Applying SVD to HH*, we get
HH* = UA2U*. (3.25)
The GMD algorithm can be applied based on U and A to get the matrices Q and R,
which are sufficient for decoding. If a TDD system is used, the reverse channel, where
the roles of previous transmitter and receiver are exchanged, can be modeled as
Y rev H Q Srer -|- Z,.rl;
(3.26)

24
where the subscript rev means reverse channel. Define
= Â£[yreyL] (3-27)
where y denotes the complex conjugate of y. Using the similar argument, we have
H*H = VA2V*. (3.28)
Then the reverse receiver, i.e., the previous transmitter, can calculate R and P from V
and A. Channel subspace tracking techniques (see, e.g., [45] [46]) can be used to estimate
U, V and A efficiently. Hence our GMD scheme can be applied without the need of
using training symbols for channel estimation. We note that this merit of GMD is not
shared by the conventional transceiver schemes introduced in Section 2.3 since all those
methods allocate different powers to different subchannels, which makes it difficult, if
not impossible, to estimate the singular values in A. Of course, if the same power is
allocated to each eigen-subchannel, this blind two-way channel subspace tracking idea
can also be combined with the SVD based schemes, at the cost of significant capacity
loss.
The GMD scheme can be made backward compatible with the TDD systems using
VBLAST decoders. By using CSIT or blind subspace tracking techniques, the trans
mitter can calculate the linear precoder F. Hence it can always precode the transmitted
data x to be Px, even when sending the training data. Thus the receiver is fooled
to believe that the channel is the virtual one Ht = HP = QR. Although the linear
precoder P is made transparent to the VBLAST detector, the decoder still enjoys the
multiple identical subchannels due to the linear precoder F = P.
3.3.3 Subchannel Selection
The previous discussion is based on the assumption that all the subchannels cor
responding to positive singular values are used for signal transmission. However, in
practical scenarios, some of the positive singular values of the channel matrix H can be
very small. This situation occurs for spatially correlated flat fading channels, or even
i.i.d. Rayleigh flat fading channels with Mr ta Mt 1. From (3.12b), we see that
it will influence the overall channel quality and hence subchannel selection is helpful.
The other situation where subchannel selection is needed is the case when the input
power is low or moderate. In this section, we propose a simple algorithm to select the

25
subchannels, which is numerically verified to be able to achieve near optimal capacity
even at low SNR.
Let us sort the singular values of H as A// i > Ah,2 > ^h,k > 0. If GMD is
constrained to the first n < K eigen subchannels, we obtain n identical subchannels
Di = \nXi + Zi, for i = 1,... ,n.
where
A =
\=i
(3.29)
(3.30)
To maximize the channel throughput with our GMD scheme, we need to solve the
following problem
max n log 1 +
l n5<
or
max
l 1 +
\ *=i
(3.31)
(3.32)
The solution to this problem is straightforward. We can use either linear search or
bisection method to find the optimal n.
Several remarks are in order, i) It is straightforward to incorporate the channel
selection into the GMD algorithm. In Section 6.2.2, we show that GMD starts from
SVD H = UAV* and then applies a series of Givens transformation to A to make it
upper triangular. The Givens transformation can be constrained to the first n < K
diagonal elements of A. ii) The blind channel subspace tracking can be combined with
the subchannel selection strategy seamlessly. If only the subchannels corresponding to
the largest n < K singular values are selected, the blind channel tracking technique will
track the n dimensional subspace automatically, iii) The performance loss of the GMD
scheme at low SNR region is due to the well-known fact that the zero-forcing equalizer
is inherently suboptimal. In the next chapter, we propose the so-called uniform channel
decomposition (UCD) scheme, which can decompose a MIMO channel into multiple
identical subchannels in a strictly capacity lossless manner.
3.3.4 Further Remarks
The author later noticed [30] in which an idea similar to GMD was proposed to
approach the performance of the ML detector in the ISI suppression scenario. For a
SISO ISI channel, if symbols are precoded and transmitted in a block manner, then the

26
data model (2.39) can be used to represent the received block data (cf. (2.3) and (2.4)).
Note that for this case, H is a Toeplitz matrix due to the time invariant property of
the IS I channel. A linear precoder design F was proposed in [30] such that the virtual
channel Ht = HF can be decomposed via QR decomposition to be Hu( = QR where
R has equal diagonal elements. We see that this equal diagonal idea is equivalent to
GMD. However, our GMD scheme, independently motivated by the MIMO transceiver
design problem, has several major advantages over the algorithm in [30]:
1. Our GMD scheme represents a paradigm shift from the conventional linear trans
ceiver designs to nonlinear designs and can be proven, both numerically and theo
retically, to have superior performance from both BER and information theoretic
aspects.
2. Our GMD algorithm is computationally much more efficient than that of [30],
Both algorithms start from the SVD of H which is followed by K 1 iterations.
The GMD involves 2K 2 fast Givens rotations. For a channel H with Mt =
Mr = K, the SVD requires 0(K3) flops while the GMD requires additional 0(K2)
flops. Thus the computational complexity of the GMD scheme is comparable
to the conventional linear transceiver schemes. However, the algorithm in [30]
involves multiplications and inversions of matrices in each iteration and the overall
computational burden turns out to be additional 0(/F4) flops.
3. For the GMD algorithm, only the information of HH*, and hence A and U, are
needed to calculate Q. However, for the algorithm in [30], the equalizer needs
to know both the precoder F and H, and hence Hvt = HF, in order to apply
the traditional QR to H,,(. Hence it cannot be combined with the aforementioned
blind two-way channel subspace tracking algorithm introduced in Section 3.3.2.
Like the algorithm in [30], the GMD scheme can also be combined with orthogonal
frequency division multiplexing (OFDM) for ISI suppression. For a SISO ISI channel
with memory L,
L-1
y{n) hix(n l) + z(n), (3.33)
1=0
after applying OFDM with block length N, we get a MIMO channel
y = Dx + z (3.34)
where D is a diagonal matrix with the diagonal elements equal to the V-point FFT of
h = [ho, hi,..., /Â¡l-]7. Hence the GMD scheme can be applied directly. We expect

27
that GMD-ZFDP may have better BER performance than GMD-VBLAST if N 1, in
which case the GMD-VBLAST may suffer from considerable performance degradation
due to error propagations.
3.4 Performance Examples
We present next several numerical examples to demonstrate the effectiveness of the
GMD scheme. In all the examples, we assume Rayleigh independent flat fading channels.
In the first example, we consider a Rayleigh flat fading channel with Mt 4 and
Mr = 4. We compute the Shannon capacities of the channel with both CSIR and CSIT
(Cjt, (2.10)), the channel with uninformed transmitter (Cut, (2.11)), the channel using
the GMD scheme (Cgmd, (3.16)), the channel using the MTM scheme (Cmtm, (2-49)),
and the channel using the MMD scheme (Cmmd, (2.55)). We average the capacities of
1000 Monte-Carlo-generated H realizations. The result is presented in Figure 3-1. We
note that the capacity loss of the MMD scheme is about twice that of the MTM scheme
at high SNR as predicted in Section 2.3. The relative capacity loss of the MMD scheme
compared with MTM is smaller at low SNR because some subchannels are not used at
low SNR. The GMD scheme outperforms the linear transceiver designs when the SNR
is moderate or high and is asymptotically capacity lossless at high SNR.
Figure 3-2 shows the complementary cumulative distribution functions (CCDF) of
the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR
equal to 23 dB. The five thin dashed curves denote the channel capacities of the five sub
channels obtained via SVD plus water filling. Note that the leftmost thin curve crosses
the vertical axis at a value less than one, which means that the worst subchannel (cor
responding to the smallest singular value of the channel matrix) is sometimes discarded
by water filling. The thick line is the CCDF of each subchannel capacity obtained via
GMD. Figure 3-2 further illustrates the disadvantages of the conventional SVD plus bit
allocation scheme (see, e.g., [19] [20] [23]). The channel capacities of the 5 subchannels
obtained via SVD plus water filling range from 0 to about 10 bps/Hz, which suggests
that the BPSK or QPSK modulation should be used to match the capacity of the worst
subchannel and something like 512 or 1024 QAM to the best subchannel. This bit
allocation significantly increases the modulation/demodulation complexity. Moreover,
using a constellation with size greater than 256 is impractical for the current RF circuit
design technology. For the GMD scheme, on the other hand, the same constellation with
a moderate size, say 64-QAM, can be applied to reap most of the channel capacity.

28
To demonstrate the effectiveness of the subchannel selection approach, we consider
a 10 x 10 independent Rayleigh flat fading channel. The channel is usually ill-conditioned
since some singular values of H are very close to zero. Without the subchannel selection
strategy, GMD suffers from performance degradation, especially at low SNR, as seen
in Figure 3-3. On the other hand, with the subchannel selection scheme, there is only
about 0.2 bit/sec/Hz rate loss compared with the CÂ¡T, even at very low SNR.
We compare the BER performance of the GMD-VBLAST scheme with the unpre-
coded MMSE-VBLAST scheme with the optimal detection ordering, the MTM scheme
and the MMD scheme. No error correcting code is used in the simulations. In Fig
ure 3-4(a), H C4x2 has identically independent Rayleigh fading elements. Hence the
channel matrix is usually well-conditioned. Two independent symbol streams modulated
as 16-QAM are transmitted. The figure is obtained by averaging 1000 Monte Carlo tri
als of H. We see that the GMD scheme has more than one dB improvement over the
MMD scheme at moderate to high SNR. In Figure 3-4(b), H condition number, in which case the MMD scheme is subject to more capacity loss as
analyzed in Section 2.3. Four independent symbol streams are transmitted. The BER
performance of the GMD scheme is much better than the others. We did not include
MTM because it discards some bad subchannels and hence cannot be used to transmit
four independent data streams.
In the final example, we combine the GMD scheme with 64-point FFT based
OFDM for ISI suppression in a SISO channel. We assume that the channel response
hi, l = 0,1,..., L 1, are independent zero-mean circularly symmetric Gaussian random
variables with unit variance. The channel length is L = 4. The GMD-ZFDP is about 2
dB better than GMD-VBLAST. This is because GMD-VBLAST suffers from consider
able error propagation effect. This result suggests that GMD-ZFDP may be preferred
over GMD-VBLAST if the channel has a large dimensionality.
3.5 Conclusions
In this chapter, we introduce a novel joint transceiver design, which combining
the geometric mean decomposition (GMD) with the VBLAST equalizer or dirty paper
precoder. The GMD scheme can decompose a MIMO channel into multiple identical
scalar subchannels. This desirable property can bring about much convenience to the
practical system design, particularly the symbol constellation selection. Moreover, we
have shown that the GMD scheme is optimal asymptotically for high SNR in terms of

29
= 4, M( = 4 Â¡id Rayleigh channel
Figure 3-1: Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr 4 for i.i.d. Rayleigh flat fading channels.
both information rate and BER performance while the computational complexity of our
scheme is comparable with the conventional linear transceiver scheme. Furthermore, we
have shown that the GMD scheme can be applied without the need of using training
symbols for channel estimation if combined with subspace tracking techniques. We
have also considered the issue of subchannel selection when some of the subchannels
are too poor to be useful. The GMD scheme can also be combined with OFDM for ISI
suppression. Both the theoretical analyses and empirical simulations have been provided
to validate the effectiveness of our approaches.

30
= 5, M( = 5 iid Rayleigh channel SNR = 23 dB
Figure 3-2: Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results
based on 2000 Monte Carlo trials.

31
M *10, Mb 10, SNR = 0dB
r t
M = 10, M 10, SNR = 10dB
r I
Capacity (bit/sec/Hz)
Capacity (bit/sec/Hz)
(a)
(b)
M = 10, M = 10, SNR 20 dB
r t
M = 10, M = 10, SNR = 30 dB
r t
(c)
(d)
Figure 3-3: Complementary cumulative distribution function of the capacity of an i.i.d.
Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results based on 1000 Monte
Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20 dB, and (d) 30 dB.
M{= 2, Mf= 4 iid Rayleigh channel, 16-QAM
M{= 4, Mf= 4 iid Rayleigh channel, 16-QAM
(b)
Figure 3-4: BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr = 4 and (b) Mt = 4 and Mr = 4.

32
GMD+OFDM, N = 64, L = 4, 64-QAM
Figure 3-5: BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined
with OFDM for ISI suppression.

CHAPTER 4
UNIFORM CHANNEL DECOMPOSITION
We have seen in Chapter 3 that the GMD scheme can have much better perfor
mance than the conventional linear transceivers. However, the GMD scheme may suffer
from considerable capacity loss at low SNR due to the inherent zero-forcing oper
ations which is capacity lossy, especially at low SNR. In this chapter, we propose a
uniform channel decomposition (UCD) scheme, which is also based on the GMD matrix
decomposition algorithm, to decompose a MIMO channel into multiple identical sub
channels. The UCD scheme has two implementation forms. One is the combination
of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST)
detector, which is referred to as UCD-VBLAST, and the other includes a dirty paper
(DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as
UCD-DP. Just like the GMD scheme, UCD can bring much convenience to the subse
quent modulation/demodulation and coding/decoding procedures by obviating the need
of bit allocation. Two remarkable merits of UCD, which are not shared by the GMD
scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD has
the maximal diversity gain. Moreover, the UCD scheme can decompose a MIMO chan
nel into an arbitrarily large number of independent subchannels, which is an enabling
technology to achieve high data rate transmission using small symbol constellations.
To facilitate the discussion, we recall the channel model given in (2.1) as follows.
y = HFx + z, (4.1)
where x G and y G CMrXl is the received signal and H G CMrXMt is the channel matrix with rank
K. We assume E[xx*] = and z ~ N(0,a^I\Â¡T) is the circularly symmetric complex
Gaussian noise. We define the input SNR as
P = -~-J2 FX^ = ^Tr{F*F} 4 Tr{FF}. (4.2)
33

34
4.1 Closed-Form Representation of MMSE-VBLAST
The UCD scheme is based on the closed-form representation of the VBLAST scheme
using MMSE nulling vectors. For MMSE-VBLAST, the nulling vector for the ith layer
is
(4.3)
w, = hjh* + alj hj, i = 1,..., Mt.
The MMSE-VBLAST algorithm can be represented in a concise matrix form which was
given in [9] (also see the more detailed version [47]).
Consider the augmented matrix
Ha =
H
x/oIm,
(4.4)
(Air+Mt)xMt
Applying the QR decomposition to Ha yields
Ha = Q//aR//a =
Q Ha
Qtfa
R
Ha
(4.5)
where R//a 6 CMfXM is an upper triangular matrix with positive diagonal elements and
QuHa CWrXA/t. Note that H = Q# R//a is not the QR decomposition of H since
is not unitary. However, we can readily obtain the nulling vectors using and R
as shown in the following lemma [47]:
Lemma 4.1.1 Let {q//a,}fi'i denote the columns of and the diagonal
elements of R//, where QuHa and R#o are given in (f.5). The nulling vectors of (4-3)
satisfy
w
i = rHl,iicHa, i = l,2,...,Mt.
(4.6)
Then the output signal-to-interfere-and-noise ratio (SINR) of the zth layer (i.e., the
signal corresponding to hÂ¡) using w, is
|h>i|2^
Pi =
w* (E=i alhJh* + E1) w4
(4.7)
Inserting (4.3) into (4.7), we can simplify (4.7) via some straightforward calculations to
be (see, e.g., [48])
P* = h.*cr1hi> i = l,...,Mf
where C = E}=i hjh* + al.
(4.8)

35
The SINRs given in (4.8) are related to R//a as shown in the following lemma:
Lemma 4.1.2 The diagonal o/R//a given in (4-5) and {p,}^ given in (4-8) satisfy
a(l + pi) = r2Haii, i = 1,2,..., Aft- (4-9)
Proof: See Appendix A.
An immediate corollary follows.
Corollary 4.1.3 The MMSE-VBLAST detector is information lossless. That is,
M,
^ log(l + Pi) = log |H*Ha-1 + I|, (4.10)
t=i
where the right hand side of (4-10) is equal to (2.7) with F = 1m,
Proof: From (4.4) and (4.5), we have
H*Ha-1+I = a-1H0*H. = a-1R/io*Riro. (4.11)
Hence
Mt
log |H*Ha_1 +1| = Â£ log (cT1^) (4.12)
=i
According to Lemma 4.1.2,
log |H*Ha-1 +1| = ^ log(l + p).
=i
We note that Corollary 4.1.3 coincides with the findings in [48].
4.2 UCD-VBLAST
If we modify the precoder F given in (2.8) to be
F = V\$1/2iT (4.13)
where 1 Â£ (to avoid capacity loss, we should not choose L < K
in general) and fl*Pl = I, then we see through inserting (4.13) into (2.7) that the
F given in (4.13) is also a precoder maximizing the channel throughput. However,
introducing S7 brings much greater flexibility than the precoder of (2.8). In the following,
we concentrate on how to design 2.

36
Given the precoder of (4.13), the virtual channel is
G = HF = UA1/2iT = UET
(4.14)
Ga =
(4.15)
where E = A4>1/2 is a diagonal matrix with diagonal elements Let Ga denote
the augmented matrix
r UEft*
Vah
The UCD scheme is based on the following lemma.
Lemma 4.2.1 For any matrix of the form given in (4-15), we can find a semi-unitary
matrix ft G CL*K such that the QR decomposition of Ga yields an upper triangular
matrix with equal diagonal elements.
Proof: Rewrite (4.15) as
Ga =
U[E:
Voii-L
(4.16)
where 20 Cixi is a unitary matrix whose first K columns form !. We further rewrite
(4.16) as
G =
Ij\ir 0
0 S7q
We can have the following GMD:
U[E: 0/cX(x,_/c)]
Vah
5-
(4.17)
Ji
U[E : 0ft-x(i/_ft-)]
- QjRjP*,
(4.18)
where Rj G M.LxL is an upper triangular matrix with equal diagonal elements and
Qj G C(Mr+L)xL Semi-unitary and Pj G CLxL is unitary. Inserting (4.18) into (4.17)
yields
IjWr 0
0 fig
Ga =
QjRjPjLq.
(4.19)
Let 70 = P} and
Qg =
l A/,.
o n
0
Q ./
(4.20)
Then (4.19) can be rewritten to be Ga = QGaRj which is the QR decomposition of Ga.
The semi-unitary matrix FI associated with Gu consists of the first K columns of fi0 (or
P})-

37
Prom Lemma 4.2.1 and Lemma 4.1.2, we conclude that we can always combine a
linear precoder and the MMSE-VBLAST detector to uniformly decompose a MIMO
channel into L > K subchannels with the same output SINRs. According to Corollary
4.1.3, we can further conclude that the channel decomposition is strictly capacity lossless.
We refer to the scheme demonstrated in Lemma 4.2.1 as UCD-VBLAST.
The proof of Lemma 4.2.1 is insightful. Indeed, given the SVD of H and the
water filling level 41/2, we only need to calculate the GMD given in (4.18). Then we
immediately obtain the linear precoder F = V<)I,/22*, where il consists of the first K
columns of P). Let QÂ£a denote the first Mr rows of QGa, or equivalently the first Mr
rows of Qj (cf. (4.20)). According to Lemma 4.1.1, the nulling vectors are calculated as
w* = rj}i (4.21)
where rjtii is the zth diagonal element of R; and qGo>i is the zth column of QÂ£ .
Some observations can help reduce the computational complexity. For any matrix
B G CMxN with SVD B = UgAsYg and the augmented matrix with SVD
A =
B
y/al
= U^V^,
the diagonal elements of A and A#, i.e., and Asatisfy
^a, = yA2Bi + a, i
Moreover
U,
v^V^A^1
Hence the SVD of J defined in (4.18) is
and Vj = Vo.
J =
U[S: 0/fX(Â£,_K-)]S 1
SI/
where E is an L x L diagonal matrix with the diagonal elements
Gi \J at2 + a, 1 and
(4.22)
(4.23)
(4.24)
(4.25)
(4.26a)
r = y/a, K + 1 < i < L.
(4.26b)

38
Applying the GMD matrix decomposition algorithm given in Section 6.2 to E yields
S (Q1Q2 Ql-ORjP-xPL, Pj). (4.27)
Hence
U[E: 0/cx(Z-iC)]
y/ah
U[S : Oft-x(_/c)]Â£ 1
(Q1Q2...Ql-1)Rj(PI_1PI_2...P[).
(4.28)
Then the linear precoder has the form:
F = V
\$1/2:0 kx(L-K)
PiP2 -. .P-i.
(4.29)
The nulling vectors are calculated according to (4.21) with rJtl = ^nf=i &i\ and
Qg = U[Â£ : 0^X(i,_/c)]SQiQ2... Qr,_i. (4.30)
Note that Q; and PÂ¡, l = 1,2,..., L, are Givens rotation matrices and hence calcu
lating (4.29) and (4.30) needs 0(Mt(K + L)) and 0(Mr(K + L)) flops, respectively.
We summarize the UCD-VBLAST scheme as follows 1
Table 4-1: The UCD-VBLAST scheme
step
operation
flops
1
Compute SVD H = UAV*
0(MtMrK)
2
Calculate h1/2 using (2.9)
0(K2)
3
S = AfcVa
0(K)
4
Obtain E using (4.26)
0(K)
5
Apply GMD to E to obtain (4.27)
0(P)
6
Generate F using (4.29)
0(Mt(K + L))
7
Compute QÂ£ using (4.30)
0(Mr(K + L))
8
Calculate {w}|lj using (4.21)
0(MrL)
Obviously, our UCD-VBLAST scheme has comparable computational complexity
to the SVD based linear transceiver designs. An observation relevant to practical imple
mentations is as follows. Note that the receiver does not have to calculate Step 6 since
CSIT is available and the transmitter can run Steps 1 to 6. However, if the receiver cal
culates F. which only takes a small number of flops, and feeds it back to the transmitter,
1 Steps 5-7 can be processed simultaneously as in the GMD algorithm.

39
then the transmitter is relieved from calculating the SVDs. Hence in FDD systems, it is
preferable to feed back F, rather than H, to the transmitter. In TDD systems, there are
still advantages for feeding back F since this reduces by approximately half the overall
computational complexity.
We conclude the discussions of the UCD-VBLAST scheme by deriving the SINR of
each subchannel. Note that the diagonal elements of Rj is
rjii=n^l = 1,2,..., Z,, (4.31)
which is the geometric mean of the diagonal elements of E. It follows from (4.26) that
K
\1/L ( k \ l/L
rlu = ( aL~K Rtf + )) = a ( (a-1 (4.32)
Â¡=i
q=i
According to Lemma 4.1.2,
K
l/L
Pi = p ~ II(a 1 a=i
(4.33)
Hence
L K K
loS2(! + Pi) = k&i1 + a~1 i=l i=l :=1
which is exactly the CÂ¡j in (2.10). Hence UCD-VBLAST is strictly capacity lossless.
4.3 UCD-DP
As a dual form of UCD-VBLAST, the UCD scheme can be implemented by using
DP precoding, which we refer to as UCD-DP. For UCD-DP, a direct construction of
the linear precoder F as done in Section 4.2 is not obvious. Instead, we exploit the
We convert the UCD-DP problem into the UCD-VBLAST problem in the reverse
channel where the roles of the transmitter and receiver are exchanged
y = H*x + z. (4.35)
The UCD-VBLAST scheme can be applied to the channel of (4.35), which yields
the precoder Fre and the equalizer {w,}f=1 as in (4.29) and (4.21), respectively. Nor
malize {w}t=1 to be of unit Euclidean norm, which we denote as {w,}f=1. Let W =
[wj,..., w]. According to the uplink-downlink duality, the precoder of UCD-DP should
be F = WD, where D, is diagonal with the diagonal elements which will be

40
determined based on (4.40) below. We use Fre, the linear precoder in the reverse
channel, as the linear equalizer. Then the equivalent MIMO channel is
Y F*etlHWD9x + F*evz, (4.36)
where the zth scalar subchannel of the MIMO channel is
L t-1
Vi = f*Hwiy/qlxi + f/Hwjy/q]xj + ^ f)*Hwjy/qjxj + f*z. (4.37)
j=i+1 j= 1
Applying the dirty paper precoder to x, and treating ^ f* Hwj ^/qjXj as the interfer
ence known at the transmitter (note that here we precode the first layer first while for
UCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
Vi = f)*Hwjy/qjXj + f*z (4.38)
j=i+1
with SINR
Pi =
for = 1,2.
L.
(4.39)
The next step is to calculate {qi}i=i such that pi = p, 1 < i < L, where p is as defined
in (4.33). Let aj = |f*Hwy|2. Then (4.39) can be represented in the matrix form
aii pan
-pan
<7i
' llfill2 '
0 22
-pa2L
<72
INI2
0
0 aLL
<7l
pa
.I|fi||a.
(4.40)
It is easy to see that 0, 0 < i < L. It is proven in [49] that Yl!i=\ <7 tr(FF*) =
tr(FreF*ei,). That is, the UCD-DP needs exactly the same power as the UCD-VBLAST
to obtain L identical subchannels with SINR p.
The UCD-DP using the Tomlinson-Harashima precoder leads to an input power
increase of for M-QAM symbols. Nevertheless, for a system with high dimension
ality and/or using large constellation, UCD-DP is a better choice than UCD-VBLAST
since it is free of propagation errors.
4.4 Performance Analysis
4.4.1 Diversity Gain Analysis
An important performance metric is diversity gain, which is defined as follows [16].

41
Definition 4.4.1 Let Pe(p) denote the average error probability of a scheme at SNR p.
The diversity gain of the scheme is
d =
lim !2fS.
p-> OO log p
(4.41)
The diversity gain measures how fast the error probability decays with SNR. We note
that diversity gain is usually discussed without assuming the availability of CSIT. The
reason is that diversity gain is a concept associated with channel outage, i.e., the case
where the channel is too poor to support a target data rate. Using CSIT, one can adjust
the transmission rate to avoid channel outage. However, if the rate is fixed, which is
desirable in practice, we can also use diversity gain as a performance measure of the
transceiver designs. Based on this observation, we analyze the diversity gains of the
UCD and GMD schemes. The result is summarized in the following proposition.
Proposition 4.4.2 Consider the i.i.d. Rayleigh flat fading MIMO channel defined in
(4-1). Let M = max(Mt,Mr) and m min(Mt,Mr). The diversity gains of the GMD
and the UCD schemes are
dGMD{LI,m) (M m + 1 )m, and dUCD(M,m) = Mm, (4.42)
respectively.
We have applied the typical error event analysis (see [16] [50]) to obtain (4.42). The
details are relegated to Appendix B. We see that although UCD has a negligible coding
gain compared with the GMD scheme at high SNR, it has an additional m2 m diversity
gains over GMD. An interesting point to make is that water filling does not help improve
diversity gains. Hence at high SNR, water filling is useless in both capacity and diversity
aspects.
Given the fact that the GMD scheme is asymptotically capacity lossless for high
SNR, it is rather surprising to see the large diversity loss of GMD compared with UCD.
We give an intuitive explanation as follows. Note that diversity gain is determined by
the typical error events that the MIMO channel is in deep fade. Namely, the diversity
gain of a scheme depends on its ability of dealing with bad channels. A deeply faded
channel with high input SNR is equivalent to a normal channel with low SNR, in
which scenario the GMD scheme is far less efficient than UCD as shown in the numerical
examples. Consequently, the GMD has less diversity gain than UCD.

42
4.4.2 Further Remarks
Besides the larger coding gain at low SNR and an improved diversity gain at high
SNR, the UCD scheme enjoys more flexibility than the GMD scheme. For a rank
K MIMO channel, the GMD scheme can support no more than K independent data
streams. However, the UCD scheme can decompose a rank K MIMO channel into
L > K identical subchannels, and L is not even limited by the dimensionality of the
channel matrix. This property of the UCD scheme enables one to achieve high data rate
transmission using small constellations as demonstrated in the numerical examples.
The UCD scheme also suggests new ways of channel decomposition which are much
more flexible than the conventional SVD based ones. Indeed, one may chose the permu
tation matrices and Givens rotations to achieve a wide variety of channel decompositions
with some prescribed SINRs as suggested by the generalized triangular decomposition
(GTD) (See Chapter 6, [51] [52]). This idea is developed in Chapter 5.
Finally, we link UCD with DBLAST [2], which has been shown to be able to achieve
the optimal tradeoff between the channel diversity and multiplexing [16]. We observe
that each diagonal layer of DBLAST can be viewed as the interleaving of the vertical
layers of VBLAST in the space-time domain and each diagonal layer can be regarded
as a virtual subchannel with the same capacity. However, DBLAST requires short and
powerful error correcting coding to make the virtual subchannel work as a real one.
This is a major difficulty for the implementation of DBLAST. In addition, DBLAST
suffers from boundary wastage. In contrast, our UCD scheme, by exploiting CSIT,
applies interleaving (via the Givens rotations and permutations) in the space domain
only. This makes the UCD scheme free from the boundary wastage. Moreover, the
UCD scheme is decoupled from coding procedures. Indeed, UCD can be concatenated
with any error correcting code. Furthermore, UCD makes it easier to design the coding
scheme since UCD decomposes a MIMO channel into multiple subchannels with identical
capacities. Thus in a slowly time varying channel, UCD is much easier to implement
than DBLAST despite their duality. This manifests clearly the values of CSIT.
4.5 Numerical Examples
We present next several numerical examples to demonstrate the effectiveness of the
UCD scheme.
In the first example, we assume Rayleigh independent flat fading channels with
Mt = 10 and Mr = 10. We compare the channel capacity using the UCD and GMD

43
schemes. The complementary cumulative distribution functions (CCDF) of the capacity
drawn out of 2000 Monte-Carlo realizations of H are shown in Figure 4-1. We see that
the UCD scheme outperforms the GMD scheme significantly at low SNR although the
difference becomes smaller at higher SNR.
Figure 4-2 shows the CCDFs of the channel capacities of a 5 x 5 independent
Rayleigh flat fading channel with SNR equal to 25 dB. The five thin dashed curves
denote the channel capacities of the five subchannels obtained via SVD plus water
filling. Note that the leftmost thin dashed curve crosses the vertical axis at a value
less than one, which means that the worst subchannel (corresponding to the smallest
singular value of the channel matrix) is sometimes discarded by water filling. The thick
solid line is the CCDF of the capacity of the L = 5 subchannels obtained via UCD. All
these subchannels have the same capacity. As discussed in Section 4.2, a rank K MIMO
channel can be decomposed into L > K subchannels. The thin solid line represents
the case where a MIMO channel is decomposed into 7 identical subchannels using the
UCD scheme. Figure 4-2 demonstrates the advantages of our UCD scheme over the
conventional SVD plus bit allocation scheme (see, e.g., [19]). The channel capacities
of the 5 subchannels obtained via SVD plus water filling range from 0 to about 11
bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match
the capacity of the worst subchannel and something like 1024 or 2048 QAM to the best
subchannel. This bit allocation significantly increases the modulation/demodulation
complexity. Using GMD or UCD, we can decompose a rank 5 MIMO channel into 5
subchannels and hence the same constellation with a reasonable size, say 128-QAM, can
be used to reap most of the channel capacity. The UCD scheme can do even better.
In this example, after decomposing a MIMO channel into 7 subchannels via UCD, we
can apply a small to moderate constellation, say 16-QAM or 64-QAM, to achieve the
channel capacity.
In the third example, we assume Rayleigh independent flat fading channels with
Mt 4 and Mr = 4. We compare the BER performance of the GMD and UCD
schemes along with the conventional MMSE-VBLAST with optimal detection ordering
in Figure 4-3. We see that both GMD and UCD outperform the conventional VBLAST
detector significantly. Moreover, the BER vs. SNR lines of the GMD and UCD schemes
have much steeper decreasing slopes, which means much better diversity gains, than the
conventional VBLAST. The diversity gains of the GMD and UCD schemes are 4 and 16,

44
respectively. While there is a noticeably larger diversity gain for UCD compared with
GMD as shown in Figure 4-3, the difference is not as drastic as the theoretical prediction.
It is because the input SNR is not high enough to validate the approximations made in
the typical error event analyses (see Appendix B).
In the final example, we compare the BER performance of UCD-VBLAST and
UCD-DP in the scenario of a 10 x 10 Rayleigh flat fading channel. To present a bench
mark, we also include UCD-genie as the imaginary scenario where at each layer, a genie
would eliminate the influence of erroneous detections from the previous layers when using
UCD-VBLAST. Figure 4-4 shows that UCD-VBLAST may suffer from some small BER
degradations caused by error propagation (about 0.5 dB for BER = 10-4) compared with
UCD-genie. The UCD-DP, on the contrary, is free of error propagation and hence has
BER performance very close to that of UCD-genie. The slight SNR loss of UCD-DP
is mainly due to the inherent power-amplification effect of the Tomlinson-Harashima
precoder.
4.6 Conclusions
Based on the GMD matrix decomposition algorithm and the closed-form represen
tation of the MMSE-VBLAST detector, we have introduced the UCD scheme for MIMO
communications that can decompose a MIMO channel into multiple subchannels with
identical capacities in a capacity lossless manner. We have proposed two versions of the
UCD scheme, i.e., UCD-VBLAST and UCD-DP. The UCD scheme can provide much
convenience for the subsequent modulation/demodulation and coding/decoding proce
dures due to obviating the need of bit allocation. We have also shown that UCD can
achieve the maximal diversity gain. The simulations show that the UCD scheme has
excellent performance even without the use of error correcting codes. The UCD scheme
suggests a new way of channel decomposition which enjoys much more flexibility than
the conventional SVD based ones.
Appendix A
Proof of Lemma 4.1.2
Rewrite (4.5)
Q Ha
H Q = QHuRHa 4
(4.43)

45
H
hi
Ha,, =
vM
7 ha,i
0(-l)xl
0(M,-t)xl
Let H, (H) denote the submatrix containing the first i columns of Ha (H) and ha i
(hj) the th column. Then
(4.44)
For the QR decomposition Ha = Q//aR//a, the geometric implication of rua ii is
the component of haa projected onto the subspace spanned by the ith column of Q//o,
i.e., qHa,i- Note that q//ai is orthogonal to the subspace spanned by {qH0,j}j=i or,
equivalently, the column space of Hence
rHa, = K^h^K, (4-45)
where stands for the orthogonal projection onto th null space of AT. Therefore
2 i_ *
^Ha,ii ^a,i
I-HQ,i_! (HV.jH^x) 1 H;,_!
Inserting (4.44) into (4.46) yields
ba.i-
(4.46)
rlaM = a + h* I H_x (H^jHi-i + a:!)-1 H*_j
h,
= a + ah? (Hi-iHJ.! + al) 1 h,
(4.47)
From (4.8), we see that
Pi = h* (Hj_1H*_1 + al)-1 hj. (4.48)
Hence r2Ha ii = a(l + pt). The lemma is proven.
Appendix B
Proof of Proposition 4.4.2
Without loss of generality, we assume H e CMxm, each of whose entry is of cir
cularly symmetric Gaussian distribution with zero-mean and unit variance. Consider
BPSK modulation. The average error probability of the GMD scheme is
Q ^\/2Pgmd^ E
qUwb)
= E
Q
t
\
/ m \ X/m\
2p (n )
V
\=i / y
(4.49)

46
where the Q-function is defined as
The diversity gain of the GMD scheme is
log PGMD
d0MD(M,m) = lim 5. (4.50)
P^oo log p
For any QAM constellation, the average error probability is similar to (4.49) except for
some constants before or inside the Q-function. Since we focus on the high SNR region,
all these constants will not affect the diversity gain defined in (4.50).
At high SNR, the typical error event is
Â£ {^h < P }
(4.51)
It can be shown that instead of calculating (4.50), which involves complicated integra
tions, we can compute the following [50, Ch. 3]:
dGMv{M,m) = lim
p*oo
log P(Â£)
log p
(4.52)
Note that
m
= |H*HI (4.53)
i=l
According to [53, Theorem 7.5.3] (with straightforward extensions from real-valued do
main to the complex-valued domain),
^/T = |H*H| = jQ (4-54)
t=l
where g2Ps are independent Chi-squared random variables with probability density
= J~iyx'~le~X' x - (4.55)
Now the typical error event can be written as
m
{5A/-m+i}t=l j[ 9M-m+i ^ P
i-1
U {{?M-m+}Â£l -ah-m+i
47
where Â£a {{aj x : Ya=i a > m}' Hence
P{Â£) =
2
Mm+i
< p ai)dai... dam
(4.57)
Prom (4.55), we know that as e * 0,
P(gl Using (4.52) (4.58) and (4.57), we calculate the diversity gain as
<^gmd(^Uim)
i r TTm P m+*)Q ,
_ Jej 1 li=l (M-m-fi)!
poo log P
log L.+ p~ (Ai-m+*)- don dam.
- lim 2 -
p>oo log P
m
m + z)a,
Â£q i=i
(4.59)
(4.60)
where Â£+ = Â£Q H^t > 0, i = 1,..., m). To obtain (4.60) from (4.59), we have used the
property that the integral in the numerator of (4.59) is dominated by the term with the
SNR exponent closest to zero, as p oo (see [16] for details). Here the integration is
constrained over Â£+ because the integration over Â£a is dominated by the one over Â£+.
The reason is as follows. Suppose only ani,... 1anj > 0, j < m, and the other as,
Qfcl,.. .,akm_., are negative. Then
m j
n^(5M-m+i < P ) ~ < P
i=l =1
Let Â£+ denote {{a,H=1 > 0 : Â£<=i ani > m -
j m
*"i) p-EUl(.M-m+ni)Qni
afcJ. Clearly,
inf (M m + rii)an > inf (M m + i)at,
+
which implies that the integration over Â£a is dominated by that over Â£+. Solving the
optimization problem of (4.60) yields
dGtjm(M, m) = (M -m + 1 )m.
(4.61)

48
Now we consider UCD. We observe that the power allocation applied to each eigen
subchannel is no greater than p. Hence the overall channel throughput of UCD is
m m
Yl bS (l + ^Ak) ^ ^CD < Yl l0g 1 + (4-62)
=1 i=l
where the left term denotes the channel throughput associated with uniform power
allocation. Applying UCD, we obtain m subchannels with the same SNR:
\
771 rn
IK1* ~~ 1 PvcD ~ \
=i \
n (i+pA^) -1.
(4.63)
i=l
The typical error event is
It follows from (4.63) that
Â£ = : PVCD < 1}
(4.64)
m \ / m m \
\J n 0+Â£xh) -1 <11
(4.65)
It is easy to see that
Hence
lim = lim hgPM
p>oc
logp
Mm asm ,im
p->oo log P
logRi(p)
(4.66)
(4.67)
poo log P p->oo log p '
which implies that water filling does not help improve diversity gain.
It follows from the analyses of [16] that the UCD scheme achieves the optimal
diversity-multiplexing tradeoff. In particular, when the transmission data rate is fixed,
disregard the increase of input SNR, the diversity gain is ducd(M,m) = Mm.
w

49
M, = 10, M > 10, SNR = 0 dB M, = 10, M = 10, SNR = 10 dB
t r I r
(a)
(b)
M -10, M =10, SNR = 20 dB
t r
Mt = 10, Mf = 10, SNR = 30 dB
Figure 4-1: Complementary cumulative distribution function of the capacity of an i.i.d.
Rayleigh flat fading channel with Mt = 10 and Mr 10. Results based on 2000 Monte
Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20 dB, and (d) 30 dB.

50
Mr = 5, M( = 5 iid Rayleigh channel SNR = 25 dB
Figure 4-2: Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results
based on 2000 Monte Carlo trials.
Mf= 4, Mr= 4 iid Rayleigh channel, 16-QAM
Figure 4-3: Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt = 4 and MT = 4.

BER
51
Mt= 10, Mf= 10 Â¡id Rayleigh channel, 64-QAM
Figure 4-4: BER performances of the UCD-DP, UCD-VBLAST schemes and the imagi
nary UCD-genie scheme. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh
flat fading channel with Mt = 10 and Mr = 10.

CHAPTER 5
TUNABLE CHANNEL DECOMPOSITION
5.1 Introduction
All these aforementioned MIMO transceiver designs focus on improving the commu
nication quality subject to power constraints. In this chapter, we tackle a new aspect of
the MIMO transceiver design problem. We regard a MIMO transceiver design as a way
of decomposing a MIMO channel into multiple subchannels. As we have mentioned, the
MIMO channel decomposition through SVD plus water filling lacks flexibility despite
its optimality in terms of achieving the maximal overall channel capacity. The success
of UCD motivates a much more flexible channel decomposition approach, namely the
tunable channel decomposition (TCD) scheme, which is the main result of this chapter.
Using the recently developed generalized triangular decomposition (GTD), we propose
the TCD scheme to decompose a MIMO channel into multiple subchannels with pre
scribed capacities or, equivalently, signal-to-interference-and-noise ratios (SINR). The
main properties of the TCD scheme are summarized as follows:
1. Given K parallel subchannels with capacities C\, C2, .., Ck, which is obtained
through applying SVD plus water filling to a rank K MIMO channel, TCD can
convert the K subchannels into L > K subchannels with capacities R\, R2,..., R
if and only if (Ci,..., CK, 0,...,0)eRJ majorizes (Ri,R2, ..., Ri) 1 In partic
ular, Ci = X^!=i Rii be., the TCD is capacity lossless.
2. The TCD scheme has two implementation forms. One is the combination of a
linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST)
detector, which is referred to as TCD-VBLAST, and the other includes a DP
precoder and a linear equalizer followed by a DP decoder, which we refer to as
TCD-DP.
3. Given the SVD of the MIMO channel matrix, the computational complexity of
TCD, which is to calculate the precoder and equalizer matrices, is O(KL), which
is computationally quite efficient.
1 The concept of majorization is introduced in Section 5.2.
52

53
Almost originated at the same time as the research on MIMO transceiver designs,
the optimal design of symbol synchronous CDMA (S-CDMA) sequences has been un
der intensive study over the past decade (see, e.g., [31] [54] [55] [56]). Although the two
research topics have been studied in an apparently independent manner in the signal
processing and information theory communities, the CDMA sequence design problem
can be viewed as a special case of the MIMO transceiver design as we have shown
in Section 2.1.1. Hence the TCD scheme can be applied, with little modifications, to
the design of optimal CDMA sequences. Moreover, the TCD-VBLAST and TCD-DP
schemes can be applied to design optimal CDMA sequences in the uplink (mobile-to-
base) and downlink (base-to-mobile) scenarios, respectively. Our TCD scheme, which
is independently motivated by the MIMO transceiver design problem, turns out to be
related to the scheme proposed in [56]. The relationship is discussed in Section 5.3.
5.2 Channel Model and Preliminaries
5.2.1 Channel Model
To facilitate the discussion, we rewrite the channel model used in the previous
chapters.
y = HFx + z, (5.1)
where x G CLxl is the information symbols precoded by the linear precoder F G CM,xL
and y G CMrXl is the received signal and H G CMrXMt is the channel matrix with rank
K. We assume Â£7[xx*] = cr2I and z ~ A(0,<72lA/r) is the circularly symmetric complex
Gaussian noise. We define the SNR as
Â£[x*F*Fx] a:
= -Â§Tr{F*F} ^ -Tr{F*F},
a
(5.2)
5.2.2 Channel Decomposition
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K
diagonal matrix whose diagonal elements are the nonzero singular values of
H. To maximize the channel capacity with respect to F given the input power constraint
Tr{FF*} < pcr^/cr2, one needs to solve
CÂ¡T = max log, |I + a_1HFF*H*
Tr{FF*} The optimal linear precoder is (cf. (2.8))
(5.3)
F =- V\$1/2.
(5.4)

54
Here L = K and 4> is diagonal whose kth {1 < k < K) diagonal element 4>k determines
the power loaded to the kth subchannel and is found via water filling to be
/ \ +
Mp) =
a
AH,k
(5.5)
with p being chosen such that a\ J2k=i 4>k(p) P&z an<^ (a)+ = max{0, a}. In this case,
we obtain K subchannels with capacities
Ck = log2 1 +
= log2
P^H,h
a
bps/Hz, k = 1,2,... ,K. (5.6)
Due to the usually large dynamic range of singular values {Af/fc}Â¡[Li> the SVD decom
poses a MIMO channel into multiple parallel eign-subchannels with different channel
capacities. Moreover, since the optimal power loading levels are fixed as given in (5.5),
the achievable MIMO channel decomposition is rigidly given in (5.6) and it lacks flexi
bility.
Another way of decomposing a MIMO channel is to use the VBLAST detector [5],
The VBLAST scheme involves sequential nulling and cancellation and it decomposes
the MIMO channel into K subchannels (or layers as coined in [5]). By changing the
ordering of the signal detection, we can get K\ subchannel combinations, each of which
is capacity lossless [48].
Theoretically, more combinations of subchannels is possible via time sharing (see
[57, Ch. 14.3]). Recall that every DBLAST layer sends its data substream across the K
transmitting antennas, or VBLAST layers, in a time sharing manner [2]. For example,
for a system with Mt 2, the transmitted data are
Vertical Layer-I : X\ y2 x2 y4 ...
Vertical Layer-II : 0 x? y3 x4 ...
Let X( and yi,i = 1,2,..., denote the symbols transmitted through the DBLAST layers
I and II, respectively, at time i. The receiver first estimates xi and then estimates x2
by regarding y2 as interference. The estimates of xY, x2 are decoded jointly, which form
the output of the diagonal layer I. After subtracting out the effect of X\,x2 from the
received data, we can estimate and decode y2)y2, which form the diagonal layer II. We
remark that DBLAST can be viewed as a combination of VBLAST and the time sharing
technique, which decomposes the MIMO channel into multiple identical subchannels.

55
However, time sharing can be difficult to implement in practice. For instance, the
major difficulty of DBLAST is the requirement of encoding the diagonal layer with short
and efficient error correction codes, which limits its practical implementation despite its
superb theoretical performance analyzed in [16].
If CSIT is available, more flexible and practical channel decompositions can be
achieved. In Chapter 4, we have proposed the UCD scheme which combines the geomet
ric mean decomposition (GMD) developed in Section 6.2 with either an MMSE-VBLAST
detector or a DP precoder to decompose the MIMO channel of (5.1) into L > K iden
tical subchannels. Hence, the UCD scheme can achieve the theoretical performance of
the DBLAST scheme without resorting to any error correcting coding.
In this chapter, we generalize the results of Chapter 4 and develop a systematic
channel decomposition that combines the recently proposed GTD algorithm with either
an MMSE-VBLAST detector or a DP precoder. We show that given K parallel subchan
nels with capacities Ci, C2,..., Ck, which are obtained via SVD, TCD can convert the
K subchannels into L > K subchannels2 with capacities Rx, R2,..., R if and only if
(Ri, /?2, , Rl) is majorized by {C\,..., Ck, 0,..., 0) 6 ML. This scheme is particularly
relevant to the applications where independent data streams with different qualities-of-
service (QoS) share the same MIMO channel [28], For example, video services usually
require higher SNRs than audio services. Decomposing a MIMO channel into multi
ple subchannels with prescribed capacities and transmitting independent data streams
through these subchannels can provide much convenience for resource allocations.
5.2.3 Majorization and Generalized Triangular Decomposition
We introduce several basic concepts and theorems of the majorization theory from
[58]-
Definition 1 For x, y G Rn, if
i i
l^j 1=1 1=1
with equality holds for j = n, where the subscript [i] denotes the ith largest element of the
sequence, we say that x is majorized by y and denote x -<+ y or, equivalently, y x.
2 If L < K, some eign-subchannels are discarded, which causes capacity loss. Hence
we focus on the case of L > K.

56
Definition 2 An n x n matrix P is doubly stochastic if its (i,j)th entry p^ > 0 for
i,j = 1,. , n, and Pij = 1 and Pij = 1.
Theorem 5.2.1 x -<+ y if and only if there exists a doubly stochastic matrix P such
that x = Py.
A square matrix II is said to be a permutation matrix if each row and column has a
single one, and all the other entries are zero. There are n! permutation matrices of size
n x n.
Theorem 5.2.2 The permutation matrices constitute the extreme points of the set of
doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the convex
hull of the permutation matrices.
It follows from Theorems 5.2.1 and 5.2.2 that the set {x|x -<+ y} is the convex
hull spanned by the n! points which are the permutations of y.
As we have mentioned before, given K parallel subchannels with capacities C\, C2,..., CV,
which are obtained via SVD, TCD can convert the K subchannels into L > K subchan
nels with capacities R\,R2,... ,Rl if and only if (RX,R2,..., Rl) -<+ (Cj,..., Ck, 0,..., 0) G
ML. For example, for a MIMO channel H with rank K 3, assume that the capacities
of the 3 subchannels obtained via SVD are C\ > C2 > C3. If L = K, then TCD can
decompose the MIMO channel into 3 subchannels with a rate vector r = (Rx, R2, R3) if
and only if r lies in the convex hull
Co
(A
C2
V Ca )
Here Co stands for the convex hull defined as
(5.9)
CofS} {01! + + 6Â¡ 0,9\ -F ... -)- 6Â¡< = 1}. (5.10)
In general, the capacity region is a convex hull defined by K\ vertices in a K-
dimensional space. Since the TCD is capacity lossless, i.e., Q the
capacity region falls into a (K l)-dimensional hyperplane. The gray area in Figure
5-1 shows the convex hull of (5.9) with Ci = 3, C2 = 2, and C3 = 1. In this case, the 6
vertices lie in the 2-D plane {x : Y^Â¡=i xi =6}. An interesting special case is the UCD
scheme [59], which achieves the rate vector corresponding to the center of the convex
hull, i.e., r = (2, 2,2).

57
Capacity lossles region (C, = 3, C2 2, C 1)
Figure 5-1: Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, Cx = 3, C2 = 2, and C3 = 1.
Definition 3 For x, y M", if
i i
n^ =1 i=i
with equality for j = n, we say that x is multiplicatively majorized by y and write
x - Obviously, if x y, then logx logy.
Now we are ready to introduce the GTD theorem.
Theorem 5.2.3 (GTD theorem) Let H Â£ cm*n have rank K with singular values
A R+. There exists an upper triangular matrix R Â£ CK*K and matrices Q and P
with orthonormal columns such that H = QRP* if and only if the diagonal elements of
R satisfy |r| A.
Proof: We relegate the proof to Chapter 6.
There is a computationally efficient and numerically stable algorithm to achieve the
GTD predicted by Theorem 5.2.3, which is presented in Chapter 6.
5.3 Tunable Channel Decomposition
5.3.1 TCD-VBLAST
We see from (5.2) that F can always be scaled such that a = 1. Hence without loss
of generality, we let a = 1 in the sequel to simplify the notation.

58
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K
diagonal matrix whose diagonal elements are the nonzero singular values of
H. The conventional SVD based linear transceiver designs have precoder F = V\$V2
where is a diagonal matrix whose diagonal elements stand for the power allocation.
The precoder F transforms the MIMO channel into K orthogonal subchannels with
capacities
Ck = log2(l + Alk) bps/Hz, k = 1,2,..., K. (5.12)
For this kind of precoder design, the only way of controlling the capacity of the sub
channels is to change the power allocation .
If we modify the precoder F to be 3
F = V\$l'2nr (5.13)
where 6 RixK with L > K, and 2T2 I, then it can been readily seen that
introducing 2 does not change the overall channel capacity. However, it brings much
greater flexibility as demonstrated in the following theorem.
Theorem 5.3.1 (TCD Theorem) Consider a MIMO channel of (4-1) with F given
in (5.13). For any L > K, let c be a zero vector with its first K elements
replaced with {Ck}k=v where Ck = log (1 -f A2H k4>k)- Given any rates we can
find an orthonormal matrix 2 6 RixK such that the combination of the linear precoder
F = Vfc^nT and the MMSE-VBLAST detector yields L subchannels with capacities
{ify}Â£=i if and only if {Rk)k=i -<+ c.
Proof: Given the precoder of (5.13), the virtual channel is
G = HF = UA<>1/2f2T = UAG2T (5.14)
where Ac = A\$1/2 is a diagonal matrix with diagonal elements
Xg, = XH(Â¡)]/2, i = l,...,K. (5.15)
3 Letting 2 to be complex-valued does not introduce additional flexibility as is clear
according to the GTD algorithm.

59
Let the augmented matrix Ga be defined as
Ga =
UAGf2T
h
(5.16)
J (Mr+L)xL
After some straightforward calculations, we can obtain the SVD of Ga as the following:
, -i
Ga =
U[Ag : 0Kx(L_K')]AGa
f0A'
A-Ga 0 >
(5.17)
where fi0 RLxL is orthogonal with its first K columns forming fi and the diagonal
matrix AGa contains the singular values of Ga:
\A + A1 < < K,
1, i > K.
(5.18)
According to Theorem 5.2.3, we can apply GTD to obtain
AGa = QRGaPT
(5.19)
if and only if the diagonal elements of RGa E RÂ£'xi', which we denote as {'g0,*}1,
satisfy
{AGo,af=1. (5.20)
Note that both Q and P in (5.19) are real-valued matrices because AGa is a real-valued
diagonal matrix. Inserting (5.19) into (5.17) yields
G =
U[Ag : 0Kx(L-K)]AGa
fi0AGJ
TnT
QRc, P fi
(5.21)
Choose fi0 = PT and define
QGa =
U[Ag : QKx(L-K)]A.Gla
fioA,
-i
Q
(5.22)
Then (5.21) can be rewritten asGa = QGaRGa, which is the QR decomposition of Ga.
By Lemma 4.1.2, it follows that for a = 1, (5.20) is equivalent to
{Id- P*}=i = {rGa,,,},=! dx {AGoii}f=1,
(5.23)

60
where p, 1 < i < L, denotes the output SINR of the th subchannel, and AGa i is given
in (5.18). If
mL = {iog(i+A)}
{igAG,Kii = c>
(5.24)
then (5.20) and (5.23) hold, which implies the existence of fi (the first K columns of
pt).
Conversely, suppose that there exists a semi-unitary matrix 2 such that the linear
precoder F = V capacities {Rk}k=v Let Ga QGaRGa be the QR decomposition. It follows from
Theorem 5.2.3 that (5.20) holds. Hence, by (5.23), we conclude that (5.24) holds.
The proof of Theorem 5.3.1 is constructive. Indeed, given the SVD of H and the
power loading level \$1/,;2, we only need to calculate Ac, AGq, and the GTD of AGo given
in (5.19). Then we immediately obtain the linear precoder
F = V\$1/2Ot = V
&1/2:0Kx{L-K)
P.
(5.25)
Let Qga denote the first Mr rows of QGa. Then it follows from (5.22) that
Qca = U[Ag:0kx(l-K)]AgQ
= U
r: oKx(L-K)
Q,
(5.26)
where T is diagonal with its ith diagonal element being 7
to Lemma 4.1.1, the nulling vectors are calculated as
7t7r According
w = rc!,uqG.,i, 1 < i < L, (5.27)
where rGai is the zth diagonal element of RGq and qGo is the ith column of QGa.
In the GTD algorithm, P and Q are obtained via multiplication of L 1 Givens ro
tation matrices. Hence calculating (5.25) and (5.26) needs 0(Mt(L+K)) and 0(Mr(L+
K)) flops, respectively. We note that the decoding starts with the Lth layer, then the
L lth, and so on.
Given the SVD of H and the power allocation level the TCD-VBLAST scheme
needs to run the procedures summarized in Table 5-1. If Mt Mr K, then the TCD-
VBLAST scheme requires only 0(L2 + K2 + KL) flops, given the SVD of the channel
matrix.

61
Table 5-1: The TCD-VBLAST Scheme
step
operation
flops
1
Calculate Ac = A\$1/i2
0(K)
2
Obtain Ago using (5.18)
O(K)
3
Apply GTD to A.Ga to obtain (5.19)
SWT
4
Generate F using (5.25)
0(Mt(L + K))
5
Compute using (5.26)
0(Mr(L + K))
6
Calculate using (5.27)
0{MrL)
5.3.2 TCD-DP
Similar to UCD, the TCD also have two implementation forms, which are dual to
each other. As a dual form of TCD-VBLAST, the TCD scheme can be implemented by
using a DP precoder, which we refer to as TCD-DP. For TCD-DP, a direct construction
of the linear precoder F as done in Section 5.3.1 is not obvious. Instead, we exploit the
uplink-downlink duality revealed in [49] to obtain TCD-DP. This technique is also used
in [59].
We first apply the TCD-VBLAST scheme to the reverse channel
y = H*Fx + z, (5.28)
where the roles of the transmitter and receiver are exchanged and the H in (5.1) is
replaced by H*. Then we obtain the precoder F and the equalizer W = [wi,..., w]
from H* according to (5.25) and (5.27), respectively. Applying F and the VBLAST
detector with nulling vectors {w}f=1, we obtain L subchannels
1
w*y = ^ + w*z, i = 1,..., L,
j=i
(5.29)
where the zth subchannel (5.29) is free of interference from the yth (j > i) subchannels
which are detected and cancelled out in advance. The SINR of the subchannel (5.29) is
Pi
lw*H*f|2a^
(5.30)
Note that replacing w by w,, which is obtained by scaling w, such that ||w|| = 1, does
not change p, since the output SINR is invariant to the length of w,. Also note that
a = 1, i.e., a\ a2. Hence (5.29) can be simplified to be
|w*H*f|2
Pi =
(5.31)

62
Let f, i = 1,..., L, be the scaled version of f\ and has unit length. Denote pt ||f,||2.
Then
, i 1,.... L.
Let ij |f*Hw.,|2. Then (5.32) can be represented in the matrix form
(5.32)
1
o
o
1
Pi
Pi
~P2a 12 22
P2
P2
: . o
~PLalL -PLa2L aLL
PL
PL
(5.33)
According to the uplink-downlink duality, in the original channel, the precoder of TCD-
DP should be F = ..., v/Â¡tZw], where { and the receiving vectors are f, i 1,... ,L. Then we get L subchannels whose ith
scalar subchannel of the MIMO channel is
L i-1
2/t + Y i'HwjsfqjXj + Y + f.*z (5.34)
j=i+l j=1
Applying the dirty paper precoder to x, and treating E}=i f Hwj y/qjXj as the interfer
ence known at the transmitter (note that here we precode the first layer first while for
TCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
yi = ViYlvjiy/qixi+ Y + f*z (5-35)
=1+1
with SINR (again, recall that a 1 and o\ cr2)
<7|f;Hw|2
pi =
1 + Ej=i+1 ^IfrHwjl
for i 1,2,... ,L.
(5.36)
Similar to (5.32), (5.36) can also be represented as
11 Pl12 PllL
Pi
0 a22 P22L
92
=
P2
0 0 an
9L
PL
It is easy to see that > 0, 0 < i < L. It is proven in [49] that E+=i <7
tr(FF*) = Yh=iP- That is, to obtain L subchannels with SINRs {p}f=1,
(5.37)
= tr(FF) =
the TCD-DP

63
needs exactly the same power as the TCD-VBLAST. To make this chapter self-contained,
we give below an alternative proof to this interesting and useful fact.
Let Ua denote a strictly upper triangular matrix whose (, _?)th entry is aXJ for
1 < i < j < L and zero otherwise. Let VA and Vp be two Lx L diagonal matrices with
their zth element equal to att and p,, respectively. Then (5.32) can be rewritten as
{VA VJA\) p = p (5.38)
or equivalently
{V^Va-UJ) p = 1 (5.39)
where p = [pi,... ,p]T, p [pi, >Pl]T and 1 is a vector with unit elements. Hence
P = (V;1Va-UJ)~11 (5.40)
Similarly, (5.37) can be rewritten as
{VA VPUA) q p (5.41)
or
{V-1Va-Ua) q = l. (5.42)
Hence
q =(V;1Va-Ua)'11. (5.43)
Prom (5.40) and (5.43),
L L
j^Pi = iT {vyvA-ujy11 = iT (v-yvA-uAyl 1 = (5.44)
=i =1
We can use the Tomlinson-Harashima precoder [42] [43] or the trellis precoder [44]
to achieve known interference cancellation at the transmitter. For a system with high
dimensionality, TCD-DP is a better choice than TCD-VBLAST since it is free of prop
agation errors.
5.4 MIMO Communications with QoS Constraints
In this section, we apply the TCD scheme to MIMO communications with QoS
constraints. Suppose we want to transmit L > K independent data streams through a
MIMO channel. Instead of multiplexing all the substreams in the time division manner
to share the entire MIMO channel, we apply TCD to decompose the MIMO channel
into multiple subchannels whose capacities/SINRs meet the QoS requirements of the

64
substreams, and dedicate one subchannel to each substream. In [28], the authors studied
the same problem. They proposed a linear transceiver design which, similar to TCD, can
also control the SINR of each subchannel via designing the precoder. However, the linear
transceiver is capacity lossy and can suffer from considerable performance degradation
compared with our TCD scheme as we will show at the end of this section. Given that
all the subchannels meet the QoS constraints, we want to minimize the overall input
power. We need to solve the following optimization problem:
minp
subject to
tr (FF*)
diag(R) = {V1 + Pi)i=\-
(5.45)
Here QR denotes the QR decomposition and diag(R) denotes the vector formed by the
diagonal of R. According to Lemma 4.1.2, the diagonal of R determines the SINRs of
the subchannels. Without loss of generality, we assume that pi > p2 > ... > pl- We
now consider a problem whose constraints are more relaxed than those of (5.45):
minp tr (FF*)
subject to XGa {v/1 + Pi)i=i, Ga =
(5.46)
where \Ga stands for the singular values of the augmented matrix Ga. In general, for
any matrix A, we let denote the singular values of A. By Theorem 5.2.3, if F is
feasible in (5.45), then F is feasible in (5.46). We now further simplify (5.46) and show
that its solution provides a solution of (5.45).
Theorem 5.4.1 If H = UAV* is the singular value decomposition of H, then (5.46)
has a solution of the form F = V41/2 where \$ Â£ RKxK is a diagonal matrix with
diagonal elements 4>i, 1 < i < K, chosen to solve the problem
min* Y!=i &
subject to nti(l + x2H,ii) > H=i(l + P). k > 4>k+\ > 0, l
iiiLiU+^H,^) nti(i+Pi)'
(5.47)

65
Moreover, if QRGaPT is the GTD of Aca in (5.19), then (5.45) has the solution F =
V^1/2^7 where \$ is a solution of (5.47) and PI is the matrix formed by the first K
columns of PT.
Proof: See Appendix A.
We now develop an efficient algorithm for solving (5.47). We will see that the
constraint cpk > (5.47). To begin, we make a change of variables to further simplify the formulation of
(5.47). We define
A = i + 1 /A#,*, 1 < i < K,
A = 1 AH,i
fa = rittiri1+pi)-
With these definitions, (5.47) reduces to
miiv J2i 'Pi

subject to nil 'Pi > nil A, A > 1 1 (5.49)
Both the equality constraint and the inequalities > 4>k+\ in (5.47) have been dropped
since these constraints are automatically satisfied at an optimum. The fact that (Â¡>k >
(Â¡>k+\ is established after Lemma 5.4.2. With regard to the equality constraint, if ip is
feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the
cost is reduced when the trailing components of f> are lowered. That is, if ip is feasible
in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is
reduced when the trailing components of ip are lowered.
Clearly, the feasible set for (5.49) is nonempty and the cost function tends to infinity
as any of the components of ip tends to infinity. By continuity of the cost function and
the constraints, a minimizer must exist. We now analyze the structure of the minimizer.
By exploiting the structure, we obtain a fast algorithm for solving (5.49).
We first study a similar optimization problem with relaxed constraints.
Lemma 5.4.1 Any solution ip of the problem
K k k
min 'Pi subject to riA>n A, 1 < k < K, (5.50)
i=l =1 t=l
has the property that ipi+1 < ipi for each i.

66
Proof: We replace the inequalities in (5.50) by the equivalent constraints obtained
by taking logs:
k k
Y los(^) > Y 1 =1 =1
The Lagrangian C associated with (5.50), after this modification of the constraints, is
K / k
p) = Y ( Vk Y (los(^*)_ los(A))
k= 1 \ =1
By the first-order optimality conditions associated with ip, there exists i > 0 with the
property that the gradient of the Lagrangian with respect to ip vanishes. Equating to
zero the partial derivative of the Lagrangian with respect to ipj, we obtain the relations
K
i=j
Hence, ipj ipj+1 = p,j >0.
Using Lemma 5.4.1, we can gain insights into the structure of a solution to (5.49).
Lemma 5.4.2 There exists a solution ip to (5.49) with the property that for some integer
Vt+i < ipi for all i < j, Vi+i > fpi for oil i > j, ipi = -rj for all i > j. (5.51)
*H,i
In particular, ipj < ipi for all i.
Proof: If ip is a solution of (5.49) with the property that ipi > -J for all 1 < j <
K, then by the convexity of the constraints, it follows that ip is a solution of (5.50). By
Lemma 5.4.1, we conclude that Lemma 5.4.2 holds with j K. Now, suppose that ip is
a solution of (5.49) with ipi = 1 /X2H i for some i. We wish to show that ipk = 1 /X2H k for
all k > i. Suppose, to the contrary, that there exists an index k > i with the property
that ipk = 1/A2Hk and ipk+i > 1/A^ fc+1. We show that components k and k + 1 of ip
can be modified so as to satisfy the constraints and make the cost strictly smaller. In
particular, let ip(e) be identical with ip except for components k and k -f 1:
ipk(e) = (1 + e)ipk
and
ipk+i(e) =
*Pk+1
1 + e
(5.52)
For e > 0 small, ip(e) satisfies the constraints of (5.49). The change A(e) in the cost
function of (5.49) is
A(e) (1 + e)ipk +
A+i
-ipk- ipk+1.
1 + e

The derivative of A(e) evaluated at zero is
A'(0) = %pk -Tpk+i-
Since 1 /A^ fc is an increasing function of k and since pk l/A#*., we conclude that
tpk+i > V'k and A'(0) < 0. Hence, for e > 0 near zero, -0(e) has a smaller cost than
-0(0), which yields a contradiction. Hence, there exists an index j with the property
that V'. = 1 /\2H i for all i > j and pi > 1/A^ for all i < j.
According to Lemma 5.4.1, pi > ipi+i for any i < j. To complete the proof, we
need to show that pj < pj+i- As noted previously, any solution of (5.49) satisfies
K K
i=1 =1
which implies (cf. (5.48))
i j / K \ j
n >n-
1=1 i=l \i=j+l J =1
That is, the constraint nu a > uu Pi in (5.49) is inactive. If ipj > ipj+u we will
decrease the j-th component and increase the j -f 1 component, while leaving the other
components unchanged. Letting 3/3(6) be the modified vector, we set
Vj+i (S) = (1 + S)3pj+1 and ^(S) =
Since the j-th constraint in (5.49) is inactive, 3/3(6) is feasible for <5 near zero. And if
3/3j > 3/3j+i, then the cost decreases as J increases. It follows that xpj < 3/3j+1.
By Lemma 5.4.2, is a decreasing function of i for i [1, j] while fa = 1/A2Hi
for i > j. Since An,i is a decreasing function of i, it follows that fa = fa 1/A2Hi is a
decreasing function of i for i [1,/] with pi > 0, while pi = 0 for i > j. Hence, pi is
a decreasing function of i 6 [1,/f], In particular, the constraint pk > pk+l in (5.47) is
automatically satisfied by the associated solution characterized in Lemma 5.4.2.
We refer to the index j in Lemma 5.4.2 as the break point. At the break point,
the lower bound constraint pi > 1/A^ changes from inactive to active. We now use
Lemma 5.4.2 to obtain an algorithm for (5.49).
Lemma 5.4.3 Let 7^ denote the k-th geometric mean of the Pi:

68
function ip = TCDPow (/3,A)
L = 1 ; R = length (/3) ; ip = zeros (1, R) ;
C = cumsum (log (/3)) ;
while R > L
[t, 1] = max (Â£(L:R)./[1:R-L+l]) ;
71 = exp (t) ; LI = L + 1 1 ;
if 71 > 1/A (LI) 2
ip(.L:Ll) = 71 ;
L = L + 1 ;
C(L:R) = C(L:R) CCL-1) ;
else
V>(L1:R) = 1./(A(L1:R).*2) ;
C(LI1) = COO sum (log (-0(LI:R))) ;
R = LI 1 ;
end
end
Figure 5-2: A Matlab function to solve (5.49).
and let l denote an index for which 7* is the largest:
l = arg max{7fc : 1 < k < K}. (5.53)
If 7; > 1/Af, then putting ipi = 7Â¡ for all i < l is optimal in (5.49). 7/7/ < 1/Af, then
ipi = 1/A# for all i > l at an optimal solution of (5.49).
Proof: See Appendix B.
Based on Lemma 5.4.3, we can use the following strategy to solve (5.49). We form
the geometric mean described in Lemma 5.4.3 and we evaluate l. If 7Â¡ > 1/A^(, then
we set (pi = 7; for i < l, and we simplify (5.49) by removing ipi, 1 < i < l, from the
problem. If 7Â¡ < 1/A2Hl, then we set tpi = l/X2H i for i > l, and we simplify (5.49) by
removing xpi, l < i < K, from the problem. The Matlab code TCDPow implementing this
algorithm appears in Figure 5-2.
After obtaining the power loading level (pi = xpi~ l/X2Hi, 1 precoder F and the nulling vectors {w}f=1 according to Table 5-1 in Section 5.3. Note
that one of the possible paths through the TCDPow routine makes the leading elements of
t/> all equal while setting the trailing elements of xpi = 1/A^,. This path coincides with
the standard water filling algorithm. In this case, the TCD scheme is optimal in terms
of maximizing the overall throughput given the input power. On the other hand, if some
substream has a very high prescribed SINR such that the l given in (5.53) is less than the
break point j, then ip leads to be a multi-level water filling power allocation, which

69
suffers from overall capacity loss. This happens when the target rate vector [i?i,... Rl\
falls out of the convex hull spanned by the L\ permutations of [C\,... C/f,0,... ,0]
(cf. Figure 5-1), where C*, k = l...., K, are the capacities of the eigen subchannels
with water filling power allocation. As a remedy to this issue, one can break (if it
is practically allowable) the oversized substream into more than one substreams with
smaller rates, or equivalently, lower SINR requirements. Note that TCD can decompose
a MIMO channel into an arbitrarily large number of subchannels.
An interesting special case is that pi = P2 = Pl, he., the substream shares
the same SINR requirements. In this case, fd\ < /?2 < < Pk since the singular values
{^//,},Ci are in nonincreasing order, and TCDPow yields a standard water filling solution.
In this case, TCD becomes UCD.
We present two numerical example to conclude this section. In the first example, we
assume Rayleigh independent flat fading channels with Mt 5 and Mr = 6. We consider
equal QoS requirements for L = 5 independent substreams. Figure 5-3 compares the
input power needed by our TCD scheme and the linear transceiver scheme of [28], Our
scheme can save about 2.5 dB for any prescribed output SINR.
= 6, M( = 5 iid Rayleigh Flat Fading
Figure 5-3: Input SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr 6.
In the second example, we consider a rank two MIMO channel with singular value
Ai,A2. Suppose we want to decompose the MIMO channel into 2 subchannels with

70
capacity C\ and C2 with C\ + C2 = 10 bps/Hz. We consider the three scenarios with
(Ai = 2, A2 = 1), (Ai = 5, A2 = 1), and (Ai = 10, A2 = 1). For all the three cases, there
is an inflection point beyond which our TCD is the same as the linear design of [28].
That is because when the two subchannels have very disparate QoS constraints, i.e.,
Ci is far larger than C2, the optimal strategy is to apply SVD to the channel matrix
and transmit data through the orthogonal eign-subchannels. (In this case, 0 = 1. (cf.
(5.13)).) If the subchannels QoS constraints are not too disparate, which corresponds
to the region to the left of the inflection point, the required input power of our TCD
scheme is invariant with respect to C\, C2 and is strictly less than that needed by the
linear design. This region corresponds to the capacity lossless region (cf. Figure 5-1).
Another interesting point is that the relative advantage of TCD is more prominent if
the singular values A1;A2 become more disparate.
Figure 5-4: Input SNR vs. C\. A rank 2 channel is decomposed into two subchannels
with capacities C\ and C2 = 10 C\.
5.5 CDMA Sequence Design
As we have shown in Section 2.1.1, the CDMA sequence design problem can be
viewed as a special case of the MIMO transceiver design. In an idealized S-CDMA
system where the channel does not experience any fading or near-far effect, L mobile
users modulate their information symbols via spreading sequences {sjf=1, each of which
has the processing gain N. The discrete-time baseband S-CDMA signal received at the

71
(single-antenna) base-station can be represented as [31]
y = Sx + z
(5.54)
where S = [si,...,s] 6 RNxL and the 1th (1 < l < L) entry of x, xÂ¡, stands for
the information symbol from the 1th user. In the downlink channel, the base station
multiplexes the information dedicated to the L mobile users through the spreading
sequences, which are the columns of S. Then, all the mobiles receive the same signal
given in (5.54). We remark that (5.54) can also be written as (4.1) with H = 1^ and
F = S. Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading
sequences amounts to optimizing the precoder F for a MIMO system. Indeed, due
to the simple channel matrix (H = I), some procedures of the TCD scheme can be
simplified. We shall show that the TCD scheme turns out to be an improved solution
to the sequence design proposed in [56]. At the end of this section, we will compare our
TCD scheme and the scheme proposed in [56].
5.5.1 CDMA Sequences Maximizing Sum Capacity
Recall that the precoder maximizing the overall MIMO channel capacity is F =
V>1/,27T where 4 is obtained by water filling algorithm. For an S-CDMA channel,
H = I, then V = I and the optimal power loading level is the uniform power allocation.
Hence the CDMA sequence maximizing the sum capacity is S = Since 2 has
orthonormal columns, we obtain SST = pi. This observation coincides with the findings
in [31], in which the authors show that the CDMA sequences maximizing the sum
capacity are the Welch-Bound-Equality sequences.
For the uplink scenario, i.e., the mobiles to base station case, the base station cal
culates the optimal CDMA sequences for each mobile user and the associated successive
nulling vectors needed by itself. Then the base station informs the mobile users their
designated CDMA sequences.
First, we need to calculate the power loading levels \$ e RNxN such that the
following GTD matrix decomposition is possible:

A
[31 2: Owx(-jV)]
h
qrpt,
(5.55)
where the diagonal elements of R, riiti 1. 2..... L, satisfy the QoS constraints. Note
that the singular values of H form a sequence whose first N elements are

72
1,2,..., iV, followed by L N ones. From Theorem 5.2.3, (5.55) exists if and only if
({l + l>--->l) >x {1 + pjf=1. (5.56)
Similar to (5.47), we need to solve the problem
min* Eili fa
subject to ({1 + 0<}il1,1,. 1) {1 + pj.ii
fa > 0, Vz
Similar to (5.49), (5.57) can be further simplified using the variables
L
(5.57)
fa = fa + 1, A = 1 + pi for i < N, and PN ]j[(l + pj.
i=N
The simplified problem is
minV> E.ii fa
subject to nil fa > nil A, fa > 1, 1 < k < N.
(5.58)
The algorithm TCDPow simplifies immensely when we apply it to (5.58). Since \$ > 1 =
for all z, the constraints fa > l are inactive. Since fa < /3,_i for all i < N, the
geometric means satisfy 7, < 7_j for all i < N. Hence, in Lemma 5.4.3, the value of l
is either 1 or N. If l = 1, then we set fa = Pi and we remove fa from the problem. If
l N, then fa = 7^ for all z. It follows that there exists an index j with the property
that
/ N \
fa Pi for all i < j and fa I TT A I for all > j-
\i=j+1 /
This observation coincides with the solution obtained in [56],
Let T denote an L x L identity matrix with its first N diagonal elements replaced
by 1 < i < N. According to the TCD scheme presented in Section 5.3.1, we then
apply the GTD algorithm to tF1/2 to obtain
*1/2 Q*R*PÂ¡.
According to (5.25),
Let
S = F =
Nx(LN)
P*.
[vi, , VJ = [4>l/: 0vx(L-/v)]'I' l/ 2Qj<.
(5.59)
(5.60)
(5.61)

73
By (5.26) and (5.27), the nulling vectors used at the base station are
w* = r,v i = (5.62)
where is the ith diagonal element of R>Â¡,. In summary, the base station needs to
run the following three steps:
1. Solve the optimization problem (5.58).
2. Apply the GTD algorithm to '1/2 in (5.59).
3. Obtain the spreading sequences for all mobile users, [si,...,s] = S, and the
nulling vectors {w}t=1 (cf. (5.60) and (5.62)) for the base station.
In the downlink case, the mobiles cannot cooperate with each other for decision
feed-back. Hence the VBLAST detection is impractical at receivers. However, we can
apply TCD-DP as introduced in Section 5.3.2 to cancel out known interferences at the
transmitter, i.e., the base station. We can convert the downlink problem as an uplink
one and exploit the downlink-uplink duality as we have done in Section 5.3.2. Note that
H = H* = I, i.e., the downlink and uplink channels are the same! Consider the case
where the uplink and downlink communications are symmetric, i.e., for each mobile
user, the QoS of the communications from the user to the base station and the base
station to the user are the same. After obtaining the spreading sequences [sj,..., s] for
the mobile users, and the nulling vectors [wi,..., w] used at the base station for the
transmitted from the base station are exactly [wi,..., w] and the nulling vectors used
at the mobiles are the spreading sequences, [sj,..., s], used in the uplink case. The only
parameters we need to calculate are qi,...,qN (cf- (5.37)). Hence in this symmetric case,
the base station only needs to inform the mobiles their designated spreading sequences
once in the two-way communications. Each mobile uses the same sequence for both data
transmission in the uplink channel and interference nulling in the downlink channel.
5.5.4 Numerical Example
We present one numerical example to show how TCD can be applied to CDMA
sequence design. We consider an example where there are L 4 mobile users and
the processing gain N = 3. The prescribed SINRs of the four users are 20,19,18, and
17 dB, respectively. For the uplink case, we apply the TCD-VBLAST scheme to obtain

74
the spreading sequences of the four users as the columns of the matrix
( 10.0000 -12.0745 -6.4974 -3.0926 ^
S =
V
0 0 7.4138 -15.5760
0 8.8312 -13.3801 -6.3686
The nulling vectors used by the base station are the columns of the matrix
W =
/ 0.0990 -0.0015 -0.0037 -0.0104 ^
0 0 0.1157 -0.0522
0 0.1098 -0.0077 -0.0213
(5.63)
(5.64)
We note that for this uplink scenario, the base station detects the fourth mobile user,
which has the spreading sequence corresponds to the fourth column of S, first and the
first user last.
If the prescribed SINRs of the four users remain the same in the downlink scenario,
the spreading sequences used by the base station are the four columns of the matrix
F -
1 17.1936
0
l
-0.2303
0
17.0149
-0.5154
16.0012
-1.0614
-1.2796 ^
-6.4449
-2.6352 y
(5.65)
In this case, the base station applies the dirty paper precoder to the first mobile user
first and the last user last. Note that the columns of F and W in (5.64) are the same
up to a scaling factor. Moreover, tr(FFT) = tr(SST) = 892.7274. which means that the
power consumed in the base station equals to the overall power used by the four mobile
users. At the mobile end, the users use the nulling vectors
S =
1 10.0000
0
V 0
-12.0745 -6.4974
0 7.4138
8.8312 -13.3801
-3.0926 N
-15.5760
-6.3686 ,
(5.66)
which are exactly the spreading sequences used in the uplink scenario. Scaling the output
signals may be necessary at the mobile ends for the subsequent dirty paper-decoder. But
the signal scaling does not influence the output SINR.
If zeros are not allowed in the spreading sequences, we can left multiply S and W
a 3 x 3 orthogonal matrix to eliminate the zeros in S.

75
5.5.5 Further Remarks
The TCD scheme, which was originally motivated by MIMO transceiver designs,
turns out to be similar to the scheme of [56] in several aspects. Both schemes are based
on the nonlinear decision feedback operations. Hence both are optimal in terms of max
imizing the channel throughput and minimizing the overall input power. Both the GTD
algorithm, on which the TCD scheme is based, and the construction of the Hermitian
matrix with prescribed eigenvalues and Cholesky values as done in [56] rely on the Weyl-
Horn theorem. However, our TCD scheme enjoys several remarkable advantages over
the scheme of [56]. First, note that if we obtain the GTD H = QRP*, where R has the
prescribed diagonal elements, then it follows immediately that A = P*H*HP = RR* is
the desired Cholesky decomposition. However, the information associated with Q is lost
in the Cholesky decomposition. Hence the nulling vectors used at the receivers of [56]
cannot be calculated explicitly as our TCD does (cf. (5.27)). Furthermore, the correla
tion matrix A is only an intermediate result. To get the CDMA sequences, one has to
decompose A = RR* explicitly. The TCD scheme, however, can be used to obtain both
the precoder (CDMA sequences), which are the columns of P, and the equalizer from Q
simultaneously. Second, our TCD scheme is a solution to the more general MIMO trans
ceiver design problem. The Cholesky decomposition algorithm provided in Appendix C
of [56] is only applicable to the scenario where the singular values are only of two values.
Hence it is not applicable to the general design of MIMO transceivers. The more general
Cholesky factorization algorithm suggested in the proofs is computationally far less effi
cient. Third, the TCD scheme has two implementation forms, i.e., TCD-VBLAST and
TCD-DP, which makes it applicable to both the downlink and uplink scenarios. Finally,
the TCD scheme provides insights that identify the CDMA sequence design problem as
special cases of the MIMO transceiver design.
5.6 Conclusions
Based on the recently developed GTD matrix decomposition algorithm, we have
proposed the TCD scheme utilizing the CSIT and CSIR. TCD can be used to decompose
a MIMO channel into multiple subchannels with prescribed capacities. The TCD scheme
has two implementation forms. One is the combination of a linear precoder and a
minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred
to as TCD-VBLAST, and the other includes a dirty paper (DP) precoder and a linear
equalizer followed by a DP decoder, which we refer to as TCD-DP. Both forms of TCD

76
are computationally very efficient. We have also determined the subchannel capacity
region such that a capacity lossless decomposition is possible. The applications of the
TCD scheme for MIMO communications with QoS constraints have been investigated.
We have also identified the problems of designing precoders for OFDM communications
and designing CDMA sequences as special cases in the unifying framework of MIMO
transceiver designs. In particular, we have shown that the CDMA sequence design
problem in the uplink and downlink scenarios can be solved using TCD-VBLAST and
TCD-DP, respectively.
Appendix A
Proof of Theorem 5.4.1
Observe that for F Vi1/2, we have
K
tr(FF*) = ^& and HF = UAi1/2. (5.67)
1=1
Hence, \2HFi = X2Hi . Jl + 1 < i < K,
\G i = v
1,
i > K.
Since 1+pi > 1, the last LK inequalities in the multiplicative majorization condition in
(5.46) are implied by the single equality constraint in (5.47). Hence, the problem (5.46)
reduces to (5.47) where F = Vi1/2, which gives an upper bound for the minimum in
(5.46).
Let F = UFi' 2fiT denote the singular value decomposition for any given F G
CM,xL. Once again, tr (FF*) is given by the sum in (5.67). By [60, Theorem 3.3.14],
the singular values of the product HF of two matrices are multiplicatively majorized by
the product of the singular values of H and F:
k k
JJ 1 < k < K. (5.68)
i= 1 =1
Taking logs, we have
k k
^!g(A^,,) > ^log(A^Fi),
i=l
1=1
1 (5.69)

77
By [60, Lemma 3.3.8]) and (5.69),
k k
Y /(log(A^)) > Y /(log(A2HF,i)). 1 < k < K, (5.70)
t=l 1=1
whenever / is a real-valued, increasing convex function. The function f(t) = log(e -I-1)
is convex since its second derivative is positive. Making this choice for / in (5.70) and
exponentiating both sides, we obtain:
k k
+ 1) + 1)> 1 < k < K.
t=l =1
Since F is feasible in (5.46),
H(Ahf, + 1)
i=l
K
JJ(AffF,i + 1)
k
> n<*.+1)>
i=l
= n<*+^
=1
1 < k < K,
Combining this with (5.71), we get
=i
K
n(A//,i0i+1)
k
> fj(p + l)i
t=l
L
=i
l (5.71)
(5.72)
Since Xjj +1 is the square of the i-th singular value of the augmented matrix Ga cor
responding to the choice Vi1/2, we conclude that F = Vi1/2 satisfies all the inequality
constraints in (5.46). If the inequality (5.72) is strict, then x should be decreased
in order to satisfy the equality constraint in (5.47). Since decreasing x only lowers
tr(FF*), we deduce that the minimum in (5.46) is achieved by a matrix of the form
F = Vi1/2. If F = Vi1/2 is optimal in (5.46), then so is F = Vi1/2!}7 whenever
fl has K orthonormal columns (since the constraints are satisfied and the value of the
cost does not change). We now make the choice for fi given in Theorem 5.3.1. That
is, if QRcuPT is the GTD of in (5.19) where i is a solution of (5.47), then 17 is
the matrix formed by the first K columns of PT. For this choice of 2, the constraints
of (5.45) are satisfied. As noted earlier, the minimum in (5.45) can be no smaller than
the minimum in (5.46). Since this choice for F yields the same cost in both (5.45) and
(5.46), we conclude that F = V!?1/2!}7 is optimal in (5.45).

78
Appendix B
Proof of Lemma 5.4.3
First suppose that 7Â¡ > 1 /Af. By the arithmetic/geometric mean inequality, the
problem
1 1 1
min ^ fai subject to n**n A, ^>0, (5.73)
1=1 1=1 1=1
has the solution fai =7Â¡, 1 < i < l. Since A//, is a decreasing function of i and
71 > 1 /X2Hi, we conclude that fai 7; satisfies the constraints fai > 1/Af{i for 1 < i < l.
Since l attains the maximum in (5.53),
k
if >n&
i=l
for all k <1. Hence, by taking fa = 7/ for 1 < i < Z, the first l inequalities in (5.49) are
satisfied, with equality for k = l, and the first l lower bound constraints fa > 1/A^t are
satisfied.
Let fa denote any optimal solution of (5.49). If
1 1
ii*-n (s-74)
=i =1
then by the unique optimality of fa = 7*, 1 < i < l, in (5.73), and by the fact that the
inequality constraints in (5.49) are satisfied for k G [1,Z], we conclude that fa = 7Â¡ for
all i [1, /]. On the other hand, suppose that
1 1
n>*?>n>=t- (5-75)
Â¡=1 =i
We show below that this leads to a contradiction; consequently, (5.74) holds and fa = 7/
for i [1,/].
Define the quantity
i/Â¡
7*=(n ^
u=i
By (5.75) 7* > 7j. Again, by the arithmetic/geometric mean inequality, the solution of
the problem
( 1
min fa subject to n fa > 7I1 ^>0, (5.76)
1=1 1=1
is fa 7 for i [1,Z]. By (5.75), 7* > 7Â¡ and fa satisfies the inequality constraints in
(5.49) for k 6 [1,Z],

79
Let M be the first index with the property that
M M
II* = 11* (5-77)
=1 1=1
Such an index exists since ip* is optimal, which implies that
1=1 =1
First, suppose that M < j, where j is the break point given in Lemma 5.4.2. By
complementary slackness, /q = 0 and ip* ip*+1 = Â¡il for 1 < z < M. We conclude that
xpi = 7, for 1 < i < M. By (5.77) we have
M
7.M=n a.
=1
It follows that
/ M \ X/M
IJaJ =7* >7i>
which contradicts the fact that l achieves the maximum in (5.53).
In the case M > j, we have ipi 7 for 1 < z < j. Again, this follows by
complementary slackness. However, we need to stop when i = j since the lower bound
constraints become active for i > j. In Lemma 5.4.2, we show that ip* > ip* 7 for
i > j. Consequently, we have
M M
t=l i=l
Again, this contradicts the fact that l achieves the maximum in (5.53). This completes
the analysis of the case where 7; > 1/A2HÂ¡.
Now consider the case By the definition of 7we have
K
l/K
K
7; > ( n^ J or 7^>n^
(5.78)
0=1
i= 1
If j is the break point described in Lemma 5.4.2, then ip* > ip* for all f; it follows that
K
i=1
(5.79)

80
Since the product of the components of 0* is equal to the product of the components
of f3, from (5.78) and (5.79) we get
K K
7* na=n* w)K
=1 i=l
Hence, 7 > 0* > 1/A^ fr z J- particular, if Z < j, then 7Â¡ > 1/A#,,
or, l > j when 7Â¡ < 1/A#,. As a consequence, 0* = j}.

CHAPTER 6
NOVEL MATRIX DECOMPOSITIONS
6.1 Introduction
Given a complex matrix H, we consider the decomposition H = QRP*, where R
is upper triangular and Q and P have orthonormal columns. Special instances of this
decomposition are
(a) the singular value decomposition (SVD) [61, 62]
H = VSW',
where Â£ is a diagonal matrix containing the singular values on the diagonal,
(b) the Schur decomposition [63]
H = QUQ*,
where U is an upper triangular matrix with the eigenvalues of H on the diagonal,
(c) the QR decomposition where P = I.
In this chapter, we will introduce two novel matrix decompositions, i.e., the geomet
ric mean decomposition (GMD) and the generalized triangular decomposition (GTD).
As we introduced before, the GMD scheme and the UCD scheme are based on the GMD
matrix decomposition algorithm, and the TCD is based on the GTD algorithm. The
results of this chapter are motivated by the applications of designing MIMO transceiver.
Interesting, these results turn out to be also useful to the numerical analysis community.
6.2 Geometric Mean Decomposition
In this section, we present a new unitary decomposition which call the geometric
mean decomposition or GMD. Given a rank K matrix H G Cmxn, it is expressed in the
form QRP* where P and Q have orthonormal columns, and R G RKxK is a real upper
triangular matrix with diagonal elements all equal to the geometric mean of the positive
singular values:
81

82
Here the ctj are the singular values of H, and a is the geometric mean of the positive
singular values. Thus R is upper triangular and the nonzero diagonal elements are the
geometric mean of the positive singular values.
We were led to this decomposition when trying to optimize the performance of
multiple-input multiple-output (MIMO) systems. However, this decomposition has
arisen recently in several other applications. In [64, Prob. 26.3] Higham proposed
the following problem:
Develop an efficient algorithm for computing a unit upper triangular matrix
with prescribed singular values 1 < i < K, where the product of the ax
is 1.
A solution to this problem could be used to construct test matrices with user specified
singular values.
The solution of Kosowski and Smoktunowicz [65] starts with the diagonal matrix E,
with 2-th diagonal element crÂ¡, and applies a series of 2 by 2 orthogonal transformations
to obtain a unit triangular matrix. The complexity of their algorithm is 0(K2). Thus
the solution given in [65] amounts to the statement
QjEPo-R, (6.1)
where R is unit upper triangular.
For general E, where the product of the ax is not necessarily 1, one can multiply E
by the scaling matrix a-1I, apply (6.1), then multiply by a to obtain the GMD of E.
And for a general matrix H, the singular value decomposition H = VEW* and (6.1)
combine to give the H = QRP* where
Q = VQo and P = WPu.
According to (3.11), we consider the problem of choosing Q and P to maximize the
minimum of the
max min : 1 < i < K\
Q.P j
subject to QRP* = H, Q*Q = I, P*P = I, (6-2)
r,j = 0 for i > j, Re MA *K,
where K is the rank of H. Since the GMD of H is feasible in (6.2), we conclude that
the GMD yields the optimal solution to (6.2).

83
6.2.1 Generalized Maximin Properties
We consider the following problem:
max min {u1 : 1 < i < K}
F,G
subject to GUF* = H, Uij 0 for i > j, U Â£ RKxK,
uÂ¡i > 0, 1 < i < K,
tr ((G*G)-1) < pi, tr((F*F)-1) (6.3)
If Pi P2 = K, then any Q and P feasible in (6.2) are feasible in (6.3). Hence, the
problem (6.3) is less constrained than the problem (6.2) since the set of feasible matrices
has been enlarged. Nonetheless, we now show that the solution to this relaxed problem
is the same as the solution of the more constrained problem (6.2).
Theorem 6.2.1 7/H Â£ Cmxn has rank K, then a solution of (6.3) is given by
G = Q*/^, U=(^pW and F = P
Pi
v K J
where QRP* is the GMD of H.
Proof: Let VSW* be the singular value decomposition of H, where E Â£ KAxA
contains the K positive singular values of H on the diagonal. If F and G satisfy the
constraints of (6.3), then we have
H = YEW* GUF*.
The column space of GUF* is contained in the column space of G. Since G has K
columns, the dimension of the column space is at most K. Since GUF* = H has rank
K, the column space of G must coincide with the column space of H, which is equal to
the column space of V. Hence, there exists a K by K invertible matrix A such that
G = VA. (6.4)
In the same fashion, the column space of F must coincide with the column space
of H*, which is equal to the column space of W. And there exists a K by K invertible
matrix B such that
F = WB.
(6.5)

84
Combining (6.4) and (6.5) with the identity GUF* = H = VSW* gives
AUB* = S.
It follows that
K
det (Â£*Â£) = det (BU*A*AUB*) = det (AA)det (B*B) |u|2,
i=i
which gives
K
min.KI2* < JJM2 = det (S*S)det (A*A)_1det (B*B)-1.
i=l
By the constraints of (6.3), we have
(6.6)
tr ((G*G)_1) = tr ((A*A)-1) < pi,
tr ((F*F)-1) = tr ((B*B)_1) < p?.
By the geometric mean inequality and the fact that the determinant (trace) of a matrix
is the product (sum) of the eigenvalues, a K by K Hermitian positive semidefinite matrix
S satisfies
det (S) <
Using these bounds for the determinant and the trace in (6.6), we have
min I rtiv I <
i < \JP\P2
K
(6.7)
Finally, it can be verified that for the choices of G, U, and F given in the statement of
the theorem, the inequality (6.7) is an equality.
6.2.2 Implementation Based on Initial SVD
We now give an algorithm for evaluating the GMD that starts with the singular
value decomposition H = VSW*. The algorithm generates a sequence of upper trian
gular matrices R(U, 1 < L < K, with R(b = Â£. Each matrix R(G has the following
properties:
(a) rj^ = 0 when i > j or j > max {L, z}.
(b) r\^ = o for all i < L, and the geometric mean of L < i < K, is g.
We express R(:+1l = Q,R,,-)Pt where Q*. and are orthogonal for each k.
These orthogonal matrices are constructed using a symmetric permutation and a
pair of Givens rotations. Suppose that R(/c) satisfies (a) and (b). If rjÂ¡ > g, then let

85
II be a permutation matrix with the property that ITR^II exchanges the (k + l)-st
diagonal element of with any element r^,, p > k, for which rw < a. If < a,
then let II be chosen to exchange the (k + l)-st diagonal element with any element rpp,
p > k, for which > a. Let 8\ = r[^ and 82 = rpp denote the new diagonal elements
at locations k and k + 1 associated with the permuted matrix IIRa n
Next, we construct orthogonal matrices Gi and G2 by modifying the elements in
the identity matrix that lie at the intersection of rows k and k + 1 and columns k and
k + 1. We multiply the permuted matrix IIR^n on the left by Gj and on the right
by Gi. These multiplications will change the elements in the 2 by 2 submatrix at the
intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements
of Gj and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of Gj,
IIRWn, and G,:
(6.8)
c8\
s2
1
0
c s
1
QI
H
o
<0
1
1
c8\
1
O
s c
0 y
(GJ)
(nR<>n) (Gj)
(R(fc+1>)
If 1 = 82 = cr, we take c = 1 and s = 0; if 8\ ^ 82, we take
In either case,
c =
x -
62 and s = vT
A
and y = .
a
(6.9)
(6.10)
Since a lies between 1 and 82, s and c are nonnegative real scalars.
Figure 6-1 depicts the transformation from R(fc) to GjlIR^IIGi. The dashed
box is the 2 by 2 submatrix displayed in (6.8). Notice that c and s, defined in (6.9), are
real scalars chosen so that
c2 + s2 = 1 and (c8\)2 4- (s2)2 = a2.
With these identities, the validity of (6.8) follows by direct computation. Defining
Qk = IIG2 and P/t = nG^ we set
R(k+D = QTR(^)Pfc (6,n)
It follows from Figure 6-1, (6.8), and the identity det (R(fc+1>) = det (R(fc)), that (a)
and (b) hold for L = k+ 1. Thus there exists a real upper triangular matrix R^, with

86
Column k
X
X
X
0
0
0
X X
X
X
0
0
X
X
0
0
0
X
X
X
0
0
Ic
iX
0;
0
0
; X
X
0
0
xi
0
0
xj
0
0
X
0
X
0
X
Ak)
GjnR^nG,
Figure 6-1: The operation displayed in (6.8)
a on the diagonal, and unitary matrices Q and P, i = 1,2,..., K 1, such that
Rw = (QLi QQ[)S(PiP2 . Pfc-i).
Combining this identity with the singular value decomposition, we obtain H = QRP*
where
q=v(iiq.)> r=rw. and p=w^j][Pj-
In summary, our algorithm for computing the GMD, based on an initial SVD, is
the following:
1. Let H = VEW* be the singular value decomposition of H, and initialize Q = V,
P = W, R = S, and k = 1.
2. If rkk > a. choose p > k such that rpp < a. If rkk < a, choose p > k such that
Cpp > d-. In R, P, and Q, perform the following exchanges:
Cfe+l,fc+l fpp
P;,fc+1 ^ P :,p
Q:,fc ^ Q:,p
3. Construct the matrices G] and G2 shown in (6.8). Replace R by GjRGi, replace
Q by QG2, and replace P by PG^
4. If k = K 1, then stop, QRP* is the GMD of H. Otherwise, replace k by k + 1
and go to step 2.
Given the SVD, this algorithm for the GMD requires 0((m + n)K) flops. For
comparison, reduction of H to bidiagonal form by the Golub-Reinsch bidiagonalization

87
scheme [66, 67, 68], often the first step in the computation of the SVD, requires 0(mnK)
flops.
6.3 Generalized Triangular Decomposition
In this section, we attempt to generalized decomposition of the form
H = QRP*, (6.12)
where R is upper triangular and Q and P have orthonormal columns. We will answer
the following two questions. First, what is the necessary and sufficient condition that the
decomposition of (6.12) exists. Second, how to calculate such a decomposition. Section
6.3.1and 6.3.2 focus on answering the two questions.
6.3.1Existence of GTD
The following result is due to Weyl [69] (also see [60, p. 171]):
Theorem 6.3.1 If A E Cnxn with eigenvalues A and singular values cr, then A X a.
The following result is due to Horn [70] (also see [60, p. 220]):
Theorem 6.3.2 If r 6 Cn and matrix R Â£ Cnxn with singular values at, 1 < i < n, and with r on the diagonal of R.
We now combine Theorems 6.3.1 and 6.3.2 to obtain:
Theorem 6.3.1 Let H E > ... >
aK > 0. There exists an upper triangular matrix R E CKxK and matrices Q and P
with orthonormal columns such that H = QRP* if and only if v < cr.
Proof: If H = QRP*, then the eigenvalues of R are its diagonal elements and the
singular values of R coincide with those of H. By Theorem 6.3.1, r < a. Conversely,
suppose that r < a. Let H VEW* be the singular value decomposition, where
E E RhxK. By Theorem 6.3.2, there exists an upper triangular matrix R Â£ CKxK with
the r on the diagonal and with singular values cr,, 1 < i < K. Let R = V0EW be
the singular value decomposition of R. Substituting E = VJRW0 in the singular value
decomposition for H, we have
h = (vv;)R(ww;)*.
In other words, H = QRP* where Q VVq and P WWJ.
6.3.2The GTD Algorithm
Given a matrix H E Crnx" with rank K and with singular values oÂ¡ > o2 > ... >
&K > 0, and given a vector r E CA such that r a, we now give an algorithm for

88
computing the decomposition H = QRP\ This algorithm for the GTD essentially
yields a constructive proof of Theorem 6.3.2.
Let VSW* be the singular value decomposition of H, where Â£ is a K by K diagonal
matrix with the diagonal containing the positive singular values. We let 6 CKxK
denote an upper triangular matrix with the following properties:
(a) r\^ = 0 when i > j or j > i > L. In other words, the trailing principal submatrix
of starting at row L and column L, is diagonal.
(b) If denotes the diagonal of R^L\ then the first L 1 elements of r and are
equal. In other words, the leading diagonal elements of R) match the prescribed
leading elements of the vector r.
(c) ti-k where Tl-k denotes the subvector of r consisting of components L
through K. In other words, the trailing diagonal elements of R^) multiplicatively
majorize the trailing elements of the prescribed vector r.
Initially, we set R(1^ = Â£. Clearly, (a)-(c) hold for L = 1. Proceeding by induction,
suppose we have generated upper triangular matrices R^, L = 1,2satisfying
(a)-(c), and unitary matrices Q and P, such that R(+1) = QR()P for 1 < L <
k. We now show how to construct unitary matrices Qt and P*, such that R^+b =
Q^R^'^Pfc, where R^fc+1^ satisfies (a)-(c) for L = k + 1.
Let p and q be defined as follows:
p = arg min{|rt-fc)| : k < i < K, |r{fc)| > |r*,|}, (6.13)
i
q = arg max{|rf}| : k i
where r-A) is the -th element of Since rk-.K ^ r^, there exists p and q satisfying
(6.13) and (6.14). Let II be the matrix corresponding to the symmetric permutation
ITR^n which moves the diagonal elements and r\$ to the k-th and (k + l)-st
diagonal positions respectively. Let 8X and 52 = denote the new diagonal
elements at locations k and k + 1 associated with the permuted matrix ITR^n.
Next, we construct unitary matrices Gi and G2 by modifying the elements in the
identity matrix that lie at the intersection of rows k and k + 1 and columns k and
k + 1. We multiply the permuted matrix II*R(ir)n on the left by G2 and on the right
by Gj. These multiplications will change the elements in the 2 by 2 submatrix at the
intersection of rows k and A: + 1 with columns k and k + 1. Our choice for the elements
of G! and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of Gj,

89
Column k
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
ix
o!
0
0
; X
X:
0
0
Row
k
X:
0
0
X ;
0
0
X
0
X
0
X X
n*RU)n Gjn'R^nG,
Figure 6-2: The operation displayed in (6.15)
ITR^n, and Gj
If |<5i| = |<521 = |r/t|, we take c = 1 and s = 0; if |i| ^ |2|, we take
c<5j s62
X 0
c s
rk x
S2 cSi
0 s2
s c
0 y
(6.15)
(G5)
(n*R(fc)n)
(Gi)
(R(*+b)
In either case,
c
x =
,H2-|<52|2
| stf-iw
sc(|2|2-|1|2)rfc
and s = \/l c2.
and y =
(6.16)
(6.17)
N2 M2
n*R(>n to G5n*RWnGx. The dashed box is the 2 by 2 submatrix displayed in
(6.15). Notice that c and s, defined in (6.16), are real scalars chosen so that
c2 + s2 1 and (?\6\ |2 + s2|2|2 = |r*.|2. (6.18)
With these identities, the validity of (6.15) follows by direct computation. By the choice
of p and q, we have
M < M < |ii|. (6.19)
If |5i| |2|, it follows from (6.19) that c and s are real nonnegative scalars. It can be
checked that the 2 by 2 matrices in (6.15) associated with Gi and G. are both unitary.
Consequently, both GÂ¡ and G2 are unitary. We define
R(fc+1) = (nG2)*R(fc)(nG!) = Q;R(i')pfc,

90
where = nG2 and Pfc = FIGi. By (6.15) and Figure 6-2, R(fc+1) has properties (a)
and (b) for L k + 1. Now consider property (c).
We write a ~ b if a and b are equal after a suitable reordering of the components.
Let a, b, a+, and b+ be vectors whose components are ordered in decreasing magnitude,
and which satisfy
a ~ rk:K, b ~ rj^, a+ ~ rk+1:K, and b+ ~ (6.20)
Thus at is the z-th largest (in magnitude) component of rk:x. By the induction hypoth
esis, we have a ^ b. To establish (c), we need to show that a+ X b+. Let the index s
be chosen so that as = rk, and let the index t be chosen so that
M > \rk\ > |6t+i|. (6.21)
By the definition of p and q, bt and = 6i+1. As seen in (6.20), a+ is obtained
from a by deleting as = rk. The vector r(fc+1^ is obtained from r(fc) by a unitary trans
formation that changes the value of two elements. In particular, b+ is obtained from b
by replacing the adjacent pair bt and 6(+i by
btbt+lrk
By (6.21) |6t| > |y| > |6t+i|. Consequently,
K = y. (6.22)
We partition the proof of (c) into 2 cases.
Case 1: s < t. Since at+ < a for all f, a -< b, and = bf for 1 < i < t, we have
^ ai:ti = b^t_j. (6.23)
For j > t > s, it follows from the induction hypothesis and the connection between a
and a+ that
J-1 j~ 1 3 3
n n ia+i=ii n i^i=n n ^ n n- ^
=1 =1 =i =1
Since Gj and G2 are unitary, the determinant of (6.15) gives
m\ = |r'fc)rf | \btbt+i\ = \rky\ = |rfc6+|,
(6.25)

91
where the last equality in (6.25) comes from (6.22). Hence, for j > t, it follows that
nw = wimMn) ( n w)
=1 \.=1 / \i=t+2 /
= wi*fi(nnfij(iift+i)-Nf[iifi. m
\i=i / \*=t+i / <=i
Combining (6.23), (6.24), and (6.26), we have a+ < b+.
Case 2: s > t. As before, (6.23) holds. For t < j < s, we have
nwi-nwsnw-Mnufi.
=1 i=l =1 t=l
where the first equality comes from the relation j < s, the middle inequality is the
induction hypothesis, and the last equality is (6.26). Rearranging this gives
fDnwisnw. <6-27)
Since |iZj|/1rfc| = |aj|/|as| > 1 when j < s, we deduce from (6.27) that
aj-i -< bnj-i
when j < s. This also holds for j > s due to (6.24) and (6.26). This completes the
proof of (c).
Hence, there exists an upper triangular matrix \ with occupying the first
K 1 diagonal elements, and unitary matrices Q and P, = 1,2,..., /C 1, such that
Rw = (QU Q;QI)S(P,P2 ... Pfc_i).
(6.28)
Equating determinants in (6.28) and utilizing the identity r-k> = for 1 < i < K 1,
we have
K
nw
=i
Wi
k)\ / K
K
K
' K
Vk\
nw =n-*=nN.
0=1
i=l
i=l
where the last equality is due to the assumption r -< a. It follows that |rj^| = |r/c|.
Let C be the diagonal matrix obtained by replacing the (K, K) element of the identity
matrix by /rK. The matrix C is unitary since |rfc|/|r^| = 1. The matrix
R = C*R(*>
(6.29)
has diagonal equal to r due to the choice of C.

92
Combining (6.28) and (6.29) with the singular value decomposition H = VEW*
gives
H = VQ1Q2 Qjt-iCRP^_j.. P*P*W*.
Hence, we have obtained the GTD with
Finally, note that if r is real, then Gj and G2 are real, which implies R is real.
We summarize the steps of the GTD algorithm as follows. To make it easier to
distinguish between the elements of the matrix R and the elements of the given diagonal
vector r, we use Rij to denote the (i,j) element of R and r, to denote the z-th element
of r.
1. Let H = VEW* be the singular value decomposition of H, and suppose we are
given r 6 CA with r < a. Initialize Q = V, P = W, R = S, and k = 1.
2. Let p and q be defined as follows:
p = arg min{|i?i| : k |rfc|},
i
q = arg max{|Ri| : k i
In R, P, and Q, perform the following exchanges:
(Rkk, Pfc+l,fc+l)
(Rl:fc-l,fci Rl:fc-l,fc+l)
(P:,fc> P;,fc+l)
(Q:.k, Q:,fc+l)
{Rpp> Rqq)
(Rl:/cl,pi Rl:fc1,9)
- (P,p,P,q)
(Q:,p>Q:.i)
3. Construct the matrices Gx and G2 shown in (6.15). Replace R by GjRGi, replace
Q by QG2, and replace P by PGj.
4. If k = K 1, then go to step 5. Otherwise, replace k by k + 1 and go to step 2.
5. Multiply column K of Q by Rkk/tk', replace Rkk by rK The product QRP* is
the GTD of H based on r.
A Matlab implementation of our GTD algorithm appears in the Appendix. Given the
SVD, this algorithm for the GTD requires 0((m+n)K) flops. For comparison, reduction
of H to bidiagonal form by the Golub-Reinsch bidiagonalization scheme [66, 67, 68], often
the first step in the computation of the SVD, requires 0(mnK) flops.

93
Dimension
Time
SVD_EIG
GTD
a error
SVD_EIG GTD
A error
SVD_EIG GTD
100
0.61
0.20
9.8e15
1.0e-14
3.3e14
0
200
2.24
0.38
2.0e14
1.7e14
5.9e13
0
400
13.84
0.86
6.8e14
3.7e14
3.3e13
0
800
97.50
2.30
9.8e14
7.0e14
1.5e10
0
1200
317.83
5.67
l.le13
1.3e13
1.5e9
0
1600
746.77
10.77
3.2e13
1.5e13
7.7e4
0
Table 6-1: Comparison of SVD_EIG and GTD for inverse eigenvalue problems (CPU time
in seconds, singular value and eigenvalue errors in sup-norm)
6.3.3 Inverse Eigenvalue Problem
In [71] Chu presents a recursive procedure for constructing matrices with prescribed
eigenvalues and singular values. His algorithm, which he calls SVD_EIG, is based on
Horns divide and conquer proof of the sufficiency of Weyls product inequalities. In
general, the output of SVD_EIG is not upper triangular. Consequently, this routine could
not be used to generate the GTD. Chu notes that to achieve an upper triangular matrix
would require an algorithm one order more expensive than the divide-and-conquer
algorithm.
Given a vector of singular values a R" and a vector of eigenvalues A C", with
A ^ cr, we can use the GTD to generate a matrix R with A on the diagonal and with
singular values problem provided by the GTD to Chus algorithm. In our initial experimentation, we
discovered that the algorithm of Chu, as presented in [71], did not work. When this was
pointed out, Chu provided an adjustment in which the parameter Â¡i in [71, (2.2)] was
replaced by /zAi/|Ai|. With this adjustment, it was possible to solve 4 by 4 and 5 by
5 test cases that previously caused failure. The results reported in this section use the
Both Matlab routines GTD (see Appendix) and SVD_EIG [71] require 0(n2) flops,
so in an asymptotic sense, the approaches are equivalent. In Table 6-1 we compare the
actual running times of GTD and SVD_EIG for matrices of various dimensions. These
computer runs were performed on a Sun Workstation with 1 GB memory. In making
these runs, the portion of the GTD code connected with the updating of the matrices
P and Q was deleted since SVD_EIG does not accumulate the unitary matrices. The
input arrays a and A were generated in the following way: Using the Matlab routine
RAND, we randomly generated a square matrix whose element lie between 0 and 1. The

94
singular values a were computed using the Matlab routine SVD and the eigenvalues A
were computed using Matlabs EIG. By the theorem of Weyl [69], \ < a. We then
used both SVD_EIG and GTD to generate matrices with the specified singular values and
eigenvalues. Five different matrices of each dimension were generated and the average
running time is reported in Table 6-1.
The times shown in Table 6-1 indicate that GTD becomes increasingly more efficient
than svd_EIG as the matrix dimension increases. For a dimension of 100, GTD is about
three times faster than svd_eig. For a dimension of 1600, GTD is about 70 times faster
than SVD_EIG. In an efficiently designed compiled code, the difference in speed between
these two approaches to inverse eigenvalue problems could be more substantial: the
permutations appearing in GTD could be replaced by the updating of a pointer array;
also, the columns of R that are zero except for the diagonal entry could be flagged,
and when multiplying R by Glt we can skip the multiplication of these essentially zero
columns.
In Table 6-1 we also compare the specified singular values and eigenvalues to those
obtained by applying Matlabs SVD and EIG routines to the generated matrices. That is,
for each matrix output by either svd_eig or GTD, we use Matlabs routines to compute
the singular values and eigenvalues, and we compare to the specified singular values and
eigenvalues using the sup-norm. The errors reported in Table 6-1 are the average errors
for the 5 random matrices of each dimension. Both routines generate matrices with
singular values that match those computed by Matlabs SVD routine to within 13 or 14
digits. Observe that GTD always matches exactly the prescribed eigenvalues since the
generated matrix is triangular, with the specified eigenvalues on the diagonal. The error
in the eigenvalues of the matrix generated by SVD_EIG was comparable to the singular
value error for matrices of dimension up to 400. Thereafter, the error in the eigenvalues
grew quickly. When the matrix dimension doubled from 400 to 800, the error increased
roughly by the factor 103. And when the matrix dimension doubled again from 800 to
1600, the error increased roughly by the factor 106.
A recursive algorithm can require a significant amount of memory. While SVD_EIG
executed, we monitored the memory usage using the Unix top command. We observed
that for a matrix of dimension 1600, the memory consumption grew to 319 MB. Since a
complex double precision matrix of dimension 1600 occupies about 41 MB memory, the
recursion required more than 7 times as much space as the matrix itself.

95
6.4 Conclusions
In this chapter, we introduce two novel matrix decomposition algorithms, including
the geometric mean decomposition and the generalized triangular decomposition. The
GMD H QRP* is a solution of the maximin problem (6.3); the smallest diagonal
element of R is as large as possible. Starting with the SVD, we show that the GMD
can be computed using a series of Givens rotations, and row and column exchanges.
Alternatively, the GMD could be computed directly, without performing an initial SVD,
if H is first reduced by unitary transformations to a real matrix. In a further extension of
our algorithm for the GMD, we show in [51] how to compute a factorization H = QRP*
where the diagonal of R is any vector satisfying the Weyl multiplicative majorization
conditions [69]. The GTD represents the most general unitary decomposition H =
QRP* '. That is, the diagonal r of R must satisfy r < cr, where cr is the vector of singular
values for H, while for any diagonal r with r ^ cr, we can write H = QRP*. The GTD
includes, as special cases, the singular value decomposition, the Schur decomposition,
the QR decomposition, and the GMD. Similar to GMD, given the SVD, the GTD
based on r can also be evaluated using a series of Givens rotations and permutations.
The GTD algorithm provides a new proof of Horns theorem [70], Applications of the
GTD include transceiver design for MIMO communications [72] and inverse eigenvalue
problems surveyed extensively in [73].
Appendix
Matlab Implementation of GTD
/, Input:
7.
"/, H = U*S*V (singular value decomposition of H)
'/, U and V orthonormal columns
/o S diagonal matrix with nonnegative diagonal entries
1 r desired diagonal of R
'/, nnz (r) = nnz (S)
'/ r multiplicatively majorized by diag (S)
7, product nonzero r = product nonzero diag (S)
*/.
'/, Output:
7.

96
*/. H = Q*R*P (GTD based on r)
*/, P and Q orthonormal columns
/, R upper triangular, R (i, i) = r (i)
function [Q, R, P] = gtd (U, S, V, r)
d = diag (S) ;
K = min (size (S)) ;
P=V;Q=U;R= zeros (K) ;
for k = 1 : K-l
rk = r (k) ;
abs_rk = abs (rk) ;
kpl = k + 1 ; kml = k 1 ;
I = find (abs (d (k : K)) > abs_rk) ;
if ( isempty (I) )
[x, p] = max (abs (d (k : K))) ;
p = p + kml ;
else
I = I + kml ;
[x, p] = min (abs (d (I))) ;
P = I (p) ;
end
deltal = d (p) ;
d ([k p]) = d ([p k]) ;
I = find (abs (d (kpl : K)) <= abs_rk) ;
if ( isempty (I) )
[x, q] = min (abs (d (kpl : K))) ;
q = q + k ;
else
I = I + k ;
[x, q] = max (abs (d (I))) ;
q = i (q) ;
end

97
delta2 = d (q) ;
d ([kpl q]) = d ([q kpl]) ;
sq.deltal = abs (deltal)'2 ;
sq_delta2 = abs (delta2)'2 ;
sq_rk = abs_rk*2 ;
denom = sq.deltal sq_delta2 ;
if ( (denom <= 0) I (sq_rk > sq_deltal) )
c = 1 ; s = 0 ;
elseif ( sq_rk < sq_delta2 )
c = 0 ; s = 1 ;
else
c = sqrt ((sq_rk sq_delta2)/denom) ;
s = sqrt (l-c*c) ;
end
if ( sqjrk > 0 )
x = -s*c*rk*denom/sq_rk ;
y = deltal*delta2*rk/sq_rk ;
G1 = [ c -s
sc];
G2 = [ c*deltal -s*(delta2)
s*delta2 c*(deltal) ] ;
G2 = ((rk')/sq_rk) G2 ;
else
x = 0. ;
y = deltal ;
G1 = [0-1
10];
G2 = G1 ;
end
if ( k > 1 )
7, permute the columns
R (l:kml, [k p]) = R (l:kml, [p k]) ;
R (l:kml, [kpl q]) = R (l:kml, [q kpl])

98
7, apply G1 to R
R (l:kml, [k kpl]) = R (l:kml, [k kpl])*Gl
end
R (k, k) = rk ;
R (k, kpl) = x ;
d (kpl) = y ;
7 permute the columns
P (:, [k p]) = P (:, [p k]) ;
P (:, [kpl q]) = P (:, [q kpl]) ;
Q (:, [k p]) = Q (:, [p k]) ;
Q (:, [kpl q]) = Q (:, [q kpl]) ;
7. apply G1 to P
P (:, [k kpl]) = P (:, [k kpl])*Gl ;
7. apply G2 to Q
Q (:, [k kpl]) = Q (: [k kpl])*G2 ;
end
R (K, K) = r (K) ;
if ( r (K) ~= 0 )
Q (:, K) = q (:, K)*d (K)/ r (K) ;
end
P = P (:, 1:K) ;
Q = Q (:, 1:K) ;

CHAPTER 7
CONCLUSIONS
This dissertation studies the signal processing aspect of MIMO communications.
We present a new perspective to the MIMO communications: any MIMO scheme can
be regarded as a MIMO channel decomposer, which decomposes a MIMO channel into
multiple scalar subchannels. Based on this perspective, this dissertation presents three
novel MIMO transceiver designs: the geometric mean decomposition (GMD) scheme,
the uniform channel decomposition (UCD) scheme, and the tunable channel decompo
sition (TCD) scheme. All these schemes deploying either a VBLAST detector at the
receiver or a dirty paper precoder at the transmitter. These transceiver designs rep
resent a paradigm shift from the conventional linear MIMO transceiver design to the
nonlinear ones. The superior performance of the GMD and UCD scheme unveils the
practical significance of the collaborations between the transmitter and receiver. That
is, such collaborations facilitate achieving the optimal tradeoff between the diversity and
multiplexing gains promised by the MIMO communication theory. The TCD scheme
represents a unifying solution to a considerably wide range of problems, including de
signing the precoder for OFDM communications and the optimal CDMA sequences.
Motivated by the application of transceiver designs, this dissertation also intro
duces two novel matrix decomposition algorithms, i.e., the geometric mean decompo
sition (GMD) and the generalized triangular decomposition (GTD). The two matrix
decompositions are the cornerstones of the three transceiver designs proposed in this
dissertation. Moreover, the two matrix decomposition algorithms have significant im
plications in the matrix analysis community. For instance, the GTD is a new and more
efficient solution to the inverse eigenvalue problem.
99

REFERENCES
[1] I. E. Telatar, Capacity of multiple antenna Gaussian channels, AT&T Technical
Memorandum, June 1995.
[2] G. J. Foschini, Jr., Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas, Bell Labs Tech. Journal,
vol. 1, pp. 41-59, Autumn 1996.
[3] I. E. Telatar, Capacity of multi-antenna Gaussian channels, European Transac
tions on Telecommunications, vol. 10, no. 6, pp. 585-595, 1999.
[4] G. J. Foschini and M. J. Gans, On limits of wireless communications in a fading
environment when using multiple antennas, IEEE Journal on Selected Areas in
Communications, vol. 17, pp. 1841-1852, November 1999.
[5] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky, Simplified
processing for high spectral efficiency wireless communication employing multiple-
element arrays, Wireless Personal Communications, vol. 6, pp. 311-335, March
1999.
[6] M. Sellathurai and S. Haykin, Turbo-BLAST for wireless communications: Theory
and experiments, IEEE Transactions on Signal Processing, vol. 50, pp. 2538-2545,
October 2002.
[7] M. Sellathurai and G. J. Foschini, Stratified diagonal layered space-time architec
tures: signal processing and information theoretic aspects, IEEE Transactions on
Signal Processing, vol. 51, pp. 2943-2954, November 2003.
[8] G. J. Foschini, D. Chizhik, M. J. Gans, C. Papadias, and R. A. Valenzuela, Analy
sis and performance of some basic space time architectures, IEEE Journal on
Selected Areas in Communications, vol. 21, pp. 303-320, April 2003.
[9] B. Hassibi, A fast square-root implementation for BLAST, Thirty-Fourth Asilo-
mar Conf. Signals, Systems and Computers, pp. 1255-1259, November 2000.
[10] S. L. Ariyavisitakul, Turbo space-time processing to improve wireless channel ca
pacity, IEEE Transactions on Communications, vol. 48, pp. 1347-1358, August
2000.
[11] J. Benesty, Y. Huang, and J. Chen, A fast recursive algorithm for optimum sequen
tial signal detection in a BLAST system, IEEE Transactions on Signal Processing,
vol. 51, pp. 1722-1730, July 2003.
[12] S. M. Alamouti, A simple transmit diversity techniques for wireless communica
tions, IEEE Journal on Selected Areas in Communications, vol. 16, pp. 1451-1458,
October 1998.
[13] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time codes for high data
rate wireless communications: Performance criterion and code construction, IEEE
Transactions on Information Theory, vol. 44, pp. 744-765, March 1998.
100

101
[14] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, Space-time block codes from
orthogonal designs, IEEE Transactions on Information Theory, vol. 45, pp. 1456
1467, July 1999.
[15] V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, Combined array
processing and space-time coding, IEEE Transactions on Information Theory,
vol. 47, pp. 199-207, Feb. 1999.
[16] L. Zheng and D. Tse, Diversity and multiplexing: A fundamental tradeoff in
multiple-antenna channels, IEEE Transactions on Information Theory, vol. 49,
pp. 1073-1096, May 2003.
[17] Available Online: http://www.3gpp.org.
[18] J. Yang and S. Roy, On joint transmitter and receiver optimization for multiple-
input-multiple-output (MIMO) transmission systems, IEEE Transactions on
Communications, vol. 42, pp. 3221-3231, December 1994.
[19] G. G. Raleigh and J. M. Cioffi, Spatial-temporal coding for wireless communica
tion, IEEE Transactions on Communications, vol. 46, pp. 357-366, March 1998.
[20] A. Scaglione, G. B. Giannakis, and S. Barbarossa, Filterbank transceiver opti-
mizating information rate in block transmissions over dispersive channels, IEEE
Transactions on Information Theory, vol. 45, pp. 1019-1032, April 1999.
[21] A. Scaglione, G. B. Giannakis, and S. Barbarossa, Redundant filterbank precoders
and equalizers part i: Unification and optimal designs, IEEE Transactions on
Signal Processing, vol. 47, pp. 1988-2006, July 1999.
[22] H. Sampath, P. Stoica, and A. Paulraj, Generalized linear precoder and decoder
design for MIMO channels using the weighted MMSE criterion, IEEE Transactions
on Communications, vol. 49, pp. 2198-2206, December 2001.
[23] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath, Optimal
designs for space-time linear precoders and decoders, IEEE Transactions on Signal
Processing, vol. 50, pp. 1051-1064, May 2002.
[24] E. Onggosanusi, A. Sayeed, and B. V. Veen, Optimal antenna diversity signal
ing for wideband space-time wireless channels utilizing channel state information,
IEEE Transactions on Communications, vol. 50, pp. 341-353, February 2002.
[25] E. Onggosanusi, A. Sayeed, and B. V. Veen, Efficient signaling schemes for wide
band space-time wireless channels using channel state information, IEEE Trans
actions on Vehicular Technology, vol. 52, pp. 1-13, January 2003.
[26] D. Palomar, J. Cioffi, and M. Lagunas, Joint Tx-Rx beamforming design for mul
ticarrier MIMO channels: A unified framework for convex optimization, IEEE
Transactions on Signal Processing, vol. 51, pp. 2381-2401, September 2003.
[27] D. Palomar and M. Lagunas, Joint transmit-receive space-time equalization in
spatially correlated MIMO channels: A beamforming approach, IEEE Journal on
Selected Areas in Communications, vol. 21, pp. 730-743, June 2003.
[28] D. Palomar, M. Lagunas, and J. Cioffi, Optimum linear joint transmit-receive
processing for MIMO channels with QoS constraints, IEEE Transactions on Signal
Processing, vol. 52, pp. 1179-1197, May 2004.

102
[29] L. Collin, O. Berder, P. Rostaing, and G. Burel, Optimal minimum distance-based
precoder for MIMO spatial multiplexing systems, IEEE Transactions on Signal
Processing, vol. 52, pp. 617-627, March 2004.
[30] J.-K. Zhang, A. Kavcic, X. Ma, and K. M. Wong, Design of unitary precoders for
ISI channels, in Proceedings IEEE International Conference on Acoustics Speech
and Signal Processing, vol. Ill, (Orlando, Florida), pp. 2265-2268, 2002.
[31] M. Rupf and J. L. Massey, Optimal sequence multisets for synchronous code
division multiple-access channels, IEEE Transactions on Information Theory,
vol. 40, pp. 1261-1266, July 1994.
[32] D. W. Bliss, K. W. Forsythe, A. O. Hero, and A. F. Yegulalp, Environmental issues
for MIMO capacity, IEEE Transactions on Signal Processing, vol. 50, pp. 2128
2142, September 2002.
[33] P. Stoica and R. L. Moses, Introduction to Spectral Analysis. Englewood Cliffs, NJ:
Prentice-Hall, 1997.
[34] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University
Press, 1985.
[35] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, Third
Edition, 1996.
[36] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York,
NY: John Wiley and Sons, Inc., 1968.
[37] G. Ginis and J. M. CiofR, On the relationship between V-BLAST and the GDFE,
IEEE Communications Letters, vol. 5, pp. 364-366, September 2001.
[38] G. Ginis and J. M. Cioffi, Vectored transmission for digital subscriber line sys
tems, IEEE Journal on Selected Areas in Communications, vol. 20, pp. 1085-1104,
June 2002.
[39] G. Caire and S. Shamai, On the achievable throughput of a multiantenna Gaussian
broadcast channel, IEEE Transactions on Information Theory, vol. 49, pp. 1691-
1706, July 2003.
[40] G. Ginis and J. M. Cioffi, A multi-user precoding scheme achieving crosstalk can
cellation with application to DSL systems, Proc. 3fth Asilomar Conference Sig
nals, Systems, Computers, Asilomar, CA, vol. 2, pp. 1627-1631, 29 Oct.-l Nov.
2000.
[41] M. Costa, Writing on dirty paper, IEEE Transactions on Information Theory,
vol. 29, pp. 439-441, May 1983.
[42] M. Tomlinson, New automatic equaliser employing modulo arithmetic, Electron.
Lett., vol. 7, pp. 138-139, March 1971.
[43] H. Harashima and H. Miyakawa, Matched-transmission technique for channels
with intersymbol interference, IEEE Trans. Communications, pp. 774-780, August
1972.
[44] W. Yu and J. M. Cioffi, Trellis precoding for the broadcast channel, Global
Telecommunications Conference, vol. 2, pp. 1344-1348, November 2001.

103
[45] B. C. Banister and J. Zeidler, Feedback assisted transmission subspace tracking
for MIMO systems, IEEE Journal on Selected Areas in Communications, vol. 21,
pp. 452-463, April 2003.
[46] A. Poon, D. Tse, and R. W. Brodersen, An adaptive multiantenna transceiver
for slowly flat fading channels, IEEE Transactions on Communications, vol. 51,
pp. 1820-1827, November 2003.
[47] B. Hassibi, An efficient square-root implementation for BLAST, http://cm.bell-
labs. com/who/hochwald/papers/squareroot/, 2000.
[48] M. Varanasi and T. Guess, Optimum decision feedback multiuser equalization
with successive decoding achieves the total capacity of the Gaussian multiple-access
channel, Conference Record of the Thirty-First Asilomar Conference on Signals,
Systems and Computers, vol. 2, pp. 1405 1409, Nov 2-5 1997.
[49] P. Viswanath and D. Tse, Sum capacity of the vector Gaussian broadcast channel
pp. 1912-1921, August 2003.
[50] D. Tse and P. Viswanath, Fundamentals of Wireless Communications. Available:
http://inst.eecs.berkeley.edu/~ee224b/sp04/#Course Notes, 2004.
[51] Y. Jiang, W. Hager, and J. Li, The generalized triangular decomposi
tion, SIAM Journal on Matrix Analysis and Applications, Available online:
http://www.sal.ufl.edu/yjiang/papers/gtd.pdf Submitted.
[52] Y. Jiang and J. Li, Adaptable channel decomposition for MIMO communica
tions, IEEE International Conference on Acoustics, Speech, and Signal Processing,
[53] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, second edi
tion, John Wiley and Sons, Inc., 1984.
[54] P. Viswanath and V. Anantharam, Optimal sequences and sum capacity of syn
chronous CDMA systems, IEEE Transactions on Information Theory, vol. 45,
pp. 1984-1991, September 1999.
[55] P. Viswanath, V. Anantharam, and D. Tse, Optimal sequences, power control
and user capacity of synchronous CDMA systems with linear MMSE multiuser
receivers, IEEE Transactions on Information Theory, vol. 45, pp. 1968-1983, Sep
tember 1999.
[56] T. Guess, Optimal sequence for CDMA with decision-feedback receivers, IEEE
Transactions on Information Theory, vol. 49, pp. 886-900, April 2003.
[57] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley &
Sons, Inc, 1991.
[58] A. Marshall and I. Olkin, Inequalities: Theory of Majorization. New York: Acad
emic, 1979.
[59] Y. Jiang, J. Li, and W. Hager, Uniform channel decomposition for MIMO
communications, IEEE Transactions on Signal Processing, available online:
http://www.sal.ufl.edu/yjiang/papers/ucdR3.pdf, to appear 2004. The conference
version presented at Asilomar conference, Nov. 2004.

104
[60] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge: Cambridge
University Press, 1991.
[61] E. Beltrami, Sulle funzioni bilineari, Giornale De Matematiche, vol. 11, pp. 98-
106, 1873.
[62] C. Jordan, Mmoire sur les formes bilinaires, J. Math. Purs Appl., vol. 19,
pp. 35-54, 1874.
[63] I. Schur, On the characteristic roots of a linear substitution with an application
to the theory of integral equations, Math. Ann., vol. 66, pp. 488-510, 1909.
[64] N. J. Higham, Accuracy and Stability of Numerical Algorithms. Philadelphia:
SIAM, 1996.
[65] P. Kosowski and A. Smoktunowicz, On constructing unit triangular matrices with
prescribed singular values, Computing, vol. 64, pp. 279-285, 2000.
[66] G. H. Golub and C. Reinsch, Singular value decomposition and least squares so
lution, Numerische Mathematik, vol. 14, pp. 403-420, 1970.
[67] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns
Hopkins University Press, 1983.
[68] J. H. Wilkinson and C. Reinsch, Linear algebra, in Handbook for Automatic
Computation (F. L. Bauer, ed.), vol. 2, (Berlin), Springer-Verlag, 1971.
[69] H. Weyl, Inequalities between two kinds of eigenvalues of a linear transformation,
Proc. Nat. Acad. Sci. U. S. A., vol. 35, pp. 408-411, 1949.
[70] A. Horn, On the eigenvalues of a matrix with prescribed singular values, Proc.
Amer. Math. Soc., vol. 5, pp. 4-7, 1954.
[71] M. T. Chu, A fast recursive algorithm for constructing matrices with prescribed
eigenvalues and singular values, SIAM J. Numer. Anal., vol. 37, pp. 1004-1020,
2000.
[72] Y. Jiang, J. Li, and W. Hager, Joint transceiver design for
MIMO communications using geometric mean decomposition, IEEE
Transactions on Signal Processing, to appear. Available online:
http://www.sal.ufl.edu/yjiang/papers/gmdCommR2.pdf.
[73] M. T. Chu, Inverse eigenvalue problems, SIAM Rev., vol. 40, no. 1, pp. 1-39
(electronic), 1998.

BIOGRAPHICAL SKETCH
Yi Jiang was born in Yixing, Jiangsu Province, China, in November 1978. He
received his B.Sc. degree in Electronic Engineering and Information Science from the
University of Science and Technology of China (USTC) in July 2001, the M.S. degree
in Electrical Engineering from the University of Florida (UF) in May 2003.
105

I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Jian Li, Chair
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Kenneth K. O
Professor of Electrical and Computer
Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
John M. Shea
Assistant Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Tan F. Wong
Associate Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
William W. Hager
Professor of Mathematics

TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
Yi Jiang
(352) 392-5241
Department of Electrical and Computer Engineering
Chair: Jian Li
Degree: Doctor of Philosophy
It is recently discovered that deploying multiple transmitting and multiple receiv
ing antennas in a wireless communication system can drastically improve the data rate
and reliability of wireless communications, even without consuming additional band
width and input power. This so-called multi-input multi-output (MIMO) technology
has been under intense research and will be applied to the next generation of wireless
communication networks.
This dissertation focuses on designing practical transceiver designs for MIMO sys
tems with sound theoretical foundations. Three designs, i.e., the GMD, UCD, and TCD
schemes, are proposed. These designs represent a paradigm shift from the conventional
linear designs to the nonlinear designs while keeping the implementation complexity
low. It is proven, both through theoretical analyses and numerical simulations, that the
three designs are much better than their linear counterparts in that they can achieve
faster and more reliable communications. The schemes proposed in this dissertation
will probably play an important role in the next generation wireless fidelity (Wi-Fi)
and digital subscribe line (DSL) technologies. Besides its engineering significance, this
dissertation also invents two matrix decompositions, which are significant contributions
to the numerical analysis community.

I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor ofvPJtosbj)hy.
Jian Li, Chair/
Professor anelectrical and
Computer Engineering
1 certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
Kenneth K. <
Professor or Electrical and Computer
Engineering
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctpt-^Philosoghy. j
13 Shea
Assistant Professor of Electrical and
Computer Engineering
I certify that 1 have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Docfor of Philosophy.
Associate Professor ofTilectrical and
Computer Engineering
1 certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
(aJ
William W. Hager
Professor of Mathematics

This dissertation was submitted to the Graduate Faculty of the College of En
gineering and to the Graduate School and was accepted as partial fulfillment of the
requirements for the degree of Doctor of Philosophy.
May 2005
Pramod P. Khargonekar
Dean, College of Engineering
Kenneth Gerhardt

92
Combining (6.28) and (6.29) with the singular value decomposition H = VEW*
gives
H = VQ1Q2 Qjt-iCRP^_j.. P*P*W*.
Hence, we have obtained the GTD with
Finally, note that if r is real, then Gj and G2 are real, which implies R is real.
We summarize the steps of the GTD algorithm as follows. To make it easier to
distinguish between the elements of the matrix R and the elements of the given diagonal
vector r, we use Rij to denote the (i,j) element of R and r, to denote the z-th element
of r.
1. Let H = VEW* be the singular value decomposition of H, and suppose we are
given r 6 CA with r < a. Initialize Q = V, P = W, R = S, and k = 1.
2. Let p and q be defined as follows:
p = arg min{|i?i| : k |rfc|},
i
q = arg max{|Ri| : k i
In R, P, and Q, perform the following exchanges:
(Rkk, Pfc+l,fc+l)
(Rl:fc-l,fci Rl:fc-l,fc+l)
(P:,fc> P;,fc+l)
(Q:.k, Q:,fc+l)
{Rpp> Rqq)
(Rl:/cl,pi Rl:fc1,9)
- (P,p,P,q)
(Q:,p>Q:.i)
3. Construct the matrices Gx and G2 shown in (6.15). Replace R by GjRGi, replace
Q by QG2, and replace P by PGj.
4. If k = K 1, then go to step 5. Otherwise, replace k by k + 1 and go to step 2.
5. Multiply column K of Q by Rkk/tk', replace Rkk by rK The product QRP* is
the GTD of H based on r.
A Matlab implementation of our GTD algorithm appears in the Appendix. Given the
SVD, this algorithm for the GTD requires 0((m+n)K) flops. For comparison, reduction
of H to bidiagonal form by the Golub-Reinsch bidiagonalization scheme [66, 67, 68], often
the first step in the computation of the SVD, requires 0(mnK) flops.

designs proposed in this dissertation. Moreover, the two decompositions have significant
implications in the matrix analysis community. For instance, the GTD is a new solution
to the inverse eigenvalue problem.
x

36
Given the precoder of (4.13), the virtual channel is
G = HF = UA1/2iT = UET
(4.14)
Ga =
(4.15)
where E = A4>1/2 is a diagonal matrix with diagonal elements Let Ga denote
the augmented matrix
r UEft*
Vah
The UCD scheme is based on the following lemma.
Lemma 4.2.1 For any matrix of the form given in (4-15), we can find a semi-unitary
matrix ft G CL*K such that the QR decomposition of Ga yields an upper triangular
matrix with equal diagonal elements.
Proof: Rewrite (4.15) as
Ga =
U[E:
Voii-L
(4.16)
where 20 Cixi is a unitary matrix whose first K columns form !. We further rewrite
(4.16) as
G =
Ij\ir 0
0 S7q
We can have the following GMD:
U[E: 0/cX(x,_/c)]
Vah
5-
(4.17)
Ji
U[E : 0ft-x(i/_ft-)]
- QjRjP*,
(4.18)
where Rj G M.LxL is an upper triangular matrix with equal diagonal elements and
Qj G C(Mr+L)xL Semi-unitary and Pj G CLxL is unitary. Inserting (4.18) into (4.17)
yields
IjWr 0
0 fig
Ga =
QjRjPjLq.
(4.19)
Let 70 = P} and
Qg =
l A/,.
o n
0
Q ./
(4.20)
Then (4.19) can be rewritten to be Ga = QGaRj which is the QR decomposition of Ga.
The semi-unitary matrix FI associated with Gu consists of the first K columns of fi0 (or
P})-

where
11
A = (R* RxITR^HR*) 1 (2.24)
B = (Rx R^ITR^HR*)_1 RXH*R~1 (2.25)
and o is irrelevant to the present discussion. From (2.22)-(2.25) we have
~dlog/fx, Y) = (Rx RxH*RHRx)_1 (x RaH*R;1y). (2.26)
where x is the conjugate of x. Here we define the differentiation with respect to a
complex-valued vector as [35, Appendix B]
d_
<9w
1
2
/
_2_
dui J dv\
\
a
duM
f dvM )
d_
<9w
a
du\
d
duM
+ jA
+ 3
dvM )
(2.27)
where the mth entry of w, wm urn + jvm, m = 1 The Bayesian Fisher
information matrix (FIM) [36] is given by
FIM = E
dlog/(x,y) <91og/(x,y)T
dx dx
(2.28)
Based on (2.26) and (2.21), we obtain
FIM = A [I : R-rFPRy1]
Rx RxH*
I
HRj Rj,
-R^hr*
= (Rx-RxH*Ry1HRx)1
= Rj1 + H*(Ry HRXH*)-1H
= R^ + HR^H
(2.29)
(2.30)
(2.31)
Comparing (2.20) and (2.31), we see that
R = log2|Rx|+log2|FIM|
(2.32)
= log2 |Rx| log2 |CRB|
(2.33)
where
CRB = FIM"1 =RX- R.ITR^HR, (2.34)
This shows that there exists a simple relation between the Gaussian MIMO channel
throughput, which is an upper bound of the information transmission rate for any

15
After some straightforward calculations, we have
A-1
lim CUPL Cmtm = K log2 bps/Hz. (2.52)
(IE, *il)v
Note that for any real valued sequence {A//^}^ > 0, the arithmetic mean is greater than
or equal to the geometric mean, or Lii ^h\ > ^h\^J Hence we conclude
that limp^ooCupl Cmtm > 0 and the equality holds if and only if {\h,}: are aU
the same. We infer from (2.52) that the capacity loss of the MTM transceiver can be
quite large if the channel matrix H has a large condition number, which is verified in
Section 3.4.
If the same constellation is used for each subchannel, then the substream cor
responding to the largest Et dominates the overall BER performance. Recall that
a-1
Ei k=1 y, which is proportional to the inverse of AHence the sub-
(P+Z^fc=1
channels may have very different SNRs especially when H has a large condition number.
To mitigate this undesirable effect, one can use the MMD transceiver, or MAX-MSE
(cf. [26] Section V-A5), with
F mmd = Emtm, (2.53)
where is a unitary matrix that makes all the diagonal elements of E in (2.41) the
same, that is,
1 K
E=kYE'- (254)
1=1
According to (2.47), the capacity of the channel using the MMD linear transceiver is
Thus
K
C,
MMD
= -K log2 = -K log2 ^
1=1
(2.55)
j K K
= K^-^YE'~YXo^Ei
= K log2 K
= K log2
1 Li: Ei
1 IK
i ,-i
(iE, e,)
A A-1
K AH,t
(IE, Ail)
(2.56)
(2.57)
(2.58)
Cmtm Cmmd

95
6.4 Conclusions
In this chapter, we introduce two novel matrix decomposition algorithms, including
the geometric mean decomposition and the generalized triangular decomposition. The
GMD H QRP* is a solution of the maximin problem (6.3); the smallest diagonal
element of R is as large as possible. Starting with the SVD, we show that the GMD
can be computed using a series of Givens rotations, and row and column exchanges.
Alternatively, the GMD could be computed directly, without performing an initial SVD,
if H is first reduced by unitary transformations to a real matrix. In a further extension of
our algorithm for the GMD, we show in [51] how to compute a factorization H = QRP*
where the diagonal of R is any vector satisfying the Weyl multiplicative majorization
conditions [69]. The GTD represents the most general unitary decomposition H =
QRP* '. That is, the diagonal r of R must satisfy r < cr, where cr is the vector of singular
values for H, while for any diagonal r with r ^ cr, we can write H = QRP*. The GTD
includes, as special cases, the singular value decomposition, the Schur decomposition,
the QR decomposition, and the GMD. Similar to GMD, given the SVD, the GTD
based on r can also be evaluated using a series of Givens rotations and permutations.
The GTD algorithm provides a new proof of Horns theorem [70], Applications of the
GTD include transceiver design for MIMO communications [72] and inverse eigenvalue
problems surveyed extensively in [73].
Appendix
Matlab Implementation of GTD
/, Input:
7.
"/, H = U*S*V (singular value decomposition of H)
'/, U and V orthonormal columns
/o S diagonal matrix with nonnegative diagonal entries
1 r desired diagonal of R
'/, nnz (r) = nnz (S)
'/ r multiplicatively majorized by diag (S)
7, product nonzero r = product nonzero diag (S)
*/.
'/, Output:
7.

60
where p, 1 < i < L, denotes the output SINR of the th subchannel, and AGa i is given
in (5.18). If
mL = {iog(i+A)}
{igAG,Kii = c>
(5.24)
then (5.20) and (5.23) hold, which implies the existence of fi (the first K columns of
pt).
Conversely, suppose that there exists a semi-unitary matrix 2 such that the linear
precoder F = V capacities {Rk}k=v Let Ga QGaRGa be the QR decomposition. It follows from
Theorem 5.2.3 that (5.20) holds. Hence, by (5.23), we conclude that (5.24) holds.
The proof of Theorem 5.3.1 is constructive. Indeed, given the SVD of H and the
power loading level \$1/,;2, we only need to calculate Ac, AGq, and the GTD of AGo given
in (5.19). Then we immediately obtain the linear precoder
F = V\$1/2Ot = V
&1/2:0Kx{L-K)
P.
(5.25)
Let Qga denote the first Mr rows of QGa. Then it follows from (5.22) that
Qca = U[Ag:0kx(l-K)]AgQ
= U
r: oKx(L-K)
Q,
(5.26)
where T is diagonal with its ith diagonal element being 7
to Lemma 4.1.1, the nulling vectors are calculated as
7t7r According
w = rc!,uqG.,i, 1 < i < L, (5.27)
where rGai is the zth diagonal element of RGq and qGo is the ith column of QGa.
In the GTD algorithm, P and Q are obtained via multiplication of L 1 Givens ro
tation matrices. Hence calculating (5.25) and (5.26) needs 0(Mt(L+K)) and 0(Mr(L+
K)) flops, respectively. We note that the decoding starts with the Lth layer, then the
L lth, and so on.
Given the SVD of H and the power allocation level the TCD-VBLAST scheme
needs to run the procedures summarized in Table 5-1. If Mt Mr K, then the TCD-
VBLAST scheme requires only 0(L2 + K2 + KL) flops, given the SVD of the channel
matrix.

4.5 Numerical Examples 42
4.6 Conclusions 44
5 TUNABLE CHANNEL DECOMPOSITION 52
5.1 Introduction 52
5.2 Channel Model and Preliminaries 53
5.2.1 Channel Model 53
5.2.2 Channel Decomposition 53
5.2.3 Majorization and Generalized Triangular Decomposition .... 55
5.3 Tunable Channel Decomposition 57
5.3.1 TCD-VBLAST 57
5.3.2 TCD-DP 61
5.4 MIMO Communications with QoS Constraints 63
5.5 CDMA Sequence Design 70
5.5.1 CDMA Sequences Maximizing Sum Capacity 71
5.5.4 Numerical Example 73
5.5.5 Further Remarks 75
5.6 Conclusions 75
6 NOVEL MATRIX DECOMPOSITIONS 81
6.1 Introduction 81
6.2 Geometric Mean Decomposition 81
6.2.1 Generalized Maximin Properties 83
6.2.2 Implementation Based on Initial SVD 84
6.3 Generalized Triangular Decomposition 87
6.3.1 Existence of GTD 87
6.3.2 The GTD Algorithm 87
6.3.3 Inverse Eigenvalue Problem 93
6.4 Conclusions 95
7 CONCLUSIONS .' 99
REFERENCES 100
BIOGRAPHICAL SKETCH 105
v

35
The SINRs given in (4.8) are related to R//a as shown in the following lemma:
Lemma 4.1.2 The diagonal o/R//a given in (4-5) and {p,}^ given in (4-8) satisfy
a(l + pi) = r2Haii, i = 1,2,..., Aft- (4-9)
Proof: See Appendix A.
An immediate corollary follows.
Corollary 4.1.3 The MMSE-VBLAST detector is information lossless. That is,
M,
^ log(l + Pi) = log |H*Ha-1 + I|, (4.10)
t=i
where the right hand side of (4-10) is equal to (2.7) with F = 1m,
Proof: From (4.4) and (4.5), we have
H*Ha-1+I = a-1H0*H. = a-1R/io*Riro. (4.11)
Hence
Mt
log |H*Ha_1 +1| = Â£ log (cT1^) (4.12)
=i
According to Lemma 4.1.2,
log |H*Ha-1 +1| = ^ log(l + p).
=i
We note that Corollary 4.1.3 coincides with the findings in [48].
4.2 UCD-VBLAST
If we modify the precoder F given in (2.8) to be
F = V\$1/2iT (4.13)
where 1 Â£ (to avoid capacity loss, we should not choose L < K
in general) and fl*Pl = I, then we see through inserting (4.13) into (2.7) that the
F given in (4.13) is also a precoder maximizing the channel throughput. However,
introducing S7 brings much greater flexibility than the precoder of (2.8). In the following,
we concentrate on how to design 2.

TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
By
YI JIANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2005

56
Definition 2 An n x n matrix P is doubly stochastic if its (i,j)th entry p^ > 0 for
i,j = 1,. , n, and Pij = 1 and Pij = 1.
Theorem 5.2.1 x -<+ y if and only if there exists a doubly stochastic matrix P such
that x = Py.
A square matrix II is said to be a permutation matrix if each row and column has a
single one, and all the other entries are zero. There are n! permutation matrices of size
n x n.
Theorem 5.2.2 The permutation matrices constitute the extreme points of the set of
doubly stochastic matrices. Moreover, the set of doubly stochastic matrices is the convex
hull of the permutation matrices.
It follows from Theorems 5.2.1 and 5.2.2 that the set {x|x -<+ y} is the convex
hull spanned by the n! points which are the permutations of y.
As we have mentioned before, given K parallel subchannels with capacities C\, C2,..., CV,
which are obtained via SVD, TCD can convert the K subchannels into L > K subchan
nels with capacities R\,R2,... ,Rl if and only if (RX,R2,..., Rl) -<+ (Cj,..., Ck, 0,..., 0) G
ML. For example, for a MIMO channel H with rank K 3, assume that the capacities
of the 3 subchannels obtained via SVD are C\ > C2 > C3. If L = K, then TCD can
decompose the MIMO channel into 3 subchannels with a rate vector r = (Rx, R2, R3) if
and only if r lies in the convex hull
Co
(A
C2
V Ca )
Here Co stands for the convex hull defined as
(5.9)
CofS} {01! + + 6Â¡ 0,9\ -F ... -)- 6Â¡< = 1}. (5.10)
In general, the capacity region is a convex hull defined by K\ vertices in a K-
dimensional space. Since the TCD is capacity lossless, i.e., Q the
capacity region falls into a (K l)-dimensional hyperplane. The gray area in Figure
5-1 shows the convex hull of (5.9) with Ci = 3, C2 = 2, and C3 = 1. In this case, the 6
vertices lie in the 2-D plane {x : Y^Â¡=i xi =6}. An interesting special case is the UCD
scheme [59], which achieves the rate vector corresponding to the center of the convex
hull, i.e., r = (2, 2,2).

13
before, this inflexible transceiver scheme can bring many difficulties to the subsequent
coding/decoding and modulation/demodulation procedures.
2.3 Rate Performance of Linear Transceivers
To gain more insights into the limitations of the linear transceiver designs, we ana
lyze the asymptotic rate performances of two typical linear transceiver designs for high
SNR. We will show that the linear transceivers may suffer from considerable capacity
loss and there is apparently a fundamental tradeoff between the throughput and the
BER performance.
According to the channel model of (2.1), the received data vector is
y = HFx + z. (2.39)
The optimal linear receiver is always the LMMSE equalizer (also see, e.g., [23])
Gop( = F*H* (HFFHV2 + ct2I)_1, (2.40)
which yields the optimal estimate of the information symbol s = G^y. The mean-
squared-error (MSE) matrix of s is
E = (I + a_1F*H*HF)_1. (2.41)
Note that E is a function of the linear precoder F. In the following, we analyze two linear
precoder designs based on the minimization of the trace of the MSE matrix (MTM) and
the minimization of the maximum diagonal elements of MSE matrix (MMD) criteria,
which are referred to as ARITH-MSE and MAX-MSE in [26], respectively. We choose
these two schemes because they appear to be the most typical ones and the MMD
scheme yields the optimal (or very close to the optimal) performance among all the
linear transceivers. Indeed, the MMD is equivalent to the linear MIN-BER scheme in
the flat fading channel case (see [26]). We do not consider the SVD plus water filling
The MTM scheme, or ARITH-MSE, which has appeared in several linear transceiver
design papers (see, e.g., [22] [23] and [26]), attempts to minimize tr(E) with respect to
F. The MTM precoder turns out to be
F MTM = V*1/2
(2.42)

66
Proof: We replace the inequalities in (5.50) by the equivalent constraints obtained
by taking logs:
k k
Y los(^) > Y 1 =1 =1
The Lagrangian C associated with (5.50), after this modification of the constraints, is
K / k
p) = Y ( Vk Y (los(^*)_ los(A))
k= 1 \ =1
By the first-order optimality conditions associated with ip, there exists i > 0 with the
property that the gradient of the Lagrangian with respect to ip vanishes. Equating to
zero the partial derivative of the Lagrangian with respect to ipj, we obtain the relations
K
i=j
Hence, ipj ipj+1 = p,j >0.
Using Lemma 5.4.1, we can gain insights into the structure of a solution to (5.49).
Lemma 5.4.2 There exists a solution ip to (5.49) with the property that for some integer
Vt+i < ipi for all i < j, Vi+i > fpi for oil i > j, ipi = -rj for all i > j. (5.51)
*H,i
In particular, ipj < ipi for all i.
Proof: If ip is a solution of (5.49) with the property that ipi > -J for all 1 < j <
K, then by the convexity of the constraints, it follows that ip is a solution of (5.50). By
Lemma 5.4.1, we conclude that Lemma 5.4.2 holds with j K. Now, suppose that ip is
a solution of (5.49) with ipi = 1 /X2H i for some i. We wish to show that ipk = 1 /X2H k for
all k > i. Suppose, to the contrary, that there exists an index k > i with the property
that ipk = 1/A2Hk and ipk+i > 1/A^ fc+1. We show that components k and k + 1 of ip
can be modified so as to satisfy the constraints and make the cost strictly smaller. In
particular, let ip(e) be identical with ip except for components k and k -f 1:
ipk(e) = (1 + e)ipk
and
ipk+i(e) =
*Pk+1
1 + e
(5.52)
For e > 0 small, ip(e) satisfies the constraints of (5.49). The change A(e) in the cost
function of (5.49) is
A(e) (1 + e)ipk +
A+i
-ipk- ipk+1.
1 + e

47
where Â£a {{aj x : Ya=i a > m}' Hence
P{Â£) =
2
Mm+i
< p ai)dai... dam
(4.57)
Prom (4.55), we know that as e * 0,
P(gl Using (4.52) (4.58) and (4.57), we calculate the diversity gain as
<^gmd(^Uim)
i r TTm P m+*)Q ,
_ Jej 1 li=l (M-m-fi)!
poo log P
log L.+ p~ (Ai-m+*)- don dam.
- lim 2 -
p>oo log P
m
m + z)a,
Â£q i=i
(4.59)
(4.60)
where Â£+ = Â£Q H^t > 0, i = 1,..., m). To obtain (4.60) from (4.59), we have used the
property that the integral in the numerator of (4.59) is dominated by the term with the
SNR exponent closest to zero, as p oo (see [16] for details). Here the integration is
constrained over Â£+ because the integration over Â£a is dominated by the one over Â£+.
The reason is as follows. Suppose only ani,... 1anj > 0, j < m, and the other as,
Qfcl,.. .,akm_., are negative. Then
m j
n^(5M-m+i < P ) ~ < P
i=l =1
Let Â£+ denote {{a,H=1 > 0 : Â£<=i ani > m -
j m
*"i) p-EUl(.M-m+ni)Qni
afcJ. Clearly,
inf (M m + rii)an > inf (M m + i)at,
+
which implies that the integration over Â£a is dominated by that over Â£+. Solving the
optimization problem of (4.60) yields
dGtjm(M, m) = (M -m + 1 )m.
(4.61)

8
2.1.2 Channel Capacity
Suppose x is a Gaussian random vector. The capacity of the MIMO channel (2.1)
is
C = log2
|<72I + (T2HFF*H*|
kzi|
(2.6)
where | | denotes the determinant of a matrix. If both CSIT and CSIR are available, we
can maximize the channel capacity with respect to F given the input power constraint
cr2Tr{FF*} = pa\. That is,
CjT = max log2 |I + a-1HFF*H*|, (2.7)
<7Â¡Tr{FF*}=p<72
where a is as defined in (2.2) and the subscript of Cjt stands for informed transmitter.
Denote the SVD of H as H = UAV*, where A is a K x K diagonal matrix whose
diagonal elements {A//*})^ are the nonzero singular values of H. The solution to F in
(2.7) is [3]
F = V\$1/2. (2.8)
Here \$ is diagonal whose /cth (1 < k < K) diagonal element 4>k determines the power
loaded to the fcth subchannel and is found via water filling to be
+
(2.9)
Mp0 = yi~ ^ j
with fi being chosen such that Ylk=i fail*) Pal an(l (a)+ = max{0,a}. Then the
solution to (2.7) is
Cjt = log2 (l + Atf'j bps/Hz. (2.10)
k=i \ a /
Note that some of *.s can be zeros. In this case, we can only transmit L < K data
streams.
If the CSIT is not available, the optimal transmission strategy is to evenly allocate
I poc
power to each antenna [3], For this case, F = \/7tIm( and the channel capacity with
V Mt
uninformed transmitter (UT) is
Cut = log2 f 1 + ) bps/Hz.
n=l t
(2.11)
1 Throughout this dissertation, we assume that the coherent time of the channel goes
to infinity. Hence advanced coding is applicable to approach the Shannon capacity.

61
Table 5-1: The TCD-VBLAST Scheme
step
operation
flops
1
Calculate Ac = A\$1/i2
0(K)
2
Obtain Ago using (5.18)
O(K)
3
Apply GTD to A.Ga to obtain (5.19)
SWT
4
Generate F using (5.25)
0(Mt(L + K))
5
Compute using (5.26)
0(Mr(L + K))
6
Calculate using (5.27)
0{MrL)
5.3.2 TCD-DP
Similar to UCD, the TCD also have two implementation forms, which are dual to
each other. As a dual form of TCD-VBLAST, the TCD scheme can be implemented by
using a DP precoder, which we refer to as TCD-DP. For TCD-DP, a direct construction
of the linear precoder F as done in Section 5.3.1 is not obvious. Instead, we exploit the
uplink-downlink duality revealed in [49] to obtain TCD-DP. This technique is also used
in [59].
We first apply the TCD-VBLAST scheme to the reverse channel
y = H*Fx + z, (5.28)
where the roles of the transmitter and receiver are exchanged and the H in (5.1) is
replaced by H*. Then we obtain the precoder F and the equalizer W = [wi,..., w]
from H* according to (5.25) and (5.27), respectively. Applying F and the VBLAST
detector with nulling vectors {w}f=1, we obtain L subchannels
1
w*y = ^ + w*z, i = 1,..., L,
j=i
(5.29)
where the zth subchannel (5.29) is free of interference from the yth (j > i) subchannels
which are detected and cancelled out in advance. The SINR of the subchannel (5.29) is
Pi
lw*H*f|2a^
(5.30)
Note that replacing w by w,, which is obtained by scaling w, such that ||w|| = 1, does
not change p, since the output SINR is invariant to the length of w,. Also note that
a = 1, i.e., a\ a2. Hence (5.29) can be simplified to be
|w*H*f|2
Pi =
(5.31)

59
Let the augmented matrix Ga be defined as
Ga =
UAGf2T
h
(5.16)
J (Mr+L)xL
After some straightforward calculations, we can obtain the SVD of Ga as the following:
, -i
Ga =
U[Ag : 0Kx(L_K')]AGa
f0A'
A-Ga 0 >
(5.17)
where fi0 RLxL is orthogonal with its first K columns forming fi and the diagonal
matrix AGa contains the singular values of Ga:
\A + A1 < < K,
1, i > K.
(5.18)
According to Theorem 5.2.3, we can apply GTD to obtain
AGa = QRGaPT
(5.19)
if and only if the diagonal elements of RGa E RÂ£'xi', which we denote as {'g0,*}1,
satisfy
{AGo,af=1. (5.20)
Note that both Q and P in (5.19) are real-valued matrices because AGa is a real-valued
diagonal matrix. Inserting (5.19) into (5.17) yields
G =
U[Ag : 0Kx(L-K)]AGa
fi0AGJ
TnT
QRc, P fi
(5.21)
Choose fi0 = PT and define
QGa =
U[Ag : QKx(L-K)]A.Gla
fioA,
-i
Q
(5.22)
Then (5.21) can be rewritten asGa = QGaRGa, which is the QR decomposition of Ga.
By Lemma 4.1.2, it follows that for a = 1, (5.20) is equivalent to
{Id- P*}=i = {rGa,,,},=! dx {AGoii}f=1,
(5.23)

78
Appendix B
Proof of Lemma 5.4.3
First suppose that 7Â¡ > 1 /Af. By the arithmetic/geometric mean inequality, the
problem
1 1 1
min ^ fai subject to n**n A, ^>0, (5.73)
1=1 1=1 1=1
has the solution fai =7Â¡, 1 < i < l. Since A//, is a decreasing function of i and
71 > 1 /X2Hi, we conclude that fai 7; satisfies the constraints fai > 1/Af{i for 1 < i < l.
Since l attains the maximum in (5.53),
k
if >n&
i=l
for all k <1. Hence, by taking fa = 7/ for 1 < i < Z, the first l inequalities in (5.49) are
satisfied, with equality for k = l, and the first l lower bound constraints fa > 1/A^t are
satisfied.
Let fa denote any optimal solution of (5.49). If
1 1
ii*-n (s-74)
=i =1
then by the unique optimality of fa = 7*, 1 < i < l, in (5.73), and by the fact that the
inequality constraints in (5.49) are satisfied for k G [1,Z], we conclude that fa = 7Â¡ for
all i [1, /]. On the other hand, suppose that
1 1
n>*?>n>=t- (5-75)
Â¡=1 =i
We show below that this leads to a contradiction; consequently, (5.74) holds and fa = 7/
for i [1,/].
Define the quantity
i/Â¡
7*=(n ^
u=i
By (5.75) 7* > 7j. Again, by the arithmetic/geometric mean inequality, the solution of
the problem
( 1
min fa subject to n fa > 7I1 ^>0, (5.76)
1=1 1=1
is fa 7 for i [1,Z]. By (5.75), 7* > 7Â¡ and fa satisfies the inequality constraints in
(5.49) for k 6 [1,Z],

91
where the last equality in (6.25) comes from (6.22). Hence, for j > t, it follows that
nw = wimMn) ( n w)
=1 \.=1 / \i=t+2 /
= wi*fi(nnfij(iift+i)-Nf[iifi. m
\i=i / \*=t+i / <=i
Combining (6.23), (6.24), and (6.26), we have a+ < b+.
Case 2: s > t. As before, (6.23) holds. For t < j < s, we have
nwi-nwsnw-Mnufi.
=1 i=l =1 t=l
where the first equality comes from the relation j < s, the middle inequality is the
induction hypothesis, and the last equality is (6.26). Rearranging this gives
fDnwisnw. <6-27)
Since |iZj|/1rfc| = |aj|/|as| > 1 when j < s, we deduce from (6.27) that
aj-i -< bnj-i
when j < s. This also holds for j > s due to (6.24) and (6.26). This completes the
proof of (c).
Hence, there exists an upper triangular matrix \ with occupying the first
K 1 diagonal elements, and unitary matrices Q and P, = 1,2,..., /C 1, such that
Rw = (QU Q;QI)S(P,P2 ... Pfc_i).
(6.28)
Equating determinants in (6.28) and utilizing the identity r-k> = for 1 < i < K 1,
we have
K
nw
=i
Wi
k)\ / K
K
K
' K
Vk\
nw =n-*=nN.
0=1
i=l
i=l
where the last equality is due to the assumption r -< a. It follows that |rj^| = |r/c|.
Let C be the diagonal matrix obtained by replacing the (K, K) element of the identity
matrix by /rK. The matrix C is unitary since |rfc|/|r^| = 1. The matrix
R = C*R(*>
(6.29)
has diagonal equal to r due to the choice of C.

82
Here the ctj are the singular values of H, and a is the geometric mean of the positive
singular values. Thus R is upper triangular and the nonzero diagonal elements are the
geometric mean of the positive singular values.
We were led to this decomposition when trying to optimize the performance of
multiple-input multiple-output (MIMO) systems. However, this decomposition has
arisen recently in several other applications. In [64, Prob. 26.3] Higham proposed
the following problem:
Develop an efficient algorithm for computing a unit upper triangular matrix
with prescribed singular values 1 < i < K, where the product of the ax
is 1.
A solution to this problem could be used to construct test matrices with user specified
singular values.
The solution of Kosowski and Smoktunowicz [65] starts with the diagonal matrix E,
with 2-th diagonal element crÂ¡, and applies a series of 2 by 2 orthogonal transformations
to obtain a unit triangular matrix. The complexity of their algorithm is 0(K2). Thus
the solution given in [65] amounts to the statement
QjEPo-R, (6.1)
where R is unit upper triangular.
For general E, where the product of the ax is not necessarily 1, one can multiply E
by the scaling matrix a-1I, apply (6.1), then multiply by a to obtain the GMD of E.
And for a general matrix H, the singular value decomposition H = VEW* and (6.1)
combine to give the H = QRP* where
Q = VQo and P = WPu.
According to (3.11), we consider the problem of choosing Q and P to maximize the
minimum of the
max min : 1 < i < K\
Q.P j
subject to QRP* = H, Q*Q = I, P*P = I, (6-2)
r,j = 0 for i > j, Re MA *K,
where K is the rank of H. Since the GMD of H is feasible in (6.2), we conclude that
the GMD yields the optimal solution to (6.2).

16
where to get (2.58) from (2.57) we have used (2.46). Note that the relative capacity loss
of MMD compared with MTM is independent of SNR given that all the subchannels
are used. Interestingly, we can see from (2.58) and (2.52) that Cmtm ~ Cmmd =
lim^oo Cupl Cmtm We conclude that asymptotically for high SNR, the MMD
transceiver has twice the capacity loss of MTM, i.e.,
V A-1
lim CVPL Cmmd = 2K log2 ^ ^ bps/Hz, (2.59)
(nr., as-)
although it may yield better BER performance. An intuitive explanation of the capacity
loss of the MMD transceiver is as follows. Note that the only difference between MTM
and MMD is the prerotation matrix , which is an invariant operator in terms of
information capacity. However, makes the MSE matrix E non-diagonal, which means
that the elements of s = Gopty are correlated. Clearly, the correlation contains useful
information for symbol detection and decoding. However, the linear equalizer ignores
the correlation, which results in the additional capacity loss quantified in (2.58). The
analyses here are verified in Section 3.4.
In summary, the MTM transceiver suffers from capacity loss of (2.52) due to the
ceiver suffers from additional capacity loss because it makes the MSE matrix non
diagonal. Hence there is an apparently inevitable tradeoff between the information
rate and BER performance if the same symbol constellation is used in the different sub
channels. In the next chapter, we will introduce the GMD scheme and clarify that there
is not necessarily a tradeoff between BER performance and channel capacity. Indeed,
the GMD scheme attempts to achieve the best of both worlds simultaneously.

25
subchannels, which is numerically verified to be able to achieve near optimal capacity
even at low SNR.
Let us sort the singular values of H as A// i > Ah,2 > ^h,k > 0. If GMD is
constrained to the first n < K eigen subchannels, we obtain n identical subchannels
Di = \nXi + Zi, for i = 1,... ,n.
where
A =
\=i
(3.29)
(3.30)
To maximize the channel throughput with our GMD scheme, we need to solve the
following problem
max n log 1 +
l n5<
or
max
l 1 +
\ *=i
(3.31)
(3.32)
The solution to this problem is straightforward. We can use either linear search or
bisection method to find the optimal n.
Several remarks are in order, i) It is straightforward to incorporate the channel
selection into the GMD algorithm. In Section 6.2.2, we show that GMD starts from
SVD H = UAV* and then applies a series of Givens transformation to A to make it
upper triangular. The Givens transformation can be constrained to the first n < K
diagonal elements of A. ii) The blind channel subspace tracking can be combined with
the subchannel selection strategy seamlessly. If only the subchannels corresponding to
the largest n < K singular values are selected, the blind channel tracking technique will
track the n dimensional subspace automatically, iii) The performance loss of the GMD
scheme at low SNR region is due to the well-known fact that the zero-forcing equalizer
is inherently suboptimal. In the next chapter, we propose the so-called uniform channel
decomposition (UCD) scheme, which can decompose a MIMO channel into multiple
identical subchannels in a strictly capacity lossless manner.
3.3.4 Further Remarks
The author later noticed [30] in which an idea similar to GMD was proposed to
approach the performance of the ML detector in the ISI suppression scenario. For a
SISO ISI channel, if symbols are precoded and transmitted in a block manner, then the

74
the spreading sequences of the four users as the columns of the matrix
( 10.0000 -12.0745 -6.4974 -3.0926 ^
S =
V
0 0 7.4138 -15.5760
0 8.8312 -13.3801 -6.3686
The nulling vectors used by the base station are the columns of the matrix
W =
/ 0.0990 -0.0015 -0.0037 -0.0104 ^
0 0 0.1157 -0.0522
0 0.1098 -0.0077 -0.0213
(5.63)
(5.64)
We note that for this uplink scenario, the base station detects the fourth mobile user,
which has the spreading sequence corresponds to the fourth column of S, first and the
first user last.
If the prescribed SINRs of the four users remain the same in the downlink scenario,
the spreading sequences used by the base station are the four columns of the matrix
F -
1 17.1936
0
l
-0.2303
0
17.0149
-0.5154
16.0012
-1.0614
-1.2796 ^
-6.4449
-2.6352 y
(5.65)
In this case, the base station applies the dirty paper precoder to the first mobile user
first and the last user last. Note that the columns of F and W in (5.64) are the same
up to a scaling factor. Moreover, tr(FFT) = tr(SST) = 892.7274. which means that the
power consumed in the base station equals to the overall power used by the four mobile
users. At the mobile end, the users use the nulling vectors
S =
1 10.0000
0
V 0
-12.0745 -6.4974
0 7.4138
8.8312 -13.3801
-3.0926 N
-15.5760
-6.3686 ,
(5.66)
which are exactly the spreading sequences used in the uplink scenario. Scaling the output
signals may be necessary at the mobile ends for the subsequent dirty paper-decoder. But
the signal scaling does not influence the output SINR.
If zeros are not allowed in the spreading sequences, we can left multiply S and W
a 3 x 3 orthogonal matrix to eliminate the zeros in S.

41
Definition 4.4.1 Let Pe(p) denote the average error probability of a scheme at SNR p.
The diversity gain of the scheme is
d =
lim !2fS.
p-> OO log p
(4.41)
The diversity gain measures how fast the error probability decays with SNR. We note
that diversity gain is usually discussed without assuming the availability of CSIT. The
reason is that diversity gain is a concept associated with channel outage, i.e., the case
where the channel is too poor to support a target data rate. Using CSIT, one can adjust
the transmission rate to avoid channel outage. However, if the rate is fixed, which is
desirable in practice, we can also use diversity gain as a performance measure of the
transceiver designs. Based on this observation, we analyze the diversity gains of the
UCD and GMD schemes. The result is summarized in the following proposition.
Proposition 4.4.2 Consider the i.i.d. Rayleigh flat fading MIMO channel defined in
(4-1). Let M = max(Mt,Mr) and m min(Mt,Mr). The diversity gains of the GMD
and the UCD schemes are
dGMD{LI,m) (M m + 1 )m, and dUCD(M,m) = Mm, (4.42)
respectively.
We have applied the typical error event analysis (see [16] [50]) to obtain (4.42). The
details are relegated to Appendix B. We see that although UCD has a negligible coding
gain compared with the GMD scheme at high SNR, it has an additional m2 m diversity
gains over GMD. An interesting point to make is that water filling does not help improve
diversity gains. Hence at high SNR, water filling is useless in both capacity and diversity
aspects.
Given the fact that the GMD scheme is asymptotically capacity lossless for high
SNR, it is rather surprising to see the large diversity loss of GMD compared with UCD.
We give an intuitive explanation as follows. Note that diversity gain is determined by
the typical error events that the MIMO channel is in deep fade. Namely, the diversity
gain of a scheme depends on its ability of dealing with bad channels. A deeply faded
channel with high input SNR is equivalent to a normal channel with low SNR, in
which scenario the GMD scheme is far less efficient than UCD as shown in the numerical
examples. Consequently, the GMD has less diversity gain than UCD.

REFERENCES
[1] I. E. Telatar, Capacity of multiple antenna Gaussian channels, AT&T Technical
Memorandum, June 1995.
[2] G. J. Foschini, Jr., Layered space-time architecture for wireless communication in
a fading environment when using multi-element antennas, Bell Labs Tech. Journal,
vol. 1, pp. 41-59, Autumn 1996.
[3] I. E. Telatar, Capacity of multi-antenna Gaussian channels, European Transac
tions on Telecommunications, vol. 10, no. 6, pp. 585-595, 1999.
[4] G. J. Foschini and M. J. Gans, On limits of wireless communications in a fading
environment when using multiple antennas, IEEE Journal on Selected Areas in
Communications, vol. 17, pp. 1841-1852, November 1999.
[5] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky, Simplified
processing for high spectral efficiency wireless communication employing multiple-
element arrays, Wireless Personal Communications, vol. 6, pp. 311-335, March
1999.
[6] M. Sellathurai and S. Haykin, Turbo-BLAST for wireless communications: Theory
and experiments, IEEE Transactions on Signal Processing, vol. 50, pp. 2538-2545,
October 2002.
[7] M. Sellathurai and G. J. Foschini, Stratified diagonal layered space-time architec
tures: signal processing and information theoretic aspects, IEEE Transactions on
Signal Processing, vol. 51, pp. 2943-2954, November 2003.
[8] G. J. Foschini, D. Chizhik, M. J. Gans, C. Papadias, and R. A. Valenzuela, Analy
sis and performance of some basic space time architectures, IEEE Journal on
Selected Areas in Communications, vol. 21, pp. 303-320, April 2003.
[9] B. Hassibi, A fast square-root implementation for BLAST, Thirty-Fourth Asilo-
mar Conf. Signals, Systems and Computers, pp. 1255-1259, November 2000.
[10] S. L. Ariyavisitakul, Turbo space-time processing to improve wireless channel ca
pacity, IEEE Transactions on Communications, vol. 48, pp. 1347-1358, August
2000.
[11] J. Benesty, Y. Huang, and J. Chen, A fast recursive algorithm for optimum sequen
tial signal detection in a BLAST system, IEEE Transactions on Signal Processing,
vol. 51, pp. 1722-1730, July 2003.
[12] S. M. Alamouti, A simple transmit diversity techniques for wireless communica
tions, IEEE Journal on Selected Areas in Communications, vol. 16, pp. 1451-1458,
October 1998.
[13] V. Tarokh, N. Seshadri, and A. R. Calderbank, Space-time codes for high data
rate wireless communications: Performance criterion and code construction, IEEE
Transactions on Information Theory, vol. 44, pp. 744-765, March 1998.
100

21
semi-unitary as well. Indeed, the constraint that P and Q should be semi-unitary is in
fact inactive as shown in the following lemma established in Section 6.2.1.
Lemma 3.2.1 The GMD of (3.12) is also the solution to the following optimization
problem with relaxed constraints:
max min {ra : 1 < i < K\
P,Q
subject to R = Q*HP, r^ = 0 for i > j, Re RKxK,
(3.13)
r > 0, 1 < i < K,
tr(Q*Q) < K, tr(P*P) < K.
Proof: Omitted. See Section 6.2.1 for details.
The GMD, which can be viewed as an extended QR decomposition, can be read
ily combined with the aforementioned VBLAST (GDFE) or ZFDP. GMD-VBLAST is
implemented as follows: We first calculate the GMD H = QRP*. Next we choose the
precoder F = P, then the equivalent data model is
y = QRx + z. (3-14)
The next step is nothing but the VBLAST detector.
Ignoring the error propagation effect, we can regard the resulting subchannels as K
independent and identical subchannels
yi = XHxi + zi, for i = l,...,K. (3.15)
The GMD-ZFDP scheme is similar to GMD-VBLAST because of the duality be
tween VBLAST and ZFDP.
3.3 Performance Analyses and Implementations Issues
In this section, we first present the performance analyses of the GMD scheme from
capacity perspective, from which we demonstrate the advantages of our GMD scheme
over the linear transceivers. Next, we consider combining the GMD scheme with the
blind two-way channel subspace tracking in the TDD scenario. To achieve close to opti
mal performance at low SNR, we propose to combine GMD with subchannel selection.
Finally, we discuss the relationship between our GMD scheme and [30].
3.3.1 Performance Analyses
As we have mentioned earlier, the overall BER performance of a MIMO commu
nication system is dominated by the worst subchannels asymptotically for high SNR.

80
Since the product of the components of 0* is equal to the product of the components
of f3, from (5.78) and (5.79) we get
K K
7* na=n* w)K
=1 i=l
Hence, 7 > 0* > 1/A^ fr z J- particular, if Z < j, then 7Â¡ > 1/A#,,
or, l > j when 7Â¡ < 1/A#,. As a consequence, 0* = j}.

49
M, = 10, M > 10, SNR = 0 dB M, = 10, M = 10, SNR = 10 dB
t r I r
(a)
(b)
M -10, M =10, SNR = 20 dB
t r
Mt = 10, Mf = 10, SNR = 30 dB
Figure 4-1: Complementary cumulative distribution function of the capacity of an i.i.d.
Rayleigh flat fading channel with Mt = 10 and Mr 10. Results based on 2000 Monte
Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20 dB, and (d) 30 dB.

89
Column k
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
ix
o!
0
0
; X
X:
0
0
Row
k
X:
0
0
X ;
0
0
X
0
X
0
X X
n*RU)n Gjn'R^nG,
Figure 6-2: The operation displayed in (6.15)
ITR^n, and Gj
If |<5i| = |<521 = |r/t|, we take c = 1 and s = 0; if |i| ^ |2|, we take
c<5j s62
X 0
c s
rk x
S2 cSi
0 s2
s c
0 y
(6.15)
(G5)
(n*R(fc)n)
(Gi)
(R(*+b)
In either case,
c
x =
,H2-|<52|2
| stf-iw
sc(|2|2-|1|2)rfc
and s = \/l c2.
and y =
(6.16)
(6.17)
N2 M2
n*R(>n to G5n*RWnGx. The dashed box is the 2 by 2 submatrix displayed in
(6.15). Notice that c and s, defined in (6.16), are real scalars chosen so that
c2 + s2 1 and (?\6\ |2 + s2|2|2 = |r*.|2. (6.18)
With these identities, the validity of (6.15) follows by direct computation. By the choice
of p and q, we have
M < M < |ii|. (6.19)
If |5i| |2|, it follows from (6.19) that c and s are real nonnegative scalars. It can be
checked that the 2 by 2 matrices in (6.15) associated with Gi and G. are both unitary.
Consequently, both GÂ¡ and G2 are unitary. We define
R(fc+1) = (nG2)*R(fc)(nG!) = Q;R(i')pfc,

85
II be a permutation matrix with the property that ITR^II exchanges the (k + l)-st
diagonal element of with any element r^,, p > k, for which rw < a. If < a,
then let II be chosen to exchange the (k + l)-st diagonal element with any element rpp,
p > k, for which > a. Let 8\ = r[^ and 82 = rpp denote the new diagonal elements
at locations k and k + 1 associated with the permuted matrix IIRa n
Next, we construct orthogonal matrices Gi and G2 by modifying the elements in
the identity matrix that lie at the intersection of rows k and k + 1 and columns k and
k + 1. We multiply the permuted matrix IIR^n on the left by Gj and on the right
by Gi. These multiplications will change the elements in the 2 by 2 submatrix at the
intersection of rows k and k + 1 with columns k and k + 1. Our choice for the elements
of Gj and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of Gj,
IIRWn, and G,:
(6.8)
c8\
s2
1
0
c s
1
QI
H
o
<0
1
1
c8\
1
O
s c
0 y
(GJ)
(nR<>n) (Gj)
(R(fc+1>)
If 1 = 82 = cr, we take c = 1 and s = 0; if 8\ ^ 82, we take
In either case,
c =
x -
62 and s = vT
A
and y = .
a
(6.9)
(6.10)
Since a lies between 1 and 82, s and c are nonnegative real scalars.
Figure 6-1 depicts the transformation from R(fc) to GjlIR^IIGi. The dashed
box is the 2 by 2 submatrix displayed in (6.8). Notice that c and s, defined in (6.9), are
real scalars chosen so that
c2 + s2 = 1 and (c8\)2 4- (s2)2 = a2.
With these identities, the validity of (6.8) follows by direct computation. Defining
Qk = IIG2 and P/t = nG^ we set
R(k+D = QTR(^)Pfc (6,n)
It follows from Figure 6-1, (6.8), and the identity det (R(fc+1>) = det (R(fc)), that (a)
and (b) hold for L = k+ 1. Thus there exists a real upper triangular matrix R^, with

45
H
hi
Ha,, =
vM
7 ha,i
0(-l)xl
0(M,-t)xl
Let H, (H) denote the submatrix containing the first i columns of Ha (H) and ha i
(hj) the th column. Then
(4.44)
For the QR decomposition Ha = Q//aR//a, the geometric implication of rua ii is
the component of haa projected onto the subspace spanned by the ith column of Q//o,
i.e., qHa,i- Note that q//ai is orthogonal to the subspace spanned by {qH0,j}j=i or,
equivalently, the column space of Hence
rHa, = K^h^K, (4-45)
where stands for the orthogonal projection onto th null space of AT. Therefore
2 i_ *
^Ha,ii ^a,i
I-HQ,i_! (HV.jH^x) 1 H;,_!
Inserting (4.44) into (4.46) yields
ba.i-
(4.46)
rlaM = a + h* I H_x (H^jHi-i + a:!)-1 H*_j
h,
= a + ah? (Hi-iHJ.! + al) 1 h,
(4.47)
From (4.8), we see that
Pi = h* (Hj_1H*_1 + al)-1 hj. (4.48)
Hence r2Ha ii = a(l + pt). The lemma is proven.
Appendix B
Proof of Proposition 4.4.2
Without loss of generality, we assume H e CMxm, each of whose entry is of cir
cularly symmetric Gaussian distribution with zero-mean and unit variance. Consider
BPSK modulation. The average error probability of the GMD scheme is
Q ^\/2Pgmd^ E
qUwb)
= E
Q
t
\
/ m \ X/m\
2p (n )
V
\=i / y
(4.49)

CHAPTER 3
MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOMPOSITION
3.1 VBLAST and ZF-DP
In this section, we first give a brief introduction to the VBLAST architecture [5],
which is equivalent to the generalized decision feedback equalizer (GDFE) [37]. We also
introduce the more recent zero-forcing dirty paper precoder (ZFDP) applied to the
3.1.1 VBLAST
VBLAST is a simple suboptimal receiver interface which is used in the MIMO
system assuming that only CSIR is available. For a MIMO system (2.1) with Mt <
Mr and rank K = Mt, the transmitter allocates independent bit streams across the
Mt transmitting antennas with no precoding. To decode the transmitted information
symbol, VBLAST first estimates the signal with the spatial structure hAit, where h,
denotes the zth column of H, and then cancels it out from the received signal vector.
Next, it estimates the signal with spatial structure hA/t-i and so on. The signal estimator
can be either the ZF or MMSE estimator. Some proper reordering of the columns of H is
helpful to improve the BER performance [5]. This decoding scheme involves sequential
nulling and cancellation which is proved to be equivalent to the generalized decision
feedback equalizer (GDFE) [37].
The ZF nulling step in the VBLAST scheme can be represented by the QR decom
position H = QR where Q is an Mt x K matrix with orthonormal columns and R is a
K x K upper triangular matrix. Let us rewrite (2.1) as
y = QRx + z. (3.1)
Multiplying Q* to both sides of (3.1) yields
y = Rx + z, (3.2)
17

43
schemes. The complementary cumulative distribution functions (CCDF) of the capacity
drawn out of 2000 Monte-Carlo realizations of H are shown in Figure 4-1. We see that
the UCD scheme outperforms the GMD scheme significantly at low SNR although the
difference becomes smaller at higher SNR.
Figure 4-2 shows the CCDFs of the channel capacities of a 5 x 5 independent
Rayleigh flat fading channel with SNR equal to 25 dB. The five thin dashed curves
denote the channel capacities of the five subchannels obtained via SVD plus water
filling. Note that the leftmost thin dashed curve crosses the vertical axis at a value
less than one, which means that the worst subchannel (corresponding to the smallest
singular value of the channel matrix) is sometimes discarded by water filling. The thick
solid line is the CCDF of the capacity of the L = 5 subchannels obtained via UCD. All
these subchannels have the same capacity. As discussed in Section 4.2, a rank K MIMO
channel can be decomposed into L > K subchannels. The thin solid line represents
the case where a MIMO channel is decomposed into 7 identical subchannels using the
UCD scheme. Figure 4-2 demonstrates the advantages of our UCD scheme over the
conventional SVD plus bit allocation scheme (see, e.g., [19]). The channel capacities
of the 5 subchannels obtained via SVD plus water filling range from 0 to about 11
bps/Hz, which suggests that the BPSK or QPSK modulation should be used to match
the capacity of the worst subchannel and something like 1024 or 2048 QAM to the best
subchannel. This bit allocation significantly increases the modulation/demodulation
complexity. Using GMD or UCD, we can decompose a rank 5 MIMO channel into 5
subchannels and hence the same constellation with a reasonable size, say 128-QAM, can
be used to reap most of the channel capacity. The UCD scheme can do even better.
In this example, after decomposing a MIMO channel into 7 subchannels via UCD, we
can apply a small to moderate constellation, say 16-QAM or 64-QAM, to achieve the
channel capacity.
In the third example, we assume Rayleigh independent flat fading channels with
Mt 4 and Mr = 4. We compare the BER performance of the GMD and UCD
schemes along with the conventional MMSE-VBLAST with optimal detection ordering
in Figure 4-3. We see that both GMD and UCD outperform the conventional VBLAST
detector significantly. Moreover, the BER vs. SNR lines of the GMD and UCD schemes
have much steeper decreasing slopes, which means much better diversity gains, than the
conventional VBLAST. The diversity gains of the GMD and UCD schemes are 4 and 16,

22
Hence the scheme optimizing the worst subchannel can enjoy the optimal BER per
formance for high SNR. This observation is also the motivation of the aforementioned
MMD scheme. As a major advantage over the linear transceiver schemes, the GMD
scheme is also asymptotically optimal in terms of the channel capacity for high SNR as
we will show below.
If the signal power is allocated evenly to the K subchannels, then based on (3.15),
we get
Cgmd = K\og2 (l +(3.16)
where p is defined in (2.2). The channel capacity with uniform power loading on the K
subchannels is (see (2.50))
K
Cupl = ^ log2 (l + ) (3.17)
It follows from (3.16) and (3.17) that
Cupl CGmd = log2
rin=l (! + P^H,n)
(1 -I- pXjj)
From (3.12b) and (3.18), we have
(3.18)
lim Cupl Cgmd 0. (3.19)
p* OO
Based on Lemma 2.1.1
lim CÂ¡T Cupl = 0. (3.20)
p+ OO
Hence, it follows from (3.19) and (3.20) that
lim Cit Cgmd = 0, (3-21)
p*oo
i.e., for high SNR the GMD scheme is asymptotically optimal.
Hence the GMD scheme does not need to make the tradeoff between the information
rate and BER performance as the conventional linear transceivers. Instead, our GMD
scheme can achieve the optimum on both aspects simultaneously for high SNR.
As we have mentioned before, VBLAST may suffer from error propagation. Hence
the BER performance of GMD-VBLAST will be inferior to the scalar equivalence in
(3.15). We calculate the upper bound of the GMD-VBLAST BER as follows. For a
fixed SNR p, we assume that the system of (3.15) has symbol error rate (SER) Pe, i.e.,
each subchannel has SER Pe/K. We consider the worst case that decoding errors in some

94
singular values a were computed using the Matlab routine SVD and the eigenvalues A
were computed using Matlabs EIG. By the theorem of Weyl [69], \ < a. We then
used both SVD_EIG and GTD to generate matrices with the specified singular values and
eigenvalues. Five different matrices of each dimension were generated and the average
running time is reported in Table 6-1.
The times shown in Table 6-1 indicate that GTD becomes increasingly more efficient
than svd_EIG as the matrix dimension increases. For a dimension of 100, GTD is about
three times faster than svd_eig. For a dimension of 1600, GTD is about 70 times faster
than SVD_EIG. In an efficiently designed compiled code, the difference in speed between
these two approaches to inverse eigenvalue problems could be more substantial: the
permutations appearing in GTD could be replaced by the updating of a pointer array;
also, the columns of R that are zero except for the diagonal entry could be flagged,
and when multiplying R by Glt we can skip the multiplication of these essentially zero
columns.
In Table 6-1 we also compare the specified singular values and eigenvalues to those
obtained by applying Matlabs SVD and EIG routines to the generated matrices. That is,
for each matrix output by either svd_eig or GTD, we use Matlabs routines to compute
the singular values and eigenvalues, and we compare to the specified singular values and
eigenvalues using the sup-norm. The errors reported in Table 6-1 are the average errors
for the 5 random matrices of each dimension. Both routines generate matrices with
singular values that match those computed by Matlabs SVD routine to within 13 or 14
digits. Observe that GTD always matches exactly the prescribed eigenvalues since the
generated matrix is triangular, with the specified eigenvalues on the diagonal. The error
in the eigenvalues of the matrix generated by SVD_EIG was comparable to the singular
value error for matrices of dimension up to 400. Thereafter, the error in the eigenvalues
grew quickly. When the matrix dimension doubled from 400 to 800, the error increased
roughly by the factor 103. And when the matrix dimension doubled again from 800 to
1600, the error increased roughly by the factor 106.
A recursive algorithm can require a significant amount of memory. While SVD_EIG
executed, we monitored the memory usage using the Unix top command. We observed
that for a matrix of dimension 1600, the memory consumption grew to 319 MB. Since a
complex double precision matrix of dimension 1600 occupies about 41 MB memory, the
recursion required more than 7 times as much space as the matrix itself.

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
By
Yi Jiang
May 2005
Chair: Jian Li
Major Department: Electrical and Computer Engineering
This dissertation studies the signal processing aspect of multi-input multi-output
(MIMO) communications. The contribution of this dissertation is twofold.
First, this dissertation presents a new perspective to the MIMO communications:
any MIMO scheme can be regarded as a MIMO channel decomposer, which decomposes
(in an information lossy or lossless manner) a MIMO channel into multiple scalar sub
channels. Based on this perspective, this dissertation presents three novel MIMO trans
ceiver designs, the geometric mean decomposition (GMD) scheme, the uniform channel
decomposition (UCD) scheme, and the tunable channel decomposition (TCD) scheme.
All these schemes deploy either a decision feedback equalizer (DFE) at the receiver or
a dirty paper precoder (DPP) at the transmitter. These transceiver designs represent a
paradigm shift from the conventional linear MIMO transceiver designs to the nonlinear
ones. The superior performance of the GMD and UCD schemes unveils the practical
significance of making transmitter and receiver cooperate with each other. That is, such
cooperations facilitate achieving the optimal tradeoff between the diversity gain and
multiplexing promised by the MIMO communication theory. The TCD scheme repre
sents a unifying solution to a considerably wide range of problems, including designing
the precoder for orthogonal frequency division multiplexing (OFDM) communications
and the optimal code division multiple access (CDMA) sequence design.
Second, this dissertation introduces two novel matrix decomposition algorithms, i.e.,
the geometric mean decomposition (GMD) and the generalized triangular decomposition
(GTD). The two matrix decompositions form the cornerstones of the three transceiver
IX

5
solution to a wide range of applications, including the applications in which independent
data streams with different qualities-of-service (QoS) share the same MIMO channel and
design the optimal CDMA sequences.
The mathematical foundations of this dissertation, the GMD and GTD algorithms,
are established in Chapter 6. The two novel matrix decomposition algorithms are the
cornerstones of the MIMO transceiver designs proposed in this dissertation.
The conclusions are given in Chapter 7.
To read this dissertation, it is unnecessary to plunge into the details of the GMD
and GTD algorithms. For this reason, we put them to the latter part of the disserta
tion. However, a rough understanding of the two algorithms is necessary to appreciate
Chapters 3-5.

28
To demonstrate the effectiveness of the subchannel selection approach, we consider
a 10 x 10 independent Rayleigh flat fading channel. The channel is usually ill-conditioned
since some singular values of H are very close to zero. Without the subchannel selection
strategy, GMD suffers from performance degradation, especially at low SNR, as seen
in Figure 3-3. On the other hand, with the subchannel selection scheme, there is only
about 0.2 bit/sec/Hz rate loss compared with the CÂ¡T, even at very low SNR.
We compare the BER performance of the GMD-VBLAST scheme with the unpre-
coded MMSE-VBLAST scheme with the optimal detection ordering, the MTM scheme
and the MMD scheme. No error correcting code is used in the simulations. In Fig
ure 3-4(a), H C4x2 has identically independent Rayleigh fading elements. Hence the
channel matrix is usually well-conditioned. Two independent symbol streams modulated
as 16-QAM are transmitted. The figure is obtained by averaging 1000 Monte Carlo tri
als of H. We see that the GMD scheme has more than one dB improvement over the
MMD scheme at moderate to high SNR. In Figure 3-4(b), H condition number, in which case the MMD scheme is subject to more capacity loss as
analyzed in Section 2.3. Four independent symbol streams are transmitted. The BER
performance of the GMD scheme is much better than the others. We did not include
MTM because it discards some bad subchannels and hence cannot be used to transmit
four independent data streams.
In the final example, we combine the GMD scheme with 64-point FFT based
OFDM for ISI suppression in a SISO channel. We assume that the channel response
hi, l = 0,1,..., L 1, are independent zero-mean circularly symmetric Gaussian random
variables with unit variance. The channel length is L = 4. The GMD-ZFDP is about 2
dB better than GMD-VBLAST. This is because GMD-VBLAST suffers from consider
able error propagation effect. This result suggests that GMD-ZFDP may be preferred
over GMD-VBLAST if the channel has a large dimensionality.
3.5 Conclusions
In this chapter, we introduce a novel joint transceiver design, which combining
the geometric mean decomposition (GMD) with the VBLAST equalizer or dirty paper
precoder. The GMD scheme can decompose a MIMO channel into multiple identical
scalar subchannels. This desirable property can bring about much convenience to the
practical system design, particularly the symbol constellation selection. Moreover, we
have shown that the GMD scheme is optimal asymptotically for high SNR in terms of

page
ACKNOWLEDGMENTS iii
LIST OF TABLES vi
LIST OF FIGURES vii
ABSTRACT ix
CHAPTER
1 INTRODUCTION 1
1.1 Two Categories of Schemes for MIMO Communications 1
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate 2
1.3 MIMO Transceiver Design from Channel Decomposition Perspective 3
1.4 Dissertation Outline 4
2 LINEAR MIMO TRANSCEIVER DESIGNS 6
2.1 Channel Model and Channel Capacity 6
2.1.1 Channel Model 6
2.1.2 Channel Capacity 8
2.2 Channel Capacity and Cramr-Rao Bound 10
2.3 Rate Performance of Linear Transceivers 13
3 MIMO TRANSCEIVER DESIGN USING GEOMETRIC MEAN DECOM
POSITION 17
3.1 VBLAST and ZF-DP 17
3.1.1 VBLAST 17
3.1.2 ZF-DP 18
3.2 Geometric Mean Decomposition for MIMO Transceiver Design .... 20
3.3 Performance Analyses and Implementations Issues 21
3.3.1 Performance Analyses 21
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking 23
3.3.3 Subchannel Selection 24
3.3.4 Further Remarks 25
3.4 Performance Examples 27
3.5 Conclusions 28
4 UNIFORM CHANNEL DECOMPOSITION 33
4.1 Closed-Form Representation of MMSE-VBLAST 34
4.2 UCD-VBLAST 35
4.3 UCD-DP 39
4.4 Performance Analysis 40
4.4.1 Diversity Gain Analysis 40
4.4.2 Further Remarks 42
IV

90
where = nG2 and Pfc = FIGi. By (6.15) and Figure 6-2, R(fc+1) has properties (a)
and (b) for L k + 1. Now consider property (c).
We write a ~ b if a and b are equal after a suitable reordering of the components.
Let a, b, a+, and b+ be vectors whose components are ordered in decreasing magnitude,
and which satisfy
a ~ rk:K, b ~ rj^, a+ ~ rk+1:K, and b+ ~ (6.20)
Thus at is the z-th largest (in magnitude) component of rk:x. By the induction hypoth
esis, we have a ^ b. To establish (c), we need to show that a+ X b+. Let the index s
be chosen so that as = rk, and let the index t be chosen so that
M > \rk\ > |6t+i|. (6.21)
By the definition of p and q, bt and = 6i+1. As seen in (6.20), a+ is obtained
from a by deleting as = rk. The vector r(fc+1^ is obtained from r(fc) by a unitary trans
formation that changes the value of two elements. In particular, b+ is obtained from b
by replacing the adjacent pair bt and 6(+i by
btbt+lrk
By (6.21) |6t| > |y| > |6t+i|. Consequently,
K = y. (6.22)
We partition the proof of (c) into 2 cases.
Case 1: s < t. Since at+ < a for all f, a -< b, and = bf for 1 < i < t, we have
^ ai:ti = b^t_j. (6.23)
For j > t > s, it follows from the induction hypothesis and the connection between a
and a+ that
J-1 j~ 1 3 3
n n ia+i=ii n i^i=n n ^ n n- ^
=1 =1 =i =1
Since Gj and G2 are unitary, the determinant of (6.15) gives
m\ = |r'fc)rf | \btbt+i\ = \rky\ = |rfc6+|,
(6.25)

3
1.3 MIMO Transceiver Design from Channel Decomposition Perspective
In this dissertation, we present a new perspective to the MIMO communications.
We regard the aforementioned MIMO schemes as MIMO channel decomposers, which
decompose (in an information lossy or lossless manner) a MIMO channel into mul
tiple scalar subchannels. For instance, the MIMO transceiver design based on SVD
decomposes a MIMO into multiple eigen-subchannels. Similarly, the V-BLAST scheme
decomposes a MIMO channel into multiple scalar subchannels which are referred to as
layers by its inventors. These channel decompositions, however, are totally determined
by the specific channel realization and one can have little control over how the channel
is decomposed. For example, the gains of the subchannels obtained via SVD are totally
determined by the singular values of the channel matrix, which one can have no control
over.
An interesting question arises: if the transmitter and receiver are allowed to col
laborate, how can we design a transceiver that can decompose a MIMO channel into
multiple subchannels with prescribed channel gain, and without incurring capacity loss?
This dissertation is devoted to answering this question. In the process of pursuing the
answer, we investigate the following aspects of the problem.
First, we show that the conventional linear transceivers are inherently inflexible,
and we cannot rely on linear transceivers to achieve our desired channel decomposi
tions. Hence we need to go beyond the linearity constraint and investigate the nonlinear
schemes, such as a decision feedback equalizer (DFE) and a dirty paper precoder (DPP).
Second, we study the possibility of new matrix decompositions other than using
SVD. We propose two novel matrix decomposition algorithms, the geometric mean de
composition (GMD) and the generalized triangular decomposition (GTD). The two de
compositions represent a wide class of matrix decomposition, which has significant im
plications in the matrix analysis community. For instance, the GTD is a new solution
to the inverse eigenvalue problem.
Third, we propose three transceiver designs which combine the new matrix decom
position algorithms with the DFE and DPP. The three designs are the GMD scheme, the
uniform channel decomposition (UCD) scheme and the tunable channel decomposition
(TCD) scheme. Among them, the UCD scheme can decompose, in a strictly capacity
lossless manner, a MIMO channel into multiple subchannels with identical capacities or,
equivalently, identical channel gains. Moreover, the UCD scheme is a practical scheme

98
7, apply G1 to R
R (l:kml, [k kpl]) = R (l:kml, [k kpl])*Gl
end
R (k, k) = rk ;
R (k, kpl) = x ;
d (kpl) = y ;
7 permute the columns
P (:, [k p]) = P (:, [p k]) ;
P (:, [kpl q]) = P (:, [q kpl]) ;
Q (:, [k p]) = Q (:, [p k]) ;
Q (:, [kpl q]) = Q (:, [q kpl]) ;
7. apply G1 to P
P (:, [k kpl]) = P (:, [k kpl])*Gl ;
7. apply G2 to Q
Q (:, [k kpl]) = Q (: [k kpl])*G2 ;
end
R (K, K) = r (K) ;
if ( r (K) ~= 0 )
Q (:, K) = q (:, K)*d (K)/ r (K) ;
end
P = P (:, 1:K) ;
Q = Q (:, 1:K) ;

7
i.e.,
ho 0 /ijvf-i
hi ho 0 /im-i
hi
h2
H =
0
(2.4)
h\i-2 hM-1
hM-1 hM-2 hi h0 0
O hM-1 hM-2 h\ ho
Here, Mt = Mr = Ar. In either case, if the block data are precoded with the linear
precoder F, then the received data are given in (2.1). This ISI channel problem has
been studied in [21] [30].
In an idealized synchronous CDMA (S-CDMA) system where the channel does
not experience any fading or near-far effect, L mobile users modulate their information
symbols via spreading sequences {s,}^, each of which has the processing gain N. The
discrete-time baseband S-CDMA signal received at the (single-antenna) base-station can
be represented as [31]
y = Sx + z
(2.5)
where S = [si,...,s] 6 RNxL and the Ith (1 < l < L) entry of x, xÂ¡, stands for
the information symbol from the fth user. In the downlink channel, the base station
multiplexes the information dedicated to the L mobile users through the spreading
sequences, which are the columns of S. Then all the mobiles receive the same signal
given in (2.5). We remark that (2.5) can also be written as (2.1) with H = IN and F = S.
Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading sequences
amounts to optimizing the precoder F for a MIMO system. Indeed, this problem has
been under intensive research in the past several years.
In summary, both designing a precoder for OFDM transmission through an ISI
channel and searching for the optimal S-CDMA sequences can be regarded as special
cases in the unifying framework of MIMO transceiver designs. MIMO transceiver designs
can be used in the OFDM and CDMA applications after only simple modifications. In
this dissertation, we will concentrate on MIMO transceiver design although we will
discuss the optimal design of CDMA sequences in Chapter 5.

64
substreams, and dedicate one subchannel to each substream. In [28], the authors studied
the same problem. They proposed a linear transceiver design which, similar to TCD, can
also control the SINR of each subchannel via designing the precoder. However, the linear
transceiver is capacity lossy and can suffer from considerable performance degradation
compared with our TCD scheme as we will show at the end of this section. Given that
all the subchannels meet the QoS constraints, we want to minimize the overall input
power. We need to solve the following optimization problem:
minp
subject to
tr (FF*)
diag(R) = {V1 + Pi)i=\-
(5.45)
Here QR denotes the QR decomposition and diag(R) denotes the vector formed by the
diagonal of R. According to Lemma 4.1.2, the diagonal of R determines the SINRs of
the subchannels. Without loss of generality, we assume that pi > p2 > ... > pl- We
now consider a problem whose constraints are more relaxed than those of (5.45):
minp tr (FF*)
subject to XGa {v/1 + Pi)i=i, Ga =
(5.46)
where \Ga stands for the singular values of the augmented matrix Ga. In general, for
any matrix A, we let denote the singular values of A. By Theorem 5.2.3, if F is
feasible in (5.45), then F is feasible in (5.46). We now further simplify (5.46) and show
that its solution provides a solution of (5.45).
Theorem 5.4.1 If H = UAV* is the singular value decomposition of H, then (5.46)
has a solution of the form F = V41/2 where \$ Â£ RKxK is a diagonal matrix with
diagonal elements 4>i, 1 < i < K, chosen to solve the problem
min* Y!=i &
subject to nti(l + x2H,ii) > H=i(l + P). k > 4>k+\ > 0, l
iiiLiU+^H,^) nti(i+Pi)'
(5.47)

103
[45] B. C. Banister and J. Zeidler, Feedback assisted transmission subspace tracking
for MIMO systems, IEEE Journal on Selected Areas in Communications, vol. 21,
pp. 452-463, April 2003.
[46] A. Poon, D. Tse, and R. W. Brodersen, An adaptive multiantenna transceiver
for slowly flat fading channels, IEEE Transactions on Communications, vol. 51,
pp. 1820-1827, November 2003.
[47] B. Hassibi, An efficient square-root implementation for BLAST, http://cm.bell-
labs. com/who/hochwald/papers/squareroot/, 2000.
[48] M. Varanasi and T. Guess, Optimum decision feedback multiuser equalization
with successive decoding achieves the total capacity of the Gaussian multiple-access
channel, Conference Record of the Thirty-First Asilomar Conference on Signals,
Systems and Computers, vol. 2, pp. 1405 1409, Nov 2-5 1997.
[49] P. Viswanath and D. Tse, Sum capacity of the vector Gaussian broadcast channel
pp. 1912-1921, August 2003.
[50] D. Tse and P. Viswanath, Fundamentals of Wireless Communications. Available:
http://inst.eecs.berkeley.edu/~ee224b/sp04/#Course Notes, 2004.
[51] Y. Jiang, W. Hager, and J. Li, The generalized triangular decomposi
tion, SIAM Journal on Matrix Analysis and Applications, Available online:
http://www.sal.ufl.edu/yjiang/papers/gtd.pdf Submitted.
[52] Y. Jiang and J. Li, Adaptable channel decomposition for MIMO communica
tions, IEEE International Conference on Acoustics, Speech, and Signal Processing,
[53] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, second edi
tion, John Wiley and Sons, Inc., 1984.
[54] P. Viswanath and V. Anantharam, Optimal sequences and sum capacity of syn
chronous CDMA systems, IEEE Transactions on Information Theory, vol. 45,
pp. 1984-1991, September 1999.
[55] P. Viswanath, V. Anantharam, and D. Tse, Optimal sequences, power control
and user capacity of synchronous CDMA systems with linear MMSE multiuser
receivers, IEEE Transactions on Information Theory, vol. 45, pp. 1968-1983, Sep
tember 1999.
[56] T. Guess, Optimal sequence for CDMA with decision-feedback receivers, IEEE
Transactions on Information Theory, vol. 49, pp. 886-900, April 2003.
[57] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley &
Sons, Inc, 1991.
[58] A. Marshall and I. Olkin, Inequalities: Theory of Majorization. New York: Acad
emic, 1979.
[59] Y. Jiang, J. Li, and W. Hager, Uniform channel decomposition for MIMO
communications, IEEE Transactions on Signal Processing, available online:
http://www.sal.ufl.edu/yjiang/papers/ucdR3.pdf, to appear 2004. The conference
version presented at Asilomar conference, Nov. 2004.

CHAPTER 2
LINEAR MIMO TRANSCEIVER DESIGNS
2.1 Channel Model and Channel Capacity
2.1.1 Channel Model
We consider a communication system with Mt transmitting and Mr receiving an
tennas in a frequency flat fading channel. The sampled baseband signal is given by
y = HFx + z,
(2.1)
where x G CLxl is the information symbols precoded by the precoder F G CA,X and
y Â£ CMrXl is the received signal and H G CMrXMt is the channel matrix with rank K.
We assume Â£[xx*] = a^Ii and z ~ N(0, is the circularly symmetric complex
Gaussian noise, where I stands for an identity matrix with dimension L. We define the
input SNR as
p = = ^|Tr{F*F} 4 lTr{FF}, (2.2)
af a
where a = f. Designing the MIMO transceivers, including the precoder F and the
associated equalizer, is the focus of this dissertation.
We note that the data model in (2.1) is generic. For an intersymbol-interference
(ISI) channel with impulse response h = [Hq. hi,, /-m-i]T with (-)T denoting trans
pose, if a block data with length N are transmitted using the zero-padded OFDM,
then the received block data can also be written in the form of (2.1) with
H -
h0 0
ho
hM-i
0
0
0
0
0
0
0 h\f-i ... ho
o . . ;
0 0 ... 0 h\{-1
(2.3)
In this case, H is a Toeplitz matrix with its dimensionality Mt = N and Mr N+M1.
If the OFDM with cyclic prefix is used, the channel matrix is a circulant Toeplitz matrix,
6

I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor ofvPJtosbj)hy.
Jian Li, Chair/
Professor anelectrical and
Computer Engineering
1 certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
Kenneth K. <
Professor or Electrical and Computer
Engineering
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctpt-^Philosoghy. j
13 Shea
Assistant Professor of Electrical and
Computer Engineering
I certify that 1 have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Docfor of Philosophy.
Associate Professor ofTilectrical and
Computer Engineering
1 certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope and
quality, as a dissertation for the degree of Doctor of Philosophy.
(aJ
William W. Hager
Professor of Mathematics

50
Mr = 5, M( = 5 iid Rayleigh channel SNR = 25 dB
Figure 4-2: Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt = 5 and Mr = 5. Results
based on 2000 Monte Carlo trials.
Mf= 4, Mr= 4 iid Rayleigh channel, 16-QAM
Figure 4-3: Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt = 4 and MT = 4.

38
Applying the GMD matrix decomposition algorithm given in Section 6.2 to E yields
S (Q1Q2 Ql-ORjP-xPL, Pj). (4.27)
Hence
U[E: 0/cx(Z-iC)]
y/ah
U[S : Oft-x(_/c)]Â£ 1
(Q1Q2...Ql-1)Rj(PI_1PI_2...P[).
(4.28)
Then the linear precoder has the form:
F = V
\$1/2:0 kx(L-K)
PiP2 -. .P-i.
(4.29)
The nulling vectors are calculated according to (4.21) with rJtl = ^nf=i &i\ and
Qg = U[Â£ : 0^X(i,_/c)]SQiQ2... Qr,_i. (4.30)
Note that Q; and PÂ¡, l = 1,2,..., L, are Givens rotation matrices and hence calcu
lating (4.29) and (4.30) needs 0(Mt(K + L)) and 0(Mr(K + L)) flops, respectively.
We summarize the UCD-VBLAST scheme as follows 1
Table 4-1: The UCD-VBLAST scheme
step
operation
flops
1
Compute SVD H = UAV*
0(MtMrK)
2
Calculate h1/2 using (2.9)
0(K2)
3
S = AfcVa
0(K)
4
Obtain E using (4.26)
0(K)
5
Apply GMD to E to obtain (4.27)
0(P)
6
Generate F using (4.29)
0(Mt(K + L))
7
Compute QÂ£ using (4.30)
0(Mr(K + L))
8
Calculate {w}|lj using (4.21)
0(MrL)
Obviously, our UCD-VBLAST scheme has comparable computational complexity
to the SVD based linear transceiver designs. An observation relevant to practical imple
mentations is as follows. Note that the receiver does not have to calculate Step 6 since
CSIT is available and the transmitter can run Steps 1 to 6. However, if the receiver cal
culates F. which only takes a small number of flops, and feeds it back to the transmitter,
1 Steps 5-7 can be processed simultaneously as in the GMD algorithm.

26
data model (2.39) can be used to represent the received block data (cf. (2.3) and (2.4)).
Note that for this case, H is a Toeplitz matrix due to the time invariant property of
the IS I channel. A linear precoder design F was proposed in [30] such that the virtual
channel Ht = HF can be decomposed via QR decomposition to be Hu( = QR where
R has equal diagonal elements. We see that this equal diagonal idea is equivalent to
GMD. However, our GMD scheme, independently motivated by the MIMO transceiver
design problem, has several major advantages over the algorithm in [30]:
1. Our GMD scheme represents a paradigm shift from the conventional linear trans
ceiver designs to nonlinear designs and can be proven, both numerically and theo
retically, to have superior performance from both BER and information theoretic
aspects.
2. Our GMD algorithm is computationally much more efficient than that of [30],
Both algorithms start from the SVD of H which is followed by K 1 iterations.
The GMD involves 2K 2 fast Givens rotations. For a channel H with Mt =
Mr = K, the SVD requires 0(K3) flops while the GMD requires additional 0(K2)
flops. Thus the computational complexity of the GMD scheme is comparable
to the conventional linear transceiver schemes. However, the algorithm in [30]
involves multiplications and inversions of matrices in each iteration and the overall
computational burden turns out to be additional 0(/F4) flops.
3. For the GMD algorithm, only the information of HH*, and hence A and U, are
needed to calculate Q. However, for the algorithm in [30], the equalizer needs
to know both the precoder F and H, and hence Hvt = HF, in order to apply
the traditional QR to H,,(. Hence it cannot be combined with the aforementioned
blind two-way channel subspace tracking algorithm introduced in Section 3.3.2.
Like the algorithm in [30], the GMD scheme can also be combined with orthogonal
frequency division multiplexing (OFDM) for ISI suppression. For a SISO ISI channel
with memory L,
L-1
y{n) hix(n l) + z(n), (3.33)
1=0
after applying OFDM with block length N, we get a MIMO channel
y = Dx + z (3.34)
where D is a diagonal matrix with the diagonal elements equal to the V-point FFT of
h = [ho, hi,..., /Â¡l-]7. Hence the GMD scheme can be applied directly. We expect

70
capacity C\ and C2 with C\ + C2 = 10 bps/Hz. We consider the three scenarios with
(Ai = 2, A2 = 1), (Ai = 5, A2 = 1), and (Ai = 10, A2 = 1). For all the three cases, there
is an inflection point beyond which our TCD is the same as the linear design of [28].
That is because when the two subchannels have very disparate QoS constraints, i.e.,
Ci is far larger than C2, the optimal strategy is to apply SVD to the channel matrix
and transmit data through the orthogonal eign-subchannels. (In this case, 0 = 1. (cf.
(5.13)).) If the subchannels QoS constraints are not too disparate, which corresponds
to the region to the left of the inflection point, the required input power of our TCD
scheme is invariant with respect to C\, C2 and is strictly less than that needed by the
linear design. This region corresponds to the capacity lossless region (cf. Figure 5-1).
Another interesting point is that the relative advantage of TCD is more prominent if
the singular values A1;A2 become more disparate.
Figure 5-4: Input SNR vs. C\. A rank 2 channel is decomposed into two subchannels
with capacities C\ and C2 = 10 C\.
5.5 CDMA Sequence Design
As we have shown in Section 2.1.1, the CDMA sequence design problem can be
viewed as a special case of the MIMO transceiver design. In an idealized S-CDMA
system where the channel does not experience any fading or near-far effect, L mobile
users modulate their information symbols via spreading sequences {sjf=1, each of which
has the processing gain N. The discrete-time baseband S-CDMA signal received at the

TRANSCEIVER DESIGN FOR MIMO COMMUNICATIONS
- A CHANNEL DECOMPOSITION PERSPECTIVE
Yi Jiang
(352) 392-5241
Department of Electrical and Computer Engineering
Chair: Jian Li
Degree: Doctor of Philosophy
It is recently discovered that deploying multiple transmitting and multiple receiv
ing antennas in a wireless communication system can drastically improve the data rate
and reliability of wireless communications, even without consuming additional band
width and input power. This so-called multi-input multi-output (MIMO) technology
has been under intense research and will be applied to the next generation of wireless
communication networks.
This dissertation focuses on designing practical transceiver designs for MIMO sys
tems with sound theoretical foundations. Three designs, i.e., the GMD, UCD, and TCD
schemes, are proposed. These designs represent a paradigm shift from the conventional
linear designs to the nonlinear designs while keeping the implementation complexity
low. It is proven, both through theoretical analyses and numerical simulations, that the
three designs are much better than their linear counterparts in that they can achieve
faster and more reliable communications. The schemes proposed in this dissertation
will probably play an important role in the next generation wireless fidelity (Wi-Fi)
and digital subscribe line (DSL) technologies. Besides its engineering significance, this
dissertation also invents two matrix decompositions, which are significant contributions
to the numerical analysis community.

71
(single-antenna) base-station can be represented as [31]
y = Sx + z
(5.54)
where S = [si,...,s] 6 RNxL and the 1th (1 < l < L) entry of x, xÂ¡, stands for
the information symbol from the 1th user. In the downlink channel, the base station
multiplexes the information dedicated to the L mobile users through the spreading
sequences, which are the columns of S. Then, all the mobiles receive the same signal
given in (5.54). We remark that (5.54) can also be written as (4.1) with H = 1^ and
F = S. Here Mr = Mt = N is the processing gain. Hence, optimizing the spreading
sequences amounts to optimizing the precoder F for a MIMO system. Indeed, due
to the simple channel matrix (H = I), some procedures of the TCD scheme can be
simplified. We shall show that the TCD scheme turns out to be an improved solution
to the sequence design proposed in [56]. At the end of this section, we will compare our
TCD scheme and the scheme proposed in [56].
5.5.1 CDMA Sequences Maximizing Sum Capacity
Recall that the precoder maximizing the overall MIMO channel capacity is F =
V>1/,27T where 4 is obtained by water filling algorithm. For an S-CDMA channel,
H = I, then V = I and the optimal power loading level is the uniform power allocation.
Hence the CDMA sequence maximizing the sum capacity is S = Since 2 has
orthonormal columns, we obtain SST = pi. This observation coincides with the findings
in [31], in which the authors show that the CDMA sequences maximizing the sum
capacity are the Welch-Bound-Equality sequences.
For the uplink scenario, i.e., the mobiles to base station case, the base station cal
culates the optimal CDMA sequences for each mobile user and the associated successive
nulling vectors needed by itself. Then the base station informs the mobile users their
designated CDMA sequences.
First, we need to calculate the power loading levels \$ e RNxN such that the
following GTD matrix decomposition is possible:

A
[31 2: Owx(-jV)]
h
qrpt,
(5.55)
where the diagonal elements of R, riiti 1. 2..... L, satisfy the QoS constraints. Note
that the singular values of H form a sequence whose first N elements are

29
= 4, M( = 4 Â¡id Rayleigh channel
Figure 3-1: Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr 4 for i.i.d. Rayleigh flat fading channels.
both information rate and BER performance while the computational complexity of our
scheme is comparable with the conventional linear transceiver scheme. Furthermore, we
have shown that the GMD scheme can be applied without the need of using training
symbols for channel estimation if combined with subspace tracking techniques. We
have also considered the issue of subchannel selection when some of the subchannels
are too poor to be useful. The GMD scheme can also be combined with OFDM for ISI
suppression. Both the theoretical analyses and empirical simulations have been provided
to validate the effectiveness of our approaches.

102
[29] L. Collin, O. Berder, P. Rostaing, and G. Burel, Optimal minimum distance-based
precoder for MIMO spatial multiplexing systems, IEEE Transactions on Signal
Processing, vol. 52, pp. 617-627, March 2004.
[30] J.-K. Zhang, A. Kavcic, X. Ma, and K. M. Wong, Design of unitary precoders for
ISI channels, in Proceedings IEEE International Conference on Acoustics Speech
and Signal Processing, vol. Ill, (Orlando, Florida), pp. 2265-2268, 2002.
[31] M. Rupf and J. L. Massey, Optimal sequence multisets for synchronous code
division multiple-access channels, IEEE Transactions on Information Theory,
vol. 40, pp. 1261-1266, July 1994.
[32] D. W. Bliss, K. W. Forsythe, A. O. Hero, and A. F. Yegulalp, Environmental issues
for MIMO capacity, IEEE Transactions on Signal Processing, vol. 50, pp. 2128
2142, September 2002.
[33] P. Stoica and R. L. Moses, Introduction to Spectral Analysis. Englewood Cliffs, NJ:
Prentice-Hall, 1997.
[34] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge: Cambridge University
Press, 1985.
[35] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, Third
Edition, 1996.
[36] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York,
NY: John Wiley and Sons, Inc., 1968.
[37] G. Ginis and J. M. CiofR, On the relationship between V-BLAST and the GDFE,
IEEE Communications Letters, vol. 5, pp. 364-366, September 2001.
[38] G. Ginis and J. M. Cioffi, Vectored transmission for digital subscriber line sys
tems, IEEE Journal on Selected Areas in Communications, vol. 20, pp. 1085-1104,
June 2002.
[39] G. Caire and S. Shamai, On the achievable throughput of a multiantenna Gaussian
broadcast channel, IEEE Transactions on Information Theory, vol. 49, pp. 1691-
1706, July 2003.
[40] G. Ginis and J. M. Cioffi, A multi-user precoding scheme achieving crosstalk can
cellation with application to DSL systems, Proc. 3fth Asilomar Conference Sig
nals, Systems, Computers, Asilomar, CA, vol. 2, pp. 1627-1631, 29 Oct.-l Nov.
2000.
[41] M. Costa, Writing on dirty paper, IEEE Transactions on Information Theory,
vol. 29, pp. 439-441, May 1983.
[42] M. Tomlinson, New automatic equaliser employing modulo arithmetic, Electron.
Lett., vol. 7, pp. 138-139, March 1971.
[43] H. Harashima and H. Miyakawa, Matched-transmission technique for channels
with intersymbol interference, IEEE Trans. Communications, pp. 774-780, August
1972.
[44] W. Yu and J. M. Cioffi, Trellis precoding for the broadcast channel, Global
Telecommunications Conference, vol. 2, pp. 1344-1348, November 2001.

20
3.2 Geometric Mean Decomposition for MIMO Transceiver Design
Note that VBLAST assumes no cooperations among transmitting antennas and
ZFDP assumes no cooperations at the receivers. Then a natural question arises: can
we exploit both the CSIR and CSIT to make things better if both CSIR and CSIT are
available? We attempt to address this question next.
In the sequel, we assume that the same signal constellation is used in all the inde
pendent symbol streams to reduce the system complexity. This is consistent with the
HIPERLAN/2 and IEEE 802.11 standards. Then the overall BER performance of the
system will be limited by the subchannel with the lowest SNR. To mitigate this problem,
based on (3.4) and (3.10), we consider the following optimization problem
max min : 1 < i < K\
Q,P
subject to R = Q*HP
R G R*xK,rq = 0 for i > j C3-11)
> 0 for 1 < i < K
Q*Q = P*P = Ik
where the semi-unitary matrices Q and P denote the linear operations at the receiver
and transmitter, respectively.
Since both Q and P are semi-unitary matrices, we have fln^i rnn = A//,n,
where {A#n}A=1 are the K non-zero singular values of H. In Chapter 6 we show that if
there exist semi-unitary matrices P and Q satisfying
H = QRP*, or equivalently, R = Q*HP (3.12a)
where the diagonal elements of R are given by
ru = A# = A#,n^
i IK
1 < < K,
(3.12b)
then the R in (3.12) is the solution to (3.11). The detailed treatment of the decompo
sition (3.12) is delegated to Chapter 6, We refer to this decomposition as the geometric
mean decomposition (GMD) since the diagonal elements of R are the geometric mean
of {A// } ^=1. A computationally efficient and numerically stable algorithm is proposed
in Section 6.2 to calculate the decomposition.
It seems reasonable to constrain the linear equalizer Q to be semi-unitary since it
will keep the background noise white. Yet it seems unnecessary to constrain P to be

84
Combining (6.4) and (6.5) with the identity GUF* = H = VSW* gives
AUB* = S.
It follows that
K
det (Â£*Â£) = det (BU*A*AUB*) = det (AA)det (B*B) |u|2,
i=i
which gives
K
min.KI2* < JJM2 = det (S*S)det (A*A)_1det (B*B)-1.
i=l
By the constraints of (6.3), we have
(6.6)
tr ((G*G)_1) = tr ((A*A)-1) < pi,
tr ((F*F)-1) = tr ((B*B)_1) < p?.
By the geometric mean inequality and the fact that the determinant (trace) of a matrix
is the product (sum) of the eigenvalues, a K by K Hermitian positive semidefinite matrix
S satisfies
det (S) <
Using these bounds for the determinant and the trace in (6.6), we have
min I rtiv I <
i < \JP\P2
K
(6.7)
Finally, it can be verified that for the choices of G, U, and F given in the statement of
the theorem, the inequality (6.7) is an equality.
6.2.2 Implementation Based on Initial SVD
We now give an algorithm for evaluating the GMD that starts with the singular
value decomposition H = VSW*. The algorithm generates a sequence of upper trian
gular matrices R(U, 1 < L < K, with R(b = Â£. Each matrix R(G has the following
properties:
(a) rj^ = 0 when i > j or j > max {L, z}.
(b) r\^ = o for all i < L, and the geometric mean of L < i < K, is g.
We express R(:+1l = Q,R,,-)Pt where Q*. and are orthogonal for each k.
These orthogonal matrices are constructed using a symmetric permutation and a
pair of Givens rotations. Suppose that R(/c) satisfies (a) and (b). If rjÂ¡ > g, then let

6-2 The operation displayed in (6.15)
89
viii

19
yi
hi
0 .
. 0
Xi
Zi
V2
hi
h2
. 0
&2
+
2
rx i
Vk
rKK
XK
zk
Denote s Â£ CKxl to be the symbol vector destined for the K receivers. We wish to
x satisfying
hisi
1
O
. 0
X\
?22S2

hi h2
. 0
X2
XKKSr
rx 1
rKK
Xk
(3.7)
have
(3.8)
The solution to (3.8) is
x = R *diag{R}s.
(3.9)
However, the matrix inversion can amplify the norm of x significantly which can lead
to additional power consumption at the transmitter. By exploiting the finite alphabet
property of the communication signals, the modulo arithmetic precoder (more recently
known as the Tomlinson-Harashima Precoder [42], [43]) can be applied to bound the
value of the transmitted signal. Moreover, the trellis precoding can be used to eliminate
the 1.53 dB shape-loss of Tomlinson-Harashima precoding [44]. The ZFDP transmission
scheme decomposes the MIMO channel into K parallel scalar channels (see [40] for more
details)
Vi = ruXi + Zi i = 1,2, (3.10)
Several remarks are now in order, a) VBLAST is shown to be able to achieve only
about 72% of the capacity [5]. That is because imposing the same rate of transmission
on all the transmitters makes the channel capacity limited by the worst of the K scalar
subchannels, b) VBLAST has only diversity gain of MrMt+1. c) ZFDP can achieve the
broadcast channel capacity for high SNR [39], but the subchannels have different fading
levels. Hence the transmitter, just like the aforementioned linear transceivers, have to
consider the tradeoffs between the BER performance and the channel throughput, d)
ZFDP scheme causes no error propagation, and thus (3.10) is precise, e) Both VBLAST
and ZFDP involve nonlinear operations.

18
1
... Â£
to i
i

Di D2
0 722
... r1K
r2K
X\
X2
+
Z\
z2
VK
0 ...
0 rKK
xk
Zk
The sequential signal detection is as follows
(3.3)
for i = K : 1:1
xx = C Ej=+i rijÂ£jJ /ra
end
where C stands for mapping to the nearest symbol in the symbol constellation. Ignoring
the error-propagation effect, we see that the MIMO channel is decomposed into K
parallel scalar subchannels
yi = riixi + zi, i = 1,2, , AT.
(3.4)
3.1.2 ZF-DP
We consider a broadcast MIMO channel with Mt transmitting antennas and Mr
receiving antennas (Mt > Mr). The channel model is exactly the same as (2.1) and the
CSIT is available. However the receiving antennas cannot cooperate with each other. A
vector transmission scheme was proposed in [40], which combines the QR decomposition
and dirty paper precoding. We refer to this approach as the zero-forcing dirty paper
precoding (ZFDP). (The use of the dirty paper phrase is due to Costa [41].)
The ZFDP scheme resembles the zero-forcing VBLAST method. It also goes
through the sequential nulling and cancellation procedure. The only difference is that
all these operations are done by the transmitter.
By assuming H to be of full row rank, i.e., K = Mr, ZFDP also begins with the
QR decomposition H* = QR. Let us rewrite (2.1) as
y = R*Q*x + z. (3.5)
Denoting x = Qx yields
y = R*x + z,
(3.6)

57
Capacity lossles region (C, = 3, C2 2, C 1)
Figure 5-1: Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, Cx = 3, C2 = 2, and C3 = 1.
Definition 3 For x, y M", if
i i
n^ =1 i=i
with equality for j = n, we say that x is multiplicatively majorized by y and write
x - Obviously, if x y, then logx logy.
Now we are ready to introduce the GTD theorem.
Theorem 5.2.3 (GTD theorem) Let H Â£ cm*n have rank K with singular values
A R+. There exists an upper triangular matrix R Â£ CK*K and matrices Q and P
with orthonormal columns such that H = QRP* if and only if the diagonal elements of
R satisfy |r| A.
Proof: We relegate the proof to Chapter 6.
There is a computationally efficient and numerically stable algorithm to achieve the
GTD predicted by Theorem 5.2.3, which is presented in Chapter 6.
5.3 Tunable Channel Decomposition
5.3.1 TCD-VBLAST
We see from (5.2) that F can always be scaled such that a = 1. Hence without loss
of generality, we let a = 1 in the sequel to simplify the notation.

27
that GMD-ZFDP may have better BER performance than GMD-VBLAST if N 1, in
which case the GMD-VBLAST may suffer from considerable performance degradation
due to error propagations.
3.4 Performance Examples
We present next several numerical examples to demonstrate the effectiveness of the
GMD scheme. In all the examples, we assume Rayleigh independent flat fading channels.
In the first example, we consider a Rayleigh flat fading channel with Mt 4 and
Mr = 4. We compute the Shannon capacities of the channel with both CSIR and CSIT
(Cjt, (2.10)), the channel with uninformed transmitter (Cut, (2.11)), the channel using
the GMD scheme (Cgmd, (3.16)), the channel using the MTM scheme (Cmtm, (2-49)),
and the channel using the MMD scheme (Cmmd, (2.55)). We average the capacities of
1000 Monte-Carlo-generated H realizations. The result is presented in Figure 3-1. We
note that the capacity loss of the MMD scheme is about twice that of the MTM scheme
at high SNR as predicted in Section 2.3. The relative capacity loss of the MMD scheme
compared with MTM is smaller at low SNR because some subchannels are not used at
low SNR. The GMD scheme outperforms the linear transceiver designs when the SNR
is moderate or high and is asymptotically capacity lossless at high SNR.
Figure 3-2 shows the complementary cumulative distribution functions (CCDF) of
the channel capacities of a 5 x 5 independent Rayleigh flat fading channel with SNR
equal to 23 dB. The five thin dashed curves denote the channel capacities of the five sub
channels obtained via SVD plus water filling. Note that the leftmost thin curve crosses
the vertical axis at a value less than one, which means that the worst subchannel (cor
responding to the smallest singular value of the channel matrix) is sometimes discarded
by water filling. The thick line is the CCDF of each subchannel capacity obtained via
GMD. Figure 3-2 further illustrates the disadvantages of the conventional SVD plus bit
allocation scheme (see, e.g., [19] [20] [23]). The channel capacities of the 5 subchannels
obtained via SVD plus water filling range from 0 to about 10 bps/Hz, which suggests
that the BPSK or QPSK modulation should be used to match the capacity of the worst
subchannel and something like 512 or 1024 QAM to the best subchannel. This bit
allocation significantly increases the modulation/demodulation complexity. Moreover,
using a constellation with size greater than 256 is impractical for the current RF circuit
design technology. For the GMD scheme, on the other hand, the same constellation with
a moderate size, say 64-QAM, can be applied to reap most of the channel capacity.

40
determined based on (4.40) below. We use Fre, the linear precoder in the reverse
channel, as the linear equalizer. Then the equivalent MIMO channel is
Y F*etlHWD9x + F*evz, (4.36)
where the zth scalar subchannel of the MIMO channel is
L t-1
Vi = f*Hwiy/qlxi + f/Hwjy/q]xj + ^ f)*Hwjy/qjxj + f*z. (4.37)
j=i+1 j= 1
Applying the dirty paper precoder to x, and treating ^ f* Hwj ^/qjXj as the interfer
ence known at the transmitter (note that here we precode the first layer first while for
UCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
Vi = f)*Hwjy/qjXj + f*z (4.38)
j=i+1
with SINR
Pi =
for = 1,2.
L.
(4.39)
The next step is to calculate {qi}i=i such that pi = p, 1 < i < L, where p is as defined
in (4.33). Let aj = |f*Hwy|2. Then (4.39) can be represented in the matrix form
aii pan
-pan
<7i
' llfill2 '
0 22
-pa2L
<72
INI2
0
0 aLL
<7l
pa
.I|fi||a.
(4.40)
It is easy to see that 0, 0 < i < L. It is proven in [49] that Yl!i=\ <7 tr(FF*) =
tr(FreF*ei,). That is, the UCD-DP needs exactly the same power as the UCD-VBLAST
to obtain L identical subchannels with SINR p.
The UCD-DP using the Tomlinson-Harashima precoder leads to an input power
increase of for M-QAM symbols. Nevertheless, for a system with high dimension
ality and/or using large constellation, UCD-DP is a better choice than UCD-VBLAST
since it is free of propagation errors.
4.4 Performance Analysis
4.4.1 Diversity Gain Analysis
An important performance metric is diversity gain, which is defined as follows [16].

68
function ip = TCDPow (/3,A)
L = 1 ; R = length (/3) ; ip = zeros (1, R) ;
C = cumsum (log (/3)) ;
while R > L
[t, 1] = max (Â£(L:R)./[1:R-L+l]) ;
71 = exp (t) ; LI = L + 1 1 ;
if 71 > 1/A (LI) 2
ip(.L:Ll) = 71 ;
L = L + 1 ;
C(L:R) = C(L:R) CCL-1) ;
else
V>(L1:R) = 1./(A(L1:R).*2) ;
C(LI1) = COO sum (log (-0(LI:R))) ;
R = LI 1 ;
end
end
Figure 5-2: A Matlab function to solve (5.49).
and let l denote an index for which 7* is the largest:
l = arg max{7fc : 1 < k < K}. (5.53)
If 7; > 1/Af, then putting ipi = 7Â¡ for all i < l is optimal in (5.49). 7/7/ < 1/Af, then
ipi = 1/A# for all i > l at an optimal solution of (5.49).
Proof: See Appendix B.
Based on Lemma 5.4.3, we can use the following strategy to solve (5.49). We form
the geometric mean described in Lemma 5.4.3 and we evaluate l. If 7Â¡ > 1/A^(, then
we set (pi = 7; for i < l, and we simplify (5.49) by removing ipi, 1 < i < l, from the
problem. If 7Â¡ < 1/A2Hl, then we set tpi = l/X2H i for i > l, and we simplify (5.49) by
removing xpi, l < i < K, from the problem. The Matlab code TCDPow implementing this
algorithm appears in Figure 5-2.
After obtaining the power loading level (pi = xpi~ l/X2Hi, 1 precoder F and the nulling vectors {w}f=1 according to Table 5-1 in Section 5.3. Note
that one of the possible paths through the TCDPow routine makes the leading elements of
t/> all equal while setting the trailing elements of xpi = 1/A^,. This path coincides with
the standard water filling algorithm. In this case, the TCD scheme is optimal in terms
of maximizing the overall throughput given the input power. On the other hand, if some
substream has a very high prescribed SINR such that the l given in (5.53) is less than the
break point j, then ip leads to be a multi-level water filling power allocation, which

77
By [60, Lemma 3.3.8]) and (5.69),
k k
Y /(log(A^)) > Y /(log(A2HF,i)). 1 < k < K, (5.70)
t=l 1=1
whenever / is a real-valued, increasing convex function. The function f(t) = log(e -I-1)
is convex since its second derivative is positive. Making this choice for / in (5.70) and
exponentiating both sides, we obtain:
k k
+ 1) + 1)> 1 < k < K.
t=l =1
Since F is feasible in (5.46),
H(Ahf, + 1)
i=l
K
JJ(AffF,i + 1)
k
> n<*.+1)>
i=l
= n<*+^
=1
1 < k < K,
Combining this with (5.71), we get
=i
K
n(A//,i0i+1)
k
> fj(p + l)i
t=l
L
=i
l (5.71)
(5.72)
Since Xjj +1 is the square of the i-th singular value of the augmented matrix Ga cor
responding to the choice Vi1/2, we conclude that F = Vi1/2 satisfies all the inequality
constraints in (5.46). If the inequality (5.72) is strict, then x should be decreased
in order to satisfy the equality constraint in (5.47). Since decreasing x only lowers
tr(FF*), we deduce that the minimum in (5.46) is achieved by a matrix of the form
F = Vi1/2. If F = Vi1/2 is optimal in (5.46), then so is F = Vi1/2!}7 whenever
fl has K orthonormal columns (since the constraints are satisfied and the value of the
cost does not change). We now make the choice for fi given in Theorem 5.3.1. That
is, if QRcuPT is the GTD of in (5.19) where i is a solution of (5.47), then 17 is
the matrix formed by the first K columns of PT. For this choice of 2, the constraints
of (5.45) are satisfied. As noted earlier, the minimum in (5.45) can be no smaller than
the minimum in (5.46). Since this choice for F yields the same cost in both (5.45) and
(5.46), we conclude that F = V!?1/2!}7 is optimal in (5.45).

72
1,2,..., iV, followed by L N ones. From Theorem 5.2.3, (5.55) exists if and only if
({l + l>--->l) >x {1 + pjf=1. (5.56)
Similar to (5.47), we need to solve the problem
min* Eili fa
subject to ({1 + 0<}il1,1,. 1) {1 + pj.ii
fa > 0, Vz
Similar to (5.49), (5.57) can be further simplified using the variables
L
(5.57)
fa = fa + 1, A = 1 + pi for i < N, and PN ]j[(l + pj.
i=N
The simplified problem is
minV> E.ii fa
subject to nil fa > nil A, fa > 1, 1 < k < N.
(5.58)
The algorithm TCDPow simplifies immensely when we apply it to (5.58). Since \$ > 1 =
for all z, the constraints fa > l are inactive. Since fa < /3,_i for all i < N, the
geometric means satisfy 7, < 7_j for all i < N. Hence, in Lemma 5.4.3, the value of l
is either 1 or N. If l = 1, then we set fa = Pi and we remove fa from the problem. If
l N, then fa = 7^ for all z. It follows that there exists an index j with the property
that
/ N \
fa Pi for all i < j and fa I TT A I for all > j-
\i=j+1 /
This observation coincides with the solution obtained in [56],
Let T denote an L x L identity matrix with its first N diagonal elements replaced
by 1 < i < N. According to the TCD scheme presented in Section 5.3.1, we then
apply the GTD algorithm to tF1/2 to obtain
*1/2 Q*R*PÂ¡.
According to (5.25),
Let
S = F =
Nx(LN)
P*.
[vi, , VJ = [4>l/: 0vx(L-/v)]'I' l/ 2Qj<.
(5.59)
(5.60)
(5.61)

This dissertation was submitted to the Graduate Faculty of the College of En
gineering and to the Graduate School and was accepted as partial fulfillment of the
requirements for the degree of Doctor of Philosophy.
May 2005
Pramod P. Khargonekar
Dean, College of Engineering
Kenneth Gerhardt

73
By (5.26) and (5.27), the nulling vectors used at the base station are
w* = r,v i = (5.62)
where is the ith diagonal element of R>Â¡,. In summary, the base station needs to
run the following three steps:
1. Solve the optimization problem (5.58).
2. Apply the GTD algorithm to '1/2 in (5.59).
3. Obtain the spreading sequences for all mobile users, [si,...,s] = S, and the
nulling vectors {w}t=1 (cf. (5.60) and (5.62)) for the base station.
In the downlink case, the mobiles cannot cooperate with each other for decision
feed-back. Hence the VBLAST detection is impractical at receivers. However, we can
apply TCD-DP as introduced in Section 5.3.2 to cancel out known interferences at the
transmitter, i.e., the base station. We can convert the downlink problem as an uplink
one and exploit the downlink-uplink duality as we have done in Section 5.3.2. Note that
H = H* = I, i.e., the downlink and uplink channels are the same! Consider the case
where the uplink and downlink communications are symmetric, i.e., for each mobile
user, the QoS of the communications from the user to the base station and the base
station to the user are the same. After obtaining the spreading sequences [sj,..., s] for
the mobile users, and the nulling vectors [wi,..., w] used at the base station for the
transmitted from the base station are exactly [wi,..., w] and the nulling vectors used
at the mobiles are the spreading sequences, [sj,..., s], used in the uplink case. The only
parameters we need to calculate are qi,...,qN (cf- (5.37)). Hence in this symmetric case,
the base station only needs to inform the mobiles their designated spreading sequences
once in the two-way communications. Each mobile uses the same sequence for both data
transmission in the uplink channel and interference nulling in the downlink channel.
5.5.4 Numerical Example
We present one numerical example to show how TCD can be applied to CDMA
sequence design. We consider an example where there are L 4 mobile users and
the processing gain N = 3. The prescribed SINRs of the four users are 20,19,18, and
17 dB, respectively. For the uplink case, we apply the TCD-VBLAST scheme to obtain

CHAPTER 6
NOVEL MATRIX DECOMPOSITIONS
6.1 Introduction
Given a complex matrix H, we consider the decomposition H = QRP*, where R
is upper triangular and Q and P have orthonormal columns. Special instances of this
decomposition are
(a) the singular value decomposition (SVD) [61, 62]
H = VSW',
where Â£ is a diagonal matrix containing the singular values on the diagonal,
(b) the Schur decomposition [63]
H = QUQ*,
where U is an upper triangular matrix with the eigenvalues of H on the diagonal,
(c) the QR decomposition where P = I.
In this chapter, we will introduce two novel matrix decompositions, i.e., the geomet
ric mean decomposition (GMD) and the generalized triangular decomposition (GTD).
As we introduced before, the GMD scheme and the UCD scheme are based on the GMD
matrix decomposition algorithm, and the TCD is based on the GTD algorithm. The
results of this chapter are motivated by the applications of designing MIMO transceiver.
Interesting, these results turn out to be also useful to the numerical analysis community.
6.2 Geometric Mean Decomposition
In this section, we present a new unitary decomposition which call the geometric
mean decomposition or GMD. Given a rank K matrix H G Cmxn, it is expressed in the
form QRP* where P and Q have orthonormal columns, and R G RKxK is a real upper
triangular matrix with diagonal elements all equal to the geometric mean of the positive
singular values:
81

83
6.2.1 Generalized Maximin Properties
We consider the following problem:
max min {u1 : 1 < i < K}
F,G
subject to GUF* = H, Uij 0 for i > j, U Â£ RKxK,
uÂ¡i > 0, 1 < i < K,
tr ((G*G)-1) < pi, tr((F*F)-1) (6.3)
If Pi P2 = K, then any Q and P feasible in (6.2) are feasible in (6.3). Hence, the
problem (6.3) is less constrained than the problem (6.2) since the set of feasible matrices
has been enlarged. Nonetheless, we now show that the solution to this relaxed problem
is the same as the solution of the more constrained problem (6.2).
Theorem 6.2.1 7/H Â£ Cmxn has rank K, then a solution of (6.3) is given by
G = Q*/^, U=(^pW and F = P
Pi
v K J
where QRP* is the GMD of H.
Proof: Let VSW* be the singular value decomposition of H, where E Â£ KAxA
contains the K positive singular values of H on the diagonal. If F and G satisfy the
constraints of (6.3), then we have
H = YEW* GUF*.
The column space of GUF* is contained in the column space of G. Since G has K
columns, the dimension of the column space is at most K. Since GUF* = H has rank
K, the column space of G must coincide with the column space of H, which is equal to
the column space of V. Hence, there exists a K by K invertible matrix A such that
G = VA. (6.4)
In the same fashion, the column space of F must coincide with the column space
of H*, which is equal to the column space of W. And there exists a K by K invertible
matrix B such that
F = WB.
(6.5)

44
respectively. While there is a noticeably larger diversity gain for UCD compared with
GMD as shown in Figure 4-3, the difference is not as drastic as the theoretical prediction.
It is because the input SNR is not high enough to validate the approximations made in
the typical error event analyses (see Appendix B).
In the final example, we compare the BER performance of UCD-VBLAST and
UCD-DP in the scenario of a 10 x 10 Rayleigh flat fading channel. To present a bench
mark, we also include UCD-genie as the imaginary scenario where at each layer, a genie
would eliminate the influence of erroneous detections from the previous layers when using
UCD-VBLAST. Figure 4-4 shows that UCD-VBLAST may suffer from some small BER
degradations caused by error propagation (about 0.5 dB for BER = 10-4) compared with
UCD-genie. The UCD-DP, on the contrary, is free of error propagation and hence has
BER performance very close to that of UCD-genie. The slight SNR loss of UCD-DP
is mainly due to the inherent power-amplification effect of the Tomlinson-Harashima
precoder.
4.6 Conclusions
Based on the GMD matrix decomposition algorithm and the closed-form represen
tation of the MMSE-VBLAST detector, we have introduced the UCD scheme for MIMO
communications that can decompose a MIMO channel into multiple subchannels with
identical capacities in a capacity lossless manner. We have proposed two versions of the
UCD scheme, i.e., UCD-VBLAST and UCD-DP. The UCD scheme can provide much
convenience for the subsequent modulation/demodulation and coding/decoding proce
dures due to obviating the need of bit allocation. We have also shown that UCD can
achieve the maximal diversity gain. The simulations show that the UCD scheme has
excellent performance even without the use of error correcting codes. The UCD scheme
suggests a new way of channel decomposition which enjoys much more flexibility than
the conventional SVD based ones.
Appendix A
Proof of Lemma 4.1.2
Rewrite (4.5)
Q Ha
H Q = QHuRHa 4
(4.43)

BER
51
Mt= 10, Mf= 10 Â¡id Rayleigh channel, 64-QAM
Figure 4-4: BER performances of the UCD-DP, UCD-VBLAST schemes and the imagi
nary UCD-genie scheme. Results based on 1000 Monte Carlo trials of an i.i.d. Rayleigh
flat fading channel with Mt = 10 and Mr = 10.

34
4.1 Closed-Form Representation of MMSE-VBLAST
The UCD scheme is based on the closed-form representation of the VBLAST scheme
using MMSE nulling vectors. For MMSE-VBLAST, the nulling vector for the ith layer
is
(4.3)
w, = hjh* + alj hj, i = 1,..., Mt.
The MMSE-VBLAST algorithm can be represented in a concise matrix form which was
given in [9] (also see the more detailed version [47]).
Consider the augmented matrix
Ha =
H
x/oIm,
(4.4)
(Air+Mt)xMt
Applying the QR decomposition to Ha yields
Ha = Q//aR//a =
Q Ha
Qtfa
R
Ha
(4.5)
where R//a 6 CMfXM is an upper triangular matrix with positive diagonal elements and
QuHa CWrXA/t. Note that H = Q# R//a is not the QR decomposition of H since
is not unitary. However, we can readily obtain the nulling vectors using and R
as shown in the following lemma [47]:
Lemma 4.1.1 Let {q//a,}fi'i denote the columns of and the diagonal
elements of R//, where QuHa and R#o are given in (f.5). The nulling vectors of (4-3)
satisfy
w
i = rHl,iicHa, i = l,2,...,Mt.
(4.6)
Then the output signal-to-interfere-and-noise ratio (SINR) of the zth layer (i.e., the
signal corresponding to hÂ¡) using w, is
|h>i|2^
Pi =
w* (E=i alhJh* + E1) w4
(4.7)
Inserting (4.3) into (4.7), we can simplify (4.7) via some straightforward calculations to
be (see, e.g., [48])
P* = h.*cr1hi> i = l,...,Mf
where C = E}=i hjh* + al.
(4.8)

79
Let M be the first index with the property that
M M
II* = 11* (5-77)
=1 1=1
Such an index exists since ip* is optimal, which implies that
1=1 =1
First, suppose that M < j, where j is the break point given in Lemma 5.4.2. By
complementary slackness, /q = 0 and ip* ip*+1 = Â¡il for 1 < z < M. We conclude that
xpi = 7, for 1 < i < M. By (5.77) we have
M
7.M=n a.
=1
It follows that
/ M \ X/M
IJaJ =7* >7i>
which contradicts the fact that l achieves the maximum in (5.53).
In the case M > j, we have ipi 7 for 1 < z < j. Again, this follows by
complementary slackness. However, we need to stop when i = j since the lower bound
constraints become active for i > j. In Lemma 5.4.2, we show that ip* > ip* 7 for
i > j. Consequently, we have
M M
t=l i=l
Again, this contradicts the fact that l achieves the maximum in (5.53). This completes
the analysis of the case where 7; > 1/A2HÂ¡.
Now consider the case By the definition of 7we have
K
l/K
K
7; > ( n^ J or 7^>n^
(5.78)
0=1
i= 1
If j is the break point described in Lemma 5.4.2, then ip* > ip* for all f; it follows that
K
i=1
(5.79)

The derivative of A(e) evaluated at zero is
A'(0) = %pk -Tpk+i-
Since 1 /A^ fc is an increasing function of k and since pk l/A#*., we conclude that
tpk+i > V'k and A'(0) < 0. Hence, for e > 0 near zero, -0(e) has a smaller cost than
-0(0), which yields a contradiction. Hence, there exists an index j with the property
that V'. = 1 /\2H i for all i > j and pi > 1/A^ for all i < j.
According to Lemma 5.4.1, pi > ipi+i for any i < j. To complete the proof, we
need to show that pj < pj+i- As noted previously, any solution of (5.49) satisfies
K K
i=1 =1
which implies (cf. (5.48))
i j / K \ j
n >n-
1=1 i=l \i=j+l J =1
That is, the constraint nu a > uu Pi in (5.49) is inactive. If ipj > ipj+u we will
decrease the j-th component and increase the j -f 1 component, while leaving the other
components unchanged. Letting 3/3(6) be the modified vector, we set
Vj+i (S) = (1 + S)3pj+1 and ^(S) =
Since the j-th constraint in (5.49) is inactive, 3/3(6) is feasible for <5 near zero. And if
3/3j > 3/3j+i, then the cost decreases as J increases. It follows that xpj < 3/3j+1.
By Lemma 5.4.2, is a decreasing function of i for i [1, j] while fa = 1/A2Hi
for i > j. Since An,i is a decreasing function of i, it follows that fa = fa 1/A2Hi is a
decreasing function of i for i [1,/] with pi > 0, while pi = 0 for i > j. Hence, pi is
a decreasing function of i 6 [1,/f], In particular, the constraint pk > pk+l in (5.47) is
automatically satisfied by the associated solution characterized in Lemma 5.4.2.
We refer to the index j in Lemma 5.4.2 as the break point. At the break point,
the lower bound constraint pi > 1/A^ changes from inactive to active. We now use
Lemma 5.4.2 to obtain an algorithm for (5.49).
Lemma 5.4.3 Let 7^ denote the k-th geometric mean of the Pi:

2
1.2 Joint Transceiver Design: Where Tx and Rx Collaborate
All the aforementioned methods assume that the channel state information (CSI)
is available at the receiver (CSIR) only. Under this assumption, collaborations between
the transmitter and receiver are difficult in the physical layer. However, if the commu
nication environment is relatively stationary, the availability of CSI at the transmitter
(CSIT) is also possible via feedback or the reciprocal principle when time division duplex
(TDD) is used. In fact, in the third generation WCDMA standard [17], the CSIT is
assumed to obtain improved system performance, which is referred to as the closed-loop
transmit diversity or transmit adaptive array (TxAA) technique. Based on this assump
tion, the joint optimal transceiver design (also referred to as precoding at the transmitter
and equalization at the receiver) has recently attracted considerable attentions [18] [19]
[20] [21] [22] [23] [24] [25] [26] [27] [28] [29],
These designs are based on a variety of criteria, including minimum mean-squared-
error (MMSE), [18] [21] [22], maximum SNR [21], maximum information rate [19] [20]
[22], and BER based criteria [23] [24] [25] [29], More recently, a unified framework has
been presented to accommodate all these criteria, under which the design problems can
be solved via convex optimization methods [26].
The aforementioned literature on joint transceiver design considered linear trans
formations only. It is widely understood that the singular value decomposition (SVD),
which decomposes a MIMO channel into multiple parallel subchannels, and water fill
ing can be used to achieve the channel capacity [3], However, due to the usually very
different signal-to-noise ratios (SNR) of the subchannels, this apparently simple scheme
requires careful bit allocation (see, e.g., [19] [20] [23]) to match the subchannel capacity
and achieve a prescribed BER. Bit allocation not only increases the coding/decoding
complexity, but also is inherently capacity lossy because of the finite constellation gran
ularity. An alternative is to use the same constellation in all the subchannels, like
the schemes adopted by the European standard HIPERLAN/2 and the IEEE 802.11
standards for wireless local area networks (WLANs). However, for this alternative, the
BER is dominated by the subchannels with the lowest SNRs. To optimize the BER
performance, more signal power could be allocated to the poorer subchannels. Yet this
approach causes significant capacity loss due to inverse water filling like power allo
cation. There is apparently a fundamental tradeoff between the capacity and the BER
performance.

37
Prom Lemma 4.2.1 and Lemma 4.1.2, we conclude that we can always combine a
linear precoder and the MMSE-VBLAST detector to uniformly decompose a MIMO
channel into L > K subchannels with the same output SINRs. According to Corollary
4.1.3, we can further conclude that the channel decomposition is strictly capacity lossless.
We refer to the scheme demonstrated in Lemma 4.2.1 as UCD-VBLAST.
The proof of Lemma 4.2.1 is insightful. Indeed, given the SVD of H and the
water filling level 41/2, we only need to calculate the GMD given in (4.18). Then we
immediately obtain the linear precoder F = V<)I,/22*, where il consists of the first K
columns of P). Let QÂ£a denote the first Mr rows of QGa, or equivalently the first Mr
rows of Qj (cf. (4.20)). According to Lemma 4.1.1, the nulling vectors are calculated as
w* = rj}i (4.21)
where rjtii is the zth diagonal element of R; and qGo>i is the zth column of QÂ£ .
Some observations can help reduce the computational complexity. For any matrix
B G CMxN with SVD B = UgAsYg and the augmented matrix with SVD
A =
B
y/al
= U^V^,
the diagonal elements of A and A#, i.e., and Asatisfy
^a, = yA2Bi + a, i
Moreover
U,
v^V^A^1
Hence the SVD of J defined in (4.18) is
and Vj = Vo.
J =
U[S: 0/fX(Â£,_K-)]S 1
SI/
where E is an L x L diagonal matrix with the diagonal elements
Gi \J at2 + a, 1 and
(4.22)
(4.23)
(4.24)
(4.25)
(4.26a)
r = y/a, K + 1 < i < L.
(4.26b)

54
Here L = K and 4> is diagonal whose kth {1 < k < K) diagonal element 4>k determines
the power loaded to the kth subchannel and is found via water filling to be
/ \ +
Mp) =
a
AH,k
(5.5)
with p being chosen such that a\ J2k=i 4>k(p) P&z an<^ (a)+ = max{0, a}. In this case,
we obtain K subchannels with capacities
Ck = log2 1 +
= log2
P^H,h
a
bps/Hz, k = 1,2,... ,K. (5.6)
Due to the usually large dynamic range of singular values {Af/fc}Â¡[Li> the SVD decom
poses a MIMO channel into multiple parallel eign-subchannels with different channel
capacities. Moreover, since the optimal power loading levels are fixed as given in (5.5),
the achievable MIMO channel decomposition is rigidly given in (5.6) and it lacks flexi
bility.
Another way of decomposing a MIMO channel is to use the VBLAST detector [5],
The VBLAST scheme involves sequential nulling and cancellation and it decomposes
the MIMO channel into K subchannels (or layers as coined in [5]). By changing the
ordering of the signal detection, we can get K\ subchannel combinations, each of which
is capacity lossless [48].
Theoretically, more combinations of subchannels is possible via time sharing (see
[57, Ch. 14.3]). Recall that every DBLAST layer sends its data substream across the K
transmitting antennas, or VBLAST layers, in a time sharing manner [2]. For example,
for a system with Mt 2, the transmitted data are
Vertical Layer-I : X\ y2 x2 y4 ...
Vertical Layer-II : 0 x? y3 x4 ...
Let X( and yi,i = 1,2,..., denote the symbols transmitted through the DBLAST layers
I and II, respectively, at time i. The receiver first estimates xi and then estimates x2
by regarding y2 as interference. The estimates of xY, x2 are decoded jointly, which form
the output of the diagonal layer I. After subtracting out the effect of X\,x2 from the
received data, we can estimate and decode y2)y2, which form the diagonal layer II. We
remark that DBLAST can be viewed as a combination of VBLAST and the time sharing
technique, which decomposes the MIMO channel into multiple identical subchannels.

ACKNOWLEDGMENTS
Foremost, I thank my advisor, Professor Jian Li, for her support, encouragement,
and guidance in the past four years. Dr. Li provided me the invaluable opportunity
to investigate those fascinating research problems and always showed full confidence in
me. I just hope I can live up to her expectations in my future career. I am very grateful
to my collaborator, Professor William W. Hager, whose suggestions and rigorous math
have benefited me a lot. I thank Dr. Tan F. Wong for teaching me the information
theory. Some of the basic ideas of this dissertation were formulated when I was taking
his class in the fall of 2003. I would like to thank Dr. John M. Shea, Dr. Tan F. Wong,
Dr. Kenneth K. O, and Dr. William W. Hager for serving in my dissertation committee.
Thanks go to all my friends both at the University of Florida and elsewhere who made
the last four years full of fun.
This dissertation is dedicated to my parents and my fiance Hongying.
iii

53
Almost originated at the same time as the research on MIMO transceiver designs,
the optimal design of symbol synchronous CDMA (S-CDMA) sequences has been un
der intensive study over the past decade (see, e.g., [31] [54] [55] [56]). Although the two
research topics have been studied in an apparently independent manner in the signal
processing and information theory communities, the CDMA sequence design problem
can be viewed as a special case of the MIMO transceiver design as we have shown
in Section 2.1.1. Hence the TCD scheme can be applied, with little modifications, to
the design of optimal CDMA sequences. Moreover, the TCD-VBLAST and TCD-DP
schemes can be applied to design optimal CDMA sequences in the uplink (mobile-to-
base) and downlink (base-to-mobile) scenarios, respectively. Our TCD scheme, which
is independently motivated by the MIMO transceiver design problem, turns out to be
related to the scheme proposed in [56]. The relationship is discussed in Section 5.3.
5.2 Channel Model and Preliminaries
5.2.1 Channel Model
To facilitate the discussion, we rewrite the channel model used in the previous
chapters.
y = HFx + z, (5.1)
where x G CLxl is the information symbols precoded by the linear precoder F G CM,xL
and y G CMrXl is the received signal and H G CMrXMt is the channel matrix with rank
K. We assume Â£7[xx*] = cr2I and z ~ A(0,<72lA/r) is the circularly symmetric complex
Gaussian noise. We define the SNR as
Â£[x*F*Fx] a:
= -Â§Tr{F*F} ^ -Tr{F*F},
a
(5.2)
5.2.2 Channel Decomposition
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K
diagonal matrix whose diagonal elements are the nonzero singular values of
H. To maximize the channel capacity with respect to F given the input power constraint
Tr{FF*} < pcr^/cr2, one needs to solve
CÂ¡T = max log, |I + a_1HFF*H*
Tr{FF*} The optimal linear precoder is (cf. (2.8))
(5.3)
F =- V\$1/2.
(5.4)

101
[14] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, Space-time block codes from
orthogonal designs, IEEE Transactions on Information Theory, vol. 45, pp. 1456
1467, July 1999.
[15] V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, Combined array
processing and space-time coding, IEEE Transactions on Information Theory,
vol. 47, pp. 199-207, Feb. 1999.
[16] L. Zheng and D. Tse, Diversity and multiplexing: A fundamental tradeoff in
multiple-antenna channels, IEEE Transactions on Information Theory, vol. 49,
pp. 1073-1096, May 2003.
[17] Available Online: http://www.3gpp.org.
[18] J. Yang and S. Roy, On joint transmitter and receiver optimization for multiple-
input-multiple-output (MIMO) transmission systems, IEEE Transactions on
Communications, vol. 42, pp. 3221-3231, December 1994.
[19] G. G. Raleigh and J. M. Cioffi, Spatial-temporal coding for wireless communica
tion, IEEE Transactions on Communications, vol. 46, pp. 357-366, March 1998.
[20] A. Scaglione, G. B. Giannakis, and S. Barbarossa, Filterbank transceiver opti-
mizating information rate in block transmissions over dispersive channels, IEEE
Transactions on Information Theory, vol. 45, pp. 1019-1032, April 1999.
[21] A. Scaglione, G. B. Giannakis, and S. Barbarossa, Redundant filterbank precoders
and equalizers part i: Unification and optimal designs, IEEE Transactions on
Signal Processing, vol. 47, pp. 1988-2006, July 1999.
[22] H. Sampath, P. Stoica, and A. Paulraj, Generalized linear precoder and decoder
design for MIMO channels using the weighted MMSE criterion, IEEE Transactions
on Communications, vol. 49, pp. 2198-2206, December 2001.
[23] A. Scaglione, P. Stoica, S. Barbarossa, G. B. Giannakis, and H. Sampath, Optimal
designs for space-time linear precoders and decoders, IEEE Transactions on Signal
Processing, vol. 50, pp. 1051-1064, May 2002.
[24] E. Onggosanusi, A. Sayeed, and B. V. Veen, Optimal antenna diversity signal
ing for wideband space-time wireless channels utilizing channel state information,
IEEE Transactions on Communications, vol. 50, pp. 341-353, February 2002.
[25] E. Onggosanusi, A. Sayeed, and B. V. Veen, Efficient signaling schemes for wide
band space-time wireless channels using channel state information, IEEE Trans
actions on Vehicular Technology, vol. 52, pp. 1-13, January 2003.
[26] D. Palomar, J. Cioffi, and M. Lagunas, Joint Tx-Rx beamforming design for mul
ticarrier MIMO channels: A unified framework for convex optimization, IEEE
Transactions on Signal Processing, vol. 51, pp. 2381-2401, September 2003.
[27] D. Palomar and M. Lagunas, Joint transmit-receive space-time equalization in
spatially correlated MIMO channels: A beamforming approach, IEEE Journal on
Selected Areas in Communications, vol. 21, pp. 730-743, June 2003.
[28] D. Palomar, M. Lagunas, and J. Cioffi, Optimum linear joint transmit-receive
processing for MIMO channels with QoS constraints, IEEE Transactions on Signal
Processing, vol. 52, pp. 1179-1197, May 2004.

39
then the transmitter is relieved from calculating the SVDs. Hence in FDD systems, it is
preferable to feed back F, rather than H, to the transmitter. In TDD systems, there are
still advantages for feeding back F since this reduces by approximately half the overall
computational complexity.
We conclude the discussions of the UCD-VBLAST scheme by deriving the SINR of
each subchannel. Note that the diagonal elements of Rj is
rjii=n^l = 1,2,..., Z,, (4.31)
which is the geometric mean of the diagonal elements of E. It follows from (4.26) that
K
\1/L ( k \ l/L
rlu = ( aL~K Rtf + )) = a ( (a-1 (4.32)
Â¡=i
q=i
According to Lemma 4.1.2,
K
l/L
Pi = p ~ II(a 1 a=i
(4.33)
Hence
L K K
loS2(! + Pi) = k&i1 + a~1 i=l i=l :=1
which is exactly the CÂ¡j in (2.10). Hence UCD-VBLAST is strictly capacity lossless.
4.3 UCD-DP
As a dual form of UCD-VBLAST, the UCD scheme can be implemented by using
DP precoding, which we refer to as UCD-DP. For UCD-DP, a direct construction of
the linear precoder F as done in Section 4.2 is not obvious. Instead, we exploit the
We convert the UCD-DP problem into the UCD-VBLAST problem in the reverse
channel where the roles of the transmitter and receiver are exchanged
y = H*x + z. (4.35)
The UCD-VBLAST scheme can be applied to the channel of (4.35), which yields
the precoder Fre and the equalizer {w,}f=1 as in (4.29) and (4.21), respectively. Nor
malize {w}t=1 to be of unit Euclidean norm, which we denote as {w,}f=1. Let W =
[wj,..., w]. According to the uplink-downlink duality, the precoder of UCD-DP should
be F = WD, where D, is diagonal with the diagonal elements which will be

62
Let f, i = 1,..., L, be the scaled version of f\ and has unit length. Denote pt ||f,||2.
Then
, i 1,.... L.
Let ij |f*Hw.,|2. Then (5.32) can be represented in the matrix form
(5.32)
1
o
o
1
Pi
Pi
~P2a 12 22
P2
P2
: . o
~PLalL -PLa2L aLL
PL
PL
(5.33)
According to the uplink-downlink duality, in the original channel, the precoder of TCD-
DP should be F = ..., v/Â¡tZw], where { and the receiving vectors are f, i 1,... ,L. Then we get L subchannels whose ith
scalar subchannel of the MIMO channel is
L i-1
2/t + Y i'HwjsfqjXj + Y + f.*z (5.34)
j=i+l j=1
Applying the dirty paper precoder to x, and treating E}=i f Hwj y/qjXj as the interfer
ence known at the transmitter (note that here we precode the first layer first while for
TCD-VBLAST, we detect the Lth layer first), we obtain an equivalent subchannel
L
yi = ViYlvjiy/qixi+ Y + f*z (5-35)
=1+1
with SINR (again, recall that a 1 and o\ cr2)
<7|f;Hw|2
pi =
1 + Ej=i+1 ^IfrHwjl
for i 1,2,... ,L.
(5.36)
Similar to (5.32), (5.36) can also be represented as
11 Pl12 PllL
Pi
0 a22 P22L
92
=
P2
0 0 an
9L
PL
It is easy to see that > 0, 0 < i < L. It is proven in [49] that E+=i <7
tr(FF*) = Yh=iP- That is, to obtain L subchannels with SINRs {p}f=1,
(5.37)
= tr(FF) =
the TCD-DP

48
Now we consider UCD. We observe that the power allocation applied to each eigen
subchannel is no greater than p. Hence the overall channel throughput of UCD is
m m
Yl bS (l + ^Ak) ^ ^CD < Yl l0g 1 + (4-62)
=1 i=l
where the left term denotes the channel throughput associated with uniform power
allocation. Applying UCD, we obtain m subchannels with the same SNR:
\
771 rn
IK1* ~~ 1 PvcD ~ \
=i \
n (i+pA^) -1.
(4.63)
i=l
The typical error event is
It follows from (4.63) that
Â£ = : PVCD < 1}
(4.64)
m \ / m m \
\J n 0+Â£xh) -1 <11
(4.65)
It is easy to see that
Hence
lim = lim hgPM
p>oc
logp
Mm asm ,im
p->oo log P
logRi(p)
(4.66)
(4.67)
poo log P p->oo log p '
which implies that water filling does not help improve diversity gain.
It follows from the analyses of [16] that the UCD scheme achieves the optimal
diversity-multiplexing tradeoff. In particular, when the transmission data rate is fixed,
disregard the increase of input SNR, the diversity gain is ducd(M,m) = Mm.
w

93
Dimension
Time
SVD_EIG
GTD
a error
SVD_EIG GTD
A error
SVD_EIG GTD
100
0.61
0.20
9.8e15
1.0e-14
3.3e14
0
200
2.24
0.38
2.0e14
1.7e14
5.9e13
0
400
13.84
0.86
6.8e14
3.7e14
3.3e13
0
800
97.50
2.30
9.8e14
7.0e14
1.5e10
0
1200
317.83
5.67
l.le13
1.3e13
1.5e9
0
1600
746.77
10.77
3.2e13
1.5e13
7.7e4
0
Table 6-1: Comparison of SVD_EIG and GTD for inverse eigenvalue problems (CPU time
in seconds, singular value and eigenvalue errors in sup-norm)
6.3.3 Inverse Eigenvalue Problem
In [71] Chu presents a recursive procedure for constructing matrices with prescribed
eigenvalues and singular values. His algorithm, which he calls SVD_EIG, is based on
Horns divide and conquer proof of the sufficiency of Weyls product inequalities. In
general, the output of SVD_EIG is not upper triangular. Consequently, this routine could
not be used to generate the GTD. Chu notes that to achieve an upper triangular matrix
would require an algorithm one order more expensive than the divide-and-conquer
algorithm.
Given a vector of singular values a R" and a vector of eigenvalues A C", with
A ^ cr, we can use the GTD to generate a matrix R with A on the diagonal and with
singular values problem provided by the GTD to Chus algorithm. In our initial experimentation, we
discovered that the algorithm of Chu, as presented in [71], did not work. When this was
pointed out, Chu provided an adjustment in which the parameter Â¡i in [71, (2.2)] was
replaced by /zAi/|Ai|. With this adjustment, it was possible to solve 4 by 4 and 5 by
5 test cases that previously caused failure. The results reported in this section use the
Both Matlab routines GTD (see Appendix) and SVD_EIG [71] require 0(n2) flops,
so in an asymptotic sense, the approaches are equivalent. In Table 6-1 we compare the
actual running times of GTD and SVD_EIG for matrices of various dimensions. These
computer runs were performed on a Sun Workstation with 1 GB memory. In making
these runs, the portion of the GTD code connected with the updating of the matrices
P and Q was deleted since SVD_EIG does not accumulate the unitary matrices. The
input arrays a and A were generated in the following way: Using the Matlab routine
RAND, we randomly generated a square matrix whose element lie between 0 and 1. The

46
where the Q-function is defined as
The diversity gain of the GMD scheme is
log PGMD
d0MD(M,m) = lim 5. (4.50)
P^oo log p
For any QAM constellation, the average error probability is similar to (4.49) except for
some constants before or inside the Q-function. Since we focus on the high SNR region,
all these constants will not affect the diversity gain defined in (4.50).
At high SNR, the typical error event is
Â£ {^h < P }
(4.51)
It can be shown that instead of calculating (4.50), which involves complicated integra
tions, we can compute the following [50, Ch. 3]:
dGMv{M,m) = lim
p*oo
log P(Â£)
log p
(4.52)
Note that
m
= |H*HI (4.53)
i=l
According to [53, Theorem 7.5.3] (with straightforward extensions from real-valued do
main to the complex-valued domain),
^/T = |H*H| = jQ (4-54)
t=l
where g2Ps are independent Chi-squared random variables with probability density
= J~iyx'~le~X' x - (4.55)
Now the typical error event can be written as
m
{5A/-m+i}t=l j[ 9M-m+i ^ P
i-1
U {{?M-m+}Â£l -ah-m+i

58
Denote the SVD of a rank K channel H as H = UAV*, where A is a K x K
diagonal matrix whose diagonal elements are the nonzero singular values of
H. The conventional SVD based linear transceiver designs have precoder F = V\$V2
where is a diagonal matrix whose diagonal elements stand for the power allocation.
The precoder F transforms the MIMO channel into K orthogonal subchannels with
capacities
Ck = log2(l + Alk) bps/Hz, k = 1,2,..., K. (5.12)
For this kind of precoder design, the only way of controlling the capacity of the sub
channels is to change the power allocation .
If we modify the precoder F to be 3
F = V\$l'2nr (5.13)
where 6 RixK with L > K, and 2T2 I, then it can been readily seen that
introducing 2 does not change the overall channel capacity. However, it brings much
greater flexibility as demonstrated in the following theorem.
Theorem 5.3.1 (TCD Theorem) Consider a MIMO channel of (4-1) with F given
in (5.13). For any L > K, let c be a zero vector with its first K elements
replaced with {Ck}k=v where Ck = log (1 -f A2H k4>k)- Given any rates we can
find an orthonormal matrix 2 6 RixK such that the combination of the linear precoder
F = Vfc^nT and the MMSE-VBLAST detector yields L subchannels with capacities
{ify}Â£=i if and only if {Rk)k=i -<+ c.
Proof: Given the precoder of (5.13), the virtual channel is
G = HF = UA<>1/2f2T = UAG2T (5.14)
where Ac = A\$1/2 is a diagonal matrix with diagonal elements
Xg, = XH(Â¡)]/2, i = l,...,K. (5.15)
3 Letting 2 to be complex-valued does not introduce additional flexibility as is clear
according to the GTD algorithm.

23
subchannels will cause the failure of the decoding in all the subsequent subchannels. The
SER upper bound is readily calculated as
Pe,GMD-VBLAST =
<
-tfitl-P.m-r,)P.
n0
1 K~'
-^T(K-n)Pc
71=0
(3.22)
For a moderate K, say K < 10, the performance loss caused by the error propagation is
rather small. For a system with high dimensionality, GMD-ZFDP is a better choice since
it causes no error propagation. On the other hand, the Tomlinson-Harashima precoder
leads to an input power increase of for M-QAM.
3.3.2 Combination of GMD with Two-way Channel Subspace Tracking
In TDD systems, the GMD scheme may be combined with two-way channel sub
space tracking techniques. The GMD algorithm, given in Chapter 6, starts with the
SVD. To calculate the matrix P (cf. (3.11)), we only need to know the singular values
A and the right singular vectors V (cf. Chapter 6). Similarly, only A and U are used
to calculate Q. Rewriting (2.1) with the precoder F = P yields,
y = HPx + z. (3.23)
Since the GMD scheme uses the same signal constellation and uniform power allocation,
the covariance matrix of s is a scaled identity matrix, i.e., E[xx*] = Ry = Â£[yy*] = HHV2 -F a\l. (3.24)
If the signal power a2 and the noise power a2 are known a priori, we have HH* =
(Ry cr2I)/cr2. Applying SVD to HH*, we get
HH* = UA2U*. (3.25)
The GMD algorithm can be applied based on U and A to get the matrices Q and R,
which are sufficient for decoding. If a TDD system is used, the reverse channel, where
the roles of previous transmitter and receiver are exchanged, can be modeled as
Y rev H Q Srer -|- Z,.rl;
(3.26)

4
that can achieve the optimal tradeoff between the diversity gain and multiplexing gain.
Without incurring any capacity loss, the TCD scheme can decompose a MIMO chan
nel into multiple subchannels with prescribed capacities/channel gains. This scheme
is applicable to a wide range of applications, including the multi-task communications
where independent data streams with different qualities-of-service (QoS) share the same
MIMO channel, and designing the optimal CDMA sequences.
1.4 Dissertation Outline
In Chapter 2, we introduce the data model and some relevant information-theoretic
results that will be used in this dissertation. We also review the existing transceiver
designs and analyze the performances of those methods. By linking the channel capacity
with the Cramer Rao bound (CRB), we give an information-theoretic explanation why
linear transceivers are inflexible.
Chapter 3 presents the GMD scheme that combines the VBLAST detector or DP
precoder with the GMD matrix decomposition algorithm. The GMD scheme can decom
pose a MIMO channel into multiple identical scalar subchannels. This desirable prop
erty can bring much convenience to the practical system design, particularly the symbol
constellation selection. Moreover, we have shown that the GMD scheme is optimal as
ymptotically for high SNR in terms of both information rate and BER performance
while the computational complexity of our scheme is comparable to the conventional
linear transceiver scheme.
In Chapter 4, we propose a uniform channel decomposition (UCD) scheme. Similar
to the GMD scheme, the UCD is also based on the GMD matrix decomposition algo
rithm and can decompose a MIMO channel into multiple identical subchannels. Two
remarkable merits of UCD, which are not shared by the GMD scheme, are that first,
UCD is strictly capacity lossless at any SNR, and second, UCD can achieve the opti
mal diversity and multiplexing tradeoff. Moreover, the UCD scheme can decompose
a MIMO channel into an arbitrarily large number of independent subchannels, which
is an enabling technology to achieve high data rate transmission using small symbol
constellations.
Chapter 5 is devoted to tackling a new aspect of the MIMO transceiver design
problem. Instead of attempting to optimize the BER performance for fixed input power
and data rate, we propose the TCD scheme which can decompose a MIMO channel
into multiple subchannels with prescribed channel capacities. We show that TCD is a

24
where the subscript rev means reverse channel. Define
= Â£[yreyL] (3-27)
where y denotes the complex conjugate of y. Using the similar argument, we have
H*H = VA2V*. (3.28)
Then the reverse receiver, i.e., the previous transmitter, can calculate R and P from V
and A. Channel subspace tracking techniques (see, e.g., [45] [46]) can be used to estimate
U, V and A efficiently. Hence our GMD scheme can be applied without the need of
using training symbols for channel estimation. We note that this merit of GMD is not
shared by the conventional transceiver schemes introduced in Section 2.3 since all those
methods allocate different powers to different subchannels, which makes it difficult, if
not impossible, to estimate the singular values in A. Of course, if the same power is
allocated to each eigen-subchannel, this blind two-way channel subspace tracking idea
can also be combined with the SVD based schemes, at the cost of significant capacity
loss.
The GMD scheme can be made backward compatible with the TDD systems using
VBLAST decoders. By using CSIT or blind subspace tracking techniques, the trans
mitter can calculate the linear precoder F. Hence it can always precode the transmitted
data x to be Px, even when sending the training data. Thus the receiver is fooled
to believe that the channel is the virtual one Ht = HP = QR. Although the linear
precoder P is made transparent to the VBLAST detector, the decoder still enjoys the
multiple identical subchannels due to the linear precoder F = P.
3.3.3 Subchannel Selection
The previous discussion is based on the assumption that all the subchannels cor
responding to positive singular values are used for signal transmission. However, in
practical scenarios, some of the positive singular values of the channel matrix H can be
very small. This situation occurs for spatially correlated flat fading channels, or even
i.i.d. Rayleigh flat fading channels with Mr ta Mt 1. From (3.12b), we see that
it will influence the overall channel quality and hence subchannel selection is helpful.
The other situation where subchannel selection is needed is the case when the input
power is low or moderate. In this section, we propose a simple algorithm to select the

55
However, time sharing can be difficult to implement in practice. For instance, the
major difficulty of DBLAST is the requirement of encoding the diagonal layer with short
and efficient error correction codes, which limits its practical implementation despite its
superb theoretical performance analyzed in [16].
If CSIT is available, more flexible and practical channel decompositions can be
achieved. In Chapter 4, we have proposed the UCD scheme which combines the geomet
ric mean decomposition (GMD) developed in Section 6.2 with either an MMSE-VBLAST
detector or a DP precoder to decompose the MIMO channel of (5.1) into L > K iden
tical subchannels. Hence, the UCD scheme can achieve the theoretical performance of
the DBLAST scheme without resorting to any error correcting coding.
In this chapter, we generalize the results of Chapter 4 and develop a systematic
channel decomposition that combines the recently proposed GTD algorithm with either
an MMSE-VBLAST detector or a DP precoder. We show that given K parallel subchan
nels with capacities Ci, C2,..., Ck, which are obtained via SVD, TCD can convert the
K subchannels into L > K subchannels2 with capacities Rx, R2,..., R if and only if
(Ri, /?2, , Rl) is majorized by {C\,..., Ck, 0,..., 0) 6 ML. This scheme is particularly
relevant to the applications where independent data streams with different qualities-of-
service (QoS) share the same MIMO channel [28], For example, video services usually
require higher SNRs than audio services. Decomposing a MIMO channel into multi
ple subchannels with prescribed capacities and transmitting independent data streams
through these subchannels can provide much convenience for resource allocations.
5.2.3 Majorization and Generalized Triangular Decomposition
We introduce several basic concepts and theorems of the majorization theory from
[58]-
Definition 1 For x, y G Rn, if
i i
l^j 1=1 1=1
with equality holds for j = n, where the subscript [i] denotes the ith largest element of the
sequence, we say that x is majorized by y and denote x -<+ y or, equivalently, y x.
2 If L < K, some eign-subchannels are discarded, which causes capacity loss. Hence
we focus on the case of L > K.

12
coder/decoder, and the CRB, which is a lower bound on the covariance matrix of any
unbiased estimator of x.
The MMSE estimator of x is
xMMSE = R,H (HRXH* + R,)-1 y (2.35)
It is easy to verify that the MMSE estimator of x can achieve the CRB. Hence the MMSE
estimator is the best we can achieve under the Gaussian assumptions. In general cases,
the matrices FIM and CRB are non-diagonal; i.e., the MMSE estimates of the elements
of x are correlated. The correlations between the elements of x clearly contain useful
information for the subsequent decoding procedures. However, in practice, we only
estimate the single elements of x separately and ignore the correlations between these
elements. This causes the loss of information. In fact, we can quantify the capacity loss
as
Qob. = Â£ log CRB** log |CRB| (2.36)
fc=i
where CRB** denotes the k-th diagonal element of CRB. According to the Hadamard
inequality [34], for any positive semidefinite matrix M 6 CK,
K
|M| < H Mkk (2.37)
2 = 1
and the equality holds if and only if M is diagonal. Hence Closs > 0 and there is no
capacity loss if and only if CRB is a diagonal matrix.
Based on the aforementioned discussions, we see that i) in general MIMO com
munications, linear MMSE estimators followed by separate substream decoding are not
capacity-wise optimal and ii) if the channel matrix H has the property that CRB of
(2.34) is a diagonal matrices, linear MMSE estimators may be the first step of capacity
lossless processing. If CSIT is available, the transmitter can apply some precoder F and
get a virtual channel matrix
Hvt = HF (2.38)
such that CRB is diagonal. This explains why all the existing linear transceiver designs
invariably lead to the diagonalization of the channel matrix. Indeed, if Rx is diagonal
and R- = a]I, then it follows from (2.31) that Hvt must have orthogonal columns
to get diagonal FIM and hence diagonal CRB. Then the precoder F = V, which
is the right singular vector of H, is the only optimal solution. Yet as we discussed

I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Jian Li, Chair
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Kenneth K. O
Professor of Electrical and Computer
Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
John M. Shea
Assistant Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
Tan F. Wong
Associate Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms to acceptable
standards of scholarly presentation and is fully adequate, in scope and quality, as a
dissertation for the degree of Doctor of Philosophy.
William W. Hager
Professor of Mathematics

32
GMD+OFDM, N = 64, L = 4, 64-QAM
Figure 3-5: BER performances of GMD-VBLAST and GMD-ZFDP. Both are combined
with OFDM for ISI suppression.

LIST OF FIGURES
Figure page
3-1 Average capacity over 1000 Monte Carlo trials vs. SNR with Mt = 4 and
Mr = 4 for i.i.d. Rayleigh flat fading channels 29
3-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of the i.i.d. Rayleigh flat fading channel with Mt = 5 and
Mr = 5. Results based on 2000 Monte Carlo trials 30
3-3 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt 10 and Mr = 10. Results
based on 1000 Monte Carlo trials. SNR = (a) 0 dB, (b) 10 dB, (c) 20
dB, and (d) 30 dB 31
3-4 BER performance averaged over 1000 Monte Carlo trials of i.i.d. Rayleigh
flat fading channel vs. SNR with (a) Mt = 2 and Mr 4 and (b) Mt 4
and Mr = 4 31
3-5 BER performances of GMD-VBLAST and GMD-ZFDP. Both are com
bined with OFDM for ISI suppression 32
4-1 Complementary cumulative distribution function of the capacity of an
i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. Results
based on 2000 Monte Carlo trials. SNR = (a) 10 dB, (b) 10 dB (c) 20
dB, and (d) 30 dB 49
4-2 Complementary cumulative distribution functions of the capacities of 5
subchannels of an i.i.d. Rayleigh flat fading channel with Mt 5 and
Mr = 5. Results based on 2000 Monte Carlo trials 50
4-3 Uncoded BER performance when using 16-QAM. Results based on 1000
Monte Carlo trials of an i.i.d. Rayleigh flat fading channel with Mt 4
and Mr = 4 50
4-4 BER performances of the UCD-DP, UCD-VBLAST schemes and the
imaginary UCD-genie scheme. Results based on 1000 Monte Carlo
trials of an i.i.d. Rayleigh flat fading channel with Mt = 10 and Mr = 10. 51
5-1 Illustration of the capacity lossless region obtainable via TCD. We assume
K = 3, Ci = 3, C2 = 2, and C3 = 1 57
5-2 A Matlab function to solve (5.49) 68
5-3 Input SNR vs. Output SINR. The result is based on the average of 500
Monte Carlo trials of a i.i.d. Rayleigh flat fading channel with Mt 5
and Mr = 6 69
5-4 Input SNR vs. C\. A rank 2 channel is decomposed into two subchannels
with capacities C\ and C2 = 10 C\ 70
6-1 The operation displayed in (6.8) .
vii
86

10
C. There are more transmitting antennas than receiving ones, i.e., Mt> Mr.
Moreover, the availability of CSIT provides more freedom, which makes it easier to
devise joint transceiver design schemes to achieve the underlying channel capacity. This
observation is the underlying theme of this dissertation.
2.2 Channel Capacity and Cramr-Rao Bound
One of the most important significances of the Shannons information theory is that
this theory can predict the highest achievable data rate for a given channel. Similarly,
the Cramr-Rao bound (CRB) [33], which is the inverse of the Fisher information matrix
(FIM), can predict the minimum mean squared error (MMSE) an estimator can achieve.
In this section, we show that the MIMO channel capacity formula of (2.6) can be re
written as a function of CRB, or FIM. Based on this relationship, we show that linear
transceivers lack flexibility.
We rewrite (2.1) as follows:
y = Hx + z, (2-19)
but we relax the assumption of (2.1) slightly. Instead of assuming spatially white noise,
we assume that z ~ A'(0, R.). We also assume that the channel input x ~ N(0. Rx)
also has circularly symmetric complex Gaussian distribution and is independent of z.
Then the channel output y ~ Ar(0. HR IT -f R,). For this more general scenario, the
channel capacity is
^ |R2 + HRXH*
c=logm
Now Consider the following random vector,
X
~ Ar [ 0,
y
l [
Its log-likelihood function is
R,
HR,
log/(x,y)
const [x* y*]
R,
R,H*
-1
X
HR,
Ry
y
(2.21)
(2.22)
Using the block matrix inversion formula [34], we get
R.r
R,H*
-1
A B
HR,
Ry
B*
(2.23)

65
Moreover, if QRGaPT is the GTD of Aca in (5.19), then (5.45) has the solution F =
V^1/2^7 where \$ is a solution of (5.47) and PI is the matrix formed by the first K
columns of PT.
Proof: See Appendix A.
We now develop an efficient algorithm for solving (5.47). We will see that the
constraint cpk > (5.47). To begin, we make a change of variables to further simplify the formulation of
(5.47). We define
A = i + 1 /A#,*, 1 < i < K,
A = 1 AH,i
fa = rittiri1+pi)-
With these definitions, (5.47) reduces to
miiv J2i 'Pi

subject to nil 'Pi > nil A, A > 1 1 (5.49)
Both the equality constraint and the inequalities > 4>k+\ in (5.47) have been dropped
since these constraints are automatically satisfied at an optimum. The fact that (Â¡>k >
(Â¡>k+\ is established after Lemma 5.4.2. With regard to the equality constraint, if ip is
feasible in (5.49) and the inequality corresponding to k = K is strictly positive, then the
cost is reduced when the trailing components of f> are lowered. That is, if ip is feasible
in (5.49) and the inequality corresponding to k = K is strictly positive, then the cost is
reduced when the trailing components of ip are lowered.
Clearly, the feasible set for (5.49) is nonempty and the cost function tends to infinity
as any of the components of ip tends to infinity. By continuity of the cost function and
the constraints, a minimizer must exist. We now analyze the structure of the minimizer.
By exploiting the structure, we obtain a fast algorithm for solving (5.49).
We first study a similar optimization problem with relaxed constraints.
Lemma 5.4.1 Any solution ip of the problem
K k k
min 'Pi subject to riA>n A, 1 < k < K, (5.50)
i=l =1 t=l
has the property that ipi+1 < ipi for each i.

14
where V is as defined in the SVD H = UAV*, and d> is a diagonal matrix whose ith
diagonal element denotes the signal power loaded to the ith subchannel. According
to the literature (see e.g. [23] Sec. III-A)
fa
^1/2Ah, A2,.
(2.43)
where p is the Lagrange multiplier which controls the loaded power such that fa =
pa2. Suppose p is sufficiently large. Then all the K subchannels are used and
K
or
1/2 P + E.=i A Â£
Eli, *iW'
(2.44)
(2.45)
Substituting (2.42), (2.43) and (2.45) into (2.41), we see that E is diagonal with the th
diagonal element
spK x-i
Z^k=l AH,k
Ei =
(P + Xjt=i A///t)A//,
Then (cf. Equation (28) of [26])
Ci = log2 Et
l (P + Sfc=1 AH k \ , x
= lg2 | .-1 + log2 AH.i-
V* A_:
\ 2^i=k AH,k /
Hence the sum rate of the channel using the MTM scheme is
k fn + yrK A-2 \ K
Cmtm = J2i = Klog2 | K k~*H,k ] + loS2 Ah,.-
.=1 \ X-/fc=l A H k ) i=i
The channel capacity with uniform power loading in the K subchannels is
K
Cupl = Y log2 (1 +
i= 1
(2.46)
(2.47)
(2.48)
(2.49)
(2.50)
Here Cupl is different from CUT defined in (2.11) in that Cupl corresponds to the
channel with the transmitter knowing the range space of H.
It follows from (2.49) and (2.50) that
Cupl-Cmtm = lg2 (l + -^A^)Alog2 (P 1 H'') lQg2 Ah,.- (2.
=1 \ 2^= 1 AH,i / ,-=l
51)

96
*/. H = Q*R*P (GTD based on r)
*/, P and Q orthonormal columns
/, R upper triangular, R (i, i) = r (i)
function [Q, R, P] = gtd (U, S, V, r)
d = diag (S) ;
K = min (size (S)) ;
P=V;Q=U;R= zeros (K) ;
for k = 1 : K-l
rk = r (k) ;
abs_rk = abs (rk) ;
kpl = k + 1 ; kml = k 1 ;
I = find (abs (d (k : K)) > abs_rk) ;
if ( isempty (I) )
[x, p] = max (abs (d (k : K))) ;
p = p + kml ;
else
I = I + kml ;
[x, p] = min (abs (d (I))) ;
P = I (p) ;
end
deltal = d (p) ;
d ([k p]) = d ([p k]) ;
I = find (abs (d (kpl : K)) <= abs_rk) ;
if ( isempty (I) )
[x, q] = min (abs (d (kpl : K))) ;
q = q + k ;
else
I = I + k ;
[x, q] = max (abs (d (I))) ;
q = i (q) ;
end

CHAPTER 4
UNIFORM CHANNEL DECOMPOSITION
We have seen in Chapter 3 that the GMD scheme can have much better perfor
mance than the conventional linear transceivers. However, the GMD scheme may suffer
from considerable capacity loss at low SNR due to the inherent zero-forcing oper
ations which is capacity lossy, especially at low SNR. In this chapter, we propose a
uniform channel decomposition (UCD) scheme, which is also based on the GMD matrix
decomposition algorithm, to decompose a MIMO channel into multiple identical sub
channels. The UCD scheme has two implementation forms. One is the combination
of a linear precoder and a minimum mean-squared-error VBLAST (MMSE-VBLAST)
detector, which is referred to as UCD-VBLAST, and the other includes a dirty paper
(DP) precoder and a linear equalizer followed by a DP decoder, which we refer to as
UCD-DP. Just like the GMD scheme, UCD can bring much convenience to the subse
quent modulation/demodulation and coding/decoding procedures by obviating the need
of bit allocation. Two remarkable merits of UCD, which are not shared by the GMD
scheme, are that first, UCD is strictly capacity lossless at any SNR, and second, UCD has
the maximal diversity gain. Moreover, the UCD scheme can decompose a MIMO chan
nel into an arbitrarily large number of independent subchannels, which is an enabling
technology to achieve high data rate transmission using small symbol constellations.
To facilitate the discussion, we recall the channel model given in (2.1) as follows.
y = HFx + z, (4.1)
where x G and y G CMrXl is the received signal and H G CMrXMt is the channel matrix with rank
K. We assume E[xx*] = and z ~ N(0,a^I\Â¡T) is the circularly symmetric complex
Gaussian noise. We define the input SNR as
P = -~-J2 FX^ = ^Tr{F*F} 4 Tr{FF}. (4.2)
33

75
5.5.5 Further Remarks
The TCD scheme, which was originally motivated by MIMO transceiver designs,
turns out to be similar to the scheme of [56] in several aspects. Both schemes are based
on the nonlinear decision feedback operations. Hence both are optimal in terms of max
imizing the channel throughput and minimizing the overall input power. Both the GTD
algorithm, on which the TCD scheme is based, and the construction of the Hermitian
matrix with prescribed eigenvalues and Cholesky values as done in [56] rely on the Weyl-
Horn theorem. However, our TCD scheme enjoys several remarkable advantages over
the scheme of [56]. First, note that if we obtain the GTD H = QRP*, where R has the
prescribed diagonal elements, then it follows immediately that A = P*H*HP = RR* is
the desired Cholesky decomposition. However, the information associated with Q is lost
in the Cholesky decomposition. Hence the nulling vectors used at the receivers of [56]
cannot be calculated explicitly as our TCD does (cf. (5.27)). Furthermore, the correla
tion matrix A is only an intermediate result. To get the CDMA sequences, one has to
decompose A = RR* explicitly. The TCD scheme, however, can be used to obtain both
the precoder (CDMA sequences), which are the columns of P, and the equalizer from Q
simultaneously. Second, our TCD scheme is a solution to the more general MIMO trans
ceiver design problem. The Cholesky decomposition algorithm provided in Appendix C
of [56] is only applicable to the scenario where the singular values are only of two values.
Hence it is not applicable to the general design of MIMO transceivers. The more general
Cholesky factorization algorithm suggested in the proofs is computationally far less effi
cient. Third, the TCD scheme has two implementation forms, i.e., TCD-VBLAST and
TCD-DP, which makes it applicable to both the downlink and uplink scenarios. Finally,
the TCD scheme provides insights that identify the CDMA sequence design problem as
special cases of the MIMO transceiver design.
5.6 Conclusions
Based on the recently developed GTD matrix decomposition algorithm, we have
proposed the TCD scheme utilizing the CSIT and CSIR. TCD can be used to decompose
a MIMO channel into multiple subchannels with prescribed capacities. The TCD scheme
has two implementation forms. One is the combination of a linear precoder and a
minimum mean-squared-error VBLAST (MMSE-VBLAST) detector, which is referred
to as TCD-VBLAST, and the other includes a dirty paper (DP) precoder and a linear
equalizer followed by a DP decoder, which we refer to as TCD-DP. Both forms of TCD

88
computing the decomposition H = QRP\ This algorithm for the GTD essentially
yields a constructive proof of Theorem 6.3.2.
Let VSW* be the singular value decomposition of H, where Â£ is a K by K diagonal
matrix with the diagonal containing the positive singular values. We let 6 CKxK
denote an upper triangular matrix with the following properties:
(a) r\^ = 0 when i > j or j > i > L. In other words, the trailing principal submatrix
of starting at row L and column L, is diagonal.
(b) If denotes the diagonal of R^L\ then the first L 1 elements of r and are
equal. In other words, the leading diagonal elements of R) match the prescribed
leading elements of the vector r.
(c) ti-k where Tl-k denotes the subvector of r consisting of components L
through K. In other words, the trailing diagonal elements of R^) multiplicatively
majorize the trailing elements of the prescribed vector r.
Initially, we set R(1^ = Â£. Clearly, (a)-(c) hold for L = 1. Proceeding by induction,
suppose we have generated upper triangular matrices R^, L = 1,2satisfying
(a)-(c), and unitary matrices Q and P, such that R(+1) = QR()P for 1 < L <
k. We now show how to construct unitary matrices Qt and P*, such that R^+b =
Q^R^'^Pfc, where R^fc+1^ satisfies (a)-(c) for L = k + 1.
Let p and q be defined as follows:
p = arg min{|rt-fc)| : k < i < K, |r{fc)| > |r*,|}, (6.13)
i
q = arg max{|rf}| : k i
where r-A) is the -th element of Since rk-.K ^ r^, there exists p and q satisfying
(6.13) and (6.14). Let II be the matrix corresponding to the symmetric permutation
ITR^n which moves the diagonal elements and r\$ to the k-th and (k + l)-st
diagonal positions respectively. Let 8X and 52 = denote the new diagonal
elements at locations k and k + 1 associated with the permuted matrix ITR^n.
Next, we construct unitary matrices Gi and G2 by modifying the elements in the
identity matrix that lie at the intersection of rows k and k + 1 and columns k and
k + 1. We multiply the permuted matrix II*R(ir)n on the left by G2 and on the right
by Gj. These multiplications will change the elements in the 2 by 2 submatrix at the
intersection of rows k and A: + 1 with columns k and k + 1. Our choice for the elements
of G! and G2 is shown below, where we focus on the relevant 2 by 2 submatrices of Gj,

CHAPTER 7
CONCLUSIONS
This dissertation studies the signal processing aspect of MIMO communications.
We present a new perspective to the MIMO communications: any MIMO scheme can
be regarded as a MIMO channel decomposer, which decomposes a MIMO channel into
multiple scalar subchannels. Based on this perspective, this dissertation presents three
novel MIMO transceiver designs: the geometric mean decomposition (GMD) scheme,
the uniform channel decomposition (UCD) scheme, and the tunable channel decompo
sition (TCD) scheme. All these schemes deploying either a VBLAST detector at the
receiver or a dirty paper precoder at the transmitter. These transceiver designs rep
resent a paradigm shift from the conventional linear MIMO transceiver design to the
nonlinear ones. The superior performance of the GMD and UCD scheme unveils the
practical significance of the collaborations between the transmitter and receiver. That
is, such collaborations facilitate achieving the optimal tradeoff between the diversity and
multiplexing gains promised by the MIMO communication theory. The TCD scheme
represents a unifying solution to a considerably wide range of problems, including de
signing the precoder for OFDM communications and the optimal CDMA sequences.
Motivated by the application of transceiver designs, this dissertation also intro
duces two novel matrix decomposition algorithms, i.e., the geometric mean decompo
sition (GMD) and the generalized triangular decomposition (GTD). The two matrix
decompositions are the cornerstones of the three transceiver designs proposed in this
dissertation. Moreover, the two matrix decomposition algorithms have significant im
plications in the matrix analysis community. For instance, the GTD is a new and more
efficient solution to the inverse eigenvalue problem.
99