Title: Comparative performance analysis of parallel beamformers
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00094778/00001
 Material Information
Title: Comparative performance analysis of parallel beamformers
Physical Description: Book
Language: English
Creator: Kim, Keonwook
George, Alan D.
Sinha, Priyabrata
Publisher: High-performance Computing and Simulation Research Laboratory, Department of Electrical and Computer Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 1999
Copyright Date: 1999
 Record Information
Bibliographic ID: UF00094778
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.


This item has the following downloads:

ICSPAT1999 ( PDF )

Full Text

1999, HCS Research Lab All Rights Reserved

Comparative Performance Analysis
of Parallel Beamformers

Keonwook Kim, Alan D. George and Priyabrata Sinha
HCS Research Lab, Electrical and Computer Engineering Department, University of Florida
216 Larsen Hall, Gainesville, FL 32611-6200, USA

Advancements in beamforming algo-
rithms are exceeding the computation and
communication capabilities of traditional
sonar array systems. Implementing parallel
beamforming algorithms in situ on distrib-
uted array systems holds the potential to
provide increased performance and fault tol-
erance at a lower cost. This paper compares
three parallel algorithms for distributed ar-
rays in terms of execution throughput, result
latency, scaled speedup, and parallel effi-

1. Introduction
Passive sonar beamforming is a class of
array processing that optimizes an array gain
in a direction of interest to detect and locate
objects in an undersea environment. Beam-
forming algorithms are particularly vital in
radar and sonar applications. The parallel
algorithms considered here are designed for
a distributed array of sonar transducer
nodes, each with its own processing element
and interconnected by a network. The neces-
sity for parallel beamforming algorithms is a
direct result of the development of advanced
beamforming algorithms that are better able
to cope with quiet sources and cluttered en-
vironments. These developments have re-
sulted in increased demands for real-time
computation. Moreover, the use of larger
sonar arrays has in turn led to larger problem
sizes. Thus, a beamformer based on a cen-
tralized processing system may prove insuf-
ficient to meet these demands, and parallel

processing in-situ on a distributed array is a
promising alternative.

2. Overview of Beamforming Algorithms
The basic operation in most beamform-
ing algorithms is to sum the manipulated
outputs from many spatially separated sen-
sors. The three parallel beamforming algo-
rithms discussed in this paper are based on
Conventional Beamforming (CBF), Split-
Aperture Conventional Beamforming (SA-
CBF) [1] and an Adaptive Beamforming
(ABF) algorithm for subspace projection [2],

2.1 Conventional Beamforming (CBF)
In a sonar array, the determination of the
direction of arrival relies on the detection of
the time delay of the signal between sensors.
In CBF, signals sampled across an array are
linearly phased (i.e. delayed) assuming a
configuration with uniform distance between
elements in the array. Incoming signals are
steered by complex-number vectors called
steering vectors. If the beamformer is prop-
erly steered to an incoming signal, the multi-
channel input signals will be amplified co-
herently, maximizing the beamformer output
power. Otherwise, the output of the
beamformer is attenuated to some degree.
Thus, peaks in the beamforming output indi-
cate directions of arrival for sources. The
output power of CBF for each steering angle
0 is defined as
P, () = s() R s(0) (1)

where s(O) is the steering vector, R is the
Cross-Spectral Matrix (CSM), and operator
* indicates complex-conjugate transposition.

2.2 Split-Aperture CBF (SA-CBF)
SA-CBF is based on single-aperture
conventional beamforming in the frequency
domain. The beamforming array is logically
divided into two sub-arrays. Each sub-array
independently performs CBF using steering
vectors on its own data. The two sub-array
beamforming outputs are cross-correlated to
detect the time delay of the signal for each
steering angle. Interpolation is used to gen-
erate outputs for angles other than the steer-
ing angles (e.g. a ratio of 4-to-1 is used in
this study). The cross-correlated data, with
knowledge of the steering angles and several
other parameters, will map the final
beamforming output. Figure 1 shows the
block diagram of the SA-CBF algorithm.


FatFoure ad Fourier Fina plot
Fast Fourier Zd Transform
Transform Smoothed
Transform Interpolation

Mul-channel Sub-array2

teegnalng ang
replica vectors

Figure 1: Block diagram of SA-CBF

2.3 Adaptive Beamforming (ABF)
The ABF algorithm used in this study is
a subspace-projection beamformer based on
QR decomposition [2]. Subspace beamform-
ing algorithms for ABF such as MUSIC
make use of the property that eigenvectors
associated with noise are orthogonal to the
space spanned by the incident signal mode

vectors. The reciprocal of steered noise sub-
space indicates peak points at signal loca-
tions. However, subspace identification to
separate noise and signal using the Singular
Value Decomposition (SVD) is computa-
tionally expensive to perform and difficult to
implement in a parallel algorithm due to the
many dependencies between the computa-
tional tasks. Instead of using the eigenvec-
tors of CSM matrix, the columns of the Q
matrix are used, which correspond to the
noise subspace. The Q matrix is from the
QR decomposition of the CSM matrix using
elementary reflectors in this study. The out-
put of the subspace beamformer is defined

PABF) (2)
s* ()ENEs())
where EN is the columns of Q matrix corre-
sponding to the noise space.

3. Parallel Beamforming Algorithms
In a distributed sonar array for parallel
processing in situ, the degree of parallelism
is linked to the number of physical nodes in
the system. However, an increase in the
number of nodes increases the problem size.
The goal is to obtain minimum processor
stalling through equal distribution of work
and minimum communication overhead.
The method of parallelization employed
by the parallel algorithms in this paper,
known as iteration decomposition [3,4], fo-
cuses on the partitioning of beamforming
jobs across iterations, with each iteration
processing a different set of array input
samples. Successive iterations are assigned
to successive processors in the array and are
overlapped in execution with one another by
pipelining. A single node performs the
beamforming task for a given sample set
while the other nodes simultaneously work
on their respective beamforming iterations.
At the beginning of every iteration, each
node executes an FFT on data that has been

newly collected by its sensor, and the results
are communicated to other processors before
the beamforming for that iteration com-
mences. The block diagram in Figure 2 il-
lustrates the manner in which beamforming
iterations are distributed across the nodes in
the distributed array, in this case using 3
Job 1/3 FFT Job 2/3 FFT Job 3/3 FFT Job 1/3 FFT

Nodel 1
Job 3/3 FFT Job 1/3 F Job 273 F Job 3/3 F

iter io1 Iter l to Iter +2 Iter +2

Figure 2: Iteration decomposition

Each processor calculates an index based
on its node number, the current job number
and the number of nodes. This index tells
the node from which point in its iteration it
must continue after executing the FFT and
communication stages and when it must
pause to begin another iteration.
The communication pattern that can be
expected in the CBF and SA-CBF algo-
rithms is an 'all-to-one' pattern, as only one
of the nodes needs to receive the data sam-
pled each cycle to perform a given
beamforming iteration on data collected
throughout the array. However, ABF algo-
rithms require 'all-to-all' communication so
that the cross-spectral matrix on each of the
nodes is updated with each cycle of sam-
pling. Thus, to provide a common frame-
work for comparisons, an 'all-to-all' type of
communication is used with all the parallel
algorithms, where each node sends its Fou-
rier transformed data to all other nodes.

4. Parallel Performance Analysis

The parallel algorithms in this paper
were implemented in MPI-C and executed
on a cluster of SPARCstation-20 worksta-
tions connected by a 155 Mb/s ATM net-

work. In the experiments described in this
section, a sampling frequency of 1500Hz is
assumed and the beamforming is performed
for 200 frequency bins. The FFT length of
the processed data is 2048 samples, and no
frequency-bin averaging is performed. Ap-
proximately 180 steering angles are resolved
(i.e. 181 for CBF and ABF, and 177 for SA-
CBF), with a 4-to-1 ratio of interpolation
used with SA-CBF.
It is apparent from the results in Figure 3
that the effective execution times of the par-
allel algorithms are much lower than their
sequential counterparts. Moreover, the in-
crease in parallel execution time as the prob-
lem size increases is less pronounced than in
the sequential case. Thus, the parallel algo-
rithms are seen to provide a higher through-
put of execution.

1800 SA GBF
1 .---SA-CBF
S 1400
C 1000
200 -
Sequential 4 Sequential 6 Sequential 8 Parallel 4 Parallel 6 Parallel 8
nodes nodes nodes nodes nodes nodes
*CBF 263 490 790 78 97 123
SA-CBF 225 273 316 62 54 51
OABF 298 772 1731 90 158 266
Figure 3: Beamformer execution times

The execution time for parallel SA-CBF
is always less than that for both parallel CBF
and parallel ABF. Unlike SA-CBF, the CBF
and ABF algorithms must directly process
for all steering angles, no interpolation is
performed, and hence they involve more
computation. This characteristic is an inher-
ent strength of the SA-CBF algorithm. ABF
requires a higher execution time than both
CBF and SA-CBF because of the complex-
ity of the QR decomposition stage.
Figure 4 shows the result latencies for
the different algorithms. Result latency re-

fers to the time required for the final output
to be available after the data has been read
by the sensors. In the case of sequential al-
gorithms, result latency is the same as the
execution time, as there is no pipelining in-

>, 1500

1000 -

5 00-

Sequential 4 Sequential 6 Sequential 8 Parallel 4 Parallel 6 Parallel 8
nodes nodes nodes nodes nodes nodes
*CBF 263 490 790 312 585 980
SA-CBF 225 273 316 249 326 409
DABF 298 772 1731 359 951 2131

Figure 4: Beamformer Result Latency

Result latencies with the parallel algo-
rithms are slightly higher than those of their
sequential counterparts. This difference can
be attributed to the fact that each beamform-
ing job has been divided into pipeline stages
and hence involves pipeline management
overhead and communication time between
successive stages. Thus, there is an obvious
tradeoff between execution throughput and
result latency when using parallel algorithms
based on the technique of iteration decom-
Speedup is defined as the ratio of the se-
quential execution time versus the parallel
execution time, where ideal speedup is equal
to the number of processors employed.
Scaled speedup recognizes that, in this case,
an increase in the number of processors
brings with it an increase in the problem
size, since each node possesses both a proc-
essor and a sensor. As seen in Figure 5, the
scaled speedups for the three parallel algo-
rithms are observed to be near linear. How-
ever, for a higher number of nodes, parallel
SA-CBF appears to provide a lower speedup
compared to parallel CBF and parallel ABF.

This outcome is a result of the presence of
loops of different sizes, different number of
steering angles and different number of out-
put angles, leading to a slight imbalance.

6 nodes
8 nodes
4 nodes 6 nodes 8 nodes
MCBF 338 502 645
*SA-CBF 362 501 617
OABF 332 487 65
Figure 5: Beamformer Scaled Speedup

Parallel efficiency is defined as the ratio
of speedup versus the number of processing
nodes (i.e. the ideal speedup). As illustrated
in Figure 6, the parallel algorithms achieve
levels of parallel efficiency in the range of
77-91%, with an average of approximately
80% for the largest cases. Since communica-
tion overhead plays a more significant role
as the size of the system increases, the effi-
ciencies decrease slightly with increase in
the number of nodes. However, the rela-
tively flat nature of these results demon-
strates that scalability is achieved at least for
arrays of moderate size and complexity.

5. Conclusions

This paper has presented a comparative
analysis of the performance of several paral-
lel algorithms for in-situ beamforming on a
distributed system. Each of the algorithms
is based on the same technique of pipelined
decomposition, where consecutive iterations
of the beamforming process are scheduled in
a round-robin fashion to execute on con-
secutive processing nodes in the array.
These algorithms were implemented as mes-
sage-passing parallel programs and executed

on a cluster of workstations connected by
ATM and their performance measured.

1 00N1


6 nodes
8 nodes
4 nodes 6 nodes 8 nodes
.CBF 85% 84% 81%
USA-CBF 91% 84% 77%
DABF 83% 81% 81%
Figure 6: Beamformer Parallel Efficiency

With respect to execution time, the par-
allel algorithms demonstrate a consistent
relationship regardless of the system size,
where split-aperture CBF performs the fast-
est, followed by the single-aperture CBF and
lastly the ABF. The sequential algorithms
demonstrate this same trend. However, de-
spite the increased complexity associated
with ABF, the results in these experiments
indicate that their execution throughput su-
persedes the simple CBF by at most only a
factor of 2.
One of the disadvantages of using a
pipelined approach to parallel processing is
an increase in result latency. However,
measurements indicate that the pipelining
overhead that increases the latency in pro-
ducing results is marginal.
Finally, the scaled speedup and parallel
efficiency achieved with each of the parallel
beamformers was found to be within ap-
proximately 80% of the ideal for systems of
four, six, and eight nodes. The general
trends indicate that comparable performance
can be expected for larger arrays, since the
decrease in efficiency as system size in-
creases is relatively slow.
The parallel beamforming algorithms
compared in this paper present many oppor-
tunities for increased performance, reliabil-



ity, and flexibility in a distributed system for
sonar signal processing. Undertaking and
coordinating the computations and commu-
nications to perform beamforming in situ is
a challenging task, and is becoming more so
as the beamformers themselves continue to
become more sophisticated. Some
beamformers, such as Minimum Variance
Distortionless Response (MVDR), exhibit
an even larger degree of communication
overhead and thus require a more elaborate
scheme to achieve reduction and hiding of
communication latency [5].
Future research activities on the subject
of parallel algorithms for in-situ processing
on distributed arrays will continue to focus
on adaptive techniques in the near term.
However, new studies and developments are
underway to help address the tremendous
challenges in computation and communica-
tion associated with advanced beamforming
in the littoral environment using matched-
field processing.

This work was sponsored in part by the Office of
Naval Research on grant N00014-99-1-0278.

[1] F. Machell, "Algorithms for broad-band proc-
essing and display," ARL Technical Letter No.
90-8 (ARL-TL-EV-90-8), Applied Research
Laboratories, Univ. of Texas at Austin, 1990.
[2] M.J. Smith and I.K. Proudler, "A one sided
algorithm for subspace projection beamform-
ing," SPIE Vol. 2846, 100-111, 1996.
[3] A. George, J. Markwell, and R. Fogarty,
"Real-time sonar beamforming on high-
performance distributed computers," Parallel
Computing, submitted Aug. 1998.
[4] A. George and K. Kim, "Parallel Algorithms
for Split-Aperture Conventional Beamform-
ing," Journal of Computational Acoustics, to
[5] A. George and J. Garcia, "A Parallel Algo-
rithm for Distributed MVDR Beamforming,"
Journal of Computational Acoustics, submitted
July 1999.

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs