1999, HCS Research Lab All Rights Reserved
Comparative Performance Analysis
of Parallel Beamformers
Keonwook Kim, Alan D. George and Priyabrata Sinha
HCS Research Lab, Electrical and Computer Engineering Department, University of Florida
216 Larsen Hall, Gainesville, FL 326116200, USA
Abstract
Advancements in beamforming algo
rithms are exceeding the computation and
communication capabilities of traditional
sonar array systems. Implementing parallel
beamforming algorithms in situ on distrib
uted array systems holds the potential to
provide increased performance and fault tol
erance at a lower cost. This paper compares
three parallel algorithms for distributed ar
rays in terms of execution throughput, result
latency, scaled speedup, and parallel effi
ciency.
1. Introduction
Passive sonar beamforming is a class of
array processing that optimizes an array gain
in a direction of interest to detect and locate
objects in an undersea environment. Beam
forming algorithms are particularly vital in
radar and sonar applications. The parallel
algorithms considered here are designed for
a distributed array of sonar transducer
nodes, each with its own processing element
and interconnected by a network. The neces
sity for parallel beamforming algorithms is a
direct result of the development of advanced
beamforming algorithms that are better able
to cope with quiet sources and cluttered en
vironments. These developments have re
sulted in increased demands for realtime
computation. Moreover, the use of larger
sonar arrays has in turn led to larger problem
sizes. Thus, a beamformer based on a cen
tralized processing system may prove insuf
ficient to meet these demands, and parallel
processing insitu on a distributed array is a
promising alternative.
2. Overview of Beamforming Algorithms
The basic operation in most beamform
ing algorithms is to sum the manipulated
outputs from many spatially separated sen
sors. The three parallel beamforming algo
rithms discussed in this paper are based on
Conventional Beamforming (CBF), Split
Aperture Conventional Beamforming (SA
CBF) [1] and an Adaptive Beamforming
(ABF) algorithm for subspace projection [2],
respectively.
2.1 Conventional Beamforming (CBF)
In a sonar array, the determination of the
direction of arrival relies on the detection of
the time delay of the signal between sensors.
In CBF, signals sampled across an array are
linearly phased (i.e. delayed) assuming a
configuration with uniform distance between
elements in the array. Incoming signals are
steered by complexnumber vectors called
steering vectors. If the beamformer is prop
erly steered to an incoming signal, the multi
channel input signals will be amplified co
herently, maximizing the beamformer output
power. Otherwise, the output of the
beamformer is attenuated to some degree.
Thus, peaks in the beamforming output indi
cate directions of arrival for sources. The
output power of CBF for each steering angle
0 is defined as
P, () = s() R s(0) (1)
where s(O) is the steering vector, R is the
CrossSpectral Matrix (CSM), and operator
* indicates complexconjugate transposition.
2.2 SplitAperture CBF (SACBF)
SACBF is based on singleaperture
conventional beamforming in the frequency
domain. The beamforming array is logically
divided into two subarrays. Each subarray
independently performs CBF using steering
vectors on its own data. The two subarray
beamforming outputs are crosscorrelated to
detect the time delay of the signal for each
steering angle. Interpolation is used to gen
erate outputs for angles other than the steer
ing angles (e.g. a ratio of 4to1 is used in
this study). The crosscorrelated data, with
knowledge of the steering angles and several
other parameters, will map the final
beamforming output. Figure 1 shows the
block diagram of the SACBF algorithm.
Subarray
FatFoure ad Fourier Fina plot
Fast Fourier Zd Transform
Transform Smoothed
Transform Interpolation
Mulchannel Subarray2
teegnalng ang
replica vectors
Figure 1: Block diagram of SACBF
2.3 Adaptive Beamforming (ABF)
The ABF algorithm used in this study is
a subspaceprojection beamformer based on
QR decomposition [2]. Subspace beamform
ing algorithms for ABF such as MUSIC
make use of the property that eigenvectors
associated with noise are orthogonal to the
space spanned by the incident signal mode
vectors. The reciprocal of steered noise sub
space indicates peak points at signal loca
tions. However, subspace identification to
separate noise and signal using the Singular
Value Decomposition (SVD) is computa
tionally expensive to perform and difficult to
implement in a parallel algorithm due to the
many dependencies between the computa
tional tasks. Instead of using the eigenvec
tors of CSM matrix, the columns of the Q
matrix are used, which correspond to the
noise subspace. The Q matrix is from the
QR decomposition of the CSM matrix using
elementary reflectors in this study. The out
put of the subspace beamformer is defined
as
PABF) (2)
s* ()ENEs())
where EN is the columns of Q matrix corre
sponding to the noise space.
3. Parallel Beamforming Algorithms
In a distributed sonar array for parallel
processing in situ, the degree of parallelism
is linked to the number of physical nodes in
the system. However, an increase in the
number of nodes increases the problem size.
The goal is to obtain minimum processor
stalling through equal distribution of work
and minimum communication overhead.
The method of parallelization employed
by the parallel algorithms in this paper,
known as iteration decomposition [3,4], fo
cuses on the partitioning of beamforming
jobs across iterations, with each iteration
processing a different set of array input
samples. Successive iterations are assigned
to successive processors in the array and are
overlapped in execution with one another by
pipelining. A single node performs the
beamforming task for a given sample set
while the other nodes simultaneously work
on their respective beamforming iterations.
At the beginning of every iteration, each
node executes an FFT on data that has been
newly collected by its sensor, and the results
are communicated to other processors before
the beamforming for that iteration com
mences. The block diagram in Figure 2 il
lustrates the manner in which beamforming
iterations are distributed across the nodes in
the distributed array, in this case using 3
nodes.
time
I
Node
Job 1/3 FFT Job 2/3 FFT Job 3/3 FFT Job 1/3 FFT
Nodel 1
Job 3/3 FFT Job 1/3 F Job 273 F Job 3/3 F
iter io1 Iter l to Iter +2 Iter +2
Figure 2: Iteration decomposition
Each processor calculates an index based
on its node number, the current job number
and the number of nodes. This index tells
the node from which point in its iteration it
must continue after executing the FFT and
communication stages and when it must
pause to begin another iteration.
The communication pattern that can be
expected in the CBF and SACBF algo
rithms is an 'alltoone' pattern, as only one
of the nodes needs to receive the data sam
pled each cycle to perform a given
beamforming iteration on data collected
throughout the array. However, ABF algo
rithms require 'alltoall' communication so
that the crossspectral matrix on each of the
nodes is updated with each cycle of sam
pling. Thus, to provide a common frame
work for comparisons, an 'alltoall' type of
communication is used with all the parallel
algorithms, where each node sends its Fou
rier transformed data to all other nodes.
4. Parallel Performance Analysis
The parallel algorithms in this paper
were implemented in MPIC and executed
on a cluster of SPARCstation20 worksta
tions connected by a 155 Mb/s ATM net
work. In the experiments described in this
section, a sampling frequency of 1500Hz is
assumed and the beamforming is performed
for 200 frequency bins. The FFT length of
the processed data is 2048 samples, and no
frequencybin averaging is performed. Ap
proximately 180 steering angles are resolved
(i.e. 181 for CBF and ABF, and 177 for SA
CBF), with a 4to1 ratio of interpolation
used with SACBF.
It is apparent from the results in Figure 3
that the effective execution times of the par
allel algorithms are much lower than their
sequential counterparts. Moreover, the in
crease in parallel execution time as the prob
lem size increases is less pronounced than in
the sequential case. Thus, the parallel algo
rithms are seen to provide a higher through
put of execution.
2000
*CBF
1800 SA GBF
1 .SACBF
OABF
1600
S 1400
1200
i
C 1000
800
600
400
200 
Sequential 4 Sequential 6 Sequential 8 Parallel 4 Parallel 6 Parallel 8
nodes nodes nodes nodes nodes nodes
*CBF 263 490 790 78 97 123
SACBF 225 273 316 62 54 51
OABF 298 772 1731 90 158 266
Figure 3: Beamformer execution times
The execution time for parallel SACBF
is always less than that for both parallel CBF
and parallel ABF. Unlike SACBF, the CBF
and ABF algorithms must directly process
for all steering angles, no interpolation is
performed, and hence they involve more
computation. This characteristic is an inher
ent strength of the SACBF algorithm. ABF
requires a higher execution time than both
CBF and SACBF because of the complex
ity of the QR decomposition stage.
Figure 4 shows the result latencies for
the different algorithms. Result latency re
fers to the time required for the final output
to be available after the data has been read
by the sensors. In the case of sequential al
gorithms, result latency is the same as the
execution time, as there is no pipelining in
volved.
2500
DABF
2000
E,
>, 1500
1000 
5 00
Sequential 4 Sequential 6 Sequential 8 Parallel 4 Parallel 6 Parallel 8
nodes nodes nodes nodes nodes nodes
*CBF 263 490 790 312 585 980
SACBF 225 273 316 249 326 409
DABF 298 772 1731 359 951 2131
Figure 4: Beamformer Result Latency
Result latencies with the parallel algo
rithms are slightly higher than those of their
sequential counterparts. This difference can
be attributed to the fact that each beamform
ing job has been divided into pipeline stages
and hence involves pipeline management
overhead and communication time between
successive stages. Thus, there is an obvious
tradeoff between execution throughput and
result latency when using parallel algorithms
based on the technique of iteration decom
position.
Speedup is defined as the ratio of the se
quential execution time versus the parallel
execution time, where ideal speedup is equal
to the number of processors employed.
Scaled speedup recognizes that, in this case,
an increase in the number of processors
brings with it an increase in the problem
size, since each node possesses both a proc
essor and a sensor. As seen in Figure 5, the
scaled speedups for the three parallel algo
rithms are observed to be near linear. How
ever, for a higher number of nodes, parallel
SACBF appears to provide a lower speedup
compared to parallel CBF and parallel ABF.
This outcome is a result of the presence of
loops of different sizes, different number of
steering angles and different number of out
put angles, leading to a slight imbalance.
6 nodes
8 nodes
4 nodes 6 nodes 8 nodes
MCBF 338 502 645
*SACBF 362 501 617
OABF 332 487 65
Figure 5: Beamformer Scaled Speedup
Parallel efficiency is defined as the ratio
of speedup versus the number of processing
nodes (i.e. the ideal speedup). As illustrated
in Figure 6, the parallel algorithms achieve
levels of parallel efficiency in the range of
7791%, with an average of approximately
80% for the largest cases. Since communica
tion overhead plays a more significant role
as the size of the system increases, the effi
ciencies decrease slightly with increase in
the number of nodes. However, the rela
tively flat nature of these results demon
strates that scalability is achieved at least for
arrays of moderate size and complexity.
5. Conclusions
This paper has presented a comparative
analysis of the performance of several paral
lel algorithms for insitu beamforming on a
distributed system. Each of the algorithms
is based on the same technique of pipelined
decomposition, where consecutive iterations
of the beamforming process are scheduled in
a roundrobin fashion to execute on con
secutive processing nodes in the array.
These algorithms were implemented as mes
sagepassing parallel programs and executed
on a cluster of workstations connected by
ATM and their performance measured.
1 00N1
9096
809
700
609
509
40'
30
20~
ABEF
6 nodes
8 nodes
4 nodes 6 nodes 8 nodes
.CBF 85% 84% 81%
USACBF 91% 84% 77%
DABF 83% 81% 81%
Figure 6: Beamformer Parallel Efficiency
With respect to execution time, the par
allel algorithms demonstrate a consistent
relationship regardless of the system size,
where splitaperture CBF performs the fast
est, followed by the singleaperture CBF and
lastly the ABF. The sequential algorithms
demonstrate this same trend. However, de
spite the increased complexity associated
with ABF, the results in these experiments
indicate that their execution throughput su
persedes the simple CBF by at most only a
factor of 2.
One of the disadvantages of using a
pipelined approach to parallel processing is
an increase in result latency. However,
measurements indicate that the pipelining
overhead that increases the latency in pro
ducing results is marginal.
Finally, the scaled speedup and parallel
efficiency achieved with each of the parallel
beamformers was found to be within ap
proximately 80% of the ideal for systems of
four, six, and eight nodes. The general
trends indicate that comparable performance
can be expected for larger arrays, since the
decrease in efficiency as system size in
creases is relatively slow.
The parallel beamforming algorithms
compared in this paper present many oppor
tunities for increased performance, reliabil
b
b
6
L
L
96
ity, and flexibility in a distributed system for
sonar signal processing. Undertaking and
coordinating the computations and commu
nications to perform beamforming in situ is
a challenging task, and is becoming more so
as the beamformers themselves continue to
become more sophisticated. Some
beamformers, such as Minimum Variance
Distortionless Response (MVDR), exhibit
an even larger degree of communication
overhead and thus require a more elaborate
scheme to achieve reduction and hiding of
communication latency [5].
Future research activities on the subject
of parallel algorithms for insitu processing
on distributed arrays will continue to focus
on adaptive techniques in the near term.
However, new studies and developments are
underway to help address the tremendous
challenges in computation and communica
tion associated with advanced beamforming
in the littoral environment using matched
field processing.
Acknowledgements
This work was sponsored in part by the Office of
Naval Research on grant N000149910278.
References
[1] F. Machell, "Algorithms for broadband proc
essing and display," ARL Technical Letter No.
908 (ARLTLEV908), Applied Research
Laboratories, Univ. of Texas at Austin, 1990.
[2] M.J. Smith and I.K. Proudler, "A one sided
algorithm for subspace projection beamform
ing," SPIE Vol. 2846, 100111, 1996.
[3] A. George, J. Markwell, and R. Fogarty,
"Realtime sonar beamforming on high
performance distributed computers," Parallel
Computing, submitted Aug. 1998.
[4] A. George and K. Kim, "Parallel Algorithms
for SplitAperture Conventional Beamform
ing," Journal of Computational Acoustics, to
appear.
[5] A. George and J. Garcia, "A Parallel Algo
rithm for Distributed MVDR Beamforming,"
Journal of Computational Acoustics, submitted
July 1999.
