7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
Towards HighResolution Simulation of Turbulent Collision of Cloud Droplets
H. Parishani, B. Rosat W.W. Grabowskij and L.P. Wang*
Department of Mechanical Engineering, University of Delaware, Newark, DE 19716, USA
t Institute of Meteorology and Water Management, Warsaw, Poland
SMesoscale and Microscale Meteorology Division, National Center for Atmospheric Research
Boulder, CO 80307, USA
hparish@udel.edu, bogdan.rosa@imgw.pl, grabow@ucar.edu, and lwang@udel.edu
Keywords: turbulent collisioncoalescence, aerodynamic interactions, cloud droplets,
direct numerical simulation, MPI implementation
Abstract
Recently we developed a hybrid direct numerical simulation (HDNS) approach to simulate the collision rate of
cloud droplets in a turbulent air where both the inertia and sedimentation of cloud droplets are considered. The most
important aspect of HDNS is to incorporate dropletdroplet local aerodynamic interactions by imbedding Stokes
disturbance flows due to droplets in a pseudo spectral simulation of the background air turbulence. In order to increase
the range of turbulent flow scales and the problem size (the computational domain size and the number of droplets),
higher resolution DNS is desired. Towards this direction, we here discuss the MPI (Message Passing Interface)
implementation of the aerodynamic interactions of cloud droplets within the HDNS. We explore and compare several
approaches to identify an optimal MPI implementation. In the MPI implementation, the data communication can
affect the nature of the iteration scheme used to solve the disturbance velocities felt by individual droplets. It is
found that the wall clock time per time step depends on the problem size (the number of particles in the system) in
a complex manner as the number of iterations per time step also depends on the problem size. We then validate the
results pertinent to turbulent collisions of cloud droplets we obtained from our newly developed MPI HDNS code, by
comparing with our published results based on the OpenMP implementation. The physical results are found to be in
excellent agreement with our previous results.
Introduction
Formation of clouds and precipitation is a fundamen
tal aspect of weather and climate. Reliable weather
and climate prediction at both local and global scales
depends on our understanding of microphysical pro
cesses (i.e., droplet activation, condensational growth,
collisioncoalescence growth, and drop breakup) and
the smallscale cloud dynamics (i.e., entrainment, mix
ing, multiscale turbulent transport, and thermodynam
ics). Clouds are a turbulent suspension of drops and
ice particles that exhibits strong aerodynamic and ther
modynamic interphase couplings. The interactions be
tween the dynamics and microphysics within clouds are
complex multiscale, multiphase processes (Pruppacher
and Klett 1997; Shaw 2003; Bodenschatz et al. 2010).
In recent years, direct numerical simulation (DNS)
of smallscale air turbulence has been applied to study
turbulent collisioncoalescence (Franklin et al. 2007;
Ayala et al. 2008a; Wang et al. 2008), the growth of
cloud droplets by diffusion of water vapor (Vaillancourt
et al. 2002; Lanotte et al. 2009), as well as entrain
ment and mixing processes (Andrejczuk et al. 2006). In
warm (icefree) clouds, turbulent collisioncoalescence
of cloud droplets affects the droplet size distribution and
is a necessary step for precipitation formation (Wang and
Grabowski 2009). This is the main motivation for the
current study.
In DNS, turbulent air motion at the dissipationrange
scales (mm to cm scales) and a limited range of inertial
subrange scales currently up to 0(10 cm) are re
solved, but largerscale motion is represented by a forc
ing scheme. The current limitation of DNS to lowflow
Reynolds numbers, relative to atmospheric turbulence,
makes it impossible to directly address the effect of
largerscale turbulent eddies on dropletscale processes.
In the last few years, our group has developed a hy
brid direct numerical simulation (HDNS) approach to
simulate the collision rate of cloud droplets in a tur
bulent air where both the inertia and sedimentation of
cloud droplets are considered. The most important as
pect of HDNS is to incorporate dropletdroplet local
aerodynamic interactions by imbedding Stokes distur
bance flows due to droplets in a pseudo spectral simula
tion of the background air turbulence (Wang et al. 2005;
Ayala et al. 2007). The methodology allows us to study
both the geometric collision rate and collision efficiency
of cloud droplets in a turbulent carrier flow. Kinematic
pair statistics such as radial distribution function and ra
dial relative velocity can be obtained in order to guide
the development of an analytical parameterization of the
turbulent collision kernel (Ayala et al. 2008a). Under
the assumptions that smallscale turbulence governs the
dropletdroplet pair interactions and that the presence of
droplets has a negligible effect on the air turbulence due
to low mass loading [0(103kg/m3)], the background
air turbulence is modeled as an isotropic and homo
geneous turbulence driven by largescale forcing. The
flow dissipation rate, droplet inertia and settling velocity
are prescribed to mimic the conditions of atmospheric
clouds. To render the manybody interaction problem
tractable, the disturbance flow due to a given droplet is
truncated and it is assumed that such a truncation has
little effect on pair collision statistics.
To more realistically represent all significant scales
of turbulent motion and to gain further insight on the
role of flow Reynolds number, in this paper we dis
cuss our efforts towards higher resolution HDNS, so a
larger computational domain (of 1 m size) or a wider
range of flow Reynolds numbers (say, Taylor microscale
Reynolds number 500) could be considered. This re
quires us to first implement MPI (Message Passing Inter
face) into our HDNS codes to take full advantage of scal
able, distributedmemory computers. Preliminary work
in this direction has been performed by Rosa and Wang
(2009) who considered MPI implementations of various
tasks in the HDNS except the aerodynamic interactions.
In this paper, we will focus on the MPI implementa
tion of aerodynamic interactions of cloud droplets. The
MPI implementation is based on domain decomposition
previously adopted for efficient MPI implementation of
Fast Fourier Transform in the pseudospectral simula
tion of fluid turbulence (Dmitruk et al. 2001). Freely
moving droplets, however, are not naturally aligned with
this domain decomposition and new issues pertinent to
droplet transport arise: (1) the gathering of information
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
in a finite region surrounding a droplet that could oc
cupy more than one subdomain and (2) rearrangement
of data associated with droplets crossing through subdo
main boundaries.
The main objective of this paper is to discuss how we
deal with these issues in the MPI implementation. We
will first describe the technical aspects of the Hybrid
DNS approach. The MPI implementation of the aero
dynamics interactions will be discussed next. We will
then validate the newly developed MPI implementation
and present performance data of the implementation. Fi
nally, main conclusions will be drawn along with a dis
cussion of future work.
Hybrid DNS
The basic ideas and algorithms for the HDNS approach
have been presented in Ayala et al. (2007) and Wang et
al. (2009). Only a brief description of the approach is
given here to prepare for our discussion on the MPI im
plementation of the aerodynamic interaction of droplets.
We consider a dilute suspension of droplets in a back
ground turbulent air flow U(x, t) solved by the usual
pseudospectral method in a periodic domain. The flow
is resolved with N3 grid points. Then, the fluid veloc
ity at the location of the kth droplet of radius a(k)
denoted by U(Y(k) (t),t), will be interpolated from the
grid using the 6point Lagrangian interpolation in each
spatial direction, where (k) (t) is the location of that
droplet. The velocity of the kth droplet will be denoted
by V(k)(t).
As a first approximation, we assume the disturbance
flows resulted from the presence of the droplets in the
background turbulent flow are localized in space, on the
assumption that the droplet size (10 to 100 pm) is as
sumed to be much smaller than the Kolmogorov scale
( lmm). The disturbance velocity at the center of kth
droplet due to other droplets in the system is denoted by
u(k) u(Y(k) (t)), which must be solved from a cou
pled linear system (Ayala et al. 2007) of dimension 3Np,
as shown by Eq. (1) in Table 1. Note that Eq. (1) is de
rived from the requirement that the composite flow field
(background flow plus the disturbance flow fields caused
by droplets) satisfies, on average, the noslip boundary
condition on the surface of each droplet. It is evident
that the disturbance velocity felt by a given droplet de
pends on the background turbulent flow, and positions
and velocities of all other droplets in the system. Other
equations needed to describe the motion of droplets are
summarized in Table 1. The most computationally de
manding step in the HDNS approach is to solve Eq. (1),
and the MPI implementation needed to solve this equa
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
Table 1: Key equations for the droplets in the HDNS approach (from top to bottom): (1) the linear system of dimension
3Np used to solve for the disturbance velocities felt by droplets; (2) the definition of Stokes disturbance flow; (3) The
viscous drag experience by the kth droplet, (4) the equation of motion, and (5) the kinematic equation for droplet
location.
A,
S(k) EI u (y(k)(t) Y() t); a(), V() U(Y(m),t) 
?fll
43a(k) _3 (a(k)) 3 r(k) 4(k+ 3 a(
4 r(k) 4 \ re) I (r(k))2 4 ) (k)
D(k) (t) 6a(k) V(k) (t) (U(y(k) (t), t) + U(k)
DJ(k) (t) V(k) (t) (U(Y(k) (t), t) + u(k)) 
dY) (t)
dt
1 (a(k) Vp(k)
4 r(k)
V() (t)
tion is the focus of this paper.
When the disturbance velocities u(k) are computed,
droplets are advanced by solving their equation of mo
tion, Eq. (4). Noting that the ettec tive relative fluid ve
locity felt by the kth droplet is U(Y(k) (t), t) + (k) 
V(k) (t), the drag force acting on the kth droplet due
to the interactions with the background turbulent flow
and the disturbance flow field is obtained by equation
(3) (Wang et al. 2005a). The Stokes inertial response
time in Eq. (4) is given as
(k) 2pp(' ')2 (6)
9 (6)
where pp is the droplet density (water density in our sim
ulation) and p is dynamic viscosity of the background
fluid (/t of air in our simulation). g in Eq. (4) is the
gravitational acceleration.
Given a(k), V(k), U((k), t), (k) at a specified time
t, Eq. (1) is a system of equations of 3Np unknowns for
the disturbance velocities of Np droplets. Note that, the
disturbance velocity at the location of one droplet is cou
pled with the disturbance velocities of all other droplets.
Also based on Eq. (1), unfortunately the three spatial
components of the disturbance velocity cannot be sep
arated to form smaller systems. So, system (1) must
be solved as a whole to yield the disturbance velocities
u(k).
In this paper, we solve Eq. (1) using an iterative
method, namely for each time step, we start with an
initial guess of u(k)o = 0, we solve Eq. (1) itera
tively to obtain the next approximations u(k)1, u(k)2
...... u(k)n. The process continues until the differences
in two consecutive iterations of disturbance velocities of
all droplets are less than a prescribed permissible error
e. Namely, the solution for system (1) is converged if
normalized error
(k)T+l I
Iu h di
Ucharac
< 6 (7)
is satisfied for all k's. Here n is the iteration counter
and Ucharac is a characteristic velocity. In this paper, we
are dealing with a bidisperse system with two different
radii of droplets. Np = N,1 + Np2 is the total number
of droplets in the domain and the characteristic velocity
Ucharac is defined as the differential terminal velocity
(Ucharac = v,2 vpl 1). For the results obtained in this
paper, we set e = 105 unless stated otherwise.
Parallel implementation
Previously, a looplevel parallelization using OpenMP
was used to solve the linear system, Eq. (1) (Ayala et
al. 2007). Namely, they divided the droplets on the right
hand side (RHS) of equation (1) into OpenMP threads.
This is a straightforward and efficient approach to com
pute the RHS of equation (1), but this approach assumes
shared memory and is limited to the use of processors
(typically 32 or less) within a same node. In terms of the
problem size we are dealing with, parallelization with
OpenMP could handle the problem efficiently up to 1283
s (r(k); a(k), V (k))
k = 1, 2, ..., Np
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
flow grid resolution with O(105) droplets, giving a max
imum Reynolds number of RA = 72.4, A being the Tay
lor micro scale of the background turbulence.
In order to tackle problems with a larger number of
droplets or flows at higher Reynolds numbers, here we
parallelize the code using MPI (Message Passing Inter
face) to solve the system (1). In principle, the MPI par
allelization does not pose a limitation on the maximum
number of processors that can be utilized. So, a higher
speed up could be achieved, this along with the larger
available memory makes it possible to treat larger prob
lem sizes.
For the MPI implementation, we decompose the cu
bic computational domain into "slabs"'. Figure 1 shows
a cubic computational domain which is decomposed into
8 slabs numbered 0 to 7 from left to right. Domain is de
composed in the direction perpendicular to gravity in or
der to minimize the number of droplets crossing the slab
boundaries. This will in turn reduce the amount of time
spent in data communication between the slabs. Since
this is a spatial decomposition, we will make use of MPI
to communicate data between the slabs when droplets
are crossing the boundaries or whenever data must be
communicated between processes. Basically each pro
cessor is assigned to compute the RHS of equation (1)
for the droplets inside its own slab and when it is needed,
data associated with droplets located in the neighboring
slabs can be obtained by MPI SEND and RECEIVE.
The computation of the summations in equation (1) is
expensive, since for every droplet this sum should be
carried out over all other droplets in the domain. In
other words, the computational cost for (1) grows as
N,2. As long as the collision efficiency is concerned,
Ayala et al. (2007) showed that we could truncate the
summation and restrict the radius of influence of the
disturbance flow caused by a kth droplet to a distance
Htrunca(k) from the center of this droplet, without a
significant effect on the computed collision efficiency.
Their experiments showed that the computed collision
efficiency is insensitive to Htrunc. if Htrunc is larger
than 35. In this paper we will use dimensionless trunca
tion radius of Htru,, = 50 which is more conservative
IluIn H, ,uc = 35.
In order to efficiently locate the droplets and their
immediate neighbors inside the truncation sphere, we
make use of cellindex method and the concept of linked
lists (Allen and Tildesley 1989). This is numerically im
plemented by dividing the slabs into smaller cells and
keeping track of droplet indices inside each cell. This
speeds up the process of finding neighboring droplets,
Since droplets in each slab are assigned to an individual processor in
our spatial decomposition, we may use the terms "process", "pro
cessor" or "slab" interchangeably.
gl
0~ 1 2 3 4 5 6 7I
.* *
101234567
**
I
**
*/ *
Figure 1: A spatial domain decomposition showing 8
slabs in the y direction. Droplets in each slab are as
signed to an individual processor, so in this case 8 pro
cessors are used.
and it does not affect the converged solution of system
(1).
In the MPI implementation of ADI, the necessity for
the data communications between the slabs is the main
challenge. Here, we have assumed that the ADI trunca
tion sphere of any individual droplet can at most cover a
space belonging to 3 slabs. Namely, we assume that the
domain of influence of droplet k located in slab num
ber n is confined to slab numbers n 1, n and n + 1.
Since the domain we are dealing with is periodic, slab 0
and slab 7 also form neighbors (if we have a total of 8
slabs as in Figure 1). This assumption amounts to an up
bound on the number of processors that we could use,
as
Lbox
N < (8)
trc r Hunc x amax
where Npro, is the number of processors (or "slabs"),
a,max is the maximum radius of the droplets in the sys
tem and Lbox is the domain size in any direction (y di
rection in figure 1). Since the droplet radius considered
is in the range of 10 to 60 pm, this does not pose a se
vere restriction but has the benefit of limiting the data
communication to nearest neighbor slabs.
We have developed and tested four variations of the
MPI implementation of treating the droplet aerodynamic
interactions, in an effort to identify the most accurate
and the most efficient approach. They will be denoted
by ADI1, ADI2, ADI3 and ADI4. The purpose for all
variations is the same, that is to solve the linear system,
Eq. (1), on a distributed memory machine.
In Figure 2, a schematic of each of the four ap
n1 n n+1
n1 n n+1
Figure 2: Four different approaches to implement par
allel ADI. The circle has a radius equal to the trunca
tion radius. The numbers inside the circle represents the
sequence in which the summation in Eq. (1) is imple
mented. Gray slabs are synchronized, while gray and
white slabs alternate during the updating of disturbance
velocities.
preaches is shown. The circle has a radius equal to
the truncation sphere and only 3 neighboring slabs of
n 1, n and n + 1 are shown (assuming the periodic
domain is already taken care of). On each panel, the
droplet located at the center of the collision sphere will
be referred to as the target droplet (say, droplet A), and
3 representative neighbors: neighbor B from the same
slab, neighbor C from slab on the right and neighbor D
from slab on the left.
In ADI1, we target one droplet (say A) and then
neighboring droplets inside a sphere of radius H,,unc x
amax are found using linked lists (say droplets B, C and
D are "neighbors" of A). Contributions from "neighbor
droplets" on the RHS of equation (1) are summed if the
distances are less 1h1n [T, rune X neighbor. Namely, the
k index in Eq. (1) is associated with A, and the m in
dex in Eq. (1) is associated with droplets B, C, and D.
At this step we will use initial guess of zero for distur
bance at the location of neighbors (B, C and D) to com
pute the first estimate for the disturbance velocity felt
by droplet A. Then we move to the next target droplet,
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
using newly obtained disturbance velocities for droplets
located within the same slab. If the truncation sphere
does not extend to the neighboring slabs, we will be
dealing with data for droplets located in the same slab
or data on the local processor. In contrast, if the trunca
tion sphere extends to neighboring slabs (which in Fig
ure 2 it does), the most updated information for the two
droplets C and D will not be readily available for slab
n. This means that the required data of droplets C and
D must be transferred from the neighboring slabs on the
right and the left, respectively. We compute contribu
tion from B to A first (step 1), the updated information
of droplet A is transferred to the neighbors immediately
after step 1. Then the contribution from C to A is com
puted (step 2), and the updated information of A again
made available to the neighbors. The last step (step 3)
will compute the contribution from D to A, followed by
updating the A information for the neighbors. The above
procedure will result in solving Eq. (1) by the block
GaussSeidel method for all contributions to A in three
sequential steps (Smith et al. 1996).
In ADI2, we instead compute the contributions to the
disturbance velocities of the neighboring droplets B, C,
D, originated from the target droplet A. Therefore, the
droplets B, C, and D are assumed to be within a distance
of Ht,,,,u x aA from A. In other words, the m index in
Eq. (1) is associated with A, and the k index in Eq. (1)
is associated with droplets B, C, and D. As a result, the
partial disturbance velocity felt by droplets B, C and D
due to droplet A are computed and saved on the current
processor. Then we move to another target droplet A
within the current processor, using the newly obtained
partially computed disturbance velocities felt by B, C
and D. For target droplets whose truncation spheres ex
tend to neighboring slabs, the required data (in this case,
the locations of droplets C and D needed for the first ar
gument on the RHS of Eq. (1)) must be received from
neighboring slabs on the right and the left, respectively.
In this method, for a given droplet (here B, C, or D), the
summation on the RHS of equation (1) is not computed
all at once, but it is computed gradually as all A droplets
are counted.
In ADI3, for each iteration, we first transferred the
required droplet data (namely, velocity, position, and
updated disturbance velocity for C and D) to the cur
rent slab. Then using these "available data", we im
plemented the same method as ADI1. In practice, this
means the data from neighboring slabs (information of
droplets C and D) are coming from the previous iteration
and not the current iteration. This will result in solving
Eq. (1) via the block GaussSeidel method for contribu
tions from droplets from the current slab and Jacobi for
contributions from droplets located from the neighbor
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
Table 2: Wall clock time & required number of iterations
per time step realized for the 4 approaches, for a given
problem size
Method time(sec)/timestep Avg. # of iteration
MPIADI1 0.369 10.1
MPIADI2 0.724 21.56
MPIADI3 0.41 11.34
MPIADI4 0.64 10.7
ing slabs. Compared to ADI1, ADI3 requires one data
communication step per iteration, while ADI1 involves
two data communication steps per iteration.
ADI4 is implemented to compute the summation in
equation (1) for all k particles located in odd number
slabs first (the first half iteration step). This is followed
by the computation of the disturbance velocity felt by all
particles in the even number slabs using the updated in
formation from the first half step of an iteration. Essen
tially, even number processes stand idle when odd num
ber processes are busy and vice versa. In this manner,
we can utilize newly updated data in odd number pro
cesses to update the disturbance velocities for droplets
located in even number processes, and vice versa. There
is an obvious drawback for this method: it will take al
most twice the time compared to (say) ADI3 to converge
since half of the processes are idle when the other half is
busy computing.
Table 2 shows the wall clock time and average iter
ation numbers measured for ADI1, ..., ADI4. These
values are measured for a grid resolution of 2563 used
for flow simulation, with Np = 106 droplets of radii 20
and 25 pm on Nproc = 32 processes. Both time mea
surements and the number of iterations are computed
by averaging over 104 time steps. This table suggests
that ADI2 and ADI4 are computationally slow and they
wouldn't be appropriate for long runs and large problem
sizes. Also ADI3, is about 11% slower than ADI1 which
suggests that ADI1 is the optimal approach.
To gain some understanding on how the solution is
converged, we plot in Figures 3, 4 and 5 the normal
ized error in x, y and z components of disturbance ve
locity u(k) versus iteration number for all 4 methods.
These graphs are plotted for a given droplet in the do
main and in this case simulation is allowed to run fur
ther to reach the normalized error of e = 1015. In each
case, the error decreases roughly exponentially with the
iteration number. ADI2 stands out being the slowest in
convergence but the other 3 methods show very similar
convergence behavior. Here we should emphasize that
although ADI1, ADI3 and ADI4 require similar num
ber of iterations to converge, their wall clock times are
103
109
10' "
10 
1o'I 
tos 
I .I .I .I I
0 5 10 15 20 25 30
Iteration number
Figure 3: Convergence of ADI subroutines, x compo
nent of disturbance velocity
10 
o \
106
109
10,1
101
I1, I , I, I ,, I
I I I . . I '
5 10 15 20
Iteration number
25 30
25 30
Figure 4: Convergence of ADI subroutines, y compo
nent of disturbance velocity
different as shown in Table 1. The times per iteration
for the four approaches are 0.0365 s, 0.0336 s, 0.0361,
0.0598 s, respectively. Further analytical studies should
be carried out to explain the rate of convergence of each
approach.
Since ADI1 is the fastest among the approaches, it
will be used below to validate results against our previ
ously published results based on OpenMP.
SADI4
 ADI3
ADI2
+ ADI1
SADI4
ADI2
+ADII
103
10
101
1012
1015
5 10 15 20
Iteration number
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
nonoverlapping corrections for both kinematic proper
ties are needed as explained in Wang et al. (2005).
S0.6
CD
X4 0.4
L:
25 30
Figure 5: Convergence of ADI subroutines, z compo
nent of disturbance velocity
Validation and preliminary
results
In the study of droplet collision and coalescence, one is
usually interested in the dynamic collision kernel which
is defined as:
SNV.2
T = box (9)
Stairs
where ., .. is the total number of droplet pairs, N
is the number of collisions per unit time per unit vol
ume. For a monodisperse system of N, droplets,
niairs = Np(Np 1)/2, while for a bidisperse sys
tem of Np1 size1 droplets and NP2 size2 droplets,
pairs = N, NP2. In our HDNS code, collision ker
nel r can be determined dynamically by counting the
number of collisions between droplets and using equa
tion (9) for different types of colliding pairs (i.e., 12,
11, 22 pairs). Furthermore, kinematic pair statistics
are also of interest, such as the average radial relative
velocity (Iw,r )(r = R) and the radial distribution func
tion 912(r = R) (Wang et al. 2005). Here R is the col
lision radius which for colliding droplets of radius al
and a2 will be R = al + a2. In our code, we compute
(Iw, ) (r = R) by averaging Iw,I of droplet pairs with
a given separation over time and space. The radial dis
tribution function 912 (r = R) is computed by defining
a spherical shell with an inner radius R 61 and outer
radius of R + 62 in which 61 and 62 are small fractions
of R. More details regarding physical and mathematical
aspects of collision kernel, radial distribution function
and radial relative velocity can be found in Wang et al.
(2008). When aerodynamic interactions are considered,
25 30 35
25 30 35
Figure 6: Dynamic collision kernel from ADI1 com
pared with the results from previously developed
OpenMP code in (Wang et al. 2008)
0.8
A
A 0.4

b.
%oa
0.2 
U. I I oI I . I I
0.0 0.2 0.4 0.6 0.8 1.0
a2/al
Figure 7: Radial relative velocity from ADI1 compared
with the results from previously developed OpenMP
code in (Wang et al. 2008)
Crosssize collision statistics (dynamic collision ker
nel, radial distribution function and radial relative veloc
ity) are obtained for a bidisperse system of al=30 pm
in radius and variable droplet size a2. For each droplet
set, we ran the code twice: first with ADI ON which
includes the effects of aerodynamic interactions (call it
SADI4
ADI3
ADI2
SADI1
x Rj=72.6, e=400 cm/s3 ADI, MPI
R,=72.6, E=400 cm2/s NADI, MPI
 4 RX=72.4, e=400 cm2/s NADI, Wang et al. 2008
R=72.4, e=400 cm2/3 ADI, Wang et al. 2008
I I I I I
0 5 10 15 20
a2 (pm)
R,=72.6, e=400 cm/s3, MPI
R.=72.4, e=400 cm2s3, Wang et al. 2008
I
/
I
I
I
/
8

7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
0.0 0.2 0.4 0.6 0.8 1.
az/al
Figure 8: Radial distribution function from AE
compared with the results from previously develop
OpenMP code in (Wang et al. 2008)
o R,=72.6, e=400 cm2/s3, MPI
R,=72.4, e=400 cm2/s3, Wang et al. 2008
Conclusions
ADI case), and second with ADI OFF which does not
consider aerodynamic interactions (call it NADI case).
Five different size ratios of a2/al=0.167, 0.333, 0.5,
0.667 and 0.833 were performed. The results for dy
namic collision kernel, radial distribution function and
radial relative velocity are plotted in figures 6, 7, and
8. The results from Wang et al. (2008) based on an
OpenMP code are also shown for comparison. An ex
cellent agreement between the MPIcode results and
OpenMPcode results is obtained for all collision and
kinematic statistics.
In figure 8 though, for a2/al=0.167, the time inter
val covered by our simulation was not long enough to
achieve a small numerical uncertainty. Since a2/al is
small, smaller droplets of radius a2 would easily be re
pelled by larger droplets of radius al and this will reduce
the pair collision rate. So, longer simulation times would
be necessary for this case.
For these runs, in each case, we set the number of
droplets Np so that liquid water content would be of or
der 10 g/m3. In order to obtain consistent and steady
state collision statistics, we performed the simulation for
a period of 10T,, where T, is the large eddy turnover
time of the background turbulence. Also we excluded
the initial 4T, to avoid unsteady transient effects. We
produced RA =72.6 at flow dissipation rate of e =400
cm2/s3 to compare with results in Wang et al. (2008).
In this paper, we have presented several approaches de
veloped to parallelize the HDNS code, using domain de
composition and MPI. This allows us to run the HDNS
code on a distributed memory computer. In general, the
MPI implementation is more challenging than our pre
vious OpenMP implementation for a multiplebody in
teraction problem we considered here. Among the four
approaches we proposed and tested, ADI1 is the most ef
ficient in terms of the number of iterations required and
the wall clock time per time step. The wall clock time
per time step depends on the problem size (the number
of particles in the system) in a complex manner as the
number of iterations also depends on the problem size.
More work is needed to understand the convergence rate
of the MPI implementation.
We then validated the results pertinent to turbulent
collisions of cloud droplets we obtained from our newly
developed MPI HDNS code, by comparing with our
published results based on the OpenMP implementation.
The results are in excellent agreement, showing that the
MPI code can now be used to simulate a larger prob
lem size and larger flow Reynolds number. More simu
lations are underway to extend the current collision effi
ciency, collision kernel, RDF and relative velocity tables
to higher RA and a wider range of dropletsize combina
tions. We anticipate that our ADI1 implementation can
be used in the near future to gather collision statistics for
aerodynamicallyinteracting droplets with RA up to 210
and a system of 0(107) droplets.
Timing and code scalability
Figure 9 shows how the the wall clock time scales with
Np, Nproc and average droplet size (al + a2)/2 for
typical simulations. On panel (a), the wall clock time
and average iteration numbers are measured for a sim
ulation of 2563 grid resolution, with droplets of size
al = 20prt and a2 = 2'u"t Panel (b) shows the re
sults for 2563 flow grid resolution with the same size
combination as (a) for N, = 106 droplets. Also on
panel (c), Np = 106 droplets have been simulated in a
2563 flow grid resolution. For data on panels (a) and
(c), we used Nproc = 32 processors. The wall clock
time per iteration scales roughly linearly with the num
S ber of particles (or the problem size), with the number
0 of iterations increasing with the system size as expected.
For a given problem size, the number of iterations is not
very sensitive to the number of processors used. The
)I1 dependence of wall clock time on the hydrodynamic in
)ed teraction radius is neither linear nor quadratic.
103
I.
" 102
S10
J10
I"
10 101
Number of particles (Millions)
101
10
I
101
Number of processors used
10
.2
s
f '
'~ I
xA
Average particle size in domain
Figure 9: (a) Wall clock time is plotted versus Np; (b)
Wall clock time is plotted versus Nproc; (c) Wall clock
time is plotted versus (al + a2)/2; Inserts on each panel
show the number of iterations needed for convergence.
Acknowledgements
This work was supported by the National Science
Foundation (NSF) under grants OCI0904534, ATM
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
0527140, and ATM0730766. Computing resources are
provided by National Center for Atmospheric Research
through CISL35751010 and CISL35751014.
References
Allen M.P., Tildesley D.J. Computer Simulation of Liq
uids, Oxford University Press, 1989.
Andrejczuk M., Grabowski WW, Malinowski S.P, and
Smolarkiewicz PK., Numerical simulation of cloud
clear air interfacial mixing: effects on cloud micro
physics. J. Atmos. Sci., 63, 32043225, 2006.
Ayala O., Rosa B., Wang L.P, Grabowski W.W., Effects
of Turbulence on the Geometric Collision Rate of Sedi
menting Droplets: Part 1. Results from direct numerical
simulation, New J. Physics, 10, 075015, 2008a.
Ayala O., Rosa B., Wang L.P, Effects of Turbulence on
the Geometric Collision Rate of Sedimenting Droplets:
Part 2. Theory and Parameterization., New J. Physics,
10,075016,2008b.
Ayala O., Grabowski W.W, Wang L.P, A hy
brid approach for simulating turbulent collisions of
hydrodynamicallyinteracting particles, J. Comp. Phys.,
Vol. 225, pp. 5173, 2007.
Bodenschatz E., Malinowski S.P, Shaw R.A., Stratmann
F, Can we understand clouds without turbulence? Sci
ence, Vol 327, pp. 970971, 2010.
Dmitruk P, Wang L.P, Matthaeus W.H., Zhang R., and
Seckel D., Scalable parallel FFT for spectral simulations
on a Beowulf cluster, Parallel Computing Vol 27 No. 14:
pp. 19211936, 2001.
Franklin, C.N., PA. Vaillancourt, M.K. Yau, Statistics
and parameterizations of the effects of turbulence on the
geometric collision kernel of cloud droplets. J. Atmos.
Sci. 64, pp. 938954, 2007.
Karrila S.J., Fuentes YO. and Kim S., Parallel Computa
tion Strategies for Hydrodynamic Interactions Between
Rigid Particles of Arbitrary Shape in a Viscous Fluid, J.
Rheology, 33(6), pp. 913947, 1989.
Lanotte, A.S., Seminara A., Toschi F, Cloud Droplet
Growth by Condensation in Homogeneous Isotropic
Turbulence. J. Atmos. Sci. Vol. 66, pp. 16851697,2009.
Pruppacher H.R., Klett J.D., Microphysics of Clouds
and Precipitation, Klumer Academic Publishers, AH
Dordrecht, The Netherlands, 1997.
7th International Conference on Multiphase Flow,
ICMF 2010, Tampa, FL, May 30 June 4, 2010
Rosa B. and Wang L.P, Parallel implementation of
particle tracking and collision in a turbulent flow.
PPAM2009, Wroclaw (Poland), September 1316, 2009.
Shaw R.A., Particleturbulence interactions in atmo
spheric clouds. Annual Review of Fluid Mechanics, 35,
pp. 183227,2003.
Smith B.F, Bjorstad P., Gropp W., Domain Decompo
sition: Parallel Multilevel Methods for Elliptic PDEs,
Cambridge University Press, 1996.
Vaillancourt P.A., Yau M.K., Bartello P., Grabowski
W.W, Microscopic approach to cloud droplet growth by
condensation. Part II: Turbulence, clustering, and con
densational growth, J. Atmos. Sci. 59, pp. 34213435,
2002.
Wang L.P., Ayala O., Kasprzak S.E., and Grabowski
W.W, Theoretical formulation of collision rate and col
lision efficiency of hydrodynamicallyinteracting cloud
droplets in turbulent atmosphere, J. Atmos. Sci., Vol. 62,
pp. 24332450.
Wang L.P and Grabowski W.W., The role of air turbu
lence in warm rain initiation, Atmos. Sci. Lett., Vol. 10,
pp. 18, 2009.
Wang L.P., Ayala O., Rosa B. and Grabowski W.W., Tur
bulent collision efficiency of heavy particle relevant to
cloud droplets, New Journal of Physics, Vol. 10, 2008.
Wang L.P, Rosa B., Gao H., Guowei H. and Jin G.,
Turbulent collision of inertial particles: Pointparticle
based, hybrid simulations and beyond, Intl. Journal of
Multiphase Flow, Vol. 35, pp. 854867, 2009.
Wang L.P., Ayala 0. and Grabowski W.W., Effects of
aerodynamic interactions on the motion of heavy parti
cles in a bidisperse suspension, Journal of Turbulence,
Vol. 8, No. 25, 2007.
