Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0017240/00001
## Material Information- Title:
- Adaptive, Multi-Domain Techniques for Two-Phase Flow Computations
- Creator:
- UZGOREN, ERAY (
*Author, Primary*) - Copyright Date:
- 2008
## Subjects- Subjects / Keywords:
- Boundary conditions ( jstor )
Cartesianism ( jstor ) Decomposition methods ( jstor ) Film thickness ( jstor ) Linear systems ( jstor ) Liquids ( jstor ) Mathematical physics ( jstor ) Plugs ( jstor ) Poisson equation ( jstor ) Simulations ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Eray Uzgoren. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Embargo Date:
- 12/31/2008
- Resource Identifier:
- 658230435 ( OCLC )
## UFDC Membership |

Downloads |

## This item is only available as the following downloads: |

Full Text |

PAGE 1 ADAPTIVE, MULTI-DOMAIN TECHNIQUES FOR TWO-PHASE FLOW COMPUTATIONS By ERAY UZGREN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2006 PAGE 2 Copyright 2006 by ERAY UZGREN PAGE 3 To my parents PAGE 4 iv ACKNOWLEDGMENTS Among the many individuals for their suppo rt, my most sincere acknowledgements go to my advisor, Dr. Wei Shyy, for providi ng me the opportunity and flexibility to perform this work. His guidance and encour agement provided me the most excellent source of direction throughout this research. I can not thank him enough for being so patient and understanding for year s and pushing us to learn more. I would like to thank Dr. Tony Ladd for hi s guidance and support in the area of parallel computing. I would also like to express my appreciation to Dr. Jacob Chung, Dr. Renwei Mei, and Dr. Bruce Carroll for their helpful comments and for participating in my dissertation committee. I would like to express my gratitude to my family members; my parents, Hrriyet and brahim Uzgren; my sisters and their families, Nilay, Ada and Melih Papila; Emine, Yi it and Bulent Topoglu for thei r patience, support a nd love during these years. They are the undisclosed source of the power that I found in myself at hard times. Finally, I would like to thank to Yasemin Merzifonluo lu for her endless support. PAGE 5 v TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES...........................................................................................................viii LIST OF FIGURES...........................................................................................................ix ABSTRACT.....................................................................................................................xiv CHAPTER 1 INTRODUCTION...................................................................................................1 1.1 Multiphase Flows.........................................................................................1 1.2 Objective and Contributions........................................................................3 1.3 Outline of the Dissertation...........................................................................6 2 MATERIALS AND METHODS.............................................................................8 2.1 Interface Location........................................................................................8 2.2 Interfacial Dynamics..................................................................................12 2.3 Computational Efficiency..........................................................................15 2.3.1 Adaptive Grid Computation...........................................................16 2.3.2 Improvements on Iterative Solvers................................................18 2.3.3 Parallel Computing........................................................................19 2.4 Summary....................................................................................................21 3 LITERATURE REVIEW......................................................................................23 3.1 Domain Decomposition Methods..............................................................25 3.1.1 Schur Complement Methods..........................................................27 3.1.2 Schwarz Methods...........................................................................30 3.1.3 Applications of Con tinuous Interface Methods.............................35 3.2 Partitioning.................................................................................................38 3.2.1 Partitioning Techniques.................................................................39 3.2.1.1 Multilevel schemes............................................................40 3.2.1.2 Space filling curves............................................................41 3.2.2. Quality Considerations...................................................................42 3.2.2.1 Load balancing...................................................................42 3.2.2.2 Communication overhead..................................................43 PAGE 6 vi 3.2.2.3 Convergence......................................................................43 3.2.2.4 Computational time............................................................44 3.3 Summary....................................................................................................44 4 MULTIDIMENSIONAL IMMERSED BOUNDARY COMPUTATION...........46 4.1 Immersed Boundary Method.....................................................................46 4.1.1 Interface Tracking..........................................................................48 4.1.2 Conservative Reconstruction.........................................................49 4.1.3 Interface Construction for Topology Changes...............................50 4.1.4 Modeling Complex Solid Boundaries............................................52 4.1.5 Interactions between Interf ace and Computational Domain Boundary........................................................................................55 4.1.6 Connection between Eulerian and Lagrangian variables...............56 4.2 Adaptive Cartesian Grid Ba sed Flow Computations.................................64 4.2.1 Grid Adaptation and Data Structure..............................................65 4.2.2 Staggered Grid Arrangement.........................................................66 4.2.3 Discretization of Governing Equations..........................................68 4.2.4 Axisymmetric Computations......................................................... 73 4.3 Summary....................................................................................................74 5 ADAPTIVE COMPUTATIONS IN MULTIPLE DOMAINS..............................77 5.1 Partitioning Algorithm...............................................................................77 5.1.1 Space Filling Curve Generation.....................................................79 5.1.2 Cell Ordering.................................................................................81 5.1.3 Mapping Partitions.........................................................................82 5.1.4 Overlap Region..............................................................................83 5.1.4.1 Ghost cells..........................................................................84 5.1.4.2 Ghost faces.........................................................................86 5.1.5 Communication Routines...............................................................87 5.1.6 Grid Adaptation.............................................................................89 5.2 Domain Decomposition for Advection-Diffusion Equation......................90 5.3 Domain Decomposition for Pr essure Poisson Equation............................92 5.3.1 Global Treatment...........................................................................94 5.3.2 Additive Schwarz Technique.........................................................96 5.3.3 Multigrid Solver.............................................................................98 5.3.3.1 Coarse grid generation.......................................................99 5.3.3.2 Matrix assembly...............................................................101 5.3.3.3 Restriction........................................................................102 5.3.3.4 Prolongation.....................................................................103 5.3.3.5 Scheduling........................................................................103 5.4 Summary..................................................................................................105 PAGE 7 vii 6 RESULTS AND DISCUSSION..........................................................................107 6.1 Flow over a Cylinder...............................................................................108 6.2 Liquid-Plug Flow.....................................................................................112 6.2.1 Steady State Solution...................................................................115 6.2.2 Time Dependent Plug Propagation..............................................121 6.2.3 Liquid Plug Flow in Branching Channels....................................122 6.3 Multi-Domain Solution Techniques........................................................126 6.3.1 Partitioning...................................................................................127 6.3.2 Pressure Poisson Equation in a Single Domain...........................130 6.3.3 Pressure Poisson Equati on in Multiple domains..........................137 6.4 Summary..................................................................................................142 7 SUMMARY AND FUTURE WORK.................................................................145 7.1 Summary..................................................................................................145 7.2 Future Work.............................................................................................146 LIST OF REFERENCES.................................................................................................148 BIOGRAPHICAL SKETCH...........................................................................................158 PAGE 8 viii LIST OF TABLES Table page 1-1 Industrial and natural occu rrences of multiphase flows.........................................2 2-1 Comparison of Interface Tracking/Capturing methods........................................11 2-2 Features of SIM and CIM.....................................................................................14 2-3 Aspects of accelerators for iterative solvers.........................................................19 3-1 Decomposition of a linear system.........................................................................26 3-2 Comparison of Schwarz and Schur domain decomposition methods...................34 3-3 Overall computational time spen t to solve a linear system of 6.6x105 elements using CG with different orderings.........................................................44 4-1 Examples of delta function bounded by two cell width........................................62 4-2 Comparison of various delta functions.................................................................63 4-3 Original conjugate gradie nt with preconditioning M............................................72 5-1 Algorithm for mapping partitions to the available processors..............................82 5-2 Data structure of ghost cells..................................................................................85 5-3 Complexity analysis of the or iginal conjugate gradient method..........................93 5-4 Complexity of the conjugate grad ient method with direct parallelism.................94 5-5 Complexity of Additive Schwar z preconditioned conjugate gradient algorithm...............................................................................................................97 5-6 Multigrid cycle and its use with the conjugate gradient algorithm.....................105 6-1 Comparison between prior studies and pr esent study for drag, lift coefficients and Strouhal number at Re =100..........................................................................110 PAGE 9 ix LIST OF FIGURES Figure page 1-1 Illustration of immiscible fluids se parated by an interface which evolves in time.........................................................................................................................1 2-1 Illustrations for identifying the interface location..................................................9 2-2 Examples of special cases in interface tracking....................................................10 2-3 Surface tension, pressure and normal component of shear stress acting on an interface.................................................................................................................12 2-4 Computational time for pressure Poisson equation during a rising bubble computation ( Re =100, We =4, and Fr =1)..............................................................15 2-5 Characteristics of ad aptive Cartesian grids...........................................................17 3-1 Routes for processors to r each data in a parallel system......................................24 3-2 Trend in parallel systems......................................................................................24 3-3 Fundamental idea of parallelis m in scientific computing.....................................26 3-4 Overview of Schur complement methods.............................................................29 3-5 Local solves at the end of iterations to illustrate the influence of the overlap region on the convergence rate.............................................................................31 3-6 Overview of Schwarz methods.............................................................................32 3-7 Different locations set for the simulations............................................................35 3-8 Increase in iteration number according to the placement of the interface with respect to the pro cessor distribution.....................................................................36 3-9 Examples of possible partiti ons on regular Cartesian grids..................................38 3-10 Adaptively refined Cartesian grid.........................................................................39 3-11 Coarsening procedure in multilevel partitioning method.....................................40 PAGE 10 x 3-12 Popular space-filling curv e structures in a 16x16 grid.........................................41 3-13 Possible partitions.................................................................................................42 4-1 Flowchart for immersed boundary method for the computations of two-phase flows using marker points with capabili ties of handling topological changes.....47 4-2 Interface representation by marker points.............................................................48 4-3 Data structure to represents the interface both in 2D and 3D...............................49 4-4 Marker deletion procedure for the ol d interface to form the new interface..........50 4-5 Indicator function varies from zero to one to identify the constituents lying inside or outside the interface...............................................................................51 4-6 Reconstruction process in 2D and 3D...................................................................52 4-7 Indicator function varying smoothly from zero to two.........................................53 4-8 Definition of faces around the solid interface for u-velocity................................54 4-9 Snapshot of the geometry of the interface an instance after the impact...............55 4-10 Odd/even modes of the first moment for piecewise cubic function produces a checker-board field...............................................................................................60 4-11 Gradients computed usin g central differencing operators for cell-center and cell-faces for a rapidly varying pressure field.......................................................63 4-12 Steps of interface location based adaptation.........................................................65 4-13 Placement of velocity field and corresponding control volumes on a staggered variable arrangement............................................................................67 4-14 Schematic for the staggered control volume.........................................................67 4-15 Construction of the normal veloc ities on the control volume for the convection term.....................................................................................................69 4-16 Flux distribution at th e control volumes of T-no des that leads to global conservation..........................................................................................................70 4-17 Graphs for two different cell-orderings................................................................72 4-18 Volumetric element considered for axisymmetric computations.........................73 5-1 Partitioning based on curves.................................................................................77 PAGE 11 xi 5-2 Construction of the Peano-Hilbert curve..............................................................79 5-3 Initial cell-ordering for a channel flow.................................................................81 5-4 Hilbert curve based ordering.................................................................................82 5-5 Partitioning based on cell-ordering.......................................................................83 5-6 Computational stencil for pressure Poisson equation in 2D consisting of immediate neighbors in eas t, west, north, south...................................................84 5-7 Overlapping regions that will exch ange information with the neighboring sub-domains (marked with crosshatches).............................................................84 5-8 Computational stencil for convectio n-diffusion equation for a face in ydirection................................................................................................................86 5-9 Ghost faces at the corner of a partition.................................................................87 5-10 Communication processes to update the information present in the ghost cells.......................................................................................................................88 5-11 Adjusting partition boundaries after adaptation....................................................90 5-12 Sample matrix that has a non-zero element at the second row and its AIJ format....................................................................................................................93 5-13 Hiding the latency of communicatio n between the processors during a matrix-vector product............................................................................................95 5-14 Results of the coarsening process.......................................................................100 5-15 Step-by-step illustration of the coarsening algorithm.........................................101 5-16 Restriction operator.............................................................................................102 5-17 Prolongation operates on the residual correction................................................103 5-18 Cycles of multigrid.............................................................................................103 6-1 Computational setup used for the flow over a stationary cylinder case.............108 6-2 Typical grid after adaptive refinement for the case of a flow over a cylinder....109 6-3 Pressure contours of Re =100 flow at an instant af ter vortex shedding become constant (Contours are spaced by P = 0.2).......................................................109 6-4 Streamlines around the cylinder; co lors indicate pressure levels.......................110 PAGE 12 xii 6-6 Grid convergence study based on the pe rcentage error in Strouhal number......111 6-7 Illustration of th e Bretherton problem................................................................112 6-8 Illustration of computational setup for liquid plug problem...............................114 6-9 Snapshot of adaptively refined grid (at 3 levels) for Re =0 and Ca =0.05 at the steady state..........................................................................................................115 6-10 Validation study for various Capillary numbers.................................................116 6-11 Steady state shapes of the air-fi nger for various Capillary numbers..................117 6-12 Pressure contours and streamline trac es of the steady propagation of liquid plug ( Ca =0.05)....................................................................................................118 6-13 Comparison of the trailing film thickness at Ca =0.05 for various Reynolds number................................................................................................................118 6-14 History of trailing film thickness fo r different initial conditions, i.e., h0 = 0.1 and h0 = 0.2 until steady state is achieved..........................................................119 6-15 Effect of grid refinement on the trailing film thickness at Re =100 and Ca =0.05...............................................................................................................120 6-16 Evolution of air finger in a 2D channel for Re =100 and Ca =0.05......................122 6-17 Pressure contours at t = 0.8.................................................................................122 6-18 Computational setup for the solid bo undaries for the flow inside branching channels...............................................................................................................123 6-19 Pressure contours fo r single phase flow ( P = 1)...............................................124 6-20 Computational setup for liquid pl ug problem in branching channels.................124 6-21 Time-dependent development of th e bubble shape in branching channels........125 6-22 Variation of film thickness in time in the branching channel flow before bifurcation point is reached.................................................................................126 6-23 Hilbert curve based partitioning algorithm is illustrated on a 32x32x32 uniform grid for 2, 3 and 4 processors................................................................127 6-24 Sample partitioning on an adap tive grid into 6 sub-domains.............................128 6-25 Comparision of the quality of parti tions between current algorithm (SFC) and METIS.................................................................................................................129 PAGE 13 xiii 6-26 Computation time for partitioning algorithm......................................................130 6-27 Iteration history for conjugate gradient and multigrid solvers on uniform grids.....................................................................................................................131 6-28 Illustration for creation of coarser levels for multigrid solver based on the initial grid represented as Level 4.......................................................................132 6-29 Performance of multigrid when diffe rent number of levels is employed...........133 6-30 Performance of multigrid and conjuga te gradient algorit hms on non-uniform grids.....................................................................................................................134 6-31 Treatment of the density field in multigrid algorithm........................................135 6-32 Performance of multigrid and conjugate gradient at various density ratios.......135 6-33 Residual correction steps in multigrid solver for density ratio of 100 using sharp varying density field..................................................................................137 6-34 Grids and partitions used for parallel performance study...................................138 6-35 Iteration history of ASCG...................................................................................139 6-36 Effect of number of s ub-domains on convergence rate......................................140 6-37 Creation of coarse grid levels and the way they are partitioned into five subdomains...............................................................................................................141 6-38 Iteration numbers for diffe rent number of sub-domains.....................................142 PAGE 14 xiv Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ADAPTIVE, MULTI-DOMAIN TECHNIQUES FOR TWO-PHASE FLOW COMPUTATIONS By Eray Uzgren December 2006 Chair: Wei Shyy Major: Aerospace Engineering Computations of immiscible two-phase fl ows deal with interf aces that may move and/or deform in response to the dynamics with in the flow field. As interfaces move, one needs to compute the new shapes and the as sociated geometric information (such as curvatures, normals, and projec ted areas/volumes) as part of the solution. The present study employs the immersed boundary method (IB M), which uses marker points to track the interface location and con tinuous interface methods to m odel interfacial conditions. The large transport property jumps across th e interface, and the c onsiderations of the mechanism including convection, diffusion, pressure, body force and surface tension create multiple time/length scales. The re sulting computational stiffness and moving boundaries make numerical simulations comput ationally expensive in three-dimensions, even when the computations are performed on adaptively refined 3D Cartesian grids that efficiently resolve the length scales. A do main decomposition method and a partitioning strategy for adaptively refined grids are developed to enable parallel computing capabilities. Specifically, the approach c onsists of multilevel additive Schwarz method for domain decomposition, and Hilbert space filling curve ordering for partitioning. The PAGE 15 xv issues related to load balancing, communica tion and computation, convergence rate of the iterative solver in regard to grid si ze and the number of sub-domains and interface shape deformation, are studied. Moreover, in terfacial representati on using marker points is extended to model complex solid geometries for single and two-phase flows. Developed model is validated using a benc hmark test case, flow over a cylinder. Furthermore, overall algorithm is employed to further investigate steady and unsteady behavior of the liquid plug problem. Finally, capability of handling two-phase flow simulations in complex solid geometries is demonstrated by studying the effect of bifurcation point on the liqui d plug, which is placed in a branching channel. PAGE 16 1 CHAPTER 1 INTRODUCTION 1.1 Multiphase Flows Multiphase flows are defined as the motion of two or more fluids which are in different thermodynamics state and/or have different chemical compositions. Multiphase flows are common in daily life as well as in many industrial applications. Some of the specific practical applications can be cited as boiling heat transfer in power industries, impact of a droplet on paper fo r buble-jet printers and biom edical applications in drug delivery processes. Table 1-1 provides a brief summ ary of industrial and natural occurrences of multiphase flows lis ted in Kleinstreuer (2003). Figure 1-1. Illustration of immiscible fluids separated by an interface which evolves in time. Table 1-1 uses phase configurations to categorize the multiphase flows. The scope of this dissertation is to focus on immiscib le liquid-liquid or gasliquid flows for bubble and drop dynamics problems. In these confi gurations, the constituents are separated by an interface, in which the transport properties of the fluid, such as density and viscosity, are typically discontinuous. Figure 1-1 demonstrates the basic characteristics of twoFluid 1 Fluid 2 Interface Surface tension force PAGE 17 2 phase flows, i.e., the presence of an in terface separating different fluids/phases, discontinuous fluid properties (density, vi scosity, etc.) across the interface, and the presence of interfacial effects such as surface tension forces. Table 1-1. Industrial and natural oc currences of multiphase flows. Phases Configuration Industr ial or Natural Occurrences Gas-Solid Dispersion of solid particles in gas Aerosol, particulate matter pollution, Filters and particle-collection devices, Pneumatic transport of particles, Gas â€“fluidized beds. Liquid-Solid Dispersion of particles in liquid Cells and particles in blood, Hydraulic transport of particles, Liquid-fluidized beds. (a) Dispersion of liquid drops in gas Mist, fog, cloud, rain storm, Coalescence of drops, Fuel droplet injection, Spray coating and cooling. (b) Dispersion of gas bubbles in liquid Transport of oil and gas, Bubble-curtain break-water, Flow of boiling liquid in pipes, Gas-Liquid or Liquid Gas (c) Both liquid and gas connected Annular regime of vertical gas-liquid flow, Movement of air-water interface in soil. (a) Dispersion of liquid drops in liquid Emulsion and creams, Break-up of immiscible drops, Coalescence of drops and phase separation. Liquid-liquid (b) Both bodies of liquids connected Transport of two liquids in horizontal pipe, Movements of oil-water interface in porous rocks. Note: Adapted from Kleinstreuer (2003). Computations of immiscible two-phase flows need to re solve the interface location in order to apply the interfacial conditions , which arise as a result of surface tension forces and distinctive material properties of the constituents. Wh en the interface moves in response to the flow dynamics, the new c onditions need to be computed for its new shape. These include the computation of geometric properties, such as curvatures, normal and tangent vectors. The interaction of surface forces with the flow field creates the most obvious difficulties since it adds non-linearity to the problem. The PAGE 18 3 discontinuities across the interfaces create multiple length/time scales and result in numerical stiffness that make s the computer simulations of two-phase flows expensive. 1.2 Objective and Contributions The present study focuses on the deve lopment of robust and efficient multidimensional algorithms for incompressible tw o-phase flows. The building blocks of multiphase flow computations have been constr ucted as part of a larger research group activity from the key ideas merged in the recent years. The following summarizes these key concepts: Multi-fluid interface tracking o Marker points that represen ts interface for 2D and 3D o Conservative reconstruction by adding/de leting elements on the representation o Handling topological changes o Modeling complex solid geometries o Interactions with the computational boundaries for contact line problems Interfacial dynamics and flow modeling o Single fluid formulation for the motion of all constituents o The discrete delta function to connect Lagrangian markers to Eulerian grid o Adaptive grid computations o Resolution adjusted automatically with the interface location and flow features Fractional-step method for moment um equations on staggered grids o Axisymmetric extension for its computational advantages o Solving linear systems in multiple domains o Partitioning strategy for adaptive Cartesian grids o Domain decomposition techniques for the solver These components form the basic structure of a recently emerged tool that can be used easily to study interfacial flow dynamics and its applications, one of which include the dynamics of an air finger in the lungs dur ing a medical treatment process, i.e., drug delivery. The liquid plug forms in the trachea just before inspiration. It propagates during the inspiration while leaving a trailing film th ickness and may eventually rupture. As the plug propagates down the airway tree, the film thickness behind is us ually thicker than the film ahead. In order to understand how liquid distributes in the airways throughout PAGE 19 4 the lung, Fujioka and Grotberg (2004) studied the interactio n of the effect of the plug propagation speed and plug length at the Stokes flow limit as well as at finite Reynolds numbers. The present work allows us to simulate both the air and the liquid phase on a branching tube for a better understandin g of the steady-state and/or non-steady development of an air-finger. Two-phase flow simulations that involve drop collisions have been studied in threedimension by Nobari and Tryggvason (1996), Shin and Juric (2002), Pan and Suga (2005) and Singh (2006). In these cases, the interplay between the impact velocity and the interfacial forces determines the dynami cs of the flow. Depending on the regime, droplets can bounce back, merge permanently or separate after a merger with a possible formation of satellite droplets. The comput ational demand due to the requirements of high grid resolution amplifies especially when the density ratio between the drops and the surrounding environment is high. Therefore, further examining such problems is only possible after developing strategies to reduce computational cost. The primary goal of this dissertation is to develop comput ationally efficient algorithms for numerical two phase flow simula tions. In order to ach ieve this goal, two major approaches have been simultaneously studied, adaptively refi ned Cartesian grids and multi-domain techniques. The key advantage of adaptive grid refinement is its ability to manage the grid re solution locally, allowi ng us to resolve flow features without bearing the redundant computati onal cost. Furthermore, de velopment of an efficient multi-domain strategy on adaptive refined Cartesian grids would lead to a competent parallel implementation. PAGE 20 5 In the literature, there are various studies in two-phase flow computations that consider parallelism. They usually lack a r obust parallel performance for large number of processors (i.e., greater than 100) that ar e available on massively parallel computers. Furthermore, most of these studies fail to cons ider the effect of la rge variations in the flow field. A strategy for developing such scalable parallel algorithms requires the following elements: Domain decomposition method to be used for solving the linear system of equations Partitioning algorithms Domain decomposition methods enable us to iteratively solve a linear system of equations. Its performance is based on the neces sity for data exchange while maintaining a reasonable convergence rate. The existence of large variations in the flow field establishes additional constraints on the process of developing a domain decomposition method. We employ the Schwarz type of methods as recent efforts i ndicate its capability of elliptic equations with strong variations. Partitioning algorithms are responsible for th e division of the computational load in such a way that they assure load balancing while keepin g the communication cost at minimum. Partitioning is one of the ke y components of effec tive parallelism when unstructured grids are employed. Among the po ssible choices, we employ Peano-Hilbert space filling curves to partition adaptively refined Cartesian grid. The specific contributions of this dissertation can be listed as Marker based interface tracking a nd interfacial dynamics modeling o Handling 2D interfaces and related t opological changes to merge and breakup o Interactions between the comput ational boundaries and the interface o Modeling complex solid boundaries o Axisymmetric computations PAGE 21 6 Multi-domain technique for immersed boundary method o Development of a partitioning strategy fo r adaptively refined Cartesian grids o Development of a domain decomposition strategy for Navier-Stokes equation o Development of multigrid solver. Capabilities of the developed approach a nd algorithms have been demonstrated and analyzed via a set of case studies; including flow over a cylinder and steady and unsteady behavior of the liquid plug problem. 1.3 Outline of the Dissertation This document is organized into seven ch apters. Chapter 2 focuses on the methods and materials to illustrate the alternative techniques of two-phase flow computations. Chapter 3 explains the fundamental concepts in multi-domain techniques which are essential for parallel scientific computing. In this chapter, prim ary components of the multi-domain techniques, domain decompositi on methods and partitioning algorithms, are presented. Chapter 4 elaborates on the implementation specifics of immersed boundary computations on adaptively refined Cartesia n grids. The soluti on techniques for the Navier-Stokes equations are described hi ghlighting the numeri cal discretization procedure. Chapter 5 explains the details of the mu lti-domain technique employed in this study. In this chapte r, the partitioning algorithm and its use with the domain decomposition technique for Navier-Stokes e quations are described in detail. The complexity analysis of the multi-domain solver is presented for various scenarios in high performance computing platforms. The key results of our resear ch are presented in Chapte r 6. A validation study for solid boundary treatment includes the flow ov er a cylinder. Moreover, the liquid plug PAGE 22 7 problem is studied to obtain steady and uns teady flow features. Furthermore, the capability of handling two phase flows with complex solid geometries is demonstrated using the case of a liquid plug propagating in a branching channel. The performance of the developed domain decomposition method is also investigated in this chapter. Chapter 7 provides a summary of the pr esented work. Future extensions and refinements to the present research are also elaborated in this chapter. PAGE 23 8 CHAPTER 2 MATERIALS AND METHODS Typical interfacial-flow computations involve the following components: Interface location is resolved for the geom etric properties either by capturing the interface location using a scalar func tion, or tracking explicitly using a representative surface. Interfacial dynamics modeling deals with interfacial forces and treatment of the discontinuities in transport pr operties across the interface. Effective solution procedure for Navie r-Stokes equations which consider the features of interfacial transport in order to obtain accurate, stable and economically viable computations. Various methods exist for each component with their own strength and weakness. A brief review of some of the most popular techniques in practice is presented in the following sections. 2.1 Interface Location There are three main categories for hand ling the moving interface, namely the Lagrangian, the Eulerian, and the combined Eulerian-Lagrangian methods (Shyy et al. 1996). Figure 2-1 illustrates e ach of these methods. In Lagrangian methods, the interface lo cation coincides with the computational grid, which is as well advected and deformed with the moving interface (Figure 2-1a). Tracking the interface e xplicitly, Lagrangian methods produce accurate results when the interface motion is modest. If large deformations/movements take place, it becomes necessary to regenerate the whole grid in order to preserve the quality on the computational mesh Antaki et al. (2000). This is a difficult and an expensive task for PAGE 24 9 three-dimensional computations Johansen and Colella (1998). Examples of this category include Wasekar and Manglik (2003) on a curvilinear mesh, and Perot and Nallapati (2003) using triangulated meshes. Fluid 2 Fluid 1 Fluid 2 Fluid 2 Fluid 1 Fluid 1 Fluid 2 Fluid 1 Fluid 2 Fluid 2 Fluid 1 Fluid 1 (a) (b) (c) Figure 2-1. Illustrations for id entifying the interf ace location. (a) Lagrangian method with a grid line coinciding with the interface. (b) Eulerian method with a simplified line interface (SLIC). (c) Combined Eulerian-Lagrangian method with massless markers. Eulerian methods employ a scalar functi on on the computational grid to extract geometric information of the interface (Figur e 2-1b). This category includes two popular methods, namely, the level-set (LS) method and the volume of fluids (VOF) method. The difference between these two methods is due to the choice of the scalar function, which is a signed-distance function in LS a nd volume fraction in VOF. Both of them are relatively convenient to implement and can ha ndle topological changes (i.e., merger or break-up) naturally (Shyy et al . 1996). Problems may arise at inadequate grid resolutions because the detail level of the interface is restricted to the grid size (Rider and Kothe 1998). Enright et al. (2002) reported severe mass loss/gain for LS methods and suggested the use of the particle-level set method that uses a cloud of Lagrangian markers to avoid erroneous diffusion. On the other hand, while being able to conserve mass, VOF faces the difficulty of retrieving geometric propertie s due to possible fragmented representation of interface (Scardovelli and Zaleski 1999). Th is problem with VOF is ameliorated by PAGE 25 10 coupled LS-VOF method which conserves mass due to VOF and computes geometric information using LS (Sussman and Pucke tt 2000). Another remedy suggested by Aulisa et al. (2004) is hybrid marker -VOF method which represents the interface with markers to avoid interface reconstruction for geom etric-property computation and moves the interface using VOF solution. This method improves the pe rformance of VOF in underresolved areas. 0.1 0.1 0.0 0.0 0.4 0.9 1.0 0.1 0.3 0.8 0.9 (a) (b) (c) Figure 2-2. Examples of special cases in in terface tracking. (a) Lagrangian methods. (b) Eulerian methods. (c) Combined Eulerian-Lagrangian methods. Combined Eulerian-Lagrangian methods employ marker points to locate the constituents (Figure 2-1c). This is more advantag eous than Eulerian methods as the interface is known explicitly even at the sub-grid levels. Su ch explicit nature is also convenient because it allows direct computati on of geometric propertie s on the interface. One shortcoming of these methods is that the surface representation requires complex data structure and book-keeping for 3D comput ations. When topological changes are of interest, it becomes even harder because the connectivity information needs to be rearranged (Shyy et al. 1996). Torres and Brackbill (2000) suggested representing the interface with a set of unconnected points. The complications during topological changes are mostly avoided since there is no need to preserve any connectivity between the line segments. Subsequent efforts by Shin and Juric (2002) extended this idea using PAGE 26 11 unconnected line segments for 2D and unconnected triangular elements for 3D computations. They employed level-contour based interface recons truction at regular intervals both to capture topol ogical changes and to maintain interface quality. The idea of level-contour based reconstruction is utilized by Singh and Shyy (2006) only when topological changes are required. The connectiv ity information is kept to maintain the interface quality so that the penalties of fr equent level-based reconstruction algorithm, i.e., excessive perturbation of interface, are avoided. Table 2-1. Comparison of Interface Tracking/Capturing methods. Methods Highlights Issues Lagrangian Interface explicitly known as part of the grid. Conditions applied directly on interface. Requires frequent grid regeneration to preserve quality (Antaki et al. 2000). Difficult and expensive for complex and large 3D computations (Johansen and Colella 1998). Eulerian Easy geometric computation. Handles topological changes naturally (Sethian and Smereka 2003). LS methods have severe mass loss/gain in under-resolved areas of the interface. Particle level-set is proposed to ameliorate mass loss/gain (Enright et al. 2002). VOF has poor performance in under-resolved areas causing fragmentation of interface (Rider and Kothe 1998). Hybrid marker-VOF uses markers which moves with VOF (Aulisa et al. 2004). Coupled LS-VOF use level-set to compute geometry and VOF for conserving phase volume (Sussman 2003). Mixed EulerianLagrangian Interface explicitly known even at sub-grid levels. Geometry-computation directly on interface as it is known explicitly. Complex data-structure and book-keeping for 3D interfaces. Topological changes require manipulation of interface data (Shyy et al. 1996). Connectivity free tracking is proposed by Torres and Brackbill (2000). Level-contour-based rec onstruction is proposed by Shin and Juric (2002). Connectivity information is used maintaining interface quality which is decoupled from levelcontour based reconstruction for topological changes (Singh and Shyy 2006). In Table 2-1, advantages and disadvantages of each method are summarized. Among these methods, the present work em ploys combined Lagrangian-Eulerian PAGE 27 12 methods due to its ability to re solve the interface accu rately even at sub-grid levels. The connectivity information is kept for maintaining the interface quality by adding and deleting markers locally. Topological change s are handled using an indicator-based interface reconstruction. The details of the algorithm are described in Singh and Shyy (2006). 2.2 Interfacial Dynamics The pressure and viscous stresses show discontinuities across the interface as a result of the surface tension forces (Figure 2-1). A representati ve condition is given by the Equation (2-1) where p is the pressure, is the viscous stresses, is the surface tension, is the curvature and subscripts represent the corresponding constituents. 2121nn pp (2-1) Figure 2-3. Surface tension, pressure and norma l component of shear stress acting on an interface. Interfacial dynamics modeling incorpor ates the jump condition using two distinguished methods, namely, sharp interface methods (SIM) and continuous interface methods (CIM). Sharp interface methods solve transport equations separately for each constituent by imposing the condition (Equation (2-1)) explicitly. Such treatment yields in a reduced numerical stress imbalance acro ss the interface as indicated by Ye et al. (2002) that SIM PAGE 28 13 brings down the magnitude of spurious curren ts for a bubble in static equilibrium. Some examples of this category that employ markers include th e cut-cell method (Ye et al. 1999), immersed interface method (Li and Lai 2001) and ghost fluid method (Hao and Prosperetti 2004). Implementation of cut-cell method is challenging because constructing control volumes around the in terface involves non-rectangular cells. Moreover, the algorithm needs to identify th e cells intersecting with the interfaces, to break them into separate regions and to co mbine certain portions with the neighboring cells depending on the way that the cell is cut (Udaykumar et al. 2001). Furthermore, it employs an iterative procedure which perturbs the interface in local normal direction in order to find the correct location of the inte rface that satisfies th e jump condition (Ye et al. 1999). The overall process makes cut-ce ll method very difficult and computationally expensive for three-dimensional computati ons. On the other hand, immersed interface method (IIM) and ghost fluid method (GFM) modify the discrete operators in the governing equations to include the exact jump condition (M arella et al. 2005). Unlike cut-cell method of Ye et al. (1999), these me thods do not perturb the interface iteratively. GFM is usually used along with level-set methods as the imp licit representation of the interface location makes the multidimensional impl ementation easier (Liu et al. 2000). In addition, examples of GFM with explicit inte rface tracking algorithms also exist in the literature (Hao and Prosperetti 2004). Continuous interface methods (CIM) solve on ly one set of equations for all the constituents. Jump condition is transmitted to the momentum and, if appropriate, energy equations as source terms. This approach leads to a smeared interfacial region with a thickness of a few computationa l cells although the actual physic al process, based on the PAGE 29 14 continuum mechanics framework, represents a sharp jump with zero-thickness. Smearing is employed to depict the singular surface forces using an approximate Dirac delta function. Smearing of fluid and flow propertie s enhances the stability of the algorithm even though it limits the accuracy of continuous interface methods to only first order in space (Rider and Kothe 1995; Shyy 2004). Recent efforts by Tornberg and Enquist (2004) aimed to analyze a cl ass of regularization methods for the treatment of the singular source term for higher order of converg ence in the vicinity of interfaces. Even though they established fourth order accuracy in one-dimensions, numerical experiments in higher dimensions still indicated firs t order convergence around the interface region. Table 2-2. Features of SIM and CIM. Interfacial Dynamics Highlights Issues SIM Explicit jump conditions. Magnitude of spurious velocity currents is reduced (Ye et al. 2004). Extending cut-cell method of Ye et al. (1999) to 3D is difficult. Ghost fluid method is typically first order around the interface because the jump conditions in tangential derivatives is neglected (Liu et al. 2000). CIM Ease in implementation as it involves only one set of equation for both phases. Enhanced stability due to smearing (Rider and Kothe 1995). First order spatial accuracy around the interface because of smearing out the properties (Shyy 2004). Considering Table 2-2 that presents the features of SIM and CIM, this dissertation employs a continuous interface method (CIM) be cause it enables solving only one set of equations for all the constituents while CIM remains reasonable for a wide range of fluidfluid multiphase problems of present intere st (Francois and Shyy 2002; Francois and Shyy 2003; Shyy 2004). The overall technique is called the immersed boundary method PAGE 30 15 (IBM) when used with Eulerian-Lagrangian marker-based interface tracking (Peskin 1977; Peskin 2003). 2.3 Computational Efficiency Numerical simulations of tw o-phase flows are computa tionally demanding. One of the primary sources of the computational cost is due to the convergence behavior of pressure Poisson equation, which is adversely affected by the existence of multiple fluid components (Shyy et al. 1996). .... (a) (b) Figure 2-4. Computational time for pressu re Poisson equation during a rising bubble computation ( Re =100, We =4, and Fr =1). (a) Various density ratios. (b) Various viscosity ratios. Figure 2-4 is a representative case of degrading perfor mance with increasing fluid property jumps across the interfac e (Francois et al. 2004). In this study, the computations were performed for a buoyancy-driven risi ng bubble using immersed boundary method. A W-cycle multigrid technique was used to solve the pressure Poisson equation. The computations were performed for different density and viscosity ratios across the interface to evaluate the effect of fluid pr operty jumps on the overall computation-time. The horizontal axis in Figure 2-4 shows the levels of multigrid used for computation PAGE 31 16 where â€œ level= 1 â€ represents computation without any multigrid computation. They conclude that the density ratio has the most significant impact on the performance; i.e., computational time for density ratio of 1000 is ten times higher than that for density ratio of 10 when no enhancement with multigrid ex ists. Although these observations come from using a specific method (immersed boundary) for multiphase flows, they still represent a general tre nd (Francois et al. 2004; Lrstad and Fuchs 2004). These observations recommend taking appropr iate measures to make multiphase flows computationally mo re affordable especially for large three-dimensional problems. The commonly employed remedies are listed as follows: Adaptive grid refinement around the inte rfacial region to resolve the multiple length/time scales effectively at the de sired accuracy level (Singh et al. 2005; Sussman 2005; Wang and Wang 2004). Improvements for convergence properties of the iterative solver using multigrid and/or preconditioners (Saad 2003). Parallel computations using domain decomposition methods which make largescale simulations possible (Uzgoren et al. 2004). The subsequent sections provide an overv iew on improvements on adaptive grids, iterative solvers and parallel computations. 2.3.1 Adaptive Grid Computation Adaptive grid is a procedure that optimizes computational data size while maintaining a desirable solution by efficiently resolving local length scales. The desired level of resolution is achieved by refining/coarse ning grid cells locally. Local adaptation creates a complicated da ta-structure and more computationalburden per grid-cell while it can freely contro l the local grid resolu tion. The fundamental advantage of such a technique is its ability to resolve length scales locally with much fewer numbers of global grid-cells as co mpared to uniform grids. The overall PAGE 32 17 computation-time and computational-resource s (memory requirements) can be reduced significantly, especially for 3D computations. The grid type used for spatial discretiza tion plays a key role on the efficiency and accuracy of the numerical soluti on. Cartesian grids are widely used with local adaptation because they offer relatively uncomplicated gr id generation as well as data-structure. Figure 2-5 indicates the typical features of adaptive Cartesia n grids using various types of data-structures: quadtree/octree in 2D/3D (Wang and Wang 2004), unstructured (Aftosmis 1997), and AMR or adaptive mesh refinement (Sussman 2005). (a) (b) Figure 2-5. Characteristics of adaptive Cartesian grids. (a) Unstructured data or quadtree. (b) AMR showing embedded blocks of different resolutions. AMR collects cells in blocks and refine them to the desired level. This results in number of excessive cells. A tree-based da ta-structure offers a compact data-storage system that ignores storing the explicit info rmation about the identities of neighboring cells. When needed, grid connectivity info rmation is extracted by employing a query on the tree which results in an additional computational effort. On the other hand, unstructured approach stores the geometri c information as well as the connectivity information. Despite the large memory requirement, this method is usually the most PAGE 33 18 feasible since it saves significant amount of computational time on the retrieval of connectivity information (Aftosmis 1997). The present study employs a cell-by-cell adaptation and the grid-data is stored in an unstructured format. 2.3.2 Improvements on Iterative Solvers The convergence behavior of pressure Poisson equation can be improved by numerically enhancing the iterative procedure. Multigrid is often used in fluid dynamics computations for such purposes. Its efficiency has been demonstrated for smooth elliptic equations (Briggs and McCormick 2000; We sseling 1982). Multigrid achieves such computational efficiency by damping the hi ghest frequency component of the error arising from the fine grid on a coarser le vel (Ghia et al. 1982; Shyy 1994). Multigrid techniques have been popular with relaxation-based itera tive methods such as Gauss Seidel or Jacobi methods and are often used with equations rising from regular structured meshes. When unstructured meshes are considere d, implementation of geometric multigrid is challenging. Algebraic multigrid, initiate d by Ruge and Stuben (1985), can be used for unstructured grids while its performance is s ubject to limitations fo r anisotropic and/or variable coefficients (Brezina et al. 2005). Systems arising from unstructured mesh es are usually solved using Krylov subspace solvers, i.e., generalized minimu m residual (GMRES) and conjugate gradient (CG). Such methods employ some form of lo cal projection for conve rgence, resulting in less number of iterations than stationary iterative methods. Th e acceleration can be improved by using preconditioners (Saad and van der Vorst 2000). These methods along with preconditioners are widely used for problems that employ unstructured grids. PAGE 34 19 Table 2-3 summarizes the aspects of possi ble methods to improve the convergence behavior of pressure Poisson equations. The present work employs Krylov subspace and multigrid methods to accelerate the iterative solver for the elliptic pressure Poisson equation. Table 2-3. Aspects of accelerat ors for iterative solvers. Method for improvement Aspects Geometric: Efficient for smooth elliptic equations (Briggs and McCormick 2000). Usually used with relaxation based iterative solvers (Saad and va n der Vorst 2000). Widely employed for structured grids. Multigrid Algebraic: Difficult to implement. Can be used with unstructured grids. Problems may arise for an isotropic or variable coefficients (Brezina et al. 2005). Krylov Methods It is often used for unstructured grids. Preconditioners can be effective to close the gap between multigrid type accelerators. 2.3.3 Parallel Computing Many applications of fluid flow simulations involve multiple length scales, which can only be resolved using fine grids. A valid approximation for such problems requires an enormous amount of computer memory and computational time even when both accelerators and adaptive grids are employed. Such requirements limit our capabilities on sequential computers. Massively parallel computing offers an effective way for simulating large-scale problems by distributi ng overall computationa l load on a number of processors. The main objective of parallel computing is to decrease the computational time by increasing the number of processors used. Such scalable parallelism is only possible for â€œembarrassingly parallel problemsâ€ (Fox et al. 1994). An example of such problems is Monte Carlo trials, in which processors pe rform computations i ndependently without a PAGE 35 20 need to communicate with others. Unfortunate ly, this is not the case for most of the computational fluid dynamicsâ€™ applications in which processors are required to communicate to obtain a global solution. Several factors deteriorate the expected scalability preventing us to fully benefit from the parallel computing. Load balanci ng and communication co st are some of the fundamental factors that need to be addre ssed for a scalable parallel design. Load balancing refers to equal divi sion of computationa l tasks on processors so that it is ensured that the processors complete th eir computational cycles simultaneously. Communication cost is the time spent to exchan ge data between the processors. In most cases, it may not be possible to eliminate co mmunication cost completely while it is crucial to reduce it as much as possible to achieve an efficient parallelism. Scientific parallel computations of fluid flow simulations typically require solving linear systems of equations resulting from the discrete form of the governing equations. The parallel strategy considers the followi ng ingredients for scie ntific computing: Domain decomposition: The strategy of so lving linear systems iteratively using data parallelism. Partitioning: The way that the data is divided into smaller sub-domains. Domain decomposition methods that s upport data-paralelism are Schwarz and Schur complement methods. Schwarz dom ain decomposition methods have been successfully employed with th e computational fluid dynamics problems. They are known to be robust, but they usually require special treatments on th e overlap region to obtain good scalability on massively parallel system s (number of processo rs larger than 100 computers (Garbey 2005)). In contrast to Schwarz methods, Schur complement methods handle the linear system globally. However, Schur complement methods are difficult to PAGE 36 21 implement and consequently, there are limited nu mber of parallel flow solvers based on Schur complement methods. One example is studied by Hill et al. (2004) using PETSc, an open-source library for parallel linear solvers. Partitioning is a crucial gradient on the su ccess of parallel algorithms especially on adaptive grids. The fundamental goal of partitioning is to minimize the communication cost while maintaining the computational lo ad on each processor balanced. Partitioning of unstructured meshes can be treated as a graph partitioning problem that is shown to be NP-complete (Garey and Johnson 1979) and optimal solutions are computationally intractable for large problems. However, se veral solution methods have been proposed. Among many possible approaches, the most popular two methods ar e the space filling curves (SFC) (Aftosmis 1997) and multi-level k -way partitioning (METIS) (Karypis and Kumar 1998). This study employs SFC for partitioning due to its computational efficiency while having comparable performance with other partitioning strategies for the following reasons. The quality of partitioning based on space curves has been examined and found to be competitive with respect to other pa rtitioners (Iqbal and Carey 2005; Oliker and Biswas 1998; Pilkington and Baden 1996). On adaptation, repartitioning can be perfor med locally as the basis of the curve remains the same. This avoids glob al remapping when compared to other techniques, which is an order of magnit ude more expensive (Oliker and Biswas 1998). Partitioning is based on local values; hen ce there is no need for communication via processors in the partitioning pr ocedure (Aftosmis et al. 2004). 2.4 Summary In this chapter, av ailable methods and components of numerical two-phase flow simulations are overviewed. Firstly, techni ques to identify the moving interface location PAGE 37 22 are categorized as Lagrangian, Eulerian and mixed Eulerian/Lagrangian methods. Advantages and disadvantages of the prim ary methods in each group are discussed. Among these methods, marker based tracking, especi ally with the recent advances in data handling, is chosen because of its capability to keep the explicit knowledge of the interface shape/location at all times. Secondly, two main cla sses for interfacial dynamics models are introduced, sharp interface met hods and continuous interface methods. Because of the concerns related to computa tional efficiency and relative difficulties with sharp interface methods via marker base d tracking, present study focuses on the continuous interface methods, which offers a reasonably accurate and stable method for bubble/drop dynamics problems. The overall te chnique is referred to immersed boundary method. Even though immersed boundary technique is one of the most computationally efficient algorithms for two-phase flow simu lations, tackling with large scale problems especially with multiple interf aces can still be a major challenge. This is mainly because the existence of multiple lenght/time scales and discontinuities make the simulations computationally demanding. Efficient algorith ms, such as employing adaptively refined grids, can ameliorate such issues to an exte nt. Further remedies aiming to improve flow solution procedure include the multi-domain te chniques, which provid e an effective way of enabling parallel computations on massively parallel systems. For a successful multidomain strategy for two-phase flow simulations, partitioning algorithms and domain decomposition methods are overviewed briefly. A more comprehensive discussion on these methods is presented in Chapter 3. PAGE 38 23 CHAPTER 3 LITERATURE REVIEW Mooreâ€™s law states that processorsâ€™ speed doubles about every three years. Instead of anticipating for such a trend for cost -effective computations, one can combine currently available processorsâ€™ power to solv e large scale problems in a shorter execution time. This is known as parallel computi ng. Since the numerical simulations of twophase flows are computationally demanding, em ploying such an idea brings the benefits of efficient use of resources. One of the major concepts in scientific parallel computing is locality (Demmel et al. 1993). Locality refers to the proximity of processor and memory units. This concept can be better understood examining Figure 3-1, which shows the layout of a distributed memory system. Such systems include blocks of processor and memory units which are combined through interconnect/network. Even though any processor can reach the data stored in local and non-local memory units , their access may be established through different devices. When it is on non-local memo ry unit, data travel through interconnect, which is about an order magnitude slower th an the internal speed of any block. As a result, reaching non-local data via network requ ires more time than reaching data from a local data source. This extra time spent on th e network is a cost in parallelism resulting from the communication proce dure between the processors. The cost attached to communication is si gnificant for parallel systems especially with 100 or more computing units. Such a nu mber is not unusual when it is possible to PAGE 39 24 find scientific computers using 100,000 processors. Figure 3-2 shows the trend in parallel systems that are us ed for scientific computing. Interconnect/Networ k Processor Cache Memory Processor Cache Memory Processor Cache Memory Processor Cache Memory Figure 3-1. Routes for processors to reach data in a parallel system. Figure 3-2. Trend in parallel systems. Scientific computations involve linear systems that resu lt from the discrete form of partial differential equations. Solutions of su ch linear systems are usually attained using iterative solvers. In contrast to solvers that directly invert the coefficient matrix, they correct unknowns repeatedly until a reasona ble approximation is obtained. Iterative solvers are preferred over the direct soluti on techniques especially for large-scale problems because of their computational e fficiency. On the other hand, its success depends greatly on the spectrum of the coeffi cient matrix that influences the number of iterations it takes for the solution to converg e (Barrett 1994). The extension of iterative methods for parallel environments may exhi bit different convergence rates even though original behavior can be achieved at the cost of frequent co mmunication. Attaining PAGE 40 25 convergence rate with a reasona ble communication cost comes as another constraint for efficient scientific parallel computing. The following list is a summary of the key components of efficient parallel algorithm design for scientific computing in or der to assure an optimal arrangement of where and when to store or access data: A balanced distribution of computational load, Minimal communication between processors, Minimal number of iterations for convergence. Domain decomposition methods offer a way to cope with these tr adeoffs in parallel algorithms for solving linear systems. Because it is based on data-parallelism, many flow computations have used this approach duri ng the last two decades (Johan 1992; Liu et al. 2003; Passoni et al. 2001; Wasf y et al. 1998). The distributi on of the computational load may become challenging for large sparse systems that may arise with adaptive mesh refinement. Such data may contain irregular structures which require special partitioning algorithms in order to ensu re the success of the doma in decomposition method. The following sections provide an overv iew on domain decomposition techniques and partitioning methods in order to establish a strategy for parallel implementation of two-phase flow computations with ad aptive grid refinement capabilities. 3.1 Domain Decomposition Methods Domain decomposition methods, in which th e words literally indicate partitioning of a region, are based on the idea of â€˜divide and conque râ€™. Its essence is to divide a large problem into smaller pieces, each of which then solved independently before being combined to obtain the global solution of th e original problem (Chan and Mathew 1994). In the case of a system of equations , the problem is denoted by Equation (3-1). PAGE 41 26 Auf (3-1) where A is the coefficient matrix, u is the unknown vector and f is the source term. This system can be represented as a combination of N subsystems (Table 3-1). Table 3-1. Decomposition of a linear system. Decomposition of unknowns: 1uuN i i for , 1iiN (3-2)TARARijij (3-3) where Ri is the restriction operator that transforms iuu where TRi is the extension operator that transforms iuu Linear system: 11111 1AAuf AAufN NNNNN (3-4) Figure 3-3 illustrates this procedure for th ree subsystems on a lin ear system and its corresponding graph. (a) (b) Figure 3-3. Fundamental idea of parallelism in scientific computing. (a) Representation on a linear system. (b) Representation on a graph or a mesh. Domain decomposition methods were firs t suggested by Schwarz in late 19th century. It garnered interest in late 80â€™s and early 90â€™s due to its intrinsic properties for parallel computing. Domain decomposition methods have been extensively studied for different type of equation systems and produced a variety of methods. In the literature, they are recognized in two di fferent classes; namely, Schur complement methods and Schwarz methods (Smith et al. 2004). Thes e are explained in the following sections. PAGE 42 27 3.1.1 Schur Complement Methods Schur complement methods, which are also referred to â€œiterative substructuringâ€, iterate on the interface unknowns implicitly us ing the internal unknowns as intermediate variables. They are based on organizing a larg e linear system by direct factorization of the matrix, yielding in Schur complements, that eliminates the interior degrees of freedom. Consider the local system, i, in Equation (3-5) that yields from Equation (3-1). ,0 AA A AAiIIIBII iiii BIB BIBBBB ijj iiii jngbruf u uf (3-5) where I and B represent the corresponding values at the interior and boundary of the subsystems. Original Schur complement met hods construct the sub-sytems for boundary unknowns using Dirichlet boundary conditions. The summation term on the left-hand side is the contribution of th e neighboring subsystems to the local equation. Eliminating the interior unknowns from Equation (3-5) yields 1 ,AAAiBBIBBBIIII iiijjiiiii jngbrSuuffg (3-6) where iS is the local Schur complement, given as, 1SAAAABBBIIIIB iiiii (3-7) When all the subdomains are combined together, they constitute a linear system involving only the interface unknowns. 112111 212222 12AA AA AABIBIBB N BIBIBB N BIBIBB NNNNNSug Sug Sug (3-8) PAGE 43 28 The coefficient matrix of (3-8) has the same off-diagonal blocks as the coefficient matrix in Equation (3-4). Obtaining a solution for Equation (3-8) enables us to solve the subsystems independently. Because Schur co mplements are usually dense matrices, nonstationary iterative solution techniques (K rylov methods) are preferred for reducing the computational cost. The strength of Schur complement methods is its global natu re of handling the linear system. Also, parallelism is natural because it doesnâ€™t need to communicate with other processors during the e limination process. On the other hand, the major problems come with the way that Si is computed. It involves the inverse of the coefficient matrix for the interior unknowns, whic h increase the overall comput ation time of the method. This indicates that more sub-systems would deteriorate the performance. Consequently, approximations to these substructures are used through preconditioners on directly employed in the factorized system of Equation (3-8). Convergence rate becomes dependent on the choice of approximate su bstructures and the suitability of the preconditioner. The variants of the described method aim to enhance the convergence behavior of the reduced system. A ve ry popular practice is to employ a Neumann-Neumann preconditioner, which makes use of symmetric decompositions (Bourgat et al. 1989). For the case of non-symmetric systems, it is preferable to use Robin-Robin type preconditioner, which is usually obtained thro ugh a Fourier analysis. However, neither of these preconditioners is exact for arbitrary decompositions and they require additional modifications in the original method. Ot herwise, convergence ra te is observed to deteriorate with an increase in the number of subsystems and become dependent on the PAGE 44 29 aspect ratio (De Roeck et al. 1992; Le Tallec 1994). (Mandel 1993) showed that convergence rate dependence can be ameliorate d by adding a coarser-level problem to the Neumann-Neumann method. This method is known as balancing Neumann methods (BDD). Preconditioners Schur Complement Methods Approximate Schur com p lements Dirichlet Cond. High computational cost for Solving the reduced system Constructing substructures Robin-Robin FETI BDD Poor convergence rate large number of systems irregular subsystems, i.e. aspect ratio NeumannNeumann S y mmetric? YES NO BDDC FETI-2 Adding a coarse level problem Constraints for continuity at the interfaces FETI-DP Figure 3-4. Overview of Schur complement methods. Variations of BDD methods use different weight matrices for Neumann-Neumann preconditioners to form a decomposition of unity on the subsystem boundaries. A recent form of this method, BDDC, is in troduced by Dohrmann (2003) in which preconditioning is based on energy minimization. Such constraints at the interfaces for continuity of unknowns are al so used in dual-primal finite element tearing and interconnecting methods (FETI-DP) , which is a variation of th e original FETI algorithm. FETI methods can be viewed as a two-st ep preconditioned conjuga te gradient (PCG) algorithm (Farhat et al. 2001). Both FETI and BDD variants are difficult to implement, especially for problems with irregular data structures. More importantly, they lack a generalized approach for 3D fluid dynami cs applications. Figure 3-4 summarizes the Schur complement methods. PAGE 45 30 3.1.2 Schwarz Methods Schwarz algorithms divide the computational domain in to sub-regions, possibly overlapping. Each of these sub-regions form s subsystems of linear equations that are solved locally and they are then coupled with other sub-systems to obtain global solution. The classical alternating Schwarz method couples sub-systems using successive exchange of Dirichlet data at the overlap regions (Schwa rz 1869). In the case of matching meshes, alternating Schwarz may be interpreted as additive Schwarz or multiplicative Schwarz methods. The diffe rences between these two methods are highlighted using a system decom posed into two as in Equation (3-9). 111211 212222 AAuf AAuf (3-9) When considered with stationa ry iterative methods, Equation (3-9) can be transformed into the following form: 1 kk NuNAuf, where11 22 A N A (3-10) In Equation (3.10), the first row of N corresponds to subsystem 1 and similarly second row corresponds to subsystem 2 . During the iterations on 1, the entry 12A is moved to the right hand side of the equation using the previous iteration values. The original Schwarz method uses the most recently updated values from 1 in the iterations of 2. This is equivalent to setting 21 A in Equation (3-10), which is analogous to blockwise Gauss Seidel. The resulting scheme is known as multiplicative Schwarz method. This method, however, can only be used in sequential algorithms because the subsystems are solved one after anot her. On the ot her hand, setting allows PAGE 46 31 solving each sub-system independently, wh ich makes Schwarz methods suitable for parallel computing. This technique is known as additive Schwarz method and it is analogous to block Jacobi. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1A2B2A1A0B0B1Subdomain2 Subdomain1 0 0.2 0.4 0.6 0.8 1A2B2A1A3A0B0B1Subdomain2B3Subdomain1 Figure 3-5. Local solves at the end of iterati ons to illustrate the influence of the overlap region on the convergence rate. One of the major drawbacks of classical Schwarz methods is that the size of the overlap region affects the convergence behavior of the iterative solver. In the limit of zero overlap size, it fails to converge. Figure 3-5 illustrates the iteration behavior after each local solve of a simple 1D elliptic equation with an overlap (Garbey 2005). The first iteration of additive Schwarz algorithm yields A0 and B0 at the sub-domain interfaces. Second iteration uses the arti ficial conditions on these lines to obtain A1 and B1. This is repeated until the convergence crit erion is met. As it can also be inferred from Figure 3-5, overlap region determines the convergence rate. When the overlap region gets smaller, convergen ce rate deteriorates. Classi cal Schwarz methods are slow especially when large number of subsystems exist (O (100) or more). This is linked to the overlapping because there are more ba rriers to achieve the actual speed of propagation. The recent improvements/de velopments of the Schwarz methods are summarized in Figure 3-6. PAGE 47 32 Original Schwarz Method ( alternatin g) Additive Multi p licative Dirichlet conditions Can not be in parallel Imitate Aitken-like SteffensenSchwarz Two-level Multi-level Complicates data-structure, especially for unstructured grids Coarse levels may limit parallelism Robin conditions Solve optimization problems for parameters Can be used as a preconditioner to Krylov sub-space methods Overlap Dependency Slow Conver g ence Choice of optimal parameters Figure 3-6. Overview of Schwarz methods. One way to speedup the original Schwarz method is to use two-level or multilevel (multigrid) methods. Besides the usual adva ntages of multilevel techniques, the overlap dependence is ameliorated because the width of the overlap is naturally wide for coarser grids (Smith et al. 2004). On the other hand, implementati on requires deep modifications to the data structure which makes it challe nging for unstructured grids. Furthermore, coarse grids may become a bottleneck for pa rallelism because the ratio of communication over computation amplifies (Garbey 2005). Another remedy is to employ more genera l conditions at the processor interfaces. Lions (1990) proposed to use Robin conditi ons. Since this idea not only enhances convergence behavior but also permits non-overlapping decompositions, it has been followed by many others (Bramble et al. 1998; Dubois 2003; Nataf and Rogier 1995). The Robin conditions are represented in Equation (3-11). PAGE 48 33 1 11 1111 1 12 1112 1 2222 1 21 2222Auf at the overlap Auf at the overlapkk kkk kk k kkuu SuSu nn uu SuSu nn (3-11) In these conditions, Si is a linear operator acting on va riable in tangent direction. The choice of these operators has a consider able influence on the convergence behavior of the domain decomposition methods and bri ngs the major question for these techniques; what would be the optimal transmission conditions? One way to find the optimal conditions is through the Fourier space after taking lower order Taylor approximations of the parameter that makes the convergence rate (in Fourier space) zero. Alternatively, one can choose to solve an optimization problem to minimize the convergence rate (Dubois 2003). Recently, an alternative wa y to accelerate the overlap convergence of Schwarz method, the Steffensen-Schwarz (Aitkenlike acceleration) method, is proposed by Garbey and Dervout (2002). The idea is base d on predicting the itera tion behavior at the overlap region, i.e., the operator 1 kkuuuu is linear. This information is used to construct a linear square matrix P that may satisfy the following system. 1Pkkuuuu (3-12) The additive Schwarz algorithm becomes 1 1IPPnnuuu (3-13) The original Steffensen-Schwarz algorithm was restricted to regular Cartesian grids and has been extended to non-uniform Cart esian grids (Baranger et al. 2005). This PAGE 49 34 method is shown to exhibit good scalability on parallel architectures with large number of processors (Barberou et al. 2003). From the implementation point of view, their algorithm can easily be extended from cla ssical Schwarz as it only operates on the overlap region. A general improvement of Schwarz methods is the use of Krylov subspace methods (Japhet et al. 2001). Its usage can be il lustrated for the classical additive Schwarz algorithm as follows; 12uuBfAukkk (3-14) where B can be used as a preconditioner for Krylov subspace solvers. The additive Schwarz preconditioner reads; 1 1BRAN T iki i R , where AAkk (3-15) Table 3-2. Comparison of Schwarz and Schur domain decomposition methods. Highlights Issues Schwarz Methods Robust and easy to implement. Highly parallel; especially for memory reduction. Convergence rate deteriorates for O (100) or more. Dependence on the overlap width. Two-level Multilevel Convergence rates independent of overlap width. Change in data-structure. Coarse levels may become a bottleneck for parallelism. Optimal Conditions Better convergence rate with reduced independency of overlap width. Allows non-overlapping. Optimal conditions may vary locally: leads to solving an optimization problem. Improvements SteffensenSchwarz Improved convergence rate even for large number of sub-systems. Can easily be modified from original Schwarz. Increase in computational cost, but can be ameliorated with the use of parallelism. Schur complement methods Global nature. Reduced communication cost. Computationally expensive and requires approximations to substructures and use of suitable preconditioners. Difficult to implement and to generalize, especially in 3D. Convergence depends on aspect ratio and number of sub-domains. FETI BDDC Extended for complex problems with better convergence properties. They are even more difficult to implement. PAGE 50 35 Table 3.2 provides a comparison chart for hi ghlights and issues of various domain decomposition methods. 3.1.3 Applications of Continuous Interface Methods The solution strategy for the conti nuous interface methods usually involves projection methods that have two componen ts which are iteratively solved; namely, advection-diffusion equation and pressure Po isson equation. The pressure Poisson equation takes most of the computational tim e of the overall algorithm due to its slow convergence behavior (Shyy 1994). The nature of this equation is elliptic. In order to benefit from parallelism, methodology for domain decomposition needs to address the issues of an elliptic type of equations. In these systems, the speed of propagation is infinite and it is crucial to find an appropriate domain decomposition method that has loosely coupled subsystems. The interfacial jump conditions and materi al property variations bring another challenge for continuous interface methods. Th ey appear in the linear system as strong variations in the coefficient matrix and as the source term due to freely moving interface. When the domain is decomposed into a nu mber of sub-problems, the location of the unknowns is expected to have an influen ce on the convergence rate. This can be illustrated using a case of a spherical bubble in static equilibrium at different locations with respect to the decomposed domain (Figure 3-7). (a) (b) (c) Figure 3-7. Different locations set for the simu lations. (a) Case I. (b) Case II. (c) Case III. PAGE 51 36 Each configuration is tested for density ratio of 1 and 10 to see the effect of density variation and location on the convergence behavior. Figure 3-8 shows the convergence trend of the interface location for different density ratios. The increase in the iteration number is marked with respect to single processor runs. The number of iterations required to converge increases as the interface is shared with more processors for density ratios different than one. These results in dicate that the domai n decomposition method needs to consider such variations in the coefficient matrix. Mandel (1992) and Dryja & Widlund (1996) tackled this problem using Schur complement methods that has a coarser grid level. Their appr oach requires locally uniform meshes that a coarser mesh can re solve the discontinuous coefficients in the coefficient matrix. In practice, the precond itioner chosen for the domain decomposition, partitioning of adaptive meshes and coarse grid construction are not directly linked. Graham and Hagger (1999) points out that it becomes unreasonable to assume that these discontinuities can be resolved by the coarse r mesh for unstructured grids. They suggest that the use of additive Schwarz preconditi oners with conjugate gradient method produces optimal convergence rate that is only weakly dependent on the discontinuities. CASE# IncreaseinIterationNumber 1 2 3 1 1.1 1.2 1.3 1.4 DensityRatio1 DensityRatio10 Figure 3-8. Increase in iteration number accord ing to the placement of the interface with respect to the processor distribution. PAGE 52 37 Many studies in two-phase flow computati ons have employed parallelism in order to reduce the computational cost and enable large-scale simulations. However, existing studies usually do not emphasize the feat ures/performance evaluation of the parallel implementation (Liovic et al. 2004). Even though most examples employ Eulerian tracking, their domain decomposition methodo logy can be generalized for continuous interface methods that employ markers for tracking, i.e., immersed boundary method. This is mainly because the challenges they face for domain decomposition are closely related, i.e., rapid variations in coefficient matrix and the source term. Algorithmic variations have been reported, e.g. in Al-Rawahi and Tryggvason (2004) . They employed immersed boundary method in whic h they reserved an additional processor only for interface tracking. McQueen and Peskin (1997) implemented a shared memory implementation of immersed boundary method based on loop parallelism. They achieved 60% parallel efficiency for 16 processors. Xiao (1999) used classical additive Schwarz method by decomposing a uniform Cartesian grid into slices. They used level-set method for interface tracking with a Krylov solver, sp ecifically Bi-CGSTAB. They obtained 77% efficiency for up to 32 processors. Their results indicate that the more processors are used, the more efficiency is lost. Another example of overlapping domain decomposition with level-set approach in three-dimensions is developed by Croce et al. (2003). In their work, Cartesian grids are employed. Hill et al. (2004) studied parallel phase field model for flow with elastic membranes. Their parallel implementation employs the open source parallel linear solver, PETSc. They used Schur complement method with a Schwarz preconditioner for PAGE 53 38 the reduced system. This aimed to address the difficulties that may arise due to discontinuous coefficients. However, their results indicate an increase in iteration number with square root of the problem size. Sussman (2005) employed level based para llelism with adaptive mesh refinement using coupled level-set and volume of fluids method. They used additive Schwarz domain decomposition technique at each level, separately and hence the problems that may arise in domain decomposition technique due to unstructured grids were avoided. They used a parallel library developed by CSSE group at Lawrence Berkeley National Laboratories (Rendleman et al. 2000). Th ey used up to 16 processors, and their performance result indicated no improvement in computational time when they used more than 8 processors, for which they achieved a speedup of ~75%. 3.2 Partitioning The decision for choosing sub-problems for domain decomposition is referred to partitioning, which is a crucial ingredient of efficient parallel scientific computations. Once the decomposition is marked in the data, sub-problems are mapped onto a number of processors. This procedure is trivial fo r uniform Cartesian grids. The sub-domains could be arranged by dividing the number of cells to the corresponding number of available processors in different ways. Figure 3-9 illustrates some examples of partitioning for regular Cartesian grids. Figure 3-9. Examples of possible par titions on regular Cartesian grids. PAGE 54 39 When unstructured Cartesian grids are c onsidered, the linear system usually produces sparse coefficient matrix (Figure 3-10). In order to solve the system efficiently, we seek to decompose the problem in such a way that each proce ssor would have equal number of elements while keeping the sub-domain data as clustered as possible. For this purpose, partitioning algorithms use either computational mesh or the coefficient matrix to obtain such decompositions. The graph of the coefficient matrix may be used to reorder the rows and columns in order to obt ain a better clustered representation of the original system. This can also be obtained by ordering the elements of the computational mesh. Figure 3-10. Adaptively refined Cartesian grid. Following section highlights some of the most popular technique s that have been employed for scientific parallel computing on unstructured grids. 3.2.1 Partitioning Techniques Partitioning techniques are usually recognized in two categories: geometric and graph partitioners. Geometric techniques use the information on the location of the cells and they usually ignore the connectivity information. On the other hand, graph PAGE 55 40 partitioners take advantage of the connectivity information to work on the graph of the coefficient matrix. Since partitioning algorithms are used so as to reduce the computational cost, it needs to be completed fast and as space-efficient as possible. Being an NP-hard problem, it garnered interest from th e scientific community for exploring ways to find good strategies. There exist open source libraries which are devoted to solely partitioning problems. These include METIS (Karypis and Kumar 1998), JOSTLE (Walshaw et al. 2002), CHACO (Hendrickson and Leland 1995) and PARTY (Schamberger 2004). These packages follow the multilevel partitio ning method which is a graph partitioner. 3.2.1.1 Multilevel schemes Multilevel schemes consist of three phases : graph coarsening, initial partitioning, and multilevel refinement. The graph coarse ning phase involves constructing several graphs to find a related coarser graph. This graph is used to find another related coarser graph and this will be repeated until a su fficiently small graph is obtained. The partitioning problem is then solved for the sm allest graph and cells in higher levels are assigned to different partitions according to th eir lower levels. Then a local refinement is applied to further enhance the solution. This process usually leads to a high-quality partitioning of the original graph. Figure 3-11 illustrates this procedure. Figure 3-11. Coarsening procedure in mu ltilevel partitioning method. Computational nodes are colored circles and lines show the connectivity information. PAGE 56 41 Multilevel method is successfully used for partitioning unstructured grids with/without adaptive capabil ities (Lang and Wittum 2005; Park and Kwon 2004). In some applications, such as multiphase flows, the load weight of cells in the mesh may vary rapidly or unexpectedly. In addition, wh en an adaptive grid refinement is employed, the graph structure may change frequently. In these cases, load has to be rebalanced very quickly to avoid runtime bei ng dominated by the partitioni ng algorithm. Although multilevel methods produce high quality partitions, they execute slowly which makes it unsuitable for dynamic adaptive problems. On the other hand, another partitioning method, space filling curves, offers a relatively faster approach ( O ( N log N )) to balance meshes containing nodes with fluctuating computational load. 3.2.1.2 Space filling curves Space filling curves are special functions pa ssing through all the po ints in space in a compact way. Such curves provide 1D orde ring of points in 3D space that translates physical locality into memory locality. Figure 3-12 illustrates the most popular space filling curves: Hilbert, Lebesgue and Sierpinski. (a) Hilbert (b) Lebesgue (c) Sierpinski Figure 3-12. Popular space-fillin g curve structures in a 16x16 grid. Partitioning algorithms take advantage of such curves to obtain locality. Zumbusch (2003) showed that partitioning usin g space filling curves is quasi-optimal for PAGE 57 42 adaptively refined grids. The communication volume is bounded by 2/3CNP for 3D curves, where N is the number cells, P is the number of partitions and C is a constant dependent on the curve employed. Applications using space filling curves as partitioners include Aftosmis et al. (2003), Berger et al . (2003) and Bartholdi and Goldman (2001). 3.2.2. Quality Considerations When deciding on the partitioning algorith m, one needs to consider the following requirements for effective parallelizatio n: load balancing, communication overhead, convergence behavior, and computational time spent for partitioning. Figure 3-13 shows examples of possible partitions to illustrate the quality considerations. (a) (b) (c) Figure 3-13. Possible partitions. (a) Poor load balance. (b) Poor lo cality and disconnected sub-domains. (c) Good load ba lance and goof connectivity. 3.2.2.1 Load balancing The operation count on each cell is the same which indicates that each represents some fraction of the overall computational load of the simulation. Provided that cells have equal weight of computational load, distribution of equal number of cells on partitions is an indicator of good load balancing. Figure 3-13(a) shows a partition with poor load balancing as the number of cells in different sub-domains varies. Space filling curves offer partitioning that considers th e variations in work load. One can assign weights on cells that may also change du ring computations. Load balancing can be performed at any time without modifying the original space filling curve used. PAGE 58 43 3.2.2.2 Communication overhead Processors communicate with others after each local solve. In overlapping domain decomposition, the information to be exchan ged lies in the overlapping region. This indicates the communication volume which can be quantified by the ratio of the elementfaces on partition boundaries, known as edge-cut s, to the total elements in the partition. One of the objectives of the partitioning algo rithms is to minimize this ratio. In the illustrated example, partitioning quality is low in Figure 3-13(b) when compared to Figure 3-13(a) and Figure 3-13(c) as the communication volume is larger. Comparisons between multilevel algorithms and space-filli ng curves have been mostly focused on the communication volume (Hendrickson and Devi ne 2000; Schamberger 2004; Schloegel et al. 2004). Results indicate that multileve l methods determine partitions with lower communication volume especially for small num ber of partitions. In their study, they presented that the gap closes as the number of sub-domains increase. Space filling curves have better properties on orthogonal grid s and they are shown to perform even better when more cells are employed. Another consideration for communication overhead is the num ber of neighbors each sub-domain has. This correlates the number of message startups for each communication. Exchanging information with a number of neighboring processors often requires serialization of message setup, making this an important measure of scalability. 3.2.2.3 Convergence The convergence behavior of the linear sy stems of sub-domains is affected by cellordering algorithm. If the partition has unc onnected cells, the solution may not converge. Partitioning algorithms with locality propertie s reorder cells or the coefficient matrix to PAGE 59 44 obtain a clustered graph of the overall system. Figure 3-13(b) shows disconnected cells (14, 15, 26, etc) which would hamper the convergence of th e local solution. Table 3-3 illustrates the results of Olik er et al. (2000), in which the overall computational time for a linear system is compared. The system is solved using conjugate gradient (CG) with different orderi ng routines. As it can be seen from the table, self avoiding walk (a variation of SF Cs) is found to be three times faster than METIS while they both achieved above 75% scalability. Table 3-3. Overall computational time spen t to solve a linear system of 6.6x105 elements using CG with different orderings. P Original METIS Self avoiding walk (a variation of SFCs) 8 8.652 7.662 2.916 16 5.093 2.909 1.491 32 3.167 1.468 0.795 64 1.929 0.961 0.462 3.2.2.4 Computational time Scientific simulations with adaptive mesh refinement capabilities require repartitioning whenever grid structure is modified. Additional computation time resulted by partitioning algorithm may hamper the advant ages of parallel computing. In order to prevent such computational inefficiency, a very fast partitioning al gorithm is required. SFCs are very fast with an operation count of O( N log N ) (Pilkington and Baden 1996), whereas multilevel methods are known to have computational cost (Teresco et al. 2005). 3.3 Summary In this chapter, an overview on multi-domain techniques is presented for scientific computing and for numerical two-phase fl ow simulations. Multi-domain techniques provide a strategy for solving large linear sy stems in multiple domains. When practiced with parallel computing, multi-domain techni ques help developing scalable algorithms PAGE 60 45 that would utilize the computational resources economically. Key components of multidomain technique include domain decomposition methods and partitioning algorithms. Domain decomposition methods contain the essence of â€œdivide-and-conquerâ€, which refers to dividing a large problem into smaller pieces, each of which then solved independently before being combined to obt ain the global solution of the original problem. Among many techniques, domain de composition is categorized into two major groups, Schwarz and Schur complement method s. Since there are concerns related to relative difficulties with Schur complement methods, especially when two-phase flow computations are considered, present study focuses on Schwarz methods. It is known that the classical Schwarz methods are slow, especially when the number of sub-domains is large. Fortunate ly, it is possible to improve the classical Schwarz algorithm via multilevel methods, wh ich can eliminate the convergence rate dependence on the overlap region. Other factors, such as variations in stiffness matrix for unstructured systems are discussed in the co ntext of two-phase flow computations. Using domain decomposition methods on adaptively refined grids requires development of a strategy for partitioning, which is known to be NP-hard problem. This study particularly employs space filling curves for partitioning due to its capability of producing good quality partitions at a low computational cost. PAGE 61 46 CHAPTER 4 MULTIDIMENSIONAL IMMERSED BOUNDARY COMPUTATION 4.1 Immersed Boundary Method Immersed boundary method (IBM) employs Eu lerian and Lagrangian variables in order to perform the interfacial flow computations. Eulerian quantities are solved on the stationary background grid, whereas Lagrangi an quantities arise due to the marker points defined on the interface which can move fr eely. Immersed boundary methodâ€™s strength lies in its ability to commun icate between these quantities, which are connected through the Dirac delta function. IBM solves a single fluid formulation for all the constituents by smearing the properties across the interface. Incompre ssible Navier-Stokes equations for mass and momentum conservation are given in Equations (4-1) and (4-2) respectively, which also accounts for the interfacial dynamics. Th e source term in the momentum equation ( Fs) arises from the surface tension ( ) and the curvature ( ) as shown in Equation (4-3) (Brackbill et al. 1992). 0 u (4-1) T su uupuuFg t (4-2) xXs SFndS (4-3) The surface force in Equation (4-3) is computed using the Lagrangian marker points, X , and is translated into an Eulerian quantity, x , via the approximate discrete PAGE 62 47 Dirac delta function, xX. After these equations are solved, approximate Dirac delta function is used again for obtaining the marker velocity field to move marker points for obtaining the new geometric surface representation. Interpolate marker velocities and move markers Conserve p hase volume Compute indicator function and get fluid properties. Compute surface tension and compute Fs. Solve for velocity and pressure. Single set of equations for all Redistribute markers for desired i Need to visit surface quality? Topological change? Reconstruct Interface based on indicator Scale interface to conserve volume. BEGIN YES NO NO YES Figure 4-1. Flowchart for immersed boundary method for the computations of two-phase flows using marker points with capabili ties of handling topological changes. Figure 4-1 illustrates the overall process of immersed boundary method. As the interface evolves, markers are added or de leted to maintain in terface resolution. PAGE 63 48 Capabilities of handling topological changes are included in this algorithm utilizing levelcontour based reconstruction algorithms. Key components of the pr esent IBM algorithm are: Interface tracking via marker points Conservative reconstruction Interface construction for topological changes Solid boundary treatment The interaction between the interface and the computational boundaries Connection between Eulerian and Lagrangian variables The following sections explain the details of the numerical implementation of immersed boundary method. 4.1.1 Interface Tracking Fundamental goal of interface tracking is to construct a geometric representation for the surface separating the constituents. Fo r this purpose, IBM uses marker points which are defined on the interface. Two-di mensional interfaces are represented by line segments whereas three-dimensional interfaces are represented by a triangulated surface grid as shown in Figure 2.2(a)-(b) where the nodes are interface markers. () (a) (b) Figure 4-2. Interface represen tation by marker points. (a) Curve by line segments in 2D. (b) Surface with triangu lar elements in 3D. The elements of the interface, either representing a line segment or a triangulated surface, are represented using the same data structure, which is given in Figure 4-3. The nodes within an element are arranged in a way that the yielding normal vector PAGE 64 49 computation points outward direction from th e element. When the computations are held in two-dimensional coordinates, z-coordinate is set to a constant value and the last node in the element list is set to a negative valu e to make sure that it is not included accidentally during the computations. L1 N1 N2 N5 N7 N4 12 12 tt n tt t1 t2 t3 y x t1 t2 12 12 tt n tt N1 N2 Imaginary unit depth Figure 4-3. Data structure to repres ents the interface bo th in 2D and 3D. 4.1.2 Conservative Reconstruction The quality of surface repr esentation depends on the way that the markers are placed on the surface. When the distance betw een any two marker points is too large or too small, there is a danger that unphysical undulations occur on the interface data. Such effects can be avoided by continuously addi ng or deleting marker points to maintain appropriate marker spacing. Marker addition and deletion may cause a loss/gain in the volume enclosed by the interface. Conservation pr operties can be enforced by modifying the addition and deletion procedure to account for th e defects locally (Singh and Shyy 2006). Figure 4-4 illustrates this procedure in two-dimensions for marker deletion. The volume defect due PAGE 65 50 to the deletion of markers is recovered by m oving the newly placed marker in the normal direction to match the original volume. Area N1 N4 N2 N2 new N3 Ref N1 N4 N3 N2 Ref N1 N4 N2 new Area n (a) (b) (c) Figure 4-4. Marker deletion procedure for th e old interface to form the new interface. (a) Non-conservative marker deletion, (b) Intermediate step for conservation that computes a reference area. (c) Conservation via moved marker. 4.1.3 Interface Construction for Topology Changes The interaction of two or more bubbles may require a change in the interface topology. When two interfaces come within a certain distance, th ey merge to form a single entity. The criterion to trigger a topol ogical change is based on the distance between the two interfaces coming close to each other because the smearing effect of each starts to interfere the other. In such cases, marker poin ts are required to go through an extensive rearrangement process, which can be very challenging since the connectivity information needs to be preserved. Instead, th e indicator function, which is related to the Dirac delta function, is used to reconstruct the interface. Figure 4-5 illustrates an indicator function which varies from zero to one to identify constituents. When reconstruction is requested, the indicator function is used to redraw the interface at the location where the function value equals to half. Shin and Juric (2002) suggested this idea and applied it by removi ng all the marker points and replacing the interface data with the new one. The pres ent work tracks the interface with its PAGE 66 51 connectivity information. Unlike the conn ectivity free method introduced by Shin and Juric (2002), reconstruction is invoked only when the topological change is required. I =0 I =0 I =1 Figure 4-5. Indicator function varies from zero to one to identify the constituents lying inside or outside the interface. The topological change decision is made using a probe from the interface in the positive and negative normal direction. If the probe detects that both these indicator values lie in the same range (both are less or greater than half), the algorithm recognizes that either two interfaces had become clos er or the interface b ecame thinner. This triggers a reconstruction procedure which can re sult in an interface break-up or merger cases. Reconstruction is performe d by marking the cell corners and face-centers with a flag that is assembled using th e indicator values. These flags in each of the relevant cells are scanned to create new marker points, whic h are placed on the relevant faces where the indicator function equals to half. The connect ivity of marker points are determined based on pre-identified element construction scenario s. Some of these cases are illustrated in Figure 4-6. The new elements may not immediately be suitable for the data structure, i.e., polygons instead of triangular elements ma y form in 3D. These are broken into PAGE 67 52 proper/acceptable elements once it is formed. The details of 3D reconstruction can be found in Singh and Shyy (2005). 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1 0 0 1 0 1 1 0 1 1 1 Figure 4-6. Reconstructi on process in 2D and 3D. 4.1.4 Modeling Complex Solid Boundaries One of the biggest challenge s in computations of flui d flows with complex solid boundaries is to preserve the quality of th e grid when boundary conforming methods are used. On the other hand, using non-boundary co nforming strategies can help us to tackle such challenges. Such methods are easier to generalize, computa tionally efficient and can be used along with Cartesian grids. The elastic/rigid solid bodies can be defi ned using marker points when Cartesian grids are used. Implementation details of this approach are quite similar to the algorithm by Yang and Balaras (2006). This method employs given boundary conditions on an arbitrary interface by reconstruc ting a velocity field at cell s near the interface. This process doesnâ€™t involve smearing and can be seen as a sharp interface method. The present study employs the modeling ap proach where cells around a solid body are marked as forcing, pseudo-fluid and solid cells. Forcing points are used to define the imposed boundary conditions on the flow fiel d; pseudo-fluid points are required in case PAGE 68 53 the solid body can freely move and solid points are turned off for velocity and pressure field computations. As the flow field can have multiple fluid components and complex solid bodies, a function of I smoothly varying from zero to two is formed. Certain ranges are marked to indicate if the cell lies in the vicinity of a different constituent than the bulk material. This is illustrated in Figure 4-7 with the ranges us ed in the computations. Solid I = 2 Fluid I I = 0 Fluid II I = 1 Figure 4-7. Indicator function va rying smoothly from zero to two. The velocity field is constructed at the forcing points to yield the boundary condition defined on the solid boundary. Forcing points are defined in the fluid phase, which have at least one neighboring point insi de the solid interface (a pseudo-fluid point). The velocity components are defi ned at the faces of a cell. Figure 4-8 shows a representation for flagging x-oriented faces as liquid, solid, forcing and pseudo-liquid faces. In Figure 4-8, the triangle shaded areas represent the point s considered for reconstruction of forcing point velocities. The construction invol ves an interpolation schema that uses one point on the interface (which does not necessarily have to be a marker point), two points from the liqui d faces. Among the interpolation schemas suggested by Young and Balaras (2006) and Gilmanov and Sotiropoulos (2005), present study adopts the former one. These schema s mainly differ by the points involved. PAGE 69 54 Young and Balaras (2006) schema gives a more compact scheme, represented by triangular area marked with A in Figure 4-8 than the one by Gilmanov and Sotiropoulos (2005). Figure 4-8. Definition of faces arou nd the solid interface for u-velocity. The interpolation procedure is perform ed assuming a linear variation of any variable . Equations (4-4) and (4-5) is the formulation of the procedure in 2D. 123bbxby (4-4) 1 1111 2222 33331 1 1 bxy bxy bxy (4-5) In Equations (4-4) and (4-5), xi, yi represents the corners of the triangle presented in Figure 4-8. For stationary objects, the coeffici ents can be obtained once and can then be used for reconstructing the velocity field at each time step. On the other hand, the system has to be solved at every time step for m oving boundaries. 3D com putations are achieved in a similar manner by adding an additional point to obtain the coefficient of the z coordinate, b4. PAGE 70 55 4.1.5 Interactions between Interface and Computational Domain Boundary When we consider the case of a drop on a solid surface, the inte raction of all three phases, fluid-fluid-solid, becomes very challe nging. During the im pact, the drop starts spreading on the surface forming a thin film and if it doesnâ€™t break up, it would recover its thickness back, which is known as recoiling. This behavior is found to be influenced by the following parameters. Drop inertia at the time of impact in fluences the spread characteristics Surface tension determines recoil frequency Drop viscosity damps out the drop sp reading and recoiling processes Contact angle affects both drop sp reading and recoiling processes Figure 4-9 illustrates the sn apshot of a bubble after the impact. The contact line is illustrated along with the angle, . Red box illustrates the criti cal part of the problem for numerical simulations, which needs to be handled using a proper model. Vapor Liquid Interface Contact Line Solid g U d D H h Ts Figure 4-9. Snapshot of the geometry of the interface an instan ce after the impact. The interface data is required to be modified to handle the interaction between the solid wall and the interf ace at the contact point. Interfaces use a probe similar to the one introduced for the topological ch anges to identify whether th ey are close to any of the PAGE 71 56 computational boundaries. In case any of its elements is in the vicinity of the boundary, it snaps itself to the computational boundary . The connectivity information for the interface is edited to include a negative num ber when the element is connected to a computational domain boundary. The negative nu mber is chosen in a way to represent the east, west, north, south, front and back boundaries. The proper model then can be applied if the computational boundary is define d as solid, symmetric, inflow or outflow. For simplicity, the wall condition utilizes a constant contact angle model, which lets the marker slip on the no-slip surface de pending on the first marker inside the flow domain based on a predeter mined contact angle. For symmetric boundary conditions, the mark er on the computational domain sets its normal in the direction of the symmetry axis. This is achieved by an imaginary element on the other side of the computati onal domain. The inflow condition sets its contact angle according to the dire ction that the flow comes in. 4.1.6 Connection between Eulerian and Lagrangian variables The equations of motion are solved on the Eulerian grid including the surface tension forces which are computed on the Lagrangian marker points. As shown by Equation (4-3), transferring quantitie s from one to another involves Dirac delta function (denoted by ). This is illustrated via Equation (4-6) which relates the surface forces computed on the Lagrangian ( X ) surface to Eulerian ( x ) grid. (x)(X)xXs f fdS, Lagrangian to Eulerian (4-6) The force due to the surface is computed on the line segments or the triangular elements using its tangent and normal vectors. This procedure to compute the surface force is relatively simpler than extracting the curvature from the marker points. PAGE 72 57 Moreover, it produces good results due to its intrinsic nature of smoothing dilations (AlRawahi and Tryggvason 2004; Shin and Juric 2002). In addition to the surface force, veloc ity field is the second quantity to be computed. It is first calculated at the Eu lerian grid and then interpolated on the Lagrangian marker points using Equation (4-7). u(X)uxxXnn h vdv (4-7) The new marker locations are then computed using Equation (4-8). X uXnt (4-8) The material properties, i.e., density and viscosity, are discontinuous across the interface. The variation needs to be smoothe d in order to facilitate a single fluid formulation. An indicator function, I , that varies smoothly from one (outside the interface) to zero (inside the interface) is employed for this purpose. The indicator function is computed on the background grid using the Dirac delta function which is given by Equation (4-9). Once the indicator function is computed, the fluid properties varying from 1 to 2 are computed using Equation (4-10). 2 xXs I xndS (4-9) 212 I (4-10) Equations (4-7)-(4-10) implies that the Dirac func tion is actively involved in the procedure and directly affects the numerical fo rmulation. The presented formulations are exact in the continuous form. The discrete form require s a smoothed approximation to Dirac delta function. This â€œapproximationâ€ can be acquired in various ways, but the PAGE 73 58 quantity needs to be conserved during the transformation process. Equation (4-11) ensures that mass and momentum are cons erved, while energy conservation can be imposed by Equation (4-12) (Peskin 2003). 3xX1h sh, for all X (4-11) 3xXxX0h sh, for all X (4-12) In Equations (4-11) and (4-12), h is an approximated delta function which is scaled with grid spacing h . There are several Dirac delta function approximations that satisfy these fundamental requirements. Am ong them, some have enhanced properties due to the consideration of additional constr aints on the construction of the delta function (Peskin 2003; Tornberg and Engquist 2004). A one-dimensional representa tion of delta function ( ) is used in the following formulations to illustrate the main ideas of approximations. A three-dimensional quantity h can be obtained from either through the di stance function (Equation (4-13)) or through the product of three one -dimensional quantities of (Equation (4-14)). 1 ()hdr h, where xXr h (4-13) 123 31 ()h x rrr h, where 3 12 123,, x xx rrr hhh (4-14) Equation (4-13) has been used with level-set methods because the shortest distance from any cell to the su rface is readily available. For other met hods, it is computationally convenient to use Equation (4-14) instead. Unbounded functions require all of the Lagrangian points to interact with all the Eulerian points, even though th ere wonâ€™t be any significant contribution from the far PAGE 74 59 fields. Therefore, from the computational poi nt of view, it is desi rable to have a bounded region for the approximate Dirac delta function (i.e., rm ). The additional constraints on the constr uction procedure can be developed by applying Taylor expansion for f(x) in Equation (4-6) which yields Equation (4-15). (1) (2)() 2f()ff ff 2!jj p p jjrrjrrjrjr rr jrjrjrjr p (4-15) Equation (4-15) leads to the definition of moments (p M ) of the order p . The equalities for moment conditions are obtained in Equation (4-16) by matching the left hand side terms with the right hand side terms. 01 2 21, 0, 0, 0jj p p jjMjrMjrjr MjrjrMjrjr (4-16) The bounded discrete form of Dirac delta function can only satisfy a certain number of moment conditions. Tornberg and Engqui st (2004) indicate that the higher power moment conditions are satisfied, th e more accurate the approximation gets. When carefully examined, zeroth moment, 0 M , is actually the condition required for mass and momentum conservation, whic h is previously st ated by Equation (4-11). In addition, First moment condition, 1 M , resembles the energy conservation of Equation (4-12). The second moment condition, 2 M , on the other hand, indi cates that the mean square thickness of the bounda ry is zero (Peskin 2003). The consideration of all the moments is sufficient for the total quantity but the discrete operators on the Eulerian grid may not see their effects locally . For instance, if a PAGE 75 60 central difference operator is defined in a twocell width stencil, there is a possibility of odd and even decoupling. This is mainly because the distribution does not impose any constraint on even or odd cells (Figure 4-10). -3 -2 -1 0 1 2 3 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 r M0 Sum of Odd Cells Sum of Even Cells Sum of both Figure 4-10. Odd/even modes of the first mo ment for piecewise cubic function produces a checker-board field. In order to overcome this problem, Pesk in (2003) suggests a stronger condition on 0 M that would ensure that the even and odd ce lls have equal weight in the distribution with Equation (4-17). 2 1 even j odd jj r j r (4-17) Transferring the approximated Eulerian quantity in Equation (4-15) back to the Lagrangian framework is expect ed to yield the original function. Similar analysis is established by observing the function recovering its original value in the reverse transfer direction. If we apply the Ta ylor series expansion on Equation (4-15): 1f()ffp p krpr j rjrkrMrjr (4-18) PAGE 76 61 In the ideal situation, the second term needs to be zero. On the other hand, the first term on the right-hand side can be manipulated to yield Equation (4-19). This is done by relating the point k with j . ()r j kjrkr (4-19) This condition implies translation-invari ance, which can not be satisfied for functions that are bounded (Peskin 2003). Fo r such functions, one can impose a weaker form of the equation by setting j equal to k yielding the Equation (4-20). 20jrjC (4-20) According to Peskin (2003), the ultimate goal for utilizing approximate delta functions is to hide the effect of the Eule rian grid as much as possible. Among the approximate delta function examples presented in Table 4-1, the ones that satisfy more moment conditions are expected to be less in fluenced by the numerical effect introduced by the approximation. On the other hand, satisfying higher order moments requires sacrifices from other desired properties. Table 4-2 gives a comparison of the pr operties of the delta approximations presented in Table 4-1. Even though approximation for C (piecewise cubic function) satisfies higher order moments; the odd-even coupling is not forced by the first moment condition. This limits the usage of central differencing operators wh ich primarily utilize the two-cell width stencil of co llocated arrangements. In such arrangements, the pressure and the velocity fields are defined at the sa me location on a given grid. On the other hand, B , introduced by Peskin (2003), consider s the odd-even decoupling but satisfies only M0 and M1. PAGE 77 62 Table 4-1. Examples of delta function bounded by two cell width. 1 1cos,2 () 42 0,otherwiseAr r r -3 -2 -1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 r(4-21) 2 21 527124,12 8 1 ()32144,01 8 0,otherwiseBrrrr rrrrr -3 -2 -1 0 1 2 3 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 r(4-22) 23 23111 1,12 66 11 ()1,01 22 0,otherwiseCrrrr rrrrr -3 -2 -1 0 1 2 3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 r(4-23) Consistent usage of discrete divergen ce and gradient operator via momentum interpolation allows collocat ed arrangements to eliminate the possibility of odd-even decoupling. However, when large gradient s of pressure field may still result in physically unrealistic solutions where the constructed cell -face velocities do not lie between the neighboring cell-cen tered velocities (Miller a nd Schmidt 1988). Parallel observations are reported by Gu (1991) and C hoi et al. (2003) for the cases where the body source term is very large. Similar situations may also arise for the immersed boundary method, which represents in terfacial conditions as body forces. PAGE 78 63 Table 4-2. Comparison of various delta functions. ()Ar ()Br ()Cr Continuous function Yes Yes Yes Bounded region 2 cell-width 2 cell-width 2 cell-width Odd-even coupling on M0 Yes Yes Oscillates btw 0.0-1.0 pth order moment 01M 10.02M 20.50.55M 01M 10M 20.50.55M 01M10M 20M30M 40.21.0M 2jrjC C = 0.375 C=0.375 Oscillates btw 0.6-1.0 Figure 4-11 illustrates a possible unrealistic field that has been produced by one of the delta-functions. On the other had, stagge red grid arrangements, in which the velocity field is defined at locations where the pre ssure gradient is de fined, avoid odd-even decoupling naturally without introducing artificial terms in the continuity equation. -3 -2 -1 0 1 2 3 4 5 -0.1 0 0.1 0.2 0.3 0.4 0.5 igrad(P) Original Discrete Cell-Faces Cell-center Face value do not lie between cell velocities Figure 4-11. Gradients computed using central differencing operators for cell-center and cell-faces for a rapidly varying pressure field. The illustration is computed in 1D using an analytical repres entation for pressure field. The present study employs the delta f unction proposed by Peskin (2003) (B in Table 4-1) on staggered grid arrangements in order to avoid any possibilities of odd-even decoupling. PAGE 79 64 4.2 Adaptive Cartesian Grid Based Flow Computations Various approaches have been successfu lly developed for the Navier-Stokes flow computations (Issa 1986; Kim and Moin 1985; Patankar and Spalding 1972). The present study focuses on flow problem s at the incompressible limit. Therefore, the compressibility effect caused by the Mach num ber is not considered. For fluid flow computations, the fractional step method or iginally developed by Chorin (1968) is employed because it offers efficient soluti ons to the Navier-Stokes equations for incompressible flows (Equations (4-1) and (4-2)). The fractional step method decouples th e pressure field by integrating an incomplete form of the momentum equations. This yields an approximate velocity field, *u, which is usually not divergence free. Equation (4-24) gives the semi-discrete form of Equation (4-2) for a time step size of t . *uu1 uu fntRe (4-24) The pressure field is solved using the c ontinuity equation. This equation can be obtained by taking the diverg ence of the momentum equati on while considering solely pressure term. Equation (4-25) indicates the computati on of the continuity equation. 1*1uuunnp t *u p t (4-25) The intermediate velocity field is corrected using the pressure gradient to make the final velocity field divergence free (Equation (4-26)). 1*uuntp (4-26) When applying the fractional step method, one needs to solve a sequence of decoupled equations for the velo city and the pressure at each time step. This makes the PAGE 80 65 large scale unsteady numerical simulations of incompressible flows very efficient. For this reason, the fractional step method is very suitable for immersed boundary method. 4.2.1 Grid Adaptation and Data Structure Using a single set of equations for all of the constituents brings rapid variations to flow variables and material properties ar ound the interface. Resolving the whole domain to the desired length scale may result in unne cessary resolutions at regions far from the interface. This would require an enormous amount of memory and computational power especially for large-scale simulations. One remedy is to employ adaptive grid refinement, which offers a way to locally adjust the grid sizes to the desired length scale. Adaptation is performed on an unstructured Cartesian grid similar to the works of Aftosmis (1997) and Ham et al. (2002). (a) (b) (c) Figure 4-12. Steps of interface location based adaptation. (a) Initial 10x10 base grid. (b) Grid after two level of adaptation. (c) Grid after four levels of adaptation. The grid is initialized in a uniform stru cture which has prescribed number of cells in each coordinate direction. The cell that needs to be refined is split into two in each direction yielding four or eight equal sized sma ller cells in 2D or 3D respectively. This procedure brings additional levels of grid in which new coordinates in computational space is assigned to the level that cells reside . To ensure a smooth variation in grid cellsize for quality and simplicity, cells sharing a fa ce are not allowed to differ by more than PAGE 81 66 one level of refinement. The adaptation is triggered by two different mechanisms. These are interface location and flow solution based adaptation. Interface based adaptation refines the cells around the interface to the finest level. As the approximate delta functions usually operate on uniform mesh, cells within the vicinity of the interface are always kept at this resolution. The typical width of this region contains 8 to 10 cells ac ross the interface so that the accuracy is maintained even when the interface moves before adaptation is requested. This procedure is illustrated in Figure 4-12. Solution based grid adaptation method is de signed to handle larg e gradients in the flow field for the cells away from the inte rface. The current implementation considers velocity-curl to activate the adaptation proce ss. Implementation deta ils can also be found in Singh et al. (2005). 4.2.2 Staggered Grid Arrangement Grid arrangements have an influence on the features of the numerical algorithm because spatial discretization of the Navier-S tokes equations involves the discrete form of divergence and gradient operators. Correct velocity field is di vergence free only when the discrete Laplace operator in the pressure Poisson equation is defined in a consistent and conservative manner with the discrete gradient operator in Equation(4-25), i.e., the product of discrete divergence and discre te gradient operato rs (Tafti 1995). Staggered arrangements offer a natura l way to prevent the checker-board oscillations that may happen when using co llocated arrangements (Ferziger and Peric 2002). This is achieved by shifting the cont rol volumes for momentum equation where the pressure gradient is de fined. The schematic in Figure 4-13 illustrates the staggered variable arrangement on a uniform grid. PAGE 82 67 x y v u P, , Figure 4-13. Placement of velocity field and corresponding control volumes on a staggered variable arrangement. Pressure , density and viscosity are defined at the cell centers, whereas components of ve locity are shifted on the cell-faces. The implementation of staggered grid invo lves construction of separate control volumes for each velocity component. These control volumes is important to ensure the local and global conservation for mass and mo mentum. This is natural in uniform Cartesian grids when the control volumes are constructed as in the schematic given in Figure 4-13. However, there exists some freedom in the choice of defining the momentum control volumes when the grid is non-uniform. ue2 up uw Pp Pe2 Pe1 up ue1 face1 face2 Pe xe x p (a) (b) Figure 4-14. Schematic for the staggered cont rol volume. (a) Location where grid levels change. (b) Sum of all c ontrol volumes gives a gl obal volume ensuring the global conservation. One possible arrangement, proposed orig inally by Losasso et al. (2004a; 2004b), employs equal velocities on fine faces when there is a level difference between the cells. Figure 4-14 illustrates an example of this case. PAGE 83 68 Because of the equality defined at the fine face, sum of the control volumes yields a bigger control volume that would require only one pressure gradie nt defined at both faces. The gradient is computed using Equation (4-27). 12122 2epeep pe faceface pePPPPP PP xx x xxx (4-27) 4.2.3 Discretization of Governing Equations Finite volume method is employed to obtain the discrete equations of flow motion. Following the steps of the projectio n method as given in Equations (4-24) and (4-25), the discrete equations are derived in the fin ite-volume form for the advection-diffusion equation (Equation (4-28)) and pressure Poisson equation (Equation (4-29)). CVCSCSudVuundAundAF t (4-28) CSCSp undAndA t (4-29) where u is the velocity field, p is the pressure field, is the density, is the viscosity and F is the body forces including the interfac ial forces of immersed boundary method. The unit normal direction of the surface is represented by n. 5544 13 12 54 7788 16 781 222 1 2N E S WAA U U AA AA U U AA (4-30) Convection term in the advection-diffusi on equation can be represented in the discrete form as in Equation (4-31). faces CSuundAUA (4-31) PAGE 84 69 where is the velocity component, U is the velocity normal to the control volume faces and A is the area of the corresponding face. The normal velocities on the control volume faces are computed appropriate ly to maintain the divergence free condition within each control volume. The following normal face computations are performed for Figure 4-15 in order to illustrate th e construction process. UW 4 1 1 2 3 4 8 7 6 5 UE US UN Figure 4-15. Construction of the normal ve locities on the control volume for the convection term. Control volumes that lead to global conservation need to address how the fluxes will be distributed at the locations where the grid level changes. The common faces between the large cell and the small cells are indicated by a capital letter. The fluxes are always computed from the side of a sma ller cell and the corresponding portion is added as a contribution to the large cellâ€™s cont rol volume. These fluxes are marked as f1, f2, and f3 in Figure 4-16. These fluxe s are counted for the small face s, AC, CD and DF as they are. On the other hand, the larger control volu meâ€™s face is recognized as BE instead of AF. This is handled by e qually splitting the fluxes f1 and f3 into two for the missing parts AB and EF. The viscous terms require the gradient of the velocity term at the control volume faces. They are computed in a similar way to the normal velocities in convection terms by PAGE 85 70 replacing the interpolation operator with the differencing operator, which is defined in Equation (4-32). f1 f3 f2 f2 f3/2 f3/2 f3/2 f3/2 A B C D E F Figure 4-16. Flux distribution at the control volumes of Tnodes that leads to global conservation. 12 xx , where x is the distance between 1 and 2 (4-32) The viscous terms around the interface include the mixed derivative terms Tu for incompressible flows due to smoot hly varying viscosity of different constituents. The treatment far from interface accounts only for Tu where the viscosity is constant. The temporal discretization for the conv ection terms employs second order RungeKutta methods, whereas the diffusion terms us e Crank-Nicolson. The overall discrete form of the governing equations of fr actional method is given in Equation (4-35)-(4-35). * * *1 2n n n x facesfacesfacesuu VuUAuAuAfV t (4-33) PAGE 86 71 * * *1 2n n n y facesfacesfacesvv VvUAvAvAfV t (4-34) * * * b ody forces convection diffusion1 2n n n z facesfacesfaces predictionww VwUAwAwAfV t (4-35) The solution to Equation (4-35) yields the predicted velocity field which doesnâ€™t satisfy the velocity divergence condition. Th e asterisk in the conv ection equation denotes the Runge-Kutta treatment for convection term s. The body forces include gravitational forces as well as inte rfacial forces in volumetric form. These terms, convection and body forces, the part of the diffu sion term that comes from the nth time step are summed up together to form a linear system as presented in Equation (4-36). * * vVa S t (4-36) where * represents the predicted velocity field that includes all components, va is the coefficient related with the viscous terms, and S includes all other terms that are known at time n. The continuity equation is forced through the pressure Poisson equation, which is presented in discrete form in Equation (4-37). *1facesfacesp A uA t (4-37) The control volumes for the pressure Poi sson equation is the actual cell volume. The predicted face velocities ar e already solved at the faces of the co ntrol volume. The gradient of pressure is constructed following the procedure illustrated in Figure 4-14. Then the systems of equations can be represented as in Equation (4-38). PAGE 87 72 (a) (b) Figure 4-17. Graphs for two di fferent cell-orderings. (a) Using Jordan curve. (b) Using Peano-Hilbert curve. Apb (4-38) The coefficient matrix A is sparse and symmetric in nature but it includes strong variations in the coefficients because of the density variations of different constituents. A compact AIJ format is adopted for the storag e of the coefficient ma trix after adopting a cell-ordering routine which aids the conve rgence rate of the iterative solver. Figure 4-17 shows two different orderings to illustrate the structure of the same coefficient matrix. The right hand side of Equation (4-38) also shows strong variations because of the local effect due to the interfacial forces. Table 4-3. Original conjugate gradient with preconditioning M. while (convergence) { 1 1 1 1 1 1 1 1 1,/, check convergence; ,, 1i iiili iiii iiili i i ii iii i iiiii rrpAp xxp rrAp rMr rrrr prp } PAGE 88 73 The nature of the pressure Poisson equation requires us to use an efficient iterative solver which can handle the structures pres ent in the coefficient matrix. Conjugate gradient, a Krylov subspace solver, is empl oyed to reduce the computational cost. The original algorithm is given in Table 4-3. After solving for the pressure field, the predicted velocity field is corrected using Equation (4-26). 4.2.4 Axisymmetric Computations The axisymmetric computations consider each control-volume as a cylindrical volumetric element. An array that holds the distance between each face of the control volume and the axisymmetric line is constructed whenever the grid is created or adapted. This variable, which is initialized with unity for the planar case, is used to modify the area of the each control volume during flux computations with following relations illustrated in Figure 4-18. Ae As r z Figure 4-18. Volumetric element consid ered for axisymmetric computations. eeAry , wwAry , nnArx , ssArx (4-39) ewcellcenterrry , nr0.5err , sr0.5err (4.40) The computations of the surface tension forces are also modified with the additional primary axis present in axisymmetric computations. This modification applies to curvature computations as in Equation (4-41). PAGE 89 74 planaraxisymm, r axisymmn r (4-41) where rn is the radial component of the normal and r is the distance from the axisymmetric axis. 4.3 Summary This chapter explicated the adaptive Ca rtesian grid implementation for immersed boundary computations. Overall algorithm is explained in two major sections, the immersed boundary method and the solution pr ocedure for the flow field. The key aspects of these components are listed as: Immersed boundary method o Interface tracking Conservative reconstruction Interface constructi on for topology changes Interactions between interf ace and computational boundary o Modeling interfacial conditions Link between Eulerian and Lagrangian variables Modeling complex solid boundaries Adaptive Cartesian grid ba sed flow computations o Grid adaptation and data structure o Staggered grid arrangement o Fractional step method o Axisymmetric computations The components of the immersed boundary method include management of interface markers via interface tracking and mo deling the interfacial conditions with the help of the discrete delta function. Using unstructured meshes (triangles in 3D and line segments in 2D) for interface representation simplifies the issues related w ith data management. Elements formed by nodes provide connectivity information which can be modified by simply adding/removing marker points on the interface. This idea is used extensively to manage the surface quality, while being careful by avoiding undesired volume errors via the PAGE 90 75 conservative reconstruction algorithm. In addition, this allows defining multiple interfaces that are not connect ed, i.e. multiple droplets. When interfaces go through a change in its topology during a merger or a breakup, interfacial data can be partially modified with the help of a level-contour. Furthermore, in case any structure on an interface approaches a comput ational boundary, the closest marker snaps itself on the computational boundary and inherits its predef ined condition, which can be slip, no-slip, inlet, and outlet. For instan ce, if the correspondi ng boundary is defined as a wall (no-slip condition), the snapped marker de fines a contact line problem. In the case of the liquid plug problem, studied in Chapter 6, interface is defined at the inlet and the symmetry line. A special treatment for the inlet condition is defined because the intersection point, which may move with the flow conditions, can re define the boundary condition for the flow field. Interfacial conditions can have two separate forms wh ether the interface represents a solid boundary or a fluid boundary. Solid boun dary treatment allows us to handle twophase flow simulations in complex geometries. Such capability is crucial for Cartesian grid formulations. The present methodology involves enforcing velo city and pressure conditions of the solid interface into the flow field, which makes the treatment a sharp interface method. When fluid boundaries ar e considered, solution field employs a discrete form of Dirac delta function to tr anslate Eulerian variables to Lagrangian quantities and vice versa. In this process, transport properties are smoothened across the interface, which a characteristic of conti nuous interface methods. Formation of various approximate delta functions are examined fo r their applicability/compatibility with the adaptive grid based flow computations. PAGE 91 76 Flow computations are carried on an ad aptively refined Cartesian grid. Data structure and adaptation criteria are illustra ted. Staggered grid layout of Harlow and Welch (1965) is adopted on isotropically refi ned Cartesian grids. Spatial and temporal discrete equations are formulated for 2D , 3D and axisymmetric computations. PAGE 92 77 CHAPTER 5 ADAPTIVE COMPUTATIONS IN MULTIPLE DOMAINS Multi-domain implementation employs the following components: Partitioning Algorithm Domain decomposition for c onvection-diffusion equation Domain decomposition for pr essure Poisson equation The following sections describe the key components of each algorithm. 5.1 Partitioning Algorithm Partitioning algorithm divides the computational space into a number of subspaces, which will be assigned to a separate computational unit. Following strategies can be used: Cell-ordering based partitioning that utilizes a special curve Optimization techniques to find the mi nimum communication/computation ratio 1 2 3 (a) (b) Figure 5-1. Partitioning based on curves. (a) Jordan curve. (b) Final partitions for three sub-domains. METIS is an open source package that pr ovides a solution to the partitioning problem by using optimization techniques. It is included in the development as an option to the partitioning algorithm. However, due to METISâ€™s computa tional cost, this study PAGE 93 78 mainly utilizes a cell orderi ng based partitioning algorith m. This is because, cellordering is not only required for the purpose of the partitioning algorit hm, but also one of its merits is to enhance the convergence rate of the solvers, esp ecially for relaxation based solvers. Initially, Jordan curve had been used in the sequential solver for cell-ordering purposes. Jordan curve basically collects a ll the neighbori ng cells for a given cell and keeps collecting further the neighboring cells of those collected. This procedure results in an ordering that follows a diagonal base d path throughout the computational domain. Figure 5-1 illustrates the path that it divi des the computational domain into a number of partitions by relating the computational load to the number of cells. Although Jordan curve has the ability to improve the convergence rate of the solver, it doesnâ€™t serve as an effective partitioning tool. This can be observed from Figure 5-1 as the partitions might have unconnected cells depe nding on the choice of the division point for the Jordan curve. Such problems can be handled by introducing additional algorithms, which considerably increase the computational cost and therefore diminishes the attractiveness of cell ordering algorithms over METIS. On th e other hand, as opposed to Jordan curves, space filling curves (SFC) can produce quality partitions due to the curvesâ€™ intrinsic properties, i.e., locality. SFCâ€™s are special functions , which are used to map n -dimensional space into 1D space at a desired resolution. The present work considers Peano-Hilbert type space filling curves. The details of the algorithm for curve generation, its use for cell-ordering, partitioning, and preparing the data structure to perform parallel computations are described in the following sections. PAGE 94 79 5.1.1 Space Filling Curve Generation A space-filling curve is a parameterized, injective function which maps a unit line segment to a continuous curve in the unit square, cube, hypercube, etc, which gets arbitrarily close to a given point in the unit cube as the resolution increases. In two dimensions, the basi c building block of the Hilb ert curve is a â€œUâ€ shaped line which visits each of 4 cells in a 2x2 block. Each subsequent level further divides the previous levelâ€™s cells, creating sub-quadrants, visited by rotated â€œUâ€ shaped curves. In three dimensions, the curves follow the sa me basic construction rules, but the basic building block extends into the third dimension with additional â€œUâ€ shaped turns. As the resolution increases, the curve fills the sp ace. Hence we can map the space with a 1D line. Figure 5-2 illustrates the Hilbert curves on 0-1 domain with di fferent resolution, which can be recursively defined by substituting a straight line segment by a certain pattern of lines. The curve fi lls the whole domain as this process is repeated infinitely. Repeating this sub-division for k times will yield in an approximation of a Hilbert curve of k -order, which refers to its resolution. (a) (b) (c) Figure 5-2. Construction of the Peano-Hilb ert curve. (a) level=1. (b) level = 2. (c) level = 3. Hilbert curves pass every point in an n -dimensional space once and only once in some particular order according to some algor ithm. As a result, they graphically express a PAGE 95 80 mapping between one-dimensional values and the coordinates of the points, regarded as multi-dimensional values. The points are placed in a sequence according to the order in which the curve passes through them. This sequence number is referred to Hilbert derived-key, Hi, which is an integer between 1 and 2k. ,,kiiii f Hxyz (5-1) The function given in Equation (5-1) can be constructed for any dimension for any resolution using the algorithm suggested by Butz (1971). Alternative curve generation algorithms that use recursion have more comp utational cost than Butzâ€™s algorithm. The advantage of the Butzâ€™ algorithm is simply b ecause it works on binary numbers to rotate the basic shape at a given level which en ables the access to any point on the curve without visiting the prior nodes on the Hilbert curve. This operation is referred to encoding Hilbert keys. The idea to be able to acce ss the Hilbert keys from the physical space is suggested by Lawder (2000), who constructe d a similar algorithm for th e inverse function (Equation (5-2)). This operation is referred to decoding Hilbert keys. Both al gorithms give us the ability to encode and decode Hilbert keys without baring the computati onal cost of curve construction. The details of both algorithms are explained in de tail by Lawder (2000). 1,,kiiii f xyzH (5-2) Algorithms for encoding and decoding work on a domain, which is either a unitsquare in 2D or a unit-c ube in 3D, can only be app lied for cell-ordering on a computational domain, that have equal number of cells in each direction. Moreover, the number of cells is required to be multiples of 2n. Otherwise, some cells would not conform to the 2n nature of the space filling curve. In order to overcome such issues, the PAGE 96 81 initial box grid, which is defined by the user as an input parameter, is assigned a base orientation and a base Hilbert key. Because the nature of the isotropic adaptation is similar to the Hilbert curve, which is based on division of a square/cube cell to four/eight sibling cells, the function for H ilbert curve can operate indepe ndently inside each of these boxes to create Hilbert keys depending on th e refinement. During the cell-ordering, the independent Hilbert keys can be combined with the base values of orientation and resolution parameters to preserve the overall ordering. Figure 5-3 illustra tes possible grid arrangements for channel flows with the initial and adapted cu rves. The initially set base values are inherited by the sibl ing cells during the adaptation. (a) (b) Figure 5-3. Initial cell-ordering for a cha nnel flow. (a) Two-dimensional domain. (b) Three-dimensional domain. 5.1.2 Cell Ordering The algorithm for SFC is used to assign Hilb ert key values all the active cells in the computational domain. The input parameters used are based on thei r coordinate in the computational domain, which ar e represented by integers ( i , j , k , level ). Figure 5-4 illustrates the way that the cells are ordered when multiple levels exist. The resolution for the Peano-Hilbert curve is set to the maximum allowable resolution in PAGE 97 82 the grid. This allows us to use a sorting algorithm instead of a walk through the cells. A quick sort algorithm, which has N log N operations, is used to fina lize the cell-ordering. The process creates a compact coefficient ma trix as the curve ensures at least two neighboring cells reside next to the diagonal element. (a) (b) Figure 5-4. Hilbert curve based ordering. (a) Initial adaptivel y refined Cartesian grid. (b) Ordering produced by the algorithm. 5.1.3 Mapping Partitions A cut-off point is set on the ordered cell list to determine the partition for the parallel computing. Then each of these parti tions are assigned, or mapped, to one of the available processing units. One of the simple st but an effective approach that ensures load balancing is to assign equal number of cells into each sub-domain (Table 5-1). The cells from one cut-off point, H0, to another, H1, form a sub-domain/partition. Table 5-1. Algorithm for mapping partit ions to the available processors. if ( N / p ) is an integer H0 = I . N / P + 1 H1 = ( I +1) N / P if ( N / p ) has a remainder [ r ] if I < r , H0 = I . N / P + P + 1 H1 = ( I +1) N / P + P + 1 else H0 = I . N / P + r + 1 H1 = ( I +1) N / P + r PAGE 98 83 Figure 5-5 illustrates the mapping proce ss for a computational domain with a number of N cells over a number of P processors. Since, each processor is responsible for only one partition; P also implies the desired number of sub-domains. Each subdomain is denoted with i (ranging from 0 P -1). Ordered cells are marked with these cut-off points, as indicated with capital letters in Figure 5-5. These cut-off points form line segments of AB, BC, CD, and DE whic h include equal number of cells even though they govern different sized ar eas in the physical domain. A B C D E A B C D E 1st quad 2nd quad 3rd quad 4th quad Figure 5-5. Partitioning based on cell-ordering. New partitions start at the capital letters. A global1 array is created in order to track the identificati on of the governing processor for each cell. This enables us to create overlapping cell/face lists (ghost cells/faces). A local array is created that governs only the cells between H0 and H1 and a local list is arranged to transfer any local lo cation to a global locati on and vice versa. 5.1.4 Overlap Region As part of the additive Schwarz algor ithm, the overlap region is required to exchange information at the end of each local solve. These cells/f aces, referred to as 1 The term â€œglobalâ€ is referred to data available to all processors, whereas the term â€œlocalâ€ is referred to data available only to the processor of ownership. PAGE 99 84 ghost cells/faces, need to be marked as overlappi ng cells to form a buffer to be sent to the neighboring sub-domain. 5.1.4.1 Ghost cells The communication through the ce ll-list is arranged in su ch a way that provides a solution to the Pressure Poisson equation. Th e computational stencil for pressure Poisson equation is represented in Figure 5-6. Figure 5-6. Computational stencil for pressu re Poisson equation in 2D consisting of immediate neighbors in east, west, north, south. As the mapping routines had formed a globa l array to track th e governing processor for each cell, we perform a search for the cel ls that has neighbors from other processors in east, west, north, south, front and back. Each processor marks those cells and forms a receive buffer locally while also marking the ownership of those cel ls in another list. Computational cell Ghost cells Figure 5-7. Overlapping regions that will ex change information with the neighboring subdomains (marked with crosshatches). PAGE 100 85 Figure 5-7 shows the overlapping/ghost cells being prepared for communication. Red lines represent a boundary of local solution domains. At this stage, overlapping cells appear as an extension of the global mesh st ructure. Each processor marks all the ghost cells, for which they expect to receive data fr om other processors. Information in the first inner layer, cross-hatched region, is to be sent to the corresponding neighboring subdomain. Table 5-2 illustrates the data structure of ghost cells for Figure 5-7. Neighbors of 1 are given as the following sub-domains: 2 , 3 and 5 . Receive buffer will consist of two arrays; receive domain list and receive buffer list. Receive domain list has number of elements equal to the total number of s ub-domains. Indices marked with the subdomain number pointing out the starting index of the receive buffer list, which includes the indices of the cell numbers. The end i ndex is retrieved via the next sub-domainâ€™s start index. In the example, cell s between 1-24 will be sent to 2 , cells between 25-60 will be sent to 3 , and cells between 61-92 will be sent to 5 . On the other hand, subdomain 4 is ignored. Table 5-2. Data structure of ghost cells. 1 24 25 60 61 92 Receive Buffer C1 C2 C3 contains cell numbers Ci Receive Domain 1 1 25 61 61 93 1 23 4 5 The same format is used in the send list and other possible lists for communication. After constructing the receive list, each processor has the information what they should be receiving and from which processor they expe ct to receive it. These processors need to be invoked to prepare the information to be sent. This is done through explicit communication of the receive list with the responsible partie s of the other side. This PAGE 101 86 process prevents certain inconsistencies when compared to building the send buffers in the same manner of building the receive buffers. The cells in the receive buffer is added to the ordered cell list yet they are marked as ghost, indicating that these cells are not going to be solved for. The ordered cell list is extended from the size [ H1 â€“ H0 + 1] by the number of cells to be sent. The cells in the communication volume are not or dered as they do not involve in the solution process other than providing continuity to the solution field. 5.1.4.2 Ghost faces The communication through the face-list is mainly for th e staggered arrangement of the convection-diffusion equation and hence th e ghost-faces are determined according to the needs of the convection-diffusion equation. Figure 5-8. Computational stencil for c onvection-diffusion equation for a face in ydirection. Control volume is represented with dashed lines. Figure 5-8 is an exampl e of the stencil for y -control volumes that is used to compute the convection-diffusion equation. The construction of the surface integral for the control volume represented with the black dash line requires the fluxes computed for the gray control-volumes. This would requi re additional faces to be included in the stencil especially at the places wh ere grid resolution level changes. The solution domain for the velocity field is determined by the core-ordered celllist; which also includes the common faces at the processor domains. Those control PAGE 102 87 volumes form the overlap region for convection-diffusion equation. Figure 5-9 is an illustration of ghost faces which may include mo re faces than the faces of the ghost cells created for the pressure equation. Computational face Processor boundary Computational face Ghost faces Figure 5-9. Ghost faces at th e corner of a partition. 5.1.5 Communication Routines Once the lists are constructed, the values for any array can be updated at any time. Two separate routines are arranged to update a local array (with the size of the local cells/faces) or a global array (with the size of the total grid size). These routines are based on another set of fundamental subr outines for communica tion that sets MPI commands. These utilize the non-blocki ng MPI_ISEND and MPI_IRECV commands based on the data type (integ er or double precision) to pos t messages and to wait and receive for others to send. The non-blocking nature is analogous to sending a regular mail and receiving a tracking number instead of confirming the message is actually sent or delivered. Operations can still be performed in the period between sending and receiving as long as the oper ations do not involve the me ssage to be transferred. Figure 5-10, based on Figure 5-7, illustrates the locati ons of the buffers to be sent and where it is located. The communication routines accept additional two parameters other than the variable itself. These routines should accommodate the type of action to be taken with the message received. These routines provide the flexibility not only to update PAGE 103 88 the cells/faces with the receiv ed values, but also to perform certain operations, such as relaxation or subtraction. Figure 5-10. Communication processes to upda te the information present in the ghost cells. 211()gold iij x xx (5-3) In Equation(5-3), old i x is the old value contained in the ith processorâ€™s ghost cell, j x is the value received from the jth processor, and g i x is the updated value at the ghost cell of CPU i. When 1 and 2 are one, it will replace the ol d values with the new. If 1 is less than one but 2 is one, value is under-relaxed. If 1 is greater than one but 2 is one, value is over-relaxed. If 1 is greater than one but 2 is zero, old value is subtracted from the original. When updating of the ghost cells of processor i, which has d neighboring processors; i will send and expect to receive da ta from each neighboring processor, j. Let n be the number of loca l computational cells (N is the total cells), and yij be the length of data to be sent and yji is the length of data to be received. Usually it is expected that yij = yji; but in some cases yij ~ yji. Let the cost of communication of a unit length be denoted PAGE 104 89 by c (even though c becomes smaller when yij is large). The operation cost for communication (C) can be approximated for nonblocking communication as in Equation(5-4). yij.c < Cij < 2yij.c ' 11jjdd iijiji i jjCCcycdy (5-4) 1jd ij i j y y Another way to represent the cost is to establish the communication to computation ratio. If we denote as the ratio, i idyn , we can replace iy in Equation (5-4) to obtain Equation (5-5). ii iN Ccdycnc P (5-5) These routines use prescribed lists that ar e only for ghost cells/faces. An additional routine is also available as an option for cu stom lists. If a new communication with cells other than ghost cells is required, it is possibl e to construct a list in the similar format that is presented for receive list. This way of co mmunication is used after grid adaptation. Another type of communication required fo r computing the dot product utilizes MPI_ALLREDUCE command, which collects the local sums from i ndividual processors to distribute the overall sum to all the availa ble processors. This processâ€™ operation count, S, can be approximated based on c as in Equation (5-6). 21ScP (5-6) 5.1.6 Grid Adaptation Whenever the grid is adapted, the partitio ns need to be rearranged in order to accommodate the requirements of load ba lancing. This brings an additional PAGE 105 90 communication step as the previously comput ed flow field variables may no longer be available to the recently assigned processor and are required to be retrieved from the previous owner. This situation is illustrated in Figure 5-11. The locality property of space filling curves minimizes the movement and hence the amount of information that needs to be exchanged is observed to be around 10% at each adaptation. l n-1 Managed by same processor at both time steps Managed only at the previous time step, need to send info to whom is assigned at current time step. Managed only at the current time step, need to receive info from the previous responsible CPU. Subdomain at time n-1 Subdomain at time n-1 Figure 5-11. Adjusting partition boundaries after adaptation. In order to establish such communication, following steps are included in the implementation. When adaptation is requested, allocate an array and copy the values of the old values in order to identify the previous owner of the cells. During adaptation, if a cell is refined, the siblings will inherit the parentâ€™s processor ownership (flow variable is also updated for those sibling cells). Create new partitions and build a list of cells that changed from one processor to another by comparing the new and old values. Communicate for flow field va riables to update the data in the current partitions Discard the allocated array. 5.2 Domain Decomposition for Advection-Diffusion Equation The convection-diffusion e quation involves in construc tion of convection related terms, diffusion related terms (also cross-derivative (ux, uy, vx, vy) terms around the interface), moving marker points, including su rface tension forces/additional body forces. PAGE 106 91 All of these routines are adju sted for computing the fluxes only in the â€œlocalâ€ domain of each processor. We use the following schemes for the time integration of th e convection-diffusion equation: Forward-Euler Runge-Kutta/Crank Nicolson When Forward-Euler method is used, the in tegration is explicit and it involves only one communication routine for the ghost faces. The faces common to two processors (at the processor boundaries) are solved for in each sub-domain but not shared via communication. The solved flow field results in the same values as long as the previous time step values at the ghost faces are updated correctly. When Runge-Kutta/Crank Nicolson method is used, the convection terms from the previous time step and the current time step are computed and there should be an update for the ghost faces in between these computations. The Crank-Nicolson requires solution of a linear system, for which the coefficient matrix is formed by the diffusion equation. The linear system is solved using Gauss-Se idel method for which the convergence rate significantly improves with the utilization of the order cell list. The faces, which are corrected during the iterations, are used to compute the rest. Once the iteration is complete, the ghost face values are update d to adopt additive Schwarz domain decomposition. The local error in each proc essor is computed by taking the maximum of the difference between the old and the new va lue of cells and global error is deduced from the maximum of all the local errors. This error is normali zed by the cell volume and t. The convergence is achieved only if the normalized error is less than the desired level of tolerance. PAGE 107 92 5.3 Domain Decomposition for Pressure Poisson Equation The solution for the pressure Poisson equation is obtained by solving a linear system of equations. Since the coefficient matrix is symmetric and sparse, the conjugate gradient solver is used to obtain th e solution for the linear system. The coefficient matrix is updated after each grid adaptation. This avoids the computational cost of the matrix assembly is during the time steps between adaptations. Moreover, the coefficient matrix at this stag e is in a template form involving only grid related information without the density fiel d which changes with the location of the interface. Density field can be incorporated to the template with some modifications to the current coefficients of the matrix. Let ija and ija denote the template and corrected coefficients for the cell with the row, i, and column number, j, respectively. Equation (5-7) addresses modification process. 1 2ij ij ija a (5-7) The coefficients of the matrix are stored in a sparse matrix storage format for better memory management purposes. The particul ar format used in this theses is the AIJ format, which involve a one-dimensional intege r and a real array to store the diagonal and non-zero off-diagonal elements of the matrix . This matrix storage is represented in Figure 5-12. Iterations of the conjugate gradient al gorithm involve dot pr oducts and a matrixvector multiplication. These processes are adju sted for the sparse storage format and they only perform operations on non-zero elements. Table 5-3 gives an analysis of the complexity of the conjugate gradient algorithm. PAGE 108 93 12N-1N 11 222,1 1 CCCCN N NNRD RDO D R D Figure 5-12. Sample matrix that has a non-ze ro element at the second row and its AIJ format. In Table 5-3, N denotes the number of unknowns to be solved, M denotes the average number of off-diagonal elements per row, and L is the iteration count, which is limited by N. When there is no parallelism, the operation count ( ) for convergence can be estimated as in Equation (5-8) 14NMLL (5-8) Table 5-3. Complexity analysis of th e original conjugate gradient method. Operation Count Compute the residual (involves a matrix vector multiplication) NxM (>6N; < N2) Loop until converges Dot product N Compute direction (p) N Compute A.p (matrix vector multiplication) NxM Dot product N Update solution vector and residual N End Loop L There are multiple ways to achieve parallel ism with this algorithm. One can simply apply the direct parallelism by simply employi ng parallel versions of a dot product and a parallel matrix-vector multiplication. Alternatively, one can employ Additive Schwarz domain decomposition routine for co arse granular parallelism. This dissertation focuses on several appro aches for parallel implementation of the conjugate gradient method. These ar e given in the following list as: PAGE 109 94 Global treatment of conjugate gradient solver Additive Schwarz technique Multigrid solver 5.3.1 Global Treatment Global treatment of conjugate gradient so lver requires an update on the ghost cells in order to complete the matrix vector multiplication and a global summation over subdomains during the dot product. On the other hand, this method introduces an additional cost due to the required communications between the processors. First of these is due to the update performed on the ghost cells. This is denoted as C in Table 5-4, which can also be represented by the ratio between the communication to computation ratio using Equation (5-5). Table 5-4. Complexity of the conjugate gr adient method with direct parallelism. Operation Count Compute the residual (involves a matrix vector multiplication) n.M + C Loop until converges Dot product n + S Compute direction (p) n Compute A.p (matrix vector multiplication) n.M + C Dot product n + S Update solution vector and residual n End Loop L The communication cost, C, can be minimized by taking ad vantage of the nature of the matrix-vector multiplication and the non-bl ocking nature of the MPI routines. The latency between sending and receiving the me ssages can be hidden during computation of the locally available values. Once the message is received, contribu tion of the ghost cells can then be accounted to finali ze the summation. This works very well if the latency is equal or greater than the cost of the loca l multiplication, which is the case for a large matrix. Otherwise, latency will become visible. This is illustrated in Figure 5-13. PAGE 110 95 Send Message CPU A CPU B Local Multiplication Receive Message Continue on multiplication Send Message Local Multiplication Receive Message Continue on multiplication Latency Figure 5-13. Hiding the late ncy of communication between the processors during a matrix-vector product. Another communication is required to complete the dot product process, which is a global type of communica tion and denoted with (S). In Table 5-4, n represents the local number of cells which is in the order of N/P, where P is the total number of processors. M doesnâ€™t change because the proc ess involves all the coefficien ts of the stiffness matrix. Using this notation, the opera tion cost for the direct para llel option for the conjugate gradient algorithm is approximated as in Equation (5-9). 122NN M CLLS PP (5-9) The speed-up that can be achieved by usi ng this algorithm is given in Equation (5-10), which can be further manipulated to give Equation (5-11) with the assumption of average number of neighboring cells (M) equal to 4 (for 2D). (1)4 122 NMLLN NN M CLLS PP (5-10) 1 12 1 84 CPLLSP P NLN (5-11) The efficiency of the parallelism is improves when the ratio of Equation (5-11) increases. The maximum value of this ra tio can only be one. The second term in the PAGE 111 96 denominator is the cost due to communication ( ) and it can be furt her manipulated to yield Equation (5-12). 24(1) for L >> 1 888 CPSPdPPc c NN (5-12) This analysis helps us to identify the f actors that have effect on the communication cost. For instance, communication cost decreases when; Communication to computation ratio is small, Number of neighboring processors is small, Total number of cells is large, Total number of processors is small. 5.3.2 Additive Schwarz Technique A domain decomposition technique which u tilizes coarse grain parallelism might enable us to efficiently use the available pa rallel computing resources. On the other hand, when communication frequency decreases , the convergence rate of the iterative solvers would deteriorate. Therefore, to improve the convergence rate, Additive Schwarz domain decomposition method is applied to conjugate gradient as a preconditioner. Additive Schwarz domain decomposition method can be incorporated into conjugate gradient with the following modi fication as given in Equation (5-13). 11 T iiiiiiirMrRARr, for ith sub-domain (5-13) The solution of ir in iii M rr can be obtained using the conjugate gradient algorithm, during which communication is not required as the components of r can be obtained locally. Table 5-5 illustrates the complexity of th e conjugate gradient algorithm with an Additive Schwarz preconditioner. In Table 5-5, t denotes the number of iterations to obtain an approximate solution for the precond itioner. As the pr econditioning step can PAGE 112 97 become costly, this parameter can be set to a fixed number of sweeps than obtaining a solution at a given accuracy level. Table 5-5. Complexity of Additive Schwarz preconditioned conjugate gradient algorithm. Operation Count Compute the residual (involves a matrix vector multiplication) nM + C Loop until global convergence Solve for z in iii M zr nmt Dot product n + S Compute direction (p) n Compute A.p (matrix vector multiplication) nM + C Dot product n + S Update solution vector and residual n End global convergence loop A Operation cost of this algorithm is approximated as in Equation (5.14). 142 NNN M CAAmtAS PPP (5.14) The speed-up that can be achieved by usi ng this algorithm is given in Equation (5-15), which leads to Equation (5-16) with the assumption of average number of neighboring cells in global matrix ( M ) and in the local matrix ( m ) to be equal to 4 in twodimensional computations. (1)4 142 NMLLN NNN M CAAmtAS PPP (5-15) 1 2 2 38 CPSP P AtA LLN (5-16) According to the denominator of Equation (5-16), the ratio of â€œ A / L â€ has significant effect on both computation time (first term in the denominator) and communication cost (second term in the denominator). Further studying this relation, the following can be deduced: PAGE 113 98 If L = A , then no gain is achieved through the preconditioner. If L < A , then it is more costly than the direct parallelization If L > A , the communication cost and computationa l cost of direct parallelization is reduced considerably as long as the relation in Equation (5-17) holds. 3 2 L t A (5-17) Equation (5-17) limits the use of Additive Schwarz algorithm as the number of iterations to obtain the local solution is problem dependent. When the number of iterations, t , is fixed, there would be no guarantee that the convergence of the global system will enhance. In addition, Additiv e Schwarz methodsâ€™ convergence behavior is highly dependent on the grid size in the overlap region. Employing multigrid type methods removes this direct reliance on the grid size. This possibility is further explored in the following section. 5.3.3 Multigrid Solver The high frequency components of error decay faster than the low frequency components. Multigrid aims to damp a ll solution error components, low and high frequency, at the same rate to achieve a be tter convergence rate. The highest frequency component of the error is due to smallest grid size. Coarse grid levels are used to smooth errors that exist on the fine grid. This proc ess leads to damping the error uniformly at all levels, which is demonstrated by Ghia et al (1982) and Shyy (1994). Let s ol satisfies Equation (5-18), and let n be the nth iterative solution that contains error yielding a residual from R as given in Equation (5-19). solAb (5-18) nsol A AbR (5-19) PAGE 114 99 This leads to Equation (5-20) assuming A operates on the solution vectors independently, solAAbR (5-20) Using Equation (5-18), Equation (5-21) holds for the residual. nARbA (5-21) In order to apply the idea in multigrid, this relation is transferred from a fine level grid ( l ) to a coarser level ( l -1) using a appropriate interp olation/restriction operator, 1l l I . Its application is given in Equation (5-22). 1 1 1 1 111 l lll l lll lllIAA IRR AR (5-22) After solving for 1l at a coarser level, the solu tion field is corrected after transferring the smoothed erro r back to the fine level via prolongation/interpolation operator. 1 11 nnl llllI (5-23) This process enables damping of hi gh frequency and low frequency error components at the same rate. This is an attractive property as the additive schwarz algorithm has a strong dependence on the overlap grid size. 5.3.3.1 Coarse grid generation Coarse grids are created repr esentatively to assemble the coefficient matrix and to perform the interpolation operators for restric tion and prolongation. They are represented with a list that contains cells from the closes t fine grid without connectivity information. The coarse grids are created using following rules PAGE 115 100 The grid is coarsened one level at a time Candidate cells for coarsening are chosen based on the cells Hilbert key values If the cells to be coarsened already have multiple levels; only the cells at the highest level are coarsened The coarsest grid is set to be initial box gr id set by the user, i.e., level = 0 at normal computations According to the definitions given above ; the coarse level computations produce the coarse levels as in Figure 5-14. (a) (b) (c) Figure 5-14. Results of the coarsening proce ss. (a) Level = 2 illust rated with Hilbert cellordering. (b) Level = 1. (c) Level = 0 (initial box grid). The creation of coarser grids is handled based on Hilbert space filling curves by combining 4 cells in 2D and 8 cells in 3D together. We travel along the ordered cell list and check if this is possible or not. If all candidate cells to be combined are at the same level, all represent a coarser cell in the new level. This process is repeated until the desired number of coarse grid levels is constructed. This is illustrated in Figure 5-15 in detail. Coarse grid generation process involves se t of rules to ensure that cells being coarsened can form a proper cell. These are listed as follows: PAGE 116 101 Check the current cell using Hilbert key values whether it is at a proper location for coarsening. If it is, mark the current cell as C1 and put the following cells into the stack as candidates. Check the current four cells ( C1, C2, C3, C4) if it matches with the level of C1. In case they are at the same level, put C1 into the stack and mark cells at this level as coarsened. Set new C1 as the cells after C4. Check order is C2, C3, and C4. If the check is failed at any point; put the priors into the list with flags as not coarsened and set C1 to the current and pe rform a similar check. C1 C2 C3 C4 C1 C2 C3 C4 C1 C3 C4 C2 (a) (b) (c) C3 C4 C2 C1 C3 C4 C2 C1 C3 C4 C2 C1 (d) (e) (f) Figure 5-15. Step-by-step illustration of the co arsening algorithm. Shaded areas represent the candidate cells; reds are denied. Th e red shaded cells are the ones that prevent coarsening for the candidates. 5.3.3.2 Matrix assembly Geometric based coefficient matrix asse mbly is based on conservation laws. Coefficient matrix is created for all grid le vels and its algorithm requires the connectivity information at all levels. However, such inform ation is only available at the original grid PAGE 117 102 level and the rest are constructed using the cell-list that contains the identification of the smaller cells that a coarse cell contains. C onnectivity information of the contained cells (at the finest level) would help us to constr uct the connectivity information at the coarse levels. The cells coarsened also tracks the information of refinement level from the initial box grid to access the shape of the cell (spacing and area). This is mainly because of the non-uniform nature of the original grid . Some cells may not be coarsened for all created coarse grids. 5.3.3.3 Restriction Restriction operator relates the variables that are at a fine level to a coarser level. In the multigrid method, restriction operates e ither on the residual or on the density at a given level to compute its representative value at the coarser level. The density is a field variable and it is restricted by means of aver aging the fine level values using Equations (5-24) and (5-25) based on Figure 5-16. The difference in Equations (5-24) and (5-25) is because density is a field variable and resi dual includes the volumetric information of the cell it resides in. 2 3 1 4 c R2 R3 R1 R4 Rc (a) (b) Figure 5-16. Restriction operato r. (a) Density. (b) Residual. 12340.25Cffff (5-24) 1234Cffff R RRRR (5-25) PAGE 118 103 5.3.3.4 Prolongation The prolongation operator relates the variables that are at a coarse level to a fine level. It operates on the correction obtained after solving the coarse grid equations. This is simply achieved by Equation (5-26) based on Figure 5-17 as the correction is a field variable. 1234 f fffC R RRRR (5-26) R2 R3 R1 R4 Rc Figure 5-17. Prolongation operates on the residual correction. 5.3.3.5 Scheduling The multigrid algorithm is recursively defined and accepts three parameters, , 1 , and 2 which sets the nature of the cycle as V-cycle or W-cycle. 1 and 2 denote the number of sweeps for pre-smoothing and post-smoothing, respectively. When =1, 1=2>0 , multigrid follows a V-Cycle and when =2, 1=2>0 ; it follows a W-cycle scheduling which are represented in Figure 5-18. L3 L2 L0 L1 Restriction Prolongation Smoothing Legend V-Cycle W-Cycle Figure 5-18. Cycles of multigrid. PAGE 119 104 Both preand post-smoothi ng is performed using a conj ugate gradient algorithm. In the parallel application of multigrid pr econditioned conjugate gradient, the smoother can also employ an additional Additive Schwarz preconditioner to enhance the approximate M-1 in a given number of sweeps. Since the number of grids becomes negligible in the coar se levels, the operation cost for smoothing can be ignored in some cases. However, one should note that the la tency cannot be hidden in too coarse grids and possible restrictions on the maximum leve l of the multigrid can provide a solution to such issues. Multigrid cycles usually have good convergen ce characteristics and convergence is achieved when the cycles are repeated as defined until the tolerance limit is reached. During these iterations, solution vector can be underor over-relax ed. Furthermore, multigrid algorithm can be used along with th e conjugate gradient algorithm when it is applied as a preconditioner. The modificat ion process is similar to the case when Additive Schwarz preconditioner is applied to the conjugate gradient algorithm. Equation (5-27) represents the difference with Equation (5-13) using the restriction and prolongation operators. 111 1ll iiililirMrIAIr (5-27) The solution of ir in iii M rr can be obtained through a cycle of multigrid, which is illustrated in Table 5-6 with its use in the c onjugate gradient algorithm. The complexity of this algorithm have si milar results as the Additive Schwarz Decomposition, except that the iteration numbe r enhancement quite considerably and it does not depend on the overlap size. When the MG cycle is considered alone, the algorithm becomes the multigrid solver. PAGE 120 105 Table 5-6. Multigrid cycle and its use w ith the conjugate gradient algorithm. Multigrid cycle MG preconditioned conjugate gradient Algorithm MG (Al, b, x, , 1, 2) { 1 1012 2if ( coarsest) Solve else { Presmooth(,,,) restrict() repeat () =MG(,,,,,) prolongate() Postsmooth(,,,) } return l l l l ll Axb xAbx dbAx eAde xxe xAbx } while (convergence) { 1 1 1 112 1 1 1 1,/, check convergence; (,,,,,) ,, 1i iiili iiii iiili ii li ii iii i iiiMGL ii rrpAp xxp rrAp rrr rrrr prp } 5.4 Summary This chapter explained the implementation specifics for the developed multidomain technique. The key aspects of these components are as follows: Partitioning algorithm o Hilbert curve generation o Cell ordering o Mapping partitions o Overlap region o Communication routines o Grid Adaptation Domain decomposition method o Global treatment o Additive Schwarz technique o Multigrid solver Partitioning algorithm divides the computational space into a number of subspaces, which can be assigned to a separate computational unit. As the approach involves Hilbert type space filling curves, an algorithm that translates n -dimensional coordinate into one-dimensional Hilbert key values (vice versa) is presented. This algorithm is utilized during the cell-ordering process, which is done via th e quick sort algorithm with a computational cost of N log N operations. After the partitions are created by assigning PAGE 121 106 equal number of cells to each sub-domain, they are expanded by one cell to determine the cells at the overlap regions. These cells ar e actively used in the domain decomposition implementation and they resemble the communication process during parallel computations. Even though multi-domain technique targ ets pressure Poisson equation, its application to advection-diffusion equation is practiced as well. The solution procedure for the pressure Poisson equation is examined for three different multi-domain techniques, which are implemented for a comp arison study of their efficiency. These methods include a global treatment for c onjugate gradient method, additive Schwarz treatment, and multigrid methods. The global treatment resembles fine grain parallelism (frequent communication) as it collects the direction and magnitude for conjugate gradient iterations from each subdomain whenever required. Considerations for parallelism of this approach show that it can be inefficient due to an increase in the communication cost when large number of sub-domains is considered. On the other hand, Schwarz methods, which donâ€™t require as frequent communications in pa rallel environments, are shown to be beneficiary only when there is a reduction in the iteration number for convergen ce. Final approach that involves multigrid solver is expected to ame liorate the issues of Schwarz methods. The numerical experiments using th ese techniques are explored in Chapter 6 in detail. PAGE 122 107 CHAPTER 6 RESULTS AND DISCUSSION Previous chapters have focused on the key concepts and necessary methods to develop robust and efficient multidimensional algorithms for incompressible two-phase flows. In this chapter, we concentrate on the validation and analysis of the developed key components as well as their engineering applications. First section is a validation study for the solid boundary treatment using immersed boundary method. For this purpose, flow over a cylinder at a Reynolds number equal to 100 is considered. The vortex shedding frequency and drag/lift coefficients are compared with the prior studies. Second case study examines liqui d plug problem, in which air finger propagates in a channel filled with liquid. In this cas e, interface appears at the computational boundary. Our treatment allows the marker at th is location to move only in one direction, which in return modifies the initial bounda ry condition. This interactive boundary condition and overall method is compared to prior numerical studies that computed the steady state solution for the film thickness a nd interface features. In addition, unsteady propagation of the liqui d plug is investigated in hori zontal and branching channels. When the branching channel flow is considered, the shape of the channel is constructed using marker points. This case aims to de monstrate the methodâ€™s capability of handling solid, liquid and gas objects all together. Finally, developed strategy for the treatment of multi-domain computations on adaptively refined Cartesian grids is analyzed for its performance and its applicability for PAGE 123 108 parallel computations. The partitioning method by space filling curves is demonstrated on two and three dimensional grids. Its pe rformance is compared with METIS. The solution procedure for pressure Poisson equa tion is analyzed for its performance when multigrid and conjugate gradient solvers are ut ilized. The single domain study is used to compare both methodsâ€™ convergence rate and total computational time on uniform and non-uniform grids. In addition, the effects of various density ratios are also examined. The application of additive Schwarz domain decomposition method is considered for the multi-domain solution. Its use with conjugate gradient algorithm and multigrid solver is examined to determine the best approach suitable for parallel computations. 6.1 Flow over a Cylinder The implementation for solid boundary treatment is validated using the test case of flow over a circular body. Figure 6-1 shows the comput ational geometry and the boundary conditions used for this test case. We note that the configuration presented above is similar to the one recently used by Xu and Wang (2006). -101234 -1 0 1 X Y Z Slip Stationary Cylinder D=0.2 Outlet Inlet Slip Figure 6-1. Computational se tup used for the flow over a stationary cylinder case. The flow is initiated with a velocity field that is set to zero everywhere. As the inlet velocity is introduced to the flow domain, the flui d flows around the cylindrical object. The Reynolds numb er for this case is set to 100, which is based on the diameter PAGE 124 109 of the stationary cylinder and the inlet velo city. The computations are performed until frequency of vortex shedding becomes constant. X Y Z Figure 6-2. Typical grid after adaptive refinement for the ca se of a flow over a cylinder. The base grid resolution is set to 40x20 w ith four levels of resolution, which corresponds to a uniform grid at a resolution of 640x320. Th e grid is refined to the maximum level around the solid body due to th e adaptation criterion that is based on interface location. The remaining sections of the computational domain are adaptively refined or coarsened based on the level of vorti city so that the features of the flow can sufficiently be captured. Figure 6-2 illustrates the features of the adaptively refined grid at an instant after the frequency of vortex shedding becomes constant. -101234 -1 0 1 Figure 6-3. Pressure contours of Re=100 flow at an instan t after vortex shedding become constant (Contours are spaced by P = 0.2). PAGE 125 110 Figure 6-3 presents the pressure contour s obtained for the grid presented in Figure 6-2. The boundary condition at the cylinder is incorporated in the solution procedure by constructing a force field around the marker point s. This is analogous to a sharp interface method that doesnâ€™t involve smearing to obtain the desired boundary conditions. Figure 6-4 illustrates the streamlines ar ound the cylinder at an instant ( t = 100, non-dimensional time). Figure 6-4. Streamlines around the cylinde r; colors indicate pressure levels. Table 6-1. Comparison between prior studies a nd present study for drag, lift coefficients and Strouhal number at Re =100. Drag coefficient Lift co efficient Strouhal number Prior studies (Range) 1.33 0.014 1.43 0.009 .25 .34 0.164 0.175 Present study 1.39 0.012 .28 0.167 For the computations of flow over a cylindrical body at Re=100, the Strouhal number is observed to be 0.167, which had been reported to be in the range between 0.164 and 0.175 by Russell and Wang (2003). The study by Xu and Wang (2006) obtained the Strouhal number equal to 0.171 on a uniform grid at the resolution of 640x320. Table 6-1 compares the lift and drag co efficients obtained for Re=100. It indicates that our results are within the range of computati ons reported by Xu and Wang (2006) and those reviewed by Russell and Wang (2003). PAGE 126 111 Grid adaptation evolution is shown in Figure 6-5, in which the total number of cells employed is normalized by the corresponding uniform grid size. Using four level of grid refinement is shown to reduce the computationa l cost by 87%. Time GridSize 0 5 10 15 20 5% 10% 15% 20% 25%4levels(24.1of205)x1032levels(3.4of12,8)x1033levels(8.2of51.2)x103 Figure 6-5. Evolution of grid adaptation in time for 3 different levels of refinement. Grid convergence analysis is performed usi ng 3 different levels of refinement on the base grid of 40x20. Error in Strouhal num ber is utilized to determine the order of spatial accuracy, which is computed to be 1.94 based on the solution obtained at the finest grid. The asymptotic analysis usi ng Richardson extrapolation is presented in Figure 6-6. GridCells Error%(StrouhalNo) 100 200 300 400 500 600 5% 10% 15% 20%1 2 Figure 6-6. Grid convergence study based on th e percentage error in Strouhal number. PAGE 127 112 6.2 Liquid-Plug Flow Theory of elongated bubbles is a classical problem in fluid mechanics, referred to as the Bretherton problem or li quid plug flow. Figure 6-5 illustrates an infinitely long air finger in a channel. gas liquid h U 2 H Figure 6-7. Illustration of the Bretherton problem. Bretherton (1961) showed that the thickness between the film and the channel wall, or the film thickness, decreases with a decrease Capillary number, CaU , where U is the speed of the air finger, is the viscosity and is the surface tension. Using the asymptotic analysis, Bretherton (1961) demons trated that at the limit of Capillary number approaching zero, the film thickness is proportional to 2/3Ca. Also, Cox (1962) stated that the film thickness reaches a finite value for large Capillary numbers. Bretherton problem has been investigated numerically for intermediate values of Capillary number. Giavedonni and Saita (1997) and Heil (2000) studied the effects of inertia and showed that the film thickness d ecreases with increasi ng Reynolds number for flows at a fixed Capillary number. This obs ervation holds at low Reynolds numbers and the behavior is reversed at high Reynolds numbers. Heil (2001) showed that the formation of closed vortices at the bubble tip ca uses a significant increase in the pressure rise in that region. It is also concluded that this become s more significant for flows at low Ca and high Re , which would cause a rupture (Howell et al. 2000). Such a phenomenon can occur in the lungs during a medical treatment process, i.e., drug delivery. PAGE 128 113 Fujioka and Grotberg (2004) studied the interaction between the plug propagation speed and plug length (distance between the fr ont and the back air finger) at the Stokes flow limit and at various Reynolds numbers. They concluded th at the film thickness decreases as the plug length decreases below the channel width, and for sufficiently short plugs, there is a significant interaction betw een the leading and the trailing menisci and their local flow effects. So far, Bretherton problem has only been investigated for the steady state using free-surface boundary conditions while neglecting the viscous effects of the air finger. Some of the examples of such studies in clude Heil (2001), Howe ll et al. (2000), and Fujioka and Grotberg (2006). These resear chers solved the governing equations using body fitted grids, in which preserving the grid quality for large deformations is a difficult and computationally expensive ta sk. Therefore, in order to further investigate liquid plug dynamics, a flexible interface tracking method that can handle the large deformations with a feasible computational cost needs to be considered. This is crucial especially when the mechanisms of rupture and the inte ractions between multiple air bubbles are of interest. Immersed boundary method is a very good candidate for such problems since it has more attractive features when compared to methods that use body fitted grids. These features can be listed as: It includes the effects of all the c onstituents in the flow field Interface is tracked via markers allowing us handle possible large deformations. In numerical simulations of liquid plug flows, the grid needs to be resolved in the region between the wall and air fi nger in order to correctly capt ure the film thickness. As the current method employs adaptively refined Cartesian grids, this region can have very high resolution locally without globally handling such a resolution demand. PAGE 129 114 Computational setup for the Bretherton pr oblem is considered to be a planar channel, in which the bubble pr essure drives an air-finger to advance inside the liquid. Previous numerical studies considering steady state solution for the finger shape and film thickness used the computational framework that moves at a constant speed with the airfinger. Present study adopts a simila r strategy for validation purposes. Inflow Z eromassflux Outflow No-slip Symmetry axis Markers adjust to yield = 0 Markers adjust for symmetry A B Figure 6-8. Illustration of computa tional setup for liquid plug problem. Computational setup for a planar channel flow is presented in Figure 6-8. Interface of the air finger, represented with marker poi nts, is initially separated from the channel wall by a distance, h0. Marker point at the comput ational boundary (marked with A) imposes a mixed boundary condition, which im poses inflow condition at the lower part and zero mass flux condition at the film. C ondition for the interface at this location is assumed to remain horizontal further from the inlet condition, which allows the movement of the marker point only in the ver tical direction. The marker point at the symmetry axis (marked with B) is treated following the symmetry condition used for the computations and it doesnâ€™t modify the conditions at the co mputational boundary. Non-dimensional parameters of the flow based on the properties of the liquid are defined as follows: PAGE 130 115 l lUH Re , lCaU , 2l l H Re We Ca (6-1) In Equation (6-1), H is the half of the channel height, U is the velocity at which the bubble propagates, l and l are the density and viscosity of the liquid and is the surface tension. Because the solution of the lower and the upper half channel are identical, computations are ca rried only on the upper half. Following the above definitions, steady stat e propagation of air finger in water is studied at the limit of Reynolds number e qual to zero at various Capillary numbers. Moreover, the shape and the film thickness va lues are obtained and compared at various Reynolds number when Capillary number is fixed. After these validation cases, we investigate the unsteady shapes of the flow s in a horizontal and branching channels. 6.2.1 Steady State Solution At the steady state, the finger propagates at a speed of U and leaves a film thickness of h behind. Therefore, a frame of reference is chosen so that the computational domain moves at a constant speed of U . No-slip condition at the ch annel walls is applied as u =(1,0). Far right from the bubble tip, outlet condition is imposed by specifying the pressure. A mixed boundary condition is used on the far left from the bubble tip by enforcing u =(0,0) for air and u =(-1, 0) for liquid. Figure 6-9. Snapshot of adaptivel y refined grid (at 3 levels) for Re =0 and Ca =0.05 at the steady state. PAGE 131 116 Figure 6-9 is an illustration for the adaptively refined grid at Re =0 and Ca =0.05. The film region, between the channel wall and the interface at the inlet, is required to contain minimum of 10 cells to be able to resolve the flow field. Grid in Figure 6-9 is obtained using a base grid of 50x10 and three levels of refinement. Original theory by Brethert on (1961) suggests that the film thickness varies with capillary number, Ca , when Re approaches zero. Equation (6-2) presents the expression of this analysis. 2/31.3375hCa (6-2) Figure 6-10 illustrates the variation of the film thickness and the pressure drop at the bubble tip for various Capillary numbers. (a) (b) Figure 6-10. Validation study for various Cap illary numbers. (a) Pr essure drop at the bubble tip. (b) Film thickness. Flow conditions explored in the present study include Ca=0.01, 0.05, 0.1, 0.25, 1.0, and 5.0. Figure 6-10 illustrates that our results are in good agreement with the prior studies of Reinelt and Saffman (1985) and Heil (2001). Variation from the theory of Bretherton (1961) is due to an increase in the capillary number as the approach is PAGE 132 117 applicable to small Capillary numbers. The corresponding bubble sh apes at the steady state are also presented in Figure 6-11. Ca=0.001 Ca=0.01 Ca=0.1 Ca=0.25 Ca=1.0 Ca=5.0 X 0 0.5 1 1.5 2 2.5 Figure 6-11. Steady state shapes of the air-finger for various Capillary numbers. Reynolds number at a given Capillary numbe r is also investigated at the steady state. Figure 6-12 compares the results obtain ed using the immersed boundary method and a benchmark solutions of Heil (2001) when Ca=0.05. In this figure, pressure contours are shown with colors and streamlines are presented via white lines. Similar to their observations, IBM solutions also captured the formation of a closed vortex field in the vicinity of the bubble tip for finite Reynolds numbers. This vortex field is not PAGE 133 118 observed when the fluid inertia is not considered as in the case of Re=0 and it is observed to increase with Reynolds number. Figure 6-12. Pressure contours and streamline traces of the steady propagation of liquid plug (Ca=0.05). Figure 6-13. Comparison of the trailing film thickness at Ca=0.05 for various Reynolds number. PAGE 134 119 Film thickness variation with Re at Ca =0.05 is shown in Figure 6-13 comparing the results with the numerical study of Heil (2001). In th e steady state simulations, the initial thickness is usually chosen to be clos e to the thickness sugge sted by the analytical theory or earlier numerical studies in or der to avoid extensiv e computation time. However, the steady state solution is independent from the initial conditions and the simulations recover the trailing thickn ess value regardless of the choice of h0. t*=t.(H / ) FilmThickness 0 50 100 150 0.1 0.12 0.14 0.16 0.18 0.2 h0=0.1 h0=0.2 Figure 6-14. History of trailing film thickne ss for different initial conditions, i.e., h0 = 0.1 and h0 = 0.2 until steady state is achieved. Figure 6-14 illustrates results of a numerical experiment, in which two different initial values are used for h0. Film thickness at the steady state for the case Re=100 and Ca=0.05 is studied on a base grid of 50x10 with two levels of refi nement. The film thickness is known to be around 0. 137 and the two values for initial thickness are set to 0.1 and 0.2. These values are chosen to a pproach the steady state value from different directions. The first point is at a closer point to the wall than the steady state, whereas the second choice of initial film thickne ss is further away from the wall. PAGE 135 120 As it can be observed from Figure 6-14, film thickness at the steady state is obtained regardless of the initial film thickness value. The variation from the initial stage to the steady state is derived by the pressure difference due to the boundary conditions. When the marker at the boundary adjusts itself to a new cell, the inlet velocity is readjusted in order to account for the cha nge in the boundary conditions. The effect of this adjustment appears in the time history figures as flat regions, where the change in film thickness is small. Film region needs to be adequately reso lved by computational grid because the smearing nature of immersed boundary method at the near wall regions and at the inlet location may deteriorate the leve l of accuracy in a way that th e bias error in the solution can be significant. This is mainly because the inlet boundary condition changes according to the solution field. X t*=t.(H / ) FilmThickness 0 50 100 150 0.1 0.12 0.14 0.16 0.18 0.2 2LevelsofRefinement 3LevelsofRefinement Heil(2001) X Figure 6-15. Effect of grid refineme nt on the trailing film thickness at Re=100 and Ca=0.05. Figure 6-15 shows the time hi story of film thickness when two and thr ee levels of refinement are employed on the same base grid of 50x10. The flow is computed at PAGE 136 121 Re=100 and Ca=0.05. At the initial stages of th e flow computations, the coarse grid solution comes too close to the wall, where there are only four cells between the wall and the interface at the inlet. On the other ha nd, the number of cells in that region drops down to 10 cells at most when th ree levels of refinement are used. As a result, the fine grid computations were able to produce th e same result which is obtained by the benchmark solution (Heil 2001). 6.2.2 Time Dependent Plug Propagation Capabilities developed by the current study allow us to further investigate the evolution of the interface for the liquid plug problem. In this section, we focus on the growth of the air finger to capture the bubble shapes. Bounda ry conditions are chosen to be the same as the ones used in the steady state computations. Representation of the computational domain is shown in Figure 6-8. On the other hand, computational frame is stationary and does not follow the bubble to observe the changes at the bubble tip. Imposition of the boundary conditions of unst eady computations is modified for its consistency with the steady state cases. Mi xed inflow conditions are given on the left boundary where the liquid velocity is set to u=(0,0) and air to u=(1,0). No-slip condition is used at the top channel wall by enforcing u=(0,0). Outflow and symmetry conditions are imposed for the right and bottom boundaries. Inlet condition brings extra mass into the computational domain. This condition will cause bubble volume to increase, which is computed using the area and velocity at the inlet. Volume conservation routine accounts for this change when it is required to conserve the mass enclosed by the interface. Bubble shapes are also studied for Re=100 and Ca=0.05 for its development in a channel flow. Snapshots from the computations are shown in Figure 6-16. As the air PAGE 137 122 finger elongates in time, the liquid plug starts developing a neck between the tip of the air finger and the inlet conditions. This is due to the pressure rise at the film region. t=0.0 t=0.2 t=0.4 t=0.6 t=0.8 Figure 6-16. Evolution of air finger in a 2D channel for Re=100 and Ca=0.05. Figure 6-17 shows the pressure contours at an instant, t = 0.8. As the pr essure at the film region builds up, the interface forms a neck by balancing the pressure drop with interfacial forces. If the air finger reaches a certain length, this mechanism would cause rupture as shown in the expe riments by Cassidy et al. (1999). P 5 4 3 2 1 0 -1 -2 -3 -4 -5 Figure 6-17. Pressu re contours at t = 0.8. The cutoff values are limited to in order to highlight the pressure developing at the film region. 6.2.3 Liquid Plug Flow in Branching Channels Having capabilities of handling complex solid geometries opens up the possibilities to further investigate the unst eady behavior of liquid plug flow s in practical applications. One relevant example occurs in the medical treatment in the lung. Liquid plugs are PAGE 138 123 transported through the airways in drug delivery applications. This process relies heavily on the liquid distribution in the lung, which has many branching airways. In order to understand the liquid distribution, it is importa nt to understand the nature of liquid plugs at these airway bifurcations. Developed methodology is used to initiate the computational appr oach that can be directly applied to such a practical proble m. Bifurcation point and branches at a predefined angle can be include d in the computations via marker points. These markers are tagged as solid points to enforce the boundary conditions at these geometries. Another group of markers, ta gged as fluid points, is utilized to define the interface between the air and the liquid phases. Before including both solid and fluid points, we first explore the single phase flow in branching channels. Solid walls of the ch annel are represented by additional interfaces at the locations where it doesnâ€™t confor m to the computational boundary. Bottom computational boundary is set for the axis of symmetry. Figure 6-18 shows the configuration of the channel wh ich has a section that makes 30o from the horizontal. Figure 6-18. Computational setup for the soli d boundaries for the fl ow inside branching channels. Computations are performed on a base grid of 40x10 with four levels of refinement. The mesh covers the whole co mputational domain but the solid cells and PAGE 139 124 faces are skipped during the flow computations. Flow parameters used for the single phase computations include the Reynolds number, which is based on the channel height H, inlet velocity magnitude U, density (l ) and the viscosity (l ) of the fluid as given in Equation (6-3). l lUH Re (6-3) Figure 6-19 shows the pressure contours obt ained when the Reynol ds number is set to 100. Pressure gradient parallel to the chan nel walls is observed to be constant. At this flow condition, separation didnâ€™t occur at the critical locations where the channel tilts. Figure 6-19. Pressure contour s for single phase flow ( P = 1). Figure 6-20 shows the initial computationa l setup for the liquid plug problem when the interface is included. The grid employe d for this case is same as the single phase flow computations. Flow para meters as defined in Equation (6-1) is taken as Re=100 and Ca=0.5. Figure 6-20. Computational setup for liqui d plug problem in branching channels. PAGE 140 125 Figure 6-21. Time-dependent development of the bubble shape in branching channels. Figure 6-21 shows snapshots of the air-fi nger shapes in the channel before its bifurcation point. The developm ent of the air-finger during th e initial stages occurs in a similar fashion to the one that is presented for the horizontal channel (Figure 6-16). On the other hand, air bubble, in c ontrast to the referred case, doesnâ€™t create a neck during elongation process. This is because the bubble shape is affected by the existence of the PAGE 141 126 bifurcation point. As a resu lt, the film thickness gets smaller than the steady state solutions for the same flow parameters wh ile the bubble tip region becomes more flat. Film thickness development in the branch ing channel is compared with the steady state solutions of Heil (2001). Figure 6-22 illustrates the vari ation of film thickness in time before the bifurcation point. According to Figure 6-22, the present case study determines thickness to be 0.107 whereas stead y state solution predicts the thickness to be around 0.129. time FilmThickness 0.2 0.4 0.6 0.8 1 0.1 0.102 0.104 0.106 0.108 0.11Presentstudy (Branchingchannel) Estimated steadystatesol. byHeil(2001) h=0.107 h~0.129 Figure 6-22. Variation of film thickness in ti me in the branching channel flow before bifurcation point is reached. As the bubble elongates in the channel, th e interaction between the solid interfaces at the bifurcation point is yet to be investig ated. Such problems require modeling of their interaction, which can be handled in a si milar fashion that computational boundary and fluid interfaces are handled. 6.3 Multi-Domain Solution Techniques Previous sections have focused on the ap plications of the developed method for two-phase flow computations with complex so lid boundaries. In this section, we explore PAGE 142 127 both single and multi-domain techniques for the solution procedure to identify their influences on the efficiency of the overall al gorithm. Firstly, partitioning algorithm that establishes the sub-domain distribution is ex amined for its applicability in parallel computing. In addition, quality of the part itions produced by two separate approaches, SFC and METIS, are compared. Finally, multi grid method on adaptively refined grids is studied for both single and multi-domains, whic h utilizes SFC partitioning methodology. 6.3.1 Partitioning Partitioning algorithm based on Hilbert spac e filling curve is tested for different grid arrangements. The algor ithm creates a map of cells that contain the processor identification they belong to. Overlap regi on is created and tested for communication performance. We first explore the quality of the parti tioning algorithm on a uniform grid with the size of 32x32x32. Figure 6-23. Hilbert curve based partitioning algorith m is illustrated on a 32x32x32 uniform grid for 2, 3 and 4 processors. Figure 6-23 illustrates the partitions obtained using Hilbert curve for a set of processors. In this figure, each color repres ents a partition. For instance, for the two processors case, processor 1 is represented by red, and processor 2 is represented by blue. When the number of sub-domains can be repr esented as a power of two, then the sub- PAGE 143 128 regions are rectangular boxes. This is due to a locality property for Hilbert curve based partitioning and it improves the convergence properties of the domain decomposition algorithm when linear systems with elliptic nature are considered. For two-phase flows, grid is refined to the finest level in the vicinity of interfaces. Figure 6-24 denotes the case when an ellipsoid is placed at the cente r of a grid 16x16x16 that is resolved around the interface to 64x64x64. Figure 6-24 is produced for a spherical interface shifted from the center. Figure 6-24. Sample partitioning on an adaptive grid into 6 sub-domains. In parallel environments, transferring info rmation from one processor to the others brings additional computational cost. Current implementation is evaluated in terms of communication requirement among different processors. For benchmarking purposes, our results obtained using one-dimensional arrays are compared to the results of METIS. The ratio of overlap cells to computational cells can also be used as a performance measure for the communication cost. Attractiv eness of a partitioning algorithm increase when this ratio gets smaller. Therefore, k eeping this ratio at its minimum is one of the main objectives of the partitioning algorithm. Figure 6-25(a) illustrates the communicati on to computation ratio for METIS and SFC. According to the figure, the performan ces of both METIS and SFC partitioning are comparable for the grids tested. For small gr id sizes, the ratio for Hilbert curve takes PAGE 144 129 more desirable values, i.e., smaller values, when compared to METIS. On the other hand, when the grid sizes get larger , METIS produces better results. Figure 6-25(b) summarizes our results wi th respect to actual time spent for communication for a single variable for METIS and SFC. The actual communication time, which is performed on SGI Altix system (Table 3.3), is observed to be shorter for Hilbert curve based partitioning (SFC) even though the results ar e within the same order. The increase in grid size works in the favor of communication. This is as a result of a decrease in the performance of the netw ork connection for small sized packages. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 2345678 SubdomainComm Ratio METIS 4096 SFC 4096 METIS 39901 SFC 39901 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 0.0014 0.0016 0.0018 0.0020 2345678 SubdomainComm Time METIS 4096 SFC 4096 METIS 39901 SFC 39901 (a) (b) Figure 6-25. Comparision of the quality of partitions between cu rrent algorithm (SFC) and METIS. (a) Communication to co mputation ratio. (b) Communication time. Figure 6-26 summarizes our analysis for th e computation time for SFC and METIS. Figure 6-26(a) compares performances of SF C and METIS for number of sub-domains. Similarly, Figure 6-26(b) illustrates the computation time spent in partitioning for various grid sizes. Figure 6-26(a) implies that the SFC init ial partitioning algo rithm runs faster than METIS. Furthermore, according to Figure 6-26 (b), the difference between performances of SFC and METIS gets even more dramatic when grid size increases (even though, they are observed to be in the same order). PAGE 145 130 0.01000.10001.00002345678 SubdomainPart. Time METIS 4096 SFC 4096 METIS 39901 SFC 39901 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 01000020000300004000050000 Grid SizePart. Time METIS SFC Figure 6-26. Computation time for partitioning algorithm. (a) Variation with respect to number of subdomains. (b) Variation with respect to grid size. 6.3.2 Pressure Poisson Equation in a Single Domain Pressure Poisson equation is one of the most critical components of the overall immersed boundary computations in terms of computational cost. This section focuses on the techniques to solve this equation. A test case is developed in order to evaluate the performance and possible issues. Equation (6-4) addresses the governing equation for this test case for a given density field ( ) on a unit square which covers the spatial domain of 0.50.5x and 0.50.5y . 2, ,2coscos p fxy f xyxy (6-4) Equation (6-5) states the boundary conditions applied to Equation (6-4). (0.5,)(0.5,)(,0.5)(,0.5)0fyfyfxfx (6-5) The solution field for p is obtained using the discretiz ation procedure described in Chapter 4. PAGE 146 131 First, we explore the problem presented in Equations (6-4) and (6-5) on uniform grids. Both conjugate gradient and multigrid solvers are used to obtain the solution field for pressure. Two base grid sets are used with a resolution of 5x5 and 10x10 at 6 levels of refinement. This arrangement yields approximately 25,000 cells and 100,000 cells in two-dimensional representation, respectively. The number of levels for grid refinement determines the number levels of multigrid solver. Iteration Log(error) 10 20 30 40 50 60 -7 -6 -5 -4 -3 -2 -1 0 1 CG(G0=25,000) CG(G1=100,000) MG6(G0=25,000) MG6(G1=100,000) Figure 6-27. Iteration history for conjugate gradient and multigrid solvers on uniform grids. Iteration history based on the grid size is given in Figure 6-27. Based on the figure, it can be concluded that multigrid solver conve rges faster than the conjugate gradient algorithm. Even though the computational time for one mutigrid cycle is larger than a single iteration of conjugate gradient solver, multigrid solver usually achieves a speedup of 2-3 mainly because of the order of reducti on in the iteration number. Furthermore, the number of cycles for convergence in mu ltigrid remains almost the same whereas conjugate gradient algorithm shows an increas e of 30% in the iteration number when the grid is doubled. PAGE 147 132 A similar scenario is explored for adaptiv ely refined grids. The grid refinement criterion is based on the interface location. For this particular test case, a circle with a radius of 0.25 is placed at the center of the computational domain. The immersed boundary algorithm is used to smooth the density field when different materials are used. The solution for pressure field is sought on a base grid of 20x20 with five levels of refinement using both multigrid and conjugate gradient algorithm. Level 4 Level 3 Level 2 Level 1 Level 0 Figure 6-28. Illustration for creation of coar ser levels for multigrid solver based on the initial grid represented as Level 4. The levels presented in Figure 6-28 reflect all of the levels that the current algorithm supports. However, it is also possibl e to adjust the maximum number of levels in the multigrid solver. Using more levels enhances the convergence rate and the computational time even though it brings an overhead to each of the multigrid cycles. PAGE 148 133 Figure 6-29 illustrates the performance of mu ltigrid algorithm when different levels of multigrid employed. As expected, the numb er of iterations decreases when we include more levels in multigrid solver. Furthermore, the computation time, scaled with respect to the total time spent by the conjug ate gradient algorithm as presented in Figure 6-29(b), is found to be in favor of the multigrid solver in most of the cases. Only the case of multigrid with two levels is found to be slower than CG, while a speedup of 3 is achieved when all the available levels are employed by the algorithm. Cycles, 1= 2=1 Log(error) 5 10 15 20 25 30 -6 -4 -2 0 2Levels 3Levels 4Levels 5Levels NumberofLevels Speedup 2 3 4 5 0 1 2 3 4BasedonCPUtimeofCG (a) (b) Figure 6-29. Performance of multigrid when different number of levels is employed. (a) Iteration history. (b ) Speedup based on the soluti on time of the conjugate gradient algorithm. Behavior of the convergence rate on adap tively refined grids is similar to the uniform grid cases. However, it is certain that each multigrid level in uniform grids is subject to coarsening. On the other hand, coarsening proc ess only includes those cells that can be coarsened in the non-uniform grids. Figure 6-30 shows the performance of conjugate gradient and multigrid solvers on adaptively refined grids. Grid size is modified by changing the base level grid size while keeping the level of refinement in order to employ same number of levels in the multigrid solver. Total iteration number to PAGE 149 134 reduce the maximum error to a tolerence limit is observed to remain in the same order for multigrid while it is found to increase linear ly for the conjugate gradient algorithm. Similar to the uniform grid case, the computational cost of multigrid algorithm is less than conjugate gradient solver for all grid sizes and the speedup valu es vary between 1.5 and 3.4. NumberofCells CPUTime Speedup 20000 40000 60000 80000 100000 0 10 20 30 40 1 2 3 4 CG MG Speedup(MG) Figure 6-30. Performance of multigrid and c onjugate gradient algorithms on non-uniform grids. (a) Iteration number. (b) Computational time and speedup. Density variation in the flow field affects the coefficient matrix at all levels. At the finest grid level, density varies across the interface smoothly as it is smeared using indicator function. One way to incorporate de nsity values at a coarse level is to employ the restriction operator on the fine level. A nother way we consider is to use interface location and employ the density variation sharpl y only at coarser levels. Original density field with smooth variation and both options for density field at coarse levels are presented in Figure 6-31. PAGE 150 135 (a) (b) (c) Figure 6-31. Treatment of the density field in multigrid algorithm. (a) Smeared density field at the finest level. (b) Smooth variation at intermediate levels. (c) Sharp variation at intermediate levels. Figure 6-32 shows the performance results at different density ra tios for conjugate gradient algorithm and multigrid algorithm w ith smooth and sharp density treatment. Materials inside and outside the interface is c hosen in a way to simulate the problems of bubble and drop dynamics. Ratios less than one corresponds to a bub ble (lighter fluid inside the interface) while ratios greater than one corresponds to a drop (heavier fluids inside the interface). DensityRatio(Out/In) Iterations 10-3 10-2 10-1 100 101 102 103 100 200 300 CG MGSmoothTreatment MGSharpTreatment Bubble Underrelaxed with0.5 800Iterations Droplet DensityRatio(Out/In) CPUTime 10-3 10-2 10-1 100 101 102 10 3 0 0.5 1 1.5 CG MGSmoothTreatment MGSharpTreatment Figure 6-32. Performance of multigrid and conj ugate gradient at various density ratios. (a) Total iteration number. (b) Total CPU time. The convergence rate and hen ce the computational cost of multi-grid algorithm for both density treatments stayed advantage ous over conjugate gradient for moderate PAGE 151 136 density variations. All algorithms of MG a nd CG have shown a significant increase for the iteration numbers when the density ratios are in the order of 1000. Such an increase in iteration number is observed to be higher in cases resembling droplets than the ones of bubbles. Especially, the multi-grid algorithm w ith smooth density variations in coarse levels requires under-relaxation factor to ac hieve convergence at th e ratio of 1000. On the other hand, convergence is achieved without employing under-relaxation with the sharp treatment for density variation achie ved convergence without under-relaxation even though the iterations for convergence still shows a significant increase when compared to moderate density ratios. Figure 6-32 is produced using 4 sweeps for preand postsmoothing inside the multigrid algorithm. When the number of sweeps is increased from 4 to 8 for the case of sharp treatment of density variation, the iteration number for convergence decreased to back to 19. Also , sharp treatment achie ved convergence at a reasonable rate for density ratios of 10,000 when the number of sweeps is set 24, while smooth treatment failed to converge for ratios above 1,000. Figure 6-32(b) shows the total computati on time for CG and MG with sharp and smooth treatment. The number of sweeps for smoothing operators is consistent with the iterations presented in Figure 6-32(a) (4 sweeps). Sharp variation for multigrid achieves better performance for the ratios of 0.001, 0.01, 0.1, 1, 10 and 100 than the CG and MG with smooth treatment. The choice of underrelaxation parame ters and number of sweeps in smoothing steps of multigrid modifies the overall performance significantly for density ratios in the order of 1000 or higher. Figure 6-33 illustrates the steps of residual correction in multigrid for a density ratio of 100. Density variation at coarse levels is treated sharply. PAGE 152 137 (Presmoothing Level 3) (Presmoothing Level 2) (Presmoothing Level 1) (Solve Level 0) (Postsmoothing Level 1) (Postsmoothing Level 3) X-0.4 -0.2 0 0.2 0.4 Y-0.4 -0.2 0 0.2 0.4 Residual0 0.2 0.4 0.6 0.8 X-0.4 -0.2 0 0.2 0.4 Y-0.4 -0.2 0 0.2 0.4 Residual0 0.2 0.4 0.6 0.8 X Y Z Figure 6-33. Residual correction steps in multig rid solver for density ratio of 100 using sharp varying density field. 6.3.3 Pressure Poisson Equation in Multiple domains Direct parallelization approach for the conjugate gradient method algorithm can give good speedup rates for small number of processors. However, its performance deteriorates significantly when the communica tion between the processors is slow or non-homogenous as in the applications of metacomputing. In order to open up the possibilities for coarse grain parallelism, we explore domain decomposition methods PAGE 153 138 using multigrid. The implemention is examined for its performance on multiple domains for the problem presented in Equations (6-4) and (6-5). Direct application of additive Schwarz domain decomposition method on any iterative solver exhibits dependency on both th e grid size at the overlap region and the number of sub-domains employed. We expl ore the behavior of single level additive Schwarz method when applied as a preconditioner to the conj ugate gradient algorithm. Grid initialized with a box grid (64x64) and adapted continuously for additional levels uniformly. Cases were compared fo r different number of processors. Figure 6-34 illustrates some of the partitions used for the domain decompositions method (64x64). (a) (b) (c) Figure 6-34. Grids and partitions used for pa rallel performance study. (a) Single CPU. (b) 6 CPUs. (c) 11 CPUs. (Number of colors indicates maximum number of neighbors per processor.) Additive-Schwarz preconditioner is employe d as an inner solver, which produce yields a residual vector to be used by the c onjugate gradient algorithm. Attractive nature of AS preconditioner is that the solution process is completely performed on the local processor without any need of communication. However, e fficiency of the overall CG algorithm is dependent on the number of swee ps employed to obtain the approximate preconditioner. The more the number of sweep s is used, the coarse r the granularity of parallelism gets. An excessive reduction of error for this equation can cause redundant PAGE 154 139 computational effort, while inadequate reduc tion of error results less employment for local computation. (a) (b) Figure 6-35. Iteration history of ASCG. (a) Different number of grids at 4 sweeps. (b) Different number of sweeps on first two grids ( G0 and G1). Figure 6-35 illustrates the effect of the grid size and the number sweeps employed for the additive Schwarz solver on the number of iterations on four sub-domains. It can be inferred from the Figure 6-35 that the iteration num ber for convergence is doubled when number of cells increase by four. Furt hermore, convergence hi stories of iterations for different number of sweeps shows that more number of sweeps decrease the number of conjugate gradient iterations, resulting an increase in the computational cost. The number of iterations for convergence redu ces when the number of sweeps is set according to the number of sub-domains employed. Figure 6-36(a) illustrates the domain depe ndency of additive Schwarz algorithm for different frequencies of communication. Figure 6-36(b) compares the convergence behavior of conjugate gradient to the additive Schwarz algorithm. Figure 6-36 suggests that direct application of additive Schwar z algorithm as a preconditioner to conjugate gradient is dependent on the number of pa rtitions and grid size. Convergence rate PAGE 155 140 deteriorates when communicati on frequency is set to a lower value (coarse granular). Moreover, convergence rate greatly deteriorat es when more subdomains are employed. Iterations Log(Err) 200 400 600 800 -6 -5 -4 -3 -2 -1 0 Grid1,ASCG Grid2,ASCG Grid3,ASCG Grid1,PCG Grid2,PCG Grid3,PCG 6CPU (a) (b) Figure 6-36. Effect of number of sub-domains on convergence rate. (a) AS solver with CG sweeps for coarse and fine grain domain decomposition. (b) Convergence history of CG preconditioned with AS using different number of sweeps. Multilevel additive Schwarz method is e xplored for its performance on multiple domains. It is similar to the multigrid met hod presented in a single domain. As it can be seen from Figure 6-37, grids at the coarser levels are also decomposed into multiple subdomains provided that they are greater than a threshold value for the number of cells in a single domain. If it is lower than the thre shold value or it is the coarsest level, then partitioning is not employed. This is mainly because the hidden latency for the vectormatrix multiplication would become ineffec tive for the communications on very coarse grids. Choice for the smoother can affect the performance of the multigrid solver and parallelization procedure. Ghost cells at ea ch grid level are required to communicate as many as the number of sweeps when conjugate gr adient is used as a smoother. This is referred to MG-CG algorithm. This algorithm can further be enhanced to yield a better PAGE 156 141 performance by utilization of AS preconditione r to the conjugate gradient algorithm. Another approach is to employ Gauss-Seidel as both preand post smoother. This arrangement, referred to MG-GS, requires single communication at the beginning of the sweep and uses that value to smooth the solution field. (Level 4) (Level 3) (Level 2) (Level 1) (Level 0) Figure 6-37. Creation of coarse grid levels and the way they are partitioned into five subdomains. Figure 6-38 shows the number of iterati ons for each of the algorithm for density ratio of 100. In this figure, coarse gra nular solver with Gau ss-Seidel smoothing has similar characteristics of single level additiv e Schwarz algorithm. On the other hand, multigrid solver with CG smoother can achieve convergence rates independent of number of subdomains. When employed as a precond itioner, iteration number for convergence decreases. PAGE 157 142 Subdomains IterationNumber 5 10 15 20 25 10 15 20 25 30 MG-GS MG-CG MG-CGAS Figure 6-38. Iteration numbers for di fferent number of sub-domains. 6.4 Summary In this chapter, the key components of the immersed boundary technique are validated and analyzed using benchmark pr oblems. In addition, steady and unsteady behavior of the liquid plug problem is investigated. The results obtained for the flow over a cylinder show good agreement with the prior studies. This completes the validati on study of the solid boundary treatment that utilizes marker points to enforce the boundary conditions on the flow field. Second case study examines liquid plug problem, in which air finger propagates in a channel filled with liquid. Interactive boundary condition due to the marker points at the inlet and the symmetry line are used to obtain the steady stat e solution field. These results are verified using the prior numerical studies. Furthe rmore, unsteady propagation of the liquid plug is studied in horizontal and branching chan nels. Since the branching channel flow simulation involves both solid and fluid inte rfaces, our computations demonstrate the methodâ€™s capability of handling solid, liquid and gas objects all together. PAGE 158 143 Finally, developed strategy of the trea tment of multi-domain computations on adaptively refined Cartesian grid s is investigated for its performance and its applicability on parallel computations. The partitioning method by space filling curves is demonstrated on two and three dimensional gr ids. It is shown that the communication cost that is produced using Hilbert curve base d partitions is in the same order with the communication cost of partitions by METI S while computational time required for partitioning algorithm tends to increase with grid size for METIS much faster than SFC partitioning. The solution procedure for pressure Po isson equation is analyzed for its performance when multigrid and conjugate grad ient solvers are utilized. Single domain study showed that standalone multigrid solver offers a speedup of 2-3 when compared to conjugate gradient solver. The effect of vari ations in the stiffness matrix is examined using various density ratios. Convergence rate of multigrid remained almost constant for density ratios between 0.01 and 100, whereas conjugate gradient solver exhibited a decrease. However, multigrid solvers may require additional tuning of relaxation parameter and/or preand post smoothing parameters when the variations in the stiffness matrix are in the order of 1000. At these cond itions, conjugate gradie nt stayed consistent with the trend of a decrease in its convergence rate. The application of additive Schwarz do main decomposition method is considered for the multi-domain solution. The number of iterations for convergence is found to be highly dependent on the number of subdomains. One of the reasons of such strong dependence can be due to the existence of various cell sizes in the overlap region. Multigrid approach is shown to aid such issues as it is shown that the iteration number for PAGE 159 144 convergence remain constant for up to 28 sub-dom ains. As a result, we can conclude that multigrid technique is a suitable technique for parallel computations of two-phase flow simulations. PAGE 160 145 CHAPTER 7 SUMMARY AND FUTURE WORK 7.1 Summary Primary objective of this dissertation is to develop an effective computational framework using multi-domain techniques for complex interfacial flow simulations. Main components of the present wo rk are summarized as follows: Immersed boundary method for multidimensi onal flow with the capabilities of handling multiple separate interfaces of co mplex solid geometries and fluid-fluid phase combinations. A domain decomposition algorithm that acc ounts for large property variations in the flow field. A partitioning algorithm for unstructured mesh data with capabilities of adaptation. A validation study for modeling complex geom etries is performed using the case of a flow over a cylinder. Our results agreed with the results obtained by the prior studies. Second case study examines the liquid plug problem and resu lts are compared to prior numerical studies that computed the steady state solution for the film thickness and interface features. Furthermore, unsteady propa gation of the liquid plug is investigated in horizontal and branching channels . When branching channel fl ow is considered, shape of the channel is constructed using marker po ints. This case demonstrates the methodâ€™s capability of handling solid, liquid and gas objects all together. The developed strategy for the treatment of multi-domain computations on adaptively refined Cartesian grids is studied . The partitioning method by space filling curves is demonstrated on two and three di mensional grids. Solution procedure for PAGE 161 146 pressure Poisson equation is analyzed for its performance when multigrid and conjugate gradient solvers are utili zed. The single domain study is aimed to compare both methodsâ€™ convergence rate, to tal computati onal time on uniform and non-uniform grids. In addition, the effects of various density ra tios are also examined. The application of additive Schwarz domain decomposition met hod is considered for the multi-domain solution. Multigrid implementation showed that it achieves convergence rates which are independent of the number of sub-domains. Wh en there is large variation in the density field, sharp treatment of density in coarse gr id levels is observed to yield better results. When additive Schwarz is used as a preconditioner to the smoothers in MG-CG algorithm, computational cost increases even though itera tion numbers decrease. 7.2 Future Work Marker based interface tracking met hods to model moving/stationary solid boundaries have been used by several researchers for a wi de range of computations ranging from turbulent flows (Yang and Bala ras 2006) and fluid-structure interaction (Gilmanov and Sotiropoulos 2005; Zhu and Pesk in 2002). This dissertation adopted an approach that is capable of handling fixed so lid bodies immersed inside the flow field. The approach is similar to the one used by Yang and Elias (2005), in which the bodies can also deform and move with the flow f eatures. Having such capabilities would open the possibilities of exploring many flow pr oblems, i.e.; micro-air vehicle design and flight of an insect. The heat transfer computat ions involving phase change are crucial parts of the multiphase flows, which need to be investigat ed. It was studied as part of a group work and needs to be extended and studied further for complex flows. Important concepts to be addressed include the artific ial correction of temperature in cells across the interface, PAGE 162 147 and thermal boundary layer thickness. In a ddition, for morphological level of phase changes, such as crystal growth problems, wh en the geometry is hi ghly irregular and the curvatures and surface tension/energy effect s more difficult to accurately capture, significant efforts will be required before truly three-dimensional physics of practical interest can be simulated. PAGE 163 148 LIST OF REFERENCES Aftosmis, M. J. (1997). "Solution Adaptiv e Cartesian Grid Methods for Aerodynamic Flows with Complex Geometries." Lecture notes, von Karm an Institute for Fluid Dynamics, Series , 2. Aftosmis, M. J., Berger, M. J., and Murm an, S. M. (2004). "Applications of SpaceFilling-Curves to Cartesian Methods for CFD." AIAA Paper , 1232, 2003-1237. Al-Rawahi, N., and Tryggvason, G. (2004) . "Numerical simulation of dendritic solidification with convection: Three-dimensional flow." Journal of Computational Physics , 194(2), 677-696. Antaki, J. F., Blelloch, G. E., Ghattas, O., Ma lcevic, I., Miller, G. L., and Walkington, N. J. (2000). "A parallel dynamic-mesh la grangian method for simulation of flows with dynamic interfaces." Proceedings of the 2000 ACM/IEEE Conference on Supercomputing . Aulisa, E., Manservisi, S., and Scardovell i, R. (2004). "A surface marker algorithm coupled to an area-preserving marker re distribution method for three-dimensional interface tracking." Journal of Computational Physics , 197(2), 555-584. Baranger, J., Garbey, M., and Oudin-Dar dun, F. (2005). "Generalized Aitken-like Acceleration of the Schwarz Method." Lecture Notes in Computational Science and Engineering , 40, 505-512. Barberou, N., Garbey, M., Hess, M., Resch, M. M., Rossi, T., Toivanen, J., and TromeurDervout, D. (2003). "Efficient metacomputi ng of elliptic linear and non-linear problems." Journal of Parallel and Distributed Computing , 63(5), 564-577. Barrett, R. (1994). Templates for the solution of linea r systems: building blocks for iterative methods , SIAM Philadelphia. Bourgat, J., Glowinski, R., Tallec, P., and Vi drascu, M. (1989). "Variational formulations and algorithms for trace operator in doma in decomposition calculations, inSecond Int." Symp. on Domain Decomposition Methods for PDEs, T. Chan, R. Glowinski, J. Periaux, and O. Widlund, eds., Philadelphia, PA , 3-17. Brackbill, J. U., Kothe, D. B., and Ze mach, C. (1992). "A continuum method for modeling surface tension." Journal of Computational Physics , 100(2), 335-354. PAGE 164 149 Bramble, J. H., Pasciak, J. E., and Vassilev, A. T. (1998). "Analysis of Non-Overlapping Domain Decomposition Algorithms with Inexact Solves." Mathematics of Computation , 67(221), 1-19. Bretherton, F. P. (1961). "The motion of long bubbles in tubes." Journal of Fluid Mechanics , 10, 166-188. Brezina, M., Falgout, R., MacLachlan, S., Mant euffel, T., McCormick, S., and Ruge, J. (2005). "Adaptive Smoothed Aggregation ( SA) Multigrid." SIAM Review , 47(2), 317-346. Briggs, W. L., and McCormick, S. F. (2000). A multigrid tutorial , Society for Industrial and Applied Mathematics Philadelphia, PA, USA. Butz, A. R. (1971). "Alternative algor ithm for Hilbertâ€™s space-filling curve." IEEE Transactions on Computers , 20, 424-426. Cassidy, K. J., Gavriely, N., and Grotberg, J. B. (2001). "Liquid Plug Flow in Straight and Bifurcating Tubes." Journal of Biomechanical Engineering , 123, 580. Chan, T. F., and Mathew, T. P. (1994). Domain Decomposition Algorithms , Dept. of Mathematics, University of California, Los Angeles. Choi, S. K., Kim, S. O., Lee, C. H., and Choi, H. K. (2003). "Use of the Momentum Interpolation Method for Flow s with a Large Body Force." Numerical Heat Transfer: Part B: Fundamentals , 43(3), 267-287. Chorin, A. J. (1968). "Numerical Solu tion of the Navier-Stokes Equations." Mathematics of Computation , 22(104), 745-762. Cox, B. G. (1961). "An experimental investig ation of the streamlines in viscous fluid expelled from a tube." Journal of Fluid Mechanics , 20(02), 193-200. Croce, R., Griebel, M., and Schweitzer, M. A. (2004). "A Parallel Level-Set Approach for Two-Phase Flow Problems with Surface Tension in Three Space Dimensions." Preprint , 157. De Roeck, Y. H., Le Tallec, P., and Vidras cu, M. (1992). "A domain-decomposed solver for nonlinear elasticity." Computer Methods in Applied Mechanics and Engineering , 99(2-3), 187-207. Demmel, J., Heath, M. T., and van der Vo rst, H. (1993). "Para llel linear algebra." Acta Numerica , 2, 111. Dohrmann, C. R. (2003). "A Preconditioner for Substructuring Based on Constrained Energy Minimization." SIAM Journal on Scientific Computing , 25(1), 246-258. PAGE 165 150 Dryja, M., Sarkis, M. V., and Widlund, O. B. (1996). "Multilevel Schwarz methods for elliptic problems with discontinuous coefficients in three dimensions." Numerische Mathematik , 72(3), 313-348. Dubois, O. (2003). "Optimized Schwarz Met hods for the Advection-Diffusion Equation," Department of Mathematics and Statistics, McGill University. Enright, D., Fedkiw, R., Ferziger , J., and Mitchell, I. (2002). "A hybrid particle level set method for improved interface capturing." Journal of Computational Physics , 183(1), 83-116. Farhat, C., Lesoinne, M., LeTallec, P., Pier son, K., and Rixen, D. (2001). "FETI-DP: A Dual-Primal United FETI Method-Part I: A Faster Alternative to the Two-level FETI Method." International Journal for Nume rical Methods in Engineering , 50(7), 1523-1544. Ferziger, J. H., and Peric, M. (2002). Computational methods for fluid dynamics , Springer New York. Fox, G. C., Williams, R. D ., and Messina, P. C. (1994). Parallel Computing Works! Morgan Kaufmann. Francois, M., and Shyy, W. (2002). "Micro -scale drop dynamics for heat transfer enhancement." Progress in Aerospace Sciences , 38(4), 275-304. Francois, M., and Shyy, W. (2003). "Computatio ns of Drop Dynamics with the Immersed Boundary Method, Part 1: Numerical Algoritm and B uoyancy-Induced Effect." Numerical Heat Transfer: Part B: Fundamentals , 44(2), 101-118. Francois, M., Uzgoren, E., Jackson, J., and Shyy, W. (2004). "Multigrid computations with the immersed boundary te chnique for multiphase flows." International Journal of Numerical Methods for Heat & Fluid Flow , 14(1), 98-115. Fujioka, H., and Grotberg, J. B. (2004). "S teady Propagation of a Liquid Plug in a TwoDimensional Channel." Journal of Biomechanical Engineering , 126, 567. Garbey, M. (2005). "Acceleration of the Schwarz Method for Elliptic Problems." SIAM Journal on Scientific Computing , 26(6), 1871-1893. Garbey, M., and Tromeur-Dervout , D. (2002). "On some Aitken -like acceleration of the Schwarz method." International Journal for Nu merical Methods in Fluids , 40(12), 1493-1513. Garey, M. R., and Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness , WH Freeman & Co. New York, NY, USA. PAGE 166 151 Ghia, U., Ghia, K. N., and Shin, C. T. (1982). "High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method." NASA, Symposium on Numerical Boundary Condition Procedures and Multigrid Methods, (Moffett Field, CA, Oct. 19-22, 1981) J ournal of Computational Physics (ISSN 0021-9991) , 48, 387-411. Giavedoni, M. D., and Saita, F. A. (2006). "T he axisymmetric and plane cases of a gas phase steadily displacing a Newtonian liquidâ€”A simultaneous solution of the governing equations." Physics of Fluids , 9(8), 2420-2428. Gilmanov, A., and Sotiropoulos, F. (2005). "A hybrid Cartesian/immersed boundary method for simulating flows with 3D, ge ometrically complex, moving bodies." Journal of Computational Physics , 207(2), 457-492. Graham, I. G., and Hagger, M. J. (1999). "Unstructured Additive Schwarz--Conjugate Gradient Method for Elliptic Problems with Highly Discontinuous Coefficients." SIAM Journal on Scientific Computing , 20(6), 2041-2066. Gu, C. Y. (1991). "Computation of Flows with Large Body Forces." Numerical Methods in Laminar and Turbulent Flow, Pineridge Press, Swansea, UK . Ham, F. E., Lien, F. S., and Strong, A. B. (2002). "A Cartesian Grid Method with Transient Anisotropic Adaptation." Journal of Computational Physics , 179(2), 469494. Hao, Y., and Prosperetti, A. (2004). "A numerical method for three-dimensional gasâ€“ liquid flow computations." Journal of Computational Physics , 196(1), 126-144. Harlow, F. H., and Welch, J. E. (1965) . "Numerical calculati on of time-dependent viscous incompressible flow." Phys. Fluids , 8(12), 2182. Heil, M. (2000). "Finite Reynolds number effect s in the propagation of an air finger into a liquid-filled flexible-walled channel." Journal of Fluid Mechanics , 424, 21-44. Hendrickson, B., and Devine, K. (2000). "D ynamic load balanci ng in computational mechanics." Computer Methods in Applie d Mechanics and Engineering , 184(2), 485-500. Hendrickson, B., and Leland, R. (1995). "T he Chaco userâ€™s guide: Version 2.0." Sandia National Laboratories, Albuquerque, NM , 87185-1110. Hill, J. C., Walkington, N., and Ghattas, O. (2004). "Parallel Eulerian Methods for Flows with Elastic Membranes." Howell, P. D., Waters, S. L., and Grotberg , J. B. (2000). "The propagation of a liquid bolus along a liquid-line d flexible tube." Journal of Fluid Mechanics , 406, 309335. PAGE 167 152 Iqbal, S., and Carey, G. F. (2005). "Perfo rmance analysis of dynamic load balancing algorithms with variable number of processors." Journal of Parallel and Distributed Computing , 65(8), 934-948. Issa, R. I. (1986). "Solution of the implicitly discretised fluid flow equations by operatorsplitting." Journal of Computational Physics , 62(1), 40-65. Japhet, C., Nataf, F., and Rogier, F. (2001) . "The optimized order 2 method Application to convectionâ€“diffusion problems." Future Generation Computer Systems , 18(1), 17-30. Johan, Z. (1992). "Data parall el finite element techniques for large-scale computational fluid dynamics." Ph. D. Thesis Stanford Univ., CA. Johansen, H., and Colella, P. (1998). "A Cartesian grid embedded boundary method for Poisson's equation on irregular domains." Journal of Computational Physics , 147(1), 60-85. Karypis, G., and Kumar, V. (1998). "METI S: A Software Package for Partitioning Unstructured Graphs, Partitioning Me shes, and Computing Fill-Reducing Orderings of Sparse Matrices." University of Minnesota, Sept . Kim, J., and Moin, P. (1985). "Application of a fractional-step method to incompressible Navier-Stokes equations." Journal of Computational Physics , 59, 308-323. Kleinstreuer, C. (2003). Two-Phase Flow: Theory and Applications , Taylor & Francis Group. Lang, S., and Wittum, G. (2005). "Large-scale density-driven flow simulations using parallel unstructured Grid adaptation and local multigrid methods." Concurrency and Computation Practice and Experience , 17(11), 1415-1440. Lawder, J. K. (2000). "Calculation of Mappi ngs between One and n-dimensional Values Using the Hilbert Space-Filling Curve." Le Tallec, P. (1994). "Numerical methods for nonlinear three-dime nsional elasticity." Handbook of Numerical Analysis , 3. Li, Z., and Lai, M. C. (2001). "The Imme rsed Interface Method for the Navier-Stokes Equations with Singular Forces." Journal of Computational Physics , 171(2), 822842. Lions, P. L. (1989). "On the Schwarz alternating method III: A variant for nonoverlapping subdomains." Third International Symposium on Domain Decomposition Methods for Par tial Differential Equations, he ld in Houston, Texas, March , 20-22. PAGE 168 153 Liovic, P., Lakehal, D., and Liow, J. L. ( 2004). "LES of turbulent bubble formation and break-up based on interface tracking." Direct and Large-Eddy Simulation V Amsterdam: Kluwer Academic , 5, 261. Liu, C. H., Leung, D. Y. C., and Woo, C. M. (2003). "Development of a scalable finite element solution to the Navierâ€“Stokes equations." Computational Mechanics , 32(3), 185-198. Liu, X. D., Fedkiw, R. P., and Kang, M. (2000). "A boundary condition capturing method for Poisson's equation on irregular domains." Journal of Computational Physics , 160(1), 151-178. Lrstad, D., and Fuchs, L. (2004). "Highorder surface tension VOF-model for 3D bubble flows with high density ratio." Journal of Computational Physics , 200(1), 153-176. Losasso, F., Fedkiw, R., and Osher, S. (2004) . "Spatially adaptive t echniques for level set methods and incompressible flow." Computers and Fluids . Losasso, F., Gibou, F., and Fedkiw, R. (2004) . "Simulating water and smoke with an octree data structure." ACM Transactions on Graphics (TOG) , 23(3), 457-462. Mandel, J. (1993). "Balancing domain decomposition." Communications in Numerical Methods in Engineering , 9(3), 233-241. Marella, S., Krishnan, S., Liu, H., and Udaykumar, H. S. (2005). "Sharp interface Cartesian grid method I: An easily implemented technique for 3D moving boundary computations." Journal of Computational Physics , 210(1), 1-31. McQueen, D., and Peskin, C. (1997). "Shared-Memory Parallel Vector Implementation of the Immersed Boundary Method for th e Computation of Blood Flow in the Beating Mammalian Heart." The Journal of Supercomputing , 11(3), 213-236. Miller, T. F., and Schmidt, F. W. (1988). "Use of a pressure-w eighted interpolation method for the solution of the incompre ssible Navier-Stokes equations on a nonstaggered grid system." Numerical Heat Transfer , 14(2), 213-233. Nataf, F., and Rogier, F. (1995). "Factorizat ion of the convection-diffusion operator and the Schwarz algorithm." M , 3, 67-93. Nobari, M., and Tryggvason, G. (1996). "Numer ical simulations of three-dimensional drop collisions." AIAA Journal , 34(4), 750-755. Oliker, L., and Biswas, R. (1998). "PLUM: parallel load balancing for adaptive unstructured meshes." Journal of Parallel and Distributed Computing , 52(2), 150177. PAGE 169 154 Oliker, L., and Biswas, R. (2000). "Paralle lization of a dynamic unstructured algorithm using three leading programming paradigms." IEEE Transactions on Parallel and Distributed Systems , 11(9), 931-940. Pan, Y., and Suga, K. (2005). "Numerical simu lation of binary liqui d droplet collision." Physics of Fluids , 17(8), -. Park, Y. M., and Kwon, O. J. (2004). "Simul ation of Unsteady Rotor Flow Field Using Unstructured Adaptive Sliding Meshes." Journal of the American Helicopter Society , 49(4), 391-400. Passoni, G., Cremonesi, P., and Alfonsi, G. (2001). "Analysis and implementation of a parallelization strategy on a Navier-Stokes solver for shear flow simulations." Parallel Computing , 27(13), 1665-1685. Patankar, S. V., and Spalding, D. B. (1972). "A calculation proce dure for heat, mass and momentum transfer in threedimensional parabolic flows." Int. J. Heat Mass Transfer , 15(10), 1787. Perot, B., and Nallapati, R. (2003). "A movi ng unstructured staggered mesh method for the simulation of incompressible free-surface flows." Journal of Computational Physics , 184(1), 192-214. Peskin, C. S. (1977). "Numerical anal ysis of blood flow in the heart." J. Comput. Phys , 25(3), 220. Peskin, C. S. (2003). "The immersed boundary method." Acta Numerica , 11, 479-517. Pilkington, J. R., and Baden, S. B. (1996). "Dynamic partitioning of non-uniform structured workloads withspacefilling curves." Parallel and Distributed Systems, IEEE Transactions on , 7(3), 288-300. Reinelt, D. A., and Saffman, P. G. (1985). "The penetration of a finger into a viscous fluid into a channel and tube." SIAM J. Sci. Stat. Comput , 6, 542. Rendleman, C. A., Beckner, V. E., Lijewski, M., Crutchfield, W., a nd Bell, J. B. (2000). "Parallelization of structured, hierarchi cal adaptive mesh refinement algorithms." Computing and Visualization in Science , 3(3), 147-157. Rider, W. J., and Kothe, D. B. (1995). "Stretching and tearing interface tracking methods." AIAA Computational Fluid Dynamics Conference, 12 th, and Open Forum, San Diego, CA , 806-816. Rider, W. J., and Kothe, D. B. (1998). "Reconstructing volume tracking." Journal of Computational Physics , 141(2), 112-152. PAGE 170 155 Ruge, J., and Stuben, K. (1985). "Efficient solu tion of finite difference and finite element equations." Multigrid methods for integral and differential equations (Bristol, 1983), Oxford Univ. Press, New York , 169. Saad, Y. (2003). "Iterative Methods for Sparse Linear Systems." Saad, Y., and van der Vorst, H. A. (2000). "Ite rative solution of linear systems in the 20th century." Journal of Computational and Applied Mathematics , 123(1-2), 1-33. Scardovelli, R., and Zaleski, S. (1999). "Direct numerical simulation of free-surface and interfacial flow." Annual Review of Fluid Mechanics , 31(1), 567-603. Schamberger, S. (2004). "Graph partitioning with the Party library : Helpful-sets in practice." Comp. Arch. and High Perf. Comp., SBAC-PAD , 4, 198. Schloegel, K., Karypis, G., and Kumar, V. (2004). "Graph Partitioning for High Performance Scientific Simulations." Computing Reviews , 45(2), 110. Schwarz, H. A. (1869). "Uber einige Abbildungsaufgaben." Ges. Math. Abh , 11, 65-83. Sethian, J. A., and Smereka, P. (2003). "Levet Set Methods for Fluid Interfaces." Annual Review of Fluid Mechanics , 35(1), 341-372. Shin, S., and Juric, D. (2002). "Modeling thre e-dimensional multiphase flow using a level contour reconstruction method for fr ont tracking without connectivity." Journal of Computational Physics , 180(2), 427-470. Shyy, W. (1994). Computational modeling for fluid flow and interfacial transport , Elsevier New York. Shyy, W. (2004). "Multiphase computati ons using sharp and continuous interface techniques for micro-gravity applications." Comptes rendus. Mecanique , 332(5-6), 375-386. Shyy, W., Smith, R. W., Udaykumar, H. S., and Rao, M. M. (1996). Computational Fluid Dynamics with Moving Boundaries , Taylor & Francis, Inc. Bristol, PA, USA. Singh, R. (2006). "Three-dimensional marker-b ased multiphase flow computation using adaptive cartesian grid techniques," Un iversity of Florida, Gainesville. Singh, R., N'Dri, N., Uzgoren, E., Shyy, W ., and Garbey, M. (2005). "Three-Dimensional Adaptive, Cartesian Grid Method for Multiphase Flow Computations." 43 rd AIAA Aerospace Sciences Meeting and Exhibit , 2005. Singh, R., and Shyy, W. (2006). "Three-Dimens ional Adaptive Grid Computation with Conservative, Marker-Based Tracking for Interfacial Fluid Dynamics." 44 th AIAA Aerospace Sciences Meeting and Exhibit , 1-23. PAGE 171 156 Smith, B., Bjorstad, P., and Gropp, W. (2004). Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , Cambridge University Press. Sotiropoulos, F., and Abdallah, S. (1991). "The discrete continuity equation in primitive variable solutions of incompressible flow." Journal of Computational Physics , 95(1), 212-227. Sussman, M. (2003). "A second order coupled level set and volume-of-fluid method for computing growth and collapse of vapor bubbles." Journal of Computational Physics , 187(1), 110-136. Sussman, M. (2005). "A parallelized, adaptive algorithm for multiphase flows in general geometries." Computers and Structures , 83(6-7), 435-444. Sussman, M., and Puckett, E. (2000). "A coupled level set and volume-of-fluid method for improved interface capturing." Journal of Computational Physics , 162, 301â€“ 337. Tafti, D. (1995). "Alternate Formulations for the Pressure Equation Laplacian on a Collocated Grid for Solving the Unsteady Incompressible Navier-Stokes Equations." Journal of Computational Physics , 116(1), 143-153. Teresco, J. D., Faik, J., and Flaherty, J. E. (2005). "Hierarchical partitioning and dynamic load balancing for scientific computation." Lecture Notes in Computer Science , 3732, 911. Tornberg, A. K., and Engquist, B. (2004). "Num erical approximations of singular source terms in differential equations." Journal of Computational Physics , 200(2), 462488. Torres, D. J., and Brackbill, J. U. (2000). "The point-set methodFront-tracking without connectivity." Journal of Computational Physics , 165(2), 620-644. Udaykumar, H. S., Mittal, R., Rampunggoon, P., and Khanna, A. (2001). "A sharp interface Cartesian grid method for simulating flows with complex moving boundaries." Journal of Computational Physics , 174(1), 345-380. Uzgoren, E., Shyy, W., and Garbey, M. "Par allelized Domain Decomposition Techniques for Multiphase Flows." ASME Heat Transfer/Fluids Engineering Summer Conference , Charlotte, North Carolina. Walshaw, C., Cross, M., and McManus, K. (2002). "Multiphase Mesh Partitioning for Parallel Computational Mechanics Codes." Proceedings of the International Conference on Computati onal Science-Part II , 943-952. Wang, Z., and Wang, Z. J. (2004). "The Le vel Set Method on Adaptive Cartesian Grid For Interface Capturing." AIAA Paper No , 82. PAGE 172 157 Wasekar, V. M., and Manglik, R. M. (2003). "Short-Time-Transient Surfactant Dynamics and Marangoni Convection Around Boiling Nuclei." Journal of Heat Transfer , 125(5), 858-866. Wasfy, T., West, A. C., and Modi, V. (1998) . "Parallel finite element computation of unsteady incompressible flows." International Journal for Numerical Methods in Fluids , 26(1), 17-37. Wesseling, P. (1982). "A robust and efficient multigrid method." Xiao, F. (1999). "A Computational Model fo r Suspended Large Rigid Bodies in 3D Unsteady Viscous Flows." Journal of Computational Physics , 155(2), 348-379. Xu, S., and Wang, Z. J. (2006). "An imme rsed interface method for simulating the interaction of a fluid with moving boundaries." Journal of Computational Physics , 216(2), 454-493. Yang, J., and Balaras, E. (2006). "An em bedded-boundary formulation for large-eddy simulation of turbulent flows in teracting with moving boundaries." Journal of Computational Physics , 215(1), 12-40. Ye, T., Mittal, R., Udaykumar, H. S., and Shyy, W. (1999). "An accurate Cartesian grid method for viscous incompressible flow s with complex immersed boundaries." Journal of Computational Physics , 156(2), 209-240. Ye, T., Shyy, W., Tai, C. F., and Chung, J. N. (2004). "Assessment of sharp-and continuous-interface methods for drop in static equilibrium." Computers and Fluids , 33(7), 917-926. Zhu, L., and Peskin, C. S. (2002). "Simulation of a flapping flexible filament in a flowing soap film by the im mersed boundary method." Journal of Computational Physics , 179(2), 452-468. PAGE 173 158 BIOGRAPHICAL SKETCH Eray Uzgoren was born on June 23, 1976, in Ankara, Turkey, and has lived there until he came to the United States for the graduate school. He has obtained his B.S. degree in mechanical engineering from Middle East Tec hnical University in Ankara, Turkey, in Ju ly 1999. In January 2002, he started his Ph.D. study in the Department of Mechanical and Aerospace Engineering at the University of Florida, Gainesville. He got his M.S. degree in mechanical and aerospace engineering from the University of Florida in May 2003. His main research interests lie in numerical simulations of multiphase flows. He has worked on his dissertation with Professor Wei Shyy. During his Ph.D., he has served as a t eaching assistant for one semester. He has been also managing the website for the comput ational thermo-fluid s research group and administrating a Beowulf cluster for hi gh performance parallel computing. |