UFDC Home  myUFDC Home  Help 



Full Text  
REAILTIMIE SMOOTH SURFACE CONSTRUCTION ON THIE GRAPHICS PROCESSING UNIT By TIANYUN NI A DISSERTAiTION PRESENTED) TO THE GRADUATE SCHOOL OF THE~ UNIVERISITYI OF FLORIDA IN PARTIAL FU] li i i '. i i T OF THIE REQ2UIIRP. i! .TS FOR THIE DEGREE OF DO~CTORI OF; 1il OSOPHY UNIVERSITY OF FILORIIDA S:: Tianyun Ni To my family, p. .11 il my father and to all of whom have lent encouragement and support during the time .i ut on thi s research ACKNOWLED(_'? li: TS I wish to express my sincerest thanks to the chair of my dissertation committee, Dr. Jojrg, Sfor working with me throughout this long enterprise. TABLE OF CONTENTS page ACKNOWLEDGMENT S . 4 LIST OF TABLES . 7 LIST OFFIGURES . 8 ABSTRACT. ................................ . 10 CHAPTER 1 INTRODUCTION . . 11 1.1 Motivation. . . 11 1.2 Problem Statement . . 13 1.3 Modern GPU Pipeline and Current Trends . . 14 1.4 Representations in Surface Modeling . . 17 1.4.1 Subdivision Surfaces .. . 17 1.4.2 Parametric Patches ......... . 20 1.4.2.1 Bezier technique . . 22 1.4.2.2 Related work. . . 23 2 A NEW SCHEME FOR SURFACE CONSTRUCTION .. . . 25 2.1 Contribution. . . 25 2.2 The Conversion Algorithm . . 25 2.2.1 The Conversion Rules for a Type1 Quad . . 27 2.2.2 The Conversion Rules for a Type2, or Type3 Quad . . 29 2.3 Derivation of the coef ficients of a cpatch . . 30 2.3.1 Derivation of Ao andX A t...... . . 31 2.3.2 Derivation of b211 and bl21 ..... . . 31 2.3.3 Derivation ofbl 6112. ..... . . 33 2.4 Smoothness Verification . . 35 2.5 Complexity Analysis . . 39 2.5.1 Number of Patches . . 39 2.5.2 Cost of Patch Construction . . 39 2.5.3 Cost of Surface Evaluation . . 39 2.6 Approximation CatmullClark Subdivision Surface . . 40 2.7 WaterTight Surface Verification . . 40 2.8 Discussion. . . 40 3 GPU IMPLEMENTATION . . 42 3.1 Overview . . 42 3.2 2pass Approach . . 42 3.3 1pass Approach . . 44 3.4 Coordinate System Transformation . . 44 3.5 WaterTight Evaluation .. . . 46 3.6 Conclusion . . 47 4 RESULTS . ............................. . 48 4.1 Shape Quality . . 48 4.2 Performance . . 50 4.3 Displacement Mapping .. . . 51 4.4 Morphing and Animation . . 52 4.5 Conclusion . . 55 5 PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS . 56 6 DISCUSSION AND FUTURE WORK . . 59 6.1 Future GPU API. . . 59 6.2 Volume Preservation . . 59 6.3 Adaptive Tessellation . . 59 REFERENCES . . 61 BIOGRAPHICAL SKETCH . . 65 LIST OF TABLES Table page 41 ALU operations for evaluation at (v, v) . . 50 42 Performance results ......... .. . 50 43 Performance of the 1pass implementation. . . 51 29 The choice of middle point in cpatch 210 The center of a bicubic patch can be evaluated by the linear combination of the bound ary coefficients. 211 C1 transition between a triangular and a bicubic patch. 212 G1 transition between two triangular patches. LIST OF FIGURES Polygonal modeling ..... Problem statement .... DirectX 10 pipeline stages .... DirectX 10 pipeline ..... The primitives ..... The notations of input mesh .... The three possible configurations ..... The CatmullClark stencils .... The subdivision schemes .... The suggested rendering passes .... Future GPU architecture .... The subdivision schemes .... Derivation of cpatch .... Vertex computation ..... Surface conversion .... Computing control points v, e, f and t, the Patchbased computation .... Patch computation .... The reparameterization of A to meet G1 at Figure 11 12 13 14 15 16 17 18 19 110 111 112 21 22 23 24 25 26 27 page . 11 . 12 . 14 . 15 . 16 . 17 . 17 . 18 . 19 . 21 . 22 . 24 . 25 . 26 . 26 . 27 . 28 . 30 . 32 . . 32 proj section of e :the vertex 28 Coefficients b211 and bl21 Of cpatch is derived on top of a ghost patch. 31 2Pass implementation . . 42 32 2Pass conversion . . 43 33 1Pass conversion . . 45 34 1Pass implementation . . 45 35 (u, v) on an irregular quad. . . 46 36 Watertight Evaluation ......... .. . 46 41 Shape quality comparison . . 48 42 CatmullClark approximation comparison . . 49 43 Ordinary patches and extraordinary patches . . 49 44 GPU smoothed quad surfaces with displacement mapping. . . 49 45 Closeup of the frog. The refined mesh is watertight. . . 51 46 Displacement mapping on the frog model ....... .. 52 47 Shape comparison. . . 53 48 Shape comparison. . . 53 49 Real time animation on the Sword model. ....... .. 54 410 Real time animation on the Frog model . . 54 411 Asynchronous animation of nine Frogs. . . 54 51 The reasons for using Tr/Quad/Pent Meshes . . 56 52 A quad/tri/pent model . . 57 53 Patch representations . . 57 54 Triangular representation . . 57 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the .1. i : for the Degree of Doctor of Philosophy REAL~TIME SMIOOTHI SURFACE CONSTRUCTION ON TH(E GRAPHIICS PROCrl TEN~G UNIT By Tianyun Ni August I1 Chair: "': i i.i :: Computer Engineering Increased realism in interactive :1 .ill and :nn^in I: .i::T: complex smooth surfaces to be rendered at ever higher frame: rates. In ,. .:i .1 : representations used to model surfaces offline, such as spline and .:1. i: : :: surfaces, have to be modified or reorganized to allow for efficient usage of the graphics i :.. :: unit and its SIMD) (Single: Instruction, "1::7 IT i Data) parallelism. This i: : :i: :1: 1: ::f a novel algorithm for converting quad meshes on the: GPU to smooth, w/atertight ::: i?: at the highest speed documented so far. The: conversion ...T.. bicubic i..E... wherever i ,i1 and closely mimics the hnyr of the CatmullClark subdivision surface by c i::1. 1:; where a vertex has a valence different from 4. The smooth surface is piecewise polynomial and has wvelldefined normals everywhere. CHAPTER 1 INTRODUCTION This chapter introduces the challenges that motivate the dissertation, gives a detailed literature review, positions of the research relative to the current state of the art. and an overview of the modern GPU pipeline. 1.1 Motivation In graphics, 3D obj ects are approximated by polyhedral meshes of great complexity. For example, a game character can consist of tens of thousands of polygons (Figure 11). Increased realism in interactive gaming demands such meshes to be animated and rendered in realtime. There are essentially two maj or approaches in the literature which serve this purpose: Polygonal Modeling and Higherorder Surface Modeling. There are two scenarios of animations: Morphing and Skinning. Morphing is used to change one image into another through a seamless transition. Skinning is a common technique to deform characters [20, 23, 24, 32]. The animated mesh, referred as a "skin", is deformed based on the pose of an underlying skeleton. In Polygonal Modeling (Figure 11), skinning and morphing are applied to a highdetail mesh created by an artist. Most games currently use this approach. This technique involves redundant work due to minimal sharing in Polygonal Modeling representation. In addition, a large number of vertices in a complex mesh must be fed into the graphics pipeline via the GPU's memory bus, which is a potential bottleneck. .Animateid. Hlghdetall Animation Hgeal Figure 11. Polygonal Modeling: currently the popular animation approach in games. The alternative approach, Surface Modeling, animates a coarse mesh (Figure 12). Subdivision surfaces and parametric patches, as two popular highorder surface representations, both support level of detail rendering (see Section 1.4). Highlydetailed 3D models are produced by displacement mapping [1l]. Displacement mapping adds fine details in form of scalar fields on the smooth surface defined by the coarse mesh. As a specific instance, Lee [27] proposes Displaced Subdivision Surface to represent a detailed surface model as a scalar value displacement over a smooth surface domain. This approach reduces the number of vertices that must be read and animated in each frame because complex geometric details are generated on the GPU. The runtime cost now includes the conversion process from the coarse input mesh to the final complex mesh. The conversion process involves surface construction, evaluation and displacement mapping. RepresntatonRepresntauo no mapping Smot Surfac  Tessellata +u Displacementt ~~M apaingapi Mesh Figure 12. Each highdetail mesh in Surface Modeling is represented by a coarse control mesh with a displacement map. The coarse control mesh is first converted to a smooth surface. Then the surface is tessellated and the vertices are perturbed in the normal directions based on the corresponding value in the displacement map. Last, the normal at each vertex of the refined highlydetailed mesh is updated. In summary, the advantages of Surface Modeling are 1. lower computation cost of animation because skinning is done on the coarse mesh, not the final dense mesh; 2. memory and bandwidth savings by encoding most detail as onedimensional displacements rather than threedimensional vectors; 3. support of refinement level on the fly; 4. customization of archetypes: we can model different 3D models with the same coarse mesh, changing only the displacement map; 5. support of adaptive tessellation: evaluation does not have to be on a uniform grid. The disadvantages of Surface Modeling is that modern GPUs cannot render such surface directly. The surface must be converted into triangles or quads through a process of tessellation and evaluation. Therefore, Surface Modeling becomes more attractive as a realtime technique only if the conversion is more cheaply than the cost of reading and animating a highpolygon mesh. Our goal is to design such a scheme on the GPU. 1.2 Problem Statement Meshes consist of pure quadrilateral facets are common in modeling for animation. Any polyhedral mesh can be converted into such a quad mesh by one step of mesh refinement. But a good designer creates meshes with the quadrestriction in mind so that no global refinement is necessary. We therefore focus on quadrilateral meshes and aim to derive a set of efficient rules directly on the GPU (Figure 12, the red dotted rectangle) that produce surfaces with good visual quality. Specifically the resulting surfaces should 1. generate a small number of low degree polynomials; 2. possess smooth geometry (no extra cost for smooth shading); 3. closely approximate CatmullClark surfaces (a standard modeling tool); 4. are watertight (no pixel drops out); 5. map well to the graphics pipeline and leverage the strengths of GPU computation. 1.3 Modern GPU Pipeline and Current Trends A graphics processing unit (GPU) is a dedicated graphics rendering device. Its SIMD architecture has evolved substantially over the last decade. This highly parallel structure makes it more effective than generalpurpose CPUs for a range of algorithms. Modern GPUs expose a programmable parallel stream processing pipeline as a series of short programs called shaders. During the last five years, maj or graphics software libraries such as OpenGL and DirectX are used to program the GPU via shaders on a programmable pipeline, which has mostly superseded the older "fixedfunction pipeline". The two most popular graphics software libraries, DirectX and OpenGL, currently both specify APIs for three types of shaders: vertex, geometry, and pixel shader. The shaders in DirectX10 system [4](Figure 14 ) share a common core that accesses up to 128 memory buffers and 16 parameter (constant) buffers. Vertex and pixel shaders use a "onein, oneout" data processing model. In contrast, the geometry shader has a limited ability to amplify or reduce primitive count and thus is able to change meshes. Figure 13 shows the input Input As Fixed Vertex and Index Vertices Sembler (IA) buffers Vertex Shader Programmable Avertex, up to 8 Avertex, up to 8 32bit 4 (VS) 32bit 4 component data component data Geomletry Programmable A primitive, up Anyr number of primitive, Shader (GS) to 8 32bit 4 up to 1024 32bit 4 co~mponent data component data The Rasterizer Fixed vertices and Fragments (TR) attributes of a single primitive Pixel Shader Programmable Afragment, up A fragment, up to 8 32bit (PS) to 8 32bit 4 4 co mpo nent data component data Output Fixed A fragment A pixel Merger (OM) Figure 13. The input and output of each pipeline stage in DirectX 10 system and output of each pipeline stage. The more detailed explanation of each stage is as follows: u~~r Geometry Shader t Pixel Shader Output Merger Inb~~Ulmer ~~~~~~ ...,~~I ""ur;....l  ill Figure 14. DirectX 10 Pipeline 1. The Input Assembler (IA) gathers vertex data to set up vertex and index buffers. Vertex buffers contain pervertex data while index buffers define geometry primitives as integer indices into vertex buffers. Indexing helps avoid redundant computations of the same vertex. 2. The vertex shader (VS) typically processes vertexbased operations such as changing the position and normal of a single vertex. The computations in this stage are local. Each vertex only has its own information and does not communicate with other vertices. The VS is most commonly used to transform vertices from obj ect space to clip space. 3. The geometry shader (GS) processes the vertices of a single primitive. A primitive can be a point, a line segment, a triangle, a point with adj acency, a line segment with adj acency, and a triangle with adj acency (Figure 15). Due to the availability of the primitive vertices up to 6 vertices for a triangle with adj acency), the computations in the stage are less local than those on the VS and PS. The GS can emit additional primitives. This new amplification feature, introduced in DirectX10, adds more flexibility and makes a number of algorithms [1] possible to be implemented on the GPU, such as mesh refinement, shadow volumes, dynamic particle systems, etc. The geometry shader output may be fed to the rasterizer stage and/or to a vertex buffer in memory via the stream output stage. Memory resources (Buffers, textures) 4. The rasterizer (TR) is a fixedfunction stage generating fragments by filling in the poly gons sent through the graphics pipeline. Clipping, culling, perspective divide, viewport transform, primitive setup, scissoring, depth offset also happen in the stage. 5. The pixel shader (PS) operates on one fragment at a time. Usually scene lighting and pixelrelated effects such as bump mapping and color tone mapping occur in the PS. 6. The output merger (OM) takes a fragment from PS and performs traditional stencil and depth testing operations as well as render target blending to generate a final pixel on the screen. O 4 O Point Point with Adjacency Line Segment Line Segement .in1jcn O Triangle Triangle with Adjacency Figure 15. The six primitives used in GS The future GPU pipeline [29, 48] is expected to provide a Tessellation Unit, combined with new shader stages for patch conversion and evaluation of tessellated highorder surfaces. The Tessllator provides a solution to adaptive refinement on the graphics hardware. Based on user provided tessellation factors per edge, the tessellator adaptively creates a sampling pattern of the underlying parametric domain and automatically generates a set of parametric domains. In addition, two special shaders are introduced to the nextgeneration GPU pipeline. The patch shader converts an input mesh to a set of patches. The evaluation shader takes the (u, v) output of the tessellator and evaluates the patch at (u, v). This future GPU architecture also allows the GPU to exploit more parallelism because multiple arithmetic units can be running the same evaluation shader. Moreover tessellation occurs on the GPU and overcomes the bottleneck of bus bandwidth caused by model complexity. The new GPU design indicates Surface Modeling is the trend for realtime graphics. 1.4 Representations in Surface Modeling In Computer Graphics, surfaces are represented by polyhedral meshes. A polyhedral mesh is a collection of vertices, edges and facets. The valence of the vertex is the number of its incident edges. Each facet is an nsided polygon. In a triangular (or quadrilateral) mesh, n equals 3 (or 4 respectively). An arbitrary mesh has nsided polygons where the value of n is arbitrary. The difference between Regular and Irregular Vertices are explained in Figure 16. Figure 17 illustrates three possible types of a facet. EMMM Tri. Tria ngles Valence Valfence AII vertices Exactone Morethan = 6 != 6 of a facet vertex of a one vertex of Quad. Quadrilaterals Valence Valence are regular facet is a facet are = 4 != 4 irregular irregular Figure 16. Tri and Quadrilateral meshes and facet types 1,2,3. Type1 Quad Type2 Quad Type3 Quad Figure 17. The three possible configurations. Type1 Quad is regular. Type2 or 3 is irregular. Parametric patches and subdivision surfaces are maj or tools for modeling freeform surfaces with arbitrary topology. A more intuitive way for inexperienced users to create shape by drawing curves, or sketch is also available [22, 36] 1.4.1 Subdivision Surfaces Subdivision surfaces, as part of standard modeling packages (e.g., 3DMax, Maya, Soft image, Mirai, Lightwave, etc.), have proven to be a useful modeling tool. Subdivision schemes were first introduced by [10, 12, 31]. They generate a smooth surface through mesh refinement process. This method begins with a coarse mesh that approximates a 3d model, known as a control mesh. Each vertex in the control mesh is called a control point. Control points influence the shape of the limit surface. The mesh is refined after each subdivision step by inserting new vertices into the mesh, refining existing point positions, and updating the connectivity. The positions of the new vertices in the mesh are computed by the averaging rules that apply to the positions of nearby old vertices. The averaging rules are different from scheme to scheme (see a comparison in Figure 19), and it is these rules that determine the properties of the surface. The graphs that illustrates the rules are called stencils. The binary subdivision splits each edge into 2 while ternary subdivision split each edge into 3. Usually each subdivision scheme has at most three types of rules: vertex stencil, edge stencil, and face stencil. For example, the stencils of CatmullClark subdivision is shown in Figure 18. The refinement rules includes stencils for smooth surface as well as special rules for creating shared or semisharped features. Each refinement step produces a denser mesh than the previous one. The limit subdivision surface is the surface produced from this process after infinitely many times of refinements. In practical use however, this algorithm is only applied a limited, and usually four, number of times. FaceStencil EdgeStencil VertexStencil Figure 18. The stencils used in CatmullClark subdivision. These stencils define the rules to derive the new vertices that lie on the old vertices, edges, and facets. A realization of tessellationonthefly for Loop subdivision surfaces was proposed in [33]. Pulli [44] implemented Loop's subdivision scheme with additions by Hoppe et al [19]. Bischoff [3] proposed a forwarddifferencing method that only requires a constant amount of memory regardless of subdivision step. DeRose [13] generalized the infinitely sharp creases of [19] to obtain semisharp creases. Hoppe [19] extended Loop's scheme by introducing CatrnullClark any C' C1 No DooSabin C2 C1 NO Loop C2 C1 NO Butterfly C1 C1 Yes Kobbelt C1 C1 Yes Sirnplest any C1 C1 No Sqrt(3) C2 C1 NO 48 C4 C1 NO Ternary Triangle C4 C1 NO Quad/Triangle M C2 C1 NO 43 M C2 C1 NO Ternary Quad C2 C1 NO Figure 19. Classification of common Subdivision Schemes. subdivision rules that lead to a piecewise smooth surface with features such as creases, corners, darts, and conical vertices. Adaptive subdivision can dramatically speed up the performance because the level of detail(LOD) is updated based on dynamic distance with the camera as well as the complexity of each part of the model. Adaptive refinement is previously implemented using quadtree data structure [50]. Each level of the tree represents one refinement level of the mesh. However, it is difficult to map the recursive nonuniform tree structure to parallel computation. Bunnell [9] provides code for adaptive refinement. Even though this code was optimized for an earlier generation GPUs, this implementation adaptively renders the subdivision surfaces in realtime on current hardware. Lai and Cheng [26] implemented adaptive CatmullClark subdivision. A hardware architecture support for adaptive refinement is proposed by [5] The implementation of subdivision surfaces on the GPU can be roughly categorized into three groups: (I) recursive evaluation [9, 13, 28, 44, 46]; (II) direct evaluation [45, 47]; (III) pretabulated basis function composition [6, 7]. Recursive evaluation is the most intuitive way, but not the most efficient approach. Stam [47] directly evaluates subdivision surfaces at L~ ~l;rmWmR arbitrary parameter values. However, Stam's method can not evaluate a mesh that contains Type3 quads. Moreover, the required proj section of control points into the eigen space is too complex for large meshes on the GPU. The weakness of [6, 7, 9, 46] is not able to convert a mesh with Type3 quads either. To get rid of those quads usually means applying at least one Catmull Clark subdivision step on the CPU and fourfold data transfer to the GPU. In more detail, Shiue implements recursive CatmullClark subdivision using several passes via the pixel shader, using textures for storage and spiralenumerated mesh fragments for maximizing parallelism [46]. Bolz tabulates the subdivision nodal functions up to a given density and linearly combine them in the GPU [6, 7]. The number of nodal functions equals the number of the vertices of the input mesh. One of the obvious advantages of subdivision surfaces is they can model surfaces of arbitrary topological type. Also because of static refinement rule for each scheme subdivision surfaces are easy to implement. Although subdivision surfaces have been known for nearly twenty years, their use has been hindered in realtime applications such as games because recursive refinement is neither memory efficient nor performance efficient. Multiple passes are required to render a visually smooth surface. Moreover, approximately 4fold of geometry increase after each subdivision step causes heavy memory traffic on the bus between the CPU and the GPU. 1.4.2 Parametric Patches Since current and impending GPU configurations favor short explicit surface definitions over recursively defined surfaces, the alternative Patchbased refinement has been advocated for fast rendering. Parametric patches (short as PP) are rendered directly in terms of their polynomial representations, as opposed to a collection of approximating facets. Generally speaking, PP converts control meshes to a set of patches that are parametric piecewise polynomials. PP schemes can conveniently fit into a 2pass implementation on the current graphics pipeline (Figure 110). The two rendering passes are combined to one pass in a future GPU pipeline (Figure 111) [48]. Tessellated domain Corse Quad Mesh (u,v), or (s,t,w) Palchbasd Surac Output '.  "mm.?r~;;_i SurfaceT CosrutonSrfc Evaluhation ur pa r enamet ic patchte Srepeetto s In te folowi gpste dea il ae d edusn DM afte te devaluation ofthpodce atches frompreiouspas The overall speed of a PP scheme is influenced by both the complexity of patches and the number of patches. For shape measurements, a desired PP scheme ensures at least G1 continuity across the adj acent patches and is a close approximation of subdivision surfaces. One of the biggest challenge is to ensure the smoothness everywhere over the patches. Peters explained how to solve the vertex enclosure problem and geometric continuity in [39, 41]. GPUbased evaluation of trimmed N~URBs surfaces is proposed in [16, 25]. Peters [40] used an approximation to the limit surface of DooSabin subdivision to get a quickly convergent series of approximations to the volume of the enclosed subdivision surface. The difficult problem of filling nsided holes is recently solved by [21, 42]. Bajaj et al. [2] introduced Apatches in trivariate BB form with few free parameters to adjust the shape both locally and globally. In [15], the freeform surface is represented in either NURBS form or as cubic triangular Be~zier patches An explicit spline representation of smooth freeform surfaces is to form the basis of an interactive sculpting environment. In the spirit of the Tessllator, Boubekeur [8] Inp I ve I Pa I SC ntcrction hader jlL urfaI ladeI ~aderI atoI Figure 111. One possible pass on the future graphics rendering pipeline, describes a generic refinement pattern for Surface Modeling (tessellation + displacement) on any programmable GPU. 1.4.2.1 Bezier technique The Be~zier form is a parametric surface representation and was first developed in 1972 by the French engineer Pierre Be~zier. A comprehensive overview of the Be~zier form can be found in [43]. A Be~zier patch is a defined by control points. A Be~zier surface, as a set of Bezier patches, are piecewise polynomials. They are visually intuitive and mathematically convenient due to the following properties: 1. Affine invariance: Applying an affine transformation to a control mesh applies it to the corresponding Be~zier patch as well. 2. The convex hull property: A Be~zier patch lies completely within the convex hull of its control points, and therefore also completely within the bounding box of its control points in any given Cartesian coordinate system. There are two types of Bezier patch: A tensor product patch in Be~zier form of degree m by n is defined as: where (u, v) is a barycentric coordinate on the domain of [0, 1] x [0, 1]. A triangular Be~zier patch of degree n is defined as: b(s, t, w): := b,?)~~~ijk S j i+j+k=n where (s, t, w) are the barycentric coordinates on a triangle domain. 1.4.2.2 Related work For quadrilateral input meshes, it is well known that Type1 quads can be converted into degree 3 by 3 patches in tensorproduct Be~zier form by the standard Bspline to Be~zier conver sion rules [14]. Therefore, any two adj acent patches derived from ordinary quads will j oin C2 The interesting aspect is the conversion of Type2 and Type3 quads. A number of techniques(see a comparison in Figure 112) exist to smooth out quad meshes. Peters [3 8] generates NURBS output, that could be rendered, for example by the GPU algorithm of [17]. But this has not been implemented. The method of [30] generates one bicubic patch per quad following the shape of CatmullClark surfaces. Since these bicubic patches typically do not j oin smoothly, Loop and Schaefer compute two additional patches whose cross product approximates the normal of the bicubic patch. As pointed out in [49], this trompe l'oeil represents a simple solution when true smoothness is not needed. Comparing the number of operations in construction and evaluation, the method of [30] should run at comparable speeds to our GPU quad mesh smoothing. Our method [37] designs a cpatch for converting an irregular quad. The resulting cpatches form a G1 surface. The alternative algorithm proposed by [35] uses a bi5 Be~zier patch for each irregular quad. Peters [38] Bi3 C1 4m Vlachos et al. [49] Loop and Shaefer [30] Myles et al. [35] Thesis 137] Cubic, Quadratic Bi3 2 by 3 Bi3 Bi5 BI3 cpatch 2m m geom, 2mta n m m m: number of input quads Figure 112. This figure compares existing PP schemes in terms of how well they meet the performance and shape measurements. geom=geometry patches, tantangent patches. CHAPTER 2 A NEW SCHEME FOR SURFACE CONSTRUCTION 2.1 Contribution This thesis proposes a set of rules for converting a quadrilateral mesh to a surface consist ing of bicubic splines wherever possible. Each irregular quad (Figure 17) is converted to a novel C1 surface patch (short epatch). The surface closely mimics the shape of the CatmullClark sub division surface and is constructed entirely by local parallel operations on the GPU. The resulting surface is piecewise polynomial and has welldefined normals everywhere. The evaluation avoids pixel dropout. A cpatch is a C1 piecewise polynomial patch with cubic boundary. It is defined by 24 coefficients whose instantiation for a smooth surface is given in Section xxx below and indicated in Figure 21. A cpatch has an alternative representation as four triangular, total degree 4 patches in BernsteinBezier form (Figure 25 right). O O O O O O O O 112, O b'211 + a bil21 vi e' Figure 21. The cpatch coefficients. For i 0, 1, 2, 3, the boundary coefficients vi and e) defined by vertex neighborhoods(figure 24 specifies the formulas). The interior coefficients bji7, bj21,, b6112 (figure 26), where i 0..3, j 0 ..ni, and ni is the valence of I 2.2 The Conversion Algorithm Here we give the detailed algorithm for converting the quad mesh into coefficients that define a smooth surface of low degree. Essentially, the conversion from a mesh to a patch P2n1 Figure 22. Smoothing the vertex neighborhood according to Figure 24. The center point p,, its direct neighbors p~j and diagonal neighbors p~j+1 form a vertex neighborhood, j = 0..n 1. (a) quad neighborhood (b) bicubic (c) cpatch Figure 23. a) A quad neighborhood defining a surface piece. b) A bicubic patch with 4 x 4 control points. This patch is the output if the quad is regular, and used to determine the shape of a cpatch c) if the quad is irregular. A cpatch is defined by 4 x 6 control points displayed as and can alternatively, for analysis, be represented as four C1connected triangular pieces of degree 4 with degree 3 outer boundaries identical to the bicubic patch boundaries. L. i !~`I~ consists of computing new points near a vertex using the knowledge of the vertex neighborhood. A vertex neighborhood consists of a mesh point p, and mesh points pk, k = 0, .. ., 2N 1 of all quads surrounding p, (Figure 22). the union of the four vertex neighborhoods is a the quad neighborhood(Figure 23, A.) that defines a patch. In our scheme, the patch is either a tensor product bicubic Be~zier patch, or a cpatch. 2.2.1 The Conversion Rules for a Type1 Quad Recall that a quad is Type1 if all four vertices have 4 neighbors. Type1 quads are considered regular in the literature. Such a facet will be converted into a degree 3 by 3 patch in tensorproduct Be~zier form by the standard Bspline to Be~zier conversion rules [14]. Therefore, any two adj acent patches derived from Type1 quads will j oin C2. Figure 23 illustrates the derivation process from a quad to a Bicubic Be~zier patch. The conversion rules are shown in Figure 24. Jf := (4p* + 2pzj + 2ppya + p2441)/9 e, := (f3 + S 1 + 7. + 2 ) / 4 vV :=.Co 4 / + 2ej + (r0 3)p, 1 .pN1~ 2ar(j) "0 N Leor = OS N e, j = 0,1. Figure 24. Computing control points v, e, f and t, the projection of e, at a vertex of valence N from the mesh points pj of a vertex neighborhood; the subscripts are module 2N. By default, o, : (c, /c + 9)c + 1ii ) /16i, th~e subdominant eigenvalue of CatmuullClark subdivision. A vertex v computed according to Figure 24 is the limit point of CatmullClark sub division as explained, for example, in [18]. The rules for ej and fj are the standard rules for converting a uniform bicubic tensorproduct Bspline to its Be~zier representation. The points tj are a proj section of ej into a common tangent plane (see e.g. [15]). The default scale fac tor o is the subdominant eigenvalue of CatmullClark subdivision. We note that for NV = 4, ejf+2 = 2v ej and o = 1/2 so that the projection leaves the tangent control points invariant as tj = ej: forNV= 4, tj = v+(ej j+2) 8J v 0 ) = j. (21) In the next stage, we combine information from four vertex neighborhoods, as shown in Figure 25, to populate a tensorproduct patch y of degree 3 by 3 in Be~zier form [14]: k=0 L=o k > The patch is defined by its 16 control points gkL. The formulas of Figure 24 make this patch the Be~zier representation of a bicubic spline in Bspline form. For example, in the notation of Figure (VO 0 t:, 1 25, (gko)k=0,..:4 an xrnriar b'2 bi+2 ]~ r: 1'; II L 01 11 1 e 1 ~I t 0 11 ~; cl 0 .0 * 310 220 130 040 Figure 25. Patch construction. On the left, four vertex neighborhoods with vertices v" each contribute one sector to assemble the 4 x 4 coefficients of the Bitzier patch 9, for example gon = O, an0 e n, 911 /nfO us =ct Ui e4(\ve use superscripts to indicate vertices). On the right, the same four sectors are used to determine a cpatch if the underlying quad is extraordinary. The indices of the control points of y and b" are shown. Note that only a subset of the ccveliic imr.\ of the four triangudar pieces b" is actually computed to define the cpatch. The full set of coefficients displayed here is only used to analyze the construction. The indexing of 15 coefficients of a quartic triangular patch is shown on the right. We use this labeling throughout the dissertation. 2.2.2 The Conversion Rules for a Type2, or Type3 Quad Type2 and Type3 quads are known as irregular. The irregular quads have at least one and possibly up to four vertices with valence other than 4. For each irregular quad, the conversion involves two steps: 1. Apply regular rules defined in Figure 24 to generate v" and e" showrn in Figunre 21 left. 2. Thenn applyr rles in Figure~ 2 toyedb+ bj21,r h11 Shown in Figure 21 right. We use the bicubic patch to outline the shape as we replace it by a cpatch (Figure 23, c). A cpatch has the right degrees of freedom to cheaply and locally construct a smooth surface. We introduce the cpatch in terms of a wellknown Be~zier form of a polynomial piece bi of total degree 4 [14]: k++m=4 k,,m20 The cpatch is equivalent to the union of four bi, i = 0, 1, 2, 3 of total degree 4, but defined by only 4 x 6 ccoefficients constructed in Figures 24 and 26: VZ, ti, ti, bizz, bj21l, b112, i = 0, 1, 2, 3. These 24 ccoefficients imply the missing interior control points of the representation (22) by C1 continuity between the triangular pieces: for j = 0, 1, 2, 3 and i = 0, 1, 2, 3, bi bi1 (' + bi )/ (23) 3j,0,1+(3 0,3j,1+j ,, ,~ andu the~ bolunary control l pointsJ bieo are implied by degreeraising [14]: bio :=v' b t : (vi + 3t ) /4, bi 20 __ t+1) /2, bi 30 = i+1 3"+1) /4, bi 40: i+1. (24) For all obj ects with boundaries, the boundary rules are simply the derivation of cubic Be~zier curves defined by (vi, t t l~, vi+l). Basis functions corresponding to the 24 ccoeflficients of the ~+,4 (~B+~i >+ ~_CB+1 (tg "~ t q ~, ts"+l)3 (fi ~> 211 S( SB13 Ii+1 i+1: _.i+l> bit, 2 s*+ ~(b , bIg a b ig(  b l  b  b l)/16 Figure 26. Formulas for the 4 x 3 interior control points that, together with the vertex control points vi and the tangent control points t(, define a cpatch. See also Figures 211 and 212. Here ci : cos ~, s:sin 2" a"~nndsprsc~ripts are modu~lo\ 4. By, default, g : (CE oi +3(e6 + ) + 9fi)/64, the central point of the ordinary patch. cpatch can be read off by setting one ccoefficient to one and all others to zero and then applying (23) and (24). 2.3 Derivation of the coefficients of a epatch When a cpatch sector b meets a cpatch sector a (Figure 212), the following equation must hold to preserve G1 continuity across the boundary between b and a, A(u)81b(u, 0) = 82b(u, 0) 8 1a(0, u), (25) where, with denoting the scalar, respectively three scalar products for the vectors, 3(Uo, 2UI, U2) 2 u _> U2) dlb(u, 0): 82b(u, 0): 81a(0, u): 4(vUo, 3vl, 3U2, U3) (3, U2 _)U U2, ( 3>" 4(wo, 3wl, 3w2, I,. (3, U2 _>U( U2 ( U3) Equation (25) can be rewritten in a collection of the following simplified forms in terms of Ui,l , 3XoUo = 21,,+ in,,, (26) (27) (28) (29) 12(vl + wl) 12(v2 w2) 6AoUI + 3A Uo 3XoU2 + 6A U 3Ai U2 = 3 it 2, 2.3.1 Derivation of Ao and At The scalar Ao is derived from (26). (29) sets the constraint for Az. L~et Uon := (1, 0), V/o := (cOS: 2, Sin 2x), and Wo~ : (cOS 2x, Sin 2).~ (Figure 27) GUo, i = 62~ f0m degree raising. We know uo 24 4 2 4 vo + wo 3 1 + cOS 2x7 S1H2x 4 2 '2 3 2xr (1 + cos , 0) 4 no 3 2xr (1 + cos )>Uo 4 no 3 1 COS 2x Sin 2x2 4 2o n 2 (210) Hence, 4(vo + wo) 3(1 os)U Similarly, because V3 (1 COS 2x rSin 2") and W3 n1 81 I o CO 2xSin 2x, 2xr 3 1 COS )U2 n1 4(v3 it'. ) (211) Hence, At = (1 cos 2x) 2.3.2 Derivation of b211 and bl21 To derive the formulas for big, and its symmetric counterpart bj21 110te that the formulas must guarantee a smooth transition between b" and its neighbor patch on an adj acent quad, ni = valence of Ni 3/4*VO 27~ 3/4*V3 v0 v3 No/woaliw0 / O N1 uO=3/4*U0 u3=3/!'4; w0 w3 3/ 4 *WO 3 /4*W3 3 2x 3 2n v:0+w0rO= (cos +1)Uo v3 +w3 (1 cos )U2~ 4 no 4 ut1 Figure 27. The reparameterization of A to meet G1 at the vertex regardless whether the adj acent quad is regular or irregular. That is, the formulas are derived to satisfy simultaneously~~~ttt~~ttt~~ two types of smoothness constraints (see Section 2.4). From Equation Triangular patches O O Ghost patch Figure 28. Coefficients b211 and bl21 Of cpatch is derived on top of a ghost patch. (27), we obtain 1 1 b211 a211 0U 1 X1 0 + 2b310 (212) 2 4 To get a second constraint and determine b211 uniquely, we consider the values b 7,, and ai, if each ghost patch in terms of sin averages (Figure 28): 4so(b211 b310) + 1(b211 b220) = 3(bll b1o) yields 4Sob310 4S1b220 3(00 O b211 = (213) 4(so + 81 Similarly, 4Sob310 + 1Sb220 0 (~ 1 ) a211 =(214) 4(so + s1) Therefore, 3(fo0 e~~ b211 a211 (215) 2(so a ) Together with Equation (212), 1 1 3f b21 =b31 01 (216) 4 8 4(so si) Equation (28) implies 1 1 bl21 a121 0 2~ X1U 1 2bl30 (217) 4 2 Using the similar approach as deriving b211, we yield 4so(bl21 b220) + 1(bl21 bl30) 3(b21 b20) yields 4Slbl30 + SOb220 0 / 1 : bl21= =C\ (218) 4(so + 81 Similarly, 4Slbl30SO Sb220? C1_tl al21 =0rUJ (219) 4(so + si) (218) and (219) 4 bl21 0121 =(~ e) (220) 2(so + s ) (218) and (220) 4 1 1 3(f > bl21 = bl30+ Xov 0 011 I] 1 0 u\0 1(221) 81 4 41 (so + si) The formulas (221) and (221) are the same as shown in Figure 26. 2.3.3 Derivation of b112 BycvllJvly contrast, b6112 is not pinned down by continuity constraints. We~ could choose each ""1 arbitrarily without changing the formal smoothness of the resulting surface. However, we opt for increased smoothness at the center of the cpatch and additionally use the freedom to closely mimic the shape of CatmullClark subdivision surfaces, as we did earlier for vertices. First, we approximately satisfy four C2 COnstraints across the diagonal boundaries at the central point b004 (Figure 29) by enforcing 1 1 0 0 b12 b7,b2 2 0 1 1 0 b2 1b72 (222) 0 0 1 1 b12, 2 b7,b21 2 1 0 0 1 b1b7 2 where q := \Eobz j1.Teprubto by, q,.~+, is necessary,, since the ,,cffcien matrix of the C2 COnstraints is rank deficient. After perturbation, the system can be solved with the lastnm+ eqato implie byl the firs three.~m We~ addr the co~nstraint that the ave~rag of b}1 matches g. := g( 11+, ),, the+;, center position ofthbcuicpach I~I l Figure 29. Dark lines cover the control points involved in the C2 COnstraints (222). The points on dashed lines are implied by averaging. 11 0 0 bil b bO bj1 0 1 1 0 bi2 1 b b2 0 0 1 1 bil$1 2 b 7, bij21 4 \1 1 1 1 b(2/ * g, lies on the Bicubic patch at n = 0.5 and v = 0.5. The Bicubic control points are given except interior 4 points, because all the control points on the boundaries are calculated. We can use a mask of determining Be~zier control points from a uniform bicubic Bspline surface. Figure 210(a) is a mask for b l. For other interior points, we can use a symmetric mask. 031a ~ 1, 31 00Iq 13 233 33 10 a 22 023 2129 323 023 2 4 a as3 118 21 313 or 3 3 b00, 30 a) b) c) Figure 210. The center of a bicubic patch can be evaluated by the linear combination of the boundary coefficients. Figure 210(b) shows a mask for the evaluation of Bicubic patch at (0.5, 0.5). g, = (boo + 3boi + ;:1.,,. + 03 + 3b10o 9bll 9b12 +:'t . 64 +;:i_,, + 9b21 + 9b22 '_ i~. + b30 +:' I +:' b33) Now, we can solve for the b}12,, i = 0, 1, 2, 3 and obtain the formula of Figure 26. 2.4 Smoothness Verification In this section we formally verify the following lemma. For the purpose of the proof, we view the cpatch in its equivalent representation as four Be~zier patches of total degree 4. Lemma 1. Two adjacent polynomial pieces a and b defined by the rules of Section 2.2 (Figure 24, Figure 26, (23), (24)) meet at least (i) C2 if a and b correspond to two regular quads; (ii) C1 if a and b are adjacent pieces of a cpatch; (iii) C1 if a and b correspond to two quads, exactly one of which is regular; (iv) 0I ithr tangent continuity if a and b correspond to two different irregular quads; Proof: (i) If a and b are bicubic patches corresponding to regular quads, they are part of a bicubic spline with uniform knots and therefore meet C2. (ii) If a and b are adj acent pieces of a cpatch then Equations (23) enforce C1 continuity. For the remaining cases, let b be a triangular piece. Let a the parameter corresponding to the quad edge between b400 = 00, where u = 0 and the valence is NVo and b040 = Ul where u = 1 and the valence is NI1 (Figures 211 for (iii) and 212 for case (iv)). By construction, the common boundary b(u, 0) = a(0, u) is a curve of degree 3 with Be~zier control points (vo, to t:, vl) so that bicubic patches on regular quads and triangular patches on irregular quads match up exactly. Denote by 81~b the partial derivative of b along the common boundary and by 82~b the par tial derivative in the other variable. Since b(u, 0) = a(0, u), we have 81b(u, 0) = 82a(0, u). The partial derivative in the other variable of a is 82a. We will verify that the following conditions hold, that imply tangent continuity: if one quad is ordinary (case (iii)), 81b(u, 0) = 282b(u, 0) + 81a(0, u), (223) if both quads are extraordinary (case (iv)), ((1 u6)Ao + u1Xl)8b(ul, 0) = 8b(U, 0) + 81aL(0, 11), (224) 2xr where Ao := 1 + co, At := 1 c and ci := cos( Both equations, (223) and (224), equate vectorvalued polynomials of degree 3 (we write 81b(u, 0) in degreeraised form [14]). The equations hold, if and only if all Be~zier coefficients are equal. Off hand, this means checking four vectorvalued equations for each of (223) and (224). However, in both cases, the setup is symmetric with respect to reversal of the direction in which the boundary b(u, 0) is traversed. That means, we need only check the first two equations (223') and (223") of (223) and the first two equations (224') and (224") of (224). We verify these equations by inserting the formulas of Figures 24 and 26. To verify (223), the key observation is that NVo = NI~ = 4 if one quad is ordinary. Hence co = c = 0 and so = sl = 1 (cf. Figure 26) and t~ = e Therefore, for example (cf. Figure e0 f0 I b **b3ol o bs eo el a Figure 211. C1 transition between a triangular and a bicubic patch. 211) 282b(0, 0) = 2 4(b:3O1 Iyo) = 8 ( 1 o) where the factor stm fromc,,,;;, ~ raiin te erefrom 3,' to 4; and the seondnr Bezienr coeCffcient of 81b(u, 0) (in degreeraised form) and of 282b(u, 0) are respectively (cf. Figure 211) S(eg tvo) + 2(e: e8) 3 and e eo o v~ o o> 24(b211 b:310) = 8( i o + 3 o 4 8 8 Then, comparing the first two Be~zier coefficients of 8 b(u, 0) and 282b(u, 0) + 81a(0, u) yields equality and establishes C1 continuity: 81b(0,0) (e to) 2(e, e,) 282b(0,0) 81u(0 0) 2(e, e ) (e tUo) 3(fo 3(fo o ) 0\a The equations for (224) are similar, except that we need to replace ej by tj and keep in mind that, by definition, (t o1 ,o o _ ,,o) = 2co~t o ,o Figure 212. G1 transition between two triangular patches. Hence, for example, 82b(0, 0) + 81a(0, 0) = 4(b301 y0 + 301 VO0 = 4 2co(to vo) The first of the four coefficient equations of (224) then simplifies to 3(1 + co) (to vo) = 4(b301 8 301 2vo/0 =O 3(o oi 2 2 = 3(2co(18 vo) 2( vo Noti\ngr+~ tha tem (fo 0 e)/(8(o +s1)) in the expansions of b211 and a211 cancel, the second coefficient equation is 610(t] to) + 3Ai/ti vo) = 12(b211 + 211 2b310) 12 2(1 + co) 12 2(1 c'),~ 4 8 It is easy to read off that the qualities hold. So the claim of smoothness is verified. 2.5 Complexity Analysis 2.5.1 Number of Patches The conversion scheme yields the minimum set of patches because (1) no initial refinement for input coarse mesh is needed; (2) each quadrilateral facet of the coarse mesh corresponds to only one patch. Namely, the total number of patches equals to the number of facets in the mesh. The patch complexity of various schemes are compared in Figure 112. The low cost of construction and evaluation makes cpatches an attractive representation, not just on the GPU 2.5.2 Cost of Patch Construction The separation into vertex and patch construction means that the number of scaled vertex additions (adds) per patch is independent of the valence. The cost of computing the control points per patch, (i.e.), with the cost of vertex computations distributed, is 4 x (4 + 1 + 1 + 2) = 32 adds per bicubic construction and computing tj from to and 1 andl determining, bjz j2 n bi1 according to Figure, 2 amounts+ to, an;;,, adiina 4. x 2 6 + 6 + 12) = 104 adds per cpatch. Each cpatch has 24 coefficients. This compares favorably to, say [30] where 16+12+12 coefficients are generated. 2.5.3 Cost of Surface Evaluation The patch can be evaluated at any parametric domain (u, v) using de Castelj au's algorithm. A tensor product Bicubic Be~zier patch is defined by 16 control points. The evaluation at (u, v) needs 42 vectorvector additions, 42 scalervector multiplications, and 42 scalerscaler operations. Similarly the evaluation of a cpatch at (u, v) requires 40 vectorvector additions and 60 scalervector multiplications. In terms of evaluation cost, a cpatch has roughly the same cost as a bicubic patch does. 2.6 Approximation CatmullClark Subdivision Surface Since CatmullClark subdivision is a standard modeling tool, our scheme is designed to approximate CatmullClark Subdivision Surface. In fact, the resulting Bicubic patches com pletely agree with the CatmullClark Subdivision Surface except in the immediate neighborhood of irregular mesh vertices. In such a neighborhood they j oin at least with tangent continuity and interpolate the limit of the irregular mesh vertex. Furthermore, the center of cpatch interpolates the center point of the correspondent CatmullClark limit surface due to the choice of the cpatch coefficient b112 2.7 WaterTight Surface Verification Patches are evaluated independently. If the generated vertices along the boundary from the adj acent patches do not match exactly, the refined mesh will have a hole in it. There are three configurations for adj acent patches: (1) both are Bi3 patches, (3) both are cpatches (2) one of them is Bi3 patch. The coefficients defining the shared boundary curve are derived by the averaging rules defined in Figure 24. Since additions are commutative, the generation of all boundary coef ficients are independent of the evaluation of the choice of patch. In other words, no round off error and cracking are possible for the first case. The boundary coefficients of a cpatch are com puted by the same rules in Figure 24, therefore watertightness are also achieved for the lateral two cases. Note that computation of the cubic boundaries shared by a bicubic and a cpatch is mathematically identical. 2.8 Discussion The introduction of triangular patches to model quad patches is somewhat unconventional, but has been used in an I3D paper before [15]. Also [49] is based on triangular patches. Evaluation and normal computation of degree 4 triangular patches is comparable in cost to 1. :: .. .. ioduct bicubic l.t1 11 in the triangular case wMe have to average 15 control points, in the tensorproduct case 16. Triangular i :t 7: may deserve more attention in OpenGL. CHAPTER 3 GPU IMPLEMENTATION 3.1 Overview We implemented the conversion scheme using C++ on DirectX 10 pipeline. We compute vertex neighborhoods according to Figure 24 in the vertex shader and use the geometry shader primitive triangle 0I itr adjacency to accumulate the coefficients of the bicubic patch or compute a cpatch according to Figure 26. We implemented conversion plus rendering in two variants: a 1pass and a 2pass scheme. 3.2 2pass Approach Vertex Sae ,, v, to~tl,f, Geometry Shader input Assmb po~siio7. normal Pixel Shader Figure 31. 2pass implementation detailed in Figure 32. The first pass converts, the second renders. Note that the geometry shader only computes at most 24 coefficients per patch and does not create (amplify to) evaluation point primitives. VS In ps, n, or VS Use texture lookunp to rerieve pzj, pa y 1 VS Out v, to,11, f;,4 j 0..n 1 GS In v' t(, f', i .. GS ifr legular1 quad assemble: gkl, k,I~ = 01..3 else comnpute b'l~~,z zbjzzbjz GS Out if legullar quad, stream out 3; k,~ I 0..3. else str~eam~ o~ult b'4o,tr(b, b21b , i 0..3. Pass 2 Evahutitine~ Position and NJonnal VS In (,v VS ilglrqa comrpute noral and position at (u, vr) by the tensored de C'astel~~l' j n'll algoitiun else Compute: the remnaining Beizier~ contl~E~ points Compute: normal and poiitionl at (u, 1) by de Castelrljaul's abo(lit1un adjusted to cpaitches. VS Out position, nolrmal PS In possition, normal PS compyute colori PS Out ~color ConvTersion Figure 32. 2Pass conversion: VSvertex shader, GS=geometry shader, PS=pixel shader. VS Out of Pass 1 outputs NV points fj for one vertex (hence the subscript) and GS In of Pass 1 retrieves four points f", each generated by a different vertex of the quad (hence the superscript). The 2pass insplententation constructs the patches in the first pass using the vertex shader and the geometry shader and evaluates positions and normals in the second pass. Pass 1 streams out only the 4 x 6i coefficients of a cpatch and not the 4 x (4+22) Be~zier control points of the equivalent triangular pieces. The data amplification necessary to evaluate takes place by instancing a (u, v)grid on the vertex shader in the second pass. That is, we do not stream back large data sets after amplification. Position and normal are computed on the (u, v) domain [0..1]2 Pass I of the bicubic or of the cpatch (not on any triangular domains). We pretessellate the quad domain, and store the results in a set of textures with different resolution. If a tessellation factor is chosen to be mr, the texture with (mr + 1) by (mr + 1) parametric values will be sent to the vertex shader in the subsequent evaluation pass. Given the pretessellated domain with a patch identifier, the vertex shader loads the appropriate control points and evaluates the patch. Figure 32 lists the input, output and the computations of each pipeline stage. Figure 31 illustrates this association of computations and resources. In order to avoid pricy branching in HLSL(High Level Shader Language) and optimize the performance, specialized shaders are actually written for patch constructions and evaluation based on the patch type. 3.3 1pass Approach In the 1pa~ss insplententation, the evaluation immediately follows conversion in the geometry shader, using the geometry shader's ability to ansplipS, (i.e.), output multiple point primitives for each facet (Figure 34). While a 1pass implementation sounds more efficient than a 2pass implementation, DX10 limits data amplification in the geometry shader so that the maximal evaluation density is 8 x 8 per quad. Moreover, maximal amplification in the geometry shader slows the performance. We observed a minimum of 25' better performance of the 2pa~ss insplententation. Figure 3.3 lists the data flow on the graphics pipeline. 3.4 Coordinate System Transformation When we evaluate normal and position of an irregular quad at (u, v), we need first transform the tessellated domain value from a Cartesian coordinate (u, v) to a barycentric coordinate (s, t, w). Figure 35 illustrates how to locate which of four triangles where (u, v) lies on. In this way, we minimize number of comparisons and take care of the shared vertices. We make (0.0, 0.0), (1.0, 0.0), (0.5, 0.5) only belong to TI, (1.0, 1.0) only belongs to T2, and (0.0, 1.0) only belongs to T4 VS In p,,n, a VS Use textiure lookup to3 retri~ever pzf !y, p+1 Compute vey,,f,, f/od1 VS Out vr COitolty fi iJi 0..~n1 GS Inl v',f~!14Cri, f 0..3 GSifrdrqa assembled gkl k, !1I =..3 telsFsllare thte: pammetriei domain compute ~nonal andl positions a (u, v) else computer b 1b'21! b; 12 compute the remaining Bezie control points tesse~llate the pnamInetheic domain compute no~nnal and2 position at (u, v) b~y de Casnteljou's ,algorillun adjusted to epa'thels. PS I~n pos~ition. normnl PS comrrpute color PS Ou~t cor Pa ss 1 Com ersion a~nd Ev~aluation Figure 33. 1Pass conversion: VSvertex shader, GS=geometry shader, PS=pixel shader. GS amplifies the geometry and evaluates the patches. Input Assembler P,, n, a Vertex Shader v, to,t,,f, v', toltl f1 Position, normal Pixel Shader olor, Figure 34. At present, the 1pass conversionandrendering must place patch assembly and evaluation on the geometry shader. This is not efficient. (O 0,1 0) (1 0 1 0) If (vu <= 0) if (u+ v1 <= 0) T, n '"" ,,else T2 else ifs Ti(u+v1 <=0) T4 Figure 35. (u, v) on an irregular quad. 3.5 WaterTight Evaluation The HLSL code in Figure 36 shows that the same cubic curve is evaluated along the boundary. An explicit if statement in the evaluation guarantees the exact same ordering of computations since boundary coefficients are only computed once, VS OUTPUT eval cpatchEval( VS_INPUT_levarl input, uiint: vlD: S'V_1nstancelD) VS_OUTPUT_eval output; // Compute 1 5 Bezier coefficients of the triang ular patch // In th e boundary case for th e sake of waateriig htness. [branch] if (wV == 0) { // The vcoordiinaite is what needs to be used to evaluate at the // boundary. Since u=1v, it is also used here. output.lP = ( ( u*u*u)*b300 + ( v*v*v)*b030 ) + ( 13*u'u'v)*b210 + (3*u*v*v)*bl 20); else { !/Evaluate~s the interior points return output; Figure 36. Watertight Evaluation 3.6 Conclusion The presented approach fits well into a GPU pipeline. In both approaches, we compute v, e, f and t using its vertex neighborhood and the rules in Figure 24 in the vertex shader. Each vertex has 2n + 1 vertices in its vertex neighborhood, where n is the valence. This information is stored in a texture. With a vertex ID and its valence, all vertices in its neighborhood can be retrieved in counterclockwised order. In the geometry shader, the patch is finalized and assembled. Overall, the 2pass implementation has better performance because of small stream out, short geometry shader code and minimal amplification on the geometry shader. CHAPTER 4 RESULT S 4.1 Shape Quality Our algorithm produces C1 surfaces and they closely approximate CatmullClark subdivi sion surfaces. We compare our algorithm with [30] on the closeness to CatmullClark surfaces. We measure how the surface is close to CatmullClark surface by comparing both geometric dif ference and normal angle difference. Figure 41 compares the smoothed quad mesh surfaces with densely refined CatmullClark subdivision surfaces based on the same mesh. Both geometric distance, as percent of the local quad size, and normal distance, in degrees of variation, are com pared. Especially after displacement, large models rendered by subdivision and quad smoothing appear visually indistinguishable. The relatively small examples, without displacement, shown in Figure 41 and the close up in Figure 45 are also important to support our observation that cpatches do not create shape problems compared to a single bicubic patch: despite the lower degree and internal C1 j oin, their visual appearance is remarkably similar to that of bicubic patches. The comparison with ACCpatches [30] is shown in 42. Figures 43, 44 show the generated smooth surface by our algorithm and the surface after applying displacement mapping respectively. CC Our Scherne CC Our Scherne Surface OO 99 Geometry f~~ Difference (%) 0 M4 S8 4C AII 11 Normal Angle Difference (O) 1 I p Figure 41. Comparison between the CatmullClark (CC) subdivision limit surface and the smoothed quad mesh surface for the same input. Cpatch Avg. 1.07474 Max: 4.43114 ACC patch. Loop 07 Avg: 1.76099 Max:4 92876 Geometry distance error Normal angle error Figure 42. Comparison of ACCpatch and Cpatch in terms of approximation of CatmullClark subdivision surfaces for the same input. Figure 43. GPU smoothed quad surfaces: orange patches correspond to ordinary quads, blue patches to extraordinary quads. Figure 44. GPU smoothed quad surfaces with displacement mapping. 9 Avg: 1.88284 Max: 8.06158 Avg: 1.89969 Max: 7.06922 We compiled and executed the implementation on the latest graphics cards of both maj or vendors under DirectX10 and tested the performance for several industrysized models. Two surface models and models with displacement mapping are shown in Figure 43 and 44 respectively. Table 4 summarizes the performance of the 2pass algorithm for different granularities of evaluation. The frog model, in particular, provides a challenge due to the large number of extraordinary patches. The Frog Party shown in Figure 411 currently renders at 50 fps for uniform evaluation for N=9, (i.e.), on a 9 x 9 grid. That is, the implementation converts 1292 9 quads, of which 59% are extraordinary, and renders of 1 million polygons 50 times per second. On the same hardware, we measured Bunnell's efficient implementation (distribution accompanying [9]) featuring the single frog model, (i.e.), 1/9th of the work of the Frog Party, running at 44 fps with three subdivisions (equivalent to tessellation factor N=9). That is, Table 41. A a total degree 4 patch and a bicubic patch have the same evaluation cost at (u, v) in terms of ALU operations. evaluation for a cpatch ALU vector ops position 55 normal 3 other 1 total 59 evaluation for a bicubic patch ALU vector ops position 56 normal 3 other 0 total 59 Table 42. Frames per second for some standard test meshes with each patch evaluated on a grid of size NVx NV; eqs = percentage of extraordinary quads. Sword and Frog are shown in Figure 43, Head in Figure 41. Mesh Frames per second (verts,quads, eqs) N = 5 9 17 33 Sword (140,13 8, 38%) 965 965 965 703 Head (602,600, 100%) 637 557 376 165 Frog (1308, 1292, 59%) 483 392 226 87 4.2 Performance Figure 45. Closeup of the frog. The refined mesh is watertight. Table 43. Performance of the 1pass implementation. Mesh Slower 1pass implementation NV= 2 5 8 Sword 389 96 43 Head 108 34 15 Frog 44 10 4 GPU smoothing of quad meshes is an order of magnitude faster. Compared to [46], the speed up is even more dramatic. While the comparison is not among equals since both [46] and [9] implement recursive CatmullClark subdivision, it is nevertheless fair to observe that the speedup is at least partially due to our avoiding stream back after amplification (data explosion due to refinement). We expect that more careful storage of vertex neighborhoods, in retrieving order, will further improve our use of texture cache and thereby improve the frames per second (fps) count. 4.3 Displacement Mapping Displacement mapping is a technique for adding geometric details on the mesh with a height map. It is different from Bump Mapping or Normal Mapping in the sense that it changes the geometry by moving vertices often along their normal directions according to the value in the height map. The change of real geometry, not just normal for instance in Bump Mapping, permits selfocclusion. Figure 46 shows the displacement mapping on the frog model which consists of 330k facets. The size of height map is 1024 by 1024. Figure 46. Displacement mapping on the frog model In order to perturb normals after displacement mapping, we need D, and D, bump mapping value. The equation to calculate new normals is as follows. S =P + D n (41) where, S is the displacement of the point P, D is the displacement and n is the normal of P. Then the new normal is calculated by the cross product of S, and S,. S, = P,+ D, n + D n, (42) S, = P, +D, n + D n, (43) Note that n, and n, are the derivatives of the normalized normal n. n,= (44) where n, = P,, x P, + P, x Ps, 4.4 Morphing and Animation We implement morphing using the 2pass approach. The animated sequence of the input meshes in form of textures are fed into the Input Assembler of the first pass each frame. The morphed patches are constructed during the first pass. Fine details are added in the second pass. The screen shots in Figures 49, 410, 411 illustrate real time displacement and animation. Npatch, Vlahos 00 CatrnullClark Subdivision ACC: patch. Loop 07 Cpatch Figure 47. Comparison of the cpatch scheme with PNTriangles(also called Npatch), ACCpatch, and CatmullClark subdivision Input Mesh CatmullClark Subdivision Npatcr. Vla hos 00 Cpatch Figure 48. comparison of the cpatch scheme with PNTriangles(also called Npatch), ACCpatches, and CatmullClark subdivision Figure 49. Real time animation on the Sword model. Figure 410. Real time animation on the Frog model. Figure 411. Asynchronous animation of nine Frogs. 4.5 Conclusion Smoothing quad meshes on the GPU offers an alternative to highly refined facet repre sentations transmitted to the GPU and is preferable for interactive graphics and integration with complex morphing and displacement. We advertised a 2pass scheme, since, as we argued, the DX10 geometry shader is not well suited for the data amplification for evaluation after conversion. The 1pass scheme outlined in Section 3 may become more valuable with availability of a dedicated hardware tesselator [29, 48]. Such a tesselator will make amplification more efficient and support adaptive tessellation (which is why we only discussed uniform tessellation in Section 3). Such a hardware amplification will also benefit the 2pass approach in that the (u, v) domain tessellation, fed into the second pass will be replaced by the amplification unit. CHAPTER 5 PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS Our conversion algorithm can be generalized to work for arbitrary meshes. The generalized algorithm [34] provides an elegant solution for meshes with Tri/Quad/Pent Facets. Removing restrictions on vertex valences and allowing meshes with triangles, quadrilaterals, and pentagons vastly simplifies a designer's task and enriches the design space of meshes for smooth surfaces: while quads naturally model the flow of (parallel) feature lines and are therefore the main facet type in models, triangular facets allow merging lines while pentagonal facets allow to starting new lines (Figure 51) without creating Tcorners or forcing refinement of intermediate models to satisfy connectivity or quadlayout constraints. Essentially, designers can reuse the whole range of polyhedral models they are used to. We modified the algorithm for converting quad meshes to a generalized method for a mesh with Tr/Quad/Pent facets. The generalized scheme converts such a polyhedral model to a surface with everywhere welldefined normal and C2 i 'regular' mesh regions with quadgrid connectivity. Figure 52 shows an example of the resulting surfaces. Note that the facets are limited to triangles, quads and pentagons due to current GPU Figure 51. (a) Retaining the density of feature lines while varying their number. (b),(c) Axe handle detail using a triangle and a pentagon to transition between detailed and coarser areas. constraints and to avoid unnecessary notational, technical and shape complexity. An irregular facet with k sides is converted into a kpatch. A kpatch is a generalization of a cpatch. It is a piecewise degree 4 C1 spline patch with k cubic boundaries. A kpatch is defined by 6k + 1 control points indicated as o in Figure 53(b),(c). That is, the kpatch corresponding to a triangular, quadrilateral or pentagonal facet is defined by a total of 19, 24 or 31 points respectively. Figure 52. The generalized scheme converts a mesh with Tri/Quad/Pent Facets to a smooth surface consisting of bicubic patches (yellow), kpatch with k = 3 (green), k = 4 (red), and k = 5 (gi av}. i~4 0313 23 3 ordinary (a) l"h:" b" polar extraordinary (b) (c) Figure 53. (a) An ordinary facet is converted to a bicubic patch with 16 control points gij. (b),(c) An extraordinary facet with k sides is converted to a k defined by 6k + 1 control points shown as 0. The k can be viewed as k C1connected degree4 triangular patches i, i = 0 .. k1 with cubic outer boundaries. 300 210 120 030 (a) 004 10 13 202 2 301/ ~s p 031 400 310 220 130 040 (b) Figure 54. The triangular sectors are listed in counterclockwise order with a modulok superscript. (a) 14 control points from three consecutive sectors of a kpatch define (b) a single patch in triangular Be~zierform. For evaluation, we can recover the polynomial representation of the ith sector in triangular form of totaldegree 4 (Figure 53(b) and (c)), S(uv) :=C ijk uivildV(1U u vLk (51) i+j+k=4 where the (4+~2) BBcoefficients ijk E R3" are indexed as in Figure 54. Specifically, we compute the (4+2") COefficients ijk (Figure 54(b)) from the 14 coefficients labeled in 54(a) by simple averaging: degreeraising the coefficients i31,1,0, I = 0, .,3 to i4,,0, e= 0, .,4 1i400, 1310, 1220, 1130, 1040]= [1300, 4303ii i20i20 4 2 3, 1030] and computing the shared coefficients on the sector boundaries i3,0,1+ 10,3,1+, =0, 1, 2, 3, (i.e.), indices 301, 202, 103 and 004 in Figure 54 (b), from the C1 constraints. Read [34] for a thorough explanation of the algorithm and its GPU implementation, smoothness verification, etc. CHAPTER 6 DISCUSSION AND FUTURE WORK 6.1 Future GPU API Our conversion scheme not only fits well with the current graphics hardware pipeline, but also matches very well with the architecture of the future graphics hardware[29, 48]. The work load currently in the geometry shader will be assigned to the patch shader. The ideal GPU pipeline needs to explore more parallelism in the geometry shader where 24 coefficients of a cpatch can be computed independently given the vertex neighborhood. The maximal parallelism makes the cost of deriving one coefficient roughly equals to the cost of constructing a whole patch. Currently we precompute the tessellated domain and store these static values in a set of textures. In the future, this part of computation will be replaced by the tessellation unit. Animation using our conversion scheme will be achieved in a single pass without geometry transmission between passes. 6.2 Volume Preservation Preserving the volume under constraints can achieve a realistic deformable obj ect ani mation. The wellknown divergence theorem can be used to reduce a volume integral to an an integral over the surface. Given a closed obj ect, volume is matched to a prescribed value by infl ating or defl ating the deformable obj ect uniformly. For enhancing the realism, this method can be further extended to fix parts of the obj ect and attach different material properties to surface pieces. This exact, localized volume preservation method works for all surfaces that consists of Bezier patches. Therefore, we will combine this method with our new surface conversion algorithm to achieve realtime volume preservation. 6.3 Adaptive Tessellation The adaptive tessellation samples each surface patch more densely in regions of high curvature and less densely in regions of low curvature. Moreover it adjusts the level of detail according to how close the geometry is to the camera. The surface is only tested where and when it's necessary. Therefore, adaptive tessellated surface will greatly improve the performance. The tessellation factor can be generated by using the flat test [9]. With the tessellation unit in the GPU, the cost of tessellating the domain is almost free. [15] C. Gonzalez and J. Peters. Localized hierarchy surface i.11:. In S. S. J. IRossignac, editor, At : .' :.. .. .. ..... on Interactive 3D' C::.., .!: : .715, 1999. [16] M. Guthe, A. Balaizs, and R. Klein. GPUbased trimming and tessellation of NURBS and T ;l:.: :At : .' i . ... . .. oni C: .~ .!7 241(3): 10161023, [17] M. Guthe, A. Balaizs, and R. Klein. GPUbased trimming and tessellation of NURBS and Tspline ::: 1.: A( if :' !..: C. 9::l 241(3):10161023, [18] M. Halstead, M. Kass, and T. DeRose. riTE:. .:.1 fair interpolation using C ::::i : : ? :. of SIGGRAPH2~ 93, i : 35441, Alug 199)3. [19] H. Hoppe, T. D3eRose, T. Duchamnp, M. Halstead, H. Jin, J. McDonald, J. Schwneitzer, and WJl. i.:: i ] 1 i . smooth .::: E reconstruction. C.. ret: Graplhics,r 28(A1nnual Conference Series::~': 302, 1994. [20] D1. L. James and C. D). T..1_ Skinning mesh animations. In S'IG( :.;.' / :'t '05: A('' SIGGRAPH_~::: '.'':. . pages _1'^ 07, New/ York, NY, USA, 2005.,': '1 [21] K. 1 :: :.::: i .: and J. F: : : Guided i:: :::I~:: http ://www. cise. ufl. edu/resear; !. 3./::i ? papers. shtml. [22] O. A. ii. :: penko and J. F;. Hughes. Smoothskietch: 3d freeform shapes from I i sketches. Afl: ': . . . : . onz Cr ~.. :: 25/3:589)598, [23] Lt. K~avan, C. O' Sullivan, and J. Zara. Efficient collision detection for spherical blend skinning. In 1 :. ... of ..7. 4th intern7 atio nal .,:..... ona C. 1 c./. .: ; 7.. .!' and interactive 7.:.... in Aulstralasia an~d .7 Aia: table o~fcontents, .. ...: LungnuI MaTJlays~ia, i : 1 i 156, I~; [24] Lt. K~avan and J. Zara. Spherical blend 7:: a realtime deformation of articulated model s. In I3D) '05: P; .7. of cthle 2: sympnosium7 on Interactfive 3D) graphics an~d games, .: 916, New York, NY,I L _~:~.l 4 A [25] A. Krishnamurthy, R. Khardek~ar, and S. McMains. Direct evaluation of nurbs curves and on the GPU. In .' i:' '07: 7 .7: fl .7 i 7A( \: /syg~ii~i~osriuml on Solid an7d physical.... .7 : 329334, "i Y Tork, NYi, L 4, : 7. At [26] S. ILai and Fi. F;. Cheng. ,": 1.: i ive rendering of catmullclark subdivision surfaces. In C7AD CGr '05:P. Y. . :~ of he Ninth Intlernlational C.~ :;: on C ... '. AidedL~ D. .. : and Computer C. ..: ... A pages 125132, Wiashington, DC~, USA, 1:I. IEEE Computer Society. [27] A. Lee, Hi. Mloreton, and HI. Hoppe. Displaced subdivision surfaces. In Kt. Akeley, a p ...... .,77,,,,.7...7 p. pages 859C4. Atl ii Press / At . SIGGRAPH /Addison Wetsley Longman, 2: ~. citeseer.ist.psu.edu/ilee00Cdisplac~ed .ht [28] A. ILee, Ht. M~oreton, and H. if i Dii isplaced subdivision surfaces. In Kt. Akeley, editor, :  : 1:2 C1 ::.I. .: C (:.I : 7. .. ...:. Annual Confe~rence Series, pages 8594. A1CM Press / A1CM SIG= :, 4PH / Alddison WVesley Longman, I~:: [43] H. Prautzsch, W. B3oehm, and M~f. Paluzny. Bezier and B" .1 :. .1::? ......... ,; , [44] K. Pulli and M. Segal. Fast rendering of ::i .. : : :. : :: .~: In SIGGR2T~ A PH '96: AC: ' SIG(C'..:. ; PH~ 96 Visual PI . .r:. T/.. ar~t and ..: .J. ;.. .J  .: .::. of,~'I;. SIG .:PH '96, page 144, Newv Yorki, NY, C~^ 4, 1996. ACM. [45] S. Schaefer and J. WJlarren. Exact evaluation of nonpo ynomial :il:: : : schemes at rational parameter values. In PG; '07: P; r :_ of I ..' 1 ~. C1 ..:. .. . on Clompukter C:l ..7... .~ antd I. 7.. .... .. 32 13 30, .:: : : : :. DC USA : 1 ? ~ C .= 1.. Society. [46] IL.~J. Shiue, 1. Jones, and J. Peters. A realtime GPU subdivision kernel. In M. Gross, editor, 7. C'ompukter C' I P: I: Annual Conference: Series, i 1010c1015. ACM Pr~ess / ACJM SIIGGRAPH /: Addison We'tsley Lotngman, I:: [47] J. Stam. Exact devaluation of GatmullClarki subdivision surfaces at arbitrary parameter values. In SG;( . i P.nFr i 395 i:^ = 1 1998. [48] A. Tatarinov. Instanced . .ii ==:.:. in .::. : i0, http ://wwwv.microsoft. com/idow:ij: i= : : i :1 i :: i : .; Familyld=5 72BE8A6263A 41424 A 7FE69C FF 1A 5B 18Odi spl ayl ang=en. [49] Ai. Vlachos, J. To r: : C. Boyd, and J. L. '. IT 1: .11 Curved PN triangles. In  1:, S' ..: : oni Initeractfive 3D> C .. BiAnnual Conference Series, i.. I59166. [50] D>. Zorin. '::i : : : .: for modeling and animation. AC 1 ',S'IGGRAPHH Course Notes, Tianyun Ni was born in ... I: China. She was awarded her BS in computer science with mathematics minor from Texas State University in ' ~ and her ME in computer engineering from University of Florida in She earned her doctoral degree in computer i ': field in BIOG~RAPHICIIAL SKETCH PAGE 1 REALTIMESMOOTHSURFACECONSTRUCTIONONTHEGRAPHICSPROC ESSING UNIT By TIANYUNNI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2008 1 PAGE 2 c r 2008TianyunNi 2 PAGE 3 Tomyfamily,especiallymyfatherandtoallofwhomhavelent encouragementandsupport duringthetimespentonthisresearch 3 PAGE 4 ACKNOWLEDGMENTS Iwishtoexpressmysincerestthankstothechairofmydisser tationcommittee,Dr.J¨org, Peters,forworkingwithmethroughoutthislongenterprise 4 PAGE 5 TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................... 4 LISTOFTABLES ....................................... 7 LISTOFFIGURES ....................................... 8 ABSTRACT ........................................... 10 CHAPTER 1INTRODUCTION .................................... 11 1.1Motivation ...................................... 11 1.2ProblemStatement ................................. 13 1.3ModernGPUPipelineandCurrentTrends ..................... 14 1.4RepresentationsinSurfaceModeling ........................ 17 1.4.1SubdivisionSurfaces ............................ 17 1.4.2ParametricPatches ............................. 20 1.4.2.1Beziertechnique ......................... 22 1.4.2.2Relatedwork ........................... 23 2ANEWSCHEMEFORSURFACECONSTRUCTION ................. 25 2.1Contribution ..................................... 25 2.2TheConversionAlgorithm ............................. 25 2.2.1TheConversionRulesforaType1Quad .................. 27 2.2.2TheConversionRulesforaType2,orType3Quad ............ 29 2.3Derivationofthecoefcientsofacpatch ..................... 30 2.3.1Derivationof 0 and 1 ........................... 31 2.3.2Derivationof b 211 and b 121 .......................... 31 2.3.3Derivationof b 112 .............................. 33 2.4SmoothnessVerication .............................. 35 2.5ComplexityAnalysis ................................ 39 2.5.1NumberofPatches ............................. 39 2.5.2CostofPatchConstruction ......................... 39 2.5.3CostofSurfaceEvaluation ......................... 39 2.6ApproximationCatmullClarkSubdivisionSurface ................ 40 2.7WaterTightSurfaceVerication .......................... 40 2.8Discussion ...................................... 40 3GPUIMPLEMENTATION ................................ 42 3.1Overview ...................................... 42 3.22passApproach .................................. 42 5 PAGE 6 3.31passApproach .................................. 44 3.4CoordinateSystemTransformation ......................... 44 3.5WaterTightEvaluation ............................... 46 3.6Conclusion ..................................... 47 4RESULTS ......................................... 48 4.1ShapeQuality .................................... 48 4.2Performance ..................................... 50 4.3DisplacementMapping ............................... 51 4.4MorphingandAnimation .............................. 52 4.5Conclusion ..................................... 55 5PATCHCONVERSIONSFORMESHESWITHTRI/QUAD/PENTFACETS ..... 56 6DISCUSSIONANDFUTUREWORK .......................... 59 6.1FutureGPUAPI ................................... 59 6.2VolumePreservation ................................ 59 6.3AdaptiveTessellation ................................ 59 REFERENCES ......................................... 61 BIOGRAPHICALSKETCH .................................. 65 6 PAGE 7 LISTOFTABLES Table page 41ALUoperationsforevaluationat ( u;v ) .......................... 50 42Performanceresults .................................... 50 43Performanceofthe1passimplementation. ....................... 51 7 PAGE 8 LISTOFFIGURES Figure page 11Polygonalmodeling .................................... 11 12Problemstatement ..................................... 12 13DirectX10pipelinestages ................................ 14 14DirectX10pipeline .................................... 15 15Theprimitives ....................................... 16 16Thenotationsofinputmesh ................................ 17 17Thethreepossiblecongurations ............................. 17 18TheCatmullClarkstencils ................................ 18 19Thesubdivisionschemes ................................. 19 110Thesuggestedrenderingpasses .............................. 21 111FutureGPUarchitecture ................................. 22 112Thesubdivisionschemes ................................. 24 21Derivationofcpatch ................................... 25 22Vertexcomputation .................................... 26 23Surfaceconversion .................................... 26 24Computingcontrolpoints v e f and t ,theprojectionof e ................ 27 25Patchbasedcomputation ................................. 28 26Patchcomputation ..................................... 30 27Thereparameterizationof tomeet G 1 atthevertex .................. 32 28Coefcients b 211 and b 121 ofcpatchisderivedontopofaghostpatch. ......... 32 29Thechoiceofmiddlepointincpatch .......................... 34 210Thecenterofabicubicpatchcanbeevaluatedbythelin earcombinationoftheboundarycoefcients. ...................................... 35 211 C 1 transitionbetweenatriangularandabicubicpatch. ................. 37 212 G 1 transitionbetweentwotriangularpatches. ...................... 38 8 PAGE 9 312Passimplementation .................................. 42 322Passconversion ..................................... 43 331Passconversion ..................................... 45 341Passimplementation .................................. 45 35 ( u;v ) onanirregularquad. ................................ 46 36WatertightEvaluation .................................. 46 41Shapequalitycomparison ................................. 48 42CatmullClarkapproximationcomparison ........................ 49 43Ordinarypatchesandextraordinarypatches ....................... 49 44GPUsmoothedquadsurfaceswithdisplacementmapping. ............... 49 45Closeupofthefrog.Therenedmeshiswatertight. .................. 51 46Displacementmappingonthefrogmodel ........................ 52 47Shapecomparison ..................................... 53 48Shapecomparison ..................................... 53 49RealtimeanimationontheSwordmodel. ........................ 54 410RealtimeanimationontheFrogmodel. ......................... 54 411AsynchronousanimationofnineFrogs. ......................... 54 51ThereasonsforusingTr/Quad/PentMeshes ....................... 56 52Aquad/tri/pentmodel ................................... 57 53Patchrepresentations ................................... 57 54Triangularrepresentation ................................. 57 9 PAGE 10 AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy REALTIMESMOOTHSURFACECONSTRUCTIONONTHEGRAPHICSPROC ESSING UNIT By TianyunNi August2008 Chair:J¨org,PetersMajor:ComputerEngineering Increasedrealismininteractivegraphicsandgamingrequi rescomplexsmoothsurfaces toberenderedateverhigherframerates.Inparticular,rep resentationsusedtomodelsurfaces ofine,suchassplineandsubdivisionsurfaces,havetobem odiedorreorganizedtoallow forefcientusageofthegraphicsprocessingunitanditsSI MD(SingleInstruction,Multiple Data)parallelism.Thisdissertationpresentsanovelalgo rithmforconvertingquadmesheson theGPUtosmooth,watertightsurfacesatthehighestspeed documentedsofar.Theconversion reproducesbicubicsplineswhereverpossibleandclosely mimicstheshapeoftheCatmullClark subdivisionsurfacebycpatcheswhereavertexhasavalenc edifferentfrom4.Thesmooth surfaceispiecewisepolynomialandhaswelldenednormal severywhere. 10 PAGE 11 CHAPTER1 INTRODUCTION Thischapterintroducesthechallengesthatmotivatethedi ssertation,givesadetailed literaturereview,positionsoftheresearchrelativetoth ecurrentstateoftheart.andanoverview ofthemodernGPUpipeline. 1.1Motivation Ingraphics,3Dobjectsareapproximatedbypolyhedralmesh esofgreatcomplexity.For example,agamecharactercanconsistoftensofthousandsof polygons(Figure 11 ).Increased realismininteractivegamingdemandssuchmeshestobeanim atedandrenderedinrealtime. Thereareessentiallytwomajorapproachesintheliteratur ewhichservethispurpose:Polygonal ModelingandHigherorderSurfaceModeling. Therearetwoscenariosofanimations:MorphingandSkinnin g.Morphingisusedto changeoneimageintoanotherthroughaseamlesstransition .Skinningisacommontechnique todeformcharacters[ 20 23 24 32 ].Theanimatedmesh,referredasaskin,isdeformed basedontheposeofanunderlyingskeleton.InPolygonalMod eling(Figure 11 ),skinning andmorphingareappliedtoahighdetailmeshcreatedbyana rtist.Mostgamescurrentlyuse thisapproach.Thistechniqueinvolvesredundantworkduet ominimalsharinginPolygonal Modelingrepresentation.Inaddition,alargenumberofver ticesinacomplexmeshmustbefed intothegraphicspipelineviatheGPU'smemorybus,whichis apotentialbottleneck. Figure11.PolygonalModeling:currentlythepopularanim ationapproachingames. 11 PAGE 12 Thealternativeapproach,SurfaceModeling,animatesacoa rsemesh(Figure 12 ). Subdivisionsurfacesandparametricpatches,astwopopula rhighordersurfacerepresentations, bothsupportlevelofdetailrendering(seeSection1.4).Hi ghlydetailed3Dmodelsareproduced bydisplacementmapping[ 11 ].Displacementmappingaddsnedetailsinformofscalar eldsonthesmoothsurfacedenedbythecoarsemesh.Asaspe cicinstance,Lee[ 27 ] proposesDisplacedSubdivisionSurfacetorepresentadeta iledsurfacemodelasascalarvalue displacementoverasmoothsurfacedomain.Thisapproachre ducesthenumberofverticesthat mustbereadandanimatedineachframebecausecomplexgeome tricdetailsaregeneratedon theGPU.Theruntimecostnowincludestheconversionproces sfromthecoarseinputmeshto thenalcomplexmesh.Theconversionprocessinvolvessurf aceconstruction,evaluationand displacementmapping. Figure12.EachhighdetailmeshinSurfaceModelingisrep resentedbyacoarsecontrolmesh withadisplacementmap.Thecoarsecontrolmeshisrstconv ertedtoasmooth surface.Thenthesurfaceistessellatedandtheverticesar eperturbedinthenormal directionsbasedonthecorrespondingvalueinthedisplace mentmap.Last,the normalateachvertexoftherenedhighlydetailedmeshisu pdated. Insummary,theadvantagesofSurfaceModelingare 1. lowercomputationcostofanimationbecauseskinningisdon eonthecoarsemesh,notthe naldensemesh; 2. memoryandbandwidthsavingsbyencodingmostdetailasonedimensionaldisplacements ratherthanthreedimensionalvectors; 12 PAGE 13 3. supportofrenementlevelonthey; 4. customizationofarchetypes:wecanmodeldifferent3Dmode lswiththesamecoarse mesh,changingonlythedisplacementmap; 5. supportofadaptivetessellation:evaluationdoesnothave tobeonauniformgrid. ThedisadvantagesofSurfaceModelingisthatmodernGPUsca nnotrendersuchsurfacedirectly. Thesurfacemustbeconvertedintotrianglesorquadsthroug haprocessoftessellationand evaluation.Therefore,SurfaceModelingbecomesmoreattr activeasarealtimetechniqueonlyif theconversionismorecheaplythanthecostofreadingandan imatingahighpolygonmesh.Our goalistodesignsuchaschemeontheGPU. 1.2ProblemStatement Meshesconsistofpurequadrilateralfacetsarecommoninmo delingforanimation.Any polyhedralmeshcanbeconvertedintosuchaquadmeshbyones tepofmeshrenement.Buta gooddesignercreatesmesheswiththequadrestrictioninm indsothatnoglobalrenementis necessary.Wethereforefocusonquadrilateralmeshesanda imtoderiveasetofefcientrules directlyontheGPU(Figure 12 ,thereddottedrectangle)thatproducesurfaceswithgoodv isual quality.Specicallytheresultingsurfacesshould 1. generateasmallnumberoflowdegreepolynomials; 2. possesssmoothgeometry(noextracostforsmoothshading); 3. closelyapproximateCatmullClarksurfaces(astandardmo delingtool); 4. arewatertight(nopixeldropsout); 5. mapwelltothegraphicspipelineandleveragethestrengths ofGPUcomputation. 13 PAGE 14 1.3ModernGPUPipelineandCurrentTrends Agraphicsprocessingunit(GPU)isadedicatedgraphicsren deringdevice.ItsSIMD architecturehasevolvedsubstantiallyoverthelastdecad e.Thishighlyparallelstructuremakes itmoreeffectivethangeneralpurposeCPUsforarangeofal gorithms.ModernGPUsexposea programmableparallelstreamprocessingpipelineasaseri esofshortprogramscalledshaders. Duringthelastveyears,majorgraphicssoftwarelibrarie ssuchasOpenGLandDirectXare usedtoprogramtheGPUviashadersonaprogrammablepipelin e,whichhasmostlysuperseded theolderxedfunctionpipeline.Thetwomostpopulargr aphicssoftwarelibraries,DirectX andOpenGL,currentlybothspecifyAPIsforthreetypesofsh aders:vertex,geometry,andpixel shader.TheshadersinDirectX10system[ 4 ](Figure 14 )shareacommoncorethataccesses upto128memorybuffersand16parameter(constant)buffers .Vertexandpixelshadersusea onein,oneoutdataprocessingmodel.Incontrast,theg eometryshaderhasalimitedabilityto amplifyorreduceprimitivecountandthusisabletochangem eshes.Figure 13 showstheinput Figure13.TheinputandoutputofeachpipelinestageinDir ectX10system andoutputofeachpipelinestage.Themoredetailedexplana tionofeachstageisasfollows: 14 PAGE 15 Figure14.DirectX10Pipeline1. TheInputAssembler(IA)gathersvertexdatatosetupvertex andindexbuffers.Vertex bufferscontainpervertexdatawhileindexbuffersdeneg eometryprimitivesasinteger indicesintovertexbuffers.Indexinghelpsavoidredundan tcomputationsofthesame vertex. 2. Thevertexshader(VS)typicallyprocessesvertexbasedop erationssuchaschangingthe positionandnormalofasinglevertex.Thecomputationsint hisstagearelocal.Each vertexonlyhasitsowninformationanddoesnotcommunicate withothervertices.TheVS ismostcommonlyusedtotransformverticesfromobjectspac etoclipspace. 3. Thegeometryshader(GS)processestheverticesofasinglep rimitive.Aprimitivecanbe apoint,alinesegment,atriangle,apointwithadjacency,a linesegmentwithadjacency, andatrianglewithadjacency(Figure 15 ).Duetotheavailabilityoftheprimitivevertices upto6verticesforatrianglewithadjacency),thecomputat ionsinthestageareless localthanthoseontheVSandPS.TheGScanemitadditionalpr imitives.Thisnew amplicationfeature,introducedinDirectX10,addsmore exibilityandmakesanumber ofalgorithms[ 1 ]possibletobeimplementedontheGPU,suchasmeshrenemen t, shadowvolumes,dynamicparticlesystems,etc.Thegeometr yshaderoutputmaybefedto therasterizerstageand/ortoavertexbufferinmemoryviat hestreamoutputstage. 15 PAGE 16 4. Therasterizer(TR)isaxedfunctionstagegeneratingfra gmentsbyllinginthepolygonssentthroughthegraphicspipeline.Clipping,culling ,perspectivedivide,viewport transform,primitivesetup,scissoring,depthoffsetals ohappeninthestage. 5. Thepixelshader(PS)operatesononefragmentatatime.Usua llyscenelightingand pixelrelatedeffectssuchasbumpmappingandcolortonema ppingoccurinthePS. 6. Theoutputmerger(OM)takesafragmentfromPSandperformst raditionalstenciland depthtestingoperationsaswellasrendertargetblendingt ogenerateanalpixelonthe screen. Figure15.Thesixprimitivesusedin GS ThefutureGPUpipeline[ 29 48 ]isexpectedtoprovideaTessellationUnit,combinedwith newshaderstagesforpatchconversionandevaluationoftes sellatedhighordersurfaces.The Tessllatorprovidesasolutiontoadaptiverenementonthe graphicshardware.Basedonuserprovidedtessellationfactorsperedge,thetessellatorad aptivelycreatesasamplingpatternof theunderlyingparametricdomainandautomaticallygenera tesasetofparametricdomains.In addition,twospecialshadersareintroducedtothenextge nerationGPUpipeline.Thepatch shaderconvertsaninputmeshtoasetofpatches.Theevaluat ionshadertakesthe ( u;v ) outputof thetessellatorandevaluatesthepatchat ( u;v ) .ThisfutureGPUarchitecturealsoallowstheGPU toexploitmoreparallelismbecausemultiplearithmeticun itscanberunningthesameevaluation shader.MoreovertessellationoccursontheGPUandovercom esthebottleneckofbusbandwidth causedbymodelcomplexity.ThenewGPUdesignindicatesSur faceModelingisthetrendfor realtimegraphics. 16 PAGE 17 1.4RepresentationsinSurfaceModeling InComputerGraphics,surfacesarerepresentedbypolyhedr almeshes.Apolyhedral meshisacollectionofvertices,edgesandfacets.Thevalen ceofthevertexisthenumberofits incidentedges.Eachfacetisannsidedpolygon.Inatriang ular(orquadrilateral)mesh,nequals 3(or4respectively).Anarbitrarymeshhasnsidedpolygon swherethevalueofnisarbitrary. ThedifferencebetweenRegularandIrregularVerticesaree xplainedinFigure 16 .Figure 17 illustratesthreepossibletypesofafacet. Figure16.TriandQuadrilateralmeshesandfacettypes1, 2,3. Figure17.Thethreepossiblecongurations.Type1Quadi sregular.Type2or3isirregular. Parametricpatchesandsubdivisionsurfacesaremajortool sformodelingfreeformsurfaces witharbitrarytopology.Amoreintuitivewayforinexperie nceduserstocreateshapebydrawing curves,orsketchisalsoavailable[ 22 36 ] 1.4.1SubdivisionSurfaces Subdivisionsurfaces,aspartofstandardmodelingpackage s(e.g.,3DMax,Maya,Softimage,Mirai,Lightwave,etc.),haveproventobeausefulmo delingtool.Subdivisionschemes wererstintroducedby[ 10 12 31 ].Theygenerateasmoothsurfacethroughmeshrenement 17 PAGE 18 process.Thismethodbeginswithacoarsemeshthatapproxim atesa3dmodel,knownasa controlmesh.Eachvertexinthecontrolmeshiscalledacont rolpoint.Controlpointsinuence theshapeofthelimitsurface.Themeshisrenedaftereachs ubdivisionstepbyinsertingnew verticesintothemesh,reningexistingpointpositions,a ndupdatingtheconnectivity.The positionsofthenewverticesinthemesharecomputedbythea veragingrulesthatapplytothe positionsofnearbyoldvertices.Theaveragingrulesaredi fferentfromschemetoscheme(see acomparisoninFigure 19 ),anditistheserulesthatdeterminethepropertiesofthes urface. Thegraphsthatillustratestherulesarecalledstencils.T hebinarysubdivisionsplitseachedge into2whileternarysubdivisionspliteachedgeinto3.Usua llyeachsubdivisionschemehasat mostthreetypesofrules:vertexstencil,edgestencil,and facestencil.Forexample,thestencils ofCatmullClarksubdivisionisshowninFigure 18 .Therenementrulesincludesstencils forsmoothsurfaceaswellasspecialrulesforcreatingshar pedorsemisharpedfeatures.Each renementstepproducesadensermeshthanthepreviousone. Thelimitsubdivisionsurfaceis thesurfaceproducedfromthisprocessafterinnitelymany timesofrenements.Inpracticaluse however,thisalgorithmisonlyappliedalimited,andusual lyfour,numberoftimes. Figure18.ThestencilsusedinCatmullClarksubdivision .Thesestencilsdenetherulesto derivethenewverticesthatlieontheoldvertices,edges,a ndfacets. ArealizationoftessellationontheyforLoopsubdivis ionsurfaceswasproposedin [ 33 ].Pulli[ 44 ]implementedLoop'ssubdivisionschemewithadditionsbyH oppeetal[ 19 ]. Bischoff[ 3 ]proposedaforwarddifferencingmethodthatonlyrequire saconstantamountof memoryregardlessofsubdivisionstep.DeRose[ 13 ]generalizedtheinnitelysharpcreases of[ 19 ]toobtainsemisharpcreases.Hoppe[ 19 ]extendedLoop'sschemebyintroducing 18 PAGE 19 Figure19.ClassicationofcommonSubdivisionSchemes. subdivisionrulesthatleadtoapiecewisesmoothsurfacewi thfeaturessuchascreases,corners, darts,andconicalvertices. Adaptivesubdivisioncandramaticallyspeeduptheperform ancebecausethelevelof detail(LOD)isupdatedbasedondynamicdistancewiththeca meraaswellasthecomplexity ofeachpartofthemodel.Adaptiverenementispreviouslyi mplementedusingquadtreedata structure[ 50 ].Eachlevelofthetreerepresentsonerenementlevelofth emesh.However,it isdifculttomaptherecursivenonuniformtreestructure toparallelcomputation.Bunnell[ 9 ] providescodeforadaptiverenement.Eventhoughthiscode wasoptimizedforanearlier generationGPUs,thisimplementationadaptivelyrenderst hesubdivisionsurfacesinrealtime oncurrenthardware.LaiandCheng[ 26 ]implementedadaptiveCatmullClarksubdivision.A hardwarearchitecturesupportforadaptiverenementispr oposedby[ 5 ] TheimplementationofsubdivisionsurfacesontheGPUcanbe roughlycategorized intothreegroups:(I)recursiveevaluation[ 9 13 28 44 46 ];(II)directevaluation[ 45 47 ]; (III)pretabulatedbasisfunctioncomposition[ 6 7 ].Recursiveevaluationisthemostintuitive way,butnotthemostefcientapproach.Stam[ 47 ]directlyevaluatessubdivisionsurfacesat 19 PAGE 20 arbitraryparametervalues.However,Stam'smethodcannot evaluateameshthatcontains Type3quads.Moreover,therequiredprojectionofcontrol pointsintotheeigenspaceistoo complexforlargemeshesontheGPU.Theweaknessof[ 6 7 9 46 ]isnotabletoconvertamesh withType3quadseither.Togetridofthosequadsusuallyme ansapplyingatleastoneCatmullClarksubdivisionstepontheCPUandfourfolddatatransfe rtotheGPU.Inmoredetail,Shiue implementsrecursiveCatmullClarksubdivisionusingsev eralpassesviathepixelshader,using texturesforstorageandspiralenumeratedmeshfragments formaximizingparallelism[ 46 ].Bolz tabulatesthesubdivisionnodalfunctionsuptoagivendens ityandlinearlycombinetheminthe GPU[ 6 7 ].Thenumberofnodalfunctionsequalsthenumberofthevert icesoftheinputmesh. Oneoftheobviousadvantagesofsubdivisionsurfacesisthe ycanmodelsurfacesof arbitrarytopologicaltype.Alsobecauseofstaticreneme ntruleforeachschemesubdivision surfacesareeasytoimplement.Althoughsubdivisionsurfa ceshavebeenknownfornearly twentyyears,theirusehasbeenhinderedinrealtimeapplic ationssuchasgamesbecause recursiverenementisneithermemoryefcientnorperform anceefcient.Multiplepasses arerequiredtorenderavisuallysmoothsurface.Moreover, approximately4foldofgeometry increaseaftereachsubdivisionstepcausesheavymemorytr afconthebusbetweentheCPUand theGPU.1.4.2ParametricPatches SincecurrentandimpendingGPUcongurationsfavorshorte xplicitsurfacedenitions overrecursivelydenedsurfaces,thealternativePatchb asedrenementhasbeenadvocatedfor fastrendering.Parametricpatches(shortasPP)arerender eddirectlyintermsoftheirpolynomial representations,asopposedtoacollectionofapproximati ngfacets.Generallyspeaking,PP convertscontrolmeshestoasetofpatchesthatareparametr icpiecewisepolynomials.PP schemescanconvenientlytintoa2passimplementationon thecurrentgraphicspipeline (Figure 110 ).Thetworenderingpassesarecombinedtoonepassinafutur eGPUpipeline (Figure 111 )[ 48 ]. 20 PAGE 21 Figure110.Theanimation,DisplacementMapping(DM)take placeinVSoftherstpass,and secondpassrespectively.Therstpassconvertsthedeform edcontrolmeshtoits parametricpatchrepresentations.Inthefollowingpass,t hedetailsareaddedusing DMaftertheevaluationoftheproducedpatchesfrompreviou spass. TheoverallspeedofaPPschemeisinuencedbyboththecompl exityofpatchesandthe numberofpatches.Forshapemeasurements,adesiredPPsche meensuresatleast G 1 continuity acrosstheadjacentpatchesandisacloseapproximationofs ubdivisionsurfaces.Oneofthe biggestchallengeistoensurethesmoothnesseverywhereov erthepatches.Petersexplainedhow tosolvethevertexenclosureproblemandgeometriccontinu ityin[ 39 41 ]. GPUbasedevaluationoftrimmedNURBssurfacesisproposed in[ 16 25 ].Peters[ 40 ] usedanapproximationtothelimitsurfaceofDooSabinsubd ivisiontogetaquicklyconvergent seriesofapproximationstothevolumeoftheenclosedsubdi visionsurface.Thedifcultproblem ofllingnsidedholesisrecentlysolvedby[ 21 42 ].Bajajetal.[ 2 ]introducedApatches intrivariateBBformwithfewfreeparameterstoadjustthe shapebothlocallyandglobally. In[ 15 ],thefreeformsurfaceisrepresentedineitherNURBSform orascubictriangular BezierpatchesAnexplicitsplinerepresentationofsmoot hfreeformsurfacesistoformthe basisofaninteractivesculptingenvironment.Inthespiri toftheTessllator,Boubekeur[ 8 ] 21 PAGE 22 Figure111.Onepossiblepassonthefuturegraphicsrender ingpipeline, describesagenericrenementpatternforSurfaceModeling (tessellation+displacement)onany programmableGPU.1.4.2.1Beziertechnique TheBezierformisaparametricsurfacerepresentationand wasrstdevelopedin1972 bytheFrenchengineerPierreBezier.Acomprehensiveover viewoftheBezierformcanbe foundin[ 43 ].ABezierpatchisadenedbycontrolpoints.ABeziersur face,asasetofBezier patches,arepiecewisepolynomials.Theyarevisuallyintu itiveandmathematicallyconvenient duetothefollowingproperties: 1. Afneinvariance:Applyinganafnetransformationtoacon trolmeshappliesittothe correspondingBezierpatchaswell. 2. Theconvexhullproperty:ABezierpatchliescompletelywi thintheconvexhullofits controlpoints,andthereforealsocompletelywithinthebo undingboxofitscontrolpoints inanygivenCartesiancoordinatesystem. TherearetwotypesofBezierpatch: 22 PAGE 23 AtensorproductpatchinBezierformofdegree m by n isdenedas: g ( u;v ):= m X i =0 n X j =0 g ij m i u i (1 u ) m i n j v j (1 v ) n j : where ( u;v ) isabarycentriccoordinateonthedomainof [0 ; 1] [0 ; 1] AtriangularBezierpatchofdegreenisdenedas: b ( s;t;w ):= X i + j + k = n i;j;k 0 b ijk n i j k s i t j w k : where ( s;t;w ) arethebarycentriccoordinatesonatriangledomain. 1.4.2.2Relatedwork Forquadrilateralinputmeshes,itiswellknownthatType1 quadscanbeconvertedinto degree3by3patchesintensorproductBezierformbythest andardBsplinetoBezierconversionrules[ 14 ].Therefore,anytwoadjacentpatchesderivedfromordinar yquadswilljoin C 2 TheinterestingaspectistheconversionofType2andType3quads.Anumberoftechniques(see acomparisoninFigure 112 )existtosmoothoutquadmeshes.Peters[ 38 ]generatesNURBS output,thatcouldberendered,forexamplebytheGPUalgori thmof[ 17 ].Butthishasnotbeen implemented.Themethodof[ 30 ]generatesonebicubicpatchperquadfollowingtheshapeof CatmullClarksurfaces.Sincethesebicubicpatchestypic allydonotjoinsmoothly,Loopand Schaefercomputetwoadditionalpatcheswhosecrossproduc tapproximatesthenormalofthe bicubicpatch.Aspointedoutin[ 49 ],thistrompel'oeilrepresentsasimplesolutionwhentrue smoothnessisnotneeded.Comparingthenumberofoperation sinconstructionandevaluation, themethodof[ 30 ]shouldrunatcomparablespeedstoourGPUquadmeshsmoothi ng.Our method[ 37 ]designsacpatchforconvertinganirregularquad.Theres ultingcpatchesforma G 1 surface.Thealternativealgorithmproposedby[ 35 ]usesabi5Bezierpatchforeachirregular quad. 23 PAGE 24 Figure112.ThisgurecomparesexistingPPschemesinterm sofhowwelltheymeetthe performanceandshapemeasurements.geom=geometrypatche s,tan=tangent patches. 24 PAGE 25 CHAPTER2 ANEWSCHEMEFORSURFACECONSTRUCTION 2.1Contribution Thisthesisproposesasetofrulesforconvertingaquadrila teralmeshtoasurfaceconsistingofbicubicsplineswhereverpossible.Eachirregularq uad(Figure 17 )isconvertedtoanovel C 1 surfacepatch(short cpatch ).ThesurfacecloselymimicstheshapeoftheCatmullClark subdivisionsurfaceandisconstructedentirelybylocalparal leloperationsontheGPU.Theresulting surfaceispiecewisepolynomialandhaswelldenednormal severywhere.Theevaluationavoids pixeldropout. Acpatchisa C 1 piecewisepolynomialpatchwithcubicboundary.Itisdene dby24 coefcientswhoseinstantiationforasmoothsurfaceisgiv eninSectionxxxbelowandindicated inFigure 21 .Acpatchhasanalternativerepresentationasfourtriang ular,totaldegree4patches inBernsteinBezierform(Figure 25 right ). Figure21.Thecpatchcoefcients.For i =0 ; 1 ; 2 ; 3 ,theboundarycoefcients v i and e ij denedbyvertexneighborhoods(gure 24 speciestheformulas).Theinterior coefcients b i211 b i121 b i112 (gure 26 ),where i =0 :: 3 ;j =0 ::n i ; and n i isthe valenceof v i 2.2TheConversionAlgorithm Herewegivethedetailedalgorithmforconvertingthequadm eshintocoefcientsthat deneasmoothsurfaceoflowdegree.Essentially,theconve rsionfromameshtoapatch 25 PAGE 26 Figure22.Smoothingthevertexneighborhoodaccordingto Figure 24 .Thecenterpoint p ,its directneighbors p 2 j anddiagonalneighbors p 2 j +1 formavertexneighborhood, j =0 ::n 1 Figure23.a)Aquadneighborhooddeningasurfacepiece.b )Abicubicpatchwith 4 4 controlpoints.Thispatchistheoutputifthequadisregula r,andusedtodetermine theshapeofa cpatch c)ifthequadisirregular.Acpatchisdenedby 4 6 control pointsdisplayedas andcanalternatively,foranalysis,berepresentedasfour C 1 connectedtriangularpiecesofdegree4withdegree3outer boundariesidentical tothebicubicpatchboundaries. 26 PAGE 27 consistsofcomputingnewpointsnearavertexusingtheknow ledgeofthe vertexneighborhood A vertexneighborhood consistsofameshpoint p andmeshpoints p k k =0 ;:::; 2 N 1 of allquadssurrounding p (Figure 22 ).theunionofthefour vertexneighborhoods isathe quad neighborhood (Figure 23 ,A.)thatdenesapatch.Inourscheme,thepatchiseitherat ensor productbicubicBezierpatch,oracpatch.2.2.1TheConversionRulesforaType1Quad RecallthataquadisType1ifallfourverticeshave4neighb ors.Type1quadsare consideredregularintheliterature.Suchafacetwillbeco nvertedintoadegree3by3patchin tensorproductBezierformbythestandardBsplinetoBe zierconversionrules[ 14 ].Therefore, anytwoadjacentpatchesderivedfromType1quadswilljoin C 2 .Figure 23 illustratesthe derivationprocessfromaquadtoaBicubicBezierpatch.T heconversionrulesareshownin Figure 24 Figure24. Computingcontrolpoints v e f and t ,theprojectionof e ,atavertexofvalence N fromthemesh points p j ofavertexneighborhood;thesubscriptsaremodulo 2 N .Bydefault, N := c N +5+ p ( c N +9)( c N +1) = 16 ,thesubdominanteigenvalueofCatmullClark subdivision. Avertex v computedaccordingtoFigure 24 isthelimitpointofCatmullClarksubdivisionasexplained,forexample,in[ 18 ].Therulesfor e j and f j arethestandardrulesfor convertingauniformbicubictensorproductBsplinetoit sBezierrepresentation.Thepoints t j areaprojectionof e j intoacommontangentplane(seee.g.[ 15 ]).Thedefaultscalefactor isthesubdominanteigenvalueofCatmullClarksubdivisio n.Wenotethatfor N =4 e j +2 =2 v e j and =1 = 2 sothattheprojectionleavesthetangentcontrolpointsinv ariantas 27 PAGE 28 t j = e j : for N =4 ;t j = v + 2 4 ( e j e j +2 )= v +( e j v )= e j : (21) Inthenextstage,wecombineinformationfromfourvertexne ighborhoods,asshowninFigure 25 ,topopulateatensorproductpatch g ofdegree3by3inBezierform[ 14 ]: g ( u;v ):= 3 X k =0 3 X ` =0 g k` 3 k u k (1 u ) 3 k 3 ` v ` (1 v ) 3 ` : Thepatchisdenedbyits16controlpoints g k` .TheformulasofFigure 24 makethispatchthe BezierrepresentationofabicubicsplineinBsplineform .Forexample,inthenotationofFigure 25 ( g k 0 ) k =0 ;:: 3 =( v 0 ;t 00 ;t 11 ;v 1 ) Figure25. Patchconstruction.Ontheleft,fourvertexneighborhoods withvertices v i eachcontributeonesector toassemblethe 4 4 coefcientsoftheBezierpatch g ,forexample g 00 = v 0 g 10 = e 00 g 11 = f 0 g 30 = v 1 g 31 = e 10 (weusesuperscriptstoindicatevertices).Ontheright,th esamefoursectorsare usedtodetermineacpatchiftheunderlyingquadisextraor dinary.Theindicesofthecontrolpointsof g and b i areshown. Notethatonlyasubsetofthecoefcientsofthefourtriangu larpieces b i is actuallycomputedtodenethecpatch. Thefullsetofcoefcientsdisplayedhereisonlyusedto analyzetheconstruction.Theindexingof15coefcientsof aquartictriangularpatchisshownonthe right.Weusethislabelingthroughoutthedissertation. 28 PAGE 29 2.2.2TheConversionRulesforaType2,orType3Quad Type2andType3quadsareknownasirregular.Theirregula rquadshaveatleastoneand possiblyuptofourverticeswithvalenceotherthan4.Forea chirregularquad,theconversion involvestwosteps: 1. ApplyregularrulesdenedinFigure 24 togenerate v i and e ij showninFigure 21 left. 2. ThenapplyrulesinFigure 26 toyield b i211 ;b i121 ;b i112 showninFigure 21 right. Weusethebicubicpatchtooutlinetheshapeaswereplaceitb yacpatch(Figure 23 c ).A cpatchhastherightdegreesoffreedomtocheaplyandlocal lyconstructasmoothsurface.We introducethecpatchintermsofawellknownBezierformo fapolynomialpiece b i oftotal degree4[ 14 ]: b i ( u 1 ;u 2 ):= X k + ` + m =4 k;`;m 0 b ik`m 4! k ` m u k1 u `2 (1 u 1 u 2 ) m : (22) Thecpatchisequivalenttotheunionoffour b i i =0 ; 1 ; 2 ; 3 oftotaldegree4,butdenedby only 4 6 ccoefcientsconstructedinFigures 24 and 26 : v i ;t i0 ;t i1 ;b i211 ;b i121 ;b i112 ;i =0 ; 1 ; 2 ; 3 : These24ccoefcientsimplythemissinginteriorcontrolp ointsoftherepresentation( 22 )by C 1 continuitybetweenthetriangularpieces:for j =0 ; 1 ; 2 ; 3 and i =0 ; 1 ; 2 ; 3 b i3 j; 0 ; 1+ j = b i 1 0 ; 3 j; 1+ j :=( b i3 j; 1 ;j + b i 1 1 ; 3 j;j ) = 2; (23) andtheboundarycontrolpoints b ik` 0 areimpliedbydegreeraising[ 14 ]: b i400 := v i ;b i310 :=( v i +3 t i0 ) = 4 ;b i220 :=( t i0 + t i +1 1 ) = 2 ; b i130 :=( v i +1 +3 t i +1 1 ) = 4 ;b i040 := v i +1 : (24) Forallobjectswithboundaries,theboundaryrulesaresimp lythederivationofcubicBezier curvesdenedby ( v i ;t i0 ;t i +1 1 ;v i +1 ) .Basisfunctionscorrespondingtothe24ccoefcientsoft he 29 PAGE 30 Figure26. Formulasforthe 4 3 interiorcontrolpointsthat,togetherwiththevertexcont rolpoints v i andthe tangentcontrolpoints t ij ,denea cpatch .SeealsoFigures 211 and 212 .Here c i :=cos 2 N i s i :=sin 2 N i andsuperscriptsaremodulo4.Bydefault, g :=( P 3i =0 v i +3( e i0 + e i1 )+9 f i ) = 64 ,the centralpointoftheordinarypatch. cpatchcanbereadoffbysettingoneccoefcienttooneand allotherstozeroandthenapplying ( 23 )and( 24 ). 2.3Derivationofthecoefcientsofacpatch Whenacpatchsectorbmeetsacpatchsectora(Figure 212 ),thefollowingequation mustholdtopreserve G 1 continuityacrosstheboundarybetweenbanda, ( u ) @ 1 b ( u; 0)= @ 2 b ( u; 0)+ @ 1 a (0 ;u ) ; (25) where,with denotingthescalar,respectivelythreescalarproductsfo rthevectors, ( u ):=( 0 ; 1 ) ( u; 1 u ) @ 1 b ( u; 0):=3( U 0 ; 2 U 1 ;U 2 ) ( u 2 ;u (1 u ) ; (1 u ) 2 ) @ 2 b ( u; 0):=4( v 0 ; 3 v 1 ; 3 v 2 ;v 3 ) ( u 3 ;u 2 (1 u ) ;u (1 u ) 2 ; (1 u ) 3 ) @ 1 a (0 ;u ):=4( w 0 ; 3 w 1 ; 3 w 2 ;w 3 ) ( u 3 ;u 2 (1 u ) ;u (1 u ) 2 ; (1 u ) 3 ) 30 PAGE 31 Equation( 25 )canberewritteninacollectionofthefollowingsimplied formsintermsof U i ;v i ;w i 3 0 U 0 =4 v 0 +4 w 0 (26) 6 0 U 1 +3 1 U 0 =12( v 1 + w 1 ) (27) 3 0 U 2 +6 1 U 1 =12( v 2 + w 2 ) (28) 3 1 U 2 =4 v 3 +4 w 3 (29) 2.3.1Derivationof 0 and 1 Thescalar 0 isderivedfrom( 26 ).( 29 )setstheconstraintfor 1 Let U 0 :=(1 ; 0) V 0 :=(cos 2 n 0 ; sin 2 n 0 ) ,and W 0 :=(cos 2 n 0 ; sin 2 n 0 ) .(Figure 27 ) Weknow u 0 = 3 4 U 0 ;u 3 = 3 4 U 2 fromdegreeraising. v 0 + w 0 = 1 2 ( 3 4 V 0 + 3 4 U 0 )+ 1 2 ( 3 4 W 0 + 3 4 U 0 ) = 3 4 ( 1+cos 2 n 0 2 ; sin 2 n 0 2 )+ 3 4 ( 1+cos 2 n 0 2 ; sin 2 n 0 2 ) = 3 4 (1+cos 2 n 0 ; 0) = 3 4 (1+cos 2 n 0 ) U 0 (210) Hence, 4( v 0 + w 0 )=3(1+cos 2 n 0 ) U 0 0 =(1+cos 2 n 0 ) Similarly,because V 3 =(1 cos 2 n 1 ; sin 2 n 1 ) and W 3 =(1 cos 2 n 1 ; sin 2 n 1 ) 4( v 3 + w 3 )=3(1 cos 2 n 1 ) U 2 (211) Hence, 1 =(1 cos 2 n 1 ) 2.3.2Derivationof b 211 and b 121 Toderivetheformulasfor b i211 anditssymmetriccounterpart b i121 notethattheformulas mustguaranteeasmoothtransitionbetween b i anditsneighborpatchonanadjacentquad, 31 PAGE 32 Figure27.Thereparameterizationof tomeet G 1 atthevertex regardlesswhethertheadjacentquadisregularorirregula r.Thatis,theformulasarederivedto satisfy simultaneously twotypesofsmoothnessconstraints(seeSection 2.4 ).FromEquation Ghost patch Triangular patches Figure28.Coefcients b 211 and b 121 ofcpatchisderivedontopofaghostpatch. ( 27 ),weobtain b 211 + a 211 = 1 2 0 U 1 + 1 4 1 U 0 +2 b 310 (212) Togetasecondconstraintanddetermine b 211 uniquely,weconsiderthevalues b 211 and a 211 if eachghostpatchintermsof sin averages(Figure 28 ): 4 s 0 ( b 211 b 310 )+4 s 1 ( b 211 b 220 )=3( b 11 b 10 ) yields b 211 = 4 s 0 b 310 +4 s 1 b 220 +3( f 0 0 t 00 ) 4( s 0 + s 1 ) (213) 32 PAGE 33 Similarly, a 211 = 4 s 0 b 310 +4 s 1 b 220 +3( f 0 n 0 1 t 00 ) 4( s 0 + s 1 ) (214) Therefore, b 211 a 211 = 3( f 0 0 e 00 ) 2( s 0 + s 1 ) (215) TogetherwithEquation( 212 ), b 211 = b 310 + 1 4 0 ( t 11 t 00 )+ 1 8 1 ( t 00 v 0 )+ 3( f 0 0 e 00 ) 4( s 0 + s 1 ) (216) Equation( 28 )implies b 121 + a 121 = 1 4 0 U 2 + 1 2 1 U 1 +2 b 130 (217) Usingthesimilarapproachasderiving b 211 ,weyield 4 s 0 ( b 121 b 220 )+4 s 1 ( b 121 b 130 )= 3( b 21 b 20 ) yields b 121 = 4 s 1 b 130 +4 s 0 b 220 +3( f 1 0 t 11 ) 4( s 0 + s 1 ) (218) Similarly, a 121 = 4 s 1 b 130 +4 s 0 b 220 +3( f 1 1 t 11 ) 4( s 0 + s 1 ) (219) ( 218 )and( 219 ) ) b 121 a 121 = 3( f 1 0 e 11 ) 2( s 0 + s 1 ) (220) ( 218 )and( 220 ) ) b 121 = b 130 + 1 8 0 ( v 1 t 11 )+ 1 4 1 ( t 11 t 00 )+ 3( f 1 0 e 11 ) 4( s 0 + s 1 ) (221) Theformulas( 221 )and( 221 )arethesameasshowninFigure 26 2.3.3Derivationof b 112 Bycontrast, b i112 isnotpinneddownbycontinuityconstraints.Wecouldchoos eeach b i112 arbitrarilywithoutchangingtheformalsmoothnessofther esultingsurface.However,weopt forincreasedsmoothnessatthecenterofthecpatchandadd itionallyusethefreedomtoclosely mimictheshapeofCatmullClarksubdivisionsurfaces,asw edidearlierforvertices.First,we 33 PAGE 34 approximatelysatisfyfour C 2 constraintsacrossthediagonalboundariesatthecentralp oint b 004 (Figure 29 )byenforcing 266666664 1 100 01 10 001 1 1001 377777775 266666664 b 0112 b 1112 b 2112 b 3112 377777775 = 1 2 266666664 b 0211 b 1121 q b 1211 b 2121 q b 2211 b 3121 q b 3211 b 0121 q 377777775 ; (222) where q := 1 4 P 3i =0 ( b i211 b i121 ) .Theperturbationby q isnecessary,sincethecoefcientmatrix ofthe C 2 constraintsisrankdecient.Afterperturbation,thesyst emcanbesolvedwiththe lastequationimpliedbytherstthree.Weaddtheconstrain tthattheaverageof b i112 matches g := g ( 1 2 ; 1 2 ) ,thecenterpositionofthebicubicpatch. Figure29.Darklinescoverthecontrolpointsinvolvedint he C 2 constraints( 222 ).Thepoints ondashedlinesareimpliedbyaveraging. 0BBBBBBB@ 1 100 01 10 001 1 1111 1CCCCCCCA 0BBBBBBB@ b 0112 b 1112 b 2112 b 3112 1CCCCCCCA = 1 2 0BBBBBBB@ b 0211 b 1121 q b 1211 b 2121 q b 2211 b 3121 q 8 g 1CCCCCCCA g liesontheBicubicpatchat u =0 : 5 and v =0 : 5 .TheBicubiccontrolpointsaregiven exceptinterior4points,becauseallthecontrolpointsont heboundariesarecalculated.Wecan 34 PAGE 35 useamaskofdeterminingBeziercontrolpointsfromaunifo rmbicubicBsplinesurface.Figure 210 (a)isamaskfor b 11 .Forotherinteriorpoints,wecanuseasymmetricmask. Figure210.Thecenterofabicubicpatchcanbeevaluatedb ythelinearcombinationofthe boundarycoefcients. Figure 210 (b)showsamaskfortheevaluationofBicubicpatchat (0 : 5 ; 0 : 5) g = 1 64 ( b 00 +3 b 01 +3 b 02 + b 03 +3 b 10 +9 b 11 +9 b 12 +3 b 13 +3 b 20 +9 b 21 +9 b 22 +3 b 23 + b 30 +3 b 31 +3 b 32 + b 33 ) Now,wecansolveforthe b i112 i =0 ; 1 ; 2 ; 3 andobtaintheformulaofFigure 26 2.4SmoothnessVerication Inthissectionweformallyverifythefollowinglemma.Fort hepurposeoftheproof,we viewthecpatchinitsequivalentrepresentationasfourB ezierpatchesoftotaldegree4. Lemma1. Twoadjacentpolynomialpieces a and b denedbytherulesofSection 2.2 (Figure 24 ,Figure 26 ( 23 ) ( 24 ) )meetatleast (i) C 2 if a and b correspondtotworegularquads; (ii) C 1 if a and b areadjacentpiecesofacpatch; (iii) C 1 if a and b correspondtotwoquads,exactlyoneofwhichisregular; (iv) withtangentcontinuityif a and b correspondtotwodifferentirregularquads; Proof. (i)If a and b arebicubicpatchescorrespondingtoregularquads,theyar epartofa bicubicsplinewithuniformknotsandthereforemeet C 2 .(ii)If a and b areadjacentpiecesofa cpatchthenEquations( 23 )enforce C 1 continuity. 35 PAGE 36 Fortheremainingcases,let b beatriangularpiece.Let u theparametercorrespondingto thequadedgebetween b 400 = v 0 ,where u =0 andthevalenceis N 0 and b 040 = v 1 where u =1 andthevalenceis N 1 (Figures 211 for(iii)and 212 forcase(iv)).Byconstruction,thecommon boundary b ( u; 0)= a (0 ;u ) isacurveofdegree3withBeziercontrolpoints ( v 0 ;t 00 ;t 11 ;v 1 ) sothat bicubicpatchesonregularquadsandtriangularpatchesoni rregularquadsmatchupexactly. Denoteby @ 1 b thepartialderivativeof b alongthecommonboundaryandby @ 2 b thepartialderivativeintheothervariable.Since b ( u; 0)= a (0 ;u ) ,wehave @ 1 b ( u; 0)= @ 2 a (0 ;u ) .The partialderivativeintheothervariableof a is @ 2 a .Wewillverifythatthefollowingconditions hold,thatimplytangentcontinuity: ifonequadisordinary(case(iii)), @ 1 b ( u; 0)=2 @ 2 b ( u; 0)+ @ 1 a (0 ;u ); (223) ifbothquadsareextraordinary(case(iv)), (1 u ) 0 + u 1 @ 1 b ( u; 0)= @ 2 b ( u; 0)+ @ 1 a (0 ;u ) ; (224) where 0 :=1+ c 0 ; 1 :=1 c 1 ; and c i :=cos( 2 N i ) : Bothequations,( 223 )and( 224 ),equatevectorvaluedpolynomialsofdegree3(wewrite @ 1 b ( u; 0) indegreeraisedform[ 14 ]).Theequationshold,ifandonlyifallBeziercoefcient s areequal.Offhand,thismeanscheckingfourvectorvalued equationsforeachof( 223 )and ( 224 ).However,inbothcases,thesetupissymmetricwithrespec ttoreversalofthedirectionin whichtheboundary b ( u; 0) istraversed.Thatmeans,weneedonlycheckthersttwoequa tions ( 223 ')and( 223 )of( 223 )andthersttwoequations( 224 ')and( 224 )of( 224 ).We verifytheseequationsbyinsertingtheformulasofFigures 24 and 26 Toverify( 223 ),thekeyobservationisthat N 0 = N 1 =4 ifonequadisordinary.Hence c 0 = c 1 =0 and s 0 = s 1 =1 (cf.Figure 26 )and t ij = e ij .Therefore,forexample(cf.Figure 36 PAGE 37 Figure211. C 1 transitionbetweenatriangularandabicubicpatch. 211 ) 2 @ 2 b (0 ; 0)=2 4( b 301 v 0 )=8 3 4 ( e 00 + e 01 2 v 0 ) =3( e 00 + e 01 ) 6 v 0 ; wherethefactor 3 4 stemsfromraisingthedegreefrom3to4;andthesecondBezi ercoefcientof @ 1 b ( u; 0) (indegreeraisedform)andof 2 @ 2 b ( u; 0) arerespectively(cf.Figure 211 ) 3 ( e 00 v 0 )+2( e 11 e 00 ) 3 and 2 4( b 211 b 310 )=8( e 11 e 00 4 + e 00 v 0 8 +3 f 0 e 00 8 ) : Then,comparingthersttwoBeziercoefcientsof @ 1 b ( u; 0) and 2 @ 2 b ( u; 0)+ @ 1 a (0 ;u ) yields equalityandestablishes C 1 continuity: 3( e 00 v 0 )  {z } @ 1 b (0 ; 0) =3( e 00 + e 01 ) 6 v 0  {z } 2 @ 2 b (0 ; 0) 3( e 01 v 0 )  {z } @ 1 a (0 ; 0) ( 0 ) ( e 00 v 0 )+2( e 11 e 00 )=2( e 11 e 00 )+( e 00 v 0 )+3( f 0 e 00 ) 3( f 0 e 00 ) : ( 00 ) Theequationsfor( 224 )aresimilar,exceptthatweneedtoreplace e j by t j andkeepin mindthat,bydenition, ( t 0n 0 1 v 0 )+( t 01 v 0 )=2 c 0 ( t 00 v 0 ) : 37 PAGE 38 Figure212. G 1 transitionbetweentwotriangularpatches. Hence,forexample, @ 2 b (0 ; 0)+ @ 1 a (0 ; 0)=4( b 301 v 0 + a 301 v 0 ) = 3 4 4 2 c 0 ( t 00 v 0 ) : Therstofthefourcoefcientequationsof( 224 )thensimpliesto 3(1+ c 0 )( t 00 v 0 )=4( b 301 + a 301 2 v 0 ) =3( t 01 + t 00 2 v 0 + t N 0 1 1 + t 00 2 v 0 ) =3 1 2 (2 c 0 ( t 00 v 0 )+2( t 00 v 0 )) : ( 0 ) Notingthatterms ( f 0 e 00 ) = (8( s 0 + s 1 )) intheexpansionsof b 211 and a 211 cancel,thesecond coefcientequationis 6 0 ( t 11 t 00 )+3 1 ( t 00 v 0 )=12( b 211 + a 211 2 b 310 ) = 12 2(1+ c 0 ) 4 ( t 11 t 00 )+ 12 2(1 c 1 ) 8 ( t 00 v 0 ) : ( 00 ) Itiseasytoreadoffthattheequalitieshold.Sotheclaimof smoothnessisveried. 38 PAGE 39 2.5ComplexityAnalysis 2.5.1NumberofPatches Theconversionschemeyieldstheminimumsetofpatchesbeca use(1)noinitialrenement forinputcoarsemeshisneeded;(2)eachquadrilateralface tofthecoarsemeshcorrespondsto onlyonepatch.Namely,thetotalnumberofpatchesequalsto thenumberoffacetsinthemesh. ThepatchcomplexityofvariousschemesarecomparedinFigu re 112 Thelowcostofconstructionandevaluationmakescpatches anattractiverepresentation, notjustontheGPU2.5.2CostofPatchConstruction Theseparationintovertexandpatchconstructionmeanstha tthenumberofscaledvertex additions(adds)perpatchisindependentofthevalence.Th ecostofcomputingthecontrolpoints perpatch ,(i.e.),withthecostofvertexcomputationsdistributed, is 4 (4+1+1+2)=32 addsperbicubicconstructionandcomputing t j from t 0 and t 1 anddetermining b i211 b i121 and b i112 accordingtoFigure 26 amountstoanadditional 4 (2+6+6+12)=104 addsper cpatch.Eachcpatchhas24coefcients.Thiscomparesfav orablyto,say[ 30 ]where16+12+12 coefcientsaregenerated.2.5.3CostofSurfaceEvaluation Thepatchcanbeevaluatedatanyparametricdomain ( u;v ) usingdeCasteljau'salgorithm. AtensorproductBicubicBezierpatchisdenedby16contr olpoints.Theevaluationat ( u;v ) needs42vectorvectoradditions,42scalervectormultip lications,and42scalerscaler operations.Similarlytheevaluationofacpatchat ( u;v ) requires40vectorvectoradditionsand 60scalervectormultiplications.Intermsofevaluationc ost,acpatchhasroughlythesamecost asabicubicpatchdoes. 39 PAGE 40 2.6ApproximationCatmullClarkSubdivisionSurface SinceCatmullClarksubdivisionisastandardmodelingtoo l,ourschemeisdesignedto approximateCatmullClarkSubdivisionSurface.Infact,t heresultingBicubicpatchescompletelyagreewiththeCatmullClarkSubdivisionSurfacee xceptintheimmediateneighborhood ofirregularmeshvertices.Insuchaneighborhoodtheyjoin atleastwithtangentcontinuityand interpolatethelimitoftheirregularmeshvertex.Further more,thecenterofcpatchinterpolates thecenterpointofthecorrespondentCatmullClarklimits urfaceduetothechoiceofthecpatch coefcient b 112 2.7WaterTightSurfaceVerication Patchesareevaluatedindependently.Ifthegeneratedvert icesalongtheboundaryfromthe adjacentpatchesdonotmatchexactly,therenedmeshwillh aveaholeinit.Therearethree congurationsforadjacentpatches:(1)bothareBi3patch es,(3)botharecpatches,(2)oneof themisBi3patch. Thecoefcientsdeningthesharedboundarycurvearederiv edbytheaveragingrules denedinFigure 24 .Sinceadditionsarecommutative,thegenerationofallbou ndarycoefcientsareindependentoftheevaluationofthechoiceofpa tch.Inotherwords,noroundoff errorandcrackingarepossiblefortherstcase.Thebounda rycoefcientsofacpatcharecomputedbythesamerulesinFigure 24 ,thereforewatertightnessarealsoachievedforthelater al twocases.Notethatcomputationofthecubicboundariessha redbyabicubicandacpatchis mathematicallyidentical. 2.8Discussion Theintroductionoftriangularpatchestomodelquadpatche sissomewhatunconventional, buthasbeenusedinanI3Dpaperbefore[ 15 ].Also[ 49 ]isbasedontriangularpatches. Evaluationandnormalcomputationofdegree4triangularpa tchesiscomparableincostto 40 PAGE 41 tensorproductbicubicpatches:inthetriangularcaseweh avetoaverage15controlpoints,inthe tensorproductcase16.Triangularpatchesmaydeservemor eattentioninOpenGL. 41 PAGE 42 CHAPTER3 GPUIMPLEMENTATION 3.1Overview WeimplementedtheconversionschemeusingC++onDirectX10 pipeline.Wecompute vertexneighborhoodsaccordingtoFigure 24 inthevertexshaderandusethegeometryshader primitive trianglewithadjacency toaccumulatethecoefcientsofthebicubicpatchorcomput e acpatchaccordingtoFigure 26 .Weimplementedconversionplusrenderingintwovariants: a 1passanda2passscheme. 3.22passApproach Figure31. 2passimplementationdetailedinFigure 32 .Therstpassconverts,thesecondrenders.Notethatthe geometryshaderonlycomputesatmost24coefcientsperpat chanddoesnotcreate(amplifyto) evaluationpointprimitives. 42 PAGE 43 Figure32.2Passconversion:VS=vertexshader,GS=geome tryshader,PS=pixelshader.VSOut ofPass1outputs N points f j foronevertex(hencethesubscript)andGSInofPass1 retrievesfourpoints f i ,eachgeneratedbyadifferentvertexofthequad(hencethe superscript). The 2passimplementation constructsthepatchesintherstpassusingthevertexshad er andthegeometryshaderandevaluatespositionsandnormals inthesecondpass.Pass1streams outonlythe 4 6 coefcientsofacpatchandnotthe 4 4+2 2 Beziercontrolpointsof theequivalenttriangularpieces.Thedataamplicationne cessarytoevaluatetakesplaceby instancinga ( u;v ) gridonthevertexshaderinthe secondpass .Thatis,we donotstreamback largedatasetsafteramplication .Positionandnormalarecomputedonthe ( u;v ) domain [0 :: 1] 2 43 PAGE 44 ofthebicubicorofthecpatch(notonanytriangulardomain s).Wepretessellatethequad domain,andstoretheresultsinasetoftextureswithdiffer entresolution.Ifatessellationfactor ischosentobe m ,thetexturewith ( m +1) by ( m +1) parametricvalueswillbesenttothe vertexshaderinthesubsequentevaluationpass.Giventhep retessellateddomainwithapatch identier,thevertexshaderloadstheappropriatecontrol pointsandevaluatesthepatch.Figure 32 liststheinput,outputandthecomputationsofeachpipelin estage.Figure 31 illustratesthis associationofcomputationsandresources.Inordertoavoi dpricybranchinginHLSL(High LevelShaderLanguage)andoptimizetheperformance,speci alizedshadersareactuallywritten forpatchconstructionsandevaluationbasedonthepatchty pe. 3.31passApproach Inthe 1passimplementation ,theevaluationimmediatelyfollowsconversioninthe geometryshader,usingthegeometryshader'sabilityto amplify ,(i.e.),outputmultiplepoint primitivesforeachfacet(Figure 34 ).Whilea1passimplementationsoundsmoreefcient thana2passimplementation,DX10limitsdataamplicatio ninthegeometryshadersothatthe maximalevaluationdensityis 8 8 perquad.Moreover,maximalamplicationinthegeometry shaderslowstheperformance.Weobservedaminimumof 25% betterperformanceofthe2pass implementation .Figure 3.3 liststhedataowonthegraphicspipeline. 3.4CoordinateSystemTransformation Whenweevaluatenormalandpositionofanirregularquadat ( u;v ) ,weneedrst transformthetessellateddomainvaluefromaCartesiancoo rdinate ( u;v ) toabarycentric coordinate ( s;t;w ) .Figure 35 illustrateshowtolocatewhichoffourtriangleswhere ( u;v ) lieson.Inthisway,weminimizenumberofcomparisonsandta kecareofthesharedvertices. Wemake (0 : 0 ; 0 : 0) ; (1 : 0 ; 0 : 0) ; (0 : 5 ; 0 : 5) onlybelongto T 1 (1 : 0 ; 1 : 0) onlybelongsto T 2 ,and (0 : 0 ; 1 : 0) onlybelongsto T 4 44 PAGE 45 Figure33.1Passconversion:VS=vertexshader,GS=geome tryshader,PS=pixelshader.GS ampliesthegeometryandevaluatesthepatches. Figure34. Atpresent,the1passconversionandrenderingmustplac epatchassemblyandevaluationonthe geometryshader.Thisisnotefcient. 45 PAGE 46 u v (0.5,0.5) (0.0,1.0) (1.0,1.0)(1.0,0.0) (0.0,0.0) T4 T3 T2 T1 Figure35. ( u;v ) onanirregularquad. 3.5WaterTightEvaluation TheHLSLcodeinFigure 36 showsthatthesamecubiccurveisevaluatedalongthe boundary.Anexplicitifstatementintheevaluationguara nteestheexactsameorderingof computationssinceboundarycoefcientsareonlycomputed once, Figure36.WatertightEvaluation 46 PAGE 47 3.6Conclusion ThepresentedapproachtswellintoaGPUpipeline.Inbotha pproaches,wecompute v e f and t usingits vertexneighborhood andtherulesinFigure 24 inthevertexshader.Each vertexhas 2 n +1 verticesinits vertexneighborhood ,where n isthevalence.Thisinformation isstoredinatexture.WithavertexIDanditsvalence,allve rticesinitsneighborhoodcan beretrievedincounterclockwisedorder.Inthegeometrys hader,thepatchisnalizedand assembled.Overall,the2passimplementationhasbetterp erformancebecauseofsmallstreamout,shortgeometryshadercodeandminimalamplicationon thegeometryshader. 47 PAGE 48 CHAPTER4 RESULTS 4.1ShapeQuality Ouralgorithmproduces C 1 surfacesandtheycloselyapproximateCatmullClarksubdi visionsurfaces.Wecompareouralgorithmwith[ 30 ]ontheclosenesstoCatmullClarksurfaces. WemeasurehowthesurfaceisclosetoCatmullClarksurface bycomparingbothgeometricdifferenceandnormalangledifference.Figure 41 comparesthesmoothedquadmeshsurfaceswith denselyrenedCatmullClarksubdivisionsurfacesbasedo nthesamemesh.Bothgeometric distance,aspercentofthelocalquadsize,andnormaldista nce,indegreesofvariation,arecompared.Especiallyafterdisplacement,largemodelsrender edbysubdivisionandquadsmoothing appearvisuallyindistinguishable.Therelativelysmalle xamples,withoutdisplacement,shown inFigure 41 andthecloseupinFigure 45 arealsoimportanttosupportourobservationthat cpatchesdonotcreateshapeproblemscomparedtoasingleb icubicpatch:despitethelower degreeandinternal C 1 join,theirvisualappearanceisremarkablysimilartothat ofbicubic patches.ThecomparisonwithACCpatches[ 30 ]isshownin 42 .Figures 43 44 showthe generatedsmoothsurfacebyouralgorithmandthesurfaceaf terapplyingdisplacementmapping respectively. Figure41.ComparisonbetweentheCatmullClark(CC)subd ivisionlimitsurfaceandthe smoothedquadmeshsurfaceforthesameinput. 48 PAGE 49 Figure42.ComparisonofACCpatchandCpatchintermsofa pproximationofCatmullClark subdivisionsurfacesforthesameinput. Figure43.GPUsmoothedquadsurfaces:orangepatchescorr espondtoordinaryquads,blue patchestoextraordinaryquads. Figure44.GPUsmoothedquadsurfaceswithdisplacementma pping. 49 PAGE 50 4.2Performance Wecompiledandexecutedtheimplementationonthelatestgr aphicscardsofboth majorvendorsunderDirectX10andtestedtheperformancefo rseveralindustrysizedmodels. Twosurfacemodelsandmodelswithdisplacementmappingare showninFigure 43 and 44 respectively.Table4summarizestheperformanceofthe2p assalgorithmfordifferent granularitiesofevaluation.Thefrogmodel,inparticular ,providesachallengeduetothelarge numberofextraordinarypatches.TheFrogPartyshowninFig ure 411 currentlyrendersat50 fpsforuniformevaluationforN=9,(i.e.),ona 9 9 grid.Thatis,theimplementationconverts 1292 9 quads,ofwhich59%areextraordinary,andrendersof1milli onpolygons50timesper second.Onthesamehardware,wemeasuredBunnell'sefcien timplementation(distribution accompanying[ 9 ])featuringthesinglefrogmodel,(i.e.),1/9thofthework oftheFrogParty, runningat44fpswiththreesubdivisions(equivalenttotes sellationfactorN=9).Thatis, Table41.Aatotaldegree4patchandabicubicpatchhavethe sameevaluationcostat ( u;v ) in termsofALUoperations. evaluationforacpatchALUvectorops position55normal3other1total59 evaluationforabicubicpatchALUvectorops position56normal3other0total59 Table42.Framespersecondforsomestandardtestmesheswi theachpatchevaluatedonagrid ofsize N N ;eqs = percentageofextraordinaryquads.SwordandFrogareshown inFigure 43 ,HeadinFigure 41 MeshFramespersecond(verts,quads,eqs) N =591733 Sword(140,138,38%)965965965703Head(602,600,100%)637557376165Frog(1308,1292,59%)48339222687 50 PAGE 51 Figure45.Closeupofthefrog.Therenedmeshiswaterti ght. Table43.Performanceofthe1passimplementation. MeshSlower1passimplementation N =258 Sword3899643Head1083415Frog44104 GPUsmoothingofquadmeshesisanorderofmagnitudefaster. Comparedto[ 46 ],thespeed upisevenmoredramatic.Whilethecomparisonisnotamongeq ualssinceboth[ 46 ]and[ 9 ] implementrecursiveCatmullClarksubdivision,itisneve rthelessfairtoobservethatthespeedup isatleastpartiallyduetoouravoidingstreambackafteram plication(dataexplosiondueto renement).Weexpectthatmorecarefulstorageofvertexne ighborhoods,inretrievingorder, willfurtherimproveouruseoftexturecacheandtherebyimp rovetheframespersecond(fps) count. 4.3DisplacementMapping Displacementmappingisatechniqueforaddinggeometricde tailsonthemeshwitha heightmap.ItisdifferentfromBumpMappingorNormalMappi nginthesensethatitchanges thegeometrybymovingverticesoftenalongtheirnormaldir ectionsaccordingtothevalueinthe 51 PAGE 52 heightmap.Thechangeofrealgeometry,notjustnormalfori nstanceinBumpMapping,permits selfocclusion.Figure 46 showsthedisplacementmappingonthefrogmodelwhichconsi stsof 330kfacets.Thesizeofheightmapis1024by1024. Figure46.Displacementmappingonthefrogmodel Inordertoperturbnormalsafterdisplacementmapping,wen eed D u and D v bump mappingvalue.Theequationtocalculatenewnormalsisasfo llows. S = P + D n (41) where,SisthedisplacementofthepointP,Disthedisplacem entandnisthenormalofP.Then thenewnormaliscalculatedbythecrossproductof S u and S v S u = P u + D u n + D n u (42) S v = P v + D v n + D n v (43) Notethat n u and n v arethederivativesofthenormalizednormal n n u = n 0u n ( n 0u n ) jj n jj (44) where n 0u = P uu P v + P u P uv4.4MorphingandAnimation Weimplementmorphingusingthe2passapproach.Theanimat edsequenceoftheinput meshesinformoftexturesarefedintotheInputAssemblerof therstpasseachframe.The morphedpatchesareconstructedduringtherstpass.Fined etailsareaddedinthesecondpass. ThescreenshotsinFigures 49 410 411 illustraterealtimedisplacementandanimation. 52 PAGE 53 Figure47.ComparisonofthecpatchschemewithPNTriang les(alsocalledNpatch), ACCpatch,andCatmullClarksubdivision Figure48.comparisonofthecpatchschemewithPNTriang les(alsocalledNpatch), ACCpatches,andCatmullClarksubdivision 53 PAGE 54 Figure49.RealtimeanimationontheSwordmodel. Figure410.RealtimeanimationontheFrogmodel. Figure411.AsynchronousanimationofnineFrogs. 54 PAGE 55 4.5Conclusion SmoothingquadmeshesontheGPUoffersanalternativetohig hlyrenedfacetrepresentationstransmittedtotheGPUandispreferableforinte ractivegraphicsandintegrationwith complexmorphinganddisplacement. Weadvertiseda2passscheme,since,asweargued,theDX10g eometryshaderisnot wellsuitedforthedataamplicationforevaluationafterc onversion.The1passscheme outlinedinSection 3 maybecomemorevaluablewithavailabilityofadedicatedha rdware tesselator[ 29 48 ].Suchatesselatorwillmakeamplicationmoreefcientan dsupport adaptive tessellation (whichiswhyweonlydiscusseduniformtessellationinSect ion 3 ).Suchahardware amplicationwillalsobenetthe2passapproachinthatth e ( u;v ) domaintessellation,fedinto thesecondpasswillbereplacedbytheamplicationunit. 55 PAGE 56 CHAPTER5 PATCHCONVERSIONSFORMESHESWITHTRI/QUAD/PENTFACETS Ourconversionalgorithmcanbegeneralizedtoworkforarbi trarymeshes.Thegeneralized algorithm[ 34 ]providesanelegantsolutionformesheswithTri/Quad/Pen tFacets.Removing restrictionsonvertexvalencesandallowingmesheswithtr iangles,quadrilaterals,andpentagons vastlysimpliesadesigner'staskandenrichesthedesigns paceofmeshesforsmoothsurfaces: whilequadsnaturallymodeltheowof(parallel)featureli nesandarethereforethemainfacet typeinmodels,triangularfacetsallowmerginglineswhile pentagonalfacetsallowtostarting newlines(Figure 51 )withoutcreatingTcornersorforcingrenementofinter mediatemodels tosatisfyconnectivityorquadlayoutconstraints.Essen tially,designerscanreusethewhole rangeofpolyhedralmodelstheyareusedto.Wemodiedtheal gorithmforconvertingquad meshestoageneralizedmethodforameshwithTr/Quad/Pentf acets.Thegeneralizedscheme convertssuchapolyhedralmodeltoasurfacewitheverywher ewelldenednormaland C 2 in `regular'meshregionswithquadgridconnectivity.Figur e 52 showsanexampleoftheresulting surfaces.Notethatthefacetsarelimitedtotriangles,qua dsandpentagonsduetocurrentGPU Figure51.(a)Retainingthedensityoffeaturelineswhile varyingtheirnumber.(b),(c)Axe handledetailusingatriangleandapentagontotransitionb etweendetailedand coarserareas. constraintsandtoavoidunnecessarynotational,technica landshapecomplexity. Anirregularfacetwith k sidesisconvertedintoakpatch.Akpatchisageneralizat ion ofacpatch.Itisapiecewisedegree4 C 1 splinepatchwith k cubicboundaries.Akpatch isdenedby 6 k +1 controlpointsindicatedas inFigure 53 (b),(c).Thatis,thekpatch correspondingtoatriangular,quadrilateralorpentagona lfacetisdenedbyatotalof19,24or 31pointsrespectively. 56 PAGE 57 Figure52.ThegeneralizedschemeconvertsameshwithTri/ Quad/PentFacetstoasmooth surfaceconsistingofbicubicpatches( yellow ),kpatchwith k =3 ( green ), k =4 ( red ),and k =5 ( gray ). Figure53.(a)Anordinaryfacetisconvertedtoabicubicp atchwith16controlpoints g ij (b),(c)Anextraordinaryfacetwith k sidesisconvertedtoakdenedby 6 k +1 controlpointsshownas .Thekcanbeviewedas k C 1 connecteddegree4 triangularpatches i i =0 :::k 1 withcubicouterboundaries. Figure54.Thetriangular sectors arelistedincounterclockwiseorderwithamodulok superscript.(a)14controlpointsfromthreeconsecutives ectorsofakpatchdene (b)asinglepatchintriangularBezierform. 57 PAGE 58 Forevaluation,wecanrecoverthepolynomialrepresentati onofthe i th sectorintriangular formoftotaldegree4(Figure 53 (b)and(c)), S ( u;v ):= X i + j + k =4 ijk 4 i j k u i v j ( 1 u v ) k ; (51) wherethe 4+2 2 BBcoefcients ijk 2 R 3 areindexedasinFigure 54 .Specically,wecompute the 4+2 2 coefcients ijk (Figure 54 (b))fromthe14coefcientslabeledin 54 (a)bysimple averaging:degreeraisingthecoefcients i 3 l ; l ; 0 l =0 ;:::; 3 to i 4 `;`; 0 ` =0 ;:::; 4 [ i 400 ; i 310 ; i 220 ; i 130 ; i 040 ]=[ i 300 ; i 300 + 3i 210 4 ; i 210 + i 120 2 ; 3i 120 + i 030 4 ; i 030 ] andcomputingthesharedcoefcientsonthesectorboundar ies i 3 ; 0 ; 1 + = i 1 0 ; 3 ; 1 + =0 ; 1 ; 2 ; 3 ,(i.e.),indices 301 202 103 and 004 inFigure 54 (b),fromtheC 1 constraints. Read[ 34 ]forathoroughexplanationofthealgorithmanditsGPUimpl ementation, smoothnessverication,etc. 58 PAGE 59 CHAPTER6 DISCUSSIONANDFUTUREWORK 6.1FutureGPUAPI Ourconversionschemenotonlytswellwiththecurrentgrap hicshardwarepipeline, butalsomatchesverywellwiththearchitectureofthefutur egraphicshardware[ 29 48 ].The workloadcurrentlyinthegeometryshaderwillbeassignedt othepatchshader.TheidealGPU pipelineneedstoexploremoreparallelisminthegeometrys haderwhere24coefcientsofa cpatchcanbecomputedindependentlygiventhevertexneig hborhood.Themaximalparallelism makesthecostofderivingonecoefcientroughlyequalstot hecostofconstructingawhole patch.Currentlyweprecomputethetessellateddomainands torethesestaticvaluesinaset oftextures.Inthefuture,thispartofcomputationwillber eplacedbythetessellationunit. Animationusingourconversionschemewillbeachievedinas inglepasswithoutgeometry transmissionbetweenpasses. 6.2VolumePreservation Preservingthevolumeunderconstraintscanachieveareali sticdeformableobjectanimation.Thewellknowndivergencetheoremcanbeusedtored uceavolumeintegraltoanan integraloverthesurface.Givenaclosedobject,volumeism atchedtoaprescribedvalueby inatingordeatingthedeformableobjectuniformly.Fore nhancingtherealism,thismethodcan befurtherextendedtoxpartsoftheobjectandattachdiffe rentmaterialpropertiestosurface pieces.Thisexact,localizedvolumepreservationmethodw orksforallsurfacesthatconsists ofBezierpatches.Therefore,wewillcombinethismethodw ithournewsurfaceconversion algorithmtoachieverealtimevolumepreservation. 6.3AdaptiveTessellation Theadaptivetessellationsampleseachsurfacepatchmored enselyinregionsofhigh curvatureandlessdenselyinregionsoflowcurvature.More overitadjuststhelevelofdetail accordingtohowclosethegeometryistothecamera.Thesurf aceisonlytestedwhereandwhen 59 PAGE 60 it'snecessary.Therefore,adaptivetessellatedsurfacew illgreatlyimprovetheperformance.The tessellationfactorcanbegeneratedbyusingtheattest[ 9 ].Withthetessellationunitinthe GPU,thecostoftessellatingthedomainisalmostfree. 60 PAGE 61 REFERENCES [1] MicrosoftDirectX10SDK.2008.http://www.microsoft.com/downloads/details.aspx?Fam ilyId=572BE8A6263A4424A7FE69CFF1A5B180displaylang=en. [2] C.Bajaj,J.Chen,andG.Xu.Freeformsurfacedesignwithap atches.In Proceedingsof GraphicsInterface94 ,pages174181,Banff,Alberta,Canada,1994. [3] S.Bischoff,L.P.Kobbelt,andH.Seidel.Towardshardwarei mplementationofloop subdivision.In HWWS'00:ProceedingsoftheACMSIGGRAPH/EUROGRAPHICS workshoponGraphicshardware ,pages4150,NewYork,NY,USA,2000.ACMPress. [4] D.Blythe.TheDirect3D10System.In ProceedingsofACMSIGGRAPH2006 ,pages 724734,2006.http://download.microsoft.com/download /f/2/d/f2d5ee2cb7ba4cd09686b6508b5479a1/Direct3D10 web.pdf. [5] M.Bo,M.Amor,M.Doggert,J.Hirche,andW.Strasser.Hardwa resupportforadaptive subdivisionsurfacerendering,2001.citeseer.ist.psu.e du/article/boo01hardware.html. [6] J.BolzandP.Schr¨oder.RapidevaluationofCatmullClark subdivisionsurfaces.In Web3D '02:Proceedingoftheseventhinternationalconferenceon 3DWebtechnology ,pages 1117,NewYork,NY,USA,2002.ACMPress. [7] J.BolzandP.Schr¨oder.Evaluationofsubdivisionsurface sonprogrammablegraphics hardware.2007.http://www.multires.caltech.edu/pubs/ GPUSubD.pdf. [8] T.BoubekeurandC.Schlick.GenericmeshrenementonGPU.I n HWWS'05:ProceedingsoftheACMSIGGRAPH/EUROGRAPHICSconferenceonGraphi cshardware ,pages 99104,NewYork,NY,USA,2005.ACM. [9] M.Bunnell. GPUGems2:ProgrammingTechniquesforHighPerformanceGr aphicsand GeneralPurposeComputation ,chapter7.AdaptiveTessellationofSubdivisionSurfaces withDisplacementMapping.AddisonWesley,Reading,MA,2 005. [10] E.CatmullandJ.Clark.RecursivelygeneratedBsplinesur facesonarbitrarytopological meshes. ComputerAidedDesign ,10:350355,1978. [11] R.L.Cook. Shadetrees .ACM,NewYork,NY,USA,1998. [12] M.S.D.Doo.Behaviourofrecursivedivisionsurfacesneare xtraordinarypoints. Computer AidedDesign ,10:356360,1978. [13] T.DeRose,M.Kass,andT.Truong.Subdivisionsurfacesinch aracteranimation.In SIGGRAPH'98:Proceedingsofthe25thannualconferenceonC omputergraphicsand interactivetechniques ,pages8594,NewYork,NY,USA,1998.ACMPress. [14] G.Farin. Curvesandsurfacesforcomputeraidedgeometricdesign:ap racticalguide AcademicPressProfessional,Inc.,SanDiego,CA,USA,1988 61 PAGE 62 [15] C.GonzalezandJ.Peters.Localizedhierarchysurfacespli nes.InS.S.J.Rossignac,editor, ACMSymposiumonInteractive3DGraphics ,pages715,1999. [16] M.Guthe, A.Balazs,andR.Klein.GPUbasedtrimmingandtessellati onofNURBSand Tsplinesurfaces. ACMTransactionsonGraphics ,24(3):10161023,2005. [17] M.Guthe,A.Balazs,andR.Klein.GPUbasedtrimmingandte ssellationofNURBSand Tsplinesurfaces. ACMTrans.Graph. ,24(3):10161023,2005. [18] M.Halstead,M.Kass,andT.DeRose.Efcient,fairinterpol ationusingCatmullClark surfaces. ProceedingsofSIGGRAPH93 ,pages3544,Aug1993. [19] H.Hoppe,T.DeRose,T.Duchamp,M.Halstead,H.Jin,J.McDon ald,J.Schweitzer,and W.Stuetzle.Piecewisesmoothsurfacereconstruction. ComputerGraphics ,28(Annual ConferenceSeries):295302,1994. [20] D.L.JamesandC.D.Twigg.Skinningmeshanimations.In SIGGRAPH'05:ACM SIGGRAPH2005Papers ,pages399407,NewYork,NY,USA,2005.ACM. [21] K.KarciauskasandJ.Peters.Guidedsubdivision,2005.http://www.cise.u.edu/research/SurfLab/papers.shtm l. [22] O.A.KarpenkoandJ.F.Hughes.Smoothsketch:3dfreeforms hapesfromcomplex sketches. ACMTransactionsonGraphics ,25/3:589598,2006. [23] L.Kavan,C.O'Sullivan,andJ.Zara.Efcientcollisionde tectionforsphericalblend skinning.In Proceedingsofthe4thinternationalconferenceonCompute rgraphicsand interactivetechniquesinAustralasiaandSoutheastAsiat ableofcontents,KualaLumpur, Malaysia ,pages147156,2006. [24] L.KavanandJ. Zara.Sphericalblendskinning:arealtimedeformationo farticulated models.In I3D'05:Proceedingsofthe2005symposiumonInteractive3D graphicsand games ,pages916,NewYork,NY,USA,2005.ACM. [25] A.Krishnamurthy,R.Khardekar,andS.McMains.Directeval uationofnurbscurvesand surfacesontheGPU.In SPM'07:Proceedingsofthe2007ACMsymposiumonSolidand physicalmodeling ,pages329334,NewYork,NY,USA,2007.ACM. [26] S.LaiandF.F.Cheng.Adaptiverenderingofcatmullclarks ubdivisionsurfaces.In CADCG'05:ProceedingsoftheNinthInternationalConferenceo nComputerAidedDesignand ComputerGraphics ,pages125132,Washington,DC,USA,2005.IEEEComputerSo ciety. [27] A.Lee,H.Moreton,andH.Hoppe.Displacedsubdivisionsurf aces.InK.Akeley, editor, Siggraph2000,ComputerGraphicsProceedings ,pages8594.ACMPress/ACM SIGGRAPH/AddisonWesleyLongman,2000.citeseer.ist.psu .edu/lee00displaced.html. [28] A.Lee,H.Moreton,andH.Hoppe.Displacedsubdivisionsurf aces.InK.Akeley,editor, Siggraph2000,ComputerGraphicsProceedings ,AnnualConferenceSeries,pages8594. ACMPress/ACMSIGGRAPH/AddisonWesleyLongman,2000. 62 PAGE 63 [29] M.Lee.Nextgenerationgraphicsprogrammingonxbox360,2 006. http://download.microsoft.com/download/d/3/0/d30d58 cd87a241d5bb53baf560aa2373/ Next Generation Graphics Programming on Xbox 360.ppt. [30] C.LoopandS.Schaefer.ApproximatingCatmullClarksubdi visionsurfaceswithbicubic patches.Technicalreport,MicrosoftResearch,MSRTR20 0744,2007. [31] C.T.Loop.Smoothsubdivisionsurfacesbasedontriangles, 1987.Master'sThesis, DepartmentofMathematics,UniversityofUtah. [32] A.Mohr,L.Tokheim,andM.Gleicher.Directmanipulationof interactivecharacterskins. In I3D'03:Proceedingsofthe2003symposiumonInteractive3D graphics ,pages2730, NewYork,NY,USA,2003.ACM. [33] K.MullerandS.Havemann.Subdivisionsurfacetesselation ontheyusingaversatile meshdatastrucure,2000.citeseer.ist.psu.edu/muller00 subdivision.html. [34] A.Myles,T.Ni,andJ.Peters.GPUfriendlysmoothsurfaces frommesheswith tri/quad/pentfacets.In SymposiumonGeometryProcessing,July24,2008,Copenhagen,Denmark ,pages18.Blackwell,2008. [35] A.Myles,Y.Yeo,andJ.Peters.GPUconversionofquadmeshes tosmoothsurfaces. InD.Manocha,B.Levy,andH.Suzuki,editors, ACMSolidandPhysicalModeling Symposium,June24,2008,StonyBrookUniversity,StonyBr ook,NewYork,USA ,pages 321326.ACMPress,2008. [36] A.Nealen,T.Igarashi,O.Sorkine,andM.Alexa.Fibermesh: designingfreeformsurfaces with3dcurves. ACMTrans.Graph. ,26(3),2007. [37] T.Ni,Y.Yeo,A.Myles,V.Goel,andJ.Peters.GPUsmoothingo fquadmeshes.In M.Spagnuolo,D.CohenOr,andX.Gu,editors, IEEEInternationalConferenceonShape ModelingandApplications,June46,2008,StonyBrookUniv ersity,StonyBrook,New York,USA ,pages310.ACMPress,2008. [38] J.Peters.PatchingCatmullClarkmeshes.InK.Akeley,edi tor, Siggraph2000,Computer GraphicsProceedings ,AnnualConferenceSeries,pages255258.ACMPress/ACM SIGGRAPH/AddisonWesleyLongman,2000. [39] J.Peters.Geometriccontinuity.In HandbookofComputerAidedGeometricDesign ,pages 193229.Elsevier,2002. [40] J.PetersandA.Nasri.Computingvolumesofsolidsenclosed byrecursivesubdivision surfaces. ComputerGraphicsForum ,16(3):C89C94,1997. [41] J.PetersandU.Reif.AnalysisofgeneralizedBsplinesubd ivisionalgorithms. SIAM JournalonNumericalAnalysis ,35(2):728748,Apr.1998. [42] H.Prautzsch.Freeformsplines. ComputerAidedGeometricDesign ,14(3):201206,1997. 63 PAGE 64 [43] H.Prautzsch,W.Boehm,andM.Paluzny. B ezierandBSplineTechniques .SpringerVerlag, 2002. [44] K.PulliandM.Segal.Fastrenderingofsubdivisionsurface s.In SIGGRAPH'96:ACM SIGGRAPH96VisualProceedings:Theartandinterdisciplin aryprogramsofSIGGRAPH '96 ,page144,NewYork,NY,USA,1996.ACM. [45] S.SchaeferandJ.Warren.Exactevaluationofnonpolynomi alsubdivisionschemesat rationalparametervalues.In PG'07:Proceedingsofthe15thPacicConferenceon ComputerGraphicsandApplications ,pages321330,Washington,DC,USA,2007.IEEE ComputerSociety. [46] L.J.Shiue,I.Jones,andJ.Peters.ArealtimeGPUsubdivis ionkernel.InM.Gross, editor, Siggraph2005,ComputerGraphicsProceedings ,AnnualConferenceSeries,pages 10101015.ACMPress/ACMSIGGRAPH/AddisonWesleyLongman ,2005. [47] J.Stam.ExactevaluationofCatmullClarksubdivisionsur facesatarbitraryparameter values.In SIGGRAPH ,pages395404,1998. [48] A.Tatarinov.Instancedtessellationindirectx10,2008.http://www.microsoft.com/downloads/details.aspx?Fam ilyId=572BE8A6263A4424A7FE69CFF1A5B180displaylang=en. [49] A.Vlachos,J.Peters,C.Boyd,andJ.L.Mitchell.CurvedPNt riangles.In 2001, SymposiumonInteractive3DGraphics ,BiAnnualConferenceSeries,pages159166. ACMPress,2001. [50] D.Zorin.Subdivisionformodelingandanimation. ACMSIGGRAPHCourseNotes ,2000. 64 PAGE 65 BIOGRAPHICALSKETCH TianyunNiwasborninNanjing,China.ShewasawardedherBSi ncomputersciencewith mathematicsminorfromTexasStateUniversityin2000andhe rMEincomputerengineering fromUniversityofFloridain2002.Sheearnedherdoctorald egreeincomputergraphicseldin 2008. 65 