<%BANNER%>

Real-Time Smooth Surface Construction on the Graphics Processing Unit

Permanent Link: http://ufdc.ufl.edu/UFE0021975/00001

Material Information

Title: Real-Time Smooth Surface Construction on the Graphics Processing Unit
Physical Description: 1 online resource (65 p.)
Language: english
Creator: Ni, Tianyun
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: animation, gpu, smooth, subdivision
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Increased realism in interactive graphics and gaming requires complex smooth surfaces to be rendered at ever higher frame rates. In particular, representations used to model surfaces offline, such as spline and subdivision surfaces, have to be modified or reorganized to allow for efficient usage of the graphics processing unit and its SIMD (Single Instruction, Multiple Data) parallelism. This dissertation presents a novel algorithm for converting quad meshes on the GPU to smooth, water-tight surfaces at the highest speed documented so far. The conversion reproduces bi-cubic splines wherever possible and closely mimics the shape of the Catmull-Clark subdivision surface by c-patches where a vertex has a valence different from 4. The smooth surface is piecewise polynomial and has well-defined normals everywhere.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Tianyun Ni.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Peters, Jorg.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021975:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021975/00001

Material Information

Title: Real-Time Smooth Surface Construction on the Graphics Processing Unit
Physical Description: 1 online resource (65 p.)
Language: english
Creator: Ni, Tianyun
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: animation, gpu, smooth, subdivision
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Increased realism in interactive graphics and gaming requires complex smooth surfaces to be rendered at ever higher frame rates. In particular, representations used to model surfaces offline, such as spline and subdivision surfaces, have to be modified or reorganized to allow for efficient usage of the graphics processing unit and its SIMD (Single Instruction, Multiple Data) parallelism. This dissertation presents a novel algorithm for converting quad meshes on the GPU to smooth, water-tight surfaces at the highest speed documented so far. The conversion reproduces bi-cubic splines wherever possible and closely mimics the shape of the Catmull-Clark subdivision surface by c-patches where a vertex has a valence different from 4. The smooth surface is piecewise polynomial and has well-defined normals everywhere.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Tianyun Ni.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Peters, Jorg.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0021975:00001


This item has the following downloads:


Full Text





REAIL-TIMIE SMOOTH- SURFACE CONSTRUCTION ON THIE GRAPHICS PROCESSING
UNIT




















By
TIANYUN NI


A DISSERTAiTION PRESENTED) TO THE GRADUATE SCHOOL
OF THE~ UNIVERISITYI OF FLORIDA IN PARTIAL FU] li i i '. i i T
OF THIE REQ2UIIRP-. i! .TS FOR THIE DEGREE OF
DO~CTORI OF; 1il OSOPH--Y

UNIVERSITY OF FILORIIDA



































S:: Tianyun Ni



































To my family, p. .11 il my father and to all of whom have lent encouragement and support

during the time .i ut on thi s research










ACKNOWLED(_'? li: TS


I wish to express my sincerest thanks to the chair of my dissertation committee, Dr. Jojrg,

Sfor working with me throughout this long enterprise.











TABLE OF CONTENTS


page

ACKNOWLEDGMENT S . 4

LIST OF TABLES . 7

LIST OFFIGURES . 8

ABSTRACT. ................................ . 10

CHAPTER

1 INTRODUCTION . . 11

1.1 Motivation. . . 11
1.2 Problem Statement . . 13
1.3 Modern GPU Pipeline and Current Trends . . 14
1.4 Representations in Surface Modeling . . 17
1.4.1 Subdivision Surfaces .. . 17
1.4.2 Parametric Patches ......... . 20
1.4.2.1 Bezier technique . . 22
1.4.2.2 Related work. . . 23

2 A NEW SCHEME FOR SURFACE CONSTRUCTION .. . . 25

2.1 Contribution. . . 25
2.2 The Conversion Algorithm . . 25
2.2.1 The Conversion Rules for a Type-1 Quad . . 27
2.2.2 The Conversion Rules for a Type-2, or Type-3 Quad . . 29
2.3 Derivation of the coef ficients of a c-patch . . 30
2.3.1 Derivation of Ao andX A t...... . . 31
2.3.2 Derivation of b211 and bl21 ..... . . 31
2.3.3 Derivation ofbl 6112. ..... . . 33
2.4 Smoothness Verification . . 35
2.5 Complexity Analysis . . 39
2.5.1 Number of Patches . . 39
2.5.2 Cost of Patch Construction . . 39
2.5.3 Cost of Surface Evaluation . . 39
2.6 Approximation Catmull-Clark Subdivision Surface . . 40
2.7 Water-Tight Surface Verification . . 40
2.8 Discussion. . . 40

3 GPU IMPLEMENTATION . . 42

3.1 Overview . . 42
3.2 2-pass Approach . . 42












3.3 1-pass Approach . . 44
3.4 Coordinate System Transformation . . 44
3.5 Water-Tight Evaluation .. . . 46
3.6 Conclusion . . 47

4 RESULTS . ............................. . 48

4.1 Shape Quality . . 48
4.2 Performance . . 50
4.3 Displacement Mapping .. . . 51
4.4 Morphing and Animation . . 52
4.5 Conclusion . . 55

5 PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS . 56

6 DISCUSSION AND FUTURE WORK . . 59

6.1 Future GPU API. . . 59
6.2 Volume Preservation . . 59
6.3 Adaptive Tessellation . . 59

REFERENCES . . 61

BIOGRAPHICAL SKETCH . . 65










LIST OF TABLES

Table page

4-1 ALU operations for evaluation at (v, v) . . 50

4-2 Performance results ......... .. . 50

4-3 Performance of the 1-pass implementation. . . 51



































































2-9 The choice of middle point in c-patch

2-10 The center of a bi-cubic patch can be evaluated by the linear combination of the bound-
ary coefficients.

2-11 C1 transition between a triangular and a bicubic patch.

2-12 G1 transition between two triangular patches.


LIST OF FIGURES


Polygonal modeling .....

Problem statement ....

DirectX 10 pipeline stages ....

DirectX 10 pipeline .....

The primitives .....

The notations of input mesh ....

The three possible configurations .....

The Catmull-Clark stencils ....

The subdivision schemes ....

The suggested rendering passes ....

Future GPU architecture ....

The subdivision schemes ....

Derivation of c-patch ....

Vertex computation .....

Surface conversion ....

Computing control points v, e, f and t, the

Patch-based computation ....

Patch computation ....

The re-parameterization of A to meet G1 at


Figure

1-1

1-2

1-3

1-4

1-5

1-6

1-7

1-8

1-9

1-10

1-11

1-12

2-1

2-2

2-3

2-4

2-5

2-6

2-7


page

. 11

. 12

. 14

. 15

. 16

. 17

. 17

. 18

. 19

. 21

. 22

. 24

. 25

. 26

. 26

. 27

. 28

. 30

. 32

. . 32


proj section of e


:the vertex


2-8 Coefficients b211 and bl21 Of c-patch is derived on top of a ghost patch.











3-1 2-Pass implementation . . 42

3-2 2-Pass conversion . . 43

3-3 1-Pass conversion . . 45

3-4 1-Pass implementation . . 45

3-5 (u, v) on an irregular quad. . . 46

3-6 Water-tight Evaluation ......... .. . 46

4-1 Shape quality comparison . . 48

4-2 Catmull-Clark approximation comparison . . 49

4-3 Ordinary patches and extraordinary patches . . 49

4-4 GPU smoothed quad surfaces with displacement mapping. . . 49

4-5 Close-up of the frog. The refined mesh is water-tight. . . 51

4-6 Displacement mapping on the frog model ....... .. 52

4-7 Shape comparison. . . 53

4-8 Shape comparison. . . 53

4-9 Real time animation on the Sword model. ....... .. 54

4-10 Real time animation on the Frog model . . 54

4-11 Asynchronous animation of nine Frogs. . . 54

5-1 The reasons for using Tr/Quad/Pent Meshes . . 56

5-2 A quad/tri/pent model . . 57

5-3 Patch representations . . 57

5-4 Triangular representation . . 57










Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
.1. i : for the Degree of Doctor of Philosophy

REAL-~TIME SMIOOTHI SURFACE CONSTRUCTION ON TH-(E GRAPHIICS PROCrl TEN~G
UNIT

By

Tianyun Ni

August I1

Chair: "':
i i.i :: Computer Engineering

Increased realism in interactive -:1 .ill and -:nn^in I: .i::T:- complex smooth surfaces

to be rendered at ever higher frame: rates. In ,. .:i .1 : representations used to model surfaces

offline, such as spline and .:1. i: : :: surfaces, have to be modified or reorganized to allow

for efficient usage of the graphics i :.. :: unit and its SIMD) (Single: Instruction, "1::7 IT i

Data) parallelism. This i: :- :i: :1:- 1: ::f a novel algorithm for converting quad meshes on

the: GPU to smooth, w/ater-tight ::: i?: at the highest speed documented so far. The: conversion

...T.. bi-cubic i..E... wherever i ,i1 and closely mimics the -hnyr of the Catmull-Clark

subdivision surface by c i:-:1. 1-:; where a vertex has a valence different from 4. The smooth

surface is piecewise polynomial and has wvell-defined normals everywhere.










CHAPTER 1
INTRODUCTION


This chapter introduces the challenges that motivate the dissertation, gives a detailed

literature review, positions of the research relative to the current state of the art. and an overview

of the modern GPU pipeline.

1.1 Motivation


In graphics, 3D obj ects are approximated by polyhedral meshes of great complexity. For

example, a game character can consist of tens of thousands of polygons (Figure 1-1). Increased

realism in interactive gaming demands such meshes to be animated and rendered in real-time.

There are essentially two maj or approaches in the literature which serve this purpose: Polygonal

Modeling and Higher-order Surface Modeling.

There are two scenarios of animations: Morphing and Skinning. Morphing is used to

change one image into another through a seamless transition. Skinning is a common technique

to deform characters [20, 23, 24, 32]. The animated mesh, referred as a "skin", is deformed

based on the pose of an underlying skeleton. In Polygonal Modeling (Figure 1-1), skinning

and morphing are applied to a high-detail mesh created by an artist. Most games currently use

this approach. This technique involves redundant work due to minimal sharing in Polygonal

Modeling representation. In addition, a large number of vertices in a complex mesh must be fed

into the graphics pipeline via the GPU's memory bus, which is a potential bottleneck.

.Animateid.
Hlgh-detall Animation Hg-eal











Figure 1-1. Polygonal Modeling: currently the popular animation approach in games.










The alternative approach, Surface Modeling, animates a coarse mesh (Figure 1-2).

Subdivision surfaces and parametric patches, as two popular high-order surface representations,

both support level of detail rendering (see Section 1.4). Highly-detailed 3D models are produced

by displacement mapping [1l]. Displacement mapping adds fine details in form of scalar

fields on the smooth surface defined by the coarse mesh. As a specific instance, Lee [27]

proposes Displaced Subdivision Surface to represent a detailed surface model as a scalar value

displacement over a smooth surface domain. This approach reduces the number of vertices that

must be read and animated in each frame because complex geometric details are generated on

the GPU. The runtime cost now includes the conversion process from the coarse input mesh to

the final complex mesh. The conversion process involves surface construction, evaluation and

displacement mapping.





RepresntatonRepresntauo
no mapping

Smot Surfac
- -Tessellata +u
Displacementt
~~M apaingapi



Mesh


Figure 1-2. Each high-detail mesh in Surface Modeling is represented by a coarse control mesh
with a displacement map. The coarse control mesh is first converted to a smooth
surface. Then the surface is tessellated and the vertices are perturbed in the normal
directions based on the corresponding value in the displacement map. Last, the
normal at each vertex of the refined highly-detailed mesh is updated.



In summary, the advantages of Surface Modeling are

1. lower computation cost of animation because skinning is done on the coarse mesh, not the
final dense mesh;


2. memory and bandwidth savings by encoding most detail as one-dimensional displacements
rather than three-dimensional vectors;










3. support of refinement level on the fly;


4. customization of archetypes: we can model different 3D models with the same coarse
mesh, changing only the displacement map;

5. support of adaptive tessellation: evaluation does not have to be on a uniform grid.

The disadvantages of Surface Modeling is that modern GPUs cannot render such surface directly.

The surface must be converted into triangles or quads through a process of tessellation and

evaluation. Therefore, Surface Modeling becomes more attractive as a real-time technique only if

the conversion is more cheaply than the cost of reading and animating a high-polygon mesh. Our

goal is to design such a scheme on the GPU.

1.2 Problem Statement

Meshes consist of pure quadrilateral facets are common in modeling for animation. Any

polyhedral mesh can be converted into such a quad mesh by one step of mesh refinement. But a

good designer creates meshes with the quad-restriction in mind so that no global refinement is

necessary. We therefore focus on quadrilateral meshes and aim to derive a set of efficient rules

directly on the GPU (Figure 1-2, the red dotted rectangle) that produce surfaces with good visual

quality. Specifically the resulting surfaces should

1. generate a small number of low degree polynomials;


2. possess smooth geometry (no extra cost for smooth shading);

3. closely approximate Catmull-Clark surfaces (a standard modeling tool);


4. are water-tight (no pixel drops out);


5. map well to the graphics pipeline and leverage the strengths of GPU computation.










1.3 Modern GPU Pipeline and Current Trends


A graphics processing unit (GPU) is a dedicated graphics rendering device. Its SIMD

architecture has evolved substantially over the last decade. This highly parallel structure makes

it more effective than general-purpose CPUs for a range of algorithms. Modern GPUs expose a

programmable parallel stream processing pipeline as a series of short programs called shaders.

During the last five years, maj or graphics software libraries such as OpenGL and DirectX are

used to program the GPU via shaders on a programmable pipeline, which has mostly superseded

the older "fixed-function pipeline". The two most popular graphics software libraries, DirectX

and OpenGL, currently both specify APIs for three types of shaders: vertex, geometry, and pixel

shader. The shaders in DirectX10 system [4](Figure 1-4 ) share a common core that accesses

up to 128 memory buffers and 16 parameter (constant) buffers. Vertex and pixel shaders use a

"one-in, one-out" data processing model. In contrast, the geometry shader has a limited ability to

amplify or reduce primitive count and thus is able to change meshes. Figure 1-3 shows the input




Input As Fixed Vertex and Index Vertices
Sembler (IA) buffers
Vertex Shader Programmable Avertex, up to 8 Avertex, up to 8 32-bit 4-
(VS) 32-bit 4- component data
component data
Geomletry Programmable A primitive, up Anyr number of primitive,
Shader (GS) to 8 32-bit 4- up to 1024 32-bit 4-
co~mponent data component data
The Rasterizer Fixed vertices and Fragments
(TR) attributes of a
single primitive
Pixel Shader Programmable Afragment, up A fragment, up to 8 32-bit
(PS) to 8 32-bit 4- 4 -co mpo nent data
component data
Output Fixed A fragment A pixel
Merger (OM)


Figure 1-3. The input and output of each pipeline stage in DirectX 10 system


and output of each pipeline stage. The more detailed explanation of each stage is as follows:






























u~~r


Geometry Shader t





Pixel Shader



Output
Merger


Inb~~Ulmer
~~~~~~ ...,------------~~I


""ur;....l --


ill


Figure 1-4. DirectX 10 Pipeline


1. The Input Assembler (IA) gathers vertex data to set up vertex and index buffers. Vertex
buffers contain per-vertex data while index buffers define geometry primitives as integer
indices into vertex buffers. Indexing helps avoid redundant computations of the same
vertex.


2. The vertex shader (VS) typically processes vertex-based operations such as changing the
position and normal of a single vertex. The computations in this stage are local. Each
vertex only has its own information and does not communicate with other vertices. The VS
is most commonly used to transform vertices from obj ect space to clip space.


3. The geometry shader (GS) processes the vertices of a single primitive. A primitive can be
a point, a line segment, a triangle, a point with adj acency, a line segment with adj acency,
and a triangle with adj acency (Figure 1-5). Due to the availability of the primitive vertices
up to 6 vertices for a triangle with adj acency), the computations in the stage are less
local than those on the VS and PS. The GS can emit additional primitives. This new
amplification feature, introduced in DirectX10, adds more flexibility and makes a number
of algorithms [1] possible to be implemented on the GPU, such as mesh refinement,
shadow volumes, dynamic particle systems, etc. The geometry shader output may be fed to
the rasterizer stage and/or to a vertex buffer in memory via the stream output stage.


Memory resources
(Buffers, textures)










4. The rasterizer (TR) is a fixed-function stage generating fragments by filling in the poly-
gons sent through the graphics pipeline. Clipping, culling, perspective divide, viewport
transform, primitive set-up, scissoring, depth offset also happen in the stage.


5. The pixel shader (PS) operates on one fragment at a time. Usually scene lighting and
pixel-related effects such as bump mapping and color tone mapping occur in the PS.


6. The output merger (OM) takes a fragment from PS and performs traditional stencil and
depth testing operations as well as render target blending to generate a final pixel on the
screen.

O- 4- O
Point Point with Adjacency

Line Segment
Line Segement .in1jcn




O
Triangle Triangle with
Adjacency

Figure 1-5. The six primitives used in GS


The future GPU pipeline [29, 48] is expected to provide a Tessellation Unit, combined with

new shader stages for patch conversion and evaluation of tessellated high-order surfaces. The

Tessllator provides a solution to adaptive refinement on the graphics hardware. Based on user-

provided tessellation factors per edge, the tessellator adaptively creates a sampling pattern of

the underlying parametric domain and automatically generates a set of parametric domains. In

addition, two special shaders are introduced to the next-generation GPU pipeline. The patch

shader converts an input mesh to a set of patches. The evaluation shader takes the (u, v) output of

the tessellator and evaluates the patch at (u, v). This future GPU architecture also allows the GPU

to exploit more parallelism because multiple arithmetic units can be running the same evaluation

shader. Moreover tessellation occurs on the GPU and overcomes the bottleneck of bus bandwidth

caused by model complexity. The new GPU design indicates Surface Modeling is the trend for

real-time graphics.










1.4 Representations in Surface Modeling


In Computer Graphics, surfaces are represented by polyhedral meshes. A polyhedral

mesh is a collection of vertices, edges and facets. The valence of the vertex is the number of its

incident edges. Each facet is an n-sided polygon. In a triangular (or quadrilateral) mesh, n equals

3 (or 4 respectively). An arbitrary mesh has n-sided polygons where the value of n is arbitrary.

The difference between Regular and Irregular Vertices are explained in Figure 1-6. Figure 1-7

illustrates three possible types of a facet.


EMMM
Tri. Tria ngles Valence Valfence AII vertices Exactone Morethan
= 6 != 6 of a facet vertex of a one vertex of
Quad. Quadrilaterals Valence Valence are regular facet is a facet are
= 4 != 4 irregular irregular


Figure 1-6. Tri- and Quadrilateral meshes and facet types 1,2,3.









Type-1 Quad Type-2 Quad Type-3 Quad

Figure 1-7. The three possible configurations. Type-1 Quad is regular. Type-2 or 3 is irregular.



Parametric patches and subdivision surfaces are maj or tools for modeling freeform surfaces

with arbitrary topology. A more intuitive way for inexperienced users to create shape by drawing

curves, or sketch is also available [22, 36]

1.4.1 Subdivision Surfaces

Subdivision surfaces, as part of standard modeling packages (e.g., 3DMax, Maya, Soft-

image, Mirai, Lightwave, etc.), have proven to be a useful modeling tool. Subdivision schemes

were first introduced by [10, 12, 31]. They generate a smooth surface through mesh refinement










process. This method begins with a coarse mesh that approximates a 3d model, known as a

control mesh. Each vertex in the control mesh is called a control point. Control points influence

the shape of the limit surface. The mesh is refined after each subdivision step by inserting new

vertices into the mesh, refining existing point positions, and updating the connectivity. The

positions of the new vertices in the mesh are computed by the averaging rules that apply to the

positions of nearby old vertices. The averaging rules are different from scheme to scheme (see

a comparison in Figure 1-9), and it is these rules that determine the properties of the surface.

The graphs that illustrates the rules are called stencils. The binary subdivision splits each edge

into 2 while ternary subdivision split each edge into 3. Usually each subdivision scheme has at

most three types of rules: vertex stencil, edge stencil, and face stencil. For example, the stencils

of Catmull-Clark subdivision is shown in Figure 1-8. The refinement rules includes stencils

for smooth surface as well as special rules for creating shared or semi-sharped features. Each

refinement step produces a denser mesh than the previous one. The limit subdivision surface is

the surface produced from this process after infinitely many times of refinements. In practical use

however, this algorithm is only applied a limited, and usually four, number of times.








FaceStencil EdgeStencil VertexStencil

Figure 1-8. The stencils used in Catmull-Clark subdivision. These stencils define the rules to
derive the new vertices that lie on the old vertices, edges, and facets.



A realization of tessellation-on-the-fly for Loop subdivision surfaces was proposed in

[33]. Pulli [44] implemented Loop's subdivision scheme with additions by Hoppe et al [19].

Bischoff [3] proposed a forward-differencing method that only requires a constant amount of

memory regardless of subdivision step. DeRose [13] generalized the infinitely sharp creases

of [19] to obtain semi-sharp creases. Hoppe [19] extended Loop's scheme by introducing





Catrnull-Clark any C' C1 No

Doo-Sabin C2 C1 NO

Loop C2 C1 NO

Butterfly C1 C1 Yes
Kobbelt C1 C1 Yes
Sirnplest any C1 C1 No
Sqrt(3) C2 C1 NO
4-8 C4 C1 NO
Ternary Triangle C4 C1 NO
Quad/Triangle M C2 C1 NO
4-3 M C2 C1 NO
Ternary Quad C2 C1 NO


Figure 1-9. Classification of common Subdivision Schemes.


subdivision rules that lead to a piecewise smooth surface with features such as creases, corners,

darts, and conical vertices.


Adaptive subdivision can dramatically speed up the performance because the level of

detail(LOD) is updated based on dynamic distance with the camera as well as the complexity

of each part of the model. Adaptive refinement is previously implemented using quad-tree data

structure [50]. Each level of the tree represents one refinement level of the mesh. However, it

is difficult to map the recursive non-uniform tree structure to parallel computation. Bunnell [9]

provides code for adaptive refinement. Even though this code was optimized for an earlier

generation GPUs, this implementation adaptively renders the subdivision surfaces in real-time

on current hardware. Lai and Cheng [26] implemented adaptive Catmull-Clark subdivision. A

hardware architecture support for adaptive refinement is proposed by [5]

The implementation of subdivision surfaces on the GPU can be roughly categorized

into three groups: (I) recursive evaluation [9, 13, 28, 44, 46]; (II) direct evaluation [45, 47];

(III) pre-tabulated basis function composition [6, 7]. Recursive evaluation is the most intuitive

way, but not the most efficient approach. Stam [47] directly evaluates subdivision surfaces at


L~ ~l;rmWmR










arbitrary parameter values. However, Stam's method can not evaluate a mesh that contains

Type-3 quads. Moreover, the required proj section of control points into the eigen space is too

complex for large meshes on the GPU. The weakness of [6, 7, 9, 46] is not able to convert a mesh

with Type-3 quads either. To get rid of those quads usually means applying at least one Catmull-

Clark subdivision step on the CPU and four-fold data transfer to the GPU. In more detail, Shiue

implements recursive Catmull-Clark subdivision using several passes via the pixel shader, using

textures for storage and spiral-enumerated mesh fragments for maximizing parallelism [46]. Bolz

tabulates the subdivision nodal functions up to a given density and linearly combine them in the

GPU [6, 7]. The number of nodal functions equals the number of the vertices of the input mesh.

One of the obvious advantages of subdivision surfaces is they can model surfaces of

arbitrary topological type. Also because of static refinement rule for each scheme subdivision

surfaces are easy to implement. Although subdivision surfaces have been known for nearly

twenty years, their use has been hindered in realtime applications such as games because

recursive refinement is neither memory efficient nor performance efficient. Multiple passes

are required to render a visually smooth surface. Moreover, approximately 4-fold of geometry

increase after each subdivision step causes heavy memory traffic on the bus between the CPU and

the GPU.

1.4.2 Parametric Patches


Since current and impending GPU configurations favor short explicit surface definitions

over recursively defined surfaces, the alternative Patch-based refinement has been advocated for

fast rendering. Parametric patches (short as PP) are rendered directly in terms of their polynomial

representations, as opposed to a collection of approximating facets. Generally speaking, PP

converts control meshes to a set of patches that are parametric piecewise polynomials. PP

schemes can conveniently fit into a 2-pass implementation on the current graphics pipeline

(Figure 1-10). The two rendering passes are combined to one pass in a future GPU pipeline

(Figure 1-11) [48].










Tessellated domain
Corse Quad Mesh
(u,v), or (s,t,w)







Palch-basd Surac Output '. | "mm.?r~;;_i


SurfaceT CosrutonSrfc Evaluh-ation ur




-pa r enamet ic patch-te Srepeetto s In te folowi gpste dea il ae d edusn

DM afte te devaluation ofthpodce atches frompreiouspas





The overall speed of a PP scheme is influenced by both the complexity of patches and the

number of patches. For shape measurements, a desired PP scheme ensures at least G1 continuity

across the adj acent patches and is a close approximation of subdivision surfaces. One of the

biggest challenge is to ensure the smoothness everywhere over the patches. Peters explained how

to solve the vertex enclosure problem and geometric continuity in [39, 41].


GPU-based evaluation of trimmed N~URBs surfaces is proposed in [16, 25]. Peters [40]

used an approximation to the limit surface of Doo-Sabin subdivision to get a quickly convergent

series of approximations to the volume of the enclosed subdivision surface. The difficult problem

of filling n-sided holes is recently solved by [21, 42]. Bajaj et al. [2] introduced A-patches

in tri-variate BB form with few free parameters to adjust the shape both locally and globally.

In [15], the free-form surface is represented in either NURBS form or as cubic triangular

Be~zier patches An explicit spline representation of smooth free-form surfaces is to form the

basis of an interactive sculpting environment. In the spirit of the Tessllator, Boubekeur [8]













Inp





I ve


I Pa





I


SC ntcrction
hader jlL urfaI



ladeI
~aderI


atoI


Figure 1-11. One possible pass on the future graphics rendering pipeline,


describes a generic refinement pattern for Surface Modeling (tessellation + displacement) on any

programmable GPU.

1.4.2.1 Bezier technique

The Be~zier form is a parametric surface representation and was first developed in 1972

by the French engineer Pierre Be~zier. A comprehensive overview of the Be~zier form can be

found in [43]. A Be~zier patch is a defined by control points. A Be~zier surface, as a set of Bezier

patches, are piecewise polynomials. They are visually intuitive and mathematically convenient

due to the following properties:

1. Affine invariance: Applying an affine transformation to a control mesh applies it to the
corresponding Be~zier patch as well.


2. The convex hull property: A Be~zier patch lies completely within the convex hull of its
control points, and therefore also completely within the bounding box of its control points
in any given Cartesian coordinate system.

There are two types of Bezier patch:









A tensor product patch in Be~zier form of degree m by n is defined as:





where (u, v) is a barycentric coordinate on the domain of [0, 1] x [0, 1].

A triangular Be~zier patch of degree n is defined as:


b(s, t, w): := b,?)~~~ijk S j
i+j+k=n


where (s, t, w) are the barycentric coordinates on a triangle domain.

1.4.2.2 Related work


For quadrilateral input meshes, it is well known that Type-1 quads can be converted into

degree 3 by 3 patches in tensor-product Be~zier form by the standard B-spline to Be~zier conver-

sion rules [14]. Therefore, any two adj acent patches derived from ordinary quads will j oin C2

The interesting aspect is the conversion of Type-2 and Type-3 quads. A number of techniques(see

a comparison in Figure 1-12) exist to smooth out quad meshes. Peters [3 8] generates NURBS

output, that could be rendered, for example by the GPU algorithm of [17]. But this has not been

implemented. The method of [30] generates one bicubic patch per quad following the shape of

Catmull-Clark surfaces. Since these bicubic patches typically do not j oin smoothly, Loop and

Schaefer compute two additional patches whose cross product approximates the normal of the

bicubic patch. As pointed out in [49], this trompe l'oeil represents a simple solution when true

smoothness is not needed. Comparing the number of operations in construction and evaluation,

the method of [30] should run at comparable speeds to our GPU quad mesh smoothing. Our

method [37] designs a c-patch for converting an irregular quad. The resulting c-patches form a

G1 surface. The alternative algorithm proposed by [35] uses a bi-5 Be~zier patch for each irregular

quad.

































Peters [38] Bi-3 C1 4m


Vlachos et al. [49]


Loop and Shaefer [30]


Myles et al. [35]


Thesis 137]


Cubic, Quadratic


Bi-3
2 by 3

Bi-3
Bi-5

BI-3
c-patch


2m


m geom,
2mta n

m


m


m: number of input quads


Figure 1-12.


This figure compares existing PP schemes in terms of how well they meet the
performance and shape measurements. geom=geometry patches, tan-tangent
patches.









CHAPTER 2
A NEW SCHEME FOR SURFACE CONSTRUCTION

2.1 Contribution

This thesis proposes a set of rules for converting a quadrilateral mesh to a surface consist-

ing of bi-cubic splines wherever possible. Each irregular quad (Figure 1-7) is converted to a novel

C1 surface patch (short e-patch). The surface closely mimics the shape of the Catmull-Clark sub-

division surface and is constructed entirely by local parallel operations on the GPU. The resulting

surface is piecewise polynomial and has well-defined normals everywhere. The evaluation avoids

pixel dropout.

A c-patch is a C1 piecewise polynomial patch with cubic boundary. It is defined by 24

coefficients whose instantiation for a smooth surface is given in Section xxx below and indicated

in Figure 2-1. A c-patch has an alternative representation as four triangular, total degree 4 patches

in Bernstein-Bezier form (Figure 2-5 right).


O O
O
O O
O O
O 112, O

b'211 + a bil21

vi e'

Figure 2-1. The c-patch coefficients. For i 0, 1, 2, 3, the boundary coefficients vi and e)
defined by vertex neighborhoods(figure 2-4 specifies the formulas). The interior
coefficients bji7, bj21,, b6112 (figure 2-6), where i 0..3, j 0 ..ni, and ni is the
valence of I


2.2 The Conversion Algorithm

Here we give the detailed algorithm for converting the quad mesh into coefficients that

define a smooth surface of low degree. Essentially, the conversion from a mesh to a patch



























P2n-1


Figure 2-2.


Smoothing the vertex neighborhood according to Figure 2-4. The center point p,, its
direct neighbors p~j and diagonal neighbors p~j+1 form a vertex neighborhood,
j = 0..n 1.


(a) quad neighborhood (b) bicubic


(c) c-patch


Figure 2-3.


a) A quad neighborhood defining a surface piece. b) A bicubic patch with 4 x 4
control points. This patch is the output if the quad is regular, and used to determine
the shape of a c-patch c) if the quad is irregular. A c-patch is defined by 4 x 6 control
points displayed as and can alternatively, for analysis, be represented as four
C1-connected triangular pieces of degree 4 with degree 3 outer boundaries identical
to the bicubic patch boundaries.


-L.


i !~`I~









consists of computing new points near a vertex using the knowledge of the vertex neighborhood.

A vertex neighborhood consists of a mesh point p, and mesh points pk, k = 0, .. ., 2N 1 of

all quads surrounding p, (Figure 2-2). the union of the four vertex neighborhoods is a the quad

neighborhood(Figure 2-3, A.) that defines a patch. In our scheme, the patch is either a tensor

product bi-cubic Be~zier patch, or a c-patch.

2.2.1 The Conversion Rules for a Type-1 Quad

Recall that a quad is Type-1 if all four vertices have 4 neighbors. Type-1 quads are

considered regular in the literature. Such a facet will be converted into a degree 3 by 3 patch in

tensor-product Be~zier form by the standard B-spline to Be~zier conversion rules [14]. Therefore,

any two adj acent patches derived from Type-1 quads will j oin C2. Figure 2-3 illustrates the

derivation process from a quad to a Bi-cubic Be~zier patch. The conversion rules are shown in

Figure 2-4.


Jf- := (4p* + 2pzj + 2ppya + p2441)/9
e, := (f3 + S -1 + 7. + 2 ) / 4
vV :=-.Co 4 / + 2ej + (r0 3)p,
1 .pN-1~ 2ar(j-)
"0 N Leor = OS N e, j = 0,1.


Figure 2-4. Computing control points v, e, f and t, the projection of e, at a vertex of valence N from the mesh
points pj of a vertex neighborhood; the subscripts are module 2N. By default,
o, :- (c, /c + 9)c + 1i-i ) /16i, th~e subdominant eigenvalue of Catmuull-Clark
subdivision.


A vertex v computed according to Figure 2-4 is the limit point of Catmull-Clark sub-

division as explained, for example, in [18]. The rules for ej and fj are the standard rules for

converting a uniform bicubic tensor-product B-spline to its Be~zier representation. The points

tj are a proj section of ej into a common tangent plane (see e.g. [15]). The default scale fac-
tor o is the subdominant eigenvalue of Catmull-Clark subdivision. We note that for NV = 4,

ejf+2 = 2v ej and o- = 1/2 so that the projection leaves the tangent control points invariant as










tj = ej:


forNV= 4, tj = v+-(ej j+2) 8J v 0 ) = j.


(2-1)


In the next stage, we combine information from four vertex neighborhoods, as shown in Figure

2-5, to populate a tensor-product patch y of degree 3 by 3 in Be~zier form [14]:


k=0 L=o k


>


The patch is defined by its 16 control points gkL. The formulas of Figure 2-4 make this patch the

Be~zier representation of a bicubic spline in B-spline form. For example, in the notation of Figure


(VO 0 t:, 1


2-5, (gko)k=0,..:4


an xrnriar


b'-2 bi+2


]~ r: 1';
II -L


01 11 1


e 1

~I t 0 11 ~; cl


0

.0 *


310 220


130 040


Figure 2-5.


Patch construction. On the left, four vertex neighborhoods with vertices v" each contribute one sector
to assemble the 4 x 4 coefficients of the Bitzier patch 9, for example gon = O, an0 e n, 911 /nfO
us =ct Ui e4(\ve use superscripts to indicate vertices). On the right, the same four sectors are
used to determine a c-patch if the underlying quad is extraordinary. The indices of the control points of
y and b" are shown. Note that only a subset of the ccveliic imr.\ of the four triangudar pieces b" is
actually computed to define the c-patch. The full set of coefficients displayed here is only used to
analyze the construction. The indexing of 15 coefficients of a quartic triangular patch is shown on the
right. We use this labeling throughout the dissertation.










2.2.2 The Conversion Rules for a Type-2, or Type-3 Quad


Type-2 and Type-3 quads are known as irregular. The irregular quads have at least one and

possibly up to four vertices with valence other than 4. For each irregular quad, the conversion

involves two steps:

1. Apply regular rules defined in Figure 2-4 to generate v" and e" showrn in Figunre 2-1 left.


2. Thenn applyr rles in Figure~ 2- toyedb+ bj21,r h11 Shown in Figure 2-1 right.

We use the bicubic patch to outline the shape as we replace it by a c-patch (Figure 2-3, c). A

c-patch has the right degrees of freedom to cheaply and locally construct a smooth surface. We

introduce the c-patch in terms of a well-known Be~zier form of a polynomial piece bi of total

degree 4 [14]:


k++m=4
k,,m20
The c-patch is equivalent to the union of four bi, i = 0, 1, 2, 3 of total degree 4, but defined by

only 4 x 6 c-coefficients constructed in Figures 2-4 and 2-6:


VZ, ti, ti, bizz, bj21l, b112, i = 0, 1, 2, 3.


These 24 c-coefficients imply the missing interior control points of the representation (2-2) by

C1 continuity between the triangular pieces: for j = 0, 1, 2, 3 and i = 0, 1, 2, 3,

bi bi-1 (' + bi- )/ (2-3)
3-j,0,1+(3 0,3-j,1+j -,, ,-~


andu the~ bolunary control l pointsJ bieo are implied by degree-raising [14]:

bio :=v' b t : (vi + 3t ) /4, bi 20 __ t+1) /2,


bi 30 = i+1 3"+1) /4, bi 40: i+1. (2-4)


For all obj ects with boundaries, the boundary rules are simply the derivation of cubic Be~zier

curves defined by (vi, t t l~, vi+l). Basis functions corresponding to the 24 c-coeflficients of the










~+,4 (-~B+~i >+ ~_CB+1 (tg "~
-t q ~, ts"+l)3 (fi ~>


211


S( SB13 Ii+1 i+1:


_.i+l>


bit, 2


s*+ ~(b ,


bIg a
b ig(


- b l
- b -


b l)/16


Figure 2-6. Formulas for the 4 x 3 interior control points that, together with the vertex control points vi and the
tangent control points t(, define a c-patch. See also Figures 2-11 and 2-12. Here ci :- cos ~,
s:-sin 2" a"~nndsprsc~ripts are modu~lo\ 4. By, default, g : (CE oi +3(e6 + ) + 9fi)/64, the
central point of the ordinary patch.

c-patch can be read off by setting one c-coefficient to one and all others to zero and then applying
(2-3) and (2-4).

2.3 Derivation of the coefficients of a e-patch

When a c-patch sector b meets a c-patch sector a (Figure 2-12), the following equation
must hold to preserve G1 continuity across the boundary between b and a,


A(u)81b(u, 0) = 82b(u, 0) 8 1a(0, u),


(2-5)


where, with denoting the scalar, respectively three scalar products for the vectors,


3(Uo, 2UI, U2) 2 u


_> U2)


dlb(u, 0):

82b(u, 0):

81a(0, u):


4(vUo, 3vl, 3U2, U3) (3, U2 _)U U2, (


3>"


4(wo, 3wl, 3w2, I,. (3, U2


_>U( U2 ( U3)










Equation (2-5) can be rewritten in a collection of the following simplified forms in terms of


Ui,l ,


3XoUo = 21,,+ in-,,,


(2-6)

(2-7)

(2-8)

(2-9)


12(vl + wl)

12(v2 w2)


6AoUI + 3A Uo

3XoU2 + 6A U


3Ai U2 = 3 it 2,


2.3.1 Derivation of Ao and At

The scalar Ao is derived from (2-6). (2-9) sets the constraint for Az.


L~et Uon := (1, 0), V/o := (cOS: 2, Sin 2x), and Wo~ :


(cOS 2x, Sin 2).~ (Figure 2-7)


GUo, i = 62~ f0m degree raising.


We know uo


24 4 2 4


vo + wo


3 1 + cOS 2x7 S1H2x
4 2 '2
3 2xr
-(1 + cos -, 0)
4 no
3 2xr
-(1 + cos -)>Uo
4 no


3 1 COS 2x Sin 2x2
4 2o n 2


(2-10)


Hence, 4(vo + wo)


3(1 os)U


Similarly, because V3


(1 COS 2x rSin 2") and W3
n1 81


I -o CO 2x-Sin 2x,


2xr
-3 1 COS )U2
n1


4(v3 it'. )


(2-11)


Hence, At = (1 cos 2x)

2.3.2 Derivation of b211 and bl21

To derive the formulas for big, and its symmetric counterpart bj21 110te that the formulas

must guarantee a smooth transition between b" and its neighbor patch on an adj acent quad,











ni = valence of Ni
3/4*VO 27~ 3/4*V3

v0 v3
No/woaliw0 / O N1

uO=3/4*U0 u3=3/!'4;
w0 w3

3/ 4 *WO 3 /4*W3
3 2x 3 2n
v:0+w0rO= (cos +1)Uo v3 +w3 (1 -cos -)U2~
4 no 4 ut1

Figure 2-7. The re-parameterization of A to meet G1 at the vertex


regardless whether the adj acent quad is regular or irregular. That is, the formulas are derived to

satisfy simultaneously~~~ttt~~ttt~~ two types of smoothness constraints (see Section 2.4). From Equation

Triangular patches





O O







Ghost patch

Figure 2-8. Coefficients b211 and bl21 Of c-patch is derived on top of a ghost patch.


(2-7), we obtain
1 1
b211 a211 0U 1 X1 0 + 2b310 (2-12)
2 4

To get a second constraint and determine b211 uniquely, we consider the values b 7,, and ai, if

each ghost patch in terms of sin averages (Figure 2-8):

4so(b211 b310) + 1(b211 b220) = 3(bll b1o) yields

4Sob310 4S1b220 3(00 O
b211 = (2-13)
4(so + 81









Similarly,
4Sob310 + 1Sb220 0 (~ -1 )
a211 =(2-14)
4(so + s1)
Therefore,
3(fo0 e~~
b211 a211 (2-15)
2(so a )
Together with Equation (2-12),

1 1 3f
b21 =b31 01 (2-16)
4 8 4(so si)

Equation (2-8) implies
1 1
bl21 a121 0 2~ X1U 1 2bl30 (2-17)
4 2

Using the similar approach as deriving b211, we yield 4so(bl21 b220) + 1(bl21 bl30)

3(b21 b20) yields
4Slbl30 + SOb220 0 / 1 :
bl21= =C\ (2-18)
4(so + 81
Similarly,
4Slbl30SO Sb220? C1_tl
al21 =0rUJ (2-19)
4(so + si)
(2-18) and (2-19) 4

bl21 0121 =(~ -e) (2-20)
2(so + s )
(2-18) and (2-20) 4

1 1 3(f- >
bl21 = bl30+ Xov 0 011 I] 1 0 u\0 1(2-21)
81 4 41 (so + si)

The formulas (2-21) and (2-21) are the same as shown in Figure 2-6.

2.3.3 Derivation of b112

BycvllJvly contrast, b6112 is not pinned down by continuity constraints. We~ could- choose each ""1

arbitrarily without changing the formal smoothness of the resulting surface. However, we opt

for increased smoothness at the center of the c-patch and additionally use the freedom to closely

mimic the shape of Catmull-Clark subdivision surfaces, as we did earlier for vertices. First, we









approximately satisfy four C2 COnstraints across the diagonal boundaries at the central point b004

(Figure 2-9) by enforcing

1 -1 0 0 b12 b7,b2 2

0 1 -1 0 b2 1b7-2
(2-22)
0 0 1 -1 b12, 2 b7,-b21 2

-1 0 0 1 b1b7 2

where q := \Eobz j1.Teprubto by, q,.~+, is necessary,, since the ,,cffcien matrix

of the C2 COnstraints is rank deficient. After perturbation, the system can be solved with the
lastnm+ eqato implie byl the firs three.~m We~ addr the co~nstraint that the ave~rag of b}1 matches

g. := g( 11+, ),, the+;, center position ofthbcuicpach








I~I l





Figure 2-9. Dark lines cover the control points involved in the C2 COnstraints (2-22). The points
on dashed lines are implied by averaging.



1-1 0 0 bil b bO bj1


0 1 -1 0 bi2 1 b -b2

0 0 1 -1 bil$1 2 b 7, bij21 4

\1 1 1 1 b(2/ *

g, lies on the Bicubic patch at n = 0.5 and v = 0.5. The Bicubic control points are given

except interior 4 points, because all the control points on the boundaries are calculated. We can










use a mask of determining Be~zier control points from a uniform bicubic B-spline surface. Figure

2-10(a) is a mask for b l. For other interior points, we can use a symmetric mask.

031a ~ -1, 31 00--Iq 13 233 33 10
a 22 023 -2129 323 023 2

4 a as3 -118 21 313 or 3 3

b00, 30
a) b) c)

Figure 2-10. The center of a bi-cubic patch can be evaluated by the linear combination of the
boundary coefficients.


Figure 2-10(b) shows a mask for the evaluation of Bicubic patch at (0.5, 0.5).


g, = (boo + 3boi + ;:1.,,. + 03 + 3b10o 9bll 9b12 +:'t .
64

+;:i_,, + 9b21 + 9b22 '_ i~. + b30 +:' I +:' b33)


Now, we can solve for the b}12,, i = 0, 1, 2, 3 and obtain the formula of Figure 2-6.

2.4 Smoothness Verification


In this section we formally verify the following lemma. For the purpose of the proof, we

view the c-patch in its equivalent representation as four Be~zier patches of total degree 4.

Lemma 1. Two adjacent polynomial pieces a and b defined by the rules of Section 2.2 (Figure

2-4, Figure 2-6, (2-3), (2-4)) meet at least

(i) C2 if a and b correspond to two regular quads;

(ii) C1 if a and b are adjacent pieces of a c-patch;

(iii) C1 if a and b correspond to two quads, exactly one of which is regular;

(iv) 0I ithr tangent continuity if a and b correspond to two different irregular quads;


Proof: (i) If a and b are bicubic patches corresponding to regular quads, they are part of a

bicubic spline with uniform knots and therefore meet C2. (ii) If a and b are adj acent pieces of a

c-patch then Equations (2-3) enforce C1 continuity.









For the remaining cases, let b be a triangular piece. Let a the parameter corresponding to

the quad edge between b400 = 00, where u = 0 and the valence is NVo and b040 = Ul where u = 1

and the valence is NI1 (Figures 2-11 for (iii) and 2-12 for case (iv)). By construction, the common

boundary b(u, 0) = a(0, u) is a curve of degree 3 with Be~zier control points (vo, to t:, vl) so that

bicubic patches on regular quads and triangular patches on irregular quads match up exactly.

Denote by 81~b the partial derivative of b along the common boundary and by 82~b the par-

tial derivative in the other variable. Since b(u, 0) = a(0, u), we have 81b(u, 0) = 82a(0, u). The

partial derivative in the other variable of a is 82a. We will verify that the following conditions

hold, that imply tangent continuity:


if one quad is ordinary (case (iii)),

81b(u, 0) = 282b(u, 0) + 81a(0, u), (2-23)

if both quads are extraordinary (case (iv)),

((1 u6)Ao + u1Xl)8b(ul, 0) = 8b(U, 0) + 81aL(0, 11), (2-24)
2xr
where Ao := 1 + co, At := 1 c and ci := cos(


Both equations, (2-23) and (2-24), equate vector-valued polynomials of degree 3 (we write

81b(u, 0) in degree-raised form [14]). The equations hold, if and only if all Be~zier coefficients

are equal. Off hand, this means checking four vector-valued equations for each of (2-23) and

(2-24). However, in both cases, the setup is symmetric with respect to reversal of the direction in

which the boundary b(u, 0) is traversed. That means, we need only check the first two equations

(2-23') and (2-23") of (2-23) and the first two equations (2-24') and (2-24") of (2-24). We

verify these equations by inserting the formulas of Figures 2-4 and 2-6.

To verify (2-23), the key observation is that NVo = NI~ = 4 if one quad is ordinary. Hence

co = c = 0 and so = sl = 1 (cf. Figure 2-6) and t~ = e Therefore, for example (cf. Figure










e0 f0 I


b
**b3ol o bs
eo el


a


Figure 2-11. C1 transition between a triangular and a bicubic patch.


2-11)


282b(0, 0) = 2 4(b:3O1 Iyo) = 8 ( 1 o)




where the factor stm fromc,,,;;, ~ raiin te erefrom 3,' to 4; and the seondnr Bezienr coeCffcient of

81b(u, 0) (in degree-raised form) and of 282b(u, 0) are respectively (cf. Figure 2-11)

S(eg tvo) + 2(e: e8)
3 and

e eo o v~ o o>
2-4(b211 b:310) = 8( i o + 3 o
4 8 8

Then, comparing the first two Be~zier coefficients of 8 b(u, 0) and 282b(u, 0) + 81a(0, u) yields

equality and establishes C1 continuity:


81b(0,0)

(e -to) 2(e, e,)


282b(0,0) 81u(0 0)

2(e, e ) (e tUo) 3(fo

-3(fo o )
0\a


The equations for (2-24) are similar, except that we need to replace ej by tj and keep in

mind that, by definition,


(t o-1 ,o o _


,,o) = 2co~t o ,o






















Figure 2-12. G1 transition between two triangular patches.


Hence, for example,


82b(0, 0) + 81a(0, 0) = 4(b301 y0 + 301 VO0

= -4 2co(to vo)


The first of the four coefficient equations of (2-24) then simplifies to


3(1 + co) (to vo) = 4(b301 8 301 2vo/0

=O 3(-o oi
2 2

= 3-(2co(18 vo) 2( vo

Noti\ngr+~ tha tem (fo -0 e)/(8(o +s1)) in the expansions of b211 and a211 cancel, the second

coefficient equation is


610(t] to) + 3Ai/ti vo) = 12(b211 + 211 2b310)
12 2(1 + co) 12 2(1- c'),~
4 8

It is easy to read off that the qualities hold. So the claim of smoothness is verified.










2.5 Complexity Analysis


2.5.1 Number of Patches

The conversion scheme yields the minimum set of patches because (1) no initial refinement

for input coarse mesh is needed; (2) each quadrilateral facet of the coarse mesh corresponds to

only one patch. Namely, the total number of patches equals to the number of facets in the mesh.

The patch complexity of various schemes are compared in Figure 1-12.

The low cost of construction and evaluation makes c-patches an attractive representation,

not just on the GPU

2.5.2 Cost of Patch Construction

The separation into vertex and patch construction means that the number of scaled vertex

additions (adds) per patch is independent of the valence. The cost of computing the control points

per patch, (i.e.), with the cost of vertex computations distributed, is 4 x (4 + 1 + 1 + 2) = 32
adds per bicubic construction and computing tj from to and 1 andl determining, bjz j2 n

bi1 according to Figure, 2- amounts+ to, an;;,, adiina 4. x 2 6 + 6 + 12) = 104 adds per

c-patch. Each c-patch has 24 coefficients. This compares favorably to, say [30] where 16+12+12

coefficients are generated.

2.5.3 Cost of Surface Evaluation


The patch can be evaluated at any parametric domain (u, v) using de Castelj au's algorithm.

A tensor product Bi-cubic Be~zier patch is defined by 16 control points. The evaluation at

(u, v) needs 42 vector-vector additions, 42 scaler-vector multiplications, and 42 scaler-scaler

operations. Similarly the evaluation of a c-patch at (u, v) requires 40 vector-vector additions and

60 scaler-vector multiplications. In terms of evaluation cost, a c-patch has roughly the same cost

as a bicubic patch does.










2.6 Approximation Catmull-Clark Subdivision Surface


Since Catmull-Clark subdivision is a standard modeling tool, our scheme is designed to

approximate Catmull-Clark Subdivision Surface. In fact, the resulting Bi-cubic patches com-

pletely agree with the Catmull-Clark Subdivision Surface except in the immediate neighborhood

of irregular mesh vertices. In such a neighborhood they j oin at least with tangent continuity and

interpolate the limit of the irregular mesh vertex. Furthermore, the center of c-patch interpolates

the center point of the correspondent Catmull-Clark limit surface due to the choice of the c-patch

coefficient b112-

2.7 Water-Tight Surface Verification

Patches are evaluated independently. If the generated vertices along the boundary from the

adj acent patches do not match exactly, the refined mesh will have a hole in it. There are three

configurations for adj acent patches: (1) both are Bi-3 patches, (3) both are c-patches (2) one of

them is Bi-3 patch.

The coefficients defining the shared boundary curve are derived by the averaging rules

defined in Figure 2-4. Since additions are commutative, the generation of all boundary coef-

ficients are independent of the evaluation of the choice of patch. In other words, no round off

error and cracking are possible for the first case. The boundary coefficients of a c-patch are com-

puted by the same rules in Figure 2-4, therefore water-tightness are also achieved for the lateral

two cases. Note that computation of the cubic boundaries shared by a bicubic and a c-patch is

mathematically identical.

2.8 Discussion

The introduction of triangular patches to model quad patches is somewhat unconventional,

but has been used in an I3D paper before [15]. Also [49] is based on triangular patches.

Evaluation and normal computation of degree 4 triangular patches is comparable in cost to










1. :: .. .. i-oduct bicubic l.t1 11 in the triangular case wMe have to average 15 control points, in the

tensor-product case 16. Triangular i :t 7:- may deserve more attention in OpenGL.











CHAPTER 3
GPU IMPLEMENTATION


3.1 Overview


We implemented the conversion scheme using C++ on DirectX 10 pipeline. We compute

vertex neighborhoods according to Figure 2-4 in the vertex shader and use the geometry shader

primitive triangle 0I itr adjacency to accumulate the coefficients of the bicubic patch or compute

a c-patch according to Figure 2-6. We implemented conversion plus rendering in two variants: a

1-pass and a 2-pass scheme.


3.2 2-pass Approach







Vertex Sae
,, v, to~tl,f,


Geometry Shader









input Assmb





po~si-io7. normal
Pixel Shader




Figure 3-1. 2-pass implementation detailed in Figure 3-2. The first pass converts, the second renders. Note that the
geometry shader only computes at most 24 coefficients per patch and does not create (amplify to)
evaluation point primitives.











VS In ps, n, or
VS Use texture lookunp to r-erieve pzj, pa y 1

VS Out v, to,11, f;,4 j 0..n 1
GS In v' t(, f',- i ..
GS ifr legular1 quad
assemble: gkl, k,I~ = 01..3
else
comnpute b'l~~,z zbjzzbjz
GS Out if legullar quad, stream out 3; k,~ I 0..3.
else str~eam~ o~ult b'4o,tr(b, b21b ,
i 0..3.

Pass 2 Evahutitine~ Position and NJonnal
VS In (,v
VS ilglrqa
comrpute noral and position at (u, vr)
by the tensored de C'astel~~l' j n'll algoitiun
else
Compute: the remnaining Beizier~ contl~E~ points
Compute: normal and poiitionl at (u, 1)
by de Castelrljaul's abo(lit1un adjusted to c-paitches.
VS Out position, nolrmal
PS In possition, normal
PS compyute colori
PS Out ~color


ConvTersion


Figure 3-2. 2-Pass conversion: VS-vertex shader, GS=geometry shader, PS=pixel shader. VS Out
of Pass 1 outputs NV points fj for one vertex (hence the subscript) and GS In of Pass 1
retrieves four points f", each generated by a different vertex of the quad (hence the
superscript).


The 2-pass insplententation constructs the patches in the first pass using the vertex shader

and the geometry shader and evaluates positions and normals in the second pass. Pass 1 streams

out only the 4 x 6i coefficients of a c-patch and not the 4 x (4+22) Be~zier control points of

the equivalent triangular pieces. The data amplification necessary to evaluate takes place by

instancing a (u, v)-grid on the vertex shader in the second pass. That is, we do not stream back

large data sets after amplification. Position and normal are computed on the (u, v) domain [0..1]2


Pass I










of the bicubic or of the c-patch (not on any triangular domains). We pre-tessellate the quad

domain, and store the results in a set of textures with different resolution. If a tessellation factor

is chosen to be mr, the texture with (mr + 1) by (mr + 1) parametric values will be sent to the

vertex shader in the subsequent evaluation pass. Given the pre-tessellated domain with a patch

identifier, the vertex shader loads the appropriate control points and evaluates the patch. Figure

3-2 lists the input, output and the computations of each pipeline stage. Figure 3-1 illustrates this

association of computations and resources. In order to avoid pricy branching in HLSL(High

Level Shader Language) and optimize the performance, specialized shaders are actually written

for patch constructions and evaluation based on the patch type.

3.3 1-pass Approach

In the 1-pa~ss insplententation, the evaluation immediately follows conversion in the

geometry shader, using the geometry shader's ability to ansplipS, (i.e.), output multiple point

primitives for each facet (Figure 3-4). While a 1-pass implementation sounds more efficient

than a 2-pass implementation, DX10 limits data amplification in the geometry shader so that the

maximal evaluation density is 8 x 8 per quad. Moreover, maximal amplification in the geometry

shader slows the performance. We observed a minimum of 25' better performance of the 2-pa~ss

insplententation. Figure 3.3 lists the data flow on the graphics pipeline.

3.4 Coordinate System Transformation

When we evaluate normal and position of an irregular quad at (u, v), we need first

transform the tessellated domain value from a Cartesian coordinate (u, v) to a barycentric

coordinate (s, t, w). Figure 3-5 illustrates how to locate which of four triangles where (u, v)

lies on. In this way, we minimize number of comparisons and take care of the shared vertices.

We make (0.0, 0.0), (1.0, 0.0), (0.5, 0.5) only belong to TI, (1.0, 1.0) only belongs to T2, and

(0.0, 1.0) only belongs to T4-













VS In p,,n, a
VS Use textiure lookup to3 retri~ever pzf !y, p+1
Compute v-ey,-,f,, f/od1
VS Out vr COitolty fi iJi 0..~n-1
GS Inl v',f~!14Cri, f 0..3
GSifrdrqa
assembled gkl k, !1I =..3
telsFsllare thte: pammetrie-i domain
compute ~nonal andl positions a (u, v)

else
computer b 1b'21! b; 12
co-mpute the remaining Bezie control points
tesse~llate the pnamInetheic domain
co-mpute no~nnal and2 position at (u, v)
b~y de Casnteljou's ,algorillun adjusted to e-pa'thels.
PS I~n pos~ition. no-rmnl
PS comrrpute color
PS Ou~t cor


Pa ss 1


Com ersion a~nd Ev~aluation


Figure 3-3. 1-Pass conversion: VS-vertex shader, GS=geometry shader, PS=pixel shader. GS
amplifies the geometry and evaluates the patches.


Input Assembler

P,, n, a
Vertex Shader
v, to,t,,f,

v', toltl f1


Position, normal
Pixel Shader

olor,


Figure 3-4. At present, the 1-pass conversion-and-rendering must place patch assembly and evaluation on the
geometry shader. This is not efficient.









(O 0,1 0)


(1 0 1 0)


If (v-u <= 0)
if (u+ v-1 <= 0) T,
n '"" ,,else T2
else

ifs Ti(u+v-1 <=0) T4


Figure 3-5. (u, v) on an irregular quad.


3.5 Water-Tight Evaluation

The HLSL code in Figure 3-6 shows that the same cubic curve is evaluated along the

boundary. An explicit if- statement in the evaluation guarantees the exact same ordering of

computations since boundary coefficients are only computed once,

VS OUTPUT eval cpatchEval( VS_INPUT_levarl input, uiint: vlD: S'V_1nstancelD)

VS_OUTPUT_eval output;

// Compute 1 5 Bezier coefficients of the triang ular patch


// In th e boundary case for th e sake of waateriig htness.
[branch]
if (wV == 0) {
// The v-coordiinaite is what needs to be used to evaluate at the
// boundary. Since u=1-v, it is also used here.
output.lP = ( ( u*u*u)*b300 + ( v*v*v)*b030 ) + ( 13*u'u'v)*b210 + (3*u*v*v)*bl 20);

else {
!/Evaluate~s the interior points


return output;


Figure 3-6. Water-tight Evaluation










3.6 Conclusion


The presented approach fits well into a GPU pipeline. In both approaches, we compute v,

e, f and t using its vertex neighborhood and the rules in Figure 2-4 in the vertex shader. Each

vertex has 2n + 1 vertices in its vertex neighborhood, where n is the valence. This information

is stored in a texture. With a vertex ID and its valence, all vertices in its neighborhood can

be retrieved in counter-clockwised order. In the geometry shader, the patch is finalized and

assembled. Overall, the 2-pass implementation has better performance because of small stream-

out, short geometry shader code and minimal amplification on the geometry shader.









CHAPTER 4
RESULT S

4.1 Shape Quality

Our algorithm produces C1 surfaces and they closely approximate Catmull-Clark subdivi-

sion surfaces. We compare our algorithm with [30] on the closeness to Catmull-Clark surfaces.

We measure how the surface is close to Catmull-Clark surface by comparing both geometric dif-

ference and normal angle difference. Figure 4-1 compares the smoothed quad mesh surfaces with

densely refined Catmull-Clark subdivision surfaces based on the same mesh. Both geometric

distance, as percent of the local quad size, and normal distance, in degrees of variation, are com-

pared. Especially after displacement, large models rendered by subdivision and quad smoothing

appear visually indistinguishable. The relatively small examples, without displacement, shown

in Figure 4-1 and the close up in Figure 4-5 are also important to support our observation that

c-patches do not create shape problems compared to a single bicubic patch: despite the lower

degree and internal C1 j oin, their visual appearance is remarkably similar to that of bicubic

patches. The comparison with ACC-patches [30] is shown in 4-2. Figures 4-3, 4-4 show the

generated smooth surface by our algorithm and the surface after applying displacement mapping

respectively.

CC Our Scherne CC Our Scherne
Surface OO 99



Geometry f~~
Difference (%)
0
M4
S8 4C AII 11
Normal Angle
Difference (O) 1 I p


Figure 4-1. Comparison between the Catmull-Clark (CC) subdivision limit surface and the
smoothed quad mesh surface for the same input.











C-patch
Avg. 1.07474
Max: 4.43114


ACC patch. Loop 07
Avg: 1.76099
Max:4 92876


Geometry
distance error


Normal
angle error


Figure 4-2. Comparison of ACC-patch and C-patch in terms of approximation of Catmull-Clark
subdivision surfaces for the same input.


Figure 4-3.


GPU smoothed quad surfaces: orange patches correspond to ordinary quads, blue
patches to extraordinary quads.


Figure 4-4. GPU smoothed quad surfaces with displacement mapping.


9


Avg: 1.88284
Max: 8.06158


Avg: 1.89969
Max: 7.06922












We compiled and executed the implementation on the latest graphics cards of both

maj or vendors under DirectX10 and tested the performance for several industry-sized models.

Two surface models and models with displacement mapping are shown in Figure 4-3 and

4-4 respectively. Table 4 summarizes the performance of the 2-pass algorithm for different

granularities of evaluation. The frog model, in particular, provides a challenge due to the large

number of extraordinary patches. The Frog Party shown in Figure 4-11 currently renders at 50

fps for uniform evaluation for N=9, (i.e.), on a 9 x 9 grid. That is, the implementation converts

1292 9 quads, of which 59% are extraordinary, and renders of 1 million polygons 50 times per

second. On the same hardware, we measured Bunnell's efficient implementation (distribution

accompanying [9]) featuring the single frog model, (i.e.), 1/9th of the work of the Frog Party,

running at 44 fps with three subdivisions (equivalent to tessellation factor N=9). That is,

Table 4-1. A a total degree 4 patch and a bicubic patch have the same evaluation cost at (u, v) in
terms of ALU operations.
evaluation for a c-patch ALU vector ops
position 55
normal 3
other 1
total 59
evaluation for a bicubic patch ALU vector ops
position 56
normal 3
other 0
total 59


Table 4-2. Frames per second for some standard test meshes with each patch evaluated on a grid
of size NVx NV; eqs = percentage of extraordinary quads. Sword and Frog are shown
in Figure 4-3, Head in Figure 4-1.
Mesh Frames per second
(verts,quads, eqs) N = 5 9 17 33
Sword (140,13 8, 38%) 965 965 965 703
Head (602,600, 100%) 637 557 376 165
Frog (1308, 1292, 59%) 483 392 226 87


4.2 Performance





























Figure 4-5. Close-up of the frog. The refined mesh is water-tight.

Table 4-3. Performance of the 1-pass implementation.
Mesh Slower 1-pass implementation
NV= 2 5 8
Sword 389 96 43
Head 108 34 15
Frog 44 10 4


GPU smoothing of quad meshes is an order of magnitude faster. Compared to [46], the speed

up is even more dramatic. While the comparison is not among equals since both [46] and [9]

implement recursive Catmull-Clark subdivision, it is nevertheless fair to observe that the speedup

is at least partially due to our avoiding stream back after amplification (data explosion due to

refinement). We expect that more careful storage of vertex neighborhoods, in retrieving order,

will further improve our use of texture cache and thereby improve the frames per second (fps)

count.

4.3 Displacement Mapping


Displacement mapping is a technique for adding geometric details on the mesh with a

height map. It is different from Bump Mapping or Normal Mapping in the sense that it changes

the geometry by moving vertices often along their normal directions according to the value in the










height map. The change of real geometry, not just normal for instance in Bump Mapping, permits

self-occlusion. Figure 4-6 shows the displacement mapping on the frog model which consists of

330k facets. The size of height map is 1024 by 1024.









Figure 4-6. Displacement mapping on the frog model


In order to perturb normals after displacement mapping, we need D, and D, bump

mapping value. The equation to calculate new normals is as follows.


S =P + D n (4-1)


where, S is the displacement of the point P, D is the displacement and n is the normal of P. Then

the new normal is calculated by the cross product of S, and S,.


S, = P,+ D, -n + D n, (4-2)


S, = P, +D, -n + D n, (4-3)

Note that n, and n, are the derivatives of the normalized normal n.


n,= (4-4)


where n, = P,, x P, + P, x Ps,

4.4 Morphing and Animation

We implement morphing using the 2-pass approach. The animated sequence of the input

meshes in form of textures are fed into the Input Assembler of the first pass each frame. The

morphed patches are constructed during the first pass. Fine details are added in the second pass.

The screen shots in Figures 4-9, 4-10, 4-11 illustrate real time displacement and animation.

























N-patch, Vlahos 00


Catrnull-Clark
Subdivision


ACC: patch. Loop 07


C-patch


Figure 4-7.


Comparison of the c-patch scheme with PN-Triangles(also called N-patch),
ACC-patch, and Catmull-Clark subdivision


Input Mesh


Catmull-Clark
Subdivision


N-patcr. Vla hos 00


C-patch


Figure 4-8. comparison of the c-patch scheme with PN-Triangles(also called N-patch),
ACC-patches, and Catmull-Clark subdivision






















Figure 4-9. Real time animation on the Sword model.












Figure 4-10. Real time animation on the Frog model.


Figure 4-11. Asynchronous animation of nine Frogs.









4.5 Conclusion


Smoothing quad meshes on the GPU offers an alternative to highly refined facet repre-

sentations transmitted to the GPU and is preferable for interactive graphics and integration with

complex morphing and displacement.

We advertised a 2-pass scheme, since, as we argued, the DX10 geometry shader is not

well suited for the data amplification for evaluation after conversion. The 1-pass scheme

outlined in Section 3 may become more valuable with availability of a dedicated hardware

tesselator [29, 48]. Such a tesselator will make amplification more efficient and support adaptive

tessellation (which is why we only discussed uniform tessellation in Section 3). Such a hardware

amplification will also benefit the 2-pass approach in that the (u, v) domain tessellation, fed into

the second pass will be replaced by the amplification unit.









CHAPTER 5
PATCH CONVERSIONS FOR MESHES WITH TRI/QUAD/PENT FACETS

Our conversion algorithm can be generalized to work for arbitrary meshes. The generalized

algorithm [34] provides an elegant solution for meshes with Tri/Quad/Pent Facets. Removing

restrictions on vertex valences and allowing meshes with triangles, quadrilaterals, and pentagons

vastly simplifies a designer's task and enriches the design space of meshes for smooth surfaces:

while quads naturally model the flow of (parallel) feature lines and are therefore the main facet

type in models, triangular facets allow merging lines while pentagonal facets allow to starting

new lines (Figure 5-1) without creating T-corners or forcing refinement of intermediate models

to satisfy connectivity or quad-layout constraints. Essentially, designers can re-use the whole

range of polyhedral models they are used to. We modified the algorithm for converting quad

meshes to a generalized method for a mesh with Tr/Quad/Pent facets. The generalized scheme

converts such a polyhedral model to a surface with everywhere well-defined normal and C2 i

'regular' mesh regions with quad-grid connectivity. Figure 5-2 shows an example of the resulting

surfaces. Note that the facets are limited to triangles, quads and pentagons due to current GPU








Figure 5-1. (a) Retaining the density of feature lines while varying their number. (b),(c) Axe
handle detail using a triangle and a pentagon to transition between detailed and
coarser areas.


constraints and to avoid unnecessary notational, technical and shape complexity.

An irregular facet with k sides is converted into a k-patch. A k-patch is a generalization

of a c-patch. It is a piecewise degree 4 C1 spline patch with k cubic boundaries. A k-patch

is defined by 6k + 1 control points indicated as o in Figure 5-3(b),(c). That is, the k-patch

corresponding to a triangular, quadrilateral or pentagonal facet is defined by a total of 19, 24 or

31 points respectively.
























Figure 5-2.


The generalized scheme converts a mesh with Tri/Quad/Pent Facets to a smooth
surface consisting of bi-cubic patches (yellow), k-patch with k = 3 (green), k = 4
(red), and k = 5 (gi av}.


-i~4-


0313 23 3





ordinary
(a)


l"h:" b"
polar extraordinary
(b) (c)


Figure 5-3.


(a) An ordinary facet is converted to a bi-cubic patch with 16 control points gij.
(b),(c) An extraordinary facet with k sides is converted to a k defined by 6k + 1
control points shown as 0. The k can be viewed as k C1-connected degree-4
triangular patches i, i = 0 .. k-1 with cubic outer boundaries.


300 210 120 030
(a)


004
10 13
202 2
301/ ~s p 031
400 310 220 130 040
(b)


Figure 5-4. The triangular sectors are listed in counter-clockwise order with a modulo-k
superscript. (a) 14 control points from three consecutive sectors of a k-patch define
(b) a single patch in triangular Be~zier-form.









For evaluation, we can recover the polynomial representation of the ith sector in triangular

-form of total-degree 4 (Figure 5-3(b) and (c)),


S(uv) :=C ijk uivildV(1U- u -vLk (5-1)
i+j+k=4

where the (4+~2) BB-coefficients ijk E R3" are indexed as in Figure 5-4. Specifically, we compute

the (4+2") COefficients ijk (Figure 5-4(b)) from the 14 coefficients labeled in 5-4(a) by simple
averaging: degree-raising the coefficients i3-1,1,0, I = 0, .,3 to i4-,,0, e= 0, .,4

1i400, 1310, 1220, 1130, 1040]= [1300, 4303ii i20i20 4 2 3, 1030]

and computing the shared -coefficients on the sector boundaries i3-,0,1+ 10,3-,1+,

=0, 1, 2, 3, (i.e.), indices 301, 202, 103 and 004 in Figure 5-4 (b), from the C1 constraints.

Read [34] for a thorough explanation of the algorithm and its GPU implementation,

smoothness verification, etc.









CHAPTER 6
DISCUSSION AND FUTURE WORK

6.1 Future GPU API

Our conversion scheme not only fits well with the current graphics hardware pipeline,

but also matches very well with the architecture of the future graphics hardware[29, 48]. The

work load currently in the geometry shader will be assigned to the patch shader. The ideal GPU

pipeline needs to explore more parallelism in the geometry shader where 24 coefficients of a

c-patch can be computed independently given the vertex neighborhood. The maximal parallelism

makes the cost of deriving one coefficient roughly equals to the cost of constructing a whole

patch. Currently we precompute the tessellated domain and store these static values in a set

of textures. In the future, this part of computation will be replaced by the tessellation unit.

Animation using our conversion scheme will be achieved in a single pass without geometry

transmission between passes.

6.2 Volume Preservation

Preserving the volume under constraints can achieve a realistic deformable obj ect ani-

mation. The well-known divergence theorem can be used to reduce a volume integral to an an

integral over the surface. Given a closed obj ect, volume is matched to a prescribed value by

infl ating or defl ating the deformable obj ect uniformly. For enhancing the realism, this method can

be further extended to fix parts of the obj ect and attach different material properties to surface

pieces. This exact, localized volume preservation method works for all surfaces that consists

of Bezier patches. Therefore, we will combine this method with our new surface conversion

algorithm to achieve real-time volume preservation.

6.3 Adaptive Tessellation

The adaptive tessellation samples each surface patch more densely in regions of high

curvature and less densely in regions of low curvature. Moreover it adjusts the level of detail

according to how close the geometry is to the camera. The surface is only tested where and when










it's necessary. Therefore, adaptive tessellated surface will greatly improve the performance. The

tessellation factor can be generated by using the flat test [9]. With the tessellation unit in the

GPU, the cost of tessellating the domain is almost free.










[15] C. Gonzalez and J. Peters. Localized hierarchy surface i.11:. In S. S. J. IRossignac, editor,
At : .' :.. .. .. ..... on Interactive 3D' C::.., .-!: : .7-15, 1999.

[16] M. Guthe, A. Balaizs, and R. Klein. GPU-based trimming and tessellation of NURBS and
T ;l:-.: :-At : .' i . ... . .. oni C: .~ .!7 241(3): 1016-1023,

[17] M. Guthe, A. Balaizs, and R. Klein. GPU-based trimming and tessellation of NURBS and
T-spline ::: 1.:- A( if :' !..: C. 9::l 241(3):1016--1023,

[18] M. Halstead, M. Kass, and T. DeRose. riTE:. .:.1 fair interpolation using C ::::i :
: ? :. of SIGGRAPH2~ 93, i : 35--441, Alug 199)3.

[19] H. Hoppe, T. D3eRose, T. Duchamnp, M. Halstead, H. Jin, J. McDonald, J. Schwneitzer, and
WJl. i.:: i ] 1 i -. smooth .::: E reconstruction. C.. re-t-: Graplhics,r 28(A1nnual
Conference Series::~': 302, 1994.

[20] D1. L. James and C. D). T..1_ Skinning mesh animations. In S'IG( :.;.' / :'t '05: A(''
SIGGRAPH_~::: '.'':-. .- pages _1'^ 07, New/ York, NY, USA, 2005.,': '1

[21] K. 1 :: -:.::: i .: and J. F: -: : Guided i:: :::I~::
http ://www. cise. ufl. edu/resear; !. 3./::i ? papers. shtml.

[22] O. A. ii. :: penko and J. F;. H-ughes. Smoothskietch: 3d free-form shapes from I -i
sketches. Afl: ': .- -. .- : .- onz Cr ~.. :: 25/3:589)-598,

[23] Lt. K~avan, C. O' Sullivan, and J. Zara. Efficient collision detection for spherical blend
skinning. In 1 :. ...- of ..7. 4th intern7 atio nal .,:..... ona C. 1 -c./. .: ; 7.. .!' and
interactive 7.:.... in Aulstralasia an~d .7 Aia: table o~fcontents, .. ...: LungnuI
MaTJlays~ia, i : 1 i --156, I~;

[24] Lt. K~avan and J. Zara. Spherical blend 7:: a real-time deformation of articulated
model s. In I3D) '05: P; .7. of cthle 2: sympnosium7 on Interactfive 3D) graphics an~d
games, .: 9-16, New York, NY,I L _~:~-.l 4 A

[25] A. Krishnamurthy, R. Khardek~ar, and S. McMains. Direct evaluation of nurbs curves and
on the GPU. In .' i:' '07: 7 .7: fl .7 i 7A( \: /syg~ii~i~osriuml on Solid an7d
physical.... .7 : 329-334, "i Y Tork, NYi, L 4, :-- -7. At

[26] S. ILai and Fi. F;. Cheng. ,": 1.: i ive rendering of catmull-clark subdivision surfaces. In C7AD-
CGr '05:P. Y. .- --:~ of he Ninth Intlernlational C.~ :;: on C ... -'. AidedL~ D. .. : and
Computer C. ..: ... A pages 125--132, Wiashington, DC~, USA, 1:I. IEEE Computer Society.

[27] A. Lee, Hi. Mloreton, and HI. H-oppe. Displaced subdivision surfaces. In Kt. Akeley,
a p ...... .,77,,,,.7...7 p. pages 85-9C4. Atl ii Press / At .
SIGGRAPH /Addison Wetsley Longman, 2: -~. citeseer.ist.psu.edu/ilee00Cdisplac~ed .ht

[28] A. ILee, Ht-. M~oreton, and H. if i Dii isplaced subdivision surfaces. In Kt. Akeley, editor,
-: -- : 1:2 C1 -:-:.I. .: C (:.I : --7. .. ...-:. Annual Confe~rence Series, pages 85--94.
A1CM Press / A1CM SIG-= :, 4PH / Alddison WVesley Longman, I~::










[43] H. Prautzsch, W. B3oehm, and M~f. Paluzny. Bezier and B-" .1 :. .1::? ......... ,; ,


[44] K. Pulli and M. Segal. Fast rendering of ::i .. : : :. : :: .~: In SIGGR2T~ A PH '96: AC: '
SIG(C'..:. ; PH~ 96 Visual PI . .r:. T/.. ar~t and ..: .J. ;.. .J -- .: .::. of,~'I;. SIG .:-PH
'96, page 144, Newv Yorki, NY, C~^ 4, 1996. ACM.

[45] S. Schaefer and J. WJlarren. Exact evaluation of non-po ynomial :il:: : : schemes at
rational parameter values. In PG; '07: P; r :_ of I ..'- 1 ~. C1 ..:. .. -. on
Clompukter C:l ..--7... .~ antd I-. -7.. .... .. 32 1-3 30, .:: : : : -:. DC USA : 1 ? ~
C .= -1.. Society.

[46] IL.-~J. Shiue, 1. Jones, and J. Peters. A realtime GPU subdivision kernel. In M. Gross,
editor, 7. C'ompukter C' I P: I: Annual Conference: Series, i
1010c-1015. ACM Pr~ess / ACJM SIIGGRAPH /: Addison We'tsley Lotngm-an, I::

[47] J. Stam. Exact devaluation of Gatmull-Clarki subdivision surfaces at arbitrary parameter
values. In SG;( -. i P.nFr i 395 i:^ = 1 1998.

[48] A. Tatarinov. Instanced .- .ii ==:.:. in .::. : i0,
http ://wwwv.microsoft. com/idow:ij: i= : : i :1 i :: i : .;- Familyld=5 72BE8A6--263A -41424-
A 7FE-69C FF 1A 5B 18Odi spl ayl ang=en.

[49] Ai. Vlachos, J. To r-: : C. Boyd, and J. L. '. IT 1: -.11 Curved PN triangles. In -- 1:,
S' ..: : oni Initeractfive 3D> C .. Bi-Annual Conference Series, i.. I59-166.


[50] D>. Zorin. '::i : : : .: for modeling and animation. AC 1 ',S'IGGRAPHH Course Notes,












Tianyun Ni was born in ... I: China. She was awarded her BS in computer science with

mathematics minor from Texas State University in '- -~ and her ME in computer engineering

from University of Florida in She earned her doctoral degree in computer i ': field in


BIOG~RAPHICIIAL SKETCH-





PAGE 1

REAL-TIMESMOOTHSURFACECONSTRUCTIONONTHEGRAPHICSPROC ESSING UNIT By TIANYUNNI ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2008 1

PAGE 2

c r 2008TianyunNi 2

PAGE 3

Tomyfamily,especiallymyfatherandtoallofwhomhavelent encouragementandsupport duringthetimespentonthisresearch 3

PAGE 4

ACKNOWLEDGMENTS Iwishtoexpressmysincerestthankstothechairofmydisser tationcommittee,Dr.J¨org, Peters,forworkingwithmethroughoutthislongenterprise 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................... 4 LISTOFTABLES ....................................... 7 LISTOFFIGURES ....................................... 8 ABSTRACT ........................................... 10 CHAPTER 1INTRODUCTION .................................... 11 1.1Motivation ...................................... 11 1.2ProblemStatement ................................. 13 1.3ModernGPUPipelineandCurrentTrends ..................... 14 1.4RepresentationsinSurfaceModeling ........................ 17 1.4.1SubdivisionSurfaces ............................ 17 1.4.2ParametricPatches ............................. 20 1.4.2.1Beziertechnique ......................... 22 1.4.2.2Relatedwork ........................... 23 2ANEWSCHEMEFORSURFACECONSTRUCTION ................. 25 2.1Contribution ..................................... 25 2.2TheConversionAlgorithm ............................. 25 2.2.1TheConversionRulesforaType-1Quad .................. 27 2.2.2TheConversionRulesforaType-2,orType-3Quad ............ 29 2.3Derivationofthecoefcientsofac-patch ..................... 30 2.3.1Derivationof 0 and 1 ........................... 31 2.3.2Derivationof b 211 and b 121 .......................... 31 2.3.3Derivationof b 112 .............................. 33 2.4SmoothnessVerication .............................. 35 2.5ComplexityAnalysis ................................ 39 2.5.1NumberofPatches ............................. 39 2.5.2CostofPatchConstruction ......................... 39 2.5.3CostofSurfaceEvaluation ......................... 39 2.6ApproximationCatmull-ClarkSubdivisionSurface ................ 40 2.7Water-TightSurfaceVerication .......................... 40 2.8Discussion ...................................... 40 3GPUIMPLEMENTATION ................................ 42 3.1Overview ...................................... 42 3.22-passApproach .................................. 42 5

PAGE 6

3.31-passApproach .................................. 44 3.4CoordinateSystemTransformation ......................... 44 3.5Water-TightEvaluation ............................... 46 3.6Conclusion ..................................... 47 4RESULTS ......................................... 48 4.1ShapeQuality .................................... 48 4.2Performance ..................................... 50 4.3DisplacementMapping ............................... 51 4.4MorphingandAnimation .............................. 52 4.5Conclusion ..................................... 55 5PATCHCONVERSIONSFORMESHESWITHTRI/QUAD/PENTFACETS ..... 56 6DISCUSSIONANDFUTUREWORK .......................... 59 6.1FutureGPUAPI ................................... 59 6.2VolumePreservation ................................ 59 6.3AdaptiveTessellation ................................ 59 REFERENCES ......................................... 61 BIOGRAPHICALSKETCH .................................. 65 6

PAGE 7

LISTOFTABLES Table page 4-1ALUoperationsforevaluationat ( u;v ) .......................... 50 4-2Performanceresults .................................... 50 4-3Performanceofthe1-passimplementation. ....................... 51 7

PAGE 8

LISTOFFIGURES Figure page 1-1Polygonalmodeling .................................... 11 1-2Problemstatement ..................................... 12 1-3DirectX10pipelinestages ................................ 14 1-4DirectX10pipeline .................................... 15 1-5Theprimitives ....................................... 16 1-6Thenotationsofinputmesh ................................ 17 1-7Thethreepossiblecongurations ............................. 17 1-8TheCatmull-Clarkstencils ................................ 18 1-9Thesubdivisionschemes ................................. 19 1-10Thesuggestedrenderingpasses .............................. 21 1-11FutureGPUarchitecture ................................. 22 1-12Thesubdivisionschemes ................................. 24 2-1Derivationofc-patch ................................... 25 2-2Vertexcomputation .................................... 26 2-3Surfaceconversion .................................... 26 2-4Computingcontrolpoints v e f and t ,theprojectionof e ................ 27 2-5Patch-basedcomputation ................................. 28 2-6Patchcomputation ..................................... 30 2-7There-parameterizationof tomeet G 1 atthevertex .................. 32 2-8Coefcients b 211 and b 121 ofc-patchisderivedontopofaghostpatch. ......... 32 2-9Thechoiceofmiddlepointinc-patch .......................... 34 2-10Thecenterofabi-cubicpatchcanbeevaluatedbythelin earcombinationoftheboundarycoefcients. ...................................... 35 2-11 C 1 transitionbetweenatriangularandabicubicpatch. ................. 37 2-12 G 1 transitionbetweentwotriangularpatches. ...................... 38 8

PAGE 9

3-12-Passimplementation .................................. 42 3-22-Passconversion ..................................... 43 3-31-Passconversion ..................................... 45 3-41-Passimplementation .................................. 45 3-5 ( u;v ) onanirregularquad. ................................ 46 3-6Water-tightEvaluation .................................. 46 4-1Shapequalitycomparison ................................. 48 4-2Catmull-Clarkapproximationcomparison ........................ 49 4-3Ordinarypatchesandextraordinarypatches ....................... 49 4-4GPUsmoothedquadsurfaceswithdisplacementmapping. ............... 49 4-5Close-upofthefrog.Therenedmeshiswater-tight. .................. 51 4-6Displacementmappingonthefrogmodel ........................ 52 4-7Shapecomparison ..................................... 53 4-8Shapecomparison ..................................... 53 4-9RealtimeanimationontheSwordmodel. ........................ 54 4-10RealtimeanimationontheFrogmodel. ......................... 54 4-11AsynchronousanimationofnineFrogs. ......................... 54 5-1ThereasonsforusingTr/Quad/PentMeshes ....................... 56 5-2Aquad/tri/pentmodel ................................... 57 5-3Patchrepresentations ................................... 57 5-4Triangularrepresentation ................................. 57 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy REAL-TIMESMOOTHSURFACECONSTRUCTIONONTHEGRAPHICSPROC ESSING UNIT By TianyunNi August2008 Chair:J¨org,PetersMajor:ComputerEngineering Increasedrealismininteractivegraphicsandgamingrequi rescomplexsmoothsurfaces toberenderedateverhigherframerates.Inparticular,rep resentationsusedtomodelsurfaces ofine,suchassplineandsubdivisionsurfaces,havetobem odiedorreorganizedtoallow forefcientusageofthegraphicsprocessingunitanditsSI MD(SingleInstruction,Multiple Data)parallelism.Thisdissertationpresentsanovelalgo rithmforconvertingquadmesheson theGPUtosmooth,water-tightsurfacesatthehighestspeed documentedsofar.Theconversion reproducesbi-cubicsplineswhereverpossibleandclosely mimicstheshapeoftheCatmull-Clark subdivisionsurfacebyc-patcheswhereavertexhasavalenc edifferentfrom4.Thesmooth surfaceispiecewisepolynomialandhaswell-denednormal severywhere. 10

PAGE 11

CHAPTER1 INTRODUCTION Thischapterintroducesthechallengesthatmotivatethedi ssertation,givesadetailed literaturereview,positionsoftheresearchrelativetoth ecurrentstateoftheart.andanoverview ofthemodernGPUpipeline. 1.1Motivation Ingraphics,3Dobjectsareapproximatedbypolyhedralmesh esofgreatcomplexity.For example,agamecharactercanconsistoftensofthousandsof polygons(Figure 1-1 ).Increased realismininteractivegamingdemandssuchmeshestobeanim atedandrenderedinreal-time. Thereareessentiallytwomajorapproachesintheliteratur ewhichservethispurpose:Polygonal ModelingandHigher-orderSurfaceModeling. Therearetwoscenariosofanimations:MorphingandSkinnin g.Morphingisusedto changeoneimageintoanotherthroughaseamlesstransition .Skinningisacommontechnique todeformcharacters[ 20 23 24 32 ].Theanimatedmesh,referredasa”skin”,isdeformed basedontheposeofanunderlyingskeleton.InPolygonalMod eling(Figure 1-1 ),skinning andmorphingareappliedtoahigh-detailmeshcreatedbyana rtist.Mostgamescurrentlyuse thisapproach.Thistechniqueinvolvesredundantworkduet ominimalsharinginPolygonal Modelingrepresentation.Inaddition,alargenumberofver ticesinacomplexmeshmustbefed intothegraphicspipelineviatheGPU'smemorybus,whichis apotentialbottleneck. Figure1-1.PolygonalModeling:currentlythepopularanim ationapproachingames. 11

PAGE 12

Thealternativeapproach,SurfaceModeling,animatesacoa rsemesh(Figure 1-2 ). Subdivisionsurfacesandparametricpatches,astwopopula rhigh-ordersurfacerepresentations, bothsupportlevelofdetailrendering(seeSection1.4).Hi ghly-detailed3Dmodelsareproduced bydisplacementmapping[ 11 ].Displacementmappingaddsnedetailsinformofscalar eldsonthesmoothsurfacedenedbythecoarsemesh.Asaspe cicinstance,Lee[ 27 ] proposesDisplacedSubdivisionSurfacetorepresentadeta iledsurfacemodelasascalarvalue displacementoverasmoothsurfacedomain.Thisapproachre ducesthenumberofverticesthat mustbereadandanimatedineachframebecausecomplexgeome tricdetailsaregeneratedon theGPU.Theruntimecostnowincludestheconversionproces sfromthecoarseinputmeshto thenalcomplexmesh.Theconversionprocessinvolvessurf aceconstruction,evaluationand displacementmapping. Figure1-2.Eachhigh-detailmeshinSurfaceModelingisrep resentedbyacoarsecontrolmesh withadisplacementmap.Thecoarsecontrolmeshisrstconv ertedtoasmooth surface.Thenthesurfaceistessellatedandtheverticesar eperturbedinthenormal directionsbasedonthecorrespondingvalueinthedisplace mentmap.Last,the normalateachvertexoftherenedhighly-detailedmeshisu pdated. Insummary,theadvantagesofSurfaceModelingare 1. lowercomputationcostofanimationbecauseskinningisdon eonthecoarsemesh,notthe naldensemesh; 2. memoryandbandwidthsavingsbyencodingmostdetailasonedimensionaldisplacements ratherthanthree-dimensionalvectors; 12

PAGE 13

3. supportofrenementlevelonthey; 4. customizationofarchetypes:wecanmodeldifferent3Dmode lswiththesamecoarse mesh,changingonlythedisplacementmap; 5. supportofadaptivetessellation:evaluationdoesnothave tobeonauniformgrid. ThedisadvantagesofSurfaceModelingisthatmodernGPUsca nnotrendersuchsurfacedirectly. Thesurfacemustbeconvertedintotrianglesorquadsthroug haprocessoftessellationand evaluation.Therefore,SurfaceModelingbecomesmoreattr activeasareal-timetechniqueonlyif theconversionismorecheaplythanthecostofreadingandan imatingahigh-polygonmesh.Our goalistodesignsuchaschemeontheGPU. 1.2ProblemStatement Meshesconsistofpurequadrilateralfacetsarecommoninmo delingforanimation.Any polyhedralmeshcanbeconvertedintosuchaquadmeshbyones tepofmeshrenement.Buta gooddesignercreatesmesheswiththequad-restrictioninm indsothatnoglobalrenementis necessary.Wethereforefocusonquadrilateralmeshesanda imtoderiveasetofefcientrules directlyontheGPU(Figure 1-2 ,thereddottedrectangle)thatproducesurfaceswithgoodv isual quality.Specicallytheresultingsurfacesshould 1. generateasmallnumberoflowdegreepolynomials; 2. possesssmoothgeometry(noextracostforsmoothshading); 3. closelyapproximateCatmull-Clarksurfaces(astandardmo delingtool); 4. arewater-tight(nopixeldropsout); 5. mapwelltothegraphicspipelineandleveragethestrengths ofGPUcomputation. 13

PAGE 14

1.3ModernGPUPipelineandCurrentTrends Agraphicsprocessingunit(GPU)isadedicatedgraphicsren deringdevice.ItsSIMD architecturehasevolvedsubstantiallyoverthelastdecad e.Thishighlyparallelstructuremakes itmoreeffectivethangeneral-purposeCPUsforarangeofal gorithms.ModernGPUsexposea programmableparallelstreamprocessingpipelineasaseri esofshortprogramscalledshaders. Duringthelastveyears,majorgraphicssoftwarelibrarie ssuchasOpenGLandDirectXare usedtoprogramtheGPUviashadersonaprogrammablepipelin e,whichhasmostlysuperseded theolder”xed-functionpipeline”.Thetwomostpopulargr aphicssoftwarelibraries,DirectX andOpenGL,currentlybothspecifyAPIsforthreetypesofsh aders:vertex,geometry,andpixel shader.TheshadersinDirectX10system[ 4 ](Figure 1-4 )shareacommoncorethataccesses upto128memorybuffersand16parameter(constant)buffers .Vertexandpixelshadersusea ”one-in,one-out”dataprocessingmodel.Incontrast,theg eometryshaderhasalimitedabilityto amplifyorreduceprimitivecountandthusisabletochangem eshes.Figure 1-3 showstheinput Figure1-3.TheinputandoutputofeachpipelinestageinDir ectX10system andoutputofeachpipelinestage.Themoredetailedexplana tionofeachstageisasfollows: 14

PAGE 15

Figure1-4.DirectX10Pipeline1. TheInputAssembler(IA)gathersvertexdatatosetupvertex andindexbuffers.Vertex bufferscontainper-vertexdatawhileindexbuffersdeneg eometryprimitivesasinteger indicesintovertexbuffers.Indexinghelpsavoidredundan tcomputationsofthesame vertex. 2. Thevertexshader(VS)typicallyprocessesvertex-basedop erationssuchaschangingthe positionandnormalofasinglevertex.Thecomputationsint hisstagearelocal.Each vertexonlyhasitsowninformationanddoesnotcommunicate withothervertices.TheVS ismostcommonlyusedtotransformverticesfromobjectspac etoclipspace. 3. Thegeometryshader(GS)processestheverticesofasinglep rimitive.Aprimitivecanbe apoint,alinesegment,atriangle,apointwithadjacency,a linesegmentwithadjacency, andatrianglewithadjacency(Figure 1-5 ).Duetotheavailabilityoftheprimitivevertices upto6verticesforatrianglewithadjacency),thecomputat ionsinthestageareless localthanthoseontheVSandPS.TheGScanemitadditionalpr imitives.Thisnew amplicationfeature,introducedinDirectX10,addsmore exibilityandmakesanumber ofalgorithms[ 1 ]possibletobeimplementedontheGPU,suchasmeshrenemen t, shadowvolumes,dynamicparticlesystems,etc.Thegeometr yshaderoutputmaybefedto therasterizerstageand/ortoavertexbufferinmemoryviat hestreamoutputstage. 15

PAGE 16

4. Therasterizer(TR)isaxed-functionstagegeneratingfra gmentsbyllinginthepolygonssentthroughthegraphicspipeline.Clipping,culling ,perspectivedivide,viewport transform,primitiveset-up,scissoring,depthoffsetals ohappeninthestage. 5. Thepixelshader(PS)operatesononefragmentatatime.Usua llyscenelightingand pixel-relatedeffectssuchasbumpmappingandcolortonema ppingoccurinthePS. 6. Theoutputmerger(OM)takesafragmentfromPSandperformst raditionalstenciland depthtestingoperationsaswellasrendertargetblendingt ogenerateanalpixelonthe screen. Figure1-5.Thesixprimitivesusedin GS ThefutureGPUpipeline[ 29 48 ]isexpectedtoprovideaTessellationUnit,combinedwith newshaderstagesforpatchconversionandevaluationoftes sellatedhigh-ordersurfaces.The Tessllatorprovidesasolutiontoadaptiverenementonthe graphicshardware.Basedonuserprovidedtessellationfactorsperedge,thetessellatorad aptivelycreatesasamplingpatternof theunderlyingparametricdomainandautomaticallygenera tesasetofparametricdomains.In addition,twospecialshadersareintroducedtothenext-ge nerationGPUpipeline.Thepatch shaderconvertsaninputmeshtoasetofpatches.Theevaluat ionshadertakesthe ( u;v ) outputof thetessellatorandevaluatesthepatchat ( u;v ) .ThisfutureGPUarchitecturealsoallowstheGPU toexploitmoreparallelismbecausemultiplearithmeticun itscanberunningthesameevaluation shader.MoreovertessellationoccursontheGPUandovercom esthebottleneckofbusbandwidth causedbymodelcomplexity.ThenewGPUdesignindicatesSur faceModelingisthetrendfor real-timegraphics. 16

PAGE 17

1.4RepresentationsinSurfaceModeling InComputerGraphics,surfacesarerepresentedbypolyhedr almeshes.Apolyhedral meshisacollectionofvertices,edgesandfacets.Thevalen ceofthevertexisthenumberofits incidentedges.Eachfacetisann-sidedpolygon.Inatriang ular(orquadrilateral)mesh,nequals 3(or4respectively).Anarbitrarymeshhasn-sidedpolygon swherethevalueofnisarbitrary. ThedifferencebetweenRegularandIrregularVerticesaree xplainedinFigure 1-6 .Figure 1-7 illustratesthreepossibletypesofafacet. Figure1-6.Tri-andQuadrilateralmeshesandfacettypes1, 2,3. Figure1-7.Thethreepossiblecongurations.Type-1Quadi sregular.Type-2or3isirregular. Parametricpatchesandsubdivisionsurfacesaremajortool sformodelingfreeformsurfaces witharbitrarytopology.Amoreintuitivewayforinexperie nceduserstocreateshapebydrawing curves,orsketchisalsoavailable[ 22 36 ] 1.4.1SubdivisionSurfaces Subdivisionsurfaces,aspartofstandardmodelingpackage s(e.g.,3DMax,Maya,Softimage,Mirai,Lightwave,etc.),haveproventobeausefulmo delingtool.Subdivisionschemes wererstintroducedby[ 10 12 31 ].Theygenerateasmoothsurfacethroughmeshrenement 17

PAGE 18

process.Thismethodbeginswithacoarsemeshthatapproxim atesa3dmodel,knownasa controlmesh.Eachvertexinthecontrolmeshiscalledacont rolpoint.Controlpointsinuence theshapeofthelimitsurface.Themeshisrenedaftereachs ubdivisionstepbyinsertingnew verticesintothemesh,reningexistingpointpositions,a ndupdatingtheconnectivity.The positionsofthenewverticesinthemesharecomputedbythea veragingrulesthatapplytothe positionsofnearbyoldvertices.Theaveragingrulesaredi fferentfromschemetoscheme(see acomparisoninFigure 1-9 ),anditistheserulesthatdeterminethepropertiesofthes urface. Thegraphsthatillustratestherulesarecalledstencils.T hebinarysubdivisionsplitseachedge into2whileternarysubdivisionspliteachedgeinto3.Usua llyeachsubdivisionschemehasat mostthreetypesofrules:vertexstencil,edgestencil,and facestencil.Forexample,thestencils ofCatmull-ClarksubdivisionisshowninFigure 1-8 .Therenementrulesincludesstencils forsmoothsurfaceaswellasspecialrulesforcreatingshar pedorsemi-sharpedfeatures.Each renementstepproducesadensermeshthanthepreviousone. Thelimitsubdivisionsurfaceis thesurfaceproducedfromthisprocessafterinnitelymany timesofrenements.Inpracticaluse however,thisalgorithmisonlyappliedalimited,andusual lyfour,numberoftimes. Figure1-8.ThestencilsusedinCatmull-Clarksubdivision .Thesestencilsdenetherulesto derivethenewverticesthatlieontheoldvertices,edges,a ndfacets. Arealizationoftessellation-on-the-yforLoopsubdivis ionsurfaceswasproposedin [ 33 ].Pulli[ 44 ]implementedLoop'ssubdivisionschemewithadditionsbyH oppeetal[ 19 ]. Bischoff[ 3 ]proposedaforward-differencingmethodthatonlyrequire saconstantamountof memoryregardlessofsubdivisionstep.DeRose[ 13 ]generalizedtheinnitelysharpcreases of[ 19 ]toobtainsemi-sharpcreases.Hoppe[ 19 ]extendedLoop'sschemebyintroducing 18

PAGE 19

Figure1-9.ClassicationofcommonSubdivisionSchemes. subdivisionrulesthatleadtoapiecewisesmoothsurfacewi thfeaturessuchascreases,corners, darts,andconicalvertices. Adaptivesubdivisioncandramaticallyspeeduptheperform ancebecausethelevelof detail(LOD)isupdatedbasedondynamicdistancewiththeca meraaswellasthecomplexity ofeachpartofthemodel.Adaptiverenementispreviouslyi mplementedusingquad-treedata structure[ 50 ].Eachlevelofthetreerepresentsonerenementlevelofth emesh.However,it isdifculttomaptherecursivenon-uniformtreestructure toparallelcomputation.Bunnell[ 9 ] providescodeforadaptiverenement.Eventhoughthiscode wasoptimizedforanearlier generationGPUs,thisimplementationadaptivelyrenderst hesubdivisionsurfacesinreal-time oncurrenthardware.LaiandCheng[ 26 ]implementedadaptiveCatmull-Clarksubdivision.A hardwarearchitecturesupportforadaptiverenementispr oposedby[ 5 ] TheimplementationofsubdivisionsurfacesontheGPUcanbe roughlycategorized intothreegroups:(I)recursiveevaluation[ 9 13 28 44 46 ];(II)directevaluation[ 45 47 ]; (III)pre-tabulatedbasisfunctioncomposition[ 6 7 ].Recursiveevaluationisthemostintuitive way,butnotthemostefcientapproach.Stam[ 47 ]directlyevaluatessubdivisionsurfacesat 19

PAGE 20

arbitraryparametervalues.However,Stam'smethodcannot evaluateameshthatcontains Type-3quads.Moreover,therequiredprojectionofcontrol pointsintotheeigenspaceistoo complexforlargemeshesontheGPU.Theweaknessof[ 6 7 9 46 ]isnotabletoconvertamesh withType-3quadseither.Togetridofthosequadsusuallyme ansapplyingatleastoneCatmullClarksubdivisionstepontheCPUandfour-folddatatransfe rtotheGPU.Inmoredetail,Shiue implementsrecursiveCatmull-Clarksubdivisionusingsev eralpassesviathepixelshader,using texturesforstorageandspiral-enumeratedmeshfragments formaximizingparallelism[ 46 ].Bolz tabulatesthesubdivisionnodalfunctionsuptoagivendens ityandlinearlycombinetheminthe GPU[ 6 7 ].Thenumberofnodalfunctionsequalsthenumberofthevert icesoftheinputmesh. Oneoftheobviousadvantagesofsubdivisionsurfacesisthe ycanmodelsurfacesof arbitrarytopologicaltype.Alsobecauseofstaticreneme ntruleforeachschemesubdivision surfacesareeasytoimplement.Althoughsubdivisionsurfa ceshavebeenknownfornearly twentyyears,theirusehasbeenhinderedinrealtimeapplic ationssuchasgamesbecause recursiverenementisneithermemoryefcientnorperform anceefcient.Multiplepasses arerequiredtorenderavisuallysmoothsurface.Moreover, approximately4-foldofgeometry increaseaftereachsubdivisionstepcausesheavymemorytr afconthebusbetweentheCPUand theGPU.1.4.2ParametricPatches SincecurrentandimpendingGPUcongurationsfavorshorte xplicitsurfacedenitions overrecursivelydenedsurfaces,thealternativePatch-b asedrenementhasbeenadvocatedfor fastrendering.Parametricpatches(shortasPP)arerender eddirectlyintermsoftheirpolynomial representations,asopposedtoacollectionofapproximati ngfacets.Generallyspeaking,PP convertscontrolmeshestoasetofpatchesthatareparametr icpiecewisepolynomials.PP schemescanconvenientlytintoa2-passimplementationon thecurrentgraphicspipeline (Figure 1-10 ).Thetworenderingpassesarecombinedtoonepassinafutur eGPUpipeline (Figure 1-11 )[ 48 ]. 20

PAGE 21

Figure1-10.Theanimation,DisplacementMapping(DM)take placeinVSoftherstpass,and secondpassrespectively.Therstpassconvertsthedeform edcontrolmeshtoits parametricpatchrepresentations.Inthefollowingpass,t hedetailsareaddedusing DMaftertheevaluationoftheproducedpatchesfrompreviou spass. TheoverallspeedofaPPschemeisinuencedbyboththecompl exityofpatchesandthe numberofpatches.Forshapemeasurements,adesiredPPsche meensuresatleast G 1 continuity acrosstheadjacentpatchesandisacloseapproximationofs ubdivisionsurfaces.Oneofthe biggestchallengeistoensurethesmoothnesseverywhereov erthepatches.Petersexplainedhow tosolvethevertexenclosureproblemandgeometriccontinu ityin[ 39 41 ]. GPU-basedevaluationoftrimmedNURBssurfacesisproposed in[ 16 25 ].Peters[ 40 ] usedanapproximationtothelimitsurfaceofDoo-Sabinsubd ivisiontogetaquicklyconvergent seriesofapproximationstothevolumeoftheenclosedsubdi visionsurface.Thedifcultproblem ofllingn-sidedholesisrecentlysolvedby[ 21 42 ].Bajajetal.[ 2 ]introducedA-patches intri-variateBBformwithfewfreeparameterstoadjustthe shapebothlocallyandglobally. In[ 15 ],thefree-formsurfaceisrepresentedineitherNURBSform orascubictriangular BezierpatchesAnexplicitsplinerepresentationofsmoot hfree-formsurfacesistoformthe basisofaninteractivesculptingenvironment.Inthespiri toftheTessllator,Boubekeur[ 8 ] 21

PAGE 22

Figure1-11.Onepossiblepassonthefuturegraphicsrender ingpipeline, describesagenericrenementpatternforSurfaceModeling (tessellation+displacement)onany programmableGPU.1.4.2.1Beziertechnique TheBezierformisaparametricsurfacerepresentationand wasrstdevelopedin1972 bytheFrenchengineerPierreBezier.Acomprehensiveover viewoftheBezierformcanbe foundin[ 43 ].ABezierpatchisadenedbycontrolpoints.ABeziersur face,asasetofBezier patches,arepiecewisepolynomials.Theyarevisuallyintu itiveandmathematicallyconvenient duetothefollowingproperties: 1. Afneinvariance:Applyinganafnetransformationtoacon trolmeshappliesittothe correspondingBezierpatchaswell. 2. Theconvexhullproperty:ABezierpatchliescompletelywi thintheconvexhullofits controlpoints,andthereforealsocompletelywithinthebo undingboxofitscontrolpoints inanygivenCartesiancoordinatesystem. TherearetwotypesofBezierpatch: 22

PAGE 23

AtensorproductpatchinBezierformofdegree m by n isdenedas: g ( u;v ):= m X i =0 n X j =0 g ij m i u i (1 u ) m i n j v j (1 v ) n j : where ( u;v ) isabarycentriccoordinateonthedomainof [0 ; 1] [0 ; 1] AtriangularBezierpatchofdegreenisdenedas: b ( s;t;w ):= X i + j + k = n i;j;k 0 b ijk n i j k s i t j w k : where ( s;t;w ) arethebarycentriccoordinatesonatriangledomain. 1.4.2.2Relatedwork Forquadrilateralinputmeshes,itiswellknownthatType-1 quadscanbeconvertedinto degree3by3patchesintensor-productBezierformbythest andardB-splinetoBezierconversionrules[ 14 ].Therefore,anytwoadjacentpatchesderivedfromordinar yquadswilljoin C 2 TheinterestingaspectistheconversionofType-2andType3quads.Anumberoftechniques(see acomparisoninFigure 1-12 )existtosmoothoutquadmeshes.Peters[ 38 ]generatesNURBS output,thatcouldberendered,forexamplebytheGPUalgori thmof[ 17 ].Butthishasnotbeen implemented.Themethodof[ 30 ]generatesonebicubicpatchperquadfollowingtheshapeof Catmull-Clarksurfaces.Sincethesebicubicpatchestypic allydonotjoinsmoothly,Loopand Schaefercomputetwoadditionalpatcheswhosecrossproduc tapproximatesthenormalofthe bicubicpatch.Aspointedoutin[ 49 ],thistrompel'oeilrepresentsasimplesolutionwhentrue smoothnessisnotneeded.Comparingthenumberofoperation sinconstructionandevaluation, themethodof[ 30 ]shouldrunatcomparablespeedstoourGPUquadmeshsmoothi ng.Our method[ 37 ]designsac-patchforconvertinganirregularquad.Theres ultingc-patchesforma G 1 surface.Thealternativealgorithmproposedby[ 35 ]usesabi-5Bezierpatchforeachirregular quad. 23

PAGE 24

Figure1-12.ThisgurecomparesexistingPPschemesinterm sofhowwelltheymeetthe performanceandshapemeasurements.geom=geometrypatche s,tan=tangent patches. 24

PAGE 25

CHAPTER2 ANEWSCHEMEFORSURFACECONSTRUCTION 2.1Contribution Thisthesisproposesasetofrulesforconvertingaquadrila teralmeshtoasurfaceconsistingofbi-cubicsplineswhereverpossible.Eachirregularq uad(Figure 1-7 )isconvertedtoanovel C 1 surfacepatch(short c-patch ).ThesurfacecloselymimicstheshapeoftheCatmull-Clark subdivisionsurfaceandisconstructedentirelybylocalparal leloperationsontheGPU.Theresulting surfaceispiecewisepolynomialandhaswell-denednormal severywhere.Theevaluationavoids pixeldropout. Ac-patchisa C 1 piecewisepolynomialpatchwithcubicboundary.Itisdene dby24 coefcientswhoseinstantiationforasmoothsurfaceisgiv eninSectionxxxbelowandindicated inFigure 2-1 .Ac-patchhasanalternativerepresentationasfourtriang ular,totaldegree4patches inBernstein-Bezierform(Figure 2-5 right ). Figure2-1.Thec-patchcoefcients.For i =0 ; 1 ; 2 ; 3 ,theboundarycoefcients v i and e ij denedbyvertexneighborhoods(gure 2-4 speciestheformulas).Theinterior coefcients b i211 b i121 b i112 (gure 2-6 ),where i =0 :: 3 ;j =0 ::n i ; and n i isthe valenceof v i 2.2TheConversionAlgorithm Herewegivethedetailedalgorithmforconvertingthequadm eshintocoefcientsthat deneasmoothsurfaceoflowdegree.Essentially,theconve rsionfromameshtoapatch 25

PAGE 26

Figure2-2.Smoothingthevertexneighborhoodaccordingto Figure 2-4 .Thecenterpoint p ,its directneighbors p 2 j anddiagonalneighbors p 2 j +1 formavertexneighborhood, j =0 ::n 1 Figure2-3.a)Aquadneighborhooddeningasurfacepiece.b )Abicubicpatchwith 4 4 controlpoints.Thispatchistheoutputifthequadisregula r,andusedtodetermine theshapeofa c-patch c)ifthequadisirregular.Ac-patchisdenedby 4 6 control pointsdisplayedas andcanalternatively,foranalysis,berepresentedasfour C 1 -connectedtriangularpiecesofdegree4withdegree3outer boundariesidentical tothebicubicpatchboundaries. 26

PAGE 27

consistsofcomputingnewpointsnearavertexusingtheknow ledgeofthe vertexneighborhood A vertexneighborhood consistsofameshpoint p andmeshpoints p k k =0 ;:::; 2 N 1 of allquadssurrounding p (Figure 2-2 ).theunionofthefour vertexneighborhoods isathe quad neighborhood (Figure 2-3 ,A.)thatdenesapatch.Inourscheme,thepatchiseitherat ensor productbi-cubicBezierpatch,orac-patch.2.2.1TheConversionRulesforaType-1Quad RecallthataquadisType-1ifallfourverticeshave4neighb ors.Type-1quadsare consideredregularintheliterature.Suchafacetwillbeco nvertedintoadegree3by3patchin tensor-productBezierformbythestandardB-splinetoBe zierconversionrules[ 14 ].Therefore, anytwoadjacentpatchesderivedfromType-1quadswilljoin C 2 .Figure 2-3 illustratesthe derivationprocessfromaquadtoaBi-cubicBezierpatch.T heconversionrulesareshownin Figure 2-4 Figure2-4. Computingcontrolpoints v e f and t ,theprojectionof e ,atavertexofvalence N fromthemesh points p j ofavertexneighborhood;thesubscriptsaremodulo 2 N .Bydefault, N := c N +5+ p ( c N +9)( c N +1) = 16 ,thesubdominanteigenvalueofCatmull-Clark subdivision. Avertex v computedaccordingtoFigure 2-4 isthelimitpointofCatmull-Clarksubdivisionasexplained,forexample,in[ 18 ].Therulesfor e j and f j arethestandardrulesfor convertingauniformbicubictensor-productB-splinetoit sBezierrepresentation.Thepoints t j areaprojectionof e j intoacommontangentplane(seee.g.[ 15 ]).Thedefaultscalefactor isthesubdominanteigenvalueofCatmull-Clarksubdivisio n.Wenotethatfor N =4 e j +2 =2 v e j and =1 = 2 sothattheprojectionleavesthetangentcontrolpointsinv ariantas 27

PAGE 28

t j = e j : for N =4 ;t j = v + 2 4 ( e j e j +2 )= v +( e j v )= e j : (2–1) Inthenextstage,wecombineinformationfromfourvertexne ighborhoods,asshowninFigure 2-5 ,topopulateatensor-productpatch g ofdegree3by3inBezierform[ 14 ]: g ( u;v ):= 3 X k =0 3 X ` =0 g k` 3 k u k (1 u ) 3 k 3 ` v ` (1 v ) 3 ` : Thepatchisdenedbyits16controlpoints g k` .TheformulasofFigure 2-4 makethispatchthe BezierrepresentationofabicubicsplineinB-splineform .Forexample,inthenotationofFigure 2-5 ( g k 0 ) k =0 ;:: 3 =( v 0 ;t 00 ;t 11 ;v 1 ) Figure2-5. Patchconstruction.Ontheleft,fourvertexneighborhoods withvertices v i eachcontributeonesector toassemblethe 4 4 coefcientsoftheBezierpatch g ,forexample g 00 = v 0 g 10 = e 00 g 11 = f 0 g 30 = v 1 g 31 = e 10 (weusesuperscriptstoindicatevertices).Ontheright,th esamefoursectorsare usedtodetermineac-patchiftheunderlyingquadisextraor dinary.Theindicesofthecontrolpointsof g and b i areshown. Notethatonlyasubsetofthecoefcientsofthefourtriangu larpieces b i is actuallycomputedtodenethec-patch. Thefullsetofcoefcientsdisplayedhereisonlyusedto analyzetheconstruction.Theindexingof15coefcientsof aquartictriangularpatchisshownonthe right.Weusethislabelingthroughoutthedissertation. 28

PAGE 29

2.2.2TheConversionRulesforaType-2,orType-3Quad Type-2andType-3quadsareknownasirregular.Theirregula rquadshaveatleastoneand possiblyuptofourverticeswithvalenceotherthan4.Forea chirregularquad,theconversion involvestwosteps: 1. ApplyregularrulesdenedinFigure 2-4 togenerate v i and e ij showninFigure 2-1 left. 2. ThenapplyrulesinFigure 2-6 toyield b i211 ;b i121 ;b i112 showninFigure 2-1 right. Weusethebicubicpatchtooutlinetheshapeaswereplaceitb yac-patch(Figure 2-3 c ).A c-patchhastherightdegreesoffreedomtocheaplyandlocal lyconstructasmoothsurface.We introducethec-patchintermsofawell-knownBezierformo fapolynomialpiece b i oftotal degree4[ 14 ]: b i ( u 1 ;u 2 ):= X k + ` + m =4 k;`;m 0 b ik`m 4! k ` m u k1 u `2 (1 u 1 u 2 ) m : (2–2) Thec-patchisequivalenttotheunionoffour b i i =0 ; 1 ; 2 ; 3 oftotaldegree4,butdenedby only 4 6 c-coefcientsconstructedinFigures 2-4 and 2-6 : v i ;t i0 ;t i1 ;b i211 ;b i121 ;b i112 ;i =0 ; 1 ; 2 ; 3 : These24c-coefcientsimplythemissinginteriorcontrolp ointsoftherepresentation( 2–2 )by C 1 continuitybetweenthetriangularpieces:for j =0 ; 1 ; 2 ; 3 and i =0 ; 1 ; 2 ; 3 b i3 j; 0 ; 1+ j = b i 1 0 ; 3 j; 1+ j :=( b i3 j; 1 ;j + b i 1 1 ; 3 j;j ) = 2; (2–3) andtheboundarycontrolpoints b ik` 0 areimpliedbydegree-raising[ 14 ]: b i400 := v i ;b i310 :=( v i +3 t i0 ) = 4 ;b i220 :=( t i0 + t i +1 1 ) = 2 ; b i130 :=( v i +1 +3 t i +1 1 ) = 4 ;b i040 := v i +1 : (2–4) Forallobjectswithboundaries,theboundaryrulesaresimp lythederivationofcubicBezier curvesdenedby ( v i ;t i0 ;t i +1 1 ;v i +1 ) .Basisfunctionscorrespondingtothe24c-coefcientsoft he 29

PAGE 30

Figure2-6. Formulasforthe 4 3 interiorcontrolpointsthat,togetherwiththevertexcont rolpoints v i andthe tangentcontrolpoints t ij ,denea c-patch .SeealsoFigures 2-11 and 2-12 .Here c i :=cos 2 N i s i :=sin 2 N i andsuperscriptsaremodulo4.Bydefault, g :=( P 3i =0 v i +3( e i0 + e i1 )+9 f i ) = 64 ,the centralpointoftheordinarypatch. c-patchcanbereadoffbysettingonec-coefcienttooneand allotherstozeroandthenapplying ( 2–3 )and( 2–4 ). 2.3Derivationofthecoefcientsofac-patch Whenac-patchsectorbmeetsac-patchsectora(Figure 2-12 ),thefollowingequation mustholdtopreserve G 1 continuityacrosstheboundarybetweenbanda, ( u ) @ 1 b ( u; 0)= @ 2 b ( u; 0)+ @ 1 a (0 ;u ) ; (2–5) where,with denotingthescalar,respectivelythreescalarproductsfo rthevectors, ( u ):=( 0 ; 1 ) ( u; 1 u ) @ 1 b ( u; 0):=3( U 0 ; 2 U 1 ;U 2 ) ( u 2 ;u (1 u ) ; (1 u ) 2 ) @ 2 b ( u; 0):=4( v 0 ; 3 v 1 ; 3 v 2 ;v 3 ) ( u 3 ;u 2 (1 u ) ;u (1 u ) 2 ; (1 u ) 3 ) @ 1 a (0 ;u ):=4( w 0 ; 3 w 1 ; 3 w 2 ;w 3 ) ( u 3 ;u 2 (1 u ) ;u (1 u ) 2 ; (1 u ) 3 ) 30

PAGE 31

Equation( 2–5 )canberewritteninacollectionofthefollowingsimplied formsintermsof U i ;v i ;w i 3 0 U 0 =4 v 0 +4 w 0 (2–6) 6 0 U 1 +3 1 U 0 =12( v 1 + w 1 ) (2–7) 3 0 U 2 +6 1 U 1 =12( v 2 + w 2 ) (2–8) 3 1 U 2 =4 v 3 +4 w 3 (2–9) 2.3.1Derivationof 0 and 1 Thescalar 0 isderivedfrom( 2–6 ).( 2–9 )setstheconstraintfor 1 Let U 0 :=(1 ; 0) V 0 :=(cos 2 n 0 ; sin 2 n 0 ) ,and W 0 :=(cos 2 n 0 ; sin 2 n 0 ) .(Figure 2-7 ) Weknow u 0 = 3 4 U 0 ;u 3 = 3 4 U 2 fromdegreeraising. v 0 + w 0 = 1 2 ( 3 4 V 0 + 3 4 U 0 )+ 1 2 ( 3 4 W 0 + 3 4 U 0 ) = 3 4 ( 1+cos 2 n 0 2 ; sin 2 n 0 2 )+ 3 4 ( 1+cos 2 n 0 2 ; sin 2 n 0 2 ) = 3 4 (1+cos 2 n 0 ; 0) = 3 4 (1+cos 2 n 0 ) U 0 (2–10) Hence, 4( v 0 + w 0 )=3(1+cos 2 n 0 ) U 0 0 =(1+cos 2 n 0 ) Similarly,because V 3 =(1 cos 2 n 1 ; sin 2 n 1 ) and W 3 =(1 cos 2 n 1 ; sin 2 n 1 ) 4( v 3 + w 3 )=3(1 cos 2 n 1 ) U 2 (2–11) Hence, 1 =(1 cos 2 n 1 ) 2.3.2Derivationof b 211 and b 121 Toderivetheformulasfor b i211 anditssymmetriccounterpart b i121 notethattheformulas mustguaranteeasmoothtransitionbetween b i anditsneighborpatchonanadjacentquad, 31

PAGE 32

Figure2-7.There-parameterizationof tomeet G 1 atthevertex regardlesswhethertheadjacentquadisregularorirregula r.Thatis,theformulasarederivedto satisfy simultaneously twotypesofsmoothnessconstraints(seeSection 2.4 ).FromEquation Ghost patch Triangular patches Figure2-8.Coefcients b 211 and b 121 ofc-patchisderivedontopofaghostpatch. ( 2–7 ),weobtain b 211 + a 211 = 1 2 0 U 1 + 1 4 1 U 0 +2 b 310 (2–12) Togetasecondconstraintanddetermine b 211 uniquely,weconsiderthevalues b 211 and a 211 if eachghostpatchintermsof sin averages(Figure 2-8 ): 4 s 0 ( b 211 b 310 )+4 s 1 ( b 211 b 220 )=3( b 11 b 10 ) yields b 211 = 4 s 0 b 310 +4 s 1 b 220 +3( f 0 0 t 00 ) 4( s 0 + s 1 ) (2–13) 32

PAGE 33

Similarly, a 211 = 4 s 0 b 310 +4 s 1 b 220 +3( f 0 n 0 1 t 00 ) 4( s 0 + s 1 ) (2–14) Therefore, b 211 a 211 = 3( f 0 0 e 00 ) 2( s 0 + s 1 ) (2–15) TogetherwithEquation( 2–12 ), b 211 = b 310 + 1 4 0 ( t 11 t 00 )+ 1 8 1 ( t 00 v 0 )+ 3( f 0 0 e 00 ) 4( s 0 + s 1 ) (2–16) Equation( 2–8 )implies b 121 + a 121 = 1 4 0 U 2 + 1 2 1 U 1 +2 b 130 (2–17) Usingthesimilarapproachasderiving b 211 ,weyield 4 s 0 ( b 121 b 220 )+4 s 1 ( b 121 b 130 )= 3( b 21 b 20 ) yields b 121 = 4 s 1 b 130 +4 s 0 b 220 +3( f 1 0 t 11 ) 4( s 0 + s 1 ) (2–18) Similarly, a 121 = 4 s 1 b 130 +4 s 0 b 220 +3( f 1 1 t 11 ) 4( s 0 + s 1 ) (2–19) ( 2–18 )and( 2–19 ) ) b 121 a 121 = 3( f 1 0 e 11 ) 2( s 0 + s 1 ) (2–20) ( 2–18 )and( 2–20 ) ) b 121 = b 130 + 1 8 0 ( v 1 t 11 )+ 1 4 1 ( t 11 t 00 )+ 3( f 1 0 e 11 ) 4( s 0 + s 1 ) (2–21) Theformulas( 2–21 )and( 2–21 )arethesameasshowninFigure 2-6 2.3.3Derivationof b 112 Bycontrast, b i112 isnotpinneddownbycontinuityconstraints.Wecouldchoos eeach b i112 arbitrarilywithoutchangingtheformalsmoothnessofther esultingsurface.However,weopt forincreasedsmoothnessatthecenterofthec-patchandadd itionallyusethefreedomtoclosely mimictheshapeofCatmull-Clarksubdivisionsurfaces,asw edidearlierforvertices.First,we 33

PAGE 34

approximatelysatisfyfour C 2 constraintsacrossthediagonalboundariesatthecentralp oint b 004 (Figure 2-9 )byenforcing 266666664 1 100 01 10 001 1 1001 377777775 266666664 b 0112 b 1112 b 2112 b 3112 377777775 = 1 2 266666664 b 0211 b 1121 q b 1211 b 2121 q b 2211 b 3121 q b 3211 b 0121 q 377777775 ; (2–22) where q := 1 4 P 3i =0 ( b i211 b i121 ) .Theperturbationby q isnecessary,sincethecoefcientmatrix ofthe C 2 constraintsisrankdecient.Afterperturbation,thesyst emcanbesolvedwiththe lastequationimpliedbytherstthree.Weaddtheconstrain tthattheaverageof b i112 matches g := g ( 1 2 ; 1 2 ) ,thecenterpositionofthebicubicpatch. Figure2-9.Darklinescoverthecontrolpointsinvolvedint he C 2 constraints( 2–22 ).Thepoints ondashedlinesareimpliedbyaveraging. 0BBBBBBB@ 1 100 01 10 001 1 1111 1CCCCCCCA 0BBBBBBB@ b 0112 b 1112 b 2112 b 3112 1CCCCCCCA = 1 2 0BBBBBBB@ b 0211 b 1121 q b 1211 b 2121 q b 2211 b 3121 q 8 g 1CCCCCCCA g liesontheBicubicpatchat u =0 : 5 and v =0 : 5 .TheBicubiccontrolpointsaregiven exceptinterior4points,becauseallthecontrolpointsont heboundariesarecalculated.Wecan 34

PAGE 35

useamaskofdeterminingBeziercontrolpointsfromaunifo rmbicubicB-splinesurface.Figure 2-10 (a)isamaskfor b 11 .Forotherinteriorpoints,wecanuseasymmetricmask. Figure2-10.Thecenterofabi-cubicpatchcanbeevaluatedb ythelinearcombinationofthe boundarycoefcients. Figure 2-10 (b)showsamaskfortheevaluationofBicubicpatchat (0 : 5 ; 0 : 5) g = 1 64 ( b 00 +3 b 01 +3 b 02 + b 03 +3 b 10 +9 b 11 +9 b 12 +3 b 13 +3 b 20 +9 b 21 +9 b 22 +3 b 23 + b 30 +3 b 31 +3 b 32 + b 33 ) Now,wecansolveforthe b i112 i =0 ; 1 ; 2 ; 3 andobtaintheformulaofFigure 2-6 2.4SmoothnessVerication Inthissectionweformallyverifythefollowinglemma.Fort hepurposeoftheproof,we viewthec-patchinitsequivalentrepresentationasfourB ezierpatchesoftotaldegree4. Lemma1. Twoadjacentpolynomialpieces a and b denedbytherulesofSection 2.2 (Figure 2-4 ,Figure 2-6 ( 2–3 ) ( 2–4 ) )meetatleast (i) C 2 if a and b correspondtotworegularquads; (ii) C 1 if a and b areadjacentpiecesofac-patch; (iii) C 1 if a and b correspondtotwoquads,exactlyoneofwhichisregular; (iv) withtangentcontinuityif a and b correspondtotwodifferentirregularquads; Proof. (i)If a and b arebicubicpatchescorrespondingtoregularquads,theyar epartofa bicubicsplinewithuniformknotsandthereforemeet C 2 .(ii)If a and b areadjacentpiecesofa c-patchthenEquations( 2–3 )enforce C 1 continuity. 35

PAGE 36

Fortheremainingcases,let b beatriangularpiece.Let u theparametercorrespondingto thequadedgebetween b 400 = v 0 ,where u =0 andthevalenceis N 0 and b 040 = v 1 where u =1 andthevalenceis N 1 (Figures 2-11 for(iii)and 2-12 forcase(iv)).Byconstruction,thecommon boundary b ( u; 0)= a (0 ;u ) isacurveofdegree3withBeziercontrolpoints ( v 0 ;t 00 ;t 11 ;v 1 ) sothat bicubicpatchesonregularquadsandtriangularpatchesoni rregularquadsmatchupexactly. Denoteby @ 1 b thepartialderivativeof b alongthecommonboundaryandby @ 2 b thepartialderivativeintheothervariable.Since b ( u; 0)= a (0 ;u ) ,wehave @ 1 b ( u; 0)= @ 2 a (0 ;u ) .The partialderivativeintheothervariableof a is @ 2 a .Wewillverifythatthefollowingconditions hold,thatimplytangentcontinuity: ifonequadisordinary(case(iii)), @ 1 b ( u; 0)=2 @ 2 b ( u; 0)+ @ 1 a (0 ;u ); (2–23) ifbothquadsareextraordinary(case(iv)), (1 u ) 0 + u 1 @ 1 b ( u; 0)= @ 2 b ( u; 0)+ @ 1 a (0 ;u ) ; (2–24) where 0 :=1+ c 0 ; 1 :=1 c 1 ; and c i :=cos( 2 N i ) : Bothequations,( 2–23 )and( 2–24 ),equatevector-valuedpolynomialsofdegree3(wewrite @ 1 b ( u; 0) indegree-raisedform[ 14 ]).Theequationshold,ifandonlyifallBeziercoefcient s areequal.Offhand,thismeanscheckingfourvector-valued equationsforeachof( 2–23 )and ( 2–24 ).However,inbothcases,thesetupissymmetricwithrespec ttoreversalofthedirectionin whichtheboundary b ( u; 0) istraversed.Thatmeans,weneedonlycheckthersttwoequa tions ( 2–23 ')and( 2–23 ”)of( 2–23 )andthersttwoequations( 2–24 ')and( 2–24 ”)of( 2–24 ).We verifytheseequationsbyinsertingtheformulasofFigures 2-4 and 2-6 Toverify( 2–23 ),thekeyobservationisthat N 0 = N 1 =4 ifonequadisordinary.Hence c 0 = c 1 =0 and s 0 = s 1 =1 (cf.Figure 2-6 )and t ij = e ij .Therefore,forexample(cf.Figure 36

PAGE 37

Figure2-11. C 1 transitionbetweenatriangularandabicubicpatch. 2-11 ) 2 @ 2 b (0 ; 0)=2 4( b 301 v 0 )=8 3 4 ( e 00 + e 01 2 v 0 ) =3( e 00 + e 01 ) 6 v 0 ; wherethefactor 3 4 stemsfromraisingthedegreefrom3to4;andthesecondBezi ercoefcientof @ 1 b ( u; 0) (indegree-raisedform)andof 2 @ 2 b ( u; 0) arerespectively(cf.Figure 2-11 ) 3 ( e 00 v 0 )+2( e 11 e 00 ) 3 and 2 4( b 211 b 310 )=8( e 11 e 00 4 + e 00 v 0 8 +3 f 0 e 00 8 ) : Then,comparingthersttwoBeziercoefcientsof @ 1 b ( u; 0) and 2 @ 2 b ( u; 0)+ @ 1 a (0 ;u ) yields equalityandestablishes C 1 continuity: 3( e 00 v 0 ) | {z } @ 1 b (0 ; 0) =3( e 00 + e 01 ) 6 v 0 | {z } 2 @ 2 b (0 ; 0) 3( e 01 v 0 ) | {z } @ 1 a (0 ; 0) ( 0 ) ( e 00 v 0 )+2( e 11 e 00 )=2( e 11 e 00 )+( e 00 v 0 )+3( f 0 e 00 ) 3( f 0 e 00 ) : ( 00 ) Theequationsfor( 2–24 )aresimilar,exceptthatweneedtoreplace e j by t j andkeepin mindthat,bydenition, ( t 0n 0 1 v 0 )+( t 01 v 0 )=2 c 0 ( t 00 v 0 ) : 37

PAGE 38

Figure2-12. G 1 transitionbetweentwotriangularpatches. Hence,forexample, @ 2 b (0 ; 0)+ @ 1 a (0 ; 0)=4( b 301 v 0 + a 301 v 0 ) = 3 4 4 2 c 0 ( t 00 v 0 ) : Therstofthefourcoefcientequationsof( 2–24 )thensimpliesto 3(1+ c 0 )( t 00 v 0 )=4( b 301 + a 301 2 v 0 ) =3( t 01 + t 00 2 v 0 + t N 0 1 1 + t 00 2 v 0 ) =3 1 2 (2 c 0 ( t 00 v 0 )+2( t 00 v 0 )) : ( 0 ) Notingthatterms ( f 0 e 00 ) = (8( s 0 + s 1 )) intheexpansionsof b 211 and a 211 cancel,thesecond coefcientequationis 6 0 ( t 11 t 00 )+3 1 ( t 00 v 0 )=12( b 211 + a 211 2 b 310 ) = 12 2(1+ c 0 ) 4 ( t 11 t 00 )+ 12 2(1 c 1 ) 8 ( t 00 v 0 ) : ( 00 ) Itiseasytoreadoffthattheequalitieshold.Sotheclaimof smoothnessisveried. 38

PAGE 39

2.5ComplexityAnalysis 2.5.1NumberofPatches Theconversionschemeyieldstheminimumsetofpatchesbeca use(1)noinitialrenement forinputcoarsemeshisneeded;(2)eachquadrilateralface tofthecoarsemeshcorrespondsto onlyonepatch.Namely,thetotalnumberofpatchesequalsto thenumberoffacetsinthemesh. ThepatchcomplexityofvariousschemesarecomparedinFigu re 1-12 Thelowcostofconstructionandevaluationmakesc-patches anattractiverepresentation, notjustontheGPU2.5.2CostofPatchConstruction Theseparationintovertexandpatchconstructionmeanstha tthenumberofscaledvertex additions(adds)perpatchisindependentofthevalence.Th ecostofcomputingthecontrolpoints perpatch ,(i.e.),withthecostofvertexcomputationsdistributed, is 4 (4+1+1+2)=32 addsperbicubicconstructionandcomputing t j from t 0 and t 1 anddetermining b i211 b i121 and b i112 accordingtoFigure 2-6 amountstoanadditional 4 (2+6+6+12)=104 addsper c-patch.Eachc-patchhas24coefcients.Thiscomparesfav orablyto,say[ 30 ]where16+12+12 coefcientsaregenerated.2.5.3CostofSurfaceEvaluation Thepatchcanbeevaluatedatanyparametricdomain ( u;v ) usingdeCasteljau'salgorithm. AtensorproductBi-cubicBezierpatchisdenedby16contr olpoints.Theevaluationat ( u;v ) needs42vector-vectoradditions,42scaler-vectormultip lications,and42scaler-scaler operations.Similarlytheevaluationofac-patchat ( u;v ) requires40vector-vectoradditionsand 60scaler-vectormultiplications.Intermsofevaluationc ost,ac-patchhasroughlythesamecost asabicubicpatchdoes. 39

PAGE 40

2.6ApproximationCatmull-ClarkSubdivisionSurface SinceCatmull-Clarksubdivisionisastandardmodelingtoo l,ourschemeisdesignedto approximateCatmull-ClarkSubdivisionSurface.Infact,t heresultingBi-cubicpatchescompletelyagreewiththeCatmull-ClarkSubdivisionSurfacee xceptintheimmediateneighborhood ofirregularmeshvertices.Insuchaneighborhoodtheyjoin atleastwithtangentcontinuityand interpolatethelimitoftheirregularmeshvertex.Further more,thecenterofc-patchinterpolates thecenterpointofthecorrespondentCatmull-Clarklimits urfaceduetothechoiceofthec-patch coefcient b 112 2.7Water-TightSurfaceVerication Patchesareevaluatedindependently.Ifthegeneratedvert icesalongtheboundaryfromthe adjacentpatchesdonotmatchexactly,therenedmeshwillh aveaholeinit.Therearethree congurationsforadjacentpatches:(1)bothareBi-3patch es,(3)botharec-patches,(2)oneof themisBi-3patch. Thecoefcientsdeningthesharedboundarycurvearederiv edbytheaveragingrules denedinFigure 2-4 .Sinceadditionsarecommutative,thegenerationofallbou ndarycoefcientsareindependentoftheevaluationofthechoiceofpa tch.Inotherwords,noroundoff errorandcrackingarepossiblefortherstcase.Thebounda rycoefcientsofac-patcharecomputedbythesamerulesinFigure 2-4 ,thereforewater-tightnessarealsoachievedforthelater al twocases.Notethatcomputationofthecubicboundariessha redbyabicubicandac-patchis mathematicallyidentical. 2.8Discussion Theintroductionoftriangularpatchestomodelquadpatche sissomewhatunconventional, buthasbeenusedinanI3Dpaperbefore[ 15 ].Also[ 49 ]isbasedontriangularpatches. Evaluationandnormalcomputationofdegree4triangularpa tchesiscomparableincostto 40

PAGE 41

tensor-productbicubicpatches:inthetriangularcaseweh avetoaverage15controlpoints,inthe tensor-productcase16.Triangularpatchesmaydeservemor eattentioninOpenGL. 41

PAGE 42

CHAPTER3 GPUIMPLEMENTATION 3.1Overview WeimplementedtheconversionschemeusingC++onDirectX10 pipeline.Wecompute vertexneighborhoodsaccordingtoFigure 2-4 inthevertexshaderandusethegeometryshader primitive trianglewithadjacency toaccumulatethecoefcientsofthebicubicpatchorcomput e ac-patchaccordingtoFigure 2-6 .Weimplementedconversionplusrenderingintwovariants: a 1-passanda2-passscheme. 3.22-passApproach Figure3-1. 2-passimplementationdetailedinFigure 3-2 .Therstpassconverts,thesecondrenders.Notethatthe geometryshaderonlycomputesatmost24coefcientsperpat chanddoesnotcreate(amplifyto) evaluationpointprimitives. 42

PAGE 43

Figure3-2.2-Passconversion:VS=vertexshader,GS=geome tryshader,PS=pixelshader.VSOut ofPass1outputs N points f j foronevertex(hencethesubscript)andGSInofPass1 retrievesfourpoints f i ,eachgeneratedbyadifferentvertexofthequad(hencethe superscript). The 2-passimplementation constructsthepatchesintherstpassusingthevertexshad er andthegeometryshaderandevaluatespositionsandnormals inthesecondpass.Pass1streams outonlythe 4 6 coefcientsofac-patchandnotthe 4 4+2 2 Beziercontrolpointsof theequivalenttriangularpieces.Thedataamplicationne cessarytoevaluatetakesplaceby instancinga ( u;v ) -gridonthevertexshaderinthe secondpass .Thatis,we donotstreamback largedatasetsafteramplication .Positionandnormalarecomputedonthe ( u;v ) domain [0 :: 1] 2 43

PAGE 44

ofthebicubicorofthec-patch(notonanytriangulardomain s).Wepre-tessellatethequad domain,andstoretheresultsinasetoftextureswithdiffer entresolution.Ifatessellationfactor ischosentobe m ,thetexturewith ( m +1) by ( m +1) parametricvalueswillbesenttothe vertexshaderinthesubsequentevaluationpass.Giventhep re-tessellateddomainwithapatch identier,thevertexshaderloadstheappropriatecontrol pointsandevaluatesthepatch.Figure 3-2 liststheinput,outputandthecomputationsofeachpipelin estage.Figure 3-1 illustratesthis associationofcomputationsandresources.Inordertoavoi dpricybranchinginHLSL(High LevelShaderLanguage)andoptimizetheperformance,speci alizedshadersareactuallywritten forpatchconstructionsandevaluationbasedonthepatchty pe. 3.31-passApproach Inthe 1-passimplementation ,theevaluationimmediatelyfollowsconversioninthe geometryshader,usingthegeometryshader'sabilityto amplify ,(i.e.),outputmultiplepoint primitivesforeachfacet(Figure 3-4 ).Whilea1-passimplementationsoundsmoreefcient thana2-passimplementation,DX10limitsdataamplicatio ninthegeometryshadersothatthe maximalevaluationdensityis 8 8 perquad.Moreover,maximalamplicationinthegeometry shaderslowstheperformance.Weobservedaminimumof 25% betterperformanceofthe2-pass implementation .Figure 3.3 liststhedataowonthegraphicspipeline. 3.4CoordinateSystemTransformation Whenweevaluatenormalandpositionofanirregularquadat ( u;v ) ,weneedrst transformthetessellateddomainvaluefromaCartesiancoo rdinate ( u;v ) toabarycentric coordinate ( s;t;w ) .Figure 3-5 illustrateshowtolocatewhichoffourtriangleswhere ( u;v ) lieson.Inthisway,weminimizenumberofcomparisonsandta kecareofthesharedvertices. Wemake (0 : 0 ; 0 : 0) ; (1 : 0 ; 0 : 0) ; (0 : 5 ; 0 : 5) onlybelongto T 1 (1 : 0 ; 1 : 0) onlybelongsto T 2 ,and (0 : 0 ; 1 : 0) onlybelongsto T 4 44

PAGE 45

Figure3-3.1-Passconversion:VS=vertexshader,GS=geome tryshader,PS=pixelshader.GS ampliesthegeometryandevaluatesthepatches. Figure3-4. Atpresent,the1-passconversion-and-renderingmustplac epatchassemblyandevaluationonthe geometryshader.Thisisnotefcient. 45

PAGE 46

u v (0.5,0.5) (0.0,1.0) (1.0,1.0)(1.0,0.0) (0.0,0.0) T4 T3 T2 T1 Figure3-5. ( u;v ) onanirregularquad. 3.5Water-TightEvaluation TheHLSLcodeinFigure 3-6 showsthatthesamecubiccurveisevaluatedalongthe boundary.Anexplicitif-statementintheevaluationguara nteestheexactsameorderingof computationssinceboundarycoefcientsareonlycomputed once, Figure3-6.Water-tightEvaluation 46

PAGE 47

3.6Conclusion ThepresentedapproachtswellintoaGPUpipeline.Inbotha pproaches,wecompute v e f and t usingits vertexneighborhood andtherulesinFigure 2-4 inthevertexshader.Each vertexhas 2 n +1 verticesinits vertexneighborhood ,where n isthevalence.Thisinformation isstoredinatexture.WithavertexIDanditsvalence,allve rticesinitsneighborhoodcan beretrievedincounter-clockwisedorder.Inthegeometrys hader,thepatchisnalizedand assembled.Overall,the2-passimplementationhasbetterp erformancebecauseofsmallstreamout,shortgeometryshadercodeandminimalamplicationon thegeometryshader. 47

PAGE 48

CHAPTER4 RESULTS 4.1ShapeQuality Ouralgorithmproduces C 1 surfacesandtheycloselyapproximateCatmull-Clarksubdi visionsurfaces.Wecompareouralgorithmwith[ 30 ]ontheclosenesstoCatmull-Clarksurfaces. WemeasurehowthesurfaceisclosetoCatmull-Clarksurface bycomparingbothgeometricdifferenceandnormalangledifference.Figure 4-1 comparesthesmoothedquadmeshsurfaceswith denselyrenedCatmull-Clarksubdivisionsurfacesbasedo nthesamemesh.Bothgeometric distance,aspercentofthelocalquadsize,andnormaldista nce,indegreesofvariation,arecompared.Especiallyafterdisplacement,largemodelsrender edbysubdivisionandquadsmoothing appearvisuallyindistinguishable.Therelativelysmalle xamples,withoutdisplacement,shown inFigure 4-1 andthecloseupinFigure 4-5 arealsoimportanttosupportourobservationthat c-patchesdonotcreateshapeproblemscomparedtoasingleb icubicpatch:despitethelower degreeandinternal C 1 join,theirvisualappearanceisremarkablysimilartothat ofbicubic patches.ThecomparisonwithACC-patches[ 30 ]isshownin 4-2 .Figures 4-3 4-4 showthe generatedsmoothsurfacebyouralgorithmandthesurfaceaf terapplyingdisplacementmapping respectively. Figure4-1.ComparisonbetweentheCatmull-Clark(CC)subd ivisionlimitsurfaceandthe smoothedquadmeshsurfaceforthesameinput. 48

PAGE 49

Figure4-2.ComparisonofACC-patchandC-patchintermsofa pproximationofCatmull-Clark subdivisionsurfacesforthesameinput. Figure4-3.GPUsmoothedquadsurfaces:orangepatchescorr espondtoordinaryquads,blue patchestoextraordinaryquads. Figure4-4.GPUsmoothedquadsurfaceswithdisplacementma pping. 49

PAGE 50

4.2Performance Wecompiledandexecutedtheimplementationonthelatestgr aphicscardsofboth majorvendorsunderDirectX10andtestedtheperformancefo rseveralindustry-sizedmodels. Twosurfacemodelsandmodelswithdisplacementmappingare showninFigure 4-3 and 4-4 respectively.Table4summarizestheperformanceofthe2-p assalgorithmfordifferent granularitiesofevaluation.Thefrogmodel,inparticular ,providesachallengeduetothelarge numberofextraordinarypatches.TheFrogPartyshowninFig ure 4-11 currentlyrendersat50 fpsforuniformevaluationforN=9,(i.e.),ona 9 9 grid.Thatis,theimplementationconverts 1292 9 quads,ofwhich59%areextraordinary,andrendersof1milli onpolygons50timesper second.Onthesamehardware,wemeasuredBunnell'sefcien timplementation(distribution accompanying[ 9 ])featuringthesinglefrogmodel,(i.e.),1/9thofthework oftheFrogParty, runningat44fpswiththreesubdivisions(equivalenttotes sellationfactorN=9).Thatis, Table4-1.Aatotaldegree4patchandabicubicpatchhavethe sameevaluationcostat ( u;v ) in termsofALUoperations. evaluationforac-patchALUvectorops position55normal3other1total59 evaluationforabicubicpatchALUvectorops position56normal3other0total59 Table4-2.Framespersecondforsomestandardtestmesheswi theachpatchevaluatedonagrid ofsize N N ;eqs = percentageofextraordinaryquads.SwordandFrogareshown inFigure 4-3 ,HeadinFigure 4-1 MeshFramespersecond(verts,quads,eqs) N =591733 Sword(140,138,38%)965965965703Head(602,600,100%)637557376165Frog(1308,1292,59%)48339222687 50

PAGE 51

Figure4-5.Close-upofthefrog.Therenedmeshiswater-ti ght. Table4-3.Performanceofthe1-passimplementation. MeshSlower1-passimplementation N =258 Sword3899643Head1083415Frog44104 GPUsmoothingofquadmeshesisanorderofmagnitudefaster. Comparedto[ 46 ],thespeed upisevenmoredramatic.Whilethecomparisonisnotamongeq ualssinceboth[ 46 ]and[ 9 ] implementrecursiveCatmull-Clarksubdivision,itisneve rthelessfairtoobservethatthespeedup isatleastpartiallyduetoouravoidingstreambackafteram plication(dataexplosiondueto renement).Weexpectthatmorecarefulstorageofvertexne ighborhoods,inretrievingorder, willfurtherimproveouruseoftexturecacheandtherebyimp rovetheframespersecond(fps) count. 4.3DisplacementMapping Displacementmappingisatechniqueforaddinggeometricde tailsonthemeshwitha heightmap.ItisdifferentfromBumpMappingorNormalMappi nginthesensethatitchanges thegeometrybymovingverticesoftenalongtheirnormaldir ectionsaccordingtothevalueinthe 51

PAGE 52

heightmap.Thechangeofrealgeometry,notjustnormalfori nstanceinBumpMapping,permits self-occlusion.Figure 4-6 showsthedisplacementmappingonthefrogmodelwhichconsi stsof 330kfacets.Thesizeofheightmapis1024by1024. Figure4-6.Displacementmappingonthefrogmodel Inordertoperturbnormalsafterdisplacementmapping,wen eed D u and D v bump mappingvalue.Theequationtocalculatenewnormalsisasfo llows. S = P + D n (4–1) where,SisthedisplacementofthepointP,Disthedisplacem entandnisthenormalofP.Then thenewnormaliscalculatedbythecrossproductof S u and S v S u = P u + D u n + D n u (4–2) S v = P v + D v n + D n v (4–3) Notethat n u and n v arethederivativesofthenormalizednormal n n u = n 0u n ( n 0u n ) jj n jj (4–4) where n 0u = P uu P v + P u P uv4.4MorphingandAnimation Weimplementmorphingusingthe2-passapproach.Theanimat edsequenceoftheinput meshesinformoftexturesarefedintotheInputAssemblerof therstpasseachframe.The morphedpatchesareconstructedduringtherstpass.Fined etailsareaddedinthesecondpass. ThescreenshotsinFigures 4-9 4-10 4-11 illustraterealtimedisplacementandanimation. 52

PAGE 53

Figure4-7.Comparisonofthec-patchschemewithPN-Triang les(alsocalledN-patch), ACC-patch,andCatmull-Clarksubdivision Figure4-8.comparisonofthec-patchschemewithPN-Triang les(alsocalledN-patch), ACC-patches,andCatmull-Clarksubdivision 53

PAGE 54

Figure4-9.RealtimeanimationontheSwordmodel. Figure4-10.RealtimeanimationontheFrogmodel. Figure4-11.AsynchronousanimationofnineFrogs. 54

PAGE 55

4.5Conclusion SmoothingquadmeshesontheGPUoffersanalternativetohig hlyrenedfacetrepresentationstransmittedtotheGPUandispreferableforinte ractivegraphicsandintegrationwith complexmorphinganddisplacement. Weadvertiseda2-passscheme,since,asweargued,theDX10g eometryshaderisnot wellsuitedforthedataamplicationforevaluationafterc onversion.The1-passscheme outlinedinSection 3 maybecomemorevaluablewithavailabilityofadedicatedha rdware tesselator[ 29 48 ].Suchatesselatorwillmakeamplicationmoreefcientan dsupport adaptive tessellation (whichiswhyweonlydiscusseduniformtessellationinSect ion 3 ).Suchahardware amplicationwillalsobenetthe2-passapproachinthatth e ( u;v ) domaintessellation,fedinto thesecondpasswillbereplacedbytheamplicationunit. 55

PAGE 56

CHAPTER5 PATCHCONVERSIONSFORMESHESWITHTRI/QUAD/PENTFACETS Ourconversionalgorithmcanbegeneralizedtoworkforarbi trarymeshes.Thegeneralized algorithm[ 34 ]providesanelegantsolutionformesheswithTri/Quad/Pen tFacets.Removing restrictionsonvertexvalencesandallowingmesheswithtr iangles,quadrilaterals,andpentagons vastlysimpliesadesigner'staskandenrichesthedesigns paceofmeshesforsmoothsurfaces: whilequadsnaturallymodeltheowof(parallel)featureli nesandarethereforethemainfacet typeinmodels,triangularfacetsallowmerginglineswhile pentagonalfacetsallowtostarting newlines(Figure 5-1 )–withoutcreatingT-cornersorforcingrenementofinter mediatemodels tosatisfyconnectivityorquad-layoutconstraints.Essen tially,designerscanre-usethewhole rangeofpolyhedralmodelstheyareusedto.Wemodiedtheal gorithmforconvertingquad meshestoageneralizedmethodforameshwithTr/Quad/Pentf acets.Thegeneralizedscheme convertssuchapolyhedralmodeltoasurfacewitheverywher ewell-denednormaland C 2 in `regular'meshregionswithquad-gridconnectivity.Figur e 5-2 showsanexampleoftheresulting surfaces.Notethatthefacetsarelimitedtotriangles,qua dsandpentagonsduetocurrentGPU Figure5-1.(a)Retainingthedensityoffeaturelineswhile varyingtheirnumber.(b),(c)Axe handledetailusingatriangleandapentagontotransitionb etweendetailedand coarserareas. constraintsandtoavoidunnecessarynotational,technica landshapecomplexity. Anirregularfacetwith k sidesisconvertedintoak-patch.Ak-patchisageneralizat ion ofac-patch.Itisapiecewisedegree4 C 1 splinepatchwith k cubicboundaries.Ak-patch isdenedby 6 k +1 controlpointsindicatedas inFigure 5-3 (b),(c).Thatis,thek-patch correspondingtoatriangular,quadrilateralorpentagona lfacetisdenedbyatotalof19,24or 31pointsrespectively. 56

PAGE 57

Figure5-2.ThegeneralizedschemeconvertsameshwithTri/ Quad/PentFacetstoasmooth surfaceconsistingofbi-cubicpatches( yellow ),k-patchwith k =3 ( green ), k =4 ( red ),and k =5 ( gray ). Figure5-3.(a)Anordinaryfacetisconvertedtoabi-cubicp atchwith16controlpoints g ij (b),(c)Anextraordinaryfacetwith k sidesisconvertedtoakdenedby 6 k +1 controlpointsshownas .Thekcanbeviewedas k C 1 -connecteddegree-4 triangularpatches i i =0 :::k 1 withcubicouterboundaries. Figure5-4.Thetriangular sectors arelistedincounter-clockwiseorderwithamodulo-k superscript.(a)14controlpointsfromthreeconsecutives ectorsofak-patchdene (b)asinglepatchintriangularBezier-form. 57

PAGE 58

Forevaluation,wecanrecoverthepolynomialrepresentati onofthe i th sectorintriangular -formoftotal-degree4(Figure 5-3 (b)and(c)), S ( u;v ):= X i + j + k =4 ijk 4 i j k u i v j ( 1 u v ) k ; (5–1) wherethe 4+2 2 BB-coefcients ijk 2 R 3 areindexedasinFigure 5-4 .Specically,wecompute the 4+2 2 coefcients ijk (Figure 5-4 (b))fromthe14coefcientslabeledin 5-4 (a)bysimple averaging:degree-raisingthecoefcients i 3 l ; l ; 0 l =0 ;:::; 3 to i 4 `;`; 0 ` =0 ;:::; 4 [ i 400 ; i 310 ; i 220 ; i 130 ; i 040 ]=[ i 300 ; i 300 + 3i 210 4 ; i 210 + i 120 2 ; 3i 120 + i 030 4 ; i 030 ] andcomputingtheshared-coefcientsonthesectorboundar ies i 3 ; 0 ; 1 + = i 1 0 ; 3 ; 1 + =0 ; 1 ; 2 ; 3 ,(i.e.),indices 301 202 103 and 004 inFigure 5-4 (b),fromtheC 1 constraints. Read[ 34 ]forathoroughexplanationofthealgorithmanditsGPUimpl ementation, smoothnessverication,etc. 58

PAGE 59

CHAPTER6 DISCUSSIONANDFUTUREWORK 6.1FutureGPUAPI Ourconversionschemenotonlytswellwiththecurrentgrap hicshardwarepipeline, butalsomatchesverywellwiththearchitectureofthefutur egraphicshardware[ 29 48 ].The workloadcurrentlyinthegeometryshaderwillbeassignedt othepatchshader.TheidealGPU pipelineneedstoexploremoreparallelisminthegeometrys haderwhere24coefcientsofa c-patchcanbecomputedindependentlygiventhevertexneig hborhood.Themaximalparallelism makesthecostofderivingonecoefcientroughlyequalstot hecostofconstructingawhole patch.Currentlyweprecomputethetessellateddomainands torethesestaticvaluesinaset oftextures.Inthefuture,thispartofcomputationwillber eplacedbythetessellationunit. Animationusingourconversionschemewillbeachievedinas inglepasswithoutgeometry transmissionbetweenpasses. 6.2VolumePreservation Preservingthevolumeunderconstraintscanachieveareali sticdeformableobjectanimation.Thewell-knowndivergencetheoremcanbeusedtored uceavolumeintegraltoanan integraloverthesurface.Givenaclosedobject,volumeism atchedtoaprescribedvalueby inatingordeatingthedeformableobjectuniformly.Fore nhancingtherealism,thismethodcan befurtherextendedtoxpartsoftheobjectandattachdiffe rentmaterialpropertiestosurface pieces.Thisexact,localizedvolumepreservationmethodw orksforallsurfacesthatconsists ofBezierpatches.Therefore,wewillcombinethismethodw ithournewsurfaceconversion algorithmtoachievereal-timevolumepreservation. 6.3AdaptiveTessellation Theadaptivetessellationsampleseachsurfacepatchmored enselyinregionsofhigh curvatureandlessdenselyinregionsoflowcurvature.More overitadjuststhelevelofdetail accordingtohowclosethegeometryistothecamera.Thesurf aceisonlytestedwhereandwhen 59

PAGE 60

it'snecessary.Therefore,adaptivetessellatedsurfacew illgreatlyimprovetheperformance.The tessellationfactorcanbegeneratedbyusingtheattest[ 9 ].Withthetessellationunitinthe GPU,thecostoftessellatingthedomainisalmostfree. 60

PAGE 61

REFERENCES [1] MicrosoftDirectX10SDK.2008.http://www.microsoft.com/downloads/details.aspx?Fam ilyId=572BE8A6-263A4424-A7FE-69CFF1A5B180displaylang=en. [2] C.Bajaj,J.Chen,andG.Xu.Freeformsurfacedesignwitha-p atches.In Proceedingsof GraphicsInterface94 ,pages174–181,Banff,Alberta,Canada,1994. [3] S.Bischoff,L.P.Kobbelt,andH.Seidel.Towardshardwarei mplementationofloop subdivision.In HWWS'00:ProceedingsoftheACMSIGGRAPH/EUROGRAPHICS workshoponGraphicshardware ,pages41–50,NewYork,NY,USA,2000.ACMPress. [4] D.Blythe.TheDirect3D10System.In ProceedingsofACMSIGGRAPH2006 ,pages 724–734,2006.http://download.microsoft.com/download /f/2/d/f2d5ee2c-b7ba-4cd0-9686b6508b5479a1/Direct3D10 web.pdf. [5] M.Bo,M.Amor,M.Doggert,J.Hirche,andW.Strasser.Hardwa resupportforadaptive subdivisionsurfacerendering,2001.citeseer.ist.psu.e du/article/boo01hardware.html. [6] J.BolzandP.Schr¨oder.RapidevaluationofCatmull-Clark subdivisionsurfaces.In Web3D '02:Proceedingoftheseventhinternationalconferenceon 3DWebtechnology ,pages 11–17,NewYork,NY,USA,2002.ACMPress. [7] J.BolzandP.Schr¨oder.Evaluationofsubdivisionsurface sonprogrammablegraphics hardware.2007.http://www.multires.caltech.edu/pubs/ GPUSubD.pdf. [8] T.BoubekeurandC.Schlick.GenericmeshrenementonGPU.I n HWWS'05:ProceedingsoftheACMSIGGRAPH/EUROGRAPHICSconferenceonGraphi cshardware ,pages 99–104,NewYork,NY,USA,2005.ACM. [9] M.Bunnell. GPUGems2:ProgrammingTechniquesforHigh-PerformanceGr aphicsand General-PurposeComputation ,chapter7.AdaptiveTessellationofSubdivisionSurfaces withDisplacementMapping.Addison-Wesley,Reading,MA,2 005. [10] E.CatmullandJ.Clark.RecursivelygeneratedB-splinesur facesonarbitrarytopological meshes. ComputerAidedDesign ,10:350–355,1978. [11] R.L.Cook. Shadetrees .ACM,NewYork,NY,USA,1998. [12] M.S.D.Doo.Behaviourofrecursivedivisionsurfacesneare xtraordinarypoints. Computer AidedDesign ,10:356–360,1978. [13] T.DeRose,M.Kass,andT.Truong.Subdivisionsurfacesinch aracteranimation.In SIGGRAPH'98:Proceedingsofthe25thannualconferenceonC omputergraphicsand interactivetechniques ,pages85–94,NewYork,NY,USA,1998.ACMPress. [14] G.Farin. Curvesandsurfacesforcomputeraidedgeometricdesign:ap racticalguide AcademicPressProfessional,Inc.,SanDiego,CA,USA,1988 61

PAGE 62

[15] C.GonzalezandJ.Peters.Localizedhierarchysurfacespli nes.InS.S.J.Rossignac,editor, ACMSymposiumonInteractive3DGraphics ,pages7–15,1999. [16] M.Guthe, A.Balazs,andR.Klein.GPU-basedtrimmingandtessellati onofNURBSand T-splinesurfaces. ACMTransactionsonGraphics ,24(3):1016–1023,2005. [17] M.Guthe,A.Balazs,andR.Klein.GPU-basedtrimmingandte ssellationofNURBSand T-splinesurfaces. ACMTrans.Graph. ,24(3):1016–1023,2005. [18] M.Halstead,M.Kass,andT.DeRose.Efcient,fairinterpol ationusingCatmull-Clark surfaces. ProceedingsofSIGGRAPH93 ,pages35–44,Aug1993. [19] H.Hoppe,T.DeRose,T.Duchamp,M.Halstead,H.Jin,J.McDon ald,J.Schweitzer,and W.Stuetzle.Piecewisesmoothsurfacereconstruction. ComputerGraphics ,28(Annual ConferenceSeries):295–302,1994. [20] D.L.JamesandC.D.Twigg.Skinningmeshanimations.In SIGGRAPH'05:ACM SIGGRAPH2005Papers ,pages399–407,NewYork,NY,USA,2005.ACM. [21] K.KarciauskasandJ.Peters.Guidedsubdivision,2005.http://www.cise.u.edu/research/SurfLab/papers.shtm l. [22] O.A.KarpenkoandJ.F.Hughes.Smoothsketch:3dfree-forms hapesfromcomplex sketches. ACMTransactionsonGraphics ,25/3:589–598,2006. [23] L.Kavan,C.O'Sullivan,andJ.Zara.Efcientcollisionde tectionforsphericalblend skinning.In Proceedingsofthe4thinternationalconferenceonCompute rgraphicsand interactivetechniquesinAustralasiaandSoutheastAsiat ableofcontents,KualaLumpur, Malaysia ,pages147–156,2006. [24] L.KavanandJ. Zara.Sphericalblendskinning:areal-timedeformationo farticulated models.In I3D'05:Proceedingsofthe2005symposiumonInteractive3D graphicsand games ,pages9–16,NewYork,NY,USA,2005.ACM. [25] A.Krishnamurthy,R.Khardekar,andS.McMains.Directeval uationofnurbscurvesand surfacesontheGPU.In SPM'07:Proceedingsofthe2007ACMsymposiumonSolidand physicalmodeling ,pages329–334,NewYork,NY,USA,2007.ACM. [26] S.LaiandF.F.Cheng.Adaptiverenderingofcatmull-clarks ubdivisionsurfaces.In CADCG'05:ProceedingsoftheNinthInternationalConferenceo nComputerAidedDesignand ComputerGraphics ,pages125–132,Washington,DC,USA,2005.IEEEComputerSo ciety. [27] A.Lee,H.Moreton,andH.Hoppe.Displacedsubdivisionsurf aces.InK.Akeley, editor, Siggraph2000,ComputerGraphicsProceedings ,pages85–94.ACMPress/ACM SIGGRAPH/AddisonWesleyLongman,2000.citeseer.ist.psu .edu/lee00displaced.html. [28] A.Lee,H.Moreton,andH.Hoppe.Displacedsubdivisionsurf aces.InK.Akeley,editor, Siggraph2000,ComputerGraphicsProceedings ,AnnualConferenceSeries,pages85–94. ACMPress/ACMSIGGRAPH/AddisonWesleyLongman,2000. 62

PAGE 63

[29] M.Lee.Next-generationgraphicsprogrammingonxbox360,2 006. http://download.microsoft.com/download/d/3/0/d30d58 cd-87a2-41d5-bb53-baf560aa2373/ Next Generation Graphics Programming on Xbox 360.ppt. [30] C.LoopandS.Schaefer.ApproximatingCatmull-Clarksubdi visionsurfaceswithbicubic patches.Technicalreport,MicrosoftResearch,MSR-TR-20 07-44,2007. [31] C.T.Loop.Smoothsubdivisionsurfacesbasedontriangles, 1987.Master'sThesis, DepartmentofMathematics,UniversityofUtah. [32] A.Mohr,L.Tokheim,andM.Gleicher.Directmanipulationof interactivecharacterskins. In I3D'03:Proceedingsofthe2003symposiumonInteractive3D graphics ,pages27–30, NewYork,NY,USA,2003.ACM. [33] K.MullerandS.Havemann.Subdivisionsurfacetesselation ontheyusingaversatile meshdatastrucure,2000.citeseer.ist.psu.edu/muller00 subdivision.html. [34] A.Myles,T.Ni,andJ.Peters.GPU-friendlysmoothsurfaces frommesheswith tri/quad/pentfacets.In SymposiumonGeometryProcessing,July2-4,2008,Copenhagen,Denmark ,pages1–8.Blackwell,2008. [35] A.Myles,Y.Yeo,andJ.Peters.GPUconversionofquadmeshes tosmoothsurfaces. InD.Manocha,B.Levy,andH.Suzuki,editors, ACMSolidandPhysicalModeling Symposium,June2-4,2008,StonyBrookUniversity,StonyBr ook,NewYork,USA ,pages 321–326.ACMPress,2008. [36] A.Nealen,T.Igarashi,O.Sorkine,andM.Alexa.Fibermesh: designingfreeformsurfaces with3dcurves. ACMTrans.Graph. ,26(3),2007. [37] T.Ni,Y.Yeo,A.Myles,V.Goel,andJ.Peters.GPUsmoothingo fquadmeshes.In M.Spagnuolo,D.Cohen-Or,andX.Gu,editors, IEEEInternationalConferenceonShape ModelingandApplications,June4-6,2008,StonyBrookUniv ersity,StonyBrook,New York,USA ,pages3–10.ACMPress,2008. [38] J.Peters.PatchingCatmull-Clarkmeshes.InK.Akeley,edi tor, Siggraph2000,Computer GraphicsProceedings ,AnnualConferenceSeries,pages255–258.ACMPress/ACM SIGGRAPH/AddisonWesleyLongman,2000. [39] J.Peters.Geometriccontinuity.In HandbookofComputerAidedGeometricDesign ,pages 193–229.Elsevier,2002. [40] J.PetersandA.Nasri.Computingvolumesofsolidsenclosed byrecursivesubdivision surfaces. ComputerGraphicsForum ,16(3):C89–C94,1997. [41] J.PetersandU.Reif.AnalysisofgeneralizedB-splinesubd ivisionalgorithms. SIAM JournalonNumericalAnalysis ,35(2):728–748,Apr.1998. [42] H.Prautzsch.Freeformsplines. ComputerAidedGeometricDesign ,14(3):201–206,1997. 63

PAGE 64

[43] H.Prautzsch,W.Boehm,andM.Paluzny. B ezierandB-SplineTechniques .SpringerVerlag, 2002. [44] K.PulliandM.Segal.Fastrenderingofsubdivisionsurface s.In SIGGRAPH'96:ACM SIGGRAPH96VisualProceedings:Theartandinterdisciplin aryprogramsofSIGGRAPH '96 ,page144,NewYork,NY,USA,1996.ACM. [45] S.SchaeferandJ.Warren.Exactevaluationofnon-polynomi alsubdivisionschemesat rationalparametervalues.In PG'07:Proceedingsofthe15thPacicConferenceon ComputerGraphicsandApplications ,pages321–330,Washington,DC,USA,2007.IEEE ComputerSociety. [46] L.-J.Shiue,I.Jones,andJ.Peters.ArealtimeGPUsubdivis ionkernel.InM.Gross, editor, Siggraph2005,ComputerGraphicsProceedings ,AnnualConferenceSeries,pages 1010–1015.ACMPress/ACMSIGGRAPH/AddisonWesleyLongman ,2005. [47] J.Stam.ExactevaluationofCatmull-Clarksubdivisionsur facesatarbitraryparameter values.In SIGGRAPH ,pages395–404,1998. [48] A.Tatarinov.Instancedtessellationindirectx10,2008.http://www.microsoft.com/downloads/details.aspx?Fam ilyId=572BE8A6-263A-4424A7FE-69CFF1A5B180displaylang=en. [49] A.Vlachos,J.Peters,C.Boyd,andJ.L.Mitchell.CurvedPNt riangles.In 2001, SymposiumonInteractive3DGraphics ,Bi-AnnualConferenceSeries,pages159–166. ACMPress,2001. [50] D.Zorin.Subdivisionformodelingandanimation. ACMSIGGRAPHCourseNotes ,2000. 64

PAGE 65

BIOGRAPHICALSKETCH TianyunNiwasborninNanjing,China.ShewasawardedherBSi ncomputersciencewith mathematicsminorfromTexasStateUniversityin2000andhe rMEincomputerengineering fromUniversityofFloridain2002.Sheearnedherdoctorald egreeincomputergraphicseldin 2008. 65