Smooth Surfaces from 4-sided Facets

T. L. Ni, Y. Yeo, A. Myles, V Goel and J. Peters

Abstract

We present a fast algorithm for converting quad meshes on the GPU

to smooth surfaces. Meshes with 12,000 input quads, of which 60% /

have one or more non-4-valent vertices, are converted, evaluated

and rendered with 9 x 9 resolution per quad at 50 frames per sec-

ond. The conversion reproduces bi-cubic splines wherever possible

and closely mimics the shape of the Catmull-Clark subdivision sur-

face by c-patches where a vertex has a valence different from 4.

The smooth surface is piecewise polynomial and has well-defined

normals everywhere. The evaluation avoids pixel dropout.

1 Introduction and Contribution

Due to the popularity of Catmull-Clark subdivision [Catmull and

Clark 1978], quad-meshes are common in modeling for animation.

Quad meshes are meshes consisting of quadrilateral facets without

restriction on the valence of the vertices. Any polyhedral mesh can

be converted into a quad mesh by one step of Catmull-Clark subdi-

vision, but a good designer creates meshes with the quad-restriction

in mind so that no global refinement is necessary.

For real-time applications such as gaming, interactive animation

and morphing, it is convenient to offload smoothing and render-

ing to the GPU. In particular, when morphing is implemented on

the GPU, it is inefficient to send large data streams on a round trip

to the CPU and back. Smooth surfaces are needed, for example, as

the base for displacement mapping in the surface normal direction

[Lee et al. 2000] (Fig 3).

For GPU smoothing, we distinguish two types of quads: ordinary

and extraordinary. A quad is ordinary if all four vertices have 4

neighbors. Such a facet will be converted into a degree 3 by 3 patch

in tensor-product B6zier form by the standard B-spline to B6zier

conversion rules [Farin 1990]. Therefore, any two adjacent patches

derived from ordinary quads will join C2. The interesting aspect is

the conversion of the extraordinary quads, i.e. quads having at least

one and possibly up to four vertices of valence n 4. We present a

new algorithm for converting both types of quads on the fly so that

1. every ordinary quad is converted into a bicubic patch in

tensor-product B6zier form, Figure 1, (b);

2. every extraordinary quad is converted into a composite patch

(short c-patch) with cubic boundary and defined by 24 coef-

ficients (Figure 1, c);

3. the surface is by default smooth everywhere (Lemma 1);

4. the shape follows that of Catmull-Clark subdivision;

5. conversion and evaluation can be mapped to the GPU to ren-

der at very high frame rates (at least an order of magnitude

faster than for example [Bunnell 2005; Shiue et al. 2005] on

current hardware).

1.1 Some Alternative Mesh Smoothing Techniques on

the GPU

A number of techniques exist to smooth out quad meshes. Catmull-

Clark subdivision [Catmull and Clark 1978] is an accepted stan-

dard, but does not easily port to the GPU. Evaluation using Stam's

approach [Stam 1998] is too complex for large meshes on the GPU.

(a) quad neighborhood (b) bicubic

(c) c-patch

Figure 1: (a) A quad neighborhood defining a surface piece. (b) A bicubic

patch with 4 x 4 control points. This patch is the output if the quad is

ordinary, and used to determine the shape of a c-patch (c) if the quad is

extraordinary. A c-patch is defined by 4 x 6 control points displayed as

* and can alternatively, for analysis, be represented as four C1-connected

triangular pieces of degree 4 with degree 3 outer boundaries identical to the

bicubic patch boundaries.

Figure 2: GPU smoothed quad surfaces: orange patches correspond to

ordinary quads, blue patches to extraordinary quads.

Figure 3: GPU smoothed quad surfaces with displacement mapping.

[Bunnell 2005; Shiue et al. 2005; Bolz and Schr6der 2002] require

separated quad meshes, i.e. quad meshes such that each quad has at

most one point with valence n 1 4. To turn quad meshes into sep-

arated quad meshes usually means applying at least one Catmull-

Clark subdivision step on the CPU and four-fold data transfer to

the GPU. In more detail, [Shiue et al. 2005] implemented recursive

Catmull-Clark subdivision using several passes via the pixel shader,

using textures for storage and spiral-enumerated mesh fragments.

[Bolz and Schr6der 2002] tabulate the subdivision functions up to

a given density and linearly combine them in the GPU. [Bunnell

2005] provides code for adaptive refinement. Even though this code

was optimized for the previous generation GPUs that provided con-

nectivity by textures read in the pixel shader, this implementation

adaptively renders the Frog (Figure 2) in real-time. (See Section 5

for a comparison). The main difference between our and Bunnell's

implementation is that we decouple mesh conversion from surface

evaluation and therefore do not have the primitive explosion before

the second rendering pass. Moreover, we place conversion early in

the pipeline so that the pixel shader is freed for additional tasks.

Two alternative smoothing strategies mimic Catmull-Clark subdivi-

sion by generating a finite number of bicubic patches. [Peters 2000]

generates NURBS output, that could be rendered, for example by

the GPU algorithm of [Guthe et al. 2005]. But this has not been

implemented to our knowledge. The method of [Loop and Schae-

fer 2007] generates one bicubic patch per quad following the shape

of Catmull-Clark surfaces. Since these bicubic patches typically do

notjoin smoothly, they compute two additional patches whose cross

product approximates the normal of the bicubic patch. As pointed

out in [Vlachos et al. 2001], this trompe l'oeil represents a sim-

ple solution when true smoothness is not needed. Comparing the

number of operations in construction and evaluation, the method of

[Loop and Schaefer 2007] should run at comparable speeds to our

GPU quad mesh smoothing (see also Section 6).

2 The Conversion Algorithm

Here we give the algorithm. Analysis and implementation follow in

the next sections. Essentially, the algorithm consists of computing

new points near a vertex using Table 1 and, for each extraordinary

quad, additional points according to Table 2. In Section 3, we will

verify that these new points define a smooth surface and in Section

4, we show how the two stages are mapped to the vertex shader and

geometry shader, respectively.

fiJ P2

P 0 o P1

Jn,

C o

P2n- 2

P2n-1

Figure 4: Smoothing the vertex neighborhood according to Table 1. The

center point p,, its direct neighbors P2j and diagonal neighbors P2j+ form

a vertex neighborhood.

In the first part, we focus on a vertex neighborhood. A vertex

neighborhood consists of a mesh point p. and mesh points pk,

f, := (4p + _,-_ + -:._ 2 +P2j+1)/9

e := (f,++f,1)/2

v 1 n

Z ] ej

1 Y 0 Z cos 2(j e,, 0,1.

Table 1: Computing control points v, e, f and t, the projection

of e, at a vertex of valence n from the mesh points pj of a ver-

tex neighborhood; the subscripts are modulo 2n. By default, a, :

(cn + 5 + /(cn + 9)(cn + 1)) /16, the subdominant eigenvalue of

Catmull-Clark subdivision.

k 0,..., 2n 1 of all quads surrounding p. (Figure 4). A vertex

v computed according to Table 1 is the limit point of Catmull-Clark

subdivision as explained, for example, in [Halstead et al. 1993].

For n 4, this choice is the limit of bicubic subdivision, i.e. B-

spline evaluation. The rules for ej and fj are the standard rules for

converting a uniform bicubic tensor-product B-spline to its B6zier

representation of degree 3 by 3 [Farin 1990]. The points tj are a

projection of ej into a common tangent plane (see e.g. [Gonzalez

and Peters 1999]). The default scale factor a is the subdominant

eigenvalue of Catmull-Clark subdivision. We note that for n 4,

ej+2 2v ej and a 1/2 so that the projection leaves the

tangent control points invariant as tj = ej:

2

forn = 4, tj = v + (ej ej+2) = v + (ej v) = e. (1)

4

In the second stage, we focus on the quads. Combining informa-

tion from four vertex neighborhoods as shown in Figure 5, we can

populate a tensor-product patch g of degree 3 by 3 in B6zier form

[Farin 1990]:

g(u, ) 3 EE 3

k=0 =0 \/

u) 3-k ( (i

The patch is defined by its 16 control points gke. If the quad is

ordinary, the formulas of Table 1 make this patch the B6zier repre-

sentation of a bicubic spline in B-spline form. For example, in the

notation of Figure 7, (gko)k=o,..3 a (0v t, t,vl). If the quad is

S* *

/ 1_ _

e7_ ) _2 __ J

I,

Ifextraordinary

0 1 22 1 0 040

400 310 220 130 040

Figure 5: Patch construction. On the left, four vertex neighborhoods with

vertices vi each contribute one sector to assemble the 4 x 4 coefficients

of the B6zier patch g, for example 900 = v, 910 e 911 fo,

930 = 01, g31 eC (we use superscripts to indicate vertices; see also

Figure 7). On the right, the same four sectors are used to determine a c-patch

if the underlying quad is extraordinary. The indices of the control points of

g and bi are shown. Note that only a subset of the coefficients of the four

triangular pieces bi is ,, ru ,i computed to define the c-patch. The full set

of coefficients displayed here is only used to analyze the construction.

extraordinary, we use the bicubic patch to outline the shape as we

_o

11

b1 l l i + c / {t i + 1 I i + 1 C i { ,i

b11 := b31o + 4(C7 t) + t o- i)

S 4(s l+c i+l

b121 g: b3IO +3 1 --1 bT +1 f e)

.1121 21"+ b b11-)/16

+b12 g + 3(b11 + b1 b11 / 16-

@ 211 121 21 121

Table 2: Formulas for the 4 x 3 interior control points that, together with the

vertex control points v' and the tangent control points t define a c-patch.

See also Figures 7 and 8. Here ci := cos 2, s := sin 2 and superscripts

are modulo 4. By default, g* : (y3 0i + '+ i) + 9fi)/64, the

central point of the ordinary patch.

replace it by a c-patch (Figure 1, c). A c-patch has the right degrees

of freedom to cheaply and locally construct a smooth surface. We

introduce the c-patch in terms of a well-known B6zier form of a

polynomial piece b' of total degree 4 [Farin 1990]:

bi(u, u2) : bizt k un (1 -u u2)m. (2)

k+t+m=4

k,e,m>0

The c-patch is equivalent to the union of four bi, i 0,1, 2, 3 of

total degree 4, but defined by only 4 x 6 c-coefficients constructed

in Tables 1 and 2:

vi, to,t,bn211,b 21,bI12, i =0,1,2,3.

These c-coefficients imply the missing interior control points by

C1 continuity between the triangular pieces: for j 0, 1, 2, 3 and

i 0,1,2, 3,

b-j,o,1+j = b -j,l+j := (b j,,j +b ,3 j,j )/2; (3)

and the boundary control points b'eo are implied by degree-raising

[Farin 1990]:

b4oo : v, b1o : (vy + 3to)/4, bo2 : (0 + )/2,

b30 : (vl + 3tz )/4, bo4o := v (4)

In particular, a tensor-product

identical boundary curves of

Basis functions corresponding to

the 24 c-coefficients of the c-

patch can be read off by setting

one c-coefficient to one and all

others to zero and then applying

(3) and (4).

To derive the formulas for b211

and its symmetric counterpart

b621 note that the formulas must

guarantee a smooth transition

between bi and its neighbor

patch on an adjacent quad, re-

gardless whether the adjacent

quad is ordinary or extraordi-

nary. That is, the formulas are

derived to satisfy simultaneously

two types of smoothness con-

straints (see Section 3). By con-

patch g and a c-patch have

degree 3 where they meet.

Figure 6: Dark lines cover

the control points involved

in the C2 constraints (5).

The points on dashed lines

are implied by averaging.

trast, b112 is not pinned down by continuity constraints. We could

choose each '-, arbitrarily without changing the formal smooth-

ness of the resulting surface. However, we opt for increased

smoothness at the center of the c-patch and additionally use the

freedom to closely mimic the shape of Catmull-Clark subdivision

surfaces, as we did earlier for vertices. First, we approximately

satisfy four C2 constraints across the diagonal boundaries at the

central point bo04 by enforcing

-1 0 0 1 b012

1 -1 0 b 121

3 1 -1 [b 12

3 0 1 b 12

b211 b121

b211 b21

b211 b321

b 11 b 21

where q := o ,(bo11 bi21). The perturbation by q is nec-

essary, since the coefficient matrix of the C2 constraints is rank

deficient. After perturbation, the system can be solved with the last

equation implied by the first three. We add the constraint that the

average of b12 matches g. := g(2, ), the center position of the

bicubic patch. Now, we can solve for the b112, i 0, 1, 2, 3 and

obtain the formula of Table 2.

3 Smoothness Verification

In this section we formally verify the following lemma. For the

purpose of the proof, we view the c-patch in its equivalent repre-

sentation as four B6zier patches of total degree 4.

Lemma 1 Two adjacent polynomial pieces a and b defined by the

rules of Section 2 (Table 1, Table 2, (3), (4)) meet at least

(i) C2 if a and b correspond to two ordinary quads.

(ii) C if a and b are adjacent pieces of a c-patch;

(iii) C1 if a and b correspond to two qads, exactly one of which is

ordinary;

(iv) with tangent continuity if a and b correspond to two different

extraordinary quads;

Proof (i) If a and b are bicubic patches corresponding to ordinary

quads, they are part of a bicubic spline with uniform knots and

therefore meet C2. (ii) If a and b are adjacent pieces of a c-patch

then Equations (3) enforce C1 continuity.

For the remaining cases, let b be a triangular piece. Let u the pa-

rameter corresponding to the quad edge between b400 = v, where

u 0 and the valence is no and b040 = v where u 1 and

the valence is ni (see Figures 7 for (iii) and 8 for case (iv)). By

construction, the common boundary b(u, 0) a(0, u) is a curve of

degree 3 with B6zier control points (vo, to, ti, v1) so that bicubic

patches on ordinary quads and triangular patches on extraordinary

quads match up exactly.

Denote by 01b the partial derivative of b along the common bound-

ary and by 02b the partial derivative in the other variable. Since

b(u, 0) a(0, u), we have 1ib(u, 0) = 0a(0, u). The partial

derivative in the other variable of a is 02a. We will verify that the

following conditions hold, that imply tangent continuity:

if one quad is ordinary (case (iii)),

6lb(u, 0) -. -. /I.,, 0) + la(0, u); (6)

if both quads are extraordinary (case (iv)),

((1 u)Ao + uAi1)lb(u, 0) = 2b(u, 0) + 81a(0,u), (7)

where Ao := 1+ co, AI := 1- c and c := cos(2).

rni

Both equations, (6) and (7), equate vector-valued polynomials of

degree 3 (we write tib(u, 0) in degree-raised form [Farin 1990]).

The equations hold, if and only if all B6zier coefficients are equal.

Off hand, this means checking four vector-valued equations for

each of (6) and (7). However, in both cases, the setup is symmet-

ric with respect to reversal of the direction in which the boundary

b(u, 0) is traversed. That means, we need only check the first two

equations (6') and (6") of (6) and the first two equations (7') and

(7") of (7). We verify these equations by inserting the formulas of

Tables 1 and 2.

0 .

Cl .0

Figure 7: C1 transition between a triangular and a bicubic patch.

To verify (6), the key observation is that no = nr 4 if one quad

is ordinary. Hence co c 0 and so s 1 (cf. Table 2) and

t = ej. Therefore, for example (cf Figure 7)

'. /.(0, 0) 2 4(b301 u) = 8 v

4 2

= 3(e + e7) 6v0,

where the factor stems from raising the degree from 3 to 4; and

the second B6zier coefficient of dib(u, 0) (in degree-raised form)

and 'I _'I. _i., 0) are respectively (cf. Figure 7)

3(eo

v0)+ 2(e eo)

2 4(b211-3) (

2 4(b211 ba i) 8(e1

and

- o e v f eo

4 8 8

Then, comparing the first two B6zier coefficients of ib (u, 0) and

_. /,i.., 0) + Oia(O, u) yields equality and establishes C continu-

ity:

3(eo vo) = 3(eo +

8lb(0,O) 28

(eo )+2(e o) 2(e -

-3(f

e) 6v -3(eo v )

2b(0,0) 1ia(0,0)

o) + (e

0ei).

00) + 3(f o)

(6")

Figure 8: G1 transition between two triangular patches.

The equations for (7) are similar, except that we need to replace ej

by tj and keep in mind that, by definition,

(to-i o) + (t o) = 2co(to o).

Hence, for example,

t2b(0, 0) + dia(0, 0) 4(b301 v + a301

3 0 .2c

S-4 2c (to v).

*f eIc

The first of the four coefficient equations of(7) then simplifies to

3(1 + c) (t vo) = 4(b301 + 301 2v)

= (tt + o V, + -'t o )

2 2

=3 (2c (to- v) + 2(t v)). (7')

2

Noting that terms (fo eo)/(8(s0 + s1)) in the expansions of b211

and a211 cancel, the second coefficient equation is

6Ao(ti to) + 3Ai(t vo) 12(b211 + a21 2b310)

12 2(1 + co) 12 2(1 c) (o o.

(t1 t) + 8 -- v ). (7")

4 8

It is easy to read off that the qualities hold. So the claim of smooth-

ness is verified. II

4 GPU Implementation

We implemented our scheme in DirectX 10 using the vertex shader

to compute vertex neighborhoods according to Table 1 and the ge-

ometry shader primitive triangle with adjacency to accumulate the

coefficients of the bicubic patch or compute a c-patch according to

Table 2. We implemented conversion plus rendering in two vari-

ants: a 1-pass and a 2-pass scheme.

The 2-pass implementation constructs the patches in the first pass

using the vertex shader and the geometry shader and evaluates po-

sitions and normals in the second pass. Pass 1 streams out only the

4 x 6 coefficients of a c-patch and not the 4 x (4+2) B6zier control

points of the equivalent triangular pieces. The data amplification

necessary to evaluate takes place by instancing a (u, v)-grid on the

vertex shader in the second pass. That is, we do not stream back

large data sets after ..i.!l; i.... .* Position and normal are com-

puted on the (u, v) doni ii [ 11 i- of the bicubic or of the c-patch

(not on any triangular domains). In our implementation, the num-

ber ofALU ops for this evaluation is 59 both for the bicubic patch

and for the c-patch. Table 3 lists the input, output and the compu-

tations of each pipeline stage. Figure 9 illustrates this association

of computations and resources. Overall, the 2-pass implementation

has small stream-out, short geometry shader code and minimal am-

plification on the geometry shader.

In the 1-pass implementation, the evaluation immediately follows

conversion in the geometry shader, using the geometry shader's

ability to amplify, i.e. output multiple point primitives for each facet

(Figure 10). While a 1-pass implementation sounds more efficient

than a 2-pass implementation, DX10 limits data amplification in

the geometry shader so that the maximal evaluation density is 8 x 8

per quad. Moreover, maximal amplification in the geometry shader

slows the performance. We observed a minimum of "' better

performance of the 2-pass implementation.

5 Results

We compiled and executed the implementation on the latest graph-

ics cards of both major vendors under DirectX10 and tested the

performance for several industry-sized models. Two surface mod-

els and models with displacement mapping are shown in Figure 2

............

1P., n, a

SVertex Shader

| v, to,t if,

t i ,t i fl

'N,

Coefficients -- o o or

gk, b400 t',-

b211i, b121i, b112i

Input Assemble

(u, V)

Vertex Shader

position,

normal

Pixel Shader

t color

:-r

.*1.

Figure 9: 2-pass implementation detailed in Table 3. The first pass con-

verts, the second renders. Note that the geometry shader only computes at

most 24 coefficients per patch and does not create (amplify to) evaluation

point primitives.

Input Assembler

P,, n, o

SVertex Shader

v, to,ti,f,

v, to'.t fi

Geometry Shader

position, normal

Pixel Shader

I color

K>

Figure 10: At present, the 1-pass conversion-and-rendering must place

patch assembly and evaluation on the geometry shader. This is not efficient.

Pass 1

Conversion

VS In p., n,

VS Use texture lookup to retrieve p2j, p2j+l

Compute v, ej, fj, to, t (Table 1)

VS Out v,to,ti,fj,j O..n 1

GS In v', t', t', f i = 0..3

GS if ordinary quad

assemble gk, k, = 0..3 (Figure 5)

else

compute b211, b121, b112 (Table 2)

GS Out if ordinary quad, stream out gk,, k, 1 0..3.

else stream out bo00, to, ti, b11, b21, b112,

i 0..3.

Pass 2 Evaluating Position and Normal

VS In (u, v)

VS if ordinary quad

compute normal and position at (u, v)

by the tensored de Casteljau's algorithm

else

Compute the remaining Bezier control points (3)

Compute normal and position at (u, v)

by de Casteljau's algorithm adjusted to c-patches.

VS Out position, normal

PS In position, normal

PS compute color

PS Out color

Table 3: 2-Pass conversion: VS=vertex shader, GS=geometry

shader, PS=pixel shader. VS Out of Pass 1 outputs n points fj for

one vertex (hence the subscript) and GS In of Pass 1 retrieves four

points fi, each generated by a different vertex of the quad (hence

the superscript).

and 3 respectively. Table 4 summarizes the performance of the 2-

pass algorithm for different granularities of evaluation. The frog

model, in particular, provides a challenge due to the large number

of extraordinary patches. The Frog Party shown in Figure 14 cur-

Mesh Frames per second

(verts,quads, eqs) N 5 9 17 33

Sword (140,138, 38%) 965 965 965 703

Head (602,600, 100%) 637 557 376 165

Frog (1308,1292, 59%) 483 392 226 87

Table 4: Frames per second for some standard test meshes with

each patch evaluated on a grid of size N x N; eqs = percentage of

extraordinary quads. Sword and Frog are shown in Figure 2, Head

in Figure 11.

rently renders at 50 fps for uniform evaluation for N=9, i.e. on a

9 x 9 grid. That is, the implementation converts 1292 9 quads,

of which 59% are extraordinary, and renders of 1 million polygons

50 times per second. On the same hardware, we measured Bun-

nell's efficient implementation (distribution accompanying [Bun-

nell 2005]) featuring the single frog model, i.e. 1/9th of the work of

the Frog Party, running at 44 fps with three subdivisions (equivalent

to tessellation factor N=9). That is, GPU smoothing of quad meshes

is an order of magnitude faster. Compared to [Shiue et al. 2005],

the speed up is even more dramatic. While the comparison is not

among equals since both [Shiue et al. 2005] and [Bunnell 2005] im-

plement recursive Catmull-Clark subdivision, it is nevertheless fair

to observe that the speedup is at least partially due to our avoiding

stream back after amplification (data explosion due to refinement).

Surface

Geometry

Difference (%)

2

I 4

I 6

Normal Angle

Difference (0)

CC Our Scheme CC Our Scheme

0099

rq 40P .

l

Figure 11: Comparison between the Catmull-Clark (CC) subdivision limit

surface and the smoothed quad mesh surface for the same input.

We expect that more careful storage of vertex neighborhoods, in

retrieving order, will further improve our use of texture cache and

thereby improve the frames per second (fps) count.

Figure 11 compares the smoothed quad mesh surfaces with densely

refined Catmull-Clark subdivision surfaces based on the same

mesh. Both geometric distance, as percent of the local quad size,

and normal distance, in degrees of variation, are compared. Es-

pecially after displacement, large models rendered by subdivision

and quad smoothing appear visually indistinguishable. The rela-

tively small examples, without displacement, shown in Figure 11

are also important to support our observation that c-patches do not

create shape problems compared to a single bicubic patch: despite

the lower degree and internal C1 join, their visual appearance is

remarkably similar to that of bicubic patches.

The accompanying video (see screen shots in Figures 12, 13, 14)

illustrates real time displacement and animation. It was captured

with a camcorder to show real time performance. The fps rates

shown are lower than the ones in Table 4 since we captured it be-

fore we separated ordinary and extraordinary quad conversion in

the implementation.

6 Discussion

Smoothing quad meshes on the GPU offers an alternative to highly

refined facet representations transmitted to the GPU and is prefer-

able for interactive graphics and integration with complex morph-

ing and displacement. The separation into vertex and patch con-

struction means that the number of scaled vertex additions (adds)

per patch is independent of the valence. The cost of computing

the control points per patch, i.e. with the cost of vertex computa-

tions distributed, is 4 x (4 + 1 + 1 + 2) 32 adds per bicu-

bic construction and computing tj from to and tl and determining

b"11,,'_ and ',, _. according to Table 2 amounts to an additional

4 x (2 + 6 + 6 + 12) 104 adds per c-patch. The data transfer be-

tween passes in the 2-pass conversion is low since only 4x 6 control

points are intermittently generated. This compares favorably to, say

[Loop and Schaefer 2007] where 16+12+12 coefficients are gener-

ated.

Since we only compute and evaluate in terms of the 24 c-patch

coefficients, the computation of the cubic boundaries shared by a

bicubic and a c-patch is mathematically identical. An explicit 'if'-

statement in the evaluation guarantees the exact same ordering of

computations since boundary coefficients are only computed once,

in the vertex shader, according to Table 1. That is, there is no pixel

drop out or gaps in the rendered surface. The resulting surface is

watertight.

We advertised a 2-pass scheme, since, as we argued, the DX10 ge-

ometry shader is not well suited for the data amplification for eval-

uation after conversion. The 1-pass scheme outlined in Section 4

may become more valuable with availability of a dedicated hard-

ware tessellator [Lee 2006]. Such a hardware amplification will

also benefit the 2-pass approach in that the (u, v) domain tessella-

tion, fed into the second pass will be replaced by the amplification

unit.

Acknowledgment:

References

BOLZ, J., AND SCHRODER, P. 2002. Rapid evaluation of Catmull-

Clark subdivision surfaces. In Web3D '02: Proceeding of the

seventh international conference on 3D Web technology, ACM

Press, New York, NY, USA, 11-17.

BUNNELL, M. 2005. GPU Gems 2: Programming Techniques

for High-Performance Graphics and General-Purpose Compu-

tation. Addison-Wesley, Reading, MA, ch. 7. Adaptive Tessella-

tion of Subdivision Surfaces with Displacement Mapping.

CATMULL, E., AND CLARK, J. 1978. Recursively generated B-

spline surfaces on arbitrary topological meshes. Computer Aided

Design 10, 350-355.

FARIN, G. 1990. Curves and Surfaces for Computer Aided Geo-

metric Design: A Practical Guide. Academic Press.

GONZALEZ, C., AND PETERS, J. 1999. Localized hierarchy sur-

face splines. In ACM Symposium on Interactive 3D Graphics,

S. S. J. Rossignac, Ed., 7-15.

GUTHE, M., BALAZS, A., AND KLEIN, R. 2005. GPU-based

trimming and tessellation of NURBS and T-spline surfaces.

ACM Trans. Graph. 24, 3, 1016-1023.

HALSTEAD, M., KASS, M., AND DEROSE, T. 1993. Efficient,

fair interpolation using Catmull-Clark surfaces. Proceedings of

SIGGRAPH 93 (Aug), 35-44.

LEE, A., MORETON, H., AND HOPPE, H. 2000. Displaced subdi-

vision surfaces. In Siggraph 2000, Computer Graphics Proceed-

ings, ACM Press / ACM SIGGRAPH / Addison Wesley Long-

man, K. Akeley, Ed., Annual Conference Series, 85-94.

LEE, M., 2006. Next generation graphics programming

on Xbox 360. http://download.microsoft.com/download

/d/3/0/d30d58cd-87a2-41d5-bb53-baf560aa2373/next_ genera-

tion_graphics_programming_on_xbox_360.ppt.

LOOP, C., AND SCHAEFER, S. 2007. Approximating Catmull-

Clark subdivision surfaces with bicubic patches. Tech. rep., Mi-

crosoft Research, MSR-TR-2007-44.

PETERS, J. 2000. Patching Catmull-Clark meshes. In Siggraph

2000, Computer Graphics Proceedc ,,' . I Press / ACM SIG-

GRAPH / Addison Wesley Longman, K. Akeley, Ed., Annual

Conference Series, 255-258.

SHIUE, L.-J., JONES, I., AND PETERS, J. 2005. A realtime GPU

subdivision kernel. ACM Trans. Graph. 24, 3, 1010-1015.

STAM, J. 1998. Exact evaluation of Catmull-Clark subdivision

surfaces at arbitrary parameter values. In SIGGRAPH, 395-404.

VLACHOS, A., PETERS, J., BOYD, C., AND MITCHELL, J. L.

2001. Curved PN triangles. In 2001, Symposium on Interactive

3D Graphics, ACM Press, Bi-Annual Conference Series, 159

166.

Figure 12: Real time displacement on the twisting Sword model. See the

video.

aF^;AF

Figure 13:

video.

Real time displacement on the twisting Frog model. See the

Figure 14: Asynchronous animation of nine Frogs. See the video.