Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00031479/00001
## Material Information- Title:
- A ray casting visibility algorithm for cubic surfaces
- Creator:
- BÃ¼yÃ¼kliÌÂ‡manli, IsÌ¦in GÃ¼rses
- Publication Date:
- 1991
- Language:
- English
- Physical Description:
- v, 101 leaves : ill. ; 28 cm.
## Subjects- Subjects / Keywords:
- Algorithms ( jstor )
Boxes ( jstor ) Computer graphics ( jstor ) Coordinate systems ( jstor ) Engines ( jstor ) Parametric surfaces ( jstor ) Pipelines ( jstor ) Pixels ( jstor ) Ray tracing ( jstor ) Silhouettes ( jstor ) Algorithms ( lcsh ) Computer and Information Sciences thesis Ph. D Dissertations, Academic -- Computer and Information Sciences -- UF Geometry -- Data processing ( lcsh ) Image processing -- Digital techniques ( lcsh ) Three-dimensional display systems ( lcsh ) - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1991.
- Bibliography:
- Includes bibliographical references (leaves 95-100).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by IsÌ¦in GÃ¼rses BÃ¼yÃ¼kliÌÂ‡manli.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for non-profit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
- Resource Identifier:
- 026492077 ( ALEPH )
25222089 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

A RAY CASTING VISIBILITY ALGORITHM FOR CUBIC SURFACES By ISIN GURSES BUYUKLIMANLI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991 ACKNOWLEDGEMENTS I would like to express my appreciation to my advisor and supervisory committee chairman, Dr. John Staudhammer, for the guidance and encouragement he provided me on this project. I am also grateful to the other members of my supervisory committee, Dr. James Keesling, Dr. Steve Thebaut, Dr. Panos Livadas, and Dr. Carl Crane, for their commitment. I also wish to thank the members of the Computer Graphics Research Group, especially Kris Iskandar and Hasan Incirlioglu, for their help and suggestions. I want to thank to Dr. Murat Balaban for his encouragement and the help he provided. Last but not least, I thank my husband, my best friend, Temel, for his help, support, and love throughout this dissertation. This dissertation is dedicated to my parents, Yurdanur and Muzaffer GUrses. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS ..................................... ABSTRACT............................... 1 INTRODUCTION.......................... 1.1 Research Motivation and Objective........ 1.2 Dissertation Overview................. 2 REVIEW OF BASIC CONCEPTS USED......... 2.1 Image Synthesis ...................... 2.1.1 Visibility Determination......... 2.1.2 Ray Casting .................. 2.2 Curved Surfaces ..................... 3 PROBLEM DEFINITION AND REVIEW ALGORITHMS ....................... 3.1 Problem Definition ................... 3.2 Literature Review.................. 3.2.1 Numerical Techniques .......... 3.2.2 Subdivision Techniques ......... 3.2.3 Parallel Processing Techniques ... OF EXISTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 THE PROPOSED ALGORITHM ................ 4.1 Preprocessing Steps................... 4.1.1 Selection of a Bounding Volume . .. 4.1.2 Construction of a Bounding Box Tree 4.1.3 Determination of Boundary Edges .. 4.1.4 Location of Silhouette Edges ....... 4.1.5 Calculation of Surface Normal ...... 4.2 Newton's Method ...................... 4.3 Algorithm........................... 4.4 Lighting Model....................... 4.5 Results and Discussion .................: 4.5.1 Complexity and Performance Analysis 4.5.2 Simulation Results ............... iii ii V 1 2 4 5 5 8 10 17 28 28 30 31 33 35 37 38 38 41 42 43 45 48 53 57 60 61 64 ........... ........... ........... 5 THE PARALLEL IMPLEMENTATION OF THE ALGORITHM........66 5.1 Coherence .......................................... 69 5.2 Forward Differencing Technique.......................... 70 5.3 Pipelined and Parallel Architecture ........................ 76 5.4 Results and-Discussion ................................. 77 5.4.1 Complexity and Performance Analysis ................80 5.4.2 Simulation Results .............................. 88 6 CONCLUSION ............................................90 6.1 Summary ........................................... 90 6.2 Future Work ........................................ 93 BIBLIOGRAPHY ............................................ 95 BIOGRAPHICAL SKETCH....................................101 iv Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A RAY CASTING VISIBILITY ALGORITHM FOR CUBIC SURFACES By Ipin Giirses Biiyuklimanli May 1991 Chairman: Dr. John Staudhammer Major Department: Computer and Information Sciences An algorithm for ray casting parametric surfaces is developed for visibility calculation. The new algorithm solves the ray/surface intersection directly using Newton iteration and utilizes ray-to-ray coherence. The computation required is reduced using bounding boxes as bounding volumes, coherence, and forward differencing methods. A parallel implementation of the algorithm is also presented taking advantage of the parallelism of ray casting to further reduce the algorithm's computational requirements. The final parametric surface intersection algorithm gives a good combination of speed and accuracy. The algorithm seems to be well suited for hardware implementation. Algorithms as well as various architectures are reviewed and some of them are also compared to the new algorithm to prove its efficiency. A system simulator is developed to verify the algorithm and the functional behavior of the system architecture. v CHAPTER 1 INTRODUCTION Computer graphics produces pictures generated by a computer programmed and operated by human beings. Only the imagination of the user and the capability of the hardware limits the form they can take: animation, simulation, graphs, recreations, portraits, architecture, drawings, paintings, or just simple lines. Because computer graphics is a way of making thoughts and ideas visible, these revolutionary images are becoming the universal language of our time [WARD89]. The design and manufacture of aircraft, turbo machinery, and automobiles, and recent applications in the advertising and animation industries have become driving forces behind research into techniques for surface design. Curves and surfaces having free-form shapes such as faces, clothes, shoes, clouds, mountains, car bodies, airplanes, are important in the construction of three-dimensional objects for image synthesis in computer graphics. The vast majority of curved surfaces may be defined with a relatively few numbers by using piecewise continuous surface patches [BART87]. However, the display of these surfaces, the rendering, on a high-definition display requires the computation of coloring on a pixel-by-pixel basis. Clearly this can be a computationally intensive task, even though the color variations are often slight. 1 2 One of the more widely used methods for finely rendering objects on a display screen is ray casting. As the name implies, the procedure calls for calculating the light properties of a ray of light which travels from the surface elements in a pixel to the observer. Ray casting is a powerful yet simple approach to image generation. This method has become one of the important tools from which highly realistic images are generated. But it also has disadvantages. The major drawback of this procedure is the great amount of time to compute ray/surface intersections. In this dissertation, this problem is attacked and a new ray casting visibility algorithm is devised. This algorithm has been compared with some algorithms that are widely used. The results show that its execution time is a bit better than the others. It has unique properties: it is short, elegant and has some interesting applications. 1.1 Research Motivation and Objective This dissertation was motivated by the lack of efficient ray casting algorithms for curved surfaces. This work especially deals with ray casting of parametric surfaces. The parametric method of surface representation is very convenient for approximation and design of curved surfaces. In particular, B~zier surface patches and B-Spline surfaces are extensively used in computer graphics and CAD. The research has focused on efficient and fast algorithm design with the objective of determining the ray/object intersections as fast as possible. The research has also addressed both the efficiency and speed of existing algorithms and analyzed 3 deficiencies of those algorithms. This process led to a new approach to resolve those deficiencies. As the first part of the research, the Newton-Raphson method is reformulated for Be-zier surfaces to reduce the complexity of the ray/object intersections. Since every ray casting algorithm is by nature an object-oriented one, this characteristic made the design and implementation attractive by applying object-oriented concepts and ideas. For applications that require fast transformation such as ray casting and rendering of complex shaded images, a parallel, pipelined multiprocessor architecture can be used to achieve high real-time performance for image synthesis. A parallel architecture offers an increase in speed that grows almost linearly with the number of processors used. To extract the maximum efficiency from a parallel architecture, key bottlenecks must be identified and an architecture that is best suited to overcoming those bottlenecks must be implemented. Since the calculation for a ray cast through a pixel in the screen is independent of the calculation for other rays cast, this process can be implemented in parallel. Therefore, as a second part of this research, a VLSI-oriented real-time algorithm is developed that combines efficiency, simplicity, speed, and parallelism. Using an incremental technique, the computation for the intersections of a ray with surfaces is reduced. The algorithm is designed in a way that each intersection processor can be placed in a VLSI chip. In a practical display system several of these chips may be used to calculate surface-ray intersections at pixels or groups of pixels. 4 1.2 Dissertation Overview This dissertation is concerned with the design of an efficient algorithm for display purposes. It is organized into six chapters. Chapter 1 is an introductory chapter. that covers objectives and background about the dissertation subject. In chapter 2 a review is presented of the basic concepts used devising the algorithm. Chapter 3 defines the problem and gives a review of existing algorithms. Chapter 4 discusses the details of the algorithm and its complexity and performance analysis. Chapter 5 describes the techniques to reduce the computation required and parallel implementation of the algorithm and complexity and performance analysis results are given. Finally, chapter 6 summarizes the results along with suggestions for future work. In summary, the major contributions of this work are the design of an efficient ray/object intersection algorithm, and improvements on this algorithm to make it suitable for parallel hardware implementation. CHAPTER 2 REVIEW OF BASIC CONCEPTS USED 2.1 Image Synthesis Image synthesis is the subfield of computer graphics that is concerned with the production of pictures defined by numbers and procedures. If the image is to appear true to a physical object, the process may be costly, complex and may require a tremendous amount of computing power. There is a tradeoff present in all image synthesis applications between realism and the amount of required computations. The image synthesis is usually accomplished with a series of processes, termed the image synthesis pipeline. Based on a simulation of light propagation in the real world, the image synthesis pipeline consists of several steps: creating data representations, geometry processing (modeling transformation, viewing operation), rasterization (visi bl e-surface determination, scan conversion, shading), and displaying the result. Geometry processing is independent of the display device, whereas the rasterization pipeline takes transformed, clipped primitives and produces pixels. The display process that maps a model to an image on screen is called rendering, and its implementation in software and/or hardware is referred to as the rendering pipeline. This standard graphics rendering, the image synthesis pipeline, is shown in Figure 2.1. 5 6 Object Data Base 4 Viewing Transformations 4 Clipping: 4 Projective Transformations FVisible Surface Calculations 4 Scan Conversion 4 Shading] 4 Iage 4 Display Controlled 4 Video Display Figure 2.1 Standard Rendering Pipeline 7 The object database contains geometric, optical and material information of the objects (also called primitives). The geometry representation of each object depends on the type of primitive used by the image renderer. The type of object used here is a curved surface, whose characteristics will be explained later. To render the scene image correctly, the object is transformed from modeling coordinates to world coordinates. Next, world coordinate descriptions are converted to viewing coordinates. That is, the scene to be viewed is clipped against a view volume (frustum) and projected into the view area defined on the view plane. There are two basic methods for projecting 3D objects onto a 2D viewing surface. All points of the object can be projected to the surface along parallel lines, or the points can be projected along lines that converge to a position called the center of the projection. These two methods are called parallel projection and perspective projection, respectively. A parallel projection preserves relative dimensions of objects (this is the technique used in drafting to produce scale drawings of three-dimensional objects) but does not give a realistic representation. A perspective projection, on the other hand, produces realistic views but doesn't preserve relative dimensions. The farther a primitive or part of a primitive is from the viewer, the smaller it will be drawn. This effect, known as perspective foreshortening, is similar to the effect achieved by a photograph and by the human visual system. The next step in the image generation process is visible-surface calculation, also called hidden-surface removal. Visible-surface determination is the process of determining which portions of a primitive are actually visible from the synthetic camera's point of view. Most 8 hidden-surface algorithms use image-space methods and in this study the ray casting technique as a point-sampling and image-space algorithm is used. Ray casting is compatible with the image compositing network designed by Fleischman [FLE188]. After determining the pixels covered by a primitive's image, scan conversion, the color for each covered-pixel is determined through the use of lighting equations, i.e. shading. This color value, along with its z coordinate and opacity, is passed to the compositing network for final display. 2.1,1 Visibility Determination The task of visibility determination of an object on a raster display has been originally referred to as the hidden-surface problem. Different methods for visiblesurface calculation are well established. It can be categorized into two groups: point sampling algorithms and continuous algorithms. While Z-buffer, ray casting/ tracing, painter's, and scan-line algorithms belong to the first group, area/volume subdivision and scan-plane algorithms belong to the second. Hidden- surface algorithms are also classified according to whether they deal with object definitions directly or with their projected images. These two approaches are called object-space methods and imagespace methods, respectively. An object-space method implemented in the physical coordinate system compares objects and parts of objects to each other to determine which surfaces are visible. These results are generally accurate to the precision of the machine. In an image-space algorithm, visibility is decided point by point at each pixel position on the projection plane. That is, calculations are performed only to 9 the precision of the screen. Therefore, image-space algorithms are also called pointsampling algorithms. The computational work for an object-space algorithm that compares each object in a scene with the remaining (n-i) objects grows as the number of objects squared (n2). Similarly, the work for an image-space algorithm which compares every -object in the scene with every pixel location in screen coordinates is proportional to nN, where N is the number of pixels, typically (5122) or more. While one might therefore believe that object-space algorithms require less work than image-space algorithms even when n exceeds 100,000, in practice, this is not the case. Most of the individual steps in object-space algorithms are more timeconsuming while it is easier to take advantage of coherence in raster scan implementation of image-space algorithms. Visibility determination is a fundamental and recurring subproblem in computer graphics. It is done in a 3D space prior to the projection into 2D that destroys the depth information needed for depth comparisons. The visible-surface problem for the simple case (high-order visibility problems being such as those calculating motion blur, depth of field, or indirect-illumination effects) can be stated as follows: Given a set of surfaces in three dimensions, a viewpoint, an oriented image plane, and a field of view, 10 For each pixel on the image plane within the field of view do{ Determine which pixel on which surface lies closest to the viewpoint along a line from the viewpoint through the pixel in the image plane. Draw the pixel in Thie appropriate color. The visibility determination of curved surfaces involves real-time image generation process. Ray casting determines the visibility of surfaces by tracing imaginary rays of light from the viewer's eye to the objects in the scene. This is a typical image-space algorithm discussed before and will be used throughout this dissertation for visibility determination. 2.1.2 Ray Casting Although ray casting has been around before computers, it became popular in the computer graphics community through the paper by Whitted [WHIT8O]. Ray casting is a technique which accomplishes a mapping from 3D display to 2D image plane by shooting a ray from the eye through each pixel on the screen to the world scene to identify the visible surface [KAUF86, ROTH82]. In the literature, ray casting and ray tracing are often used synonymously, defining a full recursive algorithm that integrates both visible surface determination and shading. However, in this dissertation, the use of ray tracing or ray casting refers only to visible surface 11 Object Data Base 4 Viewing Transformations 4 Ray Casting 4 Shading 4 Image Memory 4 Display Controller 4 Video Display Figure 2.2 Rendering Pipeline for Ray Casting. 12 algorithm. There is no tracing of additional rays from the points of intersection to determine a pixel's shade. As mentioned before, the rendering pipeline is a complex process, but the ray casting pipeline is the simplest among others because those objects that are visible at each pixel and their illumination are determined entirely in world coordinates (see Figure 2.2). Once objects have been obtained from the database and transformed by the viewing transformation, they are loaded into ray caster's world coordinate database. Ray casting is one of the more widely used methods for finely rendering objects on a display screen. The ray comes from the light source and hits the object. Some reach the observer and let the observer see the scene. If we trace the rays like this (from the light source to the observer), very few will reach the observer. Since this is obviously not an efficient way, we cast the rays in the opposite direction, from the eye through each pixel on the screen to the world scene. The first surface that an individual ray intersects will be the surface that is displayed in that particular pixel. The ray casting process is shown in Figure 2.3. This approach avoids the need to explicitly define silhouette edges for the general curved patch models and it also allows illumination effects to be modelled in a direct way. The problem here is intersecting the rays with a large collection of primitive objects defining an environment. The time spent to compute ray/surface intersections is the bottleneck of the ray casting algorithm. For each pixel in the raster, the corresponding ray is traced to the first surface that it encounters to 13 PVisible Surface Image Plane I I I I I I I y I I I I I I V1 I I I I I I A I I I I I I I I I V1 I I I I V1 I I i LJ Observer Figure 2.3 Ray Casting Process 14 determine which objects in the scene are intersected by the ray. Even for a screen resolution of 512x512, 250,000 rays should be traced (generated). The points of intersection of this ray with the surfaces are computed and the intersections are sorted in depth. The surface closest to the image plane is the visible surface through a particular pixel. The general algorithm for the ray casting is as follows: Read in the parameters for the scene from the model file Perform preprocessing for y = miny to maxy for x = minx to maxx send a ray from origin through pixel (x,y) for object = 1 to num-of-objects determine closest object to origin along ray determine lighting of closest object color pixel (x,y) The ray casting algorithm handles one pixel at a time and compares every object's depth at this pixel. As Whitted [WHIT8O] noted, it can take as much as 95 percent of the total processing time just for the ray/surface intersection calculations. There are two popular directions which can be considered for speeding up ray casting. One way is to reduce the number of candidates to be checked against each ray, at the expense of a little extra processing. Another way is to design a more 15 efficient ray/surface intersection procedure. The techniques discussed in this dissertation belong to these two directions. One approach is called space subdivision. The idea is that we can look at the world that surrounds the database and cut it up into many little volumes called voxels. As the ray moves through the world looking for the first object it hits, it essentially moves from volume to volume. If each volume contains a list of the objects within it, then we only need to check the ray against those objects. If we do get a hit, then we do not need to check the ray against all the other objects in the database; otherwise we move to the next volume and try again. There are two subdivision schemes. In the uniform subdivision proposed by Fujimoto et al. [FUJ186] and also described by Cleary and Wyvil [CLEA88], the object space is partitioned into voxels along three major axes by a regular 3D grid. Glassner [GLAS84] and Kaplan [KAPL85] describe adaptive subdivision in which a hierarchical tree known as octree is constructed to represent the given space by recursively subdividing it into eight equal size and disjoint voxels. Another approach is the bounding volume technique. A simple bounding volume such as a sphere or a box is placed around each object. Using bounding volumes results in a significant gain in efficiency. Only if a ray intersects the bounding volume does the object itself need to be checked for intersection. Though this actually increases the amount of computation for rays which come near enough to an object to pierce its bounding volume, in a typical environment most rays closely approach only a small fraction of the objects. Used in this way, bounding volumes 16 substitute simple intersection checks for more costly ones but do not decrease their number. From a theoretical standpoint this may reduce the computation by a constant factor but cannot improve upon the linear time complexity of exhaustive ray tracing. To alleviate this problem, Rubin and Whitted [RUBL8O] introduced the notion of hierarchical b-ounding volumes to ray tracing in order to attain a theoretical time complexity which is logarithmic in the number of objects instead of being linear. The method of hierarchical extents uses a tree search to find the objects hit by a ray. In the best case it yields O(logn) intersection calculations per ray. Since tree nodes are small in comparison to object data, only about 30 percent extra space is required to store the hierarchy required for machine use in a typical implementation. The idea is further developed by Kajiya [KAJI83J, Weghorst et al. [WEGH84], and Kay and Kajiya [KAY86]. In the algorithm proposed here, to eliminate unnecessary intersection tests, the intersection of a ray with a volume bounding the object is tested. There are two criteria for a good bounding volume. First, the volume must be easily tested for an intersection with a ray. The bounding volume intersection test requires significantly less computation time than testing against the original object. Second, although the bounding volume may be simple to test, it should be as tight a fit about the object as possible. With a tighter fit, the rays that hit the bounding volume will be minimized. The bounding box scheme used here is similar to the slab-plane scheme developed by Kay and Kajiya [KAY86] and later adopted by Hsu [HSU9O] in his dissertation. The construction of bounding boxes will be explained in detail in chapter 4. 17 2.2 Curved Surfaces The use of straight line segments and polygons to approximate curved lines and surfaces results in visually objectionable images, even with the most sophisticated continuous-shading models. Mach bands appear at the borders between adjacent polygons, and we always- see a polygonal silhouette. Also, polygonal methods often require excessive amounts of storage. However, curved surface techniques allow the resulting image to be computed. to whatever level detail the situation demands. Curved surface representations are important in computer graphics applications, because they are often more compact than polygonal representations and hence require relatively compact databases. Surfaces may be represented in general by the 3-space relationship F(xyz) = 0. This is the implicit representation of a surface. Unfortunately, the determination of the depth (z) of a surface point from a known ray direction (xy) involves the solution of an implicit relationship, which may become computationally difficult, even untractable. On the other hand, the use of parametric curves and surfaces for object modeling in computer graphics is becoming increasingly popular. Parametric representation is the most direct and most flexible description of a surface. It has many desirable properties [FAUX79, MUDU85, ROGE76]: 1. It allows a smooth object with no analytical definition to be approximated with a relatively small number of surface patches that join with continuity, thereby saving considerable memory space in representing the object. 18 2. It is axis independent: it avoids infinite slope values with respect to some arbitrary axis system. Difficulties arise using axis-dependent, nonparametric curves when the end point of a curve has a vertical slope relative to the chosen coordinate system. 3. It facilitates the representation of space curves in homogeneous coordinates. 4. Transformations carf be per-formed directly on parametric equations. Translation or rotation of the axes or the object can usually be carried out by translating or rotating the vectors defining the curve or patch, without modifying the functions of the parameters used. 5. It allows the unambiguous representation of multivalued surfaces or space functions. 6. It doesn't involve solution of nonlinear equations for each point (as in the case of implicit representation). 7. Points on the surface can easily be computed sequentially along parametric lines in the surface for display purposes. 8. Parametric surface patches lend themselves to piecewise constructions more naturally than do implicit surfaces. Two reasons for this are that parametric patches are defined over a finite domain and they have distinct boundary curves. By contrast, implicit surfaces can be of infinite extent. 9. Parametric equations usually offer more degrees of freedom for controlling the shape of curves and surfaces. For example, a 2D explicit curve of the form y = ax3 +bx2+a+d has four coefficients that we may vary to control the curve. 19 However, a 2D parametric cubic curve of the form x =au3+ bu2+cu +, y = eu+bu2+gu+h has eight coefficients that may be used for shape control. Curved surfaces can be described by parametric bicubic patches, Be-zier patches, and B-spline patches which are defined by cubic equations of two parameters, u and v [FOLE82]. Varying both parameters from 0 to 1 defines all points on a surface patch. Cubic patches are determined by a collection of 16 3D control points. Because these patches can be pieced together while maintaining second-order continuity at the boundaries, they are very useful for modeling smooth surfaces. Since two parameters are required for the representation of a surface, we assume that a surface may be represented by a bivariate vector-valued function Q(u'v) = [x(u'v) y(u'v) z(u'v)I (2.1) The form of Q(u~v) is Q (u,v) = a 11 u33 + a12u 3V2+ a13u 3v + a 14 u3 + a 2u23 + a .2 2V2 + a2Uv + a, U2 +(2 ) a3, uv3 + a32uv2 + a33uv + a34 U + a41v3 + a42 v2 + a43v + a The a,, vectors are the algebraic coefficients. There are sixteen a.- vectors, each with three components. This means that 48 degrees of freedom, or coefficients, must be specified to define a unique surface. 20 Equation 2.2 is more conveniently written as Q(u,v) = U C VT (2.3) where U = [u' u2 u 11, V = [v' V2 v 1], C is the coefficient matrix, and VT is the transpose of V Since the elements of the coefficient matrix are three component vectors, the C matrix is a 4 x 4 x 3 array. A bicubic patch is bounded by four curves as shown in Figure 2.4; each is a parametric cubic curve. To demonstrate this fact, if we substitute v=0 into equation 2.2, all terms containing v vanish, and the equation becomes Q(u) = a 14u3 +4-,2 +a 3u +a. (2.4) This is an expression for a parametric cubic curve. Of course, the same can be done for the boundary curves on u=O0, u = 1, and v= 1. In fact, by setting either u or v equal to some constant, one can produce rulings on the surface. The network of curves divides the surface into an assembly of topologically rectangular patches, each of which has as its boundaries 2 u-curves and 2 v-curves. Here it is assumed that u and v run from 0 to 1 along the relevant boundaries; then Q(u~v), 0 21 p 000 (Jo7 Figure 2.4 Boundary Curves of a Bicubic Patch 22 UU pR 0R0 UU p, POO1 Figure 2.5 Boundary Conditions Defining a Bicubic Patch 23 Since algebraic coefficients are not the most convenient way to define and control the shape of a patch, a geometric form defining a patch in terms of certain boundary conditions related to the algebraic coefficients is developed. To describe the boundary conditions 16 vectors are needed, 4 of them corner points, and 8 of them tangent vectors (Figure 2.5). Four more vectors called twist vectors (not shown in the figure) are used to show how the tangent vectors across a patch boundary change along that boundary. For example, the twist vectors at poo and p, control the wayp ', changes along the boundary curve u0O from p', topu1. The twist vectors are denoted as pu", puv, pu, and pu. We will deal with Bezier surfaces in this research since they are attractive in interactive design. Advantages of Bdzier surfaces are 1. It is easy to predict how the surface will respond if a control point is moved. 2. Local control is easy with Bezier surfaces. Any change in the vertices of a span will only affect the curve within that span. The rest of the curve will remain unaffected. 3. All that is necessary to increase the order of any curve segment is to specify another interior vertex. This greatly increases flexibility and overcomes many of the difficulties of cubic spline fitting and parabolic blending techniques. 4. The convex hull which is the convex polygon obtained by connecting the control points of bicubic Bezier surfaces allows bounding regions for the surface to be easily calculated, greatly increasing the efficiency of many algorithms. 24 5. The B6-zier patch generated by the 16 control points is constrained to lie within the convex hull of these points. This is useful for clipping and determining when two patches might intersect. 6. The surface shape can be changed without disturbing the slope design points, which may then be independently specified to satisfy continuity requirements with adjacent surfaces. 7. There is no need to calculate slope and tangent vectors. 8. Bdzier surface need not be represented by a square array of control points. There is also a disadvantage to the use of B6-zier surfaces: the movement of one control point will affect the entire surface. In order to obtain local control a patch must effectively be subdivided. A Bdzier surface is a tensor product of Bdzier blending functions (Bernstein basis) which interpolates between the first and last vertices. It is defined by a set of control vertices, in 3D x-y-z space, that is organized as a 2D graph with a rectangular topology. The set of control vertices is called a control hull or control graph [BARS84]. A bicubic Bezier surface, shown in Figure 2.6 takes the form m n Q,n,(l'v) = jJB,,(u )B,.,(v) (2.5) i..O j=O where m =n =3 and P, is the control graph specifying the location of the (m +1) by (n + 1) control points. The basis function, Bim(u), is given by 25 p3 301 Pol/, ~ 12 2 p Pp 02 33 Figure 2.6 Bicubic Bdzier Surface 26 BiAU)( juL(1-uri where (m~ =_M_l (rn-i)! i! with m being the degree of the polynomial and i the particular vertex in the ordered set (from 0 to in). For a bicubic Bdzier patch, the x component of Qv(u~v) in matrix form is X(U'V) = [BC13(U) B13(U) B23(U ) B33(u)] X00 XOI X02 X03 x1o X11 x12 x13 X201 X21 X22 X23 [X3 X31 X32 X33 The bicubic Be-zier surface takes the matrix form when the basis functions are substituted in the above equation: X(u,v) = [(I1-U)3 3(1 -U)2 u 3(1 -U)U2 U3] XOO XOI X2 X31 X10 X11 X12 x13 X20 X21 X2.2 X23 X30 x31 x32 X33~ (2.6) B03(V)~ B23(V) .B33(v) (2.8) (I1_V)3 3(1 -V)2V 3(1 -v)V2 V 3 (2.9) 27 Equation 2.9 can be rewritten as X(U,V) = [u3 u u 1 ] Mb PX M f [V3vv where -3 1 3 0 T o 0' Mb o 0. -3 -3 3 -6 3 0 -3 3 0 0 0 0and P. XOO X01 X02 X03 x10 X1i X12 x 13 X0x21 X22 X23 X3 31 X32X3 The matrix Px contains the position vectors for points that define the Bo-zier surface patch. Only the four corner points x., X31, X03, and X133 actually lie on the patch. The points x1, x2,, X,2, and X22 control the cross slopes along the boundary curves in the same way as the twist vectors of the bicubic patch. The remaining points control the slope of the boundary curves [MORT85]. (2.10) -1 3 Mb _3 3 -6 3 0 CHAPTER 3 PROBLEM DEFINITION AND REVIEW OF EXISTING ALGORITHMS 3.1 Problem Definition The task we are primarily concerned with is that of intersecting rays with a large collection of primitive objects defining an environment. For each pixel in the raster, the corresponding ray is traced to the first surface that it encounters to determine which objects in the scene, if any, are intersected by the ray. The points of intersection of this ray with the surfaces are computed and the intersections are sorted in depth. The surface closest to the image plane is the visible surface at a particular pixel [ROGE85]. The ray is represented parametrically, and the point along the ray is given by R(t) = RO + D -t (3.1) where RO = [xo yo z01 is the ray origin, D = [D. D, DJI is the ray direction, and t is the ray's traveling time relatively starting from the origin RO. Perspective projection enhances the realism of a displayed image by providing the viewer with depth cues. A perspective transformation need not be applied here since the ray casting method determines the correct perspective as an automatic result of tracing the rays from the eye. The objects are described in a left-handed 28 29 coordinate system with the eye at the origin (see Figure 3.1). The screen is centered about the positive z-axis and acts as the viewing plane. Without this centering process, objects may be skewed in the viewing plane. Skewing is also dependent on the distance in the z direction of the viewing plane. The further the distance between the eye and the viewing plane, the more parallel the different rays travel, thereby reducing skewing of the scene. In fact, some ray tracing algorithms place the eye at z = -co, thus having all rays travel in the same direction parallel to the z-axis. Our implementation places the eye at the origin with the view plane at a user specified distance d in the + z direction. The effect is to perform a single-point perspective projection onto the image plane. Since the z value represents the distance of a point from the eye, depth information is included in the projection equations. Objects that are farther away have their x and y coordinates divided by larger z values than those closer, causing these distant objects to be drawn smaller. Since we are using perspective projection and the center of projection is located at the origin (0,0,0), the ray equation 3.1 becomes R(t) = D-t(3.2) where x0 =y = zo=O, D. =DJd, D, = Dy/d, D 1. A bicubic Bdzier surface is given by the equation Q(u'v) = [x(u'v) y(u'v) z(u'v)] 33 (3.3) 30 y I 7' Figure 3.1 Left Handed Coordinate System 31 The intersection of a ray with a surface of this form is obtained as the simultaneous solution of the three nonlinear equations given by the vector equation: F = Q(u,v) - D-t - RO (3.4) The time spent for this ray/surface intersection computation is the bottleneck of the ray casting algorithm. Even for a screen resolution of 512x5 12, 250 000 rays should be generated. For complex scenes modeling complex lighting effects, antialiasing or motion blur [C00K84], this calculation must be performed millions of times. Direct methods of calculation have been found for several surface types, including spheres, polygons [WHIT8O], cylinders and volumes of revolution [KAJI83], Steiner patches [SEDE84], and bicubic surface patches [KAJI82]. Numerical techniques have been employed in the calculation for certain algebraic surfaces [HANR83] and parametric surface patches [J0Y86, LEVN87, LISC9O, SWEE86, TOTH85, YANG871. Several authors addressed the problem of reducing the number of ray/surface intersection calculations using the techniques of recursive subdivision [ARV087, FUJ186, GLAS84, W00D89], bounding volumes [WHIT8O, KAY86], hierarchical bounding volumes to further improve the efficiency of the bounding volumes [RUBI8O, WEGH84], and coherence [HECK841. One research direction in image synthesis has been to exploit the inherent parallelism in the algorithms used to create realistic images. Ray casting is an ideal candidate for parallel processing. Various hardware techniques [BAD090, DIPP84, 32 PLUN85, ULLN831 have been used for acceleration of ray tracing. Most of these approaches are based on tasking multiple microprocessors to manipulate a subset of rays in parallel. Pulleybank and Kapenga as well as Ullner also explored the use of VILSI circuits to speed up the process of ray/surface intersection [PULL87, ULLN83]. Since in this study we are dealing with the intersection of a parametric surface with a ray, a detailed survey about ray/parametric surface intersection will be presented rather than giving all the work for intersecting rays with surfaces. Although the methods used in other type of surfaces, such as implicit surfaces, are similar to the ones employed in our study, those won't be explained here in detail. 3.2 Literature Review Equation 3.3 can be solved for points of intersection, u, v, and t using numerical techniques or subdivision techniques. There is also a trend towards parallelizing ray casting algorithms, especially towards performing ray/object intersection calculations in parallel. In recent years several methods to calculate the intersection between parametric surfaces and rays have been proposed. But, published algorithms on ray tracing parametric surfaces are relatively few compared with the number which deal with other kinds of surfaces. Unfortunately, most of these algorithms are either very expensive, or possess other disadvantages that preclude their use in real-life rendering systems. 33 3.2.1 Numerical Techniques Kajiya [KAJ1821 uses ideas from algebraic geometry to obtain a numerical procedure for intersecting a ray with a bicubic surface patch. A three-dimensional ray is regarded as the intersection of two planes ax+by+cz+d, = 0 and a2x+bzy+c2z+d2 = 0. For the bicubic, Kajiya then substitutes the parametric surface equations into the plane equations to obtain polynomial equations of the form F, (t4v) = 0, F2(u~v) = 0, which can be solved numerically. The Laguerre method used in that paper is cubically convergent and therefore takes only one iteration for linear and quadratic functions. The method is slow but definite. It converges to the nearest solution more quickly than subdivision, but is more expensive at each step. The algorithm uses floating point operations. Kajiya estimates that 6000 floating point operations may have to be performed in order to find all of the intersections between one ray and one bicubic patch. Toth solves the ray/surface equation by minimizing the function (Q(u~v) - D i - RO)2 with respect to u, v and t [TOTH85]. This method, known as multivariate Newton iteration, has the advantage of being able to handle a line which either touches the surface or has no intersection with it. But, the computation of the control points for the interval extension method used in this paper requires more than 40% of the total CPU time used, which is a serious drawback in the algorithm. The main reason for this is that the method fails to utilize coherence. Joy and Bhetanabhotla propose another algorithm [J0Y86] using a quasi-Newton iteration which is basically a generalization of the method used by 34 Toth to solve the ray/surface intersection problem. To speed up the convergence of the quasi-Newton method, ray coherence is utilized: numerical calculations in adjoining pixels are used forming the initial approximations for new calculations. Although this method has computational advantages over the conventional Newton method in obtaining the initial guess, naive utilization of ray coherence may cause convergence to incorrect solutions and subdivision of object space to prevent this may lead to excessive memory requirements. Sweeney and Bartels [SWEE86] present a method for rendering B-spline surfaces for ray tracing. It combines subdivision and numerical techniques: the Bspline surface subdivision technique is utilized to generate a close tiling of the surface with control points which are then used as initial approximations for Newton's method, which in turn is used to generate the final intersection point. Time is saved at the expense of using more memory. There is also no guarantee that the Newton iteration, once initiated, will converge to the correct solution. Levner et al. [LEVN87] describe a method that was originally developed for ray tracing 1-spline surfaces, but has been applied successfully to bicubic surfaces in general. Their method is actually a variation on that of Sweeney and Bartels. Instead of refining the control vertex mesh, they create a mesh of points that lie on the surface itself. The advantage of this approach is that point evaluations on the surface allow treatment of general parametric surfaces. But, the memory requirements of this method are still very large. 35 Another improvement on the method reported by Sweeney and Bartels is described by Yang [YANG87]. An individual octree is created for each B-spline patch by subdividing its bounding box. It provides a more tight bounding scheme than that of Sweeney and Bartels. Since all leaf boxes hit by a ray are searched before actually attempItiihg ray/surface intersection, the algorithm is inefficient. Lischinski and Gonczarowski [LISC9O] present improvements of previously known techniques and introduce a space- and time-efficient algorithm utilizing these. This algorithm combines numerical and subdivision techniques, thus allowing utilization of ray coherence and greatly reducing the average ray/surface intersection time. Although some of the other methods mentioned above, or in the next section, are faster than their method, they claim that theirs is qualitatively superior to them, since it doesn't possess the others' disadvantages. 3.2.2 Subdivision Techniques Many subdivision methods have also been employed to solve the intersection of a ray with a bicubic patch. The subdivision algorithm uses much storage and converges only at a linear rate to a solution. Whitted [WHIT8O] was the first to describe recursive subdivision methods for ray tracing parametric surface patches. If the bounding volume of a patch is pierced by the ray, then the patch is subdivided and bounding volumes are produced by each subpatch. The subdivision process is repeated until either no bounding volumes are intersected, or the intersected bounding volume is smaller than a predetermined 36 minimum. This method requires testing each surface against each ray, which results in too many subdivisions; thus it is very inefficient. Rubin and Whitted [RUBI8O] have generalized the method used by Whitted representing the object space entirely by a hierarchical data structure consisting of bounding volumes. Tlie& algorithm utilizes parallelepipeds for bounding boxes with an orientation that minimizes their sizes. These hierarchical volumes were constructed by hand for each scene, a time consuming process not suitable for animation. Although the hierarchical representation yields an improvement of about 100:1 over naive subdivision, it is still too slow. Recently, a new approach based on recursive subdivision was presented by Woodward [W00D89]. The subdivision process is carried out in the orthogonal viewing coordinate system of the ray. To speed up the computation, the control vertex mesh is scaled to large integer numbers. Subdivisions are done using integer addition and bit shifts, taking care not to lose accuracy. This method is faster than Whitted's method and it is reported to compare favorably with methods such as that used by Sweeney and Bartels, in spite of the fact that it doesn't attempt to exploit coherence. But, like the previous subdivision methods, it has the drawback that the same amount of computation must be performed to intersect a ray with every patch: simple patches cost as much as complicated ones. 37 3.2.3 Parallel Processing Techniques In the original algorithm by Whitted over 75% of the time was spent calculating ray/object intersections. Since then much work has been done to speed up the intersection tests by using parallelism to perform many ray tests in parallel. All the researchers mdhtioned above, one way or another, explore the possibility of reducing the intersection computation, but their approaches were not simple enough for hardware implementation since they did not take into account that computation for intersection can be done in parallel. Some parallel machines have been proposed in the literature, but we have yet to see an implementation of such a machine. Ullner in his dissertation discusses how to parallelize ray tracing in various ways [ULLN83]. This is one of the early studies of parallelizing ray tracing and how ray tracing might perform on different architectures. In addition to using subdivision methods, Pulleybank also implemented ray/surface intersection in a VLSI chip [PULL87]. But, recursive subdivision for curve and surface rendering is expensive to implement in hardware due to the high speed stack memory requirements. Kedemn [KEDE84a] proposes an object space architecture based on the CSG and ray casting concepts. In his design, Kedemn implemented the CSG tree in hardware. The reconfigurable tree structure consists of two kinds of nodes, namely the Primitive Classifiers (PCs) as leaf nodes and Combine Classifiers (CCs) as internal nodes. Each PC is assigned a certain number of primitives. During the image generation process, the system casts rays from the eye through every pixel on 38 the simulated display. Each PC takes these rays and computes the intersection of every ray with its associated primitives. This computation is done in parallel. The result of the computation is passed up the tree. Each CC takes the results from its left and right nodes, combines them according to the operation assigned to it and passes the result up the- tree. At the root of the CSG tree, the color value of the pixel is calculated. This architecture computes quadratic surface visibility. Kedem and Ellis [KEDE84bI use CSG for building up a complex object out of simple primitives like cones, cylinders, and spheres. The main drawback of such a system is that representing an object with cubic or higher order surfaces require numerous quadratic primitives and even then is at best an approximation to the original surface. A similar algorithm for planar patched surfaces was presented by Thomas [THOM83]. It also used ray casting techniques and was specifically designed for VLSI implementation. Unlike Kedem's algorithm, it performs perspective projections. The intersection calculation of a ray with a parametric patch is still the subject of ongoing research in this area. To produce a highly parallel machine capable of real time operation, even compromising image quality, is also one of the challenging areas people have attacked. The problem still remains to find an efficient execution of ray casting algorithm for curved surfaces and usage of parallel processing techniques since ray casting is inherently a parallel process. CHAPTER 4 THE PROPOSED ALGORITHM The choice of rendering algorithm is dependent on the model representation and the degree of realism desired. Ray casting, the rendering algorithm of our choice, handles one pixel at a time and compares every object's depth at this pixel as opposed to the z-buffer algorithm which handles one object at a time and operates on all pixels that this object covers. The proposed algorithm determines visibility by tracing a bundle of rays from the observer to the objects in the scene. The direct intersection of a ray with a parametric surface, i.e. without generating a polygonal approximation of the surface, has generally been done using either rather sophisticated numerical methods or simpler but inefficient subdivision algorithms. This dissertation focuses on numerical techniques to calculate the intersection of a ray and surface. The techniques, as described here, deal with Bdzier surface patches, but can easily be extended to more general parametric surfaces. The algorithm consists of essentially two parts. The first part is a series of preprocessing steps. The actual ray/surface intersection is the second part of the algorithm. Preprocessing involves selection of a bounding volume, construction of a bounding box tree, determination of boundary and silhouette edges in the patch, 39 40 and calculation of the surface normal. These steps, Newton's method as a numerical technique which is used for the ray/Bdzier surface intersection calculation, actual intersection algorithm, and illumination model are discussed in this chapter. -- 4.1 Preprocessing Steps Since the ray casting procedure is so computationally expensive, any preprocessing that can be performed to help reduce computations should be done. Preprocessing is done before ray-tracing is begun. This saves time by eliminating redundant calculations for every line/object intersection. The preprocessing steps to follow are as follows: 1. Selection of a bounding volume. 2. Construction of a bounding box tree. 3. Determination of boundary edges. 4. Location of silhouette edges. 5. Calculation of surface normal. 4,1,1 Selection of a Bounding Volume Bdzier surfaces are capable of modelling a wide variety of shapes which are not easy to model in other ways. Parametric surfaces, such as Bezier surfaces, are employed increasingly as the computer graphics field matures. Since such representations involve more sophisticated mathematics than the simpler alternatives, it is more computationally expensive to produce their images. For example, in ray 41 casting a Bezier surface, it might require thousands of floating point iterations to find intersections of a ray with a Bdzier surface patch [KAJ186, TOTH85]. This is much more expensive than finding the intersections of a ray with a polygon or a quadratic surface. The selection of a bounding volume helps to accelerate ray casting Bdzier surfaces. It is possible to use the convex hull of a Bdzier patch as its bounding volume. Such a box, however, is not very tight, since control points could be quite far from the surface they define. We present a different technique: bounding boxes defined by the exact minimum (Xmin, y~. and maximum (xm., y..) values attained on a surface are tighter than other bounds obtained using approximate methods such as convex hulls of the control vertices (see Figure 4.1). Points on the surface are evaluated, using x(u~v), y(u~v) and z(u~v) given in chapter 2, by varying two parameters incrementally. The increments are obtained dividing the parameters u and v by an integer. The forward differencing technique which is presented in chapter 5 is used to obtain the increments. This scheme effectively reduces the size of the void area. An intersection test with the patch is performed only if its bounding box is intersected by the ray. The information sought for an intersection test with a bounding volume is simply a yes-no response. The computation of the actual point of intersection is not necessary. First the transformation to make the ray coincident with the z axis is computed. The transformation is then applied to each of the eight vertices of the bounding box. A new bounding box is then determined using the transformed 42 Figure 4.1 The Extent to the Surface Q(u,v). 43 vertices. An intersection between the ray (z axis) and the box occurs if the signs of the new x,,i,, and x,,. are opposite as well as the signs of y, and y.. 4.1.2 Construction of a Bounding Box Tree The second step- in the preprocessing of a surface is to create a tree of bounding boxes. The entire scene is bounded by a global bounding box at the root node of the tree. Every internal node of the tree contains pointers to its children, and its bounding box is just big enough to contain all of its children's boxes. Each leaf node holds representative values of u and v for the section of the surface it represents; these values will be used later as an initial guess for a Newton process. If a ray hits a leaf box, the intersection of the ray and the patch must be within the box or very close to it. The mean parameter values of the points contained in the leaf box hit by a ray provide good initial guesses for the Newton iteration for the calculation of the parameter values of the intersection point. Bounding boxes are useful for complex objects, such as bicubic surfaces, when the object is fairly small. Since the time to find the intersection with a bicubic surface is large, but bounding box intersections are fast, the ray tracer can save time for all the negative tests. In the case where the ray does enter the box, some additional overhead is incurred; however, this cost is easily justified by reduced times for negative tests. 44 When the initialization (modelling) stage is complete, a patch is represented by its defining matrices (C = M P M1) and the parameters kept in its bounding box requirement is less than that required by the polynomial representation. 4.1.3 Determination of Boundary Edges It is useful to have a univariate description of each bounding edge of the patch. Bicubic surface patches have four natural boundary curves: Q(uO), Q(u,1), Q(O,v), and Q(1,v), each of which is cubic in one variable. This is easily derived from the standard bivariate coefficient matrix C = M P M' as follows: Q(u,O) = edge0(u) = [u 3 u2 U 1] [C]I[Q 0 0 I]T (4.1) Q(u,l) = edge 1(u) = [u3 u 2 U 1] [C] [I1 I If The expressions for Q(Ov) and Q(1,v) are similar. Twelve coefficients are required to describe a cubic curve, four each for the x, y, and z components and these can easily be computed from the C matrix. The coefficients for edge1(u), for example, are summations of the rows of C In addition, surface normal information along each bounding edge of the patch must be provided for use by both the shader and the silhouette detection routine. The silhouette detection and the normal calculation are explained in the next two sections. 45 4.1.4 Location of Silhouette Edg-es The silhouette edges of a surface, if it contains any, are defined as those curves on the surface along which the z-component of the normal vector is zero, assuming the eye is on the z axis. In general, since the order of the surface normal is quintic (as we will s~e- in the next section) the order of the silhouette curve is also greater than the cubic, but it is approximated here by a piecewise cubic interpolant so that the silhouette can be treated the same as any other edge. Boundary and silhoutte edges are shown in Figure 4.2. To compute a silhouette edge, the cubic approximation of the normal surface and the z =0 plane are converted to the Bdzier surface representation [SCHW82I. This provides a surface representation of the normals at each point on the surface. The normal vector of the surface at the point (u~v) is evaluated. A set of points which lie on the intersection line of the normal surfaces are obtained using subdivision and the convex hull property of BO-zier meshes. The surface equation is calculated using (u~v) values to determine a corresponding set of points on the surface which represent the actual silhouette in screen space. Connecting these points with straight lines is guaranteed to be within a user-specified tolerance of the true silhouette. 46 Boundary Edges Sithoutte Edge Figure 4.2 Two Types of Edges for Curved Surfaces 47 4.1.5 Calculation of Surface Normal The normal vector to a Bdzier patch can be found by taking the vector cross product of the tangent vector in the u direction and the tangent vector in the v direction. T'he Bezier surface patch is defined by Q(u'v) = UMPMTVT (4.2) where U = [u3u2 u 11, V = [3v2Vv 1], M is the Bdzier matrix given in chapter 2, and P is the matrix of control points. The u tangent vector of the surface Q(u4v) is aQ(U,v) - (UMpMTVT) aU ____ _______= -~MpMTVT -[3u' 2u 1 O]MPMTVT = U/MPMTVT (4.3) au auau and the v tangent vector is aQ(u,v) . a(UMpMTVT) = UMpM T aV UMpMT[3v2 2v 1 OT = UMPM TVI (4.4) 0 av Both tangent vectors are parallel to the surface at the point (u4v) and, therefore, their cross-product is perpendicular to the surface. Each of the tangent vectors is a 3-tuple since equation 4.2 represents the x, y, and z components of the patch. The normal is given by N = -)'r aQ(u,v) X dQ (U'V) - ~zxy- v - yL~tYI (4.5) . n Jau l" -YZ ZX .I 48 where x., y,,, z., are the x, y, and z components of the u tangent vector and x~, y,, z, are the x, y, and z components of the v tangent vector. This is a quintic expression in u and v. In our algorithm, we chose the method given in [CATM74] to calculate the normal: shades are computed by calculating a cubic approximation to the normal at each point on the surface in order to simplify the computation. Twelve coefficients are required to approximate the cubic curve. This way, the rendering system could treat positional and normal information in a uniform way. Each component of the normal vector can be approximated by a Coons patch, corner positions, derivatives in each direction at the corners, and cross-derivatives at the corners, X.(u~v) UCP/2r V1, where dv dv X"(1,0) X"(1' 1) dvIO dv (4.6)1 ddu ddv dv(46 du du dudv dudv and 2 -2 1 1 2 -30 1 C -3 3 -2 -1 CT -2 3 0 0 (4.7) 0 0 1 0 1 -21 0 *10 0 0. 1-1 0 0 49 The y and z components, Y, Z,, are treated similarly. The values of the matrix P,, P. and P., are evaluated at u=O0, u = 1, v= 0, and v =1. These functions approximate unnormalized normal vector functions, therefore, the normal vector is normalized before use. The elements iri the matrix P, are evaluations of the functions at the corners of the patch and they are given by X.(U'V) = Yz-"Z = (U/MpX'MTV)(UMpZMTV/) _ (UMp MTVI)(U/Mp.MTV) AX(uv = (U/MpYMTV)(UMpZMTV) + (UlMpY MTVI)(UMpZMTV) _ (U/MpYMTVI)(UlMPZMTV) _ (UMpYMTVI)(UI/MpZM TV) dX,,(U,v) =(Uqp Y M T V)(UM Tp ZMTV/) + (U1Mp MTV)(UMpZMTV//) dv- (UMpYMTV//)(UMpZM TV) _ (UMP MTVI)(UIMP M TVI) (48 d 2X .(u,v) = (U"1MP MVpZM/V) + (U"1MP MTV)(UMPZMT1 + (U'Mp MTVI)(U/MpZMTV) + (UIMpYMTV)(UMPZMTVII) - (U'MP MTVI/)(U/MpZM TV) _ (U/MpYMTV/)(UlMpZMTVI) - (UMp, MTVII)(UIIMpZMTV) _ (UMPYMTVI)(UI/MpZM TVI) 4.2 Newton's Method An intersection between a surface Q(u~v) and a ray R(t) occurs when and only when F(uyv,t) = Q(u,v) - R(t) = 0(49 (4.9) 50 Since bulk of the computation time for the whole intersection calculation is taken by the root finding procedure, it is crucial to use a highly efficient root finder and the Newton's method is the one used here. This method can give the required solutions to any desired accuracy. Newton's method is a procedure for calculating the root of a polynomial equation or a system of polynomidal equations. These equations can be solved iteratively by the Newton method provided that a good initial approximation is known. Newton's method converges quadratically, i.e., near a root the number of significant digits approximately doubles with each step. This very strong convergence property makes Newton's method the method of choice for any function whose derivatives can be evaluated efficiently, and whose derivative is continuous in the neighborhood of a root. In the case of a polynomial equation in one variable, Newton's method has a simple geometric interpretation. If x,, is our initial guess for the solution to the equation f(x) = 0, we calculate the tangent _f(x,). The next approximation, x, is given by the point where the tangent crosses the axis. Equation 4.9 is equivalent to F,= x (u,v) - D ,-t =0 F uv - DY -t =0 (4.10) 1 = z (u,v) - t = 0 where F is a vector, in the form of F = [W, FY F.]. The system F(uivyt) = 0 can be reduced to a system of two equations in two unknowns, eliminating t. Then we have the equivalent system 51 g ,(u,v) = x (u,v) - Dx-z (u,V) = 0 (4.11) g2(u,v) = y (u,v) - D, z (u,v) = 0 Letting s = (u~v), equation 4.11 can be written in vector form as g (S) = 0 (4.12) Newton iteration generates a sequence of successively improved approximations to the solution of a system of equations above, using the relation X1 = Xr-J1(X,)g(Xr) (4.13) where J is the Jacobian matrix defined as au av(4.14) ag2 ag2 au av. calculated at u = u,, v = v, and J-(x,.) g (xr,) is the Newton step. We can rewrite the equation 4.13 as I 11 = [1 [~ (U, V) (4.15) The system of equations above is solved iteratively provided that the Jacobian matrix, , is non-singular at each approximation point and there is a good initial guess to start with. There is even no need to calculate the inverse matrix since the inverse of 2x2 Jacobian matrix is trivial. Since P' can be expressed explicitly as 52r1 -J1 equation 4.15 can be rewritten as Ur+1 = UrD(g, g - g2- g) (4.17) vr~ =v -r D(g2- ag 1*a2 where D(u~v) =1/det(J). Once the Newton steps have been calculated with an initial guess (u,,v,0), equation 4.17 is used to evaluate (u,,v,) from (u0,,v,), then (u2,V2) from (u,,v,), and so on, until one of the following conditions are satisfied: 1. g1(u~,,+1.) I + 1g2(Ur~,,vr.j) I < tolerance 2. u or v are outside the parametric intervals 3. Ig1(u,,1,v,+1) I + 1g2(Ur.11,r1) I > lg1(U,,v,) I + 1g2(U,,V,) 1 4. n+1I > maximum number of steps If the iteration is terminated by a case 1, it is considered that the ray does hit the patch, and Ur+,,v,+, are returned as the parameters for calculating the coordinates and the derivatives at the intersection point. If the iteration is terminated by cases 2 or 3 or 4, it is considered that no intersection happens, and NIL is returned. In this implementation, tolerance and maximum number of steps are chosen as 0.01 and 5, respectively, although these values can be varied according to the surfaces used. It was found all Newton calls were terminated by cases 1, 2, or 3, and 53 that about 90% calls required only one iteration. The search algorithm is performed in the surface parameter space only. The ray parameter is ignored, even while performing Newton iteration, until a solution is found. The Newton iteration finds a pair (uyv) such that a point Q(u,v) on the surface is also a point in a given ray. The Newton method, in high-level pseudocode, is shown in Figure 4.3. 4.3 Algorithm Primitive objects such as bicubic Bdzier surfaces are rendered one ray at a time. The algorithm developed utilizing this ray casting technique for bicubic B6-zier surface is as follows. A list of tree nodes to be tested for intersection with the ray is sorted by the minimum distance from the origin of the ray to the bounding box of the node so that the next node in the list is always closest to the origin of the ray. The list, at the beginning, contains only the root node of the tree. At each iteration, the closest node on the active list is chosen and removed. If the node is an internal node, each of its children is tested in turn for possible intersection with the ray. If the ray hits the bounding box of a child, the child is put back into the list. The nodes not intersected by the ray are thrown away, thereby saving memory. If the node is a leaf node, then we use the contained (u,v) parameter values to initiate the Newton's method. The loop exits when either the list of nodes to be tested is empty, or an intersection is found whose distance from the origin of the ray is less than the minimum distance from the origin to the next node in the list. 54 Newton (surface, treenode, ray, intersection, t, u, V) calculate C matrix /* C.' C:, C7 get initial values for u and v from treenode success = failure = FALSE while (not success && not failure){ calculate jacobian, J calculate det(J) if (det(J) = 0){ approximate J calculate det(J) again calculate JV calculate g,(u,v), g,(u,v) error = I g,(u~v) I + I g2(u,v) if (error < tolerance) success = TRUE else if (iterations > =maxit) && (error > last error) && (u u<0) H(u >1) I(v < 0) I(v >1)) failure =TRUE else I calculate Newton steps calculate ur.I, Vr+l increment iteration count if (success){ intersection = point on surface at (u,v) t = distance from ray origin to intersection point Figure 4.3 Algorithm for Newton Method 55 The intersection Procedure which is the heart of the whole algorithm is shown in Figure 4.4. The bicubic surface primitive engine (BiSPE) is based on this intersection procedure. This algorithm which executes the task of the BiSPE is suitable for parallel implementation and will be explored in detail in chapter 5. The algorithm follows T'horria's [TL-1M831 approach for collecting the intersection depth results from the primitive processors and for determining which surface is visible along the current ray. The intersection with the maximum z coordinate is the visible surface on the image plane. 4.4 Lighting Model Once a rendering algorithm has resolved the geometric question of what spot on what object a pixel represents, it needs to resolve what color that spot appears to the observer. The visible surface for a pixel is a small patch of surface. The shading model takes into account the composition, direction, and geometry of the light sources, the surface orientation and the surface properties of the object. Surface normals can be used to indicate the shape and orientation of a surface patch, which in turn is used to determine the shading value. Ray casting is the process of determining which object in a scene is first encountered by a r ay emanating from the camera view location and passing through a pixel on the display screen. From the list of ray-object intersections the first visible intersection is chosen. Since there is no further tracing the rays from the intersection point, the conventional shading model due to Phong [FOLE82, ROGES51 is used 56 intersect-surface (surface, ray, intersection) initialize empty list add root node of surface's tree to list while (list not empty) { remove next node from list if (node = leaf node){ call Newton process if (intersection found) && (closer than previous ones) save it as mnt found else for each child of node if (ray intersects bounding box) add child node to list if (mnt found < min dist to nextnode) exit loop Figure 4.4 Ray/B6-zier Surface Intersection Algorithm 57 with ambient or background lighting, diffuse shading due to surface orientation in relationship to the light source, and specular reflection or highlights due to the location of the light source and the eye as well as the surface orientation. Phong's equation for the intensity of each pixel consists of the sum of three terms given by +mn Idiffi.se =k I + d , N L + k I,(WVS') d+K d+K (4.18) =kl, + I,(kcosE) + ko'D 0 d+K dkcs~ where k4, kcd, and k,~ represent the ambient, diffuse and specular coefficients respectively for the object and 0 :5 k.,kd, : 1, 1. and I, are the intensity of the ambient light which is a constant throughout the given scene and the intensity of the light source respectively, d represents the distance from the perspective viewpoint to the object, K is an arbitrary constant that is included to prevent the denominator from approaching zero when d is small, N is the unit normal to the object at the point, L is the unit vector having the direction to the light source, V is the unit vector from the eye (origin) to the object, S is the unit vector of the specularly reflected ray, o is the angle between N and L, and 0 is the angle between V and S (see Figure 4.5). Diffuse and specular components of the lighting model are based on surface orientation. An object face perpendicular to the direction of the light source is illuminated differently from one whose face is angled to the direction of the light 58 source. The normal to the surface at the point of intersection describes the surface orientation. Each point in the scene visible to the viewer must have the normal to the surface at that point computed prior to the lighting model implementation. For evaluating the intensity of the light at each pixel, three color components of the intensity should be separately calculated. For an RGB video monitor, color components are red, green, and blue. Parameters for intensity and reflectivity then become three-element tuples, with one element for each of the color components. For instance, the triple representing the coefficient of diffuse reflection is (14,, ke kdb). However, since the constant value k, for specular reflection is not dependent on the color of the surface, kc, is the same for all three color components. Hence, the intensity calculation breaks into three equations: I, = k.,rI. + i *(k d,(N.L) + k,(V.S)") ard+K I = k I +K (k (N.L) + k,(V.Sy) (4.19) I= k ahIaj, + *1b (k A (N.L) + k,(V.S) d+K 4.5 Results and Discussion The ray tracing technique is an object oriented algorithm. Rather than group the software by procedure, we group it by data structure. Thus, instead of collecting all the intersection methods into one file and all the normal vector formulas into another, we split the problem the other way, collecting procedures for each primitive 59 N Figure 4.5 Diffuse and Specular Reflections r --9 60 type into a file of its own. If there was more than one primitive such as spheres, polygons, there would be one file containing patch-related routines, one file for sphere-related routines, another for polygon-related routines, etc. This has the advantage that primitive-dependent information can be hidden in data structures local to each file andy (he procedure interfaces can be very simple and generic. Adding new primitives to the system becomes easy with this scheme. Since the details of each primitive's data structure will be local to that sub-module, all operations on the primitive are supported by generic procedures. Each object shape is defined using its individual parameters. The intersection routine is object oriented also. Each object shape uses different procedural techniques to determine an intersection. 4.5.1 Complexity and Performance Analysis In this section, the precision required in the critical parts of the algorithm is examined. The algorithm is bound almost exclusively by floating point speed, not by storage management, not by dataflow complexity, and not by program control complexity. We intersect each ray with the environment by simply testing each and every primitive object retaining the nearest point of intersection, if one exists. This has a time complexity which is linear in the number of objects. All the intersections are sorted in order of increasing parameter t, and the first such intersection is returned. To find the intersections, we need to solve two simultaneous equations depicted in 61 equation 4.11. The number of floating point operations (flops) necessary at each step solving these equations are summarized as follows. 1. Calculation of g1(u~v) and g2(u~v). a. Computation of the coefficient matrix, C = M P M. Three 4x4 matrix multiplication reqpiires 40 subtractions, 16 additions, and 24 multiplications (knowing M contains I's and 0's in certain positions, the calculations are simplified). This is just for one component, therefore for three of them it adds up to 120 subtractions, 48 additions, and 72 multiplications. b. Computation of x(u~v), y(u~v), z(u~v). It takes 4 multiplications to calculated' u', v2, and V3. Multiplication of 1x4 U matrix with 4x4 C matrix and multiplication of resultant with 4x1 V matrix requires 15 multiplications and 15 additions for one component bringing the total to 49 multiplications and 45 additions for all three components. c. The solution of equation 4.11, once the necessary terms are calculated, requires 2 multiplies and 2 subtractions. 2. To solve the equations above, we need to calculate Newton iteration procedure. a. Calculation of partial derivatives for x, y, and z components requires 49 multiplications and 39 additions. b. Calculation of partial derivatives for g, and g, needs 6 multiplies and 4 subtractions. c. Calculation of det(J) requires 2 multiplications and 1 subtraction. d. Calculation of D requires 1 division. 62 e. To solve equation 4.17, 6 multiplications and 4 subtractions are needed. 3. Computing t takes 1 addition. It is not hard to see that, taken at its simplest level, ray casting is highly computationally expensive. These results are for ray/patch intersection for one iteration. Since it takes only one iteration 90% of the time for Newton iteration to converge, the total number of single precision floating point operations are tabulated and are shown in Table 4.1. For a resolution of 1024x 1024, there are one million rays need to be traced per frame at 30 frames per second to achieve real-time performance. A real-time processor has to dispatch the computation for each ray in 33ms. Thus, if only 10 floating operations are required per ray, real-time ray casting requires a 300 megaFlop processor. As we have seen, we have at worst case approximately 448 floating point operations per ray/patch intersection. Calculation yields a complexity of 3 Hr/frame on a quarter megaFlop machine (VAX 11/780). The state of the art for general purpose machines is roughly 200-1000 megaFlops. It is clear that in order to realize real-time ray-traced images we need substantial advances both in algorithm development as well as in special purpose hardware design. In this chapter, the analysis of the algorithm is given without taking advantage of any coherence. It seems that incorporating very strong ray-to-ray coherence is a must for a practical ray tracing system. Incoherence can be an advantage when seeking to parallelize the algorithm. The opportunity for virtually unbounded parallelism is afforded by the fact that the each pixel computation proceeds 63 Table 4.1 Number of Floating Point Operations for Ray/Patch Intersection Term Multiplication Addition Subtraction Division g1, g2 123 93 122 ur+i, V'+i 61 39 8 1 t 1 Total 184 133 130 1 independently of all others. Because it uses well known numerical procedures, fast floating point pipelines will without any difficulty directly accelerate the speed of image generation. There are several optimizations which are carried out on the ray/surface intersection calculations. First, rays are transformed to lie along the z-axis, then the same transformation are applied to each candidate object, so that any intersection occurs at x = y = 0. This simplifies the intersection calculation and allows the closest object to be determined by a z sort. The intersection point can then be transformed back for use in shading calculations via the inverse transformation. Second, by not computing the normal in the intersection routine, we make each intersection test cheaper. Finally, using integer arithmetic helps speed some computations. 64 4.5.2 Simulation Results A ray casting image rendering system has been designed to make the code easily expandable and maintainable. The validity of the algorithm was verified by software simulation. The algorithm is implemented in the C language on an IBM AT/386 computer witf a 80387 floating point processor. An object oriented design philosophy was used for the program, and the resulting code has been made robust. A bicubic Bdzier surface is presented to the program as three matrices of x, y, and z separately. The memory requirements for the bicubic Bdzier surface are determined largely by the size of the tree of bounding boxes. Performance characteristics for various surfaces are given in Table 4.2. Although, CPU times are often misleading, being dependent on programming style, code optimization, and hardware, we still supply these in order to give an indication of the code complexity. One of the major advantages of ray tracing algorithms is that they provide a uniform geometric framework, the line/surface intersection, for displaying variety of different surface types. Unfortunately, this increased generality increases the computation time significantly. Therefore, we consider implementing some part of the algorithm in hardware. One approach is to exploit the independence of the calculations from pixel to pixel and perform these calculations in parallel. The pixel level approach of the proposed architecture reduces the geometry that is typically involved for solving intersection problems to the comparison of depth values. Hardware implementation of the primitive engine algorithm finding the roots of polynomial very quickly and in a robust manner is discussed in the next chapter. 65 Table 4.2 Performance Characteristics for Various Surfaces Surface Teapot* Sphere' Doughnut' Number of patches 32 8 16 Number of control vertices/patch 16 16 16 Total number of rays 64000 40000 64000 Average boxes checked/ray 44.6 24.32 31.12 Average Newton calls/ray 0.78 0.89 0.87 Average iterations/call 1.08 0.92 1.03 Modelling time 2% 1% 1% Bounding box intersect time 46% 45% 48% Newton time 23% 28% 25% Shading time 15% 14% 11% Miscellaneous time 14% 12% 15% Total time (hrs:mins:secs) 0:19:23 0:8:07 0: 12:45 Control vertices data for teapot is taken from IEEE CG&A magazine, Vol. 7, No. 1, January 1987. 1 Control vertices data for doughnut and sphere is taken from IEEE CG&A magazine, Vol. 7, No. 4, April 1987. CHAPTER 5 THE PARALLEL IMPLEMENTATION OF THE ALGORITHM Ray casting algorithm has the advantage of being very simple, highly parallel in nature, and calculation intensive enough to produce good speedup even when each processor is assigned to a different pixel. To accelerate ray tracing some of the operations, especially intersection computation, can be performed in parallel since the computation of each pixel is independent of the others. This fact allows the ray intersection of each primitive, a bicubic Bdzier surface in our case, to be computed in parallel. Each surface is assigned to a primitive engine (processor) which determines the intersection depths. Then, these intersection results are passed to another processor, depth comparison logic, whose task is to determine the visible surface by sorting the intersections in depth. If the ray doesn't intersect the surface, that is the intersection is behind the observer and should not be displayed, the roots of the polynomial are complex and no results are passed to the depth comparison logic. If the ray is tangent to the surface, one real value is passed to depth comparison logic. If the ray intersects the surface in two or three points, the real values are passed to the depth comparison logic. The algorithm presented in the last chapter for the primitive engine was developed to efficiently use the low cost parallel processing possible in VLSI. In this 66 67 chapter, bicubic-surface primitive engine (BiSPE) based on bicubic surface concepts is presented. The whole architecture called GraPE configuring the system in a pipeline fashion with the primitive processing units (PEs) implemented in parallel is explored by Iskandar [ISKA9O]. The system presented in his dissertation can be summarized as follows:- A database manager at the top of the hierarchy distributes several objects to several pipelined viewing-transformation engine (VTE) and object processor (OP) combinations whose outputs are then distributed to a collection of primitive engines (PEs). The PEs broadcast the results of their computations such as color, intensity and depth, to the compositing network designed by Fleischman [FLE1881. This pipeline d-parallelI GraPE architecture is shown in Figure 5.1. It is the responsibility of the compositing network to combine these pixel attributes into the final image. Iskandar claims that a workable system can be attained with as little as one object processor and one set of primitive engines such that the set must contain one engine for each kind of primitive supported by the system. Therefore, BiSPE design appropriate for VLSI implementation can be incorporated into GraPE primitive engine as part of the bicubic surface generation unit. The architecture then can be connected to the compositing network. The details of the connection still need to be worked out. In implementing the algorithm in parallel, first step is to calculate the coefficients of parameters u, v in equation 2.3 and find out how many additions, shifts, multiplications, etc. we have. Once the coefficients are known, the second step is the solution of the non-linear equations. This might require more computational 68 PROCCEM SSP POESO DIASPLRAYTIONP4'E C~~ ~ ~ L PP AR Figure 5.1 Pipeline d-ParallelI GraPE Architecture 69 power from each primitive engine than is feasible to place on a single chip. That's why we need some simplifications that can be applied to the computation of coefficients and solution of the equations. The usage of coherence and forward differencing results in simple enough hardware such that one or more primitive processors could be pl'aced on a chip. This chapter examines these techniques to reduce the computation given in chapter 4. 5.1 Coherence Although an algorithm which treats all rays in the same way has the advantage of being uniform, its efficiency is limited without taking advantage of coherence. It is worth investigating whether coherence can be kept in the parallel implementation, and if so, to what degree. One must realize that as the number of processors in a system increases, the granularity of coherence is reduced. The primary coherence properties are scanline to scanline, pixel to pixel and area coherence. In this dissertation, the first two are used. In the former, adjacent scanlines intersect nearly the same set of edges which are then updated incrementally from scanline to scanline. In the latter, for interpolation across a scanline, a pixel would have a small change in its value from its neighbors if it is from the same span. The coherence along a scanline, that is pixel to pixel coherence, is in the fact that only one parameter in the ray definition changes as the ray is cast through the sequence of pixels. The algorithm takes advantage of the coherent structure of a scene to reduce the computations. The idea simply is that adjacent pixels on a 70 scanline are likely to have the same characteristics. In our case, the set of visible object spans determined for one scanline of an image typically differs little from those of the next scanline. Similarly, other properties of each scanline are quite like those of the preceding scanline. Calculations performed are dramatically reduced in the algorithm using fhdse characteristics. Since the root-finding algorithm requires an initial value, coherence is exploited by beginning with the previous scanline's solution for the current scanline. During image generation, the sample points on the image space are processed in scanline order. At the beginning of each scanline processing, every bounding box tree is traversed. g1(u~v) (see equation 4.11) has to be calculated only once for a sample point. Similarly, g2(u~v) only has to be computed once for a whole line of sample points. In processing a new sample point, instead of calculating g,1uv) from equation 4.11 again, it is updated by the increment calculated with the forward differencing method. For each scanline, g2(u~v) is also obtained incrementally. This way, a good deal of computation is saved. 5.2 Forward Differencing- Technique An efficient way of repeatedly evaluating the bicubic is by using forward differences [FOLE82]. Forward differencing, an incremental technique, is used to reduce the hardware required to implement the primitive processor. This technique for rendering curves and surfaces with adaptive subdivision is explored in [LIEN87] and [SHAN88]. In these papers, shading and image mapping on a bicubic surface 71 is performed by drawing many curves very close to each other so that no pixel gaps remain between them. Livadas et al. [HIVA88] used this incremental technique for the computation of quadratic surfaces. In this section, the scanline parameters at the beginning of each scanline are updated, taking advantage of coherence, using forward differencing technique. - The scanline parameters independent of the vertical parameter are constant from scanline to scanline and thus require no computation between scanlines. Computation of the forward differencing steps follows. The forward difference of a function f(t) is Af(t) = f(t+8) - f(t), 8>0 (5.1) We can rewrite this as f~ )= f(t) + Af(t) (5.2) So, if we knowf(t) and Af(t) we can determine f(t + 6). Rewriting the above equation in iterative terms, we see that this is where we evaluate f in constant steps of size 6, so n and t are related by t. = 6 xn, and f. = f(t,). For a bicubic surface Q(uv) = UCVT, the forward differences are AVx(u'v) = X(u,v+8) - X(u'v) ~~x(u,v) = A X(u,v+8) - Axu) (5.4) A 3x(u,v) = A 2x(u,v+b) - A 2~~v 72 The equations above are forward differences on v. Forward differences on u are calculated similarly. A,x(u~v) is a second-degree polynomial in v when u is kept constant. A', x(u~v) is linear in v whereas the third forward difference is a constant, so further forward differences are not needed. In the algorithm', ihe initial condition calculations, that is u = 0, v = 0, are done by direct evaluation of the equations. If we put these equations into a matrix D as shown below X(O,0) A X(0,O) A 2X(0,O) A 3X(0,O) 2 3 D. AMX(0,0) A. (A Vx(0,O)) A. (A "X(0,0)) AM (AVx(0,O) (5.5) 2 2 22 2 AX(0,0) A~ (A ~x(0,0)) A' (A ~X(0,0)) A u (A 3 X(0,0) 3 32 3A3XO) A~x(,O)A "(A ,X(0,O)) A3 (A 2X(0,0)) Au A~x00 then 1 Q 0 0 a 33 a 12.*a3, a3O 1 0 0 0 D OEE2 El a 23a22 a, a0 6 0 (5.6) 0 0 2E' 6 G3 a 13 a12 al, a1o0 5 2 2 82 0 .0 0 0 6 G3 a 03 a02 a01 a 0 0 83 6 8368 73 Rewriting the equation 5.6, D = E(G) C EM~ (5.7) where c and 6 are the parametric increments in the u and v directions, respectively and C is the 4x4 coefficient matrix. Because we are dealing with three functions, x(u~v), y(u~v), and z(u~v), there are three corresponding sets of initial conditions, D.= E(e) C.E(S), D, = E() C,E(S), and D_, = E(e) C.E(S). We will concentrate on the x component since the calculation of others are exactly the same. Computation of x(0,v) and x(u,0) in increments of 6 and c, respectively is done by applying a forward-step operation F to the matrix D, Then, we obtain the matrix below doo+d10 d01 +d11 d02 +dl2 d03 +dI3 DD d10+d2O d11+d2j d12 +d2 d13+d.3 (5.8) DD d20+d3O d21 +d3l d2+d32 d23 +d33 d~ d d 32 d33 where F=0 1 10 (5.9) 0 01 1 .0 00 1 74 After a forward-step operation is applied, the first row of D, has the terms x(e,O), A,r(e,O), A~(e,O), and A~x(e,O). These quantities are now used to compute x(,v), using forward differences. Applying F repeatedly, the first row of Dx is used to compute x(2e,v), and so on. Substituting column for row, that is transposing D., and applying F to D.,, we 6alculate x(u,O), x(u,S), x(u,26), and so on. The procedure calculating forward differences is shown in Figure 5.2. Using forward differencing, a polynomial of degree n can be evaluated at a sequence of equally spaced points by only n additions [CONT65]. This is considerably better than evaluating the polynomial all over again. However, error accumulation can be an issue with this approach, and sufficient fractional precision must be carried to avoid it [BART871. The forward difference basis allows parallel additions which are suitable for a pipeline implementation. We compute the intersection at one pixel and apply forward differencing to get the next pixel. The c step size for u and 6S step size for v are adjusted so that the curve steps along in approximately one pixel steps in screen coordinates. Forward differencing method has the advantage of producing picture quality equivalent to adaptive subdivision without the memory stack management overhead of recursive subdivision. The relatively poor performance of adaptive subdivision is due to the fact that a subdivision operation takes significantly more computation than a forward difference operation. Forward differencing also makes patch rendering performance competitive with polygon rendering. The technique operates in u, v space but polygon methods operate in the screen scanline order, 75 cubic-visible-surface algorithm for i =I to numpatches{ read patches /P, Pyl P, I for i =1 to numpatches { calculate coefficient -miatrices I for i =1 to numpatches{ D= E(E) x C., x E ' Dy E(,E) x Cy xE( T =z E(,E) x C, xE( T I DU,, DUY DU, F F F x D,,; x Dy; x D,; /* C", Cq, C2 */ /* get initial values * /* calculate x(u,v), y(u,v), z(u,v) * /* in increments of E * D2 = DzT DV, DVy DVz F F F x X X /* calculate x(u,v), y(u,v), z(u,v) * /* in increments of S */ for i = I to num-scanlines { update scanline parameters for j = 1 to numpixels{ calculate intersection determine lighting color pixel increment scanline parameters increment inter-scanline parameters I Figure 5.2 Procedure for Calculating Forward Differences 76 therefore, once the initial parameters are found, forward differencing does not require a transformation from screen space to image coordinates as the polygon method does. 5.3 Pipelined and Parallel Architecture For applications that require fast transformation such as ray casting and rendering of complex shaded images, a parallel, pipelined multiprocessor architecture can be used to achieve high throughput. A parallel architecture offers an increase in speed that grows almost linearly with the number of processors used. To extract the maximum efficiency from a parallel architecture, key bottlenecks must be identified and an architecture that is best suited to overcoming those bottlenecks must be chosen. In a pipelined architecture, each processor performs a unique operation on the entire data set, or object. Once, it has completed its operation, such as rotating an object, it passes the data to the next processor in the pipeline, which performs another function, such as clipping. While that processor is clipping the current object, the previous processor can be performing geometric transformations on the next object. The algorithm proposed is designed utilizing both pipelining and parallelism to achieve real-time performance. It is developed such that each intersection processor can be placed in a VLSI chip. In a practical display system several of these chips may be used to recursively calculate surface-ray intersections at pixels or groups of pixels. The bicubic surface primitive engine (BiSPE) has the capability to 77 compute the intersection of a ray and bicubic surface with sufficient throughput to allow real-time interactive display. Pipelining helps the system achieve high throughput. The next computation can be started before the previous result is available, but by the time the last circuit is reached, the result must be there. The bicubic surfaica computation unit, BiSPE, is responsible for computing the depth and color intensity values of pixels. A hardware implementation of the depth computation unit consists of multipliers, adders, and shifters. BiSPE is based on the algorithm given in chapter 4 for rendering bicubic surfaces. The underlying principle behind the algorithm is the representation of primitives as curved surfaces, not polygons. The block diagram of BiSPE is shown in Figure 5.3. A parallel computation in x, y, and z was chosen because it operates three times as fast as computing these values one after another and there appears to be space enough on the chip to allow it. Based on bounding box tests, each PB computes the pixel representation of each primitive sent to it. The BiSPE engine produces two values, the color of the surface at the display point and the depth of that point. A combining circuit then determines the visible scene dynamically while several BiSPE engines produce the next pixel's color/depth information. 5.4 Results and Discussion The difficulty in achieving a real-time performance in a graphics system is largely due to the insufficient bandwidth to the image memory. Consider a 1024 by 78 Read Patch Control Points Compute Bounding Box Compute Intersection F Test for Hit F Compare to Nearest F Update Nearest t U, V, t Figure 5.3 Bicubic Surface Primitive Engine repate for evwy ray 79 1024 display; for an image update rate of 30 frames per second, each pixel on the raster display must be updated every 31 nanoseconds or less. Current VLSI fabrication technology makes it possible to fabricate a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and ithplementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, NMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor using the NSF supported fabrication facility (MOSIS). One method of improving the performance of ray casting, to which the method is particularly well suited, is to introduce parallelism. Since each ray is independent of its neighbors, it can perform its computation in parallel with others without interference. An alternative use of parallelism would be for each object to have its own processor, and to supply on demand the z values of the intersections of a given ray with the object. This is analogous to the "processor per polygon" approach to parallelism in conventional rendering but at a higher level which in principle allows more advantage to be taken of any special properties of the object. An advantage of ray casting, and a possible justification for its use, is that it avoids the need to make polygonal approximations of curved surfaces. Given an analytic specification of the surface, all that is necessary is that it should be possible to solve the ray-surface intersection test. 80 5.4.1 Complexity and Performance Analysis The depth computation unit (DCU) or bicubic surface primitive engine (BiSPE) analysis examines two areas: complexity and performance. Complexity is estimated for discrete construction, for VLSI fabrication, and for gate-array construction. Performd.iice is examined with respect to image resolution and DCU processing speed. DCU's complexity is measured utilizing gate count estimate which is determined by partitioning the DCU conceptual hardware organization into individual functional logic blocks. Then, the estimated gate count of each functional block is determined and totaled to provide a gate count estimate of a DCU. This technique provides an estimate of expected complexity for integrated circuit fabrication. It also provides an estimate of board-level complexity for an off-the-shelf integrated circuit implementation, which is determined through totaling the functional logic package types used. It should be noted that performance is usually enhanced for a realization by judicious use of additional gating, which may alter the estimated gate count. When the architecture is built in a parallel and pipelined fashion, matrix multiplication takes considerably less computation than given in section 4.5.1. If we have four processors running in parallel, multiplication of the first row of M (Bdzier matrix) with the first column of P (matrix of control points) requires four subtractions only. This computation where MP is MxP is given below. 81 MP [01[0] = x - 3U20 + 3x10 - x( = (x3,-x,) - 2(x2,-x10) - (X20-X10 Calculation of first row of M with the other columns of P, namely MP[O] [1], MP[O] [2], and MP[O] [3], is exactly the same. Multiplication by 2 is just shifting 1 bit left and doesn't rei~jire extra hardware since it is hardwired. On the average, matrix multiplication when it is done in parallel takes one clock cycle. Pipeline (propagation) delay is three add delays and after every delay one result is obtained. Computation of the second row of M with P requires three adds and two subtracts: MP[1][O] = 3(x00-2x10+x2) = 3[(xoo-x0) + (x~o-x0)] = 2[(xoo-xld~ + (x~o-x0)I + [(x00-x10 + (x20-x10] For the third row, it takes one add and one subtract: MP[2][0] = -3x00 3x10 =(x10-x00) + 2(xl0-xw) The units to calculate elements of each row are shown in Figure 5.4. The results above show that multiplying three 4x4 matrices costs 22 additions/subtractions for one component. Computation of this matrix with U and then multiplication of the resultant with Vrequires 3 multiplies and 3 adds each. So, for x(u~v) 6 multiplies and 28 additions/subtractions are needed. In total, calculation of x(u4v), y(u~v), and z(u~v) requires 84 adds/subs and 22 multiplies. Rest of the computation is the same as the rest of the computation given in section 4.5.1. The number of floating point 82 - - SHLI X 10 X 1 SHL 1 3[(x. - X, 0) + (X. x) X10 0 SHL 1 3(x,,- X..) Figure 5.4. Block Diagram of the Matrix Computation Unit 83 operations when the intersection calculation is implemented in parallel is shown in Table 5. 1. The total number of floating point operations to calculate the intersection of a ray with a patch at a pixel with one iteration is 177. Clearly, there is a substantial saving compared to 448 floating point operations when the algorithm is implemented sequentially. The gate count of 32-bit full adder is 518 [KUCK78]. The gate count for multiplier is 15,284. With the iterative method, division requires 6 multiplications and 3 subtractions. So, for 177 operations the total gate count is approximately 1,000,000. In fact, gate counts of 5000 are often used in multipliers, therefore this circuit can be realized with a lot less than 1,000,000 gates. Because of the finite number of bits used in representing number, every mathematical computation has the potential of introducing a round-off error in its result. To prevent this error, guard bits are used. This means operands of any Table 5.1 Number of Floating Point Operations for Ray/Patch Intersection Term Multiplication Addition Subtraction Division g1, 92 20 42 44 Ur+i, Vr+i 33 27 9 1 t 1 Total 53 70 53 1 84 mathematical function must be represented by more bits than the result. The number and type of functions performed dictate the extra precision (guard bits) required by the operands. If the result of every operation is rounded, the error is plus or minus one half of the least significant bit. Thus, for a number represented by n bits, the worst case error is 2-'*) For multiplication and division, the least significant bit (LSB) of the result may be wrong. Ergo, for these operations, operands with higher precision than the result must be used to counter the er-ror. For addition/subtraction, the largest errors occur in producing differences in nearly equal numbers. These operations were eliminated when possible. Depth computation is handled by the depth computation unit (DCU). Figure 5.5 shows the components of DCU. This unit solves for the roots of equation 4.11. The number of bits required for each DCU's component depends on the required precision at the output. We use 24 bits for the depth information, since the depth resolution of 24-bits is satisfactory for high-end graphics devices. Since only positive values are used and negative intersection depths are ignored, the display space is limited to 23 bits of precision in the Z dimension. This limit dictates the precision on the X and Y dimensions to be 23 also. The result at the output, namely the parameters u, v, and t, requires 23-bit precision as discussed above. When the precision of the DCU's components is traced back from the output, we get 32-bit precision at the input. For real-time animation, a minimum image update rate is 10 frames per second. A 30-frames-per-second image update rate is more typical for high quality P Py PZ M y-. y y M M U U r ~ PxD X M MT )U- 0 Ur X. P r Z - iP M M T ZU r Z v rv 9 d91 dui dgv d92 -du dg2 u,v i ~- u r+I If i i v r+1 Figure 5.5 Block Diagram of the Depth Computation Unit 1/X D vr told 86 animation. This kind of image update rate leaves 33 milliseconds computation time for each image. This is enough time to render 60 primitives (patches). We can see that quite complex images can be built from this number of primitives. For the minimum image update rate, three times more primitives can be rendered. Many more primitives can be rendered if the clock frequency can be increased. A 100 MHz clock will double the primitives that can be handled. The above computation estimates the performance of one primitive engine. Several of these primitive engines can be running in parallel. The potential performance of the whole system then will be the performance of one primitive engine multiplied by the number of primitive engines in the system. We just estimated scene complexity by the number of primitives that can be rendered within one-frame time. Another measure of scene complexity is based on the screen resolution that can be managed by the system. This can be found by determining the pixel time. The pixel time is the time available to process one pixel. The pixel time depends on both screen resolution and image update rate as depicted by the following equation procssig tie =(Image update rate) (Screen resolution) From the above equation, processing times for different image resolutions are shown in Table 5.2. 87 Table 5.2 Processing Time for Various Screen Resolutions Screen Resolution Processing Time (ns)640 x 480 325.5 1280 x 960 81.4 1280 x 1024 76.3 1280 x 1280 61.0 1400 x 1280 55.8 2048 x 2048 23.8 Note :Image Update Rate = 10 frames per second Since the system is implemented in a pipeline fashion, its pixel time is determined by the slowest components in the pixel computation unit. These components are the floating point add or subtract units. Using the i486 as reference, a floating point add or subtract takes 5 clock cycles. The algorithm requires only 177 flops per iteration step compared to 419 flops for the method presented in Sweeney and Bartels [SWEE86]. Our algorithm takes only one iteration to converge whereas theirs take two or more. The normal computation takes 216 flops when theirs require 407 flops. Kajiya's method [KAJI82] takes 6135 flops in the worst case. The interval Newton method [TOT-85] involves similar figures. 88 5.4.2 Simulation Results Clearly, the intersection of a ray and a bicubic Bdzier surface is not a simple task. In spite of the complexity of the mathematics involved, the actual implementation is simple enough to allow convenient hardware implementation of the algorithm. The architecture developed to implement the algorithm for the bicubic surface primitive engine, BiSPE, is shown in Figure 5.3. The algorithm correctly handles the difficult cases of the ray tangentially intersecting a planar patch and intersections of the ray at a silhouette edge of the patch. It can be broken down into modules. The granularity (the size of tasks which are assigned for work) of the modules depends on what kind of hardware or architecture they are implemented on. One possibility of a coarse architecture is to implement the modules as firmware for a general purpose microprocessor with floating point support such as the Intel i860 or i486 (80486). Of course this type of implementation wastes a significant amount of the computing power and real-estate because of the general nature of the processor. The ideal implementation would be a medium-grain architecture with only the time-consuming tasks implemented in silicon. This is a type of microcoded processor. This kind of architecture makes it possible to implement the algorithms for different primitives on one type of processor. A fine-grain implementation is where every step in each algorithm is hard-wired. This of course will yield the fastest processor. However, due to complexities of the algorithms, this kind of architecture needs a considerable amount of silicon. 89 Depth computation unit, DCU, can be implemented in VLSI by today's technology. The difficulty may arise in implementing several DCU in a single chip. The goal is to implement the complete system on a single board, and several DCUs may be required to be in one chip. With the advancement of VLSI technology, this will be possible in near future. CHAPTER 6 CONCLUSION 6.1 Summary This dissertation presents the derivation of a ray casting algorithm for bicubic Bezier surfaces. A parallelized version was proposed. The major feature that distinguishes the algorithm lies in the way the primitives are rendered directly from the bicubic form, while most other systems are based on polygonal primitives. I had two goals when I started this research. One was to do the intersection computation in short time so that the use of patches in an interactive environment will be attractive. My second objective was to design the algorithm so that it can easily be implemented in VLSI chips. The two goals set at the beginning of the research have been accomplished: a computationally regular and efficient process for the display of bicubic Bdzier surfaces is devised, and a novel pipeline with hardware enhancements for image compositing and pixel production schemes is utilized. The algorithm developed is simple, efficient and fast. The method presented here compares well with the methods given in Sweeney and Bartels [SWEE86], Toth [TOTH85], and Kajiya [KAJI82I. The research focused on a numerical technique called Newton iteration to calculate the intersection points of a ray and a bicubic Bdzier surface. There are several reasons for selecting the Bezier representation. 90 91 The Bezier control mesh provides a rough approximation to the surface, and thus makes designing test cases relatively simple and more intuitive than if some other representation had been chosen. In this research, a processor is assigned to a primitive in the display. So, a primitive processor is the intersection computation hardware for a single surface. The points of the intersection are the roots of the equation Q(u~v) - D f - RO = 0 in terms of the parametric variables u, v, and t. After obtaining the roots of the above equation, coherence and forward difference are used to reduce the number of operations required. Usage of these methods results in simple enough hardware such that one or more primitive processors can be placed on a chip. It appears that the algorithm is simple enough to be implemented in a VLSI chip in the future. The purpose of the chip is to calculate the intersection point of a ray with the curved surface. Parameter values (u~v) specify the location of the intersection on the patch, and parameter value t gives the location of the intersection on the ray. The faster image creation and a more natural user interface resulting from the way objects are defined in the database are the foreseeable advantages of the proposed design. Ray casting is a very popular technique because of the realistic images that can be produced. A great variety of visible surface algorithms have been developed over the years, but not the research reported here. Not much progress has been made on hardware techniques for rendering curved surfaces. 92 These steps are followed in this research: 1. An algorithm is developed to calculate the intersection of the ray with the cubic surface. 2. The computation required is reduced using coherence and forward differencing methods. By taking ad61antage of image coherence the average number of iterations per intersection test is reduced to one. 3. Ray casting has been a particularly popular candidate for parallel implementation because, in its simplest form, each pixel is computed independently. The algorithm is designed such that it can be placed in a VLSI chip taking advantage of the parallelism of the ray casting algorithm. This also reduces the algorithm's computational requirements. 4. The algorithm's complexity is studied to determine its computational order. Moreover, the operations in the algorithm are also counted. Advantages of the algorithm are that it is conceptually relatively simple, and the code produced therefore is concise, easy to understand and easy to generalize. Newton's method is by nature approximative and this can be a disadvantage of the algorithm proposed here since imprecise intersection solutions can occasionally lead to problems. 93 6.2 Future Work Many improvements or extensions of the algorithm are possible. The objects in the world scene may be extended to include other objects such as spheres, fractals. Each object type may be defined separately and individual routines for the intersection computati6n7 may be included. As long as a routine may be written to determine the point of intersection and the normal to the object at that point, the object may be included as part of the scene, taking advantage of the object oriented property of the algorithm. The concept of bounding volumes could be extended to group objects in the environment which are physically close to each other. A hierarchical environment could be created of the entire scene, each level surrounded by a bounding volume. If the ray intersects the bounding volume, the components of that cluster need to be examined; otherwise several objects may be eliminated from the intersection process. The model file would supply the information regarding the groupings, creating some work for the user, however, the time savings of the ray casting process is substantial [WEGH84]. The Bdzier representation is actually a special case of B-splines. Extension of the algorithm to bicubic B-spline surfaces would only involve converting the control mesh to the internal power basis representation. Ray casting algorithm can be extended to be a ray tracing algorithm when rays are further traced from the intersection point. Surface shading, shadowing, and anti-aliasing can also be added to make the algorithm more complete. 94 The hardware referred to in this research was simulated in software. This is required to determine its detailed complexity, possible pipelining structure and register requirements. Hardware implementation of a primitive processor for ray/surface intersection is the next logical step. BIBLIOGRAPHY APPE68 Appel, A., "Some Techniques for Shading Machine Renderings of Solids," AFIPS Spring Joint Computer Conference, Vol. 32, 1968, pp. 3745. ARV087 Arvo, J. and Kirk, D., "Fast Ray Tracing by Ray Classification," Computer Graphics, Vol. 21, No. 4, July 1987, pp. 55-64. BAD090 Badouel, D., Boutaouch, K., and Priol, T., "Ray Tracing on Distributed Memory Parallel Computers: Strategies for Distributing Computations and Data," Technical Report 508, Institut de Recherche en Informatique et Systemes Aleatoires (IRISA), January 1990. BARS84 Barsky, B. A., "A Description and Evaluation of Various 3-D Models," IEEE Computer Graphics and Applications, Vol. 4, No. 1, January 1984, pp. 38-52. BART87 Bartels, R., Beatty, J., and Barsky, B., An Introduction to the Use of Splines in Computer Graphics, Morgan Kaufmann, Los Altos, CA, 1987. CATM74 Catmull, E., A Subdivision Algorithm for Computer Display of Curved Surfaces, Ph.D. Dissertation, University of Utah, December 1974. CATM75 Catmull, E., "Computer Display of Curved Surfaces," Proc. IEEE Conf. on Computer Graphics, Pattern Recognition and Data Structure, Los Angeles, May 1975, pp. 11-17. CHAN89 Chang, S., Shantz, M. and Rochetti, R., "Rendering Cubic Curves and Sur-faces with Integer Adaptive Forward Differencing," Computer Graphics, Vol. 23, No. 3, July 1989, pp. 157-166. CLAR79 Clark, J. H., "A Fast Scanline Algorithm for Rendering Parametric Surfaces," Computer Graphics, Vol. 13, No. 2, August 1979, p.174 (supplement to Proc. SIGGRAPH 79). CLAR82 Clark, J. H., "The Geometric Engine," Computer Graphics, Vol. 16, No. 3, July 1982, pp. 127-133. 95 |

Full Text |

PAGE 1 A RAY CASTING VISIBILITY ALGORITHM FOR CUBIC SURFACES By IÂ§IN GURSES BUYUKLiMANLI A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1991 PAGE 2 ACKNOWLEDGEMENTS I would like to express my appreciation to my advisor and supervisory committee chairman, Dr. John Staudhammer, for the guidance and encouragement he provided me on this project. I am also grateful to the other members of my supervisory committee. Dr. James Keesling, Dr. Steve Thebaut, Dr. Panos Livadas, and Dr. Carl Crane, for their commitment. I also wish to thank the members of the Computer Graphics Research Group, especially Kris Iskandar and Hasan Incirlioglu, for their help and suggestions. I want to thank to Dr. Murat Balaban for his encouragement and the help he provided. Last but not least, I thank my husband, my best friend, Temel, for his help, support, and love throughout this dissertation. This dissertation is dedicated to my parents, Yurdanur and Muzaffer Giirses. ii PAGE 3 TABLE OF CONTENTS ACKNOWLEDGEMENTS " ABSTRACT v 1 INTRODUCTION 1 LI Research Motivation and Objective 2 1.2 Dissertation Overview 4 2 REVIEW OF BASIC CONCEPTS USED 5 2.1 Image Synthesis 5 2.L1 Visibility Determination 8 2.L2 Ray Casting 10 2.2 Curved Surfaces 17 3 PROBLEM DEFINITION AND REVIEW OF EXISTING ALGORITHMS 28 3.1 Problem Definition 28 3.2 Literature Review 30 3.2.1 Numerical Techniques 31 3.2.2 Subdivision Techniques 33 3.2.3 Parallel Processing Techniques 35 4 THE PROPOSED ALGORITHM 37 4.1 Preprocessing Steps 38 4.1.1 Selection of a Bounding Volume 38 4.1.2 Construction of a Bounding Box Tree 41 4.1.3 Determination of Boundary Edges 42 4.1.4 Location of Silhouette Edges 43 4.1.5 Calculation of Surface Normal 45 4.2 Newton's Method 48 4.3 Algorithm 53 4.4 Lighting Model 57 4.5 Results and Discussion 60 4.5.1 Complexity and Performance Analysis 61 4.5.2 Simulation Results 64 iii PAGE 4 5 THE PARALLEL IMPLEMENTATION OF THE ALGORITHM 66 5.1 Coherence 69 5.2 Forward Differencing Technique 70 5.3 Pipelined and Parallel Architecture 76 5.4 Results and Discussion 77 5.4.1 Complexity and Performance Analysis 80 5.4.2 Simulation Results 88 6 CONCLUSION 90 6.1 Summary 90 6.2 Future Work 93 BIBLIOGRAPHY 95 BIOGRAPHICAL SKETCH 101 iv PAGE 5 I Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A RAY CASTING VISIBILITY ALGORITHM FOR CUBIC SURFACES By IÂ§m Giirses Biiyiiklimanli May 1991 Chairman: Dr. John Staudhammer Major Department: Computer and Information Sciences An algorithm for ray casting parametric surfaces is developed for visibility calculation. The new algorithm solves the ray/surface intersection directly using Newton iteration and utilizes ray-to-ray coherence. The computation required is reduced using bounding boxes as bounding volumes, coherence, and forward differencing methods. A parallel implementation of the algorithm is also presented taking advantage of the parallelism of ray casting to further reduce the algorithm's computational requirements. The final parametric surface intersection algorithm gives a good combination of speed and accuracy. The algorithm seems to be well suited for hardware implementation. Algorithms as well as various architectures are reviewed and some of them are also compared to the new algorithm to prove its efficiency. A system simulator is developed to verify the algorithm and the functional behavior of the system architecture. V PAGE 6 CHAPTER 1 INTRODUCTION Computer graphics produces pictures generated by a computer programmed and operated by human beings. Only the imagination of the user and the capability of the hardware limits the form they can take: animation, simulation, graphs, recreations, portraits, architecture, drawings, paintings, or just simple lines. Because computer graphics is a way of making thoughts and ideas visible, these revolutionary images are becoming the universal language of our time [WARD89]. The design and manufacture of aircraft, turbo machinery, and automobiles, and recent applications in the advertising and animation industries have become driving forces behind research into techniques for surface design. Curves and surfaces having free-form shapes such as faces, clothes, shoes, clouds, mountains, car bodies, airplanes, are important in the construction of three-dimensional objects for image synthesis in computer graphics. The vast majority of curved surfaces may be defined with a relatively few numbers by using piecewise continuous surface patches [BART87]. However, the display of these surfaces, the rendering, on a high-definition display requires the computation of coloring on a pixel-by-pixel basis. Clearly this can be a computationally intensive task, even though the color variations are often slight. 1 PAGE 7 / 2 One of the more widely used methods for finely rendering objects on a display screen is ray casting. As the name implies, the procedure calls for calculating the light properties of a ray of light which travels from the surface elements in a pixel to the observer. Ray casting is a powerful yet simple approach to image generation. This method has become one of the important tools from which highly realistic images are generated. But it also has disadvantages. The major drawback of this procedure is the great amount of time to compute ray/surface intersections. In this dissertation, this problem is attacked and a new ray casting visibility algorithm is devised. This algorithm has been compared with some algorithms that are widely used. The results show that its execution time is a bit better than the others. It has unique properties: it is short, elegant and has some interesting applications. 1.1 Research Motivation and Objective This dissertation was motivated by the lack of efficient ray casting algorithms for curved surfaces. This work especially deals with ray casting of parametric surfaces. The parametric method of surface representation is very convenient for approximation and design of curved surfaces. In particular, Bezier surface patches and B-Spline surfaces are extensively used in computer graphics and CAD. The research has focused on efficient and fast algorithm design with the objective of determining the ray/object intersections as fast as possible. The research has also addressed both the efficiency and speed of existing algorithms and analyzed PAGE 8 3 deficiencies of those algorithms. This process led to a new approach to resolve those deficiencies. As the first part of the research, the Newton-Raphson method is reformulated for Bezier surfaces to reduce the complexity of the ray/object intersections. Since every ray casting algorithm is by nature an object-oriented one, this characteristic made the design and implementation attractive by applying object-oriented concepts and ideas. For applications that require fast transformation such as ray casting and rendering of complex shaded images, a parallel, pipelined multiprocessor architecture can be used to achieve high real-time performance for image synthesis. A parallel architecture offers an increase in speed that grows almost linearly with the number of processors used. To extract the maximum efficiency from a parallel architecture, key bottlenecks must be identified and an architecture that is best suited to overcoming those bottlenecks must be implemented. Since the calculation for a ray cast through a pixel in the screen is independent of the calculation for other rays cast, this process can be implemented in parallel. Therefore, as a second part of this research, a VLSI-oriented real-time algorithm is developed that combines efficiency, simplicity, speed, and parallelism. Using an incremental technique, the computation for the intersections of a ray with surfaces is reduced. The algorithm is designed in a way that each intersection processor can be placed in a VLSI chip. In a practical display system several of these chips may be used to calculate surface-ray intersections at pixels or groups of pixels. PAGE 9 4 1.2 Dissertation Overview This dissertation is concerned with the design of an efficient algorithm for display purposes. It is organized into six chapters. Chapter 1 is an introductory chapterthat covers objectives and background about the dissertation subject. In chapter 2 a review is presented of the basic concepts used devising the algorithm. Chapter 3 defines the problem and gives a review of existing algorithms. Chapter 4 discusses the details of the algorithm and its complexity and performance analysis. Chapter 5 describes the techniques to reduce the computation required and parallel implementation of the algorithm and complexity and performance analysis results are given. Finally, chapter 6 summarizes the results along with suggestions for future work. In summary, the major contributions of this work are the design of an efficient ray/object intersection algorithm, and improvements on this algorithm to make it suitable for parallel hardware implementation. PAGE 10 CHAPTER 2 REVIEW OF BASIC CONCEPTS USED 2.1 Image Synthesis Image synthesis is the subfield of computer graphics that is concerned with the production of pictures defined by numbers and procedures. If the image is to appear true to a physical object, the process may be costly, complex and may require a tremendous amount of computing power. There is a tradeoff present in all image synthesis applications between realism and the amount of required computations. The image synthesis is usually accomplished with a series of processes, termed the image synthesis pipeline. Based on a simulation of light propagation in the real world, the image synthesis pipeline consists of several steps: creating data representations, geometry processing (modeling transformation, viewing operation), rasterization (visible-surface determination, scan conversion, shading), and displaying the result. Geometry processing is independent of the display device, whereas the rasterization pipeline takes transformed, clipped primitives and produces pixels. The display process that maps a model to an image on screen is called rendering, and its implementation in software and/or hardware is referred to as the rendering pipeline. This standard graphics rendering, the image synthesis pipeline, is shown in Figure 2.1. 5 PAGE 11 Object Data Base Viewing Transformations 4 Clipping 4 Projective Transformations 4 Visible Surface Calculations 4 Scan Conversion 4 Shading 4 Image Memory 4 Display Controller 4 Video Display Figure 2.1 Standard Rendering Pipeline PAGE 12 7 The object database contains geometric, optical and material information of the objects (also called primitives). The geometry representation of each object depends on the type of primitive used by the image renderer. The type of object used here is a curved surface, whose characteristics will be explained later. To render the scene image correctly, the object is transformed from modeling coordinates to world coordinates. Next, world coordinate descriptions are converted to viewing coordinates. That is, the scene to be viewed is clipped against a view volume (frustum) and projected into the view area defined on the view plane. There are two basic methods for projecting 3D objects onto a 2D viewing surface. All points of the object can be projected to the surface along parallel lines, or the points can be projected along lines that converge to a position called the center of the projection. These two methods are called parallel projection and perspective projection, respectively. A parallel projection preserves relative dimensions of objects (this is the technique used in drafting to produce scale drawings of three-dimensional objects) but does not give a realistic representation. A perspective projection, on the other hand, produces realistic views but doesn't preserve relative dimensions. The farther a primitive or part of a primitive is from the viewer, the smaller it will be drawn. This effect, known as perspective foreshortening, is similar to the effect achieved by a photograph and by the human visual system. The next step in the image generation process is visible-surface calculation, also called hidden-surface removal. Visible-surface determination is the process of determining which portions of a primitive are actually visible from the synthetic camera's point of view. Most PAGE 13 8 hidden-surface algorithms use image-space methods and in this study the ray casting technique as a point-sampling and image-space algorithm is used. Ray casting is compatible with the image compositing network designed by Fleischman [FLEI88]. After determining the pixels covered by a primitive's image, scan conversion, the color for each covered^ixel is determined through the use of lighting equations, i.e. shading. This color value, along with its z coordinate and opacity, is passed to the compositing network for final display. 2.1.1 Visibility Determination The task of visibility determination of an object on a raster display has been originally referred to as the hidden-surface problem. Different methods for visiblesurface calculation are well established. It can be categorized into two groups: point sampling algorithms and continuous algorithms. While Z-buffer, ray casting/tracing, painter's, and scan-line algorithms belong to the first group, area/volume subdivision and scan-plane algorithms belong to the second. Hiddensurface algorithms are also classified according to whether they deal with object definitions directly or with their projected images. These two approaches are called object-space methods and imagespace methods, respectively. An object-space method implemented in the physical coordinate system compares objects and parts of objects to each other to determine which surfaces are visible. These results are generally accurate to the precision of the machine. In an image-space algorithm, visibility is decided point by point at each pixel position on the projection plane. That is, calculations are performed only to PAGE 14 9 the precision of the screen. Therefore, image-space algorithms are also called pointsampling algorithms. The computational work for an object-space algorithm that compares each object in a scene with the remaining (n-1) objects grows as the number of objects squared (n^). Similarly, the work for an image-space algorithm which compares every object in the scene with every pbcel location in screen coordinates is proportional to nN, where N is the number of pixels, typically (512^) or more. While one might therefore believe that object-space algorithms require less work than image-space algorithms even when n exceeds 100,000, in practice, this is not the case. Most of the individual steps in object-space algorithms are more timeconsuming while it is easier to take advantage of coherence in raster scan implementation of image-space algorithms. Visibility determination is a fundamental and recurring subproblem in computer graphics. It is done in a 3D space prior to the projection into 2D that destroys the depth information needed for depth comparisons. The visible-surface problem for the simple case (high-order visibility problems being such as those calculating motion blur, depth of field, or indirect-illumination effects) can be stated as follows: Given a set of surfaces in three dimensions, a viewpoint, an oriented image plane, and a field of view, PAGE 15 10 For each pixel on the image plane within the field of view do { Determine which pixel on which surface lies closest to the viewpoint along a line from the viewpoint through the pixel in the image plane. Draw the pixel in The appropriate color. } The visibility determination of curved surfaces involves real-time image generation process. Ray casting determines the visibility of surfaces by tracing imaginary rays of light from the viewer's eye to the objects in the scene. This is a typical image-space algorithm discussed before and will be used throughout this dissertation for visibility determination. 2.1.2 Rav Casting Although ray casting has been around before computers, it became popular in the computer graphics community through the paper by Whitted [WHIT80]. Ray casting is a technique which accomplishes a mapping from 3D display to 2D image plane by shooting a ray from the eye through each pixel on the screen to the world scene to identify the visible surface [KAUF86, ROTH82]. In the literature, ray casting and ray tracing are often used synonymously, defining a full recursive algorithm that integrates both visible surface determination and shading. However, in this dissertation, the use of ray tracing or ray casting refers only to visible surface PAGE 16 11 Object Data Base 4 Viewing Transformations 4 Ray Casting 4 Shading 4 Image Memory 4 Display Controller 4 Video Display Figure 2.2 Rendering Pipeline for Ray Casting. PAGE 17 12 algorithm. There is no tracing of additional rays from the points of intersection to determine a pixel's shade. As mentioned before, the rendering pipeline is a complex process, but the ray casting pipeline is the simplest among others because those objects that are visible at each pixel and their iirumination are determined entirely in world coordinates (see Figure 2.2). Once objects have been obtained from the database and transformed by the viewing transformation, they are loaded into ray caster's world coordinate database. Ray casting is one of the more widely used methods for finely rendering objects on a display screen. The ray comes from the light source and hits the object. Some reach the observer and let the observer see the scene. If we trace the rays like this (from the light source to the observer), very few will reach the observer. Since this is obviously not an efficient way, we cast the rays in the opposite direction, from the eye through each pixel on the screen to the world scene. The first surface that an individual ray intersects will be the surface that is displayed in that particular pixel. The ray casting process is shown in Figure 2.3. This approach avoids the need to explicitly define silhouette edges for the general curved patch models and it also allows illumination effects to be modelled in a direct way. The problem here is intersecting the rays with a large collection of primitive objects defining an environment. The time spent to compute ray/surface intersections is the bottleneck of the ray casting algorithm. For each pixel in the raster, the corresponding ray is traced to the first surface that it encounters to PAGE 18 13 Observer Figure 2.3 Ray Casting Process PAGE 19 14 determine which objects in the scene are intersected by the ray. Even for a screen resolution of 512x512, 250,000 rays should be traced (generated). The points of intersection of this ray with the surfaces are computed and the intersections are sorted in depth. The surface closest to the image plane is the visible surface through a particular pixel. The general algorithm for the ray casting is as follows: Read in the parameters for the scene from the model file Perform preprocessing for y = miny to maxy for X = minx to maxx send a ray from origin through pixel (x,y) for object = 1 to num_of_objects determine closest object to origin along ray determine lighting of closest object color pixel (x,y) The ray casting algorithm handles one pixel at a time and compares every object's depth at this pixel. As Whitted [WHIT80] noted, it can take as much as 95 percent of the total processing time just for the ray/surface intersection calculations. There are two popular directions which can be considered for speeding up ray casting. One way is to reduce the number of candidates to be checked against each ray, at the expense of a little extra processing. Another way is to design a more PAGE 20 15 efficient ray/surface intersection procedure. The techniques discussed in this dissertation belong to these two directions. One approach is called space subdivision. The idea is that we can look at the world that surrounds the database and cut it up into many little volumes called voxels. As the ray moves through the world looking for the first object it hits, it essentially moves from volume to volume. If each volume contains a list of the objects within it, then we only need to check the ray against those objects. If we do get a hit, then we do not need to check the ray against all the other objects in the database; otherwise we move to the next volume and try again. There are two subdivision schemes. In the uniform subdivision proposed by Fujimoto et al. [FUJI86] and also described by Cleary and Wyvil [CLEA88], the object space is partitioned into voxels along three major axes by a regular 3D grid. Glassner [GLAS84] and Kaplan [KAPL85] describe adaptive subdivision in which a hierarchical tree known as octree is constructed to represent the given space by recursively subdividing it into eight equal size and disjoint voxels. Another approach is the bounding volume technique. A simple bounding volume such as a sphere or a box is placed around each object. Using bounding volumes results in a significant gain in efficiency. Only if a ray intersects the bounding volume does the object itself need to be checked for intersection. Though this actually increases the amount of computation for rays which come near enough to an object to pierce its bounding volume, in a typical environment most rays closely approach only a small fraction of the objects. Used in this way, bounding volumes PAGE 21 16 substitute simple intersection checks for more costly ones but do not decrease their number. From a theoretical standpoint this may reduce the computation by a constant factor but cannot improve upon the linear time complexity of exhaustive ray tracing. To alleviate this problem, Rubin and Whitted [RUBI80] introduced the notion of hierarchical bounding volumes to ray tracing in order to attain a theoretical time complexity v^^hich is logarithmic in the number of objects instead of being linear. The method of hierarchical extents uses a tree search to find the objects hit by a ray. In the best case it yields O(logn) intersection calculations per ray. Since tree nodes are small in comparison to object data, only about 30 percent extra space is required to store the hierarchy required for machine use in a typical implementation. The idea is further developed by Kajiya [KAJI83], Weghorst et al. [WEGH84], and Kay and Kajiya [KAY86]. In the algorithm proposed here, to eliminate unnecessary intersection tests, the intersection of a ray with a volume bounding the object is tested. There are two criteria for a good bounding volume. First, the volume must be easily tested for an intersection with a ray. The bounding volume intersection test requires significantly less computation time than testing against the original object. Second, although the bounding volume may be simple to test, it should be as tight a fit about the object as possible. With a tighter fit, the rays that hit the bounding volume will be minimized. The bounding box scheme used here is similar to the slab-plane scheme developed by Kay and Kajiya [KAY86] and later adopted by Hsu [HSU90] in his dissertation. The construction of bounding boxes will be explained in detail in chapter 4. PAGE 22 17 2.2 Curved Surfaces The use of straight Hne segments and polygons to approximate curved lines and surfaces results in visually objectionable images, even with the most sophisticated continuous-shading models. Mach bands appear at the borders between adjacent polygons, and we alwayssee a polygonal silhouette. Also, polygonal methods often require excessive amounts of storage. However, curved surface techniques allow the resulting image to be computed to whatever level detail the situation demands. Curved surface representations are important in computer graphics applications, because they are often more compact than polygonal representations and hence require relatively compact databases. Surfaces may be represented in general by the 3-space relationship F(x,y^) = 0. This is the implicit representation of a surface. Unfortunately, the determination of the depth (z) of a surface point from a known ray direction ix,y) involves the solution of an implicit relationship, which may become computationally difficult, even untractable. On the other hand, the use of parametric curves and surfaces for object modeling in computer graphics is becoming increasingly popular. Parametric representation is the most direct and most flexible description of a surface. It has many desirable properties [FAUX79, MUDU85, ROGE76] : 1. It allows a smooth object with no analytical definition to be approximated with a relatively small number of surface patches that join with continuity, thereby saving considerable memory space in representing the object. PAGE 23 18 2. It is axis independent: it avoids infinite slope values with respect to some arbitrary axis system. Difficulties arise using axis-dependent, nonparametric curves when the end point of a curve has a vertical slope relative to the chosen coordinate system. 3. It facilitates the representation of space curves in homogeneous coordinates. 4. Transformations can Be performed directly on parametric equations. Translation or rotation of the axes or the object can usually be carried out by translating or rotating the vectors defining the curve or patch, without modifying the functions of the parameters used. 5. It allows the unambiguous representation of multivalued surfaces or space functions. 6. It doesn't involve solution of nonlinear equations for each point (as in the case of implicit representation). 7. Points on the surface can easily be computed sequentially along parametric lines in the surface for display purposes. 8. Parametric surface patches lend themselves to piecewise constructions more naturally than do implicit surfaces. Two reasons for this are that parametric patches are defined over a finite domain and they have distinct boundary curves. By contrast, implicit surfaces can be of infinite extent. 9. Parametric equations usually offer more degrees of freedom for controlling the shape of curves and surfaces. For example, a 2D explicit curve of the form y = cc^+bx^+cx+d has four coefficients that we may vary to control the curve. PAGE 24 19 However, a 2D parametric cubic curve of the form x = ai^+bu^+cu+d, y = ai+bu^+gu+h has eight coefficients that may be used for shape comrol. Curved surfaces can be described by parametric bicubic patches, Bezier patches, and B-spline patches which are defined by cubic equations of two parameters, u and v [FOLE82]. Varying both parameters from 0 to 1 defines all points on a surface patch. Cubic patches are determined by a collection of 16 3D control points. Because these patches can be pieced together while maintaining second-order continuity at the boundaries, they are very useful for modeling smooth surfaces. Since two parameters are required for the representation of a surface, we assume that a surface may be represented by a bivariate vector-valued function Q(u,v) = [x(u,v) yiu,v) z(u,v)] (2-1) The form of Q{u,v) is (2.2) The vectors are the algebraic coefficients. There are sixteen ag vectors, each with three components. This means that 48 degrees of freedom, or coefficients, must be specified to define a unique surface. PAGE 25 20 Equation 2.2 is more conveniently written as (?(u,v) = t/ C (^-^^ where U = [m^ u 1], F = [v' v 1], C is the coefficient matrix, and is the transpose of V. Since' the elements of the coefficient matrix are three component vectors, the C matrix is a 4 x 4 x 3 array. A bicubic patch is bounded by four curves as shown in Figure 2.4; each is a parametric cubic curve. To demonstrate this fact, if we substitute v=0 into equation 2.2, all terms containing v vanish, and the equation becomes Q{u) = a^^u^ +a^u'^ +a^u+a^^ (2-4) This is an expression for a parametric cubic curve. Of course, the same can be done for the boundary curves on u = 0, w= 1, and v= 1. In fact, by setting either M or V equal to some constant, one can produce rulings on the surface. The network of curves divides the surface into an assembly of topologically rectangular patches, each of which has as its boundaries 2 M-curves and 2 v-curves. Here it is assumed that u and v run from 0 to 1 along the relevant boundaries; then Q{u,v), 0
PAGE 26 21 Figure 2.4 Boundary Curves of a Bicubic Patch PAGE 27 22 Figure 2.5 Boundary Conditions Defining a Bicubic Patch PAGE 28 23 Since algebraic coefficients are not the most convenient way to define and control the shape of a patch, a geometric form defining a patch in terms of certain boundary conditions related to the algebraic coefficients is developed. To describe the boundary conditions 16 vectors are needed, 4 of them corner points, and 8 of them tangent vectors (Figure 2.5). Four more vectors called twist vectors (not shown in the figure) are used to show how the tangent vectors across a patch boundary change along that boundary. For example, the twist vectors at poo <^^^ Pn control the way pI, changes along the boundary curve u = 0 from pl^ to p^i. The twist vectors are denoted as pH, pZ, Pm and p^. We will deal with Bezier surfaces in this research since they are attractive in interactive design. Advantages of Bezier surfaces are 1. It is easy to predict how the surface will respond if a control point is moved. 2. Local control is easy with Bezier surfaces. Any change in the vertices of a span will only affect the curve within that span. The rest of the curve will remain unaffected. 3. All that is necessary to increase the order of any curve segment is to specify another interior vertex. This greatly increases flexibility and overcomes many of the difficulties of cubic spline fitting and parabolic blending techniques. 4. The convex hull which is the convex polygon obtained by connecting the control points of bicubic Bezier surfaces allows bounding regions for the surface to be easily calculated, greatly increasing the efficiency of many algorithms. PAGE 29 24 5. The Bezier patch generated by the 16 control points is constrained to He within the convex hull of these points. This is useful for cHpping and determining when two patches might intersect. 6. The surface shape can be changed without disturbing the slope design points, which may then be independently specified to satisfy continuity requirements with adjacent surfaces. 7. There is no need to calculate slope and tangent vectors. 8. Bezier surface need not be represented by a square array of control points. There is also a disadvantage to the use of Bezier surfaces: the movement of one control point will affect the entire surface. In order to obtain local control a patch must effectively be subdivided. A Bezier surface is a tensor product of Bezier blending functions (Bernstein basis) which interpolates between the first and last vertices. It is defined by a set of control vertices, in 3D x-y-z space, that is organized as a 2D graph with a rectangular topology. The set of control vertices is called a control hull or control graph [BARS84]. A bicubic Bezier surface, shown in Figure 2.6 takes the form where m=/i = 3 and is the control graph specifying the location of the (m + 1) by (n + 1) control points. The basis function, B^^iu), is given by PAGE 30 25 Figure 2.6 Bicubic Bezier Surface PAGE 31 26 where (2.6) ml (m-i)\ i\ with m being the degree of the polynomial and / the particular vertex in the ordered set (from 0 to m). For a bicubic Bezier patch, the x component of Qs^iUyV) in matrix form is xiu,v) = [Bo3(Â«) B,3(u) B^,(,u) B,^iu)] ^02 ^03 '^lo ^12 ^20 ^22 ^23 ^30 ^32 ^33 5.3(v) 523(V) (2.8) The bicubic Bezier surface takes the matrix form when the basis functions are substituted in the above equation: xiu,v) = [(l-uf 3(l-uYu 3(1-u)m2 u^] ^00 ^01 ^02 ^03 ^10 ^12 ^13 ^20 ^21 ^22 ^23 JTjQ Xjj A:33 (1-v)' 3(l-v)2v 3(1-v)v2 (2.9) PAGE 32 27 Equation 2.9 can be rewritten as x(u,v) =[u^ u I] Af/ [v3 V 1]' where (2.10) -1 3 -3 1 -1 3 -3 1 3 -6 3 0 3 -6 3 0 0 -3 3 0 0 -3 3 0 1 0 0 0 1 0 0 0 and P, = ^00 ^01 ^02 ^03 '^lO ^ll ^12 ^13 ^20 ^21 ^22 ^23 XjQ ATjj The matrix contains the position vectors for points that define the Bezier surface patch. Only the four corner points Xoo, x^a, Xoj, and x,, actually lie on the patch. The points x^, x^^, x^, and X22 control the cross slopes along the boundary curves in the same way as the twist vectors of the bicubic patch. The remaining points control the slope of the boundary curves [MORT85]. PAGE 33 CHAPTER 3 PROBLEM DEFINITION AND REVIEW OF EXISTING ALGORITHMS 3.1 Problem Definition The task we are primarily concerned with is that of intersecting rays with a large collection of primitive objects defining an environment. For each pixel in the raster, the corresponding ray is traced to the first surface that it encounters to determine which objects in the scene, if any, are intersected by the ray. The points of intersection of this ray with the surfaces are computed and the intersections are sorted in depth. The surface closest to the image plane is the visible surface at a particular pixel [ROGE85]. The ray is represented parametrically, and the point along the ray is given by R(t) = RO + Dt (3.1) where RO = [x^ zd is the ray origin, D = is the ray direction, and t is the ray's traveling time relatively starting from the origin RO. Perspective projection enhances the realism of a displayed image by providing the viewer with depth cues. A perspective transformation need not be applied here since the ray casting method determines the correct perspective as an automatic result of tracing the rays from the eye. The objects are described in a left-handed 28 PAGE 34 29 coordinate system with the eye at the origin (see Figure 3.1). The screen is centered about the positive z-axis and acts as the viewing plane. Without this centering process, objects may be skewed in the viewing plane. Skewing is also dependent on the distance in the z direction of the viewing plane. The further the distance between the eye and the viewing plane, the more parallel the different rays travel, thereby reducing skewing of the scene. In fact, some ray tracing algorithms place the eye at z = -Â«, thus having all rays travel in the same direction parallel to the z-axis. Our implementation places the eye at the origin with the view plane at a user specified distance d in the +z direction. The effect is to perform a single-point perspective projection onto the image plane. Since the z value represents the distance of a point from the eye, depth information is included in the projection equations. Objects that are farther away have their x and y coordinates divided by larger z values than those closer, causing these distant objects to be drawn smaller. Since we are using perspective projection and the center of projection is located at the origin (0,0,0), the ray equation 3.1 becomes Rit) = Dt (3.2) where Xo = yÂ„ = = 0, Z), = DJd, = D^/d, = 1. A bicubic Bezier surface is given by the equation PAGE 35 30 Figure 3.1 Left Handed Coordinate System PAGE 36 31 The intersection of a ray with a surface of this form is obtained as the simuhaneous solution of the three nonlinear equations given by the vector equation: F = Qiu,v) D t RO (3-'*) The time spent for this ray/surface intersection computation is the bottleneck of the ray casting algorithm. Even for a screen resolution of 512x512, 250 000 rays should be generated. For complex scenes modeling complex lighting effects, antialiasing or motion blur [COOK84], this calculation must be performed millions of times. Direct methods of calculation have been found for several surface types, including spheres, polygons [WHIT80], cylinders and volumes of revolution [KAJI83], Steiner patches [SEDE84], and bicubic surface patches [KAJI82]. Numerical techniques have been employed in the calculation for certain algebraic surfaces [HANR83] and parametric surface patches [JOY86, LEVN87, LISC90, SWEE86, TOTH85, YANG87]. Several authors addressed the problem of reducing the number of ray/surface intersection calculations using the techniques of recursive subdivision [ARV087, FUJI86, GLAS84, WOOD89], bounding volumes [WHIT80, KAY86], hierarchical bounding volumes to further improve the efficiency of the bounding volumes [RUBI80, WEGH84], and coherence [HECK84]. One research direction in image synthesis has been to exploit the inherent parallelism in the algorithms used to create realistic images. Ray casting is an ideal candidate for parallel processing. Various hardware techniques [BADO90, DIPP84, PAGE 37 32 PLUN85, ULLN83] have been used for acceleration of ray tracing. Most of these approaches are based on tasking multiple microprocessors to manipulate a subset of rays in parallel. PuUeybank and Kapenga as well as Ullner also explored the use of VLSI circuits to speed up the process of ray/surface intersection [PULL87, ULLN83]. Since in this study we are dealing with the intersection of a parametric surface with a ray, a detailed survey about ray/parametric surface intersection will be presented rather than giving all the work for intersecting rays with surfaces. Although the methods used in other type of surfaces, such as implicit surfaces, are similar to the ones employed in our study, those won't be explained here in detail. 3.2 Literature Review Equation 3.3 can be solved for points of intersection, u, v, and t using numerical techniques or subdivision techniques. There is also a trend towards parallelizing ray casting algorithms, especially towards performing ray/object intersection calculations in parallel. In recent years several methods to calculate the intersection between parametric surfaces and rays have been proposed. But, published algorithms on ray tracing parametric surfaces are relatively few compared with the number which deal with other kinds of surfaces. Unfortunately, most of these algorithms are either very expensive, or possess other disadvantages that preclude their use in real-life rendering systems. PAGE 38 33 3.2.1 Numerical Techniques Kajiya [KAJI82] uses ideas from algebraic geometry to obtain a numerical procedure for intersecting a ray with a bicubic surface patch. A three-dimensional ray is regarded as the intersection of two planes OtX+by+c^+di = 0 and a^+biy+ciz+di 0. For the bicubic, Kajiya then substitutes the parametric surface equations into the plane equations to obtain polynomial equations of the form Fi(u,v) = 0, F-^{u,v) 0, which can be solved numerically. The Laguerre method used in that paper is cubically convergent and therefore takes only one iteration for linear and quadratic functions. The method is slow but definite. It converges to the nearest solution more quickly than subdivision, but is more expensive at each step. The algorithm uses floating point operations. Kajiya estimates that 6000 floating point operations may have to be performed in order to find all of the intersections between one ray and one bicubic patch. Toth solves the ray/surface equation by minimizing the function {Q(u,v) Â• Di Â• ROy with respect to u, v and t [TOTH85]. This method, known as multivariate Newton iteration, has the advantage of being able to handle a line which either touches the surface or has no intersection with it. But, the computation of the control points for the interval extension method used in this paper requires more than 40% of the total CPU time used, which is a serious drawback in the algorithm. The main reason for this is that the method fails to utilize coherence. Joy and Bhetanabhotla propose another algorithm [JOY86] using a quasi-Newton iteration which is basically a generalization of the method used by PAGE 39 34 Toth to solve the ray/surface intersection problem. To speed up the convergence of the quasi-Newton method, ray coherence is utilized: numerical calculations in adjoining pixels are used forming the initial approximations for new calculations. Although this method has computational advantages over the conventional Newton method in obtaining the' initial guess, naive utilization of ray coherence may cause convergence to incorrect solutions and subdivision of object space to prevent this may lead to excessive memory requirements. Sweeney and Bartels [SWEE86] present a method for rendering B-spline surfaces for ray tracing. It combines subdivision and numerical techniques: the Bspline surface subdivision technique is utilized to generate a close tiling of the surface with control points which are then used as initial approximations for Newton's method, which in turn is used to generate the final intersection point. Time is saved at the expense of using more memory. There is also no guarantee that the Newton iteration, once initiated, will converge to the correct solution. Levner et al. [LEVN87] describe a method that was originally developed for ray tracing /3-spline surfaces, but has been applied successfully to bicubic surfaces in general. Their method is actually a variation on that of Sweeney and Bartels. Instead of refining the control vertex mesh, they create a mesh of points that lie on the surface itself. The advantage of this approach is that point evaluations on the surface allow treatment of general parametric surfaces. But, the memory requirements of this method are still very large. PAGE 40 35 Another improvement on the method reported by Sweeney and Bartels is described by Yang [YANG87]. An individual octree is created for each B-spHne patch by subdividing its bounding box. It provides a more tight bounding scheme than that of Sweeney and Bartels. Since all leaf boxes hit by a ray are searched before actually attempting ray/surface intersection, the algorithm is inefficient. Lischinski and Gonczarowski [LISC90] present improvements of previously known techniques and introduce a spaceand time-efficient algorithm utilizing these. This algorithm combines numerical and subdivision techniques, thus allowing utilization of ray coherence and greatly reducing the average ray/surface intersection time. Although some of the other methods mentioned above, or in the next section, are faster than their method, they claim that theirs is qualitatively superior to them, since it doesn't possess the others' disadvantages. 3.2.2 Subdivision Techniques Many subdivision methods have also been employed to solve the intersection of a ray with a bicubic patch. The subdivision algorithm uses much storage and converges only at a linear rate to a solution. Whitted [WHIT80] was the first to describe recursive subdivision methods for ray tracing parametric surface patches. If the bounding volume of a patch is pierced by the ray, then the patch is subdivided and bounding volumes are produced by each subpatch. The subdivision process is repeated until either no bounding volumes are intersected, or the intersected bounding volume is smaller than a predetermined PAGE 41 36 minimum. This method requires testing each surface against each ray, which resuUs in too many subdivisions; thus it is very inefficient. Rubin and Whitted [RUBI80] have generalized the method used by Whitted representing the object space entirely by a hierarchical data structure consisting of bounding volumes. TKe' algorithm utilizes parallelepipeds for bounding boxes with an orientation that minimizes their sizes. These hierarchical volumes were constructed by hand for each scene, a time consuming process not suitable for animation. Although the hierarchical representation yields an improvement of about 100:1 over naive subdivision, it is still too slow. Recently, a new approach based on recursive subdivision was presented by Woodward [WOOD89]. The subdivision process is carried out in the orthogonal viewing coordinate system of the ray. To speed up the computation, the control vertex mesh is scaled to large integer numbers. Subdivisions are done using integer addition and bit shifts, taking care not to lose accuracy. This method is faster than Whitted's method and it is reported to compare favorably with methods such as that used by Sweeney and Bartels, in spite of the fact that it doesn't attempt to exploit coherence. But, like the previous subdivision methods, it has the drawback that the same amount of computation must be performed to intersect a ray with every patch: simple patches cost as much as complicated ones. PAGE 42 37 3.2.3 Parallel Processing Techniques In the original algorithm by Whitted over 75% of the time was spent calculating ray/object intersections. Since then much work has been done to speed up the intersection tests by using parallelism to perform many ray tests in parallel. All the researchers mentioned above, one way or another, explore the possibility of reducing the intersection computation, but their approaches were not simple enough for hardware implementation since they did not take into account that computation for intersection can be done in parallel. Some parallel machines have been proposed in the literature, but we have yet to see an implementation of such a machine. UUner in his dissertation discusses how to parallelize ray tracing in various ways [ULLN83]. This is one of the early studies of parallelizing ray tracing and how ray tracing might perform on different architectures. In addition to using subdivision methods, PuUeybank also implemented ray/surface intersection in a VLSI chip [PULL87]. But, recursive subdivision for curve and surface rendering is expensive to implement in hardware due to the high speed stack memory requirements. Kedem [KEDE84a] proposes an object space architecture based on the CSG and ray casting concepts. In his design, Kedem implemented the CSG tree in hardware. The reconfigurable tree structure consists of two kinds of nodes, namely the Primitive Classifiers (PCs) as leaf nodes and Combine Classifiers (CCs) as internal nodes. Each PC is assigned a certain number of primitives. During the image generation process, the system casts rays from the eye through every pbcel on PAGE 43 38 the simulated display. Each PC takes these rays and computes the intersection of every ray with its associated primitives. This computation is done in parallel. The result of the computation is passed up the tree. Each CC takes the results from its left and right nodes, combines them according to the operation assigned to it and passes the result up the" tree. At the root of the CSG tree, the color value of the pixel is calculated. This architecture computes quadratic surface visibility. Kedem and Ellis [KEDE84b] use CSG for building up a complex object out of simple primitives like cones, cylinders, and spheres. The main drawback of such a system is that representing an object with cubic or higher order surfaces require numerous quadratic primitives and even then is at best an approximation to the original surface. A similar algorithm for planar patched surfaces was presented by Thomas [THOM83]. It also used ray casting techniques and was specifically designed for VLSI implementation. Unlike Kedem's algorithm, it performs perspective projections. The intersection calculation of a ray with a parametric patch is still the subject of ongoing research in this area. To produce a highly parallel machine capable of real time operation, even compromising image quality, is also one of the challenging areas people have attacked. The problem still remains to find an efficient execution of ray casting algorithm for curved surfaces and usage of parallel processing techniques since ray casting is inherently a parallel process. PAGE 44 CHAPTER 4 THE PROPOSED ALGORITHM The choice of rendering algorithm is dependent on the model representation and the degree of realism desired. Ray casting, the rendering algorithm of our choice, handles one pixel at a time and compares every object's depth at this pixel as opposed to the z-buffer algorithm which handles one object at a time and operates on all pixels that this object covers. The proposed algorithm determines visibility by tracing a bundle of rays from the observer to the objects in the scene. The direct intersection of a ray with a parametric surface, i.e. without generating a polygonal approximation of the surface, has generally been done using either rather sophisticated numerical methods or simpler but inefficient subdivision algorithms. This dissertation focuses on numerical techniques to calculate the intersection of a ray and surface. The techniques, as described here, deal with Bezier surface patches, but can easily be extended to more general parametric surfaces. The algorithm consists of essentially two parts. The first part is a series of preprocessing steps. The actual ray/surface intersection is the second part of the algorithm. Preprocessing involves selection of a bounding volume, construction of a bounding box tree, determination of boundary and silhouette edges in the patch, 39 PAGE 45 40 and calculation of the surface normal. These steps, Newton's method as a numerical technique which is used for the ray/Bezier surface intersection calculation, actual intersection algorithm, and illmnination model are discussed in this chapter. 4.1 Preprocessing Steps Since the ray casting procedure is so computationally expensive, any preprocessing that can be performed to help reduce computations should be done. Preprocessing is done before ray-tracing is begun. This saves time by eliminating redundant calculations for every line/object intersection. The preprocessing steps to follow are as follows: 1. Selection of a bounding volume. 2. Construction of a bounding box tree. 3. Determination of boundary edges. 4. location of silhouette edges. 5. Calculation of surface normal. 4.1.1 Selection of a Bounding Volume Bezier surfaces are capable of modelling a wide variety of shapes which are not easy to model in other ways. Parametric surfaces, such as Bezier surfaces, are employed increasingly as the computer graphics field matures. Since such representations involve more sophisticated mathematics than the simpler alternatives, it is more computationally expensive to produce their images. For example, in ray PAGE 46 41 casting a Bezier surface, it might require thousands of floating point iterations to find intersections of a ray with a Bezier surface patch [KAJI86, TOTH85]. This is much more expensive than finding the intersections of a ray with a polygon or a quadratic surface. The selection of "a bounding volume helps to accelerate ray casting Bezier surfaces. It is possible to use the convex hull of a Bezier patch as its bounding volume. Such a box, however, is not very tight, since control points could be quite far from the surface they define. We present a different technique: bounding boxes defined by the exact minimum (xÂ„iÂ„, y^) and maximum (x^^^, y^^,) values attained on a surface are tighter than other bounds obtained using approximate methods such as convex hulls of the control vertices (see Figure 4.1). Points on the surface are evaluated, using x(u,v),>'(u,v) and z(u,v) given in chapter 2, by varying two parameters incrementally. The increments are obtained dividing the parameters u and v by an integer. The forward differencing technique which is presented in chapter 5 is used to obtain the increments. This scheme effectively reduces the size of the void area. An intersection test with the patch is performed only if its bounding box is intersected by the ray. The information sought for an intersection test with a bounding volume is simply a yes-no response. The computation of the actual point of intersection is not necessary. First the transformation to make the ray coincident with the z axis is computed. The transformation is then applied to each of the eight vertices of the bounding box. A new bounding box is then determined using the transformed PAGE 47 42 Figure 4.1 The Extent to the Surface Q(u,v). PAGE 48 43 vertices. An intersection between the ray (z axis) and the box occurs if the signs of the new x^ and x^ are opposite as well as the signs of and y^. 4.1.2 Construction of a Bounding Box Tree The second step" in the preprocessing of a surface is to create a tree of bounding boxes. The entire scene is bounded by a global bounding box at the root node of the tree. Every internal node of the tree contains pointers to its children, and its bounding box is just big enough to contain all of its children's boxes. Each leaf node holds representative values of u and v for the section of the surface it represents; these values will be used later as an initial guess for a Newton process. If a ray hits a leaf box, the intersection of the ray and the patch must be within the box or very close to it. The mean parameter values of the points contained in the leaf box hit by a ray provide good initial guesses for the Newton iteration for the calculation of the parameter values of the intersection point. Bounding boxes are useful for complex objects, such as bicubic surfaces, when the object is fairly small. Since the time to find the intersection with a bicubic surface is large, but bounding box intersections are fast, the ray tracer can save time for all the negative tests. In the case where the ray does enter the box, some additional overhead is incurred; however, this cost is easily justified by reduced times for negative tests. PAGE 49 44 When the initialization (modelling) stage is complete, a patch is represented by its defining matrices (C = M P hf) and the parameters kept in its bounding box requirement is less than that required by the polynomial representation, 4.1.3 Determination of Boundary Edges It is useful to have a univariate description of each bounding edge of the patch. Bicubic surface patches have four natural boundary curves: C(m,0), Q(0,v), and Q{\,v), each of which is cubic in one variable. This is easily derived from the standard bivariate coefficient matrbc C = M P AT as follows: Q(u,0) = edge^iu) = [M^u^u l][q[0 0 0 If (4.1) Q{u,\) = edge^iu) = [u^ u l][q[l 1 1 If The expressions for j3(0,v) and Q{\,v) are similar. Twelve coefficients are required to describe a cubic curve, four each for the X, y, and z components and these can easily be computed from the C matrix. The coefficients for edge^iu), for example, are summations of the rows of C. In addition, surface normal information along each bounding edge of the patch must be provided for use by both the shader and the silhouette detection routine. The silhouette detection and the normal calculation are explained in the next two sections. PAGE 50 45 4.1.4 Location of Silhouette Edges The silhouette edges of a surface, if it contains any, are defined as those curves on the surface along which the z-component of the normal vector is zero, assuming the eye is on the z axis. In general, since the order of the surface normal is quintic (as we will see" in the next section) the order of the silhouette curve is also greater than the cubic, but it is approximated here by a piecewise cubic interpolant so that the silhouette can be treated the same as any other edge. Boundary and silhoutte edges are shown in Figure 4.2. To compute a silhouette edge, the cubic approximation of the normal surface and the z =0 plane are converted to the Bezier surface representation [SCHW82]. This provides a surface representation of the normals at each point on the surface. The normal vector of the surface at the point (u,v) is evaluated. A set of points which lie on the intersection line of the normal surfaces are obtained using subdivision and the convex hull property of Bezier meshes. The surface equation is calculated using (u,v) values to determine a corresponding set of points on the surface which represent the actual silhouette in screen space. Connecting these points with straight lines is guaranteed to be within a user-specified tolerance of the true silhouette. PAGE 51 46 Figure 4.2 Two Types of Edges for Curved Surfaces PAGE 52 47 4.1.5 Calculation of Surface Normal The normal vector to a Bezier patch can be found by taking the vector cross product of the tangent vector in the u direction and the tangent vector in the v direction. The Bezier surface patch is defined by Q(u,v) = UMPM^V^ ^"^-^^ where U = [i^u^ul],V = [v^ v 1], M is the Bezier matrix given in chapter 2, and P is the matrix of control points. The u tangent vector of the surface Q{u,v) is ^("v) = d(UMPSf^V^) ^ dUj^p^TyT ^ [3Â„2 2u 1 0]MPM^V^ = U'MPM^V (4-3) du du du and the v tangent vector is = djUMPM^V^) ^ fjj^pMTdVl UMPM^[3v' 2v 1 0]^ = UMPM^V (4.4) dv dv dv Both tangent vectors are parallel to the surface at the point (m,v) and, therefore, their cross-product is perpendicular to the surface. Each of the tangent vectors is a 3-tuple since equation 4.2 represents the x, y, and z components of the patch. The normal is given by PAGE 53 48 where xÂ„ are the x, y, and z components of the u tangent vector and xÂ„ yÂ„ z, are the X, y, and z components of the v tangent vector. This is a quintic expression in u and V. In our algorithm, we chose the method given in [CATM74] to calculate the normal: shades are computed by calculating a cubic approximation to the normal at each point on the surface in order to simplify the computation. Twelve coefficients are required to approximate the cubic curve. This way, the rendering system could treat positional and normal information in a uniform way. Each component of the normal vector can be approximated by a Coons patch, corner positions, derivatives in each direction at the corners, and cross-derivatives at the corners, XJiu,v) s UCP^CV, where XÂ„(0,0) XjiO,l) XÂ„(1,0) dX^iO,!) du du dx^a,i) du du dX^iOfi) dX^{Q,\) dv dv JXÂ„(1,0) dX(,\,\) dv dv dX(0,0) d%(S),\) dudv dudv d%i\fi) dXiU) dudv dudv (4.6) and C = 2 -2 1 1 2 -3 0 1 -3 3 -2 -1 -2 3 0 0 0 0 1 0 1 -2 1 0 1 0 0 0 1 -1 0 0 (4.7) PAGE 54 49 The y and z components, YÂ„ ZÂ„ are treated similarly. The values of the matrix PÂ„ Py, and are evaluated at u=0, iz= 1, v=0, and v= 1. These functions approximate unnormalized normal vector functions, therefore, the normal vector is normalized before use. The elements iri the matrix P^ are evaluations of the functions at the corners of the patch and they are given by = iU"MP M^VWMPM^V') + (U'MP M^V'XU'MP^M^V) du ^ (U'MP^M^V')iU'MP^M^V) (lJMPyM''V'){U"MP^M''V) dX(u,v) = (U'MP M^V'){UM^PM^V') + iU'MPM^V){UMP M^V) dv ' ' ' ' ' (4.8) (UMP^M^V'XU'MP^M^V) (UMP^M^V'XU'MP^M^V) = iU"MP^M^V'){UMP^M^V') + (U"MP^M^V){UMP^M^V") dudv + iU'MP^M^V'XU'MP^M^V) + (U'MP^M^VXU'MP^M^V") (U'MPyM^V"){U'MP^M^V) iU'MP^M^V'XU'MP^M^V) (UMP^M^V")iU"MP^M^V) {UMPyM''V')iU"MP^M^V') 4.2 Newton's Method An intersection between a surface Qiu,v) and a ray /?(/) occurs when and only when F(u,v,0 = PAGE 55 50 Since bulk of the computation time for the whole intersection calculation is taken by the root finding procedure, it is crucial to use a highly efficient root finder and the Newton's method is the one used here. This method can give the required solutions to any desired accuracy. Newton's method is a procedure for calculating the root of a polynomial equation or a system of polynomial equations. These equations can be solved iteratively by the Newton method provided that a good initial approximation is known. Newton's method converges quadratically, i.e., near a root the number of significant digits approximately doubles with each step. This very strong convergence property makes Newton's method the method of choice for any function whose derivatives can be evaluated efficiently, and whose derivative is continuous in the neighborhood of a root. In the case of a polynomial equation in one variable, Newton's method has a simple geometric interpretation. If Xo is our initial guess for the solution to the equation f(x) = 0, we calculate the tangent fix^). The next approximation, Xi is given by the point where the tangent crosses the axis. Equation 4.9 is equivalent to = xiu,v) D^-t = 0 Py = y(u,v) D^'t = 0 (4.10) = z(Â«,v) t = 0 where F is a vector, in the form of F = [F, F^ FJ. The system F(u,v,t) = 0 can be reduced to a system of two equations in two unknowns, eliminating /. Then we have the equivalent system PAGE 56 51 g,(u,v) = x(u,v) Z)/z(u,v) = 0 g^M = y(u,v) Dy-z{u,v) = 0 Letting s = (u,v), equation 4.11 can be written in vector form as gis) = 0 (4.11) (4.12) Newton iteration generates a sequence of successively improved approximations to the solution of a system of equations above, using the relation (4.13) where / is the Jacobian matrix defined as J = du dv du dv (4.14) calculated at u = , v = v, and J'\x,) g (x,) is the Newton step. We can rewrite the equation 4.13 as -7' ^l("..'^r)' (4.15) The system of equations above is solved iteratively provided that the Jacobian matrix, /, is non-singular at each approximation point and there is a good initial guess to start with. There is even no need to calculate the inverse matrix since the inverse of 2x2 Jacobian matrix is trivial. Since can be expressed explicitly as PAGE 57 52 1 Â•'c 01 (4.16) det(7) -J,. J{ 00 equation 4.15 can be rewritten as u, (4.17) V, where D{u,v) = 1 / detiJ). Once the Newton steps have been calculated with an initial guess (Uo,Vo), equation 4.17 is used to evaluate {u^,v^) from (Mo,Vo), then {u^,v^) from (Ui,Vi), and so on, until one of the following conditions are satisfied: 1. |gi(",.i,v,Â„) I + |g2(Â«..i,v,^i) I < tolerance l.uorv are outside the parametric intervals 3. |^i(Â«,.i,v,,i)| + |g2(u..Â„v,.,)l > ki(w^v,)| + \g^{uÂ„v;)\ 4. n+l > maximum number of steps If the iteration is terminated by a case 1, it is considered that the ray does hit the patch, and u,+i,v,+i are returned as the parameters for calculating the coordinates and the derivatives at the intersection point. If the iteration is terminated by cases 2 or 3 or 4, it is considered that no intersection happens, and NIL is returned. In this implementation, tolerance and maximum number of steps are chosen as 0.01 and 5, respectively, although these values can be varied according to the surfaces used. It was found all Newton calls were terminated by cases 1, 2, or 3, and PAGE 58 53 that about 90% calls required only one iteration. The search algorithm is performed in the surface parameter space only. The ray parameter is ignored, even while performing Newton iteration, until a solution is found. The Newton iteration finds a pair (u,v) such that a point Qiu,v) on the surface is also a point in a given ray. The Newton method, in high-level pseudocode, is shown in Figure 4.3. 4.3 Algorithm Primitive objects such as bicubic Bezier surfaces are rendered one ray at a time. The algorithm developed utilizing this ray casting technique for bicubic Bezier surface is as follows. A list of tree nodes to be tested for intersection with the ray is sorted by the minimum distance from the origin of the ray to the bounding box of the node so that the next node in the list is always closest to the origin of the ray. The list, at the beginning, contains only the root node of the tree. At each iteration, the closest node on the active list is chosen and removed. If the node is an internal node, each of its children is tested in turn for possible intersection with the ray. If the ray hits the bounding box of a child, the child is put back into the list. The nodes not intersected by the ray are thrown away, thereby saving memory. If the node is a leaf node, then we use the contained (u,v) parameter values to initiate the Newton's method. The loop exits when either the list of nodes to be tested is empty, or an intersection is found whose distance from the origin of the ray is less than the minimum distance from the origin to the next node in the list. PAGE 59 54 Newton (surface, treenode, ray, intersection, t, u, v) { calculate C matrix /* CÂ„ Cy, C, */ get initial values for u and v from treenode success = failure = FALSE while (not success && not failure) { calculate jacobian, J calculate det(J) if (det(J) = 0) { approximate J calculate det(J) again } calculate J'^ calculate gi(u,v), g2(u,v) error = |gi(u,v)| + |g2(u,v)| if (error < tolerance) success = TRUE else if (iterations > = maxit) && (error > = last error) &&((u<0) II (u>l) II (v<0) II (v>l)) failure = TRUE else { calculate Newton steps calculate u^+i, v^^j increment iteration count } } if (success) { intersection = point on surface at (u,v) t = distance from ray origin to intersection point } } Figure 4.3 Algorithm for Newton Method PAGE 60 55 The intersection procedure which is the heart of the whole algorithm is shown in Figure 4.4. The bicubic surface primitive engine (BiSPE) is based on this intersection procedure. This algorithm which executes the task of the BiSPE is suitable for parallel implementation and will be explored in detail in chapter 5. The algorithm follows Thomas's [THOM83] approach for collecting the intersection depth results from the primitive processors and for determining which surface is visible along the current ray. The intersection with the maximum z coordinate is the visible surface on the image plane. 4.4 Lighting Model Once a rendering algorithm has resolved the geometric question of what spot on what object a pixel represents, it needs to resolve what color that spot appears to the observer. The visible surface for a pixel is a small patch of surface. The shading model takes into account the composition, direction, and geometry of the light sources, the surface orientation and the surface properties of the object. Surface normals can be used to indicate the shape and orientation of a surface patch, which in turn is used to determine the shading value. Ray casting is the process of determining which object in a scene is first encountered by a ray emanating from the camera view location and passing through a pixel on the display screen. From the list of ray-object intersections the first visible intersection is chosen. Since there is no further tracing the rays from the intersection point, the conventional shading model due to Phong [FOLE82, ROGE85] is used PAGE 61 56 intersect_surface (surface, ray, intersection) { initialize empty list add root node of surface's tree to list while (list not empty) { remove next node from list if (node = leaf node) { call Newton process if (intersection found) && (closer than previous ones) save it as int_found } else for each child of node if (ray intersects bounding box) add child node to list if (int_found < min_dist_to_nextnode) exit loop } } Figure 4.4 Ray/Bezier Surface Intersection Algorithm PAGE 62 57 with ambient or background lighting, diffuse shading due to surface orientation in relationship to the light source, and specular reflection or highlights due to the location of the light source and the eye as well as the surface orientation. Phong's equation for the intensity of each pixel consists of the sum of three terms given by ^ ^ambient * ^diffuse ^ ^specular = kl + + I, = k I + (fc.cose + kcos"^) (4.18) where kj, and k, represent the ambient, diffuse and specular coefficients respectively for the object and 0 < kÂ„ka,k, < 1, and /, are the intensity of the ambient light which is a constant throughout the given scene and the intensity of the light source respectively, d represents the distance from the perspective viewpoint to the object, K is an arbitrary constant that is included to prevent the denominator from approaching zero when d is small, N is the unit normal to the object at the point, L is the unit vector having the direction to the light source, Fis the unit vector from the eye (origin) to the object, S is the unit vector of the specularly reflected ray, 6 is the angle between N^and L, and PAGE 63 58 source. The normal to the surface at the point of intersection describes the surface orientation. Each point in the scene visible to the viewer must have the normal to the surface at that point computed prior to the lighting model implementation. For evaluating the intensity of the light at each pixel, three color components of the intensity should be separately calculated. For an RGB video monitor, color components are red, green, and blue. Parameters for intensity and reflectivity then become three-element tuples, with one element for each of the color components. For instance, the triple representing the coefficient of diffuse reflection is (k^^ k^. However, since the constant value k, for specular reflection is not dependent on the color of the surface, k^ is the same for all three color components. Hence, the intensity calculation breaks into three equations: (4.19) h = kj^ + -^*{k^{N.L) + k^iVm a+K 4.5 Results and Discussion The ray tracing technique is an object oriented algorithm. Rather than group the software by procedure, we group it by data structure. Thus, instead of collecting all the intersection methods into one file and all the normal vector formulas into another, we split the problem the other way, collecting procedures for each primitive PAGE 64 59 Figure 4.5 Diffuse and Specular Reflections PAGE 65 60 type into a file of its own. If there was more than one primitive such as spheres, polygons, there would be one file containing patch-related routines, one file for sphere-related routines, another for polygon-related routines, etc. This has the advantage that primitive-dependent information can be hidden in data structures local to each file and'tlie procedure interfaces can be very simple and generic. Adding new primitives to the system becomes easy with this scheme. Since the details of each primitive's data structure will be local to that sub-module, all operations on the primitive are supported by generic procedures. Each object shape is defined using its individual parameters. The intersection routine is object oriented also. Each object shape uses different procedural techniques to determine an intersection. 4.5.1 Complexity and Performance Analysis In this section, the precision required in the critical parts of the algorithm is examined. The algorithm is bound almost exclusively by floating point speed, not by storage management, not by dataflow complexity, and not by program control complexity. We intersect each ray with the environment by simply testing each and every primitive object retaining the nearest point of intersection, if one exists. This has a time complexity which is linear in the number of objects. All the intersections are sorted in order of increasing parameter t, and the first such intersection is returned. To find the intersections, we need to solve two simultaneous equations depicted in PAGE 66 61 equation 4.11. The number of floating point operations (flops) necessary at each step solving these equations are summarized as follows. 1. Calculation of gi(w,v) and gi{u,v). a. Computation of the coefficient matrix, C = M P \f. Three 4x4 matrix multipHcation requires 40 subtractions, 16 additions, and 24 multiplications (knowing M contains I's and O's in certain positions, the calculations are simplified). This is just for one component, therefore for three of them it adds up to 120 subtractions, 48 additions, and 72 multiplications. b. Computation of x(ii,v), y(u,v), z(m,v). It takes 4 multiplications to calculate Â„^ li, v^, and v'. Multiplication of 1x4 U matrix with 4x4 C matrix and multiplication of resultant with 4x1 V matrix requires 15 multiplications and 15 additions for one component bringing the total to 49 multiplications and 45 additions for all three components. c. The solution of equation 4.11, once the necessary terms are calculated, requires 2 multiplies and 2 subtractions. 2. To solve the equations above, we need to calculate Newton iteration procedure. a. Calculation of partial derivatives for x, y, and z components requires 49 multiplications and 39 additions. b. Calculafion of partial derivatives for gi and needs 6 multiplies and 4 subtractions. c. Calculation of det(J) requires 2 multiplications and 1 subtraction. d. Calculation of D requires 1 division. PAGE 67 62 e. To solve equation 4.17, 6 multiplications and 4 subtractions are needed. 3. Computing t takes 1 addition. It is not hard to see that, taken at its simplest level, ray casting is highly computationally expensive. These results are for ray/patch intersection for one iteration. Since it takes only one iteration 90% of the time for Newton iteration to converge, the total number of single precision floating point operations are tabulated and are shown in Table 4.1. For a resolution of 1024x1024, there are one million rays need to be traced per frame at 30 frames per second to achieve real-time performance. A real-time processor has to dispatch the computation for each ray in 33ms. Thus, if only 10 floating operations are required per ray, real-time ray casting requires a 300 megaFlop processor. As we have seen, we have at worst case approximately 448 floating point operations per ray/patch intersection. Calculation yields a complexity of 3 Hr/frame on a quarter megaFlop machine (VAX 11/780). The state of the art for general purpose machines is roughly 200-1000 megaFlops. It is clear that in order to realize real-time ray-traced images we need substantial advances both in algorithm development as well as in special purpose hardware design. In this chapter, the analysis of the algorithm is given without taking advantage of any coherence. It seems that incorporating very strong ray-to-ray coherence is a must for a practical ray tracing system. Incoherence can be an advantage when seeking to parallelize the algorithm. The opportunity for virtually unbounded parallelism is afforded by the fact that the each pixel computation proceeds PAGE 68 63 Table 4.1 Number of Floating Point Operations for Ray/Patch Intersection Term Multipli cation Addition Subtraction Division gi, & 123 93 Ur+i, v,,i 61 39 t 1 Total 184 133 122 8 1 130 1 independently of all others. Because it uses well known numerical procedures, fast floating point pipelines will without any difficulty directly accelerate the speed of image generation. There are several optimizations which are carried out on the ray/surface intersection calculations. First, rays are transformed to lie along the z-axis, then the same transformation are applied to each candidate object, so that any intersection occurs at X = y = 0. This simplifies the intersection calculation and allows the closest object to be determined by a 2 sort. The intersection point can then be transformed back for use in shading calculations via the inverse transformation. Second, by not computing the normal in the intersection routine, we make each intersection test cheaper. Finally, using integer arithmetic helps speed some computations. PAGE 69 64 4.5.2 Simulation Results A ray casting image rendering system has been designed to make the code easily expandable and maintainable. The validity of the algorithm was verified by software simulation. The algorithm is implemented in the C language on an IBM AT/386 computer with" a 80387 floating point processor. An object oriented design philosophy was used for the program, and the resulting code has been made robust. A bicubic Bezier surface is presented to the program as three matrices of x, y, and z separately. The memory requirements for the bicubic Bezier surface are determined largely by the size of the tree of bounding boxes. Performance characteristics for various surfaces are given in Table 4.2. Although, CPU times are often misleading, being dependent on programming style, code optimization, and hardware, we still supply these in order to give an indication of the code complexity. One of the major advantages of ray tracing algorithms is that they provide a uniform geometric framework, the line/surface intersection, for displaying variety of different surface types. Unfortunately, this increased generality increases the computation time significantly. Therefore, we consider implementing some part of the algorithm in hardware. One approach is to exploit the independence of the calculations from pixel to pixel and perform these calculations in parallel. The pixel level approach of the proposed architecture reduces the geometry that is typically involved for solving intersection problems to the comparison of depth values. Hardware implementation of the primitive engine algorithm finding the roots of polynomial very quickly and in a robust manner is discussed in the next chapter. PAGE 70 65 Table 42 Performance Characteristics for Various Surfaces Surface 1 eapoi OUilCi c Onii phnut^ Number of natches 32 8 16 Number of control vertices/patch 16 16 16 Total number of ravs 64000 40000 64000 Average boxes checked/ray 44.6 24.32 31.12 Average Newton caHs/ray 0.78 0.89 0.87 Average iterations/call 1.08 0.92 1.03 Modelling time 2% 1% 1% Bounding box intersect time 46% 45% 48% Newton time 23% 28% 25% Shading time 15% 14% 11% Miscellaneous time 14% 12% 15% Total time (hrs:mins:secs) 0:19:23 0:8:07 0:12:45 ' Control vertices data for teapot is taken from IEEE CG&A magazine, Vol. 7, No. 1, January 1987. Control vertices data for doughnut and sphere is taken from IEEE CG&A magazine, Vol. 7, No. 4, April 1987. PAGE 71 CHAPTER 5 THE PARALLEL IMPLEMENTATION OF THE ALGORITHM Ray casting algorithm has the advantage of being very simple, highly parallel in nature, and calculation intensive enough to produce good speedup even when each processor is assigned to a different pixel. To accelerate ray tracing some of the operations, especially intersection computation, can be performed in parallel since the computation of each pixel is independent of the others. This fact allows the ray intersection of each primitive, a bicubic Bezier surface in our case, to be computed in parallel. Each surface is assigned to a primitive engine (processor) which determines the intersection depths. Then, these intersection results are passed to another processor, depth comparison logic, whose task is to determine the visible surface by sorting the intersections in depth. If the ray doesn't intersect the surface, that is the intersection is behind the observer and should not be displayed, the roots of the polynomial are complex and no results are passed to the depth comparison logic. If the ray is tangent to the surface, one real value is passed to depth comparison logic. If the ray intersects the surface in two or three points, the real values are passed to the depth comparison logic. The algorithm presented in the last chapter for the primitive engine was developed to efficiently use the low cost parallel processing possible in VLSI. In this 66 PAGE 72 67 chapter, bicubic-surface primitive engine (BiSPE) based on bicubic surface concepts is presented. The whole architecture called GraPE configuring the system in a pipeUne fashion with the primitive processing units (PEs) implemented in parallel is explored by Iskandar [ISKA90]. The system presented in his dissertation can be summarized as follows." A database manager at the top of the hierarchy distributes several objects to several pipelined viewing-transformation engine (VTE) and object processor (OP) combinations whose outputs are then distributed to a collection of primitive engines (PEs). The PEs broadcast the results of their computations such as color, intensity and depth, to the compositing network designed by Fleischman [FLEI88]. This pipelined-parallel GraPE architecture is shown in Figure 5.1. It is the responsibility of the compositing network to combine these pixel attributes into the final image. Iskandar claims that a workable system can be attained with as little as one object processor and one set of primitive engines such that the set must contain one engine for each kind of primitive supported by the system. Therefore, BiSPE design appropriate for VLSI implementation can be incorporated into GraPE primitive engine as part of the bicubic surface generation unit. The architecture then can be connected to the compositing network. The details of the connection still need to be worked out. In implementing the algorithm in parallel, first step is to calculate the coefficients of parameters u, v in equation 2.3 and find out how many additions, shifts, multiplications, etc. we have. Once the coefficients are known, the second step is the solution of the non-linear equations. This might require more computational PAGE 73 68 HOST COMPUTER TRANSFORMAT ION ENG I NE C L I P P E R I OBJECTS MANAGER c 0 p n m N b |7r E DISPLAY CONTROLLER D I SPLAY Figure 5.1 Pipelined-Parallel GraPE Architecture PAGE 74 69 power from each primitive engine than is feasible to place on a single chip. That's why we need some simplifications that can be applied to the computation of coefficients and solution of the equations. The usage of coherence and forward differencing results in simple enough hardware such that one or more primitive processors could be placed on a chip. This chapter examines these techniques to reduce the computation given in chapter 4. 5.1 Coherence Although an algorithm which treats all rays in the same way has the advantage of being uniform, its efficiency is limited without taking advantage of coherence. It is worth investigating whether coherence can be kept in the parallel implementation, and if so, to what degree. One must realize that as the number of processors in a system increases, the granularity of coherence is reduced. The primary coherence properties are scanline to scanline, pixel to pixel and area coherence. In this dissertation, the first two are used. In the former, adjacent scanlines intersect nearly the same set of edges which are then updated incrementally from scanline to scanline. In the latter, for interpolation across a scanline, a pixel would have a small change in its value from its neighbors if it is from the same span. The coherence along a scanline, that is pixel to pixel coherence, is in the fact that only one parameter in the ray definition changes as the ray is cast through the sequence of pixels. The algorithm takes advantage of the coherent structure of a scene to reduce the computations. The idea simply is that adjacent pixels on a PAGE 75 70 scanline are likely to have the same characteristics. In our case, the set of visible object spans determined for one scanline of an image typically differs little from those of the next scanline. Similarly, other properties of each scanline are quite like those of the preceding scanline. Calculations performed are dramatically reduced in the algorithm using these characteristics. Since the root-finding algorithm requires an initial value, coherence is exploited by beginning v^th the previous scanline's solution for the current scanline. During image generation, the sample points on the image space are processed in scanline order. At the beginning of each scanline processing, every bounding box tree is traversed. gi(u,v) (see equation 4.11) has to be calculated only once for a sample point. Similarly, g2iu,v) only has to be computed once for a whole line of sample points. In processing a new sample point, instead of calculating g,(M,v) from equation 4.11 again, it is updated by the increment calculated with the forward differencing method. For each scanline, g2(u,v) is also obtained incrementally. This way, a good deal of computation is saved. 5.2 Forward Differencing Technique An efficient way of repeatedly evaluating the bicubic is by using forward differences [FOLE82]. Forward differencing, an incremental technique, is used to reduce the hardware required to implement the primitive processor. This technique for rendering curves and surfaces with adaptive subdivision is explored in [LIEN87] and [SHAN88]. In these papers, shading and image mapping on a bicubic surface PAGE 76 71 is performed by drawing many curves very close to each other so that no pixel gaps remain between them. Livadas et al. [LIVA88] used this incremental technique for the computation of quadratic surfaces. In this section, the scanline parameters at the beginning of each scanline are updated, taking advantage of coherence, using forward differencing technique.' The scanline parameters independent of the vertical parameter are constant from scanline to scanline and thus require no computation between scanlines. Computation of the forward differencing steps follows. The forward difference of a function /(/) is A/(0 =/(r+6) -/(O, 6>0 (5.1) We can rewrite this as fit^b) = m ^ Am (5.2) So, if we know/(0 and A/(/) we can determine 6). Rewriting the above equation in iterative terms, we see that this is = /Â„ A/Â„ (53) where we evaluate /in constant steps of size 6, so n and t are related by tÂ„ = Sxn, and/, =/(0. For a bicubic surface Qiu,v) = UCV^, the forward differences are A^x{u,v) = x(u,v+6) xiu,v) aIxM = \x(u,v+5) A^xiu,v) (5.4) a';c(Â«,v) = aJx(Â«,v+6) aJx(Â«,v) PAGE 77 72 The equations above are forward differences on v. Forward differences on u are calculated similarly. A,x(u,v) is a second-degree polynomial in v when u is kept constant. x(m,v) is linear in v whereas the third forward difference is a constant, so further forward differences are not needed. Â• In the algorithm', the initial condition calculations, that is u = 0, v=0, are done by direct evaluation of the equations. If we put these equations into a matrix D as shown below D, = x(0,0) A^x(0,0) A;x(0,0) A>(0,0) AÂ„x(0,0) AÂ„(A^x(0,0)) AÂ„(A;x(0,0)) AÂ„(A^x(0,0) A^x(0,0) A^(A^x(0,0)) A^(A;x(0,0)) AliAlx(0,0) A^x(0,0) A^(A^a:(0,0)) aJ(A;x(0,0)) a'^CaJxCO.O) (5.5) then D = 10 0 0 0 6 6^ 6^ 0 0 0 0 0 66^ ^33 ^32 ^31 ^30 ^23 ^^22 ^21 ^20 ^13 ^12 ^11 ^10 ^03 ^02 ^01 ^00 10 0 0 0 6 0 0 0 6^ 26^ 0 0 6^ 66^ 66^ (5.6) PAGE 78 73 Rewriting the equation 5.6, D = Â£(6) C Â£(5) (5.7) where e and S are the parametric inaements in the u and v directions, respectively and C is the 4x4 coefficient matrix. Because we are dealing with three functions, xiu,v), yiu,v), and z(m,v), there are three corresponding sets of initial conditions, = Â£(Â€) QEiS), D, = E(e) C,E(S), and A = E(e) CM^)We will concentrate on the X component since the calculation of others are exactly the same. Computation of x(0,v) and x(u,0) in increments of S and c, respectively is done by applying a forward-step operation F to the matrix Then, we obtain the matrix below DD. = ^00 ^'^lO ^01^^11 '^02+<^12 ^03*^^13 ^10+^ ^11+<^21 ^12+^ ^13 +'^23 '^20-' ^30 ^21^^31 ^22-^^32 ^23 -"^33 *30 *31 *32 *33 (5.8) where F = 110 0 0 110 0 0 11 0 0 0 1 (5.9) PAGE 79 74 After a forward-step operation is applied, the first row of has the terms xie,0), A;c(Â€,0), Ajx(Â€,0), and A^(e,0). These quantities are now used to compute x(Â€,v), using forward differences. Applying F repeatedly, the first row of is used to compute x(2e,v), and so on. Substituting column for row, that is transposing DÂ„ and applying F to DÂ„ we calculate x(u,0), x(u,5), x(u^5), and so on. The procedure calculating forward differences is shown in Figure 5.2. Using forward differencing, a polynomial of degree n can be evaluated at a sequence of equally spaced points by only n additions [CONT65]. This is considerably better than evaluating the polynomial all over again. However, error accumulation can be an issue with this approach, and sufficient fractional precision must be carried to avoid it [BART87]. The forward difference basis allows parallel additions which are suitable for a pipeline implementation. We compute the intersection at one pixel and apply forward differencing to get the next pixel. The e step size for u and S step size for V are adjusted so that the curve steps along in approximately one pixel steps in screen coordinates. Forward differencing method has the advantage of producing picture quality equivalent to adaptive subdivision without the memory stack management overhead of recursive subdivision. The relatively poor performance of adaptive subdivision is due to the fact that a subdivision operation takes significantly more computation than a forward difference operation. Forward differencing also makes patch rendering performance competitive with polygon rendering. The technique operates in u, v space but polygon methods operate in the screen scanline order, PAGE 80 75 cubic_visible_surface algorithm { for i = 1 to num_patches { read patches / } * P P P */ for i = 1 to num_patches { calculate coefficient matrices /* CÂ„ Cy, Q */ } for i = 1 to num_patches { D, = E(Â€) X C, X E(5f ; = E(e) X X E(6f ; D, = E(e) X Q X E(5f ; /* get initial values */ DU, = F X D,; DUy = F X Dyi DU, = F X D,; D, = D7; Dy = D DV, = F X D,; DVy = F X Dy; DV = F X D Â• /* calculate x(u,v), y(u,v), z(u,v) */ /* in increments of e */ D, = D, /Â• calculate x(u,v), y(u,v), z(u,v) */ /* in increments of 5 */ for i = 1 to num_scanlines { update scanline parameters for j = 1 to num_pixels { calculate intersection determine lighting color pixel increment scanline parameters } increment inter-scanline parameters } Figure 5.2 Procedure for Calculating Forward Differences PAGE 81 76 therefore, once the initial parameters are found, forward differencing does not require a transformation from screen space to image coordinates as the polygon method does. 5.3 Pipelined and Parallel Architecmre For applications that require fast transformation such as ray casting and rendering of complex shaded images, a parallel, pipelined multiprocessor architecture can be used to achieve high throughput. A parallel architecture offers an increase in speed that grows almost linearly with the number of processors used. To extract the maximum efficiency from a parallel architecture, key bottlenecks must be identified and an architecture that is best suited to overcoming those bottlenecks must be chosen. In a pipelined architecture, each processor performs a unique operation on the entire data set, or object. Once, it has completed its operation, such as rotating an object, it passes the data to the next processor in the pipeline, which performs another function, such as clipping. While that processor is clipping the current object, the previous processor can be performing geometric transformations on the next object. The algorithm proposed is designed utilizing both pipelining and parallelism to achieve real-time performance. It is developed such that each intersection processor can be placed in a VLSI chip. In a practical display system several of these chips may be used to recursively calculate surface-ray intersections at pixels or groups of pixels. The bicubic surface primitive engine (BiSPE) has the capability to PAGE 82 77 compute the intersection of a ray and bicubic surface with sufficient throughput to allow real-time interactive display. Pipelining helps the system achieve high throughput. The next computation can be started before the previous result is available, but by the time the last circuit is reached, the result must be there. The bicubic surface computation unit, BiSPE, is responsible for computing the depth and color intensity values of pixels. A hardware implementation of the depth computation unit consists of muhipliers, adders, and shifters. BiSPE is based on the algorithm given in chapter 4 for rendering bicubic surfaces. The underlying principle behind the algorithm is the representation of primitives as curved surfaces, not polygons. The block diagram of BiSPE is shown in Figure 5.3. A parallel computation in x, y, and z was chosen because it operates three times as fast as computing these values one after another and there appears to be space enough on the chip to allow it. Based on bounding box tests, each PE computes the pixel representation of each primitive sent to it. The BiSPE engine produces two values, the color of the surface at the display point and the depth of that point. A combining circuit then determines the visible scene dynamically while several BiSPE engines produce the next pixel's color/depth information. 5.4 Results and Discussion The difficulty in achieving a real-time performance in a graphics system is largely due to the insufficient bandwidth to the image memory. Consider a 1024 by PAGE 83 78 Read Patch Control Points I Compute Bounding Box Compute Intersection Test for Hit Compare to Nearest Update Nearest t repeated for Â•wy ray U. V, t re 5.3 Bicubic Surface Primitive Engine PAGE 84 79 1024 display; for an image update rate of 30 frames per second, each pixel on the raster display must be updated every 31 nanoseconds or less. Current VLSI fabrication technology makes it possible to fabricate a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and ithplementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, NMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor using the NSF supported fabrication facility (MOSIS). One method of improving the performance of ray casting, to which the method is particularly well suited, is to introduce parallelism. Since each ray is independent of its neighbors, it can perform its computation in parallel with others without interference. An alternative use of parallelism would be for each object to have its own processor, and to supply on demand the z values of the intersections of a given ray with the object. This is analogous to the "processor per polygon" approach to parallelism in conventional rendering but at a higher level which in principle allows more advantage to be taken of any special properties of the object. An advantage of ray casting, and a possible justification for its use, is that it avoids the need to make polygonal approximations of curved surfaces. Given an analytic specification of the surface, all that is necessary is that it should be possible to solve the ray-surface intersection test. PAGE 85 80 5.4.1 Complexity and Performance Analysis The depth computation unit (DCU) or bicubic surface primitive engine (BiSPE) analysis examines two areas: complexity and performance. Complexity is estimated for discrete construction, for VLSI fabrication, and for gate-array construction. Performance is examined with respect to image resolution and DCU processing speed. DCU's complexity is measured utilizing gate count estimate which is determined by partitioning the DCU conceptual hardware organization into individual functional logic blocks. Then, the estimated gate count of each functional block is determined and totaled to provide a gate count estimate of a DCU. This technique provides an estimate of expected complexity for integrated circuit fabrication. It also provides an estimate of board-level complexity for an off-the-shelf integrated circuit implementation, which is determined through totaling the functional logic package types used. It should be noted that performance is usually enhanced for a realization by judicious use of additional gating, which may alter the estimated gate count. When the architecture is built in a parallel and pipelined fashion, matrix multiplication takes considerably less computation than given in section 4.5.1. If we have four processors running in parallel, multiplication of the first row of M (Bezier matrix) with the first column of P (matrix of control points) requires four subtractions only. This computation where MP is MxP is given below. PAGE 86 81 AfP[0][0] = 3x20 + 3*10 = (x^-Xq^ 2(^20 -JCio) (x^-x^^ Calculation of first row of M with the other columns of P, namely MP[0][1], MP[0][2], and MP[0][3], is exactly the same. Multiplication by 2 is just shifting 1 bit left and doesn't require extra hardware since it is hardwired. On the average, matrix multiplication when it is done in parallel takes one clock cycle. Pipeline (propagation) delay is three add delays and after every delay one result is obtained. Computation of the second row of M with P requires three adds and two subtracts: MP[1][0] = 3(xÂ«,2X10+^20) = 3[(^oo-^io) ^ (^20-^10)] = 2[(x^-x,^ + (J^ao-^io)! ^ [(^00-^10) ^ (^20-^10)1 For the third row, it takes one add and one subtract: MP[2][0] = -3x^ ^ 3x,, The units to calculate elements of each row are shown in Figure 5.4. The results above show that multiplying three 4x4 matrices costs 22 additions/subtractions for one component. Computation of this matrix with U and then multiplication of the resultant with K requires 3 multiplies and 3 adds each. So, forx(u,v) 6 multiplies and 28 additions/subtractions are needed. In total, calculation ofx(u,v),_y(M,v), and 2(u,v) requires 84 adds/subs and 22 multiplies. Rest of the computation is the same as the rest of the computation given in section 4.5.1. The number of floating point PAGE 87 82 Figure 5.4. Block Diagram of the Matrix Computation Unit PAGE 88 83 operations when the intersection calculation is implemented in parallel is shown in Table 5.1. The total number of floating point operations to calculate the intersection of a ray with a patch at a pixel with one iteration is 177. Clearly, there is a substantial saving compared to 448 floating point operations when the algorithm is implemented sequentially. The gate count of 32-bit full adder is 518 [KUCK78]. The gate count for multiplier is 15,284. With the iterative method, division requires 6 multiplications and 3 subtractions. So, for 177 operations the total gate count is approximately 1,000,000. In fact, gate counts of 5000 are often used in multipliers, therefore this circuit can be realized with a lot less than 1,000,000 gates. Because of the finite number of bits used in representing number, every mathematical computation has the potential of introducing a round-off error in its result. To prevent this error, guard bits are used. This means operands of any Table 5.1 Number of Floating Point Operations for Ray/Patch Intersection Term Multiplication Addition Subtraction Division gi,g2 20 42 44 u,,i, v,,i 33 27 9 1 t 1 Total 53 70 53 1 PAGE 89 84 . mathematical function must be represented by more bits than the result. The number and type of functions performed dictate the extra precision (guard bits) required by the operands. If the result of every operation is rounded, the error is plus or minus one half of the least significant bit. Thus, for a number represented by n bits, the worst case" error is Z^^"*^'. For multiplication and division, the least significant bit (LSB) of the result may be wrong. Ergo, for these operations, operands with higher precision than the result must be used to counter the error. For addition/subtraction, the largest errors occur in producing differences in nearly equal numbers. These operations were eliminated when possible. Depth computation is handled by the depth computation unit (DCU). Figure 5.5 shows the components of DCU. This unit solves for the roots of equation 4.11. The number of bits required for each DCU's component depends on the required precision at the output. We use 24 bits for the depth information, since the depth resolution of 24-bits is satisfactory for high-end graphics devices. Since only positive values are used and negative intersection depths are ignored, the display space is limited to 23 bits of precision in the Z dimension. This limit dictates the precision on the X and Y dimensions to be 23 also. The result at the output, namely the parameters u, v, and /, requires 23-bit precision as discussed above. When the precision of the DCU's components is traced back from the output, we get 32-bit precision at the input. For real-time animation, a minimum image update rate is 10 frames per second. A 30-frames-per-second image update rate is more typical for high quality PAGE 90 85 1 PAGE 91 86 animation. This kind of image update rate leaves 33 milliseconds computation time for each image. This is enough time to render 60 primitives (patches). We can see that quite complex images can be built from this number of primitives. For the minimum image update rate, three times more primitives can be rendered. Many more primitives can be rendered if the clock frequency can be increased. A 100 MHz clock will double the primitives that can be handled. The above computation estimates the performance of one primitive engine. Several of these primitive engines can be running in parallel. The potential performance of the whole system then will be the performance of one primitive engine multiplied by the number of primitive engines in the system. We just estimated scene complexity by the number of primitives that can be rendered within one-frame time. Another measure of scene complexity is based on the screen resolution that can be managed by the system. This can be found by determining the pixel time. The pixel time is the time available to process one pixel. The pixel time depends on both screen resolution and image update rate as depicted by the following equation 1 processing time = (Image update rate) * (Screen resolution) From the above equation, processing times for different image resolutions are shown in Table 5.2. PAGE 92 87 Table 52 Processing Time for Various Screen Resolutions Screen Resolution Processing Time (ns) 640 X 480 1280 X 960 1280 X 1024 1280 X 1280 1400 X 1280 2048 X 2048 325.5 81.4 76.3 61.0 55.8 23.8 Note : Image Update Rate = 10 frames per second Since the system is implemented in a pipeline fashion, its pixel time is determined by the slowest components in the pixel computation unit. These components are the floating point add or subtract units. Using the i486 as reference, a floating point add or subtract takes 5 clock cycles. The algorithm requires only 177 flops per iteration step compared to 419 flops for the method presented in Sweeney and Bartels [SWEE86]. Our algorithm takes only one iteration to converge whereas theirs take two or more. The normal computation takes 216 flops when theirs require 407 flops. Kajiya's method [KAJI82] takes 6135 flops in the worst case. The interval Newton method [TOTH85] involves similar figures. PAGE 93 88 5.4.2 Simulation Results Clearly, the intersection of a ray and a bicubic Bezier surface is not a simple task. In spite of the complexity of the mathematics involved, the actual implementation is simple enough to allow convenient hardware implementation of the algorithm. The architecture developed to implement the algorithm for the bicubic surface primitive engine, BiSPE, is shown in Figure 5.3. The algorithm correctly handles the difficult cases of the ray tangentially intersecting a planar patch and intersections of the ray at a silhouette edge of the patch. It can be broken down into modules. The granularity (the size of tasks which are assigned for work) of the modules depends on what kind of hardware or architecture they are implemented on. One possibility of a coarse architecture is to implement the modules as firmware for a general purpose microprocessor with floating point support such as the Intel i860 or i486 (80486). Of course this type of implementation wastes a significant amount of the computing power and real-estate because of the general nature of the processor. The ideal implementation would be a medium-grain architecture with only the time-consuming tasks implemented in silicon. This is a type of microcoded processor. This kind of architecture makes it possible to implement the algorithms for different primitives on one type of processor. A fine-grain implementation is where every step in each algorithm is hard-wired. This of course will yield the fastest processor. However, due to complexities of the algorithms, this kind of architecture needs a considerable amount of silicon. PAGE 94 89 Depth computation unit, DCU, can be implemented in VLSI by today's technology. The difficulty may arise in implementing several DCU in a single chip. The goal is to implement the complete system on a single board, and several DCUs may be required to be in one chip. With the advancement of VLSI technology, this will be possible in near future. PAGE 95 CHAPTER 6 CONCLUSION 6.1 Summary This dissertation presents the derivation of a ray casting algorithm for bicubic Bezier surfaces. A paralleHzed version was proposed. The major feature that distinguishes the algorithm lies in the way the primitives are rendered directly from the bicubic form, while most other systems are based on polygonal primitives. I had two goals when I started this research. One was to do the intersection computation in short time so that the use of patches in an interactive environment will be attractive. My second objective was to design the algorithm so that it can easily be implemented in VLSI chips. The two goals set at the beginning of the research have been accomplished: a computationally regular and efficient process for the display of bicubic Bezier surfaces is devised, and a novel pipeline with hardware enhancements for image compositing and pixel production schemes is utilized. The algorithm developed is simple, efficient and fast. The method presented here compares well with the methods given in Sweeney and Bartels [SWEE86], Toth [TOTH85], and Kajiya [KAJI82]. The research focused on a numerical technique called Newton iteration to calculate the intersection points of a ray and a bicubic Bezier surface. There are several reasons for selecting the Bezier representation. 90 PAGE 96 . 91 The Bezier control mesh provides a rough approximation to the surface, and thus makes designing test cases relatively simple and more intuitive than if some other representation had been chosen. In this research, a processor is assigned to a primitive in the display. So, a primitive processor is the intersection computation hardware for a single surface. The points of the intersection are the roots of the equation Q(u,v) Dt RO = 0 in terms of the parametric variables u, v, and /. After obtaining the roots of the above equation, coherence and forward difference are used to reduce the number of operations required. Usage of these methods results in simple enough hardware such that one or more primitive processors can be placed on a chip. It appears that the algorithm is simple enough to be implemented in a VLSI chip in the future. The purpose of the chip is to calculate the intersection point of a ray with the curved surface. Parameter values (u,v) specify the location of the intersection on the patch, and parameter value t gives the location of the intersection on the ray. The faster image creation and a more natural user interface resulting from the way objects are defined in the database are the foreseeable advantages of the proposed design. Ray casting is a very popular technique because of the realistic images that can be produced. A great variety of visible surface algorithms have been developed over the years, but not the research reported here. Not much progress has been made on hardware techniques for rendering curved surfaces. PAGE 97 92 These steps are followed in this research: 1. An algorithm is developed to calculate the intersection of the ray with the cubic surface. 2. The computation required is reduced using coherence and forward differencing methods. By taking advantage of image coherence the average number of iterations per intersection test is reduced to one. 3. Ray casting has been a particularly popular candidate for parallel implementation because, in its simplest form, each pixel is computed independently. The algorithm is designed such that it can be placed in a VLSI chip taking advantage of the parallelism of the ray casting algorithm. This also reduces the algorithm's computational requirements. 4. The algorithm's complexity is studied to determine its computational order. Moreover, the operations in the algorithm are also counted. Advantages of the algorithm are that it is conceptually relatively simple, and the code produced therefore is concise, easy to understand and easy to generalize. Newton's method is by nature approximative and this can be a disadvantage of the algorithm proposed here since imprecise intersection solutions can occasionally lead to problems. PAGE 98 93 6.2 Future Work Many improvements or extensions of the algorithm are possible. The objects in the world scene may be extended to include other objects such as spheres, fractals. Each object type may be defined separately and individual routines for the intersection computation may be included. As long as a routine may be written to determine the point of intersection and the normal to the object at that point, the object may be included as part of the scene, taking advantage of the object oriented property of the algorithm. The concept of bounding volumes could be extended to group objects in the environment which are physically close to each other. A hierarchical environment could be created of the entire scene, each level surrounded by a bounding volume. If the ray intersects the bounding volume, the components of that cluster need to be examined; otherwise several objects may be eliminated from the intersection process. The model file would supply the information regarding the groupings, creating some work for the user, however, the time savings of the ray casting process is substantial [WEGH84]. The Bezier representation is actually a special case of B-splines. Extension of the algorithm to bicubic B-spline surfaces would only involve converting the control mesh to the internal power basis representation. Ray casting algorithm can be extended to be a ray tracing algorithm when rays are further traced from the intersection point. Surface shading, shadowing, and anti-aliasing can also be added to make the algorithm more complete. PAGE 99 94 The hardware referred to in this research was simulated in software. This is required to determine its detailed complexity, possible pipelining structure and register requirements. Hardware implementation of a primitive processor for ray/surface intersection is the next logical step. PAGE 100 BIBUOGRAPHY APPE68 Appel, A., "Some Techniques for Shading Machine Renderings of SoMs" AFIPS Spring Joint Computer Conference, Vol. 32, 1968, pp. 3745. ARV087 Arvo, J. and Kirk, D., "Fast Ray Tracing by Ray Classification," Computer Graphics, Vol. 21, No. 4, July 1987, pp. 55-64. BADO90 Badouel, D., Boutaouch, K., and Priol, T., "Ray Tracing on Distributed Memory Parallel Computers: Strategies for Distributing Computations and Data," Technical Report 508, Institut de Recherche en Informatique et Systemes Aleatoires (IRISA), January 1990. BARS84 Barsky, B. A, "A Description and Evaluation of Various 3-D Models," IEEE Computer Graphics and Applications, Vol. 4, No. 1, January 1984, pp. 38-52. BART87 Bartels, R., Beatty, J., and Barsky, B., An Introduction to the Use of Splines in Computer Graphics, Morgan Kaufmann, Los Altos, CA 1987. CATM74 Catmull, E., A Subdivision Algorithm for Computer Display of Curved Surfaces, Ph.D. Dissertation, University of Utah, December 1974. CATM75 Catmull, E., "Computer Display of Curved Surfaces," Proc. IEEE Conf on Computer Graphics, Pattern Recognition and Data Structure, Los Angeles, May 1975, pp. 11-17. CHAN89 Chang, S., Shantz, M. and Rochetti, R., "Rendering Cubic Curves and Surfaces with Integer Adaptive Forward Differencing," Computer Graphics, Vol. 23, No. 3, July 1989, pp. 157-166. CLAR79 Clark, J. H., "A Fast Scanline Algorithm for Rendering Parametric Surfaces," Computer Graphics, Vol. 13, No. 2, August 1979, p. 174 (supplement to Proc. SIGGRAPH 79). CLAR82 Clark, J. H., "The Geometric Engine," Computer Graphics, Vol. 16, No. 3, July 1982, pp. 127-133. 95 PAGE 101 96 CLEA88 CONT80 COOK84 DENN83 DeR089 DeROS87 DIPP84 FAUX79 FLEI88 FOLE82 FOLE90 FUJI86 Cleary, J. G. and Wyvil, G., "An Analysis of an Algorithm for Fast Ray-tracing Using Uniform Space Subdivision," TJie Visual Computer, Vol. 4, 1988, pp. 65-83. Conte, S. D., Elementary Numerical Analysis, McGraw-Hill, New York, 1980. Cook, R. L., Porter, T., and Carpenter, L., "Distributed Ray Tracing," Computer Graphics, Vol. 18, No. 3, July 1984, pp. 137-145. Dennis, J. E. and Schnabel, R. B., Numerical Methods for Unconstrained Optimtation and Nonlinear Equations, Prentice-Hall, New Jersey, 1983. DeRose, T. D., Bailey, M. L., Barnard, B., Cypher, R., Dobrikin, D., Ebeling, C, Konstantinidou, S., McMurchie, L, Mizrahi, H., and Yost, B., "Apex: Two Architectures for Generating Parametric Curves and Surfaces," Visual Computer, No. 5, May 1989, pp. 264-276. DeRose, T. D. and Holman, T. J., "The Triangle: A Multiprocessor Architecture for Fast Curve and Surface Generation," Technical Report 87-08-07, University of Washington, Seattle, August 1987, pp. 1-36. Dippe, M. and Swensen, J., "An Adaptive Subdivision Algorithm and Parallel Architecture for Realistic Image Synthesis," Computer Graphics, Vol. 18, No. 3, July 1984, pp. 149-158. Faux, I. D. and Pratt, M. J., Computational Geometry for Design and Manufacture, EUis Horwood Limited, New York, 1979. Fleischman, R., Augmentable Object-Oriented Parallel Processor Architectures for Real-Time Computer-Generated Imagery, Ph.D. Dissertation, University of Florida, December 1988. Foley, J. D. and van Dam, A., Fundamentals of Interactive Computer Graphics, Addison Wesley, Reading, MA, 1982. Foley, J. D., van Dam, A., Feiner, S. K. and Hughes, J. F., Computer Graphics: Principles and Practice, Addison Wesley, Reading, MA, 1990. Fujimoto, A., Takayuki, T., and Kansei, I., "ARTS: Accelerated Ray Tracing System," IEEE Computer Graphics and Applications, Vol. 6, No. 4, April 1986, pp. 16-26. PAGE 102 97 GLAS84 Glassner, A. S., "Space Subdivision for Fast Ray Tracing," IEEE Computer Graphics and Applications, Vol. 4, No. 10, October 1984, pp. 15-22. GLAS89 Glassner, A. S. ed., Introduction to Ray Tracing, Academic Press, London, 1989. GLAS90 GRIM89 HANR83 HECK84 HENN90 HSU90 ISKA90 JOY86 KAJI82 KAJI83 KALR89 Glassner, A. S. ed., Graphics Gems, Academic Press, London, 1990. Grimes, J., Kohn, L., and Bharadhwaj, R., "The Intel i860 64-Bit Processor: A General-Purpose CPU with 3D Graphics Capabilities, IEEE Computer Graphics and Applications, Vol. 9, No. 4, pp. 85-94. Hanrahan, P., "Ray Tracing Algebraic Surfaces," Cowpw/er Graphics, Vol. 17, No. 3, July 1983, pp. 83-90. Heckbert, P. S. and Hanrahan, P., "Beam Tracing Polygonal Objects," Computer Graphics, Vol. 18, No. 3, July 1984, pp. 119-127. Hennessy, J. L. and Patterson, D. A., Computer Architecture: A Quantative Approach, Morgan Kaufnfenn, San Mateo, CA, 1990. Hsu, P. C, Algorithms for Animation Playback in Run-Length Frame Buffer Systems, Ph.D. Dissertation, University of Florida, May 1990. Iskandar, K., GraPE: Graphics Primitive Engine for Parallel ObjectOriented Computer Graphics Architecture, Ph.D. Dissertation, University of Florida, May 1990. Joy, K. L and Bhetanabhotla, M. N., "Ray Tracing Parametric Surface Patches Utilizing Numerical Techniques and Ray Coherence," Computer Graphics, Vol. 20, No. 4, August 1986, pp. 279-285. Kajiya, J. T., "Ray Tracing Parametric Patches," Computer Graphics, Vol. 16, No. 3, July 1982, pp. 245-254. Kajiya, J. T., "New Techniques for Ray Tracing Procedurally Defined Objects," Computer Graphics, Vol. 17, No. 3, July 1983, pp. 91-102. Kalra, D. and Barr, A. H., "Guaranteed Ray Intersections with Implicit Surfaces," Computer Graphics, Vol. 23, No. 3, July 1989, pp.297-306. PAGE 103 > 98 KAPL85 Kaplan, M. R., "Space Tracing, A Constant Time Ray-Tracer," Computer Graphics '85, San Francisco, State of the Art in Image Synthesis, Seminar Notes. KAUF86 Kaufmann, S. T., Real Time Interactive Visibility Computation for Quadratic Surfaces, Master of Engineering Thesis, University of Florida, August 1986. KAY86 Kay, T. arid Kajiya, J. T., "Ray Tracing Complex Scenes," Computer Graphics, Vol. 20, No. 4, August 1986, pp. 269-278. KEDE84a Kedem, G., "CSG-Oriented VLSI Hardware," Computer Graphics, Tutorial, Minneapolis, July 1984. KEDE84b Kedem, G. and Ellis, J. L, "The Raycasting Machine," Proc. IEEE Int'L Conf on Computer Design, 1984, pp. 533-538. KUCK78 Kuck, D. J., The Structure of Computers and Computations, John Wiley and Sons, New York, 1978. LANE80 Lane, J., Carpenter, L, Whitted, T., and Blinn, J., "Scan Line Methods for Displaying Parametrically Defined Surfaces," Communications of the ACM, Vol. 23, No. 1, January 1980, pp. 23-34. LEVN87 Levner, G., Tassinari, P., and Marini, D., "A Simple Method for Ray Tracing Bicubic Surfaces," In Kunii, T. L. (ed). Computer Graphics 1987, Springer, Tokyo, pp. 285-302. LIEN87 Lien, S., Shantz, M. and Pratt, V., "Adaptive Forward Differencing for Rendering Curves and Surfaces," Computer Graphics, Vol. 21, No. 4, July 1987, pp. 111-118. LISC90 Lischinski, D. and Gonczarowski, J., "Improved Techniques for Ray Tracing Parametric Surfaces," Visual Computer, Vol. 6, June 1990, pp. 134-152. LIVA88 Livadas, P., Staudhammer, J., and Kaufmann, S., "Real-time Interactive Visibility Computation for Quadratic Surfaces," International Journal of Computer Mathematics, Vol. 31, November 1988, pp. 19-41. MORT85 Mortenson, M. E., Geometric Modeling, John Wiley and Sons, New York, 1985. PAGE 104 99 MORT89 Mortenson, M. E., Computer Graphics: An Introduction to the Mathematics and Geometry, Industrial Press, New York, 1989. MUDU85 Mudur, S. P., "Mathematical Elements for Computer Graphics," Advances in Computer Graphics I, Springer, New York, 1985. NISH90 Nishita, T., Sederberg, T. W. and Kakimoto, M., "Ray Tracing Trimmed Rational Surface Patches," Computer Graphics, Vol. 24, No. 4, August 1990, pp. 337-345. PLUN85 Plunkett, D. J. and Bailey, M. J., "The Vectorization of a Ray-Tracing Algorithm for Improved Execution Speed," IEEE Computer Graphics and Applications, Vol. 5, No. 8, August 1985, pp. 52-60. PULL87 Pulleybank, R. and Kapenga, J. "The Feasibility of a VLSI Chip for Ray Tracing Bicubic Patches," IEEE Computer Graphics and Applications, Vol. 7, No. 3, March 1987, pp. 33-44. ROGE85 Rogers, D. F., Procedural Elements for Computer Graphics, McGraw-Hill, New York, 1985. ROGER76 Rogers, D. F. and Adams, J. A., Mathematical Elements for Computer Graphics, McGraw-Hill, New York, 1976. ROTH82 Roth, S. D., "Ray Casting for Modeling Solids," Computer Graphics and Image Processing, Vol. 18, No. 2, February 1982, pp. 109-144. RUBI80 Rubin, S. and Whitted, T., "A 3-Dimensional Representation for Fast Rendering of Complex Scenes," Computer Graphics, Vol. 14, No. 3, July 1980, pp. 110-116. SEDE84 Sederberg, T. W. and Anderson, D. C, "Ray Tracing Steiner Patches," Computer Graphics, Vol. 18, No. 3, July 1984, pp. 159-164. SHAN88 Shantz, M. and Chang, S., "Rendering Trimmed NURBS with Adaptive Forward Differencing," Computer Graphics, Vol. 22, No. 4, August 1988, pp. 189-198. STAU78 Staudhammer, J., "On Display of Space Filling Atomic Models in RealTime," Computer Graphics, Vol. 12, No. 3, July 1978, pp. 167-171. SWEE86 Sweeney, M. A. J. and Bartels, R. H., "Ray Tracing Free-Form B-Spline Surfaces," IEEE Computer Graphics and Applications, Vol. 6, No. 2, February 1986, pp. 41-49. PAGE 105 100 THOM83 Thomas, A., "Geometric Modeling and Display Primitives Toward Specialized Hardware," Computer Graphics, Vol. 17, No. 3, July 1983, pp. 299-310. ULLN83 UUner, K., Parallel Machines for Computer Graphics, Ph.D. Dissertation, California Institute of Technology, 1983. WALK85 Walker, P., "The Transputer," Byte, No. 10, May 1985, pp. 219-235. WARD89 Ward, F., "Images for the Computer Age," National Geographic, Vol. 175, No. 6, June 1989, pp. 718-751. WEGH84 Weghorst, H., Hooper, G. and Greenberg, D., "Improved Computation Methods for Ray Tracing," y4 CM Transactions on Graphics, Vol. 3, No. 1, January 1984, pp. 52-69. WHIT80 Whitted, T., "An Improved Illumination Model for Shaded Display," Communications of the ACM, Vol. 23, No. 6, June 1980, pp. 343-349. WOOD89 Woodward, C.,"Ray Tracing Parametric Surfaces by Subdivision in Viewing Plane," In: Strasser, W. (ed). Proceedings: Theory and Practice of Geometric Modeling, Springer, New York, 1989. YANG87 Yang, C, "On Speeding up Ray Tracing of B-Spline Surfaces," Computer Aided Design, Vol. 19, 1987, pp. 122-130. PAGE 106 BIOGRAPHICAL SKETCH IÂ§iii Gurses Buyiiklimanli received her Bachelor of Science degree in physics from Middle East Technical University, Turkey, in 1980 and her Master of Science degree in computer and information sciences from University of Florida in 1985. She worked as a research assistant at CAD/CAM/CAE FaciUty while she was working on her Ph.D. degree in computer and information sciences at University of Florida. Her research interests include computer graphics, software design and development, and artificial intelligence. Buyiiklimanli is a member of IEEE Computer Society, ACM and ACM SIGGRAPH. 101 PAGE 107 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. bhn Staudhammer, Chair Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation ISLud is fully adequate, in scope and quaUty, as a dissertation for the degree of DoOtdr of Philosophy. es Keesling ssor of Mathematics I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree oH5octbr of Philosophy. Steven Thebaut Assistant Professor of Computer and Information Sciences I certify that I have read/tti|s study and that in my opinion it conforms to acceptable standards of scholar^ pfesentation and is fully adequate, in scope and quahty, as a dissertation for the degree ofi)octor of Philosophy. Vanos Livadas ^sistant Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy Carll5. Crane Assistant Professor of Mechanical Engineering PAGE 108 This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1991 /rt^Winfred M. PhiUips Dean, College of Engineering Madelyn M. Lockhart Dean, Graduate School |