
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00039274/00001
Material Information
 Title:
 On speedup procedures in ray tracing
 Creator:
 Lee, Jun, 1957
 Publication Date:
 1994
 Language:
 English
 Physical Description:
 vi, 157 leaves : ill. ; 29 cm.
Subjects
 Subjects / Keywords:
 Algorithms ( jstor )
Binocular vision ( jstor ) Boxes ( jstor ) Copyrights ( jstor ) Eyes ( jstor ) Image processing ( jstor ) Images ( jstor ) Pixels ( jstor ) Ray tracing ( jstor ) Volume ( jstor ) Computer graphics  Mathematical models ( lcsh ) Dissertations, Academic  Electrical Engineering  UF Electrical Engineering thesis, Ph. D Image processing  Digital techniques  Mathematical models ( lcsh )
 Genre:
 bibliography ( marcgt )
nonfiction ( marcgt )
Notes
 Thesis:
 Thesis (Ph. D.)  University of Florida, 1994.
 Bibliography:
 Includes bibliographical references (leaves 150152).
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 Jun Lee.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 The University of Florida George A. Smathers Libraries respect the intellectual property rights of others and do not claim any copyright interest in this item. This item may be protected by copyright but is made available here under a claim of fair use (17 U.S.C. Â§107) for nonprofit research and educational purposes. Users of this work have responsibility for determining copyright status prior to reusing, publishing or reproducing this item for purposes other than what is allowed by fair use or other copyright exemptions. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder. The Smathers Libraries would like to learn more about this item and invite individuals or organizations to contact the RDS coordinator (ufdissertations@uflib.ufl.edu) with any additional information they can provide.
 Resource Identifier:
 021435314 ( ALEPH )
33263673 ( OCLC )

Downloads 
This item has the following downloads:

Full Text 
ON SPEEDUP PROCEDURES IN RAY TRACING
BY
JUN LEE
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA
1994
ACKNOWLEDGMENTS
I would like to express my appreciation to my advisor and supervisory committee chairman, Dr. John Staudhammer, for the guidance and encouragement he provided me on this project. I am also grateful to the other members of my supervisory committee, Dr. Panos E. Livadas, Dr. Jack R. Smith, Dr. A. Antonio Arroyo, Dr. Paul W. Chun, for their encouragement and their commitment.
I wish to express my sincere gratitude to the following organizations for providing resources which made this research possible. IBM Corporation loaned an RS6000 computer system for this research. Financial support was provided by the Republic of Korea Air Force.
Finally I express my sincere appreciation to my family for their dedicated support through all phases of my life especially at Gainesville, Florida in the USA.
TABLE OF CONTENTS
page
ACKNOWLEDGMENT ........................................... ii
ABSTRACT .................................................. v
CHAPTERS
1. INTRODUCTION .......................................... 1
1.. Motivation . ....................................... 1
1.2 Problem Definition 5...............................5
1.3 Overview of Dissertation ......................... 7
2. THE RAY TRACING PROCEDURE ............................. 10
2.1 Background . ....................................... 10
2.2 The Ray Tracing Process ........................... 18
2.3 Uniprocessor Implementation ...................... 21
2.4 Possible Bottlenecks .............................. 23
2.5 Possible Speedups ................................. 24
2.5.1 Image Clipping .............................. 24
2.5.2 Simplified Sorting ......................... 27
2.5.3 Parallelization ............................. 27
2.6 Summary . .......................................... 28
3. FAST RAY TRACING ALGORITHMS ........................... 30
3.1 Introduction . ..................................... 30
3.2 Bounding Volume Algorithms ....................... 32
3.3 Hierarchical Bounding Volume Algorithm ........... 38
3.4 Nonuniform Spatial Subdivision Algorithm ......... 42
3.5 Uniform Spatial Subdivision Algorithm ........... 47
3.6 Inside Test . ...................................... 49
4. DEPTH SORTER .......................................... 55
4.1 Introduction . ..................................... 55
4.2 General Depth Comparator ......................... 56
4.3 Possible Problems with the New Bounding Volume .... 67
4.3.1 Box Bounding Volumes ....................... 67
4.3.2 Sphere Bounding Volumes .................... 69
4.4 Bounding of Objects ............................... 71
iii
4.5 Algorithm. ........................................ 75
4.5.1 Filtering and Comparison. ................... 75
4.5.2 Sorting .................................... 89
4.5.3 Ray Tracing Algorithm ...................... 90
4.6 Data Structure ................................... 95
4.6.1 Primitives ................................. 95
4.6.2 Link List .................................. 97
4.6.3 Sorting .................................... 98
4.7 Hardware Considerations .......................... 100
4.7.1 Introduction ............................... 100
4.7.2 The Ray Intersector ........................ 102
4.7.3 The Depth Sorter ........................... 112
5. EXPERIMENTAL RESULTS ................................. 122
6. CONCLUSION ........................................... 146
6.1 Summary .......................................... 146
6.2 Remarks on the New Algorithm ..................... 148
6.3 Recommendations for Future Research .............. 149
REFERENCES .............................................. 150
APPENDIX ................................................ 153
BIOGRAPHICAL SKETCH ..................................... 157
Abstract of Dissertation Presented to the Graduate School of
the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
ON SPEED UP PROCEDURES IN RAY TRACING By
JUN LEE
December 1994
Chairman: Dr. John Staudhammer Major Department: Electrical Engineering
This study critically examines the ray tracing process used in the generation of highcomplexity images in computer graphics and provides design parameters for hardware which will alleviate bottlenecks inherent in the ray tracing procedure. A ray tracing algorithm is developed and bottleneck points in the ray tracing algorithm are identified which can be eliminated by hardware implementation.
To develop the new algorithm, the traditional fast algorithms are studied. By combining the strengths and considering the weak points of various algorithms, a new procedure is proposed that eliminates the inherent limitations of the basic ray tracing process.
The new algorithm employs sphere bounding volumes to reduce the number of rayobject intersections in the basic ray tracing algorithm. Traditional bounding volumes are used to
v
bound objects. Sphere bounding volumes in the new algorithm are used to bound subspaces which could contain whole objects or parts of objects.
The sphere bounding volumes are sorted with respect to ray direction for each ray. Traditional sorting of sphere bounding volumes need to calculate a square root. To avoid square root calculations, we develop a comparison algorithm which uses coefficients of quadratic equations for sorting bounding volumes. In traditional ray tracing algorithms the greatest computational load arises from the calculation of rayobject intersections. In the new algorithm, rayobject intersection tests start from the nearest bounding volume. If the ray hits an object in the bounding volume, the intersection test is terminated. If not, an intersection test is performed on the next nearest bounding volume.
Since the new bounding volume is established in the image space, not in object space, we must check whether the intersection point is in the bounding volume when a ray hits an object. For this test we develop a simple procedure using the coefficients of bounding volumes. The performance of the new algorithm is verified with computer simulations. We compare two outputs which are produced by each algorithm (traditional ray tracing algorithm and new algorithm) and show a substantial reduction in overall ray tracing calculation time. Characteristics of hardware modules are developed which can further reduce the image rendering time.
CHAPTER 1
INTRODUCTION
1.1 Motivation
The quest for visual realism continues to be a major research area in computer graphics. The thrust is to achieve images indistinguishable from a look at a real scene, i.e. we desire to effect a visual environment of artificial reality. Efforts continue to devise techniques that can ever more faithfully account for visual effects in computer produced images. Concurrent with the search for more realistic image detail calculations is a search for more effective computational techniques for wellunderstood basic techniques for image calculation.
Computer graphics developers are continually looking for computationally economic techniques to simulate virtual reality. As computers have become more powerful and graphical hardware I/O devices more prevalent, photo realism has been achieved. Photo realism is achieved by painting on the display surface an image that focusses onto the retina of an observer a picture that would normally be produced by a natural environment. The basic underlying technique is to simulate, as far as possible within the constraints of the resolution imposed by the display hardware, a view of the natural scene.
1
2
The basic technique is to put on the screen image values that are those produced from the natural scene by the rays that ultimately are focussed on the retina. The realworld environment produces a plethora of rays, scattering light in every direction, and only a tiny fraction ever finds its way to an observer's eye, thus producing a directional view of the real scene. To calculate all the rays in the real world is wasteful for producing a computerdisplayed image: that image is only one of the infinity of rays emanating from the real scene, and all the other views are not observable from the observer's viewpoint. The technique of calculating all rays is termed forward ray tracing; however, the visible scene, made up of the rays entering the observer's retina, requires far fewer rays to be considered. These rays are the ones entering the eye, and their production from elements of the realworld scene, can be mimicked by following these rays from their destination (a spot on the retina) to the sources of the light whence they came. This technique is called reverse ray tracing, or often termed "ray tracing".
Ray tracing is an image rendering method which processes each pixel in turn and finds the surface point in the three dimensional scene, of which a view is being presented, which determines its intensity and color. The image is not painted on the retina, but rather a display screen, which in turn is focussed by the observer. On the screen we paint a set of pixels, which depict the desired view of the real world. The
3
ray tracing method is based on following rays from the viewpoint through each pixel until the rays meet a surface of an object. It is the coloring of that surface point that is painted as the color of that pixel.
The ray tracing algorithm itself allows the incorporation of many visual effects in a straightforward manner. Adding the same effects into other threedimensional computer graphics techniques is much more difficult, if not impossible [Lin 92]. The technique of ray tracing resulted from the endless pursuit for photo realism.
Ray tracing produces high quality images, at a high computational cost. One of the biggest costs is the calculation of the visible object element at each pixel location. The algorithm must find the nearest object point from the location of the view point. Therefore the heart of any ray tracing package is the set of ray intersection routines. No matter what kind of techniques are applied to ray tracing, there is always the need to find the intersection point of a ray and an object. The basic ray tracing algorithm is
for (each pixel on the display)
for (each object)
find the nearest surface point
retain the nearest surface point calculate the color of that point
4
For example if the screen resolution is 1000 x 1000 and if there are 100 objects in the scene, the basic algorithm will require at least 100 million ray intersection calculations of which one will be used to calculate the picture coloring. If the objects themselves are defined by complex methods, each intersection calculation will also take a considerable amount of time. Even the largest super computer would find this computing requirement hard to satisfy within a reasonable running time.
Since the determination of each pixel color does not depend on the other pixels, parallel processing is possible pixel by pixel. But because the intersection calculation of some pixel takes a very long time, realtime interactive simulation is impossible for realistic image scenes.
To get the high quality picture desired for virtual reality applications using the ray tracing algorithm, it is critically important to avoid wasting computation time on checking the ray against objects that have no intersection and which can be trivially eliminated. The reduction of the number of intersection calculations may be done by many software optimization methods. All of those methods depend on eliminating those unintersected objects.
Still one of the greatest challenges of ray tracing is efficient execution. Despite its impressive image rendering capability, ray tracing is often dismissed as being too computationally exorbitant to be useful. Therefore efficiency
5
is a critical issue and has been the focus of much research from the beginning. This has led to many creative approaches. Decreasing computing time can be achieved both by software improvements and hardware additions.
1.2 Problem Definition
The reduction of the number of intersection calculations may be done by the following four approaches.
1. Bounding Volume Method
2. Hierarchical Bounding Volume Method 3. Uniform Spatial Subdivision Method
4. NonUniform Spatial Subdivision Method
Each approach has its own advantages and disadvantages. Because the intersection test is simple and no hierarchies are required, the sphere bounding volume algorithm is the simplest one. Here objects are bounded by a sphere extending to the objects' maximal extent in the image space. One needs to check only if a bounding volume lies in the pixel's position.
Thus simple test can be used to trivially reduce the number of candidate objects that need be considered for the full intersection calculation. Only those objects that meet the location test are then considered further.
Consider the case shown in Figure 1.1. Here a long thin object, say a slender rod, is being rendered. Since the bounding volume has a characteristic of an easy intersection
View Point
bounding volume
Figure 1.1 Worst Case of Bounding Volume Algorithm
7
test with the ray, the possible bounding volume could be a sphere. When we apply the sphere to this object as a bounding volume, most of the rays which intersect the bounding volume will in fact not intersect the object. Thus many of the calculations will be for naught. In fact, using the bounding volume approach may easily INCREASE the number of required operations. Let's consider the other case as shown in Figure 1.2. Here the bounding volume encloses several objects. In this case most of the rays which intersect the bounding volume will intersect an object or a few objects. The sphere bounding volume in Figure 1.2 shows the main idea of the bounding volume method.
The critical question is how to apply the sphere bounding volume to every environment in the image space, so as to achieve the best efficiency as shown in Figure 1.2. The goal of the present work is to address this problem.
The specific objectives of this study are as follows: 1. Analyze the ray tracing process to assess the computing
requirements in its various phases.
2. Develop a new speed up procedure for ray tracing.
3. Implement this approach.
4. Compare the efficiency of the proposed approach
experimentally with that of the original ray tracing
algorithm.
window objects View Point bounding volume
Figure 1.2 Best Case of Bounding Volume Algorithm
1.3 Overview of the Dissertation
The dissertation consists of six chapters. Chapter 1 explains the uses of ray tracing and discusses the computational problems inherent in it. Chapter 2 critically reviews the ray tracing process. Chapter 3 discusses the traditional fast algorithms used in implementing the ray tracing procedure. The main problems of each fast algorithm are summarized also. A new fast algorithm on ray tracing is presented in Chapter 4. Results of simulations which were used to verify the performance of the new fast algorithm are presented in Chapter 5. The final chapter summarizes this study and suggests directions for future research.
CHAPTER 2
THE RAY TRACING PROCEDURE
2.1 Background
The objective of the ray tracing process is to calculate an image that is a faithful reproduction of a scene, be it natural or an imagined one. The test of a wellrendered ray traced image is in the fidelity with which a natural scene can be rendered from a geometric specification of objects in the scene, the surface properties of those objects, and a description of the illumination of the scene. The image is displayed on a workstation screen and is viewed by an observer. The image of a natural scene seen by the observer should be indistinguishable from a view of a natural scene. It is therefore important to review just what the eye can see.
Normal color vision perceives images in many colors and with a high degree of image fineness (acuity). There are basically two parts of the field of vision: peripheral and foveal. It is the foveal vision that has a high degree of spatial precision; peripheral vision is less spatially acute, but has the ability of detecting motion (without detail) and conveys to the user a sense of presence in the scene. Foveal acuity and color resolution has been studied in detail [Sou 61]; peripheral vision is less acute and less discriminating 10
11
in color. Foveal vision exists in the visual center of a view field, and subtends only about one degree of arc.
The view field for a normal eye is almost 180ï¿½ lefttoright and almost 180ï¿½ topdown. Normal human vision is binocular: the left and the right eyes perceive slightly different images. The central part of the visual field is common to both eyes, resulting in an overlapped binocular field of some 150ï¿½ high and about 120' laterally. The visual cortex, the image receptor in the human observer, derives threedimensional information from the parallax between the two views [Gra 63].
The spacing of receptors in the foveal region is about 2 to 3 gm, and the focal distance of the eye in 15 mm. This results in a physiological resolution of about 0.5 to 0.7 minutes of arc. The eye in fact can resolve even finer details on structured targets [Lux 68].
In forming an image on its receptor surface, the retina, the human eye is basically an optical device. Every optical device suffers from chromatic aberration. The eye will focus on greenyellow light, making the redlight focus slightly behind the retina, and the bluelight focal plane slightly in front of the retina. Most measurements of visual acuity, image flicker rate and similar data, use white light with implicit color aberrations.
The centralfield visual acuity varies with the type of image. Astronomers, before the availability of telescopes,
12
depended heavily on acute eyesight; a job requirement was exceptionally keen vision, so that they could tell stars apart on a dark background. The normal discrimination for this task is about 2.9 x 104 radians (about 1 minute of arc); for long lines on a selfluminous background (spider webs lit from behind), the minimum visual angle is about 8 arcseconds [Lux 68].
The distribution of visual receptors is not uniform in the human eye. The density decreases markedly toward the periphery. Rods (light intensity detectors) number 110 to 130 million, and cones (color detectors) 3 to 7 million. Hence if it were possible to focus a computerproduced image directly on the visual detectors of an observer's eye, one should be able to evoke a full monocular image consistent with the realworld environs with about 100 million picture elements. Since the "normal" workstation display has about 1 million pixels, technology is within two orders of magnitude to produce images that a human observer would not be able to distinguish, at least in image acuity, from a view of the real world.
Unfortunately the eye is in constant small motion when a view is perceived. This small motion is a tiny random, or nearly random motion, the saccadic motion of the eye. We should note that during saccadic eye motion the observer (at least the human observer) is blind, else a motion blur would interfere with the image being "seen", i.e. processed for understanding. The mechanism of this brief visual blackout is
13
not well understood at this time, but indicates a highly complex interaction between the pure visual processing centers in the brain and the eye motion control centers. In this eye motion, apparently the vision detectors are "scanned" relative to the stationary image; image data processing in the nerve net underlying the receptors allows the eye to detect features in an image which are somewhat finer than the spacing of rods in the retina.
Saccadic motion requires that a picture be rendered on the computer display with as fine a resolution as the best eye resolving power, i.e. about 1/10 degree of arc in the areas that the eye might be roving over during normal vision. At the most this will be a total view angle of 180. Hence the number of pixels required will be about 180x60x10 = 100,000 in each the horizontal and vertical directions for a hemispheric panoramic display. This would require 1010 pixels, about four orders of magnitude higher than contemporary "high resolution" workstation displays. Thus even with the best of current interactive display devices we can produce only a defocussed approximation of a realworld image.
In general, color is perceived from the stimulation of various color receptors in the light detection organ of the perceiver; these are vastly different in various living organisms [Cro 94]. Hence the sensation of color is a speciesdependent phenomenon. In some creatures there is a highly developed color detection system favoring blue colors, as in
14
fish. Thus when we speak of color, we really need to specify what organism we use as a referent. Clearly, in computer graphics, we mean "color" to be a human experience. Humans perceive color from three primary color receptors, which have relatively broad frequency responses that overlap in the visible spectrum. Each have a primary response in the Red (590 nm), the Green (500 nm) and the Blue (470 nm) [Cro 94]. The combination of the responses evokes the recognition of color. This is the basis of the threecolor (tristimulus) system of the common TV, which defines the standard colors used in a normal workstation display. Any color shown is a mixture of the three primary components. In generating a color image, three primary color components must be produced for each display pixel.
The color resolution of the human visual system is usually measured in terms of color saturation and color purity. Saturation measures the number of steps that are perceived between a "pure" color (such as a spectral color) and white (composed of 3 equal parts of the three RedGreenBlue primaries). Such a measurement shows that the human can distinguish around a hundred different levels of saturation [Gra 63].
The color purity discrimination is a bit sharper; a change of wavelength of around 3 nm can be perceived in the yellowgreen part of the spectrum (about 550 nm). Hence the color purity is more critical, amounting to about 1 % in color
15
fidelity [Moo 61]. Thus specifying color information to an 8bit accuracy is consistent with human color perception; however, for precise applications, when the nonlinear characteristics of the phosphors in the display surface need to be considered, a primary color specification of at least ten bits is necessary [Mar 82].
Thus the color display that is to produce an image of the real world should show a pair of images, one for each eye, each display should subtend a visual angle of some 180', each should have about 1010 pixels, each pixel should be capable of displaying three primary colors, with a color resolution of about 10 bits. Clearly we do not now have affordable technology to approach these numbers; currently we are 4 to 5 orders of magnitude away from these numbers.
One other aspect of a monocular view of a natural scene is inherent depth information in the wavefront. An observer can easily focus on near objects, or far ones, thus deriving some depth information. This is easily demonstrated in a view out of a window: the observer can easily ignore (i.e. NOT focus on) smudges on the glass pane, but sharply observe distant scenery. The image which one normally produces on a workstation display surface lacks the focussing feature. Normally a fixedfocus distance (usually infinite) is used for the production of a scene. Such an image is then painted to a screen at a fixed distance from the observer. In this respect a computerproduced image is yet another approximation
16
to a natural scene. This refinement in imaging is detail is usually ignored in artificially produced imaging.
Therefore, at best, current technology can produce only an approximation, notably of lower spatial resolution, but close to reallife colors. Most of the work in this dissertation will deal with workstation displays, with a resolution of about 1000 x 1000 pixels. The visual image normally subtends about 30ï¿½ to 60.
To maintain the illusion of stationary, or slowly moving, images, an image sequence is painted on the display surface. The human visual cortex will fuse sequences of images and evoke the sensation of a stationary or smoothly moving image, without the appearance of image flicker, if sequential images are repeated with a high enough rate. This image fusion frequency depends on the overall image brightness. For a darkened room, such as a movie auditorium, a frequency of 24/second is adequate. For a dimlylit daytime room, normal for television viewing, the rate rises to around 30/second; and for normal daytime brightness the rate may exceed 60/second. Current highperformance workstations use display refresh rates of about 60 to 70/second.
We would, of course, like to have our display image calculated AND displayed at the maximum rate required by these physiological demands. Thus we need to contemplate the calculation of image data at a rate of at least 30 megapixels per second, each having Red, Green and Blue components.
17
The last item we need to discuss is just what should be on each pixel. In the natural world each receptor area will receive light from a small cone segment, a surface patch of about 1/10 arcminute across. Each of these patches is the sum of all light rays that are focussed on that small area. In calculating an artificial image of a real scene, we need to remember that each image pixel is a smallarea integral that we need to calculate. We seldom have the luxury of integrating over howeversmall angular cones; we normally sample the scene for one (or a few) point(s) for each pixel. This is the major cause of picture aliasing: the image elements are defined with higher resolution than is possible to paint on the display surface. Antialiasing techniques, not a subject explored in this work, deal with refining the coloring information in an attempt to account for the lack of spatial display ability.
Thus the computational requirements for rendering even a modest scene may easily exceed any reasonable computing resources. It is therefore very important to look at procedures that ease the image computation task. Two promising major avenues of approach are:
1. Ways of parallelizing the computation task;
2. Algorithmic approaches to identify the computational
bottlenecks and find effective speedup procedures.
18
This dissertation focusses on the second of these approaches. We will introduce a process for speeding up the basic ray tracing process. We will use a uniprocessor for accomplishing the basic ray tracing task, and will use the same processor to implement the speed up. Certainly the process can be adapted to parallel computations, thus conveying an added benefit to inherently faster computational complexes.
2.2 The Ray Tracing Process
In a natural environment all rays that make up a visible scene in the observer's eye start at various light sources, then are scattered, reflected and directed into the pupil of the observer's eye. Clearly, most of the light is lost to the observer; those parts of the scene image are not observable from the location of the observer. To mimic this process exactly on a computer, i.e. to trace all light rays emanating from the scene, would require the calculation of a great many light paths which would then be not utilized in forming an observer's image. This wasteful process is known as Direct Ray Tracing; the procedure models the action of light sources on the scene to produce all observable views of it. The procedure is not practical.
For reasons of economy in calculation it is necessary only to compute those rays that in fact will be used in forming the ultimate display image. Hence one starts with the
19
rays at the image plane and traces the rays backwards to their sources. The process is known as Reverse Ray Tracing. It avoids the computational inefficiency inherent in Direct Ray Tracing. It is the Reverse Ray Tracing process which is commonly known as Ray Tracing; it is the process referred to as Ray Tracing in this work.
As a further concession to practicality, the scene focussing is replaced by a simple pinhole camera, rather than getting involved in optical aperture complications [Pot 82]. Hence the image to be produced will be a pinhole camera view of a scene, thus a fixed focus, and by tracing the rays that make up the image back to the light sources, accounting for interactions of objects in the scene with the rays.
The rays that make up the image are taken to be one for each pixel that appears on the final image rendering. Hence for a VGA image, one would need 480 rows x 640 picture elements = 307,200 pixels (i.e. rays). In a "normal workstation" there would be about 1000 rows and 1250 pixels/row, or about 1.2 million pixels. For each of these rays we need to calculate where they intersect the nearest object in the scene. It is not uncommon for a scene to contain several hundred objects. The process basically is:
For each ray;
determine where each object intersects;
determine the color for each of these points;
paint this color to the pixel;
Display the whole image.
20
If the scene consists of N objects, the number of intersection calculations for a "simple VGA" image is basically 300,OOOXN operations; for a workstation image, there may be four times as many calculations. Each of these calculations may require the determination of the intersection point of a ray with a surface, which may be a curved surface in threedimensions. These calculations are involved and may require many thousands of machine cycles [Bli 80].
Once the intersection point is determined, calculation of the color of that point may involve a great deal of additional calculations. These calculations are called the rendering calculations; they determine the color reflected/emitted by the intersection point with the viewing ray representing a direction of the nearestobject for a particular ray. Depending on the nature of the visible surface point, this calculation may be simple or very complex, especially if that visible point reflects light, or is perhaps a light refracting surface. For reflected light and refracted light secondary rays need be examined which result from the light reflection/refraction properties of that particular surface point. Conceptually, one needs to determine the surface normal relative to the incident ray, and construct rays that further follow the imaged ray, all the way to the source of the light that produces the surface coloring. There may be a number of secondary rays; their intersection with the objects in the scene need then be determined. This calculation is
21
basically the same as the calculation of the ray emanating from the display plane; however, since there may be many such secondary rays, the amount of calculations may become quite large.
It is the objective of this dissertation to examine the raytracing image production process, to identify those of its steps that are computation intensive, and to offer palliative measures for the computational bottlenecks.
2.3 Uniprocessor Implementation
The fundamental ray tracing program has three basic parts: input of scene and viewing parameters, calculation of the visible picture elements, and production of the visible image to a display device or a storage medium. There is considerable interaction between the three segments, especially when the image needs to be produced in real time, as with dynamic virtual reality environments. However, the conceptual program flow can be considered to be the sequential execution of the three parts listed above. Together with the computational tasks, the fundamental ray tracing program is diagramed in Figure 2.1 (Hec 89].
Since the objective of this dissertation is the examination of the computational loads for the ray tracing process, we will concentrate on the second phase of the program and omit critical examination of the initialization, input and output tasks. Furthermore, we assume that the
Initialize storage, input and output files
Input object geometry data, object surface properties, view parameters
begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmitted color
color < black ; if (depth < maxdepth) begin
color < back ground color;
[intersect ray with all objects and find intersection
point (if any) that is closest to start of ray]
if (intersection)
begin
localcolor < { contribution of local color model at intersection point } { Calculate direction of reflected ray }
Raytrace (intersectionpoint,reflecteddirection, depth+l,reflected color) { Calculate direction of transmitted ray }
Raytrace (intersectionpoint,transmitteddirection, depth+l,transmittedcolor) Combine (color, localcolor, local weightforsurface,
reflected_color,reflected_weight forsurface,
transmittedcolor,transmittedweightforsurface)
end
end
end
Output image file
Figure 2.1 Ray Tracing Program
23
computational task is not further burdened with data access limitations, i.e. there is sufficient memory to hold all required data for the various calculations.
We note from Figure 2.3 that the entire computational task is dominated by the intersection test, which determines the intersection point of a given ray with the nearest object. Fundamentally, this intersection test must be performed for all objects in the scene, but only one object will be retained for further calculation of reflected or refracted (transmitted) light.
We note that for a uniprocessor implementation the various computational tasks are performed sequentially. For timing the intersection calculations we need to assure that the various other sections are excluded from the timing measurements. We also note that no shading calculations can be made until the visible surface element is found in establishing the coloring of an image pixel.
2.4 Possible Bottlenecks
From an analysis of the algorithm we note that the predominant computational bottleneck is in the intersection calculations, with secondary choke points in calculating reflected and transmitted (i.e. refracted) rays, as well as in calculating the coloring of a surface element. If the surfaces of the objects are curved, then the calculation of surface normals, needed for establishing the apparent surface
24
coloring, can become computationally complex. These secondary effects have been studied extensively [Bli 80], and have been subject to extensive research [Whi 80]. While we do not wish to minimize these computational tasks, we note that the first choke point is finding the nearestobject surface element that is needed for any further coloring calculations. The problem is basically finding the intersection points of all objects with a ray, and then finding that point which is nearest to the origin of the ray. This is implicitly a sorting process; however, we can make the list of elements to be sorted very short, if we find a computationally efficient way of ignoring all those objects which can not intersect a given ray (because they are located far away from the ray as it radiates into the object scene from the position of the observer).
The following chapters outline such a method.
2.5 Possible Speedups
From an examination of the basic ray tracing process, there are several parts of the process which can easily lead to computational speedups. These are image clipping, shortcuts in the object sorting process, and parallelization. We briefly discuss each of these in the following paragraphs.
2.5.1 Image Clipping
Since basic ray tracing examines each object of the natural scene to determine which one is closest to the
25
viewpoint, the viewpoint can be located anywhere, including inside of the object assembly that makes up the final scene. This is merely a computerized version of building a scene, for example a movie set, and moving a camera toward, around and into it. At various positions of the path of the camera images are constructed, which become the sequence of frames that define a dynamic image, an animated image sequence. There is no need to discard any of the objects that make up the scene, even if those parts are invisible to the viewer, such as elements of the scene that may become located behind the observer. One could cull the objects and retain only those in the general direction of the view, and thereby greatly reduce the required calculations. Various image clipping algorithms exist which can provide a list of objects, and parts of the original objects, which will appear in the final image. Some objects will be only partly in the final scene; these elements must be clipped and the parts outside of the image extent must be discarded.
Since the objects in the image may be anything, the clipping must be able to handle arbitrary primitives. A clipping algorithm which is so general is itself a complicated procedure and may be computationally expensive. For general object primitives there is no solution available, short of a variation of the ray tracing algorithm itself. Hence the search for such a panacea is likely to be fruitless.
26
If we restrict the primitives to objects having polygonal facets only, the WeilerAtherthon algorithm will work well [Wei 77]. This algorithm will determine the visible parts of all objects defined by finite polygonal or infinite planar facets. The procedure is complex and computationally expensive.
However, a much simpler approach is more promising. We note that the visibility of the objects will be determined by the ray tracing process itself; we merely would like to reduce the number of elements outside of the viewing frustrum that need to be tested. We simply discard entire objects which have all of their vertices outside of the viewing image. This requires a minor modification: each test for top/left/bottom/right outside of the view is applied in the view coordinates to the lowest top/rightmost left/highest bottom/leftmost right vertex of each object. These tests can be applied when the objects are placed into the view coordinate system: for each object the indicated tests are performed at the end of each object's coordinate transformation.
Thus there is a relatively simple test to reduce the number of candidate objects for the ray intersection calculations. This step alone can noticeably reduce the total amount of required calculations. We assume that such a preliminary pruning is done on the original object list, and that the list of objects has already been reduced.
2.5.2 Simplified Sorting
The next problem is to sort objects in a way that avoids many comparisons. A given object may be nowhere in the direction of the ray being used for construction of a pixel. Finding these "inactive" objects with a computationally efficient process is highly desirable.
2.5.3 Parallelization
One of the easiest concepts for speeding up the ray tracing calculation is to realize that physically every ray is independent of the other rays; thus each ray can be assigned to a separate processor. Then the total image generation time is merely the time taken by the most involved ray calculation; i.e. tracing of the most complex ray. For an "ordinary" VGA image one could use some 300,000 processors, assuming that few secondary rays need be calculated. For more complex scenes on workstations, over a million processors could be employed. There is also the problem of defining the image for each processor's use (by a data broadcasting process), and assembling the results into a single image buffer (so that calculation of the next scene can be started while the old image is displayed.)
This parallelization would require each processor to calculate the ray path for each pixel AND would ignore any information the neighboring processors may have about the image structure, i.e. image coherence is ignored. Thus the
28
process may be the fastest, but it also the most wasteful of computing resources.
A reasonable solution is to map the ray tracing process to the engines that comprise the computing environment. This dissertation merely notes this, and will not attempt to delve into it. We will concentrate on finding speedups in the uniprocessor solution, and leave the mapping of the improved process as a desirable extension for further work. Work in defining image composition hardware has been completed [Fle 88]; these together with the structures defined in this dissertation point toward a class of image generation machinery for calculating rich visual environments for use in dynamic artificial reality scenes.
2.6 Summary
This chapter reviewed the parameters involved in human vision and the ideas underlying the basic ray tracing process. It showed that reasonable computing machinery and existing affordable display technology can present but an approximation to a naturally observed scene. The image quality is limited by resolution of the display device and by the computational capabilities of the machinery that may be used to produce the actual image. However, within these limitations, images can be produced that present a visually stunning image. It is the intent of this dissertation to examine one of the computational bottlenecks in the calculation of computer
29
produced images: that of decreasing the number of objects that need to be considered during the calculation of visible parts of a scene. In achieving this goal we also specify design parameters for simple computational structures which can be cast in VLSI hardware. We will verify the correctness of these designs and use simulation to estimate their impact on the performance of the basic ray tracing procedure.
CHAPTER 3
FAST RAY TRACING ALGORITHMS
3.1 Introduction
The basic philosophy of ray tracing is that an observer sees a point on a display surface as a result of the interaction of the surface at that point with rays emanating from the scene. In general a light ray may reach the surface indirectly via reflection at other surfaces, transmission through partially transparent objects or a combination of these. Ray tracing is the most complete simulation of an illumination/reflection model in computer graphics. Ray tracing procedures have produced the most realistic images to date in computer graphics.
But the method has a number of disadvantages, the most critical being extremely high processing requirements. Image generation time can be hours or even days, even on powerful computer systems. Ray tracing is considered impractical. Much research has been done to improve the procedure. There are several techniques that have gained wide acceptance. In this chapter we will discuss four algorithms and point out the problems of each of them.
31
Faced with the task of accelerating the process of ray tracing, there are three very distinct strategies to consider [Arv 89]:
1) reducing the average cost of finding the intersection of a ray with the environment;
2) reducing the total number of rays intersected with the environment;
3) replacing individual rays with a more general entity.
These strategies are called "faster intersections", "fewer rays" and "generalized rays", respectively. The last strategy (generalized rays) places some constraint on the environment that can be considered, such as restricting the type of primitive objects, or we may need to abandon the notion of exact intersection calculations, accepting an approximation instead [Arv 87]. The examples of this are beam tracing, cone tracing, pencil tracing (Fol 90](Arv 89]. These procedures are peripheral to the study being presented here and will not be discussed further.
The other two strategies, faster intersections and fewer rays, can be combined under the heading of speed up algorithms. In this case obtaining faster rayintersection with objects is still the key to the fast algorithm. The strategy used to get faster intersections separates this class into the subcategories of "faster" and "fewer" rayobject intersections. The former consists of efficient
32
algorithms for intersecting rays with specific primitive objects, while the latter addresses the larger problem of intersecting a ray with an environment using a minimum of rayobject intersection tests.
The following four algorithms are representative fast algorithms which can be placed into the above two subcategories:
1) Bounding volume algorithms;
2) Hierarchical bounding volume algorithms; 3) Uniform spatial subdivision algorithms;
4) Nonuniform spatial subdivision algorithms.
3.2 Bounding Volume Algorithms
The most fundamental and ubiquitous tool for ray tracing acceleration is the bounding volume. The idea of using a bounding volume takes advantage of the fact that rays usually miss the objects they are tested against. By enclosing complicated objects in invisible bounding volumes that are easy to intersect, one can avoid complicated object intersection calculations. This leads to fewer calculations, especially if the ray misses the bounding volume. Because the bounding volume intersection test is simpler than object intersection, this algorithm may reduce the computation by a considerable amount, but can not improve upon the linear time complexity of exhaustive ray tracing (Arv 89]. The reason can be explained with the objects in Figure 3.1. How do we bound
(a)
BVI  BV3
BV2 (b)
(c)
Figure 3.1 Bounding Volumes
34
these objects? How many bounding volumes are employed? What kinds of shapes are best as a bounding volumes? The simple bounding volume could be sphere (circle in this twodimensional example) because the intersection test is easier than that of any other type. When we employ one sphere bounding volume, the bounding volume is too huge, and every ray will hit the bounding volume. In this case the bounding volume algorithm must pay the cost of three object intersection tests and costs the of bounding volume intersection tests. When we employ three bounding volumes on each object, the bounding volumes for B,C have similar problems as the one bounding volume case. Even though empty space is reduced by employing three bounding volumes, volumes 2,3 still contain big empty spaces.
The other solution is to employ a box as a bounding volume (in the twodimensional case a rectangle). By employing fitted bounding volumes we reduce the empty space in each bounding volume. The problem in this case happens with object C. The ray intersection test for object C requires three ray and line intersection calculations. But the ray intersection test for the bounding volume requires four ray and line intersection tests. There is no advantage if the cost for intersection with the bounding volume is as expensive as that with the object. Weghorst points out this as the key problem of the bounding volume algorithm [Weg 84]. The void area of bounding volume is defined as the difference in the projected
35
areas of bounding volume and the item. He considers the following two problems when bounding volumes are employed:
1. void area of bounding volume;
2. the cost of intersecting the bounding volumes.
In his computer experiment, Weghorst applied two bounding volumes (sphere, box) to the same object for deciding which one is better. He considered the case of only changing the view point and he measured two performance indices for each type of bounding volume. From his experiments he concluded that the optimal choice of bounding volume is ray dependent.
However, in the general case, as the illumination becomes more complex or the environment becomes specular or transparent, ray dependency becomes more difficult to consider since rays are more likely to arrive from any direction.
The best bounding volume depends on both the expense of performing tests on the bounding volume itself and on how well the volume protects the enclosed object from tests that do not yield an intersection. The criterion for the selection of the bounding volume is to minimize the total cost function T of the intersection test for an object. This total cost function is defined by Weghorst, Hooper, and Greenberg as follows [Weg 84]
T=b*B+i *I
where
T total cost function
b number of times that the bounding volume is tested for
intersection
B : the cost of testing the bounding volume for
intersection
i : the number of times that the item is tested for
intersection
I : cost of testing the item for intersection.
For a specific item in an environment, with a given view, b and I are constant. However by manipulating the shape and size of the bounding volume, B and i can be varied to reduce the total cost function. The two elements B and i are generally interdependent. For example, reducing B by reducing the complexity of the bounding volume will almost certainly increase i. To minimize the total cost function, we only assign bounding volumes to those items whose intersection tests are sufficiently complex to warrant one. Certain items such as spheres, cylinders, and rectangular parallelepipeds, need not to be bounded. In this case the ray tracing algorithm maintains two lists (one is the bounding volume list, the other the simple object list).
Bounding volume algorithm can be added to the original ray tracing algorithm. Figure 3.2 shows the bounding volume algorithm. This algorithm employs a bounding volume list and performs intersection tests with the objects after finishing raybounding volume intersection tests. From a theoretical point of view the bounding volume algorithm may reduce the computation by a constant factor, but can not improve upon
procedure Raytrace (start, direction, depth, color) vector : start, direction integer : depth colors : color
begin
vector : intersection_point, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmitted color
color < black ; if (depth maxdepth) begin
color < backgroundcolor;
while (bounding volume)
[intersect ray with all objects in a bounding volume and find intersection point (if any) that is closest
to start of ray]
bounding volume 4 next bounding volume;
endwhile
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersection point,reflected direction,
depth+1,reflected color)
{ Calculate direction of transmitted ray }
Raytrace (intersection point,transmitteddirection,
depth+l,transmittedcolor)
Combine (color, localcolor,localweightforsurface,
reflectedcolor,reflected weight for surface,
transmitted color,transmittedweightforsurface)
end
end
end
Figure 3.2 Bounding Volume Algorithm
the linear time complexity of exhaustive ray tracing. The main problem of bounding volumes is that defining the optimal bounding volume is difficult.
3.3 Hierarchical Bounding Volume Algorithm
A common extension to bounding volumes is an attempt to impose a hierarchical structure of bounding volumes on the scene.
If it is possible, objects in close spatial proximity are allowed to form clusters, and the clusters are themselves enclosed in bounding volumes. Figure 3.3 shows a bounding volume A which contains one large object B and an other bounding volume C, which has four small bounding volumes (Cl, C2, C3, C4) inside it. The tree represents the hierarchical relationship between the seven boundary extents A, B, C, Cl, C2, C3, C4. By enclosing a number of bounding volumes within a larger bounding volume it is possible to eliminate many objects from further consideration with a single intersection check. If a ray did not intersect the parent volume, there was no need to test it against the bounding volumes or objects contained within. A ray traced against bounding volumes means that such a tree is traversed from the topmost level.
A ray that happened to intersect Cl in Figure 3.3 would be tested against the bounding volume Cl, C2, C3 and C4 but only because it intersects the bounding volume representingthat cluster. A ray that missed bounding volume A need not be tested against bounding volumes inside of A. This intersection
Bounding Volume Tree Structure
A
C C
C1 02 03 C4
Figure 3.3 Bounding Volume Hierarchies
40
algorithm is implemented in Figure 3.4. The hierarchical bounding volume algorithm employs an intersection procedure (HBVintersect). The data structure for this hierarchy is assumed to be a tree with an arbitrary branching factor at each internal node. Thus bounding volumes may enclose any number of other bounding volumes. Each leaf node of the tree is a single primitive object while each interior node consists of a bounding volume. The procedure Intersect in HBVintersect performs a rayobject intersection for the given ray information (origin, direction) and an object. The function "IntersectP" is very similar to "intersect" except that it returns a boolean value indication whether an intersection was found or not. The intersection process of the hierarchical bounding volume begins with the root node of the tree.
To alleviate the bounding volume problem, Rubin and Whitted [Rub 80] introduce the hierarchical bounding volumes algorithm to ray tracing in order to attain a theoretical time complexity which is logarithmic in the number of objects instead of being linear. But constructing a bounding volume hierarchy involves two considerations: 1) which bounding volumes to enclose; 2) what type of bounding volume to enclose them with.
This is a challenging problem because the number of possible hierarchical groupings of objects grows exponentially with the number of objects, making an exhaustive search totally impractical. There are some suggestions on
procedure Raytrace (start, direction, depth, color) vector : start, direction integer depth colors : color
begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmittedcolor
color < black ; if (depth < maxdepth) begin
color < back groundcolor;
HBV intersect(start,direction,node);
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersection point,reflecteddirection,
depth+1,reflectedcolor)
{ Calculate direction of transmitted ray }
Raytrace (intersection point,transmitteddirection,
depth+l,transmitted color)
Combine (color, localcolor,localweightfor surface,
reflectedcolor,reflectedweightforsurface,
transmittedcolor,transmitted_weight for surface)
end
end
end
procedure HBV intersect (origin, direction, node) vector : origin, direction globalpointer : *node
begin
if node is a leap then
Intersect (origin,direction,node.object);
elseif Intersect_P(origin,direction,node.bounding volume) then
for each child of node do
HBVintersect(origin,direction,child);
endfor
endif
end
Figure 3.4 Hierarchical Bounding Volume Algorithm
42
constructing the hierarchy of bounding volumes [Gol 80] [Rub 80] [Weg 84].
The potential clustering and the depth of hierarchy depends on the nature of the scene. The problem with bounding volume hierarchies is that they are not convenient for a user to specify. That drawback is addressed by techniques for generating bounding volume hierarchies automatically [Nic 941.
3.4 Nonuniform Spatial Subdivision Algorithm
Bounding volume hierarchies provide a means of recursively narrowing the focus of the search to more promising candidates for intersection. Bounding volume hierarchies organize objects bottomup; in contrast spatial subdivision algorithms (uniform or nonuniform case) begin with a different philosophy. Spatial partitioning subdivides space topdown. i.e, we rely on simple volumes to identify objects which are good candidates for intersection, but these simple volumes are constructed by applying a divideandconquer technique to the space surrounding the objects instead of considering the objects themselves. One may construct the volumes in a topdown fashion by partitioning a volume bounding the environment into smaller pieces. The smaller volumes are assigned a collection of objects which are totally or partially contained within them. The spatial subdivision algorithm selects sets of objects based on given volumes. This
43
small volume is an axisaligned rectangular prism. This is called a "voxel". A preprocessing step is responsible for constructing nonoverlapping voxels.
The basic idea of the spatial subdivision algorithm is that a ray imposes a strict ordering on the pierced voxels based on the distance to the point at which the ray first enters each voxel. Because the voxels are closest to the ray origin than those in all subsequent voxels, if we process the voxels in the order in which they are encountered along the ray, we need not consider the contents of any further voxels once we have found a point of intersection.
There are two types of spatial subdivision schemes: uniform spatial subdivision and nonuniform spatial subdivision. Nonuniform spatial subdivision techniques discretize space into regions of varying size in order to conform to features of the environment. This variation in size allows more subdivisions to be formed in densely populated regions of space and it allows large voxels to cover regions which are sparsely populated or are entirely void.
Usually an octree is one possible data structure for creating and organizing such a collection of voxels. Glassner [Gla 84] introduces an octree variation for use in ray tracing. In the creation of the octree, a box containing the environment is recursively subdivided until each voxel contains fewer than some threshold number of intersection candidates or until a storage limitation is reached. After
44
constructing the octree, we trace rays through the algorithm in Figure 3.5. In Glassner's approach, nodes of the octree are linked and accessed by uniquely defined names rather than sorting explicit pointers to descendent nodes. To access data associated with a node name, the name is used to retrieve a pointer from a hash table. Glassner observed that simply computing the name modulo the size of the hash table serves as a good hashing function. If a ray hits nothing within a voxel we must proceed to the next voxel pierced by that ray. Glassner's algorithm accomplishes this task by keeping the minimum length of voxels (the resolution of voxel) in nodes of the octree. The movement to the next voxel is accomplished by finding a point within the next voxel and performing the lookup.
An other type of data structure for creating and organizing such a collection of voxels is suggested by Kaplan and Jansen uses binary space partitioning trees (BSP trees). This BSP trees obviates the need for voxel names and hashing at the expense of a potential increase in storage. Figure 3.6 shows a spatial subdivision algorithm based on BSP trees. This algorithm is suggested by Jansen [Arv 89].
The big difference between Glassner's and Jansen's approaches are the data structure for voxels and the movement to the next voxel. Instead of finding the next voxel by creating a point guaranteed to fall within it and traversing the hierarchical structure from the root, Jansen's algorithm
procedure Raytrace (start, direction, depth, color) vector : start, direction integer : depth colors : color
begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmittedcolor
color + black ; if (depth : maxdepth) begin
color < back groundcolor;
Octree intersect(start,direction);
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersectionpoint,reflecteddirection,
depth+l,reflected color)
{ Calculate direction of transmitted ray }
Raytrace (intersectionpoint,transmitteddirection,
depth+l,transmittedcolor)
Combine (color, local_color,localweight forsurface,
reflectedcolor,reflected_weight forsurface,
transmittedcolor,transmittedweightforsurface)
end
end
end
procedure Octreeintersect (origin, direction) vector : origin, direction
begin
vector : Q
Q < origin;
repeat
[ locate the voxel which contains Q ]
for each object in the voxel do
Intersect(origin,direction,object);
endfor
if (no intersection)
Q < a point in the next voxel pierced by ray;
endif
until an intersection is found or Q is outside the
environment
end
Figure 3.5 Nonuniform Spatial Subdivision Algorithm (Octree)
procedure Raytrace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmittedcolor color < black ; if (depth 5 maxdepth) begin
color + back groundcolor; BSP intersect(start,direction,node); if (intersection) begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersection point,reflected direction, depth+1,reflected color) { Calculate direction of transmitted ray }
Raytrace (intersection point,transmitteddirection, depth+l,transmittedcolor) Combine (color, local color,localweightfor surface,
reflected color,reflectedweight for surface,
transmittedcolor,transmitted_weight for surface)
end
end
end
procedure BSP intersect (origin, direction, node) vector : origin, direction globalpointer : *node begin
if rayinterval is empty or node is nil then return if node is a leaf then for each object in the node do
Intersect(origin,direction,object);
endfor
else
near < ray clipped to near side of node.partition;
BSPintersect (near.origin, near.direction, pointer to near half space); if (no intersection)
far < ray clipped to far side of node.partition;
BSPintersect (far.origin, far.direction, pointer to far half space); endif
endif
end
Figure 3.6 Nonuniform Spatial Subdivision Algorithm (BSP)
47
recursively descends all the branches of the BSP tree which terminate at pierced voxels, making use of each partition node only once per ray.
Figure 3.7 shows the nonuniform spatial subdivision algorithm via an octree. The ray A shown here visits five of the voxels to examine the objects in those five voxels. Three of the eight objects need to be tested for intersection. The ray B visits only one of the voxels and performs one rayobject intersection test which is tested eight times in the original ray tracing algorithm. Finer subdivision can decrease the number of rayobject intersection tests at the expense of additional voxel processing overhead. This algorithm requires enormous amounts of data storage [Wat 89].
3.5 Uniform Spatial Subdivision Algorithm
Fujimoto [Fuj 86] introduced an uniform spatial subdivision algorithm in which voxels of uniform size are organized in a regular threedimensional grid. The overall strategy is quite similar to the nonuniform spatial subdivision algorithm. The voxels are processed in the order they are pierced. When each voxel is tested, candidate objects in the voxel are intersected with the ray. To perform this, Fujimoto developed a threedimensional digital difference analyzer (3DDDA) to incrementally compute successive voxel indices in the same way that efficient line rasterization algorithms incrementally compute pixel coordinates. This is
TO "Tested Object PV: Processed Voxels
Figure 3.7 Nonuniform Spatial Subdivision
49
similar to the line drawing algorithm [Fol 90]. This 3DDDA eliminates floating point multiplications and divisions. Figure 3.8 shows this algorithm.
The differences between uniform spatial subdivision algorithm and nonuniform spatial subdivision algorithm are the following:
1) the subdivision strategy does not depend on the structure of the environment;
2) access to the ray pierced voxels are very fast due to the incremental calculations.
The Figure 3.9 shows a 2D analog of the uniform spatial subdivisions. The ray A visits 14 voxels and results in one object being tested for ray intersection. But there are many empty voxels in this example. Since this algorithm does not depend on the structure of the environment, there are many voxels which point nothing. This kind of disadvantage can not overcome the advantage of fast access. The big disadvantage is that a huge memory may be required. Although paged memory techniques can be used to implement the scheme, there is a large memory management overhead in paging and many modest images can not be handled expeditiously.
3.6 Inside Test
Whatever kind of fast algorithm is used in ray tracing, it is vitally important that the rayobject intersections must be correct. The big advantage of spatial subdivision algorithm
procedure Raytrace (start, direction, depth, color) vector : start, direction integer depth colors : color
begin
vector : intersectionpoint, reflected direction,
transmitted direction
colors : localcolor, reflectedcolor, transmitted color color < black ; if (depth maxdepth) begin
color back ground color;
Grid intersect(start,direction,node);
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersection point,reflecteddirection, depth+l,reflected color) { Calculate direction of transmitted ray I
Raytrace (intersection point,transmitteddirection, depth+l,transmittedcolor) Combine (color, local color,local weight for surface,
reflected color,reflectedweightforsurface,
transmitted color,transmittedweight for surface)
end
end
end
procedure Grid intersect (origin, direction, node) vector : origin, direction globalpointer : *node
begin
(compute i,j,k for the voxel containing origin]; [set up 3DDDA based on direction and origin] repeat
for each object in voxel[i,j,k] do
Intersect(origin, direction, object);
endfor
if (no intersection) then
update i,j,k using 3DDDA;
endif
until an intersection is found or outside of environment end
Figure 3.8 Uniform Spatial Subdivision Algorithm
ï¿½
Circle Object
Figure 3.9 Uniform Spatial Subdivision
52
is that sorting the raypierced voxels is included in the algorithm. But the bounding volume algorithm does not include the sorting procedure. Even though we can sort bounding volumes, sorting order of bounding volumes does not mean the sorting order of objects. For example ray A pierced two bounding volumes BV1, BV2 in Figure 3.10 (a). The sorting order for bounding volumes with respect to ray A is (BV1, BV2). But the sorting order for objects is different than that for bounding volumes. There is rayobject intersection in BV1. But that intersection is not the correct one with ray A. The reason is that bounding is performed on objects not on space. This is a problem of the bounding volume algorithm (including hierarchical bounding volumes).
Spatial subdivision algorithm has a different kind of problem. The ray B in Figure 3.10 (b) pierces the voxels in the following order: voxel 4, voxel 3, voxel 7, voxel 6, voxel 5. Since object OBJ3 is in voxel 4, the rayobject intersection test is performed between the ray and OBJ3. But the intersection point is not in voxel 4.
The inside test makes sure that an intersection point is in a particular voxel in the spatial subdivision algorithm. The inside test procedure tests the following after a rayobject intersection has been detected:
1. find the intersection point using the intersection
parameter value(t) and the ray equation;
BV1
BV2
(a)
(b)
Figure 3.10 Inside Test
54
2. check whether the intersection point is in the x,y,z
interval of the voxel;
3. make sure that the intersection point is in the voxel.
To get a fast ray tracing algorithm, we must find a fast and easily computed inside test algorithm. To save memory in the spatial subdivision algorithm, we can combine the bounding volume algorithm and the spatial subdivision algorithm. In this case bounding is performed on subspaces rather than objects and the inside test must be done on every intersection point. This new algorithm will be developed in chapter 4.
CHAPTER 4
DEPTH SORTER
4.1 Introduction
The Depth Sorter discussed in this chapter is a procedure for substantially speeding up the ray tracing calculations. The core of this algorithm consists of two ideas:
1) sort bounding volumes,
2) avoid unnecessary intersection tests even for objects in
the bounding volumes intersected with the ray.
This algorithm is modeled after other fast algorithms and avoids the drawbacks inherent in those. This chapter shows the development of the fast algorithm. We discuss some of the problems in the fast algorithm. Data structure is also an important factor to consider to save memory space or when the scene contains a great many objects. A critical factor in computational efficiency of ray tracing is the ease with which object bounding can be accomplished. After objects are assigned to bounding volumes, the sorting of those volumes with respect to the ray is a key problem.
The main sorting idea and aspects of its hardware implementation are considered. The following subsections discuss these things step by step.
55
4.2 General Depth Comparator
The ray tracing algorithm finds the object nearest to the ray origin. That algorithm performs intersection calculation first. After a ray meets an object, the depth of the intersection with the previously found depth is compared. Repeating this step for every object, the comparator finds the nearest object at each pixel. We can interpret the ray tracing algorithm as a depth comparator for objects in image space. If we want to find the nearest object not using bounding or spatial subdivision algorithm, the ray tracing algorithm itself is sufficient for calculating the depth of each visible element in the image. The amount of calculation, and hence the execution time, is basically fixed by the complexity of the scene. By adding hardware, we can reduce the run time. According to statistics reported by Whitted [Whi 80], 95% of ray tracing time is spent for intersection calculations. To improve this, the bottleneck stage in ray tracing, we could use hardware that performs the intersection test. Let's consider some of the problems in implementing this kind of hardware.
The typical object space for ray tracing is made up of two types of surfaces, flat surface elements, and quadratic surfaces. For both of these types the surface equations and the surface normal equations are relatively simple [Han 89].
It may be possible to devise schemes for depth sorting using a wide range of object primitives. But it is not easy
57
when we consider the following example as shown in Figure 4.1. Let us consider a general polygonal surface element. Since the element may be concave, the star shown in Figure 4.1 represents the general case.
Note that the pertinent equations are:
f1(x,Y,z)=aix+biY+Ciz=O(i=l,2, ..,5) (4.0) To find the ray/object intersection we must
1. Calculate the intersection of the ray with the surface, as a function of the ray parameter t;
2. Find the intersection point using t and the ray equation;
3. Check the following conditions
( fl 0, f2 2 0, f3 " 0) OR ( f2 0, f3 S 0, f4 > 0) OR ( f2 0, f4 S 0, f5 : 0) OR ( fl 0, f4 > 0, f5 0) OR ( fl 0, f3 ' 0, f5 0) OR
( fl 0, f2 0, f3 < 0, f4 0, f5 0)
Tests enumerated in Step 3 can become difficult to implement. Generally the more complex the polygonal shape, the more involved the test for intersection becomes. The major problem with devising a depth comparator for arbitrary object primitives is that such objects do not have a regular (uniform) shape and whenever we employ a new primitive, the comparator must be modified to accommodate the comparisons
= 0
f = 0
f2 = 0
f3 = 0
f4 = 0
RayrZ
Figure 4.1 A general polygonal surface element
59
required by the new primitive. So primitive sorting is not easy and not necessarily a good idea. If we have the same type of object (i.e. bounding volume), we can sort the depths of objects with respect to the ray direction. If the bounding volume is relatively simple, ray intersections with the actual object points may be not difficult to find.
From the above example we know that the primitive depth sorting is not easy; however, bounding volume sorting could be easy. When we employ the bounding volume algorithm to sort depth with respect to the ray, we must consider the following questions.
1. What kind of bounding shape gives the simplest intersection test?
2. What kind of bounding volumes can be fast and easily sorted with simple calculations?
There are lots of bounding volumes which have regular shape. Box and sphere are representive. Other volumes could be more complicated than these. Intersection test of sphere bounding volume is easier than that of box bounding. For a simple object as shown in Figure 4.2 (a), we can bound it using a sphere as shown in Figure 4.2 (b). The void area of a bounding volume is defined as the difference in area between the orthogonal projections of the object and bounding volume onto a plane perpendicular to the ray and ray the origin of the ray [Weg 84]. Sphere bounding results in a very simple
Figure 4.2 Bounding Volume
61
intersection test. However, sphere bounding may not be proper for many shapes. Kay and Kajiya presented a method of handling box bounding based on slabs [Kay 86]. A slab is simply the space between two parallel planes. The intersection of a set of slabs defines the bounding volume. This bounding volume method does not overcome the void area problem for this kind of object, as illustrated in Figure 4.2 (a). This algorithm also needs to build a hierarchy structure of the objects and bounding volumes in image space. Drawbacks of the hierarchical bounding volume is addressed by techniques for generating bounding volume hierarchies automatically [Nic 94]. Hardware implementation for box bounding volume intersection results in machinery that is not simpler than that for sphere bounding volumes.
Even though we should not consider the void area problem, we need to look at the volume link problem as can be seen from Figure 4.3. One advantage of the ray tracing algorithm is that we can freely move the viewpoint. Objects are bounded by bounding volumes and bounding volumes are linked by pointers. For three bounding volumes, there are six possible links as shows in Figure 4.3. For the linked list (a), the bounding volume algorithm performs intersection tests first with bounding volume 3 and then tests with objects in volume 3. Even though the ray hits volume 3, it does not hit any object inside volume 3. The ray tracing algorithm repeats the same procedure for volume 2, but fails to find any intersection.
boundlist 3  2  null boundlist 3 n1 ull boundlist" 2 m3 H null boundlist 2 null boundlistJ 1 null boundlist ,_ 1 2  3 null
Figure 4.3 Bounding Volume Link Lists
View point
(a)
(b)
(c)
(d)
(e)
(f)
63
Finally it tests volume 1 and finds the nearest object from the viewpoint. In this worst case, the bounding volume ray tracing algorithm performs the whole intersection with objects and bounding volumes.
The best case is in Figure 4.3 (f). That algorithm performs intersection test with volume 1 and finds the nearest object. The algorithm keeps the nearest hit distance until all intersections are performed. When the next intersection with the ray and volume is being calculated, the algorithm compares only the nearest distance and updates it or bypasses those steps. The problem in this situation is how to keep the best linked list for any viewpoint, and for all rays. Spatial subdivision algorithm gives a solution for this situation, by sorting the ray hit spaces with respect to ray direction. But to solve the void area problem, the spatial subdivision algorithm divides the space more finely; however, that in turn requires bigger arrays. The memory representing image space will be proportional to n3, e.g. if latitude, longitude, height axis are divided by 10 then a 10xlOxlO array space is required to represent the image space. Yet another problem is the inside test, which was mentioned in the previous chapter.
So far we found the following problems with the bounding volume algorithm and spatial subdivision algorithm:
1. Void areas in bounding volumes;
2. Link status in bounding volumes;
3. Memory space representing image space in spatial subdivision algorithm;
4. Inside test in spatial subdivision algorithm.
By bounding subspace instead of object we can overcome the problems listed above. A new bounding algorithm based on above considerations is proposed in Figure 4.4.
Image space is properly divided. This divided subspace is a bounding volume if that contains at least a part of an object or an entire object. So an object can be bounded by several bounding volumes or one bounding volume can contain several objects in this bounding volume algorithm. This bounding strategy is to ease the void area problem. The next procedure is to sort those bounding volumes with respect to the ray. The algorithm will inspect objects in each sorted bounding volume. If the ray hits the nearest object, then no more calculations are required for the intersections test. For example in Figure 4.4 (b), spaces 1,2,3,4,5,7,9,12,14, can not be bounding volumes because they contain neither part of an object nor entire objects. Six bounding volumes will be sorted with respect to the ray direction as shows in (c). The algorithm will inspect volume 11 to find the nearest object A. After finding the nearest object in the bounding volume nearest to the viewpoint, the algorithm quits inspection.
(a)
Figure 4.4 New Algorithm and Data structure
boundist 61 ALITo null
null
(b)
sortlist 11
13 15
(c) null
Figure 4.4 Continued
LEInull
4.3 Possible Problems with the New Bounding Volumes
The new bounding volume algorithm considers two problems in ray tracing. The first one is to make bounding volumes compact to avoid the void area problem by allowing objects to overlap in the bounding volumes. So any shape could be a possible candidate for bounding volumes. We choose shapes that give easy intersection tests and are easily sorted with respect to the ray. We also need to consider possible hardware construction implications. Let's consider two bounding volumes, boxes and spheres.
4.3.1 Box Bounding Volumes
Box bounding volume is defined by six planes. Bounding volume is described by six plane equations. To find the intersection parameter value with a ray, the algorithm performs at least 6 division operations for each box.
Division algorithm takes three or four times as long to compute in most implementations as multiplication. Furthermore, the division algorithm tends to be difficult to pipeline due to the dependencies inherent in selecting quotient bits [Flo 89] [Erc 94]. To avoid division operations in the sort procedures, we compare the depths of box bounding volumes using coefficients of plane equations . In this case the inside test is not easy. For example, Figure 4.5 shows a twodimensional box bounding volume containing simple objects A and B. Ray R1 starts from outside of the two
RI
(a)
R2
(b)
Figure 4.5 Inside test and overlap test
69
bounding boxes 1 and 2. Some proper algorithm sorts the two bounding boxes with respect to the ray direction RI. When we calculate the intersection of ray R1 and objects in Box 1, ray R1 hits the object A at Box2. In this case ray R1 must return object B color information. To prevent this situation, we must check for an inside test as we remarked in the previous chapter.
When the ray hits object A, we know only the intersection distance, basically a parameter value (t). From this value t we can find the intersection location using the ray equation. After finding the intersection location, we need to check whether that location is in the bounding volume or not. These procedures need many operations (multiplications, additions, comparisons). This inside test is not easy for box bounding. Implementation of hardware is also more complicated than the box sorter when the inside test is added to the hardware.
4.3.2 Sphere Bounding Volumes
Sphere bounding volume also has some of the same problems that the procedure that uses box bounding volumes. One advantage of the sphere is that the volume is easier to specify, i.e. only one equation is enough to represent a sphere. Because bounding volume equation is quadratic, we need a square root function to solve for intersection distance.
70
While the design of fast and efficient adders and multiplier is wellunderstood, division and square root remain serious design challenges. The reasons are the intrinsic dependence among the iteration steps and the complexity of the resultdigit generation function [Erc 94]. So sorting depth with respect to ray intersection distance may not be a good idea. However, comparing the coefficients of the bounding volume equation and the ray direction provides a clue to sort depths of the bounding volumes. This approach will be given in section 4.5., where we discuss the object intersection with the ray. Using coeffients from the bounding volume equations and object intersection depth parameter t, we can easily check the inside test. The other problem is to avoid the intersection test with the object already found to have not intersected the ray. Figure 4.5 (b) shows two sphere bounding volumes which contain object A and (A and B). Ray R2 passes through volumes 1 and 2. A sort algorithm sorts two volumes. At object intersection stage, the intersection algorithm performs a test whether R2 hits object A in SPI or not. In this example R2 misses object A. For the next bounding volume, another intersection test will carried out. We want to avoid the object intersection test for A, because we already know that object A was already missed by R2. This problem also will be considered in section 4.6.
4.4 Bounding of Objects
The new bounding algorithm consists of two steps:
1. The object bounding stage; 2. The ray calculation stage.
In the first stage the objects in the image space are partitioned by invisible spherical surfaces. After the first stage, the algorithm computes the ray at each pixel. The main objective of the object bounding stage is to reduce the void volume in each sphere bounding volume. Generally we know that if we employ many bounding volumes, we could make the void volume very small. But if the sorting time involved in the examination of the bounding volume is larger than the calculation of the object intersections, there is no advantage. Even though we may implement hardware for fast sorting of bounding volumes, this hardware will require a large memory to hold the data for the many bounding volumes.
In the new bounding algorithm, sorting is done for finding the nearest bounding volume for each sphere. The sorting ultimately should be done in hardware and the intersection tests with objects and ray will be done by software. The complexity of the hardware will be dictated by the number of spherical bounding volumes it is to handle. Let's assume that the depth sorter (an implemented hardware for sorting sphere bounding volumes) can handle N sphere bounding volumes.
There are two ways to assign bounding spheres:
1. A priori assignment, typically by the user's
examination of the structure of the object assembly;
2. Automatic bounding, using an explicit algorithm to
cluster the objects in the scene.
We examine the algorithm for the clustering process. Usually we know the location of objects in the scene and the nature of the objects. We partition image space into N boxes. By shrinking each axis of the rectangular boxes, we make each box as compact as possible. When we bound the objects the best rule to minimize void areas is that the bounding box shape is made cubical. After making boxes of desirable shape, we tightly wrap each box using a circumscribing sphere. This wrapping procedure will be the same that of the automatic bounding which will be discussed later. After assigning sphere bounding volumes, the algorithm inspects each object for a sphere bounding volume whether objects are already included in the bounding volume or not. For simple image space, bounding by visual inspection is very easy. When the image space has many objects and the space partition is not simple, the objects bounding step may take many calculations.
The automatic bounding procedure is simpler to employ than bounding made by inspection. Figure 4.6 shows the automatic bounding procedures for twodimensional object space. (We use twodimensional space for illustration only; the algorithm is designed for threedimensional application.)
Figure 4.6 Bounding procedure of New Algorithm
. . . . . . . . . . . . .
10
74
This space has 4 objects A,B,C,D. Let's assume N=6, i.e. this algorithm can employ up to 6 sphere bounding volumes (in 3space.)
To bound objects, we must know the extent of the object space. By checking each object, we find the maximum extent of the objects. Figure 4.6 (a) shows this extent. The aspect ratio of this frame is approximately 3:2 so this space is divided as 6 boxes. Each box in the example has large void areas. To reduce those void areas, we shrink each box in the principal view coordinate dimensions such that the shrunk boxes just enclose the objects or parts of objects. We may also find that some boxes are empty, i.e. contain no objects from the scene. These boxes are removed from the list of potential bounding boxes. In this example box 6 in figure 4.6
(b) is removed from the candidate list for bounding volumes. We can easily wrap a sphere around each rectangular box. For the twodimensional case used as an example, the center of the sphere is the intersection point of the two diagonals and the radius is the half length of the box diagonal. Employing these circles as the bounding volume, we can bound objects in the image space. For the threedimensional case, the threedimensional bounding spheres become circles in the viewing space.
A visual inspection for tailored bounding volumes can employ all N bounding volumes because the partition is performed by the human operator. Even though the partition of
75
space is not easy, there are many alternate methods for assigning the bounding spheres. In general it takes a long time to bound objects for an arbitrary scene. The automatic process of assigning bounding volumes takes a short time to bound objects in the image space. It can employ up to N bounding volumes. The automatic bounding algorithm shown in the example in Figure 4.6 uses 5 of the 6 possible bounding volumes. In this example one bounding volume is idle during the subsequent sorting process. One could develop a tighter algorithm, at the expense of additional execution time, which can render fewer idle bounding volumes: the bounding volumes are more dense to reduce the void area problem. However, the additional running time was found to reduce the overall efficiency. Hence the procedure shown here is a "best compromise" solution leading to the greatest observed improvements.
4.5 Algorithm
4.5.1 Filtering and Comparison
There are many sphere bounding volumes which were constructed from the ideas presented in the previous section. In the ray calculation stage, we need to find the spherical bounding volumes which are intersected by the ray and sort the distances of the points where the ray intersects the bounding volume with respect to the ray direction. After discarding the unintersected sphere bounding volumes, we need a norm to
76
compare depths, two spheres at a time. This norm can be derived from a comparison between the coefficient pairs of quadratic equations that describe the spheres.
Let d be the ray direction unit vector from the view point
d = (d,,, dy, d,) and d,2+dY +d2 = 1 Furthermore,
V0 is the viewpoint (V,, VY, V,);
BC is the center of the bounding sphere (x0, Yo, z0); and
R is the radius of the bounding sphere. The bounding volume equation is (XXo) I+ (yyo) 2 + (Z_Zo) 2=R2 (4.1)
The ray equation is
V+ td= (Vx+ tdx, V + tdy, V,+ td,) (4.2) Substituting Eq. (4.1) into Eq. (4.2), we get
V V+ tdxXo) 2 + ( Vx+ tdxXo) 2 + ( V + tdxXo) 2:=R2 (4.3)
Reorder Eq. (4.3) with respect to t
t( d2,+ dy2+d 2) +t [ 2dx (V,,xo) +2 dy(Vyyo) +2 d.,(Vz Zo)]44
+ (2Vxx) 2+ (Vyo) 2+ ( Vzzo) 2R2=0
Let
b=2dx (Vxxo) +2dy (Vyo) +2d.(Vzz0)(
(4.5)
C= (V,Xo) 2+ (Vyo) 2+ (+ zZ) 2R2 (4.6) Then Eq.(4.4) becomes
t2+bt+c=O (4.7)
The intersection of the parametric ray with the sphere is characterized by the quadratic equation (4.7). Each bounding volume will have a different set of coefficients {b,c} for every ray. Let t2+blt+cl = 0, t2+b2t+c2 = 0 be the two parametric intersection equations for the sphere bounding volumes 1 and 2 respectively. Using the coefficient b,c and only simple mathematics, we can compare the depths of the sphere bounding volumes with respect to the ray direction. Before developing the algorithm, we define discrimination equation D and four state variables x1,x2,x3,x4.
x, = 1 b2 < b,
0 :b2 k b,
X2 = 1 4(cIc )
0 4 (CIC2)>b2b22
X3 = 1 b1b2<2(c1+c2)
0 b1b2>2 (c1+c2)
x4 = 1 (b2cI bIc2)(blb2)<(cIc2)2
0 (b2cI bIc2)(blb2)(cc2)2
D=b24c (4.8)
Let's consider the following two quadratic equations.
ff (t) =t2+bit+ci (4.9)
f2(t)=t2+b2 t+C2 (4.10) Because a nonpositive intersection value is always meaningless in ray tracing, we are interested in only the smallest positive root of each equation.
Three spherical bounding volumes (spl, sp2, and sp3) are shown in Figure 4.7(b). The ray R starts at the origin P with a direction. Each bounding volume contains objects. The ray R has no intersection with sp2 or sp3. So any object in sp2 or sp3 will not be intersected with the ray R. Physically, the sphere bounding volume spl does not meet the ray, but mathematically that volume has an intersection with the ray. The curve f1(t) shows the relationship between physical interpretation and mathematical interpretation. The intersection of spl with the ray R occurs for negative values of the parameter t. Two negative real roots of f1(t) mean that the sphere bounding volume is located behind the ray origin. Since we are only interested in the objects which are in the direction of the ray, we need to remove those bounding volumes at the presort stage in the ray tracing algorithm.
Figure (a) in 4.7 gives a clue to the identity of the unintersected bounding volumes. The negative discriminant (D) in equation (4.8) means that the volume has no intersection with the ray. Even though we calculate intersections with the ray, two negative intersection values of the parameter t are useless as intersection points in the direction of the positive ray, as can be ascertained from f1(t) in (a).
(a)
SP3
SP2 0R
(b)
Figure 4.7 Unintersected bounding volumes
80
The center of f,(t) is in the left half plane and the intersection with y axis is positive. So when the coefficient pair is bi 2 0 and ci > 0, we know that that sphere bounding volume is behind the origin of the ray. We filter out those two cases in a presort stage with simple tests of the quadratic function coefficients:
1. f(t) has no real root;
2. f(t) has two negative roots.
Case 1 : cl > 0 and c2 > 0
Consider the two sphere bounding volumes in the image space shown in Figure 4.8(b). The ray R starts at the outside of both sphere bounding volumes. The related mathematical curves are shown in (a). All roots of these quadratic equations are positive real. This means that two bounding volumes are in the direction of the ray and the two sphere bounding volumes do not include the origin of ray P. The ray R start at P and hits spl at A, and sp2 at B. We define depth as a length from the origin of the ray P to the nearest ray intersection position on the sphere bounding volume. For example the length PA is the depth of spl and the length PB is that of sp2. The related intersection values of the parameter t in Figure 4.8(a) are tA and tB. These intersection values are proportional to the actual depths. The two bounding volumes can be overlapping or separated as seen in Figure 4.8(b). In either any case we can compare the depth with respect to the ray path without using
(b)
Figure 4.8 Case 1
0
(a)
SPI
SP2
82
the square root function. The Appendix gives details of the mathematical steps. Using a flow chart and state variables, we summarize these mathematical steps in Figure 4.9. Table in
(b) of Figure 4.9 gives the selection condition t, by using only the state variable equation.
11=ti
=xlx3 +x134 +x2x3X4 (4.11) =x1 (x3+3E) +x23 x4
Consider one more thing about the curve in Figure 4.8. spl contains a part of a triangle and sp2 also has a part of the same triangle. When we perform the intersection test for objects in spl, we get the intersection point I. But I is not included in spl. How do we know that a intersected position is in a bounding volume? We call this the "inside" test. If we know the intersection values of the bounding volumes (tAtB,tc,tD in this example), the comparison of these values ( tA t, tc or tB < tl  tD ) gives the proper information. But we want to avoid using the square root function and use only the coefficients b,c to compare depths.
This comparison test indicated above does not work directly for objects enclosed in multiple spherical bounding volumes. Let's consider the meaning of fi(t). fi(t) = 0 means that ray R is on the surface of volume i at the parameter value t. fi(t) < 0 means that t is between two intersection values, so the ray R is inside of the volume i. fi(t) > 0 means that the ray R is outside of volume i. From this
1/yes O/no
t2 ti
(a)
(b)
Figure 4.9 State variables and Depth comparison
x3X4X x 00 01 11 10
00 t2 t2 ti ti 01 t2 ti ti t2 11 t2 t2 ti ti 10 t2 t2 ti ti
84
criterion we know that intersection happens in sp2 because f1(ti) > 0, f2(ti) < 0, where
fi(t)=(t+bi)*t+ci (i=1,2) (4.12) Even though we did not calculate the intersection values of the sphere bounding volumes with the ray, we can compare those depths and inside test using only a pair of quadratic equation coefficients.
Case 2 : cl < 0 and c2 5 0
The roots of each function have a positive and a negative root. This means that the ray starts at the intersected common part of each sphere bounding volume. Figure 4.10 (b) shows this situation. We defined the depth as a length from the origin of the ray to the nearest intersection point of the bounding volume. We apply that definition to this situation. The purpose of depth definition is to find the order of sphere bounding volumes with respect to the ray path. Adding this concept to the depth definition, we easily avoid the complicated selection equations shown in the previous case.
When we calculate reflected color, the origin of the ray is changed from the view point to the intersection point. If the intersection point is in the common part of volume 1 and volume 2, the secondary ray starts from that intersection point. In Figure 4.10 the depth of spl is less than that of sp2, but the ray will intersect with both spl and sp2. We don't know how to find the exact order of these two volumes,
(a)
SPI
(b)
Figure 4.10 Case 2
86
since both bounding volumes contain the origin of the ray. When we compare the depths of volumes we can select either volume so we can give a same order for two volumes by defining that the depth is zero when the sphere bounding volume contains the origin of the ray. The selection condition t, is a don't care.
Case 3: cl 5 0 and c2 > 0
Because function f1(t) has a negative root and a positive root, we can infer that the ray starts from the inside of volume 1 in Figure 4.11. But function f2(t) has two positive roots. These two curves mean that sphere bounding volume 2 does not contain the origin of the ray, but bounding volume 1 does. These sphere bounding volumes can be overlapped or separated. In this figure even though the ray meets volume 2 at tB, we can not give higher priority for volume 2. Since volume 1 contains the origin of the ray, the depth is zero. So volume 1 must be selected in this comparison. The selection equation is
13=t1=I (4.13)
Case 4: cl > 0 and c2 : 0
This is the opposite of the situation of case 3. Bounding volume 2 has the ray starting point, rather than volume 1, so the selection equation must select volume 2. This situation is shown in Figure 4.12. The selection equation is
(a)
SPI
(b)
Figure 4.11 Case 3
(a)
SPI
SP2
(b)
Figure 4.12 Case 4
14=t1=O (4.14) If we define two more state variables Y, and Y2, we can combine these four cases with respect to t,
Y = 1 (c1>O)
0 (c, 0)
Y2= 1 (c2>0)
0 (c2<0)
t1=11yly2+13jy2+14yV=11yly2+13ly2=y2 (1ly+T) (4.15)
if t, 1 then select f, else select f2. Using only coefficients b and c, we can compare the two bounding volume depths with respect to the ray path. Repeating this comparison, we can sort the depth with respect to the ray for any pixel or for any direction.
4.5.2. Sorting
This is the process of arranging the sphere bounding volumes with respect to depth. The arrangement of sphere bounding volumes is undertaken so that succeeding processes may find the nearest object from the ray origin with fewer intersection tests than is needed in the basic ray tracing algorithm. Even though we employ many spherical bounding volumes, only a few of them intersect with a given ray.
90
The sort is performed with the exchange algorithm [Lor 75]. The inputs of this sort are the number of intersected bounding volumes n and n sets of three numbers (the volume identifier and coefficients b, c) which represent a sphere bounding volume. This sort algorithm is presented in Figure 4.13.
4.5.3 Ray Tracing Algorithm
Ray tracing is an algorithm that works entirely in object space. At a given point in the image plane, the visible surfaces are obtained by tracing a ray backwards from the eye through the imaging point into the scene. If this ray intersects an object, then local color calculations will determine the color that is the result of illumination at that point. This is light from the light sources directly reflected at the surface. If the object is partially reflective, partially transparent, or both, then the color of the point in the image plane should include a contribution from reflected and transmitted rays. These must be traced backwards to discover their origin, and hence the light they contribute. Determining a color for each of these rays may require the tracing of further rays and other intersections with objects. However, the ray tracing algorithm spends most of its time in the intersection calculations.
To improve this bottleneck problem, the new algorithm partitions the object space and bounds the partitioned space using sphere bounding volumes at the ray tracing
procedure SORT(sort,n) array sort[n]; integer n; begin
do i = to n
select & 0;
temp[select].set  sort[i];
temp(select].id i ;
if ( i = n ) return(sort);
do j = i+1 to n
temp[!select].set 0 sort(j];
temp[!select].id  j ;
select 4 compare(temp(select],temp[!select] );
enddo
sort[temp[select].id] N sort[i];
sort[i] 4 temp[select];
enddo
end
Figure 4.13 Sort Algorithm
92
initialization stage. In calculating the intersection with an object the new algorithm first performs a ray intersection with the sphere bounding volumes. After intersection tests with the sphere bounding volumes, the new algorithm sorts the sphere bounding volumes intersected by the ray with respect to depth. After sorting volumes, a ray intersection test is performed with objects which are in the first sorted sphere bounding volume. Once the algorithm finds the nearest object from the ray origin, the intersection test is exited. If not, the ray intersection test with other objects will be continued, until the last bounding volume is processed.
Figure 4.14 shows the basic ray tracing algorithm. The suggested new algorithm is presented in Figure 4.15. Trace, shade and intersect routines form the heart of the ray tracing algorithm. The trace routine is emphasized in Figure 4.15 because shade and intersect routines could be the same as in the basic ray tracing algorithm. Each of these algorithms works with a set of object primitives, often just a collection of triangular surface facets. We can define many kinds of primitives using simple mathematical equations. Some primitives can not be bounded by a finite number of sphere bounding volumes because they are infinite objects. For example the xy plane or a cylinder defined by its radius and orientation (without giving a length) can not be bounded by a finite number of spherical bounding volumes. If we consider such primitives, we need to employ two lists, one for sphere
procedure Raytrace (start, direction, depth, color) vector : start, direction integer depth colors : color
begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmittedcolor
color < black ; if (depth maxdepth) begin
color  backgroundcolor;
[intersect ray with all objects and find intersection
point (if any) that is closest to start of ray]
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersectionpoint,reflecteddirection,
depth+l,reflected color)
{ Calculate direction of transmitted ray }
Raytrace (intersection point,transmitteddirection,
depth+l,transmittedcolor)
Combine (color, localcolor, local_weightforsurface,
reflected_color,reflected weight for surface,
transmitted color, transmitted_weightfor_surface)
end
end
end
Figure 4.14 Ray Tracing Algorithm
procedure Ray trace ( start, direction, depth, color) vector : start, direction integer : depth colors : color
begin
vector : intersectionpoint, reflecteddirection,
transmitted direction
colors : localcolor, reflectedcolor, transmittedcolor
if depth>maxdepth then color block else
begin
color + back groundcolor;
{ intersect ray with all sphere bounding volumes }
if (bounding volume intersection)
begin
{ sort n sphere bounding volumes which are intersected
with ray }
do i = 1 to n
( 1. intersect ray with all objects in the i th
depth's sphere bounding volume
2. find intersection point (if any) that is
nearest to origin of ray
3. check inside test for nearest point whether
it is true or not
4. update the nearest point if item #3 is true
5. quit do loop if nearest point found in the
ith depth's sphere bounding volume.)
enddo
end
[intersect ray with all objects in infinite objectlist and update intersection point (if any) that is nearest to origin of ray]
if (intersection)
begin
localcolor < [ contribution of local color model at intersection point ] { Calculate direction of reflected ray }
Raytrace (intersectionpoint,reflected direction,
depth+1,reflected color)
{ Calculate direction of transmitted ray }
Raytrace (intersectionpoint,transmitteddirection,
depth+l,transmittedcolor)
Combine (color, local color,localweightfor surface,
reflected color,reflected_weightfor surface,
transmitted color,transmittedweightforsurface)
end
end
end
Figure 4.15 New Algorithm

Full Text 
PAGE 1
ON SPEEDUP PROCEDURES IN RAY TRACING BY JUN LEE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1994
PAGE 2
ACKNOWLEDGMENTS I would like to express my appreciation to my advisor and supervisory committee chairman, Dr. John Staudhammer, for the guidance and encouragement he provided me on this project. I am also grateful to the other members of my supervisory committee, Dr. Panos E. Livadas, Dr. Jack R. Smith, Dr. A. Antonio Arroyo, Dr. Paul W. Chun, for their encouragement and their commitment. I wish to express my sincere gratitude to the following organizations for providing resources which made this research possible. IBM Corporation loaned an RS6000 computer system for this research. Financial support was provided by the Republic of Korea Air Force. Finally I express my sincere appreciation to my family for their dedicated support through all phases of my life especially at Gainesville, Florida in the USA. 11
PAGE 3
TABLE OF CONTENTS page ACKNOWLEDGMENT ii ABSTRACT v CHAPTERS 1. INTRODUCTION 1 1 . 1 Motivation 1 1.2 Problem Definition 5 1.3 Overview of Dissertation 7 2. THE RAY TRACING PROCEDURE 10 2.1 Background 10 2.2 The Ray Tracing Process 18 2.3 Uniprocessor Implementation 21 2.4 Possible Bottlenecks 23 2.5 Possible Speedups 24 2.5.1 Image Clipping 24 2.5.2 Simplified Sorting 27 2.5.3 Parallelization 27 2.6 Summary 2 8 3. FAST RAY TRACING ALGORITHMS 30 3.1 Introduction 30 3.2 Bounding Volume Algorithms 32 3.3 Hierarchical Bounding Volume Algorithm 38 3.4 Nonuniform Spatial Subdivision Algorithm 42 3.5 Uniform Spatial Subdivision Algorithm 47 3.6 Inside Test 49 4. DEPTH SORTER 55 4.1 Introduction 55 4.2 General Depth Comparator 56 4.3 Possible Problems with the New Bounding Volume.... 67 4.3.1 Box Bounding Volumes 67 4.3.2 Sphere Bounding Volumes 69 4.4 Bounding of Objects 71 iii
PAGE 4
4.5 Algorithm 75 4.5.1 Filtering and Comparison 75 4.5.2 Sorting 89 4.5.3 Ray Tracing Algorithm 9 0 4.6 Data Structure 95 4.6.1 Primitives 95 4.6.2 Link List 97 4.6.3 Sorting 9 8 4.7 Hardware Considerations 100 4.7.1 Introduction 100 4.7.2 The Ray Intersector 102 4.7.3 The Depth Sorter 112 5. EXPERIMENTAL RESULTS 122 6. CONCLUSION 146 6.1 Summary 146 6.2 Remarks on the New Algorithm 148 6.3 Recommendations for Future Research 149 REFERENCES 150 APPENDIX 153 BIOGRAPHICAL SKETCH 157 IV
PAGE 5
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ON SPEED UP PROCEDURES IN RAY TRACING By JUN LEE December 1994 Chairman: Dr. John Staudhammer Major Department: Electrical Engineering This study critically examines the ray tracing process used in the generation of highcomplexity images in computer graphics and provides design parameters for hardware which will alleviate bottlenecks inherent in the ray tracing procedure. A ray tracing algorithm is developed and bottleneck points in the ray tracing algorithm are identified which can be eliminated by hardware implementation. To develop the new algorithm, the traditional fast algorithms are studied. By combining the strengths and considering the weak points of various algorithms, a new procedure is proposed that eliminates the inherent limitations of the basic ray tracing process. The new algorithm employs sphere bounding volumes to reduce the number of rayobject intersections in the basic ray tracing algorithm. Traditional bounding volumes are used to V
PAGE 6
bound objects. Sphere bounding volumes in the new algorithm are used to bound subspaces which could contain whole objects or parts of objects. The sphere bounding volumes are sorted with respect to ray direction for each ray. Traditional sorting of sphere bounding volumes need to calculate a sguare root. To avoid sguare root calculations, we develop a comparison algorithm which uses coefficients of guadratic equations for sorting bounding volumes . In traditional ray tracing algorithms the greatest computational load arises from the calculation of rayobject intersections. In the new algorithm, rayobject intersection tests start from the nearest bounding volume. If the ray hits an object in the bounding volume, the intersection test is terminated. If not, an intersection test is performed on the next nearest bounding volume. Since the new bounding volume is established in the image space, not in object space, we must check whether the intersection point is in the bounding volume when a ray hits an object. For this test we develop a simple procedure using the coefficients of bounding volumes. The performance of the new algorithm is verified with computer simulations. We compare two outputs which are produced by each algorithm (traditional ray tracing algorithm and new algorithm) and show a substantial reduction in overall ray tracing calculation time. Characteristics of hardware modules are developed which can further reduce the image rendering time. VI
PAGE 7
CHAPTER 1 INTRODUCTION 1.1 Motivation The quest for visual realism continues to be a major research area in computer graphics. The thrust is to achieve images indistinguishable from a look at a real scene, i.e. we desire to effect a visual environment of artificial reality. Efforts continue to devise techniques that can ever more faithfully account for visual effects in computer produced images. Concurrent with the search for more realistic image detail calculations is a search for more effective computational techniques for wellunderstood basic techniques for image calculation. Computer graphics developers are continually looking for computationally economic techniques to simulate virtual reality. As computers have become more powerful and graphical hardware I/O devices more prevalent, photo realism has been achieved. Photo realism is achieved by painting on the display surface an image that focusses onto the retina of an observer a picture that would normally be produced by a natural environment. The basic underlying technique is to simulate, as far as possible within the constraints of the resolution imposed by the display hardware, a view of the natural scene. 1
PAGE 8
2 The basic technique is to put on the screen image values that are those produced from the natural scene by the rays that ultimately are focussed on the retina. The realworld environment produces a plethora of rays, scattering light in every direction, and only a tiny fraction ever finds its way to an observer's eye, thus producing a directional view of the real scene. To calculate all the rays in the real world is wasteful for producing a computerdisplayed image: that image is only one of the infinity of rays emanating from the real scene, and all the other views are not observable from the observer's viewpoint. The technique of calculating all rays is termed forward ray tracing; however, the visible scene, made up of the rays entering the observer's retina, requires far fewer rays to be considered. These rays are the ones entering the eye, and their production from elements of the realworld scene, can be mimicked by following these rays from their destination (a spot on the retina) to the sources of the light whence they came. This technique is called reverse ray tracing, or often termed "ray tracing". Ray tracing is an image rendering method which processes each pixel in turn and finds the surface point in the three dimensional scene, of which a view is being presented, which determines its intensity and color. The image is not painted on the retina, but rather a display screen, which in turn is focussed by the observer. On the screen we paint a set of pixels, which depict the desired view of the real world. The
PAGE 9
3 ray tracing method is based on following rays from the viewpoint through each pixel until the rays meet a surface of an object. It is the coloring of that surface point that is painted as the color of that pixel. The ray tracing algorithm itself allows the incorporation of many visual effects in a straightforward manner. Adding the same effects into other threedimensional computer graphics techniques is much more difficult, if not impossible [Lin 92]. The technique of ray tracing resulted from the endless pursuit for photo realism. Ray tracing produces high quality images, at a high computational cost. One of the biggest costs is the calculation of the visible object element at each pixel location. The algorithm must find the nearest object point from the location of the view point. Therefore the heart of any ray tracing package is the set of ray intersection routines. No matter what kind of techniques are applied to ray tracing, there is always the need to find the intersection point of a ray and an object. The basic ray tracing algorithm is for (each pixel on the display) for (each object) find the nearest surface point retain the nearest surface point calculate the color of that point
PAGE 10
4 For example if the screen resolution is 1000 x 1000 and if there are 100 objects in the scene, the basic algorithm will require at least 100 million ray intersection calculations of which one will be used to calculate the picture coloring. If the objects themselves are defined by complex methods, each intersection calculation will also take a considerable amount of time. Even the largest super computer would find this computing requirement hard to satisfy within a reasonable running time. Since the determination of each pixel color does not depend on the other pixels, parallel processing is possible pixel by pixel. But because the intersection calculation of some pixel takes a very long time, realtime interactive simulation is impossible for realistic image scenes. To get the high quality picture desired for virtual reality applications using the ray tracing algorithm, it is critically important to avoid wasting computation time on checking the ray against objects that have no intersection and which can be trivially eliminated. The reduction of the number of intersection calculations may be done by many software optimization methods. All of those methods depend on eliminating those unintersected objects. Still one of the greatest challenges of ray tracing is efficient execution. Despite its impressive image rendering capability, ray tracing is often dismissed as being too computationally exorbitant to be useful. Therefore efficiency
PAGE 11
5 is a critical issue and has been the focus of much research from the beginning. This has led to many creative approaches. Decreasing computing time can be achieved both by software improvements and hardware additions . 1.2 Problem Definition The reduction of the number of intersection calculations may be done by the following four approaches. 1. Bounding Volume Method 2 . Hierarchical Bounding Volume Method 3. Uniform Spatial Subdivision Method 4. NonUniform Spatial Subdivision Method Each approach has its own advantages and disadvantages . Because the intersection test is simple and no hierarchies are required, the sphere bounding volume algorithm is the simplest one. Here objects are bounded by a sphere extending to the objects' maximal extent in the image space. One needs to check only if a bounding volume lies in the pixel's position. Thus simple test can be used to trivially reduce the number of candidate objects that need be considered for the full intersection calculation. Only those objects that meet the location test are then considered further. Consider the case shown in Figure 1.1. Here a long thin object, say a slender rod, is being rendered. Since the bounding volume has a characteristic of an easy intersection
PAGE 12
6 Figure 1 . 1 Worst Case of Bounding Volume Algorithm
PAGE 13
7 test with the ray, the possible bounding volume could be a sphere. When we apply the sphere to this object as a bounding volume, most of the rays which intersect the bounding volume will in fact not intersect the object. Thus many of the calculations will be for naught. In fact, using the bounding volume approach may easily INCREASE the number of reguired operations. Let's consider the other case as shown in Figure 1.2. Here the bounding volume encloses several objects. In this case most of the rays which intersect the bounding volume will intersect an object or a few objects. The sphere bounding volume in Figure 1.2 shows the main idea of the bounding volume method. The critical question is how to apply the sphere bounding volume to every environment in the image space, so as to achieve the best efficiency as shown in Figure 1.2. The goal of the present work is to address this problem. The specific objectives of this study are as follows: 1. Analyze the ray tracing process to assess the computing requirements in its various phases. 2. Develop a new speed up procedure for ray tracing. 3. Implement this approach. 4. Compare the efficiency of the proposed approach experimentally with that of the original ray tracing algorithm .
PAGE 14
Figure 1.2 Best Case of Bounding Volume Algorithm
PAGE 15
9 1 .3 Overview of the Dissertation The dissertation consists of six chapters. Chapter 1 explains the uses of ray tracing and discusses the computational problems inherent in it. Chapter 2 critically reviews the ray tracing process. Chapter 3 discusses the traditional fast algorithms used in implementing the ray tracing procedure. The main problems of each fast algorithm are summarized also. A new fast algorithm on ray tracing is presented in Chapter 4. Results of simulations which were used to verify the performance of the new fast algorithm are presented in Chapter 5. The final chapter summarizes this study and suggests directions for future research.
PAGE 16
CHAPTER 2 THE RAY TRACING PROCEDURE 2 . 1 Background The objective of the ray tracing process is to calculate an image that is a faithful reproduction of a scene, be it natural or an imagined one. The test of a wellrendered ray traced image is in the fidelity with which a natural scene can be rendered from a geometric specification of objects in the scene, the surface properties of those objects, and a description of the illumination of the scene. The image is displayed on a workstation screen and is viewed by an observer. The image of a natural scene seen by the observer should be indistinguishable from a view of a natural scene. It is therefore important to review just what the eye can see. Normal color vision perceives images in many colors and with a high degree of image fineness (acuity). There are basically two parts of the field of vision: peripheral and foveal. It is the foveal vision that has a high degree of spatial precision; peripheral vision is less spatially acute, but has the ability of detecting motion (without detail) and conveys to the user a sense of presence in the scene. Foveal acuity and color resolution has been studied in detail [Sou 61]; peripheral vision is less acute and less discriminating 10
PAGE 17
11 in color. Foveal vision exists in the visual center of a view field, and subtends only about one degree of arc. The view field for a normal eye is almost 180Â° lefttoright and almost 180Â° topdown. Normal human vision is binocular: the left and the right eyes perceive slightly different images. The central part of the visual field is common to both eyes, resulting in an overlapped binocular field of some 150Â° high and about 120Â° laterally. The visual cortex, the image receptor in the human observer, derives threedimensional information from the parallax between the two views [Gra 63]. The spacing of receptors in the foveal region is about 2 to 3 jum, and the focal distance of the eye in 15 mm. This results in a physiological resolution of about 0.5 to 0.7 minutes of arc. The eye in fact can resolve even finer details on structured targets [Lux 68]. In forming an image on its receptor surface, the retina, the human eye is basically an optical device. Every optical device suffers from chromatic aberration. The eye will focus on greenyellow light, making the redlight focus slightly behind the retina, and the bluelight focal plane slightly in front of the retina. Most measurements of visual acuity, image flicker rate and similar data, use white light with implicit color aberrations. The centralfield visual acuity varies with the type of image. Astronomers, before the availability of telescopes.
PAGE 18
12 depended heavily on acute eyesight; a job requirement was exceptionally keen vision, so that they could tell stars apart on a dark background. The normal discrimination for this task is about 2.9 x 10^ radians (about 1 minute of arc); for long lines on a selfluminous background (spider webs lit from behind), the minimum visual angle is about 8 arcseconds [Lux 68 ] . The distribution of visual receptors is not uniform in the human eye. The density decreases markedly toward the periphery. Rods (light intensity detectors) number 110 to 130 million, and cones (color detectors) 3 to 7 million. Hence if it were possible to focus a computerproduced image directly on the visual detectors of an observer's eye, one should be able to evoke a full monocular image consistent with the realworld environs with about 100 million picture elements. Since the "normal" workstation display has about 1 million pixels, technology is within two orders of magnitude to produce images that a human observer would not be able to distinguish, at least in image acuity, from a view of the real world. Unfortunately the eye is in constant small motion when a view is perceived. This small motion is a tiny random, or nearly random motion, the saccadic motion of the eye. We should note that during saccadic eye motion the observer (at least the human observer) is blind, else a motion blur would interfere with the image being "seen", i.e. processed for understanding. The mechanism of this brief visual blackout is
PAGE 19
13 not well understood at this time, but indicates a highly complex interaction between the pure visual processing centers in the brain and the eye motion control centers. In this eye motion, apparently the vision detectors are "scanned" relative to the stationary image; image data processing in the nerve net underlying the receptors allows the eye to detect features in an image which are somewhat finer than the spacing of rods in the retina. Saccadic motion requires that a picture be rendered on the computer display with as fine a resolution as the best eye resolving power, i.e. about 1/10 degree of arc in the areas that the eye might be roving over during normal vision. At the most this will be a total view angle of 180Â°. Hence the number of pixels required will be about 180X60X10 Â« 100,000 in each the horizontal and vertical directions for a hemispheric panoramic display. This would require 10Â‘Â° pixels, about four orders of magnitude higher than contemporary "high resolution" workstation displays. Thus even with the best of current interactive display devices we can produce only a defocussed approximation of a realworld image. In general, color is perceived from the stimulation of various color receptors in the light detection organ of the perceiver; these are vastly different in various living organisms [Cro 94]. Hence the sensation of color is a speciesdependent phenomenon. In some creatures there is a highly developed color detection system favoring blue colors, as in
PAGE 20
14 fish. Thus when we speak of color, we really need to specify what organism we use as a referent. Clearly, in computer graphics, we mean "color" to be a human experience. Humans perceive color from three primary color receptors, which have relatively broad frequency responses that overlap in the visible spectrum. Each have a primary response in the Red (590 nm), the Green (500 nm) and the Blue (470 nm) [Cro 94]. The combination of the responses evokes the recognition of color. This is the basis of the threecolor (tristimulus) system of the common TV, which defines the standard colors used in a normal workstation display. Any color shown is a mixture of the three primary components . In generating a color image, three primary color components must be produced for each display pixel. The color resolution of the human visual system is usually measured in terms of color saturation and color purity. Saturation measures the number of steps that are perceived between a "pure" color (such as a spectral color) and white (composed of 3 equal parts of the three RedGreenBlue primaries). Such a measurement shows that the human can distinguish around a hundred different levels of saturation [Gra 63 ] . The color purity discrimination is a bit sharper; a change of wavelength of around 3 nm can be perceived in the yellowgreen part of the spectrum (about 550 nm) . Hence the color purity is more critical, amounting to about ^ % in color
PAGE 21
15 fidelity [Moo 61]. Thus specifying color information to an 8bit accuracy is consistent with human color perception; however, for precise applications, when the nonlinear characteristics of the phosphors in the display surface need to be considered, a primary color specification of at least ten bits is necessary [Mar 82]. Thus the color display that is to produce an image of the real world should show a pair of images, one for each eye, each display should subtend a visual angle of some 180Â°, each should have about 10*Â” pixels, each pixel should be capable of displaying three primary colors, with a color resolution of about 10 bits. Clearly we do not now have affordable technology to approach these numbers; currently we are 4 to 5 orders of magnitude away from these numbers . One other aspect of a monocular view of a natural scene is inherent depth information in the wavefront. An observer can easily focus on near objects, or far ones, thus deriving some depth information. This is easily demonstrated in a view out of a window: the observer can easily ignore (i.e. NOT focus on) smudges on the glass pane, but sharply observe distant scenery. The image which one normally produces on a workstation display surface lacks the focussing feature. Normally a fixedfocus distance (usually infinite) is used for the production of a scene. Such an image is then painted to a screen at a fixed distance from the observer. In this respect a computerproduced image is yet another approximation
PAGE 22
16 to a natural scene. This refinement in imaging is detail is usually ignored in artificially produced imaging. Therefore, at best, current technology can produce only an approximation, notably of lower spatial resolution, but close to reallife colors. Most of the work in this dissertation will deal with workstation displays, with a resolution of about 1000 x 1000 pixels. The visual image normally subtends about 30Â° to 60Â°. To maintain the illusion of stationary, or slowly moving, images, an image sequence is painted on the display surface. The human visual cortex will fuse sequences of images and evoke the sensation of a stationary or smoothly moving image, without the appearance of image flicker, if sequential images are repeated with a high enough rate. This image fusion frequency depends on the overall image brightness. For a darkened room, such as a movie auditorium, a frequency of 24/second is adequate. For a dimlylit daytime room, normal for television viewing, the rate rises to around 30/second; and for normal daytime brightness the rate may exceed 60/second. Current highperformance workstations use display refresh rates of about 60 to 70/second. We would, of course, like to have our display image calculated AND displayed at the maximum rate required by these physiological demands. Thus we need to contemplate the calculation of image data at a rate of at least 30 megapixels per second, each having Red, Green and Blue components.
PAGE 23
17 The last item we need to discuss is just what should be on each pixel. In the natural world each receptor area will receive light from a small cone segment, a surface patch of about 1/10 arcminute across. Each of these patches is the sum of all light rays that are focussed on that small area. In calculating an artificial image of a real scene, we need to remember that each image pixel is a smallarea integral that we need to calculate. We seldom have the luxury of integrating over howeversmall angular cones; we normally sample the scene for one (or a few) point(s) for each pixel. This is the major cause of picture aliasing: the image elements are defined with higher resolution than is possible to paint on the display surface. Antialiasing techniques, not a subject explored in this work, deal with refining the coloring information in an attempt to account for the lack of spatial display ability. Thus the computational requirements for rendering even a modest scene may easily exceed any reasonable computing resources. It is therefore very important to look at procedures that ease the image computation task. Two promising major avenues of approach are: 1. Ways of parallelizing the computation task; 2 . Algorithmic approaches to identify the computational bottlenecks and find effective speedup procedures.
PAGE 24
18 This dissertation focusses on the second of these approaches. We will introduce a process for speeding up the basic ray tracing process. We will use a uniprocessor for accomplishing the basic ray tracing task, and will use the same processor to implement the speed up. Certainly the process can be adapted to parallel computations, thus conveying an added benefit to inherently faster computational complexes . 2.2 The Ray Tracing Process In a natural environment all rays that make up a visible scene in the observer's eye start at various light sources, then are scattered, reflected and directed into the pupil of the observer's eye. Clearly, most of the light is lost to the observer; those parts of the scene image are not observable from the location of the observer. To mimic this process exactly on a computer, i.e. to trace all light rays emanating from the scene, would require the calculation of a great many light paths which would then be not utilized in forming an observer's image. This wasteful process is known as Direct Ray Tracing; the procedure models the action of light sources on the scene to produce all observable views of it. The procedure is not practical. For reasons of economy in calculation it is necessary only to compute those rays that in fact will be used in forming the ultimate display image. Hence one starts with the
PAGE 25
19 rays at the image plane and traces the rays backwards to their sources. The process is known as Reverse Ray Tracing. It avoids the computational inefficiency inherent in Direct Ray Tracing. It is the Reverse Ray Tracing process which is commonly known as Ray Tracing; it is the process referred to as Ray Tracing in this work. As a further concession to practicality, the scene focussing is replaced by a simple pinhole camera, rather than getting involved in optical aperture complications [Pot 82]. Hence the image to be produced will be a pinhole camera view of a scene, thus a fixed focus, and by tracing the rays that make up the image back to the light sources, accounting for interactions of objects in the scene with the rays. The rays that make up the image are taken to be one for each pixel that appears on the final image rendering. Hence for a VGA image, one would need 480 rows x 640 picture elements = 307,200 pixels (i.e. rays). In a "normal workstation" there would be about 1000 rows and 1250 pixels/row, or about 1.2 million pixels. For each of these rays we need to calculate where they intersect the nearest object in the scene. It is not uncommon for a scene to contain several hundred objects. The process basically is: For each ray; determine where each object intersects ; determine the color for each of these points; paint this color to the pixel; Display the whole image.
PAGE 26
20 If the scene consists of N objects, the number of intersection calculations for a "simple VGA" image is basically 300,000 xn operations; for a workstation image, there may be four times as many calculations . Each of these calculations may require the determination of the intersection point of a ray with a surface, which may be a curved surface in threedimensions. These calculations are involved and may require many thousands of machine cycles [Bli 80]. Once the intersection point is determined, calculation of the color of that point may involve a great deal of additional calculations. These calculations are called the rendering calculations; they determine the color reflected/emitted by the intersection point with the viewing ray representing a direction of the nearestob ject for a particular ray. Depending on the nature of the visible surface point, this calculation may be simple or very complex, especially if that visible point reflects light, or is perhaps a light refracting surface. For reflected light and refracted light secondary rays need be examined which result from the light reflection/refraction properties of that particular surface point. Conceptually, one needs to determine the surface normal relative to the incident ray, and construct rays that further follow the imaged ray, all the way to the source of the light that produces the surface coloring. There may be a number of secondary rays; their intersection with the objects in the scene need then be determined. This calculation is
PAGE 27
21 basically the same as the calculation of the ray emanating from the display plane; however, since there may be many such secondary rays, the amount of calculations may become quite large . It is the objective of this dissertation to examine the raytracing image production process, to identify those of its steps that are computation intensive, and to offer palliative measures for the computational bottlenecks. 2.3 Uniprocessor Implementation The fundamental ray tracing program has three basic parts: input of scene and viewing parameters, calculation of the visible picture elements, and production of the visible image to a display device or a storage medium. There is considerable interaction between the three segments, especially when the image needs to be produced in real time, as with dynamic virtual reality environments. However, the conceptual program flow can be considered to be the sequential execution of the three parts listed above. Together with the computational tasks, the fundamental ray tracing program is diagramed in Figure 2.1 [Hec 89]. Since the objective of this dissertation is the examination of the computational loads for the ray tracing process, we will concentrate on the second phase of the program and omit critical examination of the initialization, input and output tasks. Furthermore, we assume that the
PAGE 28
22 Initialize storage, input and output files Input object geometry data, object surface properties, view parameters begin vector : inter section_point, ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color
PAGE 29
23 computational task is not further burdened with data access limitations, i.e. there is sufficient memory to hold all required data for the various calculations. We note from Figure 2 . 3 that the entire computational task is dominated by the intersection test, which determines the intersection point of a given ray with the nearest object. Fundamentally, this intersection test must be performed for all objects in the scene, but only one object will be retained for further calculation of reflected or refracted (transmitted) light. We note that for a uniprocessor implementation the various computational tasks are performed sequentially. For timing the intersection calculations we need to assure that the various other sections are excluded from the timing measurements. We also note that no shading calculations can be made until the visible surface element is found in establishing the coloring of an image pixel. 2.4 Possible Bottlenecks From an analysis of the algorithm we note that the predominant computational bottleneck is in the intersection calculations, with secondary choke points in calculating reflected and transmitted (i.e. refracted) rays, as well as in calculating the coloring of a surface element. If the surfaces of the objects are curved, then the calculation of surface normals, needed for establishing the apparent surface
PAGE 30
24 coloring, can become computationally complex. These secondary effects have been studied extensively [Bli 80], and have been subject to extensive research [Whi 80], While we do not wish to minimize these computational tasks, we note that the first choke point is finding the nearestob ject surface element that is needed for any further coloring calculations. The problem is basically finding the intersection points of all objects with a ray, and then finding that point which is nearest to the origin of the ray. This is implicitly a sorting process; however, we can make the list of elements to be sorted very short, if we find a computationally efficient way of ignoring all those objects which can not intersect a given ray (because they are located far away from the ray as it radiates into the object scene from the position of the observer) . The following chapters outline such a method. 2.5 Possible Speedups From an examination of the basic ray tracing process, there are several parts of the process which can easily lead to computational speedups. These are image clipping, shortcuts in the object sorting process, and parallelization. We briefly discuss each of these in the following paragraphs. 2.5.1 Image Clipping Since basic ray tracing examines each object of the natural scene to determine which one is closest to the
PAGE 31
25 viewpoint, the viewpoint can be located anywhere, including inside of the object assembly that makes up the final scene. This is merely a computerized version of building a scene, for example a movie set, and moving a camera toward, around and into it. At various positions of the path of the camera images are constructed, which become the sequence of frames that define a dynamic image, an animated image sequence. There is no need to discard any of the objects that make up the scene, even if those parts are invisible to the viewer, such as elements of the scene that may become located behind the observer. One could cull the objects and retain only those in the general direction of the view, and thereby greatly reduce the required calculations. Various image clipping algorithms exist which can provide a list of objects, and parts of the original objects, which will appear in the final image. Some objects will be only partly in the final scene; these elements must be clipped and the parts outside of the image extent must be discarded. Since the objects in the image may be anything, the clipping must be able to handle arbitrary primitives. A clipping algorithm which is so general is itself a complicated procedure and may be computationally expensive. For general object primitives there is no solution available, short of a variation of the ray tracing algorithm itself. Hence the search for such a panacea is likely to be fruitless.
PAGE 32
26 If we restrict the primitives to objects having polygonal facets only, the WeilerAtherthon algorithm will work well [Wei 77]. This algorithm will determine the visible parts of all objects defined by finite polygonal or infinite planar facets. The procedure is complex and computationally expensive . However, a much simpler approach is more promising. We note that the visibility of the objects will be determined by the ray tracing process itself; we merely would like to reduce the number of elements outside of the viewing frustrum that need to be tested. We simply discard entire objects which have all of their vertices outside of the viewing image. This requires a minor modification: each test for top/left/bottom/right outside of the view is applied in the view coordinates to the lowest top/rightmost left/highest bottom/leftmost right vertex of each object. These tests can be applied when the objects are placed into the view coordinate system: for each object the indicated tests are performed at the end of each object's coordinate transformation. Thus there is a relatively simple test to reduce the number of candidate objects for the ray intersection calculations. This step alone can noticeably reduce the total amount of required calculations. We assume that such a preliminary pruning is done on the original object list, and that the list of objects has already been reduced.
PAGE 33
27 2.5.2 Simplified Sorting The next problem is to sort objects in a way that avoids many comparisons. A given object may be nowhere in the direction of the ray being used for construction of a pixel. Finding these "inactive" objects with a computationally efficient process is highly desirable. 2.5.3 Parallelization One of the easiest concepts for speeding up the ray tracing calculation is to realize that physically every ray is independent of the other rays; thus each ray can be assigned to a separate processor. Then the total image generation time is merely the time taken by the most involved ray calculation; i.e. tracing of the most complex ray. For an "ordinary" VGA image one could use some 300,000 processors, assuming that few secondary rays need be calculated. For more complex scenes on workstations, over a million processors could be employed. There is also the problem of defining the image for each processor's use (by a data broadcasting process), and assembling the results into a single image buffer (so that calculation of the next scene can be started while the old image is displayed. ) This parallelization would require each processor to calculate the ray path for each pixel AND would ignore any information the neighboring processors may have about the image structure, i.e. image coherence is ignored. Thus the
PAGE 34
28 process may be the fastest, but it also the most wasteful of computing resources. A reasonable solution is to map the ray tracing process to the engines that comprise the computing environment. This dissertation merely notes this, and will not attempt to delve into it. We will concentrate on finding speedups in the uniprocessor solution, and leave the mapping of the improved process as a desirable extension for further work. Work in defining image composition hardware has been completed [Fie 88]; these together with the structures defined in this dissertation point toward a class of image generation machinery for calculating rich visual environments for use in dynamic artificial reality scenes. 2 . 6 Summary This chapter reviewed the parameters involved in human vision and the ideas underlying the basic ray tracing process. It showed that reasonable computing machinery and existing affordable display technology can present but an approximation to a naturally observed scene. The image quality is limited by resolution of the display device and by the computational capabilities of the machinery that may be used to produce the actual image. However, within these limitations, images can be produced that present a visually stunning image. It is the intent of this dissertation to examine one of the computational bottlenecks in the calculation of computer
PAGE 35
29 produced images: that of decreasing the number of objects that need to be considered during the calculation of visible parts of a scene. In achieving this goal we also specify design parameters for simple computational structures which can be cast in VLSI hardware. We will verify the correctness of these designs and use simulation to estimate their impact on the performance of the basic ray tracing procedure.
PAGE 36
CHAPTER 3 FAST RAY TRACING ALGORITHMS 3.1 Introduction The basic philosophy of ray tracing is that an observer sees a point on a display surface as a result of the interaction of the surface at that point with rays emanating from the scene. In general a light ray may reach the surface indirectly via reflection at other surfaces, transmission through partially transparent objects or a combination of these. Ray tracing is the most complete simulation of an illumination/reflection model in computer graphics. Ray tracing procedures have produced the most realistic images to date in computer graphics . But the method has a number of disadvantages, the most critical being extremely high processing requirements. Image generation time can be hours or even days, even on powerful computer systems. Ray tracing is considered impractical. Much research has been done to improve the procedure. There are several techniques that have gained wide acceptance. In this chapter we will discuss four algorithms and point out the problems of each of them. 30
PAGE 37
31 Faced with the task of accelerating the process of ray tracing, there are three very distinct strategies to consider [Arv 89 ] : 1) reducing the average cost of finding the intersection of a ray with the environment; 2) reducing the total number of rays intersected with the environment; 3) replacing individual rays with a more general entity. These strategies are called "faster intersections", "fewer rays" and "generalized rays", respectively. The last strategy (generalized rays) places some constraint on the environment that can be considered, such as restricting the type of primitive objects, or we may need to abandon the notion of exact intersection calculations, accepting an approximation instead [Arv 87]. The examples of this are beam tracing, cone tracing, pencil tracing [Fol 90] [Arv 89]. These procedures are peripheral to the study being presented here and will not be discussed further. The other two strategies, faster intersections and fewer rays, can be combined under the heading of speed up algorithms. In this case obtaining faster rayintersection with objects is still the key to the fast algorithm. The strategy used to get faster intersections separates this class into the subcategories of "faster" and "fewer" rayobject intersections. The former consists of efficient
PAGE 38
32 algorithms for intersecting rays with specific primitive objects, while the latter addresses the larger problem of intersecting a ray with an environment using a minimum of rayobject intersection tests. The following four algorithms are representative fast algorithms which can be placed into the above two subcategories : 1) Bounding volume algorithms; 2) Hierarchical bounding volume algorithms; 3) Uniform spatial subdivision algorithms; 4) Nonuniform spatial subdivision algorithms. 3.2 Bounding Volume Algorithms The most fundamental and ubiquitous tool for ray tracing acceleration is the bounding volume. The idea of using a bounding volume takes advantage of the fact that rays usually miss the objects they are tested against. By enclosing complicated objects in invisible bounding volumes that are easy to intersect, one can avoid complicated object intersection calculations. This leads to fewer calculations, especially if the ray misses the bounding volume. Because the bounding volume intersection test is simpler than object intersection, this algorithm may reduce the computation by a considerable amount, but can not improve upon the linear time complexity of exhaustive ray tracing [Arv 89]. The reason can be explained with the objects in Figure 3.1. How do we bound
PAGE 39
33 Figure 3 . 1 Bounding Volumes
PAGE 40
34 these objects? How many bounding volumes are employed? What kinds of shapes are best as a bounding volumes? The simple bounding volume could be sphere (circle in this twodimensional example) because the intersection test is easier than that of any other type. When we employ one sphere bounding volume, the bounding volume is too huge, and every ray will hit the bounding volume. In this case the bounding volume algorithm must pay the cost of three object intersection tests and costs the of bounding volume intersection tests. When we employ three bounding volumes on each object, the bounding volumes for B,C have similar problems as the one bounding volume case. Even though empty space is reduced by employing three bounding volumes, volumes 2,3 still contain big empty spaces. The other solution is to employ a box as a bounding volume (in the twodimensional case a rectangle). By employing fitted bounding volumes we reduce the empty space in each bounding volume. The problem in this case happens with object C. The ray intersection test for object C requires three ray and line intersection calculations. But the ray intersection test for the bounding volume requires four ray and line intersection tests. There is no advantage if the cost for intersection with the bounding volume is as expensive as that with the object. Weghorst points out this as the key problem of the bounding volume algorithm [Weg 84]. The void area of bounding volume is defined as the difference in the projected
PAGE 41
35 areas of bounding volume and the item. He considers the following two problems when bounding volumes are employed: 1. void area of bounding volume; 2. the cost of intersecting the bounding volumes. In his computer experiment, Weghorst applied two bounding volumes (sphere, box) to the same object for deciding which one is better. He considered the case of only changing the view point and he measured two performance indices for each type of bounding volume. From his experiments he concluded that the optimal choice of bounding volume is ray dependent. However, in the general case, as the illumination becomes more complex or the environment becomes specular or transparent, ray dependency becomes more difficult to consider since rays are more likely to arrive from any direction. The best bounding volume depends on both the expense of performing tests on the bounding volume itself and on how well the volume protects the enclosed object from tests that do not yield an intersection. The criterion for the selection of the bounding volume is to minimize the total cost function T of the intersection test for an object. This total cost function is defined by Weghorst, Hooper, and Greenberg as follows [Weg 84 ] T = b * B + i * I where T : total cost function b : number of times that the bounding volume is tested for intersection
PAGE 42
36 B : the cost of testing the bounding volume for intersection i : the number of times that the item is tested for intersection I : cost of testing the item for intersection. For a specific item in an environment, with a given view, b and I are constant. However by manipulating the shape and size of the bounding volume, B and i can be varied to reduce the total cost function. The two elements B and i are generally interdependent. For example, reducing B by reducing the complexity of the bounding volume will almost certainly increase i. To minimize the total cost function, we only assign bounding volumes to those items whose intersection tests are sufficiently complex to warrant one. Certain items such as spheres, cylinders, and rectangular parallelepipeds, need not to be bounded. In this case the ray tracing algorithm maintains two lists (one is the bounding volume list, the other the simple object list). Bounding volume algorithm can be added to the original ray tracing algorithm. Figure 3.2 shows the bounding volume algorithm. This algorithm employs a bounding volume list and performs intersection tests with the objects after finishing raybounding volume intersection tests. From a theoretical point of view the bounding volume algorithm may reduce the computation by a constant factor, but can not improve upon
PAGE 43
37 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : intersection p oint, ref lected_direction , transmitted_direction colors : local_color, ref lected_color , transmitted_color color
PAGE 44
38 the linear time complexity of exhaustive ray tracing. The main problem of bounding volumes is that defining the optimal bounding volume is difficult. 3.3 Hierarchical Bounding Volume Algorithm A common extension to bounding volumes is an attempt to impose a hierarchical structure of bounding volumes on the scene . If it is possible, objects in close spatial proximity are allowed to form clusters, and the clusters are themselves enclosed in bounding volumes . Figure 3 . 3 shows a bounding volume A which contains one large object B and an other bounding volume C, which has four small bounding volumes (Cl, C2, C3, C4) inside it. The tree represents the hierarchical relationship between the seven boundary extents A, B, C, Cl, C2, C3, C4 . By enclosing a number of bounding volumes within a larger bounding volume it is possible to eliminate many objects from further consideration with a single intersection check. If a ray did not intersect the parent volume, there was no need to test it against the bounding volumes or objects contained within. A ray traced against bounding volumes means that such a tree is traversed from the topmost level. A ray that happened to intersect Cl in Figure 3.3 would be tested against the bounding volume Cl, C2, C3 and C4 but only because it intersects the bounding volume representing that cluster. A ray that missed bounding volume A need not be tested against bounding volumes inside of A. This intersection
PAGE 45
39 Bounding Volume Tree Structure A Figure 3.3 Bounding Volume Hierarchies
PAGE 46
40 algorithm is implemented in Figure 3.4. The hierarchical bounding volume algorithm employs an intersection procedure (HBV_intersect ) . The data structure for this hierarchy is assumed to be a tree with an arbitrary branching factor at each internal node. Thus bounding volumes may enclose any number of other bounding volumes . Each leaf node of the tree is a single primitive object while each interior node consists of a bounding volume. The procedure Intersect in HBV_intersect performs a rayobject intersection for the given ray information (origin, direction) and an object. The function " Intersect_P" is very similar to "intersect" except that it returns a boolean value indication whether an intersection was found or not. The intersection process of the hierarchical bounding volume begins with the root node of the tree. To alleviate the bounding volume problem, Rubin and Whitted [Rub 80] introduce the hierarchical bounding volumes algorithm to ray tracing in order to attain a theoretical time complexity which is logarithmic in the number of objects instead of being linear. But constructing a bounding volume hierarchy involves two considerations: 1) which bounding volumes to enclose; 2) what type of bounding volume to enclose them with. This is a challenging problem because the number of possible hierarchical groupings of objects grows exponentially with the number of objects, making an exhaustive search totally impractical. There are some suggestions on
PAGE 47
41 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : inter section_point, ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color
PAGE 48
42 constructing the hierarchy of bounding volumes [Gol 80] [Rub 80] [Weg 84] . The potential clustering and the depth of hierarchy depends on the nature of the scene. The problem with bounding volume hierarchies is that they are not convenient for a user to specify. That drawback is addressed by techniques for generating bounding volume hierarchies automatically [Nic 94] . 3.4 Nonuniform Spatial Subdivision Algorithm Bounding volume hierarchies provide a means of recursively narrowing the focus of the search to more promising candidates for intersection. Bounding volume hierarchies organize objects bottomup; in contrast spatial subdivision algorithms (uniform or nonuniform case) begin with a different philosophy. Spatial partitioning subdivides space topdown, i.e, we rely on simple volumes to identify objects which are good candidates for intersection, but these simple volumes are constructed by applying a divideandconquer technique to the space surrounding the objects instead of considering the objects themselves. One may construct the volumes in a topdown fashion by partitioning a volume bounding the environment into smaller pieces. The smaller volumes are assigned a collection of objects which are totally or partially contained within them. The spatial subdivision algorithm selects sets of objects based on given volumes. This
PAGE 49
43 small volume is an axisaligned rectangular prism. This is called a "voxel". A preprocessing step is responsible for constructing nonoverlapping voxels. The basic idea of the spatial subdivision algorithm is that a ray imposes a strict ordering on the pierced voxels based on the distance to the point at which the ray first enters each voxel. Because the voxels are closest to the ray origin than those in all subsequent voxels, if we process the voxels in the order in which they are encountered along the ray, we need not consider the contents of any further voxels once we have found a point of intersection. There are two types of spatial subdivision schemes: uniform spatial subdivision and nonuniform spatial subdivision. Nonuniform spatial subdivision techniques discretize space into regions of varying size in order to conform to features of the environment. This variation in size allows more subdivisions to be formed in densely populated regions of space and it allows large voxels to cover regions which are sparsely populated or are entirely void. Usually an octree is one possible data structure for creating and organizing such a collection of voxels. Glassner [Gla 84] introduces an octree variation for use in ray tracing. In the creation of the octree, a box containing the environment is recursively subdivided until each voxel contains fewer than some threshold number of intersection candidates or until a storage limitation is reached. After
PAGE 50
44 constructing the octree, we trace rays through the algorithm in Figure 3.5. In Glassner's approach, nodes of the octree are linked and accessed by uniquely defined names rather than sorting explicit pointers to descendent nodes. To access data associated with a node name, the name is used to retrieve a pointer from a hash table. Glassner observed that simply computing the name modulo the size of the hash table serves as a good hashing function. If a ray hits nothing within a voxel we must proceed to the next voxel pierced by that ray. Glassner' s algorithm accomplishes this task by keeping the minimum length of voxels (the resolution of voxel) in nodes of the octree. The movement to the next voxel is accomplished by finding a point within the next voxel and performing the lookup . An other type of data structure for creating and organizing such a collection of voxels is suggested by Kaplan and Jansen uses binary space partitioning trees (BSP trees). This BSP trees obviates the need for voxel names and hashing at the expense of a potential increase in storage. Figure 3.6 shows a spatial subdivision algorithm based on BSP trees. This algorithm is suggested by Jansen [Arv 89]. The big difference between Glassner' s and Jansen's approaches are the data structure for voxels and the movement to the next voxel. Instead of finding the next voxel by creating a point guaranteed to fall within it and traversing the hierarchical structure from the root, Jansen's algorithm
PAGE 51
45 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : inter section_point, ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color ^ black ; if (depth ^ maxdepth) begin color
PAGE 52
46 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : inter section_point, ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color
PAGE 53
47 recursively descends all the branches of the BSP tree which terminate at pierced voxels, making use of each partition node only once per ray. Figure 3.7 shows the nonuniform spatial subdivision algorithm via an octree. The ray A shown here visits five of the voxels to examine the objects in those five voxels. Three of the eight objects need to be tested for intersection. The ray B visits only one of the voxels and performs one rayobject intersection test which is tested eight times in the original ray tracing algorithm. Finer subdivision can decrease the number of rayobject intersection tests at the expense of additional voxel processing overhead. This algorithm requires enormous amounts of data storage [Wat 89]. 3.5 Uniform Spatial Subdivision Algorithm Fujimoto [Fuj 86] introduced an uniform spatial subdivision algorithm in which voxels of uniform size are organized in a regular threedimensional grid. The overall strategy is quite similar to the nonuniform spatial subdivision algorithm. The voxels are processed in the order they are pierced. When each voxel is tested, candidate objects in the voxel are intersected with the ray. To perform this, Fujimoto developed a threedimensional digital difference analyzer (3DDDA) to incrementally compute successive voxel indices in the same way that efficient line rasterization algorithms incrementally compute pixel coordinates. This is
PAGE 54
TO : Tested Object PV : Processed Voxels Figure 3.7 Nonuniform Spatial Subdivision
PAGE 55
49 similar to the line drawing algorithm [Fol 90]. This 3DDDA eliminates floating point multiplications and divisions. Figure 3.8 shows this algorithm. The differences between uniform spatial subdivision algorithm and nonuniform spatial subdivision algorithm are the following: 1) the subdivision strategy does not depend on the structure of the environment; 2) access to the ray pierced voxels are very fast due to the incremental calculations. The Figure 3.9 shows a 2D analog of the uniform spatial subdivisions. The ray A visits 14 voxels and results in one object being tested for ray intersection. But there are many empty voxels in this example. Since this algorithm does not depend on the structure of the environment, there are many voxels which point nothing. This kind of disadvantage can not overcome the advantage of fast access. The big disadvantage is that a huge memory may be required. Although paged memory techniques can be used to implement the scheme, there is a large memory management overhead in paging and many modest images can not be handled expeditiously. 3.6 Inside Test Whatever kind of fast algorithm is used in ray tracing, it is vitally important that the rayobject intersections must be correct. The big advantage of spatial subdivision algorithm
PAGE 56
50 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : intersection_point , ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color ^ black ; ~ if (depth ^ maxdepth) begin color ^ back_ground_color; Grid_intersect( start, direction, node) ; if (intersection) begin local_color <[ contribution of local color model at intersection point ] { Calculate direction of reflected ray } Ray_trace ( intersection_point , ref lected_direction, depth+1 , ref lected_color ) { Calculate direction of transmitted ray } Ray_trace ( intersection_point , transmitted_direction , depth+1 , transmitted_color ) Combine (color, local_color, local_weight_for_surface, ref lected_color , ref lected_weight_for_surface, transmitted_color , transmitted_weight_for_surf ace) end end end procedure Grid_intersect (origin, direction, node) vector : origin, direction globalpointer : *node begin [compute i,j,k for the voxel containing origin]; [set up 3DDDA based on direction and origin] repeat for each object in voxel[i,j,k] do Intersect (origin, direction, object); endfor if (no intersection) then update i,j,k using 3DDDA; endif until an intersection is found or outside of environment end Figure 3.8 Uniform Spatial Subdivision Algorithm
PAGE 57
Circle : Object Figure 3.9 Uniform Spatial Subdivision
PAGE 58
52 is that sorting the raypierced voxels is included in the algorithm. But the bounding volume algorithm does not include the sorting procedure. Even though we can sort bounding volumes, sorting order of bounding volumes does not mean the sorting order of objects. For example ray A pierced two bounding volumes BVl, BV2 in Figure 3.10 (a). The sorting order for bounding volumes with respect to ray A is (BVl, BV2). But the sorting order for objects is different than that for bounding volumes. There is rayobject intersection in BVl. But that intersection is not the correct one with ray A. The reason is that bounding is performed on objects not on space. This is a problem of the bounding volume algorithm (including hierarchical bounding volumes). Spatial subdivision algorithm has a different kind of problem. The ray B in Figure 3.10 (b) pierces the voxels in the following order: voxel 4, voxel 3, voxel 7, voxel 6, voxel 5. Since object OBJ3 is in voxel 4, the rayobject intersection test is performed between the ray and OB J3 . But the intersection point is not in voxel 4 . The inside test makes sure that an intersection point is in a particular voxel in the spatial subdivision algorithm. The inside test procedure tests the following after a rayobject intersection has been detected: 1. find the intersection point using the intersection parameter value(t) and the ray equation;
PAGE 59
53 BV1 Figure 3.10 Inside Test
PAGE 60
54 2. check whether the intersection point is in the x,y,z interval of the voxel; 3. make sure that the intersection point is in the voxel. To get a fast ray tracing algorithm, we must find a fast and easily computed inside test algorithm. To save memory in the spatial subdivision algorithm, we can combine the bounding volume algorithm and the spatial subdivision algorithm. In this case bounding is performed on subspaces rather than objects and the inside test must be done on every intersection point. This new algorithm will be developed in chapter 4.
PAGE 61
CHAPTER 4 DEPTH SORTER 4.1 Introduction The Depth Sorter discussed in this chapter is a procedure for substantially speeding up the ray tracing calculations. The core of this algorithm consists of two ideas: 1) sort bounding volumes, 2) avoid unnecessary intersection tests even for objects in the bounding volumes intersected with the ray. This algorithm is modeled after other fast algorithms and avoids the drawbacks inherent in those. This chapter shows the development of the fast algorithm. We discuss some of the problems in the fast algorithm. Data structure is also an important factor to consider to save memory space or when the scene contains a great many objects. A critical factor in computational efficiency of ray tracing is the ease with which object bounding can be accomplished. After objects are assigned to bounding volumes, the sorting of those volumes with respect to the ray is a key problem. The main sorting idea and aspects of its hardware implementation are considered. The following subsections discuss these things step by step. 55
PAGE 62
56 4.2 General Depth Comparator The ray tracing algorithm finds the object nearest to the ray origin. That algorithm performs intersection calculation first. After a ray meets an object, the depth of the intersection with the previously found depth is compared. Repeating this step for every object, the comparator finds the nearest object at each pixel. We can interpret the ray tracing algorithm as a depth comparator for objects in image space. If we want to find the nearest object not using bounding or spatial subdivision algorithm, the ray tracing algorithm itself is sufficient for calculating the depth of each visible element in the image. The amount of calculation, and hence the execution time, is basically fixed by the complexity of the scene. By adding hardware, we can reduce the run time. According to statistics reported by Whitted [Whi 80], 95% of ray tracing time is spent for intersection calculations. To improve this, the bottleneck stage in ray tracing, we could use hardware that performs the intersection test. Let's consider some of the problems in implementing this kind of hardware. The typical object space for ray tracing is made up of two types of surfaces, flat surface elements, and quadratic surfaces. For both of these types the surface equations and the surface normal equations are relatively simple [Han 89]. It may be possible to devise schemes for depth sorting using a wide range of object primitives. But it is not easy
PAGE 63
57 when we consider the following example as shown in Figure 4.1. Let us consider a general polygonal surface element. Since the element may be concave, the star shown in Figure 4.1 represents the general case. Note that the pertinent eguations are: f^(x,y,z) =aiX+b^y+c^z=Q (i=i,2,..,5) (4.0) To find the ray/object intersection we must 1. Calculate the intersection of the ray with the surface, as a function of the ray parameter t; 2. Find the intersection point using t and the ray equation; 3. Check the following conditions ( fi ^ 0, f2 ^ 0, fj > 0) OR ( f2 ^ 0, fj ^ 0, f4 > 0) OR ( f2 ^ 0, f4 ^ 0, fs ^ 0) OR ( fi < 0, f4 > 0, fj > 0) OR ( fi ^ 0, fj < 0, fj < 0) OR ( fi ^ 0, f2 ^ 0, fj < 0, f4 ^ 0, fj ^ 0) Tests enumerated in Step 3 can become difficult to implement. Generally the more complex the polygonal shape, the more involved the test for intersection becomes. The major problem with devising a depth comparator for arbitrary object primitives is that such objects do not have a regular (uniform) shape and whenever we employ a new primitive, the comparator must be modified to accommodate the comparisons
PAGE 64
58 fl = 0 Figure 4.1 A general polygonal surface element
PAGE 65
59 required by the new primitive. So primitive sortinq is not easy and not necessarily a qood idea. If we have the same type of object (i.e. bounding volume), we can sort the depths of objects with respect to the ray direction. If the bounding volume is relatively simple, ray intersections with the actual object points may be not difficult to find. From the above example we know that the primitive depth sorting is not easy; however, bounding volume sorting could be easy. When we employ the bounding volume algorithm to sort depth with respect to the ray, we must consider the following questions . 1. What kind of bounding shape gives the simplest intersection test? 2. What kind of bounding volumes can be fast and easily sorted with simple calculations? There are lots of bounding volumes which have regular shape. Box and sphere are representive . Other volumes could be more complicated than these. Intersection test of sphere bounding volume is easier than that of box bounding. For a simple object as shown in Figure 4.2 ( 3 ), we can bound it using a sphere as shown in Figure 4.2 (b). The void area of a bounding volume is defined as the difference in area between the orthogonal projections of the object and bounding volume onto a plane perpendicular to the ray and ray the origin of the ray [Weg 84]. Sphere bounding results in a very simple
PAGE 66
60 Figure 4.2 Bounding Volume
PAGE 67
61 intersection test. However, sphere bounding may not be proper for many shapes. Kay and Kajiya presented a method of handling box bounding based on slabs [Kay 86]. A slab is simply the space between two parallel planes . The intersection of a set of slabs defines the bounding volume. This bounding volume method does not overcome the void area problem for this kind of object, as illustrated in Figure 4.2 (a). This algorithm also needs to build a hierarchy structure of the objects and bounding volumes in image space. Drawbacks of the hierarchical bounding volume is addressed by techniques for generating bounding volume hierarchies automatically [Nic 94] . Hardware implementation for box bounding volume intersection results in machinery that is not simpler than that for sphere bounding volumes . Even though we should not consider the void area problem, we need to look at the volume link problem as can be seen from Figure 4.3. One advantage of the ray tracing algorithm is that we can freely move the viewpoint. Objects are bounded by bounding volumes and bounding volumes are linked by pointers . For three bounding volumes, there are six possible links as shows in Figure 4.3. For the linked list (a), the bounding volume algorithm performs intersection tests first with bounding volume 3 and then tests with objects in volume 3. Even though the ray hits volume 3, it does not hit any object inside volume 3. The ray tracing algorithm repeats the same procedure for volume 2, but fails to find any intersection.
PAGE 68
62 null null null null null null (a) (b) (c) (d) (e) (f) Figure 4.3 Bounding Volume Link Lists
PAGE 69
63 Finally it tests volume 1 and finds the nearest object from the viewpoint. In this worst case, the bounding volume ray tracing algorithm performs the whole intersection with objects and bounding volumes. The best case is in Figure 4.3 (f). That algorithm intersection test with volume 1 and finds the nearest object. The algorithm keeps the nearest hit distance until all intersections are performed. When the next intersection with the ray and volume is being calculated, the algorithm compares only the nearest distance and updates it or bypasses those steps. The problem in this situation is how to keep the best linked list for any viewpoint, and for all rays. Spatial subdivision algorithm gives a solution for this situation, by sorting the ray hit spaces with respect to ray direction. But to solve the void area problem, the spatial subdivision algorithm divides the space more finely; however, that in turn requires bigger arrays. The memory representing image space will be proportional to n^ e.g. if latitude, longitude, height axis are divided by 10 then a 10x10x10 array space is required to represent the image space. Yet another problem is the inside test, which was mentioned in the previous chapter. So far we found the following problems with the bounding volume algorithm and spatial subdivision algorithm: 1. Void areas in bounding volumes; 2. Link status in bounding volumes;
PAGE 70
64 3 . Memory space representing image space in spatial subdivision algorithm; 4. Inside test in spatial subdivision algorithm. By bounding subÂ— space instead of object we can overcome the problems listed above. A new bounding algorithm based on above considerations is proposed in Figure 4.4. Image space is properly divided. This divided subspace is a bounding volume if that contains at least a part of an object or an entire object. So an object can be bounded by several bounding volumes or one bounding volume can contain several objects in this bounding volume algorithm. This bounding strategy is to ease the void area problem. The next procedure is to sort those bounding volumes with respect to the ray. The algorithm will inspect objects in each sorted bounding volume. If the ray hits the nearest object, then no more calculations are reguired for the intersections test. For example in Figure 4.4 (b), spaces 1,2,3,4,5,7,9,12,14, can not be bounding volumes because they contain neither part of an object nor entire objects. Six bounding volumes will be sorted with respect to the ray direction as shows in (c). The algorithm will inspect volume 11 to find the nearest object A. After finding the nearest object in the bounding volume nearest to the viewpoint, the algorithm quits inspection.
PAGE 71
65 (a) Figure 4 . 4 New Algorithm and Data structure
PAGE 72
boundlist null (b) sortlist null Figure 4.4 Continued
PAGE 73
67 4.3 Possible Prob lems with the Naw BonnHina Volnmf^s The new bounding volume algorithm considers two problems in ray tracing. The first one is to make bounding volumes compact to avoid the void area problem by allowing objects to overlap in the bounding volumes. So any shape could be a possible candidate for bounding volumes. We choose shapes that 9^^Â® intersection tests and are easily sorted with respect to the ray. We also need to consider possible hardware construction implications. Let's consider two bounding volumes, boxes and spheres. 4.3.1 Box Bounding Volumes Box bounding volume is defined by six planes. Bounding volume is described by six plane equations. To find the intersection parameter value with a ray, the algorithm performs at least 6 division operations for each box. Division algorithm takes three or four times as long to compute in most implementations as multiplication. Furthermore, the division algorithm tends to be difficult to pipeline due to the dependencies inherent in selecting quotient bits [Flo 89] [Ere 94]. To avoid division operations in the sort procedures, we compare the depths of box bounding volumes using coefficients of plane equations . In this case the inside test is not easy. For example. Figure 4.5 shows a twodimensional box bounding volume containing simple objects A and B. Ray R1 starts from outside of the two
PAGE 74
Figure 4.5 Inside test and overlap test
PAGE 75
69 bounding boxes 1 and 2 . Some proper algorithm sorts the two bounding boxes with respect to the ray direction R1 . when we calculate the intersection of ray R1 and objects in Box 1, ray R1 hits the object A at Box2 . In this case ray R1 must return object B color information. To prevent this situation, we must check for an inside test as we remarked in the previous chapter. When the ray hits object A, we know only the intersection distance, basically a parameter value (t) . From this value t we can find the intersection location using the ray equation. After finding the intersection location, we need to check whether that location is in the bounding volume or not. These procedures need many operations (multiplications, additions, comparisons). This inside test is not easy for box bounding. Implementation of hardware is also more complicated than the box sorter when the inside test is added to the hardware. 4.3.2 Sphere Bounding Volumfis Sphere bounding volume also has some of the same problems that the procedure that uses box bounding volumes . One advantage of the sphere is that the volume is easier to specify, i.e. only one equation is enough to represent a sphere. Because bounding volume equation is quadratic, we need a square root function to solve for intersection distance.
PAGE 76
70 While the design of fast and efficient adders and multiplier is wellunderstood, division and square root remain serious design challenges. The reasons are the intrinsic dependence among the iteration steps and the complexity of the resultdigit generation function [Ere 94]. So sorting depth with respect to ray intersection distance may not be a good idea. However, comparing the coefficients of the bounding volume equation and the ray direction provides a clue to sort depths of the bounding volumes. This approach will be given in section 4.5., where we discuss the object intersection with the ray. Using coeffients from the bounding volume equations and object intersection depth parameter t, we can easily check the inside test. The other problem is to avoid the intersection test with the object already found to have not intersected the ray. Figure 4.5 (b) shows two sphere bounding volumes which contain object A and (A and B). Ray R2 passes through volumes 1 and 2. A sort algorithm sorts two volumes. At object intersection stage, the intersection algorithm performs a test whether R2 hits object A in SPl or not. In this example R2 misses object A. For the next bounding volume, another intersection test will carried out. We want to avoid the object intersection test for A, because we already know that object A was already missed by R2 . This problem also will be considered in section 4.6.
PAGE 77
71 4.4 Bounding of Objects The new bounding algorithm consists of two steps: 1. The object bounding stage; 2. The ray calculation stage. In the first stage the objects in the image space are partitioned by invisible spherical surfaces. After the first stage, the algorithm computes the ray at each pixel. The main objective of the object bounding stage is to reduce the void volume in each sphere bounding volume. Generally we know that if we employ many bounding volumes, we could make the void volume very small. But if the sorting time involved in the examination of the bounding volume is larger than the calculation of the object intersections, there is no advantage. Even though we may implement hardware for fast sorting of bounding volumes, this hardware will require a large memory to hold the data for the many bounding volumes. In the new bounding algorithm, sorting is done for finding the nearest bounding volume for each sphere. The sorting ultimately should be done in hardware and the intersection tests with objects and ray will be done by software. The complexity of the hardware will be dictated by the number of spherical bounding volumes it is to handle. Let's assume that the depth sorter (an implemented hardware for sorting sphere bounding volumes ) can handle N sphere bounding volumes .
PAGE 78
72 There are two ways to assign bounding spheres: 1. A priori assignment, typically by the user's examination of the structure of the object assembly; 2. Automatic bounding, using an explicit algorithm to cluster the objects in the scene. We examine the algorithm for the clustering process. Usually we know the location of objects in the scene and the nature of the objects. We partition image space into N boxes. By shrinking each axis of the rectangular boxes, we make each box as compact as possible. When we bound the objects the best rule to minimize void areas is that the bounding box shape is made cubical. After making boxes of desirable shape, we tightly wrap each box using a circumscribing sphere. This wrapping procedure will be the same that of the automatic bounding which will be discussed later. After assigning sphere bounding volumes, the algorithm inspects each object for a sphere bounding volume whether objects are already included in the bounding volume or not. For simple image space, bounding by visual inspection is very easy. When the image space has many objects and the space partition is not simple, the objects bounding step may take many calculations. The automatic bounding procedure is simpler to employ than bounding made by inspection. Figure 4.6 shows the automatic bounding procedures for twodimensional object space. (We use twodimensional space for illustration only; the algorithm is designed for threedimensional application.)
PAGE 79
(b) Figure 4.6 Bounding procedure of New Algorithm
PAGE 80
74 This space has 4 objects A,B,C,D. Let's assume N=6, i.e. this algorithm can employ up to 6 sphere bounding volumes (in 3space . ) To bound objects, we must know the extent of the object space. By checking each object, we find the maximum extent of the objects. Figure 4.6 (a) shows this extent. The aspect ratio of this frame is approximately 3:2 so this space is divided as 6 boxes. Each box in the example has large void areas. To reduce those void areas, we shrink each box in the principal view coordinate dimensions such that the shrunk boxes just enclose the objects or parts of objects. We may also find that some boxes are empty, i.e. contain no objects from the scene. These boxes are removed from the list of potential bounding boxes. In this example box 6 in figure 4.6 (b) is removed from the candidate list for bounding volumes. We can easily wrap a sphere around each rectangular box. For the twodimensional case used as an example, the center of the sphere is the intersection point of the two diagonals and the radius is the half length of the box diagonal. Employing these circles as the bounding volume, we can bound objects in the image space. For the threedimensional case, the threedimensional bounding spheres become circles in the viewing space. A visual inspection for tailored bounding volumes can employ all N bounding volumes because the partition is performed by the human operator. Even though the partition of
PAGE 81
75 space is not easy, there are many alternate methods for assigning the bounding spheres. In general it takes a long time to bound objects for an arbitrary scene. The automatic process of assigning bounding volumes takes a short time to bound objects in the image space. It can employ up to N bounding volumes . The automatic bounding algorithm shown in the example in Figure 4.6 uses 5 of the 6 possible bounding volumes. In this example one bounding volume is idle during the subsequent sorting process. One could develop a tighter algorithm, at the expense of additional execution time, which can render fewer idle bounding volumes : the bounding volumes are more dense to reduce the void area problem. However, the additional running time was found to reduce the overall efficiency. Hence the procedure shown here is a "best compromise" solution leading to the greatest observed improvements . 4.5 Algorithm 4.5.1 Filtering and Comparison There are many sphere bounding volumes which were constructed from the ideas presented in the previous section. In the ray calculation stage, we need to find the spherical bounding volumes which are intersected by the ray and sort the distances of the points where the ray intersects the bounding volume with respect to the ray direction. After discarding the unintersected sphere bounding volumes, we need a norm to
PAGE 82
76 compare depths, two spheres at a time. This norm can be derived from a comparison between the coefficient pairs of quadratic equations that describe the spheres. Let d be the ray direction unit vector from the view point d = (dx, dy, dj and d^^+dy^+d,^ = 1 Furthermore, Vo is the viewpoint (V,, Vy, V J ; Bj is the center of the bounding sphere (Xq, Yo, Zq); and R is the radius of the bounding sphere. The bounding volume equation is (xXg) 2+ (y7g) 2+ (zZg) 2=i?2 (4.1) The ray equation is (4.2) Substituting Eg. (4.1) into Eq. (4.2), we get ( td^Xg ) 2 + ( y^+ td^Xg ) 2 + ( td^Xg ) 2 =i?2 (4,3) Reorder Eq. (4.3) with respect to t t2(dj + d^+d^) +t[2d^(V^Xg) +2dy,(Vy,yg) +2d^(V^Zg) ] + ( V^Xg) 2+ ( v^y,) 2+ ( y^zg) 2i?2 = o (4.4) Let d=2d,(y,Xg) +2dy,( Vyg) +2d,(y,Zg) ( 4 . 5 )
PAGE 83
77 C= ( V^x,) 2 + { V^o) Then Eq.(4.4) becomes t^+bt+c=0 (4.6) (4.7) The intersection of the parametric ray with the sphere is characterized by the quadratic equation (4.7). Each bounding volume will have a different set of coefficients {b,c} for every ray. Let t^+bjt+Ci ~ 0/ t^+b 2 t+C 2 = 0 be the two parametric intersection equations for the sphere bounding volumes 1 and 2 respectively. Using the coefficient b,c and only simple mathematics, we can compare the depths of the sphere bounding volumes with respect to the ray direction. Before developing the algorithm, we define discrimination equation D and four state variables Xi,X 2 ,X 3 ,X 4 . X, = X, = X, = X. = bj < bi b2 ^ bi 4(c,C2):Sbi^b2^ 4 ( c,C2)>b,^b2^ bib2<2 ( C 1 +C 2 ) bib2^2(Ci+C2) (bjCi b,C2) (bib2)<(c,C2)^ (b2Ct biC2) (b,b2)^(CiC2)^ Â£>=jb^4c (4.8) Let's consider the following two quadratic equations ( t) =t^+b^t + c^ (4.9)
PAGE 84
78 f2it) =t^+b^t+c^ (4.10) Because a nonpositive intersection value is always meaningless in ray tracing, we are interested in only the smallest positive root of each equation. Three spherical bounding volumes (spl, sp2, and sp3) are shown in Figure 4.7(b). The ray R starts at the origin P with a direction. Each bounding volume contains objects. The ray R has no intersection with sp2 or sp3 . So any object in sp2 or sp3 will not be intersected with the ray R. Physically, the sphere bounding volume spl does not meet the ray, but mathematically that volume has an intersection with the ray. The curve fi(t) shows the relationship between physical interpretation and mathematical interpretation. The intersection of spl with the ray R occurs for negative values of the parameter t. Two negative real roots of fi(t) mean that the sphere bounding volume is located behind the ray Â• Since we are only interested in the objects which are in the direction of the ray, we need to remove those bounding volumes at the presort stage in the ray tracing algorithm. Figure (a) in 4.7 gives a clue to the identity of the unintersected bounding volumes. The negative discriminant (D) in equation (4.8) means that the volume has no intersection with the ray. Even though we calculate intersections with the ray, two negative intersection values of the parameter t are useless as intersection points in the direction of the positive ray, as can be ascertained from fi(t) in (a).
PAGE 85
79 SP3 Figure 4.7 Unintersected bounding volumes
PAGE 86
80 The center of fi(t) is in the left half plane and the intersection with y axis is positive. So when the coefficient pair is b; ^ 0 and Cj > 0, we know that that sphere bounding volume is behind the origin of the ray. We filter out those two cases in a presort stage with simple tests of the guadratic function coefficients: 1. f(t) has no real root; 2. f(t) has two negative roots. Case 1 : cl > 0 and c2 > 0 Consider the two sphere bounding volumes in the image space shown in Figure 4.8(b). The ray R starts at the outside of both sphere bounding volumes . The related mathematical curves are shown in (a) . All roots of these guadratic equations are positive real. This means that two bounding volumes are in the direction of the ray and the two sphere bounding volumes do not include the origin of ray P. The ray R start at P and hits spl at A, and sp2 at B. We define depth as a length from the of the ray P to the nearest ray intersection position on the sphere bounding volume. For example the length PA is the depth of spl and the length PB is that of sp2 . The related intersection values of the parameter t in Figure 4.8(a) are t^ and tg . These intersection values are proportional to the actual depths . The two bounding volumes can be overlapping or separated as seen in Figure 4.8(b). In either any case we can compare the depth with respect to the ray path without using
PAGE 87
81 Figure 4.8 Case 1
PAGE 88
82 uhe square root function. The Appendix gives details of the mathematical steps. Using a flow chart and state variables, we summarize these mathematical steps in Figure 4.9. Table in (b) of Figure 4.9 gives the selection condition tj by using only the state variable equation. ii = X1X3 +X3X4 +X2X3X4 (4.11) Consider one more thing about the curve in Figure 4.8. spl contains a part of a triangle and sp 2 also has a part of the same triangle. When we perform the intersection test for objects in spl, we get the intersection point I. But I is not included in spl. How do we know that a intersected position is in a bounding volume? We call this the "inside" test. If we know the intersection values of the bounding volumes (tA;t 0 ,tc,tD in this example), the comparison of these values ( ^ tj ^ tc or tg ^ tj to ) gives the proper information. But we want to avoid using the square root function and use only the coefficients b,c to compare depths. This comparison test indicated above does not work directly for objects enclosed in multiple spherical bounding volumes. Let's consider the meaning of fi(t) . fi(t) = 0 means that ray R is on the surface of volume i at the parameter value t. fi(t) < 0 means that t is between two intersection values, so the ray R is inside of the volume i. fj(t) > 0 means that the ray R is outside of volume i. From this
PAGE 89
T t t2 tl (a) X3X4\XlX2 00 01 11 10 00 t2 t2 tl t1 01 t2 tl tl t2 11 t2 t2 tl tl 10 t2 t2 Figure 4 . 9 State variables and Depth comparison
PAGE 90
84 criterion we know that intersection happens in sp2 because fi(tj) > 0, f 2 (tj) < 0, where fiit)={t+b^)*t + Cj^ (i=l,2) (4.12) Even though we did not calculate the intersection values of the sphere bounding volumes with the ray, we can compare those depths and inside test using only a pair of quadratic equation coefficients . Case 2 : cl < 0 and c2 < 0 The roots of each function have a positive and a negative root. This means that the ray starts at the intersected common part of each sphere bounding volume. Figure 4.10 (b) shows this situation. We defined the depth as a length from the origin of the ray to the nearest intersection point of the bounding volume. We apply that definition to this situation. The purpose of depth definition is to find the order of sphere bounding volumes with respect to the ray path. Adding this concept to the depth definition, we easily avoid the complicated selection equations shown in the previous case. When we calculate reflected color, the origin of the ray is changed from the view point to the intersection point. If the intersection point is in the common part of volume 1 and volume 2, the secondary ray starts from that intersection point. In Figure 4.10 the depth of spl is less than that of sp2, but the ray will intersect with both spl and sp2 . We don't know how to find the exact order of these two volumes.
PAGE 91
85 Figure 4.10 Case 2
PAGE 92
86 since both bounding volumes contain the origin of the ray. When we compare the depths of volumes we can select either volume so we can give a same order for two volumes by defining that the depth is zero when the sphere bounding volume contains the origin of the ray. The selection condition t, is a don't care. Case 3: cl < 0 and c2 > 0 Because function f](t) has a negative root and a positive root, we can infer that the ray starts from the inside of volume 1 in Figure 4.11. But function f 2 (t) has two positive roots. These two curves mean that sphere bounding volume 2 does not contain the origin of the ray, but bounding volume 1 does. These sphere bounding volumes can be overlapped or separated. In this figure even though the ray meets volume 2 at tg, we can not give higher priority for volume 2. Since volume 1 contains the origin of the ray, the depth is zero. So volume 1 must be selected in this comparison. The selection eguation is = = l (4.13) Case 4: cl > 0 and c2 < 0 This is the opposite of the situation of case 3. Bounding volume 2 has the ray starting point, rather than volume 1, so the selection eguation must select volume 2 . This situation is shown in Figure 4.12. The selection equation is
PAGE 93
87 f(t) (b) Figure 4.11 Case 3
PAGE 94
88 (a) (b) Figure 4.12 Case 4
PAGE 95
89 ^4 = ^^i = 0 (4.14) If we define two more state variables yj and 72 , we can combine these four cases with respect to t, Yi= 1 (Ci>0) 1 0 (Ci<0) 72= 1 1 (C2>0) [ 0 (C2^0) 1^172 + 3yiy2 + ^47172 = 1 17172 + ^37i 72 =72i^i7i+7i) (4.15) if ti = 1 then select fj else select f 2 . Using only coefficients b and c, we can compare the two bounding volume depths with respect to the ray path. Repeating this comparison, we can sort the depth with respect to the ray for any pixel or for any direction. 4.5.2. Sorting This is the process of arranging the sphere bounding volumes with respect to depth. The arrangement of sphere bounding volumes is undertaken so that succeeding processes may find the nearest object from the ray origin with fewer intersection tests than is needed in the basic ray tracing algorithm. Even though we employ many spherical bounding volumes, only a few of them intersect with a given ray.
PAGE 96
90 The sort is performed with the exchange algorithm [Lor 75]. The inputs of this sort are the number of intersected bounding volumes n and n sets of three numbers (the volume identifier and coefficients b, c) which represent a sphere bounding volume. This sort algorithm is presented in Figure 4.13. 4.5.3 Ray Tracing AlaorithTn Ray tracing is an algorithm that works entirely in object space. At a given point in the image plane, the visible surfaces are obtained by tracing a ray backwards from the eye through the imaging point into the scene. If this ray intersects an object, then local color calculations will determine the color that is the result of illumination at that point. This is light from the light sources directly reflected at the surface. If the object is partially reflective, partially transparent, or both, then the color of the point in the image plane should include a contribution from reflected and transmitted rays. These must be traced backwards to discover their origin, and hence the light they contribute. Determining a color for each of these rays may require the tracing of further rays and other intersections with objects. However, the ray tracing algorithm spends most of its time in the intersection calculations. To improve this bottleneck problem, the new algorithm partitions the object space and bounds the partitioned space using sphere bounding volumes at the ray tracing
PAGE 97
91 procedure SORT ( sort, n) array sort[n]; integer n; begin do i = to n select 0; temp [ select ]. set sort[i]; temp [ select ]. id i ; if ( i = n ) return ( sort ) ; do j = i+1 to n temp [! select ]. set
PAGE 98
92 initialization stage. In calculating the intersection with an object the new algorithm first performs a ray intersection with the sphere bounding volumes . After intersection tests with the sphere bounding volumes, the new algorithm sorts the sphere bounding volumes intersected by the ray with respect to depth. After sorting volumes, a ray intersection test is PÂ®tformed with objects which are in the first sorted sphere bounding volume. Once the algorithm finds the nearest object from the ray origin, the intersection test is exited. If not, the ray intersection test with other objects will be continued, until the last bounding volume is processed. Figure 4.14 shows the basic ray tracing algorithm. The suggested new algorithm is presented in Figure 4.15. Trace, shade and intersect routines form the heart of the ray tracing algorithm. The trace routine is emphasized in Figure 4.15 because shade and intersect routines could be the same as in the basic ray tracing algorithm. Each of these algorithms works with a set of object primitives, often just a collection of triangular surface facets. We can define many kinds of primitives using simple mathematical equations. Some primitives can not be bounded by a finite number of sphere bounding volumes because they are infinite objects. For example the xy plane or a cylinder defined by its radius and orientation (without giving a length) can not be bounded by a finite number of spherical bounding volumes. If we consider such primitives, we need to employ two lists, one for sphere
PAGE 99
93 procedure Ray_trace (start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : intersection_point , ref lected_direction, transmitted_direction colors : local_color, ref lected_color , transmitted_color color
PAGE 100
94 procedure Ray trace ( start, direction, depth, color) vector : start, direction integer : depth colors : color begin vector : inter section_point, ref lected_direction, transmitted_ direction colors : local_color, ref lected_color , transmitted_color if depth>maxdepth then color block else begin color
PAGE 101
95 bounding volumes and the other for infinite objects. The ray intersection test needs to consider both lists. We find the nearest objects from each list, and decide the nearest object by comparing two intersection values . And also we must check the inside test for finite objects. These things are considered by the new algorithm outlined in Figure 4.15. 4.6 Data Structure There are many kinds of data types which are required to efficiently process ray tracing algorithms. To describe the objects in the image space we need to define and link primitives. Vector operations are also required in the ray tracing algorithm. The basic data structures for ray tracing were studied by Heckbert [Hec 89]. Our algorithm is designed to alleviate the ray intersection test with all objects. This new algorithm employs bounding volumes and allows overlapping of objects between bounding volumes. By defining new data structures for the primitives and the link list, we can save memory space and reduce ray tracing execution time. 4.6.1. Primitives One of the goals of writing a ray trace program is to make it easily extensible so anybody could use it to try out various advanced ray tracing techniques. This requirement demands an object oriented programming. Rather than group the
PAGE 102
96 software by procedures, we group it by data structures. Thus instead of collecting all the intersection methods into one file and all the normal vector formulas into another, we split the problem another way, by collecting procedures for each primitive type into a file of its own. For example there is one file containing sphere related routines, another file for polygonal related routines, etc. This has the advantage that primitivedependent information can be hidden in data structures local to each file and the procedure interfaces can be very simple and generic. Adding new primitives to the system becomes easy with this scheme. Since the details of each primitive's data structure will be local to that submodule, all operations on the primitive must be supported by generic procedures, the most important of which are procedures for intersection, for normal calculations , and for reading the specification for a primitive [Hec 89]. We must also check the ray field descriptor in the object whether the object was hit by the ray or not. Figure 4.5 shows two primitives in bounding volume 1. The test result is that ray R2 misses object A. The next sphere bounding volume which was sorted by the depth sort algorithm is volume 2. For the objects in volume 2 the ray intersection test will be the next procedure. At this stage we know that ray R2 does not meet object A as a result of previous stage of the calculations. However, the ray tracing algorithm does not keep track of previous intersections, so the ray tracing
PAGE 103
97 algorithm again performs the ray intersection test for object A. The ray intersection test takes more time than that for the comparison operation. By employing a ray identification field in the primitive, we can overcome this problem. When the ray R2 tests for intersection with object A, the algorithm checks whether the ray id field of the object A is R2 or not. If the field is not R2 , then the algorithm performs the ray intersection test with object A. If the ray misses object A, then the algorithm writes R2 in the ray id field of object A. If ray hits object A at the other volume ( i.e, a ray intersection with object A happened but the inside test failed ) , then the algorithm does not write R2 in the ray id field of object A. In volume 2 ray R2 does not need the intersection test with object A. When the algorithm checks the ray id field, the algorithm will find that object A has been tested in the previous volumes and ray R2 has missed object A. Thus we can avoid unnecessary intersection tests by employing a ray id field with each finite primitive. 4.6.2 Link List There are three kinds of objects in the new algorithm. One is the finite object which can be bounded by finite number of sphere bounding volumes and the second is the infinite object which can not be bounded by finite number of sphere bounding volumes. The last is a light list. Finite objects
PAGE 104
98 are partitioned by bounding procedures and each sphere bounding volume has its own link list. The new depth sort algorithm allows overlapping objects between sphere bounding volumes. Figure 4.16 shows four objects which are bounded by three sphere bounding volumes. To represent this in the image space we use eight objects, two sets of A,B,C,D. Each object needs a bigger memory space than that of two pointer fields. Because each object has its own primitive specification fields, pointer fields are needed for the intersect procedure and the normal vector procedure. This kind of link structure is proper for a small number of objects in an image space. For a large number of objects in an image space we need to employ a new link structure to save memory space. Such a new link structure is presented in Figure 4.16 (c). There are object lists which link every finite object in the image space. Each sphere bounding volume is linked in the new data structure. The new data structure has two fields. One is for linking the next item, the other field for pointing to the corresponding object. So instead of eight objects, eight pointers are enough to represent objects in the image. 4.6.3 Sorting The sort procedures require two buffer registers to compare the depths of two sphere bounding volumes. Each sphere bounding volume in the sort procedure is represented by three items: two coefficients b,c of the quadratic equation and an
PAGE 105
2 99 SP1 SP3 LBU NULL LB LDL sP3~* i rJ~ ld (b) NULL ObjectÂ— List AV spr^ SP2^ SP3^. LB A V (c) LDL AV NULL NULL Figure 4.16 Data Structure of New Algorithm NULL NULL NULL
PAGE 106
100 identifier of the sphere bounding volume. The candidate bounding volumes are stored in the array of memory. Because the sorting strategy is a linear selection with exchange, we need to keep the array index of the nearest volume during the comparison procedure. The data structure for two buffer registers consists of three pairs (b,c, id) and the array position of the nearest object. Two buffer registers make exchange easy between the two sphere bounding volumes . 4.7 Hardware Consideration 4.7.1 Introduction The basic idea of the new algorithm is to reduce the number of ray intersections with objects in an image space which one needs to consider because the ray tracing algorithm spends most of its execution time in the intersection stage. We employ sphere bounding volumes to find the nearest object. Since sphere bounding volume strategy has the void volume problem, we allow object overlap in a few sphere bounding volumes. This requires many interaction point calculations of sphere bounding volumes to sort depths with respect to the ray path. We know that the nearest bounding volume has a high probability of containing the nearest object. We need not calculate the intersection point of bounding volumes, we only need to calculate the intersection status whether the ray hits bounding volumes or not to sort depth and to reduce execution time .
PAGE 107
101 When we employ a small number of bounding volumes, each bounding volume contains more objects than when we employ many bounding volumes. Rayobject intersection calculation time will be considerable for each sphere bounding volume. When we employ a large number of bounding volumes the void area of each bounding volume will be smaller than when we employ a small number of bounding volumes. We already discussed possible hardware implementation of the rayobject depth sorter in section 4.2. The bottleneck which is caused by employing a large number of bounding volumes can be solved in the design of the hardware. Since this hardware only considers the sphere bounding volumes, it has no problem when we employ new primitives to describe image parts. So using this hardware, we can extend the ray tracing algorithm through software . That hardware performs two functions : 1. ray intersection test with every sphere bounding volume for a given ray information (direction, origin); 2 . depth sorting for output of item 1 . Whenever a ray direction is changed, the intersection part must perform tests for all bounding volumes. This task may be computationally very intensive because there are many rays which must be considered during the execution of the ray tracing algorithm. Since the intersection part and the depth test procedure are always in fixed order, we can implement this intersection part as a pipeline.
PAGE 108
102 But the output of intersection part is not stable. For example in some cases the ray does not hit any sphere bounding volume and in some cases the ray hits some of the sphere bounding volumes. Depth sorting of the sphere bounding volumes is performed on the output of the intersection part ( presort part ) . The number of bounding volumes being sorted varies for each ray and the time for sorting is not fixed. So the direct connection between the ray intersection part and depth sorting is not well designed. There must be an array of memory between them to act as a buffer. This array is used for output storage of the intersection parts and also is used for input to the depth sorter. Figure 4.17 shows the block diagram of this hardware. The contents of the sphere bounding volume in Figure 4.17 is not changed during the execution of ray tracing and those values are assigned at the initialization stage. Whenever ray information is changed, the contents of the buffer is changed. 4.7.2 The Ray Intersector This device performs the ray intersection tests with all spherical bounding volumes. Since the information about these volumes is always used for testing ray intersections, the information is stored in a memoryresident array. The information consists of the center of the sphere, the radius of the sphere, and a number that identifies the sphere.
PAGE 109
103 A rray of S BVs 1 Vi Ri 2 V2 R2 3 V3 R3 n Vn Rn N Vn Rn Ray Intersector t)i Cl idi b2 C2 id2 b3 C3 id3 bn Cn idn 0 0 0 0 0 0 Depth Sorter Buffer SBV : Sphere Bounding Volume Input : d : ray direction X : ray origin Output : sorted SBVs Figure 4.17 Block Diagram of Hardware
PAGE 110
104 The ray intersector has two parts, as shown in Figure 4.18(a), the Bounding Volume Coefficient Generator (BVCG) and the ray intersector itself. The design details of the BVCG are given in Figure 4.18(b). A ray is defined by two pieces of information: the origin and the direction. These pieces of information are used to generate the coefficients of the quadratic equations (4.5) and (4.6) for each bounding sphere. To obtain b from b/2 we perform a left shift, an operation which can be implemented by shifting the wiring (or merely labeling the bus wires with a onehigher subscript) . There are five (5) steps required to get b; the coefficient c requires 4 steps. Each row represents each step in that picture. When we implement these steps as parallel pipeline elements, the coefficients are produced at the same pipeline stage . Using these two coefficients (b and c) the root position test (RPT) of the ray intersector hardware performs the root position tests whether both roots are real or not. When the root tests show possible real intersection( s ) , the coefficients and the volume identifier are moved to a buffer for sorting calculations. The discrimination test D in Equation (4.8) yields information on whether the roots are real or not. The most important result of that calculation is the sign of the result, because a nonnegative sign means that ray hits bounding volume. This makes the hardware needed for the determination of real roots very simple.
PAGE 111
105 Ray Direction Origin Volume information switch BVCG ; Bounding Volume Coefficient Generator RPT : Root Position Tester Figure 4.18 Block Diagram of Ray Intersector
PAGE 112
106 (b) Figure 4.18 Continued (BVCG)
PAGE 113
107 switch b 1/0 (C) Figure 4.18 Continued (RPT)
PAGE 114
id Figure 4.18 Continued (Pipelined Ray Intersector)
PAGE 115
109 We can obtain the values of b/2 and c directly from Figure 4.18(b). The details of the design of the RPT are shown in Figure 4.18(c). By using D/4 we can reduce the pipeline stages by one. To test for the root position in the RPT module, we need three inputs (the sign of b/2, the sign of D, and the sign of c). Different implementations may use varying formats for mantissa length, exponent length, radix, encoding of negative numbers, and possible use of a floating point hidden bit. However, in microcomputer systems the 1985 IEEE floating point standard is becoming widely established. In this standard the sign bit is zero for a nonnegative number, and it is one when the number is negative. Figure 4.19 shows the relationship between the sign bits and the related quadratic curves. From the associated Table (a) we derive the switch relationship: switch = Db + Dc= D{b+c) (4.16) When the switch is TRUE the coefficients b and c, as well as the volume identifier are moved to the output buffer of this stage. When the switch is FALSE, the output will be overwritten by the result of the next test. Serial connection of the BVCG and the RPT will not complete the intersection pipeline. Each bounding volume must be processed at the same level of this pipeline. To keep the relevant information we need to employ registers in BVCG as well as in RPT. Also in the Ray Intersector registers are
PAGE 116
110 D\bc 00 01 11 10 0 0 1 1 1 1 0 0 0 0 switch=D'c+D'b=D'(b+c) (a) b=1,c=0 Figure 4.19 Relationship between sign bits and the related quadratic curve
PAGE 117
Ill needed to keep track of the volume identifier numbers. With these considerations, the ray intersector pipeline is shown implemented in Figure 4.18(d). The pipeline has six stages. The first stage forms the difference between the ray origin and the center of the sphere. The coefficients b and c in equations (4.5) and (4.6) are generated in the second through fourth stages. The discrimination equation (Eq. 4.8) is performed at the fifth and sixth stages. Also in the sixth stage the root position test is performed. The second and fifth stages are multipliers and the other stages are adders /subtracters . When this pipeline is implemented we need to consider the throughput of the pipeline and the required machine cycle times. The throughput is the rate of completion of instructions in the pipeline. Since the pipeline stages are sequential, all stages must operate at the same cycle time. The cycle time is the instruction completion time and the time required to initialize the next stage. The maximum rate of this operation is determined by the slowest pipeline stage. Multiplication could be accomplished with a massive parallel multiplier. However, hardware considerations normally limit multiplication hardware to be no more than repeated additions. Hence we can infer the slowest pipeline stage. Stage six consists of a subtractor and two levels of logic gates. Generally subtraction is much faster than multiplication, so the work in stage six can be performed
PAGE 118
112 during' 'ths inultiplica'tion tiins nBÂ©dÂ©d in thÂ© othÂ©]r s'tagÂ©Â©. HÂ©ncÂ© thÂ© pÂ©rformancÂ© of thÂ© ray intÂ©rsÂ©ctor will bÂ© limitÂ©d by thÂ© multipliÂ©r timÂ©s rÂ©quirÂ©d in stagÂ©s two and fivÂ©. ThÂ© ray intÂ©rsÂ©ction hardwarÂ© producÂ©s a ray intÂ©rsÂ©ction rÂ©sult pÂ©r Â©ach multiplication Â©xÂ©cution timÂ©. HÂ©ncÂ© to procBss N bounding sphÂ©rÂ©s, thÂ© hardwarÂ© will neÂ©d timÂ© Â©quivalent to approximatÂ©ly N multiplications. 4.7.3 ThÂ© DÂ©pth Sorter WÂ© can easily compare thÂ© depth of volumes from thÂ© characteristics of thÂ© spheres, thÂ© centers and radii. The basic idea of thÂ© depth sorter is embedded in thÂ© state variables on Section 4.5.1. The block diagram of thÂ© depth sorter is shown in Figure 4.20. The information about the sphere bounding volume which intersects the ray is fetched from the output buffer of the previous stage and is moved to one of the input registers (tempA or tempB.) The selection of which buffer to use is made by the output of the Select Module (SM). When the output of the SM is 0, tempA is used; otherwise tempB is used. The input buffer will receive the information about the bounding volume (which intersected the ray.) Depth comparisons will be continued until the last buffer information is received. The control unit issues an " endoflist " signal on completion of the bounding volumes list. The results from the depth comparisons are stored in
PAGE 119
113 SVG:State Variable Generator Figure 4.20 Block Diagram of Depth Sorter
PAGE 120
114 a buffer through a 2:1 MUX. Figure 4.13 explains the data flow and the data storage in the depth sorter. The key function of the depth sorter is depth comparison. There are two modules involved in this. One module (SVG) generates the state variables Xi, X 2 , X 3 , X 4 , yi, and The state variables are generated from the results of the earlier calculations. Using these state variables, the other module (Select) selects the particular temp register (tempA or tempB) . Figures 4.21 (a), (b), (c) and (d) show the calculation of the state variables, x, results from a simple subtraction between the two b's. The calculations for X 2 and Xj require three steps. Each row represents one step. Figures 4.21 (b) and (c) indicate left shift operations. Twice an integer value is that value left shifted by one place. For floating point values the same is achieved by adding one to the exponent of a radix2 floating point number. Although he algorithms indicate multiplications, we get the same results from shifting (which is equivalent to buswire rerouting) or from a shortword integer addition. The state variable X 4 is the result of 4 operations . This is the most complicated of the calculations that go into producing the state variables . The time taken by the state variable generator depends on the calculation of X 4 .
PAGE 121
115 b2 bi b2 bi C1 C2 X2 (b) Figure 4.21 State Variable Xj and X 2
PAGE 122
116 Ci C2 4 ^ X3 b2 Cl bi C2 t>2 bi m4 W t X4 (d) Figure 4.21 Continued (X 3 and X 4 )
PAGE 123
117 The following are the operations necessary for the state variables x, through X 4 : Xl = b^b^ (4.17) X 2 {bi ~b2 ) ~4 { 0 ^02 ) (4.18) X 3 = b,b 22 (C 1 +C 2 ) (4.19) ib2C^b^C2) U?ib 2 ) (CiC2)2 (4.20) Since the comparisons of the state variables are always with the value zero, we can use the sign bits of Xi, X 3 , and X 4 directly in the indicated operations. For Xj the sign needs to be inverted, potentially a trivial hardware solution, using at most a single gate. The variables y are treated similarly as the variables x. With IEEE floating point numbers, for the representation of zero, the mantissa part is always zero, and when the number is not zero, the first bit of the mantissa is not zero. Using this property the state variables y are: y = si (4.21) where s: sign hit of coefficient c f: first bit of c's mantissa part The relationship between the state variables x, through X 4 is summarized in Figure 4.21. In the pipeline, the state
PAGE 124
f\s 0 1 0 0 1 1 0 0 y = sf (a) mantissa exponent y (b) Figure 4.22 State Variable Y
PAGE 125
119 variable y, is obtained from c, in tempA and yj is obtained from C 2 in tempB. The relationships between the various x state variables are presented in Figure 4.23. This figure also details the steps we need to take to design this SVG( State Variable Generator ) as a pipeline. The process requires four steps. Note that X, is completed in stage 1, but X4 is not complete until stage 4. Hence the selection can not be completed until stage 4 is finished. Because of the basic characteristics of a sort function, it is not easily implemented in a pipeline. The six state variables are generated in the SVG module, and those six are input into the Select module, which performs simple logic operations. Figure 4.24 presents the logic diagrams of the Select module. We replace the AND gate A in Figure 4.24 with an XOR gate by replacing the negation gate in Figure 4.23. Thus the depth sorter in Figure 4.23 will sort the depths of the sphere bounding volumes using only two modules, SVG and Select. The ray tracing algorithm uses only information in memory buffers where information about those depth sorted volumes is stored. The number of bounding volumes which are employed in the ray tracing algorithm depends on the characteristics of the hardware; ideally it should hold in its own buffers all the information needed for all images it is to handle.
PAGE 126
120 bi ci b2C2 y 2 yi X4 X3 X 2 xi Figure 4.23 Relationship between Variables
PAGE 127
121 Figure 4.24 Select Module
PAGE 128
CHAPTER 5 EXPERIMENTAL RESULTS The main problem of the ray tracing algorithm is the number of ray intersections with the various objects in the scene. To find one intersection point, the ray must perform the intersection tests with all the objects in the image space. This is the most severe bottleneck in the ray tracing algorithm. The traditional fast algorithms attack this bottleneck by employing bounding volumes or by partitioning the image space. Each algorithm has its own limitations as explained in chapter 3. To overcome these problems (including the problem of each fast algorithm and that of the bottleneck), we suggested a new algorithm in chapter 4 for the general ray tracing procedure. In this chapter we present the results of extensive simulations of the performance of the proposed algorithm. We pay special attention to the following two aspects : 1. the location of the bottleneck in the procedure; 2. performance of the new algorithm at the bottleneck. This simulation was performed on an RS6000 workstation. To compare the performance between the original ray tracing algorithm and the new algorithm, we employ simple primitives. 122
PAGE 129
123 The primitives are the sphere, the plane, and the triangle. In the simulation we employ two models. One model consists of 2 planes and 264 spheres, the other model consists of 4096 triangles . The big difference between the two models is that one has a small number of objects in the image space and the other model has a large number of objects. The simulation is first performed using a small image resolution (80x50), then the tests are repeated with a much higher resolution (640x400). Analyzing the two sets of results, we find relations between the two models. For convenience we give a name to each model. The first model, which has the small number of primitives is called "trees" (see Figure 5.1). The second model, shown in Figure 5.2, is called "delta". The resolution of both displayed pictures is 1024 x 768. Before we calculate these pictures we simulate both at a lower resolution. Because computation of a high resolution image takes a long time, we start from a small resolution in order to save computation time (and to check for possible program errors ) . In these simulations, we measure the initialization time, the ray tracing time and the number of ray intersections with the objects for each model and given image resolution. Before starting ray tracing, the algorithm needs to read the objects to link each other or to link them to some bounding volumes. The time taken by this step is called the initialization time.
PAGE 130
124 Figure 5.1 Trees
PAGE 131
125 Figure 5.2 Delta
PAGE 132
126 After initializing the environment, the algorithm traces the rays to draw the entire image on the screen. The time taken by this step is called the ray tracing time. Generally ray tracing time consists of four parts: 1. ray time : total time for making new rays; 2. trace time : total time for rayobject intersections; 3. shade time : total time for color selection at the ray intersection point; 4. write time : total time for writing the output to some file . To measure the times listed above , we use two different times (calendar time and process time). We record the start time and the end time. By taking the difference between these two times we measure the process time. In some cases the measured total process time results in an overflow because the process time resolution is very small and the processing time is very long. In these cases we use the calendar time (wallclock time) to measure the performance. When we open a work station, we find that there are many processes that run on the system. Even though we remove the unnecessary processes to measure the exact processing time, still there are some processes which will be included in the system time. Since the CPU distributes the time to each process, the measured process time has some variation for the same job. For example the initialization time in Figure 5.3
PAGE 133
127 must be the same as the time for the initialization of that of Figure 5.6. (The figures are grouped at the end of this chapter.) But a slight difference is found when we compare the corresponding two columns in the table of measurements . This small deviation is not an important factor when we analyze the perfoinnance of each algorithm. The dominant factor (trace time) is not affected by this deviation. When we measure the calendar time, it shows more time than the process time, because calendar time includes the clock function call time, which then gives the process time. In the tables, a "0" for the number of bounding volumes means that ray tracing is performed by the original algorithm, shown in Figures 5.3, 5.6, 5.9, 5.12. Based on section 4.4, automatic bounding is performed on the model "trees". When we compare the initialization time of two resolutions (80x50, 640x400), we find that initialization stage is not affected by the resolution. But ray tracing time depends on the image resolution. For a small window ( 80x50 ) , initialization time takes a big portion of whole process time. For example, initialization is 10% of the whole processing time for the case where many bounding volumes are used, as in the last row of Figure 5.3. But the initialization time shown in the last row of Figure 5.6 is a smaller portion of the total processing time than that shown in Figure 5.3. Since the initialization stage is performed in the image space, this time is not affected by resolution but is affected by the
PAGE 134
128 number of bounding volumes. The amount of initialization time is almost constant. The process time consists of 4 times as we remarked before. Figure 5.4 (a) compares the initialization times and each component of the ray tracing time. Figure 5.4 (b) compares the same things as Figure 5.4 (a) except the trace time. The other factors are not significant in Figure 5.4(b). This phenomenon is the same in Figures 5.7 and 5.13. When we compare the corresponding 4 columns in Figures 5 . 3 and Figure 5.6, trace time is found to be the dominant factor for ray tracing calculations. In the traditional ray tracing algorithm rayobject intersection tests are done during trace time. In the new algorithm not only the rayobject intersection tests but also the sorting of bounding volumes are performed during trace time. Trace time depends on the number of rayobject intersections in the traditional ray tracing algorithm. Since we employ more bounding volumes, there are fewer rayobject intersections. The ray tracing time (also trace time) decreases with the increasing the number of bounding volumes until some limiting number of bounding volumes is reached. When the number of bounding volumes is then increased, the process time increases. The reason is that the sorting procedure for bounding volumes takes more time than the rayobject intersection time. Figure 5.5 shows trace time and the number of rayobject intersection relations.
PAGE 135
129 The basic ideas of the new ray tracing algorithm are the following: 1 . sort procedure for bounding volume is performed by Depth Sorter (hardware) 2. rayobject intersection is performed by software. Thus in the new algorithm sorting time can be ignored if we employ a reasonably small number of bounding volumes. The trace time (also ray tracing time) depends on only the number of rayobject intersections. When the light sources are in the image space, the time taken for shading calculations depends on the number of rayobject intersections, because the procedure for checking whether the intersection point is in the shadow or not reguires rayobject intersections. Ray time and write time do not depend on the number of bounding volumes . In this simple model, "trees", we did not put any light sources in the space occupied by the object. Shade time is not affected by the number of bounding volumes. Using software simulation only we reduce the ray tracing time; it is THREE times faster than that of the traditional ray tracing algorithm. If we put a few light sources in the image space, we get more efficient results than that of Figure 5. 3, 5. 6. The model "delta" has 4096 triangles. The image of this model is calculated for two resolutions (80x50, 640x400). To get simple comparisons we did not put light sources in this model. As we
PAGE 136
130 expected the number of rayobject intersection is decreasing when we increase the number of bounding volumes. This is represented by Figures 5,11 and 5.14. Because of the behavior of the sort procedure for bounding volumes, trace time increases when the number of bounding volumes exceeds 442 . Using only software simulation we get 16 times faster result than that of the traditional algorithm for model "delta" . When the number of bounding volumes is 794 the initialization time takes more time than the tracing time in Figure 5.3, But this initialization time is not a big portion of the total time, shown in Figure 5.12. The initialization time depends not only on the number of bounding volumes but also on the number of objects in the image space. In comparing the pictures produced by using the standard procedure and by the new algorithm, yet another test must be made. We must compare the pictures pixel by pixel to assess their fidelity. In the following we discuss the results for this comparison. Two image files are generated by the traditional ray tracing algorithm and the new algorithm. The resolution of both images is 1024x768. The new algorithm uses 42 bounding volumes to produce the image file. A pixel of each image file is a 24 bit color specification. We compare the two files pixel by pixel. The number of color pixels is 176,703. Of these 263 pixels color pixels are different. To analyze these pixels, we highlight those pixels with white color as shown in Figure 5.16.
PAGE 137
131 Aliasing and round off error are the main reasons for the difference. Let's consider the aliasing problem using Figure 5.15. Two planes A,B are intersected at I. The ray R hits the intersection point I. If plane A is Red, and plane B is Blue, then what color must be selected by the ray R? If the objects are linked by linklist 1, then the ray R selects color A. If the objects are linked by linklist 2, the ray R returns color B. If the linklist order of traditional algorithm is linklist 1 and the new algorithm uses link list 2, a pixel difference occurs at the intersection point when we compare the two files. 144 pixels of 176,703 pixels is 0.08% of the total color pixels. This number is reasonable because image files are produced by two different algorithms. Generally to overcome the aliasing problem, we may use a supersampling algorithm. Supersampling can also be applied to the new algorithm to reduce aliasing problems. The whole processing time of the original algorithm is 21 hours 24 minutes 17 seconds. The new algorithm with 545 bounding volumes takes 1 hour 32 minutes 21 seconds to produce 1024x768 resolutions.
PAGE 138
Resolutions : 80 x 50 Model : Trees 132 # of BVs initializatio n time ray tracing time # of intersection s 0 1 sec 30 sec 2984254 2 1 sec 20 sec 1173790 4 1 sec 21 sec 1128705 8 1 sec 14 sec 557034 12 1 sec 12 sec 345099 18 1 sec 11 sec 253588 30 1 sec 10 sec 160896 100 2 sec 19 sec 254072 140 3 sec 16 sec 87019 208 5 sec 20 sec 63245 # of BVs initiali zation ray trace shade write 0 59 21 2723 40 34 2 66 41 1661 39 31 4 63 34 1707 50 42 8 67 28 1032 51 24 12 68 31 797 45 30 18 71 32 713 42 36 30 76 32 633 60 33 100 108 36 1520 61 27 140 137 25 1266 46 31 208 218 31 1654 45 34 (Unit : RS6000 process time) Figure 5.3 Experimental Results for Trees (80x50)
PAGE 139
133 Process time init ray trace shade write Process time # of BVs init ray shade write (b) Figure 5.4 Process Time for Trees (80x50)
PAGE 140
134 Trace time 3,000 2,500 2,000 \ _ \ 1,500 \ . m Â• \ / 1,000 % / 500 1 1 1 1 1 1 1 1 1 1 0 2 4 8 12 18 30 100 140 208 # of BVs # of intersections Test picture resolution : 80 x 50 unit of Y axis ;0.1million Figure 5.5 Trace Time and Number of Intersections for Trees (80x50)
PAGE 141
Resolutions : 640 x 400 Model : Trees 135 # of BVs initializatio n time ray tracing time # of intersection s 0 1 sec 33 min 7 sec 190917776 2 1 sec 20 min 44 sec 74222342 4 1 sec 21 min 24 sec 71700158 8 1 sec 13 min 41 sec 34928037 12 1 sec 11 min 19 sec 21587964 18 1 sec 10 min 32 sec 15933920 30 1 sec 9 min 45 sec 10115195 100 2 sec 19 min 5 sec 15906173 140 1 sec 16 min 49 sec 5451464 208 3 sec 20 min 37 sec 3982947 # of BVs initial! zation ray trace shade write 0 57 1856 172627 3007 1949 2 62 1924 105869 2885 2004 4 64 1842 109968 3003 1861 8 66 1812 64234 2895 1698 12 68 1932 50059 2891 1843 18 70 1890 45834 2838 1798 30 78 1839 40668 2980 1971 100 105 1997 96166 2905 1893 140 126 1965 82748 2847 1873 208 188 1877 104953 2887 1997 (Unit : RS6000 process time) Figure 5.6 Experimental Results for Trees (640x400)
PAGE 142
136 Process time init ray trace shade write (a) Process time (b) Figure 5.7 Process Time for Trees (640x400)
PAGE 143
Trace time H on nnn lOUjUVJVJ 160,000 ^ 140,000 120,000 100,000 * \ 80,000 t \ 60,000 t V ** . / 40,000 20,000 1 1 1 1 1 1 1 I 1 1 0 2 4 8 12 18 30 100 140 208 # of BVs # of intersections unit of Y axis : 10 million Figure 5.8 Trace Time and Number of Intersections for Trees (640x400)
PAGE 144
Resolutions : 80 x 50 Model : Delta 138 # of BVs initializatio n time ray tracing time # of intersection s 0 14 sec 6 min 41 sec 22118400 2 17 sec 5 min 35 sec 17354968 8 19 sec 3 min 4 sec 8864860 10 2 1 sec 2 min 15 sec 6681306 42 38 sec 47 sec 2114012 272 2 min 54 sec 24 sec 584025 442 4 min 25 sec 24 sec 346039 542 5 min 50 sec 27 sec 287197 794 8 min 29 sec 33 sec 239093 # of BVs initiali zation ray trace shade write 0 1370 34 39519 12 27 2 1446 24 33113 11 28 8 1606 32 17354 2 29 10 1665 35 13144 9 28 42 2601 27 4456 13 37 272 10324 28 2196 5 34 442 16592 25 2173 9 32 542 21273 31 2374 12 37 794 30375 30 3017 13 31 (Unit : RS6000 process time) Figure 5.9 Experimental Results for Delta (80x50)
PAGE 145
139 Process time init ray trace shade write Process time 35.000 30.000 / 25.000 / 20.000 15.000 y/ 10.000 5,000 / Q Ll 1 ^ i ^ 1 I 1 u 0 2 8 10 42 272 442 542 794 # of BVs (b) Figure 5.10 Process Time for Delta (80x50) init ray shade write
PAGE 146
140 Trace time 50.000 40.000 30.000 20.000 10,000 0 0 2 8 10 42 272 442 542 794 # of BVs # of intersections unit of Y axis : Million BV : Bounding Volumes Figure 5.11 Trace Time and Number of Intersections for Delta (80x50)
PAGE 147
Resolutions : 640 x 400 Model : Delta 141 # of BVs initialization time ray tracing time # of intersections 0 14 sec 7 hr 3 min 48 sec 1403117568 2 15 sec 5 hr 56 min 12 sec 1104201523 8 17 sec 3 hr 7 min 7 sec 561840100 10 17 sec 2 hr 22 min 22 sec 423209110 42 28 sec 49 min 20 sec 133634620 272 1 min 43 sec 24 min 54 sec 36554101 442 2 min 45 sec 24 min 52 sec 21799319 542 3 min 22 sec 26 min 59 sec 18026867 794 5 min 6 sec 33 min 47 sec 14892317 # of BVs initial! zation ray trace shade write 0 1381 2038 2520375 547 2200 2 1448 2059 2111824 536 2043 8 1610 2224 1103345 581 1968 10 1669 2100 836415 533 2014 42 2599 1942 282839 593 1826 272 10211 2107 137558 586 1869 442 16440 1981 137222 637 1999 542 21103 2065 149953 590 1883 794 30374 1993 190332 619 1929 (Unit : RS6000 process time) Figure 5.12 Experimental Results for Delta (640x400)
PAGE 148
142 Process time 3,000,000 2,500,000 s 2,000,000 1,500,000 1,000,000 500,000 0 ^ ^ ^ 0 2 8 10 init ray trace shade write 42 272 442 542 794 # of BVs Process time init ray shade write (b) Figure 5.13 Process Time for Delta (640x400)
PAGE 149
Trace time 3 . 000 . 000 2 . 500.000 2 . 000 . 000 1 . 500.000 1 , 000,000 500,000 0 0 2 8 10 42 272 442 542 794 # of BVs # of intersections picture resolution : 640 x 400 unit of Y axis : 100 Millions BV : Bounding Volumes Figure 5.14 Trace Time and Number of Intersections for Delta (640x400)
PAGE 150
144 Linklisti Â— ^ Linklist2 Â— Null Null Figure 5.15 Aliasing
PAGE 151
145 Figure 5.16 Differences between two outputs
PAGE 152
CHAPTER 6 CONCLUSION 6 . 1 Summary The objective of this study was to develop a new fast algorithm for ray tracing and to specify hardware which will substantially ease computational bottlenecks in the ray tracing procedure. The algorithm which was developed employs sphere bounding volumes to eliminate having to check for the possible intersection of each object with each ray that characterizes the pixels of the final image. Traditional bounding volumes are used to bound objects. The sphere bounding volumes of the new algorithm are used to bound subspaces which could have whole objects and/or parts of an object. The sphere bounding volumes are sorted with respect to ray direction for each ray in order to find the object nearest to the ray origin. Traditional sorting of sphere bounding volumes needs to calculate square roots. To avoid square root calculations, we developed a comparison algorithm which uses coefficients of quadratic equations for sorting bounding volumes. In the traditional algorithms the computational bottleneck is caused by the high number of rayobject intersections. In the new algorithm, rayobject intersection test starts from the nearest bounding volume. If a ray hits an object in the 146
PAGE 153
147 bounding volume, the intersection test is terminated. If not, the intersection test considers the next nearest bounding volume . This algorithm gives us a smaller number of rayobject intersections when we employ more bounding volumes. Even though the number of intersections is decreased, the computational requirements for sorting sphere bounding volumes is increased. To eliminate this extra load we use a depth sorter, which can potentially be implemented in simple hardware . The basic ideas of the new algorithm consists of the following: 1. Sorting bounding volumes are done by the depth sorter; 2. Rayobject intersection is done by software. The performance of the new algorithm was verified by extensive computer simulations. When we employ many bounding volumes, the number of rayobject intersections is decreased at the expense of sphere intersections. Since the new procedure greatly simplifies the sphere sorting process, there is a substantial saving in overall computation time, amounting to a sixfold decrease in many of the test cases. To verify the new algorithm we compared two outputs which are produced by each of the algorithms (traditional ray tracing algorithm and new algorithm) .
PAGE 154
148 6.2 Remarks on the New Algorithm This new algorithm was simulated on an RS 6000 workstation. The C programming language was used for implementation. Even without hardware implementation this new algorithm shows a noticeable reduction of computational requirements. If the model includes light sources, or refraction, then the new algorithm shows even more of a computational improvement. Although designed for hardware implementation, this new algorithm can be used for speeding up the general ray tracing process. The new algorithm need not build bounding volume hierarchies in the initialization stage, because hierarchies with respect to ray direction are built by the sorter for each ray. Glassner's algorithm [Gla 84] cannot efficiently utilize memory for subspace divisions. Fujimoto's algorithm needs a very large memory because it needs memory assignments even for empty subspaces. Since the subspace which has no objects or any part of an object needs no bounding volume, much smaller memory spaces are needed than that required by the spatial subdivision algorithm. The reduced memory requirements are advantages of the hierarchical bounding volume algorithm and of the spatial subdivision algorithm. The new algorithm may be applied to any kind of object model. Even though the hardware discussed is for bounding volume sorting, any kind of primitive can be included in the new algorithm. Because the sort is performed not on the objects.
PAGE 155
149 but on bounding volumes, we can easily extend the ray tracing algorithm. This new algorithm shows great speed improvements, needs a relatively small fixed memory space for the bounding volumes and also gives expandability whether implemented in hardware, or not. 6.3 Recommendations for Future Research Many follow up studies are possible. Specially, the studies in automatic assignment of the proper number of bounding volumes, and more efficient bounding algorithms may yield further improvements . Several aspects of various bounding algorithms are considered in chapter 4. To get more computational efficiency we need a robust sphere bounding algorithm based on following considerations: 1. reduction of void areas; 2. reduction of overlap areas. To solve these problems, we must understand how many bounding volumes are optimum for a given model, and when the memories for bounding volumes are fixed, how to subdivide the image space for efficient use of the available memory. To devise a computationally efficient procedure for these tasks one needs yet another bounding algorithm.
PAGE 156
REFERENCES [Arv 87] Arvo, J. and D. Kirk, "Fast Ray Tracing by Classification", Computer Graphics, 21(4), pp. 5564, July 1987 [Arv 89] Arvo, J. and D. Kirk, "A Survey of Ray Tracing Acceleration Technigues", in A. Glassner, Introduction to Ray Tracing, Academic Press, San Diego, CA, pp. 201262, 1989 [Bli 80] Blinn, J., L. Carpenter, J. Lane and T. Whitted, "Scan Line Methods for Displaying Parametrically Defined Surfaces", Communications of the ACM, 23(1), pp. 2334, January 1980 [Bur 89] Burger, P. and D. Gillies, Interactive Computer Graphics, AddisonWesley Publishing Company, Reading, MA, 1989 [Cro 94] Cronin, T., J. Marshall, and M. Land, "The Unique Visual System of the Mantis Shrimp", American Scientist, 82(4), pp. 356365, July 1994 [Ere 94] Ercegovac, M. and T. Lang, Division and Square Root, Kluwer Academic Publishers, Boston, MA, 1994 [Fie 88] Fleischman, R., Augmentable Parallel Processor Architectures for RealTime Computer Generated Imagery, PhD Dissertation, University of Florida, December 1988 [Flo 89] Flowler, D. et al . , "An Accurate, High Speed Implementation of Division by Reciprocal Approximation", Proceedings of the 9th International Symposium on Computer Arithmetic, 1989, pp. 6067. [Fol 90] Foley, J. , A. van Dam, and S. Feiner, Computer Graphics: Principles and Practice, Second Edition, AddisonWesley, Reading, MA, 1990 [Fuj 86] Fujimoto, A., T. Tanaka, and K. Iwata, "ARTS : Accelerated Ray Tracing System", IEEE Computer Graphics and Applications, 6(4), pp. 1626, April 1986 150
PAGE 157
151 [Gla 84] Glassner, A., "Space Subdivision for Fast Ray Tracing", IEEE Computer Graphics and Applications, 4(10), pp. 1522, October 1984 [Gol 80] Goldsmith, J. and Salmon, J. , "Automatic Creation of Object Hierarchies for Ray Tracing", IEEE Computer Graphics and Applications, 7(5), pp. 1421, May 1981 [Gra 63] Graham, C., Vision and Visual Perception, John Wiley and Sons, New York, NY, 1965 [Hai 89] Haines, E., "Essential Ray Tracing Algorithms", in A. Glassner, Introduction to Ray Tracing, Academic Press, San Diego, CA, pp. 3967, 1989 [Han 89] Hanrahan, P., "A Survey of Ray Surface Intersection Algorithms", in A. S. Glassner, Introduction to Ray Tracing, Academic Press, San Diego, CA, pp. 8592, 1989 [Hec 89] Heckbert, P., "Writing a Ray Tracer", in A. Glassner, Introduction to Ray Tracing, Academic Press, San Diego, CA, pp. 263293, 1989 [Kay 86] Kay, T. and Kajiya J. , "Ray Tracing Complex Scenes", SIGGRAPH'86 Proceedings, pp. 269278, August 1986 [Lin 92] Lindley, C., Practical Ray Tracing in C, John Wiley and Sons, New York, NY, 1992 [Lor 75] Lorin, H., Sorting and Sort Systems, AddisonWesley Publishing Company, Reading, MA, 1975 [Lux 68] Luxenberg, H. and R. Kuehn, Display Systems Engineering, McGrawHill Book Co, New York, NY, 1968 [Mar 82] Marr, D. , Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, Freeman Press, San Francisco, CA, 1982 [Moo 61] Moon, P., The Scientific Basis of Illuminating Engineering, Dover Publications, New York, NY, 1961 [Pot 82] Potmesil, M. and I. Chakravarty, "Synthetic Image Generation with a Lens and Aperture Camera Model", ACM Transactions on Graphics, vol. 1, pp. 85108, 1982 [Rub 80] Rubin, S. and Whitted, T., "A Threedimensional Representation for Fast Rendering of Complex Scenes", Computer Graphics, 14(3) pp. 110116, July 1980
PAGE 158
152 [Sou 61] Southall , J. P , Introduction to Physiological Optics, Dover Publications, New York, NY, 1961 [Wat 89] Watt, A., Fundamentals of Three Dimensional Computer Graphics, AddisonWesley Publishing Company, Reading, MA, 1989, pp 190199 [Weg 84] Weghorst, H., G. Hooper and D. Greenberg, "Improved Computational Methods for Ray Tracing", ACM Transactions on Graphics, 3(1), pp. 5269, January 1984 [Wei 77] Weiler, K. and P. Atherton, "Hidden Surface Removal Using Polygon Area Sorting", Computer Graphics, 11, pp. 214222, August 1977 [Whi 80] Whitted, T., "An Improved Illumination Model for Shaded Display", Communications of the ACM, 23(6), pp. 343349, June 1980 [Wil 94] Wilt, N., Object Oriented Ray Tracing in C++, John Wiley & Sons, New York, NY, 1994
PAGE 159
APPENDIX Let tj,t2 be the smallest positive root of each fi(t),f2(t) as shown in Fig 4 . 8 : t, = (b,D,Â‘^')/2 t 2 = (b 2 D 2Â‘'')/2 Let's find the smallest t, without using the exact calculation of t. This may be done by just comparing the coefficients b,c to sort the depth of bounding volumes with respect to the ray path . Assume that tj>t2 (b,Di*^2)>(b2D2Â‘'2) b2b, > i) b2bi>0, DiÂ‘'^D2*^^<0 select t2 ii) b2b,>0, ( Di*^^D2Â‘^^^0 Di*^ 2>D2'^^ Â« bi^4Ci>b2^4c2 ) (b2bi)^>(DiÂ‘'^D2^'^)^ b2^2bib2+bi^>Di+D22D,Â‘'^D2'^^ <=> 2DiÂ‘^^D 2*^^>2 (bib22c,2c2) 153
PAGE 160
154 if (bib22c,2c2)<0 , then select t2 if (bib22ci2c2)^0 and (b,^4c,) (b2^4c2)>(bib22ci2c2)^ (b,C2b2Ci) (b2bi)>(CiC2)^ then select t2 (a) if (bib22c,2c2)^0 and (b,^4ci) (b2^4c2) = (bib22ci2c2)^ ^ (biC2b2Ci) (b2bi) = (CiC2)^ then select tj (V ti=t2) (b) From (a) , (b) (b,C2b2C,) (b2b,)^(c,C2)^ then select t 2 From conditions (a) and (b) we can construct the state variable X 4 = 0 . if (b]b 2 2 c,2 c 2)>0 and (b,^4c,) (b2^4c2)<(bib22ci2c2)^ ^ (b,C2b2C,) (b2b,)<(c,C2)^ then select t.
PAGE 161
155 iii) b 2 Â“bi< 0 , ( DiÂ‘^^D 2Â‘'^<0 ^ bi^4 Ci(bib 2 2 ci2 c 2 )^ ^ (biC 2 b 2 Ci) (b 2 bi)>(CiC 2 )^ then select ti (c) if (bib22ci2c2)>0 and (bi^4ci) (b2^4c2) = (bib22ci2c2)^ ^ (biC2b2Ci) (b2bi) = (CiC2)^ then select ti (V tj=t 2 ) (d)
PAGE 162
From conditions variable X4=0. From (c),(d) (b,C2b2C,) (b2b,)^(c,C2)^ then select t, (c) and (d) we can construct the state iv) b2Â“bj<0, assumption is wrong, select t.
PAGE 163
BIOGRAPHICAL SKETCH Jun Lee was born on August 2 , 1957, at Seoul, Korea. He received his Bachelor of Aeronautical Engineering from the Korean Air Force Academy, and his MS in electrical engineering from the US Naval Postgraduate School, Monterey, CA. He is a major in the Korean Air Force, and has received a Korean Air Force Fellowship to complete his Ph.D. in electrical engineering . He is married to Younghi Lee and they have two energetic daughters, Jarang and Utom. Upon completion of his doctoral studies he and his family will return to his military career in Korea. 157
PAGE 164
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^hn Staudhammer, Chairman Professor of Electrical Engineering I certify that I have rea(^ this study and that in my opinion it conforms to accept^le standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the PaiTOS E. Livadas, Co/chairman Assistant Professor /f Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^flack R. Smith ^Trofessor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. p4 74rrhy\ytlo A. Antonio Arroyo ^ Associate Professor of Electrical Engineering
PAGE 165
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Paul wychun Profe^or of Biochemistry ^and Molecular Biology This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosphy. December 1994 Winfred M. Phillips Dean, College of Engineering Karen A. Holbrook Dean, Graduate School

