Citation |

- Permanent Link:
- http://ufdc.ufl.edu/UF00082283/00001
## Material Information- Title:
- Augmentable object-oriented parallel processor architectures for real-time computer-generated imagery
- Creator:
- Fleischman, Ross Morris (
*Dissertant*) Staudhammer, John (*Thesis advisor*) Doty, Keith L. (*Reviewer*) Smith, Jack R. (*Reviewer*) Principe, Jose C. (*Reviewer*) Duffy, Joseph (*Reviewer*) - Place of Publication:
- Gainesville, Fla.
- Publisher:
- University of Florida
- Publication Date:
- 1988
- Copyright Date:
- 1988
- Language:
- English
- Physical Description:
- xii, 154 leaves : ill. ; 28 cm.
## Subjects- Subjects / Keywords:
- Atmospheric attenuation ( jstor )
Block diagrams ( jstor ) Buffer storage ( jstor ) Display devices ( jstor ) Image processing ( jstor ) Integers ( jstor ) Opacity ( jstor ) Pixels ( jstor ) Rectangles ( jstor ) Simulations ( jstor ) Computer architecture ( lcsh ) Computer graphics ( lcsh ) Dissertations, Academic -- Electrical Engineering -- UF Electrical Engineering thesis Ph. D Parallel processing (Electronic computers) ( lcsh ) Three-dimensional display systems ( lcsh ) - Genre:
- bibliography ( marcgt )
non-fiction ( marcgt ) theses ( marcgt )
## Notes- Abstract:
- The hardware architecture of a system for real-time computer-generated imagery (CGI) is presented that combines augmentability, modularity, organizational simplicity, and parallelism. This architecture is a functional, highly-modular, parallel processor approach that is well suited for employing VLSI technology. It is a generic structure that can grow with technological advances and can accommodate a full range of CGI systems that demand different performance requirements through one basic set of modules. The CGI process contains five fundamental components: input, modeling, rendering, compositing, and output. This architectural approach extends specialized hardware into both the compositing and output components, which allows the definition of a generic framework for building systems appropriate for many simulations. The system architecture performs image synthesis in parallel by partitioning the image generation task in object space with each partition. assigned to an individual autonomous object generator. Objects are rendered independently from each other, and when complete they are automatically composited by the hardware for display. This process is repeated at a rate suitable for real-time animation. The picture representation accepts transparent, semi-transparent, and fully opaque surfaces. Hardware facilities perform automatic hidden surface removal with antialising and atmospheric attenuation inclusion. An approximation for surface intersection is performed, and a subpixel control mechanism is provided. The parallel hardware algorithm is classified as a compute-aggregate-broadcast paradigm: a compute phase generates objects, an aggregate phase combines the objects into a scene, and a broadcast phase displays the scene. Its system framework maintains a synchronous feed-through structure that allows enlargement by either dynamic or static additions. system improvement is accommodated by adding modules that incrementally improve system performance and scope. This reduces difficulties associated with the incorporation of new systems to introductions of new modules, thereby lengthening system life.
- Thesis:
- Thesis (Ph. D.)--University of Florida, 1988.
- Bibliography:
- Includes bibliographical references.
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Ross Morris Fleischman.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 001123865 ( ALEPH )
20071125 ( OCLC ) AFM0917 ( NOTIS )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

AUGMENTABLE OBJECT-ORIENTED PARALLEL PROCESSOR ARCHITECTURES FOR REAL-TIME COMPUTER-GENERATED IMAGERY BY ROSS MORRIS FLEISCHMAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1988 Copyright 1988 by Ross Morris Fleischman ACKNOWLEDGMENTS I would like to express my appreciation to my advisor and supervisory committee chairman, Dr. John Staudhammer, for the guidance and encouragement he provided me on this project. I am also grateful to the other members of my supervisory committee, Dr. Keith L. Doty, Dr. Jack R. Smith, Dr. Jose C. Principe, and Dr. Joseph Duffy, for their commitment. I also wish to thank the members of the UF Computer Graphics Research Group for their suggestions. This dissertation is dedicated to my mother, Ruth Koegel Fleischman, and to the memory of my father, Erwin Lewis Fleischman. iii TABLE OF CONTENTS Page ACKNOWLEDGMENTS ... iii LIST OF TABLES. ... .. .vii LIST OF FIGURES . ... .viii LIST OF ABBREVIATIONS . x ABSTRACT. .. .xi CHAPTERS I INTRODUCTION. . 1 Problem Definition. 2 Dissertation Project. 3 Overview of Dissertation. 4 II TYPICAL REAL-TIME CGI ARCHITECTURE. 5 Scene Manager . 5 Geometric Processor. .. 8 Video Processor . 9 Display Device .. 10 III ALTERNATE REAL-TIME CGI ARCHITECTURE. .. .12 System Model. .. 12 Underlying Idea .... 12 Supporting Architecture .. 15 Advantages of Approach. .. .19 Target Applications. .. 22 IV COMPOSITING NETWORK .. 23 Compositing Methodology 25 RGBZA Compositing Algorithm .. 33 Network Structure ... 40 Compositing Processing Node ....... 50 General Compositing Processing Node .... 54 Depth computation unit. .... 54 Opacity computation unit. ... .58 Color computation unit. .. ....... 63 Specialized Compositing Processing Node 70 Depth computation unit. .. .71 Color computation unit. .. .72 Analysis. . ... 77 Complexity. . 77 Performance ........ 84 V VIDEO GENERATION NODE ... 90 Configuration . 90 Atmospheric Attenuation Unit. 92 Pixel Cache .. 95 Double-Buffered Frame Buffer. 96 Video Shift Registers ... 98 Color Palette .. 99 Digital-to-Analog Converters. .. 100 System Controller .. 100 Analysis. . .. 101 Complexity. . ... 101 Performance .. 102 VI DISPLAY DEVICE NODE .. 105 Display Device Approaches .. .105 Raster Scan Conversion. 106 Image Aspect Ratio. 107 Display Device Performance .. 107 VII OBJECT GENERATION NODE. 109 Configuration . .. 110 Object Generation Node Nucleus. 110 Double-buffered image buffer. .. .112 Intensity multiplication unit .. .114 Nucleus controller. .. .115 Object Generation Unit. 118 Secondary Memory Unit 119 Analysis. . 121 Complexity. . .. .121 Performance .. .. 122 VIII MAINTENANCE MANAGEMENT NODE 124 Configuration . 124 Operating Functions 125 System Boot Operation .. 125 System Normal Operation .. 126 Simulation Debugging. 127 Analysis. ... .. .127 IX CONCLUSION. . .. .129 System Simulator. .... .... 129 System Simulation 130 Discussion of System Features ... .139 Summary . .. .. .147 BIBLIOGRAPHY. . .. 149 BIOGRAPHICAL SKETCH .................. 154 LIST OF TABLES Table Page 4-1 Functional logic block equivalent of the general CPN and the specialized CPN. .. 79 4-2 Pin requirement for the general CPN, the specialized CPN, and each CPN computation unit. . 80 4-3 Gate equivalent and package pin count of various functional logic blocks 81 4-4 Estimated complexity of the general CPN, the specialized CPN, and each CPN computation unit. . 83 4-5 CPN processing-time for various image space resolutions. The image update rate is 10 frames per second ... 86 vii LIST OF FIGURES Figure Page 2-1 Block diagram of a typical real-time CGI system organization 6 3-1 Composition of an opaque background 3-D object and an opaque foreground 3-D object to produce a composite 3-D scene ...... 14 3-2 Block diagram of the proposed augmentable real-time CGI system organization .. 17 4-1 Three distinct types of pixel coverage, with respect to the ALPHA value: a) no coverage, b) full coverage, and c) partial coverage. The subpixel shape of the pixel with partial coverage is arbitrary and is only shown in this manner for conceptual clarity. ... 27 4-2 Two pixel opacity values are composite. The values were derived from coverage information from two different objects. The coverage depictions are arbitrary, they are given specific subpixel forms to clarify the composite operation. The coverage areas are actually averaged across the pixel. ... 35 4-3 The RGBZA Compositing Algorithm .. 39 4-4 A fully balanced three level compositing tree ... .... 42 4-5 The general RGBZA compositing algorithm for a fully balanced tree. Note that lower case letters designate the product of intensity and opacity. .. ... 47 4-6 The specialized RGBZ compositing algorithm of a fully balanced tree .. 49 4-7 An iterative building block depiction of a compositing processing node (CPN) .. .51 4-8 The algorithm performed by a general CPN depth computation unit. ... 56 4-9 Block diagram of a general CPN depth computa- tion unit . ... 57 viii 4-10 The algorithm performed by a general CPN opacity computation unit. .. 61 4-11 Block diagram of a general CPN opacity compu- tation unit . 62 4-12 The algorithm performed by a general CPN color computation unit. 65 4-13 Block diagram of a general CPN color compu- tation unit .. 66 4-14 The algorithm performed by a specialized CPN color computation unit. 73 4-15 Block diagram of a specialized CPN color computation unit. .. 74 5-1 Block diagram of a video generation node with respect to its seven modules. .. 91 5-2 Block diagram of the atmospheric attenuation unit, used to include atmospheric effects in a scene ..... ......... 94 7-1 Block diagram of an object generation node with respect to its three modules .. 111 7-2 Block diagram of the intensity multiplication unit, used to condition the color and opacity values for input to the compositing network 116 9-1 View of rectangle A and rectangle B in object space .. .... 131 9-2 Contents of the simulated frame buffer of OGN1 after scan converting rectangle A. ... .134 9-3 Contents of the simulated frame buffer of OGN2 after scan converting rectangle B. ... 135 9-4 Contents of the simulated frame buffer of the VGN for the first system simulation 136 9-5 Contents of the simulated frame buffer of the VGN for the second system simulation. 137 9-6 Two system tree configurations: a) fully balanced system tree and b) unbalanced system tree. .. .. 144 LIST OF ABBREVIATIONS 2-D -------- Two-Dimensional 3-D -------- Three-Dimensional AAU -------- Atmospheric Attenuation Unit CGI -------- Computer-Generated Imagery CN --------- Compositing Network CPN -------- Compositing Processing Node DAC -------- Digital-to-Analog Converter* DDN -------- Display Device Node I/O -------- Input/Output IPU -------- Intensity Multiplication Unit LSH -------- Least Significant Half MSH -------- Most Significant Half MMN -------- Maintanence Management Node MMU -------- Maintanence Management Unit OGN -------- Object Generation Node OGNN ------- Object Generation Node Nucleus OGU -------- Object Generation Unit RGB -------- RED, GREEN, and BLUE SMU -------- Secondary Memory Unit VGN -------- Video Generation Node VLSI ------- Very Large-Scale Integration Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy AUGMENTABLE OBJECT-ORIENTED PARALLEL PROCESSOR ARCHITECTURES FOR REAL-TIME COMPUTER-GENERATED IMAGERY By Ross Morris Fleischman December 1988 Chairman: Dr. John Staudhammer Major Department: Electrical Engineering The hardware architecture of a system for real-time computer-generated imagery (CGI) is presented that combines augmentability, modularity, organizational simplicity, and parallelism. This architecture is a functional, highly- modular, parallel processor approach that is well suited for employing VLSI technology. It is a generic structure that can grow with technological advances and can accommodated a full range of CGI systems that demand different performance requirements through one basic set of modules. The CGI process contains five fundamental components: input, modeling, rendering, compositing, and output. This architectural approach extends specialized hardware into both the compositing and output components, which allows the definition of a generic framework for building systems appropriate for many simulations. The system architecture performs image synthesis in parallel by partitioning the image generation task in object space with each partition assigned to an individual autonomous object generator. Objects are rendered independently from each other, and when complete they are automatically composite by the hardware for display. This process is repeated at a rate suitable for real-time animation. The picture representation accepts transparent, semi- transparent, and fully opaque surfaces. Hardware facilities perform automatic hidden surface removal with antialiasing and atmospheric attenuation inclusion. An approximation for surface intersection is performed, and a subpixel control mechanism is provided. The parallel hardware algorithm is classified as a compute-aggregate-broadcast paradigm: a compute phase generates objects, an aggregate phase combines the objects into a scene, and a broadcast phase displays the scene. Its system framework maintains a synchronous feed-through structure that allows enlargement by either dynamic or static additions. System improvement is accommodated by adding modules that incrementally improve system performance and scope. This reduces difficulties associated with the incorporation of new systems to introductions of new modules, thereby lengthening system life. xii CHAPTER I INTRODUCTION A computer-generated imagery (CGI) system is a specialized computer system that provides a visual simu- lation of an artificial environment. Conceptually, a CGI system consists of a window in multidimensional space with which an observer may look into a world. The window is presented by a computer driven display device, while the world is modeled by a database that the computer can access. Thus, the visual simulation may be regarded as a generation of an out-the-window view, in real-time, ac- cording to the simulated position and orientation of the observer with respect to the simulated changes of the artificial environment. A popular application of real-time computer-generated imagery visual simulators concerns vehicle training simu- lation [FIS85, PAN86, SCH81, SCH83b, YAN85, ZYD88]. For this application, an observer's visual experience is created by a generated perspective projection of a 3-D world rendered onto a 2-D display device [BEN83], with associated special effects. Other simulation tasks [SUG83] may have variations of the visual simulation requirement as a function of the world structure, but the real-time per- formance and rendering problems remain constant. Real-time operation, which defines a computation process where the execution time of the computer is syn- chronized with the physical event time or wall-clock time, is a major requirement of these systems [FOR83]. Also, the associated image rendering problems are computationally demanding. Thus, real-time CGI system organizations typi- cally mandate custom-designed, special-purpose, high-speed computers, with general-purpose computers for their control [SCH81, SCH83b, YAN85]. Problem Definition Traditional CGI architectures utilize both pipelining and parallelism technologies to achieve real-time perfor- mance for image synthesis. The system architectures are usually highly specialized and constrain the types of graphics primitives that can be employed [ENG86]. These special-purpose architectures usually involve a fixed graphics pipeline that is difficult to enhance for in- creased performance or for inclusion of additional graphics primitives. The realization of major CGI architectural revisions that exhibit improved performance with substantial hardware reduction is a subject of research. Innovative CGI archi- tectures will employ unique organizational structures that realize algorithmic improvements with respect to imple- mentation with massive memory, gate arrays, and custom VLSI. Thus, improvements in both VLSI memory chips [COL87, TUN87a] and VLSI computational chips [BUR87, COL87, GRI86, MOK87], plus parallel processing trends [SCH87], are good indicators that the evolution of CGI system organizational philosophies will become VLSI-oriented through parallelism. Dissertation Project The general research objective is to develop the guidelines and philosophies of a VLSI-oriented real-time CGI architecture that combines augmentability, modularity, organizational simplicity, and parallelism. This proposed architecture will be a functional, highly-modular, parallel processor approach that will be suited for employing VLSI technology. It will be a generic structure that can grow with technological advances. The investment in such a system will hypothetically never be discarded; system improvement can be accommodated by adding modules that incrementally improve the performance and scope of a sys- tem. The introduction of new systems will be reduced to introductions of new modules, thereby resisting system obsolescence. Therefore, such a system will be continu- ously expandable and never totally outmoded, thus providing performance, development, and economic benefits. Overview of Dissertation This dissertation is organized into nine chapters. Chapter I is an introductory chapter that covers objectives and background about the dissertation subject. Chapter II describes a typical real-time CGI architecture. Chapter III presents an overview and introduction of the proposed augmentable CGI architectures, along with the fundamental driving idea for the approach. Chapters IV through VIII describes each major subsystem of the augmentable CGI architectural approach. Chapter IV describes the compos- iting network. Chapter V describes the video generation node. Chapter VI describes the display device node. Chapter VII describes the object generation node. Chapter VIII describes the maintenance management node. Chapter IX is a concluding chapter that contains a discussion of the system simulation, along with a summary of the dissertation results. CHAPTER II TYPICAL REAL-TIME CGI ARCHITECTURE A typical real-time CGI system organization, popular among vehicle training simulators, is shown in Figure 2-1. This structure provides a single field-of-view of the artificial environment, termed a channel. Its organization consists of a cascade of four major subsystems: the scene manager, the geometric processor, the video processor, and the display device [SCH83b, YAN85]. The first three sub- systems form a specialized computer graphics pipeline for image rendering. The last subsystem provides a specialized display for viewing. Scene Manager The overall function of the scene manager is to provide scene elements to the system pipeline that lie in the observer's field-of-view, within the artificial envi- ronment, given observer position and orientation. Observer position and orientation information are provided to the scene manager by a host simulator [FOR83, SCH83b]. This information directs dynamic extraction of database scene elements from mass storage that are loaded into an active database memory for sorting [PAN86]. These scene elements represent the observer's panorama and are examined to Data From Host Simulator 3-D Data Blocks 3-D Data Blocks Blocks 3-D Analog Video Figure 2-1. Block diagram of a system organization. typical real-time CGI determine if they are potentially visible within the field- of-view of the observer [PAN86, YAN85]. Scene elements satisfying this condition are provided with an appropriate level-of-detail, while the remainder are culled [PAN86, YAN85]. The resultant scene elements are sent down the system pipeline, at the image update rate, to the geometric processor [YAN85]. Subsystem processing load is continu- ously monitored by the scene manager to avoid overloading the processing capacity of the pipeline. Processing load reduction techniques utilize various dynamic scene content- control mechanisms that usually degrade image quality gracefully [SCH83a, YAN85]. The mass storage device contains a database, which models an artificial environment, that drives the hardware. Features of a simulated scene (natural and cultural) are modeled to be of the same size, shape, location, and color as their real-world counterparts [SCH81, SCH83a]. Database modeling primitives for the typical CGI system consist of planar polygons as a major primitive and quadric surfaces as an option for both man-made curved objects and natural curvilinear objects [YAN85]. The database also contains scene element attributes such as color and texture. Geometric Processor The geometric processor is a special-purpose pipelined computer that operates on the scene element output from the scene manager. These operations usually produce the pro- jected geometry of the scene with associated geometric gradient and color gradient parameters. In general, the fixed coordinate system of the scene elements are trans- formed (via translation, rotation, and scaling) to the momentary eye-based coordinate system (origin located at the observer's eye). Within the eye-based coordinate system, a visibility frustrum is defined. Then, a 3-D clipping algorithm is applied to determine where the 3-D scene intersects the bounding planes of the visibility frustrum. Scene parts within the visibility frustrum are projected to the image plane with the computed geometric gradient and color gradient parameters, while the rest are deleted [BEN83]. Issues relating to color can be found in Rogers [ROG85]. A matrix multiplier was presented by Meares et al. [MEA74] and a three-dimensional coordinate transforma- tion device was presented by Newarikar [NEW82]. Clipping algorithms, geometric transformations, and perspective projection can be found in Rogers [ROG85] with an interest- ing VLSI solution presented by Clark [CLA82]. Clark discusses a four-component vector, floating point VLSI processor for accomplishing matrix transformations, clip- ping, and mapping to output device coordinates. Video Processor The video processor is a special-purpose computer that operates on the resultant projected geometry, geometric gradient, and color gradient output from the geometric processor for subsequent display. The video processor computes each pixel color produced on the picture plane representing visible portions of scene element surfaces [SCH83b, YAN85]. Pixel color computation is a function of various items: geometric gradient parameters (surface normals), color gradient parameters (scene element native color), texture maps, atmospheric attenuation (haze color), scene element illumination (both natural and cultural light sources), antialiasing techniques, shadows, and shading techniques. During, before, or after pixel computation, visible portions of the scene are identified through a hidden surface removal technique. This processor also provides timing and control of the display device, which relate to the video processor organ- izational philosophy [YAN85]: scan-line-based or frame- buffer-based. Scan-line-based units perform video pro- cessing one scan-line at a time in synchronism with each raster of the display device; one row of the visible scene's pixel codes are stored. Frame-buffer-based units perform video processing independent of the raster display; complete frame of the visible scene's pixel codes are stored. Algorithms and techniques used by the video processor are well known and can be found in the literature, such as Rogers [ROG85]. Examples of antialiasing include Booth's [B0087] presentation concerning the human factors relation to antialiasing and Carpenter's [CAR84] presentation of an interesting A-buffer approach. Real-time hardware ap- proaches to texture mapping can be found in the literature, such as one approach presented by Fant [FAN86]. Display Device Display device technology primarily consists of two variations: calligraphic displays and raster displays [SCH83a]. The color calligraphic display is characterized by a continuous layered phosphor surface (RED and GREEN phosphor layers) used to present a color picture with beam penetration control (electron beam velocity) of sequen- tially refreshed straight lines (vectors or strokes) and points (zero length vectors). The raster display contains a regular grid of phosphor triads (RED, GREEN, and BLUE) that are used to present a color picture by modulated illumination of each phosphor triad point (pixel) with refresh in a regular pattern. Calligraphic systems main- tain high quality light points with color limitations, 11 while raster systems maintain high quality painted faces without color limitations [YAN85, SCH83a]. CHAPTER III ALTERNATE REAL-TIME CGI ARCHITECTURE This chapter presents an alternate real-time CGI architectural approach as compared to the traditional approach briefly presented in Chapter II. This discussion is meant as an overview to give an understanding of the new approach before delving into its details. The system model is presented to illustrate the underlying idea and its supporting architecture. Following, is a discussion of the advantages of the new approach and typical applications. System Model A system model is presented that exhibits the premise of this research. First, the underlying idea with respect to the image generation problem is presented. Second, the supporting architecture that can realize the underlying idea is described. Underlying Idea This field of architectural research is driven by the fundamental idea that an individual scene is composed of separable objects. Therefore, a scene can be produced from the summation of every object existing in that scene; this process is called compositing. An example of compositing is presented in Figure 3-1, which illustrates the composite of two opaque 3-D objects. As shown, an opaque background 3-D object and an opaque foreground 3-D object are merged to form a composite 3-D scene. This process indicates that there is an alternative to producing an entire complex scene directly. The generation of simpler objects can be done individually, followed by compositing the simpler objects to produce an entire complex scene [POR84, STA83]. The approach taken by this research separates the image compositing process from the image synthesis process of the image generation problem. The compositing is reduced to the pixel level, where a procedure is defined that can blend images through a pixel-by-pixel process. This compositing process is extended further, at the pixel level, to include the effect of atmospheric attenuation. The compositing process performs antialiased blending of images utilizing a mixing factor. This mixing factor defines the average opacity of a pixel, which defines the average pixel reflectivity. It is useful for performing surface edge antialiasing and for rendering surfaces that are either transparent, semi-transparent, or opaque. Along with the mixing factor is the depth value for determining the depth location of a pixel in space. This information is used for a comparison to determine whether a pixel, as compared to another pixel, is in front of, behind, or at the same distance. Also, the horizontal and vertical Background 3-D Object Composite 3-D Scene Figure 3-1. Composition of an opaque background 3-D object and an opaque foreground 3-D object to produce a composite 3-D scene. Foreground 3-D Object position is determined by the pixel location in the image array and the color value is defined as the additive tri- stimulus colors: RED, GREEN, and BLUE. The atmospheric attenuation process is performed by a procedure that calculates attenuation as a function of distance from the viewpoint with respect to a fading constant and a horizon color. The fading constant is adjusted for varying atmospheric conditions such as foggy, hazy, and murky atmospheres. The horizon color is adjusted for varying background lighting conditions. This process is performed following the compositing process, which results in a pixel value representing the composite tri- stimulus color value with the effect of atmospheric attenuation included. Supporting Architecture The fundamental idea of compositing focuses on allow- ing a scene to be blended by computer. Hypothetically, the objects would be visually computer-generated in their proper position and orientation, then they would be merged by computer for display. Therefore, instead of having an individual total database for an artificial environment, the total database would be partitioned by objects to provide multiple partial databases. This would allow each object or group of similar objects in a scene to be assigned an individual processor, which would have the advantage of distributing the image generation task, thus reducing the performance requirement for each processor and secondary memory unit. Also, the task of merging or compositing the collection of objects would be performed by separate processors. As a result, an organization of this nature would produce multiple data-streams and multiple instruction-streams, thereby speeding-up both computational processing and I/O processing. Also, the separable nature of objects existing in a scene points to the goal of expandability without affecting other elements of the system. The abstract organization of the proposed augmentable CGI architecture, which logically follows from the above discussion, is illustrated in Figure 3-2. Major components of the proposed real-time CGI machine consists of multiple object generation nodes (OGNs), a compositing network (CN), a video generation node (VGN), a display device node (DDN), and a maintenance management node (MMN). This system can handle opaque, transparent, and semi-transparent images. A short description of each subsystem is discussed below with a more detailed discussion of each subsystem presented subsequently in Chapters IV through VIII. The object generation node is a VLSI-oriented image synthesis processor with an optional local secondary memory, which can execute computer graphics algorithms to render an assigned object. OGNs operate autonomously and concurrently with respect to the complete system, but in Compositing Network NGeneral Communications Figure 3-2. Block diagram of the proposed augmentable real-time CGI system organization. synchronism with it. They are assigned a partition, termed an object, of the entire image generation task. An OGN interfaces to the compositing network through its image memory, which contains an image space view of the assigned object. Each element of the image memory contains three pixel attributes: color, opacity, and depth. The X, Y coordinates are derived from a pixel's position in the image memory. The compositing network is a pixel-by-pixel hardware compositor, which is an expandable ensemble of intercon- nected compositing processing nodes, that produces a computer graphics picture through blending independently rendered objects into a full image. This network is a synchronous feed-forward structure. It simultaneously reads each image memory area, of every OGN, pixel-by-pixel in a row-by-row manner and writes the composite result to the VGN pixel-by-pixel. The video generation node processes composite object digital image data from the compositing network and then converts it to analog video data for display. Pixels are individually received from the compositing network. While pixels are received, the VGN includes the effect of atmospheric attenuation to each pixel and then writes the result to its frame buffer pixel-by-pixel in a row-by-row manner. Simultaneously, the frame buffer data is read and converted to analog data for driving the display device node. Also, the timing of the entire system is derived from the VGN. The display device node is a raster scan type monitor. It receives three primary colors from the VGN: RED, GREEN, and BLUE. The video timing of the monitor is also control- led directly from the VGN. The maintenance management node provides central control and health assurance of the system. It is an autonomous processor that provides self-maintenance operations and system support functions. Included is a computational unit, a secondary memory unit, and a con- sole. The MMN communications to and from the nodes of the system is provided by a general interface from which all system nodes are connected. Advantages of Approach The improvements of this CGI architectural approach as compared to existing CGI architectural approaches encompass a reduced complexity of the individual image synthesis processors, ease of system expansion, ease of including different graphic primitives, decoupling of the rendering process from the compositing process, and ease of system understanding. The reduced complexity of image synthesis processors is due to three factors: 1) the image genera- tion task is distributed among many processors (OGNs), 2) the hidden surface removal with antialiasing is included in the architectural structure (compositing network), and 3) the effect of atmospheric attenuation is included in the architectural structure (VGN). The automatic processing of 2 and 3 above is relegated to the machine structure and the distributed processing of 1 above is shared among many image synthesis processors. The result is a simplified database and a reduction of the amount of geometry required to render an object. This relieves individual processing performance requirements of each object generation node, thus allowing modest processors, e.g., off-the-shelf VLSI, to perform their image synthesis tasks. Thus, OGNs do not have to be the same type. The decoupling of the rendering process from the compositing process is done though independent memory areas; this enhances system performance by keeping both processes running in parallel. The system expansion is eased, since it is done by merely adding additional CPNs and OGNs. New graphics primitives can be easily added to the system by additional OGNs that have special hardware. The basic goal, which may raise load balance issues, is to add more processors when performance demands increase. The system understanding is simplified, since the complex task of merging many objects is done through the generic machine structure. An underlying advantage is the trivial pixel level solution to the intersection problem. A solution to a set of simultaneous equations is usually done to solve the intersection of two or more surfaces. This would require a large amount of calculations. The pixel level approach of this new architecture reduces the geometry that is typically involved for solving intersection problems to the comparison of depth values. The solution is an approxima- tion, however it is visually correct. Also, the hidden surface problem is solved in a similar pixel level manner. Since reflectivity is handled by a mixing factor, the opaque, transparency, semi-transparency and edge-anti- aliasing problems associated with computer graphics are also consolidated to the pixel level. Along with this is the inclusion, at the pixel level, of the atmospheric attenuation effects. Thus, a compact pixel-by-pixel method allows the solution to complex geometrical problems and the inclusion of complex realistic image effects. This organization will allow a full range of CGI systems that demand different performance requirements to be accommodated through one basic set of modules. It will be a generic structure that can grow with technological advances. System improvement is accommodated by adding modules as opposed to a system redesign that is usually associated with typical real-time CGI systems. Thus, in contrast to current fixed performance brute-force real-time CGI architectures, a variable performance and expandable real-time CGI architectural approach is presented here. Target Applications Target applications of this device will not be restricted to any specific real-time simulation task, i.e., vehicle simulation. A goal of this research is to extend the architecture for inclusion of.other real-time simu- lation applications, e.g., process and system simulations. It will be a general purpose framework to simulate many things, in real-time, with visual output. In short, the object generation nodes can be thought of as processing logical objects. Objects can be a single item or a collection of items. For instance, an object is abstract: blade of grass, total field of grass, or physical object. Therefore, some or all of the tasks of simulation can be moved to each object, so long as these tasks are separable. However, the tasks do not have to be cleanly separable. For instance, overlaps could exist, which would be resolved in compositing. This would allow simple splitting of some objects into two somewhat overlapping ones without the need of calculating new intersections due to an artificial division cut. Also, true-color, pseudo- color, or both can be applied for the visual simulation with respect to the problem domain. CHAPTER IV COMPOSITING NETWORK The compositing network (CN) is a hardware compositor that produces a computer graphics picture through blending heterogenously rendered objects into a full image. These separately rendered objects are reductions of a total modeled environment into pieces that rely on compositing techniques for accumulation. Each object is produced by an individual object generation node (OGN), which is in itself a computer image generation device. The network configura- tion is in the form of a synchronous feed-forward tree that is connected to a multiplicity of object generation nodes (OGNs) for input and to a single video generation node (VGN) for output. Therefore, many object images are com- posited simultaneously. The composite of additional object images is done through enlarging the compositing network and through including additional object generation nodes. There is no fixed configuration, but rather a general framework to configure a compositing network utilizing a collection of basic building blocks, called compositing processing nodes (CPNs). The compositing network operation requires the simul- taneous input of all instances of pixels with the same X and Y cartesian coordinate per unit time. Each instance of a pixel is part of an individual object rendered by an object generation node. These pixel values flow in a synchronous feed-forward manner through the compositing network, while being merged pixel-by-pixel at particular stages. The last stage of the network provides a single surviving pixel as output, which has an implied X coordi- nate and Y coordinate. To summarize, the compositing process is carried-out pixel-by-pixel through three steps: 1) every pixel value is simultaneously read from each image array of the object generation nodes at a specified X coordinate and Y coordi- nate, 2) the compositing process operates on the collection of pixel values read from the OGNs to produce a single composite pixel resultant, and 3) the single composite pixel resultant is written to the resident image array of the video generation node at the same X coordinate and Y coordinate used for the read operation. This process is repeated at every X coordinate and Y coordinate of the image array to produce every composite pixel value of the image array within the video generation node. The entire compositing network action, for each collection of pixels, can be generally characterized by Pc = oper(P1, P2, P3,..'* Pi) (1) at every pixel with identical X, Y cartesian coordinates in the i image arrays. The value Pc represent a single sur- viving pixel after compositing. The "oper" operator is a general operation that symbolizes the compositing action due to the entire compositing network. Specifics of the compositing algorithm that the compositing network realizes are developed and described in the following sections. Compositing Methodology Guidelines for the generation of 2-D pictures and arithmetic for their 2-D compositing were discussed by Porter and Duff [POR84]. Their compositing method produced antialiased composite images through a pixel-by-pixel process. The antialiased composite or antialiased blending of images requires information about the subpixel overlap and object opacity. This information, as discussed by Porter and Duff [POR84], is given by adding a mixing factor to the color channels, which is called an ALPHA value. Therefore, a pixel is defined by four independent varia- bles: RED, GREEN, BLUE, and ALPHA. Thus, the interplay of alpha values must be considered for compositing objects to accumulate a final image [POR84]. The ALPHA portion of an object representation provides two pieces of information for compositing: 1) the single ALPHA value represents the extent of coverage of an object within a pixel and 2) the collection of ALPHA values repre- senting an object provides coverage information that desig- nates the shape of an object within the image space. The pixel coverage information provides a mixing factor to control linear interpolation of foreground and background colors at every pixel. The object shape information, which is termed a matte, identifies the object from what is not the object within an isolated image array. The ALPHA value represents the opacity of a pixel, which is a fractional value that ranges from zero to one. The antithesis of ALPHA, which is the transparency of a pixel, is defined as (1-ALPHA). Therefore, the transpar- ency value also ranges from zero to one. Figure 4-1 illus- trates this coverage information, pictorially, for three distinct coverage types of opacity: no coverage, full coverage, and partial coverage. As shown, no coverage is indicated by an ALPHA value of zero, full coverage is indicated by an ALPHA value of one, and partial coverage is indicated by a fractional ALPHA value between zero and one [POR84]. The pixel coverage information consists of an average value of opacity. Therefore, the subpixel distribution of opacity is not known or, in other words, the subpixel shape is not known. Thus, some pixel coverage information is missing, but the ALPHA value information is still useful for rendering transparent objects, semi-transparent ob- jects, and performing non-commutative object edge anti- aliasing for rendering opaque, semi-transparent, or trans- parent objects. Also, since the ALPHA value represents ALPHA = 0 a) No Coverage ALPHA = 1 b) Full Coverage 0 < ALPHA < 1 c) Partial Coverage (arbitrary depiction) Figure 4-1. Three distinct types of pixel coverage, with respect to the ALPHA value: a) no coverage, b) full coverage, and c) partial coverage. The subpixel shape of the pixel with partial coverage is arbitrary and is only shown in this manner for conceptual clarity. the average coverage of an object within a pixel, the pixel color is determined by the product of ALPHA and the object's true color. Porter and Duff [POR84] discussed many operators for the compositing of two-dimensional images. The operator of interest to this research is the "over" operator. This operator computes a composite pixel color due to one pixel in front of another. The composite pixel color is given by cc = Cf + (1 Af)cb (2) and the composite opacity Ac = Af + (1 Af)Ab (3) where c denotes one of three tri-stimulus color values, A denotes the opacity value ALPHA, the subscript c denotes the composite, subscript f denotes the foreground, and subscript b denotes the background. Also, the true fore- ground color Cf is multiplied by the foreground opacity Af to produce cf and the true background color Cb is multi- plied by the background opacity Ab to produce cb. This is done to keep the computation of cc similar to the computa- tion of Ac. The derivation of the "over" operator is presented by Porter and Duff [POR84]. A similar develop- ment of "over," adjusted for this research, is presented in the following section of this chapter. Porter and Duff's approach has a drawback of requiring the priority of images to be manually entered. Therefore, Duff [DUF85] introduced the depth variable, Z, as an ex- tension to the earlier image composition algorithm to correct this drawback. The approach extended each pixel in the image space to contain five independent variables: RED, GREEN, BLUE, ALPHA, and Z. From this representation an RGBAZ algorithm was developed that combined the "over" operator of Porter and Duff [POR84] with a Z-buffer algo- rithm. Before discussing Duff's [DUF85] approach, the Z- buffer algorithm is presented and discussed. A Z-buffer is a depth buffer that stores the Z car- tesian coordinate, which is also termed the depth coordi- nate, of every visible pixel in image space. It is used in conjunction with a frame buffer, which is an attribute buffer that stores the intensity of each pixel in image space. A Z-buffer algorithm is a hidden-surface algorithm that operates on the RGB intensity information and the depth coordinate, Z, stored at each pixel in image space. The Z-buffer algorithm is described by Catmull [CAT74]. It functions by comparing the depth value of a new pixel, which is to be written into the frame buffer, with the depth value of the pixel that is currently stored in the Z- buffer. If the comparison indicates that the new pixel is closer to the viewpoint than the current pixel, then the new pixel's intensity value is written into the frame buffer and its depth value is written into the Z-buffer [ROG85]. If the comparison does not indicate the new pixel is closer to the viewpoint than the current pixel, then the current pixel values remain in the frame buffer and in the Z-buffer. To recapitulate, the Z-buffer algorithm is a search over X, Y in 3-D space for the value of Z(X,Y) that is closest to the viewpoint in image space. The Z-buffer operation can be defined as RGBZ = zmin(L,M), where L is an image array, M is an image array, and RGBZ is the survivor pixel from either M or L according to the algorithm. The collection of resultant RGBZ survivors over X, Y produces an image space that is a composite image of the rendered objects. This composite operation [DUF85] is more com- pactly characterized as Zc = zmin(ZL, ZM) (4) RGBc = RGBL, if ZL = zmin, else RGBM (5) at every pixel in the two image arrays. The subscript c denotes the composite. Two properties of the "zmin" oper- ator is that it is both commutative and associative. The Z-buffer algorithm allows pixels to be written into the frame buffer in arbitrary order. Therefore, the computation time associated with a depth sort operation is eliminated [ROG85]. Unfortunately, the algorithm has inherent aliasing problems due to its point sampling nature [DUF85]. It also fails for rendering transparent objects, but it is fast and simple [CAR84]. Duff's approach utilized the depth value at each of the four corners of a pixel to compute a fraction called BETA. This value is computed through linearly interpolat- ing the four depth corner values. The composite color is computed by cc = B(cf over cb) + (1 B)(cb over cf) (6) and Zc = min(Zf, Zb) (7) where c denotes one of three tri-stimulus color values multiplied by its respective opacity value, Z denotes the depth value, B denotes Duff's BETA value, the subscript c denotes the composite, subscript f denotes the foreground, and subscript b denotes the background. This approach combines the pixels by area sampling. A drawback of this 3-D compositing approach and of the previously discussed 2- D compositing approach is that they do not apply when the edges of more than one object are projected onto a single pixel. The compositing algorithm developed in this re- search, which is discussed in the following section of this chapter, addresses this problem. Another interesting approach to compositing was dis- cussed by Carpenter [CAR84], with the introduction of the A-buffer. An A-buffer is an antialiased hidden surface mechanism that is an enhancement to the Z-buffer through inclusion of a mask that contains subpixel coverage infor- mation. Therefore, the mask provides more pixel coverage information than the ALPHA value, but it is more memory intensive. The compositing techniques specified in the reviewed literature had specific idealic objectives, which are listed as follows: 1. Must not induce spatial aliasing in the image, which implies that soft edges of objects must be respected in computing the final image [POR84]. 2. Provide facilities for arbitrary dissolves, fades, darkening, and attenuation of objects [POR84]. 3. Exploit the full associativity of the compositing process, which implies accumulation of several foreground objects into an aggregate foreground can be inspected over different backgrounds [POR84]. 4. Allow various object representations: transparent, semi-transparent, and opaque [POR84]. 5. Visibility technique must support all conceivable geometric modeling primitives: polygons, quadrics, patches, fractals, and so on [CAR84, DUF85]. 6. Must handle opaque intersecting surfaces and trans- parent intersecting surfaces [CAR84]. 7. Must handle hidden surface removal [CAR84, DUF85]. The proposed new architectural approach attempts to satisfy these compositing technique objectives. Unfortu- nately, due to trade-offs taken to keep the approach within hardware limits, some of these objectives are not entirely met. The constraints and trade-offs associated with the approach addressed through this research, which concern the stated idealic objectives, are discussed in later sections and chapters. RGBAZ Compositing Algorithm The proposed compositing method is developed to allow any number of images to be composite with hidden surface removal and antialiasing. The compositing algorithm real- ized by the compositing network is based on Porter and Duff's [POR84] "over" operator, but is modified through the introduction of the depth value. This rendition modifies the "over" operator through incorporating the "zmin" oper- ator for identifying the foreground pixel from the back- ground pixel. The algorithm is labeled an RGBAZ algorithm, as was Duff's [DUF85], but differs from that formulation. It is developed and described in the subsequent paragraphs. Consider opacity values, A1 and A2, belonging to a pair of semi-transparent pixels, P1 and P2, that have identical X and Y coordinates, but differ in the Z coordi- nate where the Z1 value is less than that of the Z2 value. The composite Z value for this situation, in ac- cordance with the Z-buffer algorithm utilizing Equation 4, is given by Zc = Z1 (8) where Z denotes the depth value, and subscript c identifies the composite resultant. The depth comparison identifies pixel P1 as being closer to the viewpoint than pixel P2. Therefore, pixel P1 is identified as the foreground pixel and pixel P2 is identified as the background pixel. The opacity represen- tation designates the opaqueness of pixel P1 as A1 and its clearness as (1 AI). Likewise, the opaqueness of pixel P2 is A2 and its clearness is (1 A2). This implies that the composite opacity, according to the "over" operator, of the two pixels is given by Ac = Al + (1 A1)A2 (9) where A denotes the opacity value. An example of this situation is depicted in Figure 4-2. The composite color is calculated by realizing that pixel P1 allows (1 AI) of its background light through and reflects A1 of its color. Likewise, pixel P2 allows (1 - A2) of its background light through and reflects A2 of I A2 (1 I I A1 (1-A I I A1 (1-A 1 Figure 4-2. Background Object (partial pixel coverage) -A 2) Foreground Object (partial pixel coverage) 1) Composited Objects (shared partial pixel coverage) N2 (1-A 1 )(1-A2) Two pixel opacity values are composite. The values were derived from coverage information from two different objects. The coverage depictions are arbitrary, they are given specific subpixel forms to clarify the composite operation. The coverage areas are actually averaged across the pixel. its color. Therefore, P1 reflects A1 of its color and lets (1 A1) of P2's reflected color through. This implies that the composite color, according to the "over" operator, of the two pixels is given by cc = A1C1 + (1 A1)A2C2 (10) where C represents the tri-stimulus colors: RED, GREEN, and BLUE. The upper case C is used to designate the true color, which occurs when the pixel is 100% overlapped by the object. The lower case color c depicts the true color value multiplied by its opacity value, which is given by cc = AcCc (11) A similar argument follows, as presented above, when Z2 is less then Z1. For this condition, substitute pixel subscript identifiers "2" for "1" and "1" for "2" in the development presented above. The composite depth, opacity, and color would then be given by zc = z2 (12) Ac = A2 + (1 A2)A1 (13) cc = A2C2 + (1 A2)A1C1 (14) respectively. The incorporation of the "zmin" operator with the "over" operator requires an additional development for the effects of two pixels with equal depth values. This condi- tion implies that two objects are occupying the same voxel in space. Therefore, both objects contribute to the in- tensity of the resultant pixel, but the intensity contribu- tion due to each of these objects is nebulous. This condition can be understood by considering the quantization error due to the use of finite depth values. The opacity contributions from the input pixels may be due to pixel overlap. But, the foreground and background object can not be discerned, since the difference in depth is within the limits of the quantization error. The development of this condition will consider the pixel as a small cubic volume, instead of a small surface. This model will allow the edges of two objects to be projected into its space. The viewable or reflective front surface of this small cubic volume is only of interest for determination of the opacity and color values. The composite opacity is found by first considering the condition, Z1 < Z2, where (Z2 Z1) is within the quantization error. The composite opacity would then be equal to Equation 9. Now, consider the condition, Z1 > Z2, where (Z1 Z2) is within the quantization error. The composite opacity would then be equal to Equation 13. The probability of either of these conditions occurring within the small cubic volume are equal. Therefore, the composite opacity and color values are computed through a simple average of the two possible conditions, which are given by Ac = [(A1 + (1 A1)A2) + (A2 + (1 A2)A1)]/2 = Al + A2 A1A2 (15) and cc = [(A1C1 + (1 A1)A2C2) + (A2C2 + (1 A2)A1C1)]/2 = A1C1 + A2C2 (C1 + C2)A1A2/2 (16) Also, the composite depth is given by Zc = Zi = Z2 (17) It is interesting to note that Equations 9, 13, and 15 are equal. Boundary analysis of Equation 16 is performed to check its validity, which is presented as follows: Cc = A1C1, if A2 = 0 (18) Cc = A2C2, if A1 = 0 (19) Cc = (C1 + C2)/2, if A1 = A2 = 1 (20) The first two boundary examples demonstrate a reduction to a single pixel case, which is to be expected. The last boundary example reduces to an average color that does not become amplified, which is also to be expected. A psueoo- code outline of this RGBZA compositing algorithm with respect to a pair of image arrays is given in Figure 4-3. As shown, each pixel of the two image arrays are compos- ited to produce a composite image array for display. The RGBZA Compositing Algorithm given An array RGBZAl[x,y] An array RGBZA2[x,y] An array rgbZAc[x,y] begin for each element(x,y) of array rgbZAC[x,y] do AC = Al + A AIA2 f Z1 < Z2 henA rc = A1R1 + (1-A1)A2R2 gc = AG1 + (1-AI)A2G2 bc = AiB1 + (1-AI)A2B2 Z = Z1 endif if > Z2 then rc = A2R2 + (1-A2)A1R1 c = A2G2 + (1-A2)A1G1 bc = A2B2 + (1-A2)A1B1 Z = Z2 endif if Z = Z2 then rc = AIR1 + A2R2 (R1 + R2)AIA2/2 gc = A1G1 + A2G2 (GI + G2)A1A2/2 bc = A1B1 + A2B2 (B1 + B2)AIA2/2 Z = Z1 endif endfor Display rgbc array of the rgbZAc array end Figure 4-3. The RGBZA Compositing Algorithm. composite opacity and depth values are not needed for display; they are included so that the resultant image array can be composite with other image arrays. This subject is discussed in the following section. Network Structure The compositing operation described in the previous section dealt with compositing two pixels, each produced from two separate objects, to form a single composite pixel as a result. A method of compositing many pixels, where each pixel is produced from many objects, would be to create a hierarchy of compositing operations. At the bottom of the hierarchy, compositing operations would simultaneously accept pixel values from separate image arrays as input. The multiple outputs of the bottom level in the hierarchy would be used as inputs to the next level in the hierarchy. This process would continue until an individual output is produced at the top of the hierarchy of compositing operations. The result would be a composite pixel value of every pixel value used as input to the lowest level of the hierarchy. This composite pixel value would then be written into an image array at the same X, Y coordinate that was used for the input pixels. The same procedure would be done for all succeeding composite pixel value outputs of the hierarchy, which would produce a complete composite image array of many objects. A hardware synthesis of the hypothetical hierarchy of compositing operations is what defines the compositing network. It is created through interconnecting an ensemble of fundamental hardware compositing units that realize the compositing operation. These units are termed compositing processing nodes (CPNs). The defined function of a CPN is to produce a single composite pixel value from a pair of input pixel values. It maintains a 2-to-1 configuration, where the output of one CPN can supply an input to a suc- ceeding CPN. This is an iterative property, which is the property required to realize the hierarchy of compositing operations in hardware. The structure of the entire compositing network is driven by the structure of an individual CPN. Therefore, the interconnection of CPNs forms a binary tree, which realizes the compositing net- work. A depiction of this structure is shown in Figure 4- 4, which illustrates a fully balanced 3 level compositing tree that has 7 CPNs and 8 connections for deeper CPN levels or for terminal OGN connection. The general struc- ture for a fully balanced tree with N terminal connections maintains n=log2N levels with N-l CPNs for the network configuration. However, the compositing network does not have to be a fully balanced tree. It can be unbalanced as long as all of the OGNs are connected at the same level within the system tree. To Video Generation Node (VGN) Level 1 Level 2 Level 3 1 2 3 4 5 6 7 8 From deeper levels of Compositing Processing Nodes (CPNs) or from eight Object Generation Nodes (OGNs) Figure 4-4. A fully balanced three level compos- iting tree. The compositing network can be modeled as a pipelined machine, where each level in the system's binary tree structure is a pipeline stage. At every machine cycle, a collection of pixels, with identical X, Y coordinates, are routed to the nodes within the lowest level of the system tree. The machine operation proceeds in a synchronous feed-through manner for every machine cycle, where a col- lection of pixels at a particular level in the system tree is computed to produce a collection of composite pixel values as a result. These results are routed, before the next machine cycle, to the inputs of a succeeding level in the system tree. Therefore, each succeeding level in the system tree produces half the amount of pixel values (fully balanced tree) then were provided for input. The output of this machine provides a single composite pixel value as a result, which is produced from the highest level of the system tree. This structure is classified as a synchronous feed- forward configuration, where CPN operation is synchronous with the image update rate. Therefore, the machine cycle- time is a function of both the image space resolution and the image update rate. The pipeline is considered full when every CPN in the system tree has a valid input. During a full pipeline state, each level of the tree is processing a set of pixels that have identical X, Y coordi- nates. Therefore, the start-up time through a tree will be a function of the tree depth and the number of pipeline stages within an individual CPN. The effect of the feed-through structure, of the compositing network, has to be considered regarding the RGBZA algorithm. This structure has a cumulative affect that directly influences the compositing operation. There- fore, the RGBZA algorithm has to be adjusted to accommodate this fact. The compositing network is a subtree of the system tree and the OGNs are terminal nodes of the system tree that provide input to the compositing network. Now, con- sider the evaluation of the composite opacity value from a fully balanced system tree with i CPNs and i+l OGNs, where the total number of tree nodes is 2i+l. The CPNs are located at binary tree positions 1 through i. The OGNs are located at binary tree positions i+l through 2i+l. Note that a fully balanced system tree is used to simplify this development. However, the system tree can be unbalanced to accommodate a collection of OGNs that are not a binary multiple. The criteria is for all of the OGNs to exist at the same level within the system tree. This subject is discussed further in the system features discussion of the conclusion. For the fully balanced system tree, the composite opacity defined at the first CPN or root node, 1, to the last CPN, i, for all cases, is given as follows Al = A2 + A3 A2A3 A2 = A4 + A5 A4A5 A3 = A6 + A7 A6A7 Ai = A2i + A2i+l A2iA2i+l (21) where the subscript identifies the tree node number. The result is a recursive relation for the evaluation of the opacity value. The composite color value and depth value are defined through the use of a similar development for each of the three depth value comparisons. The condition Z2i < Z2i+l gives Zi = Z2i (22) ci = c2i + (1 A2i)c2i+l (23) and the condition Z2i > Z2i+l gives Zi = Z2i+l (24) ci = c2i+l + (1 A2i+l)c2i (25) and the condition Z2i = Z2i+ gives Zi = Z2i = Z2i+1 (26) ci = c2i + c2i+l (A2ic2i+l + A2i+lc2i)/2 (27) where the lower case color "c" depicts the true color value multiplied by its opacity value. This form of the compos- iting functions require each color value entering the compositing network to be multiplied by its respective opacity. Also, each composite color value exiting the network will be the composite color multiplied by its composite opacity. The recursive relations are handled by iterative techniques utilizing CPNs. The RGBZA compositing algorithm that each CPN should execute is depicted in Figure 4-5. This algorithm, which is termed the general RGBZA compos- iting algorithm, includes the image arrays and the multi- plication operation of the OGN's. It also includes the image array of the VGN and a reference to the DDN. The second loop within the main loop is the actual network algorithm. This task inputs a collection of pixel values for processing, according to their respective depth re- lationship, to produce a single surviving composite pixel value for output. The loop counts down in order to obviate the start-up time that would be associated with a hardware pipeline. The operation of the entire compositing network action is reduced to a special case when only opaque objects are involved without the inclusion of special effects (e.g., dissolves, darkening, antialiasing, etc.). This is given as follows {Z2i, if Z2i < Z2i+l c = Z2i+l if Z2i > Zi+l (28) (Z2i, if Z2i = Z2i+ General RGBZA Compositing Algorithm const n = total number of CPNs in tree given Array rgbZAi, where i=1,2,3,...,2n+l {node registers} Array RGBZAjy, where j=l,2,3,...,n+l {OGN Memory} Array rgb x' {VGN Memory} begin for each element(x,y) of RGBZA8 x and rgby do for i=n+l to 2n+l do {load 6N'utput registers} j =i-n ri = AjYjY A = Aj= -i -jx,yGjIx~y end or {end load OGN output registers} for i=n downto 1 do (CPN compositing operation) A = A2i + A2i+ A2i(A2i+) if Z < 22i+1 then ri =r2i + (l-A2i)r2i+l gi= g2i + (1-A2i)92i+l bi = b2i + (l-A2i)b2i+l Zi Z2i endif if Z2i > Z2i+l then i = r2i+l + (l-A2i+l)r2i gi = g2i+1 + (1-A2i+1)92i bi = b2i+l + (l-A2i+l)b2i Zi Z2i+l endif if Z2i = Zi+l then ri = r2i + r2i+l (A2ir2i+l + A2i+lr2i)/2 i = g2i + g2i+ (A2ig2i+l + A2i+192i)/2 bi = b2i + b2i+l (A2ib2i+l + A2i+1b2i)/2 Zi = Z2i endif endfor {end CPN compositing operation} rgbxy = rgbi {write composite result to VGN} endfor Display rgbxy array {DDN} end Figure 4-5. The general RGBZA compositing algorithm for a fully balanced tree. Note that lower case letters designate the product of intensity and opacity. C2i, if Z2i < Z2i+ C= {C2i+1 if Zi > +1 (29) {(C2i + C2i+1)/2, if Z2i Z2i+1 at every pixel in the i+l image arrays. The composite pixel has either full coverage or no coverage. Therefore, the opacity information is not needed. Also, the matte information is implied by a depth value that is not the maximum. The specialized RGBZ algorithm is presented in Figure 4-6. As for Figure 4-5, the second loop within the main loop is the actual network algorithm. This task inputs a collection of pixel values for processing, ac- cording to their respective depth relationship, to produce a single surviving composite pixel value for output. A mixture of both the specialized and the general forms of the compositing algorithm for CPNs can compose a compositing network. The OGNs can be specialized for opaque objects without antialiasing and special effects. These nodes would be assigned to the section of the tree that contain the specialized CPNs. Also, OGNs that process objects with antialiasing and special effects can be assigned to the section of the tree that contain the general CPNs. Configurations could include a mix of general and specialized CPNs. The purpose of mixing CPNs would be to reduce the system complexity, since the specialized CPNs are of a simpler form than the general CPNs. Specialized RGBZ Compositing Algorithm const n = total number of CPNs in tree given Array RGBZi, where i=1,2,3,...,2n+l {node registers) Array RGBZj ,,, where j=l,2,3,...,n+l {OGN Memory) Array RGBxy {VGN Memory) begin for each element(x,y) of RGBZ and RGB do for i=n+l to 2n+l do {load 6GN output re4gsters) j =i-n Ri = R Gi = Gj,x,y Bi =Bjx,y B = B*'x'Y i ),x,y Z = Z. end or {end load OGN output registers) for i=n downto 1 do {CPN compositing operation} if Z2i < Z2i+l then Ri = R2i Gi = G2i Bi = B2i zi = Z2i endif if Z2i > Z2i+ then Ri = R2i+1 Gi = G2i+1 Bi = B2i+ Z = Z2i+ endif if Z2i = Z2i+ then Ri = (R2i + R2i+i)/2 Gi = (G2i + G2i+i)/2 Bi = (B2i + B2i+i)/2 Z = Z2i endif endfor (end CPN compositing operation) RGBy = RGB1 {write composite result to VGN} endfor Display RGBx y array {DDN} end Figure 4-6. The specialized RGBZ compositing algorithm of a fully balanced tree. Compositing Processing Node The purpose of a CPN is to perform pixel-by-pixel compositing. It is a fundamental iterative hardware build- ing block used to construct a CN tree. A generic CPN configuration is depicted in Figure 4-7. The subscript "i" is a node number that identifies a node within the system tree, which consists of CPNs, OGNs, and a VGN. The CN is a subset of the system tree that contains only CPNs. The OGNs are the terminal nodes of the system tree. The VGN is connected to the root node of the CN and is identified through node number zero of the system tree. As shown, the CPN structure maintains a 2-to-1 configuration. The data inputs consists of two pixel values P2i and P2i+lr which can be routed to the CPN by either two preceding CPNs or by two preceding OGNs. The data output consists of a single pixel value, which can be routed to the input of a suc- ceeding CPN or to the input of a video generation node. The CPN input clock, CLK, is driven by a system clock that synchronizes the internal CPN operation with the entire system. This signal is provided by the video generation node (VGN), which maintains the entire system timing and control. A pixel is represented by five independent variables: RED, GREEN, BLUE, ALPHA, and Z. The tri-stimulus color or intensity is represented by the values of RED, GREEN, and BLUE. The ALPHA value represents the average opaqueness of Pixel output to next stage of system tree System Clock From VGN Pixel input from Pixel input from previous stage previous stage of system tree of system tree where, P. ={ri, g bi, Ai, Zi} P2i ={r2i' 2i, b2i A2i Z 2 P2i+1 ={r2i+1 g 2i+1' b2i+1, A 2i+1' Z2i+1 CPNi is a CPN located in the CN at node position "i." CLK is the system timing input. Figure 4-7. An iterative building block depiction of a compositing processing node (CPN). the pixel or the average light blocking characteristic of the material that the pixel represents. The Z value repre- sents the Z coordinate, in cartesian space, where the pixel exists. The X, Y cartesian coordinates are implied as identical for both input pixels, but may be different within the same clock cycle for the single output pixel due to the hardware pipeline approach. Schemes for realizing the previously discussed RGBAZ compositing algorithms are developed that are fast and inexpensive to implement in hardware, but which produces results of numerically high quality. These schemes honors two considerations: machine and numerical considerations. Machine considerations concern speed and cost of the physi- cal device. Numerical considerations concern the closest approximations to the exact numbers. The schemes attempt to maintain a balance between both. Also, the effects of roundoff error accumulation due to the feed-through opera- tion of the binary tree of CPNs are taken into consider- ation. Finite precision fixed-point numbers are used in this machine for representation of the pixel values. This representation allows storage of pixel values within the local image buffers of the OGNs and of the VGN to be inte- gers, which simplifies the image buffer organization. Also, the hardware complexity for realization of the com- positing algorithms and of the video generation processing algorithms is reduced, along with accommodation of faster cycle times for an implementation. Therefore, the compos- iting algorithms that are depicted in Figures 4-5 and 4-6 must be adjusted to accommodate the fixed-point repre- sentation of a pixel value, which relates to the machine precision of a number represented within and operated by a CPN. The tri-stimulus color variables RED, GREEN, and BLUE, are usually represented in rendering algorithms as fixed- point numbers. Therefore, they do not create an initial problem. But, roundoff error amplification can occur due to the repetitive modification of these values through the compositing network. Therefore, to enhance the numeric accuracy of the final result, the roundoff error has to be controlled. Representation of the opacity value, ALPHA, presents a similar problem, but differs slightly since its initial value is defined as a fractional number. The depth variable, Z, is usually represented within a rendering algorithm as a floating-point number. The compositing network handles the depth value as an integer and does not modify its value, therefore its floating-point value can be rounded or truncated. The algorithm modification and the CPN conceptual hardware organization are discussed in the following sections. Two CPN organizations are presented: the general CPN and the specialized CPN. The combinatorial hardware layout of the conceptual block diagrams consist of breaks in the logic for registers, termed stages. This is done to maintain a pipeline of partial operations, which enhances performance. Maximum system performance is then achieved by matching the clock cycle to the longest delay through the slowest stage. The stage delay is calculated by totaling the delay through the logic and conductors that exist between the two registers of a stage. Also, the stage with the longest delay becomes the bandwidth limiting section. General Compositing Processing Node A general CPN performs pixel-by-pixel compositing of various object types: opaque, transparent, and semi-trans- parent. It also handles antialiasing and special effects, such as fade-outs and fade-ins. The hardware organization contains three distinct functional units: the depth compu- tation unit, the opacity computation unit, and the color computation unit. These functional units are discussed with respect to finite fixed-point pixel value repre- sentation. The algorithm and the conceptual hardware organization of each unit is presented. Depth computation unit The depth computation unit discerns the foreground pixel from the background pixel or identifies both as foreground pixels. This unit functions according to the algorithm presented in Figure 4-8, which is a subset of the general RGBZA compositing algorithm of Figure 4-5. The depth value, Z, is represented by a single z-bit integer, where 0 S Z < (22-1). Therefore, the floating point repre- sentation of this value is initially truncated or rounded. The task performed by this unit, as the algorithm indi- cates, consists of 1) receiving a pair of depth values, 2) performing a comparison of depth values, 3) providing status information, and 4) outputting the smallest depth value. Status information consists of the LESS bit and EQUAL bit, which are used by the color computation unit. The LESS bit, when set, indicates that the Z2i value is smaller then the Z2i+l value. The EQUAL bit, when set, indicates that the Z2i value and Z2i+l value are equal in magnitude. A block diagram of the CPN depth computation unit is depicted in Figure 4-9. Stage 1 performs an initial load of the incoming pair of depth values from the CPN intercon- nect. Stage 2 performs a comparison of the two depth values for status information and passes the two depth values along with the status information. Stage 3 routes the surviving depth value, which is the composite depth value, to succeeding stages utilizing a 2:1 multiplexer with the LESS status bit as a selector. The succeeding stages are waiting stages that allow a final result to CPN Depth Computation Unit Algorithm given literal Z2i literal Z2i+l begin EQUAL = 0 if Z2i < Z2i+l then Z. =Z ESS =2 else ii Z 26+1 if Z E = Z2i+1 then EQUAL = 1 endif endif end {z-bit integer) (z-bit integer} {result is a z-bit integer} Figure 4-8. The algorithm performed by a general CPN depth computation unit. Comparator Symbol In A In B V II A Note: Registers are clocked at image update rate. Figure 4-9. Block diagram of a general CPN depth computa- tion unit. Z2i Z2i+1 occur simultaneously with the remaining CPN computation unit results. Opacity computation unit The opacity computation unit produces a single com- posite opacity value from two opacity values that are provided as input. The opacity is defined as a positive fractional value that ranges from zero to one. Each opac- ity value is stored in the image buffers of this machine as fixed-point binary numbers. Therefore, the opacity value, A, is represented by a positive fixed-fractional value given by A 0 ---- 1 (30) Amax where A is a binary integer such that 0 < A < Amax, and Amax is a constant that defines the range of opacity. The local image buffers store the integer value, A, while the fixed-fractional value is incorporated by the hardware. Substituting the opacity representation of Equation 30 into Equation 21 and collecting terms, gives (Amax A2i)A2i+l Ai = A2i + ---------- (31) Amax The division required in Equation 31 is eliminated by defining Amax as 2m-1, where m is the number of bits in A. This transforms the division operation to a shift opera- tion. A trade-off occurs with this technique, since each image buffer, within the OGNs, will require an extra bit- plane and an extra signal line to represent the opacity value for a particular range. Substituting the value of Amax into Equation 31, gives Ai = A2i + -----m-------- (32) In order to have a more accurate result, the hardware unit represents each opacity value as a higher precision number, which reduces the roundoff error accumulation through the compositing network. This is shown by multiplying both sides of Equation 32 with 2m and adjusting the product term, which gives (22m-1 2mA2i)2mA2i+l 2mAi = 2mA ----------------- (33) 22m-1 Equation 33 shows that the opacity value can be handled as a double precision number, if the opacity value is shifted left by m bits and if the least-significant-half of the word is padded with zeroes before entering the CN tree. Therefore, the opacity computation with the opacity values defined as double precision numbers is given by (22m-1 A2i)A2i+1 Ai = A2i + ----22----- --- (34) 22m-1 where the opacity value, Ai, is a binary integer such that 0 < Ai < 22m-l The opacity computation unit functions according to the algorithm presented in Figure 4-10, which follows the developed relations. The task performed by this unit consists of 1) receiving a pair of opacity values, 2) performing an opacity compositing operation, and 3) output- ting the composite opacity result. A block diagram of the compositing computation unit is depicted in Figure 4-11. Stage 1 performs an initial load of the pair of opacity values from the CPN intercon- nect. Stage 2 performs a subtraction operation and passes the two opacity values along with the subtraction result. Stage 3 performs a multiplication of the subtraction result with the A2i+l opacity value and shifts the multi- plication result right by 2m-2 bits (division). It also passes the A2i opacity value along with the shifted multi- plication result. Stage 4 sums the A2i value with the shifted multiplication result and performs rounding, which produces the composite opacity. Note that eliminating the signal input to the carry bit and setting the carry bit to 0 will cause chopping of the multiplication result, instead of rounding. The succeeding stages are waiting stages that allow a final result to occur simultaneously with the remaining CPN computation unit results. CPN Opacity Computation Unit Algorithm const m = number of bits of initially stored opacity value given literal A2i (2m-bit integer} literal A2i+1 {2m-bit integer) begin Ai = A2i + [(22m-1 A2i)A2i+1] shr 2m-l if [(22m-1 A2i)A2i+ AND 22m-2] = 22m-2 then Ai = Ai + 1 (roundoff error) endif end {result is a 2m-bit integer) Figure 4-10. The algorithm performed by a general CPN opacity computation unit. A2i A2i+1 22m-1 A2i A2i+1 LSB Note: Registers are clocked at image update rate. reg reg Aj Figure 4-11. Block diagram of a general CPN opacity compu- tation unit. Adder Symbol InA InB Overflow Carry In Result Color computation unit The color computation unit produces a composite color value from two color values that are provided as input. The color values are defined as the true tri-stimulus color values multiplied by their respective opacity value. Each primary color (intensity) value, c, is stored in the machine's image buffers as an n-bit integer, where 0 < c < 2n-l. But, this hardware unit handles the opacity and the color values as higher precision numbers (2m and m+n-1 bits respectfully) in order to reduce the roundoff error accumulation through the compositing network. The composite color operation for the Z2i < Z2+1 condition is developed by substituting the higher preci- sion representations of both the intensity and the opacity values into Equation 23, which gives (22m-1 A2i)c2i+l ci = c2i + ---- m-----(35) where each primary color value, c, is defined as an n+m-1 bit value within the CN tree. Therefore, each n-bit prima- ry color value requires multiplication by the m-bit opacity value before entering the CN tree. When exiting the CN tree, each color value requires shifting right by m-1 bits with rounding or chopping to provide an n-bit result. The Z2i > Z2i+1 condition is obtained through a simi- lar development as above, but with the use of Equation 25. It is given in final form by (22m1 A2i+l)c2i ci = c2i+1 + ------2------ (36) The Z2i = Z2i+ condition is also obtained through a similar development as above, but with the use of Equation 27. It is given in final form by (22m-1 A2i+1/2)c2i (A2i/2)c2i+1 ci = c2i+1 + ----------- --- (37) 22m-1 22m-1 Note that the first two terms of Equation 37 are similar to Equation 36. This reduces the hardware requirement for a realization. The color computation unit functions according to the algorithm presented in Figure 4-12, which follows the developed relations, where the color triad is represented by c. The task performed by this unit consists of 1) receiving a pair of color triads, a pair of opacity values, and status information, 2) performing a compositing opera- tion according to the depth comparison, and 3) outputting the result. A block diagram of the color computation unit is depicted in Figure 4-13. Stage 1 performs an initial load of the incoming pair of color values from the CPN intercon- nect. Stage 2 is a waiting stage for the status result of the depth computation unit. Stage 3 routes the color and CPN Color Computation Unit Algorithm const n = number of bits of initially stored intensity value m = number of bits of initially stored opacity value given literal c2i literal c2i+1 literal A2i literal A2i+l literal LESS literal EQUAL {m+n-1 bit integer: red, green, or blue) {m+n-1 bit integer: red, green, or blue) (2m bit integer) {2m bit integer) (1 bit} (1 bit} begin if LESS = 1 then2m 2i < Z2i+1) ci = c2i + [(22- A2i)c2i+l] shr 2m-1 if [(22m-1 A2i)c2i+l AND 22m-2] = 22m-2 ci = ci + 1 {roundoff error) endif else if EQUAL = 1 then {Z(i = Z2i+I ci = c2i+ + [(22m- 2i+ sr l)c2i] [(A2i shr l)c2i+1] shr 2m-1 then shr 2m-1 if [(22m-1 A2i+1 shr 1)c2i AND 22m-2] = 22m-2 then c = c, + 1 (roundoff error) endif 2m- if [(A2i shr l)c2i+1 AND 22m-2] = 22m-2 then ci = ci 1 {roundoff error) endif else 2m Z2i > Z2i+} ci = c2i+1 + [(2 A2i+l)c2i] shr 2m-1 if [(22m- A2i+l)c2 AND 22m-2] = 22m-2 then c = c + 1 {roundoff error) endif endif endif end {result is an m+n-1 bit integer) Figure 4-12. The algorithm performed by a general CPN color computation unit. A2i A24~1 LESS EQUAL Adder Symbol InA In B Overflow + Carry In Result Note: Registers are clocked at image update rate. Transparency Figure 4-13. Block diagram of a general CPN color compu- tation unit. TRANSPARENCY Figure 4-13--Continued TRANSPARENCY Figure 4-13--Continued TRANSPARENCY Figure 4-13--Continued opacity values according to their depth priority. It also shifts the routed opacity value right by 1, if the two depth values are equal. Included is the multiplication of the c2i+l color value with the halved A2i opacity value. Also, the multiplication result is shifted right by 2m-2 bits (division). Stage 4 performs the opacity subtraction operation and rounds the shifted multiplication result of stage 3. It also, passes the routed color values, the EQUAL status bit, and the subtraction result (transpar- ency). Stage 5 performs a multiplication of the subtrac- tion result (transparency) with a routed color value along and passes a routed color value. Included is shifting the multiplication result by 2m-2 bits (division). It also routes the rounded value of stage 4, if the two depth values are equal, but routes all zeroes, if the two depth values are not equal. Stage 6 sums the shifted multipli- cation result with the passed color value along with rounding and passes the multiplexer result of stage 5. Stage 7 subtracts the multiplexer result from the addition result of stage 6, which produces the composite color value. Specialized Compositing Processing Node A specialized CPN performs pixel-by-pixel compositing of opaque objects without antialiasing or special effects. The hardware organization contains two distinct functional units: the depth computation unit and the color computation unit. An opacity computation unit is unnecessary, since this specialized CPN does not incorporate antialiasing, semi-transparency, transparency, and special effects. These two computation units are discussed with respect to finite fixed-point pixel value representation. The algo- rithm and the conceptual hardware organization of each unit is presented. Depth computation unit The depth computation unit discerns the foreground pixel from the background pixel or identifies both as foreground pixels. This unit conceptually functions iden- tically to the depth computation unit of the general CPN. Therefore, it functions according to the algorithm pre- sented in Figure 4-8. As for the general CPN, the depth value, Z, is represented by a single z-bit integer, where 0 < Z < (2Z-1). Therefore, the floating point repre- sentation of this value is initially truncated or rounded. The algorithm discussion can be found in the general CPN section. The block diagram of the depth computation unit is identical to the unit of the general CPN. Therefore, its depiction is shown in Figure 4-9. The hardware discussion can be found in the general CPN section. Color computation unit The color computation unit produces a single composite color value from the pixel values that are provided as input. The color value, C, is defined as the true tri- stimulus color values. This machine stores each primary color (intensity) value in its image buffers as an n-bit integer, where 0 < C < 2n-1. The color value is passed as an n-bit integer in the hardware; therefore no shifting is necessary before entering a specialized CPN. The unit functions according to the algorithm presented in Figure 4- 14, which follows the relations developed in Equation 29, where the color triad is represented by C, and each color is an n-bit integer. The task performed by this unit consists of 1) receiving a pair of color triad values and status information, 2) performing a composite operation of the color triad according to the depth comparison, and 3) outputting the result. A block diagram of the color computation unit is depicted in Figure 4-15. Stage 1 performs an initial load of the incoming pair of color values from the CPN intercon- nect. Stage 2 sums both color values along with shifting the result right by one. It is also a waiting stage for the status result of the depth computation unit. Stage 3 routes a color triad according to its depth priority utilizing a 4:1 multiplexer with both the LESS and EQUAL status bits as selectors. The succeeding stages are CPN Color Computation Unit Algorithm given literal C2i literal C2i+l begin if LESS = 1 then Ci = C2i else if EQUAL = 1 then Ci = (C2i + C2i+l) else Ci = C2i+1 endif end {n-bit integer) {n-bit integer} {Z2i < Z2i+) {z2i =Z2i+l) shr 1 (Z2i > Z2i+l) {result is an n-bit integer) Figure 4-14. The algorithm performed by a specialized CPN color computation unit. LESS - EQUAL- Adder Symbol In A In B Overflow Carry In Result Note: Registers are clocked at image update rate. Figure 4-15. Block diagram of a specialized computation unit. CPN color Figure 4-15--Continued LESS - EQUAL Figure 4-15--Continued waiting stages that allow a final result to occur simulta- neously with the general CPNs. The result is a point sampled composite color. Analysis The compositing network analysis examines two areas: complexity and performance. Complexity is estimated for discrete construction, for VLSI fabrication, and for gate- array construction. Performance is examined with respect to image space resolution, CPN processing speed, and compositing network tree depth. Complexity The compositing network complexity is a function of both the CPN complexity and the quantity of CPNs config- uring a network. CPN complexity is measured utilizing two metrics: gate count estimate and I/O signal pin count estimate. The gate count estimate is determined through partitioning the CPN conceptual hardware organization into individual functional logic blocks, which are off-the-self SSI, MSI, and LSI components. Then, the estimated gate count of each functional logic block is determined and totaled to provide a gate count estimate of a CPN. This technique provides an estimate of expected complexity for integrated circuit fabrication. It also provides an estimate of board-level complexity for an off-the-shelf integrated circuit implementation, which is determined through totaling the functional logic block package types used. It should be noted that performance is usually enhanced for a realization by judicious use of additional gating, which may alter the estimated gate count. The partitioning of the design into functional logic blocks allows the examination of implementation trade-offs that are offered between different technologies. It reduces the organization to a logic format that can be matched to the logic resources of a target device. The hardware synthesis can be individually optimized around each vendor's library and design rule guidelines for a VLSI or a gate-array realization. The functional block equivalent is listed in Table 4- 1, the I/O pin count is listed in Table 4-2, and the gate equivalent of various standard size functional logic blocks are listed in Table 4-3. The functional block equivalent and I/O pin counts were compiled from the conceptual hardware organizations. The standard logic device gate counts were estimated by counting the gates within the functional block diagrams given in TTL data books [FAI84, SIG84]. The gate count of the 16-bit multiplier was esti- mated by considering it to be a full adder tree without input and output registers [KUC78]. This was done since the input and output registers are taken into account when estimating gate equivalences of the staging registers. These gate counts are expected to be close to an upper- 79 Table 4-1. Functional logic block equivalent of the general CPN and the specialized CPN. Logic General CPN Specialized CPN Function DCU OCU CCU DCU CCU Comparator (z-bit) 1 0 0 1 0 Adder (m-bit) 0 4 8 0 0 Adder (n-bit) 0 0 6 0 3 2:1 Multiplexer z 0 6m+6n-6 z 0 4:1 Multiplexer 0 0 5m+3n-3 0 3n D Flip-Flop 9z+2 22m+l 52m+48n-34 9z+2 30n Multiplier (2mX2m) 0 1 0 0 0 Multiplier 0 0 6 0 0 ((m+n-l)X2m) Note: Inverters are not included, since inversion can be produced through flip-flop output selection. 80 Table 4-2. Pin requirement for the general CPN, the specialized CPN, and each CPN computation unit. Signal General CPN Specialized CPN I Name DCU OCU CCU GCPN DCU CCU SCPN RED2i GREEN2i BLUE2i Z2i ALPHA2 i RED2i+1 GREEN2i+l BLUE2i+1 Z2i+1 ALPHA2i+1 REDi GREENi BLUEi zi ALPHAi CLK LESS EQUAL POWER GROUND 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 z 0 0 2m 0 z 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 z 0 0 2m 0 z 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 0 0 m+n-1 m+n-1 z 0 0 2m 1 1 1 0 1 0 1 1 1 1 0 2m 1 1 1 0 1 0 1 1 1 1 n n n n n n 0 z 0 0 n n n n n n 0 z 0 0 n n n n n n 0 z 0 0 1 1 1 0 1 0 1 1 1 1 total pins 3z 6m 9n 9n 3z 9n 9n +5 +3 +13m +15m +5 +5 +3z -4 +3z +3 -6 Table 4-3. Gate equivalent and package pin count of various functional logic blocks. Logic Package Gate Function Pins Equivalent 4-bit Magnitude Comparator (74F85) 16 31 4-bit Binary Full Adder (74F283) 16 36 Quad 2:1 Multiplexer (74F157) 16 15 Dual 4:1 Multiplexer (74F153) 16 16 Octal D-Type Flip-Flop (74F273) 20 48 16x16 Bit Multiplier (29517A) 64 4320* This approximate number excludes the input and output registers, which would account for about 288 additional gate equivalent units. bound. The procedure for estimating the total gate count for a specific design is found utilizing the tables subsequent to determining the CPN parameter values: m, n, and z. The total CPN gate count can be estimated for a different set of functional blocks by changing Table 4-3, followed by performing the suggested procedure. A graphics system is defined to exemplify the sug- gested technique for estimation of CPN complexity. The graphics device has three criteria: it will be a full-color system, it will have better than one-percent incremental change in opacity, and it will have a high-precision depth resolution. A full-color device requires the tri-stimulus colors to provide 16.7 million simultaneous colors, which is about at the human visual perception limits [ROG85]. Therefore, the required value of n is 8, which provides 8 bits for each primary color: RED, GREEN, and BLUE. The resolution of opacity that would provide better than a one-percent incremental change requires m to equal 8, since its range would be 127 (0 < A < 2m-l). The value of z is selected as 24, since the depth resolution of 24-bits is satisfactory for high-end graphics devices. Relating these selected parameter values of m, n, and z with Tables 4-1 through Table 4-3 produces the specified system estimated complexity, which is presented in Table 4-4. The general CPN has a complexity of about 12 times that of the specialized CPN. Therefore, a compositing Table 4-4. Estimated complexity of the general CPN, the specialized CPN, and each CPN computation unit. Type of General CPN Specialized CPN Count DCU OCU CCU GCPN DCU CCU SCPN Pins 77 51 172 258 77 77 147 Gates 1585 5670 33362 40617 1585 1848 3433 Packages 40 32 251 323 40 48 88 16-Pin 12 8 149 169 12 18 30 20-Pin 28 23 96 147 28 30 58 64-Pin 0 1 6 7 0 0 0 Note: The CPN parameters for m, n, and z, are 8, 8, and 24. network that can utilize a mixture of both the general and the specialized CPNs would be the most efficient configura- tion. Table 4-4 indicates that a board level CPN imple- mentation would have a reasonable package count for the general CPN and a very reasonable package count for the specialized CPN. This indicates that a CPN implemented using off-the-shelf parts is within bounds. At the time of this writing, a 16,000-gate bipolar ECL/TTL array with 100- ps delays and 292 input/output cells was available [COL88]. Chip densities of HCMOS arrays are as high as 237,000 gates, with 400-ps switching delays [BUR88]. Therefore, the CPN gate counts and pin counts are within bounds for a single chip VLSI implementation or a single-chip gate-array implementation. Performance The CPN performs pixel-by-pixel processing that is independent of scene complexity. Therefore, its pro- cessing-time is a function of both the image space reso- lution and the image update rate, which is given by 1 Processing-Time = ----------------------------- (38) (Image Update Rate)(Resolution) The image update rate is considered real-time at 10 frames per second, since images sequenced at this rate appear to have a smooth visual flow. The image space resolution is defined as the total number of visible pixels. The CPN processing-time for various image space resolutions is presented in Table 4-5. As shown, to double the final resolution while maintaining the same level of performance, the speed of the CPNs must be increased by a factor of four. The CPNs of the compositing network operate in lock- step. When the tree structure is maximally unbalanced so that each node has a left descendant but none has a right one, the compositing network degenerates into a linear pipeline. This implies that each stage of the linear pipeline must perform its function within the CPN pro- cessing-time to successfully composite a collection of pixels. Therefore, the slowest stage in the pipeline is what determines the peak performance of the compositing network. This is the multiplication stage for the general CPN, which implies that the multiplier parts are what determine the compositing network performance if any CPN is of the general type. In contrast, the comparator stage determines the compositing network performance if all CPN's are of the specialized type. Consider the example of the constraints section, where all of the CPN's are of the general type. The 16-bit multiplier, which had been specified, maintains a 45-ns multiply time (including set-up time) [ADV85]. This part has internal input and output registers, therefore the multiply time can be considered the total pipeline stage Table 4-5. CPN processing-time for various image space resolutions. The image update rate is 10 frames per second. Image Space Resolution CPN Processing-Time (pixels) (ns) 640 X 480 325.5 1280 X 960 81.4 1280 X 1024 76.3 1600 X 1280 48.8 2048 X 2048 23.8 time. Therefore, the compositing network would have a maximum bandwidth of 22.2 million results per second. From Table 4-5, all but the last entry could be supported with a single compositing network. The last entry could be sup- ported if two CN's were used, where each CN would be dedicated to a separate half of the image array, while operating at half the image update rate. The computational performance of a compositing network that is configured with all general CPNs is measured through calculating the total number of additions and multiplications that every general CPN performs per unit time. A general CPN performs, as a lower bound (all Z's not equal), eight additions and four multiplications. As an upper bound (all Z's equal), a general CPN performs eleven additions and seven multiplications. Therefore, the range of computational performance for a compositing net- work configured with all general CPNs is given by 8(CPNs)(BW) < additions/s < 11(CPNs)(BW) (39) 4(CPNs)(BW) < multiplications/s < 7(CPNs)(BW) (40) 12(CPNs)(BW) < operations/s < 18(CPNs)(BW) (41) where BW refers to the general CPN bandwidth or general CPN results per second. The CPNs refer to the total number of compositing processing nodes that comprise a compositing network. For example, consider an augmentable system archi- tecture configured with a three level CN tree with all general CPNs and a 1600 X 1280 resolution display device node. It will maintain a CN processing performance of between 1720-MOPS and 2580-MOPS (million operations per second). This throughput is what supercomputers provide, which demonstrates the potential of distributed simulta- neous calculations. The performance of a CN configured with all special- ized CPNs is measured through the total bandwidth of the CN, which is equal to the bandwidth of a unitary special- ized CPN. This metric is used since specialized CPNs primarily route data, as apposed to performing a computa- tion with regards to the data. If all depth values are equal, each specialized CPN will perform one addition. This will provide an additions per second rate that is computed through the product of the number of CPNs configured and the CPN bandwidth. The performance limiting stage of a specialized CPN is its comparison stage, but depending on word size it could be the addition stage instead. Consider* the example presented in the constraints section, but where all of the CPN's are of the specialized type. The comparison stage, utilizing the components specified in Table 4-3, maintains a 42-ns propagation delay from clock to output. Therefore, the system would have a |

Full Text |

xml record header identifier oai:www.uflib.ufl.edu.ufdc:UF0008228300001datestamp 2009-02-04setSpec [UFDC_OAI_SET]metadata oai_dc:dc xmlns:oai_dc http:www.openarchives.orgOAI2.0oai_dc xmlns:dc http:purl.orgdcelements1.1 xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.openarchives.orgOAI2.0oai_dc.xsd dc:title Augmentable object-oriented parallel processor architectures for real-time computer-generated imagery dc:creator Fleischman, Ross Morrisdc:publisher Ross Morris Fleischmandc:date 1988dc:type Bookdc:identifier http://www.uflib.ufl.edu/ufdc/?b=UF00082283&v=0000120071125 (oclc)001123865 (alephbibnum)dc:source University of Floridadc:language English
xml version 1.0 encoding UTF-8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EW09BVE8A_FTXYWU INGEST_TIME 2017-07-14T22:28:47Z PACKAGE UF00082283_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES |