AUGMENTABLE OBJECTORIENTED
PARALLEL PROCESSOR ARCHITECTURES
FOR REALTIME COMPUTERGENERATED IMAGERY
BY
ROSS MORRIS FLEISCHMAN
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1988
Copyright 1988
by
Ross Morris Fleischman
ACKNOWLEDGMENTS
I would like to express my appreciation to my advisor
and supervisory committee chairman, Dr. John Staudhammer,
for the guidance and encouragement he provided me on this
project. I am also grateful to the other members of my
supervisory committee, Dr. Keith L. Doty, Dr. Jack R.
Smith, Dr. Jose C. Principe, and Dr. Joseph Duffy, for
their commitment. I also wish to thank the members of the
UF Computer Graphics Research Group for their suggestions.
This dissertation is dedicated to my mother,
Ruth Koegel Fleischman, and to the memory of my father,
Erwin Lewis Fleischman.
iii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS . . . ... iii
LIST OF TABLES. . . . ... .. .vii
LIST OF FIGURES . . . . ... .viii
LIST OF ABBREVIATIONS . . . . x
ABSTRACT. . . . .. . .xi
CHAPTERS
I INTRODUCTION. . . . . 1
Problem Definition. . . . 2
Dissertation Project. . . . 3
Overview of Dissertation. . . . 4
II TYPICAL REALTIME CGI ARCHITECTURE. . 5
Scene Manager . . . . 5
Geometric Processor. . . . .. 8
Video Processor . . . . 9
Display Device . . . .. 10
III ALTERNATE REALTIME CGI ARCHITECTURE. . .. .12
System Model. . . . .. 12
Underlying Idea .... . . 12
Supporting Architecture . . .. 15
Advantages of Approach. . . .. .19
Target Applications. . . . .. 22
IV COMPOSITING NETWORK . . . .. 23
Compositing Methodology . . . 25
RGBZA Compositing Algorithm . . .. 33
Network Structure . . . ... 40
Compositing Processing Node ....... 50
General Compositing Processing Node .... 54
Depth computation unit. . .... 54
Opacity computation unit. . ... .58
Color computation unit. .. ....... 63
Specialized Compositing Processing Node . 70
Depth computation unit. . . .. .71
Color computation unit. . . .. .72
Analysis. . . . . ... 77
Complexity. . . . . 77
Performance . ........ . 84
V VIDEO GENERATION NODE . . ... 90
Configuration . . . . 90
Atmospheric Attenuation Unit. . . 92
Pixel Cache . . . .. 95
DoubleBuffered Frame Buffer. . . 96
Video Shift Registers . . ... 98
Color Palette . . . .. 99
DigitaltoAnalog Converters. .. . 100
System Controller . . . .. 100
Analysis. . . . . .. 101
Complexity. . . . . ... 101
Performance . . . .. 102
VI DISPLAY DEVICE NODE . . . .. 105
Display Device Approaches . . .. .105
Raster Scan Conversion. . . . 106
Image Aspect Ratio. . . . 107
Display Device Performance . . .. 107
VII OBJECT GENERATION NODE. . . . 109
Configuration . . . . .. 110
Object Generation Node Nucleus. . 110
Doublebuffered image buffer. . .. .112
Intensity multiplication unit . .. .114
Nucleus controller. . . . .. .115
Object Generation Unit. . . . 118
Secondary Memory Unit . . . 119
Analysis. . . . . . 121
Complexity. . . . . .. .121
Performance . .. . . .. 122
VIII MAINTENANCE MANAGEMENT NODE . . 124
Configuration . . . . 124
Operating Functions . . . 125
System Boot Operation . . .. 125
System Normal Operation . . .. 126
Simulation Debugging. . . . 127
Analysis. . . . ... .. .127
IX CONCLUSION. . . . . .. .129
System Simulator. . .... .... 129
System Simulation . . . 130
Discussion of System Features . ... .139
Summary . . . . .. .. .147
BIBLIOGRAPHY. . . . . .. 149
BIOGRAPHICAL SKETCH .................. 154
LIST OF TABLES
Table Page
41 Functional logic block equivalent of the
general CPN and the specialized CPN. .. 79
42 Pin requirement for the general CPN, the
specialized CPN, and each CPN computation
unit. . . . . . 80
43 Gate equivalent and package pin count of
various functional logic blocks . . 81
44 Estimated complexity of the general CPN, the
specialized CPN, and each CPN computation
unit. . . . . . 83
45 CPN processingtime for various image space
resolutions. The image update rate is 10
frames per second . . . ... 86
vii
LIST OF FIGURES
Figure Page
21 Block diagram of a typical realtime CGI
system organization . . . 6
31 Composition of an opaque background 3D
object and an opaque foreground 3D object
to produce a composite 3D scene ...... 14
32 Block diagram of the proposed augmentable
realtime CGI system organization . .. 17
41 Three distinct types of pixel coverage, with
respect to the ALPHA value: a) no coverage,
b) full coverage, and c) partial coverage.
The subpixel shape of the pixel with partial
coverage is arbitrary and is only shown in
this manner for conceptual clarity. . ... 27
42 Two pixel opacity values are composite. The
values were derived from coverage information
from two different objects. The coverage
depictions are arbitrary, they are given
specific subpixel forms to clarify the
composite operation. The coverage areas are
actually averaged across the pixel. . ... 35
43 The RGBZA Compositing Algorithm . .. 39
44 A fully balanced three level compositing
tree ... . . .... . 42
45 The general RGBZA compositing algorithm for a
fully balanced tree. Note that lower case
letters designate the product of intensity
and opacity. .. . . ... 47
46 The specialized RGBZ compositing algorithm of
a fully balanced tree . . .. 49
47 An iterative building block depiction of a
compositing processing node (CPN) . .. .51
48 The algorithm performed by a general CPN
depth computation unit. . . ... 56
49 Block diagram of a general CPN depth computa
tion unit . . . . ... 57
viii
410 The algorithm performed by a general CPN
opacity computation unit. . .. 61
411 Block diagram of a general CPN opacity compu
tation unit . . . . 62
412 The algorithm performed by a general CPN
color computation unit. . . . 65
413 Block diagram of a general CPN color compu
tation unit . . . .. 66
414 The algorithm performed by a specialized CPN
color computation unit. . . . 73
415 Block diagram of a specialized CPN color
computation unit. . .. . 74
51 Block diagram of a video generation node with
respect to its seven modules. . .. 91
52 Block diagram of the atmospheric attenuation
unit, used to include atmospheric effects in
a scene . ..... ......... . 94
71 Block diagram of an object generation node
with respect to its three modules . .. 111
72 Block diagram of the intensity multiplication
unit, used to condition the color and opacity
values for input to the compositing network 116
91 View of rectangle A and rectangle B in object
space . .. . . .... 131
92 Contents of the simulated frame buffer of
OGN1 after scan converting rectangle A. ... .134
93 Contents of the simulated frame buffer of
OGN2 after scan converting rectangle B. ... 135
94 Contents of the simulated frame buffer of the
VGN for the first system simulation . 136
95 Contents of the simulated frame buffer of the
VGN for the second system simulation. . 137
96 Two system tree configurations: a) fully
balanced system tree and b) unbalanced system
tree. . .. . . .. 144
LIST OF ABBREVIATIONS
2D  TwoDimensional
3D  ThreeDimensional
AAU  Atmospheric Attenuation Unit
CGI  ComputerGenerated Imagery
CN  Compositing Network
CPN  Compositing Processing Node
DAC  DigitaltoAnalog Converter*
DDN  Display Device Node
I/O  Input/Output
IPU  Intensity Multiplication Unit
LSH  Least Significant Half
MSH  Most Significant Half
MMN  Maintanence Management Node
MMU  Maintanence Management Unit
OGN  Object Generation Node
OGNN  Object Generation Node Nucleus
OGU  Object Generation Unit
RGB  RED, GREEN, and BLUE
SMU  Secondary Memory Unit
VGN  Video Generation Node
VLSI  Very LargeScale Integration
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
AUGMENTABLE OBJECTORIENTED
PARALLEL PROCESSOR ARCHITECTURES
FOR REALTIME COMPUTERGENERATED IMAGERY
By
Ross Morris Fleischman
December 1988
Chairman: Dr. John Staudhammer
Major Department: Electrical Engineering
The hardware architecture of a system for realtime
computergenerated imagery (CGI) is presented that combines
augmentability, modularity, organizational simplicity, and
parallelism. This architecture is a functional, highly
modular, parallel processor approach that is well suited
for employing VLSI technology. It is a generic structure
that can grow with technological advances and can
accommodated a full range of CGI systems that demand
different performance requirements through one basic set of
modules.
The CGI process contains five fundamental components:
input, modeling, rendering, compositing, and output. This
architectural approach extends specialized hardware into
both the compositing and output components, which allows
the definition of a generic framework for building systems
appropriate for many simulations. The system architecture
performs image synthesis in parallel by partitioning the
image generation task in object space with each partition
assigned to an individual autonomous object generator.
Objects are rendered independently from each other, and
when complete they are automatically composite by the
hardware for display. This process is repeated at a rate
suitable for realtime animation.
The picture representation accepts transparent, semi
transparent, and fully opaque surfaces. Hardware
facilities perform automatic hidden surface removal with
antialiasing and atmospheric attenuation inclusion. An
approximation for surface intersection is performed, and a
subpixel control mechanism is provided.
The parallel hardware algorithm is classified as a
computeaggregatebroadcast paradigm: a compute phase
generates objects, an aggregate phase combines the objects
into a scene, and a broadcast phase displays the scene.
Its system framework maintains a synchronous feedthrough
structure that allows enlargement by either dynamic or
static additions. System improvement is accommodated by
adding modules that incrementally improve system
performance and scope. This reduces difficulties
associated with the incorporation of new systems to
introductions of new modules, thereby lengthening system
life.
xii
CHAPTER I
INTRODUCTION
A computergenerated imagery (CGI) system is a
specialized computer system that provides a visual simu
lation of an artificial environment. Conceptually, a CGI
system consists of a window in multidimensional space with
which an observer may look into a world. The window is
presented by a computer driven display device, while the
world is modeled by a database that the computer can
access. Thus, the visual simulation may be regarded as a
generation of an outthewindow view, in realtime, ac
cording to the simulated position and orientation of the
observer with respect to the simulated changes of the
artificial environment.
A popular application of realtime computergenerated
imagery visual simulators concerns vehicle training simu
lation [FIS85, PAN86, SCH81, SCH83b, YAN85, ZYD88]. For
this application, an observer's visual experience is
created by a generated perspective projection of a 3D
world rendered onto a 2D display device [BEN83], with
associated special effects. Other simulation tasks [SUG83]
may have variations of the visual simulation requirement as
a function of the world structure, but the realtime per
formance and rendering problems remain constant.
Realtime operation, which defines a computation
process where the execution time of the computer is syn
chronized with the physical event time or wallclock time,
is a major requirement of these systems [FOR83]. Also, the
associated image rendering problems are computationally
demanding. Thus, realtime CGI system organizations typi
cally mandate customdesigned, specialpurpose, highspeed
computers, with generalpurpose computers for their control
[SCH81, SCH83b, YAN85].
Problem Definition
Traditional CGI architectures utilize both pipelining
and parallelism technologies to achieve realtime perfor
mance for image synthesis. The system architectures are
usually highly specialized and constrain the types of
graphics primitives that can be employed [ENG86]. These
specialpurpose architectures usually involve a fixed
graphics pipeline that is difficult to enhance for in
creased performance or for inclusion of additional graphics
primitives.
The realization of major CGI architectural revisions
that exhibit improved performance with substantial hardware
reduction is a subject of research. Innovative CGI archi
tectures will employ unique organizational structures that
realize algorithmic improvements with respect to imple
mentation with massive memory, gate arrays, and custom
VLSI. Thus, improvements in both VLSI memory chips [COL87,
TUN87a] and VLSI computational chips [BUR87, COL87, GRI86,
MOK87], plus parallel processing trends [SCH87], are good
indicators that the evolution of CGI system organizational
philosophies will become VLSIoriented through parallelism.
Dissertation Project
The general research objective is to develop the
guidelines and philosophies of a VLSIoriented realtime
CGI architecture that combines augmentability, modularity,
organizational simplicity, and parallelism. This proposed
architecture will be a functional, highlymodular, parallel
processor approach that will be suited for employing VLSI
technology. It will be a generic structure that can grow
with technological advances. The investment in such a
system will hypothetically never be discarded; system
improvement can be accommodated by adding modules that
incrementally improve the performance and scope of a sys
tem. The introduction of new systems will be reduced to
introductions of new modules, thereby resisting system
obsolescence. Therefore, such a system will be continu
ously expandable and never totally outmoded, thus providing
performance, development, and economic benefits.
Overview of Dissertation
This dissertation is organized into nine chapters.
Chapter I is an introductory chapter that covers objectives
and background about the dissertation subject. Chapter II
describes a typical realtime CGI architecture. Chapter
III presents an overview and introduction of the proposed
augmentable CGI architectures, along with the fundamental
driving idea for the approach. Chapters IV through VIII
describes each major subsystem of the augmentable CGI
architectural approach. Chapter IV describes the compos
iting network. Chapter V describes the video generation
node. Chapter VI describes the display device node.
Chapter VII describes the object generation node. Chapter
VIII describes the maintenance management node. Chapter IX
is a concluding chapter that contains a discussion of the
system simulation, along with a summary of the dissertation
results.
CHAPTER II
TYPICAL REALTIME CGI ARCHITECTURE
A typical realtime CGI system organization, popular
among vehicle training simulators, is shown in Figure 21.
This structure provides a single fieldofview of the
artificial environment, termed a channel. Its organization
consists of a cascade of four major subsystems: the scene
manager, the geometric processor, the video processor, and
the display device [SCH83b, YAN85]. The first three sub
systems form a specialized computer graphics pipeline for
image rendering. The last subsystem provides a specialized
display for viewing.
Scene Manager
The overall function of the scene manager is to
provide scene elements to the system pipeline that lie in
the observer's fieldofview, within the artificial envi
ronment, given observer position and orientation. Observer
position and orientation information are provided to the
scene manager by a host simulator [FOR83, SCH83b]. This
information directs dynamic extraction of database scene
elements from mass storage that are loaded into an active
database memory for sorting [PAN86]. These scene elements
represent the observer's panorama and are examined to
Data From Host
Simulator
3D Data
Blocks
3D Data
Blocks
Blocks
3D
Analog
Video
Figure 21. Block diagram of a
system organization.
typical realtime CGI
determine if they are potentially visible within the field
ofview of the observer [PAN86, YAN85]. Scene elements
satisfying this condition are provided with an appropriate
levelofdetail, while the remainder are culled [PAN86,
YAN85].
The resultant scene elements are sent down the system
pipeline, at the image update rate, to the geometric
processor [YAN85]. Subsystem processing load is continu
ously monitored by the scene manager to avoid overloading
the processing capacity of the pipeline. Processing load
reduction techniques utilize various dynamic scene content
control mechanisms that usually degrade image quality
gracefully [SCH83a, YAN85].
The mass storage device contains a database, which
models an artificial environment, that drives the hardware.
Features of a simulated scene (natural and cultural) are
modeled to be of the same size, shape, location, and color
as their realworld counterparts [SCH81, SCH83a]. Database
modeling primitives for the typical CGI system consist of
planar polygons as a major primitive and quadric surfaces
as an option for both manmade curved objects and natural
curvilinear objects [YAN85]. The database also contains
scene element attributes such as color and texture.
Geometric Processor
The geometric processor is a specialpurpose pipelined
computer that operates on the scene element output from the
scene manager. These operations usually produce the pro
jected geometry of the scene with associated geometric
gradient and color gradient parameters. In general, the
fixed coordinate system of the scene elements are trans
formed (via translation, rotation, and scaling) to the
momentary eyebased coordinate system (origin located at
the observer's eye). Within the eyebased coordinate
system, a visibility frustrum is defined. Then, a 3D
clipping algorithm is applied to determine where the 3D
scene intersects the bounding planes of the visibility
frustrum. Scene parts within the visibility frustrum are
projected to the image plane with the computed geometric
gradient and color gradient parameters, while the rest are
deleted [BEN83].
Issues relating to color can be found in Rogers
[ROG85]. A matrix multiplier was presented by Meares et
al. [MEA74] and a threedimensional coordinate transforma
tion device was presented by Newarikar [NEW82]. Clipping
algorithms, geometric transformations, and perspective
projection can be found in Rogers [ROG85] with an interest
ing VLSI solution presented by Clark [CLA82]. Clark
discusses a fourcomponent vector, floating point VLSI
processor for accomplishing matrix transformations, clip
ping, and mapping to output device coordinates.
Video Processor
The video processor is a specialpurpose computer that
operates on the resultant projected geometry, geometric
gradient, and color gradient output from the geometric
processor for subsequent display. The video processor
computes each pixel color produced on the picture plane
representing visible portions of scene element surfaces
[SCH83b, YAN85]. Pixel color computation is a function of
various items: geometric gradient parameters (surface
normals), color gradient parameters (scene element native
color), texture maps, atmospheric attenuation (haze color),
scene element illumination (both natural and cultural light
sources), antialiasing techniques, shadows, and shading
techniques. During, before, or after pixel computation,
visible portions of the scene are identified through a
hidden surface removal technique.
This processor also provides timing and control of the
display device, which relate to the video processor organ
izational philosophy [YAN85]: scanlinebased or frame
bufferbased. Scanlinebased units perform video pro
cessing one scanline at a time in synchronism with each
raster of the display device; one row of the visible
scene's pixel codes are stored. Framebufferbased units
perform video processing independent of the raster display;
complete frame of the visible scene's pixel codes are
stored.
Algorithms and techniques used by the video processor
are well known and can be found in the literature, such as
Rogers [ROG85]. Examples of antialiasing include Booth's
[B0087] presentation concerning the human factors relation
to antialiasing and Carpenter's [CAR84] presentation of an
interesting Abuffer approach. Realtime hardware ap
proaches to texture mapping can be found in the literature,
such as one approach presented by Fant [FAN86].
Display Device
Display device technology primarily consists of two
variations: calligraphic displays and raster displays
[SCH83a]. The color calligraphic display is characterized
by a continuous layered phosphor surface (RED and GREEN
phosphor layers) used to present a color picture with beam
penetration control (electron beam velocity) of sequen
tially refreshed straight lines (vectors or strokes) and
points (zero length vectors). The raster display contains
a regular grid of phosphor triads (RED, GREEN, and BLUE)
that are used to present a color picture by modulated
illumination of each phosphor triad point (pixel) with
refresh in a regular pattern. Calligraphic systems main
tain high quality light points with color limitations,
11
while raster systems maintain high quality painted faces
without color limitations [YAN85, SCH83a].
CHAPTER III
ALTERNATE REALTIME CGI ARCHITECTURE
This chapter presents an alternate realtime CGI
architectural approach as compared to the traditional
approach briefly presented in Chapter II. This discussion
is meant as an overview to give an understanding of the new
approach before delving into its details. The system model
is presented to illustrate the underlying idea and its
supporting architecture. Following, is a discussion of the
advantages of the new approach and typical applications.
System Model
A system model is presented that exhibits the premise
of this research. First, the underlying idea with respect
to the image generation problem is presented. Second, the
supporting architecture that can realize the underlying
idea is described.
Underlying Idea
This field of architectural research is driven by the
fundamental idea that an individual scene is composed of
separable objects. Therefore, a scene can be produced from
the summation of every object existing in that scene; this
process is called compositing. An example of compositing
is presented in Figure 31, which illustrates the composite
of two opaque 3D objects. As shown, an opaque background
3D object and an opaque foreground 3D object are merged
to form a composite 3D scene. This process indicates that
there is an alternative to producing an entire complex
scene directly. The generation of simpler objects can be
done individually, followed by compositing the simpler
objects to produce an entire complex scene [POR84, STA83].
The approach taken by this research separates the
image compositing process from the image synthesis process
of the image generation problem. The compositing is
reduced to the pixel level, where a procedure is defined
that can blend images through a pixelbypixel process.
This compositing process is extended further, at the pixel
level, to include the effect of atmospheric attenuation.
The compositing process performs antialiased blending
of images utilizing a mixing factor. This mixing factor
defines the average opacity of a pixel, which defines the
average pixel reflectivity. It is useful for performing
surface edge antialiasing and for rendering surfaces that
are either transparent, semitransparent, or opaque. Along
with the mixing factor is the depth value for determining
the depth location of a pixel in space. This information
is used for a comparison to determine whether a pixel, as
compared to another pixel, is in front of, behind, or at
the same distance. Also, the horizontal and vertical
Background 3D Object
Composite 3D Scene
Figure 31. Composition of an opaque background 3D object
and an opaque foreground 3D object to
produce a composite 3D scene.
Foreground 3D Object
position is determined by the pixel location in the image
array and the color value is defined as the additive tri
stimulus colors: RED, GREEN, and BLUE.
The atmospheric attenuation process is performed by a
procedure that calculates attenuation as a function of
distance from the viewpoint with respect to a fading
constant and a horizon color. The fading constant is
adjusted for varying atmospheric conditions such as foggy,
hazy, and murky atmospheres. The horizon color is adjusted
for varying background lighting conditions. This process
is performed following the compositing process, which
results in a pixel value representing the composite tri
stimulus color value with the effect of atmospheric
attenuation included.
Supporting Architecture
The fundamental idea of compositing focuses on allow
ing a scene to be blended by computer. Hypothetically, the
objects would be visually computergenerated in their
proper position and orientation, then they would be merged
by computer for display. Therefore, instead of having an
individual total database for an artificial environment,
the total database would be partitioned by objects to
provide multiple partial databases. This would allow each
object or group of similar objects in a scene to be
assigned an individual processor, which would have the
advantage of distributing the image generation task, thus
reducing the performance requirement for each processor and
secondary memory unit. Also, the task of merging or
compositing the collection of objects would be performed by
separate processors. As a result, an organization of this
nature would produce multiple datastreams and multiple
instructionstreams, thereby speedingup both computational
processing and I/O processing. Also, the separable nature
of objects existing in a scene points to the goal of
expandability without affecting other elements of the
system.
The abstract organization of the proposed augmentable
CGI architecture, which logically follows from the above
discussion, is illustrated in Figure 32. Major components
of the proposed realtime CGI machine consists of multiple
object generation nodes (OGNs), a compositing network (CN),
a video generation node (VGN), a display device node (DDN),
and a maintenance management node (MMN). This system can
handle opaque, transparent, and semitransparent images. A
short description of each subsystem is discussed below with
a more detailed discussion of each subsystem presented
subsequently in Chapters IV through VIII.
The object generation node is a VLSIoriented image
synthesis processor with an optional local secondary
memory, which can execute computer graphics algorithms to
render an assigned object. OGNs operate autonomously and
concurrently with respect to the complete system, but in
Compositing
Network
NGeneral
Communications
Figure 32. Block diagram of the proposed augmentable
realtime CGI system organization.
synchronism with it. They are assigned a partition, termed
an object, of the entire image generation task. An OGN
interfaces to the compositing network through its image
memory, which contains an image space view of the assigned
object. Each element of the image memory contains three
pixel attributes: color, opacity, and depth. The X, Y
coordinates are derived from a pixel's position in the
image memory.
The compositing network is a pixelbypixel hardware
compositor, which is an expandable ensemble of intercon
nected compositing processing nodes, that produces a
computer graphics picture through blending independently
rendered objects into a full image. This network is a
synchronous feedforward structure. It simultaneously
reads each image memory area, of every OGN, pixelbypixel
in a rowbyrow manner and writes the composite result to
the VGN pixelbypixel.
The video generation node processes composite object
digital image data from the compositing network and then
converts it to analog video data for display. Pixels are
individually received from the compositing network. While
pixels are received, the VGN includes the effect of
atmospheric attenuation to each pixel and then writes the
result to its frame buffer pixelbypixel in a rowbyrow
manner. Simultaneously, the frame buffer data is read and
converted to analog data for driving the display device
node. Also, the timing of the entire system is derived
from the VGN.
The display device node is a raster scan type monitor.
It receives three primary colors from the VGN: RED, GREEN,
and BLUE. The video timing of the monitor is also control
led directly from the VGN.
The maintenance management node provides central
control and health assurance of the system. It is an
autonomous processor that provides selfmaintenance
operations and system support functions. Included is a
computational unit, a secondary memory unit, and a con
sole. The MMN communications to and from the nodes of the
system is provided by a general interface from which all
system nodes are connected.
Advantages of Approach
The improvements of this CGI architectural approach as
compared to existing CGI architectural approaches encompass
a reduced complexity of the individual image synthesis
processors, ease of system expansion, ease of including
different graphic primitives, decoupling of the rendering
process from the compositing process, and ease of system
understanding. The reduced complexity of image synthesis
processors is due to three factors: 1) the image genera
tion task is distributed among many processors (OGNs), 2)
the hidden surface removal with antialiasing is included in
the architectural structure (compositing network), and 3)
the effect of atmospheric attenuation is included in the
architectural structure (VGN). The automatic processing of
2 and 3 above is relegated to the machine structure and
the distributed processing of 1 above is shared among many
image synthesis processors. The result is a simplified
database and a reduction of the amount of geometry required
to render an object. This relieves individual processing
performance requirements of each object generation node,
thus allowing modest processors, e.g., offtheshelf VLSI,
to perform their image synthesis tasks. Thus, OGNs do not
have to be the same type. The decoupling of the rendering
process from the compositing process is done though
independent memory areas; this enhances system performance
by keeping both processes running in parallel. The system
expansion is eased, since it is done by merely adding
additional CPNs and OGNs. New graphics primitives can be
easily added to the system by additional OGNs that have
special hardware. The basic goal, which may raise load
balance issues, is to add more processors when performance
demands increase. The system understanding is simplified,
since the complex task of merging many objects is done
through the generic machine structure.
An underlying advantage is the trivial pixel level
solution to the intersection problem. A solution to a set
of simultaneous equations is usually done to solve the
intersection of two or more surfaces. This would require a
large amount of calculations. The pixel level approach of
this new architecture reduces the geometry that is
typically involved for solving intersection problems to the
comparison of depth values. The solution is an approxima
tion, however it is visually correct. Also, the hidden
surface problem is solved in a similar pixel level manner.
Since reflectivity is handled by a mixing factor, the
opaque, transparency, semitransparency and edgeanti
aliasing problems associated with computer graphics are
also consolidated to the pixel level. Along with this is
the inclusion, at the pixel level, of the atmospheric
attenuation effects. Thus, a compact pixelbypixel method
allows the solution to complex geometrical problems and the
inclusion of complex realistic image effects.
This organization will allow a full range of CGI
systems that demand different performance requirements to
be accommodated through one basic set of modules. It will
be a generic structure that can grow with technological
advances. System improvement is accommodated by adding
modules as opposed to a system redesign that is usually
associated with typical realtime CGI systems. Thus, in
contrast to current fixed performance bruteforce realtime
CGI architectures, a variable performance and expandable
realtime CGI architectural approach is presented here.
Target Applications
Target applications of this device will not be
restricted to any specific realtime simulation task, i.e.,
vehicle simulation. A goal of this research is to extend
the architecture for inclusion of.other realtime simu
lation applications, e.g., process and system simulations.
It will be a general purpose framework to simulate many
things, in realtime, with visual output.
In short, the object generation nodes can be thought
of as processing logical objects. Objects can be a single
item or a collection of items. For instance, an object is
abstract: blade of grass, total field of grass, or physical
object. Therefore, some or all of the tasks of simulation
can be moved to each object, so long as these tasks are
separable. However, the tasks do not have to be cleanly
separable. For instance, overlaps could exist, which would
be resolved in compositing. This would allow simple
splitting of some objects into two somewhat overlapping
ones without the need of calculating new intersections due
to an artificial division cut. Also, truecolor, pseudo
color, or both can be applied for the visual simulation
with respect to the problem domain.
CHAPTER IV
COMPOSITING NETWORK
The compositing network (CN) is a hardware compositor
that produces a computer graphics picture through blending
heterogenously rendered objects into a full image. These
separately rendered objects are reductions of a total
modeled environment into pieces that rely on compositing
techniques for accumulation. Each object is produced by an
individual object generation node (OGN), which is in itself
a computer image generation device. The network configura
tion is in the form of a synchronous feedforward tree that
is connected to a multiplicity of object generation nodes
(OGNs) for input and to a single video generation node
(VGN) for output. Therefore, many object images are com
posited simultaneously. The composite of additional object
images is done through enlarging the compositing network
and through including additional object generation nodes.
There is no fixed configuration, but rather a general
framework to configure a compositing network utilizing a
collection of basic building blocks, called compositing
processing nodes (CPNs).
The compositing network operation requires the simul
taneous input of all instances of pixels with the same X
and Y cartesian coordinate per unit time. Each instance of
a pixel is part of an individual object rendered by an
object generation node. These pixel values flow in a
synchronous feedforward manner through the compositing
network, while being merged pixelbypixel at particular
stages. The last stage of the network provides a single
surviving pixel as output, which has an implied X coordi
nate and Y coordinate.
To summarize, the compositing process is carriedout
pixelbypixel through three steps: 1) every pixel value is
simultaneously read from each image array of the object
generation nodes at a specified X coordinate and Y coordi
nate, 2) the compositing process operates on the collection
of pixel values read from the OGNs to produce a single
composite pixel resultant, and 3) the single composite
pixel resultant is written to the resident image array of
the video generation node at the same X coordinate and Y
coordinate used for the read operation. This process is
repeated at every X coordinate and Y coordinate of the
image array to produce every composite pixel value of the
image array within the video generation node.
The entire compositing network action, for each
collection of pixels, can be generally characterized by
Pc = oper(P1, P2, P3,..'* Pi) (1)
at every pixel with identical X, Y cartesian coordinates in
the i image arrays. The value Pc represent a single sur
viving pixel after compositing. The "oper" operator is a
general operation that symbolizes the compositing action
due to the entire compositing network. Specifics of the
compositing algorithm that the compositing network realizes
are developed and described in the following sections.
Compositing Methodology
Guidelines for the generation of 2D pictures and
arithmetic for their 2D compositing were discussed by
Porter and Duff [POR84]. Their compositing method produced
antialiased composite images through a pixelbypixel
process. The antialiased composite or antialiased blending
of images requires information about the subpixel overlap
and object opacity. This information, as discussed by
Porter and Duff [POR84], is given by adding a mixing factor
to the color channels, which is called an ALPHA value.
Therefore, a pixel is defined by four independent varia
bles: RED, GREEN, BLUE, and ALPHA. Thus, the interplay of
alpha values must be considered for compositing objects to
accumulate a final image [POR84].
The ALPHA portion of an object representation provides
two pieces of information for compositing: 1) the single
ALPHA value represents the extent of coverage of an object
within a pixel and 2) the collection of ALPHA values repre
senting an object provides coverage information that desig
nates the shape of an object within the image space. The
pixel coverage information provides a mixing factor to
control linear interpolation of foreground and background
colors at every pixel. The object shape information, which
is termed a matte, identifies the object from what is not
the object within an isolated image array.
The ALPHA value represents the opacity of a pixel,
which is a fractional value that ranges from zero to one.
The antithesis of ALPHA, which is the transparency of a
pixel, is defined as (1ALPHA). Therefore, the transpar
ency value also ranges from zero to one. Figure 41 illus
trates this coverage information, pictorially, for three
distinct coverage types of opacity: no coverage, full
coverage, and partial coverage. As shown, no coverage is
indicated by an ALPHA value of zero, full coverage is
indicated by an ALPHA value of one, and partial coverage is
indicated by a fractional ALPHA value between zero and one
[POR84].
The pixel coverage information consists of an average
value of opacity. Therefore, the subpixel distribution of
opacity is not known or, in other words, the subpixel shape
is not known. Thus, some pixel coverage information is
missing, but the ALPHA value information is still useful
for rendering transparent objects, semitransparent ob
jects, and performing noncommutative object edge anti
aliasing for rendering opaque, semitransparent, or trans
parent objects. Also, since the ALPHA value represents
ALPHA = 0
a) No Coverage
ALPHA = 1
b) Full Coverage
0 < ALPHA < 1
c) Partial Coverage
(arbitrary depiction)
Figure 41. Three distinct types of pixel coverage, with
respect to the ALPHA value: a) no coverage, b)
full coverage, and c) partial coverage. The
subpixel shape of the pixel with partial
coverage is arbitrary and is only shown in
this manner for conceptual clarity.
the average coverage of an object within a pixel, the pixel
color is determined by the product of ALPHA and the
object's true color.
Porter and Duff [POR84] discussed many operators for
the compositing of twodimensional images. The operator
of interest to this research is the "over" operator. This
operator computes a composite pixel color due to one pixel
in front of another. The composite pixel color is given by
cc = Cf + (1 Af)cb (2)
and the composite opacity
Ac = Af + (1 Af)Ab (3)
where c denotes one of three tristimulus color values, A
denotes the opacity value ALPHA, the subscript c denotes
the composite, subscript f denotes the foreground, and
subscript b denotes the background. Also, the true fore
ground color Cf is multiplied by the foreground opacity Af
to produce cf and the true background color Cb is multi
plied by the background opacity Ab to produce cb. This is
done to keep the computation of cc similar to the computa
tion of Ac. The derivation of the "over" operator is
presented by Porter and Duff [POR84]. A similar develop
ment of "over," adjusted for this research, is presented
in the following section of this chapter.
Porter and Duff's approach has a drawback of requiring
the priority of images to be manually entered. Therefore,
Duff [DUF85] introduced the depth variable, Z, as an ex
tension to the earlier image composition algorithm to
correct this drawback. The approach extended each pixel
in the image space to contain five independent variables:
RED, GREEN, BLUE, ALPHA, and Z. From this representation
an RGBAZ algorithm was developed that combined the "over"
operator of Porter and Duff [POR84] with a Zbuffer algo
rithm. Before discussing Duff's [DUF85] approach, the Z
buffer algorithm is presented and discussed.
A Zbuffer is a depth buffer that stores the Z car
tesian coordinate, which is also termed the depth coordi
nate, of every visible pixel in image space. It is used in
conjunction with a frame buffer, which is an attribute
buffer that stores the intensity of each pixel in image
space. A Zbuffer algorithm is a hiddensurface algorithm
that operates on the RGB intensity information and the
depth coordinate, Z, stored at each pixel in image space.
The Zbuffer algorithm is described by Catmull [CAT74]. It
functions by comparing the depth value of a new pixel,
which is to be written into the frame buffer, with the
depth value of the pixel that is currently stored in the Z
buffer. If the comparison indicates that the new pixel is
closer to the viewpoint than the current pixel, then the
new pixel's intensity value is written into the frame
buffer and its depth value is written into the Zbuffer
[ROG85]. If the comparison does not indicate the new pixel
is closer to the viewpoint than the current pixel, then the
current pixel values remain in the frame buffer and in the
Zbuffer.
To recapitulate, the Zbuffer algorithm is a search
over X, Y in 3D space for the value of Z(X,Y) that is
closest to the viewpoint in image space. The Zbuffer
operation can be defined as RGBZ = zmin(L,M), where L is an
image array, M is an image array, and RGBZ is the survivor
pixel from either M or L according to the algorithm. The
collection of resultant RGBZ survivors over X, Y produces
an image space that is a composite image of the rendered
objects. This composite operation [DUF85] is more com
pactly characterized as
Zc = zmin(ZL, ZM) (4)
RGBc = RGBL, if ZL = zmin, else RGBM (5)
at every pixel in the two image arrays. The subscript c
denotes the composite. Two properties of the "zmin" oper
ator is that it is both commutative and associative.
The Zbuffer algorithm allows pixels to be written
into the frame buffer in arbitrary order. Therefore, the
computation time associated with a depth sort operation is
eliminated [ROG85]. Unfortunately, the algorithm has
inherent aliasing problems due to its point sampling nature
[DUF85]. It also fails for rendering transparent objects,
but it is fast and simple [CAR84].
Duff's approach utilized the depth value at each of
the four corners of a pixel to compute a fraction called
BETA. This value is computed through linearly interpolat
ing the four depth corner values. The composite color is
computed by
cc = B(cf over cb) + (1 B)(cb over cf) (6)
and
Zc = min(Zf, Zb) (7)
where c denotes one of three tristimulus color values
multiplied by its respective opacity value, Z denotes the
depth value, B denotes Duff's BETA value, the subscript c
denotes the composite, subscript f denotes the foreground,
and subscript b denotes the background. This approach
combines the pixels by area sampling. A drawback of this
3D compositing approach and of the previously discussed 2
D compositing approach is that they do not apply when the
edges of more than one object are projected onto a single
pixel. The compositing algorithm developed in this re
search, which is discussed in the following section of this
chapter, addresses this problem.
Another interesting approach to compositing was dis
cussed by Carpenter [CAR84], with the introduction of the
Abuffer. An Abuffer is an antialiased hidden surface
mechanism that is an enhancement to the Zbuffer through
inclusion of a mask that contains subpixel coverage infor
mation. Therefore, the mask provides more pixel coverage
information than the ALPHA value, but it is more memory
intensive.
The compositing techniques specified in the reviewed
literature had specific idealic objectives, which are
listed as follows:
1. Must not induce spatial aliasing in the image,
which implies that soft edges of objects must be
respected in computing the final image [POR84].
2. Provide facilities for arbitrary dissolves, fades,
darkening, and attenuation of objects [POR84].
3. Exploit the full associativity of the compositing
process, which implies accumulation of several
foreground objects into an aggregate foreground can
be inspected over different backgrounds [POR84].
4. Allow various object representations: transparent,
semitransparent, and opaque [POR84].
5. Visibility technique must support all conceivable
geometric modeling primitives: polygons, quadrics,
patches, fractals, and so on [CAR84, DUF85].
6. Must handle opaque intersecting surfaces and trans
parent intersecting surfaces [CAR84].
7. Must handle hidden surface removal [CAR84, DUF85].
The proposed new architectural approach attempts to
satisfy these compositing technique objectives. Unfortu
nately, due to tradeoffs taken to keep the approach within
hardware limits, some of these objectives are not entirely
met. The constraints and tradeoffs associated with the
approach addressed through this research, which concern the
stated idealic objectives, are discussed in later sections
and chapters.
RGBAZ Compositing Algorithm
The proposed compositing method is developed to allow
any number of images to be composite with hidden surface
removal and antialiasing. The compositing algorithm real
ized by the compositing network is based on Porter and
Duff's [POR84] "over" operator, but is modified through the
introduction of the depth value. This rendition modifies
the "over" operator through incorporating the "zmin" oper
ator for identifying the foreground pixel from the back
ground pixel. The algorithm is labeled an RGBAZ algorithm,
as was Duff's [DUF85], but differs from that formulation.
It is developed and described in the subsequent paragraphs.
Consider opacity values, A1 and A2, belonging to a
pair of semitransparent pixels, P1 and P2, that have
identical X and Y coordinates, but differ in the Z coordi
nate where the Z1 value is less than that of the Z2
value. The composite Z value for this situation, in ac
cordance with the Zbuffer algorithm utilizing Equation 4,
is given by
Zc = Z1 (8)
where Z denotes the depth value, and subscript c identifies
the composite resultant.
The depth comparison identifies pixel P1 as being
closer to the viewpoint than pixel P2. Therefore, pixel P1
is identified as the foreground pixel and pixel P2 is
identified as the background pixel. The opacity represen
tation designates the opaqueness of pixel P1 as A1 and its
clearness as (1 AI). Likewise, the opaqueness of pixel
P2 is A2 and its clearness is (1 A2). This implies that
the composite opacity, according to the "over" operator, of
the two pixels is given by
Ac = Al + (1 A1)A2 (9)
where A denotes the opacity value. An example of this
situation is depicted in Figure 42.
The composite color is calculated by realizing that
pixel P1 allows (1 AI) of its background light through
and reflects A1 of its color. Likewise, pixel P2 allows (1
 A2) of its background light through and reflects A2 of
I
A2 (1
I I
A1 (1A
I I
A1 (1A 1
Figure 42.
Background Object
(partial pixel coverage)
A 2)
Foreground Object
(partial pixel coverage)
1)
Composited Objects
(shared partial pixel coverage)
N2 (1A 1 )(1A2)
Two pixel opacity values are composite. The
values were derived from coverage information
from two different objects. The coverage
depictions are arbitrary, they are given
specific subpixel forms to clarify the
composite operation. The coverage areas are
actually averaged across the pixel.
its color. Therefore, P1 reflects A1 of its color and lets
(1 A1) of P2's reflected color through. This implies
that the composite color, according to the "over" operator,
of the two pixels is given by
cc = A1C1 + (1 A1)A2C2 (10)
where C represents the tristimulus colors: RED, GREEN, and
BLUE. The upper case C is used to designate the true
color, which occurs when the pixel is 100% overlapped by
the object. The lower case color c depicts the true color
value multiplied by its opacity value, which is given by
cc = AcCc (11)
A similar argument follows, as presented above, when
Z2 is less then Z1. For this condition, substitute pixel
subscript identifiers "2" for "1" and "1" for "2" in the
development presented above. The composite depth, opacity,
and color would then be given by
zc = z2 (12)
Ac = A2 + (1 A2)A1 (13)
cc = A2C2 + (1 A2)A1C1 (14)
respectively.
The incorporation of the "zmin" operator with the
"over" operator requires an additional development for the
effects of two pixels with equal depth values. This condi
tion implies that two objects are occupying the same voxel
in space. Therefore, both objects contribute to the in
tensity of the resultant pixel, but the intensity contribu
tion due to each of these objects is nebulous. This
condition can be understood by considering the quantization
error due to the use of finite depth values. The opacity
contributions from the input pixels may be due to pixel
overlap. But, the foreground and background object can not
be discerned, since the difference in depth is within the
limits of the quantization error.
The development of this condition will consider the
pixel as a small cubic volume, instead of a small surface.
This model will allow the edges of two objects to be
projected into its space. The viewable or reflective front
surface of this small cubic volume is only of interest for
determination of the opacity and color values.
The composite opacity is found by first considering
the condition, Z1 < Z2, where (Z2 Z1) is within the
quantization error. The composite opacity would then be
equal to Equation 9. Now, consider the condition, Z1 >
Z2, where (Z1 Z2) is within the quantization error. The
composite opacity would then be equal to Equation 13. The
probability of either of these conditions occurring within
the small cubic volume are equal. Therefore, the composite
opacity and color values are computed through a simple
average of the two possible conditions, which are given by
Ac = [(A1 + (1 A1)A2) + (A2 + (1 A2)A1)]/2
= Al + A2 A1A2 (15)
and
cc = [(A1C1 + (1 A1)A2C2) + (A2C2 + (1 A2)A1C1)]/2
= A1C1 + A2C2 (C1 + C2)A1A2/2 (16)
Also, the composite depth is given by
Zc = Zi = Z2 (17)
It is interesting to note that Equations 9, 13, and 15 are
equal.
Boundary analysis of Equation 16 is performed to check
its validity, which is presented as follows:
Cc = A1C1, if A2 = 0 (18)
Cc = A2C2, if A1 = 0 (19)
Cc = (C1 + C2)/2, if A1 = A2 = 1 (20)
The first two boundary examples demonstrate a reduction to
a single pixel case, which is to be expected. The last
boundary example reduces to an average color that does not
become amplified, which is also to be expected. A psueoo
code outline of this RGBZA compositing algorithm with
respect to a pair of image arrays is given in Figure 43.
As shown, each pixel of the two image arrays are compos
ited to produce a composite image array for display. The
RGBZA Compositing Algorithm
given
An array RGBZAl[x,y]
An array RGBZA2[x,y]
An array rgbZAc[x,y]
begin
for each element(x,y) of array rgbZAC[x,y] do
AC = Al + A AIA2
f Z1 < Z2 henA
rc = A1R1 + (1A1)A2R2
gc = AG1 + (1AI)A2G2
bc = AiB1 + (1AI)A2B2
Z = Z1
endif
if > Z2 then
rc = A2R2 + (1A2)A1R1
c = A2G2 + (1A2)A1G1
bc = A2B2 + (1A2)A1B1
Z = Z2
endif
if Z = Z2 then
rc = AIR1 + A2R2 (R1 + R2)AIA2/2
gc = A1G1 + A2G2 (GI + G2)A1A2/2
bc = A1B1 + A2B2 (B1 + B2)AIA2/2
Z = Z1
endif
endfor
Display rgbc array of the rgbZAc array
end
Figure 43. The RGBZA Compositing Algorithm.
composite opacity and depth values are not needed for
display; they are included so that the resultant image
array can be composite with other image arrays. This
subject is discussed in the following section.
Network Structure
The compositing operation described in the previous
section dealt with compositing two pixels, each produced
from two separate objects, to form a single composite pixel
as a result. A method of compositing many pixels, where
each pixel is produced from many objects, would be to
create a hierarchy of compositing operations. At the
bottom of the hierarchy, compositing operations would
simultaneously accept pixel values from separate image
arrays as input. The multiple outputs of the bottom level
in the hierarchy would be used as inputs to the next level
in the hierarchy. This process would continue until an
individual output is produced at the top of the hierarchy
of compositing operations. The result would be a composite
pixel value of every pixel value used as input to the
lowest level of the hierarchy. This composite pixel value
would then be written into an image array at the same X, Y
coordinate that was used for the input pixels. The same
procedure would be done for all succeeding composite pixel
value outputs of the hierarchy, which would produce a
complete composite image array of many objects.
A hardware synthesis of the hypothetical hierarchy of
compositing operations is what defines the compositing
network. It is created through interconnecting an ensemble
of fundamental hardware compositing units that realize the
compositing operation. These units are termed compositing
processing nodes (CPNs). The defined function of a CPN is
to produce a single composite pixel value from a pair of
input pixel values. It maintains a 2to1 configuration,
where the output of one CPN can supply an input to a suc
ceeding CPN. This is an iterative property, which is the
property required to realize the hierarchy of compositing
operations in hardware. The structure of the entire
compositing network is driven by the structure of an
individual CPN. Therefore, the interconnection of CPNs
forms a binary tree, which realizes the compositing net
work. A depiction of this structure is shown in Figure 4
4, which illustrates a fully balanced 3 level compositing
tree that has 7 CPNs and 8 connections for deeper CPN
levels or for terminal OGN connection. The general struc
ture for a fully balanced tree with N terminal connections
maintains n=log2N levels with Nl CPNs for the network
configuration. However, the compositing network does not
have to be a fully balanced tree. It can be unbalanced as
long as all of the OGNs are connected at the same level
within the system tree.
To Video Generation Node (VGN)
Level 1
Level 2
Level 3
1 2 3 4 5 6 7 8
From deeper levels of Compositing Processing Nodes (CPNs)
or from eight Object Generation Nodes (OGNs)
Figure 44.
A fully balanced three level compos
iting tree.
The compositing network can be modeled as a pipelined
machine, where each level in the system's binary tree
structure is a pipeline stage. At every machine cycle, a
collection of pixels, with identical X, Y coordinates, are
routed to the nodes within the lowest level of the system
tree. The machine operation proceeds in a synchronous
feedthrough manner for every machine cycle, where a col
lection of pixels at a particular level in the system tree
is computed to produce a collection of composite pixel
values as a result. These results are routed, before the
next machine cycle, to the inputs of a succeeding level in
the system tree. Therefore, each succeeding level in the
system tree produces half the amount of pixel values (fully
balanced tree) then were provided for input. The output of
this machine provides a single composite pixel value as a
result, which is produced from the highest level of the
system tree.
This structure is classified as a synchronous feed
forward configuration, where CPN operation is synchronous
with the image update rate. Therefore, the machine cycle
time is a function of both the image space resolution and
the image update rate. The pipeline is considered full
when every CPN in the system tree has a valid input.
During a full pipeline state, each level of the tree is
processing a set of pixels that have identical X, Y coordi
nates. Therefore, the startup time through a tree will be
a function of the tree depth and the number of pipeline
stages within an individual CPN.
The effect of the feedthrough structure, of the
compositing network, has to be considered regarding the
RGBZA algorithm. This structure has a cumulative affect
that directly influences the compositing operation. There
fore, the RGBZA algorithm has to be adjusted to accommodate
this fact.
The compositing network is a subtree of the system
tree and the OGNs are terminal nodes of the system tree
that provide input to the compositing network. Now, con
sider the evaluation of the composite opacity value from a
fully balanced system tree with i CPNs and i+l OGNs, where
the total number of tree nodes is 2i+l. The CPNs are
located at binary tree positions 1 through i. The OGNs are
located at binary tree positions i+l through 2i+l. Note
that a fully balanced system tree is used to simplify this
development. However, the system tree can be unbalanced to
accommodate a collection of OGNs that are not a binary
multiple. The criteria is for all of the OGNs to exist at
the same level within the system tree. This subject is
discussed further in the system features discussion of the
conclusion. For the fully balanced system tree, the
composite opacity defined at the first CPN or root node,
1, to the last CPN, i, for all cases, is given as follows
Al = A2 + A3 A2A3
A2 = A4 + A5 A4A5
A3 = A6 + A7 A6A7
Ai = A2i + A2i+l A2iA2i+l (21)
where the subscript identifies the tree node number. The
result is a recursive relation for the evaluation of the
opacity value. The composite color value and depth value
are defined through the use of a similar development for
each of the three depth value comparisons. The condition
Z2i < Z2i+l gives
Zi = Z2i (22)
ci = c2i + (1 A2i)c2i+l (23)
and the condition Z2i > Z2i+l gives
Zi = Z2i+l (24)
ci = c2i+l + (1 A2i+l)c2i (25)
and the condition Z2i = Z2i+ gives
Zi = Z2i = Z2i+1 (26)
ci = c2i + c2i+l (A2ic2i+l + A2i+lc2i)/2 (27)
where the lower case color "c" depicts the true color value
multiplied by its opacity value. This form of the compos
iting functions require each color value entering the
compositing network to be multiplied by its respective
opacity. Also, each composite color value exiting the
network will be the composite color multiplied by its
composite opacity.
The recursive relations are handled by iterative
techniques utilizing CPNs. The RGBZA compositing algorithm
that each CPN should execute is depicted in Figure 45.
This algorithm, which is termed the general RGBZA compos
iting algorithm, includes the image arrays and the multi
plication operation of the OGN's. It also includes the
image array of the VGN and a reference to the DDN. The
second loop within the main loop is the actual network
algorithm. This task inputs a collection of pixel values
for processing, according to their respective depth re
lationship, to produce a single surviving composite pixel
value for output. The loop counts down in order to obviate
the startup time that would be associated with a hardware
pipeline.
The operation of the entire compositing network action
is reduced to a special case when only opaque objects are
involved without the inclusion of special effects (e.g.,
dissolves, darkening, antialiasing, etc.). This is given
as follows
{Z2i, if Z2i < Z2i+l
c = Z2i+l if Z2i > Zi+l (28)
(Z2i, if Z2i = Z2i+
General RGBZA Compositing Algorithm
const
n = total number of CPNs in tree
given
Array rgbZAi, where i=1,2,3,...,2n+l {node registers}
Array RGBZAjy, where j=l,2,3,...,n+l {OGN Memory}
Array rgb x' {VGN Memory}
begin
for each element(x,y) of RGBZA8 x and rgby do
for i=n+l to 2n+l do {load 6N'utput registers}
j =in
ri = AjYjY
A = Aj=
i jx,yGjIx~y
end or {end load OGN output registers}
for i=n downto 1 do (CPN compositing operation)
A = A2i + A2i+ A2i(A2i+)
if Z < 22i+1 then
ri =r2i + (lA2i)r2i+l
gi= g2i + (1A2i)92i+l
bi = b2i + (lA2i)b2i+l
Zi Z2i
endif
if Z2i > Z2i+l then
i = r2i+l + (lA2i+l)r2i
gi = g2i+1 + (1A2i+1)92i
bi = b2i+l + (lA2i+l)b2i
Zi Z2i+l
endif
if Z2i = Zi+l then
ri = r2i + r2i+l (A2ir2i+l + A2i+lr2i)/2
i = g2i + g2i+ (A2ig2i+l + A2i+192i)/2
bi = b2i + b2i+l (A2ib2i+l + A2i+1b2i)/2
Zi = Z2i
endif
endfor {end CPN compositing operation}
rgbxy = rgbi {write composite result to VGN}
endfor
Display rgbxy array {DDN}
end
Figure 45. The general RGBZA compositing algorithm for a
fully balanced tree. Note that lower case
letters designate the product of intensity and
opacity.
C2i, if Z2i < Z2i+
C= {C2i+1 if Zi > +1 (29)
{(C2i + C2i+1)/2, if Z2i Z2i+1
at every pixel in the i+l image arrays. The composite
pixel has either full coverage or no coverage. Therefore,
the opacity information is not needed. Also, the matte
information is implied by a depth value that is not the
maximum. The specialized RGBZ algorithm is presented in
Figure 46. As for Figure 45, the second loop within the
main loop is the actual network algorithm. This task
inputs a collection of pixel values for processing, ac
cording to their respective depth relationship, to produce
a single surviving composite pixel value for output. A
mixture of both the specialized and the general forms of
the compositing algorithm for CPNs can compose a
compositing network. The OGNs can be specialized for
opaque objects without antialiasing and special effects.
These nodes would be assigned to the section of the tree
that contain the specialized CPNs. Also, OGNs that process
objects with antialiasing and special effects can be
assigned to the section of the tree that contain the
general CPNs. Configurations could include a mix of
general and specialized CPNs. The purpose of mixing CPNs
would be to reduce the system complexity, since the
specialized CPNs are of a simpler form than the general
CPNs.
Specialized RGBZ Compositing Algorithm
const
n = total number of CPNs in tree
given
Array RGBZi, where i=1,2,3,...,2n+l {node registers)
Array RGBZj ,,, where j=l,2,3,...,n+l {OGN Memory)
Array RGBxy {VGN Memory)
begin
for each element(x,y) of RGBZ and RGB do
for i=n+l to 2n+l do {load 6GN output re4gsters)
j =in
Ri = R
Gi = Gj,x,y
Bi =Bjx,y
B = B*'x'Y
i ),x,y
Z = Z.
end or {end load OGN output registers)
for i=n downto 1 do {CPN compositing operation}
if Z2i < Z2i+l then
Ri = R2i
Gi = G2i
Bi = B2i
zi = Z2i
endif
if Z2i > Z2i+ then
Ri = R2i+1
Gi = G2i+1
Bi = B2i+
Z = Z2i+
endif
if Z2i = Z2i+ then
Ri = (R2i + R2i+i)/2
Gi = (G2i + G2i+i)/2
Bi = (B2i + B2i+i)/2
Z = Z2i
endif
endfor (end CPN compositing operation)
RGBy = RGB1 {write composite result to VGN}
endfor
Display RGBx y array {DDN}
end
Figure 46. The specialized RGBZ compositing algorithm of
a fully balanced tree.
Compositing Processing Node
The purpose of a CPN is to perform pixelbypixel
compositing. It is a fundamental iterative hardware build
ing block used to construct a CN tree. A generic CPN
configuration is depicted in Figure 47. The subscript "i"
is a node number that identifies a node within the system
tree, which consists of CPNs, OGNs, and a VGN. The CN is a
subset of the system tree that contains only CPNs. The
OGNs are the terminal nodes of the system tree. The VGN is
connected to the root node of the CN and is identified
through node number zero of the system tree. As shown, the
CPN structure maintains a 2to1 configuration. The data
inputs consists of two pixel values P2i and P2i+lr which
can be routed to the CPN by either two preceding CPNs or by
two preceding OGNs. The data output consists of a single
pixel value, which can be routed to the input of a suc
ceeding CPN or to the input of a video generation node.
The CPN input clock, CLK, is driven by a system clock that
synchronizes the internal CPN operation with the entire
system. This signal is provided by the video generation
node (VGN), which maintains the entire system timing and
control.
A pixel is represented by five independent variables:
RED, GREEN, BLUE, ALPHA, and Z. The tristimulus color or
intensity is represented by the values of RED, GREEN, and
BLUE. The ALPHA value represents the average opaqueness of
Pixel output to next
stage of system tree
System
Clock
From
VGN
Pixel input from Pixel input from
previous stage previous stage
of system tree of system tree
where,
P. ={ri, g bi, Ai, Zi}
P2i ={r2i' 2i, b2i A2i Z 2
P2i+1 ={r2i+1 g 2i+1' b2i+1, A 2i+1' Z2i+1
CPNi is a CPN located in the CN at node position "i."
CLK is the system timing input.
Figure 47.
An iterative building block depiction of a
compositing processing node (CPN).
the pixel or the average light blocking characteristic of
the material that the pixel represents. The Z value repre
sents the Z coordinate, in cartesian space, where the pixel
exists. The X, Y cartesian coordinates are implied as
identical for both input pixels, but may be different
within the same clock cycle for the single output pixel due
to the hardware pipeline approach.
Schemes for realizing the previously discussed RGBAZ
compositing algorithms are developed that are fast and
inexpensive to implement in hardware, but which produces
results of numerically high quality. These schemes honors
two considerations: machine and numerical considerations.
Machine considerations concern speed and cost of the physi
cal device. Numerical considerations concern the closest
approximations to the exact numbers. The schemes attempt
to maintain a balance between both. Also, the effects of
roundoff error accumulation due to the feedthrough opera
tion of the binary tree of CPNs are taken into consider
ation.
Finite precision fixedpoint numbers are used in this
machine for representation of the pixel values. This
representation allows storage of pixel values within the
local image buffers of the OGNs and of the VGN to be inte
gers, which simplifies the image buffer organization.
Also, the hardware complexity for realization of the com
positing algorithms and of the video generation processing
algorithms is reduced, along with accommodation of faster
cycle times for an implementation. Therefore, the compos
iting algorithms that are depicted in Figures 45 and 46
must be adjusted to accommodate the fixedpoint repre
sentation of a pixel value, which relates to the machine
precision of a number represented within and operated by a
CPN.
The tristimulus color variables RED, GREEN, and BLUE,
are usually represented in rendering algorithms as fixed
point numbers. Therefore, they do not create an initial
problem. But, roundoff error amplification can occur due
to the repetitive modification of these values through the
compositing network. Therefore, to enhance the numeric
accuracy of the final result, the roundoff error has to be
controlled. Representation of the opacity value, ALPHA,
presents a similar problem, but differs slightly since its
initial value is defined as a fractional number. The depth
variable, Z, is usually represented within a rendering
algorithm as a floatingpoint number. The compositing
network handles the depth value as an integer and does not
modify its value, therefore its floatingpoint value can be
rounded or truncated.
The algorithm modification and the CPN conceptual
hardware organization are discussed in the following
sections. Two CPN organizations are presented: the general
CPN and the specialized CPN. The combinatorial hardware
layout of the conceptual block diagrams consist of breaks
in the logic for registers, termed stages. This is done to
maintain a pipeline of partial operations, which enhances
performance. Maximum system performance is then achieved
by matching the clock cycle to the longest delay through
the slowest stage. The stage delay is calculated by
totaling the delay through the logic and conductors that
exist between the two registers of a stage. Also, the
stage with the longest delay becomes the bandwidth limiting
section.
General Compositing Processing Node
A general CPN performs pixelbypixel compositing of
various object types: opaque, transparent, and semitrans
parent. It also handles antialiasing and special effects,
such as fadeouts and fadeins. The hardware organization
contains three distinct functional units: the depth compu
tation unit, the opacity computation unit, and the color
computation unit. These functional units are discussed
with respect to finite fixedpoint pixel value repre
sentation. The algorithm and the conceptual hardware
organization of each unit is presented.
Depth computation unit
The depth computation unit discerns the foreground
pixel from the background pixel or identifies both as
foreground pixels. This unit functions according to the
algorithm presented in Figure 48, which is a subset of the
general RGBZA compositing algorithm of Figure 45. The
depth value, Z, is represented by a single zbit integer,
where 0 S Z < (221). Therefore, the floating point repre
sentation of this value is initially truncated or rounded.
The task performed by this unit, as the algorithm indi
cates, consists of 1) receiving a pair of depth values, 2)
performing a comparison of depth values, 3) providing
status information, and 4) outputting the smallest depth
value. Status information consists of the LESS bit and
EQUAL bit, which are used by the color computation unit.
The LESS bit, when set, indicates that the Z2i value is
smaller then the Z2i+l value. The EQUAL bit, when set,
indicates that the Z2i value and Z2i+l value are equal in
magnitude.
A block diagram of the CPN depth computation unit is
depicted in Figure 49. Stage 1 performs an initial load
of the incoming pair of depth values from the CPN intercon
nect. Stage 2 performs a comparison of the two depth
values for status information and passes the two depth
values along with the status information. Stage 3 routes
the surviving depth value, which is the composite depth
value, to succeeding stages utilizing a 2:1 multiplexer
with the LESS status bit as a selector. The succeeding
stages are waiting stages that allow a final result to
CPN Depth Computation Unit Algorithm
given
literal Z2i
literal Z2i+l
begin
EQUAL = 0
if Z2i < Z2i+l then
Z. =Z
ESS =2
else
ii Z 26+1
if Z E = Z2i+1 then
EQUAL = 1
endif
endif
end
{zbit integer)
(zbit integer}
{result is a zbit integer}
Figure 48. The algorithm performed by a general CPN depth
computation unit.
Comparator Symbol
In A In B
V II A
Note: Registers are clocked
at image update rate.
Figure 49. Block diagram of a general CPN depth computa
tion unit.
Z2i Z2i+1
occur simultaneously with the remaining CPN computation
unit results.
Opacity computation unit
The opacity computation unit produces a single com
posite opacity value from two opacity values that are
provided as input. The opacity is defined as a positive
fractional value that ranges from zero to one. Each opac
ity value is stored in the image buffers of this machine as
fixedpoint binary numbers. Therefore, the opacity value,
A, is represented by a positive fixedfractional value
given by
A
0  1 (30)
Amax
where A is a binary integer such that 0 < A < Amax, and
Amax is a constant that defines the range of opacity. The
local image buffers store the integer value, A, while the
fixedfractional value is incorporated by the hardware.
Substituting the opacity representation of Equation 30 into
Equation 21 and collecting terms, gives
(Amax A2i)A2i+l
Ai = A2i +  (31)
Amax
The division required in Equation 31 is eliminated by
defining Amax as 2m1, where m is the number of bits in A.
This transforms the division operation to a shift opera
tion. A tradeoff occurs with this technique, since each
image buffer, within the OGNs, will require an extra bit
plane and an extra signal line to represent the opacity
value for a particular range. Substituting the value of
Amax into Equation 31, gives
Ai = A2i + m (32)
In order to have a more accurate result, the hardware unit
represents each opacity value as a higher precision number,
which reduces the roundoff error accumulation through the
compositing network. This is shown by multiplying both
sides of Equation 32 with 2m and adjusting the product
term, which gives
(22m1 2mA2i)2mA2i+l
2mAi = 2mA  (33)
22m1
Equation 33 shows that the opacity value can be handled as
a double precision number, if the opacity value is shifted
left by m bits and if the leastsignificanthalf of the
word is padded with zeroes before entering the CN tree.
Therefore, the opacity computation with the opacity values
defined as double precision numbers is given by
(22m1 A2i)A2i+1
Ai = A2i + 22  (34)
22m1
where the opacity value, Ai, is a binary integer such that
0 < Ai < 22ml
The opacity computation unit functions according to
the algorithm presented in Figure 410, which follows the
developed relations. The task performed by this unit
consists of 1) receiving a pair of opacity values, 2)
performing an opacity compositing operation, and 3) output
ting the composite opacity result.
A block diagram of the compositing computation unit
is depicted in Figure 411. Stage 1 performs an initial
load of the pair of opacity values from the CPN intercon
nect. Stage 2 performs a subtraction operation and passes
the two opacity values along with the subtraction result.
Stage 3 performs a multiplication of the subtraction
result with the A2i+l opacity value and shifts the multi
plication result right by 2m2 bits (division). It also
passes the A2i opacity value along with the shifted multi
plication result. Stage 4 sums the A2i value with the
shifted multiplication result and performs rounding, which
produces the composite opacity. Note that eliminating the
signal input to the carry bit and setting the carry bit to
0 will cause chopping of the multiplication result, instead
of rounding. The succeeding stages are waiting stages that
allow a final result to occur simultaneously with the
remaining CPN computation unit results.
CPN Opacity Computation Unit Algorithm
const
m = number of bits of initially stored opacity value
given
literal A2i (2mbit integer}
literal A2i+1 {2mbit integer)
begin
Ai = A2i + [(22m1 A2i)A2i+1] shr 2ml
if [(22m1 A2i)A2i+ AND 22m2] = 22m2 then
Ai = Ai + 1 (roundoff error)
endif
end {result is a 2mbit integer)
Figure 410. The algorithm performed by a general CPN
opacity computation unit.
A2i A2i+1
22m1
A2i
A2i+1
LSB
Note: Registers are clocked
at image update rate.
reg
reg
Aj
Figure 411. Block diagram of a general CPN opacity compu
tation unit.
Adder Symbol
InA InB
Overflow Carry In
Result
Color computation unit
The color computation unit produces a composite color
value from two color values that are provided as input.
The color values are defined as the true tristimulus color
values multiplied by their respective opacity value. Each
primary color (intensity) value, c, is stored in the
machine's image buffers as an nbit integer, where 0 < c <
2nl. But, this hardware unit handles the opacity and the
color values as higher precision numbers (2m and m+n1 bits
respectfully) in order to reduce the roundoff error
accumulation through the compositing network.
The composite color operation for the Z2i < Z2+1
condition is developed by substituting the higher preci
sion representations of both the intensity and the opacity
values into Equation 23, which gives
(22m1 A2i)c2i+l
ci = c2i +  m(35)
where each primary color value, c, is defined as an n+m1
bit value within the CN tree. Therefore, each nbit prima
ry color value requires multiplication by the mbit opacity
value before entering the CN tree. When exiting the CN
tree, each color value requires shifting right by m1 bits
with rounding or chopping to provide an nbit result.
The Z2i > Z2i+1 condition is obtained through a simi
lar development as above, but with the use of Equation 25.
It is given in final form by
(22m1 A2i+l)c2i
ci = c2i+1 + 2 (36)
The Z2i = Z2i+ condition is also obtained through a
similar development as above, but with the use of Equation
27. It is given in final form by
(22m1 A2i+1/2)c2i (A2i/2)c2i+1
ci = c2i+1 +   (37)
22m1 22m1
Note that the first two terms of Equation 37 are similar to
Equation 36. This reduces the hardware requirement for a
realization.
The color computation unit functions according to the
algorithm presented in Figure 412, which follows the
developed relations, where the color triad is represented
by c. The task performed by this unit consists of 1)
receiving a pair of color triads, a pair of opacity values,
and status information, 2) performing a compositing opera
tion according to the depth comparison, and 3) outputting
the result.
A block diagram of the color computation unit is
depicted in Figure 413. Stage 1 performs an initial load
of the incoming pair of color values from the CPN intercon
nect. Stage 2 is a waiting stage for the status result of
the depth computation unit. Stage 3 routes the color and
CPN Color Computation Unit Algorithm
const
n = number of bits of initially stored intensity value
m = number of bits of initially stored opacity value
given
literal c2i
literal c2i+1
literal A2i
literal A2i+l
literal LESS
literal EQUAL
{m+n1 bit integer: red, green, or blue)
{m+n1 bit integer: red, green, or blue)
(2m bit integer)
{2m bit integer)
(1 bit}
(1 bit}
begin
if LESS = 1 then2m 2i < Z2i+1)
ci = c2i + [(22 A2i)c2i+l] shr 2m1
if [(22m1 A2i)c2i+l AND 22m2] = 22m2
ci = ci + 1 {roundoff error)
endif
else
if EQUAL = 1 then {Z(i = Z2i+I
ci = c2i+ + [(22m 2i+ sr l)c2i]
[(A2i shr l)c2i+1] shr 2m1
then
shr 2m1
if [(22m1 A2i+1 shr 1)c2i AND 22m2] = 22m2 then
c = c, + 1 (roundoff error)
endif 2m
if [(A2i shr l)c2i+1 AND 22m2] = 22m2 then
ci = ci 1 {roundoff error)
endif
else 2m Z2i > Z2i+}
ci = c2i+1 + [(2 A2i+l)c2i] shr 2m1
if [(22m A2i+l)c2 AND 22m2] = 22m2 then
c = c + 1 {roundoff error)
endif
endif
endif
end {result is an m+n1 bit integer)
Figure 412. The algorithm performed by a general CPN
color computation unit.
A2i A24~1
LESS
EQUAL
Adder Symbol
InA In B
Overflow + Carry In
Result
Note: Registers are clocked
at image update rate.
Transparency
Figure 413. Block diagram of a general CPN color compu
tation unit.
TRANSPARENCY
Figure 413Continued
TRANSPARENCY
Figure 413Continued
TRANSPARENCY
Figure 413Continued
opacity values according to their depth priority. It also
shifts the routed opacity value right by 1, if the two
depth values are equal. Included is the multiplication of
the c2i+l color value with the halved A2i opacity value.
Also, the multiplication result is shifted right by 2m2
bits (division). Stage 4 performs the opacity subtraction
operation and rounds the shifted multiplication result of
stage 3. It also, passes the routed color values, the
EQUAL status bit, and the subtraction result (transpar
ency). Stage 5 performs a multiplication of the subtrac
tion result (transparency) with a routed color value along
and passes a routed color value. Included is shifting the
multiplication result by 2m2 bits (division). It also
routes the rounded value of stage 4, if the two depth
values are equal, but routes all zeroes, if the two depth
values are not equal. Stage 6 sums the shifted multipli
cation result with the passed color value along with
rounding and passes the multiplexer result of stage 5.
Stage 7 subtracts the multiplexer result from the addition
result of stage 6, which produces the composite color
value.
Specialized Compositing Processing Node
A specialized CPN performs pixelbypixel compositing
of opaque objects without antialiasing or special effects.
The hardware organization contains two distinct functional
units: the depth computation unit and the color computation
unit. An opacity computation unit is unnecessary, since
this specialized CPN does not incorporate antialiasing,
semitransparency, transparency, and special effects.
These two computation units are discussed with respect to
finite fixedpoint pixel value representation. The algo
rithm and the conceptual hardware organization of each unit
is presented.
Depth computation unit
The depth computation unit discerns the foreground
pixel from the background pixel or identifies both as
foreground pixels. This unit conceptually functions iden
tically to the depth computation unit of the general CPN.
Therefore, it functions according to the algorithm pre
sented in Figure 48. As for the general CPN, the depth
value, Z, is represented by a single zbit integer, where 0
< Z < (2Z1). Therefore, the floating point repre
sentation of this value is initially truncated or rounded.
The algorithm discussion can be found in the general CPN
section.
The block diagram of the depth computation unit is
identical to the unit of the general CPN. Therefore, its
depiction is shown in Figure 49. The hardware discussion
can be found in the general CPN section.
Color computation unit
The color computation unit produces a single composite
color value from the pixel values that are provided as
input. The color value, C, is defined as the true tri
stimulus color values. This machine stores each primary
color (intensity) value in its image buffers as an nbit
integer, where 0 < C < 2n1. The color value is passed as
an nbit integer in the hardware; therefore no shifting is
necessary before entering a specialized CPN. The unit
functions according to the algorithm presented in Figure 4
14, which follows the relations developed in Equation 29,
where the color triad is represented by C, and each color
is an nbit integer. The task performed by this unit
consists of 1) receiving a pair of color triad values and
status information, 2) performing a composite operation of
the color triad according to the depth comparison, and 3)
outputting the result.
A block diagram of the color computation unit is
depicted in Figure 415. Stage 1 performs an initial load
of the incoming pair of color values from the CPN intercon
nect. Stage 2 sums both color values along with shifting
the result right by one. It is also a waiting stage for
the status result of the depth computation unit. Stage 3
routes a color triad according to its depth priority
utilizing a 4:1 multiplexer with both the LESS and EQUAL
status bits as selectors. The succeeding stages are
CPN Color Computation Unit Algorithm
given
literal C2i
literal C2i+l
begin
if LESS = 1 then
Ci = C2i
else
if EQUAL = 1 then
Ci = (C2i + C2i+l)
else
Ci = C2i+1
endif
end
{nbit integer)
{nbit integer}
{Z2i < Z2i+)
{z2i =Z2i+l)
shr 1
(Z2i > Z2i+l)
{result is an nbit integer)
Figure 414. The algorithm performed by a specialized CPN
color computation unit.
LESS 
EQUAL
Adder Symbol
In A In B
Overflow Carry In
Result
Note: Registers are clocked
at image update rate.
Figure 415. Block diagram of a specialized
computation unit.
CPN color
Figure 415Continued
LESS 
EQUAL
Figure 415Continued
waiting stages that allow a final result to occur simulta
neously with the general CPNs. The result is a point
sampled composite color.
Analysis
The compositing network analysis examines two areas:
complexity and performance. Complexity is estimated for
discrete construction, for VLSI fabrication, and for gate
array construction. Performance is examined with respect
to image space resolution, CPN processing speed, and
compositing network tree depth.
Complexity
The compositing network complexity is a function of
both the CPN complexity and the quantity of CPNs config
uring a network. CPN complexity is measured utilizing two
metrics: gate count estimate and I/O signal pin count
estimate. The gate count estimate is determined through
partitioning the CPN conceptual hardware organization into
individual functional logic blocks, which are offtheself
SSI, MSI, and LSI components. Then, the estimated gate
count of each functional logic block is determined and
totaled to provide a gate count estimate of a CPN. This
technique provides an estimate of expected complexity for
integrated circuit fabrication. It also provides an
estimate of boardlevel complexity for an offtheshelf
integrated circuit implementation, which is determined
through totaling the functional logic block package types
used. It should be noted that performance is usually
enhanced for a realization by judicious use of additional
gating, which may alter the estimated gate count.
The partitioning of the design into functional logic
blocks allows the examination of implementation tradeoffs
that are offered between different technologies. It
reduces the organization to a logic format that can be
matched to the logic resources of a target device. The
hardware synthesis can be individually optimized around
each vendor's library and design rule guidelines for a VLSI
or a gatearray realization.
The functional block equivalent is listed in Table 4
1, the I/O pin count is listed in Table 42, and the gate
equivalent of various standard size functional logic blocks
are listed in Table 43. The functional block equivalent
and I/O pin counts were compiled from the conceptual
hardware organizations. The standard logic device gate
counts were estimated by counting the gates within the
functional block diagrams given in TTL data books [FAI84,
SIG84]. The gate count of the 16bit multiplier was esti
mated by considering it to be a full adder tree without
input and output registers [KUC78]. This was done since
the input and output registers are taken into account when
estimating gate equivalences of the staging registers.
These gate counts are expected to be close to an upper
79
Table 41. Functional logic block equivalent of the
general CPN and the specialized CPN.
Logic General CPN Specialized CPN
Function DCU OCU CCU DCU CCU
Comparator (zbit) 1 0 0 1 0
Adder (mbit) 0 4 8 0 0
Adder (nbit) 0 0 6 0 3
2:1 Multiplexer z 0 6m+6n6 z 0
4:1 Multiplexer 0 0 5m+3n3 0 3n
D FlipFlop 9z+2 22m+l 52m+48n34 9z+2 30n
Multiplier (2mX2m) 0 1 0 0 0
Multiplier 0 0 6 0 0
((m+nl)X2m)
Note: Inverters are not included, since inversion can be
produced through flipflop output selection.
80
Table 42. Pin requirement for the general CPN, the
specialized CPN, and each CPN computation unit.
Signal General CPN Specialized CPN I
Name DCU OCU CCU GCPN DCU CCU SCPN
RED2i
GREEN2i
BLUE2i
Z2i
ALPHA2 i
RED2i+1
GREEN2i+l
BLUE2i+1
Z2i+1
ALPHA2i+1
REDi
GREENi
BLUEi
zi
ALPHAi
CLK
LESS
EQUAL
POWER
GROUND
0 0 m+n1 m+n1
0 0 m+n1 m+n1
0 0 m+n1 m+n1
z 0
0 2m
0 z
0 0 m+n1 m+n1
0 0 m+n1 m+n1
0 0 m+n1 m+n1
z 0
0 2m
0 z
0 0 m+n1 m+n1
0 0 m+n1 m+n1
0 0 m+n1 m+n1
z 0
0 2m
1 1
1 0
1 0
1 1
1 1
0 2m
1 1
1 0
1 0
1 1
1 1
n n
n n
n n
0 z
0 0
n n
n n
n n
0 z
0 0
n n
n n
n n
0 z
0 0
1 1
1 0
1 0
1 1
1 1
total pins 3z 6m 9n 9n 3z 9n 9n
+5 +3 +13m +15m +5 +5 +3z
4 +3z +3
6
Table 43. Gate equivalent and package pin count of
various functional logic blocks.
Logic Package Gate
Function Pins Equivalent
4bit Magnitude Comparator (74F85) 16 31
4bit Binary Full Adder (74F283) 16 36
Quad 2:1 Multiplexer (74F157) 16 15
Dual 4:1 Multiplexer (74F153) 16 16
Octal DType FlipFlop (74F273) 20 48
16x16 Bit Multiplier (29517A) 64 4320*
This approximate number excludes the input and output
registers, which would account for about 288 additional
gate equivalent units.
bound. The procedure for estimating the total gate count
for a specific design is found utilizing the tables
subsequent to determining the CPN parameter values: m, n,
and z. The total CPN gate count can be estimated for a
different set of functional blocks by changing Table 43,
followed by performing the suggested procedure.
A graphics system is defined to exemplify the sug
gested technique for estimation of CPN complexity. The
graphics device has three criteria: it will be a fullcolor
system, it will have better than onepercent incremental
change in opacity, and it will have a highprecision depth
resolution. A fullcolor device requires the tristimulus
colors to provide 16.7 million simultaneous colors, which
is about at the human visual perception limits [ROG85].
Therefore, the required value of n is 8, which provides 8
bits for each primary color: RED, GREEN, and BLUE. The
resolution of opacity that would provide better than a
onepercent incremental change requires m to equal 8, since
its range would be 127 (0 < A < 2ml). The value of z is
selected as 24, since the depth resolution of 24bits is
satisfactory for highend graphics devices. Relating these
selected parameter values of m, n, and z with Tables 41
through Table 43 produces the specified system estimated
complexity, which is presented in Table 44.
The general CPN has a complexity of about 12 times
that of the specialized CPN. Therefore, a compositing
Table 44. Estimated complexity of the general CPN, the
specialized CPN, and each CPN computation unit.
Type of General CPN Specialized CPN
Count DCU OCU CCU GCPN DCU CCU SCPN
Pins 77 51 172 258 77 77 147
Gates 1585 5670 33362 40617 1585 1848 3433
Packages 40 32 251 323 40 48 88
16Pin 12 8 149 169 12 18 30
20Pin 28 23 96 147 28 30 58
64Pin 0 1 6 7 0 0 0
Note: The CPN parameters for m, n, and z, are 8, 8, and 24.
network that can utilize a mixture of both the general and
the specialized CPNs would be the most efficient configura
tion. Table 44 indicates that a board level CPN imple
mentation would have a reasonable package count for the
general CPN and a very reasonable package count for the
specialized CPN. This indicates that a CPN implemented
using offtheshelf parts is within bounds. At the time of
this writing, a 16,000gate bipolar ECL/TTL array with 100
ps delays and 292 input/output cells was available [COL88].
Chip densities of HCMOS arrays are as high as 237,000
gates, with 400ps switching delays [BUR88]. Therefore,
the CPN gate counts and pin counts are within bounds for a
single chip VLSI implementation or a singlechip gatearray
implementation.
Performance
The CPN performs pixelbypixel processing that is
independent of scene complexity. Therefore, its pro
cessingtime is a function of both the image space reso
lution and the image update rate, which is given by
1
ProcessingTime =  (38)
(Image Update Rate)(Resolution)
The image update rate is considered realtime at 10 frames
per second, since images sequenced at this rate appear to
have a smooth visual flow. The image space resolution is
defined as the total number of visible pixels. The CPN
processingtime for various image space resolutions is
presented in Table 45. As shown, to double the final
resolution while maintaining the same level of performance,
the speed of the CPNs must be increased by a factor of
four.
The CPNs of the compositing network operate in lock
step. When the tree structure is maximally unbalanced so
that each node has a left descendant but none has a right
one, the compositing network degenerates into a linear
pipeline. This implies that each stage of the linear
pipeline must perform its function within the CPN pro
cessingtime to successfully composite a collection of
pixels. Therefore, the slowest stage in the pipeline is
what determines the peak performance of the compositing
network. This is the multiplication stage for the general
CPN, which implies that the multiplier parts are what
determine the compositing network performance if any CPN is
of the general type. In contrast, the comparator stage
determines the compositing network performance if all CPN's
are of the specialized type.
Consider the example of the constraints section, where
all of the CPN's are of the general type. The 16bit
multiplier, which had been specified, maintains a 45ns
multiply time (including setup time) [ADV85]. This part
has internal input and output registers, therefore the
multiply time can be considered the total pipeline stage
Table 45. CPN processingtime for various image space
resolutions. The image update rate is 10
frames per second.
Image Space Resolution CPN ProcessingTime
(pixels) (ns)
640 X 480 325.5
1280 X 960 81.4
1280 X 1024 76.3
1600 X 1280 48.8
2048 X 2048 23.8
time. Therefore, the compositing network would have a
maximum bandwidth of 22.2 million results per second. From
Table 45, all but the last entry could be supported with a
single compositing network. The last entry could be sup
ported if two CN's were used, where each CN would be
dedicated to a separate half of the image array, while
operating at half the image update rate.
The computational performance of a compositing network
that is configured with all general CPNs is measured
through calculating the total number of additions and
multiplications that every general CPN performs per unit
time. A general CPN performs, as a lower bound (all Z's
not equal), eight additions and four multiplications. As
an upper bound (all Z's equal), a general CPN performs
eleven additions and seven multiplications. Therefore, the
range of computational performance for a compositing net
work configured with all general CPNs is given by
8(CPNs)(BW) < additions/s < 11(CPNs)(BW) (39)
4(CPNs)(BW) < multiplications/s < 7(CPNs)(BW) (40)
12(CPNs)(BW) < operations/s < 18(CPNs)(BW) (41)
where BW refers to the general CPN bandwidth or general CPN
results per second. The CPNs refer to the total number of
compositing processing nodes that comprise a compositing
network.
For example, consider an augmentable system archi
tecture configured with a three level CN tree with all
general CPNs and a 1600 X 1280 resolution display device
node. It will maintain a CN processing performance of
between 1720MOPS and 2580MOPS (million operations per
second). This throughput is what supercomputers provide,
which demonstrates the potential of distributed simulta
neous calculations.
The performance of a CN configured with all special
ized CPNs is measured through the total bandwidth of the
CN, which is equal to the bandwidth of a unitary special
ized CPN. This metric is used since specialized CPNs
primarily route data, as apposed to performing a computa
tion with regards to the data. If all depth values are
equal, each specialized CPN will perform one addition.
This will provide an additions per second rate that is
computed through the product of the number of CPNs
configured and the CPN bandwidth. The performance limiting
stage of a specialized CPN is its comparison stage, but
depending on word size it could be the addition stage
instead.
Consider* the example presented in the constraints
section, but where all of the CPN's are of the specialized
type. The comparison stage, utilizing the components
specified in Table 43, maintains a 42ns propagation delay
from clock to output. Therefore, the system would have a
