• TABLE OF CONTENTS
HIDE
 Title Page
 Copyright
 Acknowledgement
 Table of Contents
 List of Tables
 List of Figures
 List of Figures
 Abstract
 Introduction
 Typical real-time CGI architec...
 Alternate real-time CGI archit...
 Compositing network
 Video generation node
 Display device node
 Object generation node
 Maintenance management node
 Bibliography
 Biographical sketch
 Copyright
 Copyright






Title: Augmentable object-oriented parallel processor architectures for real-time computer-generated imagery
CITATION THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00082283/00001
 Material Information
Title: Augmentable object-oriented parallel processor architectures for real-time computer-generated imagery
Physical Description: xii, 154 leaves : ill. ; 28 cm.
Language: English
Creator: Fleischman, Ross Morris
Publication Date: 1988
 Subjects
Subject: Computer graphics   ( lcsh )
Three-dimensional display systems   ( lcsh )
Parallel processing (Electronic computers)   ( lcsh )
Computer architecture   ( lcsh )
Electrical Engineering thesis Ph. D
Dissertations, Academic -- Electrical Engineering -- UF
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
 Notes
Thesis: Thesis (Ph. D.)--University of Florida, 1988.
Bibliography: Includes bibliographical references.
Statement of Responsibility: by Ross Morris Fleischman.
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00082283
Volume ID: VID00001
Source Institution: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: aleph - 001123865
oclc - 20071125
notis - AFM0917

Table of Contents
    Title Page
        Page i
    Copyright
        Page ii
    Acknowledgement
        Page iii
    Table of Contents
        Page iv
        Page v
        Page vi
    List of Tables
        Page vii
    List of Figures
        Page viii
        Page ix
    List of Figures
        Page x
    Abstract
        Page xi
        Page xii
    Introduction
        Page 1
        Page 2
        Page 3
        Page 4
    Typical real-time CGI architecture
        Page 5
        Page 6
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
    Alternate real-time CGI architecture
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
    Compositing network
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
    Video generation node
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
    Display device node
        Page 105
        Page 106
        Page 107
        Page 108
    Object generation node
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
    Maintenance management node
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
        Page 139
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
        Page 146
        Page 147
        Page 148
    Bibliography
        Page 149
        Page 150
        Page 151
        Page 152
        Page 153
    Biographical sketch
        Page 154
        Page 155
    Copyright
        Page 156
    Copyright
        Copyright
Full Text















AUGMENTABLE OBJECT-ORIENTED
PARALLEL PROCESSOR ARCHITECTURES
FOR REAL-TIME COMPUTER-GENERATED IMAGERY
















BY

ROSS MORRIS FLEISCHMAN


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN
PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1988



































Copyright 1988

by

Ross Morris Fleischman












ACKNOWLEDGMENTS


I would like to express my appreciation to my advisor

and supervisory committee chairman, Dr. John Staudhammer,

for the guidance and encouragement he provided me on this

project. I am also grateful to the other members of my

supervisory committee, Dr. Keith L. Doty, Dr. Jack R.

Smith, Dr. Jose C. Principe, and Dr. Joseph Duffy, for

their commitment. I also wish to thank the members of the

UF Computer Graphics Research Group for their suggestions.

This dissertation is dedicated to my mother,

Ruth Koegel Fleischman, and to the memory of my father,

Erwin Lewis Fleischman.


iii












TABLE OF CONTENTS


Page
ACKNOWLEDGMENTS . . . ... iii

LIST OF TABLES. . . . ... .. .vii

LIST OF FIGURES . . . . ... .viii

LIST OF ABBREVIATIONS . . . . x

ABSTRACT. . . . .. . .xi


CHAPTERS

I INTRODUCTION. . . . . 1

Problem Definition. . . . 2
Dissertation Project. . . . 3
Overview of Dissertation. . . . 4

II TYPICAL REAL-TIME CGI ARCHITECTURE. . 5

Scene Manager . . . . 5
Geometric Processor. . . . .. 8
Video Processor . . . . 9
Display Device . . . .. 10

III ALTERNATE REAL-TIME CGI ARCHITECTURE. . .. .12

System Model. . . . .. 12
Underlying Idea .... . . 12
Supporting Architecture . . .. 15
Advantages of Approach. . . .. .19
Target Applications. . . . .. 22

IV COMPOSITING NETWORK . . . .. 23

Compositing Methodology . . . 25
RGBZA Compositing Algorithm . . .. 33
Network Structure . . . ... 40
Compositing Processing Node ....... 50
General Compositing Processing Node .... 54
Depth computation unit. . .... 54
Opacity computation unit. . ... .58
Color computation unit. .. ....... 63
Specialized Compositing Processing Node . 70
Depth computation unit. . . .. .71
Color computation unit. . . .. .72
Analysis. . . . . ... 77












Complexity. . . . . 77
Performance . ........ . 84

V VIDEO GENERATION NODE . . ... 90

Configuration . . . . 90
Atmospheric Attenuation Unit. . . 92
Pixel Cache . . . .. 95
Double-Buffered Frame Buffer. . . 96
Video Shift Registers . . ... 98
Color Palette . . . .. 99
Digital-to-Analog Converters. .. . 100
System Controller . . . .. 100
Analysis. . . . . .. 101
Complexity. . . . . ... 101
Performance . . . .. 102

VI DISPLAY DEVICE NODE . . . .. 105

Display Device Approaches . . .. .105
Raster Scan Conversion. . . . 106
Image Aspect Ratio. . . . 107
Display Device Performance . . .. 107

VII OBJECT GENERATION NODE. . . . 109

Configuration . . . . .. 110
Object Generation Node Nucleus. . 110
Double-buffered image buffer. . .. .112
Intensity multiplication unit . .. .114
Nucleus controller. . . . .. .115
Object Generation Unit. . . . 118
Secondary Memory Unit . . . 119
Analysis. . . . . . 121
Complexity. . . . . .. .121
Performance . .. . . .. 122

VIII MAINTENANCE MANAGEMENT NODE . . 124

Configuration . . . . 124
Operating Functions . . . 125
System Boot Operation . . .. 125
System Normal Operation . . .. 126
Simulation Debugging. . . . 127
Analysis. . . . ... .. .127

IX CONCLUSION. . . . . .. .129

System Simulator. . .... .... 129
System Simulation . . . 130
Discussion of System Features . ... .139
Summary . . . . .. .. .147













BIBLIOGRAPHY. . . . . .. 149

BIOGRAPHICAL SKETCH .................. 154












LIST OF TABLES


Table Page

4-1 Functional logic block equivalent of the
general CPN and the specialized CPN. .. 79

4-2 Pin requirement for the general CPN, the
specialized CPN, and each CPN computation
unit. . . . . . 80

4-3 Gate equivalent and package pin count of
various functional logic blocks . . 81

4-4 Estimated complexity of the general CPN, the
specialized CPN, and each CPN computation
unit. . . . . . 83

4-5 CPN processing-time for various image space
resolutions. The image update rate is 10
frames per second . . . ... 86


vii












LIST OF FIGURES


Figure Page

2-1 Block diagram of a typical real-time CGI
system organization . . . 6

3-1 Composition of an opaque background 3-D
object and an opaque foreground 3-D object
to produce a composite 3-D scene ...... 14

3-2 Block diagram of the proposed augmentable
real-time CGI system organization . .. 17

4-1 Three distinct types of pixel coverage, with
respect to the ALPHA value: a) no coverage,
b) full coverage, and c) partial coverage.
The subpixel shape of the pixel with partial
coverage is arbitrary and is only shown in
this manner for conceptual clarity. . ... 27

4-2 Two pixel opacity values are composite. The
values were derived from coverage information
from two different objects. The coverage
depictions are arbitrary, they are given
specific subpixel forms to clarify the
composite operation. The coverage areas are
actually averaged across the pixel. . ... 35

4-3 The RGBZA Compositing Algorithm . .. 39

4-4 A fully balanced three level compositing
tree ... . . .... . 42

4-5 The general RGBZA compositing algorithm for a
fully balanced tree. Note that lower case
letters designate the product of intensity
and opacity. .. . . ... 47

4-6 The specialized RGBZ compositing algorithm of
a fully balanced tree . . .. 49

4-7 An iterative building block depiction of a
compositing processing node (CPN) . .. .51

4-8 The algorithm performed by a general CPN
depth computation unit. . . ... 56

4-9 Block diagram of a general CPN depth computa-
tion unit . . . . ... 57


viii












4-10 The algorithm performed by a general CPN
opacity computation unit. . .. 61

4-11 Block diagram of a general CPN opacity compu-
tation unit . . . . 62

4-12 The algorithm performed by a general CPN
color computation unit. . . . 65

4-13 Block diagram of a general CPN color compu-
tation unit . . . .. 66

4-14 The algorithm performed by a specialized CPN
color computation unit. . . . 73

4-15 Block diagram of a specialized CPN color
computation unit. . .. . 74

5-1 Block diagram of a video generation node with
respect to its seven modules. . .. 91

5-2 Block diagram of the atmospheric attenuation
unit, used to include atmospheric effects in
a scene . ..... ......... . 94

7-1 Block diagram of an object generation node
with respect to its three modules . .. 111

7-2 Block diagram of the intensity multiplication
unit, used to condition the color and opacity
values for input to the compositing network 116

9-1 View of rectangle A and rectangle B in object
space . .. . . .... 131

9-2 Contents of the simulated frame buffer of
OGN1 after scan converting rectangle A. ... .134

9-3 Contents of the simulated frame buffer of
OGN2 after scan converting rectangle B. ... 135

9-4 Contents of the simulated frame buffer of the
VGN for the first system simulation . 136

9-5 Contents of the simulated frame buffer of the
VGN for the second system simulation. . 137

9-6 Two system tree configurations: a) fully
balanced system tree and b) unbalanced system
tree. . .. . . .. 144












LIST OF ABBREVIATIONS


2-D -------- Two-Dimensional
3-D -------- Three-Dimensional
AAU -------- Atmospheric Attenuation Unit
CGI -------- Computer-Generated Imagery
CN --------- Compositing Network
CPN -------- Compositing Processing Node
DAC -------- Digital-to-Analog Converter*
DDN -------- Display Device Node
I/O -------- Input/Output
IPU -------- Intensity Multiplication Unit
LSH -------- Least Significant Half
MSH -------- Most Significant Half
MMN -------- Maintanence Management Node
MMU -------- Maintanence Management Unit
OGN -------- Object Generation Node
OGNN ------- Object Generation Node Nucleus
OGU -------- Object Generation Unit
RGB -------- RED, GREEN, and BLUE
SMU -------- Secondary Memory Unit
VGN -------- Video Generation Node
VLSI ------- Very Large-Scale Integration












Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

AUGMENTABLE OBJECT-ORIENTED
PARALLEL PROCESSOR ARCHITECTURES
FOR REAL-TIME COMPUTER-GENERATED IMAGERY

By

Ross Morris Fleischman

December 1988

Chairman: Dr. John Staudhammer
Major Department: Electrical Engineering

The hardware architecture of a system for real-time

computer-generated imagery (CGI) is presented that combines

augmentability, modularity, organizational simplicity, and

parallelism. This architecture is a functional, highly-

modular, parallel processor approach that is well suited

for employing VLSI technology. It is a generic structure

that can grow with technological advances and can

accommodated a full range of CGI systems that demand

different performance requirements through one basic set of

modules.

The CGI process contains five fundamental components:

input, modeling, rendering, compositing, and output. This

architectural approach extends specialized hardware into

both the compositing and output components, which allows

the definition of a generic framework for building systems

appropriate for many simulations. The system architecture

performs image synthesis in parallel by partitioning the

image generation task in object space with each partition












assigned to an individual autonomous object generator.

Objects are rendered independently from each other, and

when complete they are automatically composite by the

hardware for display. This process is repeated at a rate

suitable for real-time animation.

The picture representation accepts transparent, semi-

transparent, and fully opaque surfaces. Hardware

facilities perform automatic hidden surface removal with

antialiasing and atmospheric attenuation inclusion. An

approximation for surface intersection is performed, and a

subpixel control mechanism is provided.

The parallel hardware algorithm is classified as a

compute-aggregate-broadcast paradigm: a compute phase

generates objects, an aggregate phase combines the objects

into a scene, and a broadcast phase displays the scene.

Its system framework maintains a synchronous feed-through

structure that allows enlargement by either dynamic or

static additions. System improvement is accommodated by

adding modules that incrementally improve system

performance and scope. This reduces difficulties

associated with the incorporation of new systems to

introductions of new modules, thereby lengthening system

life.


xii


















CHAPTER I
INTRODUCTION


A computer-generated imagery (CGI) system is a

specialized computer system that provides a visual simu-

lation of an artificial environment. Conceptually, a CGI

system consists of a window in multidimensional space with

which an observer may look into a world. The window is

presented by a computer driven display device, while the

world is modeled by a database that the computer can

access. Thus, the visual simulation may be regarded as a

generation of an out-the-window view, in real-time, ac-

cording to the simulated position and orientation of the

observer with respect to the simulated changes of the

artificial environment.

A popular application of real-time computer-generated

imagery visual simulators concerns vehicle training simu-

lation [FIS85, PAN86, SCH81, SCH83b, YAN85, ZYD88]. For

this application, an observer's visual experience is

created by a generated perspective projection of a 3-D

world rendered onto a 2-D display device [BEN83], with

associated special effects. Other simulation tasks [SUG83]

may have variations of the visual simulation requirement as












a function of the world structure, but the real-time per-

formance and rendering problems remain constant.

Real-time operation, which defines a computation

process where the execution time of the computer is syn-

chronized with the physical event time or wall-clock time,

is a major requirement of these systems [FOR83]. Also, the

associated image rendering problems are computationally

demanding. Thus, real-time CGI system organizations typi-

cally mandate custom-designed, special-purpose, high-speed

computers, with general-purpose computers for their control

[SCH81, SCH83b, YAN85].


Problem Definition

Traditional CGI architectures utilize both pipelining

and parallelism technologies to achieve real-time perfor-

mance for image synthesis. The system architectures are

usually highly specialized and constrain the types of

graphics primitives that can be employed [ENG86]. These

special-purpose architectures usually involve a fixed

graphics pipeline that is difficult to enhance for in-

creased performance or for inclusion of additional graphics

primitives.

The realization of major CGI architectural revisions

that exhibit improved performance with substantial hardware

reduction is a subject of research. Innovative CGI archi-

tectures will employ unique organizational structures that

realize algorithmic improvements with respect to imple-












mentation with massive memory, gate arrays, and custom

VLSI. Thus, improvements in both VLSI memory chips [COL87,

TUN87a] and VLSI computational chips [BUR87, COL87, GRI86,

MOK87], plus parallel processing trends [SCH87], are good

indicators that the evolution of CGI system organizational

philosophies will become VLSI-oriented through parallelism.


Dissertation Project

The general research objective is to develop the

guidelines and philosophies of a VLSI-oriented real-time

CGI architecture that combines augmentability, modularity,

organizational simplicity, and parallelism. This proposed

architecture will be a functional, highly-modular, parallel

processor approach that will be suited for employing VLSI

technology. It will be a generic structure that can grow

with technological advances. The investment in such a

system will hypothetically never be discarded; system

improvement can be accommodated by adding modules that

incrementally improve the performance and scope of a sys-

tem. The introduction of new systems will be reduced to

introductions of new modules, thereby resisting system

obsolescence. Therefore, such a system will be continu-

ously expandable and never totally outmoded, thus providing

performance, development, and economic benefits.











Overview of Dissertation

This dissertation is organized into nine chapters.

Chapter I is an introductory chapter that covers objectives

and background about the dissertation subject. Chapter II

describes a typical real-time CGI architecture. Chapter

III presents an overview and introduction of the proposed

augmentable CGI architectures, along with the fundamental

driving idea for the approach. Chapters IV through VIII

describes each major subsystem of the augmentable CGI

architectural approach. Chapter IV describes the compos-

iting network. Chapter V describes the video generation

node. Chapter VI describes the display device node.

Chapter VII describes the object generation node. Chapter

VIII describes the maintenance management node. Chapter IX

is a concluding chapter that contains a discussion of the

system simulation, along with a summary of the dissertation

results.

















CHAPTER II
TYPICAL REAL-TIME CGI ARCHITECTURE


A typical real-time CGI system organization, popular

among vehicle training simulators, is shown in Figure 2-1.

This structure provides a single field-of-view of the

artificial environment, termed a channel. Its organization

consists of a cascade of four major subsystems: the scene

manager, the geometric processor, the video processor, and

the display device [SCH83b, YAN85]. The first three sub-

systems form a specialized computer graphics pipeline for

image rendering. The last subsystem provides a specialized

display for viewing.


Scene Manager

The overall function of the scene manager is to

provide scene elements to the system pipeline that lie in

the observer's field-of-view, within the artificial envi-

ronment, given observer position and orientation. Observer

position and orientation information are provided to the

scene manager by a host simulator [FOR83, SCH83b]. This

information directs dynamic extraction of database scene

elements from mass storage that are loaded into an active

database memory for sorting [PAN86]. These scene elements

represent the observer's panorama and are examined to











Data From Host
Simulator


3-D Data
Blocks


3-D Data
Blocks


Blocks


3-D


Analog
Video


Figure 2-1. Block diagram of a
system organization.


typical real-time CGI













determine if they are potentially visible within the field-

of-view of the observer [PAN86, YAN85]. Scene elements

satisfying this condition are provided with an appropriate

level-of-detail, while the remainder are culled [PAN86,

YAN85].

The resultant scene elements are sent down the system

pipeline, at the image update rate, to the geometric

processor [YAN85]. Subsystem processing load is continu-

ously monitored by the scene manager to avoid overloading

the processing capacity of the pipeline. Processing load

reduction techniques utilize various dynamic scene content-

control mechanisms that usually degrade image quality

gracefully [SCH83a, YAN85].

The mass storage device contains a database, which

models an artificial environment, that drives the hardware.

Features of a simulated scene (natural and cultural) are

modeled to be of the same size, shape, location, and color

as their real-world counterparts [SCH81, SCH83a]. Database

modeling primitives for the typical CGI system consist of

planar polygons as a major primitive and quadric surfaces

as an option for both man-made curved objects and natural

curvilinear objects [YAN85]. The database also contains

scene element attributes such as color and texture.












Geometric Processor

The geometric processor is a special-purpose pipelined

computer that operates on the scene element output from the

scene manager. These operations usually produce the pro-

jected geometry of the scene with associated geometric

gradient and color gradient parameters. In general, the

fixed coordinate system of the scene elements are trans-

formed (via translation, rotation, and scaling) to the

momentary eye-based coordinate system (origin located at

the observer's eye). Within the eye-based coordinate

system, a visibility frustrum is defined. Then, a 3-D

clipping algorithm is applied to determine where the 3-D

scene intersects the bounding planes of the visibility

frustrum. Scene parts within the visibility frustrum are

projected to the image plane with the computed geometric

gradient and color gradient parameters, while the rest are

deleted [BEN83].

Issues relating to color can be found in Rogers

[ROG85]. A matrix multiplier was presented by Meares et

al. [MEA74] and a three-dimensional coordinate transforma-

tion device was presented by Newarikar [NEW82]. Clipping

algorithms, geometric transformations, and perspective

projection can be found in Rogers [ROG85] with an interest-

ing VLSI solution presented by Clark [CLA82]. Clark

discusses a four-component vector, floating point VLSI












processor for accomplishing matrix transformations, clip-

ping, and mapping to output device coordinates.


Video Processor

The video processor is a special-purpose computer that

operates on the resultant projected geometry, geometric

gradient, and color gradient output from the geometric

processor for subsequent display. The video processor

computes each pixel color produced on the picture plane

representing visible portions of scene element surfaces

[SCH83b, YAN85]. Pixel color computation is a function of

various items: geometric gradient parameters (surface

normals), color gradient parameters (scene element native

color), texture maps, atmospheric attenuation (haze color),

scene element illumination (both natural and cultural light

sources), antialiasing techniques, shadows, and shading

techniques. During, before, or after pixel computation,

visible portions of the scene are identified through a

hidden surface removal technique.

This processor also provides timing and control of the

display device, which relate to the video processor organ-

izational philosophy [YAN85]: scan-line-based or frame-

buffer-based. Scan-line-based units perform video pro-

cessing one scan-line at a time in synchronism with each

raster of the display device; one row of the visible

scene's pixel codes are stored. Frame-buffer-based units

perform video processing independent of the raster display;












complete frame of the visible scene's pixel codes are

stored.

Algorithms and techniques used by the video processor

are well known and can be found in the literature, such as

Rogers [ROG85]. Examples of antialiasing include Booth's

[B0087] presentation concerning the human factors relation

to antialiasing and Carpenter's [CAR84] presentation of an

interesting A-buffer approach. Real-time hardware ap-

proaches to texture mapping can be found in the literature,

such as one approach presented by Fant [FAN86].


Display Device

Display device technology primarily consists of two

variations: calligraphic displays and raster displays

[SCH83a]. The color calligraphic display is characterized

by a continuous layered phosphor surface (RED and GREEN

phosphor layers) used to present a color picture with beam

penetration control (electron beam velocity) of sequen-

tially refreshed straight lines (vectors or strokes) and

points (zero length vectors). The raster display contains

a regular grid of phosphor triads (RED, GREEN, and BLUE)

that are used to present a color picture by modulated

illumination of each phosphor triad point (pixel) with

refresh in a regular pattern. Calligraphic systems main-

tain high quality light points with color limitations,









11


while raster systems maintain high quality painted faces

without color limitations [YAN85, SCH83a].

















CHAPTER III
ALTERNATE REAL-TIME CGI ARCHITECTURE


This chapter presents an alternate real-time CGI

architectural approach as compared to the traditional

approach briefly presented in Chapter II. This discussion

is meant as an overview to give an understanding of the new

approach before delving into its details. The system model

is presented to illustrate the underlying idea and its

supporting architecture. Following, is a discussion of the

advantages of the new approach and typical applications.


System Model

A system model is presented that exhibits the premise

of this research. First, the underlying idea with respect

to the image generation problem is presented. Second, the

supporting architecture that can realize the underlying

idea is described.


Underlying Idea

This field of architectural research is driven by the

fundamental idea that an individual scene is composed of

separable objects. Therefore, a scene can be produced from

the summation of every object existing in that scene; this

process is called compositing. An example of compositing











is presented in Figure 3-1, which illustrates the composite

of two opaque 3-D objects. As shown, an opaque background

3-D object and an opaque foreground 3-D object are merged

to form a composite 3-D scene. This process indicates that

there is an alternative to producing an entire complex

scene directly. The generation of simpler objects can be

done individually, followed by compositing the simpler

objects to produce an entire complex scene [POR84, STA83].

The approach taken by this research separates the

image compositing process from the image synthesis process

of the image generation problem. The compositing is

reduced to the pixel level, where a procedure is defined

that can blend images through a pixel-by-pixel process.

This compositing process is extended further, at the pixel

level, to include the effect of atmospheric attenuation.

The compositing process performs antialiased blending

of images utilizing a mixing factor. This mixing factor

defines the average opacity of a pixel, which defines the

average pixel reflectivity. It is useful for performing

surface edge antialiasing and for rendering surfaces that

are either transparent, semi-transparent, or opaque. Along

with the mixing factor is the depth value for determining

the depth location of a pixel in space. This information

is used for a comparison to determine whether a pixel, as

compared to another pixel, is in front of, behind, or at

the same distance. Also, the horizontal and vertical












Background 3-D Object


Composite 3-D Scene




Figure 3-1. Composition of an opaque background 3-D object
and an opaque foreground 3-D object to
produce a composite 3-D scene.


Foreground 3-D Object











position is determined by the pixel location in the image

array and the color value is defined as the additive tri-

stimulus colors: RED, GREEN, and BLUE.

The atmospheric attenuation process is performed by a

procedure that calculates attenuation as a function of

distance from the viewpoint with respect to a fading

constant and a horizon color. The fading constant is

adjusted for varying atmospheric conditions such as foggy,

hazy, and murky atmospheres. The horizon color is adjusted

for varying background lighting conditions. This process

is performed following the compositing process, which

results in a pixel value representing the composite tri-

stimulus color value with the effect of atmospheric

attenuation included.


Supporting Architecture

The fundamental idea of compositing focuses on allow-

ing a scene to be blended by computer. Hypothetically, the

objects would be visually computer-generated in their

proper position and orientation, then they would be merged

by computer for display. Therefore, instead of having an

individual total database for an artificial environment,

the total database would be partitioned by objects to

provide multiple partial databases. This would allow each

object or group of similar objects in a scene to be

assigned an individual processor, which would have the

advantage of distributing the image generation task, thus













reducing the performance requirement for each processor and

secondary memory unit. Also, the task of merging or

compositing the collection of objects would be performed by

separate processors. As a result, an organization of this

nature would produce multiple data-streams and multiple

instruction-streams, thereby speeding-up both computational

processing and I/O processing. Also, the separable nature

of objects existing in a scene points to the goal of

expandability without affecting other elements of the

system.

The abstract organization of the proposed augmentable

CGI architecture, which logically follows from the above

discussion, is illustrated in Figure 3-2. Major components

of the proposed real-time CGI machine consists of multiple

object generation nodes (OGNs), a compositing network (CN),

a video generation node (VGN), a display device node (DDN),

and a maintenance management node (MMN). This system can

handle opaque, transparent, and semi-transparent images. A

short description of each subsystem is discussed below with

a more detailed discussion of each subsystem presented

subsequently in Chapters IV through VIII.

The object generation node is a VLSI-oriented image

synthesis processor with an optional local secondary

memory, which can execute computer graphics algorithms to

render an assigned object. OGNs operate autonomously and

concurrently with respect to the complete system, but in























Compositing
Network


NGeneral
Communications



Figure 3-2. Block diagram of the proposed augmentable
real-time CGI system organization.












synchronism with it. They are assigned a partition, termed

an object, of the entire image generation task. An OGN

interfaces to the compositing network through its image

memory, which contains an image space view of the assigned

object. Each element of the image memory contains three

pixel attributes: color, opacity, and depth. The X, Y

coordinates are derived from a pixel's position in the

image memory.

The compositing network is a pixel-by-pixel hardware

compositor, which is an expandable ensemble of intercon-

nected compositing processing nodes, that produces a

computer graphics picture through blending independently

rendered objects into a full image. This network is a

synchronous feed-forward structure. It simultaneously

reads each image memory area, of every OGN, pixel-by-pixel

in a row-by-row manner and writes the composite result to

the VGN pixel-by-pixel.

The video generation node processes composite object

digital image data from the compositing network and then

converts it to analog video data for display. Pixels are

individually received from the compositing network. While

pixels are received, the VGN includes the effect of

atmospheric attenuation to each pixel and then writes the

result to its frame buffer pixel-by-pixel in a row-by-row

manner. Simultaneously, the frame buffer data is read and

converted to analog data for driving the display device













node. Also, the timing of the entire system is derived

from the VGN.

The display device node is a raster scan type monitor.

It receives three primary colors from the VGN: RED, GREEN,

and BLUE. The video timing of the monitor is also control-

led directly from the VGN.

The maintenance management node provides central

control and health assurance of the system. It is an

autonomous processor that provides self-maintenance

operations and system support functions. Included is a

computational unit, a secondary memory unit, and a con-

sole. The MMN communications to and from the nodes of the

system is provided by a general interface from which all

system nodes are connected.


Advantages of Approach

The improvements of this CGI architectural approach as

compared to existing CGI architectural approaches encompass

a reduced complexity of the individual image synthesis

processors, ease of system expansion, ease of including

different graphic primitives, decoupling of the rendering

process from the compositing process, and ease of system

understanding. The reduced complexity of image synthesis

processors is due to three factors: 1) the image genera-

tion task is distributed among many processors (OGNs), 2)

the hidden surface removal with antialiasing is included in












the architectural structure (compositing network), and 3)

the effect of atmospheric attenuation is included in the

architectural structure (VGN). The automatic processing of

2 and 3 above is relegated to the machine structure and

the distributed processing of 1 above is shared among many

image synthesis processors. The result is a simplified

database and a reduction of the amount of geometry required

to render an object. This relieves individual processing

performance requirements of each object generation node,

thus allowing modest processors, e.g., off-the-shelf VLSI,

to perform their image synthesis tasks. Thus, OGNs do not

have to be the same type. The decoupling of the rendering

process from the compositing process is done though

independent memory areas; this enhances system performance

by keeping both processes running in parallel. The system

expansion is eased, since it is done by merely adding

additional CPNs and OGNs. New graphics primitives can be

easily added to the system by additional OGNs that have

special hardware. The basic goal, which may raise load

balance issues, is to add more processors when performance

demands increase. The system understanding is simplified,

since the complex task of merging many objects is done

through the generic machine structure.

An underlying advantage is the trivial pixel level

solution to the intersection problem. A solution to a set

of simultaneous equations is usually done to solve the











intersection of two or more surfaces. This would require a

large amount of calculations. The pixel level approach of

this new architecture reduces the geometry that is

typically involved for solving intersection problems to the

comparison of depth values. The solution is an approxima-

tion, however it is visually correct. Also, the hidden

surface problem is solved in a similar pixel level manner.

Since reflectivity is handled by a mixing factor, the

opaque, transparency, semi-transparency and edge-anti-

aliasing problems associated with computer graphics are

also consolidated to the pixel level. Along with this is

the inclusion, at the pixel level, of the atmospheric

attenuation effects. Thus, a compact pixel-by-pixel method

allows the solution to complex geometrical problems and the

inclusion of complex realistic image effects.

This organization will allow a full range of CGI

systems that demand different performance requirements to

be accommodated through one basic set of modules. It will

be a generic structure that can grow with technological

advances. System improvement is accommodated by adding

modules as opposed to a system redesign that is usually

associated with typical real-time CGI systems. Thus, in

contrast to current fixed performance brute-force real-time

CGI architectures, a variable performance and expandable

real-time CGI architectural approach is presented here.












Target Applications

Target applications of this device will not be

restricted to any specific real-time simulation task, i.e.,

vehicle simulation. A goal of this research is to extend

the architecture for inclusion of.other real-time simu-

lation applications, e.g., process and system simulations.

It will be a general purpose framework to simulate many

things, in real-time, with visual output.

In short, the object generation nodes can be thought

of as processing logical objects. Objects can be a single

item or a collection of items. For instance, an object is

abstract: blade of grass, total field of grass, or physical

object. Therefore, some or all of the tasks of simulation

can be moved to each object, so long as these tasks are

separable. However, the tasks do not have to be cleanly

separable. For instance, overlaps could exist, which would

be resolved in compositing. This would allow simple

splitting of some objects into two somewhat overlapping

ones without the need of calculating new intersections due

to an artificial division cut. Also, true-color, pseudo-

color, or both can be applied for the visual simulation

with respect to the problem domain.

















CHAPTER IV
COMPOSITING NETWORK


The compositing network (CN) is a hardware compositor

that produces a computer graphics picture through blending

heterogenously rendered objects into a full image. These

separately rendered objects are reductions of a total

modeled environment into pieces that rely on compositing

techniques for accumulation. Each object is produced by an

individual object generation node (OGN), which is in itself

a computer image generation device. The network configura-

tion is in the form of a synchronous feed-forward tree that

is connected to a multiplicity of object generation nodes

(OGNs) for input and to a single video generation node

(VGN) for output. Therefore, many object images are com-

posited simultaneously. The composite of additional object

images is done through enlarging the compositing network

and through including additional object generation nodes.

There is no fixed configuration, but rather a general

framework to configure a compositing network utilizing a

collection of basic building blocks, called compositing

processing nodes (CPNs).

The compositing network operation requires the simul-

taneous input of all instances of pixels with the same X











and Y cartesian coordinate per unit time. Each instance of

a pixel is part of an individual object rendered by an

object generation node. These pixel values flow in a

synchronous feed-forward manner through the compositing

network, while being merged pixel-by-pixel at particular

stages. The last stage of the network provides a single

surviving pixel as output, which has an implied X coordi-

nate and Y coordinate.

To summarize, the compositing process is carried-out

pixel-by-pixel through three steps: 1) every pixel value is

simultaneously read from each image array of the object

generation nodes at a specified X coordinate and Y coordi-

nate, 2) the compositing process operates on the collection

of pixel values read from the OGNs to produce a single

composite pixel resultant, and 3) the single composite

pixel resultant is written to the resident image array of

the video generation node at the same X coordinate and Y

coordinate used for the read operation. This process is

repeated at every X coordinate and Y coordinate of the

image array to produce every composite pixel value of the

image array within the video generation node.

The entire compositing network action, for each

collection of pixels, can be generally characterized by


Pc = oper(P1, P2, P3,..'* Pi) (1)

at every pixel with identical X, Y cartesian coordinates in











the i image arrays. The value Pc represent a single sur-

viving pixel after compositing. The "oper" operator is a

general operation that symbolizes the compositing action

due to the entire compositing network. Specifics of the

compositing algorithm that the compositing network realizes

are developed and described in the following sections.


Compositing Methodology

Guidelines for the generation of 2-D pictures and

arithmetic for their 2-D compositing were discussed by

Porter and Duff [POR84]. Their compositing method produced

antialiased composite images through a pixel-by-pixel

process. The antialiased composite or antialiased blending

of images requires information about the subpixel overlap

and object opacity. This information, as discussed by

Porter and Duff [POR84], is given by adding a mixing factor

to the color channels, which is called an ALPHA value.

Therefore, a pixel is defined by four independent varia-

bles: RED, GREEN, BLUE, and ALPHA. Thus, the interplay of

alpha values must be considered for compositing objects to

accumulate a final image [POR84].

The ALPHA portion of an object representation provides

two pieces of information for compositing: 1) the single

ALPHA value represents the extent of coverage of an object

within a pixel and 2) the collection of ALPHA values repre-

senting an object provides coverage information that desig-

nates the shape of an object within the image space. The












pixel coverage information provides a mixing factor to

control linear interpolation of foreground and background

colors at every pixel. The object shape information, which

is termed a matte, identifies the object from what is not

the object within an isolated image array.

The ALPHA value represents the opacity of a pixel,

which is a fractional value that ranges from zero to one.

The antithesis of ALPHA, which is the transparency of a

pixel, is defined as (1-ALPHA). Therefore, the transpar-

ency value also ranges from zero to one. Figure 4-1 illus-

trates this coverage information, pictorially, for three

distinct coverage types of opacity: no coverage, full

coverage, and partial coverage. As shown, no coverage is

indicated by an ALPHA value of zero, full coverage is

indicated by an ALPHA value of one, and partial coverage is

indicated by a fractional ALPHA value between zero and one

[POR84].

The pixel coverage information consists of an average

value of opacity. Therefore, the subpixel distribution of

opacity is not known or, in other words, the subpixel shape

is not known. Thus, some pixel coverage information is

missing, but the ALPHA value information is still useful

for rendering transparent objects, semi-transparent ob-

jects, and performing non-commutative object edge anti-

aliasing for rendering opaque, semi-transparent, or trans-

parent objects. Also, since the ALPHA value represents















ALPHA = 0


a) No Coverage


ALPHA = 1


b) Full Coverage


0 < ALPHA < 1


c) Partial Coverage
(arbitrary depiction)


Figure 4-1. Three distinct types of pixel coverage, with
respect to the ALPHA value: a) no coverage, b)
full coverage, and c) partial coverage. The
subpixel shape of the pixel with partial
coverage is arbitrary and is only shown in
this manner for conceptual clarity.












the average coverage of an object within a pixel, the pixel

color is determined by the product of ALPHA and the

object's true color.

Porter and Duff [POR84] discussed many operators for

the compositing of two-dimensional images. The operator

of interest to this research is the "over" operator. This

operator computes a composite pixel color due to one pixel

in front of another. The composite pixel color is given by


cc = Cf + (1 Af)cb (2)


and the composite opacity


Ac = Af + (1 Af)Ab (3)


where c denotes one of three tri-stimulus color values, A

denotes the opacity value ALPHA, the subscript c denotes

the composite, subscript f denotes the foreground, and

subscript b denotes the background. Also, the true fore-

ground color Cf is multiplied by the foreground opacity Af

to produce cf and the true background color Cb is multi-

plied by the background opacity Ab to produce cb. This is

done to keep the computation of cc similar to the computa-

tion of Ac. The derivation of the "over" operator is

presented by Porter and Duff [POR84]. A similar develop-

ment of "over," adjusted for this research, is presented

in the following section of this chapter.












Porter and Duff's approach has a drawback of requiring

the priority of images to be manually entered. Therefore,

Duff [DUF85] introduced the depth variable, Z, as an ex-

tension to the earlier image composition algorithm to

correct this drawback. The approach extended each pixel

in the image space to contain five independent variables:

RED, GREEN, BLUE, ALPHA, and Z. From this representation

an RGBAZ algorithm was developed that combined the "over"

operator of Porter and Duff [POR84] with a Z-buffer algo-

rithm. Before discussing Duff's [DUF85] approach, the Z-

buffer algorithm is presented and discussed.

A Z-buffer is a depth buffer that stores the Z car-

tesian coordinate, which is also termed the depth coordi-

nate, of every visible pixel in image space. It is used in

conjunction with a frame buffer, which is an attribute

buffer that stores the intensity of each pixel in image

space. A Z-buffer algorithm is a hidden-surface algorithm

that operates on the RGB intensity information and the

depth coordinate, Z, stored at each pixel in image space.

The Z-buffer algorithm is described by Catmull [CAT74]. It

functions by comparing the depth value of a new pixel,

which is to be written into the frame buffer, with the

depth value of the pixel that is currently stored in the Z-

buffer. If the comparison indicates that the new pixel is

closer to the viewpoint than the current pixel, then the

new pixel's intensity value is written into the frame












buffer and its depth value is written into the Z-buffer

[ROG85]. If the comparison does not indicate the new pixel

is closer to the viewpoint than the current pixel, then the

current pixel values remain in the frame buffer and in the

Z-buffer.

To recapitulate, the Z-buffer algorithm is a search

over X, Y in 3-D space for the value of Z(X,Y) that is

closest to the viewpoint in image space. The Z-buffer

operation can be defined as RGBZ = zmin(L,M), where L is an

image array, M is an image array, and RGBZ is the survivor

pixel from either M or L according to the algorithm. The

collection of resultant RGBZ survivors over X, Y produces

an image space that is a composite image of the rendered

objects. This composite operation [DUF85] is more com-

pactly characterized as


Zc = zmin(ZL, ZM) (4)

RGBc = RGBL, if ZL = zmin, else RGBM (5)


at every pixel in the two image arrays. The subscript c

denotes the composite. Two properties of the "zmin" oper-

ator is that it is both commutative and associative.

The Z-buffer algorithm allows pixels to be written

into the frame buffer in arbitrary order. Therefore, the

computation time associated with a depth sort operation is

eliminated [ROG85]. Unfortunately, the algorithm has

inherent aliasing problems due to its point sampling nature












[DUF85]. It also fails for rendering transparent objects,

but it is fast and simple [CAR84].

Duff's approach utilized the depth value at each of

the four corners of a pixel to compute a fraction called

BETA. This value is computed through linearly interpolat-

ing the four depth corner values. The composite color is

computed by


cc = B(cf over cb) + (1 B)(cb over cf) (6)

and

Zc = min(Zf, Zb) (7)


where c denotes one of three tri-stimulus color values

multiplied by its respective opacity value, Z denotes the

depth value, B denotes Duff's BETA value, the subscript c

denotes the composite, subscript f denotes the foreground,

and subscript b denotes the background. This approach

combines the pixels by area sampling. A drawback of this

3-D compositing approach and of the previously discussed 2-

D compositing approach is that they do not apply when the

edges of more than one object are projected onto a single

pixel. The compositing algorithm developed in this re-

search, which is discussed in the following section of this

chapter, addresses this problem.

Another interesting approach to compositing was dis-

cussed by Carpenter [CAR84], with the introduction of the

A-buffer. An A-buffer is an antialiased hidden surface












mechanism that is an enhancement to the Z-buffer through

inclusion of a mask that contains subpixel coverage infor-

mation. Therefore, the mask provides more pixel coverage

information than the ALPHA value, but it is more memory

intensive.

The compositing techniques specified in the reviewed

literature had specific idealic objectives, which are

listed as follows:


1. Must not induce spatial aliasing in the image,

which implies that soft edges of objects must be

respected in computing the final image [POR84].


2. Provide facilities for arbitrary dissolves, fades,

darkening, and attenuation of objects [POR84].


3. Exploit the full associativity of the compositing

process, which implies accumulation of several

foreground objects into an aggregate foreground can

be inspected over different backgrounds [POR84].


4. Allow various object representations: transparent,

semi-transparent, and opaque [POR84].


5. Visibility technique must support all conceivable

geometric modeling primitives: polygons, quadrics,

patches, fractals, and so on [CAR84, DUF85].












6. Must handle opaque intersecting surfaces and trans-

parent intersecting surfaces [CAR84].


7. Must handle hidden surface removal [CAR84, DUF85].


The proposed new architectural approach attempts to

satisfy these compositing technique objectives. Unfortu-

nately, due to trade-offs taken to keep the approach within

hardware limits, some of these objectives are not entirely

met. The constraints and trade-offs associated with the

approach addressed through this research, which concern the

stated idealic objectives, are discussed in later sections

and chapters.


RGBAZ Compositing Algorithm

The proposed compositing method is developed to allow

any number of images to be composite with hidden surface

removal and antialiasing. The compositing algorithm real-

ized by the compositing network is based on Porter and

Duff's [POR84] "over" operator, but is modified through the

introduction of the depth value. This rendition modifies

the "over" operator through incorporating the "zmin" oper-

ator for identifying the foreground pixel from the back-

ground pixel. The algorithm is labeled an RGBAZ algorithm,

as was Duff's [DUF85], but differs from that formulation.

It is developed and described in the subsequent paragraphs.

Consider opacity values, A1 and A2, belonging to a

pair of semi-transparent pixels, P1 and P2, that have












identical X and Y coordinates, but differ in the Z coordi-

nate where the Z1 value is less than that of the Z2

value. The composite Z value for this situation, in ac-

cordance with the Z-buffer algorithm utilizing Equation 4,

is given by


Zc = Z1 (8)

where Z denotes the depth value, and subscript c identifies

the composite resultant.

The depth comparison identifies pixel P1 as being

closer to the viewpoint than pixel P2. Therefore, pixel P1

is identified as the foreground pixel and pixel P2 is

identified as the background pixel. The opacity represen-

tation designates the opaqueness of pixel P1 as A1 and its

clearness as (1 AI). Likewise, the opaqueness of pixel

P2 is A2 and its clearness is (1 A2). This implies that
the composite opacity, according to the "over" operator, of

the two pixels is given by


Ac = Al + (1 A1)A2 (9)


where A denotes the opacity value. An example of this

situation is depicted in Figure 4-2.

The composite color is calculated by realizing that

pixel P1 allows (1 AI) of its background light through

and reflects A1 of its color. Likewise, pixel P2 allows (1

- A2) of its background light through and reflects A2 of


















I
A2 (1


I I
A1 (1-A


I I
A1 (1-A 1


Figure 4-2.


Background Object
(partial pixel coverage)




-A 2)





Foreground Object
(partial pixel coverage)




1)





Composited Objects
(shared partial pixel coverage)




N2 (1-A 1 )(1-A2)


Two pixel opacity values are composite. The
values were derived from coverage information
from two different objects. The coverage
depictions are arbitrary, they are given
specific subpixel forms to clarify the
composite operation. The coverage areas are
actually averaged across the pixel.













its color. Therefore, P1 reflects A1 of its color and lets

(1 A1) of P2's reflected color through. This implies

that the composite color, according to the "over" operator,

of the two pixels is given by


cc = A1C1 + (1 A1)A2C2 (10)


where C represents the tri-stimulus colors: RED, GREEN, and

BLUE. The upper case C is used to designate the true

color, which occurs when the pixel is 100% overlapped by

the object. The lower case color c depicts the true color

value multiplied by its opacity value, which is given by


cc = AcCc (11)


A similar argument follows, as presented above, when

Z2 is less then Z1. For this condition, substitute pixel

subscript identifiers "2" for "1" and "1" for "2" in the

development presented above. The composite depth, opacity,

and color would then be given by


zc = z2 (12)

Ac = A2 + (1 A2)A1 (13)

cc = A2C2 + (1 A2)A1C1 (14)

respectively.

The incorporation of the "zmin" operator with the

"over" operator requires an additional development for the

effects of two pixels with equal depth values. This condi-












tion implies that two objects are occupying the same voxel

in space. Therefore, both objects contribute to the in-

tensity of the resultant pixel, but the intensity contribu-

tion due to each of these objects is nebulous. This

condition can be understood by considering the quantization

error due to the use of finite depth values. The opacity

contributions from the input pixels may be due to pixel

overlap. But, the foreground and background object can not

be discerned, since the difference in depth is within the

limits of the quantization error.

The development of this condition will consider the

pixel as a small cubic volume, instead of a small surface.

This model will allow the edges of two objects to be

projected into its space. The viewable or reflective front

surface of this small cubic volume is only of interest for

determination of the opacity and color values.

The composite opacity is found by first considering

the condition, Z1 < Z2, where (Z2 Z1) is within the

quantization error. The composite opacity would then be

equal to Equation 9. Now, consider the condition, Z1 >

Z2, where (Z1 Z2) is within the quantization error. The

composite opacity would then be equal to Equation 13. The

probability of either of these conditions occurring within

the small cubic volume are equal. Therefore, the composite

opacity and color values are computed through a simple

average of the two possible conditions, which are given by













Ac = [(A1 + (1 A1)A2) + (A2 + (1 A2)A1)]/2

= Al + A2 A1A2 (15)


and


cc = [(A1C1 + (1 A1)A2C2) + (A2C2 + (1 A2)A1C1)]/2

= A1C1 + A2C2 (C1 + C2)A1A2/2 (16)


Also, the composite depth is given by


Zc = Zi = Z2 (17)


It is interesting to note that Equations 9, 13, and 15 are

equal.

Boundary analysis of Equation 16 is performed to check

its validity, which is presented as follows:


Cc = A1C1, if A2 = 0 (18)

Cc = A2C2, if A1 = 0 (19)

Cc = (C1 + C2)/2, if A1 = A2 = 1 (20)

The first two boundary examples demonstrate a reduction to

a single pixel case, which is to be expected. The last

boundary example reduces to an average color that does not

become amplified, which is also to be expected. A psueoo-

code outline of this RGBZA compositing algorithm with

respect to a pair of image arrays is given in Figure 4-3.

As shown, each pixel of the two image arrays are compos-

ited to produce a composite image array for display. The



















RGBZA Compositing Algorithm
given
An array RGBZAl[x,y]
An array RGBZA2[x,y]
An array rgbZAc[x,y]
begin
for each element(x,y) of array rgbZAC[x,y] do
AC = Al + A AIA2
f Z1 < Z2 henA
rc = A1R1 + (1-A1)A2R2
gc = AG1 + (1-AI)A2G2
bc = AiB1 + (1-AI)A2B2
Z = Z1
endif
if > Z2 then
rc = A2R2 + (1-A2)A1R1
c = A2G2 + (1-A2)A1G1
bc = A2B2 + (1-A2)A1B1
Z = Z2
endif
if Z = Z2 then
rc = AIR1 + A2R2 (R1 + R2)AIA2/2
gc = A1G1 + A2G2 (GI + G2)A1A2/2
bc = A1B1 + A2B2 (B1 + B2)AIA2/2
Z = Z1
endif
endfor
Display rgbc array of the rgbZAc array
end


Figure 4-3. The RGBZA Compositing Algorithm.












composite opacity and depth values are not needed for

display; they are included so that the resultant image

array can be composite with other image arrays. This

subject is discussed in the following section.


Network Structure

The compositing operation described in the previous

section dealt with compositing two pixels, each produced

from two separate objects, to form a single composite pixel

as a result. A method of compositing many pixels, where

each pixel is produced from many objects, would be to

create a hierarchy of compositing operations. At the

bottom of the hierarchy, compositing operations would

simultaneously accept pixel values from separate image

arrays as input. The multiple outputs of the bottom level

in the hierarchy would be used as inputs to the next level

in the hierarchy. This process would continue until an

individual output is produced at the top of the hierarchy

of compositing operations. The result would be a composite

pixel value of every pixel value used as input to the

lowest level of the hierarchy. This composite pixel value

would then be written into an image array at the same X, Y

coordinate that was used for the input pixels. The same

procedure would be done for all succeeding composite pixel

value outputs of the hierarchy, which would produce a

complete composite image array of many objects.












A hardware synthesis of the hypothetical hierarchy of

compositing operations is what defines the compositing

network. It is created through interconnecting an ensemble

of fundamental hardware compositing units that realize the

compositing operation. These units are termed compositing

processing nodes (CPNs). The defined function of a CPN is

to produce a single composite pixel value from a pair of

input pixel values. It maintains a 2-to-1 configuration,

where the output of one CPN can supply an input to a suc-

ceeding CPN. This is an iterative property, which is the

property required to realize the hierarchy of compositing

operations in hardware. The structure of the entire

compositing network is driven by the structure of an

individual CPN. Therefore, the interconnection of CPNs

forms a binary tree, which realizes the compositing net-

work. A depiction of this structure is shown in Figure 4-

4, which illustrates a fully balanced 3 level compositing

tree that has 7 CPNs and 8 connections for deeper CPN

levels or for terminal OGN connection. The general struc-

ture for a fully balanced tree with N terminal connections

maintains n=log2N levels with N-l CPNs for the network

configuration. However, the compositing network does not

have to be a fully balanced tree. It can be unbalanced as

long as all of the OGNs are connected at the same level

within the system tree.











To Video Generation Node (VGN)


Level 1









Level 2









Level 3


1 2 3 4 5 6 7 8

From deeper levels of Compositing Processing Nodes (CPNs)
or from eight Object Generation Nodes (OGNs)


Figure 4-4.


A fully balanced three level compos-
iting tree.












The compositing network can be modeled as a pipelined

machine, where each level in the system's binary tree

structure is a pipeline stage. At every machine cycle, a

collection of pixels, with identical X, Y coordinates, are

routed to the nodes within the lowest level of the system

tree. The machine operation proceeds in a synchronous

feed-through manner for every machine cycle, where a col-

lection of pixels at a particular level in the system tree

is computed to produce a collection of composite pixel

values as a result. These results are routed, before the

next machine cycle, to the inputs of a succeeding level in

the system tree. Therefore, each succeeding level in the

system tree produces half the amount of pixel values (fully

balanced tree) then were provided for input. The output of

this machine provides a single composite pixel value as a

result, which is produced from the highest level of the

system tree.

This structure is classified as a synchronous feed-

forward configuration, where CPN operation is synchronous

with the image update rate. Therefore, the machine cycle-

time is a function of both the image space resolution and

the image update rate. The pipeline is considered full

when every CPN in the system tree has a valid input.

During a full pipeline state, each level of the tree is

processing a set of pixels that have identical X, Y coordi-

nates. Therefore, the start-up time through a tree will be











a function of the tree depth and the number of pipeline

stages within an individual CPN.

The effect of the feed-through structure, of the

compositing network, has to be considered regarding the

RGBZA algorithm. This structure has a cumulative affect

that directly influences the compositing operation. There-

fore, the RGBZA algorithm has to be adjusted to accommodate

this fact.

The compositing network is a subtree of the system

tree and the OGNs are terminal nodes of the system tree

that provide input to the compositing network. Now, con-

sider the evaluation of the composite opacity value from a

fully balanced system tree with i CPNs and i+l OGNs, where

the total number of tree nodes is 2i+l. The CPNs are

located at binary tree positions 1 through i. The OGNs are

located at binary tree positions i+l through 2i+l. Note

that a fully balanced system tree is used to simplify this

development. However, the system tree can be unbalanced to

accommodate a collection of OGNs that are not a binary

multiple. The criteria is for all of the OGNs to exist at

the same level within the system tree. This subject is

discussed further in the system features discussion of the

conclusion. For the fully balanced system tree, the

composite opacity defined at the first CPN or root node,

1, to the last CPN, i, for all cases, is given as follows












Al = A2 + A3 A2A3

A2 = A4 + A5 A4A5
A3 = A6 + A7 A6A7


Ai = A2i + A2i+l A2iA2i+l (21)

where the subscript identifies the tree node number. The

result is a recursive relation for the evaluation of the

opacity value. The composite color value and depth value

are defined through the use of a similar development for

each of the three depth value comparisons. The condition

Z2i < Z2i+l gives

Zi = Z2i (22)

ci = c2i + (1 A2i)c2i+l (23)

and the condition Z2i > Z2i+l gives


Zi = Z2i+l (24)

ci = c2i+l + (1 A2i+l)c2i (25)

and the condition Z2i = Z2i+ gives


Zi = Z2i = Z2i+1 (26)

ci = c2i + c2i+l (A2ic2i+l + A2i+lc2i)/2 (27)

where the lower case color "c" depicts the true color value

multiplied by its opacity value. This form of the compos-

iting functions require each color value entering the











compositing network to be multiplied by its respective

opacity. Also, each composite color value exiting the

network will be the composite color multiplied by its

composite opacity.

The recursive relations are handled by iterative

techniques utilizing CPNs. The RGBZA compositing algorithm

that each CPN should execute is depicted in Figure 4-5.

This algorithm, which is termed the general RGBZA compos-

iting algorithm, includes the image arrays and the multi-

plication operation of the OGN's. It also includes the

image array of the VGN and a reference to the DDN. The

second loop within the main loop is the actual network

algorithm. This task inputs a collection of pixel values

for processing, according to their respective depth re-

lationship, to produce a single surviving composite pixel

value for output. The loop counts down in order to obviate

the start-up time that would be associated with a hardware

pipeline.

The operation of the entire compositing network action

is reduced to a special case when only opaque objects are

involved without the inclusion of special effects (e.g.,

dissolves, darkening, antialiasing, etc.). This is given

as follows


{Z2i, if Z2i < Z2i+l
c = Z2i+l if Z2i > Zi+l (28)
(Z2i, if Z2i = Z2i+













General RGBZA Compositing Algorithm
const
n = total number of CPNs in tree
given
Array rgbZAi, where i=1,2,3,...,2n+l {node registers}
Array RGBZAjy, where j=l,2,3,...,n+l {OGN Memory}
Array rgb x' {VGN Memory}
begin
for each element(x,y) of RGBZA8 x and rgby do
for i=n+l to 2n+l do {load 6N'utput registers}
j =i-n
ri = AjYjY

A = Aj=
-i -jx,yGjIx~y

end or {end load OGN output registers}
for i=n downto 1 do (CPN compositing operation)
A = A2i + A2i+ A2i(A2i+)
if Z < 22i+1 then
ri =r2i + (l-A2i)r2i+l
gi= g2i + (1-A2i)92i+l
bi = b2i + (l-A2i)b2i+l
Zi Z2i
endif
if Z2i > Z2i+l then
i = r2i+l + (l-A2i+l)r2i
gi = g2i+1 + (1-A2i+1)92i
bi = b2i+l + (l-A2i+l)b2i
Zi Z2i+l
endif
if Z2i = Zi+l then
ri = r2i + r2i+l (A2ir2i+l + A2i+lr2i)/2
i = g2i + g2i+ (A2ig2i+l + A2i+192i)/2
bi = b2i + b2i+l (A2ib2i+l + A2i+1b2i)/2
Zi = Z2i
endif
endfor {end CPN compositing operation}
rgbxy = rgbi {write composite result to VGN}
endfor
Display rgbxy array {DDN}
end



Figure 4-5. The general RGBZA compositing algorithm for a
fully balanced tree. Note that lower case
letters designate the product of intensity and
opacity.












C2i, if Z2i < Z2i+
C= {C2i+1 if Zi > +1 (29)
{(C2i + C2i+1)/2, if Z2i Z2i+1

at every pixel in the i+l image arrays. The composite

pixel has either full coverage or no coverage. Therefore,

the opacity information is not needed. Also, the matte

information is implied by a depth value that is not the

maximum. The specialized RGBZ algorithm is presented in

Figure 4-6. As for Figure 4-5, the second loop within the

main loop is the actual network algorithm. This task

inputs a collection of pixel values for processing, ac-

cording to their respective depth relationship, to produce

a single surviving composite pixel value for output. A

mixture of both the specialized and the general forms of

the compositing algorithm for CPNs can compose a

compositing network. The OGNs can be specialized for

opaque objects without antialiasing and special effects.

These nodes would be assigned to the section of the tree

that contain the specialized CPNs. Also, OGNs that process

objects with antialiasing and special effects can be

assigned to the section of the tree that contain the

general CPNs. Configurations could include a mix of

general and specialized CPNs. The purpose of mixing CPNs

would be to reduce the system complexity, since the

specialized CPNs are of a simpler form than the general

CPNs.















Specialized RGBZ Compositing Algorithm
const
n = total number of CPNs in tree
given
Array RGBZi, where i=1,2,3,...,2n+l {node registers)
Array RGBZj ,,, where j=l,2,3,...,n+l {OGN Memory)
Array RGBxy {VGN Memory)
begin
for each element(x,y) of RGBZ and RGB do
for i=n+l to 2n+l do {load 6GN output re4gsters)
j =i-n
Ri = R
Gi = Gj,x,y
Bi =Bjx,y
B = B*'x'Y
i ),x,y
Z = Z.
end or {end load OGN output registers)
for i=n downto 1 do {CPN compositing operation}
if Z2i < Z2i+l then
Ri = R2i
Gi = G2i
Bi = B2i
zi = Z2i
endif
if Z2i > Z2i+ then
Ri = R2i+1
Gi = G2i+1
Bi = B2i+
Z = Z2i+
endif
if Z2i = Z2i+ then
Ri = (R2i + R2i+i)/2
Gi = (G2i + G2i+i)/2
Bi = (B2i + B2i+i)/2
Z = Z2i
endif
endfor (end CPN compositing operation)
RGBy = RGB1 {write composite result to VGN}
endfor
Display RGBx y array {DDN}
end





Figure 4-6. The specialized RGBZ compositing algorithm of
a fully balanced tree.













Compositing Processing Node

The purpose of a CPN is to perform pixel-by-pixel

compositing. It is a fundamental iterative hardware build-

ing block used to construct a CN tree. A generic CPN

configuration is depicted in Figure 4-7. The subscript "i"

is a node number that identifies a node within the system

tree, which consists of CPNs, OGNs, and a VGN. The CN is a

subset of the system tree that contains only CPNs. The

OGNs are the terminal nodes of the system tree. The VGN is

connected to the root node of the CN and is identified

through node number zero of the system tree. As shown, the

CPN structure maintains a 2-to-1 configuration. The data

inputs consists of two pixel values P2i and P2i+lr which

can be routed to the CPN by either two preceding CPNs or by

two preceding OGNs. The data output consists of a single

pixel value, which can be routed to the input of a suc-

ceeding CPN or to the input of a video generation node.

The CPN input clock, CLK, is driven by a system clock that

synchronizes the internal CPN operation with the entire

system. This signal is provided by the video generation

node (VGN), which maintains the entire system timing and

control.

A pixel is represented by five independent variables:

RED, GREEN, BLUE, ALPHA, and Z. The tri-stimulus color or

intensity is represented by the values of RED, GREEN, and

BLUE. The ALPHA value represents the average opaqueness of












Pixel output to next
stage of system tree


System
Clock
From
VGN


Pixel input from Pixel input from
previous stage previous stage
of system tree of system tree
where,
P. ={ri, g bi, Ai, Zi}

P2i ={r2i' 2i, b2i A2i Z 2

P2i+1 ={r2i+1 g 2i+1' b2i+1, A 2i+1' Z2i+1
CPNi is a CPN located in the CN at node position "i."

CLK is the system timing input.


Figure 4-7.


An iterative building block depiction of a
compositing processing node (CPN).













the pixel or the average light blocking characteristic of

the material that the pixel represents. The Z value repre-

sents the Z coordinate, in cartesian space, where the pixel

exists. The X, Y cartesian coordinates are implied as

identical for both input pixels, but may be different

within the same clock cycle for the single output pixel due

to the hardware pipeline approach.

Schemes for realizing the previously discussed RGBAZ

compositing algorithms are developed that are fast and

inexpensive to implement in hardware, but which produces

results of numerically high quality. These schemes honors

two considerations: machine and numerical considerations.

Machine considerations concern speed and cost of the physi-

cal device. Numerical considerations concern the closest

approximations to the exact numbers. The schemes attempt

to maintain a balance between both. Also, the effects of

roundoff error accumulation due to the feed-through opera-

tion of the binary tree of CPNs are taken into consider-

ation.

Finite precision fixed-point numbers are used in this

machine for representation of the pixel values. This

representation allows storage of pixel values within the

local image buffers of the OGNs and of the VGN to be inte-

gers, which simplifies the image buffer organization.

Also, the hardware complexity for realization of the com-

positing algorithms and of the video generation processing












algorithms is reduced, along with accommodation of faster

cycle times for an implementation. Therefore, the compos-

iting algorithms that are depicted in Figures 4-5 and 4-6

must be adjusted to accommodate the fixed-point repre-

sentation of a pixel value, which relates to the machine

precision of a number represented within and operated by a

CPN.

The tri-stimulus color variables RED, GREEN, and BLUE,

are usually represented in rendering algorithms as fixed-

point numbers. Therefore, they do not create an initial

problem. But, roundoff error amplification can occur due

to the repetitive modification of these values through the

compositing network. Therefore, to enhance the numeric

accuracy of the final result, the roundoff error has to be

controlled. Representation of the opacity value, ALPHA,

presents a similar problem, but differs slightly since its

initial value is defined as a fractional number. The depth

variable, Z, is usually represented within a rendering

algorithm as a floating-point number. The compositing

network handles the depth value as an integer and does not

modify its value, therefore its floating-point value can be

rounded or truncated.

The algorithm modification and the CPN conceptual

hardware organization are discussed in the following

sections. Two CPN organizations are presented: the general

CPN and the specialized CPN. The combinatorial hardware












layout of the conceptual block diagrams consist of breaks

in the logic for registers, termed stages. This is done to

maintain a pipeline of partial operations, which enhances

performance. Maximum system performance is then achieved

by matching the clock cycle to the longest delay through

the slowest stage. The stage delay is calculated by

totaling the delay through the logic and conductors that

exist between the two registers of a stage. Also, the

stage with the longest delay becomes the bandwidth limiting

section.


General Compositing Processing Node

A general CPN performs pixel-by-pixel compositing of

various object types: opaque, transparent, and semi-trans-

parent. It also handles antialiasing and special effects,

such as fade-outs and fade-ins. The hardware organization

contains three distinct functional units: the depth compu-

tation unit, the opacity computation unit, and the color

computation unit. These functional units are discussed

with respect to finite fixed-point pixel value repre-

sentation. The algorithm and the conceptual hardware

organization of each unit is presented.

Depth computation unit

The depth computation unit discerns the foreground

pixel from the background pixel or identifies both as

foreground pixels. This unit functions according to the











algorithm presented in Figure 4-8, which is a subset of the

general RGBZA compositing algorithm of Figure 4-5. The

depth value, Z, is represented by a single z-bit integer,

where 0 S Z < (22-1). Therefore, the floating point repre-

sentation of this value is initially truncated or rounded.

The task performed by this unit, as the algorithm indi-

cates, consists of 1) receiving a pair of depth values, 2)

performing a comparison of depth values, 3) providing

status information, and 4) outputting the smallest depth

value. Status information consists of the LESS bit and

EQUAL bit, which are used by the color computation unit.

The LESS bit, when set, indicates that the Z2i value is

smaller then the Z2i+l value. The EQUAL bit, when set,

indicates that the Z2i value and Z2i+l value are equal in

magnitude.

A block diagram of the CPN depth computation unit is

depicted in Figure 4-9. Stage 1 performs an initial load

of the incoming pair of depth values from the CPN intercon-

nect. Stage 2 performs a comparison of the two depth

values for status information and passes the two depth

values along with the status information. Stage 3 routes

the surviving depth value, which is the composite depth

value, to succeeding stages utilizing a 2:1 multiplexer

with the LESS status bit as a selector. The succeeding

stages are waiting stages that allow a final result to



























CPN Depth Computation Unit Algorithm


given
literal Z2i
literal Z2i+l

begin
EQUAL = 0
if Z2i < Z2i+l then
Z. =Z
ESS =2
else

ii Z 26+1
if Z E = Z2i+1 then
EQUAL = 1
endif
endif
end


{z-bit integer)
(z-bit integer}












{result is a z-bit integer}


Figure 4-8. The algorithm performed by a general CPN depth
computation unit.








































Comparator Symbol
In A In B





V II A

Note: Registers are clocked
at image update rate.


Figure 4-9. Block diagram of a general CPN depth computa-
tion unit.


Z2i Z2i+1












occur simultaneously with the remaining CPN computation

unit results.

Opacity computation unit

The opacity computation unit produces a single com-

posite opacity value from two opacity values that are

provided as input. The opacity is defined as a positive

fractional value that ranges from zero to one. Each opac-

ity value is stored in the image buffers of this machine as

fixed-point binary numbers. Therefore, the opacity value,

A, is represented by a positive fixed-fractional value

given by


A
0 ---- 1 (30)
Amax

where A is a binary integer such that 0 < A < Amax, and

Amax is a constant that defines the range of opacity. The
local image buffers store the integer value, A, while the

fixed-fractional value is incorporated by the hardware.

Substituting the opacity representation of Equation 30 into

Equation 21 and collecting terms, gives


(Amax A2i)A2i+l
Ai = A2i + ---------- (31)
Amax

The division required in Equation 31 is eliminated by

defining Amax as 2m-1, where m is the number of bits in A.

This transforms the division operation to a shift opera-












tion. A trade-off occurs with this technique, since each

image buffer, within the OGNs, will require an extra bit-

plane and an extra signal line to represent the opacity

value for a particular range. Substituting the value of

Amax into Equation 31, gives


Ai = A2i + -----m-------- (32)


In order to have a more accurate result, the hardware unit

represents each opacity value as a higher precision number,

which reduces the roundoff error accumulation through the

compositing network. This is shown by multiplying both

sides of Equation 32 with 2m and adjusting the product

term, which gives

(22m-1 2mA2i)2mA2i+l
2mAi = 2mA ----------------- (33)
22m-1

Equation 33 shows that the opacity value can be handled as

a double precision number, if the opacity value is shifted

left by m bits and if the least-significant-half of the

word is padded with zeroes before entering the CN tree.

Therefore, the opacity computation with the opacity values

defined as double precision numbers is given by

(22m-1 A2i)A2i+1
Ai = A2i + ----22----- --- (34)
22m-1












where the opacity value, Ai, is a binary integer such that

0 < Ai < 22m-l

The opacity computation unit functions according to

the algorithm presented in Figure 4-10, which follows the

developed relations. The task performed by this unit

consists of 1) receiving a pair of opacity values, 2)

performing an opacity compositing operation, and 3) output-

ting the composite opacity result.

A block diagram of the compositing computation unit

is depicted in Figure 4-11. Stage 1 performs an initial

load of the pair of opacity values from the CPN intercon-

nect. Stage 2 performs a subtraction operation and passes

the two opacity values along with the subtraction result.

Stage 3 performs a multiplication of the subtraction

result with the A2i+l opacity value and shifts the multi-

plication result right by 2m-2 bits (division). It also

passes the A2i opacity value along with the shifted multi-

plication result. Stage 4 sums the A2i value with the

shifted multiplication result and performs rounding, which

produces the composite opacity. Note that eliminating the

signal input to the carry bit and setting the carry bit to

0 will cause chopping of the multiplication result, instead

of rounding. The succeeding stages are waiting stages that

allow a final result to occur simultaneously with the

remaining CPN computation unit results.

























CPN Opacity Computation Unit Algorithm

const
m = number of bits of initially stored opacity value

given
literal A2i (2m-bit integer}
literal A2i+1 {2m-bit integer)

begin
Ai = A2i + [(22m-1 A2i)A2i+1] shr 2m-l

if [(22m-1 A2i)A2i+ AND 22m-2] = 22m-2 then
Ai = Ai + 1 (roundoff error)
endif
end {result is a 2m-bit integer)



















Figure 4-10. The algorithm performed by a general CPN
opacity computation unit.












A2i A2i+1


22m-1


A2i
A2i+1


LSB


Note: Registers are clocked
at image update rate.
reg


reg


Aj


Figure 4-11. Block diagram of a general CPN opacity compu-
tation unit.


Adder Symbol
InA InB


Overflow Carry In

Result











Color computation unit

The color computation unit produces a composite color

value from two color values that are provided as input.

The color values are defined as the true tri-stimulus color

values multiplied by their respective opacity value. Each

primary color (intensity) value, c, is stored in the

machine's image buffers as an n-bit integer, where 0 < c <

2n-l. But, this hardware unit handles the opacity and the

color values as higher precision numbers (2m and m+n-1 bits

respectfully) in order to reduce the roundoff error

accumulation through the compositing network.

The composite color operation for the Z2i < Z2+1

condition is developed by substituting the higher preci-

sion representations of both the intensity and the opacity

values into Equation 23, which gives


(22m-1 A2i)c2i+l
ci = c2i + ---- m-----(35)


where each primary color value, c, is defined as an n+m-1

bit value within the CN tree. Therefore, each n-bit prima-

ry color value requires multiplication by the m-bit opacity

value before entering the CN tree. When exiting the CN

tree, each color value requires shifting right by m-1 bits

with rounding or chopping to provide an n-bit result.

The Z2i > Z2i+1 condition is obtained through a simi-

lar development as above, but with the use of Equation 25.












It is given in final form by

(22m1 A2i+l)c2i
ci = c2i+1 + ------2------ (36)



The Z2i = Z2i+ condition is also obtained through a

similar development as above, but with the use of Equation

27. It is given in final form by

(22m-1 A2i+1/2)c2i (A2i/2)c2i+1
ci = c2i+1 + ----------- --- (37)
22m-1 22m-1


Note that the first two terms of Equation 37 are similar to

Equation 36. This reduces the hardware requirement for a

realization.

The color computation unit functions according to the

algorithm presented in Figure 4-12, which follows the

developed relations, where the color triad is represented

by c. The task performed by this unit consists of 1)

receiving a pair of color triads, a pair of opacity values,

and status information, 2) performing a compositing opera-

tion according to the depth comparison, and 3) outputting

the result.

A block diagram of the color computation unit is

depicted in Figure 4-13. Stage 1 performs an initial load

of the incoming pair of color values from the CPN intercon-

nect. Stage 2 is a waiting stage for the status result of

the depth computation unit. Stage 3 routes the color and













CPN Color Computation Unit Algorithm

const
n = number of bits of initially stored intensity value
m = number of bits of initially stored opacity value


given
literal c2i
literal c2i+1
literal A2i
literal A2i+l
literal LESS
literal EQUAL


{m+n-1 bit integer: red, green, or blue)
{m+n-1 bit integer: red, green, or blue)
(2m bit integer)
{2m bit integer)
(1 bit}
(1 bit}


begin
if LESS = 1 then2m 2i < Z2i+1)
ci = c2i + [(22- A2i)c2i+l] shr 2m-1
if [(22m-1 A2i)c2i+l AND 22m-2] = 22m-2
ci = ci + 1 {roundoff error)
endif
else
if EQUAL = 1 then {Z(i = Z2i+I
ci = c2i+ + [(22m- 2i+ sr l)c2i]
[(A2i shr l)c2i+1] shr 2m-1


then



shr 2m-1


if [(22m-1 A2i+1 shr 1)c2i AND 22m-2] = 22m-2 then
c = c, + 1 (roundoff error)
endif 2m-
if [(A2i shr l)c2i+1 AND 22m-2] = 22m-2 then
ci = ci 1 {roundoff error)
endif
else 2m Z2i > Z2i+}
ci = c2i+1 + [(2 A2i+l)c2i] shr 2m-1
if [(22m- A2i+l)c2 AND 22m-2] = 22m-2 then
c = c + 1 {roundoff error)
endif
endif
endif
end {result is an m+n-1 bit integer)






Figure 4-12. The algorithm performed by a general CPN
color computation unit.
















A2i A24~1


LESS
EQUAL


Adder Symbol
InA In B


Overflow + Carry In

Result
Note: Registers are clocked
at image update rate.


Transparency








Figure 4-13. Block diagram of a general CPN color compu-
tation unit.
































TRANSPARENCY


Figure 4-13--Continued

































TRANSPARENCY


Figure 4-13--Continued
































TRANSPARENCY


Figure 4-13--Continued













opacity values according to their depth priority. It also

shifts the routed opacity value right by 1, if the two

depth values are equal. Included is the multiplication of

the c2i+l color value with the halved A2i opacity value.

Also, the multiplication result is shifted right by 2m-2

bits (division). Stage 4 performs the opacity subtraction

operation and rounds the shifted multiplication result of

stage 3. It also, passes the routed color values, the

EQUAL status bit, and the subtraction result (transpar-

ency). Stage 5 performs a multiplication of the subtrac-

tion result (transparency) with a routed color value along

and passes a routed color value. Included is shifting the

multiplication result by 2m-2 bits (division). It also

routes the rounded value of stage 4, if the two depth

values are equal, but routes all zeroes, if the two depth

values are not equal. Stage 6 sums the shifted multipli-

cation result with the passed color value along with

rounding and passes the multiplexer result of stage 5.

Stage 7 subtracts the multiplexer result from the addition

result of stage 6, which produces the composite color

value.


Specialized Compositing Processing Node

A specialized CPN performs pixel-by-pixel compositing

of opaque objects without antialiasing or special effects.

The hardware organization contains two distinct functional












units: the depth computation unit and the color computation

unit. An opacity computation unit is unnecessary, since

this specialized CPN does not incorporate antialiasing,

semi-transparency, transparency, and special effects.

These two computation units are discussed with respect to

finite fixed-point pixel value representation. The algo-

rithm and the conceptual hardware organization of each unit

is presented.

Depth computation unit

The depth computation unit discerns the foreground

pixel from the background pixel or identifies both as

foreground pixels. This unit conceptually functions iden-

tically to the depth computation unit of the general CPN.

Therefore, it functions according to the algorithm pre-

sented in Figure 4-8. As for the general CPN, the depth

value, Z, is represented by a single z-bit integer, where 0

< Z < (2Z-1). Therefore, the floating point repre-

sentation of this value is initially truncated or rounded.

The algorithm discussion can be found in the general CPN

section.

The block diagram of the depth computation unit is

identical to the unit of the general CPN. Therefore, its

depiction is shown in Figure 4-9. The hardware discussion

can be found in the general CPN section.











Color computation unit

The color computation unit produces a single composite

color value from the pixel values that are provided as

input. The color value, C, is defined as the true tri-

stimulus color values. This machine stores each primary

color (intensity) value in its image buffers as an n-bit

integer, where 0 < C < 2n-1. The color value is passed as

an n-bit integer in the hardware; therefore no shifting is

necessary before entering a specialized CPN. The unit

functions according to the algorithm presented in Figure 4-

14, which follows the relations developed in Equation 29,

where the color triad is represented by C, and each color

is an n-bit integer. The task performed by this unit

consists of 1) receiving a pair of color triad values and

status information, 2) performing a composite operation of

the color triad according to the depth comparison, and 3)

outputting the result.

A block diagram of the color computation unit is

depicted in Figure 4-15. Stage 1 performs an initial load

of the incoming pair of color values from the CPN intercon-

nect. Stage 2 sums both color values along with shifting

the result right by one. It is also a waiting stage for

the status result of the depth computation unit. Stage 3

routes a color triad according to its depth priority

utilizing a 4:1 multiplexer with both the LESS and EQUAL

status bits as selectors. The succeeding stages are



























CPN Color Computation Unit Algorithm


given
literal C2i
literal C2i+l

begin
if LESS = 1 then
Ci = C2i
else
if EQUAL = 1 then
Ci = (C2i + C2i+l)
else
Ci = C2i+1
endif
end


{n-bit integer)
{n-bit integer}


{Z2i < Z2i+)


{z2i =Z2i+l)
shr 1
(Z2i > Z2i+l)


{result is an n-bit integer)


Figure 4-14. The algorithm performed by a specialized CPN
color computation unit.


































LESS -
EQUAL-


Adder Symbol
In A In B



Overflow Carry In

Result

Note: Registers are clocked
at image update rate.


Figure 4-15. Block diagram of a specialized
computation unit.


CPN color






























































Figure 4-15--Continued































LESS -
EQUAL


Figure 4-15--Continued












waiting stages that allow a final result to occur simulta-

neously with the general CPNs. The result is a point

sampled composite color.


Analysis

The compositing network analysis examines two areas:

complexity and performance. Complexity is estimated for

discrete construction, for VLSI fabrication, and for gate-

array construction. Performance is examined with respect

to image space resolution, CPN processing speed, and

compositing network tree depth.


Complexity

The compositing network complexity is a function of

both the CPN complexity and the quantity of CPNs config-

uring a network. CPN complexity is measured utilizing two

metrics: gate count estimate and I/O signal pin count

estimate. The gate count estimate is determined through

partitioning the CPN conceptual hardware organization into

individual functional logic blocks, which are off-the-self

SSI, MSI, and LSI components. Then, the estimated gate

count of each functional logic block is determined and

totaled to provide a gate count estimate of a CPN. This

technique provides an estimate of expected complexity for

integrated circuit fabrication. It also provides an

estimate of board-level complexity for an off-the-shelf

integrated circuit implementation, which is determined











through totaling the functional logic block package types

used. It should be noted that performance is usually

enhanced for a realization by judicious use of additional

gating, which may alter the estimated gate count.

The partitioning of the design into functional logic

blocks allows the examination of implementation trade-offs

that are offered between different technologies. It

reduces the organization to a logic format that can be

matched to the logic resources of a target device. The

hardware synthesis can be individually optimized around

each vendor's library and design rule guidelines for a VLSI

or a gate-array realization.

The functional block equivalent is listed in Table 4-

1, the I/O pin count is listed in Table 4-2, and the gate

equivalent of various standard size functional logic blocks

are listed in Table 4-3. The functional block equivalent

and I/O pin counts were compiled from the conceptual

hardware organizations. The standard logic device gate

counts were estimated by counting the gates within the

functional block diagrams given in TTL data books [FAI84,

SIG84]. The gate count of the 16-bit multiplier was esti-

mated by considering it to be a full adder tree without

input and output registers [KUC78]. This was done since

the input and output registers are taken into account when

estimating gate equivalences of the staging registers.

These gate counts are expected to be close to an upper-









79














Table 4-1. Functional logic block equivalent of the
general CPN and the specialized CPN.

Logic General CPN Specialized CPN
Function DCU OCU CCU DCU CCU

Comparator (z-bit) 1 0 0 1 0

Adder (m-bit) 0 4 8 0 0

Adder (n-bit) 0 0 6 0 3

2:1 Multiplexer z 0 6m+6n-6 z 0

4:1 Multiplexer 0 0 5m+3n-3 0 3n

D Flip-Flop 9z+2 22m+l 52m+48n-34 9z+2 30n

Multiplier (2mX2m) 0 1 0 0 0

Multiplier 0 0 6 0 0
((m+n-l)X2m)

Note: Inverters are not included, since inversion can be
produced through flip-flop output selection.










80


Table 4-2. Pin requirement for the general CPN, the
specialized CPN, and each CPN computation unit.

Signal General CPN Specialized CPN I
Name DCU OCU CCU GCPN DCU CCU SCPN


RED2i

GREEN2i

BLUE2i

Z2i

ALPHA2 i

RED2i+1

GREEN2i+l

BLUE2i+1

Z2i+1

ALPHA2i+1

REDi

GREENi

BLUEi

zi

ALPHAi

CLK

LESS

EQUAL

POWER

GROUND


0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1


z 0

0 2m


0 z


0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1


z 0

0 2m


0 z


0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1

0 0 m+n-1 m+n-1


z 0

0 2m

1 1

1 0

1 0

1 1

1 1


0 2m

1 1

1 0

1 0

1 1

1 1


n n

n n

n n


0 z

0 0

n n

n n

n n


0 z

0 0

n n

n n

n n

0 z

0 0

1 1

1 0

1 0

1 1

1 1


total pins 3z 6m 9n 9n 3z 9n 9n
+5 +3 +13m +15m +5 +5 +3z
-4 +3z +3
-6




















Table 4-3. Gate equivalent and package pin count of
various functional logic blocks.

Logic Package Gate
Function Pins Equivalent

4-bit Magnitude Comparator (74F85) 16 31

4-bit Binary Full Adder (74F283) 16 36

Quad 2:1 Multiplexer (74F157) 16 15

Dual 4:1 Multiplexer (74F153) 16 16

Octal D-Type Flip-Flop (74F273) 20 48

16x16 Bit Multiplier (29517A) 64 4320*

This approximate number excludes the input and output
registers, which would account for about 288 additional
gate equivalent units.












bound. The procedure for estimating the total gate count

for a specific design is found utilizing the tables

subsequent to determining the CPN parameter values: m, n,

and z. The total CPN gate count can be estimated for a

different set of functional blocks by changing Table 4-3,

followed by performing the suggested procedure.

A graphics system is defined to exemplify the sug-

gested technique for estimation of CPN complexity. The

graphics device has three criteria: it will be a full-color

system, it will have better than one-percent incremental

change in opacity, and it will have a high-precision depth

resolution. A full-color device requires the tri-stimulus

colors to provide 16.7 million simultaneous colors, which

is about at the human visual perception limits [ROG85].

Therefore, the required value of n is 8, which provides 8

bits for each primary color: RED, GREEN, and BLUE. The

resolution of opacity that would provide better than a

one-percent incremental change requires m to equal 8, since

its range would be 127 (0 < A < 2m-l). The value of z is

selected as 24, since the depth resolution of 24-bits is

satisfactory for high-end graphics devices. Relating these

selected parameter values of m, n, and z with Tables 4-1

through Table 4-3 produces the specified system estimated

complexity, which is presented in Table 4-4.

The general CPN has a complexity of about 12 times

that of the specialized CPN. Therefore, a compositing























Table 4-4. Estimated complexity of the general CPN, the
specialized CPN, and each CPN computation unit.

Type of General CPN Specialized CPN
Count DCU OCU CCU GCPN DCU CCU SCPN

Pins 77 51 172 258 77 77 147

Gates 1585 5670 33362 40617 1585 1848 3433

Packages 40 32 251 323 40 48 88
16-Pin 12 8 149 169 12 18 30
20-Pin 28 23 96 147 28 30 58
64-Pin 0 1 6 7 0 0 0

Note: The CPN parameters for m, n, and z, are 8, 8, and 24.












network that can utilize a mixture of both the general and

the specialized CPNs would be the most efficient configura-

tion. Table 4-4 indicates that a board level CPN imple-

mentation would have a reasonable package count for the

general CPN and a very reasonable package count for the

specialized CPN. This indicates that a CPN implemented

using off-the-shelf parts is within bounds. At the time of

this writing, a 16,000-gate bipolar ECL/TTL array with 100-

ps delays and 292 input/output cells was available [COL88].

Chip densities of HCMOS arrays are as high as 237,000

gates, with 400-ps switching delays [BUR88]. Therefore,

the CPN gate counts and pin counts are within bounds for a

single chip VLSI implementation or a single-chip gate-array

implementation.


Performance

The CPN performs pixel-by-pixel processing that is

independent of scene complexity. Therefore, its pro-

cessing-time is a function of both the image space reso-

lution and the image update rate, which is given by


1
Processing-Time = ----------------------------- (38)
(Image Update Rate)(Resolution)


The image update rate is considered real-time at 10 frames

per second, since images sequenced at this rate appear to

have a smooth visual flow. The image space resolution is

defined as the total number of visible pixels. The CPN













processing-time for various image space resolutions is

presented in Table 4-5. As shown, to double the final

resolution while maintaining the same level of performance,

the speed of the CPNs must be increased by a factor of

four.

The CPNs of the compositing network operate in lock-

step. When the tree structure is maximally unbalanced so

that each node has a left descendant but none has a right

one, the compositing network degenerates into a linear

pipeline. This implies that each stage of the linear

pipeline must perform its function within the CPN pro-

cessing-time to successfully composite a collection of

pixels. Therefore, the slowest stage in the pipeline is

what determines the peak performance of the compositing

network. This is the multiplication stage for the general

CPN, which implies that the multiplier parts are what

determine the compositing network performance if any CPN is

of the general type. In contrast, the comparator stage

determines the compositing network performance if all CPN's

are of the specialized type.

Consider the example of the constraints section, where

all of the CPN's are of the general type. The 16-bit

multiplier, which had been specified, maintains a 45-ns

multiply time (including set-up time) [ADV85]. This part

has internal input and output registers, therefore the

multiply time can be considered the total pipeline stage



















Table 4-5. CPN processing-time for various image space
resolutions. The image update rate is 10
frames per second.

Image Space Resolution CPN Processing-Time
(pixels) (ns)

640 X 480 325.5

1280 X 960 81.4

1280 X 1024 76.3

1600 X 1280 48.8

2048 X 2048 23.8













time. Therefore, the compositing network would have a

maximum bandwidth of 22.2 million results per second. From

Table 4-5, all but the last entry could be supported with a

single compositing network. The last entry could be sup-

ported if two CN's were used, where each CN would be

dedicated to a separate half of the image array, while

operating at half the image update rate.

The computational performance of a compositing network

that is configured with all general CPNs is measured

through calculating the total number of additions and

multiplications that every general CPN performs per unit

time. A general CPN performs, as a lower bound (all Z's

not equal), eight additions and four multiplications. As

an upper bound (all Z's equal), a general CPN performs

eleven additions and seven multiplications. Therefore, the

range of computational performance for a compositing net-

work configured with all general CPNs is given by


8(CPNs)(BW) < additions/s < 11(CPNs)(BW) (39)

4(CPNs)(BW) < multiplications/s < 7(CPNs)(BW) (40)

12(CPNs)(BW) < operations/s < 18(CPNs)(BW) (41)


where BW refers to the general CPN bandwidth or general CPN

results per second. The CPNs refer to the total number of

compositing processing nodes that comprise a compositing

network.











For example, consider an augmentable system archi-

tecture configured with a three level CN tree with all

general CPNs and a 1600 X 1280 resolution display device

node. It will maintain a CN processing performance of

between 1720-MOPS and 2580-MOPS (million operations per

second). This throughput is what supercomputers provide,

which demonstrates the potential of distributed simulta-

neous calculations.

The performance of a CN configured with all special-

ized CPNs is measured through the total bandwidth of the

CN, which is equal to the bandwidth of a unitary special-

ized CPN. This metric is used since specialized CPNs

primarily route data, as apposed to performing a computa-

tion with regards to the data. If all depth values are

equal, each specialized CPN will perform one addition.

This will provide an additions per second rate that is

computed through the product of the number of CPNs

configured and the CPN bandwidth. The performance limiting

stage of a specialized CPN is its comparison stage, but

depending on word size it could be the addition stage

instead.

Consider* the example presented in the constraints

section, but where all of the CPN's are of the specialized

type. The comparison stage, utilizing the components

specified in Table 4-3, maintains a 42-ns propagation delay

from clock to output. Therefore, the system would have a




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs